How Dragon Years Shape Marriages and Births: Evidence from Statistical Analysis

Recently, I came across an interesting article published in the journal Significance, an official magazine of the Royal Statistical Society, the American Statistical Association, and the Statistical Society of Australia. Being a Chinese American, I’m always interested in learning about Chinese culture, in addition to the language. This article explored something I’ve heard a lot from my family but never thought about deeply: Do dragon years really make people get married or have babies more?

What Is This All About?

In Chinese astrology, each lunar year is assigned one of 12 animals. The dragon is considered the most powerful and auspicious. Growing up, I often heard my relatives say it’s best to get married or have children in a dragon year because it brings luck and prosperity.

The article shared the author’s personal story about how his Aunty Li would always nag him about getting married. But in the Year of the Dragon (2024), she suddenly stopped. Why? Because planning a wedding or having a baby in a dragon year takes time, and it was already too late for him to give her a “dragon wedding” or “dragon baby.” This story made me smile because it reminded me of my own family gatherings.

What Did the Research Find?

Researchers looked at birth and marriage data from 1970 to 2023 in six countries: Singapore, China, Malaysia, the UK, Kenya, and Mexico. Here are some highlights that stood out to me:

In Singapore, there was a strong positive dragon effect. The fertility rate increased by about 0.17 children per woman in dragon years, which is a noticeable boost.
In China, surprisingly, there wasn’t a big dragon effect overall. The researchers suggested this could be because of the one-child policy (1979–2015). Families couldn’t plan for a second dragon baby even if they wanted to.
In Malaysia, there was a small positive effect, but it wasn’t as strong as Singapore’s.
In countries with tiny Chinese populations (UK, Kenya, Mexico), there was no real dragon effect.
Snake years, which follow dragon years and are considered less lucky, showed slightly negative effects on fertility, though these were small and not consistent across countries.

What About Marriage?

The study also looked at marriage rates among ethnic Chinese in Singapore. They expected an increase in dragon years, but the results were mixed. There was no clear pattern, and some dragon years actually had fewer marriages. So, while having a dragon baby seems to matter, a dragon wedding might not be as big of a deal in the data (even though aunties still care a lot about it!).

Why Does This Matter?

For me, reading this was a cool reminder of how cultural beliefs can actually show up in real data. It also shows how statistical models can help us separate superstition from reality. In Singapore, the effect was strong enough that even the prime minister encouraged citizens to “add a little dragon” in his Lunar New Year speech.

At the same time, the study reminded me that traditions, culture, and policies (like China’s one-child policy) all interact to shape what people decide to do with their lives.

Final Thoughts

As a student interested in computational linguistics and social data, I find studies like this inspiring. They connect language, culture, demographics, and data analysis in a meaningful way. Plus, it makes me think about how traditions continue to shape decisions, even in modern societies.

I wonder if my parents also hoped I would be a dragon baby. (Spoiler: I’m not, but at least I wasn’t born in the Year of the Snake either!)

If you’re curious about Chinese culture, statistics, or demographic trends, I highly recommend reading the full article here (if your school has access). Let me know if you want a follow-up post explaining how the statistical model in the paper worked.

— Andrew

July 27, 2025 0

ACL 2025 New Theme Track: Generalization in NLP Models

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) will be happening in Vienna, Austria from July 27 to August 1. I won’t be attending in person, but as someone planning to study and do research in computational linguistics and NLP in college, I’ve been following the conference closely to keep up with the latest trends.

One exciting thing about this year’s ACL is its new theme track: Generalization of NLP Models. According to the official announcement:

“Following the success of the ACL 2020–2024 Theme tracks, we are happy to announce that ACL 2025 will have a new theme with the goal of reflecting and stimulating discussion about the current state of development of the field of NLP.

Generalization is crucial for ensuring that models behave robustly, reliably, and fairly when making predictions on data different from their training data. Achieving good generalization is critically important for models used in real-world applications, as they should emulate human-like behavior. Humans are known for their ability to generalize well, and models should aspire to this standard.

The theme track invites empirical and theoretical research and position and survey papers reflecting on the Generalization of NLP Models. The possible topics of discussion include (but are not limited to) the following:

How can we enhance the generalization of NLP models across various dimensions—compositional, structural, cross-task, cross-lingual, cross-domain, and robustness?
What factors affect the generalization of NLP models?
What are the most effective methods for evaluating the generalization capabilities of NLP models?
While Large Language Models (LLMs) significantly enhance the generalization of NLP models, what are the key limitations of LLMs in this regard?

The theme track submissions can be either long or short. We anticipate having a special session for this theme at the conference and a Thematic Paper Award in addition to other categories of awards.”

This year’s focus on generalization really highlights where the field is going—toward more robust, ethical, and real-world-ready NLP systems. It’s not just about making cool models anymore, but about making sure they work well across different languages, cultures, and use cases.

If you’re into reading papers like I am, especially ones that dig into how NLP systems can perform reliably on new or unexpected inputs, this theme track will be full of insights. I’m looking forward to checking out the accepted papers when they’re released.

You can read more at the official conference page: ACL 2025 Theme Track Announcement

— Andrew

July 24, 2025 0

America’s AI Action Plan: What It Means for the Future of U.S. AI Leadership

On July 23, 2025, the White House released the long-awaited “National AI Action Plan”, a major step following President Donald Trump’s Executive Order 14179 signed back in January. The goal? To remove barriers to American leadership in artificial intelligence—and the plan outlines how the U.S. government wants to make that happen.

The 28 page report touches on a lot of areas, but here are a few highlights that stood out to me as a student passionate about AI:

Open Access to AI Research: The plan calls for expanding access to government-funded AI models and datasets, helping more students, researchers, and small businesses innovate.
Workforce Development: There’s a strong emphasis on education, especially training the next generation of AI talent. This could open new opportunities for students like us to get involved earlier.
Streamlining Regulations: The plan pushes for cutting red tape that slows down AI development, while still upholding national security and ethical standards.
Government Use of AI: Agencies are encouraged to adopt AI technologies to boost efficiency and modernize services. This is another signal that AI is becoming a core part of public infrastructure.

It’s fascinating to see how quickly AI policy is evolving at the national level. I’ll be keeping an eye on how this action plan plays out, especially in terms of education and research access for younger students.

Read the full report here: whitehouse.gov/AIActionPlan

— Andrew

July 24, 2025 0

Attending SCiL 2025: My First In-Person Computational Linguistics Conference at the University of Oregon

This July, I had the amazing opportunity to attend the 2025 Society for Computation in Linguistics (SCiL) conference, held at the University of Oregon in Eugene from July 18 to 20. This wasn’t just my first academic conference in person. It was also my first time attending a conference where I was (surprisingly) the only high school student in the room.

Road Trip to Eugene and My Badge Moment

My family and I made the drive from Seattle to Eugene, a nearly 300-mile road trip along I-5. I was super excited (and a little nervous) to be attending a professional conference alongside professors, postdocs, and graduate students.

When I checked in, I got my conference badge and immediately noticed something funny. My badge just said “Andrew Li,” with no school or organization listed, while everyone else had theirs printed with their university or research institute. I guess Redmond High School isn’t in their system yet!

The Crowd: Grad Students, Professors, and Me

The SCiL crowd was mostly made up of college professors and graduate students. At first, I felt a little out of place sitting in rooms full of experts discussing topics in areas such as pragmatics and large language models. But once the sessions started, I realized that even as a student just starting out in the field, there was so much I could follow and even more that I wanted to learn.

The conference covered a wide range of topics, all tied together by a focus on computational modeling in linguistics. You can find the full conference schedule here.

I was especially drawn to Dr. Malihe Alikhani‘s keynote presentation “Theory of Mind in Generative Models: From Uncertainty to Shared Meaning“. Her talk explored how generative models can effectively facilitate communicative grounding by incorporating theory of mind alongside uncertainty and human feedback. What stood out to me most was the idea that positive friction can be intentionally built into conversational systems so that it encourages contemplative thinking such as reflection on uncertain assumptions by both the users and AI systems. I was also fascinated by how generative
models embody core mechanisms of pragmatic reasoning, offering linguists and cognitive
scientists both methodological challenges and opportunities to question how computational
systems reflect and shape our understanding of meaning and interaction.

Networking and New Connections

While I didn’t get the chance to meet Prof. Jonathan Dunn in person as planned (he’s teaching “Computational Construction Grammar” at the LSA 2025 Summer Institute from July 24 through August 7 and won’t arrive until July 23), I still made some great new connections.

One of them was Andrew Liu, a graduate student at the University of Toronto. We chatted about his project, “Similarity, Transformation, and the Newly Found Invariance of Influence Functions,” which he’s presenting during the poster session. He was super friendly and shared valuable advice about studying and doing research in computational linguistics and NLP. Here’s his LinkedIn profile if you’d like to check out his work.

Talking with grad students made me realize how wide the field of computational linguistics really is. Everyone had a different background — some came from linguistics, others from computer science or cognitive science — but they were all united by a shared passion for understanding language through computation.

Final Thoughts

Attending SCiL 2025 was eye-opening. Even though I was probably the youngest person there, I felt inspired, welcomed, and challenged in the best way. It confirmed my passion for computational linguistics /NLP and reminded me how much more I want to learn.

If you’re a high school student curious about computational linguistics/NLP, don’t be intimidated by professional conferences. Dive in, listen closely, ask questions, and you might be surprised by how much you take away.

— Andrew

July 22, 2025 0

I-Language vs. E-Language: What Do They Mean in Computational Linguistics?

In the summer of 2025, I started working on a computational linguistics research project using Twitch data under the guidance of Dr. Sidney Wong, a Computational Sociolinguist. As someone who is still pretty new to this field, I was mainly focused on learning how to conduct literature reviews, help narrow down research topics, clean data, build models, and extract insights.

One day, Dr. Wong suggested I look into the concept of I-language vs. E-language from theoretical linguistics. At first, I wasn’t sure why this mattered. I thought, Isn’t language just… language?

But as I read more, I realized that understanding this distinction changes how we think about language data and what we’re actually modeling when we work with NLP.

In this post, I want to share what I’ve learned about I-language and E-language, and why this distinction is important for computational linguistics research.

What Is I-Language?

I-language stands for “internal language.” This idea was proposed by Noam Chomsky, who argued that language is fundamentally a mental system. I-language refers to the internal, cognitive grammar that allows us to generate and understand sentences. It is about:

The unconscious rules and structures stored in our minds
Our innate capacity for language
The mental system that explains why we can produce and interpret sentences we’ve never heard before

For example, if I say, “The cat sat on the mat,” I-language is the system in my brain that knows the sentence is grammatically correct and what it means, even though I may never have said that exact sentence before.

I-language focuses on competence (what we know about our language) rather than performance (how we actually use it in real life).

What Is E-Language?

E-language stands for “external language.” This is the language we actually hear and see in the world, such as:

Conversations between Twitch streamers and their viewers
Tweets, Reddit posts, books, and articles
Any linguistic data that exists outside the mind

E-language is about observable language use. It includes everything from polished academic writing to messy chat messages filled with abbreviations, typos, and slang.

Instead of asking, “What knowledge do speakers have about their language?”, E-language focuses on, “What do speakers actually produce in practice?”

Why Does This Matter for Computational Linguistics?

When it comes to computational linguistics and NLP, this distinction affects:

1. What We Model

I-language-focused research tries to model the underlying grammatical rules and mental representations. For example, building a parser that captures syntax structures based on linguistic theory.
E-language-focused research uses real-world data to build models that predict or generate language based on patterns, regardless of theoretical grammar. For example, training a neural network on millions of Twitch comments to generate chat responses.

2. Research Goals

If your goal is to understand how humans process and represent language cognitively, you’re leaning towards I-language research. This includes computational psycholinguistics, cognitive modeling, and formal grammar induction.

If your goal is to build practical NLP systems for tasks like translation, summarization, or sentiment analysis, you’re focusing on E-language. These projects care about performance and usefulness, even if the model doesn’t match linguistic theory.

3. How Models Are Evaluated

I-language models are evaluated based on how well they align with linguistic theory or native speaker intuitions about grammaticality.

E-language models are evaluated using performance metrics, such as accuracy, BLEU scores, or perplexity, based on how well they handle real-world data.

My Thoughts as a Beginner

When Dr. Wong first told me about this distinction, I thought it was purely theoretical. But now, while working with Twitch data, I see the importance of both views.

For example:

If I want to study how syntax structures vary in Twitch chats, I need to think in terms of I-language to analyze grammar.
If I want to build an NLP model that generates Twitch-style messages, I need to focus on E-language to capture real-world usage patterns.

Neither approach is better than the other. They just answer different types of questions. I-language is about why language works the way it does, while E-language is about how language is actually used in the world.

Final Thoughts

Understanding I-language vs. E-language helps me remember that language isn’t just data for machine learning models. It’s a human system with deep cognitive and social layers. Computational linguistics becomes much more meaningful when we consider both perspectives: What does the data tell us? and What does it reveal about how humans think and communicate?

If you’re also just starting out in this field, I hope this post helps you see why these theoretical concepts matter for practical NLP and AI work. Let me know if you want a follow-up post about other foundational linguistics ideas for computational research.

— Andrew

July 20, 2025 0

What Is Computational Linguistics (and How Is It Different from NLP)?

When I first got interested in this field, I kept seeing the terms computational linguistics and natural language processing (NLP) used almost interchangeably. At first, I thought they were the same thing. By delving deeper through reading papers, taking courses, and conducting research, I realized that although they overlap significantly, they are not entirely identical.

So in this post, I want to explain the difference (and connection) between computational linguistics and NLP from the perspective of a high school student who’s just getting started, but really interested in understanding both the language and the tech behind today’s AI systems.

So, what is computational linguistics?

Computational linguistics is the science of using computers to understand and model human language. It’s rooted in linguistics, the study of how language works, and applies computational methods to test linguistic theories, analyze language structure, or build tools like parsers and grammar analyzers.

It’s a field that sits at the intersection of computer science and linguistics. Think syntax trees, morphology, phonology, semantics, and using code to work with all of those.

For example, in computational linguistics, you might:

Use code to analyze sentence structure in different languages
Create models that explain how children learn grammar rules
Explore how prosody (intonation and stress) changes meaning in speech
Study how regional dialects appear in online chat platforms like Twitch

In other words, computational linguistics is often about understanding language (how it’s structured, how it varies, and how we can model it with computers).

Then what is NLP?

Natural language processing (NLP) is a subfield of AI and computer science that focuses on building systems that can process and generate human language. It’s more application-focused. If you’ve used tools like ChatGPT, Google Translate, Siri, or even grammar checkers, you’ve seen NLP in action.

While computational linguistics asks, “How does language work, and how can we model it?”, NLP tends to ask, “How can we build systems that understand or generate language usefully?”

Examples of NLP tasks:

Sentiment analysis (e.g., labeling text as positive, negative, or neutral)
Machine translation
Named entity recognition (e.g., tagging names, places, dates)
Text summarization or question answering

In many cases, NLP researchers care more about whether a system works than whether it matches a formal linguistic theory. That doesn’t mean theory doesn’t matter, but the focus is more on performance and results.

So, what’s the difference?

The line between the two fields can get blurry (and many people work in both), but here’s how I think of it:

Computational Linguistics	NLP
Rooted in linguistics	Rooted in computer science and AI
Focused on explaining and modeling language	Focused on building tools and systems
Often theoretical or data-driven linguistics	Often engineering-focused and performance-driven
Examples: parsing syntax, studying morphology	Examples: sentiment analysis, machine translation

Think of computational linguistics as the science of language and NLP as the engineering side of language technology.

Why this matters to me

As someone who’s really interested in computational linguistics, I find myself drawn to the linguistic side of things, like how language varies, how meaning is structured, and how AI models sometimes get things subtly wrong because they don’t “understand” language the way humans do.

At the same time, I still explore NLP, especially when working on applied projects like sentiment analysis or topic modeling. I think having a strong foundation in linguistics makes me a better NLP researcher (or student), because I’m more aware of the complexity and nuance of language.

Final thoughts

If you’re just getting started, you don’t have to pick one or the other. Read papers from both fields. Try projects that help you learn both theory and application. Over time, you’ll probably find yourself leaning more toward one, but having experience in both will only help.

I’m still learning, and I’m excited to keep going deeper into both sides. If you’re interested too, let me know! I’m always up for sharing reading lists, courses, or just thoughts on cool research.

— Andrew

July 17, 2025 0

AI-Driven Insights from the Class of 2025 Senior Exit Survey

In late June 2025, I led my nonprofit organization, Student Echo, in a collaboration with Redmond High School to analyze responses from the Class of 2025 Senior Exit Survey. This annual survey, organized by the school’s College & Career Center, collects information on seniors’ post-graduation plans.

While the survey covers multiple areas, our focus was on one key free-response question:
“What additional support do you need before you graduate?”

The College & Career Center team had limited tools to process and interpret open-ended responses at scale. That’s where we came in. Using Student Echo’s AI tools, we analyzed the free-text answers and uncovered themes that could help the school offer more effective and timely support for graduating seniors.

Recommendations:

Maintain a master checklist of graduation tasks.
Schedule quick counselor check-ins with all seniors.
Offer transcript and scholarship submission workshops.
Watch for students who indicate confusion passively (“I think I’m good?”).
Continue mental health messaging and support for burnout or senioritis.

These recommendations aim to make senior-year support more targeted, equitable, and proactive. We were especially excited to hear that the College & Career Center plans to share our findings with the Counseling Department Chair to explore ways to improve their processes based on our analysis.

The full report is available below.

Senior Survey Response Categorization Result Summary Download

— Andrew

July 14, 2025 0

Journals and Conferences for High School Students Interested in Computational Linguistics and NLP

As a high school student interested in studying computational linguistics and natural language processing (NLP) in college, I’ve always looked for ways to stay connected to the latest developments in the field. One of the most effective strategies I’ve found is diving into the world of academic activities: reading papers, following conference proceedings, and even working on papers of my own.

In this post, I’ve put together a list of reputable journals and major conferences in computational linguistics and NLP. These are the publications and venues I wish I had known about when I first started. If you’re just getting into the field, I hope this can serve as a useful starting point.

At the end, I’ve also included a quick update on my recent experiences with two conferences: NAACL 2025 and the upcoming SCiL 2025.

Part I: Journals
Here is a list of prominent journals suitable for publishing research in computational linguistics and natural language processing (NLP), based on their reputation, impact, and relevance to the field:

Computational Linguistics
• Published by MIT Press for the Association for Computational Linguistics (ACL) since 1988.
• The primary archival journal for computational linguistics and NLP research, open access since 2009.
• Focuses on computational and mathematical properties of language and NLP system design.
Transactions of the Association for Computational Linguistics (TACL)
• Sponsored by the ACL, open access, and archived in the ACL Anthology.
• Publishes high-quality, peer-reviewed papers in NLP and computational linguistics.
Journal of Machine Learning Research (JMLR)
• Covers machine learning with some overlap in NLP, including computational linguistics applications.
• Open access and highly regarded for theoretical and applied machine learning research.
Journal of Artificial Intelligence Research (JAIR)
• Publishes research in AI, including computational linguistics and NLP topics.
• Open access with a broad scope in AI-related fields.
Natural Language Engineering
• Published by Cambridge University Press.
• Focuses on practical applications of NLP and computational linguistics.
Journal for Language Technology and Computational Linguistics (JLCL)
• Published by the German Society for Computational Linguistics and Language Technology (GSCL).
• Covers computational linguistics, language technology, and related topics.
Language Resources and Evaluation
• Focuses on language resources, evaluation methodologies, and computational linguistics.
• Published by Springer, often includes papers on corpora and annotation.

Part II: Conferences
The following are the top-tier conferences in computational linguistics and NLP, known for their competitive acceptance rates (often around 25%) and high impact in the field:

Annual Meeting of the Association for Computational Linguistics (ACL)
• The flagship conference of the ACL, held annually in summer.
• Covers all aspects of computational linguistics and NLP, highly prestigious.
Empirical Methods in Natural Language Processing (EMNLP)
• One of the top NLP conferences, focusing on empirical and data-driven NLP research.
• Held annually.
International Conference on Computational Linguistics (COLING)
• A major international conference held biennially, covering a broad range of computational linguistics topics.
North American Chapter of the Association for Computational Linguistics (NAACL)
• The ACL’s North American chapter conference, held annually or biennially.
European Chapter of the Association for Computational Linguistics (EACL)
• The ACL’s European chapter conference, focusing on NLP research in Europe and beyond.
Conference on Computational Natural Language Learning (CoNLL)
• Focuses on computational learning approaches to NLP, sponsored by ACL SIGDAT.
• Known for innovative research in natural language learning.
Lexical and Computational Semantics and Semantic Evaluation (SemEval)
• A workshop series under ACL, focusing on lexical semantics and evaluation tasks.
• Highly regarded for shared tasks in NLP.
International Joint Conference on Natural Language Processing (IJCNLP)
• Held in Asia, often in collaboration with ACL or other organizations.
• Covers a wide range of NLP topics with a regional focus.
The Society for Computation in Linguistics (SCiL) conference
• A newer and more specialized event compared to the well-established, top-tier conferences like ACL, EMNLP, COLING, NAACL, and EACL.
• Began in 2018.
• Narrower focus on mathematical and computational modeling within linguistics.
• Frequently held as a sister society meeting alongside the LSA Annual Meeting
Conference on Neural Information Processing Systems (NeurIPS)
• A premier venue for machine learning research
• Publish NLP-related papers, however, it is not a dedicated computational linguistics or NLP conference.

Part III: My Experience

NAACL 2025 took place in Albuquerque, New Mexico, from April 29 to May 4, 2025. As you might already know from my previous blog post, one of my co-authored papers was accepted to the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, part of NAACL 2025. Due to a scheduling conflict with school, I wasn’t able to attend in person—but I still participated remotely and followed the sessions virtually. It was an incredible opportunity to see the latest research and learn how experts in the field present and defend their work.

SCiL 2025 will be held from July 18 to July 20 at the University of Oregon, co-located with the LSA Summer Institute. I’ve already registered and am especially excited to meet some of the researchers whose work I’ve been reading. In particular, I’m hoping to connect with Prof. Jonathan Dunn, whose book Natural Language Processing for Corpus Linguistics I mentioned in a previous post. I’ll be sure to share a detailed reflection on the conference once I’m back.

If you’re interested in computational linguistics or NLP—even as a high school student—it’s never too early to start engaging with the academic community. Reading real papers, attending conferences, and publishing your own work can be a great way to learn, connect, and grow.

— Andrew

July 11, 2025 0

Ex Machina Gears Up for VEX Worlds 2026 in St. Louis

After an incredible season last year where our team, Ex Machina, competed at the VEX Robotics World Championship 2025, I’m excited to share that we’re back for another season! I’ll continue competing this season as a team member of Ex Machina, building on everything we learned from competing together at the global championship.

A New Season, A New Challenge

This year’s game for the VEX V5 Robotics Competition has been announced, and it looks both challenging and fun. Here is the official game reveal video so you can see what teams will be working on this season:

Watch the VEX V5 Robotics Competition 2026 Game Reveal

From the initial reveal, I can already tell that strategy, design innovation, and precise teamwork will be key to succeeding this year.

Balancing Robotics and College Applications

This season is going to be especially busy for me and my teammates. As rising seniors, we’re all deep into the college application process. Between essays, interviews, and preparing for upcoming deadlines, our schedules are definitely packed. But despite the workload, we’ve all decided to continue competing. Robotics has been such an important part of our high school journey, and we’re passionate about pushing ourselves further as a team in our final season together.

VEX Worlds 2026 Heads to St. Louis

There’s another big change this year: for 2026, the VEX Robotics World Championship is moving to St. Louis, Missouri! For the past few years, the event was held in Dallas, Texas, so this will be a new experience for everyone.

The championship will be held in April 2026 at the America’s Center Convention Complex in downtown St. Louis, with specific dates to be announced later. You can read more details about the upcoming event on the REC Foundation’s official page.

Here is a video introducing VEX Worlds 2026 in St. Louis to get you excited for what’s ahead:

VEX Robotics World Championship Heads to St. Louis in 2026

Looking Ahead

It feels both exciting and bittersweet to enter my final year of high school robotics. I know the journey ahead will be intense with balancing robot design, programming, and competition prep alongside college applications, but I’m ready for the challenge.

I’ll keep sharing updates about our season as we start building and competing, so stay tuned to see how Ex Machina continues to grow in 2026.

— Andrew

July 8, 2025 0

Summer Programs and Activities in Computational Linguistics: My Personal Experiences and Recommendations

If you’re a high school student interested in computational linguistics, you might be wondering: What are some ways to dive deeper into this field over the summer? As someone who loves language, AI, and everything in between, I’ve spent the past year researching programs and activities, and I wanted to share what I’ve learned (along with some of my personal experiences).

1. Summer Linguistic Institute for Youth Scholars (SLIYS)

What it is:
SLIYS is a two-week summer program run by The Ohio State University’s Department of Linguistics. It focuses on introducing high school students to language analysis and linguistic theory in a fun and rigorous way. Students get to explore syntax, morphology, phonetics, language universals, and even some computational topics.

My experience:
I’m super excited to share that I’ll be participating in SLIYS this summer (July 14 – 25, 2025). I was so happy to be accepted, and I’m looking forward to learning from real linguistics professors and meeting other students who are passionate about language. I’ll definitely share a reflection post after I finish the program, so stay tuned if you want an inside look!

Learn more about SLIYS here.

2. Summer Youth Camp for Computational Linguistics (SYCCL)

What it is:
SYCCL is a summer camp hosted by the Department of Linguistics and the Institute for Advanced Computational Science at Stony Brook University. It introduces high school students to computational linguistics and language technology, covering topics like language data, NLP tools, and coding for language analysis.

My experience:
I had planned to apply for SYCCL this year as well, but unfortunately, its schedule (July 6 – 18, 2025) conflicted with SLIYS, which I had already accepted. Another challenge I faced was that SYCCL’s website wasn’t updated until late April 2025, which is quite late compared to other summer programs. I had actually contacted the university earlier this year and they confirmed it would run again, but I didn’t see the application open until April. My advice is to check their website frequently starting early spring, and plan for potential conflicts with other summer programs.

Learn more about SYCCL here.

3. North American Computational Linguistics Open Competition (NACLO)

What it is:
NACLO is an annual computational linguistics competition for high school students across North America. It challenges students with problems in linguistics and language data analysis, testing their ability to decipher patterns in unfamiliar languages.

My experience:
I’ve tried twice to participate in NACLO at my local test center. Unfortunately, both times the test dates were weekdays that conflicted with my school final exams, so I had to miss them. If you’re planning to participate, I strongly recommend checking the schedule early to make sure it doesn’t overlap with finals or other major commitments. Despite missing it, I still find their practice problems online really fun and useful for thinking like a computational linguist.

Learn more about NACLO here.

4. LSA Summer Institute

What it is:
The Linguistic Society of America (LSA) Summer Institute is an intensive four-week program held every two years at different universities. It offers courses and workshops taught by top linguists and is known as one of the best ways to explore advanced topics in linguistics, including computational linguistics.

My experience:
I was planning to apply for the LSA Summer Institute this year. However, I found out that it is only open to individuals aged 18 and older. I contacted the LSA Institute Registration Office to ask if there could be any exceptions or special considerations for underage participants, but it was disappointing to receive their response: “Unfortunately, the age limit is firm and the organizers will not be considering any exceptions.” So if you’re thinking about applying, my advice is to check the age qualifications early before starting the application process.

Learn more about LSA Summer Institute here.

5. Local University Outreach Events and Courses

Another great way to explore linguistics and computational linguistics is by checking out courses or outreach events at local universities. For example, last summer I took LING 234 (Language and Diversity) at the University of Washington (Seattle). It was an eye-opening experience to study language variation, identity, and society from a college-level perspective. I wrote a reflection about it in my blog post from November 29, 2024. If your local universities offer summer courses for high school students, I highly recommend checking them out.

6. University-Affiliated AI4ALL Summer Programs for High School Students

What it is:
AI4ALL partners with universities to offer summer programs introducing high school students to AI research, ethics, and applications, often including NLP and language technology projects. While these programs are not focused solely on computational linguistics, they provide a great entry point into AI and machine learning, which are essential tools for language technology research.

About AI4ALL:
AI4ALL is a U.S.-based nonprofit focused on increasing diversity and inclusion in artificial intelligence (AI) education, research, development, and policy, particularly for historically underrepresented groups such as Black, Hispanic/Latinx, Indigenous, women, non-binary, low-income, and first-generation college students. Their mission is to make sure the next generation of AI researchers and developers reflects the diversity of the world.

Examples:

Stanford AI4ALL
Princeton AI4ALL
Carnegie Mellon AI4ALL

These programs are competitive and have different focus areas, but all aim to broaden participation in AI by empowering future researchers early.

Final Thoughts

I feel grateful to have these opportunities to grow my passion for computational linguistics, and I hope this list helps you plan your own summer learning journey. Whether you’re solving NACLO problems in your free time or spending two weeks at SLIYS like I will this summer, every step brings you closer to understanding how language and AI connect.

Let me know if you want a future post reviewing SLIYS after I complete it in July!

— Andrew

July 5, 2025 0