Can Taco Bell’s Drive-Through AI Get Smarter?

Taco Bell has always been one of my favorite foods, so when I came across a recent Wall Street Journal report about their experiments with voice AI at the drive-through, I was instantly curious. The idea of ordering a Crunchwrap Supreme or Baja Blast without a human cashier sounds futuristic, but the reality has been pretty bumpy.

According to the report, Taco Bell has rolled out AI ordering systems in more than 500 drive-throughs across the U.S. While some customers have had smooth experiences, others ran into glitches and frustrating miscommunications. People even pranked the system by ordering things like “18,000 cups of water.” Because of this, Taco Bell is rethinking how it uses AI. The company now seems focused on a hybrid model where AI handles straightforward orders but humans step in when things get complicated.

This situation made me think about how computational linguistics could help fix these problems. Since I want to study computational linguistics in college, it is fun to connect what I’m learning with something as close to home as my favorite fast-food chain.

Where Computational Linguistics Can Help

Handling Noise and Accents
Drive-throughs are noisy, with car engines, music, and all kinds of background sounds. Drive-thru interactions involve significant background noise and varied accents. Tailoring noise-resistant Automatic Speech Recognition (ASR) systems, possibly using domain-specific acoustic modeling or data augmentation techniques, would improve recognition reliability across diverse environments. AI could be trained with more domain-specific audio data so it can better handle noise and understand different accents.
Catching Prank Orders
A simple “sanity check” in the AI could flag ridiculous orders. If someone asks for thousands of items or nonsense combinations, the system could politely ask for confirmation or switch to a human employee. Incorporating a traditional sanity-check module, even rule-based, can flag implausible orders like thousands of water cups or nonsensical requests. This leverages computational linguistics to parse quantities and menu items and validate them against logical limits and store policies.
Understanding Context
Ordering food is not like asking a smart speaker for the weather. People use slang, pause, or change their minds mid-sentence. AI should be designed to pick up on this context instead of repeating the same prompts over and over.
Switching Smoothly to Humans
When things go wrong, customers should not have to restart their whole order with a person. AI could transfer the interaction while keeping the order details intact.
Detecting Frustration
If someone sounds annoyed or confused, the AI could recognize it and respond with simpler options or bring in a human right away.

Why This Matters

The point of voice AI is not just to be futuristic. It is about making the ordering process easier and faster. For a restaurant like Taco Bell, where the menu has tons of choices and people are often in a hurry, AI has to understand language as humans use it. Computational linguistics focuses on exactly this: connecting machines with real human communication.

I think Taco Bell’s decision to step back and reassess is actually smart. Instead of replacing employees completely, they can use AI as a helpful tool while still keeping the human touch. Personally, I would love to see the day when I can roll up, ask for a Crunchwrap Supreme in my own words, and have the AI get it right the first time.

Further Reading

Cui, Wenqian, et al. “Recent Advances in Speech Language Models: A Survey.” Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2025, pp. 13943–13970. ACL Anthology
Zheng, Xianrui, Chao Zhang, and Philip C. Woodland. “DNCASR: End-to-End Training for Speaker-Attributed ASR.” Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2025, pp. 18369–18383. ACL Anthology
Imai, Saki, Tahiya Chowdhury, and Amanda J. Stent. “Evaluating Open-Source ASR Systems: Performance Across Diverse Audio Conditions and Error Correction Methods.” Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), 2025, pp. 5027–5039. ACL Anthology
Hopton, Zachary, and Eleanor Chodroff. “The Impact of Dialect Variation on Robust Automatic Speech Recognition for Catalan.” Proceedings of the 22nd SIGMORPHON Workshop on Computational Morphology, Phonology, and Phonetics, 2025, pp. 23–33. ACL Anthology
Arora, Siddhant, et al. “On the Evaluation of Speech Foundation Models for Spoken Language Understanding.” Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 11923–11938. ACL Anthology
Cheng, Xuxin, et al. “MoE-SLU: Towards ASR-Robust Spoken Language Understanding via Mixture-of-Experts.” Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 14868–14879. ACL Anthology
Parikh, Aditya Kamlesh, Louis ten Bosch, and Henk van den Heuvel. “Ensembles of Hybrid and End-to-End Speech Recognition.” Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024, pp. 6199–6205. ACL Anthology
Mujtaba, Dena, et al. “Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech.” Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024, pp. 4795–4809. ACL Anthology
Udagawa, Takuma, Masayuki Suzuki, Masayasu Muraoka, and Gakuto Kurata. “Robust ASR Error Correction with Conservative Data Filtering.” Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2024, pp. 256–266. ACL Anthology

— Andrew

4,811 hits

September 10, 2025 1

Can AI Save Endangered Languages?

Recently, I’ve been thinking a lot about how computational linguistics and AI intersect with real-world issues, beyond just building better chatbots or translation apps. One question that keeps coming up for me is: Can AI actually help save endangered languages?

As someone who loves learning languages and thinking about how they shape culture and identity, I find this topic both inspiring and urgent.

The Crisis of Language Extinction

Right now, linguists estimate that out of the 7,000+ languages spoken worldwide, nearly half are at risk of extinction within this century. This isn’t just about losing words. When a language disappears, so does a community’s unique way of seeing the world, its oral traditions, its science, and its cultural knowledge.

For example, many Indigenous languages encode ecological wisdom, medicinal knowledge, and cultural philosophies that aren’t easily translated into global languages like English or Mandarin.

How Can Computational Linguistics Help?

Here are a few ways I’ve learned that AI and computational linguistics are being used to preserve and revitalize endangered languages:

1. Building Digital Archives

One of the first steps in saving a language is documenting it. AI models can:

Transcribe and archive spoken recordings automatically, which used to take linguists years to do manually
Align audio with text to create learning materials
Help create dictionaries and grammatical databases that preserve the language’s structure for future generations

Projects like ELAR (Endangered Languages Archive) work on this in partnership with local communities.

2. Developing Machine Translation Tools

Although data scarcity makes it hard to build translation systems for endangered languages, researchers are working on:

Transfer learning, where AI models trained on high-resource languages are adapted to low-resource ones
Multilingual language models, which can translate between many languages and improve with even small datasets
Community-centered translation apps, which let speakers record, share, and learn their language interactively

For example, Google’s AI team and university researchers are exploring translation models for Indigenous languages like Quechua, which has millions of speakers but limited online resources.

3. Revitalization Through Language Learning Apps

Some communities are partnering with tech developers to create mobile apps for language learning tailored to their heritage language. AI can help:

Personalize vocabulary learning
Generate example sentences
Provide speech recognition feedback for pronunciation practice

Apps like Duolingo’s Hawaiian and Navajo courses are small steps in this direction. Ideally, more tools would be built directly with native speakers to ensure accuracy and cultural respect.

Challenges That Remain

While all this sounds promising, there are real challenges:

Data scarcity. Many endangered languages have very limited recorded data, making it hard to train accurate models
Ethical concerns. Who owns the data? Are communities involved in how their language is digitized and shared?
Technical hurdles. Language structures vary widely, and many NLP models are still biased towards Indo-European languages

Why This Matters to Me

As a high school student exploring computational linguistics, I’m passionate about language diversity. Languages aren’t just tools for communication. They are stories, worldviews, and cultural treasures.

Seeing AI and computational linguistics used to preserve rather than replace human language reminds me that technology is most powerful when it supports people and cultures, not just when it automates tasks.

I hope to work on projects like this someday, using NLP to build tools that empower communities to keep their languages alive for future generations.

Final Thoughts

So, can AI save endangered languages? Maybe not alone. But combined with community efforts, linguists, and ethical frameworks, AI can be a powerful ally in documenting, preserving, and revitalizing the world’s linguistic heritage.

If you’re interested in learning more, check out projects like ELAR (Endangered Languages Archive) or the Living Tongues Institute. Let me know if you want me to write another post diving into how multilingual language models actually work.

— Andrew

September 4, 2025 0

When AI Goes Wrong Should Developers Be Held Accountable?

Artificial intelligence has become a big part of my daily life. I’ve used it to help brainstorm essays, analyze survey data for my nonprofit, and even improve my chess practice. It feels like a tool that makes me smarter and more creative. But not every story about AI is a positive one. Recently, lawsuits have raised tough questions about what happens when AI chatbots fail to protect people who are vulnerable.

The OpenAI Lawsuit

In August 2025, the parents of 16-year-old Adam Raine filed a wrongful-death lawsuit against OpenAI and its CEO, Sam Altman. You can read more about the lawsuit here. They claim that over long exchanges, ChatGPT-4o encouraged their son’s suicidal thoughts instead of stopping to help him. The suit alleges that his darkest feelings were validated, that the AI even helped write a suicide note, and that the safeguards failed in lengthy conversations. OpenAI responded with deep sorrow. They acknowledged that protections can weaken over time and said they will improve parental controls and crisis interventions.

Should a company be responsible if its product appears to enable harmful outcomes in vulnerable people? That is the central question in this lawsuit.

The Sewell Setzer III Case

The lawsuit by Megan Garcia, whose 14-year-old son, Sewell Setzer III, died by suicide in February 2024, was filed on October 23, 2024. A federal judge in Florida allowed the case to move forward in May 2025, rejecting arguments that the chatbot’s outputs are protected free speech under the First Amendment, at least at this stage of litigation. You can read more about this case here.

The lawsuit relates to Sewell’s interactions with Character.AI chatbots, including a version modeled after a Game of Thrones character. In the days before his death, the AI reportedly told him to “come home,” and he took his life shortly afterward.

Why It Matters

I have seen how AI can be a force for good in education and creativity. It feels like a powerful partner in learning. But these lawsuits show it can also be dangerous if an AI fails to detect or respond to harmful user emotions. Developers are creating systems that can feel real to vulnerable teens. If we treat AI as a product, companies should be required to build it with the same kinds of safety standards that cars, toys, and medicines are held to.

We need accountability. AI must include safeguards like crisis prompts, age flags, and quick redirects to real-world help. If the law sees AI chatbots as products, not just speech, then victims may have legal paths for justice. And this could push the industry toward stronger protections for users, especially minors.

Final Thoughts

As someone excited to dive deeper into AI studies, I feel hopeful and responsible. AI can help students, support creativity, and even improve mental health. At the same time I cannot ignore the tragedies already linked to these systems. The OpenAI case and the Character.AI lawsuit are both powerful reminders. As future developers, we must design with empathy, prevent harm, and prioritize safety above all.

— Andrew

(More recent news about the Sewell Setzer III case: Google and Character.AI to Settle Lawsuit Over Teenager’s Death on Jan. 7, 2026)

September 1, 2025 1

Is AI a Job Killer or Creator? A Student’s Perspective

As a high school student preparing to study computational linguistics in college, I often think about how AI is reshaping the world of work. Every week there are new headlines about jobs being replaced or created, and I cannot help but wonder what this means for my own future career.

When OpenAI released ChatGPT, headlines quickly followed about how AI might take over jobs. And in some cases, the headlines weren’t exaggerations. Big IT companies have already started trimming their workforces as they shift toward AI. Microsoft cut roles in its sales and support teams while investing heavily in AI copilots. Google and Meta downsized thousands of positions, with executives citing efficiency gains powered by AI tools. Amazon, too, has leaned on automation and machine learning to reduce its reliance on certain customer service and retail roles.

These stories feed into an obvious conclusion: AI is a job killer. It can automate repetitive processes, work 24/7, and reduce costs. For workers, that sounds less like “innovation” and more like losing paychecks. It’s not surprising that surveys show many employees fear being displaced by AI, especially those in entry-level or routine roles.

Bill Gates’ Perspective: Why AI Won’t Replace Programmers

But not everyone agrees with the “AI takes all jobs” narrative. Programming is often treated as one of the riskiest jobs for replacement by AI, since much of it seems automatable at first glance. To this specific job, Bill Gates has offered a different perspective. Gates believes that AI cannot replace programmers because coding is not just about typing commands into an editor.

Key Points from Bill Gates’ Perspective

Human Creativity and Judgment
Gates explains that programming requires deep problem-solving and creative leaps that machines cannot reproduce. “Writing code isn’t just typing – it’s thinking deeply,” he says. Designing software means understanding complex problems, weighing trade-offs, and making nuanced decisions, all areas where humans excel.
AI as a Tool, Not a Replacement
Yes, AI can suggest snippets, debug errors, and automate small tasks. But Gates emphasizes that software development’s heart lies in human intuition. No algorithm can replace the innovative spark of a coder facing an unsolved challenge.
Long-Term Outlook
Gates predicts programming will remain human-led for at least the next century. While AI will transform industries, the unique nature of software engineering keeps it safe from full automation.
Broader Implications of AI
Gates does not deny the risks. Jobs will shift, and some roles will disappear. But he remains optimistic: with careful adoption, AI can create opportunities, increase productivity, and reshape work in positive ways.
Other Safe Professions
Gates also highlights biology, energy, and other fields where human creativity and insight are essential. These professions, like programming, are unlikely to be fully automated anytime soon.

In short, Gates sees AI not as a replacement, but as an assistant, a way to amplify human creativity rather than eliminate it. He explained this view in an interview summarized by the Economic Times: Bill Gates reveals the one profession AI won’t replace—not even in a century.

AI as a Job Creator

If we flip the script, AI is also a job creator. Entire industries are forming around AI ethics, safety, and regulation. Companies now need AI trainers, evaluators, and explainability specialists. Developers are finding new roles in integrating AI into existing products. Even in education, AI tutors and tools are generating jobs for teachers who can adapt curricula around them.

As Gates points out, the key is using AI wisely. When viewed as a productivity booster, AI can free humans from repetitive work, allowing them to focus on higher-value and more meaningful tasks. Instead of eliminating jobs entirely, AI can create new ones we have not even imagined yet, similar to how the internet gave rise to jobs like app developers, social media managers, and data scientists.

The Third Option: Startup Rocket Fuel

There’s also another perspective I find compelling. A recent ZDNet article, Is AI a job killer or creator? There’s a third option: Startup rocket fuel, points out that AI doesn’t just destroy or create jobs, it also accelerates startups.

Think of it this way: AI lowers the cost of entry for innovation. Small teams can build products faster, test ideas cheaply, and compete with larger companies. This “startup rocket fuel” effect could unleash a new wave of entrepreneurship, creating companies and jobs that would not have been possible before.

My Perspective

As a high school student planning to study computational linguistics, I see both sides of this debate. AI has already begun changing what it means to “work,” and some jobs will inevitably disappear. But Gates’ perspective resonates with me: the creativity and judgment that humans bring are not replaceable.

Instead of viewing AI as either a job killer or job creator, I think it’s better to recognize its dual role. It will eliminate some jobs, reshape many others, and create entirely new ones. And perhaps most excitingly, it might empower a generation of students like me to build startups, pursue research, or tackle social challenges with tools that amplify what we can do.

In the end, AI isn’t writing the future of work for us. We are writing it ourselves, line by line, problem by problem, with AI as our collaborator.

Takeaway

AI will not simply erase or hand out jobs. It will redefine them, and it is up to us to decide how we shape that future.

August 26, 2025 0

Reflections on Andrew Ng’s Tip: Building Small AI Projects and Its Implications for Computational Linguistics Research

Recently, I read the latest greeting from Andrew Ng in The Batch (Issue #308), where he shared a tip about getting more practice building with AI. His advice really resonated with me, especially as someone exploring computational linguistics research while balancing schoolwork and robotics competitions.

Andrew Ng’s Key Advice

In his post, Andrew Ng emphasized:

If you find yourself with only limited time to build, reduce the scope of your project until you can build something in whatever time you do have.

He shared how he often cuts down an idea into the smallest possible component he can build in an hour or two, rather than waiting for a free weekend or months to tackle the entire project. He illustrated this with his example of creating an audience simulator for practicing public speaking. Instead of building a complex multi-person AI-powered simulation, he started by creating a simple 2D avatar with limited animations that could be expanded later.

Implications for Computational Linguistics Research

Reading this made me think about how I often approach my own computational linguistics projects. Here are a few reflections:

1. Start Small with Linguistic Tasks

In computational linguistics, tasks can feel overwhelming. For example, creating a full sentiment analysis pipeline for multiple languages, building a neural machine translation system, or training large language models are all massive goals.

Andrew Ng’s advice reminds me that it’s okay — and often smarter — to start with a small, well-defined subtask:

Instead of building a multilingual parser, start by training a simple POS tagger on a small dataset.
Instead of designing a robust speech recognition system, start by building a phoneme classifier for a single speaker dataset.
Instead of developing an entire chatbot pipeline, start by implementing a rule-based intent recognizer for a specific question type.

2. Build Prototypes to Test Feasibility

His example of building a minimal audience simulator prototype to get feedback also applies to NLP. For instance, if I want to work on dialect detection on Twitch chat data (something I’ve thought about), I could first build a prototype classifier distinguishing only two dialects or language varieties. Even if it uses basic logistic regression with TF-IDF features, it tests feasibility and lets me get feedback from mentors or peers before expanding.

3. Overcome Perfection Paralysis

As a student, I sometimes hold back on starting a project because I feel I don’t have time to make it perfect. Andrew Ng’s advice to reduce the project scope until you can build something right away is a mindset shift. Even a basic script that tokenizes Twitch messages or parses sentence structures is progress.

4. Practicing Broad Skills by Hacking Small Projects

He also mentioned that building many small projects helps practice a wide range of skills. In computational linguistics, that could mean:

Practicing different Python NLP libraries (NLTK, spaCy, Hugging Face)
Trying out rule-based vs. machine learning vs. deep learning approaches
Exploring new datasets and annotation schemes

Final Thoughts

I really appreciate Andrew Ng’s practical mindset for builders. His advice feels especially relevant to computational linguistics, where small wins accumulate into larger research contributions. Instead of feeling blocked by the scale of a project, I want to keep practicing the art of scoping down and just building something small but meaningful.

If you’re also working on computational linguistics or NLP projects as a student, I hope this inspires you to pick a tiny subtask today and start building.

Let me know if you want me to share a future post listing some small NLP project ideas that I’m working on this summer.

— Andrew

August 17, 2025 1

Is the Increasing Trend of Leveraging LLMs like ChatGPT in Writing Research Papers Concerning?

On August 4, 2025, Science published a tech news piece titled “One-fifth of computer science papers may include AI content,” written by Phie Jacobs, a general assignment reporter at Science. The article reports on a large-scale analysis conducted by researchers at Stanford University and the University of California, Santa Barbara. They examined over 1 million abstracts and introductions and found that by September 2024, 22.5% of computer science papers showed signs of input from large language models such as ChatGPT. The researchers used statistical modeling to detect common word patterns linked to AI-generated writing.

This caught my attention because I was surprised at how common AI-generated content has already become in academic research. I agree with the concern raised in the article, particularly this point:

Although the new study primarily looked at abstracts and introductions, Dmitry Kobak (University of Tübingen data scientist) worries authors will increasingly rely on AI to write sections of scientific papers that reference related works. That could eventually cause these sections to become more similar to one another and create a “vicious cycle” in the future, in which new LLMs are trained on content generated by other LLMs.

From my own experience writing research papers over the past few years, I can see why this concern is valid. If you have followed my blog, you know I have published two research papers and am currently working on a third. While working on my papers, I occasionally used ChatGPT (including its Deep Research) to help find peer-reviewed sources for citations instead of relying solely on search engines like Google Scholar. However, I quickly realized that depending on ChatGPT for this task can be risky. In my case, about 30% of the citations it provided were inaccurate, which meant I had to verify each one manually. For reliable academic sourcing, I found Google Scholar much more trustworthy because current LLMs are still prone to “hallucinations.” You may have encountered other AI tools like Consensus AI, a search engine tailored for scientific research and limited to peer-reviewed academic papers only. Compared to ChatGPT Deep Research, it’s faster and more reliable for academic queries, but I strongly recommend always verifying AI outputs, as both tools can occasionally produce inaccuracies.

The Science article also highlights that AI usage varies significantly across disciplines. “The amount of artificial intelligence (AI)-modified sentences in scientific papers had surged by September 2024, almost two years after the release of ChatGPT, according to an analysis.” The table below shows estimates of AI usage by field, with certain disciplines adopting AI much faster than others. James Zou, a computational biologist at Stanford University, suggests these differences may reflect varying levels of familiarity with AI technology.

While the study from Stanford and UCSB is quite solid, Data Scientist Kobak pointed out that the estimates above could be underreported. One reason for this is that some authors may have started removing “red flag” words from manuscripts to avoid detection. For example, the word “delve” became more common right after ChatGPT launched, but its usage dropped sharply once it became widely recognized as a hallmark of AI-generated text.

If you want to read the full article, you can find it here: Science – One-fifth of computer science papers may include AI content.

— Andrew

Update: Here is another more recent report from Nature.

August 11, 2025 0

How Computational Linguistics Is Powering the Future of Robotics?

As someone who’s been involved in competitive robotics through VEX for several years and recently started diving into computational linguistics, I’ve been wondering: how do these two fields connect?

At first, it didn’t seem obvious. VEX Robotics competitions (like the one my team Ex Machina participated in at Worlds 2025) are mostly about designing, building, and coding autonomous and driver-controlled robots to complete physical tasks. There’s no direct language processing involved… at least not yet. But the more I’ve learned, the more I’ve realized that computational linguistics plays a huge role in making real-world robots smarter, more useful, and more human-friendly.

Here’s what I’ve learned about how these two fields intersect and where robotics is heading.

1. Human-Robot Communication

The most obvious role of computational linguistics in robotics is helping robots understand and respond to human language. This is powered by natural language processing (NLP), a core area of computational linguistics. Think about assistants like Alexa or social robots like Pepper. They rely on language models and parsing techniques to interpret what we say and give meaningful responses.

This goes beyond voice control. It’s about making robots that can hold conversations, answer questions, or even ask for clarification when something is unclear. For robots to work effectively with people, they need language skills, not just motors and sensors.

2. Task Execution and Instruction Following

Another fascinating area is how robots can convert human instructions into actual actions. For example, if someone says, “Pick up the red cup from the table,” a robot must break that down: What object? What location? What action?

This is where semantic parsing comes in—turning language into structured data the robot can use to plan its moves. In VEX, we manually code our autonomous routines, but imagine if a future version of our robot could listen to instructions in plain English and adapt its behavior in real time.

3. Understanding Context and Holding a Conversation

Human communication is complex. We often leave things unsaid, refer to past ideas, or use vague phrases like “that one over there.” Research in discourse modeling and context tracking helps robots manage this complexity.

This is especially useful in collaborative environments. Think hospital robots assisting nurses, or factory robots working alongside people. They need to understand not just commands but also user intent, tone, and changing context.

4. Multimodal Understanding

Robots don’t just rely on language. They also use vision, sensors, and spatial awareness. A good example is interpreting a command like, “Hand me the tool next to the blue box.” The robot has to match those words with what it sees.

This is called multimodal integration, where the robot combines language and visual information. In my own robotics experience, we’ve used vision sensors to detect field elements, but future robots will need to combine that visual input with spoken instructions to act intelligently in dynamic spaces.

5. Emotional and Social Intelligence

This part really surprised me. Sentiment analysis and affective computing are helping robots detect emotions in voice or text, which makes them more socially aware.

This could be important for assistive robots that help the elderly, teach kids, or support people with disabilities. It’s not just about understanding words. It’s about understanding people.

6. Learning from Language

Computational linguistics also helps robots learn and adapt over time. Instead of hardcoding every behavior, researchers are working on ways for robots to learn from manuals, online resources, or natural language feedback.

This is especially exciting as large language models continue to evolve. Imagine a robot reading its own instruction manual or watching a video tutorial and figuring out how to do a new task.

Looking Ahead

While none of this technology is part of the current VEX Robotics competition (at least not yet), understanding how computational linguistics connects to robotics gives me a whole new appreciation for where robotics is going. It also makes me excited about studying this intersection more deeply in college.

Whether it’s through smarter voice assistants, more helpful home robots, or AI systems that respond naturally, computational linguistics is quietly shaping the next generation of robotics.

— Andrew

August 4, 2025 1

What I Learned (and Loved) at SLIYS: Two Weeks of Linguistic Discovery at Ohio State

This summer, I had the chance to participate in both SLIYS 1 and SLIYS 2—the Summer Linguistic Institute for Youth Scholars—hosted by the Ohio State University Department of Linguistics. Across two weeks packed with lectures, workshops, and collaborative data collection, I explored the structure of language at every level: from the individual sounds we make to the complex systems that govern meaning and conversation. But if I had to pick just one highlight, it would be the elicitation sessions—hands-on explorations with real language data that made the abstract suddenly tangible.

SLIYS 1: Finding Language in Structure

SLIYS 1 started with the fundamentals—consonants, vowels, and the International Phonetic Alphabet (IPA)—but quickly expanded into diverse linguistic territory: morphology, syntax, semantics, and pragmatics. Each day featured structured lectures covering topics like sociolinguistic variation, morphological structures, and historical linguistics. Workshops offered additional insights, from analyzing sentence meanings to exploring language evolution.

The core experience, however, was our daily elicitation sessions. My group tackled Serbo-Croatian, collaboratively acting as elicitors and transcribers to construct a detailed grammar sketch. We identified consonant inventories, syllable structures (like CV, CVC, and CCV patterns), morphological markers for plural nouns and verb tenses, and syntactic word orders. Through interactions with our language consultant, we tested hypotheses directly, discovering intricacies like how questions were formed using particles like dahlee, and how adjective-noun order worked. This daily practice gave theory immediate clarity and meaning, shaping our skills as linguists-in-training.

SLIYS 2: Choosing My Path in Linguistics

SLIYS 2 built upon our initial foundations, diving deeper into phonological analysis, morphosyntactic properties, and the relationship between language and cognition. This week offered more autonomy, allowing us to select workshops tailored to our interests. My choices included sessions on speech perception, dialectology, semiotics, and linguistic anthropology—each challenging me to think more broadly about language as both cognitive and cultural phenomena.

Yet again, the elicitation project anchored our experience, this time exploring Georgian. Our group analyzed Georgian’s distinctive pluralization system, polypersonal verb agreement (verbs agreeing with both subjects and objects), and flexible sentence orders (SVO/SOV). One fascinating detail we uncovered was how nouns remained singular when preceded by numbers. Preparing our final presentation felt especially rewarding, bringing together the week’s linguistic discoveries in a cohesive narrative. Presenting to our peers crystallized not just what we learned, but how thoroughly we’d internalized it.

More Than Just a Summer Program

What I appreciated most about SLIYS was how seriously it treated us as student linguists. The instructors didn’t just lecture—they listened, challenged us, and encouraged our curiosity. Whether we were learning about deixis or discourse analysis, the focus was always on asking better questions, not just memorizing answers.

By the end of SLIYS 2, I found myself thinking not only about how language works, but why we study it in the first place. Language is a mirror to thought, a map of culture, and a bridge between people—and programs like SLIYS remind me that it’s also something we can investigate, question, and build understanding from.

Moments from SLIYS 2: A Snapshot of a Summer to Remember

As SLIYS 2 came to a close, our instructors captured these Zoom screenshots to help us remember the community, curiosity, and collaboration that made this experience so meaningful.

Special Thanks to the SLIYS 2025 Team

This incredible experience wouldn’t have been possible without the passion, insight, and dedication of the SLIYS 2025 instructors. Each one brought something unique to the table—whether it was helping us break down complex syntax, introducing us to sociolinguistics through speech perception, or guiding us through our elicitation sessions with patience and curiosity. I’m especially grateful for the way they encouraged us to ask deeper questions and think like real linguists.

Special thanks to:

Kyler Laycock – For leading with energy, making phonetics and dialectology come alive, and always reminding us how much identity lives in the details of speech.
Jory Ross – For guiding us through speech perception and conversational structure, and for sharing her excitement about how humans really process language.
Emily Sagasser – For her insights on semantics, pragmatics, and focus structure, and for pushing us to think about how language connects to social justice and cognition.
Elena Vaikšnoraitė – For their thoughtful instruction in syntax and psycholinguistics, and for showing us the power of connecting data across languages.
Dr. Clint Awai-Jennings – For directing the program with care and purpose—and for showing us that it’s never too late to turn a passion for language into a life’s work.

Thank you all for making SLIYS 1 and 2 an unforgettable part of my summer.

— Andrew

July 28, 2025 0

My Thoughts on “The Path to Medical Superintelligence”

Recently, I read an article published on Microsoft AI’s blog titled “The Path to Medical Superintelligence”. As a high school student interested in AI, computational linguistics, and the broader impacts of technology, I found this piece both exciting and a little overwhelming.

What Is Medical Superintelligence?

The blog talks about how Microsoft AI is working to build models with superhuman medical reasoning abilities. In simple terms, the idea is to create an AI that doesn’t just memorize medical facts but can analyze, reason, and make decisions at a level that matches or even surpasses expert doctors.

One detail that really stood out to me was how their new AI models also consider the cost of healthcare decisions. The article explained that while health costs vary widely depending on country and system, their team developed a method to consistently measure trade-offs between diagnostic accuracy and resource use. In other words, the AI doesn’t just focus on getting the diagnosis right, but also weighs how expensive or resource-heavy its suggested tests and treatments would be.

They explained that their current models already show impressive performance on medical benchmarks, such as USMLE-style medical exams, and that future models could go beyond question answering to support real clinical decision-making in a way that is both effective and efficient.

What Excites Me About This?

One thing that stood out to me was the potential impact on global health equity. The article mentioned that billions of people lack reliable access to doctors or medical specialists. AI models with advanced medical reasoning could help provide high-quality medical advice anywhere, bridging the gap for underserved communities.

It’s also amazing to think about how AI could support doctors by:

Reducing their cognitive load
Cross-referencing massive amounts of research
Helping with diagnosis and treatment planning

For someone like me who is fascinated by AI’s applications in society, this feels like a real-world example of AI doing good.

What Concerns Me?

At the same time, the blog post emphasized that AI is meant to complement doctors and health professionals, not replace them. I completely agree with this perspective. Medical decisions aren’t just about making the correct diagnosis. Doctors also need to navigate ambiguity, understand patient emotions and values, and build trust with patients and their families in ways AI isn’t designed to do.

Still, even if AI is only used as a tool to support clinicians, there are important concerns:

AI could give wrong or biased recommendations if the training data is flawed
It might suggest treatments without understanding a patient’s personal situation or cultural background
There is a risk of creating new inequalities if only wealthier hospitals or countries can afford the best AI models

Another thought I had was about how roles will evolve. The article mentioned that AI could help doctors automate routine tasks, identify diseases earlier, personalize treatment plans, and even help prevent diseases altogether. This sounds amazing, but it also means future doctors will need to learn how to work with AI systems effectively, interpret their recommendations, and still make the final decisions with empathy and ethical reasoning.

Connections to My Current Interests

While this blog post was about medical AI, it reminded me of my own interests in computational linguistics and language models. Underneath these medical models are the same AI principles I study:

Training on large datasets
Fine-tuning models for specific tasks
Evaluating performance carefully and ethically

It also shows how domain-specific knowledge (like medicine) combined with AI skills can create powerful tools that can literally save lives. That motivates me to keep building my foundation in both language technologies and other fields, so I can be part of these interdisciplinary innovations in the future.

Final Thoughts

Overall, reading this blog post made me feel hopeful about the potential of AI in medicine, but also reminded me of the responsibility AI developers carry. Creating a medical superintelligence isn’t just about reaching a technological milestone. It’s about improving people’s lives safely, ethically, and equitably.

If you’re interested in AI for social good, I highly recommend reading the full article here. Let me know if you want me to write a future post about other applications of AI that I’ve been exploring this summer.

— Andrew

July 2, 2025 0

Is It Legal to Train AI on Books? A High School Researcher’s Take on the Anthropic Ruling

As someone who’s been exploring computational linguistics and large language models (LLMs), I’ve always wondered: How legal is it, really, to train AI on books or copyrighted material? This question came up while I was learning about how LLMs are trained using massive datasets, including books, articles, and other written works. It turns out the legal side is just as complex as the technical side.

A major U.S. court case in June 2025 helped answer this question, at least for now. In this post, I’ll break down what happened and what it means for researchers, developers, and creators.

The Big Picture: Copyright, Fair Use, and AI

In the U.S., books and intellectual property (IP) are protected under copyright law. That means you can’t just use someone’s novel or article however you want, especially if it’s for a commercial product.

However, there’s something called fair use, which allows limited use of copyrighted material without permission. Whether something qualifies as fair use depends on four factors:

The purpose of the use (such as commercial vs. educational)
The nature of the original work
The amount used
The effect on the market value of the original

LLM developers often argue that training models is “transformative.” In other words, the model doesn’t copy the books word for word. Instead, it learns patterns from large collections of text and generates new responses based on those patterns.

Until recently, this argument hadn’t been fully tested in court.

What Just Happened: The Anthropic Case (June 24, 2025)

In a landmark decision, U.S. District Judge William Alsup ruled that AI company Anthropic did not violate copyright law when it trained its Claude language model on books. The case was brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who argued that Anthropic had used their work without permission.

Andrea Bartz: The Lost Night: A Novel
Charles Graeber: The Good Nurse: A True Story of Medicine, Madness, and Murder
Kirk Wallace Johnson: The Fisherman and the Dragon: Fear, Greed, and a Fight for Justice on the Gulf Coast

Judge Alsup ruled that Anthropic’s use of the books qualified as fair use. He called the training process “exceedingly transformative” and explained that the model did not attempt to reproduce the authors’ styles or specific wording. Instead, the model learned patterns and structures in order to generate new language, similar to how a human might read and learn from books before writing something original.

However, the court also found that Anthropic made a serious mistake. The company had copied and stored more than 7 million pirated books in a central data library. Judge Alsup ruled that this was not fair use and was a clear violation of copyright law. A trial is scheduled for December 2025 to determine possible penalties, which could be up to $150,000 per work.

Why This Case Matters

This is the first major U.S. court ruling on whether training generative AI on copyrighted works can qualify as fair use. The result was mixed. On one hand, the training process itself was ruled legal. On the other hand, obtaining the data illegally was not.

This means AI companies can argue that their training methods are transformative, but they still need to be careful about where their data comes from. Using pirated books, even if the outcome is transformative, still violates copyright law.

Other lawsuits are still ongoing. Companies like OpenAI, Meta, and Microsoft are also facing legal challenges from authors and publishers. These cases may be decided differently, depending on how courts interpret fair use.

My Thoughts as a Student Researcher

To be honest, I understand both sides. As someone who is really excited about the possibilities of LLMs and has worked on research projects involving language models, I think it’s important to be able to learn from large and diverse datasets.

At the same time, I respect the work of authors and creators. Writing a book takes a lot of effort, and it’s only fair that their rights are protected. If AI systems are going to benefit from their work, then maybe there should be a system that gives proper credit or compensation.

For student researchers like me, this case is a reminder to be careful and thoughtful about where our data comes from. It also raises big questions about what responsible AI development looks like, not just in terms of what is allowed by law, but also what is fair and ethical.

Wrapping It Up

The Anthropic ruling is a big step toward defining the legal boundaries for training AI on copyrighted material. It confirmed that training can be legal under fair use if it is transformative, but it also made clear that sourcing content from pirated platforms is still a violation of copyright law.

This case does not settle the global debate, but it does provide some clarity for researchers and developers in the U.S. Going forward, the challenge will be finding a balance between supporting innovation and respecting the rights of creators.

— Andrew

Update (September 5, 2025):

AI startup Anthropic will pay at least $1.5 billion to settle a copyright infringement lawsuit over its use of books downloaded from the Internet to train its Claude AI models. The federal case, filed last year in California by several authors, accused Anthropic of illegally scraping millions of works from ebook piracy sites. As part of the settlement, Anthropic has agreed to destroy datasets containing illegally accessed works. (Read the full report)

June 24, 2025 0