Journals and Conferences for High School Students Interested in Computational Linguistics and NLP

As a high school student interested in studying computational linguistics and natural language processing (NLP) in college, I’ve always looked for ways to stay connected to the latest developments in the field. One of the most effective strategies I’ve found is diving into the world of academic activities: reading papers, following conference proceedings, and even working on papers of my own.

In this post, I’ve put together a list of reputable journals and major conferences in computational linguistics and NLP. These are the publications and venues I wish I had known about when I first started. If you’re just getting into the field, I hope this can serve as a useful starting point.

At the end, I’ve also included a quick update on my recent experiences with two conferences: NAACL 2025 and the upcoming SCiL 2025.

Part I: Journals
Here is a list of prominent journals suitable for publishing research in computational linguistics and natural language processing (NLP), based on their reputation, impact, and relevance to the field:

  1. Computational Linguistics
    • Published by MIT Press for the Association for Computational Linguistics (ACL) since 1988.
    • The primary archival journal for computational linguistics and NLP research, open access since 2009.
    • Focuses on computational and mathematical properties of language and NLP system design.
  2. Transactions of the Association for Computational Linguistics (TACL)
    • Sponsored by the ACL, open access, and archived in the ACL Anthology.
    • Publishes high-quality, peer-reviewed papers in NLP and computational linguistics.
  3. Journal of Machine Learning Research (JMLR)
    • Covers machine learning with some overlap in NLP, including computational linguistics applications.
    • Open access and highly regarded for theoretical and applied machine learning research.
  4. Journal of Artificial Intelligence Research (JAIR)
    • Publishes research in AI, including computational linguistics and NLP topics.
    • Open access with a broad scope in AI-related fields.
  5. Natural Language Engineering
    • Published by Cambridge University Press.
    • Focuses on practical applications of NLP and computational linguistics.
  6. Journal for Language Technology and Computational Linguistics (JLCL)
    • Published by the German Society for Computational Linguistics and Language Technology (GSCL).
    • Covers computational linguistics, language technology, and related topics.
  7. Language Resources and Evaluation
    • Focuses on language resources, evaluation methodologies, and computational linguistics.
    • Published by Springer, often includes papers on corpora and annotation.

Part II: Conferences
The following are the top-tier conferences in computational linguistics and NLP, known for their competitive acceptance rates (often around 25%) and high impact in the field:

  1. Annual Meeting of the Association for Computational Linguistics (ACL)
    • The flagship conference of the ACL, held annually in summer.
    • Covers all aspects of computational linguistics and NLP, highly prestigious.
  2. Empirical Methods in Natural Language Processing (EMNLP)
    • One of the top NLP conferences, focusing on empirical and data-driven NLP research.
    • Held annually.
  3. International Conference on Computational Linguistics (COLING)
    • A major international conference held biennially, covering a broad range of computational linguistics topics.
  4. North American Chapter of the Association for Computational Linguistics (NAACL)
    • The ACL’s North American chapter conference, held annually or biennially.
  5. European Chapter of the Association for Computational Linguistics (EACL)
    • The ACL’s European chapter conference, focusing on NLP research in Europe and beyond.
  6. Conference on Computational Natural Language Learning (CoNLL)
    • Focuses on computational learning approaches to NLP, sponsored by ACL SIGDAT.
    • Known for innovative research in natural language learning.
  7. Lexical and Computational Semantics and Semantic Evaluation (SemEval)
    • A workshop series under ACL, focusing on lexical semantics and evaluation tasks.
    • Highly regarded for shared tasks in NLP.
  8. International Joint Conference on Natural Language Processing (IJCNLP)
    • Held in Asia, often in collaboration with ACL or other organizations.
    • Covers a wide range of NLP topics with a regional focus.
  9. The Society for Computation in Linguistics (SCiL) conference
    • A newer and more specialized event compared to the well-established, top-tier conferences like ACL, EMNLP, COLING, NAACL, and EACL.
    • Began in 2018.
    • Narrower focus on mathematical and computational modeling within linguistics.
    • Frequently held as a sister society meeting alongside the LSA Annual Meeting
  10. Conference on Neural Information Processing Systems (NeurIPS)
    • A premier venue for machine learning research
    • Publish NLP-related papers, however, it is not a dedicated computational linguistics or NLP conference.

Part III: My Experience

NAACL 2025 took place in Albuquerque, New Mexico, from April 29 to May 4, 2025. As you might already know from my previous blog post, one of my co-authored papers was accepted to the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, part of NAACL 2025. Due to a scheduling conflict with school, I wasn’t able to attend in person—but I still participated remotely and followed the sessions virtually. It was an incredible opportunity to see the latest research and learn how experts in the field present and defend their work.

SCiL 2025 will be held from July 18 to July 20 at the University of Oregon, co-located with the LSA Summer Institute. I’ve already registered and am especially excited to meet some of the researchers whose work I’ve been reading. In particular, I’m hoping to connect with Prof. Jonathan Dunn, whose book Natural Language Processing for Corpus Linguistics I mentioned in a previous post. I’ll be sure to share a detailed reflection on the conference once I’m back.

If you’re interested in computational linguistics or NLP—even as a high school student—it’s never too early to start engaging with the academic community. Reading real papers, attending conferences, and publishing your own work can be a great way to learn, connect, and grow.

— Andrew

Ex Machina Gears Up for VEX Worlds 2026 in St. Louis

After an incredible season last year where our team, Ex Machina, competed at the VEX Robotics World Championship 2025, I’m excited to share that we’re back for another season! I’ll continue competing this season as a team member of Ex Machina, building on everything we learned from competing together at the global championship.


A New Season, A New Challenge

This year’s game for the VEX V5 Robotics Competition has been announced, and it looks both challenging and fun. Here is the official game reveal video so you can see what teams will be working on this season:

Watch the VEX V5 Robotics Competition 2026 Game Reveal

From the initial reveal, I can already tell that strategy, design innovation, and precise teamwork will be key to succeeding this year.


Balancing Robotics and College Applications

This season is going to be especially busy for me and my teammates. As rising seniors, we’re all deep into the college application process. Between essays, interviews, and preparing for upcoming deadlines, our schedules are definitely packed. But despite the workload, we’ve all decided to continue competing. Robotics has been such an important part of our high school journey, and we’re passionate about pushing ourselves further as a team in our final season together.


VEX Worlds 2026 Heads to St. Louis

There’s another big change this year: for 2026, the VEX Robotics World Championship is moving to St. Louis, Missouri! For the past few years, the event was held in Dallas, Texas, so this will be a new experience for everyone.

The championship will be held in April 2026 at the America’s Center Convention Complex in downtown St. Louis, with specific dates to be announced later. You can read more details about the upcoming event on the REC Foundation’s official page.

Here is a video introducing VEX Worlds 2026 in St. Louis to get you excited for what’s ahead:

VEX Robotics World Championship Heads to St. Louis in 2026


Looking Ahead

It feels both exciting and bittersweet to enter my final year of high school robotics. I know the journey ahead will be intense with balancing robot design, programming, and competition prep alongside college applications, but I’m ready for the challenge.

I’ll keep sharing updates about our season as we start building and competing, so stay tuned to see how Ex Machina continues to grow in 2026.

— Andrew

Summer Programs and Activities in Computational Linguistics: My Personal Experiences and Recommendations

If you’re a high school student interested in computational linguistics, you might be wondering: What are some ways to dive deeper into this field over the summer? As someone who loves language, AI, and everything in between, I’ve spent the past year researching programs and activities, and I wanted to share what I’ve learned (along with some of my personal experiences).


1. Summer Linguistic Institute for Youth Scholars (SLIYS)

What it is:
SLIYS is a two-week summer program run by The Ohio State University’s Department of Linguistics. It focuses on introducing high school students to language analysis and linguistic theory in a fun and rigorous way. Students get to explore syntax, morphology, phonetics, language universals, and even some computational topics.

My experience:
I’m super excited to share that I’ll be participating in SLIYS this summer (July 14 – 25, 2025). I was so happy to be accepted, and I’m looking forward to learning from real linguistics professors and meeting other students who are passionate about language. I’ll definitely share a reflection post after I finish the program, so stay tuned if you want an inside look!

Learn more about SLIYS here.


2. Summer Youth Camp for Computational Linguistics (SYCCL)

What it is:
SYCCL is a summer camp hosted by the Department of Linguistics and the Institute for Advanced Computational Science at Stony Brook University. It introduces high school students to computational linguistics and language technology, covering topics like language data, NLP tools, and coding for language analysis.

My experience:
I had planned to apply for SYCCL this year as well, but unfortunately, its schedule (July 6 – 18, 2025) conflicted with SLIYS, which I had already accepted. Another challenge I faced was that SYCCL’s website wasn’t updated until late April 2025, which is quite late compared to other summer programs. I had actually contacted the university earlier this year and they confirmed it would run again, but I didn’t see the application open until April. My advice is to check their website frequently starting early spring, and plan for potential conflicts with other summer programs.

Learn more about SYCCL here.


3. North American Computational Linguistics Open Competition (NACLO)

What it is:
NACLO is an annual computational linguistics competition for high school students across North America. It challenges students with problems in linguistics and language data analysis, testing their ability to decipher patterns in unfamiliar languages.

My experience:
I’ve tried twice to participate in NACLO at my local test center. Unfortunately, both times the test dates were weekdays that conflicted with my school final exams, so I had to miss them. If you’re planning to participate, I strongly recommend checking the schedule early to make sure it doesn’t overlap with finals or other major commitments. Despite missing it, I still find their practice problems online really fun and useful for thinking like a computational linguist.

Learn more about NACLO here.


4. LSA Summer Institute

What it is:
The Linguistic Society of America (LSA) Summer Institute is an intensive four-week program held every two years at different universities. It offers courses and workshops taught by top linguists and is known as one of the best ways to explore advanced topics in linguistics, including computational linguistics.

My experience:
I was planning to apply for the LSA Summer Institute this year. However, I found out that it is only open to individuals aged 18 and older. I contacted the LSA Institute Registration Office to ask if there could be any exceptions or special considerations for underage participants, but it was disappointing to receive their response: “Unfortunately, the age limit is firm and the organizers will not be considering any exceptions.” So if you’re thinking about applying, my advice is to check the age qualifications early before starting the application process.

Learn more about LSA Summer Institute here.


5. Local University Outreach Events and Courses

Another great way to explore linguistics and computational linguistics is by checking out courses or outreach events at local universities. For example, last summer I took LING 234 (Language and Diversity) at the University of Washington (Seattle). It was an eye-opening experience to study language variation, identity, and society from a college-level perspective. I wrote a reflection about it in my blog post from November 29, 2024. If your local universities offer summer courses for high school students, I highly recommend checking them out.


6. University-Affiliated AI4ALL Summer Programs for High School Students

What it is:
AI4ALL partners with universities to offer summer programs introducing high school students to AI research, ethics, and applications, often including NLP and language technology projects. While these programs are not focused solely on computational linguistics, they provide a great entry point into AI and machine learning, which are essential tools for language technology research.

About AI4ALL:
AI4ALL is a U.S.-based nonprofit focused on increasing diversity and inclusion in artificial intelligence (AI) education, research, development, and policy, particularly for historically underrepresented groups such as Black, Hispanic/Latinx, Indigenous, women, non-binary, low-income, and first-generation college students. Their mission is to make sure the next generation of AI researchers and developers reflects the diversity of the world.

Examples:

  • Stanford AI4ALL
  • Princeton AI4ALL
  • Carnegie Mellon AI4ALL

These programs are competitive and have different focus areas, but all aim to broaden participation in AI by empowering future researchers early.


Final Thoughts

I feel grateful to have these opportunities to grow my passion for computational linguistics, and I hope this list helps you plan your own summer learning journey. Whether you’re solving NACLO problems in your free time or spending two weeks at SLIYS like I will this summer, every step brings you closer to understanding how language and AI connect.

Let me know if you want a future post reviewing SLIYS after I complete it in July!

— Andrew

My Thoughts on “The Path to Medical Superintelligence”

Recently, I read an article published on Microsoft AI’s blog titled “The Path to Medical Superintelligence”. As a high school student interested in AI, computational linguistics, and the broader impacts of technology, I found this piece both exciting and a little overwhelming.


What Is Medical Superintelligence?

The blog talks about how Microsoft AI is working to build models with superhuman medical reasoning abilities. In simple terms, the idea is to create an AI that doesn’t just memorize medical facts but can analyze, reason, and make decisions at a level that matches or even surpasses expert doctors.

One detail that really stood out to me was how their new AI models also consider the cost of healthcare decisions. The article explained that while health costs vary widely depending on country and system, their team developed a method to consistently measure trade-offs between diagnostic accuracy and resource use. In other words, the AI doesn’t just focus on getting the diagnosis right, but also weighs how expensive or resource-heavy its suggested tests and treatments would be.

They explained that their current models already show impressive performance on medical benchmarks, such as USMLE-style medical exams, and that future models could go beyond question answering to support real clinical decision-making in a way that is both effective and efficient.


What Excites Me About This?

One thing that stood out to me was the potential impact on global health equity. The article mentioned that billions of people lack reliable access to doctors or medical specialists. AI models with advanced medical reasoning could help provide high-quality medical advice anywhere, bridging the gap for underserved communities.

It’s also amazing to think about how AI could support doctors by:

  • Reducing their cognitive load
  • Cross-referencing massive amounts of research
  • Helping with diagnosis and treatment planning

For someone like me who is fascinated by AI’s applications in society, this feels like a real-world example of AI doing good.


What Concerns Me?

At the same time, the blog post emphasized that AI is meant to complement doctors and health professionals, not replace them. I completely agree with this perspective. Medical decisions aren’t just about making the correct diagnosis. Doctors also need to navigate ambiguity, understand patient emotions and values, and build trust with patients and their families in ways AI isn’t designed to do.

Still, even if AI is only used as a tool to support clinicians, there are important concerns:

  • AI could give wrong or biased recommendations if the training data is flawed
  • It might suggest treatments without understanding a patient’s personal situation or cultural background
  • There is a risk of creating new inequalities if only wealthier hospitals or countries can afford the best AI models

Another thought I had was about how roles will evolve. The article mentioned that AI could help doctors automate routine tasks, identify diseases earlier, personalize treatment plans, and even help prevent diseases altogether. This sounds amazing, but it also means future doctors will need to learn how to work with AI systems effectively, interpret their recommendations, and still make the final decisions with empathy and ethical reasoning.


Connections to My Current Interests

While this blog post was about medical AI, it reminded me of my own interests in computational linguistics and language models. Underneath these medical models are the same AI principles I study:

  • Training on large datasets
  • Fine-tuning models for specific tasks
  • Evaluating performance carefully and ethically

It also shows how domain-specific knowledge (like medicine) combined with AI skills can create powerful tools that can literally save lives. That motivates me to keep building my foundation in both language technologies and other fields, so I can be part of these interdisciplinary innovations in the future.


Final Thoughts

Overall, reading this blog post made me feel hopeful about the potential of AI in medicine, but also reminded me of the responsibility AI developers carry. Creating a medical superintelligence isn’t just about reaching a technological milestone. It’s about improving people’s lives safely, ethically, and equitably.

If you’re interested in AI for social good, I highly recommend reading the full article here. Let me know if you want me to write a future post about other applications of AI that I’ve been exploring this summer.

— Andrew

How I Published My STEM Research in High School (and Where You Can Too)

Publishing as a high school student can be an exciting step toward academic growth and recognition. But if you’re anything like me when I started out, you’re probably wondering: Where do I even submit my work? And maybe more importantly, how do I avoid falling into the trap of predatory or low-quality journals?

In this post, I’ll walk through a curated list of reputable STEM journals that accept high school submissions—along with some honest thoughts from my own publishing journey. Whether you’re writing your first paper or looking for your next outlet, I hope this helps.


📚 10 Reputable Journals for High School Research (Especially STEM)

These are ranked loosely by selectiveness, peer-review rigor, and overall reputation. I’ve included each journal’s website, review cycle, and key details so you can compare.

  1. Columbia Junior Science Journal (CJSJ)
    Selection Rate: ~10-15% (very selective)
    Subjects: Natural sciences, engineering, social sciences
    Peer Review: Professional (Columbia faculty/editors)
    Cycle: Annual (6–9 months)
    🔗 cjsj.org
  2. Journal of Emerging Investigators (JEI)
    Selection Rate: ~70-75%
    Subjects: Biological/physical sciences (hypothesis-driven only)
    Peer Review: Graduate students and researchers
    Cycle: Rolling (7–8 months)
    🔗 emerginginvestigators.org
  3. STEM Fellowship Journal (SFJ)
    Selection Rate: ~15-20%
    Subjects: All STEM fields
    Peer Review: Canadian Science Publishing reviewers
    Cycle: Biannual (4–5 months)
    🔗 journal.stemfellowship.org
  4. International Journal of High School Research (IJHSR)
    Selection Rate: ~20–30%
    Subjects: STEM, behavioral, and social sciences
    Peer Review: Author-secured (3 academic reviewers)
    Cycle: Rolling (3–6 months)
    🔗 ijhsr.terrajournals.org
  5. The Young Researcher
    Selection Rate: ~20–25%
    Subjects: STEM, social sciences, humanities
    Peer Review: Faculty and researchers
    Cycle: Biannual (4–6 months)
    🔗 theyoungresearcher.com
  6. Journal of Student Research (JSR)
    Selection Rate: ~70–80%
    Subjects: All disciplines
    Peer Review: Faculty reviewers
    Cycle: Quarterly (6–7 months)
    🔗 jsr.org
  7. National High School Journal of Science (NHSJS)
    Selection Rate: ~20%
    Subjects: STEM and social sciences
    Peer Review: Student-led with academic oversight
    Cycle: Rolling (3–5 months)
    🔗 nhsjs.com
  8. Journal of High School Science (JHSS)
    Selection Rate: ~18%
    Subjects: STEM, arts (STEAM focus, quantitative research)
    Peer Review: Academic reviewers
    Cycle: Quarterly (4–6 months)
    🔗 jhss.scholasticahq.com
  9. Curieux Academic Journal
    Selection Rate: ~30–40%
    Subjects: STEM, humanities, social sciences
    Peer Review: Student-led with professional oversight
    Cycle: Monthly (fast-track: 2–5 weeks; standard: 1–3 months)
    🔗 curieuxacademicjournal.com
  10. Young Scientists Journal
    Selection Rate: ~40–50%
    Subjects: STEM (research, reviews, blogs)
    Peer Review: Student-led with expert input
    Cycle: Biannual (3–6 months)
    🔗 ysjournal.com

🧠 My Experience with JHSS, JSR, and NHSJS

1. Journal of High School Science (JHSS)
This was the first journal I submitted to on November 13, 2024. The submission process was straightforward, and the portal clearly tracked every stage of the review. I received feedback on December 29, but unfortunately, the reviewer seemed unfamiliar with the field of large language models. The decision was based on two Likert-scale questions:

  • “The paper makes a significant contribution to scholarship.”
  • “The literature review was thorough given the objectives and content.”

The first was marked low, and the second was marked neutral. I shared the feedback with LLM researchers from top-tier universities, and they agreed the review wasn’t well-grounded. So heads up: JHSS does have a formal structure, but you may run into an occasional reviewer mismatch.

2. Journal of Student Research (JSR)
Originally, I was going to submit my second paper here. But I ended up choosing NHSJS because JSR’s review timeline was too long for my goals (6–7 months vs. NHSJS’s 3–5 months). That said, JSR has one of the clearest submission guides I’ve come across:
👉 JSR Submission Info
If you’re not in a rush and want a polished process, it’s a solid option.

3. National High School Journal of Science (NHSJS)
This is where I published my first solo-authored research paper (see my earlier post). What stood out to me:

  • Quick response times
  • Detailed and constructive reviewer feedback

My reviewers gave me 19 major and 6 minor suggestions, each with specific guidance. It was incredibly helpful as a student navigating scientific writing for the first time.

That said, the journal’s submission format was a bit confusing (e.g., its citation style is non-standard), and the guidelines weren’t always followed by other authors. I had to clarify formatting details directly with the editor. So: highly recommend NHSJS—just make sure you confirm your formatting expectations early.


Final Thoughts

If you’re serious about publishing your research, take time to explore your options. The review process can be slow and sometimes frustrating, but it’s one of the best ways to grow as a thinker and writer.

Let me know if you have any questions. I’d be happy to share more from my experience.

— Andrew

SCiL vs. ACL: What’s the Difference? (A Beginner’s Take from a High School Student)

As a high school student just starting to explore computational linguistics, I remember being confused by two organizations: SCiL (Society for Computation in Linguistics) and ACL (Association for Computational Linguistics). They both focus on language and computers, so at first, I assumed they were basically the same thing.

It wasn’t until recently that I realized they are actually two different academic communities. Each has its own focus, audience, and style of research. I’ve had the chance to engage with both, which helped me understand how they are connected and how they differ.

Earlier this year, I had the opportunity to co-author a paper that was accepted to a NAACL 2025 workshop (May 3–4). NAACL stands for the North American Chapter of the Association for Computational Linguistics. It is a regional chapter that serves researchers in the United States, Canada, and Mexico. NAACL follows ACL’s mission and guidelines but focuses on more local events and contributions.

This summer, I will be participating in SCiL 2025 (July 18–19), where I hope to meet researchers and learn more about how computational models are used to study language structure and cognition. Getting involved with both events helped me better understand what makes SCiL and ACL unique, so I wanted to share what I’ve learned for other students who might also be starting out.

SCiL and ACL: Same Field, Different Focus

Both SCiL and ACL are academic communities interested in studying human language using computational methods. However, they focus on different kinds of questions and attract different types of researchers.

Here’s how I would explain the difference.

SCiL (Society for Computation in Linguistics)

SCiL is more focused on using computational tools to support linguistic theory and cognitive science. Researchers here are often interested in how language works at a deeper level, including areas like syntax, semantics, and phonology.

The community is smaller and includes people from different disciplines like linguistics, psychology, and cognitive science. You are likely to see topics such as:

  • Computational models of language processing
  • Formal grammars and linguistic structure
  • Psycholinguistics and cognitive modeling
  • Theoretical syntax and semantics

If you are interested in how humans produce and understand language, and how computers can help us model that process, SCiL might be a great place to start.

ACL (Association for Computational Linguistics)

ACL has a broader and more applied focus. It is known for its work in natural language processing (NLP), artificial intelligence, and machine learning. The research tends to focus on building tools and systems that can actually use human language in practical ways.

The community is much larger and includes researchers from both academia and major tech companies like Google, OpenAI, Meta, and Microsoft. You will see topics such as:

  • Language models like GPT, BERT, and LLaMA
  • Machine translation and text summarization
  • Speech recognition and sentiment analysis
  • NLP benchmarks and evaluation methods

If you want to build or study real-world AI systems that use language, ACL is the place where a lot of that cutting-edge research is happening.

Which One Should You Explore First?

It really depends on what excites you most.

If you are curious about how language works in the brain or how to use computational tools to test theories of language, SCiL is a great choice. It is more theory-driven and focused on cognitive and linguistic insights.

If you are more interested in building AI systems, analyzing large datasets, or applying machine learning to text and speech, then ACL might be a better fit. It is more application-oriented and connected to the latest developments in NLP.

They both fall under the larger field of computational linguistics, but they come at it from different angles. SCiL is more linguistics-first, while ACL is more NLP-first.

Final Thoughts

I am still early in my journey, but understanding the difference between SCiL and ACL has already helped me navigate the field better. Each community asks different questions, uses different methods, and solves different problems, but both are helping to push the boundaries of how we understand and work with language.

I am looking forward to attending SCiL 2025 this summer, and I will definitely write about that experience afterward. In the meantime, I hope this post helps other students who are just starting out and wondering where to begin.

— Andrew

Is It Legal to Train AI on Books? A High School Researcher’s Take on the Anthropic Ruling

As someone who’s been exploring computational linguistics and large language models (LLMs), I’ve always wondered: How legal is it, really, to train AI on books or copyrighted material? This question came up while I was learning about how LLMs are trained using massive datasets, including books, articles, and other written works. It turns out the legal side is just as complex as the technical side.

A major U.S. court case in June 2025 helped answer this question, at least for now. In this post, I’ll break down what happened and what it means for researchers, developers, and creators.


The Big Picture: Copyright, Fair Use, and AI

In the U.S., books and intellectual property (IP) are protected under copyright law. That means you can’t just use someone’s novel or article however you want, especially if it’s for a commercial product.

However, there’s something called fair use, which allows limited use of copyrighted material without permission. Whether something qualifies as fair use depends on four factors:

  1. The purpose of the use (such as commercial vs. educational)
  2. The nature of the original work
  3. The amount used
  4. The effect on the market value of the original

LLM developers often argue that training models is “transformative.” In other words, the model doesn’t copy the books word for word. Instead, it learns patterns from large collections of text and generates new responses based on those patterns.

Until recently, this argument hadn’t been fully tested in court.


What Just Happened: The Anthropic Case (June 24, 2025)

In a landmark decision, U.S. District Judge William Alsup ruled that AI company Anthropic did not violate copyright law when it trained its Claude language model on books. The case was brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who argued that Anthropic had used their work without permission.

  • Andrea Bartz: The Lost Night: A Novel
  • Charles Graeber: The Good Nurse: A True Story of Medicine, Madness, and Murder
  • Kirk Wallace Johnson: The Fisherman and the Dragon: Fear, Greed, and a Fight for Justice on the Gulf Coast

Judge Alsup ruled that Anthropic’s use of the books qualified as fair use. He called the training process “exceedingly transformative” and explained that the model did not attempt to reproduce the authors’ styles or specific wording. Instead, the model learned patterns and structures in order to generate new language, similar to how a human might read and learn from books before writing something original.

However, the court also found that Anthropic made a serious mistake. The company had copied and stored more than 7 million pirated books in a central data library. Judge Alsup ruled that this was not fair use and was a clear violation of copyright law. A trial is scheduled for December 2025 to determine possible penalties, which could be up to $150,000 per work.


Why This Case Matters

This is the first major U.S. court ruling on whether training generative AI on copyrighted works can qualify as fair use. The result was mixed. On one hand, the training process itself was ruled legal. On the other hand, obtaining the data illegally was not.

This means AI companies can argue that their training methods are transformative, but they still need to be careful about where their data comes from. Using pirated books, even if the outcome is transformative, still violates copyright law.

Other lawsuits are still ongoing. Companies like OpenAI, Meta, and Microsoft are also facing legal challenges from authors and publishers. These cases may be decided differently, depending on how courts interpret fair use.


My Thoughts as a Student Researcher

To be honest, I understand both sides. As someone who is really excited about the possibilities of LLMs and has worked on research projects involving language models, I think it’s important to be able to learn from large and diverse datasets.

At the same time, I respect the work of authors and creators. Writing a book takes a lot of effort, and it’s only fair that their rights are protected. If AI systems are going to benefit from their work, then maybe there should be a system that gives proper credit or compensation.

For student researchers like me, this case is a reminder to be careful and thoughtful about where our data comes from. It also raises big questions about what responsible AI development looks like, not just in terms of what is allowed by law, but also what is fair and ethical.


Wrapping It Up

The Anthropic ruling is a big step toward defining the legal boundaries for training AI on copyrighted material. It confirmed that training can be legal under fair use if it is transformative, but it also made clear that sourcing content from pirated platforms is still a violation of copyright law.

This case does not settle the global debate, but it does provide some clarity for researchers and developers in the U.S. Going forward, the challenge will be finding a balance between supporting innovation and respecting the rights of creators.

— Andrew

Update (September 5, 2025):

AI startup Anthropic will pay at least $1.5 billion to settle a copyright infringement lawsuit over its use of books downloaded from the Internet to train its Claude AI models. The federal case, filed last year in California by several authors, accused Anthropic of illegally scraping millions of works from ebook piracy sites. As part of the settlement, Anthropic has agreed to destroy datasets containing illegally accessed works. (Read the full report)

Using LLMs to Hear What Students Are Really Saying

Earlier this year, I had the opportunity to lead my nonprofit, Student Echo (student-echo.org), in a collaboration with the Lake Washington School District to analyze student survey data using Large Language Models (LLMs).

With support from Dr. Tim Krieger (Director of Data and Research) and my high school principal, Ms. VanderVeer, we focused on extracting insights from open-ended responses—comments that often get overlooked because they’re hard to analyze at scale.

Our goal was simple: use LLMs to help educators better understand what students are actually saying—what they care about, where they’re struggling, and what they wish could be different.

The analysis has since been shared with district educators to help inform future improvements in the student experience. I’m excited to share the full report below, which walks through the methods, findings, and a few key takeaways from the project.

Stay tuned—more student voices coming soon.

— Andrew

Ex Machina Goes Global: VEX Worlds 2025 Recap

From May 6 to May 8, 2025, my team and I had the chance to compete in the VEX Robotics World Championship—held at the Kay Bailey Hutchison Convention Center in Dallas. This annual event brings together the top-performing teams from around the globe for the VEX IQ, VEX V5, and VEX U competitions. We were there to represent Team 66475C – Ex Machina in the VEX V5 High School division.

Since 2021, my teams have qualified for Worlds five years in a row—each time representing Washington as one of the state’s top contenders. This year, we were proud to win the State Championship, earn our ticket to Dallas, and compete in the Design Division, which included 83 qualified teams from all over the world.

And we made it count:
🏆 Design Division Champions
🌍 Top 8 globally among 831 teams
💥 Quarterfinalists overall

Huge thanks to our incredible partner team 1010G (TenTon Robotics) from British Columbia, Canada, who helped make our division title possible. If you’re curious about how it all unfolded, you can catch the recap here:
👉 Watch the recap


My Role

As Main Builder, I utilized 3D modeling software to design the robot, ensuring efficient planning and resource management. I was actively involved in constructing all aspects of the robot, including the drive base and various subsystems. In this role, I also managed the team of builders, ensuring their work was properly integrated and aligned with the overall design, fostering collaboration and maintaining high standards throughout the building process.


Participating in this kind of international competition is incredibly rewarding—not just for the technical skills, but for what it teaches you about teamwork, dealing with pressure, and adapting to the unexpected. And honestly, one of the best parts is just making friends from all over the world.

If you’re interested in robotics, I highly recommend giving this competition a shot.

Coming soon: I’ll be sharing updates on my summer AI projects—stay tuned!

— Andrew

Back from Hibernation — A Paper, a Robot, and a Lot of Tests

It’s been a while—almost three months since my last post. Definitely not my usual pace. I wanted to check in and share why the blog has been a bit quiet recently—and more importantly, what I’ve been working on behind the scenes.

First, April and May were a whirlwind: I had seven AP exams, school finals, and was deep in preparation for the VEX Robotics World Championship. Balancing school with intense robotics scrimmages and code debugging meant there were a lot of late nights and early mornings—and not much time to write.

But the biggest reason for the radio silence? I’ve been working on a research paper that got accepted to NAACL 2025.

Our NAACL 2025 Paper: “A Bag-of-Sounds Approach to Multimodal Hate Speech Detection”

Over the past few months, I’ve had the opportunity to co-author a paper with Dr. Sidney Wong, focusing on multimodal hate speech detection using audio data. The paper was accepted to the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages at NAACL 2025.

You can read the full paper here:
👉 A Bag-of-Sounds Approach to Multimodal Hate Speech Detection

What we did:
We explored a “bag-of-sounds” method, training our model on Mel spectrogram features extracted from spoken social media content in Dravidian languages—specifically Malayalam and Tamil. Unlike most hate speech systems that rely solely on text, we wanted to see how well speech-based signals alone could perform.

How it went:
The results were mixed. Our system didn’t perform great on the final test set—but on the training and dev sets, we saw promise. The takeaway? With enough balanced and labeled audio data, speech can absolutely play a role in multimodal hate speech detection systems. It’s a step toward understanding language in more realistic, cross-modal contexts.

More importantly, this project helped me dive into the intersection of language, sound, and AI—and reminded me just how much we still have to learn when it comes to processing speech from low-resource languages.


Thanks for sticking around even when the blog went quiet. I’ll be back soon with a post about my experience at the VEX Robotics World Championship—stay tuned!

— Andrew

Blog at WordPress.com.

Up ↑