ACL 2025 New Theme Track: Generalization in NLP Models

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) will be happening in Vienna, Austria from July 27 to August 1. I won’t be attending in person, but as someone planning to study and do research in computational linguistics and NLP in college, I’ve been following the conference closely to keep up with the latest trends.

One exciting thing about this year’s ACL is its new theme track: Generalization of NLP Models. According to the official announcement:

“Following the success of the ACL 2020–2024 Theme tracks, we are happy to announce that ACL 2025 will have a new theme with the goal of reflecting and stimulating discussion about the current state of development of the field of NLP.

Generalization is crucial for ensuring that models behave robustly, reliably, and fairly when making predictions on data different from their training data. Achieving good generalization is critically important for models used in real-world applications, as they should emulate human-like behavior. Humans are known for their ability to generalize well, and models should aspire to this standard.

The theme track invites empirical and theoretical research and position and survey papers reflecting on the Generalization of NLP Models. The possible topics of discussion include (but are not limited to) the following:

  • How can we enhance the generalization of NLP models across various dimensions—compositional, structural, cross-task, cross-lingual, cross-domain, and robustness?
  • What factors affect the generalization of NLP models?
  • What are the most effective methods for evaluating the generalization capabilities of NLP models?
  • While Large Language Models (LLMs) significantly enhance the generalization of NLP models, what are the key limitations of LLMs in this regard?

The theme track submissions can be either long or short. We anticipate having a special session for this theme at the conference and a Thematic Paper Award in addition to other categories of awards.”

This year’s focus on generalization really highlights where the field is going—toward more robust, ethical, and real-world-ready NLP systems. It’s not just about making cool models anymore, but about making sure they work well across different languages, cultures, and use cases.

If you’re into reading papers like I am, especially ones that dig into how NLP systems can perform reliably on new or unexpected inputs, this theme track will be full of insights. I’m looking forward to checking out the accepted papers when they’re released.

You can read more at the official conference page: ACL 2025 Theme Track Announcement

— Andrew

Is It Legal to Train AI on Books? A High School Researcher’s Take on the Anthropic Ruling

As someone who’s been exploring computational linguistics and large language models (LLMs), I’ve always wondered: How legal is it, really, to train AI on books or copyrighted material? This question came up while I was learning about how LLMs are trained using massive datasets, including books, articles, and other written works. It turns out the legal side is just as complex as the technical side.

A major U.S. court case in June 2025 helped answer this question, at least for now. In this post, I’ll break down what happened and what it means for researchers, developers, and creators.


The Big Picture: Copyright, Fair Use, and AI

In the U.S., books and intellectual property (IP) are protected under copyright law. That means you can’t just use someone’s novel or article however you want, especially if it’s for a commercial product.

However, there’s something called fair use, which allows limited use of copyrighted material without permission. Whether something qualifies as fair use depends on four factors:

  1. The purpose of the use (such as commercial vs. educational)
  2. The nature of the original work
  3. The amount used
  4. The effect on the market value of the original

LLM developers often argue that training models is “transformative.” In other words, the model doesn’t copy the books word for word. Instead, it learns patterns from large collections of text and generates new responses based on those patterns.

Until recently, this argument hadn’t been fully tested in court.


What Just Happened: The Anthropic Case (June 24, 2025)

In a landmark decision, U.S. District Judge William Alsup ruled that AI company Anthropic did not violate copyright law when it trained its Claude language model on books. The case was brought by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who argued that Anthropic had used their work without permission.

  • Andrea Bartz: The Lost Night: A Novel
  • Charles Graeber: The Good Nurse: A True Story of Medicine, Madness, and Murder
  • Kirk Wallace Johnson: The Fisherman and the Dragon: Fear, Greed, and a Fight for Justice on the Gulf Coast

Judge Alsup ruled that Anthropic’s use of the books qualified as fair use. He called the training process “exceedingly transformative” and explained that the model did not attempt to reproduce the authors’ styles or specific wording. Instead, the model learned patterns and structures in order to generate new language, similar to how a human might read and learn from books before writing something original.

However, the court also found that Anthropic made a serious mistake. The company had copied and stored more than 7 million pirated books in a central data library. Judge Alsup ruled that this was not fair use and was a clear violation of copyright law. A trial is scheduled for December 2025 to determine possible penalties, which could be up to $150,000 per work.


Why This Case Matters

This is the first major U.S. court ruling on whether training generative AI on copyrighted works can qualify as fair use. The result was mixed. On one hand, the training process itself was ruled legal. On the other hand, obtaining the data illegally was not.

This means AI companies can argue that their training methods are transformative, but they still need to be careful about where their data comes from. Using pirated books, even if the outcome is transformative, still violates copyright law.

Other lawsuits are still ongoing. Companies like OpenAI, Meta, and Microsoft are also facing legal challenges from authors and publishers. These cases may be decided differently, depending on how courts interpret fair use.


My Thoughts as a Student Researcher

To be honest, I understand both sides. As someone who is really excited about the possibilities of LLMs and has worked on research projects involving language models, I think it’s important to be able to learn from large and diverse datasets.

At the same time, I respect the work of authors and creators. Writing a book takes a lot of effort, and it’s only fair that their rights are protected. If AI systems are going to benefit from their work, then maybe there should be a system that gives proper credit or compensation.

For student researchers like me, this case is a reminder to be careful and thoughtful about where our data comes from. It also raises big questions about what responsible AI development looks like, not just in terms of what is allowed by law, but also what is fair and ethical.


Wrapping It Up

The Anthropic ruling is a big step toward defining the legal boundaries for training AI on copyrighted material. It confirmed that training can be legal under fair use if it is transformative, but it also made clear that sourcing content from pirated platforms is still a violation of copyright law.

This case does not settle the global debate, but it does provide some clarity for researchers and developers in the U.S. Going forward, the challenge will be finding a balance between supporting innovation and respecting the rights of creators.

— Andrew

Update (September 5, 2025):

AI startup Anthropic will pay at least $1.5 billion to settle a copyright infringement lawsuit over its use of books downloaded from the Internet to train its Claude AI models. The federal case, filed last year in California by several authors, accused Anthropic of illegally scraping millions of works from ebook piracy sites. As part of the settlement, Anthropic has agreed to destroy datasets containing illegally accessed works. (Read the full report)

My First Solo Publication: A Case Study on Sentiment Analysis in Survey Data

I’m excited to share that my first solo-authored research paper has just been published in the National High School Journal of Science! 🎉

The paper is titled “A Case Study of Sentiment Analysis on Survey Data Using LLMs versus Dedicated Neural Networks”, and it explores a question I’ve been curious about for a while: how do large language models (like GPT-4o or LLaMA-3) compare to task-specific neural networks when it comes to analyzing open-ended survey responses?

If you’ve read some of my earlier posts—like my reflection on the DravidianLangTech shared task or my thoughts on Jonathan Dunn’s NLP book—you’ll know that sentiment analysis has become a recurring theme in my work. From experimenting with XLM-RoBERTa on Tamil and Tulu to digging into how NLP can support corpus linguistics, this paper feels like the natural next step in that exploration.

Why This Matters to Me

Survey responses are messy. They’re full of nuance, ambiguity, and context—and yet they’re also where we hear people’s honest voices. I’ve always thought it would be powerful if AI could help us make sense of that kind of data, especially in educational or public health settings where understanding sentiment could lead to real change.

In this paper, I compare how LLMs and dedicated models handle that challenge. I won’t go into the technical details here (the paper does that!), but one thing that stood out to me was how surprisingly effective LLMs are—even without task-specific fine-tuning.

That said, they come with trade-offs: higher computational cost, more complexity, and the constant need to assess bias and interpretability. There’s still a lot to unpack in this space.

Looking Ahead

This paper marks a milestone for me, not just academically but personally. It brings together things I’ve been learning in courses, competitions, side projects, and books—and puts them into conversation with each other. I’m incredibly grateful to the mentors and collaborators who supported me along the way.

If you’re interested in sentiment analysis, NLP for survey data, or just want to see what a high school research paper can look like in this space, I’d love for you to take a look:
🔗 Read the full paper here

Thanks again for following along this journey. Stay tuned!

Shared Task at DravidianLangTech 2025

In 2025, I had the privilege of participating in the shared task on Sentiment Analysis in Tamil and Tulu as part of the DravidianLangTech@NAACL 2025 conference. The task was both challenging and enlightening, as it required applying machine learning techniques to multilingual data with varying sentiment nuances. This post highlights the work I did, the methodology I followed, and the results I achieved.


The Task at Hand

The goal of the task was to classify text into one of four sentiment categories: Positive, Negative, Mixed Feelings, and Unknown State. The datasets provided were in Tamil and Tulu, which made it a fascinating opportunity to work with underrepresented languages.


Methodology

I implemented a pipeline to preprocess the data, tokenize it, train a transformer-based model, and evaluate its performance. My choice of model was XLM-RoBERTa, a multilingual transformer capable of handling text from various languages effectively. Below is a concise breakdown of my approach:

  1. Data Loading and Inspection:
    • Used training, validation, and test datasets in .xlsx format.
    • Inspected the data for missing values and label distributions.
  2. Text Cleaning:
    • Created a custom function to clean text by removing unwanted characters, punctuation, and emojis.
    • Removed common stopwords to focus on meaningful content.
  3. Tokenization:
    • Tokenized the cleaned text using the pre-trained XLM-RoBERTa tokenizer with a maximum sequence length of 128.
  4. Model Setup:
    • Leveraged XLM-RoBERTaForSequenceClassification with 4 output labels.
    • Configured TrainingArguments to train for 3 epochs with evaluation at the end of each epoch.
  5. Evaluation:
    • Evaluated the model on the validation set, achieving a Validation Accuracy of 59.12%.
  6. Saved Model:
    • Saved the trained model and tokenizer for reuse.

Results

After training the model for three epochs, the validation accuracy was 59.12%. While there is room for improvement, this score demonstrates the model’s capability to handle complex sentiment nuances in low-resource languages like Tamil.


The Code

Below is an overview of the steps in the code:

  • Preprocessing: Cleaned and tokenized the text to prepare it for model input.
  • Model Training: Used Hugging Face’s Trainer API to simplify the training process.
  • Evaluation: Compared predictions against ground truth to compute accuracy.

To make this process more accessible, I’ve attached the complete code as a downloadable file. However, for a quick overview, here’s a snippet from the code that demonstrates how the text was tokenized:

# Tokenize text data using the XLM-RoBERTa tokenizer
def tokenize_text(data, tokenizer, max_length=128):
return tokenizer(
data,
truncation=True,
padding='max_length',
max_length=max_length,
return_tensors="pt"
)

train_tokenized = tokenize_text(train['cleaned'].tolist(), tokenizer)
val_tokenized = tokenize_text(val['cleaned'].tolist(), tokenizer)

This function ensures the input text is prepared correctly for the transformer model.


Reflections

Participating in this shared task was a rewarding experience. It highlighted the complexities of working with low-resource languages and the potential of transformers in tackling these challenges. Although the accuracy could be improved with hyperparameter tuning and advanced preprocessing, the results are a promising step forward.


Download the Code

I’ve attached the full code used for this shared task. Feel free to download it and explore the implementation in detail.


If you’re interested in multilingual NLP or sentiment analysis, I’d love to hear your thoughts or suggestions on improving this approach! Leave a comment below or connect with me via the blog.

I am back!

This will be a short post since I’m planning to post a more in-depth discussion on one thing that I’ve been up to over the summer. Between writing a research paper (currently under review by the Journal of High School Science) and founding a nonprofit called Student Echo, I’ve been keeping myself busy. Despite all this, I plan to post shorter updates more frequently here. Sorry for the wait—assuming anyone was actually waiting—but hey, here you go.

Here’s a bit more about what’s been keeping me occupied:
My Research Paper
Title: Comparing Performance of LLMs vs. Dedicated Neural Networks in Analyzing the Sentiment of Survey Responses
Abstract: Interpreting sentiment in open-ended survey data is a challenging but crucial task in the age of digital information. This paper studies the capabilities of three LLMs, Gemini-1.5-Flash, Llama-3-70B, and GPT-4o, comparing them to dedicated, sentiment analysis neural networks, namely RoBERTa-base-sentiment and DeBERTa-v3-base-absa. These models were evaluated on their accuracy along with other metrics (precision, recall, and F1-score) in determining the underlying sentiment of responses from two COVID-19 surveys. The results revealed that despite being designed for broader applications, all three LLMs generally outperformed specialized neural networks, with the caveat that RoBERTa was the most precise at detecting negative sentiment. While LLMs are more resource-intensive than dedicated neural networks, their enhanced accuracy demonstrates their evolving potential and justifies the increased resource costs in sentiment analysis.

My Nonprofit: Student Echo
Website: https://www.student-echo.org/
Student-Echo.org is a student-led non-profit organization with the mission of amplifying students’ voices through student-designed questionnaires, AI-based technology, and close collaboration among students, teachers, and school district educators.

Blog at WordPress.com.

Up ↑