acl-2025 – CompLing Insights

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) recently finished in Vienna, Austria from July 27 to August 1. The conference announced a few awards, one of which is Best Social Impact Paper. This award was given to two papers:

AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset (by Charles Nimo et al.)
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions (by Elisa Bassignana, Amanda Cercas Curry, and Dirk Hovy).

In this blog post, I’ll talk about the second paper and share the findings from the paper and my thoughts on the topic. You can read the full paper here: https://aclanthology.org/2025.acl-long.914.pdf

What the Paper is About

This paper investigates how socioeconomic status (SES) influences interactions with language technologies, particularly large language models (LLMs) like ChatGPT, highlighting an emerging “AI Gap” that could exacerbate social inequalities. Drawing from the Technology Acceptance Model and prior work on digital divides, the authors argue that SES shapes technology adoption through factors like access, digital literacy, and linguistic habits, potentially biasing LLMs toward higher-SES patterns and underrepresenting lower-SES users.

Methods

The study surveys 1,000 English-speaking participants from the UK and US via Prolific, stratified by self-reported SES using the MacArthur scale (binned as low: 1-3, middle: 4-7, upper: 8-10). It collects sociodemographic data, usage patterns of language technologies (e.g., spell checkers, AI chatbots), and 6,482 real prompts from prior LLM interactions. Analysis includes statistical tests (e.g., chi-square for usage differences), linguistic metrics (e.g., prompt length, concreteness via Brysbaert et al.’s word ratings), topic modeling (using embeddings, UMAP, HDBSCAN, and GPT-4 for cluster descriptions), and markers of anthropomorphism (e.g., phatic expressions like “hi” and politeness markers like “thank you”).

Key Findings

Usage Patterns: Higher-SES individuals access more devices daily (e.g., laptops, smartwatches) and use LLMs more frequently (e.g., daily vs. rarely for lower SES). They employ LLMs for work/education (e.g., coding, data analysis, writing) and technical contexts, while lower-SES users favor entertainment, brainstorming, and general knowledge queries. Statistically significant differences exist in frequency (p < 0.001), contexts (p < 0.001), and tasks (p < 0.001).
Linguistic Differences in Prompts: Higher-SES prompts are shorter (avg. 18.4 words vs. 27.0 for low SES; p < 0.05) and more abstract (concreteness score: 2.57 vs. 2.66; p < 0.05). Lower-SES prompts show higher anthropomorphism (e.g., more phatic expressions) and concrete language. A bag-of-words classifier distinguishes SES groups (Macro-F1 39.25 vs. baseline 25.02).
Topics and Framing: Common topics (e.g., translation, mental health, medical advice, writing, text editing, finance, job, food) appear across groups, but framing varies—e.g., lower SES seeks debt reduction or low-skill jobs; higher SES focuses on investments, travel itineraries, or inclusivity. About 45% of prompts resemble search-engine queries, suggesting LLMs are replacing traditional searches.
User Perceptions: Trends indicate lower-SES users anthropomorphize more (e.g., metaphorical verbs like “ask”), while higher-SES use jargon (e.g., “generate”), though not statistically significant.

Discussion and Implications

The findings underscore how SES stratifies LLM use, with higher-SES benefiting more in professional/educational contexts, potentially widening inequalities as LLMs optimize for their patterns. Benchmarks may overlook lower-SES styles, leading to biases. The authors advocate the development of inclusive NLP technologies to accommodate different SES needs and habitus and mitigate the existing AI Gap.

Limitations and Ethics

Limited to Prolific crowdworkers (skewed middle/low SES, tech-savvy), subjective SES measures, and potential LLM-generated responses. Ethical compliance includes GDPR anonymity, opt-outs, and fair compensation (£9/hour).

Overall, the paper reveals SES-driven disparities in technology interactions, urging NLP development to address linguistic and habitual differences for equitable access and reduced digital divides.

My Takeaway

As a high school student who spends a lot of time thinking about fairness in AI, I find this paper important because it reminds us that bias is not just about language or culture, it can also be tied to socioeconomic status. This is something I had not thought much about before. If AI systems are trained mostly on data from higher SES groups, they might misunderstand or underperform for people from lower SES backgrounds. That could affect how well people can use AI for education, job searching, or even just getting accurate information online.

For me, the takeaway is that AI researchers need to test their models with SES diversity in mind, just like they do with gender or language diversity. And as someone interested in computational linguistics, it is inspiring to see that work like this is getting recognized with awards at ACL.

— Andrew

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) will be happening in Vienna, Austria from July 27 to August 1. I won’t be attending in person, but as someone planning to study and do research in computational linguistics and NLP in college, I’ve been following the conference closely to keep up with the latest trends.

One exciting thing about this year’s ACL is its new theme track: Generalization of NLP Models. According to the official announcement:

“Following the success of the ACL 2020–2024 Theme tracks, we are happy to announce that ACL 2025 will have a new theme with the goal of reflecting and stimulating discussion about the current state of development of the field of NLP.

Generalization is crucial for ensuring that models behave robustly, reliably, and fairly when making predictions on data different from their training data. Achieving good generalization is critically important for models used in real-world applications, as they should emulate human-like behavior. Humans are known for their ability to generalize well, and models should aspire to this standard.

The theme track invites empirical and theoretical research and position and survey papers reflecting on the Generalization of NLP Models. The possible topics of discussion include (but are not limited to) the following:

How can we enhance the generalization of NLP models across various dimensions—compositional, structural, cross-task, cross-lingual, cross-domain, and robustness?
What factors affect the generalization of NLP models?
What are the most effective methods for evaluating the generalization capabilities of NLP models?
While Large Language Models (LLMs) significantly enhance the generalization of NLP models, what are the key limitations of LLMs in this regard?

The theme track submissions can be either long or short. We anticipate having a special session for this theme at the conference and a Thematic Paper Award in addition to other categories of awards.”

This year’s focus on generalization really highlights where the field is going—toward more robust, ethical, and real-world-ready NLP systems. It’s not just about making cool models anymore, but about making sure they work well across different languages, cultures, and use cases.

If you’re into reading papers like I am, especially ones that dig into how NLP systems can perform reliably on new or unexpected inputs, this theme track will be full of insights. I’m looking forward to checking out the accepted papers when they’re released.

You can read more at the official conference page: ACL 2025 Theme Track Announcement

— Andrew

Tag: acl-2025

ACL 2025 New Theme Track: Generalization in NLP Models