How Computational Linguistics Can Help Stop Phishing Emails?

I’ve always been curious about how language can reveal hidden clues. One place this really shows up is in phishing emails. These are the fake messages that try to trick people into giving away passwords or personal information. They are annoying, but also dangerous, which makes them a great case study for how computational linguistics can be applied in real life.

Why Phishing Emails Matter

Phishing is more than just spam. A single click on the wrong link can cause real damage, from stolen accounts to financial loss. What interests me is that these emails often give themselves away through language. That is where computational linguistics comes in.

How Language Analysis Helps Detect Phishing

  • Spotting unusual patterns: Models can flag odd grammar or overly formal phrases that do not fit normal business communication.
  • Checking stylistic fingerprints: Everyone has a writing style. Computational models can learn those styles and catch imposters pretending to be someone else.
  • Finding emotional manipulation: Many phishing emails use urgency or fear, like “Act now or your account will be suspended.” Sentiment analysis can identify these tactics.
  • Looking at context and meaning: Beyond surface words, models can ask whether the message makes sense in context. A bank asking for login details over email does not line up with how real banks communicate.

Why This Stood Out to Me

What excites me about this problem is that it shows how language technology can protect people. I like studying computational linguistics because it is not just about theory. It has real applications like this that touch everyday life. By teaching computers to recognize how people write, we can stop scams before they reach someone vulnerable.

My Takeaway

Phishing shows how much power is hidden in language, both for good and for harm. To me, that is the motivation for studying computational linguistics: to design tools that understand language well enough to help people. Problems like phishing remind me why the field matters.


📚 Further Reading

Here are some recent peer-reviewed papers if you want to dive deeper into how computational linguistics and machine learning are used to detect phishing:

  • Recommended for beginners
    Saias, J. (2025). Advances in NLP Techniques for Detection of Message-Based Threats in Digital Platforms: A Systematic Review. Electronics, 14(13), 2551. https://doi.org/10.3390/electronics14132551
    A recent review covering multiple types of digital messaging threats—including phishing—using modern NLP methods. It’s accessible, up to date, and provides a helpful overview. Why I recommend this: As someone still learning computational linguistics, I like starting with survey papers that show many ideas in one place. This one is fresh and covers a lot of ground.
  • Jaison J. S., Sadiya H., Himashree S., M. Jomi Maria Sijo, & Anitha T. G. (2025). A Survey on Phishing Email Detection Techniques: Using LSTM and Deep Learning. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 13(8). https://doi.org/10.22214/ijraset.2025.73836
    Overviews deep learning methods like LSTM, BiLSTM, CNN, and Transformers in phishing detection, with notes on datasets and practical challenges.
  • Alhuzali, A., Alloqmani, A., Aljabri, M., & Alharbi, F. (2025). In-Depth Analysis of Phishing Email Detection: Evaluating the Performance of Machine Learning and Deep Learning Models Across Multiple Datasets. Applied Sciences, 15(6), 3396. https://doi.org/10.3390/app15063396
    Compares various machine learning and deep learning detection models across datasets, offering recent performance benchmarks.

— Andrew

4,811 hits

From Human Chatbots to Whale and Bird Talk: The Surprising Rise of Bio-Acoustic NLP in 2025

As a high school student passionate about computational linguistics, I find it amazing how the same technologies that power our everyday chatbots and voice assistants are now being used to decode animal sounds. This emerging area blends bioacoustics (the study of animal vocalizations) with natural language processing (NLP) and machine learning. Researchers are starting to treat animal calls almost like a form of language, analyzing them for patterns, individual identities, species classification, and even possible meanings.

Animal vocalizations do not use words the way humans do, but they frequently show structure, repetition, and context-dependent variation, features that remind us of linguistic properties in human speech.

A Highlight from ACL 2025: Monkey Voices Get the AI Treatment

One of the most interesting papers presented at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), the leading conference in our field, focuses directly on this topic.

Paper title: “Acoustic Individual Identification of White-Faced Capuchin Monkeys Using Joint Multi-Species Embeddings

Authors: Álvaro Vega-Hidalgo, Artem Abzaliev, Thore Bergman, Rada Mihalcea (University of Michigan)

What the paper covers

White-faced capuchin monkeys each have a unique vocal signature. Being able to identify which individual is calling is valuable for studying their social structures, kinship, and conservation efforts.

The main difficulty is the lack of large labeled datasets for wild or rare species. Human speech has massive annotated corpora, but animal data is much scarcer.

The researchers address this through cross-species pre-training, a transfer learning strategy. They take acoustic embedding models (essentially sound “fingerprints”) pre-trained on: (1) Extensive human speech data and (2) Large-scale bird call datasets.

These models are then applied to white-faced capuchin vocalizations, even though the original training never included capuchin sounds.

Key findings

  • Embeddings derived from human speech and bird calls transferred surprisingly well to monkey vocalizations.
  • Combining multi-species representations (joint embeddings) improved identification accuracy further.

This demonstrates how knowledge from one domain can help another distant one, similar to how learning one human language can make it easier to pick up a related one. It offers a practical solution to the data scarcity problem that often limits animal bioacoustics research.

This paper was one of 22 contributions from the University of Michigan’s Computer Science and Engineering group at ACL 2025, showing how far computational linguistics has expanded beyond traditional human text and speech.

Another ACL 2025 Contribution: Exploring Dog Communication

ACL 2025 also included “Toward Automatic Discovery of a Canine Phonetic Alphabet” by Theron S. Wang and colleagues. The work investigates the phonetic-like building blocks in dog vocalizations and aims to discover them automatically. This is an early step toward analyzing dog sounds in a more structured, language-inspired framework.

Why This Matters

  • Conservation applications — Automated systems can monitor endangered species like whales or rare birds continuously, reducing the need for long-term human fieldwork in remote locations.
  • Insights into animal communication — Researchers are beginning to test whether calls follow rule-based patterns or convey specific information (about food, threats, or social bonds), much like how humans use syntax and intonation.
  • Transfer of AI techniques — Models originally built for human speech transfer effectively to other species. New foundation models in 2025 (e.g., like NatureLM-audio) even handle thousands of animal species and support natural language queries such as “What bird is calling here?”

While these ACL 2025 papers represent cutting-edge academic work, the broader field is gaining momentum, with related discussions appearing in events like the 2025 NeurIPS workshop on AI for Non-Human Animal Communication.

This area is growing rapidly thanks to better data availability and stronger models. In the coming years, we might see practical tools that help interpret bird alarm calls or monitor ocean ecosystems through whale vocalizations.

What do you think? Would you be excited to build a simple AI tool to analyze your pet’s sounds or contribute to dolphin communication research? Computational linguistics is moving far beyond chatbots. It is now helping us listen to the voices of the entire planet.

Thanks for reading. I’d love to hear your thoughts in the comments!

— Andrew

4,811 hits

How AI and Computational Linguistics Are Unlocking Medieval Jewish History

On December 3 (2025), ACM TechNews featured a story about a groundbreaking use of artificial intelligence in historical and linguistic research. It referred to an earlier report “Vast trove of medieval Jewish records opened up by AI” from Reuters. The article described a new project applying AI to the Cairo Geniza, a massive archive of medieval Jewish manuscripts that spans nearly one thousand years. These texts were preserved in a synagogue storeroom and contain records of daily life, legal matters, trade, personal letters, religious study, and community events.

The goal of the project is simple in theory and monumental in practice. Researchers are training an AI system to read, transcribe, and organize hundreds of thousands of handwritten documents. This would allow scholars to access the material far more quickly than traditional methods permit.


Handwriting Recognition for Historical Scripts

Computational linguistics plays a direct role in how machines learn to read ancient handwriting. AI models can be taught to detect character shapes, page layouts, and writing patterns even when the script varies from one writer to another or comes from a style no longer taught today. This helps the system replicate the work of experts who have spent years studying how historical scripts evolved.


Making the Text Searchable and Comparable

Once the handwriting is converted to text, another challenge begins. Historical manuscripts often use non standard spelling, abbreviations, and inconsistent grammar. Computational tools can normalize these differences, allowing researchers to search archives accurately and evaluate patterns that would be difficult to notice manually.


Extracting Meaning Through NLP

After transcription and normalization, natural language processing tools can identify names, dates, locations, and recurring themes in the documents. This turns raw text into organized data that supports historical analysis. Researchers can explore how people, places, and ideas were connected across time and geography.


Handling Multiple Languages and Scripts

The Cairo Geniza contains material written in Hebrew, Arabic, Aramaic, and Yiddish. A transcription system must recognize and handle multiple scripts, alphabets, and grammatical structures. Computational linguistics enables the AI to adapt to these differences so the dataset becomes accessible as a unified resource.


Restoring Damaged Manuscripts

Many texts are incomplete because of age and physical deterioration. Modern work in ancient text restoration uses machine learning models to predict missing letters or words based on context and surrounding information. This helps scholars reconstruct documents that might otherwise remain fragmented.


Why This Matters for Researchers and the Public

AI allows scholars to process these manuscripts on a scale that would not be feasible through manual transcription alone. Once searchable, the collection becomes a resource for historians, linguists, and genealogists. Connections between communities and individuals can be explored in ways that were not possible before. Articles about the project suggest that this could lead to a mapping of relationships similar to a historical social graph.

This technology also expands access beyond expert scholars. Students, teachers, local historians, and interested readers may one day explore the material in a clear and searchable form. If automated translation improves alongside transcription, the archive could become accessible to a global audience.


Looking Ahead

This project is a strong example of how computational linguistics can support the humanities. It shows how tools developed for modern language tasks can be applied to cultural heritage, historical research, and community memory. AI is not replacing the work of historians. Instead, it is helping uncover material that scholars would never have time to process on their own.

Projects like this remind us that the intersection of language and technology is not only changing the future. It is now offering a deeper look into the past.

— Andrew

4,811 hits

AI Sycophancy: When Our Chatbots Say “Yes” Instead of “Why”

“I asked ChatGPT to check my argument and it just kept agreeing with me.”
“Gemini told me my logic was solid even when I knew it wasn’t.”
“Grok feels like a hype-man, not a thinking partner.”

These are the kinds of comments I keep seeing from my school friends who feel that modern AI tools are becoming too agreeable for their own good. Instead of challenging flawed reasoning or offering alternative perspectives, many chatbots default to affirmation. This behavior has a name: AI sycophancy. The term does not originate from me. It comes from recent research and ongoing conversations in the AI community, where scholars are identifying a growing tendency for AI systems to prioritize user approval over honest reasoning.

At first glance, this might feel harmless or even comforting. After all, who does not like being told they are right? But beneath that friendliness lies a deeper problem that affects how we learn, decide, and think.


What is AI Sycophancy?

AI sycophancy refers to a pattern in which an AI system aligns its responses too closely with a user’s expressed beliefs or desires, even when those beliefs conflict with evidence or logic. Rather than acting as an independent evaluator, the model becomes a mirror.

For example, a user might say, “I think this argument is correct. Do you agree?” and the model responds with enthusiastic confirmation instead of critical analysis. Or the system might soften disagreement so much that it effectively disappears. Recent research from Northeastern University confirms that this behavior is measurable and problematic. Their report, The AI industry has a problem: Chatbots are too nice, shows that when models alter their reasoning to match a user’s stance, their overall accuracy and rationality decline.
https://news.northeastern.edu/2025/11/24/ai-sycophancy-research/


Why Does It Exist?

Several forces contribute to the rise of AI sycophancy:

  • Training incentives and reward systems.
    Many models are optimized to be helpful, polite, and pleasant. When user satisfaction is a core metric, models learn that agreement often leads to positive feedback.
  • User expectations.
    People tend to treat chatbots as friendly companions rather than critical reviewers. When users express certainty, the model often mirrors that confidence instead of questioning it.
  • Alignment trade-offs.
    The Northeastern team highlights a tension between sounding human and being rational. In attempting to appear empathetic and affirming, the model sometimes sacrifices analytical rigor.
  • Ambiguous subject matter.
    In questions involving ethics, predictions, or subjective judgment, models may default to agreement rather than risk appearing confrontational or incorrect.

What Are the Impacts?

The consequences of AI sycophancy extend beyond mild annoyance.

  • Weakened critical thinking.
    Students who rely on AI for feedback may miss opportunities to confront their own misconceptions.
  • Lower reasoning quality.
    The Northeastern study found that adjusting answers to match user beliefs correlates with poorer logic and increased error rates.
  • Risk in high-stakes contexts.
    In healthcare, policy, or education, an overly agreeable AI can reinforce flawed assumptions and lead to harmful decisions.
  • False confidence.
    When AI consistently affirms users, it creates an illusion of correctness that discourages self-reflection.
  • Ethical concerns.
    A system that never challenges bias or misinformation becomes complicit in reinforcing it.

How to Measure and Correct It

Measuring sycophancy

Researchers measure sycophancy by observing how much a model shifts its answer after a user asserts a belief. A typical approach involves:

  • Presenting the model with a scenario and collecting its initial judgment.
  • Repeating the scenario alongside a strong user opinion or belief.
  • Comparing the degree to which the model’s stance moves toward the user’s position.
  • Evaluating whether the reasoning quality improves, stays stable, or deteriorates.

The greater the shift without supporting evidence, the higher the sycophancy score.


Correcting the behavior

Several strategies show promise:

  • Penalize agreement that lacks evidence during training.
  • Encourage prompts that demand critique or alternative views.
  • Require models to express uncertainty or justify reasoning steps.
  • Educate users to value disagreement as a feature rather than a flaw.
  • Use multi-agent systems where one model challenges another.
  • Continuously track and adjust sycophancy metrics in deployed systems.

Why This Matters to Me as a Student

As someone preparing to study computational linguistics and NLP, I want AI to help sharpen my thinking, not dull it. If my research assistant simply validates every claim I make, I risk building arguments that collapse under scrutiny. In chess, improvement only happens through strong opposition. The same is true for intellectual growth. Agreement without resistance is not growth. It is stagnation.

Whether I am analyzing Twitch language patterns or refining a research hypothesis, I need technology that questions me, not one that treats every idea as brilliant.


Final Thought

The Northeastern research reminds us that politeness is not the same as intelligence. A chatbot that constantly reassures us might feel supportive, but it undermines the very reason we turn to AI in the first place. We do not need machines that echo our beliefs. We need machines that help us think better.

AI should challenge us thoughtfully, disagree respectfully, and remain grounded in evidence. Anything less turns a powerful tool into a flattering reflection.

— Andrew

4,811 hits

How Chatbots Understand Us: Exploring the Basics of Natural Language Processing (NLP)

If you’ve ever asked Siri a question, chatted with a customer support bot, or played around with ChatGPT, you’ve already seen natural language processing (NLP) in action. But have you ever wondered: How do these systems actually understand what I’m saying? That question is what first got me curious about NLP, and now, as a high school student diving into computational linguistics, I want to break it down for others who might be wondering too.


What Is NLP?

Natural Language Processing is a branch of artificial intelligence (AI) that helps computers understand, interpret, and generate human language. It allows machines to read text, hear speech, figure out what it means, and respond in a way that (hopefully) makes sense.

NLP is used in tons of everyday tools and apps, like:

  • Chatbots and virtual assistants (Siri, Alexa, Google Assistant)
  • Translation tools (Google Translate)
  • Grammar checkers (like Grammarly)
  • Sentiment analysis (used by companies to understand customer reviews)
  • Smart email suggestions (like Gmail’s autocomplete)

How Do Chatbots Understand Language?

Here’s a simplified view of what happens when you talk to a chatbot:

1. Text Input

You say something like: “What’s the weather like today?”
If it’s a voice assistant, your speech is first turned into text through speech recognition.

2. Tokenization

The text gets split into chunks called tokens (usually words or phrases). So that sentence becomes:
[“What”, “’s”, “the”, “weather”, “like”, “today”, “?”]

3. Understanding Intent and Context

The chatbot has to figure out what you mean. Is this a question? A request? Does “weather” refer to the forecast or something else?

This part usually involves models trained on huge amounts of text data, which learn patterns of how people use language.

4. Generating a Response

Once the bot understands your intent, it decides how to respond. Maybe it retrieves information from a weather API or generates a sentence like “Today’s forecast is sunny with a high of 75°F.”

All of this happens in just a few seconds.


Some Key Concepts in NLP

If you’re curious to dig deeper into how this all works, here are a few beginner-friendly concepts to explore:

  • Syntax and Parsing: Figuring out sentence structure (nouns, verbs, grammar rules)
  • Semantics: Understanding meaning and context
  • Named Entity Recognition (NER): Detecting names, dates, locations in a sentence
  • Language Models: Tools like GPT or BERT that learn how language works from huge datasets
  • Word Embeddings: Representing words as vectors so computers can understand similarity (like “king” and “queen” being close together in vector space)

Why This Matters to Me

My interest in NLP and computational linguistics started with my nonprofit work at Student Echo, where we use AI to analyze student survey responses. Since then, I’ve explored research topics like sentiment analysis, LLMs vs. neural networks, and even co-authored a paper accepted at a NAACL 2025 workshop. I also use tools like Zotero to manage my reading and citations, something I wish I had known earlier.

What excites me most is how NLP combines computer science with human language. I’m especially drawn to the possibilities of using NLP to better understand online communication (like on Twitch) or help preserve endangered languages.


Final Thoughts

So the next time you talk to a chatbot, you’ll know there’s a lot going on behind the scenes. NLP is a powerful mix of linguistics and computer science, and it’s also a really fun space to explore as a student.

If you’re curious about getting started, try exploring Python, open-source NLP libraries like spaCy or NLTK, or even just reading research papers. It’s okay to take small steps. I’ve been there too. 🙂

— Andrew

4,811 hits

The Collins Word of the Year and Why It Matters for Computational Linguistics

Every year, a single word captures the moment when language and culture meet. Sometimes it comes from politics, sometimes from technology, but it always tells a story about how people think and communicate. As someone drawn to both words and code, I see each new “Word of the Year” as more than a headline. It’s data, meaning, and evolution all at once.

As I prepare to study Computational Linguistics in college, I have been paying attention not only to algorithms and corpora but also to the ways language changes around us. One of the most interesting reflections of that change is the annual “Word of the Year” chosen by Collins Dictionary. In this post, I’ll review the past ten years of Collins’ selections, explain how the 2025 Word of the Year was chosen (including the shortlist), and discuss why this matters for computational linguistics.


Past Ten Years of Collins Word of the Year

YearWord of the YearBrief explanation
2016BrexitCaptured the UK’s vote to leave the EU and its wide political, social, and linguistic effects.
2017fake newsReflected the rise of misinformation and debates about truth in media.
2018single-useHighlighted environmental awareness and discussions about disposable culture.
2019climate strikeDescribed global youth activism inspired by Greta Thunberg and climate movements.
2020lockdownDefined the year of the Covid-19 pandemic and its global restrictions.
2021NFTStood for “non-fungible token” and represented the emergence of digital assets and blockchain culture.
2022permacrisisDescribed a long period of instability and uncertainty, fitting the global mood.
2023AIRepresented artificial intelligence becoming central to everyday conversation.
2024bratCaptured the confident, independent attitude popularized by youth culture and pop music.
2025vibe codingDescribed the blending of language and technology through conversational code creation.

The 2025 Word of the Year: vibe coding

For 2025, Collins Dictionary selected vibe coding as its Word of the Year. The term refers to new software development practices that use natural language and artificial intelligence to create applications by describing what one wants rather than manually writing code. It describes a form of “coding by conversation” that bridges creativity and computation.

Source: Collins Dictionary Word of the Year 2025


How Collins Selects the Word of the Year

The Collins team monitors its extensive language database throughout the year. Using large-scale corpus analysis, they track words that rise sharply in frequency or reflect cultural, political, or technological change. The process includes:

  • Lexicographic monitoring: Editors and linguists identify new or trending words across print, social media, and digital sources.
  • Corpus analysis: Statistical tools measure frequency and context to see which words stand out.
  • Editorial review: The final decision balances data and cultural relevance to choose a word that captures the spirit of the year.

Shortlist for 2025

In addition to vibe coding, this year’s shortlist includes aura farming, biohacking, broligarchy, clanker, coolcation, glaze, HENRY, micro-retirement, and taskmasking.

You can view the full list on the Collins website: https://www.collinsdictionary.com/us/woty


Why the Collins Word of the Year Matters for Computational Linguistics

As someone preparing to study Computational Linguistics, I find the Collins Word of the Year fascinating for several reasons:

  1. Language change in data
    Each year’s word shows how new vocabulary enters real-world language use. Computational linguistics often studies these changes through corpora to model meaning over time.
  2. Human-machine interaction
    Vibe coding reflects a growing trend where natural language acts as an interface between humans and technology. It is an example of how linguistic principles are now shaping software design.
  3. Semantic and cultural evolution
    The meanings of words like “brat” or “AI” evolve quickly in digital contexts. For computational linguists, tracking these semantic shifts supports research in language modeling and word embeddings.
  4. Lexicographic data as research input
    Collins’ approach mirrors computational methods. Their frequency-based analysis can inspire how we model and predict linguistic trends using data science.
  5. Pedagogical and research relevance
    New words like vibe coding demonstrate how emerging technology changes both everyday communication and the future topics of linguistic research. They show where language innovation meets computation.

Reflection

When I first read that “vibe coding” had been chosen as the 2025 Word of the Year, I couldn’t help thinking about how it perfectly represents where computational linguistics is heading. Language is no longer just a subject of study; it is becoming a tool for creation. What used to be a set of rigid commands is turning into natural conversation.

The term also reminds me that words are living data points. Each new entry in a dictionary records a shift in how people think and communicate. For future computational linguists, observing how dictionaries evolve gives insight into how models and algorithms should adapt too.

It’s easy to see the Word of the Year as a piece of pop culture, but it’s really a linguistic dataset in disguise. Every annual choice documents how society expresses what matters most at that moment, and that is what makes it so meaningful to study.


Sources and Links

— Andrew

4,811 hits

Learning from Industry: How Companies Evaluate LLMs

Over the past few years, large language models (LLMs) have been everywhere. From chatbots that help you book flights to tools that summarize long documents, companies are finding ways to use LLMs in real products. But success is not guaranteed. In fact, sometimes it goes very wrong. A famous example was when Expedia’s chatbot once gave instructions on how to make a Molotov cocktail (Cybernews Report; see the chatbot screenshot below). Another example was Air Canada’s AI-powered chatbot making a significant error by providing incorrect information regarding bereavement fares (BBC Report). Mistakes like these show how important it is for industry practitioners to build strong evaluation systems for LLMs.

Recently, I read a blog post from GoDaddy’s engineering team about how they evaluate LLMs before putting them into real-world use (GoDaddy Engineering Blog). Their approach stood out to me because it was more structured than just running a few test questions. Here are the main lessons I took away:

  1. Tie evaluations to business outcomes
    Instead of treating testing as an afterthought, GoDaddy connects test data directly to golden datasets. These datasets are carefully chosen examples that represent what the business actually cares about.
  2. Use both classic and new evaluation methods
    Traditional machine learning metrics like precision and recall still matter. But GoDaddy also uses newer approaches like “LLM-as-a-judge,” where another model helps categorize specific errors.
  3. Automate and integrate evaluation into development
    Evaluation isn’t just something you do once. GoDaddy treats it as part of a continuous integration pipeline. They expand their golden datasets, add new feedback loops, and refine their systems over time.

As a high school student, I’m not joining the tech industry tomorrow. Still, I think it’s important for me to pay attention to best practices like these. They show me how professionals handle problems that I might face later in my own projects. Even though my experiments with neural networks or survey sentiment analysis aren’t at the scale of Expedia, Air Canada, or GoDaddy, I can still practice connecting my evaluations to real outcomes, thinking about error types, and making testing part of my workflow.

The way I see it, learning industry standards now gives me a head start for the future. And maybe when I get to do college research or internships, I’ll already be used to thinking about evaluation in a systematic way rather than as an afterthought.

— Andrew

4,811 hits

How Large Language Models Are Changing Relation Extraction in NLP

When you type a question into a search engine like “Who wrote Hamlet?” it does more than match keywords. It connects the dots between “Shakespeare” and “Hamlet,” identifying the relationship between a person and their work. This process of finding and labelling relationships in text is called relation extraction (RE). It powers everything from knowledge graphs to fact-checking systems.

In the past, relation extraction systems were built with hand-crafted rules or required thousands of annotated examples to train. Now, large language models (LLMs) such as GPT, T5, and LLaMA are making it possible to do relation extraction with far less data and more flexibility. In this post, I want to explore what relation extraction is, how LLMs are transforming it, and why this matters for anyone interested in the future of language technology.


The Old Way Versus the New Way

Relation extraction used to rely heavily on feature engineering. Researchers would first hand-craft linguistic or statistical features from the text, such as part-of-speech tags, dependency parses, named entity types, and the words appearing between two entities. They often built lexical patterns like “X, the capital of Y” or “X, founded by Y” to capture specific relationships. They also extracted syntactic paths (for example, the shortest path between two entities in a dependency tree) or used windowed word embeddings around the entities to encode local context. These features were then fed into classifiers like Support Vector Machines, logistic regression models, or decision trees to decide which relationship best described the entity pair. Later, neural networks like BERT and RoBERTa replaced much of this manual feature design by learning their own representations directly from text, but they still needed large, labeled datasets to achieve good performance.

Today, LLMs can often handle relation extraction with just a prompt. Instead of training a model from scratch, you can ask an LLM directly: “Identify the relationship between ‘Einstein’ and ‘the theory of relativity.’” With the right instructions, the model can output the relation (“Einstein proposed the theory of relativity”) even if it has never seen that exact pair in training.

See the diagram showing “classic relation extraction pipeline” versus “LLM-based pipeline” below.


How LLMs Tackle Relation Extraction

LLMs have been trained on massive amounts of text, which gives them a broad sense of how entities and relationships co-occur. This makes them powerful at zero-shot and few-shot learning. In other words, you do not need to show them hundreds of examples of a relation type before they can identify it.

For example, a simple prompt could be:

Prompt:“Read the sentence: ‘Marie Curie discovered radium.’ What is the relationship between the entities?”
Model Output from ChatGPT-5 and Grok-4:

[ChatGPT-5] In the sentence “Marie Curie discovered radium”, the relationship between the entities is:

  • Marie Curie (the subject / agent) performed the action.
  • radium (the object / theme) is what was discovered.

This is a “discoverer–discovery” relationship (also called “person–invention/discovery” or “agent–object” in relation extraction). In other words, Marie Curie is the discoverer and radium is the thing discovered.

[Grok-4] In the sentence “Marie Curie discovered radium,” the relationship between the entities “Marie Curie” and “radium” is that of discoverer and discovered. Marie Curie, a scientist, is the person who identified or found the chemical element radium.

This ability to generalize makes LLMs more versatile, especially for relations in domains where labeled data is scarce, like medicine or law. However, it also introduces risks. LLMs may hallucinate relationships that are not actually in the text or mislabel subtle ones, so careful evaluation is still necessary.


Recent Research Highlights

A major paper, A Survey on Cutting-Edge Relation Extraction Techniques Based on Language Models (Diaz-Garcia & López, 2024), reviews 137 recent ACL papers (2020-2023) that use language models for relation extraction. It shows that BERT-based methods still lead many benchmarks while models like T5 are rising in few-shot and unseen-relation settings.

Other papers from ACL 2024 and 2025 explore how well LLMs handle unseen relation types, cross-domain relation extraction, and low-resource settings. These studies show steady improvements but also highlight open questions about factuality, bias, and consistency.


Why This Matters Beyond Academia

Relation extraction sits at the core of knowledge-driven applications. Building or updating a knowledge graph for a company’s internal documents, mapping patient histories in healthcare, or connecting laws to court cases in legal tech all depend on accurately identifying relationships between entities.

LLMs make it possible to automate these tasks more quickly. Instead of spending months labeling data, organizations can draft knowledge structures with an LLM, then have humans verify or refine the results. This speeds up research and decision-making while expanding access to insights that would otherwise stay hidden in text.


Challenges and Open Questions

While LLMs are powerful, they are not flawless. They may infer relationships that are plausible but incorrect, especially if the prompt is ambiguous. Evaluating relation extraction at scale is also difficult, because many relations are context-specific or only partially expressed. Bias in training data can also skew the relationships a model “sees” as likely or normal.

Researchers are now working on ways to add uncertainty estimates, retrieval-augmented methods (i.e., combining information retrieval with generative models to improve response accuracy and relevance), and better benchmarks to test how well models extract relations across different domains and languages.


My Take as a High Schooler Working in NLP

As someone who has built a survey analysis platform and published research papers about sentiment classification, I find relation extraction exciting because it can connect scattered pieces of information into a bigger picture. Specifically, for projects like my nonprofit Student Echo, a future system could automatically link student concerns to policy areas or resources.

At the same time, I am cautious. Seeing how easily LLMs generate answers reminds me that relationships in text are often subtle. Automating them risks oversimplifying complex realities. Still, the idea that a model can find and organize connections that would take a person hours to spot is inspiring and worth exploring.


Conclusion

Relation extraction is moving from hand-built rules and large labeled datasets to flexible, generalist large language models. This shift is making it easier to build knowledge graphs, extract facts, and understand text at scale. But it also raises new questions about reliability, fairness, and evaluation.

If you want to dig deeper, check out A Survey on Cutting-Edge Relation Extraction Techniques Based on Language Models (arXiv link) or browse ACL 2024–2025 papers on relation extraction. Watching how this field evolves over the next few years will be exciting, and I plan to keep following it for future blog posts.

— Andrew

4,811 hits

Introduction to Zotero: Your Free Personal Research Assistant

At the beginning of this summer (Y2025), I learned about a tool that I wish I had discovered years ago. I hadn’t even heard of Zotero until my research collaborator, Computational Sociolinguist Dr. Sidney Wong, introduced it to me while we were working on our computational linguistics project analyzing Twitch data.

After exploring it and learning to use it for my current research, I now realize how effective and essential Zotero is for managing academic work. Honestly, I wish I could have used it for all my previous research projects.


What is Zotero?

Zotero is a free, easy-to-use tool that helps researchers at any level:

  • Collect sources such as journal articles, books, web pages, and more
  • Organize them into collections and tag them for easy retrieval
  • Annotate PDFs directly within the app with highlights and notes
  • Cite sources seamlessly in any citation style while writing papers
  • Share references and collections with collaborators

It’s like having a personal research assistant that keeps all your readings, citations, and notes organized in one place.


Why I Recommend Zotero for High School Students

As high school students, we often juggle multiple classes, club projects, competitions, and research interests. Zotero makes it easy to:

  • Manage research projects efficiently, especially when writing papers that require formal citations
  • Keep track of readings and annotate PDFs, so you don’t lose key insights
  • Collaborate with teammates or research mentors by sharing folders and annotations
  • Avoid citation mistakes, as it automatically generates references in APA, MLA, Chicago, and many other styles

My Experience Using Zotero

When Dr. Wong first recommended Zotero to me, I was a bit hesitant because I thought, “Do I really need another app?” But after installing it and importing my Twitch-related research papers, I quickly saw its value. Now, I can:

  • Search across all my papers by keyword or tag
  • Keep notes attached to specific papers so I never lose insights
  • Instantly generate BibTeX entries for LaTeX documents or formatted citations for my blog posts and papers

I wish I had known about Zotero earlier, especially during my survey sentiment analysis project and my work preparing research paper submissions. It would have saved me so much time managing citations and keeping literature organized.


Zotero vs. Other Reference Managers: Pros and Cons

Here is a quick comparison of Zotero vs. similar tools like Mendeley and EndNote based on my research and initial use:

Pros of Zotero

  • Completely free and open source with no premium restrictions on core features
  • Easy to use with a clean interface suitable for beginners
  • Browser integration for one-click saving of articles and webpages
  • Excellent plugin support for Word, LibreOffice, and Google Docs
  • ✅ Strong community support and development
  • ✅ Works well for group projects with shared libraries

Cons of Zotero

  • ❌ Limited built-in cloud storage for PDFs (300 MB free; need WebDAV or paid plan for more)
  • ❌ Not as widely used in certain STEM fields compared to Mendeley or EndNote
  • ❌ Slightly fewer advanced citation style editing features than EndNote

Compared to Mendeley

  • Mendeley offers 2 GB free storage and a slightly more modern PDF viewer, but it is owned by Elsevier and some users dislike its closed ecosystem.
  • Zotero, being open-source, is often preferred for transparency and community-driven development.

Compared to EndNote

  • EndNote is powerful and widely used in academia but is expensive (>$100 license), making it inaccessible for many high school students.
  • Zotero offers most of the core features for free with a simpler setup.

Final Thoughts

If you’re a high school student interested in research, I highly recommend checking out Zotero. It’s free, easy to set up, and can make your academic life so much more organized and efficient.

You can explore and download it here. Let me know if you want a future blog post on how I set up my Zotero collections and notes for research projects.

— Andrew

4,811 hits

Rethinking AI Bias: Insights from Professor Resnik’s Position Paper

I recently read Professor Philip Resnik’s thought-provoking position paper, “Large Language Models Are Biased Because They Are Large Language Models,” published in Computational Linguistics 51(3), which is available via open access. This paper challenges conventional perspectives on bias in artificial intelligence, prompting a deeper examination of the inherent relationship between bias and the foundational design of large language models (LLMs). Resnik’s primary objective is to stimulate critical discussion by arguing that harmful biases are an inevitable outcome of the current architecture of LLMs. The paper posits that addressing these biases effectively requires a fundamental reevaluation of the assumptions underlying the design of AI systems driven by LLMs.

What the paper argues

  • Bias is built into the very goal of an LLM. A language model tries to predict the next word by matching the probability patterns of human text. Those patterns come from people. People carry stereotypes, norms, and historical imbalances. If an LLM learns the patterns faithfully, it learns the bad with the good. The result is not a bug that appears once in a while. It is a direct outcome of the objective the model optimizes.
  • Models cannot tell “what a word means” apart from “what is common” or “what is acceptable.” Resnik uses a nurse example. Some facts are definitional (A nurse is a kind of healthcare worker). Other facts are contingent but harmless (A nurse is likely to wear blue clothing at work). Some patterns are contingent and harmful if used for inference (A nurse is likely to wear a dress to a formal occasion). Current LLMs do not have an internal line that separates meaning from contingent statistics or that flags the normative status of an inference. They just learn distributions.
  • Reinforcement Learning from Human Feedback (RLHF) and other mitigations help on the surface, but they have limits. RLHF tries to steer a pre-trained model toward safer outputs. The process relies on human judgments that vary by culture and time. It also has to keep the model close to its pretraining, or the model loses general ability. That tradeoff means harmful associations can move underground rather than disappear. Some studies even find covert bias remains after mitigation (Gallegos et al. 2024; Hofmann et al. 2024). To illustrate this, consider an analogy: The balloon gets squeezed in one place, then bulges in another.
  • The root cause is a hard-core, distribution-only view of language. When meaning is treated as “whatever co-occurs with what,” the model has no principled way to encode norms. The paper suggests rethinking foundations. One direction is to separate stable, conventional meaning (like word sense and category membership) from contextual or conveyed meaning (which is where many biases live). Another idea is to modularize competence, so that using language in socially appropriate ways is not forced to emerge only from next-token prediction. None of this is easy, but it targets the cause rather than only tuning symptoms.

Why this matters

Resnik is not saying we should give up. He is saying that quick fixes will not fully erase harm when the objective rewards learning whatever is frequent in human text. If we want models that reason with norms, we need objectives and representations that include norms, not only distributions.

Conclusion

This paper offers a clear message. Bias is not only a content problem in the data. It is also a design problem in how we define success for our models. If the goal is to build systems that are both capable and fair, then the next steps should focus on objectives, representations, and evaluation methods that make room for norms and constraints. That is harder than prompt tweaks, but it is the kind of challenge that can move the field forward.

Link to the paper: Large Language Models Are Biased Because They Are Large Language Models

— Andrew

4,811 hits

Blog at WordPress.com.

Up ↑