Search

Language models for the analysis of and interaction with climate change documents
Part of
- Tackling Climate Change with Machine Learning
Elena Volkanovska
Journal:

Environmental Data Science / Volume 4 / 2025

Published online by Cambridge University Press:

12 December 2025, e51
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Language models (LMs) have attracted the attention of researchers from the natural language processing (NLP) and machine learning (ML) communities working in specialized domains, including climate change. NLP and ML practitioners have been making efforts to reap the benefits of LMs of various sizes, including large language models, in order to both simplify and accelerate the processing of large collections of text data, and in doing so, help climate change stakeholders to gain a better understanding of past and current climate-related developments, thereby staying on top of both ongoing changes and increasing amounts of data. This paper presents a brief history of language models and ties LMs’ beginnings to them becoming an emerging technology for analysing and interacting with texts in the specialized domain of climate change. The paper reviews existing domain-specific LMs and systems based on general-purpose large language models for analysing climate change data, with special attention being paid to the LMs’ and LM-based systems’ functionalities, intended use and audience, architecture, the data used in their development, the applied evaluation methods, and their accessibility. The paper concludes with a brief overview of potential avenues for future research vis-à-vis the advantages and disadvantages of deploying LMs and LM-based solutions in a high-stakes scenario such as climate change research. For the convenience of readers, explanations of specialized terms used in NLP and ML are provided.

StudyTypeTeller—Large language models to automatically classify research study types for systematic reviews
Simona Emilova Doneva, Shirin de Viragh, Hanna Hubarava, Stefan Schandelmaier, Matthias Briel, Benjamin Victor Ineichen
Journal:

Research Synthesis Methods / Volume 16 / Issue 6 / November 2025

Published online by Cambridge University Press:

11 September 2025, pp. 1005-1024
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Abstract screening, a labor-intensive aspect of systematic review, is increasingly challenging due to the rising volume of scientific publications. Recent advances suggest that generative large language models like generative pre-trained transformer (GPT) could aid this process by classifying references into study types such as randomized-controlled trials (RCTs) or animal studies prior to abstract screening. However, it is unknown how these GPT models perform in classifying such scientific study types in the biomedical field. Additionally, their performance has not been directly compared with earlier transformer-based models like bidirectional encoder representations from transformers (BERT). To address this, we developed a human-annotated corpus of 2,645 PubMed titles and abstracts, annotated for 14 study types, including different types of RCTs and animal studies, systematic reviews, study protocols, case reports, as well as in vitro studies. Using this corpus, we compared the performance of GPT-3.5 and GPT-4 in automatically classifying these study types against established BERT models. Our results show that fine-tuned pretrained BERT models consistently outperformed GPT models, achieving F1-scores above 0.8, compared to approximately 0.6 for GPT models. Advanced prompting strategies did not substantially boost GPT performance. In conclusion, these findings highlight that, even though GPT models benefit from advanced capabilities and extensive training data, their performance in niche tasks like scientific multi-class study classification is inferior to smaller fine-tuned models. Nevertheless, the use of automated methods remains promising for reducing the volume of records, making the screening of large reference libraries more feasible. Our corpus is openly available and can be used to harness other natural language processing (NLP) approaches.

A team of three: the role of generative AI in the development of design automation systems for complex products
Alejandro Pradas Gomez, Maximilian Kretzschmar, Kristin Paetzold-Byhain, Ola Isaksson
Journal:

Proceedings of the Design Society / Volume 5 / August 2025

Published online by Cambridge University Press:

27 August 2025, pp. 309-318
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Given the rise of Generative AI and Large Language Models (LLMs), there is a high interest in their use also in engineering design domain. Current research approaches lack to leverage LLM's new orchestration capabilities and use the LLMs in ways that expose their inherent weaknesses. We present a conceptual model to visualize the contribution of LLMs to design tasks and distribute ownership in the design activities: the triangle of design responsibility. A literature review on the design engineering field presents its current uses in this community. The understanding of the model is validated with industry via survey. We identify future research directions in the field of complex product design. We hope that this model helps design automation developers, researchers and industry practitioners to position and assign responsibility effectively in their design automation implementation.

Small language models enable rapid and accurate extraction of structured data from unstructured text: An example with plants and their specialized metabolites
Lucas Busta, Alan R. Oyler
Journal:

Quantitative Plant Biology / Volume 6 / 2025

Published online by Cambridge University Press:

25 July 2025, e26
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Transformer-based large language models are receiving considerable attention because of their ability to analyse scientific literature. Small language models (SLMs), however, also have potential in this area as they have smaller compute footprints and allow users to keep data in-house. Here, we quantitatively evaluate the ability of SLMs to: (i) score references according to project-specific relevance and (ii) extract and structuring data from unstructured sources (scientific abstracts). By comparing SLMs’ outputs against those of a human on hundreds of abstracts, we found that (i) SLMs can effectively filter literature and extract structured information relatively accurately (error rates as low as 10%), but not with perfect yield (as low as 50% in some cases), (ii) that there are tradeoffs between accuracy, model size and computing requirements and (iii) that clearly written abstracts are needed to support accurate data extraction. We recommend advanced prompt engineering techniques, full-text resources and model distillation as future directions.

How Linguistics Learned to Stop Worrying and Love the Language Models
Richard Futrell, Kyle Mahowald
Journal:

Behavioral and Brain Sciences / Accepted manuscript

Published online by Cambridge University Press:

24 July 2025, pp. 1-98
- Article
- - You have access
- PDF
- Export citation
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don’t really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.

Detection avoidance techniques for large language models
Sinclair Schneider, Florian Steuber, João A.G. Schneider, Gabi Dreo Rodosek
Journal:

Data & Policy / Volume 7 / 2025

Published online by Cambridge University Press:

10 March 2025, e29
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The increasing popularity of large language models has not only led to widespread use but has also brought various risks, including the potential for systematically spreading fake news. Consequently, the development of classification systems such as DetectGPT has become vital. These detectors are vulnerable to evasion techniques, as demonstrated in an experimental series: Systematic changes of the generative models’ temperature proofed shallow learning—detectors to be the least reliable (Experiment 1). Fine-tuning the generative model via reinforcement learning circumvented BERT-based—detectors (Experiment 2). Finally, rephrasing led to a >90% evasion of zero-shot—detectors like DetectGPT, although texts stayed highly similar to the original (Experiment 3). A comparison with existing work highlights the better performance of the presented methods. Possible implications for society and further research are discussed.

Can machine learning help accelerate article screening for systematic reviews? Yes, when article separability in embedding space is high
Farhan Ali, Amanda Swee-Ching Tan, Serena Jun-Wei Wang
Journal:

Research Synthesis Methods / Volume 16 / Issue 1 / January 2025

Published online by Cambridge University Press:

10 March 2025, pp. 194-210
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Systematic reviews play important roles but manual efforts can be time-consuming given a growing literature. There is a need to use and evaluate automated strategies to accelerate systematic reviews. Here, we comprehensively tested machine learning (ML) models from classical and deep learning model families. We also assessed the performance of prompt engineering via few-shot learning of GPT-3.5 and GPT-4 large language models (LLMs). We further attempted to understand when ML models can help automate screening. These ML models were applied to actual datasets of systematic reviews in education. Results showed that the performance of classical and deep ML models varied widely across datasets, ranging from 1.2 to 75.6% of work saved at 95% recall. LLM prompt engineering produced similarly wide performance variation. We searched for various indicators of whether and how ML screening can help. We discovered that the separability of clusters of relevant versus irrelevant articles in high-dimensional embedding space can strongly predict whether ML screening can help (overall R = 0.81). This simple and generalizable heuristic applied well across datasets and different ML model families. In conclusion, ML screening performance varies tremendously, but researchers and software developers can consider using our cluster separability heuristic in various ways in an ML-assisted screening pipeline.

22 - Construction Grammar and Language Models
from Part VI - Constructional Applications
- By Harish Tayyar Madabushi, Laurence Romain, Petar Milin, Dagmar Divjak
Edited by Mirjam Fried, Univerzita Karlova, Kiki Nikiforidou, University of Athens, Greece
Book:

The Cambridge Handbook of Construction Grammar

Published online:

30 January 2025

Print publication:

06 February 2025, pp 572-595
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Recent progress in deep learning and natural language processing has given rise to powerful models that are primarily trained on a cloze-like task and show some evidence of having access to substantial linguistic information, including some constructional knowledge. This groundbreaking discovery presents an exciting opportunity for a synergistic relationship between computational methods and Construction Grammar research. In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models. We touch upon the first two approaches as a contextual foundation for the use of computational methods before providing an accessible, yet comprehensive overview of deep learning models, which also addresses reservations construction grammarians may have. Additionally, we delve into experiments that explore the emergence of constructionally relevant information within these models while also examining the aspects of Construction Grammar that may pose challenges for these models. This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar. By doing so, we hope to pave the way for new insights and advancements in both these fields.

Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting
Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic Mathias Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank
Journal:

Natural Language Processing / Volume 31 / Issue 5 / September 2025

Published online by Cambridge University Press:

31 October 2024, pp. 1210-1233
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
A vast amount of clinical data are still stored in unstructured text. Automatic extraction of medical information from these data poses several challenges: high costs of clinical expertise, restricted computational resources, strict privacy regulations, and limited interpretability of model predictions. Recent domain adaptation and prompting methods using lightweight masked language models showed promising results with minimal training data and allow for application of well-established interpretability methods. We are first to present a systematic evaluation of advanced domain-adaptation and prompting methods in a lower-resource medical domain task, performing multi-class section classification on German doctor’s letters. We evaluate a variety of models, model sizes (further-pre)training and task settings, and conduct extensive class-wise evaluations supported by Shapley values to validate the quality of small-scale training data and to ensure interpretability of model predictions. We show that in few-shot learning scenarios, a lightweight, domain-adapted pretrained language model, prompted with just 20 shots per section class, outperforms a traditional classification model, by increasing accuracy from $48.6\%$ to $79.1\%$. By using Shapley values for model selection and training data optimization, we could further increase accuracy up to $84.3\%$. Our analyses reveal that pretraining of masked language models on general-language data is important to support successful domain-transfer to medical language, so that further-pretraining of general-language models on domain-specific documents can outperform models pretrained on domain-specific data only. Our evaluations show that applying prompting based on general-language pretrained masked language models combined with further-pretraining on medical-domain data achieves significant improvements in accuracy beyond traditional models with minimal training data. Further performance improvements and interpretability of results can be achieved, using interpretability methods such as Shapley values. Our findings highlight the feasibility of deploying powerful machine learning methods in clinical settings and can serve as a process-oriented guideline for lower-resource languages and domains such as clinical information extraction projects.

ChatGPT: A Case Study on Copyright Challenges for Generative Artificial Intelligence Systems
Nicola Lucchi
Journal:

European Journal of Risk Regulation / Volume 15 / Issue 3 / September 2024

Published online by Cambridge University Press:

29 August 2023, pp. 602-624
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article focuses on copyright issues pertaining to generative artificial intelligence (AI) systems, with particular emphasis on the ChatGPT case study as a primary exemplar. In order to generate high-quality outcomes, generative AI systems require substantial quantities of training data, which may frequently comprise copyright-protected information. This prompts inquiries into the legal principles of fair use, the creation of derivative works and the lawfulness of data gathering and utilisation. The utilisation of input data for the purpose of training and enhancing AI models presents significant concerns regarding potential violations of copyright. This paper offers suggestions for safeguarding the interests of copyright holders and competitors, while simultaneously addressing legal challenges and expediting the advancement of AI technologies. This study analyses the ChatGPT platform as a case example to explore the necessary modifications that copyright regulations must undergo to adequately tackle the intricacies of authorship and ownership in the realm of AI-generated creative content.

Chapter 10 - Discussion and Conclusions
from Part IV - Discussion
Elnora ten Wolde, Universität Graz, Austria
Book:

The English Binominal Noun Phrase

Published online:

29 June 2023

Print publication:

13 July 2023, pp 277-285
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The final chapter briefly summarizes the key findings in Parts I and II and compares and discusses the strengths and weaknesses of the models discussed in Chapters 8 and 9. In particular, it is argued that one of the fundamental differences between these two approaches is the information they seek to model. FDG offers defined primitives and combinatorial constraints that function as a basis of analysis and constrain possible outcomes. In the context of this project, this means that FDG allows us to capture the distinction between the six of-binominal categories discussed by using the language-specific tools that already exist in the model. However, the FDG account lacks a network view of these phenomena. The CxG analysis offers a network perspective on the changes in constructions and links these constructions to more entrenched patterns in the language systems. This means that it can capture the co-evolution of constructions in the language system. Finally, this chapter discusses some remaining open questions and suggests potential avenues of future research.

Morphosyntactic probing of multilingual BERT models
Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai
Journal:

Natural Language Engineering / Volume 30 / Issue 4 / July 2024

Published online by Cambridge University Press:

25 May 2023, pp. 753-792
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks. We then apply two methods to locate, for each probing task, where the disambiguating information resides in the input. The first is a new perturbation method that “masks” various parts of context; the second is the classical method of Shapley values. The most intriguing finding that emerges is a strong tendency for the preceding context to hold more information relevant to the prediction than the following context.

Sentence encoding for Dialogue Act classification
Nathan Duran, Steve Battle, Jim Smith
Journal:

Natural Language Engineering / Volume 29 / Issue 3 / May 2023

Published online by Cambridge University Press:

02 November 2021, pp. 794-823
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In this study, we investigate the process of generating single-sentence representations for the purpose of Dialogue Act (DA) classification, including several aspects of text pre-processing and input representation which are often overlooked or underreported within the literature, for example, the number of words to keep in the vocabulary or input sequences. We assess each of these with respect to two DA-labelled corpora, using a range of supervised models, which represent those most frequently applied to the task. Additionally, we compare context-free word embedding models with that of transfer learning via pre-trained language models, including several based on the transformer architecture, such as Bidirectional Encoder Representations from Transformers (BERT) and XLNET, which have thus far not been widely explored for the DA classification task. Our findings indicate that these text pre-processing considerations do have a statistically significant effect on classification accuracy. Notably, we found that viable input sequence lengths, and vocabulary sizes, can be much smaller than is typically used in DA classification experiments, yielding no significant improvements beyond certain thresholds. We also show that in some cases the contextual sentence representations generated by language models do not reliably outperform supervised methods. Though BERT, and its derivative models, do represent a significant improvement over supervised approaches, and much of the previous work on DA classification.

Joint optimization on decoding graphs using minimum classification error criterion
Abdelaziz A. Abdelhamid, Waleed H. Abdulla
Journal:

APSIPA Transactions on Signal and Information Processing / Volume 3 / 2014

Published online by Cambridge University Press:

29 April 2014, e6

Print publication:

2014
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Motivated by the inherent correlation between the speech features and their lexical words, we propose in this paper a new framework for learning the parameters of the corresponding acoustic and language models jointly. The proposed framework is based on discriminative training of the models' parameters using minimum classification error criterion. To verify the effectiveness of the proposed framework, a set of four large decoding graphs is constructed using weighted finite-state transducers as a composition of two sets of context-dependent acoustic models and two sets of n-gram-based language models. The experimental results conducted on this set of decoding graphs validated the effectiveness of the proposed framework when compared with four baseline systems based on maximum likelihood estimation and separate discriminative training of acoustic and language models in benchmark testing of two speech corpora, namely TIMIT and RM1.

Search Results

Refine search

Refine search

Actions for selected content:

14 results

Language models for the analysis of and interaction with climate change documents

StudyTypeTeller—Large language models to automatically classify research study types for systematic reviews

A team of three: the role of generative AI in the development of design automation systems for complex products

Small language models enable rapid and accurate extraction of structured data from unstructured text: An example with plants and their specialized metabolites

How Linguistics Learned to Stop Worrying and Love the Language Models

Detection avoidance techniques for large language models

Can machine learning help accelerate article screening for systematic reviews? Yes, when article separability in embedding space is high

22 - Construction Grammar and Language Models

Summary

Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting

ChatGPT: A Case Study on Copyright Challenges for Generative Artificial Intelligence Systems

Chapter 10 - Discussion and Conclusions

Summary

Morphosyntactic probing of multilingual BERT models

Sentence encoding for Dialogue Act classification

Joint optimization on decoding graphs using minimum classification error criterion

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

14 results

Summary

Summary