Hostname: page-component-68c7f8b79f-xc2tv Total loading time: 0 Render date: 2025-12-15T19:58:17.713Z Has data issue: false hasContentIssue false

Expanding conceptual histories: using contextualized word embeddings for the history and philosophy of the virtual particle concept

Published online by Cambridge University Press:  01 October 2025

Michael Zichert*
Affiliation:
Geschichte und Philosophie der modernen Naturwissenschaft, Technische Universität Berlin , Germany
Arno Simons
Affiliation:
Geschichte und Philosophie der modernen Naturwissenschaft, Technische Universität Berlin , Germany
Adrian Wüthrich
Affiliation:
Geschichte und Philosophie der modernen Naturwissenschaft, Technische Universität Berlin , Germany
*
Corresponding author: Michael Zichert; Email: m.zichert@tu-berlin.de
Rights & Permissions [Opens in a new window]

Abstract

This article explores the potential of large language models (LLMs), particularly through the use of contextualized word embeddings, to trace the evolution of scientific concepts. It thus aims to extend the potential of LLMs, currently transforming much of humanities research, to the specialized field of history and philosophy of science. Using the concept of the virtual particle – a fundamental idea in understanding elementary particle interactions – as a case study, we domain-adapted a pretrained Bidirectional Encoder Representations from Transformers model on nearly a century of Physical Review publications. By employing semantic change detection techniques, we examined shifts in the meaning and usage of the term “virtual.” Our analysis reveals that the dominant meaning of “virtual” stabilized after the 1950s, aligning with the formalization of the virtual particle concept, while the polysemy of “virtual” continued to grow. Augmenting these findings with dependency parsing and qualitative analysis, we identify pivotal historical transitions in the term’s usage. In a broader methodological discussion, we address challenges such as the complex relationship between words and concepts, the influence of historical and linguistic biases in datasets, and the exclusion of mathematical formulas from text-based approaches.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Plain language summary

This study explores how large language models (LLMs) can be used to track how scientific ideas and concepts change over time – a method known as digital conceptual history. LLMs are a type of artificial intelligence or statistical model that can be used to analyze, represent and generate text, among other tasks. The goal of this article is to demonstrate how these models can be applied to humanities research broadly and to the history and philosophy of science (HPS) specifically. As a case study in the field of physics, our research focuses on the concept of the virtual particle, a key idea in understanding how fundamental particles interact. We analyze data from nearly a century of Physical Review publications, a highly influential family of physics journals. Using a specific LLM called Bidirectional Encoder Representations from Transformers (BERT), we generate contextualized word embeddings (CWEs) for every occurrence of the term “virtual” in these articles. Word embeddings are representations of words as numerical vectors in a mathematical space, designed to capture the meaning of the words. CWEs go a step further by adjusting these representations based on the context of the word within a sentence or document.

We then use these word embeddings to employ semantic change detection techniques, to identify, interpret and assess how the meaning and usage of the term “virtual” changed over time. Our analysis reveals that the dominant meaning of “virtual” stabilizes after the 1950s, aligning with the formalization of the virtual particle concept around the same time following significant contributions of Richard Feynman and Freeman J. Dyson, two highly influential physicists in the field of particle physics. At the same time the polysemy of “virtual” – that is, the coexistence of multiple meanings for a single word form – continues to grow, suggesting an expanding usage of the term in different meanings. To validate our findings, we use another computational method, “dependency parsing,” to identify the nouns with which the adjective “virtual” is most frequently associated. We also compare our findings to recent studies in the history of physics. We conclude the article with a discussion on how robust our findings are and how suited our computational methods are to analyze the development of scientific concepts.

Glossary

Introduction

Large language models (LLMs) have the potential to significantly alter the landscape of humanities research, and with it, the history and philosophy of science (HPS). Their exceptional performance across diverse natural language processing (NLP) tasks positions them as powerful tools for addressing a range of challenges in HPS (see Simons, Zichert, and Wüthrich Reference Simons, Zichert and Wüthrich2025, for a detailed discussion). For example, LLMs facilitate new ways of dealing with sources, enabling the retrieval, classification and contextual engagement with vast corpora of scientific texts, particularly through advanced retrieval-augmented generation (RAG) pipelines (Gao et al. Reference Gao, Xiong, Gao, Jia, Pan, Bi, Dai, Sun, Wang and Wang2024; Zhu et al. Reference Zhu, Yuan, Wang, Liu, Liu, Deng, Chen, Liu, Dou and Wen2025). They also support the modeling of thematic structures in science by leveraging novel techniques such as BERTopic (Grootendorst Reference Grootendorst2022), which uses embeddings to uncover latent topics and thematic patterns across fields and time periods. Additionally, LLMs allow for detailed analyses of the evolving meanings of scientific concepts, a task particularly suited to the contextualized word embeddings (CWEs) LLMs generate (Kleymann, Niekler, and Burghardt Reference Kleymann, Niekler and Burghardt2022; Simons Reference Simons2024b; Zichert Reference Zichert2023). In this article, we focus on this last area, demonstrating how CWEs can be used to trace the evolution of the concept of the virtual particle in physics – a concept central to explaining the interactions of elementary particles. This study is motivated by both a historical and philosophical interest in the concept’s role and meaning in theoretical and experimental physics, and a methodological interest in advancing computational approaches within the field of HPS.

Our approach applies the framework of digital Begriffsgeschichte (Wevers and Koolen Reference Wevers and Koolen2020) to HPS. We explore the use of CWEs and other computational methods to scale up conceptual history, which has a rich qualitative tradition in HPS and beyond. Specifically, we domain-adapted a pretrained BERT model (Devlin et al. Reference Devlin, Chang, Lee, Toutanova, Burstein, Doran and Solorio2019) on the domain-specific language of nearly a century of Physical Review (PR) publications and extracted CWEs for all occurrences of the term “virtual,” which we used as the the linguistic marker for the concept of the virtual particle.Footnote 1 While this choice reflects a significant but necessary trade-off between coverage and precision, focusing on the term “virtual” rather than the compound “virtual particle” allowed us to include precursor concepts or variants expressed by terms such as “virtual state” or “virtual transition” in our analysis. This is particularly important in periods, where the compound “virtual particle” was not yet in use but where developments which would eventually contribute to the formation of the virtual particle already occurred. To preserve the full range of usage and avoid prematurely narrowing the conceptual field, we did not exclude any occurrences of “virtual” from the analysis. We also deliberately chose not to extend our dataset to potential synonyms of “virtual,” in order to maintain a focused and coherent analysis in the present article.Footnote 2

The word embeddings of “virtual” were then used to employ semantic change detection (SCD), which aims to identify, interpret and assess shifts in lexical meaning over time using computational techniques. SCD has emerged as a distinct research field in recent years supported by multiple survey studies (Periti and Montanelli Reference Periti and Montanelli2024; Tahmasebi, Borin, and Jatowt Reference Tahmasebi, Borin and Jatowt2021). While most studies focus on the technical implementation of SCD, there have also been calls for further evaluation of the methods through in-depth case studies backed by qualitative analysis (Kutuzov, Velldal, and Øvrelid Reference Kutuzov, Velldal and Øvrelid2022; Periti and Montanelli Reference Periti and Montanelli2024). We hope to provide such a case study with this article. We apply various SCD metrics to the term “virtual,” and reconstruct from this important aspects of the origin, usage and evolution of the concept of the virtual particle. As a consequence of our aforementioned focus on “virtual” as a marker for virtual particle, we did not apply SCD metrics to the compound “virtual particle.” We applied two types of SCD metrics, one being the change in dominant meaning and the other the degree of polysemy – i.e., the coexistence of multiple meanings for a single word form. For instance, the meaning of “virtual” in connection to “reality” differs from its meaning in connection to “particle.” To enhance our analysis, we incorporated dependency parsing, which provided deeper insights into the observed semantic shifts, further evaluated the results through statistical permutation testing, and contextualized our findings with qualitative analysis.

The “Computational HPS, conceptual history and the advent of LLMs” section provides an overview of computational approaches in HPS, with a particular focus on the transformative impact of CWEs on conceptual history. The second section, “The virtual particle concept”, delves into the historical and philosophical significance of the virtual particle concept. In the “Dataset and corpus creation” section we describe the dataset used in our study, including its composition, preprocessing steps and the challenges associated with analyzing historical corpora. In the “Methods” section, we outline our methodological framework, detailing the domain-adaption of BERT, the extraction of CWEs, and the metrics used for SCD. The “Results” section presents our findings, describing the temporal development of “virtual” and highlighting shifts in its dominant meaning as well as changes in its degree of polysemy. Finally, in the “Discussion” section, we examine the broader implications of our findings for HPS and reflect on the potential of LLM-based methods to enrich conceptual history and related fields.Footnote 3

Computational HPS, conceptual history and the advent of LLMs

Tracing the emergence and development of scientific concepts has been a core concern in the HPS for a long time. One of the earliest examples of a conceptual history is Ludwik Fleck’s (Reference Fleck1979[1935]) in-depth study of the historical evolution of the concept of syphilis. Fleck demonstrated that scientific facts and the concepts underpinning them are not immutable but rather evolve through iterative interactions between social, cultural and institutional contexts. When Fleck’s work was rediscovered in the 1960s and beyond, it resonated with a growing focus on the historical, discursive and social dimensions of scientific knowledge (Foucault Reference Foucault1970; Kuhn Reference Kuhn1962), and it helped inspire a flourishing tradition of conceptual history and analysis in HPS (cf. Feest and Steinle Reference Feest and Steinle2012; Hacking Reference Hacking1999). Subsequent studies have illuminated the development of diverse scientific and methodological concepts, such as probability (Hacking Reference Hacking1975), protein synthesis (Rheinberger Reference Rheinberger1997), quarks (Pickering Reference Pickering1999), objectivity (Daston and Galison Reference Daston and Galison2007), temperature (Chang Reference Chang2007) and electricity (Steinle Reference Steinle2016).

In parallel to this rich qualitative tradition, HPS scholars have increasingly turned to computational approaches to conceptual history and analysis. Early attempts were led by scientometricians, who sought to map the conceptual structure of science by tracing and clustering either (co-)citations, treated as symbols of concepts represented by the cited articles (Small Reference Small1973, Reference Small1978), or (co-)keywords used to index articles, viewed as indicators of semiotic translations within scientific discourse (Callon et al. Reference Callon, Courtial, Turner and Bauin1983; Callon, Law, and Rip Reference Callon, Law, Rip, Callon, Law and Rip1986). In recent years, advances in adjacent fields such as digital humanities and computational linguistics have inspired more sophisticated approaches to computational conceptual history and analysis in HPS. These newer methodologies build on earlier techniques like co-citation and co-word analysis while incorporating advances in text mining and machine learning to explore the shifting meaning of scientific concepts across vast corpora of scientific texts, enriching our understanding of their socio-historical dynamics (Chen, Ding, and Ma Reference Chen, Ding and Ma2018; Gavin et al. Reference Gavin, Jennings, Kersey, Pasanek, Gold and Klein2019; Laubichler, Maienschein, and Renn Reference Laubichler, Maienschein and Renn2019; Lean, Rivelli, and Pence Reference Lean, Rivelli and Pence2023; Malaterre and Léonard Reference Malaterre and Léonard2024; Overton Reference Overton2013; Wevers and Koolen Reference Wevers and Koolen2020). However, while this integration of computational tools has opened new avenues for studying the discursive and structural dimensions of conceptual change, many approaches still struggle to effectively capture the context-dependent meanings of words and sentences.

A key breakthrough for computational HPS – and the computational humanities more broadly – was the introduction of transformer-based LLMs, with BERT (Devlin et al. Reference Devlin, Chang, Lee, Toutanova, Burstein, Doran and Solorio2019) standing out as particularly significant for conceptual history and analysis. Unlike static embedding models such as word2vec (Mikolov et al. Reference Mikolov, Chen, Corrado and Dean2013), which assign a single fixed vector to each word regardless of context, or unidirectional LLMs like GPT (Radford and Narasimhan Reference Radford and Narasimhan2018), which process text in a sequential manner and can only use preceding words (but not the following ones) to determine embeddings, BERT employs a bidirectional design that considers both preceding and succeeding words in a sentence simultaneously to generate unique CWEs that reflect their specific context. This capability is especially valuable for handling complex linguistic phenomena such as colexification, where a single word can carry multiple meanings depending on context, making BERT a powerful tool for analyzing the shifting meanings of concepts in large text corpora (Kutuzov, Velldal, and Øvrelid Reference Kutuzov, Velldal and Øvrelid2022; Periti and Montanelli Reference Periti and Montanelli2024; Periti and Tahmasebi Reference Periti, Tahmasebi, Duh, Gomez and Bethard2024; Wevers and Koolen Reference Wevers and Koolen2020).

The application CWE-based methods for conceptual history in HPS is still in its early stages, with only a few studies showcasing its potential. Kleymann, Niekler, and Burghardt (Reference Kleymann, Niekler and Burghardt2022) analyzed the concept of theory in 3,737 Digital Humanities publications, beginning in their first case study with a broad mapping of its semantic space through frequency and co-occurrence analyses of theory-related terms and frameworks. In their second case study, they focused more narrowly on the term “theory” itself, clustering CWEs using a fine-tuned BERT model to identify distinct senses of the term and trace its semantic relationships with related terms, such as “model” and “method,” over several decades (1966–2020). Simons (Reference Simons2024b) evaluated five BERT-based models with varying degrees of domain-specific pretraining, including his own Astro-HEP-BERT (Simons Reference Simons2024a), across tasks in word sense disambiguation, sense induction, clustering quality and lexical semantic change, using the term “Planck” as a test case. His study underscores the importance of domain-specific pretraining for analyzing scientific language and demonstrates the cost-effectiveness of adapting pretrained models for HPS research.

The virtual particle concept

The virtual particle – the focal concept in our case study – has been a integral element of particle physics for decades. Emerging from the frameworks of quantum field theory (QFT) and perturbation theory, it plays a crucial role not only for the theoretical description of quantum processes but also for the interpretation of experimental observations. It constitutes a powerful tool for simplifying communication, visualization (notably through Feynman diagrams, see Figure 1) and quantitative calculations of complex quantum field theoretical processes. But despite its success and widespread use, the concept holds different meanings and connotations within today’s particle physics community, and its historical origins and development remain topics of ongoing debate. The concept has also sparked significant philosophical debate, particularly regarding their ontological status (Jaeger Reference Jaeger2019; Valente Reference Valente2011) and the interplay between observational methods and the very definition of particles (Harlander, Martinez, and Schiemann Reference Harlander, Martinez and Schiemann2023). As mediators of the so-called exchange forces, virtual particles may be considered responsible for the fundamental interactions of elementary particles. They also influence reaction rates in so-called higher-order processes. In this sense, they have detectable and real effects. However, they do not share the properties of real particles; for instance, the mass and energy of a virtual particle do not stand in the same relation as would be the case with a particle that is observed in the appropriate detectors. Crucially, virtual particles exist only in intermediate, unobservable phases of processes such as decays or scattering, making them an important but elusive element of particle physics.

Figure 1. Feynman diagram illustrating the electromagnetic interaction between two electrons ( $\mathrm {e^-}$ ) via the exchange of a virtual photon ( $\gamma $ ) (time axis oriented from bottom to top). The two electrons, represented by the external solid lines, are considered “real,” while the photon, represented by the internal wavy line, is considered “virtual.” Source: https://commons.wikimedia.org/wiki/File:Feynmandiagram.svg.

Recent works have shed considerable light on the associated historical issues concerning the origin and development of the concept. Ehberger (Reference Ehberger2025) examines the early stages of this development, spanning the 1920s to around 1950, and identifies a progression from “virtual oscillators” to “virtual transitions” and, ultimately, to “virtual particles.” Similarly focused on the concept’s early phase, Martinez (Reference Martinez2024) explores the emergence and evolution of the notion of virtuality in nuclear physics during the 1920s and 1930s. Additional studies on the following conceptual shift due to Feynman diagrams, introduced by Richard Feynman in 1948, and the associated calculation schemes testify to the relevance of the virtual particle concept in the evolution of theoretical and experimental particle physics (Blum Reference Blum2017; Kaiser Reference Kaiser2005; Wüthrich Reference Wüthrich2010). While these analyses provide valuable insights, they are limited by their focus on selected documents. In this study, we aim to go beyond qualitative case studies in order to develop a more comprehensive understanding of the concept’s history by analyzing a large dataset over an extended period. However, we use these studies to contextualize our quantitative findings. Specifically, we seek to identify key periods of historical development by examining shifts in the dominant meaning of the term “virtual” within our corpus and to explore the terminological diversity associated with the term by analyzing its polysemy over time.

Dataset and corpus creation

PR corpus

The first step in our analysis is the selection and creation of the corpus (for a detailed overview of the workflow, see Figure 3). Our dataset consists of a large collection of scientific articles from eight journals in the PR-family (692,212 articles in total). The corpus spans from the introduction of the concept of virtuality in quantum physics in 1924 up to 2022, the latest complete year available for analysis, making it well-suited for studying the history of the virtual particle concept. The PR-journals are highly influential in the field of physics (Bollen, Rodriguez, and Van de Sompel Reference Bollen, Rodriguez and Van de Sompel2006) and qualitative investigations (Ehberger Reference Ehberger2025; Martinez Reference Martinez2024) confirm their relevance in the emergence and establishment of the concept, with several key articles on the topic published in these journals (e.g., Bethe and Bacher Reference Bethe and Bacher1936; Dyson Reference Dyson1949b; Feynman Reference Feynman1949). Through an agreement with the American Physical Society (APS), we have access to all normally restricted full texts, metadata and citation data from this period (American Physical Society 2023). We include eight relevant journals into our analysis: PR - Series II (all of physics until 1969), Review of Modern Physics (long review articles with broad disciplinary scope, since 1929), PR - Letters (short articles with high impact and broad disciplinary scope, since 1958), PR - A (covering atomic, molecular and optical physics, since 1970), PR - B (condensed matter and materials physics, since 1970), PR - C (nuclear physics, since 1970), PR - D (particle physics, field theory, gravitation and cosmology, since 1970) and PR - E (statistical, nonlinear, biological and soft matter physics, since 1993). To focus on long-term trends, newer journals (introduced after 2010) are excluded from the analysis. Figure 2 shows the temporal development of the corpus for each of the eight selected journals. The two dashed lines mark the primary disciplinary differentiations within the PR-journals in 1970 and around 2010, respectively.

Figure 2. Number of published articles per year for each journal in the PR-corpus. The first dashed line indicates the transition from Series II to PR A - D, while the second dashed line marks a subsequent disciplinary differentiation around 2010. The numbers in brackets denote the total article count per journal across the entire corpus, rounded to the nearest thousand.

Figure 3. Overview of the SCD workflow used in this study, showing the four main steps of the analysis: corpus creation (1), embedding (2), vector aggregation (3) and shift assessment (4). The vertical arrows indicate the chronological sequence of processes within each step. For vector aggregation, both form-based and sense-based aggregation methods are applied in parallel.

The dataset’s substantial size, comprising nearly 700,000 articles, makes it well suited for extensive analysis using computational methods. However, it also presents notable limitations, particularly concerning the early development of the concept. As a primarily US-based source written exclusively in English, significant developments from other regions are not captured. For instance, many centers of the developing quantum theory in the 1920s and early 1930s was in Europe, particularly in German-speaking countries, the Netherlands and Denmark. Several important publications were in German-language journals like Die Naturwissenschaften or Zeitschrift für Physik, which are not included in our study. Although the center of research on quantum theories increasingly shifted to the United States from the 1930s onward – particularly after World War II – important research communities also emerged in the Soviet Union and Japan, which are similarly underrepresented (Ehberger Reference Ehberger2025). Another issue is the relatively small number of articles in the corpus published before 1950 (approximately 12,000 articles, or just under 2%). This limited amount of historical text data presents a common challenge in the application of computational methods to conceptual history. While we attempt to address this limitation through permutation-based statistical testing, a more comprehensive analysis of the concept’s early phase would require the incorporation of additional text sources. In the case of the virtual particle, this would ideally involve using a multi-language BERT model to analyze articles from the entire international physics community, in particular in the 1920s to the 1940s.

Data preprocessing

Analyzing all nearly 700,000 articles in the entire corpus using word embeddings is impractical due to scalability issues. Instead, we first identify articles potentially relevant to the concept of the virtual particle through a keyword search for “virtual” in the full texts, abstracts and titles. Approximately half of the full texts are available as digitized and OCR-processed PDF files (331,210 entries before 2004), while the other half are in native digital XML format (329,880 entries from 2004 onwards). For processing the PDF-files we use GROBIDFootnote 4 , which allows parsing and restructuring of scientific publications in PDF format into uniformly TEI-formattedFootnote 5 XML files. To catch common OCR-errors prevalent in the PDF-extracted text data, we apply some basic cleaning steps like removing special characters, etc. Subsequently, citations and mathematical formulas are also removed from the text. GROBID offers automatic recognition of citations and formulas; however, it does not yield good results for this dataset and is also computationally and time intensive. Instead, we use regular expressions for automated detection and removal of formulas in texts before 2004. The XML files from 2004 onward are cleaned using the standard XML tags for citations and mathematical formulas. Despite our best efforts, there remains a notable difference in data quality regarding citations and formulas before and after 2004. Although the formulas used likely reflect significant developments in the conceptualization of the virtual particle, there are currently no established tools for the content analysis of mathematical formulas in the context of conceptual history and the history of science. Therefore, this work focuses on the analysis of linguistic text data.

To ensure the efficient use of the BERT model the texts are segmented into sentences. For this task, we utilize the large language model of the Python NLP library SciSpaCyFootnote 6 , which has been trained on a large corpus of scientific texts (albeit in bio-medicine), making it suited for this purpose. We also use the model for dependency parsing, where a sentence’s syntactic structure is created by identifying how words are grammatically related through directed links. This is particularly helpful for analyzing adjectives like “virtual,” as it allows for accurate identification of the associated nouns. We use these dependencies to evaluate and gain a deeper understanding of the observed semantic shifts. Following Laicher et al. (Reference Laicher, Kurtyigit, Schlechtweg, Kuhn, Schulte im Walde, Sorodoc, Sushil, Takmaz and Agirre2021), we do not employ further preprocessing steps, such as lemmatization, as they do not seem to improve SCD in English texts. After data preparation, our final analysis corpus consists of 41,786 articles, containing 126,540 occurrences of “virtual.” We refer to this as our “virtual”-corpus from this point onward. The exact number of articles, “virtual”-embeddings and cleaned tokens in this corpus per year can be found in the appendix (Table A.2).

Methods

BERT and domain adaptation

For SCD using BERT, fine-tuning for downstream tasks is unnecessary, as the focus is on the learned word representations, i.e., the CWEs themselves. Previous studies have demonstrated that domain adaptation, either by retraining a pretrained model or training one from scratch on a domain-specific corpus, significantly enhances the quality of CWEs for specialized use cases, particularly in scientific contexts (Beltagy, Lo, and Cohan Reference Beltagy, Lo, Cohan, Inui, Jiang, Ng and Wan2019; Lee et al. Reference Lee, Yoon, Kim, Kim, Kim, So and Kang2020; Zhang et al. Reference Zhang, Chen, Jin, Wang, Ji, Wang, Han, Al-Onaizan, Bansal and Chen2024). This is particularly crucial for our study, as the dataset comprises highly specialized scientific texts in physics. At the time of our empirical analysis, popular domain-specific BERT models for the analysis of scientific texts included BioBERT (Lee et al. Reference Lee, Yoon, Kim, Kim, Kim, So and Kang2020) and SciBERT (Beltagy, Lo, and Cohan Reference Beltagy, Lo, Cohan, Inui, Jiang, Ng and Wan2019). Additionally, astroBERTFootnote 7 , a model trained specifically on astrophysics (Grezes et al. Reference Grezes, Blanco-Cuaresma, Accomazzi, Kurtz, Shapurian, Henneken, Grant, Thompson, Chyla, McDonald, Hostetler, Templeton, Lockhart, Martinovic, Chen, Tanner and Protopapas2021), was already available. However, its narrow focus on astrophysics made it unsuitable for our broader investigation of physics language. Two additional physics-focused models, Astro-HEP-BERTFootnote 8 (Simons Reference Simons2024a) and PhysBERT (Hellert, Montenegro, and Pollastro Reference Hellert, Montenegro and Pollastro2024), have since been developed, but were released too late to be included in our study. For a comprehensive overview of scientific LLMs, including those specific to physics, see Zhang et al. (Reference Zhang, Chen, Jin, Wang, Ji, Wang, Han, Al-Onaizan, Bansal and Chen2024).

For our analysis, we retrained the BERT-base-uncased modelFootnote 9 – a general-purpose model originally trained from scratch over 40 epochs on a dataset of Wikipedia articles and free books, totaling 3.3 billion tokens – on our “virtual”-corpus for five epochs using the masked language modeling objective outlined in Devlin et al. (Reference Devlin, Chang, Lee, Toutanova, Burstein, Doran and Solorio2019). Additionally, we retrained and tested SciBERTFootnote 10 , which was trained from scratch on 1.14 million biomedical and computer science papers, but found that our domain-adapted BERT-base outperformed it slightly in terms of training and validation loss. Based on the empirical findings of Martinc et al. (Reference Martinc, Kralj Novak, Pollak, Calzolari, Béchet, Blache, Choukri, Cieri, Declerck, Goggi, Isahara, Maegaard, Mariani, Mazo, Moreno, Odijk and Piperidis2020), we did not apply time-specific fine-tuning, as we assumed that the context-dependent nature of our model’s CWEs already makes them well-suited to their temporal context.

For inference, we fed segmented sentences into our model with a maximum sequence length of 512 tokens, and extracted the sum of the last four layers for each token. For words comprising multiple subword tokens, we stored the average embedding of these tokens. Given the contextual nature of our embeddings, each token occurrence results in one embedding vector. To reduce disk storage requirements, embeddings were saved only for meaningful words, excluding stop words, numbers and special characters, with the total count of these meaningful embeddings being approximately 100 million.

SCD

General workflow

The basic procedure of SCD can be outlined as follows: Given a diachronic corpus of documents $C = \bigcup _{t=1}^{T} C_{w}^{t}$ , where $C_{w}^{t}$ represents a subcorpus of documents at time t within the overall investigation period $[1, \ldots , T]$ that contains the target word w. The goal of SCD is to quantify the semantic shift $s_{w}$ for w between two time-specific subcorpora $C_{w}^{t}$ and $C_{w}^{t'}$ or across the entire corpus. In this study, we focus on two ways in which a semantic shift can manifest: firstly, as a change in the dominant meaning of a term, and secondly, as a change in the degree of its polysemy. Specifically, for our purposes the target word is “virtual,” the documents comprise all the full texts plus abstracts of the PR-corpus that contain “virtual” (our “virtual”-corpus), and the time interval is one year. We summarize the notations used throughout the study in Table 1.

Table 1. Reference table of notations used in this article

The generalized workflow required for performing contextualized SCD can be split into four steps (see Figure 3). The first step (Corpus Creation) including the selection of an appropriate dataset, data preprocessing and identifying the relevant articles, has already been described in “Dataset and corpus creation” section. In the second step (Embedding), CWEs are generated for each occurrence of the target word in the corpus, using the domain-adapted BERT model introduced in the previous chapter. The set of all these embeddings in the time-specific subcorpus $C_{w}^{t}$ is expressed as $\Phi _{w}^{t} = \{e_{w,i}^{t}, \ldots , e_{w,I}^{t}\}$ , where $e_{w,i}^{t}$ represents a contextualized word embedding in the subcorpus and I denotes the number of all occurrences of w in it. In the third step (Vector Aggregation) the CWEs of a time period $\Phi _{w}^{t}$ are aggregated to represent the time-specific meanings of w. Two types of representations are defined: Form-based approaches examine the high-level properties of the target word per time period by looking directly at the dominant sense of a word or the degree of polysemy. When considering the dominant meaning, word prototypes $\mu _{w}^{t}$ can be generated for each time interval representing the average of all CWEs in $\Phi _{w}^{t}$ , thus providing an aggregated representation of the semantic properties of the target word in $C_{w}^{t}$ . When looking at polysemy at the high level, the aggregation step is usually skipped and the semantic shift of w is measured by directly comparing the degree of polysemy in the time-specific set of CWEs $\Phi _{w}^{t}$ and $\Phi _{w}^{t'}$ . Sense-based approaches, in contrast, attempt to first capture the different time-specific senses or meanings of the target word in $C_{w}^{t}$ using clustering methods. Each time-specific meaning corresponds to a cluster of CWEs $\phi _{w,n}^{t}$ in the set of CWEs $\Phi _{w}^{t}$ .

We apply two clustering methods to identify meaning clusters. In k-means clustering (KM), CWEs are organized into a predefined number of clusters by iteratively updating cluster centers until stable. Determining the optimal number of clusters is challenging; automated methods like the silhouette coefficient often fail to identify the actual number of meaning clusters (Martinc, Novak, and Pollak Reference Martinc, Montariol, Zosa and Pivovarova2020). Therefore, we set the number of clusters to $N = 10$ based on pragmatic considerations, aiming to strike a balance between achieving reasonably high silhouette scores and maintaining a number of clusters that can be meaningfully interpreted and qualitatively evaluated. During qualitative inspection, we assessed cluster coherence and interpretability by examining the top 20 words in each cluster, specifically checking whether they reflected distinct and recognizable senses. Affinity propagation (AP) identifies exemplars among data points and forms clusters without the need to pre-specify their number by iteratively exchanging “messages” between data points to determine the clusters. However, the number of clusters often correlates with the number of input CWEs rather than actual meanings, potentially resulting in a large number of clusters (Montariol, Martinc, and Pivovarova Reference Montariol, Martinc, Pivovarova, Toutanova, Rumshisky, Zettlemoyer, Hakkani-Tur, Beltagy, Bethard, Cotterell, Chakraborty and Zhou2021). Another drawback of AP is its high computational complexity of $O(n^2)$ . In our study, both clustering methods are applied to the entire “virtual”-corpus; however, it would also be feasible to employ time-specific clustering. In order to make the clusters usable for SCD, we calculate a cluster distribution $P_{w}^{t}$ for each time slice t. This distribution represents the relative frequency with which a specific CWE $e_{w,i}^{t}$ from the complete set of CWEs $\Phi _{w}^t$ at time t is assigned to a particular cluster $\phi _{w,n}^{t}$ within the same time slice. It is formally defined as follows:

$$\begin{align*}P_{w}^{t} = [p_{w,1}^{t}, p_{w,2}^{t}, \ldots, p_{w,N}^{t}] \mathrm{, where } p_{w,n}^{t} = \frac{|\phi_{w,n}^{t}|}{|\Phi_{w}^{t}|}\cdot\end{align*}$$

Once the time-specific representations are identified, they can be compared over time in the final step (Shift Assessment) to determine the extent of the semantic shift $s_w$ . The methods used to quantify this shift, split into those measuring the semantic shift for polysemy and those for dominant meaning, will be introduced in the next chapters.

Polysemy

We apply two methods to quantify the temporal development of a term’s polysemy. The first method is normalized Shannon entropy $H(P_{w}^{t})$ , which utilizes the cluster distribution to describe the degree of uncertainty in the distribution of embeddings across meaning clusters within a given time period. Specifically, Shannon entropy quantifies the average amount of information needed to assign a given embedding to a particular sense cluster. In our case, it reflects how likely a specific occurrence of the term “virtual” corresponds to a distinct meaning – e.g., referring to the virtual photon. A low entropy value indicates that most embeddings are concentrated in a single cluster, suggesting stable or unambiguous usage. Conversely, a high entropy value means the embeddings are dispersed across multiple clusters, pointing to more variable or ambiguous usage and therefore greater polysemy. (Baumann, Stephan, and Roth Reference Baumann, Stephan, Roth, Bouamor, Pino and Bali2023; Giulianelli, Tredici, and Fernández Reference Giulianelli, Tredici, Fernández, Jurafsky, Chai, Schluter and Tetreault2020). To ensure comparability of entropy values across different time periods, we use the normalized Shannon entropy $\eta (P_{w}^{t})$ , which ranges from 0 to 1 and is defined as follows:

$$\begin{align*}\eta(P_{w}^t) = \frac{H(P_{w}^t)}{\log(N)} \mathrm{, where } H(P_{w}^t) = -\sum_{n \in N} p_{w,n}^{t} \log(p_{w,n}^{t})\cdot\end{align*}$$

The second method, average inner distance (AID), measures the variance of the CWEs $\Phi _{w}^{t}$ , reflecting the degree of polysemy of w in $C_{w}^{t}$ . In this approach, embeddings are not aggregated into meaning clusters or word prototypes. Instead, the average distances between all possible pairs of embeddings within a single time period are calculated. Intuitively, AID measures how dispersed or spread out the contextualized embeddings of “virtual” are in a given year. A higher value of AID suggests that the term appears in a wider variety of contexts – implying greater polysemy – while lower AID indicates more consistent, unambiguous usage (Periti and Montanelli Reference Periti and Montanelli2024). This method is sometimes also referred to as self-similarity (Garí Soler and Apidianaki Reference Garí Soler and Apidianaki2021). We employ Euclidean distance, denoted in the formula as $d(e_{w,i}^{t}, e_{w,j}^{t})$ . AID is defined as follows:

$$\begin{align*}\mbox{AID}(\Phi_{w}^{t}) = \frac{1}{|\Phi_{w}^{t}|}\cdot \sum_{e_{w,i}^{t},e_{w,j}^{t} \in \Phi_{w}^{t}, i<j} d(e_{w,i}^{t}, e_{w,j}^{t})\cdot\end{align*}$$

Dominant meaning

To assess the shift in dominant meaning in a form-based manner, cosine similarity (CS) can be used. CS measures the alignment between the vectors of two word prototypes $\mu _{w}^{t}$ and $\mu _{w}^{t'}$ by calculating the dot product of the vectors divided by the product of their norms (lengths). CS values range between $-$ 1 and 1, where a high value indicates vector alignment and a low value indicates opposition. We employ the variant inverted CS over word prototypes (PRT), which, according to Kutuzov, Velldal, and Øvrelid (Reference Kutuzov, Velldal and Øvrelid2022), is better suited for quantifying the extent of the semantic shift. In our case, the word prototypes represent the dominant meaning of “virtual” at a given point in time, computed as the mean of all occurrences of “virtual” within that time slice. PRT therefore captures how the central tendency of a term’s usage – i.e., its dominant meaning – changes over time. PRT values are always greater than 1, where higher values signify a more pronounced shift. It is defined as follows:

$$\begin{align*}\mbox{PRT}(\mu_{w}^{t}, \mu_{w}^{t'}) = \frac{1}{\mbox{CS}(\mu_{w}^{t}, \mu_{w}^{t'})} \mathrm{, where } \mbox{CS}(\mu_{w}^{t}, \mu_{w}^{t'}) = \frac{\mu_{w}^{t} \cdot \mu_{w}^{t'}}{\left\|\mu_{w}^{t}\right\| \left\|\mu_{w}^{t'}\right\|}\cdot\end{align*}$$

The shift in dominant meaning can also be assessed using meaning clusters (sense-based) through the Jensen-Shannon divergence (JSD). JSD, based on normalized Shannon entropy, measures the divergence between cluster distributions of a term across different time periods. This method considers not only the variation in the size of the clusters but also how the prominence of specific clusters varies across the different time periods (Giulianelli, Tredici, and Fernández Reference Giulianelli, Tredici, Fernández, Jurafsky, Chai, Schluter and Tetreault2020). In this way, JSD captures how much the sense profile of a term changes over time, helping to identify not just how ambiguous a term is in one moment (as Shannon entropy does), but how its dominant usages shift across periods. A high JSD value indicates significantly different cluster distributions, suggesting pronounced semantic shifts. Conversely, a low JSD value indicates relatively similar distributions, implying stability in the dominant meaning. JSD is defined as follows:

$$\begin{align*}\mbox{JSD}(P_{w}^{t},P_{w}^{t'}) = H\left(\frac{1}{2}(P_{w}^{t}+P_{w}^{t'})\right) - \frac{1}{2} \left(H(P_{w}^{t})-H(P_{w}^{t'})\right)\cdot\end{align*}$$

Results

Temporal development of “virtual”

The first result of our study is the descriptive analysis of the “virtual”-corpus in regards to the temporal development of the term. Figure 4 shows the number of published articles per year containing “virtual” for the entire corpus (left) and their proportion per journal (right). The dashed lines in the left figure indicate two key disciplinary differentiations in the PR-journals: the transition from Series II to PR A - D in 1970, and the introduction of new journals like PR - X (2011) and PRX - Quantum (2021). To focus on long-term trends, these newer journals are excluded from the analysis. The decline in articles after 2010 is thus an artefact of the dataset and does not reflect overall trends in PR publications or physics. Notably, there is a low number of articles in the early phase of the study period, with only 384 publications in our corpus containing “virtual” before 1950, especially sparse before 1930 and during the war years (1942–1945). The exact number of articles, “virtual”-embeddings and cleaned tokens in the “virtual”-corpus per year can be found in the appendix (Table A.2). From 1950 onwards, the number of articles containing “virtual” increases steadily, with short periods of relative stagnation during the 1970s and 2010s, mirroring the broader increase in PR journal publications.

Figure 4. Overview of the PR corpus: The figure displays the total number of published articles per year containing “virtual” for the entire corpus (on the left) and their proportion (rolling mean over 3 years) per journal (on the right). For clarity, the proportions in PR - Letters and RMP are not shown.

The average share of articles containing “virtual” across all journals, as depicted in the right figure, is 6.04 percent over the entire period. In the pre-Feynman era (before 1950), this percentage generally remains lower, except for two notable peaks. In 1937, there is a temporary increase above 5 percent, driven by significant contributions from Bethe, Bacher and Livingston in RMP (Bethe and Bacher Reference Bethe and Bacher1936; Bethe Reference Bethe1937; Livingston and Bethe Reference Livingston and Bethe1937). The second peak in 1949 can likely be attributed to Feynman’s groundbreaking articles and their reception. For instance, with Space-Time Approach to Quantum Electrodynamics (Feynman Reference Feynman1949) – published in PR - Series II – Feynman introduced his eponymous diagrams for representing and analyzing quantum electrodynamic processes, which contributed significantly to the establishment of the concept of the virtual particle. In the same year, Freeman J. Dyson’s contributions, also published in Series II (Dyson Reference Dyson1949a, Reference Dyson1949b), further validated and established Feynman diagrams as a fundamental tool in QFT (Ehberger Reference Ehberger2025; Wüthrich Reference Wüthrich2010). Following the publications by Feynman and Dyson, the prevalence of “virtual” steadily increased, culminating in a peak during the 1960s and 1970s. This relatively high ratio of articles containing “virtual” may, at least in part, be due to the rise of an alternative to QFT: the so-called S-matrix theory (Cushing Reference Cushing1990). In this new theory, intermediate states were always on-shell such that it seems, at first sight, that “all talk of virtual particles was gone” (Kaiser Reference Kaiser2005, 285). However, in other work by S-matrix theorists like Chew, Low, or Barut the virtual particle concept seems to take center stage, and even explicitly occurs in the title of one of their articles (Barut Reference Barut1962; Chew and Low Reference Chew and Low1959). Subsequently, from the 1970s onward, QFT emerged as the dominant framework with the theoretical consolidation of a “standard model” of particle physics (Hoddeson et al. Reference Hoddeson, Brown, Riordan and Dresden1997; Salam Reference Salam1968; Weinberg Reference Weinberg1967). Finally, by the early 1980s, the proportion of articles containing “virtual” starts to decline to approximately 5 percent, gradually rising again from the 1990s onward, albeit not returning to the levels observed during the earlier peak period.

Zooming in on the individual journals or fields, respectively, articles containing “virtual” are notably prevalent in PR - D (particle physics, field theory, gravitation and cosmology) and PR - C (nuclear physics). Examination of arXiv classifications within PR - D reveals that nearly 90 percent of these articles fall under high-energy physics (HEP). Predominantly, these articles belong to the field of phenomenology (“hep-ph”), followed by experimental (“hep-ex”) and theoretical high-energy physics (“hep-th”) to a lesser extent. These findings align with expectations, as the concept of the virtual particle holds significant relevance in these fields. The frequency of “virtual” in PR - D peaks in the 1970s, 1990s and 2000s with drops in usage in between. Overall, it contributes approximately 27 percent of all articles containing the term “virtual” in the corpus, making it the largest source. Nuclear physics (PR - C) also features a significant percentage of articles containing “virtual,” comprising about 9 percent of the corpus. This aligns with recent research by Martinez on the origin of the notion of virtuality in modern physics (Martinez Reference Martinez2024). The proportion of relevant articles in PR - C increases steadily until the mid-1990s, plateaus until around 2010, and shows a recent decline. The term is less prevalent in the remaining journals, which will not be discussed in detail here for the sake of brevity. A table showing the top five journal-specific dependencies of the term “virtual” can be found in the appendix (Table A.1).

Dominant meaning becomes more stable

One key finding of our empirical study is that the dominant meaning of “virtual” becomes more stable over time. Figure 5 presents the results of the SCD-calculations regarding the shifts in the dominant meaning throughout the entire investigation period. The left graph displays the PRT-values for “virtual,” i.e., the inverted CS of the word prototypes for each year to preceding year. The right graph shows the JSD-values for both the KM and the AP-clustering. Due to the computational expense of AP-clustering, we randomly sampled approximately 25 percent of all embeddings, ensuring a minimum of 400 embeddings per year, where available.

Figure 5. Shifts in dominant meaning for “virtual,” using PRT (left) and JSD for k-means and AP-clustering (right) in the entire PR-corpus and over the entire investigation period.

The resulting conceptual development of “virtual” can be divided into two distinct phases. The first period, up until the 1950s, is characterized by pronounced fluctuations, indicating repeated conceptual reorientation during the early development of the concept, with no firmly established or dominant meaning. This trend can be seen in all three metrics, although the values for JSD on the basis of AP-clustering stabilizes at around 0.4. Notably, peaks are observed in the late 1920s and early 1940s. Given the limited number of data points available for this period, it is important to emphasize that our results for this early period reflect general trends rather than individual peaks. To ensure the robustness of our results, we conduct permutation-based statistical tests, which are described in detail at the end of this chapter. From approximately 1950 onward, marking the beginning of the second phase, the dominant meaning begins to stabilize progressively, although a minor peak is observed in the early 1980s. This suggests that the term “virtual” was increasingly only used in the sense of the virtual particle or cognate concepts – in-line with the influential contribution of Feynman at the end of the 1940s. Additional details on the shifts in dominant meaning in the field-specific journals can be found in the appendix (Figure A.1), indicating that the peak in the 1980s is mainly caused by a change in dominant meaning in PR - C (nuclear physics). We plan to conduct further research into the cause of this and other peaks. However, preliminary results indicate that around the time of the peak, the CS between “virtual” and “boson” also drops conspicuously. A possible interpretation for these changes is that the experimental detection of the W boson (Arnison et al. Reference Arnison, Astbury, Aubert, Bacci, Bauer and Bézaguet1983) has rendered them less “virtual” in the sense of being unobservable or unobserved. This would lend support to the claim that the reference to something that is not directly observable is the essential aspect of the meaning of “virtual.”

Our findings regarding the stabilization of the dominant meaning of “virtual” are also supported by the time-specific dependencies, as shown in Table 3. From the 1920s to the 1940s, “virtual” is most often associated with terms as diverse as “cathode,” “height,” “orbit,” “level,” and “oscillator.” In the 1940s, “virtual quanta” came into use, prominently featured in Feynman’s first diagrams (Feynman Reference Feynman1949). With the onset of the post-Feynman era in the 1950s, “virtual photons” and “virtual states” become increasingly established as the dominant contexts. Notably though, the concept of virtual transition, which Ehberger describes as essential for the concept’s early development (Ehberger Reference Ehberger2025), only appears among the most frequent dependencies from 1960s on. This might be due to our exclusion of German-language journals. From around 1990 onward, the dependency “correction” gains importance. These “virtual corrections” refer to parts of Feynman diagrams (or the corresponding mathematical expressions) involving the representation of a virtual particle. The increasing frequency of this use of “virtual” might be attributed to an increasing interest in (and feasibility of) “higher order” calculations and presicion measurements in various contexts, the most prominent being the search for the Higgs boson at the Large Electron–Positron Collider (LEP), which was in use at CERN from 1989 to 2000, the Tevatron (at Fermilab, 1983–2011), the planned Superconducting Super Collider (SSC, planned ca. 1983, cancelled in 1993), and at the Large Hadron Collider (LHC), which has been in use at CERN since 2009.Footnote 11 Nonetheless, “virtual photons” and “virtual states” remain the dominant contexts of use until the present, though less pronounced than in the 1960s and 1970s.

Table 2. Top four lemmatized dependencies of “virtual” per decade

Note: The number in brackets represents the share of the dependency in all dependencies of the decade.

The consistency of results across all three calculation methods, despite their different approaches, also notable: The values of PRT strongly correlate with those of JSD (Pearson coefficient for PRT and JSD - KM: 0.96, PRT and JSD - AP: 0.8), as well as the those of the two JSD metrics (0.77). These high correlation values suggest that both clustering methods reliably identify the various meanings of “virtual,” indicating stable and meaningful results. To further ensure the robustness of our findings despite the relatively low frequency of “virtual” in the early years, we employ permutation-based statistical tests for the PRT-metric, following the approach outlined in Liu, Medlar, and Glowacka (Reference Liu, Medlar, Glowacka, Gao, Eger, Zhao, Lertvittayakumjorn and Fomicheva2021). Permutation tests can be used to assess whether the observed test statistic (i.e., the SCD-metrics) differs significantly from zero, therefore indicating a semantic shift between two time periods. These tests are particularly suitable for low-frequency data because they do not rely on large sample sizes or specific distributional assumptions; instead they generate the sampling distribution based on the available data itself. This is achieved through the random and repeated rearrangement of the “virtual”-embeddings across the two time periods by sampling without replacement and then recalculating the SCD-metric for each permutation.Footnote 12 Following Liu, Medlar, and Glowacka (Reference Liu, Medlar, Glowacka, Gao, Eger, Zhao, Lertvittayakumjorn and Fomicheva2021), we employ the Benjamini–Hochberg procedure to adjust the p-values for multiple comparisons, thereby limiting the false discovery rate. Applying this method to our data, we find that the semantic shifts for the dominant meaning of “virtual” based on PRT are significant for almost all time interval. These findings support our conclusion regarding the general trend of the conceptual development while acknowledging variability in specific time periods. A detailed exemplary figure illustrating the results of the permutation tests for PRT can be found in the appendix (Figure A.2).

Polysemy increases

The second key finding of our empirical study is that the degree of polysemy of “virtual” increases. That means that while the most dominant use is that in association with the aforementioned concepts, its usage in different meanings is also expanding. Figure 6 presents the development of the degree of polysemy for “virtual” in the entire PR-corpus and over the entire investigation period. The left graph shows the AID-values, i.e., the AIDs of all “virtual”-embeddings in a given year. The values for the normalized Shannon entropy are displayed in the right graph, again for both the KM and the AP-clustering (with the same random sampling as described in the previous section).

Figure 6. Changing degree of polysemy for “virtual,” using AID (left) and normalized Shannon entropy for k-means and AP-clustering (right) in the entire PR-corpus and over the entire investigation period.

Similar to the results regarding the dominant meaning, the degree of polysemy fluctuates significantly in the early phase of the concept. Notably, the values are particularly low in the mid to late 1920s and early 1940s. These results are expected given the limited number of articles during these periods, as a small number of embeddings implies a correspondingly low number of different meanings. From 1938 to 1940, however, the values for all calculation methods are particularly high. A clear explanation for this spike is not immediately apparent, as neither the examination of the dependencies nor the shift in dominant meaning during these years provide insight. The described peaks in PRT and JSD occur several years later. One possible explanation could be that few but very different embeddings cause the peak. While the correlation coefficients between the metrics are again high (0.64 for AID and Shannon entropy (KM), 0.66 for AID and Shannon entropy (AP), and 0.94 for Shannon entropy (KM) and Shannon entropy (AP)), suggesting stable results, we were, however, unable to identify a suitable method for statistical testing of polysemy. Further research and qualitative assessment of the relevant papers is required and planned. Consequently, our present analysis focuses, once again, on general trends rather than individual peaks.

From around 1950 or 1960, depending on the metric, the fluctuations become smaller and the degree of polysemy continues to steadily increase. Notably, there is a brief spike in the early 1980s in the AID-values and another sharp increase in the 1990s, followed by a relative stabilization at a consistently high level in recent years. This increased polysemy in recent years is also reflected in the dependencies of “virtual” (Table 3), which show a decreasing total share of the top four usage contexts indicating a more even distribution of usage contexts from the 1990s onward compared to earlier decades. This trend is supported by the introduction of the journal PR - E in 1993, which is characterized by distinct usage contexts differing from those of other journals (see Table A.1). The Shannon entropy based on both clustering methods remains consistently high, exceeding or maxing out at 0.8 from about the 1950s onward and reaching nearly maximum values around the 2000s in the case of k-means. From 2010 onward, there is a small decrease in polysemy, possibly due to the second disciplinary differentiation leading to a slightly less varied usage of the term across the remaining journals. The trends observed in field-specific journals generally align with the overall findings. The details can be found in the appendix (Figure A.3).

Discussion

In this study, we investigated the use of CWEs to trace the evolution of scientific concepts over time. Our objectives were twofold: first, to chart the historical development of the virtual particle – a foundational concept in understanding the interactions between elementary particles–through nearly a century of PR articles (1924–2022); and second, to advance computational methodologies within the HPS. Accordingly, our discussion is organized into two primary sections: the first presents findings from our case study, focusing on lessons for the concept’s historical trajectory and associated semantic shifts; the second reflects on broader challenges and limitations encountered in this research, including obstacles in modeling conceptual networks, biases in datasets, and the evolving role of LLMs in shaping this domain.

Key findings from the case study

With the overarching objective to trace the historical development of the virtual particle, we employed various SCD metrics on a large number of CWEs of the term “virtual.” Our findings show that the dominant meaning of the term “virtual” becomes more stable over time while at the same time its degree of polysemy is increasing. This development can be divided into two distinct periods: An initial phase characterized by repeated conceptual reorientation with no firmly established meaning yet, and a second phase marked by the growing consolidation of the dominant meaning in the sense of the virtual particle, following Richard Feynman’s seminal works and their reception around 1950. At the same time, the degree of polysemy steadily increases throughout almost the entire investigation period, stabilizing only recently at a high level.

While these findings may appear contradictory at first, they are complementary upon closer inspection. Metrics for polysemy measure how dispersed the word embeddings are in the vector space, while metrics for dominant meaning track the stability of the primary cluster of embeddings over time. Our findings suggest that from the 1950s onward, the relative majority of the embeddings consistently centers around a usage tied to “virtual particle” (especially “virtual photons”), while the overall usage of the term “virtual” diversifies, possibly due to its uses in different fields like those in PR - E. As one reviewer helpfully pointed out, this pattern aligns with usage-based theories of semantic change, in which the stabilization of a dominant sense can occur alongside increasing polysemy. The reviewer also directed us to De Smet’s account of entrenchment in metaphorical extension, where terms become conventionalized in a specific sense without eliminating contextual variation (De Smet Reference De Smet2016). In this light, “virtual” appears to have become entrenched in its virtual particle meaning, while continuing to serve a broader range of functions across physics discourse. This process can be understood as one of conventionalization, through which a particular meaning becomes established within a specialist community, and specialization, whereby that meaning adopts a more narrowly defined role within expert discourse.

We combined our SCD-based approach with dependency parsing and qualitative assessments to evaluate the results. We find that the observed semantic shifts of “virtual” align with recent studies on the history of the virtual particle concept. This alignment is especially noticeable in the first phase of conceptual development, while the second phase – still under-researched – benefits from a more heuristic application of SCD. Notably, we identified an unexpected shift in dominant meaning of “virtual” in the early 1980s, that seems to be primarily driven by articles in nuclear physics (PR - C). We plan to investigate this peak further, along with a more in-depth discussion of the relevance of our findings for the history and philosophy of physics. As part of our evaluation strategy, we found dependency parsing to be a particularly effective tool – likely aided by the fact that “virtual,” the focus of our study, is an adjective. Nevertheless, we believe it holds broader potential as a resource-efficient evaluation method for future applications of SCD research.

Broader challenges and limitations

Beyond the empirical findings, this article aims to situate our case study within a broader discussion of the opportunities and challenges posed by LLMs for computational conceptual history in HPS. Three key issues arise in this context: the relationship between tracing words and tracing concepts, the inherent limitations and biases of historical datasets, and the handling of mathematical formulas.

The first issue concerns the relationship between SCD and conceptual history. While lexical SCD focuses on tracking the evolution of individual words, conceptual history aims to understand the development of concepts, which, as emphasized in the conceptual history literature, are best represented as networks or clusters of related words – often called “semantic spaces” (Gavin et al. Reference Gavin, Jennings, Kersey, Pasanek, Gold and Klein2019; Palti Reference Palti2011; Wevers and Koolen Reference Wevers and Koolen2020) – rather than isolated words. To effectively mobilize SCD for conceptual history, it is essential to develop methods for modeling the semantic change of such spaces as a whole rather than just individual words. Alternatively, we must establish the conditions under which the detected semantic changes of individual words can reliably reflect changes in the meaning of broader concepts. In our study, we focused on the term “virtual” as a key marker of the virtual particle concept. This choice reflects a significant but necessary trade-off between coverage and precision, given the scale of the dataset and current methodological limitations. While it limits the scope of our study by not directly tracing the semantic shifts of the compound term “virtual particle,” focusing on “virtual” allowed us cast a wider net in order to capture important variants of the virtual particle such as virtual photon or virtual level. Our analysis suggests that these were, respectively, the most important specific sense of virtual particle or one of the central precursor concepts out of which the virtual particle emerged. This approach aligns with insights from prototype theory in cognitive linguistics, which understands concepts as structured around central prototypes but extending to less typical, related expressions (Geeraerts Reference Geeraerts1989). As one reviewer helpfully noted, concepts often precede the emergence of a stable term, and may be realized through a variety of linguistic forms. Focusing on “virtual” thus offers a way to approximate the broader conceptual space, even in periods when the specific terminology virtual particle is not yet in use. While analyzing word dependencies offered partial insights into the concept’s extended semantic network, this approach falls short of comprehensively mapping its full complexity. A more thorough delineation of the conceptual space using BERT models would require extracting and clustering CWEs for all relevant terms in context – a computationally intensive and still largely unexplored task. Some progress has been made in this area using more traditional co-occurrence methods (cf. Kleymann, Niekler, and Burghardt Reference Kleymann, Niekler and Burghardt2022; Malaterre and Léonard Reference Malaterre and Léonard2024) and a “human-in-the-loop” framework for modeling concepts with iteratively updated sentence embeddings based on user feedback (Fischer et al. Reference Fischer, Schneider, Geislinger, Helfer, Koch, Biemann, Chang, Lee and Rajani2024). Future work should address efficient methods for identifying and modeling dynamic relationships within these semantic spaces, enabling a more comprehensive representation of concepts as interconnected networks and bridging the gap between lexical SCD and conceptual history.

The second issue lies in the limitations and biases of historical datasets (Periti and Montanelli Reference Periti and Montanelli2024; Wevers and Koolen Reference Wevers and Koolen2020). Temporal sparsity often affects the availability of data from early stages of conceptual development, as older periods are underrepresented due to the scarcity of publications, archival losses, or limited digitization. This imbalance complicates efforts to trace the origins and early transformations of concepts, which are often the most critical yet least accessible aspects of their development. We tried to account for some of these issues by employing permutation-based statistical testing to ensure significant results, but further work is needed. Additionally, historical corpora frequently feature linguistic diversity and orthographic shifts, which add complexity to preprocessing and model training. Geographic and cultural biases compound these challenges, as historical datasets disproportionately reflect dominant scientific or cultural centers, both past and present. In our study, for instance, we analyzed an English-language journal, which aligns with the current dominance of English as the lingua franca of science. However, this focus neglects earlier periods when other languages, such as German, played a pivotal role in physics. Even when multilingual resources are available, aligning and analyzing them presents additional challenges (cf. Periti and Montanelli Reference Periti and Montanelli2024), requiring sophisticated cross-linguistic processing and semantic harmonization. Addressing these intertwined issues of temporal, geographic, and linguistic bias is essential to ensure that computational approaches to conceptual history provide a robust and inclusive understanding of the intellectual past.

The third issue concerns the treatment of mathematical formulas in the context of conceptual history and HPS. We excluded formulas from our analysis, as their inclusion risked introducing excessive noise – each symbol being tokenized as a single unit in BERT. However, the overall impact of omitting formulas remains difficult to estimate. On one hand, the symbols used in formulas are often explained in the surrounding text, which was included in our analysis. On the other hand, formulas can carry significant standalone meaning in ways not explicitly clarified in the accompanying text. In our case, for instance, they may represent different virtual entities, such as particles, states, or processes. Furthermore, changes in the frequency of formulas over time could have influenced their relevance to the conceptual evolution, making their exclusion potentially more or less consequential depending on the period in question.

Looking ahead, advances in LLMs – including improved multilingual and multimodal capabilities as well as prompt-based methods for modeling semantic change – are expected to open up new avenues for the historical analysis of scientific concepts. With regard to SCD, initial studies exploring zero-shot and few-shot approaches using generative LLMs such as (Chat)GPT suggest that, while promising, these models are still outperformed by BERT-based approaches and are less efficient in terms of cost and computational resources (Cassotti, De Pascale, and Tahmasebi Reference Cassotti, De Pascale, Tahmasebi, Ku, Martins and Srikumar2024; Giulianelli et al. Reference Giulianelli, Luden, Fernandez, Kutuzov, Rogers, Boyd-Graber and Okazaki2023; Periti and Montanelli Reference Periti and Montanelli2024; Periti and Tahmasebi Reference Periti, Tahmasebi, Duh, Gomez and Bethard2024; Periti, Dubossarsky, and Tahmasebi Reference Periti, Dubossarsky, Tahmasebi, Graham and Purver2024). In particular, we believe that BERT-based models are still better suited for analyzing highly specialized language where domain adaptation plays a crucial role. Our own use of fine-tuned BERT embeddings in the domain of particle physics has proven effective in tracing semantic shifts within this highly specific context. Nevertheless, prompt-based approaches may soon yield broader insights across more diverse datasets. Future work could explore hybrid strategies, employing generative LLMs for initial exploration and BERT-based embeddings for more granular, domain-specific analysis. In addition, the rise of multimodal approaches (Wu et al. Reference Wu, Gan, Chen, Wan and Yu2023; Yin et al. Reference Yin, Fu, Zhao, Li, Sun, Xu and Chen2024), which integrate non-textual data such as visual and oral communication, offers a pathway for addressing some limitations inherent in text-only methods. These models may be used in future work to incorporate visual representations (e.g., Feynman diagrams, as in the case of virtual particles) and mathematical formulas alongside textual data to more fully capture how scientific meaning is constructed and evolves. Collectively, these innovations have the potential to support a more comprehensive and nuanced understanding of conceptual development within scientific practice.

Acknowledgments

We would like to thank the members of the DFG Research Unit “The Epistemology of the Large Hadron” for their valuable feedback at various stages of this work. We are especially grateful to Robert Harlander, Jean-Philippe Martinez, Rebecka Mähring and Friedrich Steinle, as well as the anonymous reviewers, for their insightful comments and helpful suggestions. This article is based on Michael Zichert’s MSc thesis, defended at the University of Leipzig (Computational Humanities Research Group), under the supervision of Adrian Wüthrich and Andreas Niekler. We would also like to thank Andreas Niekler for his additional feedback. Finally, we gratefully acknowledge the APS for granting access to the relevant full texts and metadata as well as the support by the Open Access Publication Fund of TU Berlin.

Data availability statement

The code used in this study is available at https://github.com/mZichert/scd_vp. Due to copyright restrictions regarding the PR journals, the dataset and the domain-adapted BERT model used in the study are not publicly available.

Disclosure of use of AI tools

This manuscript was prepared solely by the authors. ChatGPT4o was used to improve language, style and readability. All ideas, analyses and chapter drafts are the authors’ own work. No AI-generated content was included without human verification and revision. The authors take full responsibility for the final text.

Author contributions

M.Z.: Conceptualization, methodology, software, data curation, formal analysis, investigation, visualization, writing – original draft. A.S.: Conceptualization, validation, writing – original draft. A.W.: Conceptualization, validation, supervision, funding acquisition, writing – review and editing.

Funding statement

This work was supported by the DFG Research Unit “The Epistemology of the LHC” (Grant FOR 2063) as well as by the European Union under an ERC Consolidator Grant (Project No. 101044932, “Network Epistemology in Practice (NEPI)”). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Competing interests

The authors declare none.

Ethical standards

The authors affirm this research did not involve human participants.

Appendix

A. Figures

Figure A.1. Shifts in dominant meaning in field-specific PR-journals for “virtual,” using PRT (left) and JSD for KM (right). For clarity, the rolling mean over three years is shown.

Figure A.2. P-values (unadjusted and adjusted with Benjamini–Hochberg procedure) for the permutation-based statistical testing of the PRT-metric for “virtual.” The testing was done for 100.000 iterations (r). The dashed red line marks the significance threshold of 0.05.

Figure A.3. Changing degree in polysemy in field-specific PR-journals for “virtual,” using AID (left) and normalized Shannon entropy for KM (right). For clarity, the rolling mean over three years is shown.

Tables

Table A.1. Top five lemmatized dependencies for “virtual” per field-specific journal

Note: The number in brackets represents the share of the dependency per journal in all dependencies of the decade.

Table A.2 Detailed count of articles, “virtual”-embeddings and cleaned tokens in “virtual”-corpus per year

Footnotes

This research article was awarded the Open Materials badge for transparent practices. See the Data availability statement for details.

1 To clarify the distinction between a concept and its linguistic representations, we use italics to denote the concept (e.g., virtual particle) and quotation marks to refer to its linguistic expressions (e.g., “virtual particle”).

2 Synonym terms such as “intermediate” – particularly in early uses like “intermediate states” – were explored as possible precursors to “virtual” in Michael Zichert’s master thesis (Zichert Reference Zichert2023), on which this article builds. However, this analysis did not yield sufficiently robust results using the SCD approach. Related terms such as “virtuality” were also considered, as they refer to specific aspects of the concept (e.g., particles being off-shell), but for the present article we chose to focus exclusively on “virtual” in order to maintain clarity and analytic focus.

3 This article is an extended version of our previous work published in the proceedings of the Computational Humanities Research conference in 2024 (Aarhus) (Zichert and Wüthrich Reference Zichert, Wüthrich, Haverals, Koolen and Thompson2024). The present version has been further developed by situating the study more explicitly within the field of conceptual history in HPS, highlighting the methodological challenges of applying CWEs to this aim, reflecting on the potential of LLM-based methods to enrich conceptual history and related fields, and offering a more in-depth interpretation of the results.

4 GROBID stands for GeneRation Of BIbliographic Data (https://grobid.readthedocs.io/en/latest/).

11 For an non-technical overview of higher order calculations, see Zanderighi (Reference Zanderighi2017).

12 We limit the number of permutations to a maximum of 100,000 per time interval, i.e., two subsequent years, to save computational resources.

References

American Physical Society. 2023. “APS Data Sets for Research.” https://journals.aps.org/datasets.Google Scholar
Arnison, G., Astbury, A., Aubert, B., Bacci, C., Bauer, G., Bézaguet, A., et al. 1983. “Experimental Observation of Isolated Large Transverse Energy Electrons with Associated Missing Energy at s=540 GeV.” Physics Letters B 122, no. 1: 103–16.Google Scholar
Barut, Asim O. 1962. “Virtual Particles.” Physical Review 126, no. 5: 1873–75.Google Scholar
Baumann, Andreas, Stephan, Andreas, and Roth, Benjamin. 2023. “Seeing Through the Mess: Evolutionary Dynamics of Lexical Polysemy.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, edited by Bouamor, Houda, Pino, Juan, and Bali, Kalika, 8745–62. Association for Computational Linguistics.Google Scholar
Beltagy, Iz, Lo, Kyle, and Cohan, Arman. 2019. “SciBERT: A Pretrained Language Model for Scientific Text.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), edited by Inui, K., Jiang, J., Ng, V., and Wan, X., 3615–20. Association for Computational Linguistics.Google Scholar
Bethe, Hans Albrecht. 1937. “Nuclear Physics B. Nuclear Dynamics, Theoretical.” Reviews of Modern Physics 9, no. 2: 69244.Google Scholar
Bethe, Hans Albrecht, and Bacher, Robert Fox. 1936. “Nuclear Physics A. Stationary States of Nuclei.” Reviews of Modern Physics 8, no. 2: 82229.Google Scholar
Blum, Alexander S. 2017. “The State is not Abolished, it Withers Away: How Quantum Field Theory Became a Theory of Scattering.” Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 60: 4680.Google Scholar
Bollen, Johan, Rodriguez, Marko A., and Van de Sompel, Herbert. 2006. “Journal Status.” Scientometrics 69, no. 3: 669–87.Google Scholar
Callon, Michel, Courtial, Jean-Pierre, Turner, William A., and Bauin, Serge. 1983. “From Translations to Problematic Networks: An Introduction to Co-Word Analysis.” Social Science Information 22, no. 2: 191235.Google Scholar
Callon, Michel, Law, John, and Rip, Arie. 1986. “Qualitative Scientometrics.” In Mapping the Dynamics of Science and Technology: Sociology of Science in the Real World, edited by Callon, Michel, Law, John, and Rip, Arie, 103–23. Macmillan.Google Scholar
Cassotti, Pierluigi, De Pascale, Stefano, and Tahmasebi, Nina. 2024. “ Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types .” In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers), edited by Ku, L.-W., Martins, A., and Srikumar, V., 4539–53. Association for Computational Linguistics.Google Scholar
Chang, Hasok. 2007. Inventing Temperature: Measurement and Scientific Progress. Oxford University Press.Google Scholar
Chen, Baitong, Ding, Ying, and Ma, Feicheng. 2018. “Semantic Word Shifts in a Scientific Domain.” Scientometrics 117, no. 1: 211226.Google Scholar
Chew, Geoffrey F., and Low, Francis E.. 1959. “Unstable Particles as Targets in Scattering Experiments.” Physical Review 113, no. 6: 1640–8.Google Scholar
Cushing, James T. 1990. Theory Construction and Selection in Modern Physics: The S Matrix. Cambridge University Press.Google Scholar
Daston, Lorraine, and Galison, Peter L.. 2007. Objectivity. Zone Books.Google Scholar
De Smet, Hendrik. 2016. “How Gradual Change Progresses: The Interaction Between Convention and Innovation.” Language Variation and Change 28, no. 1: 83102.Google Scholar
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, and Toutanova, Kristina. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), edited by Burstein, J., Doran, C., and Solorio, T., 4171–86. Association for Computational Linguistics.Google Scholar
Dyson, Freeman J. 1949a. “The Radiation Theories of Tomonaga, Schwinger, and Feynman.” Physical Review 75, no. 3: 486502.Google Scholar
Dyson, Freeman J. 1949b. “The S Matrix in Quantum Electrodynamics.” Physical Review 75, no. 11: 1736–55.Google Scholar
Ehberger, Markus. 2025. Representing the Unobservable: The Formation of the Virtual Particle Concept in the Practice of Theory (1923–1949), Volume 68 of Science Networks. Historical Studies. Cham: Birkhäuser.Google Scholar
Feest, Uljana, and Steinle, Friedrich. 2012. Scientific Concepts and Investigative Practice. De Gruyter.Google Scholar
Feynman, Richard Phillips. 1949. “Space-Time Approach to Quantum Electrodynamics.” Physical Review 76, no. 6: 769–89.Google Scholar
Fischer, Tim, Schneider, Florian, Geislinger, Robert, Helfer, Florian, Koch, Gertraud, and Biemann, Chris. 2024. “Concept over time analysis: Unveiling temporal patterns for qualitative data analysis.” In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations), edited by Chang, Kai-Wei, Lee, Annie, and Rajani, Nazneen, 148–57. Association for Computational Linguistics.Google Scholar
Fleck, Ludwik. 1979. Genesis and Development of a Scientific Fact. University of Chicago Press.Google Scholar
Foucault, Michel. 1970. The Order of Things: An Archaeology of the Human Sciences. First published edition. Tavistock.Google Scholar
Gao, Yunfan, Xiong, Yun, Gao, Xinyu, Jia, Kangxiang, Pan, Jinliu, Bi, Yuxi, Dai, Yi, Sun, Jiawei, Wang, Meng, and Wang, Haofen. 2024. “Retrieval-Augmented Generation for Large Language Models: A Survey.”Google Scholar
Garí Soler, Aina, and Apidianaki, Marianna. 2021. “Let’s Play Mono-Poly: BERT Can Reveal Words’ Polysemy Level and Partitionability into Senses.” Transactions of the Association for Computational Linguistics 9: 825–44.Google Scholar
Gavin, Michael, Jennings, Collin, Kersey, Lauren, and Pasanek, Brad. 2019. “Spaces of Meaning: Conceptual History, Vector Semantics, and Close Reading.” In Debates in the Digital Humanities 2019, edited by Gold, Matthew K. and Klein, Lauren F., 243–67. University of Minnesota Press.Google Scholar
Geeraerts, Dirk. 1989. “Introduction: Prospects and Problems of Prototype Theory.” Linguistics 27, no. 4: 587612.Google Scholar
Giulianelli, Mario, Tredici, Marco Del, and Fernández, Raquel. 2020. “Analysing Lexical Semantic Change with Contextualised Word Representations.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, edited by Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J., 3960–73. Association for Computational Linguistics.Google Scholar
Giulianelli, Mario, Luden, Iris, Fernandez, Raquel, and Kutuzov, Andrey. 2023. “Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis.” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Rogers, A., Boyd-Graber, J., and Okazaki, N., 3130–48. Association for Computational Linguistics.Google Scholar
Grezes, Felix, Blanco-Cuaresma, Sergi, Accomazzi, Alberto, Kurtz, Michael J., Shapurian, Golnaz, Henneken, Edwin, Grant, Carolyn S., Thompson, Donna M., Chyla, Roman, McDonald, Stephen, Hostetler, Timothy W., Templeton, Matthew R., Lockhart, Kelly E., Martinovic, Nemanja, Chen, Shinyi, Tanner, Chris, and Protopapas, Pavlo. 2021. “Building AstroBERT, a Language Model for Astronomy & Astrophysics.”Google Scholar
Grootendorst, Maarten. 2022. “BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure.”Google Scholar
Hacking, Ian. 1975. The Emergence of Probability: A Philosophical Study of Early Ideas About Probability, Induction and Statistical Inference. 1st published edition. Cambridge University Press.Google Scholar
Hacking, Ian. 1999. The Social Construction of What? Harvard University Press.Google Scholar
Harlander, Robert, Martinez, Jean-Philippe, and Schiemann, Gregor. 2023. “The End of the Particle era?The European Physical Journal H 48, no. 1: 6.Google Scholar
Hellert, Thorsten, Montenegro, João, and Pollastro, Andrea. 2024. “PhysBERT: A Text Embedding Model for Physics Scientific Literature.” APL Machine Learning 2, no. 4: 046105.Google Scholar
Hoddeson, Lillian, Brown, Laurie, Riordan, Michael, and Dresden, Max. 1997. The Rise of the Standard Model: Particle Physics in the 1960s and 1970s. Cambridge University Press.Google Scholar
Jaeger, Gregg. 2019. “Are Virtual Particles Less Real?Entropy 21, no. 2.Google Scholar
Kaiser, David. 2005. “Physics and Feynman’s Diagrams.” American Scientist 93, no. 2.Google Scholar
Kleymann, Rabea, Niekler, Andreas, and Burghardt, Manuel. 2022. “Conceptual Forays: A Corpus-based Study of “Theory” in Digital Humanities Journals.” Journal of Cultural Analytics 7, no. 4.Google Scholar
Kuhn, Thomas S. 1962. The Structure of Scientific Revolutions. University of Chicago Press.Google Scholar
Kutuzov, Andrey, Velldal, Erik, and Øvrelid, Lilja. 2022. “Contextualized Language Models for Semantic Change Detection: Lessons Learned.” Northern European Journal of Language Technology 8, no. 1.Google Scholar
Laicher, Severin, Kurtyigit, Sinan, Schlechtweg, Dominik, Kuhn, Jonas, and Schulte im Walde, Sabine. 2021. “Explaining and Improving BERT Performance on Lexical Semantic Change Detection.” In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, edited by Sorodoc, I.-T., Sushil, M., Takmaz, E., and Agirre, E., 192202. Association for Computational Linguistics.Google Scholar
Laubichler, Manfred D., Maienschein, Jane, and Renn, Jürgen. 2019. “Computational History of Knowledge: Challenges and Opportunities.” Isis 110, no. 3: 502–12.Google Scholar
Lean, Oliver M., Rivelli, Luca, and Pence, Charles H.. 2023. “Digital Literature Analysis for Empirical Philosophy of Science.” The British Journal for the Philosophy of Science 74, no. 4: 875–98.Google Scholar
Lee, Jinhyuk, Yoon, Wonjin, Kim, Sungdong, Kim, Donghyeon, Kim, Sunkyu, So, Chan Ho, and Kang, Jaewoo. 2020. “BioBERT: A Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining.” Bioinformatics 36, no. 4: 1234–40.Google Scholar
Liu, Yan, Medlar, Alan, and Glowacka, Dorota. 2021. “Statistically Significant Detection of Semantic Shifts Using Contextual Word Embeddings.” In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, edited by Gao, Y., Eger, S., Zhao, W., Lertvittayakumjorn, P., and Fomicheva, M., 104–13. Association for Computational Linguistics.Google Scholar
Livingston, M. Stanley, and Bethe, Hans Albrecht. 1937. “Nuclear Physics C. Nuclear Dynamics, Experimental.” Reviews of Modern Physics 9, no. 3: 245390.Google Scholar
Malaterre, Christophe, and Léonard, Martin. 2024. “Epistemic Markers in the Scientific Discourse.” Philosophy of Science 91, no. 1: 151–74.Google Scholar
Martinc, Matej, Kralj Novak, Petra, and Pollak, Senja. 2020. “Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift.” In Proceedings of the Twelfth Language Resources and Evaluation Conference, edited by Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., 4811–9. European Language Resources Association.Google Scholar
Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, and Pivovarova, Lidia. 2020. “Capturing Evolution in Word Usage: Just Add More Clusters?” In Companion Proceedings of the Web Conference 2020, WWW’20, 343–9. Association for Computing Machinery.Google Scholar
Martinez, Jean-Philippe. 2024. “Virtuality in Modern Physics in the 1920s and 1930s: Meaning(s) of an Emerging Notion.” Perspectives on Science 32, no. 3: 350–71.Google Scholar
Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. 2013. “Efficient Estimation of Word Representations in Vector Space.”Google Scholar
Montariol, Syrielle, Martinc, Mate, and Pivovarova, Lidia. 2021. “Scalable and Interpretable Semantic Change Detection.” In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, edited by Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., and Zhou, Y., 4642–52. Association for Computational Linguistics.Google Scholar
Overton, James A. 2013. “”Explain” in Scientific Discourse.” Synthese 190, no. 8: 1383–405.Google Scholar
Palti, Elias Jose. 2011. “Reinhart Koselleck: His Concept of the Concept and Neo-Kantianism.” Contributions to the History of Concepts 6, no. 2: 120.Google Scholar
Periti, Francesco, Dubossarsky, Haim, and Tahmasebi, Nina. 2024. “ (Chat)GPT v BERT Dawn of Justice for Semantic Change Detection .” In Findings of the Association for Computational Linguistics: EACL 2024, edited by Graham, Y. and Purver, M., 420–36. Association for Computational Linguistics.Google Scholar
Periti, Francesco, and Montanelli, Stefano. 2024. “Lexical Semantic Change through Large Language Models: A Survey.” ACM Computing Surveys 56, no. 11: 282:1–38.Google Scholar
Periti, Francesco, and Tahmasebi, Nina. 2024. “A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change.” In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1 : Long Papers), edited by Duh, K., Gomez, H., and Bethard, S., 4262–82. Association for Computational Linguistics.Google Scholar
Pickering, Andrew. 1999. Constructing Quarks: A Sociological History of Particle Physics. University of Chicago Press.Google Scholar
Radford, Alec, and Narasimhan, Karthik. 2018. “Improving Language Understanding by Generative Pre-Training.”Google Scholar
Rheinberger, Hans-Jorg. 1997. Toward a History of Epistemic Things: Synthesizing Proteins in the Test Tube (Illustrated Edition). Stanford University Press.Google Scholar
Salam, A. 1968. “Elementary Particle Theory: : Relativistic Groups and Analyticity.” In Prog. of the Nobel Symposium, 1968, Stockholm, Sweden 367.Google Scholar
Simons, Arno. 2024a. “Astro-HEP-BERT: A Bidirectional Language Model for Studying the Meanings of Concepts in Astrophysics and High Energy Physics.”Google Scholar
Simons, Arno. 2024b. “Meaning at the Planck Scale? Contextualized Word Embeddings for Doing History, Philosophy, and Sociology of Science.”Google Scholar
Simons, Arno, Zichert, Michael, and Wüthrich, Adrian. 2025. “Large Language Models for History, Philosophy, and Sociology of Science: Interpretive Uses, Methodological Challenges, and Critical Perspectives.” Prepublished, Preprint.Google Scholar
Small, Henry. 1973. “Co-Citation in the Scientific Literature: A New Measure of the Relationship Between Two Documents.” Journal of the American Society for information Science 24, no. 4: 265–9.Google Scholar
Small, Henry G. 1978. “Cited Documents as Concept Symbols.” Social Studies of Science 8, no. 3: 327.Google Scholar
Steinle, Friedrich. 2016. Exploratory Experiments: Ampère, Faraday, and the Origins of Electrodynamics. University of Pittsburgh Press.Google Scholar
Tahmasebi, Nina, Borin, Lars, and Jatowt, Adam. 2021. “Survey of Computational Approaches to Lexical Semantic Change Detection.” Computational Approaches to Semantic Change, 191.Google Scholar
Valente, Mario Bacelar. 2011. “Are Virtual Quanta Nothing but Formal Tools?International Studies in the Philosophy of Science 25, no. 1: 3953.Google Scholar
Weinberg, Steven. 1967. “A Model of Leptons.” Physical Review Letters 19: 1264–6.Google Scholar
Wevers, Melvin, and Koolen, Marijn. 2020. “Digital Begriffsgeschichte: Tracing Semantic Change Using Word Embeddings.” Historical Methods: A Journal of Quantitative and Interdisciplinary History 53, no. 4: 226–43.Google Scholar
Wu, Jiayang, Gan, Wensheng, Chen, Zefeng, Wan, Shicheng, and Yu, Philip S.. 2023. “Multimodal Large Language Models: A Survey,” 2247–56. IEEE Computer Society.Google Scholar
Wüthrich, Adrian. 2010. The Genesis of Feynman Diagrams. Dordrecht: Springer.Google Scholar
Yin, Shukang, Fu, Chaoyou, Zhao, Sirui, Li, Ke, Sun, Xing, Xu, Tong, and Chen, Enhong. 2024. “A Survey on Multimodal Large Language Models.” National Science Review 11, no. 12: nwae403.Google Scholar
Zanderighi, Giulia. 2017. “The Two-Loop Explosion.” CERN Courier 57, no. 3: 1922.Google Scholar
Zhang, Yu, Chen, Xiusi, Jin, Bowen, Wang, Sheng, Ji, Shuiwang, Wang, Wei, and Han, Jiawei. 2024. “A Comprehensive Survey of Scientific Large Language Models and their Applications in Scientific Discovery.” In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, edited by Al-Onaizan, Y., Bansal, M., and Chen, Y.-N., 8783–817. Association for Computational Linguistics.Google Scholar
Zhu, Yutao, Yuan, Huaying, Wang, Shuting, Liu, Jiongnan, Liu, Wenhan, Deng, Chenlong, Chen, Haonan, Liu, Z., Dou, Zhicheng, and Wen, Ji-Rong. 2025. “Large Language Models for Information Retrieval: A Survey.” ACM Transactions on Information Systems. Just Accepted.Google Scholar
Zichert, Michael. 2023. “Eine digitale Begriffsgeschichte des virtuellen Teilchens.” M.Sc. thesis, University of Leipzig.Google Scholar
Zichert, Michael, and Wüthrich, Adrian. 2024. “Tracing the Development of the Virtual Particle Concept Using Semantic Change Detection.” In Proceedings of the Computational Humanities Research Conference 2024, Volume 3834 of CEUR Workshop Proceedings, edited by Haverals, W., Koolen, M., and Thompson, L., 848–68. CEUR.Google Scholar
Figure 0

Figure 1. Feynman diagram illustrating the electromagnetic interaction between two electrons ($\mathrm {e^-}$) via the exchange of a virtual photon ($\gamma $) (time axis oriented from bottom to top). The two electrons, represented by the external solid lines, are considered “real,” while the photon, represented by the internal wavy line, is considered “virtual.” Source: https://commons.wikimedia.org/wiki/File:Feynmandiagram.svg.

Figure 1

Figure 2. Number of published articles per year for each journal in the PR-corpus. The first dashed line indicates the transition from Series II to PR A - D, while the second dashed line marks a subsequent disciplinary differentiation around 2010. The numbers in brackets denote the total article count per journal across the entire corpus, rounded to the nearest thousand.

Figure 2

Figure 3. Overview of the SCD workflow used in this study, showing the four main steps of the analysis: corpus creation (1), embedding (2), vector aggregation (3) and shift assessment (4). The vertical arrows indicate the chronological sequence of processes within each step. For vector aggregation, both form-based and sense-based aggregation methods are applied in parallel.

Figure 3

Table 1. Reference table of notations used in this article

Figure 4

Figure 4. Overview of the PR corpus: The figure displays the total number of published articles per year containing “virtual” for the entire corpus (on the left) and their proportion (rolling mean over 3 years) per journal (on the right). For clarity, the proportions in PR - Letters and RMP are not shown.

Figure 5

Figure 5. Shifts in dominant meaning for “virtual,” using PRT (left) and JSD for k-means and AP-clustering (right) in the entire PR-corpus and over the entire investigation period.

Figure 6

Table 2. Top four lemmatized dependencies of “virtual” per decade

Figure 7

Figure 6. Changing degree of polysemy for “virtual,” using AID (left) and normalized Shannon entropy for k-means and AP-clustering (right) in the entire PR-corpus and over the entire investigation period.

Figure 8

Figure A.1. Shifts in dominant meaning in field-specific PR-journals for “virtual,” using PRT (left) and JSD for KM (right). For clarity, the rolling mean over three years is shown.

Figure 9

Figure A.2. P-values (unadjusted and adjusted with Benjamini–Hochberg procedure) for the permutation-based statistical testing of the PRT-metric for “virtual.” The testing was done for 100.000 iterations (r). The dashed red line marks the significance threshold of 0.05.

Figure 10

Figure A.3. Changing degree in polysemy in field-specific PR-journals for “virtual,” using AID (left) and normalized Shannon entropy for KM (right). For clarity, the rolling mean over three years is shown.

Figure 11

Table A.1. Top five lemmatized dependencies for “virtual” per field-specific journal

Figure 12

Table A.2 Detailed count of articles, “virtual”-embeddings and cleaned tokens in “virtual”-corpus per year

Submit a response

Rapid Responses

No Rapid Responses have been published for this article.