1. Introduction
In the context of Engineering Desing (ED), Natural Language Processing (NLP) techniques have enabled computer-based systems to process large volumes of unstructured technical information written in natural language. With the advent of Large Language Models (LLMs), the ability to extract and generate ED knowledge from unstructured text has significantly improved (Hue et al., 2023). LLMs are pretrained on vast corpora of textual data, learning to predict the next word in a sequence. After pretraining, these models can be prompted with textual inputs (i.e., prompt) and are able to generate textual response as outputs. This approach, known as prompting, has been extensively used in the last year to extract and/or generate ED knowledge from technical documents such as design specifications, product reviews and patents. For instance, Reference El Hassani, Masrour, Kourouma, Motte and TavčarEl Hassani et al. (2024), integrated LLMs into Failure-Mode-Effect-Analysis, using prompting to extract failure modes from customer reviews of vehicles. Additionally, they use LLMs to generate, for a given failure mode potential causes. Similarly, Reference Chen, Zuo, Cai, Yin, Zhang, Sun and WangChen et al. (2024a) use LLMs with prompting to extract Function-Behaviour-Structure ontology from engineering requirements, helping in the generation of new design concepts. Reference Chen, Song, Ding, Sun, Childs and ZuoChen et al. (2024b) used various prompt strategies to extract problems from descriptions and generate TRIZ-based solutions.
These approaches use design theories to construct prompts that guide LLMs in extracting and/or generating ED concepts from text. By integrating design theories into textual prompts, these approaches operationalize design theories, enabling their practical exploitation for ED applications. However, in the field of ED, little attention has been devoted to assessing whether LLMs generate outputs truly align with theoretical concepts of design theories. The evaluation of LLM-generated outputs is often omitted or inconsistent, typically relying on qualitative assessments of a limited number of manually selected cases. For example, Reference Chen, Zuo, Cai, Yin, Zhang, Sun and WangChen et al. (2024a) did not assess the performance of the Functions-Behaviour-Structure concepts extracted from design statements. Reference El Hassani, Masrour, Kourouma, Motte and TavčarEl Hassani et al. (2024) conducted a qualitative evaluation of only 100 failure modes extracted by prompting an LLM. Similarly, Reference Akay and KimAkay and Kim (2021) qualitatively evaluated a LLM on 88 pairs of functional requirements and design parameters extracted through prompting from 11 scientific paper abstracts. These approaches rely on the generative capabilities of LLMs to extract and generate ED knowledge in an unsupervised manner, without providing a quantitatively assessment of the performance of LLM-generated outputs in comparison to the theoretical definitions provided by design theories.
A quantitatively assessment of the performance of LLMs could be achieve by translating the design theory in language. This translation step, we called linguistic modelling, consist in creating a benchmark textual datasets manually annotated by experts containing instances of text that express the design theory under analysis. Such datasets are essential for systematically training, testing and comparing LLMs on ED tasks. However, creating such datasets presents significant challenges: (1) understanding how theoretical ED concepts are expressed in natural language is a critical prerequisite for accurately identifying them in unstructured text, since the only way by which designers, engineers, and humans in general can interact with an LLM is through natural language. In fact, these systems exclusively process natural language as both input and output; (2) natural language is ambiguous and often lacks the precision needed to clearly convey technical concepts; (3) manually annotating technical documents is expensive and time-consuming; (4) it requires domain-specific expertise, and (5) experts with such domain-specific knowledge are few and have limited time or interest in performing tasks like text annotation, which are often viewed as tedious and outside their primary objectives.
In this work, we present a linguistic modelling of the theoretical concepts from Axiomatic Design (AXD) theory ( Reference SuhSuh, 2001). Specifically, we analysed how Functional Requirements and Design Parameters can be expressed in natural language. We argue that identifying AXD concepts in text is challenging even for experts in the field, and it is unlikely that prompting LLMs can perform this task reliably. To support this claim, we quantitatively assess the challenges of annotating these concepts in sentences extracted from patent documents. We used patents as our data source because they explain how a technology works, providing highly technical content. Additionally, these data are openly available for research and have been used in previous studies on AXD concept identification ( Reference Li and TateLi and Tate, 2010). As a contribution, we release a benchmark textual dataset containing 6,000 patent sentences annotated with 19,555 AXD concepts, validated by domain experts.
2. Axiomatic Design
Design has been defined in a variety of ways depending on the specific context and field of interest. Among the many theories on defining and executing good design, Axiomatic Design (AXD) theory, introduced by Reference SuhSuh (2001), stands out as a principle-based theory for developing a scientific foundation of design. Suh ( Reference Suh2001, p. 3) defines design as: “the interplay [mapping] between what we want to achieve and how we want to achieve it”. In AXD theory, Functional Requirements (FRs) define “what we want to achieve”, describing the functional needs of a system (e.g., product, process, software, or organization) within the functional domain. Conversely, Design Parameters (DPs) specify “how we want to achieve”, representing the key physical variables in the physical domain that satisfy the FRs of a system. Suh ( Reference Suh2001, p. 12, Table 1.1) further defines DPs as “machine, components and subcomponents” of a system and “physical variables” of a product. For example, an aluminium beverage container must “contain carbonated liquid” (i.e., FR), which can be achieved by setting the “volume of the can body” (i.e., DP). The primary goal of AXD theory is minimizing the trail-and-error nature of the design process by providing a scientific approach to mapping FRs into DPs. For readers familiar with AXD theory, this mapping can be expressed mathematically as FRs = [A]*DPs, where [A] is the design matrix that defines the relationships between the FRs and DPs. This ensures that, once customers’ needs are specified in terms of FRs, designers identify the most effective combination of DPs to meet all the FRs.
Two fundamental axioms guide the AXD process for developing effective designs. Firstly, the Independence Axiom states that a good design maintains FR independence, meaning each FR is satisfied by a unique DP. This results in an uncoupled design. Secondly, the Information Axiom states that among multiple uncoupled designs, the optimal one minimizes information content, ensuring a higher probability that DPs successfully fulfil the FRs. By following these principles, AXD improves the design process, leading to more reliable and optimized solutions.
2.1. The grammar of Axiomatic Design theory in natural language
Natural language inherently introduce complexity when used to express AXD concepts because it often lacks the precision required in technical definitions. Therefore, applying Reference SuhSuh’s (2001) definitions to natural language text can result in multiple interpretations. For instance, consider the sentence “The thermally resistant body of the beverage container is required to contain carbonated liquid by having a volume of 33 cl”. The phrase “contain carbonated liquid” refers to a FR. However, ambiguity arises in determining which terms in the sentence express the DP which satisfies the FR. In fact, FR can be satisfied by (1) the “body of the beverage container” (we called DP1), a physical component of the beverage container, or alternatively (2) the “volume of the body” (we called DP2), a physical variable of the “body of the beverage container” component. Both DP1 and DP2 fit within Suh’s definition of DPs. This ambiguity makes annotating AXD concepts in text challenging, as it requires readers to subjectively interpret which natural language expressions match with AXD theoretical concepts.
Furthermore, the phrase “thermally resistant body” may implicitly express the FR “maintaining the temperature of the carbonated liquid”, and it also suggests a corresponding DP, defined as the “thermal insulation of the body”, which is not explicitly stated in the sentence. This example shows how natural language indirectly conveys AXD concepts, relying on the reader’s background knowledge for interpretation. This can lead to subjective understanding of AXD concepts and different textual annotations. Although AXD theory offers a robust mathematical framework for modelling FRs, DPs and their mappings, it does not address how these concepts are expressed in natural language and how to consistently extract them from unstructured text. To address this gap, this section provides: a linguistic modelling of AXD concepts, and an analysis of how AXD concepts can be expressed in patent sentences.
This linguistic modelling step is fundamental for constructing a benchmark dataset to evaluate the performance of the LLM. As written above, the only means by which humans can interact with an LLM is through text, as these systems exclusively process text as both input and output.
In our linguistic modelling of AXD concepts, a FR is an “Action specified for the products or the system” ( Reference SuhSuh, 2001, p. 12, Table 1.1) and consists of three components: a Doer, an Action and a Receiver. The Doer is the entity which performs the Action. While the Receiver is the entity that is affected by the Action (i.e., receives the Action). This linguistic framework is based on the Subject-Action-Object (SAO) NLP methodology, which employs grammatical patterns to extract subjects, actions, and objects from text ( Reference Li and TateLi and Tate, 2010). In patent sentences, FRs are typically expressed as follows:
- 1) FR with actions expressed with an active verb: “The imaging device transmits the acquired image to the processing device 10.” (US11861893B2). Here, the “imaging device” (Doer) performs the Action “to transmit” on the “acquired image” (Receiver). In cases involving intransitive verbs (i.e., verbs that do not require a direct object), the Receiver of the Action remains unspecified. For example, in “While the vehicle is running, […]” (US8060303B2), the “vehicle” (Doer) performs the Action “running” does not act upon a Receiver. 
- 2) FR with actions expressed with a passive verb: “The pump 77 is an electric pump that is driven by an electric motor.” (US11859581B2). Here, the Action “to drive” is performed by the “electric motor” (Doer) over the “pump” (Receiver). Moreover, in agentless passive constructions, the Doer may be omitted. For instance, in “The fluid such as water is now pumped into the drill string 12.” (US11542767B2), the Action “to pump” is performed on the “water” (Receiver) without specifying the Doer. 
- 3) FR with actions expressed with a past participle verb: in some cases, actions can be express using the past-participle form of verbs. For instance, in “When the airbag is deployed, the gas pumped into the airbag unfolds the airbag” (US2006197328A1), the past participle verb “pumped” modifies the noun “gas”, indicating that the gas (Receiver) has already been pumped (Action) before it initiates the unfolding of the airbag. 
- 4) FR as noun phrases: in some cases, actions are expressed using nouns rather than verbs. These constructions can often be rephrased to make the Action explicit through verb use. For example, “The technology allows for the measurement of both vertical and horizontal angles […]” (US11859363B2) can be rephrased as “The technology measures both vertical and horizontal angles […]” where the action “to measure” (Action) is expressed with the verb “measures” instead of the noun “measurement”. 
- 5) FR as component names: actions are also implicitly expressed using component names. For instance, in “The apparatus comprises a water collection tube pivoted to one side of its balance point.” (US8739457B1) the component “water collection tube” implicitly refers to the action “to collect” (Action) performed by the “tube” (Doer) and received by the “water” (Receiver). 
- 6) FR as adjectives or adverbs: in some cases, actions are expressed using adjectives. For example, in “Preferably, the flap is pivotable about the real pivot axis from a closed position into an opened position.” (US10113939B2) the adjective “pivotable” implicitly conveys the Action “to pivot” performed by the “flap” (Doer). Similarly, FRs can be expressed by adverbs; for instance, in “First shaft 73 is rotably mounted on upright 37 and upright 39.” (US5941414A) the adverb “rotably” refers to the Action “to rotate” performed by the “first shaft” (Doer). 
In this work we focus only on FRs expressed with verb forms (case 1-3), as verbs are the primary form identified by Reference SuhSuh (2001) for expressing FRs. Furthermore, verb forms generally have less subjective interpretations compared to cases 4-6, making it easier to establish clear annotation guidelines for text.
In our linguistic modelling of AXD concepts, a DP is a “measurable attribute of a component that designers define and adjust within its feasible range to meet one or more FRs” ( Reference SuhSuh, 2001). In patent sentences, DPs are typically expressed as follows:
- 1) DP as nouns with contextual specification: “Further, a temperature of the valve assembly is controlled based on the coolant flow.” (US2015361847A1). Here, the phrase “temperature of the valve assembly” express the DP. The phrase “of the valve assembly” provides additional context to the noun “temperature”, specifying that the temperature refers to the valve assembly. 
- 2) DP as component names: DPs may also be expressed implicitly via component names. For example, in “The system 36 may also include, an air humidity sensor 52 […]” (US2015144504A1) the phrase “air humidity sensor” refers to a component name, but the word “humidity” implicitly indicates a DP that influence the component’s functionality. 
- 3) DP as adjectives or adverbs: in some cases, DPs are expressed using adjectives rather than nouns. For instance, in “Accordingly, the outer face of the package box 90 is unsmooth” (US2007205130A1) the adjective “unsmooth” implicitly convey the DP “roughness of the package box”. Similarly, DP can be expressed using adverbs; for instance, in “[…] the inert gas flows more uniformly to the welding point” (US2024009751A1) the adverb “uniformly” can implicitly refers to the DP “flow rate of the inert gas”. 
In this work we address only DPs expressed as nouns with contextual specification (case 1), as nouns are the primary form identified by Reference SuhSuh (2001) for expressing DPs. Moreover, this form is less subject to interpretation than cases 2-3, simplifying the development of clear annotation guidelines for text.
In our linguistic modelling of AXD concepts, the mappings between FRs and DPs, represented by the design matrix [A], are named as Axiomatic Relations (AXRs). AXRs can be in the form of explicit and implicit relations. Explicit relations in text clearly define both the entities and their semantic connection, while implicit relations omit either the entities, the nature of their relationship, or both ( Reference Giordano, Consoloni, Chiarello and FantoniGiordano et al., 2024). According to this definition, AXRs can be expressed as follows:
- 1) Explicit Axiomatic Relations (AXRs): these relations consist of a FR, a DP, and an Axiomatic Pointer (AXP), which express the DP’s influence on the Action of the FR. For example, in the sentence, “In response to the threshold electrical signal, the controller activates a second alarm […]”(US2005275519A1) the AXP “in response to” explicitly indicates the influence of the DP “threshold electrical signal” over the FR expressed with “the controller (Doer) activates (Action) a second alarm (Receiver)”. 
- 2) Implicit Axiomatic Relation (AXRs): these relations do not contain an explicit AXP that links the DP to FR, leaving the connections to be inferred from the sentence. For example, in “The secondary gas flow of the spray device 7 may produce a suction force in the second conduit to draw the dry powder 4 into the secondary gas flow.” (US11541383B2), while the DP “suction force” clearly influence the FR “draw (Action) the dry powder (Receiver)”, there are no words in the sentence that explicitly define the semantic link between them. Table 1 summarizes the discussion provided above, aligning each AXD concept with its linguistic representation, accompanied by a definition and an illustrative example. 
2.2. NLP for Axiomatic Design
Several efforts have been made to modelling how FRs and DPs are expressed in technical documents.
Before the advent of LLMs, Reference Chen, Chen, Chu and KaoChen et al. (2008) developed a retrieval system capable of extracting FRs expressed in product specifications using the “Verb-Noun” structure. Similarly, Reference Li and TateLi and Tate (2010) employed grammatical rules based on the Subject-Verb-Object (SAO) structure to detect FRs in patents. These approaches rely on fixed grammatical patterns to model FRs linguistically but fail to capture the diverse and context-dependent ways FRs appear in technical documents, limiting their effectiveness.
After the advent of LLMs, Reference Akay and KimAkay and Kim (2020) use SAO structures to create prompts. These prompts were designed to generate a synthetic dataset of sentences expressing FRs and DPs using OpenAI’s GPT-2. This dataset was then used to train a classifier designed to distinguish between FRs and DPs in unstructured text. Reference Akay and KimAkay and Kim (2021) use BERT within a question-answering framework to extract FRs and DPs from design statements. Given a design statement as input, they use prompts such as “What is the aim?” to identify the FR of the system, followed by “How does it {FR}?” to extract the DP that satisfies the identified FR. Building on this, Reference Akay, Lee and KimAkay et al. (2023) integrate a LLM with prompting to create a Chatbot capable of extracting FRs and DPs from design specifications. Similarly, Reference Kwon, Kim, Lee, Suh and MunKwon et al. (2024) use a BERT-based question-answering model to extract design requirements from technical specifications. These approaches rely on the generative capabilities of LLMs to extract FRs and DPs from text but lack linguistic modelling of AXD concepts and quantitative assessment of how LLM-generated outputs aligned with AXD theory.
Table 1. Summary of the linguistic modelling of AXD concepts with definitions and example

3. Methodology
The methodology proposed in this work is composed of the following three phases: (1) Data Collection and Preprocessing, involving the retrieval and processing of patent documents to extract sentences to annotate; (2) Data Annotation, where sentences were annotated with AXD concepts by two experts (in NLP jargon called annotators); and (3) Annotation Evaluation, assessing annotation consistency and accuracy among the different annotators.
3.1. Data collection and preprocessing
We randomly collected 14,417 patents from the United States Patent and Trademark Office (USPTO) Bulk Storage System (https://bulkdata.uspto.gov/), focusing on the full text, particularly the description section, as it details the invention and is likely to contain AXD concepts. The description sections were divided into 4,408,204 sentences, which were then filtered to exclude sentences that were too short or too long based on word count. Sentences with 10 to 40 words (15 words above and below the median of 25 words) were retained, preserving 74.54% of the original set. After removing duplicates, the dataset was reduced to 2,929,742 sentences, representing 66.46% of the initial collection. From this dataset, we randomly sampled 750 sentences from each International Patent Classification (IPC) section to create a balanced sample of 6,000 sentences covering A-H patent domains.
3.2. Data annotation
We developed annotation guidelines to assist annotators in accurately identifying AXD concepts within the 6,000 sentences in the dataset. These guidelines include: (1) linguistic concepts and their corresponding annotations, as outlined in the “Linguistic Concept” column of Table 1; (2) definitions of the concepts, provided in the “Definition” column of Table 1; (3) illustrative examples for each concept, listed in the “Example” column of Table 1; (4) example of annotated sentences to visually guide annotators on expected outputs. Figure 1 shows how words are annotated based on linguistic concepts in Table 1. For instance, “motor”, “controls” and “valve” are tagged as Doer, Action, and Receiver, forming a FR. The phrase “according to” is tagged AXP, and “temperature threshold of the cooling system” is tagged as DP. Additionally, the semantic connection between DP and FR is marked with the AXR tag; and (5) exclusions, including examples of elements that should not be annotated (e.g., auxiliary verbs such as “configured” in “the radiator 210 is configured to dissipate heat from the motor 200” (US11859481B2) should not be tagged because it introduces the actual Action “dissipate”).
Two PhD students experts in the field of ED annotated 4,800 sentences (2,400 each) following established guidelines. To evaluate annotation consistency, they also independently annotated an additional 1,200 sentences. These 1,200 are the same for both annotators. This phase produced a dataset of 6,000 distinct annotated sentences. The annotation was conducted using Doccano (https://doccano.github.io/doccano/), an open-source tool for text annotation that supports entity and relation labelling. The annotation process was conducted at a rate of 50 sentences per hour, amounting to approximately 120 hours of manual effort.

Figure 1. Example of annotated sentence with linguistic concepts of AXD
During the annotation phase, annotators met every 600 sentences to review annotation guidelines. If any updates to the guidelines were made, the annotators revised the previously annotated sentences. This iterative process was essential to ensure annotation accuracy and consistency.
3.3. Annotation evaluation
Following the annotation phase, we conducted an automated analysis to identify errors introduced by annotators throughout the annotation phase. We will refer to this step as “face validity”. The face validity flagged sentences where annotation rules were violated. For instance, when an AXR is found to link a Doer with an Action, an error is raised since the tag AXR can only be used to link a DP to a FR. After this face validity was performed, flagged sentences were corrected by annotators ensuring that annotated dataset adheres to annotation guidelines. Moreover, we calculated the number of errors made by annotators, providing valuable insights on the cognitive demands of the annotation task, the complexity of the annotation rules, as well as potential approaches for simplifying the annotation guidelines.
After completing the face validity, we assessed the consistency of annotations between the two annotators. Consistency in this context refers to how similarly annotators interpreted and annotated AXD concepts in patent sentences. This analysis provided insights into the ambiguity of our annotation task. To measure consistency, the two annotators independently annotated a random sample of 1,200 sentences drawn from the initial set of 6,000 sentences. Then, we calculated the Inter-Annotator Agreement (IAA). IAA metrics evaluate the similarity between two sets of annotations; high IAA values indicate strong agreement, indicating that annotators consistently identify the same entities and relations for tagging. Among various standard IAA metrics discussed in the literature (Artstein et al., 2017), the F1 score was chosen as the IAA metric for this study. To compute the F1 score, we treated the annotations of one annotator (gold annotator) as the reference set and compared the annotations of the other annotator (test annotator) against it ( Reference SuhHripcsack and Rothschild, 2005). Precision was calculated as the number of correct annotations made by the test annotator (true positives) divided by the total number of annotations made by the test annotator. Recall was calculated as the number of correct annotations made by the test annotator divided by the total number of annotations in the reference set. The F1 score, calculated as the harmonic mean of precision and recall, provide a balanced measure of agreement between the gold and test annotators ( Reference Deleger, Li, Lingren, Kaiser, Molnar, Stoutenborough and SoltiDeleger et al., 2012). Notably, the F1 score is symmetric, switching the roles of the gold annotator and the test annotator does not affect the result.
4. Axiomatic Design dataset
In this section we release the dataset along with detailed documentation for usage on GitHub ( https://github.com/Marco-Consoloni/axiomatic-design-dataset ). Additionally, we calculate descriptive statistics to quantitatively assess whether the linguistic modelling and the annotation process has successfully captured AXD concepts within patent sentences.
Table 2. Descriptive statistics of annotation results

Table 2 presents descriptive statistics for each linguistic concept, including: (1) the total number of annotated concepts, (2) the average number of concepts per sentence, (3) the number and percentage of sentences containing at least one instance of each concept, and (4) examples of the most frequently annotated concepts and their occurrences. Table 2 highlights that the total number of annotated FRs, DPs, and AXRs is 5,019, 3,169 and 878, respectively. Compared to previous studies on the extraction of AXD concepts using NLP tools, this work employs a significantly larger dataset. For example, Reference Li and TateLi and Tate (2010) analysed 15 pairs of FRs, and DPs extracted from two patents. Reference Akay and KimAkay and Kim (2020) tested their methodology on a case study consisting of 2 FRs and 4 DPs of a faucet and a refrigerator. Reference Akay and KimAkay and Kim (2021) applied their approach to 88 pairs of FRs and DPs extracted from 11 abstracts of scientific papers. Similarly, Reference Akay, Yang and KimAkay et al. (2021) used 18 pairs of FRs and DPs from two patent abstracts. Reference Kwon, Kim, Lee, Suh and MunKwon et al. (2024) extracted 108 design requirements and 632 design parameters from 711 sentences of design specifications. These comparisons demonstrate that patents contain FRs and DPs, making them valuable resources for analysis of AXD, and our annotation process successfully captures a broad range of AXD concepts, providing a robust dataset to investigate the linguistic expression of AXD concepts.
4.1. Analysis of Axiomatic Design in natural language
To have deeper understanding on how FRs, DPs and AXRs are expressed in our dataset, we analyse patent sentences with and without AXRs. Table 3 categorizes annotated sentences into five distinct cases, indicating the number of sentences that fall into each case.
Case 1 includes sentences in which none of the four linguistic concepts (FR, DP, AXR and AXP) are annotated. This case typically includes sentences describing structural aspects of patented devices such as couplings of components and spatial arrangements. For instance, “[…], the apparatus comprises a second collector which is connected to the second outlet.” (US11857982B2) details the elements that make up the apparatus of the patented device. Cases 2 and 3 include sentences containing only DPs and only FRs, respectively. Notably, since AXRs connect FRs to DPs, AXR and AXP tags cannot be present in cases 1–3. Case 2 typically extends Case 1 by specifying DPs. For example, the sentence “the electromagnetic radiation beam can include a plurality of laser pulses having a wavelength between 320 nanometers and 430 nanometers.” (US11857462B2) not only describes the components of the electromagnetic radiation beam but also specifies its DP “wavelength”. Case 3 typically includes sentences that describe the functioning of patented devices, expressing their FRs without explicitly specifying the DPs that regulate them. For instance, “The output may be displayed on a local terminal, or transmitted to a remote terminal, or both.” (US11857265B2) do not specify any DPs. Case 4 includes sentences in which FRs, and DPs are annotated but no AXRs are annotated. Case 4 typically includes sentences containing DPs that do not influence the FRs expressed within the sentence. For instance, in the sentence, “Then, the controller decelerates the rotation of the drum, […]” the DP “rotation of the drum” is the Receiver of the Action “decelerates”, performed by the Doer “controller”. Since the DP cannot be used by designers to regulates the Action, no AXR is contained in the sentence. Case 5 consists of sentences in which FRs, DPs, AXRs and AXPs are annotated. This case represents sentences which contain explicit AXRs. On the contrary, Case 6 includes sentences where FRs, DPs and AXRs are annotated, but AXPs are absent. This case represents sentences which contain implicit AXRs.
Table 3. Descriptive statistics of Axiomatic Relations in patent sentences

In Table 3 the number of sentences which contain implicit AXRs is 410 (case 6), while the number of sentences which contain explicit AXRs is 226 (case 5). This indicates that implicit AXRs are more frequent than explicit ones. This suggests that AXRs are often not explicitly stated in patent sentences but must be inferred from context, making their identification subjective and context-dependent. The analysis of explicit and implicit AXRs can be used to convert implicit AXRs (case 6) into explicit ones (case 5), improving clarity in how design intent is communicated. For instance, the sentence “[…], when the communicator 110 receives the setting value change, the controller 100 may change the amount of detergent input.” (US11859327B2) can be rephrased as “[…], when the communicator 110 receives the setting value change, in response to the setting value change, the controller 100 may change the amount of detergent input.” The rephrased version includes the AXP “in response to” to explicitly clarify how the DP “setting value change” affects the FR “change the amount of detergent input”. Furthermore, we analysed sentences with AXRs which contain FRs that do not specify the Doer of the Action. This analysis can uncover gaps in technical communication that might hide critical details for ED tasks. For example, in the sentence “The existing well can also be pressurized to a desired pressure, […]” (US11859479B2), the FR involving the Receiver “existing well” and the Action “pressurized” is influenced by the DP “desired pressure”. However, the sentence does not explicitly state the Doer responsible for the Action “pressurized”. Such omissions could be crucial for engineers, potentially resulting in misinterpretations of design specifications and design flaws during implementation phases.
4.2. Results of annotation evaluation
In this section we evaluate the complexity of identifying AXD concept within patent sentences. The face validity conducted on the annotated sentences uncovered a total of 161 errors, accounting for 0.8% of the 19,555 annotated concepts. This suggests that accurately identifying AXD concepts demands attention, but our annotation rules were not overly complex and easy to follow by annotators.
Moreover, we calculated the IAA to evaluate the differences between the annotations made by the two annotators on 1,200 sentences. Table 4 shows the F1 scores for each linguistic concept listed in Table 1. The F1 score ranges from 0 to 1. A F1 score closer to 1 indicates that, on average, the two annotators made similar or identical annotations for that AXD concept. Conversely, lower F1 scores suggest less agreement, indicating discrepancies in how the annotators interpreted and labelled that AXD concept. Hence, the metric 1−F1 score provides an estimate of the disagreement between the annotators. Table 4 clearly shows that F1 scores for all AXD concepts are higher than their corresponding 1−F1 scores. This indicates that our linguistic modelling of AXD concepts, along with the annotation guidelines, effectively supports the annotation of AXD concepts in patent sentences.
Table 4 highlights the uncertainty in identifying AXD concepts, driven by two key factors. First, the complexity of the annotation task, which requires accurately identifying both entities and their relations. For example, annotating a FR requires identifying three exact entities (Doer, Action, and Receiver) and their relations for achieving an annotator agreement. This leads to lower IAA compared to simpler tasks, such as annotating single entities like patient or medication names, which typically achieve an IAA of around 0.8 ( Reference Deleger, Li, Lingren, Kaiser, Molnar, Stoutenborough and SoltiDeleger et al., 2012). Second, the inherent ambiguity of natural language, which often conveys technical concepts in a fragmented or implicit manner. For instance, in the sentence “Based on the message, the secure enclosure may then revoke the second user’s access to control the secure enclosure.” (US11859407B1), the gold annotator interpreted “message” as “type of message” and thus annotated it as a DP with an AXR with the A “revokes” performed by the D “secure enclosure”. Conversely, the second annotator did not annotate “message” as DP reasoning that the context does not provide sufficient evidence to confirm it as a DP controlled by the designer. However, analysing the broader patent document, it contains the sentence “Such messages may take various other forms as well.” (US11859407B1); which clearly supports the interpretation of “messages” as “type of messages”. This example shows that annotating AXD concepts within text is a demanding task requiring significant effort even for AXD experts, and the interpretation of AXD concepts often depends on understanding the broader system and the designer’s intent, which may not be expressed by individual patent sentences.
Table 4. Inter-Annotator Agreement (IAA): F1-score for each linguistic concepts

5. Conclusions and future developments
The success of data-driven approaches heavily depends on the availability of high-quality ED datasets, which are essential for training and testing LLMs on ED tasks. Without ED datasets, it is challenging to move from ED theory to practical applications. To address this need, this work presents a formal process for manually annotating ED concepts in text and it releases a comprehensive benchmark dataset consisting of 6,000 sentences, and 19,555AXD concepts. Moreover, this work enhances our understanding of how AXD concepts are expressed in patent sentences, bridging the gap between abstract AXD theory and its practical expression in natural language. While we have made specific assumptions in defining AXD concepts, we recognize the importance of standardization. Therefore, we plan to engage the ED community to clarify these definitions and establish shared standards.
A key limitation of this work is its focus on isolated sentences, as ED concepts are often interconnected and span multiple sentences. This may lead to partial interpretations of AXD concepts and potential annotation inconsistencies. To address this limitation, we plan to explore how ED relations propagate across sentences and develop a linguistic modelling to extract cross-sentence ED relations. Furthermore, this work offers quantitative insights into the challenges of identifying AXD concepts in patents and questions the reliability of prompting techniques for extracting ED concepts accurately.
For future work, we plan to leverage our dataset to assess LLM-generated outputs for extracting AXD concepts. This approach could be particularly valuable for extracting design information from patents and other technical documents, providing support for designers and engineers during the conceptual design phase. By identifying all potential FRs, DPs, and their relationships for a given product, this method can determine different product concepts, facilitating the exploration of alternative solutions. Additionally, designers can compare their product with those of competitors to identify and analyse the various design parameters that influence a specific functional requirement. By examining how these parameters differ across competing products, designers can determine which aspects can be optimized to enhance performance, meet user needs, or achieve a competitive advantage. Ultimately, automating the extraction of these concepts enable the analysis of technological trends within specific fields.
Acknowledgement
This research has been funded by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - “FAIR - Future Artificial Intelligence Research” - Spoke 1 “Human-centered AI”, funded by the European Commission under the NextGeneration EU program and by the DETAILLs Project (DEsign Tools of Artificial Intelligence in Sustainability Living LabS) - European Union. Erasmus + KA2 - Cooperation partnership in higher education (Project Number: 2023-1-IT02-KA220-HED-000158755).
 
 




