1. Introduction
Road traffic injuries cause more than one million deaths worldwide every year. As a result, substantial engineering efforts are dedicated to reducing injury risks for both occupants and pedestrians in crash scenarios during the development of new vehicles (Reference Gonter, Knoll, Leschke, Seiffert, Weinert, Pischinger and SeiffertGonter et al., 2021). Protection mechanisms include mechanical energy absorption structures, like side sills and longitudinal members, as well as restraint systems such as seats, seatbelts, and airbags. These systems undergo dynamic loads resulting in large deformations within a short period of time. The behavior is nonlinear due to large deformations, material fracture, as well as contact interactions (Reference Beyer, Schneider and SchumacherBeyer et al., 2021). By using virtual investigations based on numerical Finite Element (FE) crash simulations, flexibility during development is increased compared to purely physical crash-testing-based development processes. Nowadays, physical crash tests are used primarily for final system approval and validation of virtual models during development.
However, the design process remains complex and poses significant challenges to the engineers involved, especially due to growing demands from legal regulations and customer expectations, which are leading to increased system complexity. The design can thus be described as a substantial multi-objective optimization problem. To systematically handle optimization procedures, approaches include the use of less complex data-driven surrogate models, which are trained using data from high-fidelity FE simulation models (Horii, Reference Horii2017; Büttner et al., Reference Büttner, Schumacher, Bäck, Schwarz and Krause2023), Reinforcement Learning (RL)-agents (Mathieu et al., Reference Mathieu, Gupta, Di Roberto and Vielhaber2024; Trilling et al.; Reference Trilling, Schumacher and Zhou2024) or specifically defined heuristics (Reference Beyer, Schneider and SchumacherBeyer et al., 2021). The results generated with these methods are often complex and a considerable proportion of development time is spent on analyzing and understanding these systems. Approaches that use global sensitivity metrics, such as Sobol Indices (Reference Büttner, Schumacher, Bäck, Schwarz and KrauseBüttner et al., 2023), or comprehensible heuristics for layout definition (Reference Beyer, Schneider and SchumacherBeyer et al., 2021) can provide a superficial understanding of why a certain system configuration appears optimal. Frameworks for the Artificial Intelligence (AI)-based analysis of numerical simulation results in general show great potential in increasing development process quality and efficiency as well as improving system understanding. Proposed methods include rule mining for bumper deformations (Reference Diez, Kunze, Toewe, Wieser, Harzheim, Schumacher, Schumacher, Vietor, Fiebig, Bletzinger and MauteDiez et al., 2018), anomaly detection in crash-loaded structures (Reference Kracker, Dhanasekaran, Schumacher and GarckeKracker et al., 2023) and in crash sensor signals (Reference Mathieu, Di Roberto and VielhaberMathieu et al., 2023), as well as deep-learning plausibility checks (Reference Bickel, Goetz and WartzackBickel et al., 2023) and modelling cause-and-effect relationships (Reference Conti, Kaijima, As and BasuConti and Kaijima, 2020).
However, all these approaches lack detailed datapoint-specific insights, the possibility for intuitive user-interaction and a direct link to generate technical documentation. Subsequently, this work centers on the aspect of gaining global and local insights with the use of eXplainable AI (xAI) as well as the interactive result representation and documentation with Large-Language-Models (LLM). This will be discussed in the context of mechanical system optimization and validated within an industry use case.
2. State of the art – optimizing and understanding system behavior
The related work for the proposed framework involves optimization of crash-loaded mechanical systems (Section 2.1.), AI-driven analysis of FE simulation results (Section 2.2.), and utilization of xAI methodologies and LLMs to generally explain and understand complex behavior (Section 2.3.).
2.1. Optimization of crash-loaded systems
During the development process of new vehicle generations, simultaneous optimization of the vehicle structure and the restraint system is performed (Reference Gonter, Knoll, Leschke, Seiffert, Weinert, Pischinger and SeiffertGonter et al., 2021). This ensures structural integrity and the lowest possible loads on occupants represented by Anthropomorphic Test Devices (ATD) in crash tests, commonly known as crash test dummies. Occupant safety as discussed in Horii (Reference Horii2017), or Mathieu et al. (Reference Mathieu, Gupta, Di Roberto and Vielhaber2024) typically focuses on the design of airbags and seatbelts with the aim of reducing loads on the ATD. Structural safety as discussed in Beyer et al. (Reference Beyer, Schneider and Schumacher2021), Büttner et al. (Reference Büttner, Schumacher, Bäck, Schwarz and Krause2023) or Borse et al. (Reference Borse, Gulakala and Stoffel2024) focuses on optimizing energy absorption, maximum force or intrusion depths. Approaches are similar for both fields and include the use of global optimization algorithms either with surrogate models or computationally efficient simulation models. Especially for the topology optimization of the 3D layout of structural components graph-heuristic approaches have shown great potential (Beyer et al. (Reference Beyer, Schneider and Schumacher2021). The latter approach within two dimensions can be extended by a RL-based heuristic (Reference Trilling, Schumacher and ZhouTrilling et al., 2024). Mathieu et al. (Reference Mathieu, Gupta, Di Roberto and Vielhaber2024) also include the use of RL techniques to minimize occupant loads by design adjustments in the restraint system, and Borse et al. (Reference Borse, Gulakala and Stoffel2024) increase the structural performance of a crash box structure by adjusting the wall thicknesses.
2.2. AI-based analysis of numerical simulations
Aiming for increased quality and faster understanding of mechanical systems, a variety of approaches have been proposed in the literature to support engineers in analyzing simulation results. These focus either on the analysis of time-variant crash-loaded systems (Diez et al., Reference Diez, Kunze, Toewe, Wieser, Harzheim, Schumacher, Schumacher, Vietor, Fiebig, Bletzinger and Maute2018; Iza Teran et al., Reference Iza Teran and Garcke2019; Kracker et al., Reference Kracker, Dhanasekaran, Schumacher and Garcke2023) or statically loaded systems (Conti and Kaijima, Reference Conti, Kaijima, As and Basu2020; Bickel et al., Reference Bickel, Goetz and Wartzack2023). The purposes of these investigations are diverse, including the detection of outliers (Reference Kracker, Dhanasekaran, Schumacher and GarckeKracker et al., 2023), the identification of cause-and-effect relationships (Reference Conti, Kaijima, As and BasuConti and Kaijima, 2020), as well as finding rules of how outcomes compile (Reference Diez, Kunze, Toewe, Wieser, Harzheim, Schumacher, Schumacher, Vietor, Fiebig, Bletzinger and MauteDiez et al., 2018). Kracker et al. (Reference Kracker, Dhanasekaran, Schumacher and Garcke2023) propose an approach to automatically specify outliers in structural behavior of vehicle components subjected to crash load. Outlier scores are calculated using the kth-nearest-neighbor method within a dimensionally reduced representation of each component. Thole et al. (Reference Thole, Nikitina, Nikitin and Clees2010) analyze multiple crash simulations, identifying the components that contribute most significantly to the dominant differences in the results. Global understanding is gained by applying Principal Component Analysis (PCA) to detect deformation modes, similar to Kracker et al. (Reference Kracker, Dhanasekaran, Schumacher and Garcke2023). Additionally, to identify potential origins of scatter for a given state of interest, correlation clustering is applied, and each component at each time step is analyzed with the so-called difference PCA. Both approaches have been validated on large-scale crash simulation models. An alternative for representing numerical crash simulations in a low-dimensional space is proposed by Iza Teran and Garcke (Reference Iza Teran and Garcke2019), who use the Laplace-Beltrami and Fokker-Planck operator on FE mesh data. Diez et al. (Reference Diez, Kunze, Toewe, Wieser, Harzheim, Schumacher, Schumacher, Vietor, Fiebig, Bletzinger and Maute2018) utilize instead a supervised learning technique, specifically decision trees, to extract rules in an IF-THEN format for deformation modes of a bumper in a frontal crash scenario. Hence, information from the simulation input and output is used for processing. In the work of Hahner et al. (Reference Hahner, Iza Teran, Garcke, Farkaš, Masulli and Wermter2020), the temporal behavior of crash-loaded vehicle structures represented as oriented bounding boxes is learned using Long Short-Term Memory Networks. Two decoders are incorporated: one for reconstruction and one for predicting future time steps of the analyzed sequence. In addition to analysis of the latent space, predictions can also be performed. In the above-mentioned approaches, no direct classification is made regarding the plausibility of obtained simulation results. This aspect, however, is covered in the work of Bickel et al. (Reference Bickel, Goetz and Wartzack2023), who use a multilabel classification technique incorporating Convolutional Neural Networks. The multilabel classifier, trained on 60,000 simulations including different models, allows for defining the plausibility of the mesh, geometrical shape, or load values. To increase the interpretability of static numerical FE simulations and define what may have caused a certain simulation result, Conti and Kaijima (Reference Conti, Kaijima, As and Basu2020) trained a Bayesian Network on design parameters and corresponding deflections of a beam under static load. The resulting probabilities can thus help to estimate an outcome and improve understanding of relationships within the design space.
2.3. Explanation and understanding of systems with artificial intelligence
For gaining insights, why machine learning (ML) models make certain predictions given specific input data, xAI techniques, such as counterfactuals or model-agnostic techniques, including SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), can be used (Reference Molnar, Holzinger, Goebel, Fong, Moon, Müller, Samek, Holzinger, Goebel, Fong, Moon, Müller and SamekMolnar et al., 2022). These techniques provide valuable insights into the rationale of complex AI systems and improve trustworthiness. However, xAI techniques serve to gain transparency of the trained ML model, whereby explainability is only achieved by including domain knowledge (Reference Roscher, Bohn, Duarte and GarckeRoscher et al., 2020). In the field of production engineering, Feldkamp and Strassburger (Reference Feldkamp and Strassburger2023) apply different model-agnostic techniques (SHAP, LIME, Anchor) to investigate the robustness in control factors of a plant simulation. Multiple relationships are modeled, including those between control factors and a robustness measure, revealing what factors contribute to potential robustness issues. Other promising results include the work of Lin et al. (Reference Lin, Zhang and Tiong2023), who propose a robust optimization framework that integrates a ML model to forecast tunnel-induced damage. The model is trained on static structural FE simulation data and is used to solve multi-objective optimization problems. Global importances based on SHAP values provide transparency, enhancing understanding and trust in the recommendations. It is important to highlight that what constitutes an interpretable model can vary based on the situation and target user group, making it challenging to determine the level of interpretability required (Reference Molnar, Holzinger, Goebel, Fong, Moon, Müller, Samek, Holzinger, Goebel, Fong, Moon, Müller and SamekMolnar et al., 2022).
Apart from the xAI field, large language models (LLM) have demonstrated substantial capabilities in reasoning over data and explaining certain behaviors in a human-relatable manner. Extensive pre-training equips these models with a large knowledge base, enabling the contextualization of data and generation of insightful interpretations. By presenting recommendations in natural language, accessibility and interpretability of information, especially for non-technical users, is increased. Hegselmann et al. (Reference Hegselmann, Buendia, Lang, Agrawal and Jiang2024) analyze few-shot classification of tabular data with LLMs, highlighting their ability by benchmarking against classical approaches, including gradient-boosted trees. Their so-called TabLLM outperforms baseline models in the very-few-shot setting. However, when many training samples are available, classical ML models remain competitive, emphasizing the effective utilization of prior knowledge in the LLM. The actual use of the feature names and their relationships by the LLM is demonstrated by the observed performance drops when column names are removed. Bordt et al. (Reference Bordt, Nori, Rodrigues, Nushi and Caruana2024) confirm these findings, revealing that LLMs have memorized many popular tabular datasets verbatim. This aspect is particularly promising as data understanding forms the basis of reasoning and explanation. Roy et al. (Reference Roy, Zhang, Bhave, Bansal, Las-Casas, Fonseca and Rajmohan2024) explore the use of LLMs to reduce the efforts of on-call engineers by automatically investigating the causes of incidents. The proposed ReAct agent is equipped with retrieval tools and evaluated on an out-of-distribution dataset of production incidents collected at Microsoft. Results show good retrieval and reasoning performances, with increased factual accuracy. Notably, the LLM agent is not fine-tuned as outlined in Ahmed et al. (Reference Ahmed, Ghosh, Bansal, Zimmermann, Zhang and Rajmohan2023) due to cost and time constraints. Hsu et al. (Reference Hsu, Wu and Liu2024) combine xAI and LLM, where global importances based on SHAP values are explained in textual format to be understood by non-experts. This approach successfully transforms complex SHAP plots into short and easy-to-understand text outputs. In the field of mechanical engineering, Jadhav et al. (Reference Jadhav and Farimani2024) propose the use of an LLM for mechanical design. The system specification is provided in natural language, and the LLM designer tests new designs by executing FE simulations to evaluate the performance. A success rate of up to 90 % for optimizing simple truss structures highlights future potential.
2.4. Derivation of research objectives
Based on the previously discussed literature, relevant gaps addressed in the present work are highlighted. Global optimization approaches focus on directly providing the best possible design but often lack insights of why this design is optimal. Currently, this can be achieved by manual comparison of the optimal design and initial design, which is complex, especially if many parameters, such as the 38 wall thicknesses in Büttner et al. (Reference Büttner, Schumacher, Bäck, Schwarz and Krause2023), are considered simultaneously. Global design space analysis as realized based on SHAP values in Lin et al. (Reference Lin, Zhang and Tiong2023) or Sobol indices, as in Büttner et al. (Reference Büttner, Schumacher, Bäck, Schwarz and Krause2023), improve understanding but lack datapoint-specific information. Unsupervised learning approaches are utilized by analysis methods for crash simulations to detect outliers (Thole et al., Reference Thole, Nikitina, Nikitin and Clees2010; Iza Teran and Garcke, Reference Iza Teran and Garcke2019; Kracker et al., Reference Kracker, Dhanasekaran, Schumacher and Garcke2023). The question of what may cause the outlier is partially covered by Diez et al. (Reference Diez, Kunze, Toewe, Wieser, Harzheim, Schumacher, Schumacher, Vietor, Fiebig, Bletzinger and Maute2018) and Thole et al. (Reference Thole, Nikitina, Nikitin and Clees2010). However, the latter does not allow intuitive traceback to design parameters, which are relevant for understanding system behavior. The decision trees used in Diez et al. (Reference Diez, Kunze, Toewe, Wieser, Harzheim, Schumacher, Schumacher, Vietor, Fiebig, Bletzinger and Maute2018) do not allow for an attribution to certain features, and the approach is marginally feasible for simultaneous analysis of many parameters. Additionally, none of the approaches named offers the possibility of providing results in textual format for interaction and technical report generation. While Hsu et al. (Reference Hsu, Wu and Liu2024) propose the idea of explaining SHAP values with a LLM, the encoding into the model is only in tabular format. Furthermore, the local representation for analyzing individual data points is not discussed, despite SHAP enabling this. Additionally, there is no integration of the trained predictive models that can provide on-demand predictions and contribute to system understanding. Subsequently, the proposed framework will cover three main objectives:
- 1) Generating global importance and local design parameter contributions of crash-loaded systems through SHAP analysis of a data-driven surrogate model for system behavior understanding. 
- 2) Providing the obtained data and trained model executables of the SHAP analysis to an LLM enabling interactive exploration for engineers or automated technical report generation. 
- 3) Validating the usefulness of the framework using the example of an optimization for a crash-loaded side sill. 
These objectives aim to increase the interpretability of crash simulation results. Therefore, the engineers involved understand the system better and in less time. As this builds the foundation for developing innovative products and improving efficiency in the long term by automated documentation processes, the present work has high relevance to the field of AI-enhanced engineering.
3. Explanation of FE crash simulation results through xAI and LLMs
3.1. Framework
The foundation of the framework, depicted in Figure 1, is given by engineering domain knowledge. This includes the situational deployment of the framework in general, the definition of input and output data, as well as simulation data generation. The model input consists of design parameters, such as wall thicknesses or restraint force levels. The ML model is then trained to represent relations to so-called target values. A target value can be a single value such as a certain load on an ATD but can also be an objective function weighing multiple values such as maximum force and energy absorption. Due to the high performance on tabular data of medium sized datasets and the efficient computation of SHAP values with TreeSHAP (Reference Lundberg, Erion and ChenLundberg et al., 2020) a gradient-boosting tree ML model according to Chen and Guestrin (Reference Chen and Guestrin2016) is used. The ML model is trained to serve as a surrogate model for the FE simulation model and can optionally be passed to a differential evolution (DE) optimizer for optimization of the target value, which is known as metamodel-based optimization (Reference Büttner, Schumacher, Bäck, Schwarz and KrauseBüttner et al., 2023). As a next step, the trained prediction model is analyzed using TreeSHAP to obtain the SHAP values as contributions of each design parameter to the target value for all simulation results. A background dataset, i.e., the training data, is used for SHAP value calculation to allow dependencies between features to be handled according to the rules dictated by causal inference (Reference Lundberg, Erion and ChenLundberg et al., 2020; Reference Janzing, Minorics and BloebaumJanzing et al., 2020). The input and output data, SHAP values and the executables for the prediction and agnostic model are stored in a database. Input and output data as well as the SHAP values are serialized for each simulation and transformed into a natural-language string (Reference Hegselmann, Buendia, Lang, Agrawal and JiangHegselmann et al., 2024), as the LLM understands natural language text input best. This is also done for the results obtained by the DE optimizer and statistical measures for the input and output data, e.g., mean and standard deviation. If too many text units (tokens) are required to fit the whole input at once, information is provided simulation-wise depending on user demand. The LLM functionality within the framework is realized with an Application Programming Interface (API) provided by OpenAI. The user can give the instructions, “Please predict the results for the target value for the input data XYZ!”, to locally call the trained prediction and agnostic model executables. The concepts of system and difference SHAP later introduced in Section 3.2 can also be executed to improve design comparisons and further analyze subsystem importance.

Figure 1. Framework for explaining results from numerical finite element simulations
The LLM used is a company custom OpenAI Chat-GPT-4o version. This aspect is important, as simulation data for new products is confidential and data security issues must be considered. Aside from data and models, a context statement describing the general task is provided to the LLM. Within the context statement predefined tasks such as making a standard technical report of comparing the optimal with the initial variant are specified. In contrast to existing approaches, where the obtained results must be explained and documented manually, a substantial amount of time is saved. As previously highlighted by Hegselmann et al. (Reference Hegselmann, Buendia, Lang, Agrawal and Jiang2024), LLMs indeed rely on the given data, which allows the engineer to collaborate with the AI and interactively chat with the data and models exploiting advanced reasoning capabilities. The capabilities have been shown in other fields such as production error explanation as in Roy et al. (Reference Roy, Zhang, Bhave, Bansal, Las-Casas, Fonseca and Rajmohan2024), mechanical design in Jadhav et al. (Reference Jadhav and Farimani2024), as well as explanation of global feature importances based on SHAP values in Hsu et al. (Reference Hsu, Wu and Liu2024). The obtained engineering explanation can be documented in a knowledge storage system and help to set up further analysis.
3.2. Exploiting SHAP value additivity for improved result comparison
For an improved system understanding and comparison of simulation configurations the additivity of SHAP values (Reference Lundberg, Erion and ChenLundberg et al., 2020) is exploited. On one side, this property allows for the aggregation of features describing certain subsystems according to the paradigm of systems engineering. The contributions specified by SHAP value φ i for each feature belonging to the subsystem S can be summed up to an overall subsystem attributionφ S (Eq. 1). From an engineering perspective, these importances can be traced back to subsystems that may be manufactured by a particular supplier. Subsequently, changes can be implemented to improve overall system behavior. On the other, it permits the subtraction of features contributions for gaining insights into why results from two simulations differ (Eq. 2). In the context of optimization, this may involve comparing the optimized system design with the initial system design. Note that N is the overall number of input features. These functionalities are also integrated in the above framework. System SHAP and difference SHAP values can be calculated on demand by calling the corresponding executable.
 $$\mathop  \textstyle \phi _s = \textstyle \sum \nolimits _{i \in S} \phi _i $$
$$\mathop  \textstyle \phi _s = \textstyle \sum \nolimits _{i \in S} \phi _i $$
 $$f(x^A ) - f(x^B ) = \mathop \sum \nolimits_{i = 1}^N (\phi _i^A - \phi _i^B )$$
$$f(x^A ) - f(x^B ) = \mathop \sum \nolimits_{i = 1}^N (\phi _i^A - \phi _i^B )$$
4. Validation – sill optimization in pole side crash
The validation of the framework is demonstrated using an industrial use case, an optimization of a side sill in a vehicle pole side crash as per the FMVSS-214 regulation. In the crash, the vehicle, initially moving sideways at a constant velocity, impacts a fixed pole at location of the driver’s head center-of-gravity. This results in significant intrusions of the lateral structure of the vehicle. The side sill absorbs much of the vehicle’s kinetic energy through plastic deformation. Due to the high computing time required for full vehicle crash simulations, which can take up to several days on high-performance clusters, and the associated costs, a submodel setup is generated as depicted on top of Figure 2.

Figure 2. Sill optimization in pole side crash using a submodel with point-masses
This model contains only the side sill and connected point masses and inertias representing the vehicle and its movement. The computing time of the submodel is approximately 400 times faster than the full vehicle crash simulation, allowing for efficient data generation. Such submodel structures are regularly used in optimization, where a feasible correspondence to the original model must be maintained (Reference Büttner, Schumacher, Bäck, Schwarz and KrauseBüttner et al., 2023). As depicted in the lower left side of Figure 2, 14 wall thicknesses of this side sill profile are included as design parameters with defined boundaries for the optimization. A Latin Hypercube Sampling scheme is used to generate 556 to be computed design variants, ensuring comprehensive coverage of the permissible design range for each parameter. The target value to be optimized in this example is the mass-specific energy absorption at 80 mm of intrusion EM_rel.
4.1. Learning and optimizing system behavior
A gradient-boosting tree ML model is trained to predict EM_rel based on the simulation results of 556 variants, using the 14 wall thickness values as input features. The model accuracy, measured by the coefficient of determination (R2), for the unseen test data is 0.80, as shown on the left side of Figure 3. If necessary, further evaluation of the model can be conducted using k-fold cross-validation. The slight overfitting of the prediction model may primarily be attributed to the high nonlinearity inherent in the crash event, evidenced by buckling of the walls. For the present case, accuracy is considered sufficient. Figure 3 also shows on the right side the optimization results of EM_rel. It increased by about 20 %, from 13.4 J/kg in the initial variant to 16.5 J/kg, and approximately doubled compared to the worst variant, which shows only 8.0 J/kg

Figure 3. Scatter plot for true vs. predicted values and optimization results
4.2. Explaining the system behavior using SHAP and an LLM
At the top of Figure 4, the global importance analysis, derived from SHAP values, helps to identify significant design parameters (right) and subsystems (left) influencing the optimized target value EM_rel. Furthermore, it enables the verification of the learned relationships within the prediction model for reasonableness. The analysis reveals a higher importance of the outer sill wall thicknesses compared to the those of the inner sill. This is reasonable, as the specific energy absorption EM_rel is evaluated at an intrusion of 80 mm. Globally important wall thicknesses are R1, R13, and R2, indicated by the large range of SHAP values for each design parameter. In the center of Figure 4, the SHAP values for a single simulation result are presented, illustrating the local contribution and, consequently, the importance of the parameters for that specific outcome. These waterfall plots depict the cumulative summation of each design parameter’s SHAP value, starting from the expected value at the bottom and progressing upwards, resulting in the predicted value at the top. This enables a detailed understanding of the prediction model’s reasoning behind its output for the given design parameter values. For clarity, only the five most influential design parameters are displayed in the plots, along with the aggregated influence of the remaining nine. For an arbitrarily selected reference variant RUN_429, counteracting contributions of R13/R1 and R2/R14 become evident. Relative to this reference variant, difference SHAP values are calculated in relation to the optimal configuration RUN_202 and are shown on the right plot. The main differences can be attributed to R1 and R2. As no counteracting influences are visible, the optimal solution seems plausible as all design parameters contribute positively regarding EM_rel. Combining general engineering judgment with these analyses, which reveal the global importance and the local contributions of design parameters on the target value, allow for the rapid development of a comprehensive system understanding. However, engineers may find these plots and analyses unintuitive, or they may lack the necessary expertise for their proper interpretation. These concerns are addressed by using the LLM, as this interface allows engineers to analyze results in the form of text in a collaborative and interactive manner. An excerpt of a response comparing two simulation runs is visualized at the lower end of Figure 4. Text and figures similarly as displayed can directly be placed on slides providing the engineer with the basis for a technical report of the optimization campaign. Aside from the analysis itself, synthesis in the form of recommendations for actions to perform certain changes, similarly to the work of Jadhav et al. (Reference Jadhav and Farimani2024) may be realized but is discussed as part of future work. The total execution time of the framework, encompassing the training of the ML model to the point where the LLM is ready for interaction with text data and executables, is 85 seconds on a workstation computer for this example. The LLM is interfaced on the workstation but resides and executes on a separate machine. This execution time allows easy integration of the framework into the development process.

Figure 4. Results of explaining and documenting the system behavior in the validation use case
4.3. Discussion of results
The discussion of the results is based on the main objectives formulated in Section 2.4. Regarding the first objective, both local and global feature importances have been successfully extracted with the proposed framework using a gradient-boosting tree ML model and SHAP values within the validation example. This allows the rapid identification of important design parameters. Furthermore, the SHAP value for a design parameter represents the influence its parameter value has on the target variable in the simulation of interest. Such an importance measure is very intuitive to engineers. Consequently, this highlights the potential of data-driven prediction models and SHAP values for improving the analysis of numerical simulation in general, as this aspect also has not been comprehensively covered in literature yet. The second objective, namely, to pass data from the SHAP analysis, the prediction and agnostic model executables to an LLM, for enabling an intuitive and interactive analysis for the engineer with the LLM, is also achieved. This confirms the findings of Hegselmann et al. (Reference Hegselmann, Buendia, Lang, Agrawal and Jiang2024), who observed that additionally provided data can be used as a basis for text generation with an LLM. The added value of incorporating the LLM in this context is the improved result accessibility either as a collaborative chat or automated generation of technical reports, which the latter has not been further explored in this research. Furthermore, the proposed framework was successfully applied in an industry-relevant example of the side sill wall thickness optimization.
However, the proposed framework also has its limitations. First, the prediction model is unaware of physical relationships such as energy conservation. Although the training data is physically meaningful, this cannot be guaranteed for the rationale of the prediction model. Tracking accuracy and reasonableness, as well as preventing overfitting and bias in the training of the model through careful evaluation of the results, is crucial. Uncertainty measures like those used in Büttner et al. (Reference Büttner, Schumacher, Bäck, Schwarz and Krause2023) for predictions may further improve trustworthiness. Additionally, literature highlights pitfalls, e.g., feature dependence or unjustified causal interpretation, when using SHAP (Reference Molnar, Holzinger, Goebel, Fong, Moon, Müller, Samek, Holzinger, Goebel, Fong, Moon, Müller and SamekMolnar et al., 2022). These need to be considered to ensure proper application of the framework also from an AI-perspective making the use potentially more complex than it might initially appear. Although the LLM reasons correctly about the system in the validation example without additional training in analogy to Roy et al. (Reference Roy, Zhang, Bhave, Bansal, Las-Casas, Fonseca and Rajmohan2023), which was spot-checked, hallucinations cannot be excluded for other applications or unchecked regions. This should be comprehensively evaluated in the next step, which can only be achieved by including domain knowledge for ensuring correct context and consistency as emphasized in Roscher et al. (Reference Roscher, Bohn, Duarte and Garcke2020).
5. Summary and future work
Modern passive vehicle safety systems are primarily developed virtually, resulting in a high number of simulations due to system complexity and nonlinearity. Hence, analyzing simulation results in multiple crash scenarios is time-consuming and requires experienced engineers. To reduce development time and increase system understanding, this work proposes a new approach incorporating xAI and LLMs to enhance the analysis of numerical simulations. Determined SHAP values based on a trained prediction model serve as feature importance measures, indicating the contribution of design parameters to the model’s outcome. To further enhance result comparison, the additivity of SHAP values is exploited to introduce system and difference SHAP values. Available data of this analysis is transformed into a natural language string and provided along with executable functions to an LLM, which generates human-relatable textual explanations. This enables interactive analysis or can support technical report generation and thus makes complex results more accessible. The proposed framework was validated using a side sill optimization scenario in a vehicle pole side crash. The approach is modular and may be applied to similar problems in other domains such as thermal system development.
Future work can cover incorporating physical laws, such as energy conservation, directly into the prediction models to enhance the accuracy and trustworthiness of the results. To improve user interaction and trust, a systematic analysis of the limitations, the correctness and the usefulness of the LLM explanations need to be conducted. An exploration of different validation examples, such as the analysis of static structural simulations, can further demonstrate generalizability.
 
 



