Hostname: page-component-cb9f654ff-9knnw Total loading time: 0 Render date: 2025-09-08T01:50:49.110Z Has data issue: false hasContentIssue false

Enhancing glucose measurement efficiency: a non-invasive approach using machine learning

Published online by Cambridge University Press:  03 September 2025

Omer Faruk Goktas*
Affiliation:
Department of Electrical and Electronics Engineering, Ankara Yıldırım Beyazıt University, Ankara, Turkey
Ekin Demiray
Affiliation:
Department of Vocational Health Scholl, Medical Services and Techniques, Ankara Yıldırım Beyazıt University, Ankara, Turkey
Ali Degirmenci
Affiliation:
Department of Electrical and Electronics Engineering, Ankara Yıldırım Beyazıt University, Ankara, Turkey
Ilyas Cankaya
Affiliation:
Department of Electrical and Electronics Engineering, Ankara Yıldırım Beyazıt University, Ankara, Turkey
*
Corresponding author: Omer Faruk Goktas; Email: ofgoktas@aybu.edu.tr
Rights & Permissions [Opens in a new window]

Abstract

This research introduces a cutting-edge approach to glucose monitoring, which is essential in many applications. The study developed a new non-invasive glucose monitoring system utilizing machine learning techniques. This system examines the reflection coefficient data gathered from glucose solutions using a Vector Network Analyzer. To showcase the system’s accuracy in predicting glucose levels, two distinct datasets were employed. The first dataset comprised glucose solutions with concentrations spanning from 0 to 200 g/L, while the second dataset included solutions ranging from 15 000 to 20 000 mg/L for enhanced precision. The system measured both datasets, and three machine learning algorithms – Decision Tree, Random Forest, and Support Vector Regression – were applied to the collected data. Furthermore, a grid search method was employed to optimize the hyperparameters for each model’s optimal performance. The findings revealed that the Random Forest yielded the best results across both datasets. For gram scale, the R2 value was 0.9995, indicating that 99.95% of the glucose level variance was accounted for, with a low RMSE of 1.1589 mg/dL. Moreover, in milligram scale dataset, the R2 value was 0.9932, and RMSE was 1.1119 mg/dL, confirming the model’s high accuracy. These experimental outcomes demonstrate that the proposed system can effectively predict glucose levels.

Information

Type
Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press in association with The European Microwave Association.

Introduction

Glucose is one of the most critical molecules for the growth and reproduction of numerous organisms. It also plays a key role in many biological reactions such as respiration or photosynthesis. In addition, glucose serves as a metabolic fuel for cells. Apart from its monomer form, polymer derivatives of glucose join the structure of plants as cellulose or is stored in animal cells as a glycogen for energy storage [Reference Galant, Kaufman and Wilson1]. Moreover, glucose is an essential carbon source for the fermentation industry. Thus, monitoring of glucose is especially beneficial for optimizing biomass production and end-product concentrations such as lactic acid, amino acids, peptides, alcohols, etc. [Reference Pontius, Semenova, Silina, Gernaey and Junicke2]. Determining glucose levels is also crucial for various applications such as monitoring plant health, determination of fermentation parameters or blood glucose level detection [Reference Perdomo, De La Paz, Del, Seker, Saha, Wang and Jaramillo-Botero3].

Various methods have been devised for glucose monitoring because of their pivotal role in many areas, and all these techniques have their own advantages and disadvantages [Reference Mansour, Allam and Abdel-Rahman4, Reference Prakash and Gupta5]. For example, spectroscopic methods are accurate, but they have several drawbacks such as the high cost of chemicals used during the process and the requirement for multiple steps for sample preparation [Reference Harnsoongnoen and Wanthong6]. On the other hand, high-performance liquid chromatography (HPLC) is a robust analytical technique frequently used for detecting glucose and other metabolites in food products, fermentation environment, antibiotics, pesticides, etc. [Reference Vlasiou7]. With its high sensitivity and resolution, HPLC is able to analyze a diverse array of compounds. Nevertheless, apart from its reliance on costly instruments, the HPLC approach entails time-consuming steps including sample preparation, filtration, and analysis [Reference El-Nabawy, Awad and Ibrahim8]. Additionally, comprehensive researches are being conducted on the non-invasive determination of glucose levels in blood or other biological specimens. These non-invasive methodologies can be classified into two categories: electrochemical-based and electromagnetic-based. The use of electrochemical-based methods in glucose measurement are limited by their poor performance, that are greatly influenced by environmental conditions [Reference Shaker, Chen, Milligan and Qu9].

Electromagnetic sensing methods on the other hand, use electromagnetic signals of different wavelengths. In electromagnetic sensing, the relationship between signal (magnitude, phase, frequency, etc.) with biological samples such as organic materials, tissues or blood is measured. These methods can be used in a broad range of applications such as medicine, industrial biotechnology or agriculture [Reference Tothill10]. Different types of electromagnetic waves can be used in various applications. For instance, optical methods such as Raman spectroscopy exploit nanometer wavelengths. In studies within the millimeter waveband, wavelengths are commonly utilized. Additionally, microwaves are assessed in impedance spectroscopy [Reference Liakat, Bors, Xu, Woods, Doyle and Gmachl11]. Optical techniques aim to monitor and detect glucose in liquids using the scattering, absorption, and reflection properties of waves. On the other hand, millimeter-wave band detection, microwave band detection, and bioimpedance spectroscopy methods exploit the dielectric properties of glucose [Reference Xue, Thalmayer, Zeising, Fischer and Lübke12]. Since microwave and millimeter radiation offer lower energy per photon and less scattering amounts, they can penetrate deeper into the tissue to reach regions with sufficient blood concentration and provide more accurate glucose monitoring. They are also resistant to environmental factors, such as movement compared to optical band methods. Millimeter band methods are divided into three categories: reflection, transmission, and resonance perturbation. The reflection-based technique is also called a single port method, and when the glucose level increases or decreases, amplitude and phase of the reflected signal change due to the permeability in the solution, and this phenomenon, which is observed in signal level can be monitored by measuring the reflection coefficient (S11) [Reference Gonzales, Mobashsher and Abbosh13].

Besides its superior properties, electromagnetic glucose sensing is also a promising approach for monitoring blood glucose concentrations since it enables measuring the blood glucose levels of diabetic patients in a non-invasive manner [Reference Gonzales, Mobashsher and Abbosh13]. Because of its abundance and importance, studies about the microwave and millimeter wave bands are mainly focused on the measurement of blood glucose levels of diabetic patients. In addition, these methods are efficient for non-invasive blood glucose monitoring for diabetes disease [Reference Long, Chen, Li, Xian and Peng14]. According to [Reference Sun, Saeedi, Karuranga, Pinkepank, Ogurtsova, Duncan, Stein, Basit, Chan, Mbanya and Pavkov15], the global prevalence of diabetes was approximately 536 million individuals in 2021, with projections indicating a rise to 783 million by 2045. However, there has been little attention given to electromagnetic determination of glucose concentration present in the other biological samples such as the fermentation environment or food products [Reference Cano-Garcia, Gouzouasis, Sotiriou, Saha, Palikaras, Kosmas and Kallos16Reference Omer, Safavi-Naeini, Hughson and Shaker18]. Therefore, it can be concluded that electromagnetic glucose sensing has a great commercialization potential for various applications.

Integration of machine learning methods provides significant improvements of precision in any field as in glucose related applications. Thus, employing machine learning methods can greatly increase the accuracy of various estimations such as blood glucose levels or soil microbiota dynamics [Reference Jha and Ahmad19, Reference Shokrekhodaei, Cistola, Roberts and Quinones20]. For instance, Goktas et al. [Reference Goktas, Demiray, Degirmenci and Cankaya21] proposed a new method to determine the optimum operating point interval for Vector Network Analyzer (VNA) measurements of different glucose level concentrations. This new method is a sliding windows-based, which can be used in online applications due to its low time complexity. Additionally, it can be used to determine the operating point interval of the different macromolecules. Conducted studies in the literature mainly focused on determination of different types of diabetes using machine learning methods. Previous reports demonstrated that these methods can successfully be used in the classification of diabetes mellitus. For instance, Monte-Moreno [Reference Monte-Moreno22] designed a non-invasive system incorporating a photoplethysmograph (PPG) sensor to measure blood glucose level and blood pressure. Machine learning methods were applied to the features extracted from the PPG waveform through the signal processing module. The performance of four different machine learning methods (linear regression [LR], neural networks [NN], support vector machines [SVM], and random forest [RF]) were analyzed. According to the results, the RF method achieves the best R 2 scores for predicting blood glucose level (0.90), systolic blood pressure (0.91), and diastolic blood pressure (0.89). Furthermore Shokrekhodaei et al. [Reference Shokrekhodaei, Cistola, Roberts and Quinones20], used machine learning methods to predict glucose level measurements with data from a custom-built optical sensor. Measurements were obtained from this optical sensor at 18 different wavelengths between 410 nm and 940 nm. The relationship between glucose concentrations in aqueous solutions and wavelengths was analyzed. Wavelengths of 485 nm, 645 nm, 860 nm, and 940 nm showed high correlation between transmission intensity and glucose concentration. Initially, multiple linear regression (MLR) is used to predict glucose level using a feed-forward neural network (FFNN). While R2 was 0.92 and RMSE was 16.2 mg/dL in MLR, R 2 was 0.96 and RMSE was 11.1 mg/dL in FFNN regression model. Due to the low performance of MLR and FFNN in regression, the data set was divided into 21 classes with a glucose range of 10 mg/dL and analyzed using the classification methods k-nearest neighbors (KNN), DT, SVM. The average F1-score performance of the classification methods is 0.98, 0.97, and 0.99 for KNN, DT, and SVM methods, respectively [Reference Naresh, Nagaraju, Kollem, Kumar and Peddakrishna23]. designed a dual-wavelength short near-infrared system that can measure glucose levels and categorize glucose levels into three classes: hypoglycemic, normal, and hyperglycemic. Wavelengths of 940 nm and 950 nm were chosen because of 940 nm has a higher absorption coefficient for glucose and light at 950 nm passes more easily through tissue. FFNN regression model was used to estimate the glucose level from the inputs of the designed system. The performance of glucose level estimation is 0.99 in R 2, 2.49 mg/dL in MAE, 3.02 mg/dL in RMSE and 9.16 (mg/dL)2 in MSE metrics. Furthermore, multilayer perceptron and KNN methods are used to classify glucose levels. According to the average accuracy scores, higher performances were obtained with the KNN method (0.983 in 5-fold, 0.980 in 10-fold, 0.978 in 15-fold).

Previous studies have demonstrated that machine learning techniques, a branch of artificial intelligence, can effectively estimate glucose levels in a non-invasive manner. However, most of the existing research has primarily focused on optical methods, each offering distinct advantages and drawbacks. While optical methods tend to provide higher accuracy, this precision is often highly sensitive to environmental conditions and the quality of the technology employed. In contrast, micro- and millimeter-wave methods are less influenced by environmental factors and present a promising alternative for more in-depth measurements, particularly in glucose quantification studies.

In light of these challenges, the present study introduces a novel system designed to accurately determine glucose concentrations using various machine learning techniques. The system aims to exploit the relationship between transmitted and reflected signals, which vary according to the glucose concentration in diluted samples (1–200 g/L, 200 samples with increasing concentrations of 1 g; 15 000–20 000 mg/L, 200 samples with increasing concentrations of 25 mg; a total of 400 samples). To create the datasets for machine learning analysis, glucose solutions with varying concentrations were prepared. The reflection coefficient parameters were determined using a Vector Network Analyzer (VNA). These measurements were then compiled into low- and high-precision datasets for further analysis.

Subsequently, three different machine learning methods – Decision Tree, Random Forest, and Support Vector Machine – were applied, along with hyperparameter optimization, to achieve the best possible performance for each technique. The experimental results revealed that the Random Forest method consistently outperformed the others, providing the most accurate predictions for both low- and high-precision glucose level estimation. To the best of our knowledge, this is the first study to use a large sample size to predict high glucose concentrations across diverse fields, including food, agriculture, energy, and medicine, utilizing machine learning techniques.

The structure of the paper is organized as follows. In Section “Materials and Methods”, preparation of the solutions, experimental setup, and methods were given. Moreover, a detailed explanation of the used machine learning methods was shown in the same section. In Section “Results and Discussion”, performance metrics that are employed to assess the performance of the machine learning methods were shown, experimental results obtained from gram and milligram scale with machine learning methods were discussed and they were compared with previous reports in the literature. In Section “Conclusion”, concluding remarks and future study aspects were given.

Materials and methods

Chemicals and container

Glucose used in the assays was purchased from Merck/Germany. Mica glass (100 mL total volume) was used for all assays. Each experiments were carried out with 50 mL working volume and they were performed as triplicate to increase the repeatability of the results.

Calculation of the scattering parameters

When characterizing transmission lines, the scattering parameters of the transmission line are measured. The scattering parameter quantizes the parts of an incoming electromagnetic wave that are transmitted and reflected by a network. Scattering parameters (S) are a useful technique which shows the circuit in a matrix without knowing which circuit elements a wave circuit consists of, without needing to know its property or internal property. S11 (reflection coefficient) defines the proportion of the amplitude of the reflected signal to the amplitude of the transmitted signal. Hence, in the present study, S11 was calculated using the following Equation 1 and 2 [Reference Pozar24]. Eventually, measured magnitude values (y) were transformed to decibel values (dB) by Equation 3.

(1)\begin{equation}\left[ {\begin{array}{*{20}{c}} {\mathop V\nolimits_1^ - } \\ {\mathop V\nolimits_2^ - } \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\mathop S\nolimits_{11} }&{\mathop S\nolimits_{12} } \\ {\mathop S\nolimits_{21} }&{\mathop S\nolimits_{22} } \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {\mathop V\nolimits_1^ + } \\ {\mathop V\nolimits_2^ + } \end{array}} \right]\end{equation}
(2)\begin{equation}\left. \begin{gathered} \mathop S\nolimits_{11} = \frac{{\mathop V\nolimits_1^ - }}{{\mathop V\nolimits_1^ + }}\left| {\mathop V\nolimits_2^ + = 0,} \right.\mathop S\nolimits_{12} = \frac{{\mathop V\nolimits_1^ - }}{{\mathop V\nolimits_2^ + }}\left| {\mathop V\nolimits_1^ + = 0} \right. \hfill \\ \mathop S\nolimits_{21} = \frac{{\mathop V\nolimits_2^ - }}{{\mathop V\nolimits_1^ + }}\left| {\mathop V\nolimits_2^ + = 0,} \right.\mathop S\nolimits_{22} = \frac{{\mathop V\nolimits_2^ - }}{{\mathop V\nolimits_2^ + }}\left| {\mathop V\nolimits_1^ + = 0} \right. \hfill \\ \end{gathered} \right\}\end{equation}
(3)\begin{equation}y(dB) = 20 \cdot \log _{10}^{(y)}\end{equation}

Sample preparation and vector network analyzer

In the first part of the experiment, stock glucose solution (200 g/L) was prepared in a 50 mL sterile falcon tube. This solution was transferred to a mica glass container after measuring the dB value of the empty mica glass as a blank sample. After that step, glucose solution was serially diluted with distilled water up to 1 g/L glucose. The dilution step size was set to 1 g/L for each dilution. In the last dilution of this part, only distilled water (50 mL) was measured, making a total of 200 measurements.

In the second step, 20 000 mg/L (20 g/L) stock glucose solution was prepared in a 50 mL sterile falcon tube. This solution was transferred to a mica glass container after measuring the dB value of empty mica glass as a blank sample. The glucose solution was serially diluted with distilled water up to 15 000 mg/L (15 g/L) glucose. The dilution step size was set to 25 mg/L for each dilution. Distilled water (50 mL) was also measured, making a total of 200 measurements. To ensure reproducibility, during the experiments, the container was not moved, and all dilutions were performed with using micropipettes.

VNA (Rohde Schwarz/ZNB40-B22/Germany) equipped with WR-28 adapter and coaxial cable (miniband K-10) was used for experiments. S 11 parameter was monitored between 20 and 40 GHz. All equipment and adapter were calibrated for 50 Ω impedance. All assays were performed at 25°C and glucose solutions were measured five times using the VNA. Afterwards, average spectra were analyzed to determine the differentiation of glucose concentrations. The actual experimental setup was shown in Fig. 1.

Figure 1. (a) Experimental setup for VNA-based microwave measurement of increased glucose concentrations between 20 and 40 GHz (b) Schematic representation of system and experimental approaches (T: 25°C, number of point: 2000, sweep: 10).

Estimation of glucose level with machine learning methods

Regression is a technique that is used in data science and statistics to model the relationship between a dependent variable and one or more independent variables. The main purpose of this analysis is to predict how the dependent variable changes depending on the independent variables and to understand this change. In the present study, the estimation of glucose levels is performed with three different regression methods, namely DT, RF, and SVR. The employed methods are explained in the following subsections, respectively.

Decision tree

The Decision tree (DT) is a supervised learning algorithm with a tree-like structure. It consists of root node, decision nodes (branches), and leaf node units. The root node is the initial node of the tree where the data set is split according to specific criteria, the split continues with the decision nodes and ends at the leaf nodes. In DT regression, the value at the terminal nodes expresses the average response of the observations that fall within the specific region during training. Consequently, when a new, unseen observation is encountered, the model predicts the response by using the average value of the corresponding region [Reference Czajkowski and Kretowski25]. As can be seen in Fig. 2, the DT makes decisions by dividing nodes into sub-nodes. This subdivision continues throughout the training process until only homogeneous nodes remain [Reference Sneha and Gangil26]. The prediction of the ${i^{th}}$ sample $\hat y{}_i$ in the test data set is made by averaging the samples in the leaf node.

Figure 2. Structure of DT.

Random forest

Random forest (RF) algorithm is an ensemble learning method that constructs a multitude of decision trees for classification or regression tasks. In this method, the bagging technique is employed to construct multiple decision trees that operate in parallel and independently. The bagging technique involves bootstrapping and aggregation. Bootstrapping is the process of creating various subsets of the training data with replacement. DT regression models are created with these data subsets. In the aggregation process, the result of each DT regression model is found and averaged to determine the final estimation. In the RF method with M decision trees, the prediction result is calculated as given in Equation 4.

(4)\begin{equation}\overset{\text{$\smash{\displaystyle\frown}$}}{{f} _{rf}^N} = \frac{1}{N}\mathop \sum \limits_{n = 1}^N T(x)\end{equation}

The structure of the RF method is shown in Fig. 3.

Figure 3. Visual illustration of RF.

Support vector regression

Support vector machines (SVM) is a learning method developed by Vapnik and used for classification and regression problems [Reference Sain and Vapnik27, Reference Smola and Schölkopf28].

Let $\left\{ {\left( {{x_1},{y_1}} \right),\left( {{x_2},{y_2}} \right), \ldots ,\left( {{x_n},{y_n}} \right)} \right\} \subset \chi \times \mathbf{R}$ be the training data, where $\chi $ denotes the space of input patterns. The goal is to obtain a function $f(x)$ that has the maximum deviation $\varepsilon $ from the targets ${y_i}$ obtained for all training data and is also as smooth as possible. In other words, errors smaller than $\varepsilon $ are ignored, while a deviation larger than $\varepsilon $ is not accepted.

The general mathematical form of the support vector regression (SVR) defined as in Equation 5.

(5)\begin{equation}f\left( x \right) = w \cdot \varphi \left( x \right) + b\end{equation}

where $\varphi \left( x \right)$ denotes the kernel function, $w$ corresponds to the weights, b is the bias. In the case of Equation 5, flatness means looking for a small $w$. One way to achieve this is to minimize the norm, ${\left\| w \right\|^2} = \left\langle {w,w} \right\rangle $. The present problem can therefore be written as a convex optimization problem as in Equation 6.

(6)\begin{equation}\begin{gathered} minimize\,\,\frac{1}{2}{\left\| w \right\|^2} \hfill \\ subject\,to: \hfill \\ \,\,\,\,\,\,{y_i} - \langle w,\varphi \left( {{x_i}} \right)\rangle - b \leq \varepsilon \hfill \\ \,\,\,\,\,\,\langle w,\varphi \left( {{x_i}} \right)\rangle + b - {y_i} \leq \varepsilon \hfill \\ \end{gathered} \end{equation}

The assumption in Equation 6 implies that such a function $f$ exists that approximates all pairs ( ${x_i},{y_i}$) with $\varepsilon $ precision. This assumption also says that the convex optimization problem is feasible. But in some cases we may want to allow some errors. For this, one can add slack variables ${\xi _i},\xi _i^*$ to deal with the infeasible constraints of the optimization problem. This leads to the equation given by [Reference Sain and Vapnik27].

(7)\begin{equation}\begin{gathered} minimize\,\,\frac{1}{2}{\left\| w \right\|^2} + C\sum\limits_{i = 1}^n {\left( {{\xi _i} + \xi _i^*} \right)} \hfill \\ subject\,to: \hfill \\ \,\,\,\,\,\left[ {w \cdot \varphi \left( {{x_i}} \right) + b} \right] - {y_i} \leq \varepsilon + \,\xi _i^* \hfill \\ \,\,\,\,\,{y_i} - \left[ {w \cdot \varphi \left( {{x_i}} \right) + b} \right] \leq \varepsilon + \,{\xi _i} \hfill \\ \,\,\,\,\,{\xi _i},\xi _i^* \geq 0 \hfill \\ \end{gathered} \end{equation}

Different kernel functions are proposed to improve the performance of SVR. Commonly used kernel functions include the linear, polynomial, radial basis function (RBF), and sigmoid kernels.

A graphical illustration of the SVR conversion process is presented in Fig. 4. The samples in the original input space (low dimensional) are mapped to a feature space (high dimensional) through the non-linear mapping function $\varphi \left( x \right)$. Data points located on or outside the $\varepsilon $−tube of the decision function are defined as support vectors and are marked with red stars. The right side of Fig. 4 shows the $\varepsilon $−insensitive loss function, where ${\xi _i}$ and $\xi _i^*$ are slack variables are employed to handle the permissible positive or negative errors.

Figure 4. Illustration of the mapping input space × into high-dimensional feature space and the soft margin loss setting for a SVR.

Results and discussions

Performance metrics

In artificial intelligence problems, performance metrics are used to measure the success of the models and compare them with each other. To calculate the performance of the machine learning method, the data set is divided into two parts: training and test sets. While the machine learning model is created with the training data, the performance of this model is calculated using the samples from the test data. Different performance measures have been developed in the literature based on predictive modeling techniques. Commonly utilized performance metrics for regression problems are MAE, MSE, RMSE, and R 2.

MAE is determined by taking the average of the absolute differences between the observed values and the predicted values. MAE is defined as

(8)\begin{equation}MAE = \frac{{\sum\limits_{i = 1}^N {\left| {{y_i} - \hat y{}_i} \right|} }}{N}\end{equation}

where ${y_i}$ corresponds to the ${i^{th}}$ observed value, $\hat y{}_i$ is the ${i^{th}}$ predicted value, and N equals the number of samples in the data set.

MSE is calculated as the mean square of the difference between the observed value and the predicted value. MSE emphasizes larger errors due to the presence of the squared term in the equation. MSE is calculated as

(9)\begin{equation}MSE = \frac{{\sum\limits_{i = 1}^N {{{\left( {{y_i} - \hat y{}_i} \right)}^2}} }}{N}\end{equation}

RMSE is simply equal to the square root of MSE. RMSE is computed by taking the square root of average squared differences between the predicted and actual values. Because RMSE calculates the square root of the average squared errors, it brings the error metric back to the original scale of the target variable. RMSE is computed as

(10)\begin{equation}RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^N {{{\left( {{y_i} - \hat y{}_i} \right)}^2}} }}{N}} \end{equation}

R 2, also known as the coefficient of determination, indicates the proportion of the variance in the target variable explained by the predictions of the regression model [Reference Isaac and Saha29]. R 2 is defined as

(11)\begin{equation}{R^2} = 1 - \frac{{\sum\limits_{i = 1}^N {{{\left( {{y_i} - \hat y{}_i} \right)}^2}} }}{{\sum\limits_{i = 1}^N {{{\left( {{y_i} - \overline y } \right)}^2}} }}\end{equation}

where $\hat y{}_i$ equals ${i^{th}}$ observed value, $\hat y{}_i$ is the ${i^{th}}$ predicted value, and $\overline y $ corresponds to the mean of observed values. R 2 is also named as coefficient of determination, it can take values between 0 and 1. As the R 2 score approaches 1, it indicates that the model fits better.

K-fold cross-validation

To evaluate the performance of the models, the data set is divided into training and test sets. The machine learning model is created with the training set, and the performance of the model is evaluated on the test set. When the data set is limited, separating the data set into training and test sets is insufficient to evaluate the performance of the model. In this case, a different approach called k-fold cross-validation is often used. In k-fold cross-validation, the data set is randomly divided into k equal parts. One part is reserved for testing, while remaining k − 1 parts are used for training. This process is iterated for k times, and the results of these iterations are then averaged. In this way, all samples in the data set are used for both training and testing of the model. A sample illustration of k-fold cross-validation for a k value of 5 is shown in Fig. 5.

Figure 5. K-fold cross-validation for k = 5.

Experimental results

In this study, a new machine learning model-based system was established to measure glucose concentration in distilled water. Initially, measurements were taken from the VNA via the WR-28 adapter and glucose level estimation was performed with three different machine learning methods: DT, RF, and SVR. In addition, the performances of these methods were benchmarked. All of these methods take method-specific hyperparameters, and these parameters need to be optimized according to the data set to be applied. In order to obtain the best performance for each compared method, a grid search method was applied to the method-specific hyperparameters. In the grid search technique, the performance of the method was measured separately using each pair of hyperparameters and the hyperparameters that give the best score in the calculated performance metrics determine the final performance of the method. The search space of the hyperparameters for the compared methods are given in Table 1. In addition, k value was chosen as 10 in the k-fold cross validation technique and it is repeated 10 times with random shuffling in each iteration to increase reliability of the results.

Table 1. Hyperparameter search space of the methods

Analysis of the stem on gram scale

The first data set is prepared with the increment of glucose in gram scale starting from 1 to 200. In this data set, the results in the R 2 performance metric of the three compared methods are approximately close to each other and take values around 0.99 (Table 2). This high value of R 2 (0.99) indicates that the predictions of the model are very close to the measurements taken. However, an in-depth analysis of the benchmarked methods on a different performance metric would be more accurate for evaluating the success of the compared methods and determining the best method. For this reason, a detailed performance analysis was performed using RMSE, another regression performance metric frequently used in the literature. Figure 6 visualizes the RMSE scores for the performance of the models in the grid search range for the benchmarked methods.

Figure 6. RMSE score of the benchmarked methods on gram scale analysis (a) DT, (b) RF, (c) SVR-linear, and (d) SVR-RBF.

Table 2. The best scores obtained in benchmarked methods in gram scale

The RMSE scores of the DT method over the specified range are shown in Fig. 6(a). Changing the minimum_sample_split hyperparameter slightly changes the RMSE scores, provided that the maximum depth hyperparameter remains the same. However, RMSE scores increase substantially when the maximum depth hyperparameter falls below 4. Figure 6(b) presents the RMSE scores of the RF method. In the RF method, the RMSE results are very high for values of the number_of_tree hyperparameter up to 4. For values of the number_of_tree hyperparameters greater than 4, the RMSE results are satisfactory and vary within a limited range. As can be seen from Fig. 6(b), the effect of the maximum_depth hyperparameter is quite limited compared to the number_of_trees. The performance of the SVR with linear kernel (SVR-Linear) is shown in Fig. 6(c). In SVR-linear, the variation of RMSE scores in the grid search space is rather restricted. The difference between the highest (4.3036 mg/dL) and lowest RMSE scores (2.7976 mg/dL) is 1.5060. Figure 6(d) visualizes the RMSE scores obtained with the RBF kernel of the SVR (SVR-RBF) method. In SVM-RBF, the best scores are obtained in the range 2−10 − 2−8 for $\gamma $ and 23 − 213 for Cost (C). Outside of this range, the performance gradually decreases. The best RMSE scores achieved in each benchmarked method are given in Table 2. Based on the results in Table 2, the best RMSE scores in all performance metrics were obtained in the RF method with hyperparameters number_of_trees = 16, maximum_depth = 16.

Analysis of the stem on milligram scale

In the first part of the study, glucose concentrations between 0 and 200 g/L with 1 g/L intervals were measured via VNA in order to apply machine learning methods. According to the experiments, high R 2 and low RMSE performance results were obtained in the benchmarked methods. Thus, during the second step of the experiments, sensitivity of the current measurement method was tested for glucose concentrations between 15 000 and 20 000 mg/L with 25 mg/L intervals. Hence, it was aimed to evaluate the efficiency of the system for highly sensitive analyzes such as blood glucose levels. As previously mentioned for the gram-precision prepared data set, the R 2 performance scores of the benchmarking methods in the milligram-precision prepared data set are very close to each other and range between 0.9812 and 0.9923 (Table 3). Therefore, the RMSE performance scores of the benchmarked methods were analyzed in detail. Figure 7 shows the results of DT, RF, SVR-linear, and SVR-RBF methods on the RMSE metric.

Figure 7. RMSE score of the benchmarked methods on milligram-scale analysis (a) DT, (b) RF, (c) SVR-linear, and (d) SVR-RBF.

Table 3. Lowest performances obtained in benchmarked methods in milligram-scale

The performance of the DT method is shown in Fig. 7(a). For the same minimum_sample_split hyperparameter, the effect of changing the maximum_depth hyperparameter on the RMSE scores of the method is quite limited. For the maximum_depth hyperparameter in the range 4 to 25 for all values of the minimum_sample_split, the performance is almost unchanged. At maximum_ depth hyperparameter values lower than 4, performance decreases gradually. The performance of the RF method is presented in Fig. 7(b). The performance of the RF method is almost the same in the range of number_of_trees 25 − 210 and maximum_depth 4 − 32 (obtains the best performance), but outside this range the RMSE values gradually increase and therefore the performance decreases.

The variation of the RMSE values in the best performance range is between 1.1119 mg/dL and 1.2449 mg/dL and the variation in this range is approximately 12%. The observation of such a variation (12%) over a certain range in the same method shows the importance of adjusting the method-specific hyperparameters. The RMSE scores of the SVR-linear method in the specified range are shown in Fig. 7(c). As in the DT method Fig. 7(a), the variation of the C hyperparameter in SVR-linear was rather limited when the $\varepsilon $ hyperparameter remained constant. When the hyperparameter of $\varepsilon $ is greater than 2−3, the performance of SVR-linear drops significantly. Figure 7(d) shows the RMSE scores of the SVR-RBF method. Similar to the result of the SVM linear method, the effect of the hyperparameter C is fairly low. The performance of the method decreases significantly and gradually as the $\gamma $ value increases from 2−8. Table 3 shows the lowest scores obtained by the compared methods on all performance metrics in the milligram-scale data set and the hyperparameters at which these scores were obtained. As with the gram-scale data set, the lowest scores were obtained in the RF method with the hyperparameters number of trees = 28 and maximum depth = 30 in the mg-scale data set.

To estimate glucose concentration non-invasively, we employed three machine learning algorithms: DT, RF, an SVR. These models were chosen due to their proven effectiveness in nonlinear regression tasks and ability to handle multivariate data [Reference Smola and Schölkopf28, Reference Breiman30, Reference Quinlan31]. The input features for the models are derived from the measured S-parameters obtained via a VNA, which capture the electromagnetic response of the sensing system influenced by glucose concentration. DT models learn a hierarchy of rules based on these features, allowing for interpretable decision-making but may suffer from overfitting in complex datasets [Reference Breiman30]. RF, as an ensemble of DT, mitigates overfitting by averaging predictions from multiple trees trained on random subsets of the data, increasing robustness and generalization [Reference Quinlan31]. SVR constructs a regression function in a high-dimensional feature space using kernel functions, making it particularly suitable for capturing the nonlinear relationship between VNA-derived features and glucose levels [Reference Smola and Schölkopf28]. These models were trained and evaluated using cross-validation to ensure reliable performance estimation and avoid bias. Their predictions were compared to determine the most accurate and robust approach for non-invasive glucose estimation.

Comparison of the system

Details of the studies in the literature in which glucose concentration was estimated by non-invasive measurements are given in Table 4. As can be seen from Table 4, the performance of the proposed system is significantly superior when compared to some of the previous reports about glucose estimation studies in the literature. Among the previous works, the highest R 2 scores were obtained as 0.96, 0.95, and 0.99 from [Reference Naresh, Nagaraju, Kollem, Kumar and Peddakrishna23, Reference Anupongongarch, Kaewgun, O’reilly and Khaomek32, Reference Larin, Eledrisi, Motamedi and Esenaliev33], respectively, which are quite similar to those obtained from the current study. This R 2 scores observed from methods [Reference Naresh, Nagaraju, Kollem, Kumar and Peddakrishna23] and the proposed system indicates that 99% of the variance in the dependent variable is explained by the independent variables. Thus, it is more appropriate to perform the comparison on the RMSE metric, which is frequently used in comparing regression methods. On the other hand, the lowest RMSE scores among the values given in the literature were calculated as 5.61 mg/dL, 3.02 mg/dL, and 5.53 mg/dL in [Reference Naresh, Nagaraju, Kollem, Kumar and Peddakrishna23, Reference Xiao, Yu, Li, Song and Kikkawa34, Reference Jain, Maddila and Joshi35] (2019), respectively. Among the studies in Table 4, the lowest RMSE score was calculated as 3.02 mg/dL which was found as only 1.11 mg/dL in the current study. When the proposed system was compared with the closest studies, it was observed that the RMSE scores were 63.18%, 79.93%, and 80.22% lower.

Table 4. Performance comparison of the current work with existing studies

The results obtained from current study can be comparable with other re- ports in the literature. For example, Naresh et al. [Reference Naresh, Nagaraju, Kollem, Kumar and Peddakrishna23] proposed a dual wavelength optical system using 950 nm and 940 nm to detect the glucose non- invasively with 575 samples (460 of them were allocated for training and 115 for testing). R 2, MSE, MAE, and RMSE values were observed as 0.99, 9.16 (mg/dL)2, 2.49 mg/dL, and 3.02 mg/dL, respectively. Furthermore, Jain et al. [Reference Jain, Maddila and Joshi35] tested a non-invasive glucose detection system using near infrared spectroscopy (NIR) absorbance and reflectance spectroscopy technique. As a result of the study with 25 subjects, R 2, MAE, and RMSE values were found as 0.908, 3.87 mg/dL and 5.61 mg/dL, respectively. In an another report, photoacoustic measurements were taken on glucose solutions ranged from 0 to 500 mg/dL. Calibration of photoacoustic measurements from solutions by applying Gaussian kernel-based regression resulted in RMSE, mean absolute relative difference (MARD), and mean absolute difference (MAD) of 7.64 mg/dL, 2.07% and 5.23 mg/dL, respectively [Reference Pai, De and Banerjee36]. On the other hand, in the current study, R2, MSE, MAE, and RMSE values were obtained as 0.9932, 1.28 (mg/dL)2, 0.87 mg/dL, and 1.11 mg/dL, respectively. In comparison with the existing studies in the literature, the presented studies resulted in significant improvement (63%) in terms of RMSE performance metrics. This enhancement can be related to several factors. First, hyperparameter tuning of the benchmarked models are made to obtain he highest scores in each method. Second, in the current study, 400 samples in total were used for machine learning. Usage of such a high number of samples can lead to improved accuracy and generalization of the measurements, better representation of the data distribution, reduced bias. By this way, it was possible to obtain more reliable metrics. Third, serial dilution of glucose during experiments may have provided a stable environment which support the repeatability.

Conclusion

In this study, a novel system for non-invasive glucose estimation using a VNA was proposed. Reflection coefficients from serially diluted glucose solutions were measured and utilized for machine learning models, namely DT, RF, and SVR. The experiments were conducted in two phases. In the first phase, coarse glucose estimation was performed with increasing concentrations of 1 g/L using these machine learning methods. Due to the high performance observed in glucose estimation on the gram scale, more precise experiments were conducted in the second phase, with a milligram-scale approach (increasing by 25 mg/L in each dilution).

The results showed that RF achieved the best performance across both scales. For the gram scale, the model achieved an R 2 of 0.9995, RMSE of 1.1589 mg/dL, MAE of 0.8706 mg/dL, and MSE of 1.5060 (mg/dL)2. For the milligram scale, the model delivered an R 2 of 0.9932, RMSE of 1.1119 mg/dL, MAE of 0.8734 mg/dL, and MSE of 1.2870 (mg/dL)2. When compared to conventional glucose measurement methods, the findings demonstrate that the machine learning-based VNA approach offers a rapid and reliable alternative for glucose estimation. Furthermore, the proposed system has significant application potential in industries related to sugar, such as agriculture, food, medicine, and biotechnology. The exceptional performance of this system, which surpasses existing methods in the literature, positions it as a practical solution for real-world scenarios. Additionally, the promising results from this study suggest future applications. As a next step, the method could be extended to detect the concentration of various macromolecules. Moreover, performance could be further enhanced by evaluating additional S-parameters. In this context, the generalizability of the proposed system to different biological structures represents a valuable direction for future research. Beyond glucose, the detection and quantification of macromolecules such as proteins, lipids, or nucleic acids using similar non-invasive techniques may significantly broaden the scope of this method. Incorporating such targets into upcoming studies could contribute to the development of more comprehensive biosensing platforms suitable for diverse biomedical and industrial applications.

Acknowledgments

This work is supported by Ankara Yildirim Beyazit University’s Project Office Grant No. AYBU-BAP-2604.

Competing interests

The authors declare none.

Omer Faruk Goktas received his B.Sc. and M.Sc. degrees in Electrical and Electronics Engineering and is currently pursuing a Ph.D. degree in Electrical and Electronics Engineering at Ankara Yıldırım Beyazıt University. His research interests include microwave and millimeter-wave technologies, with a particular focus on glucose detection using advanced measurement techniques.

Dr. Ekin Demiray earned his Ph.D. in Biology from Ankara University, Ankara, Turkey, in 2019. Following his doctoral studies, he worked as a postdoctoral researcher at IMDEA Energy (Madrid, Spain) and North Carolina State University (Raleigh, USA). He is currently an Associate Professor at the Vocational Health School of Ankara Yıldırım Beyazıt University, Ankara, Turkey. His research focuses on second generation biofuel production from lignocellulosic biomass and the non-invasive monitoring of fermentation parameters.

Dr. Ali Degirmenci received the B.Sc. and M.Sc. degrees in Electrical and Electronics Engineering (EEE) from Eskisehir Osmangazi University, Eskisehir, Turkey, in 2013 and 2017, respectively, and the Ph.D. degree from the Department of EEE, Ankara Yildirim Beyazit University, Ankara, Turkey, in 2022. His research interests include data mining, image processing, machine learning, and deep learning with special interests in outlier detection.

Prof. Dr. İlyas Cankaya received his BS degree from Gazi University, Ankara, Türkiye, in 1990, and PhD from the Sussex University, UK, in 1998. He is currently a Professor in the Department of Electrical and Electronics Engineering, Ankara Yıldırım Beyazıt University in Türkiye. His primary research interests are nonlinear frequency response analysis, and biomedical signal processing.

References

Galant, AL, Kaufman, RC and Wilson, JD (2015) Glucose: Detection and analysis. Food Chemistry 188, 149160. doi:10.1016/j.foodchem.2015.04.071CrossRefGoogle ScholarPubMed
Pontius, K, Semenova, D, Silina, YE, Gernaey, KV and Junicke, H (2020) Automated electrochemical glucose biosensor platform as an efficient tool toward on-line fermentation monitoring: Novel application approaches and insights. Frontiers in Bioengineering and Biotechnology 8, 115. doi:10.3389/fbioe.2020.00436CrossRefGoogle Scholar
Perdomo, SA, De La Paz, E, Del, Cano R, Seker, S, Saha, T, Wang, J and Jaramillo-Botero, A (2023). Non-invasive in-vivo glucose-based stress monitoring in plants. Biosensors and Bioelectronics 231, . doi:10.1016/j.bios.2023.115300CrossRefGoogle ScholarPubMed
Mansour, E, Allam, A and Abdel-Rahman, AB (2023) A novel approach to non-invasive blood glucose sensing based on a single-slot defected ground structure. International Journal of Microwave and Wireless Technologies 15(1), 3240. doi:10.1017/S1759078722000174CrossRefGoogle Scholar
Prakash, D and Gupta, N (2022) Applications of metamaterial sensors: A review. International Journal of Microwave and Wireless Technologies 14(1), 1933. doi:10.1017/S1759078721000039CrossRefGoogle Scholar
Harnsoongnoen, S and Wanthong, A (2017) Real-time monitoring of sucrose, sorbitol, D-glucose and D-fructose concentration by electromagnetic sensing. Food Chemistry 232, 566570. doi:10.1016/j.foodchem.2017.04.054CrossRefGoogle ScholarPubMed
Vlasiou, MC (2023) Cheese and Milk Adulteration: Detection with Spectroscopic Techniques and HPLC: Advantages and Disadvantages. Dairy 4(3), 509514. doi:10.3390/dairy4030034CrossRefGoogle Scholar
El-Nabawy, M, Awad, S and Ibrahim, A (2023) Validation of the methods for the non-milk fat detection in artificially adulterated milk with palm oil. Food Analytical Methods 16(4), 798807. doi:10.1007/s12161-023-02465-wCrossRefGoogle Scholar
Shaker, G, Chen, R, Milligan, B and Qu, T (2016) Ambient electromagnetic energy harvesting system for on-body sensors. Electronics Letters 52(22), 18341836. doi:10.1049/el.2016.3123CrossRefGoogle Scholar
Tothill, IE (2001) Biosensors developments and potential applications in the agricultural diagnosis sector. Computers and Electronics in Agriculture 30(1–3), 205218. doi:10.1016/S0168-1699(00)00165-4CrossRefGoogle Scholar
Liakat, S, Bors, KA, Xu, L, Woods, CM, Doyle, J and Gmachl, CF (2014) Noninvasive in vivo glucose sensing on human subjects using mid-infrared light. Biomedical Optics Express 5(7), . doi:10.1364/boe.5.002397CrossRefGoogle ScholarPubMed
Xue, Y, Thalmayer, AS, Zeising, S, Fischer, G and Lübke, M (2022) Commercial and Scientific Solutions for Blood Glucose Monitoring—A Review. Sensors 22(2), .10.3390/s22020425CrossRefGoogle ScholarPubMed
Gonzales, WV, Mobashsher, AT and Abbosh, A (2019) The progress of glucose monitoring—A review of invasive to minimally and non-invasive techniques, devices and sensors. Sensors (Switzerland) 19(4). doi:10.3390/s19040800Google Scholar
Long, H, Chen, B, Li, W, Xian, Y and Peng, Z (2020) Blood glucose detection based on Teager-Kaiser main energy of photoacoustic signal. Computers in Biology and Medicine 134, . doi:10.1016/j.compbiomed.2021.104552Google Scholar
Sun, H, Saeedi, P, Karuranga, S, Pinkepank, M, Ogurtsova, K, Duncan, BB, Stein, C, Basit, A, Chan, JC, Mbanya, JC and Pavkov, ME (2022). IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Research and Clinical Practice 183, . doi:10.1016/j.diabres.2021.109119CrossRefGoogle ScholarPubMed
Cano-Garcia, H, Gouzouasis, I, Sotiriou, I, Saha, S, Palikaras, G, Kosmas, P and Kallos, E (2016). Reflection and transmission measurements using 60 GHz patch antennas in the presence of animal tissue for non-invasive glucose sensing. 2016 10th European Conference on Antennas and Propagation, EuCAP 1, 1012. doi:10.1109/EuCAP.2016.7481178Google Scholar
Hofmann, M, Trenz, F, Weigel, R, Fischer, G, and Kissinger, D (2012) A microwave sensing system for aqueous concentration measurements based on a microwave reflectometer. IEEE/MTT-S International Microwave Symposium Digest, 13. doi:10.1109/MWSYM.2012.6259771.CrossRefGoogle Scholar
Omer, AE, Safavi-Naeini, S, Hughson, R and Shaker, G (2020) Blood glucose level monitoring using an FMCW millimeter-wave radar sensor. Remote Sensing 12(3), . doi:10.3390/rs12030385CrossRefGoogle Scholar
Jha, SK and Ahmad, Z (2018) Soil microbial dynamics prediction using machine learning regression methods. Computers and Electronics in Agriculture 147, 158165. doi:10.1016/j.compag.2018.02.024CrossRefGoogle Scholar
Shokrekhodaei, M, Cistola, DP, Roberts, RC and Quinones, S (2021) Non-invasive glucose monitoring using optical sensor and machine learning techniques for diabetes applications. IEEE Access 9, 7302973045. doi:10.1109/ACCESS.2021.3079182CrossRefGoogle ScholarPubMed
Goktas, OF, Demiray, E, Degirmenci, A and Cankaya, I (2024) Real time non-invasive monitoring of glucose and nitrogen sources with a novel window sliding based algorithm. Engineering Science and Technology, an International Journal 58, . doi:10.1016/j.jestch.2024.101845CrossRefGoogle Scholar
Monte-Moreno, E (2011) Non-invasive estimate of blood glucose and blood pressure from a photoplethysmograph by means of machine learning techniques. Artificial Intelligence in Medicine 53(2), 127138. doi:10.1016/j.artmed.2011.05.001CrossRefGoogle ScholarPubMed
Naresh, M, Nagaraju, VS, Kollem, S, Kumar, J and Peddakrishna, S (2024) Non-invasive glucose prediction and classification using NIR technology with machine learning. Heliyon 10(7), . doi:10.1016/j.heliyon.2024.e28720CrossRefGoogle ScholarPubMed
Pozar, DM (2011) Microwave Engineering Theory and Techniques, 4 Edn. USA: John Wiley & sons., p. ( accessed 28 August 2025).Google Scholar
Czajkowski, M and Kretowski, M (2016) The role of decision tree representation in regression problems – An evolutionary perspective. Applied Soft Computing Journal 48, 458475. doi:10.1016/j.asoc.2016.07.007CrossRefGoogle Scholar
Sneha, N and Gangil, T (2019) Analysis of diabetes mellitus for early prediction using optimal features selection. Journal of Big Data 6(1). doi:10.1186/s40537-019-0175-6CrossRefGoogle Scholar
Sain, SR and Vapnik, VN (1996) The nature of statistical learning theory. Technometrics 38(4), 409409. doi:10.2307/1271324CrossRefGoogle Scholar
Smola, AJ and Schölkopf, B (2004) A Tutorial on Support Vector Regression. Stat. Comput archive. Statistics and Computing 14, 199222. doi:10.1023/B:STCO.0000035301.49549.88CrossRefGoogle Scholar
Isaac, N and Saha, AK (2024) Forecasting hydrogen vehicle refuelling for sustainable transportation: A light gradient-boosting machine model. Sustain 16(10), 1113. doi:10.3390/su16104055Google Scholar
Breiman, L (2001) Random forests. Machine Learning 45(1), 532. doi:10.1023/A:1010933404324CrossRefGoogle Scholar
Quinlan, JR (1986) Induction of decision trees. Machine Learning 1(1), 81106. doi:10.1007/bf00116251CrossRefGoogle Scholar
Anupongongarch, P, Kaewgun, T, O’reilly, JA and Khaomek, P (2019) Development of a non-invasive blood glucose sensor. International Journal of Applied Biomedical Engineering 12(1), 1319.Google Scholar
Larin, KV, Eledrisi, MS, Motamedi, M and Esenaliev, RO (2002) Noninvasive blood glucose monitoring with optical coherence tomography: A pilot study in human subjects. Diabetes Care 25(12), 22632267. doi:10.2337/diacare.25.12.2263CrossRefGoogle ScholarPubMed
Xiao, X, Yu, Q, Li, Q, Song, H and Kikkawa, T (2021) Precise noninvasive estimation of glucose using UWB microwave with improved neural networks and hybrid optimization. IEEE Transactions on Instrumentation and Measurement 70, 110. doi:10.1109/TIM.2020.3010680Google Scholar
Jain, P, Maddila, R and Joshi, AM (2019) A precise non-invasive blood glucose measurement system using NIR spectroscopy and Huber’s regression model. Optical and Quantum Electronics 51(2), 115. doi:10.1007/s11082-019-1766-3CrossRefGoogle Scholar
Pai, PP, De, A and Banerjee, S (2018) Accuracy enhancement for noninvasive glucose estimation using dual-wavelength photoacoustic measurements and kernel-based calibration. IEEE Transactions on Instrumentation and Measurement 67(1), 126136. doi:10.1109/TIM.2017.2761237CrossRefGoogle Scholar
Figure 0

Figure 1. (a) Experimental setup for VNA-based microwave measurement of increased glucose concentrations between 20 and 40 GHz (b) Schematic representation of system and experimental approaches (T: 25°C, number of point: 2000, sweep: 10).

Figure 1

Figure 2. Structure of DT.

Figure 2

Figure 3. Visual illustration of RF.

Figure 3

Figure 4. Illustration of the mapping input space × into high-dimensional feature space and the soft margin loss setting for a SVR.

Figure 4

Figure 5. K-fold cross-validation for k = 5.

Figure 5

Table 1. Hyperparameter search space of the methods

Figure 6

Figure 6. RMSE score of the benchmarked methods on gram scale analysis (a) DT, (b) RF, (c) SVR-linear, and (d) SVR-RBF.

Figure 7

Table 2. The best scores obtained in benchmarked methods in gram scale

Figure 8

Figure 7. RMSE score of the benchmarked methods on milligram-scale analysis (a) DT, (b) RF, (c) SVR-linear, and (d) SVR-RBF.

Figure 9

Table 3. Lowest performances obtained in benchmarked methods in milligram-scale

Figure 10

Table 4. Performance comparison of the current work with existing studies