Classifying the physical activity indicator using machine learning and direct measurements: a feasibility study

. Low levels of physical activity (PA) are related to an increased risk of death, hypertension, coronary disease, stroke, diabetes, and depression. Then, assessing the level of PA of a person is essential to create training programs that help prevent such risks. However, current measurements of PA are mainly subjective and tend to underestimate or overestimate the PA level of a person. This article intends the result of a pilot cross-sectional feasibility study that pretends to classify the PA level through direct and objective measurements. For this, direct measurements such as anthropometric and postural sway (PS) features from fifteen participants (8 Male and 7 Women) were obtained. To predict the level of PA machine learning technique of Support Vector Machines SVM was used. The classifier showed high F1, recall, and precision scores around 80%, and after feature importance selection and hyperparameter were tunned, they reached 100%. Results suggest that the use of direct measurements to classify the PA level is feasible and that there is a correlation between direct measurements and the IPAQ-SF, an indirect measurement that is typically used to assess the level of PA. This classifier intends to be a tool that helps trainers and physicians to endorse or adjust their physical training and rehabilitation procedures based on the objective evaluation of patients.


Introduction
Physical activity (PA) is defined as "any bodily movement produced by skeletal muscles that require energy expenditure" (Physical Activity, 2020).According to the World Health Organization (WHO), more than 1.4 billions of adults are insufficiently active.In fact, insufficiently active people have a 20 to 30% increased risk of death compared to sufficiently active people.Thus, regular physical activity (PA) can reduce the risk of hypertension, coronary heart disease, stroke, diabetes, cancer and depression (Physical activity, 2020).
Although regular PA benefits are well known, many adults do not follow PA recommendations, and poor adherence to exercise programs is often reported (Theofilou & Saborit 2013).This drop in PA is attributed to the inaction during leisure time and sedentary behavior (Thivel et al., 2018).Moreover, studies on PA and exercise programs report dropout rates of 20% to 50% within the first 3 to 6 months (Sáez, Solabarrieta, & Rubio, 2021;Viken et al., 2019).And factors associated to this attrition rate are motivation, time, access to facilities among others (Linke, Gallo, & Norman, 2011).
On the other hand, PA levels assessment is essential to develop proper training and exercise programs for people.PA assessment is commonly conducted through subjective (indirect) and objective (direct) instruments (Kowalski, Rhodes, Naylor, Tuokko, & MacDonald, 2012).Indirect PA measurements rely on selfreport mechanisms such as the International Physical Activity Questionnaire (IPAQ) (Cleland, Ferguson, Ellis, & Hunter, 2018;Craig et al., 2003).These measurements are practical, easy to administer, cheaper than direct measurements, and generally well accepted.However, their result tends to be subjective, over or underestimate the results and depends on the experience of the assessor (Kowalski et al., 2012;Sylvia, Bernstein, Hubbard, Keating, & Anderson, 2014).
Ultimately PS and COP have been combined with various artificial intelligence and machine learning techniques to classify and predict levels of PA and activities of daily living (ANN) (Pires, Garcia, Pombo, Flórez-Revuelta, & Spinsante, 2017), detection of falls, balance features, and medical condition diagnosis using techniques such as multiclass SVM, Random Forest, Multiple Layer Perceptrons (MLPs), Radial Basis Function Neural Networks (RBNs), and Deep Belief Networks (DBNs), or adaptive radial basis function (RBF) and neural sliding-mode (ARNNSM) (Bao, Klatt, Whitney, Sienko, & Wiens, 2019;Costa et al., 2016;Pires et al., 2017;Sun, Hsieh, & Sosnoff, 2019;Yang & Gao, 2020).Based on state of the art and current trends in intelligence and machine learning algorithms, we hypothesized that machine learning techniques combined with the COP, and anthropometric measurements of a person will help PA assessors, physicians, and policymakers to evidence the PA level of people objectively and to adjust the current training programs to enhance results and improve adherence.
In this paper, we present the methods, results, and discussions resulting from a pilot cross-sectional feasibility study to test the possibility of classifying the level of PA of people based on direct measurements of the COP and anthropometric features using support vector machines SVM.Results predicted were compared by using the IPAQ-Short Form.

Participants
This is a cross-sectional study that included 15 healthy adults (male = 8 and women = 7) between the ages of 18 and 57 years.Participants were recruited from the Nueva Granada Military University research groups in Bogota, Colombia.After the study was clearly explained to all eligible participants, they signed a consent form before starting the study.

Study design and outcome measures
Eligible participants were asked to perform three evaluations.First, volunteers were asked to answer the International Physical Activity Questionnaire -Short Form (IPAQ-SF) (Craig et al., 2003).Then participants performed tests under two conditions.First, participants performed four static standing exercises.
Participants were asked to stand still, barefoot with feet together and their arms and hands to their sides for one minute.This procedure was repeated three times for each exercise and then participants had two minutes to rest between each exercise of the static test to avoid fatigue.
The included exercises were as follow: A foam of 45 x 45 x 5 cm was placed over the measurement surface to emulate an unstable surface.
The second test was under dynamic conditions.On this occasion, participants were asked to perform the same posture as the static test maintaining the same time intervals for each exercise.Different from the first test, participants were asked to reach the maximum point in eight different directions without losing their posture and without taking their feet off the ground.The direction was a random command voice ordered by the researcher.
Such one minute was divided into eight parts to ensure equal time and allow the participant to reach all the excursion directions.The rest time interval between static and dynamic tests was five minutes, just like the rest time between exercises.
Besides, the static test consisted of participants standing with feet together.This test measures a person's PA level and classifies this into low, moderate, or high.
The eight excursion direction were as follow: The main outcomes used to train machine learning algorithms were the Index of physical activity (IPA) and the features of PS measured throughout the Center of Preasure (COP) in anteroposterior (AP) and mediolateral (ML) directions.The feature extraction section will explain further processing methods and features selected from the COP.In addition, demographic and anthropometric variables were collected.
Figure 1 shows the experimental setup used for both tests.

Data acquisition
PS features were captured using the Biosignalsplux force platform which was integrated through an interface developed on Visual Studio C#.The interface was designed to guide the assessor and participant throughout the trial and to ensure that all demographic and anthropometric data were collected.
Anthropometric data collected were as follow: 1. Height; 2. Right foot length; 3. Weight; 4. Body Mass Index.The interface used a sample time of 10 ms in order to avoid missing data.Also, this interface allowed the researcher to have visual feedback of the participants' performance in the trial.Calibration and estimation of the Center of Pressure procedure were based on manufacturer's specifications (Center of Pressure, 2021).

Data analysis
Demographic, anthropometric, and main PS variables and the IPAQ-SF score were summarized and tested for normality by using the Shapiro-Wilk test (Fregni & Illigens, 2018).After finding data were not normally distributed, Wilcoxon test for continuous variables and Fisher's exact test for categorical variables were used to compare women and male characteristics.Also, to find whet her any correlation existed between the IPAQ-SF score and main features from COP and anthropometric variables, we used the Kruskal-wallis method.A significant p-value was set to p < 0.006.It was adjusted using Bonferroni's correction (Armstrong, 2014) because participants data were tested multiple times.Statistical tests were carried out in Stata BE 17.

Data pre-processing and features extraction
Raw data were pre-processed by applying a fifth low-pass Butterworth filter with a cut-off frequency of 5Hz.This frequency is considered to be superior to normal PS frequency and also rejects external disturbances (Doyle, Hsiao-Wecksler, Ragan, & Rosengren, 2007;Paillard & Noé, 2015).To characterize PS data, we calculated fourteen kinematic measures based on the most common features stated in the literature (Bao et al., 2019;Lemay et al., 2014;Olivier, Viseu, Vignais, & Vuillerme, 2019) .Also, variations are expected even when participants were asked to place their feet on specific marks to get similar starting points.Thus, we eliminated that off-set value from all samples by sustracting a highorder polynomial function (see Figure 2).After processing the raw data, we calculated fourteen common kinematic metrics (Figure 3) for each of the samples collected from our fifteen participants.Then, we combined those metrics with six statistical descriptors: 1) mean, 2) standard deviation, 3) maximum value, 4) variance, 5) mode, and 6) median.After this process, we came out with a total of 26 features that intend to describe the PS behavior under the experiment's conditions.Also, anthropometric and independent variables such as type of test, surface, eyes condition, age and gender were added as features to be taken into account by the machine learning algorithms.The IPAQ-SF scores and classification were added to be used as the predicted value.In total 36 features were used to predict the IPAQ-SF score using machine learning techniques (see Table 1).

Classification: support vector machines (SVM)
Using the data above, we aimed to predict an individual's index of PA classified into three levels: Low, Moderate, and High.We estimated the IPAQ score by training a multi-class support vector machine with a Radial Basis Function kernel (RBF) (Zhihua & Li, 2012).
In this case, we adopted a one-vs-one approach for the multi-class classification because it is less sensitive to an in balanced data sets compared to one-vs-all approach (Bao et al., 2019).As we trained the SVM with the RBF kernel, the hyperparameter C and gamma must be considered.The hyperparameter C makes up for misclassification of training examples against simplicity of the decision surface and gamma defines how much influence a single training example wields (Pedregosa et al., 2011).Mediolateral direction (ML), Anterioposterior direction (AP) , Max refers to the maximum value of that variable.
SVM constructs hyperplanes in high dimensional space, which can be used for classification and regression.The support vector classifier SVC chooses the classifier that separates the classes with maximum margin; in general, the larger the margin, the lower the generalization error of the classifier (1.4. Support Vector Machines, 2011;Costa et al., 2016).The SVM was implemented using the Scikit-learn 1.0 library (Pedregosa et al., 2011).Next, a brief description of the mathematical formulation of the SVC used is provided.For more detailed information, refer to the respective references (Pedregosa et al., 2011;Smola, & Schölkopf, 2004).
SVC solves the following primal problem: Minimize Subject to   (  () + ) ≥ 1 −   ,   ≥ 0,  = 1, . . .,  Of note, the margin is maximized by minimizing ‖‖ 2 , while incurring a penalty when a sample is misclassified or within the margin boundary.The strength of this penalty is controlled by the term C.
This method constructs a Lagrange function from the primal problem, this function is called the dual problem and helps to solve the optimization problem more easily.Mean AP COP 3.
Mean ML COP speed 14.
Mean ML COP speed 15.
Total ML COP distance travelled 22.
Total AP COP distance travelled 23.
Max ML COP frequency 24.
Max AP COP frequency 25.
Mean where (  , ) is the kernel.Thus, to obtain the best performance from our algorithm, we tuned the hyperparameters by using the RandomizedSearchCV (Li & Phung, 2014) with C and gamma separated exponentially to choose good values.This method from the Scikit-learn library (Pedregosa et al., 2011) uses cross-validated search over parameter settings to estimate C and gamma values.We selected the best C value according to the best F1 value score.

Evaluation and feature importance
For the validation process, we divided our data randomly into test data set and train data set in a proportion of 20% and 80% respectively.Then, for performance validation, we calculated the F1 score, recall, and precision (Alzoman & Alenazi, 2021).Also, classification reports and confusion matrixes were applied to evidence the performance of the SVM model.After evaluating model performance, the feature importance value of each feature of the dataset was obtained by using the feature importance property of the SVM model.This property gives a score for each feature, where the higher the score, the more important or relevant the feature.Also, features resulting statistically significant to predict the IPAQ score were added as important features.

Results and discussion
Table 2 shows the baseline characteristics of the fifteen participants in the study.Main COP variables according to feature importance score were selected and data collected from the eyes open, firm surface, and static test serve as the baseline values of these features.As the normality test found that data were asymmetrical, we selected the median value and the standard deviation to describe the trend of some features grouped by gender.
Also, IPAQ-SF classification is shown according to each level and grouped by gender.No statistical significance of the anthropometric features was assessed since they are not expected to be comparable between groups.However, PS behavior is not statistically significant among groups compared to the baseline.
Differences among the other trial status were tested by comparing gender.However, similar results were found.Thus we combined data from women and men into the same dataset in order to feed our SVM algorithm.
Table 3 shows the comparison of the PA levels according to the anthropometric and main COP features grouped by test type.These results suggest that according to their IPAQ-SF score, people performs differently on ML displacement and ML total length under static condition.Also, maximum frequencies on AP movement seem to be different among IPAQ-SF scores under both test conditions.Evaluation of the COP features showed to be more significant among the IPAQ-SF on static test conditions than dynamic test conditions.Of note, after the intervention, participants reported experiencing more complexity in the dynamic exercise, especially under eyes closed and foam surface.
As can be seen in the results, PS performance is related to the level of PA, especially when comparing low levels versus moderate and high.However, the IPAQ-SF score shows limitation differencing between moderate and high PA levels.These results support the finding in the literature (Alsubaie, 2020;Kiers et al., 2013), where, in general, moderate and high PA levels have more control on their PS than low PA activity.
Although participants with high and moderate IPAQ-SF score have a similar results in static test conditions, it is interesting that under dynamic test conditions, participants with high IPAQ-SF showed a worse behavior than participants with a moderate level of PA.Hence, their performance is similar to low PA level participants.
Table 4 shows the performance of the SVM classifier before and after feature importance selection.It also reflects the different hyperparameters tunned.As can be seen, high values of C were necessary to maximize the margin between classes, resulting in a better performance of the algorithm.Although 360 data were collected from the participants, values were summarized into 120.This is to avoid repeated values from the same subject performing the same exercise in the training and test sets.
Acta Scientiarum.Technology, v. 45, e61317, 2023 Confusion matrixes of the algorithm using all features and fifteen top features are presented in Figure 3.In this case, 24 data were randomly selected to be predicted.In general, SVM implemented algorithms have better results in classifying participants with a moderate level of PA.Nonetheless, results suggest that it is possible to classify the PA level of a person using direct measurements such as anthropometric and COP features.Main limitations of this study include the limited number of participants for each level of PA.Also, though participants were asked to perform exercises similarly, they have different compensation techniques to balance their bodies and adjusted them on each repetition.Then, the response between each trial might have varied.
On the other hand, although IPAQ is an indirect measurement of the PA level, its assessment is still subjective and might be over or underestimated; the result of the test might not necessarily mean the real PA status of a person.Nonetheless, the SVM classifier showed good performance evidencing the relationship between the IPAQ-SF score and direct measurements.These results over 95% are comparable with other studies that applied direct measurements of the human being such as ECG (Allam, Samantray, & Ari, 2020;Prakash & Ari, 2019) and EEG (Venkata Phanikrishna, Jaya Prakash, & Ari 2021), and applied similar ML techniques with similar average results.
Of note, these results are comparable with the study of (Liao, Wu, Wei, Chou, & Chang, 2021), where they used a similar database describing human balance to the analysis of COP signals by using decision tree and empirical mode decomposition to predict falls among older adults.
Although dynamic tests were performed to maximize the difference between participants depending on their PA level, if any, no significant difference was evident in the study.This was attributed to the minimal rate of Low and High PA participants.Finally, we believe that regardless of the PA, the exercise would reflect its impact on the COP behavior.However, we recognize that the type of PA can be a confounder that increases the COP features.So for future studies, we plan to test the impact of different sports on the COP behavior.

Conclusion
This feasibility study proves that it is possible to classify the PA level of a person using direct measurements.The high results of F1 score, recall, and precision of the algorithms designed validate the use of these techniques to predict the IPAQ-SF.However, future studies that enrolling larger balanced sample sizes are needed to prove the main hypothesis of this article.By its part, the designed classifier can be a tool that objectively supports individuals' physical assessment and diagnosis processes.This system shows the possibility of using similar solutions to support the diagnosis processes and PA assessment by specialized personnel.Future work intends to create a PA model that uses direct measurements to predict and assess the actual PA level of an individual.Authors believe that such a model might be helpful to support training and rehabilitation processes using assistive technologies.

Table 2 .
Baseline features by gender.

Table 3 .
PA levels according to direct measurements per test.

Table 4 .
Performance of the SVM classifier before and after feature importance selection.