Validity of the Polar H7 Heart Rate Sensor for Heart Rate Variability Analysis during Exercise in Different Age, Body Composition and Fitness Level Groups

This work aims to validate the Polar H7 heart rate (HR) sensor for heart rate variability (HRV) analysis at rest and during various exercise intensities in a cohort of male volunteers with different age, body composition and fitness level. Cluster analysis was carried out to evaluate how these phenotypic characteristics influenced HR and HRV measurements. For this purpose, sixty-seven volunteers performed a test consisting of the following consecutive segments: sitting rest, three submaximal exercise intensities in cycle-ergometer and sitting recovery. The agreement between HRV indices derived from Polar H7 and a simultaneous electrocardiogram (ECG) was assessed using concordance correlation coefficient (CCC). The percentage of subjects not reaching excellent agreement (CCC > 0.90) was higher for high-frequency power (PHF) than for low-frequency power (PLF) of HRV and increased with exercise intensity. A cluster of unfit and not young volunteers with high trunk fat percentage showed the highest error in HRV indices. This study indicates that Polar H7 and ECG were interchangeable at rest. During exercise, HR and PLF showed excellent agreement between devices. However, during the highest exercise intensity, CCC for PHF was lower than 0.90 in as many as 60% of the volunteers. During recovery, HR but not HRV measurements were accurate. As a conclusion, phenotypic differences between subjects can represent one of the causes for disagreement between HR sensors and ECG devices, which should be considered specifically when using Polar H7 and, generally, in the validation of any HR sensor for HRV analysis.


Introduction
Heart rate (HR) variability (HRV) is the oscillation in the intervals between consecutive heartbeats (RR intervals) [1]. In the last decades, the use of HRV has been popularized, since it allows assessing cardiac autonomic modulation using simple and non-invasive techniques. In general, lower HRV has been associated with poorer prognosis in different clinical conditions, while higher HRV, especially regarding high-frequency oscillations, has been associated with better health. In particular, reduced HRV has been reported in several cardiovascular diseases and has been used for risk stratification, confirming its value as a Values are expressed as mean ± standard deviation (SD). BMI = Body mass index; PWC 80% = Physical Work Capacity at 80% of maximum HR (208 − 0.7 * age in years) in watts/kg bodyweight.

Procedure
All subjects completed one test session. Prior to the test, they were asked to adhere to the following instructions [18]: (1) avoid exercise or strenuous physical activity the day before the test; (2) drink plenty of fluids over the 24-h period preceding the test; (3) get an adequate amount of sleep (6-8 h) the night before the test; (4) avoid substances such as tobacco, alcohol or stimulants (caffeine, theine, taurine, etc.) in the 8 h before the test; (5) avoid food intake for 3 h prior to performing the test; and (6) wear comfortable, loosefitting clothing. Subjects' skin was prepared by using a razor to remove any hair from the electrode sites, cleaning the skin with alcohol and drying it with a gauze. A 12-lead high-resolution Holter ECG was acquired, with the 10 electrodes placed as indicated by the manufacturer (H12+, Mortara Instrument, Milwaukee, WI, USA), ensuring that they did not interfere with the HR sensor strap (Polar H7, Polar Electro Oy).
The test was conducted in an environmentally controlled room (22)(23) • C), between 16:00-20:00, and was divided into 3 consecutive segments: resting (S REST ), cycling (S CY ) and recovery (S REC ). During S REST , volunteers were monitored while seated at rest for 5 min, without any movement or talking. A period of 2-3 min was established to change from the chair to the cycle-ergometer, namely from S REST to S CY , during which the subject rode the electrically braked cycle-ergometer (Ergoselect 200 K, Ergoline; Bitz, Germany) at 50 W workload and chose a cadence which was maintained during the entire test according to the workload and cadence displayed in the cycle-ergometer screen. S CY was a submaximal cycle-ergometer test divided into three stages lasting 5 min each. In order to avoid a maximal exercise test, the maximum heart rate (HRmax) was estimated for each subject by using the formula defined by Tanaka et al. HRmax = 208 − 0.7 * age (years) [19]. Workload was adjusted during each stage to 60, 70 and 80% of HRmax, with these stages denoted as S CY60 , S CY70 and S CY80 , respectively. Finally, during S REC , volunteers remained seated again for 5 min without any movement or talking. Figure 1 shows an example of the temporal evolution of RR intervals from a subject throughout the entire test.
Workload was adjusted during each stage to 60, 70 and 80% of HRmax, with these stages denoted as SCY60, SCY70 and SCY80, respectively. Finally, during SREC, volunteers remained seated again for 5 min without any movement or talking. Figure 1 shows an example of the temporal evolution of RR intervals from a subject throughout the entire test. Figure 1. Example of the RR intervals for one subject throughout the entire test. Dotted lines separate the different test segments: resting (SREST), cycling (SCY) and recovery (SREC). SCY was divided into three stages corresponding to 60, 70 and 80% of HRmax, denoted as SCY60, SCY70 and SCY80, respectively.

Data Recording
Subjects self-reported their birth date, current diseases and medication. The anthropometric characteristics of the subjects were assessed. Stature was measured to the nearest 0.001 m using a portable stadiometer (SECA 225, Hamburg, Germany), with subjects standing with their scapula, buttocks and heels resting against a wall, the feet with the heels touching, forming a 45° angle and the head in the Frankfort's plane. A portable body composition analyzer (TANITA BC-418MA; Tanita Corp., Tokyo, Japan) was used to measure the body mass to the nearest 0.1 kg, with underwear and after urination. TANITA BC-418MA was also used to estimate the percentage of body fat and trunk fat (r = 0.87, p < 0.001 vs. dual-energy X-ray absorptiometry) [20]. Body mass index (BMI) was calculated dividing weight in kilograms by height in squared meters.
Beat-to-beat RR intervals with 1-ms resolution were obtained using a Polar V800 HR monitor simultaneously with a Polar H7 chest Soft Strap (Polar Electro Oy, henceforth referred to as PolarH7). Concomitantly, a 12-lead ECG was recorded at a sampling rate of 1000 Hz using a high-resolution Holter device (H12+, Mortara Instrument, henceforth referred to as ECG and used here as a reference).
VO2max can be estimated from submaximal exercise tests, a safe and feasible method showing good validity against maximal tests (correlation coefficients: 0.69 to 0.98) [21]. Rather than commonly used tests with stages of short or variable duration, an Figure 1. Example of the RR intervals for one subject throughout the entire test. Dotted lines separate the different test segments: resting (S REST ), cycling (S CY ) and recovery (S REC ). S CY was divided into three stages corresponding to 60, 70 and 80% of HRmax, denoted as S CY60 , S CY70 and S CY80 , respectively.

Data Recording
Subjects self-reported their birth date, current diseases and medication. The anthropometric characteristics of the subjects were assessed. Stature was measured to the nearest 0.001 m using a portable stadiometer (SECA 225, Hamburg, Germany), with subjects standing with their scapula, buttocks and heels resting against a wall, the feet with the heels touching, forming a 45 • angle and the head in the Frankfort's plane. A portable body composition analyzer (TANITA BC-418MA; Tanita Corp., Tokyo, Japan) was used to measure the body mass to the nearest 0.1 kg, with underwear and after urination. TANITA BC-418MA was also used to estimate the percentage of body fat and trunk fat (r = 0.87, p < 0.001 vs. dual-energy X-ray absorptiometry) [20]. Body mass index (BMI) was calculated dividing weight in kilograms by height in squared meters.
Beat-to-beat RR intervals with 1-ms resolution were obtained using a Polar V800 HR monitor simultaneously with a Polar H7 chest Soft Strap (Polar Electro Oy, henceforth referred to as PolarH7). Concomitantly, a 12-lead ECG was recorded at a sampling rate of 1000 Hz using a high-resolution Holter device (H12+, Mortara Instrument, henceforth referred to as ECG and used here as a reference). VO 2 max can be estimated from submaximal exercise tests, a safe and feasible method showing good validity against maximal tests (correlation coefficients: 0.69 to 0.98) [21]. Rather than commonly used tests with stages of short or variable duration, an ad-hoc test with 5-min stages was defined to allow reliable estimation of the low-frequency power of HRV. This enabled assessment of HRV response to increased sympathetic activity with each cycling stage [22]. Cardiorespiratory fitness was assessed using the approach of "Physical Work Capacity" (PWC). PWC in watts was measured during S CY80 of the submaximal cycle-ergometer test and was subsequently divided by the subject's body weight (PWC 80% in W/kg). Alternatively to the use of fixed HR thresholds, this method incorporates the age-dependent decline of HRmax [23,24] and has been previously used as an objective assessment of cardiorespiratory fitness [25,26].

Data Analysis and Processing
Raw RR interval time series, RR P (i), recorded by PolarH7 were downloaded from the "Polar Flow" web platform. RR interval time series from the ECG, RR E (i), were extracted using a multi-lead approach by a wavelet-based detector [27] with optimized parameters for noisy environments as described in [28]. Each beat detection was manually verified by an operator with a dedicated interface.
The delay between the RR interval series RR P (i) and RR E (i) was estimated as the time lag maximizing their cross-correlation over the first 3 min of the test when the subject is relaxed. Then, both series were synchronized by compensating for this delay. Since the two RR interval series can have different lengths, due to, e.g., wrong or missed beat detections in the Polar data, an algorithm was developed to match the RR intervals from both series, thus allowing characterization of the agreement between the paired series RR P (ip) and RR E (ip), where ip refers to the indices of beats that are matched in the two series.

Temporal Domain
The following temporal HRV indices were studied [1]: mean HR (MHR), standard deviation of normal-to-normal RR intervals (SDNN) and root mean square of successive differences of adjacent normal-to-normal RR intervals (RMSSD). MHR was obtained as the inverse of the mean of the RR intervals. SDNN is considered a measure of the total power of HRV and was calculated from the standard deviation of the NN intervals, i.e., normal RR intervals after correcting for ectopic beats [29]. RMSSD is a measure of short-term variability and was computed by the root mean square of successive differences between adjacent NN intervals. These indices were obtained from RR P (i) and RR E (i) in each segment of the test.

Frequency Domain
The instantaneous HR signal, d HR (n), was derived from both RR P (i) and RR E (i) and sampled at 4 Hz. The integral pulse frequency modulation (IPFM) model was used while dealing with the presence of ectopic beats [29]. This signal was high-pass-filtered (0.03 Hz) to remove the very low-frequency components, d MHR (n), and it was also corrected by it: [30].
The smoothed pseudo Wigner-Ville distribution (SPWVD) was applied to m(n) to estimate its time-varying spectrum. Time and frequency smoothing windows were chosen as described in [17]. The instantaneous power in the low-frequency band, P LF (n), was extracted integrating the SPWVD from 0.04 to 0.15 Hz for each time instant. The instantaneous power in the high-frequency band, P HF (n), was computed in a band centered on the respiratory frequency with a bandwidth of 0.25 Hz. Figure 2 shows an example of d HR (n), P LF (n) and P HF (n) obtained from RR E . In some analyses, mean P LF and P HF were calculated from P LF (n) and P HF (n) for each segment of the test.
Sensors 2021, 21, x FOR PEER REVIEW 6 of 14 ( ), PLF (n) and PHF (n) obtained from RRE. In some analyses, mean PLF and PHF were calculated from PLF (n) and PHF (n) for each segment of the test.

Statistical Analysis
The normality of data was checked with the Kolmogorov-Smirnov test. Since the data distribution violated the assumption of normality of the parametric tests, and such a condition was not achieved by commonly employed transformations, a non-parametric analysis was performed. Descriptive values are presented as mean ± standard deviation (SD) and HRV values are reported as median and interquartile range. Statistical analyses were performed using IBM SPSS (version 25; Chicago, IL, USA). The significance level was set at p ≤ 0.05.
Wilcoxon test for paired samples, the non-parametric equivalent of the paired samples t-test, was used to determine differences between the temporal domain HRV data obtained from PolarH7 and from ECG. The magnitude of the differences was calculated by determining the effect size (ES): = /√ where Z represents the Z-score for the Wilcoxon statistic and n is the total number of observations [31]. Differences were considered small when ES < 0.2, small to medium when ES = 0.2-0.5, medium to large when ES= 0.5-0.8 and large when ES > 0.8 [32]. Lin's concordance correlation coefficient (CCC) was used to study the agreement between the following PolarH7-derived and ECG-derived signals: RR (ip), PHF (n) and PLF (n). CCC determines how much the observed data deviate from the perfect concordance line at 45° on a square axis scatter plot [33]. CCC was evaluated in each segment (SREST, SCY60, SCY70, SCY80 and SREC). A CCC value greater than 0.90 was considered "excellent" [34] and the percentage of subjects with CCC values below this threshold was reported for each segment. Example of d HR (n), P LF (n) and P HF (n) obtained from RR E for one subject: Resting segment (left) and cycling segment (right). Note that the axes have different scales. d HR (n) = instantaneous HR signal; P LF (n) = Instantaneous low-frequency power; P HF (n) = Instantaneous high-frequency power; RR E = RR intervals series from the ECG.

Statistical Analysis
The normality of data was checked with the Kolmogorov-Smirnov test. Since the data distribution violated the assumption of normality of the parametric tests, and such a condition was not achieved by commonly employed transformations, a non-parametric analysis was performed. Descriptive values are presented as mean ± standard deviation (SD) and HRV values are reported as median and interquartile range. Statistical analyses were performed using IBM SPSS (version 25; Chicago, IL, USA). The significance level was set at p ≤ 0.05.
Wilcoxon test for paired samples, the non-parametric equivalent of the paired samples t-test, was used to determine differences between the temporal domain HRV data obtained from PolarH7 and from ECG. The magnitude of the differences was calculated by determining the effect size (ES): ES = Z/ √ n where Z represents the Z-score for the Wilcoxon statistic and n is the total number of observations [31]. Differences were considered small when ES < 0.2, small to medium when ES = 0.2-0.5, medium to large when ES= 0.5-0.8 and large when ES > 0.8 [32].
Lin's concordance correlation coefficient (CCC) was used to study the agreement between the following PolarH7-derived and ECG-derived signals: RR (ip), P HF (n) and P LF (n). CCC determines how much the observed data deviate from the perfect concordance line at 45 • on a square axis scatter plot [33]. CCC was evaluated in each segment (S REST , S CY60 , S CY70 , S CY80 and S REC ). A CCC value greater than 0.90 was considered "excellent" [34] and the percentage of subjects with CCC values below this threshold was reported for each segment.
Cluster analysis was performed to identify groups of subjects with similar characteristics in terms of the following three variables of interest: age, body composition (trunk fat percentage) and fitness level (PWC 80% ). Trunk fat percentage was selected among all body composition variables, since it is the most specific to the electrode placement area.
Following the methodology described in previous studies [35,36], two types of cluster analyses were combined: hierarchical clustering (Ward's method) and k-means clustering. First, individual and multivariate outliers (according to Mahalanobis distance) were detected to reduce the sensitivity of the Ward's method to outliers. Second, hierarchical cluster analysis was used, as the number of clusters in the data were unknown beforehand. Examination of dendrograms showed that a four-cluster solution produced good differentiation between groups. Finally, k-means cluster was performed with four possible solutions. Compared to hierarchical methods, k-means cluster analysis is considered less sensitive to outliers and has been found to result in greater within-cluster homogeneity and between-cluster heterogeneity [35].
To assess differences in the percentage of error for each HRV index between the four cluster groups, a Kruskal-Wallis test (non-parametric equivalent of one-way analysis of variance, ANOVA) with Bonferroni correction was performed. The Dunn-Bonferroni post hoc method was used for pairwise comparisons. The relative error in HRV indices was calculated as the absolute error of the PolarH7 with respect to the ECG measurement divided by the reference ECG measurement, e.g., (SDNN ECG − SDNN PolarH7 )/SDNN ECG , which was then multiplied by 100 to obtain the percentage of error (%Error). In the case of the frequency HRV variables, %Error was calculated from the mean value for each segment of the test. To evaluate the magnitude of the differences, ES was calculated as: ES = H/( n 2 − 1 /(n + 1)), where H stands for the Kruskal-Wallis test statistic and n is the total number of observations [31]. Table 2 shows the descriptive characteristics of the 4 cluster groups, which were described as CLUSTER A (High PWC 80% ), CLUSTER B (Low PWC 80% and low age), CLUSTER C (Low PWC 80% , high age and medium trunk fat percentage) and CLUSTER D (Low PWC 80% , high age and high trunk fat percentage).  Table 3 shows the values of HRV indices obtained from PolarH7 and ECG. Mean P LF and P HF were calculated from P LF (n) and P HF (n) for each segment (differently from Table 4, where the instantaneous series were used). Wilcoxon test for paired samples revealed that P HF and temporal domain HRV indices (MHR, SDNN and RMSSD) were lower at all cycling stages (S CY60 , S CY70 and S CY80 ) when measured by PolarH7, with P LF being lower at the highest intensity (S CY80 ) when measured by PolarH7. The magnitude of all these differences was small to medium, i.e., 0.2-0.5 according to the effect sizes. Values are expressed as median and interquartile range. Segments are based on the test phases: resting (S REST ), cycling (S CY ) and recovery (S REC ). S CY was divided in three stages at 60, 70 and 80% of HRmax, denoted as S CY60 , S CY70 and S CY80, respectively. P LF = low-frequency power; P HF = high-frequency power; MHR = mean HR; SDNN = SD of the NN intervals; RMSSD = root mean square of successive differences between NN intervals. ES = Effect size. * = Significant differences between devices (p ≤ 0.05, Wilcoxon test for paired samples). Values are expressed as CCC mean and (percentage of subjects under 0.9 threshold). Segments are based on the test phases: resting (S REST ), cycling (S CY ) and recovery (S REC ). S CY was divided in three stages at 60, 70 and 80% of HRmax, denoted as S CY60 , S CY70 and S CY80 respectively. The characteristics of each cluster were the following: CLUSTER A = high fitness; CLUSTER B = low fitness and low age; CLUSTER C = low fitness, high age and medium trunk fat percentage; CLUSTER D = low fitness, high age and high trunk fat percentage. RR (ip) = paired RR interval series; P LF (n) = instantaneous low-frequency power; P HF (n) = instantaneous high-frequency power. Table 4 shows CCC values for RR (ip), P LF (n) and P HF (n) and outlines the percentage of subjects not reaching excellent agreement (CCC > 0.90) for each segment of the test. The number of subjects not reaching excellent agreement was clearly higher for P HF (n) than for P LF (n) (χ 2 (degrees of freedom); χ 2 (1) = 45.52; p < 0.001), it increased with exercise intensity (χ 2 (2) = 38.47; p < 0.001) and was lower during exercise than during S REC (χ 2 (1) = 42.31; p < 0.001). When performing the analysis separately for each identified cluster, CLUSTER A obtained the highest CCC values, with CLUSTER D being the group with less subjects showing optimal agreement between devices in P HF (n). Due to the presence of noise in the RR P (i) series during S REC , the instantaneous power could not be properly extracted in 4 volunteers and the final sample for S REC was N = 63. Table 5 shows %Error for each HRV index. Kruskal-Wallis test demonstrated significant differences between clusters in P HF at S REST and during exercise (S CY70 and S CY80 ). With regards to temporal domain HRV indices, SDNN showed significant differences between groups at S REST and during exercise (S CY60 and S CY80 ) and RMSSD showed significant differences at the highest intensities (S CY70 and S CY80 ). Both for P HF and for temporal domain HRV indices, CLUSTER D was the group with the highest %Error. The magnitude of all these differences was small, i.e., <0.2 according to the effect sizes.  Percentage error (%) values are expressed as median and interquartile range. The characteristics of each cluster were the following: CLUSTER A = high fitness; CLUSTER B = low fitness and low age; CLUSTER C = low fitness, high age and medium trunk fat percentage; CLUSTER D = low fitness, high age and high trunk fat percentage. Segments are based on the test phases: resting (S REST ), cycling (S CY ) and recovery (S REC ). S CY was divided in three stages at 60, 70 and 80% of HRmax, denoted as S CY60 , S CY70 and S CY80 respectively. P LF = low-frequency power; P HF = high-frequency power; SDNN= SD of the RR intervals; RMSSD = root mean square of successive differences between NN intervals. * = Significant differences between clusters (p ≤ 0.05, Kruskal-Wallis test). A = Different to CLUSTER A; B = Different to CLUSTER B; C = Different to CLUSTER C; D = Different to CLUSTER D.

Discussion
In this study, HRV analysis from RR intervals provided by PolarH7 at rest and during various exercise intensities has been validated against the same analysis from a simultaneous ECG recording. Wilcoxon test showed a large number of significant differences between devices in HRV indices during exercise. However, the effect size was small to medium and of little practical relevance in the case of MHR. When observing RR (ip), P LF (n) and P HF (n) signals, the percentage of subjects not reaching excellent agreement between devices (CCC > 0.90) increased with exercise intensity and was higher for P HF (n) than for P LF (n). Cluster analysis revealed that phenotypic characteristics like age, body composition and fitness level influenced HRV measurements as well as the differences between PolarH7 and ECG. In particular, CLUSTER D, composed of subjects with low fitness level, high age and high trunk fat percentage, was the group with the lowest number of subjects obtaining excellent agreement between devices for P HF (n) and with the highest %Error for time-and frequency-domain HRV indices.
The large number of significant differences in the temporal domain HRV indices (MHR, SDNN and RMSSD) between PolarH7 and ECG at all cycling stages could be due to a small but consistent difference between devices. Specifically, PolarH7 values were usually slightly lower than those measured by the ECG, which is supported by the obtained small to medium effect sizes. In the case of MHR, the values measured by the two devices for individual subjects were the same up to the 2nd-3rd decimal figure. Even if significant, such differences between devices may not be meaningful in practice, as differences of less than one beat per minute are below what has been reported as the smallest worthwhile change in previous studies [37].
Regarding analysis of the full paired series of RR intervals, excellent agreement between devices was found at rest, in accordance with previous validation studies of PolarH7/H10 HR sensors [10,12]. Also, our results confirmed that the agreement between devices decreased with the intensity of exercise, as previously reported [11,12]. Despite this reduction, Gilgen-Ammann et al. proposed Polar H10 as the gold standard for RR interval assessment during intense activities for HR and HRV evaluation [12]. It should be noted, however, that a reduced set of ten healthy, lean and physically fit volunteers was considered in [12], whereas here we investigated a larger set of volunteers with a broader range of ages, body compositions and fitness levels. This may explain why we found a more noticeable reduction in the agreement between devices during exercise.
Frequency-domain HRV indices, including P LF and P HF , were not usually investigated in previous studies validating PolarH7/H10 HR sensors. Nevertheless, these frequency-domain signals were evaluated in the previous Polar RS800 model, reporting that differences between devices increased with exercise intensity and were higher for P HF than for P LF [17]. In the present study, P LF showed excellent agreement at rest and during the whole exercise test, meaning that PolarH7 can follow HR oscillations up to 0.15 Hz. Still, the percentage of subjects reaching excellent agreement for P LF at the highest intensities (81% with CCC > 0.9 at 80% of HRmax) was lower than in [17] (96% with Pearson correlation coefficient > 0.8 at 80-100% of VO 2 max), possibly due to the greater heterogeneity of the present population sample. In the case of P HF , we found that PolarH7 and ECG showed disagreement at the highest intensities, in accordance with results reported for Polar RS800, possibly due to a multifactorial etiology, including the higher respiratory frequency and higher noise level during exercise, the processing performed by Polar when a beat cannot be detected and the effect of the body characteristics of the subjects [17]. Since both P HF , reflecting vagal modulation of cardiac activity, and its highly correlated time-domain HRV measures, such as RMSSD [38], are commonly used to monitor the autonomic status before, during and after exercise [39], a note of caution on the interpretation of results obtained from PolarH7 is suggested, especially at exercise intensities greater than 70% of HRmax.
Our results from clustering analysis confirmed the hypothesis that the phenotypic characteristics of the subjects are one of the causes for the observed differences between PolarH7 and ECG devices [17]. As initially postulated, CLUSTER D, containing subjects with low fitness, high age and high trunk fat percentage, was the one showing the highest %Error for HRV indices, reaching 50% error in P HF and 30% error in RMSSD for intensities greater than 70% of HRmax. Nevertheless, even in CLUSTER D, some subjects presented excellent agreement between devices, confirming that the characteristics of the subjects are not the only cause of disagreement. Future studies should clarify other possible reasons underlying the observed differences between devices.
The recovery from exercise was the time period when the lowest CCC values were measured, especially for P LF and P HF signals. To our knowledge, this is the first time that PolarH7 validity has been analyzed during sitting recovery. The lack of agreement between devices could be due to the noisy signal recorded by PolarH7 in some volunteers, as illustrated in Figure 3. Consequently, despite the correction algorithms, 10% of P LF signals and 27% of P HF signals presented very low agreement (CCC < 0.1). Based on these results, assessment of HRV, particularly P LF and P HF , during the recovery period may not be reliable if PolarH7 is used. This is in line with Schneider et al., who recommended evaluation of HR recovery rather than evaluating post-exercise HRV [7].
subjects are not the only cause of disagreement. Future studies should clarify other possible reasons underlying the observed differences between devices.
The recovery from exercise was the time period when the lowest CCC values were measured, especially for PLF and PHF signals. To our knowledge, this is the first time that PolarH7 validity has been analyzed during sitting recovery. The lack of agreement between devices could be due to the noisy signal recorded by PolarH7 in some volunteers, as illustrated in Figure 3. Consequently, despite the correction algorithms, 10% of PLF signals and 27% of PHF signals presented very low agreement (CCC < 0.1). Based on these results, assessment of HRV, particularly PLF and PHF, during the recovery period may not be reliable if PolarH7 is used. This is in line with Schneider et al., who recommended evaluation of HR recovery rather than evaluating post-exercise HRV [7].

Strengths and Limitations
The present study has several main strengths. One of them is the phenotypic variety of the 67 volunteers. HR sensor validation studies are often carried out in groups of 20, or even fewer, young, lean, healthy and physically fit volunteers [10][11][12], and in these con-

Strengths and Limitations
The present study has several main strengths. One of them is the phenotypic variety of the 67 volunteers. HR sensor validation studies are often carried out in groups of 20, or even fewer, young, lean, healthy and physically fit volunteers [10][11][12], and in these conditions it may be easier to detect RR intervals. Therefore, our larger sample of 20-to 70-year-old subjects with varied physical conditions is much more heterogeneous and representative of HR sensor users. Secondly, our assessment of PolarH7 validity as a function of phenotypic characteristics, including age, is particularly relevant considering that the older population is growing all around the world and more so in Europe [40], with advanced age being associated with changes in body composition and reduced cardiorespiratory fitness [41,42]. Taking into account these associations, cluster analysis was used to evaluate how the concurrence of these characteristics in the volunteers could affect HRV measurements obtained from PolarH7. In third place, cardiorespiratory fitness and the excess of body fat are strong predictors of mortality and risk of cardiovascular disease, being age the main risk factor for multimorbidity [40,42]. Accordingly, it is of special interest to evaluate the validity of these devices in subjects with these phenotypic characteristics. As discussed in the introduction of the study, the applications of HRV in the evaluation and management of a wide range of diseases are growing. The use of these inexpensive and simple to use devices could be a very useful tool for E-health in primary care. HRV measures that can be reliably assessed by HR sensors like PolarH7 need to be established so that interpretations can be safely made. Last but not least, all our body measurements and signal recordings were performed in the laboratory, under homogeneous conditions, enabling the control of confounding factors and the reproducibility of the study.
On the other hand, some limitations need to be acknowledged. According to the metaanalysis by Dobbs et al., the degree of absolute error between portable devices and ECG measurements was larger among studies involving a greater number of female subjects [9]. Here, only men were studied so that sex was not a confounding variable. Further research over other populations including not only women but also black/African would allow confirming the results obtained by this study in wider populations. Another potential limitation is that PolarH7 has been superseded by the Polar H10 band. Even so, millions of users still wear a PolarH7 band and the performance of both bands seems to be similar during stationary exercise [43]. Future studies could extend the research here presented to the analysis of other HR sensors.

Conclusions
Three major findings have emerged from the present study. First, assessment of HR and HRV in a relatively large and heterogeneous sample has confirmed that PolarH7 can accurately measure mean HR and low-frequency oscillations (up to 0.15 Hz) of HR at rest and during exercise. However, disagreement between PolarH7 and ECG exists when evaluating high-frequency HR oscillations during moderate-to-high intensity exercise. Second, the validity of PolarH7 measurements during sitting recovery has been studied for the first time. The results of the present research support the notion that PolarH7 is appropriate to study HR recovery rather than post-exercise HRV. Third, clustering analysis shows that the agreement between PolarH7 and ECG devices varies depending on the characteristics of the subjects regarding age, body composition and fitness level. Our results point to the need of ensuring phenotypic variety in any validation studies of HR sensors.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The datasets analyzed during the current study are available from the corresponding author on reasonable request.