Test-Retest Reliability and the Learning Effect on Isokinetic Fatigue in Female Master ’ s Cyclists

Background: Isokinetic exercise is commonly used as a benchmark for strength and performance. Objective: The purpose of this investigation was to establish isokinetic fatigue test-retest reliability and examine the learning effect when testing without familiarization. Methods: 22 masters-aged [53±5 years), competitive female cyclists completed 3 separate 50-repetition knee flexion/extension tests on a Biodex, separated by one-week with no familiarization. Testretest reliability [intra-class correlation [ICC]), 95% confidence intervals [CI), technical error of measurement [TEM) were calculated. Results: ICCs between trials exhibited excellent reliability during extension [.93–.97) and flexion [.93–.97) for all variables except time to peak torque [ICC=.35 and.45 for extension and flexion, respectively) and fatigue index [ICC=.47 for flexion). Relative TEM was minimal for extension between trial 1 and trial 2 [0.27%–0.97%) and between trial 2 and trial 3 [0.27%–1.45%) for all variables. Similar results were observed for flexion between trial 1 and trial 2 [0.87%–2.45%) and between trial 2 and trial 3 [0.54%–1.10%). No differences [Wilks Λ>.05) existed between trials, indicating no learning effect associated with the tests. Conclusions: There was strong test-retest reliability in masters-aged, female athletes and no learning effect was associated with the Biodex during a knee extension/flexion fatigue protocol.

Another aspect of consideration during isokinetic evaluations is the learning effect related to repeated testing.To date, only two known investigations have examined the learning effect associated with the Biodex (Lund et al., 2005;Symons, Vandervoort, Rice, Overend, & Marsh, 2005).An initial study investigated the learning effect between two subsequent testing sessions separated by 2-10 days, suggesting test-retest reliability was high (ICC range = 0.84 -0.94) between sessions during a 5-repetition, muscular strength protocol (Symons et al., 2005).Unfortunately, this investigation did not perform a 3 rd follow-up test, which would have confirmed whether a learning effect occurred between trials.As a result, the overall learning effect cannot be determined.Another study suggested there was no learning effect associated with the Biodex during knee extension/flexion, demonstrating strong reliability (ICC range = 0.89 -0.94) between multiple measurements taken on the same day and over a longitudinal (one week) period (Lund et al., 2005).However, it is important to note that in this investigation, participants were provided a familiarization trial in which to become acquainted with the equipment and procedures prior to data collection commencing (Lund et al., 2005).The implementation of a familiarization trial eliminates the ability to detect a true learning effect and minimizes external validity of test-retest reliability for the Biodex isokinetic dynamometer.Altogether based on previous literature, it remains inconclusive whether a true learning effect exists on the isokinetic dynamometer.
Strength testing during fatiguing protocols in populations such as MA must be reliable in order to establish long-term efficacy for subsequent evaluations.Furthermore, it is imperative that the learning effect of the Biodex be evaluated without the implementation of a familiarization trial to establish testing efficacy from a clinical perspective.Therefore, the purpose of this investigation was two-fold, 1) to establish test-retest reliability of the Biodex during fatiguing exercise in MA, and 2) to determine whether a true learning effect exists with the Biodex when utilized without a familiarization.

Particpants
This study included 22 masters-aged female cyclists from the Southern region of the United States (Table 1).Cyclists were recruited as the push/pull nature of lower-body isokinetic exercise (i.e., knee extension/flexion) relates to muscle pattern activation utilized during cycling exercise (So, Ng, & Ng, 2005).Females were specifically recruited because they exhibit greater levels of internal motivation compared to males, which would minimize external (life-related) factors affecting testing variability (Gillet & Rosnet, 2008).MA classification requirements were determined based upon those set forth by USA Cycling and World Masters Cycling organizations.Inclusion criteria included: a) an age ≥ 35 years, b) not classified as an elite cyclist or competitor in an event based on the international cycling federation (Union Cycliste Internationale, UCI) standards, and c) not a member of a registered team under the UCI.For this investigation, MA were also required to have cycled at least 2 years for a minimum of 3 days per week (Glenn, Gray, Stewart, Moyen, Kavouras, & DiBrezzo, 2015;Glenn et al., 2016).Individuals experiencing acute or chronic lower-body musculoskeletal injuries were excluded from participation.With regard to the learning effect associated with the Biodex assessment, the rationale for using trained athletes was two-fold: 1) When determining the learning effect of a measure, the steadiness of the participant must be considered, and trained cyclists are familiar with the movement patterns associated with isokinetic knee extension/flexion (Lund et al., 2005;So et al., 2005).2) Subjects should be well motivated when determining learning effects, and athletes participating in competitive sports exhibit greater levels of intrinsic motivation compared to non-competitive counterparts (Frederick-Recascino & Schuster-Smith, 2003;Lund et al., 2005).As previously mentioned, females were specifically recruited because of their greater levels of internal motivation compared to males (Gillet et al., 2008).
Based on these parameters, we chose to test female cyclists.As an aging population (that is more prone to knee injuries) has not yet been investigated, we chose MA.All measures and procedures were approved by the University's Institutional Review Board prior to testing and all subjects completed a health history questionnaire and signed a statement of informed consent prior to participation.Participant recruitment was completed via email, fliers, and visits to local cycling clubs and organizations.

Procedures
Food logs were distributed to all participants to record food and fluid intake for the 24 h prior to each trial.Participants were asked to replicate their 24-hour dietary intake from the first trial for all subsequent trials.To account for dietary intake affecting outcome measures on testing days, participants were required to fast for 3 h prior to each trial (Glenn et al., 2016).All participants refrained from vigorous exercise, alcohol, and caffeine during the 24 h prior to each trial.Participants verbally confirmed adherence to all controls prior to each trial.Additionally, participants were instructed to wear clothes and shoes in which they would normally exercise, and wore similar attire for all trials.
Participants reported to the laboratory for 3 visits.The initial visit included completion of an informed consent and health history questionnaire, demographic and body composition measurements, and baseline testing for the isokinetic exercise protocol (described in detail below).Body mass was assessed using a beam scale (Detecto 437 Eye-Level Weigh Beam Physician Scale, Irvington, NJ) and height was measured with a stadiometer (Detecto, Webb City, MO).Body fat and lean mass were measured via dual-energy x-ray absorptiometry (DXA; General Electric, Fairfield, CT).Prior to DXA analysis, proper calibration procedures and quality assurance analysis were followed as previously described (Glenn, Gray, & Vincenzo, 2014).In order to determine the learning effect associated with isokinetic exercise, the baseline evaluation was considered trial 1, and no familiarization was provided to the equipment or protocol.Participants were also not permitted to be in the laboratory while other evaluations were being conducted in order to ensure initial introduction to the assessment was standardized (Brown & Weir, 2001).None of the participants had ever undergone isokinetic exercise testing prior to participating in this investigation.
After baseline testing, participants reported to the lab for trials 2 and 3 and completed the same isokinetic exercise protocol.To ensure any learning effects were solely associated with the isokinetic exercise protocol, all trials were separated by exactly 1 week; no variation in this was permitted.Trials for each participant were also scheduled at the same time (± 1 hour) to ensure chronobiological control (Altamirano, Coburn, Brown, & Judelson, 2012;Mota, Stock, Carillo, Olinghouse, Drusch, & Thompson, 2015).Finally, to mask real-time performance results, participants were not permitted to see the real-time computer output during the testing procedure.

Isokinetic Exercise Testing
The Biodex system II Isokinetic Dynamometer (Biodex Medical, Inc., Shirley, NY) was used to measure isokinetic exercise variables.Once seated on the dynamometer, the participant was instructed to keep their back flat against the chair and then was stabilized using thigh, pelvic and shoulder straps.The mechanical axis of the dynamometer was aligned with the knee of the participant's dominant leg, and the lateral femoral condyle was used as the landmark for setting the axis of rotation.After trial 1, chair and dynamometer settings were recorded to ensure consistent positioning for all subsequent sessions.Before testing, all participants received specific instructions to maximally extend and flex the knee joint through the full range of motion during each individual repetition throughout the evaluation.Calibration of the Biodex isokinetic dynamometer was performed according to manufacturer-established specifications.The protocol consisted of 50 repetitions with extension/flexion movement parameters set at 180º/240º per second, respectively (Glenn et al., 2016).To ensure maximal effort was given throughout the evaluation, strong verbal encouragement was provided during each evaluation (Glenn et al., 2016).
Variables used to determine test-retest reliability (determined a priori to testing) included the following: a) peak torque (N•m), b) relative peak torque (based on body weight [%]), c) time to peak torque (ms), d) torque generated at 30º (N•m), e) torque generated at 0.18 s (N•m), f) work completed during the highest repetition (J), g) relative work completed (based on body weight [%]), h) total work completed (J), i) work completed during the initial 3 rd of exercise (J), j) work completed during the middle 3 rd of exercise (J), k) work completed during the final 3 rd of exercise (J), l) fatigue index (%), m) average power (W), and n) average peak torque (N•m).All variables were calculated by the Biodex software with the exception of "work completed during the middle 3 rd of exercise," which was determined by subtracting the work completed during the initial and final thirds of exercise from total work completed.

Statistical Analyses
Statistical Package for the Social Sciences (SPSS, version 22) was used to conduct analyses.Normal distribution of data were assessed with histograms and boxplots.
In order to test for the degree of agreement between the trials (1 vs. 2, 2 vs. 3, and 1 vs. 3), intra-class correlation coefficients (ICC) were calculated.ICC gives a relative expression of the reliability, and general guidelines suggest an ICC ≥.75 indicates strong reliability (Little, Emery, Black, Scott, Meeuwisse, & Nettel-Aguirre, 2015;Portney & Watkins, 2000).For the purposes of this investigation, coefficients of < 0.50 indicated poor reliability between trials, 0.50 to 0.74 indicated moderate reliability, and ≥ 0.75 indicated strong reliability (Little et al., 2015).For those variables in which there was poor reliability with regard to between-trial comparisons, the 95% confidence intervals (CI) for the all between-trial ICCs were compared.For those cases, test-retest reliability was constituted if the 95% CI did not overlap for any of the between-trial comparisons (Little et al., 2015;Moyen, Ellis, Ciccone, Thurston, Cochrane, & Brown, 2014).
In conjunction with ICCs, technical error of measurement (TEM) was calculated for each variable between trials 1 vs. 2, 2 vs. 3, and 1 vs. 3.TEM is defined as the standard deviation between repeated measures and the lower the TEM obtained, the more accurate the measurement.Absolute (Equation 1) and relative (Equation 2 Where: Absolute TEM = TEM calculated in equation 1 VAV = variable average value (calculated as the arithmetic mean of all subjects' mean from two trials [i.e. the mean of 22 subject means]).
Appropriate conditions to accurately measure TEM are that a) variables are always collected in the same measurement unit, b) calculations are only applied to the same measurement performed and/or the equipment utilized, c) calculations are only applied when using a similar (homogenous) population (i.e.athletes), d) measurements must include a minimum of 20 participants, and e) measurements must be performed at the same time of day (Perrini et al., 2005).The sample size (n = 22), chronobiological considerations (± 1 hour), and participant homogeneity requirements (female masters cyclists) were satisfied in this investigation.As the Biodex was utilized for all trials and variable measurement units were consistent from trial to trial, all conditions were satisfied for TEM calculations.
To determine the presence of a learning effect between testing trials, a repeated measures multivariate analysis of variance model (RM-ANOVA) was utilized for each of the Biodex variables.Greenhouse-Geisser corrections were implemented when sphericity violations occurred.When necessary (i.e.significant F score), a Bonferroni adjustment was made for multiple pairwise comparisons during post hoc analysis.A learning effect was constituted when variables exhibited a significant performance improvement between trials 1 and 2, but not trials 2 and 3 (Little et al., 2015).
Where appropriate, all variables are presented as mean ± SD.

RESULTS
In order to assess the test-retest reliability of knee extension/ flexion on the Biodex isokinetic dynamometer, ICC values were calculated for each extension and flexion variable between trials.For all variables, ICCs were calculated between trial 1 vs. 2, trial 2 vs. 3, and trial 1 vs. 3.ICCs between the testing trials exhibited excellent comparisons for the extension component of the protocol (Table 2); all variables demonstrated moderate to strong reliability within the 3 trial comparisons (i.e.ICC ≥.50).Only 1 variable (time to peak torque) exhibited poor reliability for trial 1 vs. 3 (ICC =.35) during the extension phase.However, the 95% CI overlapped with the trial 1 vs. 2 and 2 vs. 3 CIs, indicating there were no significant differences between ICC values for this variable.During the extension component, the highest ICC values for between trial comparisons were exhibited for peak torque (range:.93 -.96), work completed during the highest repetition (range:.94-.96), total work completed (range:.95-.96), and average peak torque (range:.94-.97).
For each variable assessed during the flexion component, ICCs were calculated between trial 1 vs. trial 2, trial 2 vs. trial 3, and trial 1 vs. trial 3 (Table 3).ICCs between the trials exhibited strong comparisons for the flexion component of the protocol.All variables demonstrated moderate to strong reliability within the 3 trial comparisons (i.e., ICC ≥.50).Only 2 variables (time to peak torque and fatigue index) exhibited poor reliability between trial 1 vs. 3 (ICC =.45 and.47, respectively) during the flexion phase.However, for both variables, the 95% CIs overlapped with the trial 1 vs. 2 and 2 vs. 3 CIs, indicating there were no significant differences between ICC values for this variable.The highest ICC values between all trials were exhibited for peak torque (range:.93 -.96), work completed during the highest repetition (range:.94-.96), total work completed (range:.95-.96), and average peak torque (range:.94-.97).Absolute and relative TEMs were calculated for all variables.Relative TEM exhibited minimal measurement error between trial 1 vs. trial 2 (range: 0.27% -0.97% for all variables), trial 2 vs. trial 3 (range: 0.27% -1.45% for all variables), and trial 1 vs. trial 3 (range: 0.32% -1.30% for all variables) during the extension component of the isokinetic exercise protocol (Table 4).The flexion component of the protocol (Table 5) also exhibited low relative TEM between trial 1 vs. trial 2 (range: 0.87% -2.45% for all variables), trial 2 vs. trial 3 (range: 0.54% -1.10% for all variables), and trial 1 vs. trial 3 (range: 0.71% -2.26% for all variables).
Raw values from the extension and flexion components of the 50 repetition protocol are displayed in Tables 6 and 7, respectively.For the extension component of the protocol, RM-MANOVA indicated no significant differences (Wilks Λ >.05) between the trials for any of the isokinetic exercise variables measured.These non-significant results were mirrored when evaluating the flexion component of the protocol (Wilks Λ >.05).This indicates that there was no learning effect with the Biodex knee extension/flexion for female MA cyclists.

DISCUSSION
The purpose of this investigation was two-fold, 1) to establish test-retest reliability of the Biodex isokinetic dynamometer in female MA, and 2) to determine whether there is a learning effect associated with the Biodex isokinetic dynamometer when utilized without a familiarization.For reliability in MA, the results from this study indicate the Biodex exhibits strong test-retest consistency between trials.
Most all variables exhibited moderate to strong reliability (as defined by ICC ≥.50) indicating strong test-retest reliability.There were also no performance improvements for any measured variables during knee extension/flexion between trials, suggesting there is no learning effect with the Biodex in female MA cyclists.
Based on the ICCs and 95% CI, there is a very high test-retest reliability on the Biodex isokinetic dynamometer in female MA cyclists.Additionally, measurement errors between trials were extremely low (relative TEM ≤ 2.5%) for all extension (Table 4) and flexion (Table 5) variables.Previous investigations have examined test-retest reliability in young, healthy males and females, pediatrics, and untrained older males; however, these are the first data demonstrating these results in MA and an all-female subject sample (Brown, Whitehurst, Gilbert, & Buchalter, 1995;Feiring, Ellenbecker, & Derscheid, 1990;Symons et al., 2005;Tsiros et al., 2011).Not only are MA at a greater risk for lower-extremity injury when compared to younger, trained individuals, females are also at a greater risk for knee injury compared to males (Dugan, 2005;McKean et al., 2006).As a result, it is important that baseline testing results are accurate when used as an outcome measure for a training program or in rehabilitation from a knee injury.In non-research settings, it may take considerable time and financial resources to schedule on-site evaluations in clinics or performance facilities and as a result, initial familiarization to the equipment/procedures may not be feasible.If a test is not reliable between visits, it would make tracking longitudinal performance gains and recovery from injury difficult in MA, as measurement sensitivity may not be appropriate to detect minor improvements.Reliable measurements from trial to trial are critical for these athletes in order to determine minute changes in strength and evaluate agonist/antagonist ratios between the quadriceps and hamstrings musculature (Mota et al., 2015).It is also im-Table 3. Intra-class correlations and limits of agreement for test-retest reliability during the flexion component of the 50-repetition protocol completed on the Biodex Isokinetic Dynamometer T1 -T2 ICC 95% CI T2 -T3 ICC 95% CI T1 -T3 ICC 95% CI portant to note that the MA cyclists utilized in this investigation would be a population likely to demonstrate the least amount of testing variability, based on familiarity with muscle recruitment patterns that mimic those used in sport-based settings and high intrinsic motivation to give maximal efforts each time (Frederick-Recascino & Schuster-Smith, 2003;So et al., 2005).Thus, these results cannot be extrapolated to an untrained population, which might display more variability between trials.
The Biodex is a multifaceted tool providing numerous outcome performance variables.Nevertheless, when determining test-retest reliability of the Biodex, peak torque is commonly used as the outcome variable associated with measurement consistency (Bagley, McLeland, Arevalo, Brown, Coburn, & Galpin, 2016;Lund et al., 2005;McLeland, Ruas, Arevalo, Bagley, Ciccone, Brown, Coburn, Galpin, & Malyszek, 2016;Tsiros et al., 2011).These are the  first data, in any population, to evaluate test-retest reliability of the following outcome variables (Tables 2 and 3): a) time to peak torque, b) torque generated at 30º, c) torque generated at 0.18s, d) work completed during the highest repetition, e) relative work completed (based on body weight), f) total work completed, g) work completed during the initial 3 rd of exercise, h) work completed during the middle 3 rd of exercise, i) work completed during the final 3 rd of exercise, j) fatigue index, k) average power, and l) average peak torque.In our study, the ICCs for peak torque were consistently high (>.75),indicating strong test-retest reliability.Additionally, most other variables demonstrated moderate to strong reliability (ICC ≥.50).Only 1 variable in extension (time to peak torque) and 2 variables during flexion (time to peak torque and fatigue rate) exhibited poor reliability for one of the between-trial comparisons (trial 1 vs. 3).Still, for these variables, the 95% CI overlapped with the CI for the other trials (i.e.trial 1 vs. 2 CI overlapped with trial 2 vs. 3 CI overlapped with trial 1 vs. 3 CI).This indicates that although the ICC was low for those variables, values were not significantly different from ICC values in the other 2 trial comparisons.
Additionally, it has been shown that the effect of the leg flexors on the outcome of the isokinetic fatigue test is minimal (Mota et al., 2015).Ensuring that all variables calculated by the Biodex software demonstrate strong test-retest reliability is important because different populations may have  different goals associated with the Biodex evaluation.For example, an endurance athlete attempting to improve finishing speed during a race would require reliable measurements of 'work completed during the final 3 rd of exercise' to successfully track changes associated with training.Our study indicates that the Biodex can be considered a reliable tool for all of these measures.
Previously, the presence of a learning effect associated with the Biodex remained unclear, as only two studies had evaluated this concept with conflicting results (Lund et al., 2005;Symons et al., 2005).Although the work by Lund et al. (2005) suggested there is no learning effect associated with the evaluation, the incorporation of a familiarization trial inherently invalidates these claims.The other investigation examining the learning effect of the Biodex suggested test-retest reliability was high (ICC = 0.84 -0.94) during a 5-repetition, muscular strength protocol (Symons et al., 2005).However, as the work by Symons et al. (2005) only examined an initial test-retest design and did not account for additional follow-up assessments (i.e. a 3 rd evaluation to determine changes in variability from initial testing), an overall learning effect cannot be determined.Results from the current investigation suggest there is not a learning effect associated with the Biodex isokinetic dynamometer (based on non-significant Wilks Λ, p > 0.05) when tested at 3 separate time points, 1 week apart.
While the outcomes of this investigation are novel and important for future research, there are certain constraints associated with the results that must be addressed.This investigation included athletes (specifically females) who were comfortable with the exercise patterns recruited, and it cannot be assumed untrained individuals would exhibit similar results because these individuals tend to have lower intrinsic motivation and therefore, may not give a maximal effort each time (Frederick-Rescascino & Schuster-Smith, 2003).These data were also collected in trained, healthy, athletic individuals free from lower-extremity injury; thus these results may not be valid for untrained or individuals undergoing a rehabilitation program based on a potential increase in measurement variability.Finally, this investigation allotted exactly 1 week between testing trials, where participants were required to come at the same time each day.This chronobiological control may not always be feasible in a clinical setting, and could lead to augmented measurement variability between testing.
In this investigation, we present the first data exhibiting strong test-retest reliability of the Biodex isokinetic dynamometer in MA.Additionally, this is the first investigation to demonstrate strong test-retest reliability in an all-female subject population.Furthermore, we demonstrate that there is no learning effect associated with either the extension or flexion component of isokinetic knee exercise on the Biodex.When used in clinical or research settings, a familiarization protocol does not appear necessary before undergoing isokinetic exercise testing.The removal of a familiarization trial to the Biodex can save time and minimize financial requirements for athletes tracking longitudinal performance gains.However, these results only pertain to a highly trained, older female population, and these results cannot be extrapolated to injured or non-athlete populations.Future investigations are required in lesser-trained and/or individuals in clinical settings to further elucidate whether there is a learning effect associated with the Biodex isokinetic dynamometer.

CONCLUSIONS
There was strong test-retest reliability in masters-aged, female athletes.No learning effect was associated with the Biodex during a knee extension/flexion fatigue protocol, indicating that a familiarization protocol is not necessary for isokinetic testing.

Table 2 .
Intra-class correlations and limits for test-retest reliability during extension component of the 50-repetition protocol on the Biodex Isokinetic Dynamometer

Table 4 .
Absolute and relative technical error of measurement calculations during the extension component of the 50-repetition protocol completed on the Biodex Isokinetic Dynamometer between testing trials T1=Initial testing trial, T2=Second testing trial, T3=Third testing trial, TEM=technical error of measurement.All data are presented as mean±SD (n=22)

Table 5 .
Absolute and relative technical error of measurement calculations during the flexion component of the 50-repetition protocol completed on the Biodex Isokinetic Dynamometer between testing trials

Table 6 .
Raw values calculated during the extension component of the 50-repetition protocol completed on the Biodex Isokinetic Dynamometer

Table 7 .
Raw values calculated during the flexion component of the 50-repetition protocol completed on the Biodex Isokinetic Dynamometer