Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/8509929 StandardizationofReticulocyteValuesinan AntidopingContext ArticleinAmericanJournalofClinicalPathology·July2004 DOI:10.1309/1FAM-1VT3-N76G-JGXV·Source:PubMed CITATIONS READS 39 34 4authors,including: KenSharpe UniversityofMelbourne 81PUBLICATIONS2,010CITATIONS SEEPROFILE Allin-textreferencesunderlinedinbluearelinkedtopublicationsonResearchGate, lettingyouaccessandreadthemimmediately. Availablefrom:KenSharpe Retrievedon:19September2016 Hematopathology / STANDARDIZATION OF RETICULOCYTE VALUES Standardization of Reticulocyte Values in an Antidoping Context Michael J. Ashenden, PhD,1 Ken Sharpe, PhD,2 Rasmus Damsgaard, MD,3 and Lisa Jarvis, PhD4 Key Words: Recombinant human erythropoietin; Reticulocytes; Athletes; Blood tests; Doping; Erythropoiesis DOI: 10.1309/1FAM1VT3N76GJGXV Abstract The lack of standardization of reticulocyte results hinders the ability of sports authorities to recognize the telltale fluctuations over time that are typical for athletes using illegal blood doping to improve their performance. Therefore, the aim of the present study was to devise a tenable approach for antidoping authorities to quantify instrument bias. We evaluated reticulocyte data derived during a 42-week period from 210 hospital patient blood samples measured in duplicate simultaneously on up to 11 hematology analyzers located in a single laboratory. We found that square root transformation of reticulocyte values enabled quantification of interinstrument bias by using the mean reticulocyte value of a cohort of approximately 54 subjects as a de facto calibration agent. We also demonstrated that measurement precision associated with low reticulocyte values was not inferior to that associated with higher values. 816 816 Am J Clin Pathol 2004;121:816-825 DOI: 10.1309/1FAM1VT3N76GJGXV An independent examination commissioned by the World Anti-Doping Agency, Montreal, Canada, recommended that blood testing be used in conjunction with urine testing to deter the use of recombinant human erythropoietin (rHuEPO) in sports.1 In addition to the cost saving offered by the use of blood as a screening tool for subsequent urinalysis, the rationale for this recommendation included the capacity to target athletes for follow-up testing on the basis of “suspicious” blood profiles. For several weeks after ceasing use of rHuEPO, athletes will have elevated hemoglobin levels in tandem with suppressed reticulocyte production.2 When hemoglobin (g/L) and reticulocyte (%) values are substituted into an algorithm developed to detect athletes who have stopped using rHuEPO, an elevated OFF-hr model score (OFF-hr score = hemoglobin value – 60√reticulocyte value) can discriminate athletes who recently have stopped using rHuEPO from nonusers. 3 However, the benefits associated with the enhanced “reach back” offered by scrutinizing hematologic parameters are tempered by concerns about the standardization of reticulocyte results, for which no universally accepted reference method exists.4 Issues pertinent to antidoping efforts include interchangeability of results derived from different platforms, the absence of a universal calibration material, and the precision of reticulocyte assays at low reticulocyte counts. Low reticulocyte counts are a hallmark of previous rHuEPO use, so that imprecision at such levels will dilute the ability of authorities to recognize signs of blood manipulation. Thresholds for OFF-hr model scores have been published that enable practitioners to recognize the probability associated with unusual deviations from expected scores.3 However, © American Society for Clinical Pathology Hematopathology / ORIGINAL ARTICLE these thresholds were derived using data collected only on the ADVIA 120 platform (Bayer, Tarrytown, NY), so that if reticulocyte values derived from other platforms are substituted into the equation, intermethod bias might compromise the integrity of this approach. The absence of a reticulocyte calibration material that can be used by all methods prevents technicians from quantifying intermethod bias. Fresh blood is the ideal material as a calibrator for reticulocyte counts4; however, the poor stability of reticulocytes that undergo maturation and manufacturer-dependent calibration procedures that cannot be modified easily by the user 4 confound the use of a whole-blood calibration material. Even if universal calibration material were available, it is unlikely that the precision reported for stabilized cells would replicate the precision of the instrument when measuring whole blood, because stabilizing the reticulocytes to confer a useful shelf life inevitably would alter cellular characteristics. This has been shown to influence the different dye-detection-algorithm scenarios used by the manufacturers of reticulocyte platforms in an unpredictable manner (L.J., unpublished observations). The aims of the present study were as follows: (1) evaluate the characteristics of multiple reticulocyte platforms to assess whether values derived using different platforms conceivably could be substituted into the OFF-hr model without compromising the integrity of this approach; (2) explore the precision of low vs high reticulocyte values; and (3) develop an approach to adjust for bias in results from a specific instrument before its utilization in an antidoping setting. Materials and Methods Parallel Evaluation of 11 Reticulocyte Analyzers We obtained blood samples from patients in a Minneapolis, MN, hospital. Five patients were selected randomly each week (unspecified sex), yielding a total of 210 samples during a 42-week period. Samples were obtained by venipuncture using K3EDTA tubes, and data were derived from analysis of these samples on 11 reticulocyte analyzers that were maintained at R&D Systems, Minneapolis ❚Table 1❚. With a few exceptions, each sample was measured in duplicate on each instrument. Owing to sample volume limitations, it was not possible to perform these measures on each sample using all instruments (on average, each sample was measured in duplicate on 8.8 instruments). Reticulocyte percentages or square root–transformed reticulocyte percentages are reported for all analyses as indicated. Instruments were compared according to the method of Bland and Altman,5 in which the difference between (the average of duplicates from) 2 machines is plotted against the average of the 2 methods for each data point. We used the square root of reticulocyte percentages for our comparisons. The precision of each instrument is ascertained from the between-duplicate SD (of square root percentages), given as √ Σni=1 (x i1– x i2)2 2n where xi1 and xi2 are duplicate readings on the ith sample and n is the number of samples analyzed on a given instrument. In addition to the aforementioned evaluations, we also examined different sources of variation for each instrument. Because 5 blood samples were analyzed on any given instrument at a single time point and this process was repeated multiple times throughout the 42-week period, it was possible to quantify 3 components of variation: variation between duplicates, variation between samples within the 42 weeks, and the variation between weeks. By comparing the between- and within-weeks variability, it is possible to test the stability of the instrument throughout the year. If an instrument is not stable, the variability between weeks will ❚Table 1❚ Instruments Used in the Comparison and the Approach Used for Staining of Reticulocytes* Instrument No. 1 2 3 4 5 6 7 8 9 10 11 Make/Model Bayer ADVIA 120 Abbott CellDyn CD3200 Abbott CellDyn CD3500 Abbott CellDyn CD3700 Abbott CellDyn CD4000 Becton Dickinson ReticCount FACSCount Bayer H*3 Beckman Coulter STKS Sysmex XE2100 ABX Pentra 120 Retic Beckman Coulter GENS Approach Stain Detection SV SV SV† SV† F F† SV† SV† F F SV Oxazine 750 New methylene blue New methylene blue New methylene blue CD4K 530 Thiazole orange Oxazine 750 New methylene blue Stromatolyser-NR Thiazole orange New methylene blue Absorbance Light scatter Light scatter Light scatter Fluorescence Fluorescence Absorbance VCS Fluorescence Fluorescence VCS F, fluorescence; SV, supravital; VCS, volume, conductivity, and light scatter. * Manufacturer locations are as follows: Bayer, Tarrytown, NY; Abbott, Abbott Park, IL; Becton Dickinson, Franklin Lakes, NJ; Sysmex, Kobe, Japan; ABX, Montpellier, France; Beckman Coulter, Miami, FL. † Depicts semiautomated or manual method. Am J Clin Pathol 2004;121:816-825 © American Society for Clinical Pathology 817 DOI: 10.1309/1FAM1VT3N76GJGXV 817 817 Ashenden et al / STANDARDIZATION OF RETICULOCYTE VALUES be greater than what can be accounted for by the variation between samples, within weeks. Precision of Reticulocyte Values To test the hypothesis that the precision of reticulocyte measurement increases with the reticulocyte level, we first determined the median value of each of the 210 blood samples across the 11 instruments. We then designated each blood sample as having a low (n = 66; range, <1.3%), medium (n = 77; range, 1.3%-2.0%), or high (n = 67; range, >2.0%) reticulocyte percentage. Because SD is the most useful measure of precision, we calculated the within-sample SD (ie, between duplicate readings) for each sample across all instruments. This analysis was performed on the raw data set and on the data after they had been transformed using log and square root transformations. Quantifying Instrument Bias Based on the research used to derive the OFF-hr model, when an ADVIA 120 machine is used to measure reticulocytes in a cohort of male endurance athletes residing at sea level, the (square root–transformed) reticulocyte mean will be 1.12 (SD, 0.165, based on n = 192 elite male endurance athletes6). However, because it is highly improbable that any instrument will be identical to the “average” ADVIA 120 used to derive these values (data were collected using 13 ADVIA 120 machines), the reported sample mean will vary with the bias of the particular instrument used, relative to the expected mean of 1.12. Therefore, comparing the reported mean value derived from the sample group of athletes with the value 1.12 enables the technician to quantify instrument bias. The precision of the estimate of the sample mean will be influenced by the number of athletes tested and by the SD for the sampled population. We define a group of statistically sufficient size comprising elite male endurance athletes tested at sea level as a “samplator.” An important caveat is the requirement that the SD of the samplator be comparable with the SD of the 192 elite male endurance athletes from which the expected values were derived. This is crucial because a SD beyond the expected 95% interval for a given sample size would reduce confidence that the samplator was comparable to the standard, disease- and drug-free population used in the original study. We recommend a preliminary empiric evaluation of a potential samplator data set, eg, via a dot plot, so that the technician can be satisfied the general distribution of square root reticulocytes is typical (ie, not bimodal or including unusual outliers). Having satisfied this preliminary screen, descriptive statistics should be calculated. There is no plausible reason why a samplator population should have a lower than expected SD for square root reticulocytes; therefore, if the calculated SD is less than the lower 818 818 Am J Clin Pathol 2004;121:816-825 DOI: 10.1309/1FAM1VT3N76GJGXV limit, the data should be discarded and the process repeated. If, however, the SD of the samplator exceeds the upper limit, outliers should be removed to ascertain whether this brings the samplator SD back within the expected range. Outliers should be removed via the box plot criteria: square root reticulocyte values that are more than 1.5 times the interquartile range beyond the interquartile limits are excluded from the sample.7 If this does not reduce the SD to within the acceptable range, the data should be discarded and the process repeated. Inability to obtain a samplator with a SD within the expected range points to a possible fundamental anomaly in the instrument. To demonstrate the adherence of actual data to this theory, data collected from competitors by the International Ski Federation during the 2001-2002 World Cup season were evaluated. Samples from all athletes were tested on a single machine (Sysmex R-500, Sysmex, Kobe, Japan); however, in many cases, fewer than 50 athletes were tested on a single day. Therefore, to provide meaningful sample sizes, when necessary, data obtained during consecutive days from athletes at the same competition venue were combined to give sample sizes near 50 (2 events with 73 and 103 samples were not altered). Means and SDs of square root–transformed reticulocyte percentages were compared with the expected mean ± SD value of 1.12 ± 0.165 derived from 192 elite male endurance athletes measured at sea level on the ADVIA 120 platform.6 We calculated the impact on OFF-hr model scores by multiplying the instrument bias by –60 (because OFF-hr = hemoglobin value – 60√reticulocyte value, a lower reported mean would increase notional OFF-hr model scores). Results Comparison Between Instruments The Bland and Altman analysis, plotting the difference between samples analyzed on 2 platforms against the average value of the respective methods, showed that the ADVIA 120 platform typically reported a higher reticulocyte value than the candidate instruments we evaluated, whether the candidate instrument relied on fluorescence ❚Figure 1❚ or nonfluorescence ❚Figure 2❚ to detect reticulocytes. The single exception to this tendency was the FACSCount platform, which read higher than the ADVIA 120 in our hands. Candidate instrument–ADVIA 120 agreement, as reflected by the SD of the differences between machines, demonstrated that the CD3200, CD4000, Sysmex XE2100, and FACSCount showed most consistent agreement with the ADVIA 120 machine in our hands. With regard to candidate instrument–candidate instrument comparisons, the CD4000–Sysmex XE2100–FACSCount triumvirate showed notable intermethod agreement, © American Society for Clinical Pathology Hematopathology / ORIGINAL ARTICLE ADVIA v CD4000 ADVIA 120 Minus CD4000 1 SD = 0.172 0 –1 0 1 2 3 Average of the 2 Methods (Square Root of Percentage) ADVIA v FACS CD4000 v FACS 1 SD = 0.187 CD4000 Minus FACS –1 0 1 2 3 0 –1 0 Average of the 2 Methods (Square Root of Percentage) ADVIA v Sysmex 2100 0 1 2 3 0 –1 0 CD4000 Minus ABX Pentra 120 ADVIA 120 Minus ABX Pentra 120 0 1 2 2 3 0 –1 0 3 Average of the 2 Methods (Square Root of Percentage) 0 1 2 2 3 FACS v ABX Pentra 120 1 SD = 0.261 0 –1 1 Average of the 2 Methods (Square Root of Percentage) CD4000 v ABX Pentra 120 1 SD = 0.259 0 1 SD = 0.104 Average of the 2 Methods (Square Root of Percentage) ADVIA v ABX Pentra 120 –1 FACS v Sysmex 2100 1 SD = 0.098 Average of the 2 Methods (Square Root of Percentage) 1 3 FACS Minus Sysmex 2100 0 –1 2 CD4000 v Sysmex 2100 1 SD = 0.143 CD4000 Minus Sysmex 2100 ADVIA 120 Minus Sysmex 2100 1 1 Average of the 2 Methods (Square Root of Percentage) 3 Average of the 2 Methods (Square Root of Percentage) Sysmex 2100 v ABX Pentra 120 SD = 0.242 Sysmex 2100 Minus ABX Pentra 120 0 SD = 0.119 FACS Minus ABX Pentra 120 ADVIA 120 Minus FACS 1 0 –1 0 1 2 3 Average of the 2 Methods (Square Root of Percentage) 1 SD = 0.200 0 –1 0 1 2 3 Average of the 2 Methods (Square Root of Percentage) ❚Figure 1❚ Comparison of estimates of (square root of) reticulocytes in ~100 samples for instruments incorporating fluorescent reticulocyte enumeration (except the ADVIA 120, which uses absorbance). The plots are according to the Bland and Altman method,5 including mean difference (broken line) and standard deviation of the differences (upper left corner of graph) for each separate comparison. For proprietary information, see the text. and results derived from within this subgroup had the smallest SDs of any comparisons (Figure 1). Furthermore, whereas the FACSCount gave higher reticulocyte values on average than the CD4000 or Sysmex XE2100, the latter 2 machines showed virtually no intermethod bias. Precision of Different Instruments Tests for between-week variation in excess of that accounted for by within-week variation gave statistically significant results for the ADVIA 120 and ABX Pentra 120 instruments. Time series plots (not shown) confirmed that the ADVIA 120 was noticeably aberrant during weeks 34 to 39, and the laboratory confirmed that this instrument was recalibrated around week 40. Primarily because the ADVIA 120 is the instrument of central interest, data from it for weeks 34 to 39 were omitted from all subsequent analyses. For the ABX Pentra 120, the problem was much greater, with erratic behavior observed throughout the study period. The between-duplicates SD (of square root reticulocyte percentages) was considerably larger for the ABX Pentra 120 (SD = 0.081) than for most of the other instruments for which values ranged from 0.028 (CD4000) to 0.050 (ADVIA 120). The 2 exceptions were the Bayer H*3 (SD = 0.057) and the Beckman Coulter STKS (SD = 0.076). Values for the remaining instruments were as follows: 0.033 (Sysmex XE2100), 0.036 (CD3500), 0.041 (FACSCount), 0.044 (CD3700), 0.046 (CD3200), and 0.048 (Beckman Coulter GENS). Precision of Reticulocyte Values at Low, Medium, and High Reticulocyte Percentages In ❚Table 2❚, the SD of duplicate readings for the same blood sample are reported using 3 approaches to present reticulocyte data. The average values across all 11 instruments Am J Clin Pathol 2004;121:816-825 © American Society for Clinical Pathology 819 DOI: 10.1309/1FAM1VT3N76GJGXV 819 819 Ashenden et al / STANDARDIZATION OF RETICULOCYTE VALUES ADVIA v CD3200 ADVIA 120 Minus CD3200 1 SD = 0.163 0 –1 0 1 2 3 Average of the 2 Methods (Square Root of Percentage) CD3200 v H*3 0 1 2 0 –1 3 0 Average of the 2 Methods (Square Root of Percentage) ADVIA 120 Minus STKS SD = 0.258 0 –1 0 1 2 3 1 –1 0 2 2 0 –1 3 0 Average of the 2 Methods (Square Root of Percentage) CD3200 Minus GEN-S ADVIA 120 Minus GEN-S 0 1 1 SD = 0.248 3 Average of the 2 Methods (Square Root of Percentage) 1 0 1 2 2 3 H*3 v GEN-S SD = 0.145 0 –1 1 Average of the 2 Methods (Square Root of Percentage) CD3200 v GEN-S SD = 0.195 0 H*3 v STKS 1 0 ADVIA v GEN-S –1 3 SD = 0.207 Average of the 2 Methods (Square Root of Percentage) 1 2 CD3200 v STKS CD3200 Minus STKS ADVIA v STKS 1 1 Average of the 2 Methods (Square Root of Percentage) H*3 Minus STKS 0 SD = 0.173 H*3 Minus GEN-S –1 1 3 Average of the 2 Methods (Square Root of Percentage) 1 STKS v GEN-S SD = 0.206 STKS Minus GEN-S SD = 0.207 CD3200 Minus H*3 ADVIA 120 Minus H*3 ADVIA v H*3 1 0 –1 0 1 2 3 Average of the 2 Methods (Square Root of Percentage) 1 SD = 0.179 0 –1 0 1 2 3 Average of the 2 Methods (Square Root of Percentage) ❚Figure 2❚ Comparison of estimates of (square root of) reticulocytes in ~100 samples for instruments incorporating nonfluorescent reticulocyte enumeration (the ADVIA 120 utilizes absorbance). The plots are according to the Bland and Altman method,5 including mean difference (broken line) and standard deviation of the differences (upper left corner of graph) for each separate comparison. For proprietary information, see the text. demonstrated that the SDs of log-transformed reticulocyte percentages decreased from low to medium to high values (Table 2). This tendency was uniform, with few exceptions (1 value for instruments 2, 4, 8, and 10 did not adhere to this trend). The trend was the same, but opposite, for raw data in which the SD between duplicates for high values was about double that of low values. SDs of square root–transformed values were consistent across the 3 levels. For illustrative purposes, coefficients of variation (CVs) were calculated using raw reticulocyte percentages, yielding values of 9.1%, 7.6%, and 5.9% for low, medium, and high values, respectively. Samplator Data When results obtained from groups of athletes tested at a moderate altitude (~1,600 m) were compared with values derived from groups tested at sea level, the mean value of square root–transformed reticulocyte percentages for the 820 820 Am J Clin Pathol 2004;121:816-825 DOI: 10.1309/1FAM1VT3N76GJGXV altitude groups were, on average, 0.05 units higher; P = .017. The SDs of 2 cohorts measured at altitude and 1 cohort measured at sea level were found to lie outside the expected 95% interval for the respective sample sizes (Figure 2, upper panel). Only 1 of these groups (n = 103) was found to contain outliers, and, in this case, the adjustment reduced the SD to within the expected range. For a sample of 54 subjects, the lower and upper limits of the 95% interval are 0.134 and 0.196, respectively. ❚Table 3❚ shows the difference between the reported mean value for square-root transformed reticulocyte percentages and the expected value of 1.12 (the gap between the data points and the solid line, ❚Figure 3B❚). This difference is an estimate of the bias associated with this particular instrument and would be the recommended adjustment to allow for candidate instrument–ADVIA 120 bias had any one of these groups been used as the samplator. Depending on © American Society for Clinical Pathology Hematopathology / ORIGINAL ARTICLE ❚Table 2❚ Within-Sample SD (Between Duplicate Readings) for Samples Classified as Having Low, Medium, or High Reticulocyte Percentages* Log Instrument No. Low 1 2 3 4 5 6 7 8 9 10 11 Average SD * 0.077 0.114 0.060 0.092 0.053 0.074 0.154 0.120 0.062 0.125 0.150 0.098 Raw Square Root Medium High Low Medium High Low Medium High 0.076 0.057 0.059 0.051 0.047 0.047 0.106 0.121 0.056 0.165 0.096 0.080 0.062 0.061 0.048 0.067 0.034 0.043 0.099 0.121 0.042 0.103 0.044 0.066 0.096 0.099 0.053 0.082 0.059 0.138 0.088 0.156 0.064 0.132 0.082 0.095 0.133 0.094 0.096 0.083 0.080 0.109 0.133 0.197 0.084 0.243 0.125 0.125 0.195 0.174 0.141 0.192 0.104 0.149 0.167 0.285 0.117 0.214 0.127 0.170 0.042 0.050 0.027 0.042 0.027 0.049 0.054 0.062 0.031 0.063 0.050 0.045 0.049 0.036 0.037 0.032 0.030 0.035 0.057 0.074 0.034 0.097 0.054 0.049 0.054 0.051 0.040 0.056 0.029 0.039 0.061 0.091 0.035 0.073 0.037 0.051 Low, <1.3%, mean, 1.08%; medium, 1.3%-2.0%, mean, 1.66%; high, >2%, mean, 2.97%; raw, SD of the values as percentages; log and square root, SD when a log or square root transformation is taken of raw readings before calculating SD, respectively. Instrument numbers corresponds with the instrument numbers in Table 1. which group was used, results from the Sysmex R-500 instrument would have influenced OFF-hr model scores by –1.7 to 4.8 units (Table 3). Discussion Measurement of Reticulocyte Counts Reticulocytes are transitional RBCs, between nucleated RBCs and mature RBCs, which contain some stainable remnant messenger RNA. With the exception of the FACSCount flow cytometer, which separates reticulocytes from mature RBCs using a “gate,” all analyzers evaluated in the present study use a cluster analysis technique to identify, classify, and separate RBCs from reticulocytes. Separation is complicated because messenger RNA represents a transient quantity that degrades within about 3 days after the reticulocyte is released into circulation from the bone marrow. Therefore, there is a continuum between reticulocytes and mature cells rather than a separate population of each. What is classified as a reticulocyte is somewhat arbitrary, and the algorithms used by manufacturers to classify reticulocytes typically are proprietary information and unavailable for general scrutiny. What is clear is that each algorithm uses a somewhat different approach to establish the cutoff point between reticulocytes and other blood cells. As a preliminary step to ascertain whether it is feasible to interchange reticulocyte results from different platforms, ❚Table 3❚ Sample SD and Mean for Square Root–Transformed Reticulocyte Percentages of Groups of Elite Male Cross-Country Skiers Tested During the 2001-2002 International Ski Federation World Cup Season Using a Sysmex R-500* No. of Samples Sea level 48 61 43 39 34 32 Altitude (~1,600 m) 103 (100)† 48 73 52 52 Collection Date SD Mean Mean – Expected ∆OFF-hr Model Score November 23 November 24 November 25 November 26-27 March 9-10 March 15-16 0.158 0.168 0.124 0.154 0.127 0.179 1.084 1.060 1.046 1.046 1.096 1.040 –0.036 –0.060 –0.074 –0.074 –0.024 –0.080 2.2 3.6 4.4 4.4 1.4 4.8 December 14 December 14 December 15-16 January 4 January 5-6 January 8 0.191 0.169 0.208 0.182 0.170 0.171 1.149 1.142 1.132 1.116 1.072 1.095 +0.029 +0.022 +0.012 –0.004 –0.048 –0.025 –1.7 –1.3 –0.7 0.2 2.9 1.5 Mean – expected, difference between the reported mean and 1.12 (the estimated mean for elite male endurance athletes tested at sea level on an ADVIA 120 platform); ∆OFF-hr model score (hemoglobin value – 60 √reticulocyte value), difference of the Sysmex-derived score and the value expected had the sample been measured on an ADVIA 120 platform. Sysmex R-500, Sysmex, Kobe, Japan; ADVIA 120, Bayer, Tarrytown, NY. † Values derived after 3 outliers are removed. * Am J Clin Pathol 2004;121:816-825 © American Society for Clinical Pathology 821 DOI: 10.1309/1FAM1VT3N76GJGXV 821 821 Ashenden et al / STANDARDIZATION OF RETICULOCYTE VALUES A 0.225 Sample Mean (Square Root of Percent Reticulocytes) Sample Standard Deviation (Square Root of Percent Reticulocytes) B 0.205 0.185 0.165 0.145 0.125 0.105 30 40 50 60 70 80 90 100 110 Sample Size 1.25 1.20 1.15 1.10 1.05 1.00 0.95 30 40 50 60 70 80 90 100 110 Sample Size ❚Figure 3❚ Sample standard deviations (A) and means (B) of square-root reticulocytes for cohorts of elite male cross-country skiers tested during the 2001-2002 International Ski Federation World Cup season. Solid lines depict the respective values (SD, 0.165; mean, 1.12) derived from a sample of 192 elite male endurance athletes tested at sea level (circles) on the ADVIA platform.6 Broken lines depict the upper and lower limits of the expected 95% interval for different sample sizes. Triangles depict groupings of athletes tested at altitude (~1600 m above sea level); encircled data points represent the group (n=103) before and after removal of outliers. it is insightful to evaluate the general agreement between different methods. In our hands, the intermethod comparison plots revealed that the CellDyn 3200, CellDyn 4000, FACSCount, and Sysmex XE2100 demonstrated sufficiently consistent results with the ADVIA 120 to recommend them as being candidate platforms with results that potentially could be interchanged with ADVIA 120 data. The criteria applied during this subjective evaluation were primarily the uniformity (depicted by the low SD of differences) and the absence of large disparities between samples measured simultaneously on the candidate instrument and an ADVIA 120. However, a prescient secondary consideration was the consistency of the intermethod agreement among the candidate instruments themselves. As depicted in Figure 2, the grouping of intermethod comparison values among the CellDyn 4000, FACSCount, and Sysmex XE2100 platforms was notably tighter than any other candidate instrument–candidate instrument comparison. Furthermore, each of these platforms uses fluorescent stains, which are known to demonstrate enhanced sensitivity in the identification of low reticulocyte numbers compared with supravital stains.8 This further predisposes their use in an antidoping setting in which low reticulocyte values are of primary interest. Adjusting for Instrument Bias It has been assumed that the manufacturing process and standardization of how each instrument model enumerates reticulocytes ensures that non–ADVIA 120/ADVIA 120 agreement is relatively reproducible across all machines of a 822 822 Am J Clin Pathol 2004;121:816-825 DOI: 10.1309/1FAM1VT3N76GJGXV given model (as distinct from bias due to errors in calibration). Having established that a particular instrument model shows sufficient generic agreement with the ADVIA 120 platform to be considered a potential substitute, it is necessary to adjust for instrument-specific bias that might arise from errors in calibration (OFF-hr model thresholds were set based on the significance of deviations from the population mean value; therefore, instrument bias has the potential to invalidate the rationale for these thresholds). Several options exist, including development of regression equations and adjustments based on means and SDs of samples measured on both platforms. However, adjustment using such generic approaches relies on the presumption that the randomly selected candidate machine is calibrated identically to the “sister” instrument on which the adjustment factor was derived. Because reticulocyte calibration typically is factoryset and problematic for an independent technician to verify, this assumption seems contentious. By using the samplator concept, bias between the specific machine and the ADVIA 120 platform can be established, provided that a sufficiently large number of subjects are tested to be confident that the resulting sample mean will closely approximate the mean of the population from which the sample was drawn. Because a sample population of male endurance athletes is readily available to antidoping authorities (virtually all blood testing is carried out on athletes competing in endurance events), this approach provides a convenient and objective means of adjusting for interinstrument bias, despite the absence of a universal calibration material. A paper calibration of this difference will enable © American Society for Clinical Pathology Hematopathology / ORIGINAL ARTICLE the (adjusted) value to be substituted into the OFF-hr model without compromising the model’s integrity. Statistical theory shows that the SE of the sample mean is SD/√n, where n is the sample size and SD is the population SD. Because it is not tenable to adjust reticulocyte percentages by less than 0.1 (values generally are reported to only 1 decimal place), and a 0.1 difference on the percentage scale translates into a difference of about 0.045 on the square root scale (for values around the expected mean value of 1.12), it would not be sensible to estimate the samplator mean to a greater accuracy than 0.045. Because 95% confidence intervals are given by the mean ± twice the SE, a samplator of 54 athletes with the expected SD of 0.165 represents the upper limit required for a precise estimate of the population mean, since (0.165/√54) × 2 = 0.045. Therefore, if 54 (or thereabouts) male endurance athletes were tested at sea level using a candidate machine, any difference between the group mean reported by the candidate machine and the expected mean of 1.12 (ie, the expected value if the athletes’ samples had been measured on an average ADVIA 120) represents the candidate instrument–ADVIA 120 bias. A retrospective evaluation of data collected by the International Ski Federation under field conditions (the analyzer typically was relocated to each competition venue) revealed that samples of around 50 male endurance athletes yielded SDs that were comparable with the expected SD of 0.165. However, the reported mean value tended to be higher when blood samples were analyzed in venues approximately 1,600 m above sea level, which we speculate is attributable to the physiologic effect of altitude on reticulocyte production (the 0.05 difference was comparable to the 0.08 difference found for altitudes of 1,730-2,220 m).6 This underlines the importance for the instrument bias to be established using blood samples collected from athletes at sea level, to ensure that the reported mean is directly comparable with the ADVIA 120–derived value of 1.12 appropriate for male endurance athletes at sea level. Depending on which sea level data were chosen, the instrument-specific bias as indicated by the samplator approach would have yielded a bias between 0.02 and 0.08 (on the square root scale) for the Sysmex R-500, which would have resulted in OFF-hr scores being altered by 1 and 5 units, respectively. To put this in context, a male endurance athlete will have an OFF-hr model score of approximately 80, with 1 in 10 athletes exceeding a score of 104.6 and 1 in 1,000 athletes exceeding a score of 125.6.3 It is noteworthy that a change in the OFF-hr model score of 1 to 5 units also could be the result of a bias of 0.5 g/dL or less (≤5 g/L) in the hemoglobin assay. According to the most recent College of American Pathologists survey results,9 the SD of hemoglobin measurements on any given sample typically is between 0.1 and 0.3 g/dL (1.0 and 3.0 g/L). This would give a 2 × SD range of ± 0.2 to 0.6 g/dL (2-6 g/L). When contrasted with these values, our results suggest that were the samplator concept used to quantify instrument bias, the magnitude of any subsequent error contributed by the reticulocyte component of the OFF-hr model would be comparable to that associated with the measurement of the hemoglobin concentration. Convenience dictates that the collection of blood samples from sufficient numbers of subjects to constitute a reasonable samplator be undertaken infrequently. To correctly emulate the approach described herein, blood samples should be obtained and analyzed at a single, sea level location, ideally on the same day (or on 2 consecutive days). It is acceptable to use sample sizes in excess of 54, and technicians must not subjectively eliminate data points or select subsets of available data in an effort to achieve favorable results. We propose that the samplator be run a minimum of once yearly on candidate machines, while simultaneously obtaining quality control values from instrument-specific controls as the samplator is run. This enables the technician to use the samplator to establish candidate instrument–ADVIA 120 bias, while in parallel using instrument-specific reticulocyte controls to monitor instrument drift over time. An additional samplator would need to be run only if values drifted beyond a designated range. For heightened control sensitivity, each laboratory should establish its own mean and acceptable range instead of using the generic reference range provided by the control lot manufacturer (which typically has limits that are appropriate in a clinical setting but not for the heightened sensitivity desirable in an antidoping scenario). Interpretation and Precision of Reticulocyte Data Whether the reticulocyte count is reported as a percentage of all RBCs (percent reticulocytes) or as a number per volume of blood (× 103/µL [× 109/L]), the distribution of values in the general population is skewed, with an extended tail reflecting people with high reticulocyte values (associated with numerous causes, including environmental, genetic, and/or pathologic circumstances). Previous empiric evaluation of data sets has demonstrated that a square root transformation of the reticulocyte percentages leads to (almost) constant variance and values that are close to being normally distributed,3 and this was borne out by our present results in which SD values of duplicates were consistent regardless of the reticulocyte percentage only if square roots were analyzed. However, an appraisal of the counting technique used by flow cytometers yields an additional rationale for the appropriateness of the square root transformation. Assuming an accurate count of the number of reticulocytes among a fixed number of RBCs (eg, 35,000 cells, which is approximately the number of cells counted by contemporary hematology analyzers), the distribution of the count from sample to Am J Clin Pathol 2004;121:816-825 © American Society for Clinical Pathology 823 DOI: 10.1309/1FAM1VT3N76GJGXV 823 823 Ashenden et al / STANDARDIZATION OF RETICULOCYTE VALUES sample is likely to have a distribution that is close to binomial. When there is a large sample (35,000 is very large) and a small probability that any 1 cell is a reticulocyte (up to 3% or 4% of a population is small in this context), the binomial distribution is closely approximated by a Poisson distribution. For a Poisson distribution, the mean is equal to the variance, and statistical theory shows that taking square roots will lead to (almost) constant variance (which implies constant SD) and values that are close to being normally distributed. The legitimacy of square root transformations for percentage reticulocyte data has implications on several levels. First, it demonstrates unequivocally that in an antidoping setting, in which authorities seek to recognize a value that deviates substantially from an expected normal value, optimal resolution is provided when the units evaluated are the square root of the percentage of reticulocytes. While initially this novel approach might entail some mental gymnastics for practitioners accustomed to evaluating reticulocytes as an absolute count or a percentage, it demands no greater dexterity than required to alternate between absolute counts and percentage values, and once the data have been transformed, the interpretation of results is straightforward. Second, our results refute the belief that “the measurement precision associated with low reticulocyte counts is inferior to the precision associated with higher reticulocyte counts.” Although precision typically is reported as a CV, a more useful measure of precision is the SD. Our results demonstrated that SD is reasonably constant between low and high reticulocyte values after they have been square root transformed, revealing that the precision of low reticulocyte values is not inferior to the precision of high values (the generality of this observation was demonstrated by the consistency of between-duplicates SD, even across different platforms). The promulgation of reduced precision at low reticulocyte levels is based on the universally reported finding that CVs increase as reticulocyte values decrease. CVs are most useful for data for which a log transformation is appropriate, because under these circumstances, the CV is expected to be constant regardless of the reticulocyte percentage.10 However, if, as shown by our data, a log transformation is not appropriate for reticulocyte percentages, then the CV also is inappropriate. The reason that lower CVs (and a correspondingly false indication of precision) typically are reported for higher reticulocyte counts is that, even though the SD tends to increase as the mean increases, division of the larger SD by its larger mean generates a smaller CV for reticulocyte data. Substituting Non–ADVIA 120 Data Into Blood Models To establish whether a candidate instrument can be designated as adequate for potential use in the context of deriving valid OFF-hr model scores, we propose that the 824 824 Am J Clin Pathol 2004;121:816-825 DOI: 10.1309/1FAM1VT3N76GJGXV following criteria be applied on a case-by-case basis (including ADVIA 120 instruments as well) rather than generically across a given instrument model. First, because lack of access to another ADVIA 120 makes derivation of a Bland-Altman comparison problematic for many technicians, an alternative approach is required to lend credence to the presumption that what the candidate machine recognizes as a reticulocyte is comparable and consistent with what the ADVIA 120s (used in the OFF-hr model derivation) detected as reticulocytes. This criterion can be satisfied by establishing that the SD of between-sample variation of the candidate machine is within the range of 0.134 and 0.196 (see SDsamp; ❚Appendix 1❚). Second, the precision of the candidate machine should be equal (or superior) to the precision of the ADVIA 120s used in the model derivation. This can be accomplished by running samples (eg, the samplator blood samples) in duplicate and confirming that the between-duplicate SD of the square root–transformed reticulocyte percentages (see SDdup, Appendix 1) is equal to or less than 0.05 (the value for the ADVIA 120s used in model derivation). The imprecision of the ADVIA 120 platform is inherently factored into the thresholds for the OFF-hr model; therefore, provided that the candidate instrument’s precision is within these bounds, any machine-specific variation should not (unduly) influence the integrity of the model. We occasionally encountered readings that were divergent from the sample’s designated “true” value (ie, the median value for that blood sample measured across multiple instruments). Although such variation already is incorporated into the OFF-hr model thresholds, which were derived using a single measurement of each blood sample, when an athlete’s OFF-hr score exceeds a nominal threshold, it seems prudent to repeat the analysis to ratify the initial value. The average of multiple replicates provides a more precise estimate of the sample’s true value and, therefore, enhances the specificity of the OFF-hr model approach. Summary We have identified 3 key findings from this research. First, results derived from several candidate platforms are sufficiently comparable with the ADVIA 120 method to allow values to be substituted into the OFF-hr model. Second, there exists a robust approach to adjust for the bias of a candidate instrument. Third, measurement precision for low reticulocyte values is not inferior to that for normal levels. These findings pave the way for antidoping authorities to harvest hematologic information from a host of diverse sources. Because the OFF-hr model bestows a reach-back of several weeks to detect previous rHuEPO use, frequent blood testing and the commensurate fear of © American Society for Clinical Pathology Hematopathology / ORIGINAL ARTICLE ❚Appendix 1❚ Samplator Protocol Recommended Approach to Quantify and Adjust for Instrument Bias 1. Blood samples to be obtained from at least 54 male endurance athletes, at sea level at 1 location. All samples preferably obtained on the same day, but spread over 2 consecutive days at most. Reticulocyte percentages should be measured in duplicate. 2. Calculate the within-sample (between duplicates) SD using the formula: n 2 SDdup = Σ i=1 (x i1– x i2) 2n where xi1 and xi2 are the square roots of the duplicate reticulocyte percentage readings on the ith sample and n is the number of samples. Should the value of SDdup exceed 0.05, the instrument should be checked and the procedure repeated. 3. If SDdup is 0.05 or less, for each sample, calculate sri = (x1i + x2i ) /2, the average of the duplicate (square root transformed) reticulocyte values. n –2 4. Calculate SDsamp = Σi=1 (sri –sr) n –1 the between-samples SD, where sr is the mean of the sris. 5. If SDsamp is within the range 0.134 and 0.196, use the difference between sr and 1.12 as the paper correction for all subsequent (square root–transformed) percentage reticulocyte readings. For example, if sr = 1.084, then 0.036 (1.12 – 1.084) should be added to all subsequent (transformed) values (which will have the effect of reducing the OFF-hr model score [hemoglobin value – 60√reticulocyte value]), whereas if sr = 1.149, then 0.029 should be subtracted from all subsequent (transformed) values. 6. If SDsamp is less than 0.134, the instrument should be checked, and the whole procedure repeated. 7. If SDsamp is more than 0.196, use the box plot criteria (on the n sri values) to identify outliers in the sample. If outliers are identified and SDsamp is reduced to lie within the range (0.134-0.196) when the outliers are omitted, use the average of the remaining sr values to determine the bias correction, as in 5. If no outliers are found or if the removal of outliers does not result in an acceptable value for SDsamp, the instrument should be checked and the whole procedure repeated. √ √ detection this invokes should be a powerful deterrent against athletes manipulating their blood to seek an illegal performance advantage. From the 1Science and Industry Against Blood Doping Research Consortium, Gold Coast, and 2Department of Mathematics and Statistics, the University of Melbourne, Melbourne, Australia; 3Copenhagen Muscle Research Centre, Copenhagen, Denmark; and 4R&D Systems, Hematology Research, Minneapolis, MN. Supported by the World Anti-Doping Agency, Montreal, Canada. Address reprint requests to Dr Ashenden: Science and Industry Against Blood Doping (SIAB), Gold Coast QLD 4217, Australia. Acknowledgments: We thank Jennifer Bauer, Hematology QC Manager, and the staff technologists at R&D Systems, and Robin Parisotto for preliminary discussions concerning the manuscript, and we acknowledge the United States Anti-Doping Agency, whose conference precipitated the intellectual discussions leading to this collaboration. References 1. Peltre G, Holmann H. Validation of the Urine Test for Recombinant Human Erythropoietin. Montreal, Canada: World Anti-Doping Agency; 2003. 2. Parisotto R, Wu M, Ashenden MJ, et al. Detection of recombinant human erythropoietin abuse in athletes utilising markers of altered erythropoiesis. Haematologica. 2001;86:128137. 3. Gore CJ, Parisotto R, Ashenden MJ, et al. Second-generation blood tests to detect erythropoietin abuse by athletes. Haematologica. 2003;88:333-344. 4. Buttarello M, Bulian P, Farina G, et al. Flow cytometer reticulocyte counting. Am J Clin Pathol. 2001;115:100-111. 5. Bland J, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307-310. 6. Sharpe K, Hopkins W, Emslie K, et al. Development of reference ranges in elite athletes for markers of altered erythropoiesis. Haematologica. 2002;87:1248-1257. 7. Velleman PF, Hoaglin DC. Applications, Basics and Computing of Exploratory Data Analysis. Boston, MA: Duxbury Press; 1981. 8. National Committee for Clinical Laboratory Standards. Methods for Reticulocyte Counting (Flow Cytometry and Supravital Dyes); Approved Guideline. H44-A. Wayne, PA: NCCLS; 1997. 9. College of American Pathologists. 2003 Surveys: Participant Summaries for Hematology Automated Differentials FH1-B, FH2-B, FH3-B, FH4-B, FH6-B, FH8-B, FH9-B, FH10-B. Northfield, IL: College of American Pathologists; 2003. 10. Armitage P, Berry G. Statistical Methods in Medical Research, Oxford, England: Blackwell Scientific Publications; 1987. Am J Clin Pathol 2004;121:816-825 © American Society for Clinical Pathology 825 DOI: 10.1309/1FAM1VT3N76GJGXV 825 825
© Copyright 2026 Paperzz