Space-Time Clustering of Childhood Leukemia in San Francisco`

[CANCER RESEARCH 30, 1969—1973, July 1970]
Space-Time
Clustering of Childhood Leukemia in San Francisco'
Melville R. Klauber and Piero Mustacchi
University of Utah College ofMedicine, Salt Lake Cyty, Utah 84112, and University ofCalifornia School of
Medicine, San Francisco,California94122
SUMMARY
berland and Durham
clustering.
but found no significant
space-time
Mantel's regression approach was used to test for space-time
clustering for time of diagnosis and address of 149 leukemia
cases in children under the age of 15 years diagnosed in San
Francisco during the 20-year period 1946 to 1965. The cases
were partitioned into consecutive time intervals, and the sum
of the distances between pairs of cases falling within intervals
was compared to its expectation, assuming a random alloca
clustering for a case series which partially overlaps the series
presented in this communication. A deficit of leukemia cases
for all ages in the period 1949 to 1955 was discovered in those
San Francisco census tracts where there had been an absence
ofleukemia cases for the period 1946 to 1948.
tion of space points to time points. Five different interval sizes
were used: 0.5, 1, 2, 4, and 12 months. The 2-month intervals
MATERIALS AND METHODS
For San Francisco,
Mustacchi
(8) found
a form of spatial
were chosen in advance for significance testing purposes at the
San Francisco resident cases of leukemia in children under
age 15 years diagnosed during the 20-year period January 8,
and 2 to 14 years. Neither of the age groupings showed statis
1946 to January 7, 1966, were identified through the co
tically significant clustering for 2-month intervals. The only
operation of the San Francisco Department of Public Health;
analysis of this same type yielding a p value less than 0.05 was
for ages 2 to 14 years and with 12-month intervals (p 0.024). San Francisco Medical Society ; Stanford University Medical
A comparison of “within―
time interval average distance School ; and administrative, medical, and record room staffs of
every San Francisco hospital. Clinical charts and death certifi
between cases to “between―
time interval average distance
cates were searched for the diagnosis of leukemia.
indicated that “clustering―
in the series was weak. In addition,
Mantel's approach (6) was used in the present study. Two
Knox' s approach was used for descriptive purposes. The clus
distance measures were computed, X a spatial measure and Y a
tering appeared to be the result of a slight excess of pairs
temporal measure, for each of the n(n—l)/2 unordered pairs
within relatively large time and/or large space distances.
of cases. Let I and j index any pair of distinct cases. Statis
Data are presented indicating that a single analysis of a case
tically significant space-time clustering is evidenced by a large
series with a long time span can be subject to artifactual
departure (in the appropriate direction) of the statistic
space-time clustering occurring in the population at risk.
5% level. Separate
analyses
were performed
for ages 0 to 14
@
w
=
0.5 Z
=
@‘
X,@ Y@1
INTRODUCTION
Since the report of an apparent cluster of childhood leu
kemia by Heath and Hasterlik (3) in 1963, a number of objec
tive statistical methods have been proposed and applied to try
from
its expectation
based
on the assumption
of random
matching of the space and time coordinates. The spatial
measure chosen was the distance in feet between the cases
which was plotted to the nearest 100 feet. The temporal
to determine
whetheror not thereis significant
space-time
measure was defined as Y@1 1 if Cases i and / were diagnosed
clustering of this disease. Knox (5) used a 2 X 2 contingency
within the same time interval and Y,1 0 otherwise.
table approach for the set of all spatial and temporal distances
w is merely
the sum of spatial distances
between
all pairs of
(by residenceand time of onset)for leukemiacaseswith onset cases that were both diagnosed within the same time interval.
before age 6 in Northumberland and Durham (1951 to 1960).
A significant excess of cases less than 1 km apart in residence
and less than 60 days apart in time of onset was found. The
Five different time intervals were investigated: 0.5, 1, 2, 4,
and 12 months. The 2-month interval was the predetermined
value chosen for significance testing purposes at the 5% level.
same method was applied to data for leukemia cases with Each of the 5 sizes of intervals consisted of 1 or more consecu
onset under age 15 years for Portland, Ore., and its environs. A tive 0.5-month intervals: January 8 to January 22, January 23
significant excess was not found for cases less than 1 km and to February 7, February 8 to February 22, etc. This unusual
60 days apart, but interaction was best shown for distances arrangement of intervals was chosen in order to make the
less than 4 km and 250 days apart (7). Barton et al. (2) applied intervals approximately the same length and at the same time
the Barton-David approach (1) to Knox's data for Northum
to try to reduce the likelihood of misclassification. Each study
year, starting at January 8, was divided into 24 0.5-month
1 This
investigation
was
supported
by
USPHS
Grants
CA
05924
and 9751 and the Laurence and Alice Anspacher Myers Fund.
Received May 20, 1968; accepted March 20, 1970.
intervals,
7 of 16 days, 16 of 15 days, and 1 of 13 days (14
days in leap years). In the case series, the day of the month
was 1 for 37% of the cases and 15 for 32% of the cases. By
JULY 1970
Downloaded from cancerres.aacrjournals.org on June 16, 2017. © 1970 American Association for Cancer Research.
I969
Melville R. Klauber and Piero Mustacchi
“centering―the 0.5-month
intervals
extensive discussion of the method and its relationship
on the 1st and 15th days
to a
of themonth,it washopedto reducemisclassification
errors number of other approaches to the space-time clustering prob
that might arise if the division had been made closer to the 1st
and 15th days of the month.
Approximate p values were determin@4@y treating the
standardized
distributed
statistic
(W —Exp
W) /v'Var
lem.
It was necessary to reduce space-time clustering due to
varying population growth rates in different neighborhoods
during the 20-year period. The 1950 and 1960 census data for
San Francisco persons ages 0 to 14 years indicated that the
rate of population growth varied considerably from census
tract to census tract; 35% of the tracts had 25% or more
W as a normally
random variable. The reader is referred to Ref. 6
for the formulas usedto compute Exp Wand Var W and for
Table 1
Normal approximation p values for Mantel's approach for leukemia
under age 15 years in San Francisco (1946 to 1965)
by age group and size oftime intervals
deviation
at0.5
Age
mo.0—14
group
(yr)p
separate
mo.4
2—140.71 0.850.31
0.460.32
0.260.17
mo.2
mo.1
mo.4
may have acquired
mo.12
intervals
(15)
(38)
(64) (128) (385)
17,720
Between
17,520
17,540
17,570
17,500
(1,407)
intervals19,420
(1,664)16,740
(1,754)16,810
(1,728)16,820
(1,777)17,360
coordinates
of
the
residences
were
determined
to
based
on the
January
8, 1946 to January
7, 1951;
in parentheses.
the disease as a result of a congenital
con
dition. A supplementary analysis with reciprocal spatial and
temporal measures was performed to determine whether the
use of time intervals and a distance measure which does not
give extra weight to short distances obscured the degree of
clustering.
Two further analyses are presented for descriptive rather
than significance testing purposes. For each time interval size
used in the Mantel approach, a comparison was made of
“within―
time interval average distance between cases to the
“between―
time interval average distance. Knox's approach (5)
which compares the number of pairs of cases less than a given
spatial distance and less than a given temporal distance to
expectation
the
nearest 100 feet, and the averagedistances were rounded to the nearest
10 feet. Pooled averagesare given from the 4 separate 5-year analyses.
b No. of pairs shown
in 1960
The data were analyzed separately for the full 15-year age
group and ages 2 through 14 years. It was thought that clus
tering
might be enhanced by eliminating the children
diagnosed during the first 2 years of life, since these children
(611)
(103)
(194)
(25)b
(55)
17,650 17,670 17,670 17,700 17,740
(2,123)2—14Within
(2,540)17,360
intervals18,630
(2,631)17,060
(2,709)17,080
(2,679)17,270
spatial
population
(@;W - @;
Exp w)/-@J@;
Var W
intervals
Between
a The
5-year periods:
0.180.19 0.024
Table 2
A verage distance in feet between residences ofSan Francisco
childhood leukemia cases, within and between time intervals,
by age group and length oftime intervals
(ft)at:(yr)averageda0.5
Age
groupPairsDistance
mo.0—14Within
expected
January 8, 1951 to January 7, 1956, etc. Summary (normal)
p values were obtained with the statistic
mo.12
mo.2
mo.1
from
assumption that each tract had the same growth rate during
the previous 10 years for ages 0 to 14 years as the whole city
(9). In order to reduce the effect of this artifactual clustering
in the population at risk, the case series was analyzed in 4
was tried for the 16 combinations
obtainable
for
the spatial distances 0.25, 0.5, 1, and 2 miles with the tern
poral distances 30, 60, l20, and 365 days. Knox's approach
provides additional descriptive information over the average
distance approach in that it shows the excess of pairs within
Table 3
Observed/expected number ofpairs ofSan Francisco childhood leukemia cases diagnosed January 6, 1946, to January 7, 1966,
less than indicated separation in space (address) and time (data ofdiagnosis) by age group
Space
121)at:<30
149)at:Ages2—14
yr(n
distance
days<0.252/0.6
days<60
(miles)Ages0—14yr(n
days<120
days<365
days<30
(6)4/2.9
(8)14/8.4
(18)1/0.3
days<120
days<365
(2)1/0.8
(2)2/1.7
(4)9/5.1
(8)5/5.9
days<60
(11)<0.52/2.1
(4)a3/1.3
(30)<l4/5.8
(4)7/4.4
(14)10/9.5
(20)31/27.7
(48)1/1.2
(2)4/2.6
(70)<220/19.4
(8)12/12.1
(22)26/25.9
(42)79/75.7
(98)3/3.5
(6)8/7.4
(15)17/16.6
(30)50/49.6
(36)46/40.4
(68)100/86.6
(102)264/252.7
(139)12/16.6
(22)32/24.6
(49)68/55.3
(76)177/163.3
(10)19/17.3
(110)
a No. of individuals involved in pairs observed shown in parentheses.
1970
CANCER RESEARCH VOL. 30
Downloaded from cancerres.aacrjournals.org on June 16, 2017. © 1970 American Association for Cancer Research.
Space-Time
Clustering of Childhood
Leukemia
Table 4
Leukemia cases under age 15 years diagnosed with addresses in San Francisco by date ofdiagnosis, coordinates
(V = No. ofIOO fret east oforigin, W No. oflOO feet north oforigin), age in years, and sex
@
Coordinates
Coordinates
Coordinates
Coordinates
Coordinates
Year, _________________ Year, _________________ Year, _________________ Year, _________________ Year, __________________
mo./day V W Age Sex mo./day V W Age
mo./day
mo./day
mo./day
Sex
V W Age Sex
V W Age Sex
V W Age Sex
194619584/1691332M6/15725210F12/1286195OF1/133331013F7/1265542F5/1187260OM7/1662356M12/1911984M1/10179724M7/13482392M7/1483285M8/11933054F1
4/1
275
2 M
78
1947157617M10/20
3/1163619F2/13
5/1198 216103
1962842149M2/211053234M1/12682111F4/158819812F6/17312776M2/13303150Flo/l3232433F1/51003734F4/152822490M7/12292792M5/11081924F11/153652013M3/1
19513483122M1955
1243 8M F9/15
164249
2821210M F7/69/15208
7/1552
105282
3/20131
38714 1F F5/15
18571 1911 6F M5/288/1330
M194810/153503162M8/5843193M12/42773185M8/155534713F1/15137925F8/212033297M9/152002949F1/202431983F19529/13642503F195
125280
2773 12M
138 155 2 F 10/1 293 204
5/3
6 M 6/7 241 183 13 F
11/15231
5/2422030189
1541 3MF7/15
19491571049M1953
1/151852144M4/15
292286
1873 5M M3/297/1095 153312
M1/182691120M2/726117312F5/261223280F12/1923997M10/1135930214F3/12013450M3/13451783M6/1522715714M12/15322102M10/153362172F3/15
31114 5M
1/12
9/28 92 287 6 F
9/20 135 74 2 F
127251276 3M F7/15
12/15222
1/15116
352367
2853 1M F9/15
F1/1223022M2/12481842M4/11833291M1/151132164M2/73542970M19574/15992146F19654/1673294M2/152991460M6/153042098F4/212671380M12/28150903M4/12
19612583223F7/15
10/263283562572894 2M
19501813363F1954
various distances in both dimensions. Knox's approach was not
used for statistical significance testing; however, Knox's
statistic has the form of a Mantel Z-statistic and may be tested
by Mantel's approach.
RESULTS
For the predetermined time interval of 2 months for ages 0
to 14 years a nonsignificant p value of 0.32 was obtained; for
ages 2 to 14 years the p value was reduced only slightly (0.26).
Table 1 shows that there appears to be a trend of greater
space-time clustering with larger time intervals, especially for
the 2- to 14-year age groups. For 12-month intervals, the p
values are 0.19 for ages 0 to 14 years and 0.024 for ages 2 to
JULY
14 years. Evidently, removing cases diagnosed during the first
2 years of life did enhance the space-time clustering con
siderably when 12-month intervals were used.
The average distance in feet between residences of the
cases for those pairs within the same time intervals and
those pairs not in the same time intervals were computed
for each 5-year period, the 5 interval lengths, and the 2
age groupings. The 40 differences in averages were so
small that only the pooled averages are shown in Table
2; the largest difference in averages among the 4 5-year periods
was for 1956 to 1960 with 12-month intervals, where the
“withintime intervals― average was 15,740 feet and the
“betweentime intervals―average was 17,7 10 feet. In Table 2,
the pooled averages were obtained merely by summing the
distances between the indicated pairs from the individual
1970
Downloaded from cancerres.aacrjournals.org on June 16, 2017. © 1970 American Association for Cancer Research.
1971
Melville R. Klauber and Piero Mustacchi
5-year analyses and dividing by the number of pairs. In no case 62,500 and K@ = 60); for ages 2 to 14 years, the range
did the “withinintervals―pooled average distance exceed the of p values was 0.020 to 0.49 with 7 of the 25 values
less than 0.05 (p = 0.020 occurred with K5 62,500 and
“betweenintervals―average by as much as 1,000 feet.
120). Apparently, no serious division of “clusters―
The obse rved and expected pairs less than various K@
for the age group 2 to 14 years when the
space and time distances apart are shown in Table 3. occurred
These indicate that the smaller “within―
than “between―interval time measure was used. For ages 0 to 14 years,
average distance for intervals of 1 month or longer in the minimum p value obtained with the reciprocal trans
Table 2 were not due to excessive numbers of cases oc formation was 0.058 compared to 0.17 with the interval
distance measure; but with the former measure, 5 times
curring very close to each other. It appears
that there
are slightly more than expected pairs less than 0.25 mile as many forms of the test were tried as with the
apart, close to the expected number less than 0.5 and 1 latter, and none of all 30 results were significant at the
mile apart, and more than expected less than 2 miles 5% level.
The 6 of the 7 “significant―
p values were probably
apart. One of the largest observed to expected ratios
obtained, 9/5.1 for the age group 2 to 14 years, for little affected by the time measurement errors mentioned
pairs less than 0.25 mile and 365 days apart, involved above, since they involved the use of additive time con
only 11 cases. One case was in 3 pairs, 5 cases were in stants, K@ of 60 (days) or more.
2 pairs, and the remaining 5 cases appeared in only 1
pair. The data in Table 3 are taken from the 11,026 DISCUSSION
pairs obtainable from the 149 cases ages 0 to 14 and
the 7,260 pairs from the 121 cases ages 2 to 14 for the
The problem of assessing the statistical significance of
full 20-year period and hence may be affected by space a number of space-time clustering studies of a rare dis
time clustering in the population at risk. For descriptive ease is hard because the replication
of studies
under
purposes,
it was thought
desirable
to identify
the
extent
of excess pairs of “close―
cases for the whole period.
Table 4 provides a complete listing of cases in chrono
logical order of diagnosis and shows the coordinates of
the home address, sex, and age at time of diagnoses.
Further analysis of the present data clearly indicated
the necessity of dividing the 20-year period in order to
reduce
the
effect
of space-time
clustering
of the
popula
similar conditions
is difficult,
if not impossible.
Because
of differences
in geography
and population
density,
results
obtained
in 1 area cannot
be definitively
con
firmed or refuted
by a study in another
area. If one
wishes to repeat a study in the same area, since the
disease is rare, a great deal of time is required and, with
the passage of time, important changes in population
density and distribution can occur. This makes it desirable
to try a variety of space and time measures in order to
tion at risk. When the analysis was repeated with the
whole 20-year series as a single unit, all 10 p values were elucidate
the nature of the clustering,
if there be any.
reduced from what they were for the summary statistic When a number of measures are used for the same data,
from the four 5-year analyses. The largest absolute and the probability values obtained become nominal. Further
proportional reductions were 0.19 to 0.13 (ages 0 to 14 interpretation
of them is clouded
by the statistical
de
years,
I 2-month
intervals)
and
0.024
to 0.015
(ages 2 to
14 years, 12-month intervals), respectively.
By using the time measure which identifies whether or
not
pairs
of
cases
lie
in
the
same
time
interval,
the
problem of errors in time measurements was mitigated,
but this ran the risk of putting children whose address
and time of diagnosis were close into separate intervals.
For determination
of whether or not the separation of
cases into time intervals and the use of untransformed
spatial distances did in fact obscure clustering, the anal
ysis was repeated in the same way, except with different
space and time measures. The reciprocal measure sug
gested by Mantel (5) was used for both measures:
xii = l/(K@
+D11)
andY,1 l/(Kt+ T@1),
where K@ and K@ are constants and D11 and T@, are the
distance in feet and time separation in days, respectively,
between Cases i and j. The 25 combinations of constants
obtainable
from
the
values K5
100, 500, 2,500,
12,500, 62,500 and K@ = 15, 30, 60, 120, 365 were
tried for both age groups, 0 to 14 years and 2 to 14
years. The range of p values obtained for ages 0 to 14
years was 0.058 to 0.49 (p
0.058 occurred with K5
1972
pendence
of one such result
upon
another.
In the present
study,
the problem
of interpreting
the p values is
academic, since even if the space-time clustering is real, it
is weak.
The residence of the child in a relatively compact
highly urbanized city, such as San Francisco, may not be
an appropriate spatial location for studies of space-time
clustering. A child in San Francisco may easily come into
contact with large numbers of persons who live through
out the city, but a child living in a suburban
or rural
community
is likely to be in contact with far fewer per
sons. Likewise, an environmental
agent
present in a small area can potentially
higher percentage of a dense population
dense population of the same size.
which may be
affect a much
than of a less
Is this to say that investigation
of space-time clustering
of leukemia cases as a means of elucidating
the etiology
of the disease is bankrupt?
continued
use
of
the
No, what is needed
general
approach,
but
is some
with
more
meaningful observations. An unsuccessful attempt at this
was a study of hospital of birth clustering by Klauber
(4). Other attempts should be made with the use of
observations
of times and places consistent
hypothesis
of a given type of leukemogenic
with a given
exposure.
CANCER RESEARCH VOL. 30
Downloaded from cancerres.aacrjournals.org on June 16, 2017. © 1970 American Association for Cancer Research.
Space-Time
ACKNOWLEDGMENTS
Clustering of Childhood Leukemia
2. Barton, D. E., David, F. N., and Merrington, M. A Criterion for
Testing Contagion in Time and Space. Ann. Human Genet., Lon
We are grateful to Nathan Mantel of the National Cancer In
stitute for allowing us the use of a prepublication copy of Ref.
6 and for a number of suggestions that led to improvement in
this communication, especially the suggestion to split the time
period and to observe the effect of eliminating cases under the
age of 2 years. However, we bear the responsibility for any er
rors. We are indebted to Richard Walster for computer pro
gramming and the University of Utah Computer Center for time
on the UNIVAC 1180 computer.
REFERENCES
1. Barton, D. E., and David, F. N. Randomization Bases for Multi
variate Tests I. The Bivariate Case, Randomness of N Points in a
Plane. BulLInst. Intern. Statist.,39: 455—467,1962.
JULY
don,29: 97—102,1965.
3. Heath, C. W., and Hasterlik, R. J. Leukemia among Children in a
Suburban Community. Am. J. Med., 34: 796—812,1963.
4. Klauber, M. R. A Study of Clustering of Childhood Leukemia by
Hospital of Birth. Cancer Res., 28: 1790—1792, 1968.
5. Knox, G. Epidemiology of Children Leukemia in Northumberland
and Durham. Brit. J. Prevent. Social Med., 18: 17—24,1964.
6. Mantel, N. The Detection of DiseaseClustering and a Generalized
RegressionApproach. Cancer Res., 27: 209—220,1967.
7. Meighan, S. S., and Knox, G. Leukemia in Childhood: Epidemi
ology in Oregon. Cancer, 18: 8 11—814,1965.
8. Mustacchi, P. Some Intra-City Variations of Leukemia Incidence in
San Francisco. Cancer, 18: 362—368,1965.
9. U. S. Census of Housing: 1950 and 1960 Series HC(3) City Blocks.
San Francisco, Calif. : U. S. Dept. of Commerce, Bureau of the
Census.
1970
Downloaded from cancerres.aacrjournals.org on June 16, 2017. © 1970 American Association for Cancer Research.
1973
Space-Time Clustering of Childhood Leukemia in San Francisco
Melville R. Klauber and Piero Mustacchi
Cancer Res 1970;30:1969-1973.
Updated version
E-mail alerts
Reprints and
Subscriptions
Permissions
Access the most recent version of this article at:
http://cancerres.aacrjournals.org/content/30/7/1969
Sign up to receive free email-alerts related to this article or journal.
To order reprints of this article or to subscribe to the journal, contact the AACR Publications
Department at [email protected].
To request permission to re-use all or part of this article, contact the AACR Publications
Department at [email protected].
Downloaded from cancerres.aacrjournals.org on June 16, 2017. © 1970 American Association for Cancer Research.