G
Data Analysis of Table 1, Randich et al.
The Randich et al. (Ref. 1) paper is based on an analysis of compositional data provided by two secondary lead smelters to bullet manufacturers on their lead alloy shipments. For each element, Randich et al. provide three measurements from each of 28 lead (melt) lots being poured into molds. The measurements were taken at the beginning (B), middle (M), and end (E) “position” of each pour. In this appendix, the variability in the measurements within a lot (due to position) is compared with the variability across lots. Consistent patterns in the lots and positions are also investigated.
Let u_{ijk} denote the logarithm of the reported value in position i (i = 1, 2, 3, for B, M, E) in lot j (j = 1, …, 28), on element k (k = 1, …, 6, for Sb, Sn, Cu, As, Bi, and Ag). A simple additive model for u_{ijk} in terms of the two factors position and lot is
where _{k} denotes the typical value of u_{ijk} over all positions and lots (usually estimated as the mean over all positions and lots, ); ρ_{ik} denotes the typical effect of position i for element k, above or below _{k} (usually estimated as the mean over all lots minus the overall mean, ); λ_{jk} denotes the typical effect of lot j for element k, above or below _{k} (usually estimated as the mean over all positions minus the overall mean, ); and ε_{ijk} is the error term that accounts for any difference that remains between u_{ijk} and the sum of the effects just defined (usually estimated as
Because replicate measurements are not included in Table 1 of Randich et al., we are unable to assess the existence of an interaction term between position and lot; such an interaction, if it exists, must be incorporated into the error term, which also includes simple measurement error. The parameters of the model (_{k}, ρ_{ik}, λ_{jk}) can also be estimated more robustly via median polish (Ref. 2), which uses medians rather than means and thus provides more robust estimates, particularly when the data include a few outliers or extreme values that will adversely affect sample means (but not sample medians). This additive model was verified for each element by using Tukey’s diagnostic plot for twoway tables (Ref. 2, 3).
The conventional way to assess the signficance of the two factors is to compare the variance of the position effects, Var and the variance of the lot effects, Var scaled to the level of a single observation, with the variance of the estimated error term, Var(r_{ijk}). Under the null hypothesis that all ρ_{ik} are zero (position has no particular effect on the measurements, beyond the anticipated measurement error), the ratio of 28·Var to Var should follow an F distribution with two and 54 degrees of freedom; ratios that exceed 3.168 would be evidence that position affects measurements more than could be expected from mere measurement error.
Table G.1 below provides the results of the twoway analysis of variance with two factors, position and lot, for each element. The variances of the effects, scaled to the level of a single observation, are given in the column headed “Mean Sq”; the ratio of the mean squares is given under “F Value”; and the P value of
TABLE G.1 Analyses of Variance for Log(Measurement) Using Table 1 in Randich et al. (Ref. 1)
Sb 
Df 
Sum Sq 
Mean Sq 
F Value 
Pr (> F) 
MS (median polish) 
Position 
2 
0.001806 
0.000903 
2.9449 
0.06111 
0.004 
Lot 
27 
0.111378 
0.004125 
13.4514 
1.386e15 
0.0042 
Residuals 
54 
0.016560 
0.000307 



Sn 
Df 
Sum Sq 
Mean Sq 
F Value 
Pr (> F) 
MS (median polish) 
Position 
2 
2.701 
1.351 
7.5676 
0.001267 
0.2345 
Lot 
27 
147.703 
5.470 
30.6527 
<2.2e16 
6.0735 
Residuals 
54 
9.637 
0.178 



Cu 
Df 
Sum Sq 
Mean Sq 
F Value 
Pr (> F) 
MS (median polish) 
Position 
2 
0.006 
0.003 
0.1462 
0.8643 
0.00003 
Lot 
27 
102.395 
3.792 
176.9645 
<2e16 
4.1465 
Residuals 
54 
1.157 
0.021 



As 
Df 
Sum Sq 
Mean Sq 
F Value 
Pr (> F) 
MS (median polish) 
Position 
2 
0.0127 
0.0063 
2.1046 
0.1318 
0.0036 
Lot 
27 
15.4211 
0.5712 
189.5335 
<2e16 
.5579 
Residuals 
54 
0.1627 
0.0030 



Bi 
Df 
Sum Sq 
Mean Sq 
F Value 
Pr (> F) 
MS (median polish) 
Position 
2 
0.000049 
0.000024 
0.3299 
0.7204 
0.0000 
Lot 
27 
0.163701 
0.006063 
81.9890 
<2e16 
0.0061 
Residuals 
54 
0.003993 
0.000074 



Ag 
Df 
Sum Sq 
Mean Sq 
F Value 
Pr (> F) 
MS (median polish) 
Position 
2 
0.00095 
0.00047 
1.6065 
0.21 
0.0000 
Lot 
27 
1.95592 
0.07244 
245.6707 
<2e16 
0.0735 
Residuals 
54 
0.01592 
0.00029 



this statistic is listed under “Pr(> F)”. For comparison, the equivalent mean square under the median polish analysis is also given; notice that, for the most part, the values are consistent with the mean squares given by the conventional analysis of variance, except for Sn, for which the mean square for position is almost 6 times smaller under the median polish (1.351 versus 0.2345).
Only for Sn did the ratio of the mean square for position (B, M, E) to the residual mean square exceed 3.168 (1.351/0.178); for all other elements, this ratio was well below this critical point. (The significance for Sn may have come from the nonrobustness of the sample means caused by two unusually low values: Lot #424, E = 21 (B = 414, M = 414); and Lot #454, E = 45 (B = 377, M = 367). When using median polish as the analysis rather than conventional analysis of variance, the ratio is (0.2345/0.178) = 1.317 (not significant).) For all elements, the effect of lot is highly significant; differences among lots characterize nearly all the variability in these data for all elements.
Table G.2 provides the estimates of the position and lot effects in this format:
The analysis suggests that the variation observed in the measurements at different positions is not significantly larger than that observed from the analytical measurement error. All analyses were conducted with the statistics package R (Ref. 4).
TABLE G.2 Median Polish on Logarithms (Results Multiplied by 1,000 to Avoid Decimal Points)
Sb 
423 
424 
425 
426 
427 
429 
444 
445 
446 
447 
448 
1 
−7 
0 
−4 
−10 
6 
0 
19 
7 
1 
−15 
0 
2 
0 
0 
0 
0 
−3 
−1 
0 
−3 
0 
1 
3 
3 
9 
−104 
2 
24 
0 
6 
−5 
0 
−8 
0 
−5 
Column Effect 
−40 
6 
12 
27 
−56 
57 
34 
−53 
1 
13 
38 

450 
451 
452 
453 
454 
455 
456 
457 
458 
459 
460 
1 
−10 
−1 
−3 
0 
0 
0 
0 
−2 
0 
−5 
−4 
2 
0 
0 
0 
1 
8 
−4 
−9 
2 
3 
0 
0 
3 
3 
11 
8 
−48 
−33 
12 
5 
0 
−3 
2 
44 
Column Effect 
−16 
−35 
−9 
−1 
57 
−53 
−34 
47 
−49 
52 
−12 

461 
463 
464 
465 
466 
467 
Row Effect 




1 
66 
0 
0 
1 
0 
4 
0 




2 
−5 
−5 
−4 
0 
−8 
0 
0 




3 
0 
5 
0 
−21 
10 
−2 
−6 




Column Effect 
−32 
53 
−34 
−37 
23 
1 
6559 




Sn 
423 
424 
425 
426 
427 
429 
444 
445 
446 
447 
448 
1 
0 
0 
0 
−41 
144 
−45 
271 
0 
0 
0 
−179 
2 
127 
69 
−27 
0 
−192 
0 
0 
4 
61 
−55 
0 
3 
−120 
−2800 
11 
148 
0 
60 
−53 
−42 
−15 
168 
9 
Column Effect 
−1050 
371 
−625 
672 
−2909 
1442 
−659 
−408 
−884 
−618 
108 

450 
451 
452 
453 
454 
455 
456 
457 
458 
459 
460 
1 
0 
605 
−22 
1428 
0 
−45 
−6 
240 
41 
−77 
−5 
2 
−9 
0 
0 
−112 
42 
0 
28 
−30 
0 
0 
0 
3 
201 
−313 
83 
0 
−1944 
99 
0 
0 
−176 
88 
139 
Column Effect 
−122 
−2328 
−942 
−5474 
277 
338 
203 
−1067 
−349 
849 
787 

461 
463 
464 
465 
466 
467 
Row Effect 




1 
−22 
−65 
0 
436 
0 
−54 
69 




2 
0 
0 
53 
−71 
−4 
0 
0 




3 
118 
112 
−443 
0 
95 
68 
−112 




Column Effect 
908 
933 
938 
−117 
846 
560 
5586 




Two unusual residuals: Lot #424, “E” = 21 (B = 414, M = 414) Lot #454, “E” = 45 (B = 377, M = 367) 
Cu 
423 
424 
425 
426 
427 
429 
444 
445 
446 
447 
448 
1 
−166 
−19 
−18 
93 
−2 
−13 
0 
−8 
0 
0 
106 
2 
0 
0 
0 
0 
0 
0 
2 
0 
35 
34 
−23 
3 
12 
51 
0 
−121 
0 
0 
−38 
0 
−43 
−21 
0 
Column Effect 
607 
258 
−94 
418 
80 
−424 
436 
269 
441 
307 
−1106 

450 
451 
452 
453 
454 
455 
456 
457 
458 
459 
460 
1 
−16 
−27 
−37 
44 
0 
27 
76 
13 
0 
−53 
−2 
2 
0 
0 
0 
0 
52 
−5 
0 
0 
2 
0 
0 
3 
0 
24 
0 
0 
−470 
0 
0 
0 
−5 
49 
288 
Column Effect 
30 
−495 
−1523 
−30 
630 
448 
330 
30 
50 
−1894 
−2405 

461 
463 
464 
465 
466 
467 
Row Effect 




1 
−2 
691 
0 
−242 
13 
−24 
2 




2 
0 
0 
−28 
10 
−31 
0 
0 




3 
19 
0 
857 
0 
0 
11 
0 




Column Effect 
−958 
−4890 
−1365 
−255 
−700 
−357 





As 
423 
424 
425 
426 
427 
429 
444 
445 
446 
447 
448 
1 
 −166 
−19 
−18 
93 
−2 
−13 
0 
−8 
0 
0 
106 
2 
0 
0 
0 
0 
0 
0 
2 
0 
35 
34 
−23 
3 
12 
51 
0 
−121 
0 
0 
−38 
0 
−43 
−21 
0 
Column Effect 
607 
258 
−94 
418 
80 
−424 
436 
269 
441 
307 
−1106 

450 
451 
452 
453 
454 
455 
456 
457 
458 
459 
460 
1 
−16 
−27 
−37 
44 
0 
27 
76 
13 
0 
−53 
−2 
2 
0 
0 
0 
0 
52 
−5 
0 
0 
2 
0 
0 
3 
0 
24 
0 
0 
−470 
0 
0 
0 
−5 
49 
288 
Column Effect 
30 
−495 
−1523 
−30 
630 
448 
330 
30 
50 
−1894 
−2405 

461 
463 
464 
465 
466 
467 
Row Effect 




1 
−2 
691 
0 
−242 
13 
−24 
2 




2 
0 
0 
−28 
10 
−31 
0 
0 




3 
19 
0 
857 
0 
0 
11 
0 




Column Effect 
−958 
−4890 
−1365 
−255 
−700 
−357 
4890 




Bi 
423 
424 
425 
426 
427 
429 
444 
445 
446 
447 
448 
1 
0 
−11 
0 
0 
10 
−10 
0 
10 
0 
0 
0 
2 
−10 
0 
0 
0 
0 
0 
0 
0 
0 
9 
0 
3 
0 
0 
0 
10 
0 
0 
0 
0 
0 
0 
0 
Column Effect 
−5 
−78 
−46 
−25 
−25 
−35 
15 
15 
63 
90 
15 

450 
451 
452 
453 
454 
455 
456 
457 
458 
459 
460 
1 
0 
−9 
0 
52 
0 
0 
0 
0 
0 
0 
0 
2 
−9 
0 
10 
0 
0 
−11 
0 
0 
0 
0 
0 
3 
0 
9 
0 
−11 
−21 
0 
11 
0 
0 
10 
10 
Column Effect 
53 
90 
−25 
−67 
−35 
−67 
−67 
34 
25 
34 
15 
REFERENCES
1. Randich, E.; Duerfeldt, W.; McLendon, W.; and Tobin, W. Foren. Sci. Int. 2002,127, 174−191.
2. Tukey, J. W. Exploratory Data Analysis; AddisonWesley: Reading, MA, 1977.
3. Mosteller, F. and Tukey, J. W. Data Analysis and Regression: A Second Course in Statistics; AddisonWesley: Reading, MA, 1977, pp 192–199.
4. R. Copyright 2002, The R Development Core Team, Version 1.5.1 (20020617), for the Linux operating system see <http://www.rproject.org>.