STATISTICAL INVESTIGATION OF SAMPLE CHARACTERISTICS NEEDED FOR

STATISTICAL INVESTIGATION OF SAMPLE CHARACTERISTICS NEEDED FOR
ESTABLISHING THE ORIGIN OF URANIUM ORE CONCENTRATES/YELLOWCAKE
C. K. Baynea, J. M. Begovichb, D. A. Bostickb, J. A. Carterb,
J. M. Giaquintob, J. Horitab, D. L. Millerc, I. D. Hutcheond,
M. J. Kristod, E. C. Ramond, M. Robeld, S. A. Stewardd
a
Haselwood Services and Manufacturing, Inc., Oliver Springs, TN 37840
b
Oak Ridge National Laboratory, Oak Ridge, TN 37831
c
National Geospatial-Intelligence Agency, Bethesda, MD 20816
d
Lawrence Livermore National Laboratory, Livermore, CA 94550
ABSTRACT
During the past few years, the need to link uranium ore (yellowcake) to a particular
geologic/geographical location has become a critical issue to identifying proliferant networks.
The uranium isotopics and certain stable isotopics, along with trace elemental concentrations,
provide important information for making worthwhile statistical assessments of similar origins of
yellowcake samples. Statistical review of chemical composition data was based on both cluster
methods and principal component analysis. A comparison of sample grouping results obtained by
both statistical techniques is presented for a series of yellowcake samples from reportedly three
different routes of receipt.
INTRODUCTION
The source of nuclear material is crucial for nuclear security and nuclear forensics. Uranium
milling is the process of converting raw ore as it arrives from mining operations into a product
known as uranium yellowcake (also commonly referred to as uranium ore concentrate). The
yellowcake from these process steps contains a host of impurities, which originate from the ore or
are picked up from contact with different chemical and physical media during processing.1
Three data sets of yellowcake material are identified as Line 0, Line 1, and Line 2. The
identification convention used to represent Line 0 samples is by numbers less than 100, Line 1
samples by 100 series numbers, and Line 2 samples by 200 and 300 series numbers. Line 0 samples
were chemically analyzed primarily by inductively coupled plasma atomic emission spectrometry
(ICP-AES), with initial screens being performed by spark source mass spectrometry (SSMS). Line
1 and Line 2 samples were measured by ICP mass spectrometry (ICP-MS) using standard curves by
the standard addition technique. All samples of the yellowcake material were natural abundance
uranium (~ 0.711 wt % of 235U). The elemental concentrations (g/g) were measured on 23 samples
for Line 0, 23 samples for Line 1, and 35 samples for Line 2. The purpose of this exercise is to
group the 81 samples into one or more homogeneous clusters based on their elemental
concentrations. Statistical grouping of yellowcake samples may indicate that samples in each
cluster are derived from a similar mine ore body or chemical processing method.
Concentrations of 15 elements, Na, Mg, Zr, Fe, Al, Mo, Ca, K, Zn, V, Mn, Cu, As, Sr, and Ti, were
selected to characterize the 81 yellowcake samples. These elements had concentrations above the
detection limits and were measured on all three data sets. Other authors2-4 have used other elements
for clustering, particularly the isotopes of Pb.2,4
STANDARD DEVIATIONS
The elements with the greatest concentration spread are most likely to show different clusters of
yellowcake samples. Those elements with only a narrow range of concentrations may make
clustering samples (if there are any) harder to identify. Table 1 lists the simple statistics for
elemental concentrations. The table is useful for ranking the classification elements by their
importance for clustering yellowcake samples.
Table 1. Statistics for the 15 classification elements.
Element
Na
Mg
Zr
Fe
Al
Mo
Ca
K
Zn
V
Mn
Cu
As
Sr
Ti
N
81
81
81
81
81
81
81
81
81
81
81
81
81
81
81
Average
ȝJJ
17118
3458
3139
2646
1397
1195
1098
951
87
260
158
19
58
135
56
Standard Deviation
ȝJJ
23494
7271
6515
5016
3320
1661
1030
864
414
325
153
125
97
86
55
RSD
(%)
137
210
208
190
238
139
94
91
475
125
97
650
166
64
97
DATA CORRELATION
A linear relation between the concentrations of two elements can be measured by their correlation
coefficient. Correlation coefficients range from -1 to +1, with -1 indicating a linear relation with
negative slope and +1 a linear relation with positive slope. Correlation values between the two
limits represent weaker linear relations, with zero indicating no linear relation at all. If two
elements have a strong linear relation (e.g., correlation > 0.9 or < -0.9), the two elements provide
the same information because the concentrations of one element can be transformed into the
concentrations of the other element by a linear equation. Table 2 shows the strongest linear
relations for the 15 classification elements.
Table 2. Selected correlation coefficients for 15 classification elements.
Correlations
Na
Mg
Zr
0.91
Fe
Al
0.96
0.96
The correlations in Table 2 show that the five elements with the largest standard deviations can be
represented by the two classification elements of Na and Mg.
CLUSTER ANALYSIS
Different clustering methods merge yellowcake samples based on different distance measurements.
ethod5 was selected as the clustering method for the yellowcake samples and uses sum of
squares (SSW) for concentrations added over all classification elements. Suppose two clusters r and
s are joined to form cluster t, then SSWt > SSWr + SSWs. The increase in SSWt due to the joining
of clusters r and s will be quite small if the two clusters are very close and high if they are very
joins two clusters with the smallest increase in SSWt.
The clustering of yellowcake samples dendrogram. Tree diagrams start with each sample as an individual group, then merge sample
groups as the distance between clusters increases. the distance between two
clusters is represented by a semi-partial R2 value, which is the ratio of the between SSW of the two
individual clusters to the total SSW of the combined cluster.
! yellowcake samples using all 15 classification elements. The
clustering was then repeated using only Na and Mg. Both clustering results were the same due to
the large standard deviations of Na and Mg and their high correlations with the other elements
(i.e., Zr, Fe, and Al) showing large standard deviations.
Figure 1 is a tree diagram for method that uses Na and Mg as the classification elements.
The yellowcake samples in the tree diagram were arbitrarily clustered into four clusters. Cluster #1
represents all Line 2 yellowcake samples plus Line 0, sample 11. Cluster #2 is a mixture of
yellowcake samples from Line 0 and Line 1 data sets. Cluster #3 is Line 0 yellowcake samples with
high Mg concentrations. Cluster #4 is Line 0 yellowcake samples with high Na concentrations.
Four statistical parameters " semi-partial R2, R2, pseudo F, and pseudo t2 " 6,7 are useful indicators
of the number of clusters. Large relative changes, either increases or decreases, can indicate the
joining of two quite distinct clusters. Figure 2 plots these four statistics versus the number of
clusters. The semi-partial R2 plot indicates two or three clusters, and the R2 plot indicates three or
four clusters. The pseudo F plot is not very informative because the changes in this parameter seem
to be relatively small for data sets containing up to four clusters. The pseudo t2 plot indicates either
five or six clusters. In general, these indicator statistic plots show about two to six possible clusters.
0.0
0.2
0.4
0.6
0.8
1.0
#1
Sample Number
#2
#3
#4
1 3 3 3 3 3 3 2 3 3 3 3 2 3 3 2 2 3 3 2 3 3 3 3 3 3 3 2 3 3 3 3 2 3 3 3 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 2 1 4 1 2 8 3 6 1 1 2 9 1 7 2
1 5 4 4 4 3 3 0 2 4 3 3 0 5 4 3 3 5 3 0 5 4 5 2 4 3 4 0 4 5 3 3 1 3 3 4 5 0 8 0 4 0 0 2 2 0 2 1 0 1 0 1 1 1 1 0 0
2 1 9 1 0 1 1 7 3 6
0 1
2
3
2
6 8 5 4 8 1 6 8 1 9 5 7 4 9 9 8 3 3 8 0 6 1 9 0 7 2 9 3 2 4 6 0 0 2 7
2
4 6 3 2 3 0 9 1 2 8 7 4 0 3 7 5
1 8
1 9 6 5
Figure 1. Cluster of 81 yellowcake samples using Na and Mg classification elements.
Semi-Partial R2
0.9
1.0
0.8
0.9
0.8
0.7
0.6
0.6
0.5
R2
Semipartial R2
0.7
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
0
1
2
3
4
5
6
7
8
9
10
0
1
2
Number of Clusters
3
4
5
6
7
8
9
10
8
9
10
Number of Clusters
1100
450
1050
400
1000
950
350
300
850
Pseudo t2
Pseudo F
900
800
750
700
650
250
200
150
600
100
550
500
50
450
400
0
1
2
3
4
5
6
7
Number of Clusters
8
9
10
0
1
2
3
4
5
6
7
Number of Clusters
Figure 2. Semi-partial R2, R2, Pseudo F, and Pseudo t2 for clustering using the
classification elements Na and Mg.
PRINCIPAL COMPONENTS
Principal component analysis (PCA) is a mathematical method to reduce the number of
classification variables measured on a sample. For this project, we would like to reduce
the 15 classification elements to a smaller number of descriptive principal components.
The descriptive principal components are linear combinations of the original
classification elements. The first principal component represents the major spread of the
elemental concentrations. The second principal component is orthogonal to the first
principal component and represents the next major spread of the elemental
concentrations. PCA continues to extract orthogonal principal components until all the
variations of the elemental concentrations are represented. The objective of PCA is to
represent most of the elemental concentration variability by a few principal components.
For example, the PC1 and PC2 for the 81 yellowcake samples are the linear
concentrations of the 15 classification elements.
PC1 = 0.969*Na - 0.003*Mg - 0.011*Fe + 0.005*Mo - 0.003*Al
- 0.006*Ca + 0.018*K - 0.001*Zn + 0.247*Zr - 0.002*V
- 0.0004*Cu - 0.0009*Mn - 0.0004*As - 0.0019*Sr + 0.0009*Ti.
PC2 = 0.001*Na + 0.763*Mg + 0.522*Fe + 0.151*Mo + 0.342*Al
+ 0.055*Ca + 0.024*K - 0.002*Zn + 0.029*Zr + 0.022*V
- 0.0003*Cu + 0.0123*Mn - 0.002*As - 0.003*Sr + 0.002*Ti.
The PCA would extract 15 principal components to represent the total variation of the
elemental concentration. However, PC1 and PC2 account for 98.4% of the total variation
and essentially contain all the variation information for the yellowcake data.
Examination of the principal components shows that Na and Zr are the dominant
elements contributing to PC1 and that Mg, Fe, Mo, and Al are the dominant elements
contributing to PC2. Table 2 lists a strong correlation between Na and Zr and strong
correlations between Mg and Fe as well as between Mg and Al. These relationships
suggest that a plot of Mg versus Na may have the same appearance as a plot of PC1
versus PC2. The two different plots present the same information when compared.
Figure 3 presents a plot of Mg versus Na data with clusters that correspond to the four
clusters found bmethod in Figure 1.
60000
Line 0
Line 1
Line 2
Magnesium (Pg/g)
50000
40000
#3
30000
20000
#2
10000
#4
0
#1
0
25000
50000
75000
100000
Sodium (P
Pg/g)
Figure 3. Mg versus Na plot for the 81 yellowcake samples with Clusters #1, #2,
#3, and #4 corresponding to clustering by :DUG¶Vmethod in Figure 1.
STANDARDIZED CLASSIFICATION ELEMENTS
The previous procedure clustered the yellowcake samples by ranking the classification
elements according to their standard deviations. Another option is to standardize the
classification elements so all classification elements have the same standard deviation
and therefore are given equal rank for clustering the yellowcake samples. The
concentration of a standardized classification element is defined as:
Standardized Concentrat ion
Concentrat ion Average
Standard Deviation
The averages and standard deviations of the concentrations for the classification elements
are given in Table 1.
The average and standard deviations of all 15 standardized concentrations for the
classification elements are 0.00 and 1.00, respectively. However, the standardization
transformation does not change the correlations between classification elements.
Correlations given in Table 2 are the same for both classification elements and
standardized classification elements. Therefore, the 12 standardized classification
elements Na, Mg, Mo, Ca, K, Zn, V$&';?@'
clustering method and PCA.
Figure 4 '
'
the 12 standardized classification elements. The tree diagram clusters the yellowcake
samples into the same four clusters as Figure 1 but also displays some smaller
sub-clusters.
Figure 5 depicts the plots of the four indicator statistics for the number of clusters. The
semi-partial R2 plot shows relatively large changes for six clusters, and the R2 plot gives
the same results. The pseudo F gives the largest relative change also for six clusters, and
the pseudo t2 shows a relatively large change for five clusters.
Figure 6 is a plot of PC1 versus PC2 for the 12 standardized classification elements
corresponding to Figure 4. PC1 and PC2 only represent 52.5% of the total variation of
the 12 standardized classification elements. Ten principal components would be required
to represent 98.1% of the total variation for the standardized classification elements. PC1
versus PC2 shows the four main clusters. Additional sub-clusters can be identified in
Cluster #1 and Cluster #4. Cluster #3 may also have a sub-cluster of either one or two
samples.
CONCLUSIONS
Cluster analysis is a descriptive technique. The solution is not unique and is strongly
dependent upon methods and criteria. Cluster analysis always provides groups of
clusters, even if there is no group structure. When applying a cluster analysis we are
hypothesizing that clusters exist, but this assumption may be false or weak. With this
caution, the overall conclusion is that among the 81 yellowcake samples there are four
main clusters with some possible sub-clusters. Cluster #1 represents all Line 2
yellowcake samples plus Line 0, sample 11. Cluster #2 is a mixture of yellowcake
samples from Line 0 and Line 1 data sets. Cluster #3 is all Line 0 yellowcake samples
with high Mg concentrations. Cluster #4 is all Line 0 yellowcake samples with high Na
concentrations. Additional sub-clusters may be identified in Cluster #1 and Cluster #4.
Cluster #3 may also have a sub-cluster of one or two samples. Multiple measurements of
elemental content were not performed on each sample. This lack of replication may
cause identification of clusters caused by unusual or outlier measurements rather than
different source locations. Additional information may be needed to resolve this conflict
and to determine why these observations are extreme measurements.
Figure 4. :DUG¶VFOXVtering of yellowcake samples using 12 standardized classification elements (Na, Mg, Mo, Ca, K,
Zn, V, Mn, Cu, As, Sr, and Ti).
Semi-Partial R2
0.9
0.30
0.8
0.25
0.6
0.5
R2
Semipartial R2
0.7
0.20
0.15
0.4
0.10
0.3
0.2
0.05
0.1
0.00
0.0
0
1
2
3
4
5
6
7
8
9
10
0
1
2
Number of Clusters
3
4
5
6
7
8
9
10
8
9
10
Number of Clusters
35
42
41
40
30
39
38
37
25
Pseudo t2
Pseudo F
36
35
34
33
32
31
20
15
30
29
28
10
27
26
25
5
1
2
3
4
5
6
7
8
9
10
0
1
2
Number of Clusters
3
4
5
6
7
Number of Clusters
Figure 5. Semi-partial R2, R2, Pseudo F, and Pseudo t2 for clustering using
standardized classification elements (Na, Mg, Mo, Ca, K, Zn, V, Mn,
Cu, As, Sr, and Ti).
3.5
3.0
#1
2.5
#2
2.0
1.5
Principal Component 2
1.0
#3
0.5
0.0
-0.5
-1.0
-1.5
-2.0
#4
-2.5
-3.0
-3.5
-4.0
Line 0
Line 1
Line 2
-4.5
-5.0
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
Principal Component 1
Figure 6. PC1 versus PC2 for the 12 standardized classification elements used
IRU:DUG¶VFOXstering method in Figure 4.
ACKNOWLEDGEMENT
This research was sponsored by the Office of Nonproliferation and International Security
(NA-24), National Nuclear Security Administration (NNSA), U.S. Department of Energy
under contract DE-AC05-00OR22725 with Oak Ridge National Laboratory managed and
operated by UT-Battelle, LLC.
REFERENCES
1. Uranium Producers of America, Conventional Mining and Milling of Uranium
Ore, http://www.uraniumproducersamerica.com/tech.html.
2.
Qvedkauskaite-LeGore, J., Rasmussen, G., Abousahl, S., and van Belle, P.,
Investigation of the sample characteristics needed for the determination of the
origin of uranium-bearing materials, Journal of Radioanalytical and Nuclear
Chemistry, Vol. 278, No. 1 (2008), pp. 201-209.
3.
Fernandes, E. A. N., Tagliaferro, F. S., Bode, P., Bacchi, M. A., Sarriés, G. A.,
Characterisation of Components of Waste Rock Piles of Future Uranium
Mining Activities in Brazil Using INAA and Statistical Data Treatment,
Journal of Radioanalytical and Nuclear Chemistry, Vol. 244, No. 3, (2000),
pp. 595-598.
4.
Keegan, E., Richter, S., Kelly, I., Wong, H., Gadd, P., Kuehn, H., and AlonsoMunoz, A., The Provenance of Australian Uranium Ore Concentrates by
Elemental and Isotopic Analysis, Applied Geochemistry, Vol. 23, Issue 4,
(2008), pp. 765-777.
5.
SAS Institute Inc., SAS/STATTM Guide for Personal Computers, Version 6
Edition. Cary, NC, SAS Institute Inc., 1987, pp. 283-357.
6.
Milligan, G. W. and Cooper, M. C. (1985), An Examination of Procedures for
Determining the Number of Clusters in a Data Set, Psychometrika, 50,
pp. 159-179.
7.
Hawkins, D. M., Muller, M. W., and ten Kooden, J. A. (1982), &'
; Topics in Applied Multivariate Analysis, ed. D. M. Hawkins,
Cambridge: Cambridge University Press.