Download Report

SAMPLE SIZE AND CONFIDENCE WHEN APPLYING THE
NSSDA
Ariza López, Francisco Javier (*); Atkinson Gordo, Alan David (*)
(*) Grupo de Investigación en Ingeniería Cartográfica.
Dpto. de Ingeniería Cartográfica, Geodésica y Fotogrametría.
Universidad de Jaén.
Campus “Las Lagunillas” s/n. 23071. Jaén (Spain).
e-mail: [email protected]. Tel: +34953212469
e-mail: [email protected] Tel: +34927257195
ABSTRACT
In this work a simulation process is used in order to study the variation and stability of National Standard Spatial Data
Accuracy (NSSDA) results, depending on the sample size. Empirical results show that the NSSDA underestimated the
error level presented in the population, and that positional accuracy estimation also has a variability of 11%when using
the recommended sample size of 20 points. Simulation results indicate out the use of samples of a hundred points in
order to reach an effective confidence level of 95%. The NSSDA is a methodology of shared risk between users and
producers when accuracy is “as expected”, but for other cases the relation is altered, as simulation results demonstrated.
INTRODUCTION
Since positional quality is essential in cartographic production, all mapping agencies have used statistical methods for
its control, and we call these methods tests. Among the different methods used, we can highlight the National Map
Accuracy Standard (USBB, 1947), the Engineering Map Accuracy Standard (ASCI, 1983; ASP, 1985; Veregin, 1989;
Giordano and Veregin, 1994), the ASPRS (Merchant, 1987; ASPRS, 1989), or the more recent National Standard
Spatial Data Accuracy (FGDC, 1998).
The National Standard Spatial Data Accuracy (NSSDA) established by the Federal Geographic Data Committee in
1998 is a statistical methodology for evaluating the positional quality of a Geographic Data Base (GDB). NSSDA is a
compulsorily fulfilled standard for federal agencies of the USA producing analogical and/or digital cartographic data,
and is ever more widely used all over the world.
Like other positional quality control procedures, the coordinates of a set of points in the GDB are compared to
coordinates of the same points in a higher accuracy source, mainly a field survey. In this way an RMSE is derived from
discrepancies between pairs of coordinates. The NSSDA does not carry out a study on the presence of systematisms, as
it considers that “they might have been eliminated in the best way” (FGDC, 1998). Therefore, the NSSDA only focuses
on the study of data dispersion. The NSSDA gives results in a more open way than the previous test because it leaves to
the user’s understanding whether or not the derived accuracy reaches expectations, which means, in a practical way, if
the product passes or fails the user’s accuracy expectations. So acceptance or rejection is the responsibility of the user.
The test only tells us: "the product has been checked/compiled for N meters of horizontal/vertical accuracy at 95% of
level of confidence". Table 1 summarizes the steps for applying the standard.
From a statistical point of view, one of the most controversial aspects of all the above-mentioned methodologies for
positional control is the number and distribution of the control points. With regard to the number, which is our interest
here, it should always be large enough for the hypothesis of normality to be fulfilled, this being determined by the laws
of large numbers in statistics. For this reason recommendations always suggest at least 20 points (FGDC, 1998;
MPLMIC, 1999). Nevertheless, this size seems to be very small and some authors (Li, 1991) and institutions (Ordnance
Survey GB) suggest larger sizes. Obviously, since an elimination of gross errors should always be performed, a higher
number should be used. This number of points should be enough to ensure, with a given level of confidence, that a
GDB with a non-acceptable quality level will not be acquired. On the other hand, the number of points to be used for
the control must be the lowest possible in order to minimize the cost of such a control (Ariza, 2002).
Under the assumption of no systematic errors, our research focuses on the study of the variability of the estimated
positional (only horizontal) accuracy of a GDB when applying the NSSDA with different sample sizes. In order to
enable the generalization of results, synthetic Normal (µP= 0,  2P = 1) distributed populations of data are used.
Simulation is developed using the bases of the process shown in “Positional quality control by means of the EMAS test
and acceptance curves” (Ariza and Atkinson, 2005) also presented to this XXII International Cartographic Conference.
The confidence of the results is analyzed with reference to so-called user and producer risks.
Table 1.- Summary of the NSSDA when applied to the horizontal component
Select a sample of a minimum of 20 check points (n>=20)
Compute individual errors for each point i:
e xi  x ti  x mi
Compute RMSE for each component:
RMSE X 
e yi  y ti  y mi
e x2
RMSEY 
i
n
Compute the horizontal RMSE using appropriate expression:
If RMSEX = RMSEY
e y2
n
i
If RMSEX  RMSEY and 0.6 < (RMSEmin / RMSEmax) < 1,
RMSE R  RMSE X2  RMSEY2
ACCURACYR= 1.7308 RMSER = 2.4477 RMSEX = 2.4477 RMSEY
ACCURACYR  2.4477 0.5 (RMSEX + RMSEY)
Note: If error is normally distributed and independent in each of the x- and y-components, the factor 2.4477 is used to
compute horizontal accuracy at the 95% confidence level (Greenwalt and Schults, 1962).
This presentation is organized in four sections: The first deals with the analysis of the accuracy variability depending on
sample size but with fixed population variability, the second presents properties (underestimation and risks) of this
methodology that have not been mentioned previously, and the next presents the analysis of the user’s and producer’s
risks when variability of populations is considered. Finally, conclusions are presented.
Nowadays there is a proposal for the revision of the NSSDA based on various suggestions, like those coming from the
National Digital Elevation Program oriented towards adding instructions for how to test and report vertical accuracy in
areas where a normal distribution can not always be attained, or the proposal of Tilley (2002) in order to classify
accuracy results derived from the NSSDA, or the claim by McCollum (2003) that the Greenwalt and Shultz (1962)
estimator is inappropriately used in the NSSDA to determine a probability of 95%. So that our results can also give
more ideas for the redefinition of this standard.
VARIABILITY OF NSSDA ACCURACY ESTIMATIONS
In this paper simulation has been used as the base tool for analyzing the behaviour of the NSSDA methodology. The
simulation process is similar to that applied to other positional control methodologies (for more detail see Ariza and
Atkinson 2005); it basically consists of three main steps:
¬ Simulation of populations. A hundred synthetic populations of well known parameters (µP= 0,  2P = 1, where "P"
means population) are derived from a controlled statistical random values generation process. Single population
values are considered positional error values.
¬ Simulation of samples. A thousand samples of different sizes (n=10, 20, 30, and so on) are extracted from each
population. The NSSDA is applied to each sample as if it were a single positional control test.
¬ Statistic computations. Results values are aggregated deriving mean errors and variation of error, the later giving an
idea of the stability and reliability of the process.
Results of the process are shown in Table 2. Because the simulation is performed using a Normal ≈ N(µP= 0,  2P = 1)
distributed population, the theoretical value to be detected by the NSSDA is ACCURACYR = 2.447 m, which
corresponds to a circular error estimation with a probability of the 95%. Because of the large number of simulations, the
final results are very sound, with decreasing stability values ranging from 0.7 for samples of minimum size, and up to
0.2 for samples with maximum size. As can be observed in Table 2, mean values for 2,416 m  0,382 m are obtained
for sample sizes of 10 points. This last supposes a 15.8 % of variability with respect to the mean observed value, and is
inadmissible. For the size recommend by the NSSDA (n=20) the observed values obtained is ACCURACYR = 2.434 m
 0.241 m. This value is 0.6% less than the corresponding theoretical value to be detected, and in this case variability is
in the order of  11%, so that actually accuracy has a 89% of confidence. The variation range decreases when sample
size increases, so that for sample sizes of 700 points it is circa 1%. In this way, taking a sample size of 95 points the
mean observed value is ACCURACYR = 2.443 m  0.121 m and the variability is within ± 5 % of that value. So if we
want to work with a confidence level of 95% it is not advisable to use less than 95-100 points for the control sample. In
this case the simulation variability is about 0.4% which means a variation interval between ±4.6% and ±5.46%. Also, if
we want to limit the variability to a maximum of 5%(= ±2,5%) a sample with at least n=275 points will be needed. But
it is also possible to obtain the maximum variation value for a given sample size. The deviation value presented in
column “c” of Table 2 should be multiplied by a K factor related to the desired confidence level, and added to the mean
value of the same table. For instance, when n = 275, and the desired confidence level is 95%, K=1.96, so from Table 2
we obtain:
ACCURACYR (Maximum) = ACCURACYR ± K Deviation = 2.446 m ± 1.96 x 0.061m = 2.446 m ± 0.120 m
Other studies based on population normality (LI, 1991) suggest samples sizes of the same order.
Table 2.- Mean ACCURACYR values and variability obtained by simulation of samples and populations
n
(a)
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
125
150
175
NSSDA
m
(b)
2,416
2,426
2,432
2,434
2,437
2,438
2,439
2,440
2,441
2,441
2,442
2,442
2,442
2,443
2,443
2,443
2,443
2,444
2,444
2,444
2,446
2,446
Dev.
±m
(c)
0,382
0,312
0,270
0,241
0,219
0,203
0,189
0,178
0,168
0,160
0,153
0,147
0,141
0,136
0,131
0,127
0,124
0,121
0,115
0,099
0,088
0,083
Variation
±%
(d)
15,8
12,9
11,1
9,9
9,0
8,3
7,8
7,3
6,9
6,6
6,3
6,0
5,8
5,5
5,4
5,2
5,1
5,0
4,7
4,1
3,6
3,4
Stab.
%
(e)
0,7
0,6
0,5
0,5
0,5
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,4
0,3
0,4
n
(a)
200
225
250
275
300
325
350
375
400
425
450
475
500
525
550
575
600
625
650
675
700
800
NSSDA
m
(b)
2,445
2,446
2,445
2,446
2,446
2,445
2,446
2,446
2,446
2,446
2,446
2,446
2,446
2,447
2,446
2,446
2,446
2,446
2,446
2,446
2,446
2,447
Dev.
±m
(c)
0,075
0,070
0,065
0,061
0,058
0,055
0,052
0,048
0,046
0,043
0,042
0,040
0,039
0,037
0,034
0,033
0,031
0,030
0,028
0,026
0,025
0,024
Variation Stab.
±%
%
(d)
(e)
3,1
0,4
2,9
0,4
2,6
0,4
2,5
0,3
2,4
0,3
2,3
0,3
2,1
0,3
2,0
0,3
1,9
0,3
1,8
0,3
1,7
0,3
1,6
0,3
1,6
0,3
1,5
0,3
1,4
0,2
1,3
0,3
1,3
0,2
1,2
0,2
1,1
0,2
1,0
0,2
1,0
0,2
1,0
0,2
Columns are: (a) Size (number of points) of the 1000 random samples; (b) simulation mean observed value for
horizontal accuracy (ACCURACYR) by applying the NSSDA with a 95% confidence level; (c) mean deviation of
the simulation process with respect to the mean observed value; (d) previous deviation expressed as a percentage
of mean observed value of the horizontal accuracy; (e) stability of the process when using a hundred random
populations, distributed as N(0,1).
The same tendency results are expressed graphically in Figure 1. Here the X-axis refers to the size of control sample,
and the Y-axis to the mean observed, or estimated, population value through the sample when using N(µP= 0,  2P = 1)
populations in the simulation process. The wider and red dashed line corresponds to the value to be theoretically
detected by the NSSDA: ACCURACYR = 2.447 m. The series of points are the results of the simulation; they have a
very clear tendency, approaching from below the theoretical value when increasing the sample size. The two other
dashed lines represent the decreasing tendency of the variability of the mean values.
Exactitud estimada por NSSDA para 1 -  = 95%
2.450
2.445
NSSDA (m)
2.440
2.435
2.430
2.425
2.420
2.415
2.410
0
100
200
300
400
500
600
700
800
(n) puntos por muestra
Figure 1.- Mean ACCURACYR values (points) and variability (black dashed lines) obtained by simulation. The
theoretical NSSDA value corresponding to a N(µP= 0,  2P =1) distributed population is shown in red
UNDERESTIMATION, USER’S AND PRODUCER’S RISK
As shown in Figure 1, the mean estimated value for a NSSDA control is smaller than the corresponding value for the
N(µP= 0,  2P = 1) distributed population. In other words, in mean values the NSSDA underestimated the error level of
the population or overestimated the accuracy. But variability of mean values can be above or below the mean tendency,
so that for a specific control there is a probability for a better or a worst estimation of the RMSER, and that implies
certain kind of risk that we call producer’s and user’s risk. Figure 2 shows a graphical interpretation: the area between
the upper dashed line and the horizontal red one, which corresponds to expected population value (an ACCURACYR =
2.447 m for N(µP= 0,  2P =1) distributed population), represents a producers risk. The same is true for the area between
the lower dashed line and the same horizontal red line, but in this case corresponding to the user.
The interpretation given here to such probabilities or kind of risks is as follows:
 Producer risk means the probability of an estimation of the population’s RMSER value greater than it actually is. So
the conclusion for the NSSDA is a worse accuracy than the true one. For a given sample size, for instance n1, this
occurs proportionally to the ratio between segments A1B1 and A1D1 (producers risk = A1B1/A1D1). Segment A1B1 is
the width of the risk-producer area for n1, and A1D1 is the total variability for the mean estimation when sample size
is n1.
 User risk means the probability of an estimation of the population’s RMSER value less than it actually is. So the
conclusion of the NSSDA is a better accuracy than the true one. For a given sample size, for instance n1, this occurs
proportionally to the ratio between segments B1D1 and A1D1 (users risk = B1D1/A1D1). Segment B1D1 is the width of
the risk user area for n1, and A1D1 is the total variability for the mean estimation when the sample size is n1.
A
B
NSSDA
Pobl.
NSSDA
RP
RU
D
D
10
15
20
25
30 n35 40
45
(n) Sample size
50
55
60
65
70
Figure 2.- Probabilities of better and worst estimations of RMSER than actually (User’s and producer’s risk)
Numerically speaking, both probabilities are very similar and also present a tendency to be equal (Table 3), but
curiously for lower sample sizes greater differences occur although these are limited to 4.2% for n=10. This behaviour,
in relation to producer and user risk, implies certain equitability because both share, more or less, the same risk.
Table 3.- Probabilities of better and worst estimations
of RMSER than actually (User’s and Producer’s Risk)
size
Probabilities
size
Probabilities
n
RU
RP
n
RU
RP
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
90
0.521
0.517
0.514
0.514
0.512
0.511
0.511
0.510
0.509
0.510
0.508
0.509
0.509
0.508
0.508
0.508
0.479
0.483
0.486
0.486
0.488
0.489
0.489
0.490
0.491
0.490
0.492
0.491
0.491
0.492
0.492
0.492
95
100
110
125
150
175
200
225
250
275
300
400
500
600
700
800
0.506
0.494
0.507
0.507
0.508
0.503
0.503
0.507
0.504
0.508
0.506
0.504
0.506
0.507
0.508
0.510
0.500
0.493
0.493
0.492
0.497
0.497
0.493
0.496
0.492
0.494
0.496
0.494
0.493
0.492
0.490
0.500
VARAIBILITY OF USER’S AND PRODUCER’S RISK
Until now we have worked under the assumption of having a N(µP= 0,  2P = 1) distributed population and analyzed
what can occur when estimating from a sample of a given size. Now we are going to analyze the behaviour of the
NSSDA when expecting N(µP= 0,  2P = 1) distributed population but actually working with other normal N(µP= 0, 
2
2
P = D ) distributed population (D<>1). So we are going to determine the user’s and producer’s risk for that condition.
For this analysis we have used a simulation process similar to that mentioned above, but changing the variation
behaviour when creating random populations. So a set of populations normally distributed has been synthetically
created following a N(µP= 0,  2P = D2), where D = 0,8; 0,85; 0,9; 0,95; 1,00; 1,05; 1,10; 1,15 and 1,20. For each
synthetic population a thousand samples of different sizes (n=10, 20, 30, and so on) were extracted. The NSSDA was
applied to each sample as if it were a single positional control test.
The results of this process are presented in Figure 3. The horizontal axis corresponds to sample size and vertical axis to
ACCURACYR (left) and D (right). Figure 3 shows very similar tendency lines, but shifted to the vertical. The wider and
dashed line corresponds to the previously studied situation where population follows a N(µP= 0,  2P = 1) and so is the
same line as presented in Figure 1. Tendency curves above the wider and dashed line correspond to those cases were D
> 1, and curves below to those where D<1.
NSSDA (m)
Deviation (m)
3.0
1.2
2.9
1.15
2.8
1.1
2.7
2.6
1.05
2.5
1.0
2.4
0.95
2.3
2.2
0.9
2.1
0.85
2.0
0.8
1.9
0
20
40
60
80
100
120
140
Sample Size
Figure 3.- Evolution of ACCURACYR values for different population deviations versus sample size
The different values of D can be considered, in relation to D=1, as exigency ratios (ER) implying a detected nominal
accuracy value when applying the NSSDA, and vice versa. This idea is presented in Table 4. For example: ER = 0,8
means that we require an ACCURACYR = 1,957 m but the estimation from sample gives ACCURACYR = 2,446 m. So
that our exigency is the 80% (= 1,957 / 2,446) of the actual accuracy of the product. The opposite case occurs when ER
> 1, for instance if ER = 1,2 that means we require an ACCURACYR = 2,935 m but the estimation from sample gives
ACCURACYR = 2,446 m, so the product is a 120% (= 2,935 / 2,446) more accurate than expected.
Table 4.- Exigency ratios or D values and corresponding detected nominal
values for the NSSDA (ACCURACYR)
Exigency Ratios (ER)
ACCURACYR
0,80
0,85
0,90
0,95
1,957
2,079
2,201
2,324
1,00
1,05
1,10
1,15
1,20
2,446
2,568
2,691
2,813
2,935
The described variability behaviour of values around the mean (previous section) has a very important role now when
studying the behaviour of the NSSDA when expecting a N(µP= 0,  2P = 1) distributed population but actually working
with a N(µP= 0,  2P = D2) distributed population. In order to use an acceptance index in the form of probability we
consider the following rules:
 R1: If D < 1  ER > 1, the accuracy of the population is better than expected, and this means that it would be
considered as satisfactory or accepted. So when performing the simulation, we will take into account the number of
cases where observed resulting values for ACCURACYR < 2.477m. The number of such cases will be expressed as
an acceptance percentage of total cases (the number of times we are able to say that accuracy is better than
expected).
 R2: If D >1 ER < 1, the accuracy of the population is worse than expected, and this means that it would be
considered as not satisfactory or not accepted. So when performing the simulation, we will take into account the
number of cases where observed resulting values for ACCURACYR < 2.477m. The number of such cases will be
expressed as an acceptance percentage of total cases (the number of times we are not able to say that accuracy is
worse than expected).
Figure 4 shows obtained results for the above mentioned process when representing acceptance values as a percentage.
The black wider curve corresponds to the case where D=1 (labelled with 1). Its value is a little more than 50% because
the accuracy underestimation of the NSSDA generates that situation. The red curves correspond to cases where D<1
(labelled with 1.05 up to 1.20). Here quality is better than expected, user acceptance increases and producer’s risk
decreases to a percentage equals to 100% minus Acceptance (%) (a good product can be rejected in this percentage).
The blue curves correspond to cases where D>1 (labelled with 0,95 up to 0,80). Here quality is worse than expected,
and user acceptance is a risk (a bad product can be accepted in that percentage) which decreases when sample size n
increases.
(%) Acceptance
100
1,20
1,15
90
1,10
80
1,05
70
60
1,00
50
40
30
0,95
20
10
0,85
0,80
0,90
0
0
20
40
60
80
100
120
140
Sample Size
Figure 4.- Evolution of acceptance levels for different population deviations versus sample size
CONCLUSIONS
By using a simulation based methodology the NSSDA ACCURACYR variations and risk behaviours have been analyzed.
The statistical analysis is based on the use of normal distributed synthetic populations, which ensures the control of the
process and the generality of results and its easy applicability to real cases. The main conclusions derived from the
results are:
 The NSSDA has a little tendency to underestimate accuracy.
 For the minimum proposed sample size (20 points) the variability of results is in the order of 11%, which actually
means a confidence level of the 89%.
 In order to have a 95% confidence level on estimation, and variability within a range of ±5%, the sample size must
be in the order of 100 points.
 Because of its statistical formulation, the NSSDA accuracy estimation gives similar user’s and producer’s risks,
which means a shared risk behaviour.
 If the variability of the population is greater or lesser than expected, user’s and producer’s risks change. We have
derived a family of curves that can be used by users to determine the sample size for limiting their risk but also by
producers to analyze the tradeoffs between their product’s quality and acceptance to decide and establish the
capacity of the production process.
ACKNOWLEDGEMENTS
This work has been partially funded by the National Ministry of Sciences and Technology under grant nº BIA200302234.
REFERENCES
ARIZA, F.J. (2002). Control de Calidad en la Producción Cartográfica. Ra-Ma.
ARIZA, F.J., ATKINSON, A. (2005). Positional quality control by means of the emas test and acceptance
curves. In proceedings of the XXII International Cartographic Conference, La Coruña, España.
ASCI (1983). Map Uses, scales and accuracies for engineering and associated purposes. American Society
of Civil Engineers, Committee on Cartographic Surveying, Surveying and Mapping Division, New
York.
ASP (1985). Accuracy Specification for Large-Scale Line Maps. In PE& RS, vol 51, nº 2.
ASPRS (1989). Accuracy standards for large scale maps. In PE&RS, vol. 56, nº7.
ATKINSON, A. (2005). Control de calidad posicional en cartografía: análisis de los principales estándares y
propuesta de mejora. Tesis doctoral, Universidad de Jaén, Jaén.
FGDC (1998). Geospatial Positioning Accuracy Standards, National Standard for Spatial Data Accuracy.
FGDC-STD-007-1998, http://www.fgdc.gov/
GIORDANO, A.; VEREGIN, H. (1994). Il controllo di qualitá nei sistemei informative territoriali. Il Cardo
Editore, Venetia.
GREENWALT, C.; SHULTZ, M. (1962). Principles of error theory and cartographic applications. ACIC
Technical Report nº 96. ACIC, St Louis.
MCCOLLUM, J. (2003). Map Error and Root Mean Square. In proceedings of the Sixteenth Annual
Geographic Information Sciences Conference of the Towson University and Towson University's
Department of Geography and Environmental Planning (TGGIS 2003).
MERCHANT, D. (1987). A Spatial Accuracy Specification for Large Scale Topographic Maps. In PE&SR,
vol. 53.
MPLMIC (1999). Positional Accuracy Handbook. Minnesota Planning Land Management Information
Center; http://www.mnplan.state.mn.us/press/accurate.html.
TILLEY, G. (2002). A classification system for National Standards for Spatial data Accuracy. In proceedings
of the Fifteenth Annual Geographic Information Sciences Conference of the Towson University and
Towson University's Department of Geography and Environmental Planning (TGGIS 2002).
USBB (1947). United States National Map Accuracy Standards. U.S. Bureau of the Budget.
VEREGIN H. (1989). Taxonomy of Errors in Spatial Data Bases, Technical Paper 89-12, NCGIA, Santa Barbara.