Document 279769

Folia Geobotanica 42: 209-216, 2007
TO SAMPLE OR NOT TO SAMPLE?
THAT IS THE QUESTION ... FOR THE VEGETATION SCIENTIST
Alessandro C h i a r u c c i
Department of Environmental Science "'G. Sarfatti ", Universi~ of Siena, Via P.A. Mattioli 4, 531 O0Siena, Italy;
e-mail [email protected]
Abstract: LAJER (2007) raised the problem of using a non-random sample for statistical testing of plant
community data. He argued that this violates basic assumptions of the tests, resulting thus in non-significant
results. However, a huge part of present-day knowledge of vegetation science is still based on non-random,
preferentially collected data of plant communities. I argue that, given the inherent limits of preferential
sampling, a change of approach is now necessary, with the adoption of sampling based on random principles
seeming the obvious choice. However, a complete transition to random-based sampling designs in vegetation
science is limited by the yet undefined nature of plant communities and by the still diffused opinion that plant
communities have a discrete nature. Randomly searching for such entities is almost impossible, given their
dependence on scale of observation, plot size and shape, and the need for finding well-defined types. I conclude
that the only way to solve this conundrum is to consider and study plant communities as operational units. If the
limits of the plant communities are defined operationally, they can be investigated using proper sampling
techniques and the collected data analyzed using adequate statistical tools.
Keywerds: Plant communities, Vegetation sampling, Vegetation science
INTRODUCTION
LAJER (2007) claimed that applying rigorous statistical tests on vegetation data collected
using the phytosociological method is not valid, because these data are collected using a
subjective selection o f observational units, thus violating the properties o f randomness,
known probability and independence o f statistical sampling upon which statistical tests are
built. LAJER (2007) remarked that in population and community ecology statistical inference
is valid only ifprobabilistic sampling is undertaken (KENKEL et al. 1989); only a model-based
inference can be attempted, and this strictly depends on the assumptions o f the adopted model
(STEVENS 1994). For everybody with a basic knowledge o f statistics and sampling theory,
this statement is so obvious that it is trivial. However, several vegetation scientists continue to
ignore these arguments and collect their data on the basis o f preferential choices, but then
process these data as if they were based on a probabilistic sampling design.
I will argue that the reason the process o f sampling in vegetation science is still facing such
problems is linked to the definition o f plant community itself. I will then discuss some
possible ways to overcome this long conundrum.
Forum: Analysis of non-randomly sampled data sets in vegetation ecology
210
A. Chiarucci
The nature of plant communities
The historically most important discussion in vegetation science was the long, and
sometimes harsh, debate on the nature of plant communities, with the community-unit theory
(CLEMENTS 1916) at one end and the individualistic hypothesis (GLEASON 1926) at the other
end. By the 1970s after the work of J.T. Curtis and R.H. Whittaker of the Wisconsin school, in
the English speaking world the continuum theory was widely accepted as a valid description
for vegetation (NICOLSON 2001). In Europe such agreement was never reached (MIRKIN
1987, MORAVEC 1989). Hence, vegetation and plant communities remain something for
which formal definitions have never been globally recognized.
In Europe, the community-unit theory still has many supporters and a lot of research
follows the phytosociological method. This discipline, representing an outcome of the
community-unit theory, searches for discrete plant communities by using a subjective
selection of sampling units, named relev6s (EWALD 2003). In phytosociology, "The term
plant community involves the entirety of plant stands (vegetation units, phytocoenosis), as
well as the type which can be abstracted from these stands (basic type, elemental type); this
type is determined by a characteristic species combination and certain site conditions" (POTT
1996). Communities of any rank are designated as syntaxa (POTT 1996), with the basic level
being the association (BRAUN-BLANQUET 1965, WILLNER 2006). Thus, in present-day
phytosociology the association is considered a real object, the specimens of which should be
recognized in the field (see e.g. RIVAS-MARTI-NEZ1996: 168) and recorded.
A different view is that based on a more general definition of plant communities.
According to PIELOU(1977), the term "community"refers to all the organisms belonging to a
taxonomic group higher than species and present within a delineated study area. The concept
used by PIELOU (1977), and others, refers to a pragmatic approach in defining plant
communities, which does not need to recognize the communities in the field. They can be
defined in a pragmatic way and a delineated area can be a small quadrat to a whole island, or
even a whole continent. This argument has been the subject of discussion also in recent years.
In a paper entitled "Does vegetation science exist", WILSON (1991) asked "do plant
communities exist in any more meaningful sense, as integrated, discrete entities?" and
concluded with the provocative sentence "Isn't it only the fact that we like vegetation that
keeps us going as vegetation scientists?". In commenting on this paper, PALMER& WHITE
(1994b) found a total lack of agreement on how ecosystems and communities were defined by
different authors and concluded that "Communities have no universally recognized
integrative, circumscribing force. Communities therefore must be defined and studied
operationally. If, as Wilson suggests, an integrative, circumscribing force is a necessary
precondition for the existence of vegetation science, then vegetation science has no hope of
discovering such a force". Paralleling a definition of ecosystem given by LINDEMAN(1942),
PALMER & WHITE (1994b) suggested that ecologists should "define community
operationally, with as little conceptual baggage as possible, so that we can put the debate
about their existence behind us". They then proposed to define the community as "the living
organisms present within a space-time unit of any magnitude". This definition corresponded
well to that given and commented by PIELOU (1977).
To sample or not to sample?
211
Vegetation and vegetation sampling
According to ELZINGA et al. (2001), sampling is the process of selecting a part of
something with the intent of showing the quality, style, or nature of the whole. Also,
"A sample is a number of observations from a universe" (FORD 200 l: 220) and the "universe
is the set of all observations that could possibly be obtained that follow the probability law"
(FORD 2001 : 22 l). Converting this into a vegetation description process, it means to collect
data, e.g. registering species composition in field quadrats within a certain vegetation (e.g. a
forest), and then generalize the descriptive features, such as dominant and frequent species,
obtained from the sample (the quadrats) to the universe, that is the whole vegetation (e.g. the
whole forest). Sampling can be done also for other purposes, such as environmental
correlation, species association and patterns. WILSON (2007) claimed that these latter
purposes are less affected by sampling problems. I retain that a biased sample is biased for any
scope, but in this contribution I will concentrate on the issue of spatial inference on
community composition. This is a very typical argument also for traditional phytosociology
and other descriptive approaches. In fact, when a phytosociologist produces a vegetation map,
he/she describes the species composition of a whole part of a territory (the patches of the same
class) from sample data (the plots/relevrs). Even if not explicit, this is a typical process of
spatial inference and its validity strictly depends on the representativity of the sample.
If we wish to apply a robust sampling protocol, the statistical population of concern must be
formally specified at the very first step. Apart from practical difficulties in the field that limit
the application of formal protocols, the main problem in the process of sampling vegetation
arises from the lack of a unique definition of vegetation and plant community. How can the
population (in a statistical sense) of concern be defined and used for a probabilistic sampling,
if a simple, recognized, and unique definition of plant community or vegetation is missing?
This has clear consequences on the possibility of obtaining adequate data for statistical testing
(LAJER 2007). In practical terms, two important issues to be decided when planning a field
sampling are: (i) where to locate sampling units and (once this has been decided), (ii) how
much area needs to be sampled to obtain adequate information on vegetation composition
and/or structure. These two issues are both dependent on the adopted definition of plant
communities. Other issues may also arise, such as the number of sampling units and their
shape, but they are less dependent on the definition of plant communities and I will not discuss
them here.
Location of sampling units
If the plant community is a discrete unit, as argued by phytosociologists (e.g.
RIVAS-MARTrNEZ 1996, POTT 1996), then this must be recognized in the field before being
sampled. In contrast, if the plant community is an operational concept, as suggested by
PIELOU (1977) or PALMER & WHITE (1994b), this concept has to be formally defined before
the beginning of field sampling.
In the case of discrete units, it is impossible to decide apriori where to locate the sampling
units and how large these should be to represent the natural vegetation unit: these two issues
both depend on the floristic homogeneity and the presence of diagnostic species. It is only
possible to recognize a plant community directly in the field! In fact, the classic definition of
212
A. Chiarucci
plant communities by BRAUN-BLANQUET(1965) markedly stressed the physiognomic and
floristic homogeneity of the area to be sampled, but did not provide indications of the
procedure for locating the observational units. BRAUN-BLANQUET (1965) wrote "As a
general rule, every sample plot, whether large or small, precisely delimited or not, should
show the greatest possible uniformity, not only in regard to its floristic composition, which
determines the appearance or physiognomy of the community, but also in regard to soil and
relief as far as these can be observed". This leads to a high subjectivity in locating the
observational units and makes it impossible to frame this process into a probabilistic
procedure. However, to frame this approach into the proper historical context, I recall that
Braun-Blanquet worked during the first half of the 20th century, when sampling theory was
not yet well developed.
Of course, phytosociologists often just collected data in patches of vegetation, without
taking into account the association concept in the field, and then analyzed the data using
multivariate methods to search for groupings. This is likely to reduce the subjectivity but the
exact sites for data collection are subjectively selected, thus violating the assumptions of
randomness, independence and known probability. Sometimes the homogeneous area to be
sampled can be delineated by more objective tools, such as remote sensing. Nevertheless, in
recent reviews of the phytosociological method, the selection of observational units is still
described as driven by the expertise of the vegetation scientist rather than by a probabilistic
approach (e.g. POTT 1996, RIVAS-MARTINEZ 1996). However, these authors often suggest
that statistically valid information can be extracted from the data collected using this
approach, thus implying that generalization can be made from them. As an example,
RIVAS-MARTINEZ (1996: 150) wrote that "... with all this information, by means of an
inductive and statistical method based on the trueness of the phytosociological sample of the
vegetation, (the phytosociology) attempts to create a hierarchical and universal typology, with
the association being the basic unit of the system". The reader is induced to imagine that a
level of generalization can be performed by using these data and the resulting synthesis.
However, as clearly stated by L,~JER(2007), and many others (e.g. KENKELet al. 1989), this
is not possible when using statistical tools developed for design-based methods that need a
probabilistic sampling design.
In the description of the methods for data collection, usually the phytosociologists state that
observational units should be "representative" of the whole vegetation stand. However, it is
not specified whether the representativeness referred to is statistical (and thus defined by
independence, randomness and known probability of sampling) or whether, as is more likely,
it is somehow perceived by the surveyor as "typical", "expected", "frequent" or "usual" for
that observational unit under the particular condition (EWALD2003, FERRETTI& CHIARUCCI
2003). Thus, when the representativeness of the observational unit is driven by the subjective
expertise of the vegetation scientist in the field, it is not possible to consider this process as
a true sampling. This process can be better described as an expert recognition of objects and
consequently the data cannot be analyzed by inferential statistics (L,~JER2007). In addition,
a circular reasoning can emerge from the process of recognizing discrete objects in the field
before sampling and then analyzing these field data to make generalizations. This circularity
To sample or not to sample?
213
of reasoning was evidenced by many authors as one of the basic problems of present-day
phytosociology (see e.g. EWALD 2003, WILLNER 2006).
In the case of plant communities as operational units (PIELOU 1977, PALMER & WHITE
1994b) these problems do not arise, because it is possible to decide a priori that a plant
community is the assemblage of plant species within a certain fixed area, irrespective of
whether or not this represents a natural unit, a unit with a certain degree of species
interactions, or a unit linked by any other integrative, circumscribing force. This is a very
simple solution to a very complex problem! These units can, in fact, be located by any
probabilistic procedure and consequently do not face the problems associated with the use of
statistical test. Restricted (or stratified) randomization can help to ensure that all the
vegetation types, including the rarest ones, are sampled. This approach retains a statistical
validity of the sample, whilst ensuring a more even spread of sample plots over the area.
Size of sampling units
Another important problem is connected with the area to be sampled. Vegetation is
a natural phenomenon showing a strong spatial dependence. BRAUN-BLANQUET (1965)
indicated that if the uniform community is circumscribed it can be completely surveyed, but
"... if the uniform community is too extensive, the investigator must be satisfied with samples
of vegetation". It is then suggested that the minimum-area approach should be adopted to
determine the area size of each observational units. At the time of Braun-Blanquet, the fact
that area is one of the most important factors influencing communities was already known
(ARRHENIUS 192 l, GLEASON 1922), but the concept that spatial scale is itself a determinant
of species richness and composition was not assimilated by the schools devoted to the
vegetation description. In recent years there have been clear statements on the inapplicability
of the minimum-area concept (BARKMAN 1989, VAN DER MAAREL 1996), but these have
remained almost completely unheeded, and this concept is still used by present-day
phytosociological literature (see e.g. RIVAS-MART[NEZ 1996). In a recent analysis of two
large data sets (made of 41,174 and 27,365 phytosociological relevds respectively), CHYTRS?
& OTYPKOVA(2003) found that "the effect of variable plot sizes on vegetation analysis and
classification is not sufficiently known..." and "... in some situations, sampling in either small
or large plots may result in assignment of relev6s to different phytosociological classes or
habitat types. Therefore defining vegetation and habitat types as scale-dependent concepts is
needed". The authors suggested that fixed size plots should be used.
Substantially, it is not possible to make rigorous comparisons of species composition or
richness of plots that differ in area (CONNOR & McCoY 1979, MAGURRAN 1988, PALMER&
WHITE 1994a, PEET et al. 1998, WtLLIAMSON2003). Also in this case, the solution provided
by considering plant communities as operational units (PIELOU 1977, PALMER & WHITE
1994b), and not as natural units, is simple and powerful because, it allows planning
a probabilistic sampling design and using all the needed statistical tests, simply by deciding
that the analysis is performed at one spatial scale (or more than one).
214
A. Chiarucci
What can we do? An operational approach
Obtaining results with a general validity from samples of plant communities is clearly
a formal problem that needs a similarly formal solution. If vegetation scientists continue to
subjectively decide where to locate the sample plots and how large these should be, the
criticisms made by LAJER (2007) will be valid and no generalization will be possible! As
clearly stated by LAJER (2007), to be valid any statistical test should be applied to a sample
obtained by some sort ofprobabilistic procedure. This implies that all sampling units drawn
from the studied statistical population should have a known chance of being sampled and that
the sampling of each unit is independent from the sampling of any other. In addition, I can add
that outside the context of a formal sampling design, the use of terms like sample units and/or
sample plot may be questionable. However how is it possible to probabilisticaUy collect
a sample of something for which a formal definition is not available? It is first necessary to
agree on a definition of vegetation and plant communities! In my opinion, "vegetation is the
spatial and temporal coexistence of plants" (by "coexistence" I do not mean that a mechanism
is necessarily implied; by "plants" I mean individuals of the same or different species).
Vegetation is a general phenomenon that is investigated and modelled using the plant
community concept. Recalling the work of PIELOU (1977), PALMER& WHITE (1994b), and
many others, I (re)propose the following operational definition: "A plant community is
formed by all the plants living in a given unit of space and time". Plant communities defined in
such a way can show regularities in patterns and assembly rules at any spatial and temporal
scale and these are investigated by vegetation science. As a consequence, the study of
vegetation and plant communities can be performed at different spatial and temporal scales,
depending on the purpose of the study itself. If vegetation scientists used similarly operational
definitions, they would be able to use probabilistic sampling and statistical testing. As
suggested by LAJER(2007), a correct sample of plant communities in an area can be obtained,
e.g. by a random selection of points within this area, by using independent coordinates for
each point, and then using fixed-area plots centred on these points. Other probabilistic
approaches rather than pure random sampling can be adopted to have adequate sampling of
habitat types, environmental strata or to solve other special issues. WILSON(2007) stated that
a win-win solution is provided by restricted randomization!
For those who still like to consider plant communities as natural entities to be identified in
the field using the expertise of the vegetation scientist, the solution is similarly simple: they
only need to avoid the term "sampling" and avoid formal statistical testing! They could
describe their work as "field recognition of plant communities" or "expert
classification/identification of plant communities". In this case, statistical tools can be used to
summarize data and make descriptive outputs, but not to provide generalizations by means of
statistical inference.
Concluding, if vegetation scientists wish to sample they need to clearly define the
statistical population of concern and this is possible only when using an operational definition
of plant communities; if vegetation scientists wish to recognize the plant communities as
natural objects they do not need to sample!
To sample or not to sample?
215
Acknowledgements: This paper was based on several readings and discussions I had with many colleagues and
friends, who helped me, over many years, to arrive at an operational view of plant communities; for this, I would
like to remember and acknowledge (more-or-less in chronological order) Stefano Mazzoleni, Simona
Maccherini, Barbara J. Anderson, Susan Walker, Lorenzo Fattorini. A special thanks is due to Marco Ferretti,
J. Bastow Wilson and Mike W. Palmer, who contributed very much to these ideas but also commented on an
earlier version of this unusual manuscript.
REFERENCES
ARRHENIUSO. (1921): Species and area. J. Ecol. 9: 95-99.
BARKMANJ.J. (1989): A critical evaluation of minimum area concepts. Vegetatio 85: 89-104.
BRAUN-BLANQUETJ. (1965): Plant sociology. The study of plant communities (Facsimile of the edition of
1932). Hafner Publishing Company, New York and London.
CHYTRX[M. & OTYPKOVAZ. (2003): Plot sizes used for phytosociological sampling of European vegetation. J.
Veg. Sci. 14: 563-570.
CLEMENTS F.E. (1916): Plant succession: an analysis of the development of vegetation. Carnegie Inst.,
Washington, DC.
CONNOR E.F. & MCCOY E.D. (1979): The statistics and biology of the species-area relationship. Amer.
Naturalist 113: 791-833.
ELZINGA C.L., SALZER D.W., WILLOUGHBY J.W. & GIBBS J.P. (2001): Monitoring plant and animal
population. Blackwell Science, Malden.
EWALD J. (2003): A critique for phylosociology. J. Veg. Sci. 14: 291-296.
FERRETTI M. & CHIARUCCIA. (2003): Design concepts adopted in long-term forest monitoring programs in
Europe- problems for the future? Sci. TotalEnvironm. 310: 171-178.
FORD E. D. (2001): Scientific methods for ecological research, Cambridge University Press, Cambridge,
GLEASON H.A. (1922): On the relation between species and area. Ecology 3: 158-162.
GLEASONH.A. (1926): The individualistic concept of plant association. Bull. Torrey Bot, Club 53: 7-26.
KENKEL N.C., JUHASZ-NAGYP. & PODANI J. (1989): On sampling procedures in population and community
ecology. Vegetatio 83: 195-207.
L•JER K. (2007): Statistical tests as inappropriate tools for data analysis performed on non-random samples of
plant communities. Folia Geobot. 42:115-122.
LINDEMANR.L. (1942 ): The trophic-dynamic aspect of ecology. Ecology 23:399-418.
MAGURRANA.E. (1988): Ecological diversity and its measurement. Chapman and Hall, London.
MIRKIN B.M. (1987): Paradigm change and vegetation classification in Soviet phytocoenology. Vegetatio 68:
131 138.
MORAVEC J. (1989): Influences of the individualistic concept of vegetation on syntaxonomy. Vegetatio 81:
29-39.
NICOLSONM. (2001): "Towards establishing ecology as a science instead of an art": the work of John T. Curtis
on the plant community continuum. Web Ecol. 2: 145.
PALMER M.W. & WHITE P.S. (1994a): Scale dependence and the species-area relationship. Amer. Naturalist
144: 717-740.
PALMER M.W. & WHITE P.S. (1994b): On the existence of ecological communities. J. Veg. Sci. 5: 279-282.
PEET R.K., WENTWORTHT.R. & WHITE P.S. (1998): A flexible, multipurpose method for recording vegetation
composition and structure. Castanea 63: 262-274.
P1ELOU E.C. (1977): Mathematical ecology. John Wiley and Sons, New York.
POTT R. (1996): Plant communities as subject of research of phytosociology in Germany. In: LOIDI J. (ed.),
Avances en Fitosociologia (Advances in phytosociology), Servicio Editorial Universidad del Pals Vasco,
Bilbao, pp. 115-124.
R/VAS-MART~NEZ S. (1996): La fitosociologia en Espana. In: LOIDI J. (ed.), Avances en fitosociologia
(Advances inphytosociology), Servicio Editorial Universidad del Pais Vasco, Bilbao, pp. 149-174.
STEVENS D. (1994): Implementation of a national monitoring program. J. Environm. Managem. 42: 1-29.
VAN DER MAAREL E. (1996): Vegetation dynamics and dynamic vegetation science. Acta Bot. Neerl. 45:
421-442.
216
A. Chiarucci
WILLIAMSON M. (2003): Species-area relationships at small scales in continuum vegetation. J. Ecol. 91:
904-907.
WILLNER W. (2006): The association concept revisited. Phytocoenologia 36: 67-76.
WILSON J.B. (1991): Does vegetation science exist?./.. Veg. Sci. 2: 289-290.
WILSON J.B. (2007): Priorities in statistics, the sensitive feet of elephants, and don't transform data. Folia
Geobot. 42: 161-167.
Received 17 July 2006, revision received and accepted 4 December 2006