4_Documentation and metadata

Documentation and metadata are all about clarity and context. What is the data, and what is the
story of its creation? Researchers submitting good documentation and metadata enable archives
to create rich and detailed metadata that allows people to get the best from data. Good, detailed,
comprehensive data documentation leads to good data re-use. This is good for science and good for
your archive.
Documentation
Documentation is contextual material generated in the course of data creation, analysis, and data
archiving. Without it, the reuse of data is impossible because in order to receive valid answers to new
research questions, it is indispensable to know the who, what, where, when and why of the data’s
creation.
A rich set of documentation should therefore include information on
Objectives of data creation – hypotheses, operationalization, and funding proposals.
Methods of data collection – questionnaires, interview schedules, instructions, consent
procedures.
Structure and relationship of files – organization of data, and naming conventions.
Quality assurance procedures – problems encountered and solutions applied, cleaning and
data verification procedures.
Data manipulations – anonymisation undertaken; recoding.
The European Values Study Longitudinal Data File 1981-2008 (EVS, 2011) provides a good example
of study documentation.
Metadata
Good metadata enables data discovery and data re-use. That is why metadata is critical to archives
and repositories. As an agreed set of standards, it allows humans and machines to discover,
comprehend, and evaluate data across time and distance without having to access the data itself.
Furthermore, it facilitates bibliographic data citation and is indispensable in the long-term curation of
data, providing information on provenance, technical, and legal aspects.
The bulk of metadata associated with archived datasets is created in the ingest process. However,
this metadata is not static and can undergo changes even after a resource has been archived – for
example to document any preservation action involving changes to the data (e.g. migration to a
different format) or to document changes made to correct mistakes.
CESSDA AS
CESSDA House
Parkveien 20
5007 Bergen
NORWAY
phone: +47 55 58 21 18
e-mail: [email protected]
www.cessda.net
Consortium of
European Social Science
Data Archives
Documentation and Metadata
The main types of metadata are:
Descriptive: provides information on the intellectual content of a data collection. Researchers are
critical to producing descriptive metadata, providing information on the fundamental nature,
structure, context and sources of data. This metadata can take different forms, applicable at study,
case, and variable level.
Administrative: includes information that helps archives and repositories ingest and manage data for
preservation. Archives and repositories, along with researchers are important creators of
administrative metadata. Examples of administrative metadata include recording data formats,
copyright ownership and terms of re-use licenses. As the National Information Standards
Organization (NISO) points out, rights management metadata and preservation metadata are
subsets of administrative metadata often listed separately (NISO, 2004).
Structural: concerns (physical or logical) links between objects or between parts of a complex object.
For example, the relationship between variables and cases in a dataset, different waves in a
longitudinal study, pages in an interview, chapters in a book, images in a video.
Technical: gives information about mime types, file formats, file size, encoding, or storage.
Preservation Description Information
In the OAIS reference model, an important function of metadata is to provide Preservation
Description Information (PDI). PDI is all the information needed to preserve data. At the same time,
PDI has the capacity of engendering trust, as it focuses particularly on “describing the past and
present states” of a given resource, “ensuring it is uniquely identifiable, and ensuring it has not been
unknowingly altered” (CCSDS, 2012, p. 4-29). PDI comprises of five different types of metadata (see
box).
Preservation Description Information
Reference Information: information that helps unambiguously identify a resource, for example, a DOI®, an
archive internal study number, call number, ISSN, or bibliographic description.
Context Information: information on how a resource relates to its environment. This includes information on
related documents or datasets, but also reasons the resource was created.
Provenance Information: information documenting changes made to data since its creation. Typically, this
takes the form of “a set of accumulative, chronologically ordered records that describe the events in the life of
the content data” (Factor et al., 2009, no pag.).
Fixity Information: provides means to detect unauthorized (i.e. undocumented) changes to a resource, for
example by means of checksums or hash functions. It enables archives to monitor the stability, integrity, and
authenticity of their digital assets.
Access Rights Information: information on the license and legal conditions under which resources are
preserved, accessed, and disseminated.
Metadata schemes
Metadata regarded as “suitable” and necessary for describing a resource can differ depending on
subject discipline, type of resources to be described, intended uses or user communities. Accordingly,
many types of metadata schemes (also referred to as “element sets”), exist for differing needs and
disciplines. The semantics of a scheme provides definitions and meaning for each element. In
addition, there are established rules as to formulating content and representation (for example,
capitalization of terms). There may also be a controlled vocabulary for the values of elements.:
2
Here are some examples for metadata schemes:
Data Documentation Initiative (DDI; http://www.ddialliance.org/): social science archiving
Dublin Core® Metadata Initiative (DCMI; http://dublincore.org/): focuses on networked
resources
DataCite Metadata Schema (http://schema.datacite.org/): metadata properties chosen for
the accurate and consistent identification of data for citation and retrieval purposes
Metadata Encoding and Transmission Standard (METS;
http://www.loc.gov/standards/mets/): for Librarianship
Preservation Metadata Maintenance Activity (PREMIS;
http://www.loc.gov/standards/premis/): Data Dictionary for Preservation Metadata
ISO 19115 (http://www.iso.org/iso/catalogue_detail.htm?csnumber=26020): Geographic
data and associated services, spatial-temporal, data quality, access and rights to use
ISO11179
(http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35343)
: organizational metadata. Managing elements in a registry to create a common
understanding of data across and between organizations
Statistical Data and Metadata Exchange (SDMX; http://sdmx.org/): statistical data standard
and harmonization
This list is far from comprehensive. Furthermore, these schemes are not mutually exclusive. For
example, DDI aligns with other metadata standards like Dublin Core, ISO1179, SDMX, and ISO
19915. ISO 19115 influences the coding of the spatial coverage of a study, and SDMX maps to DDI3
because its fields are closely related.
References
CCSDS. (2012). Reference Model for an Open Archival Information System (OAIS). Recommended Practice
(No. CCSDS 650.0-M-2). Retrieved from
http://public.ccsds.org/publications/archive/650x0m2.pdf
EVS. (2011). European Values Study Longitudinal Data File 1981-2008 (EVS 1981-2008), Version
2.0.0. GESIS Data Archive, Cologne. ZA4804. doi:10.4232/1.11005
Factor, M., Henis, E., Naor, D., Rabinovici-Cohen, S., Reshef, P., Ronen, S., … Guerico, M. (2009).
Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in
Preservation Aware Storage. Retrieved from
http://static.usenix.org/event/tapp09/tech/full_papers/factor/factor.pdf
National Information Standards Organization (NISO). (2004). Understanding Metadata.
Retrieved from http://www.niso.org/publications/press/UnderstandingMetadata.pdf
3
Further Reading
Bailey, Jefferson (2012): File Fixity and Digital Preservation Storage: More Results from the NDSA
Storage Survey. In: The Signal. Digital Preservation.
http://blogs.loc.gov/digitalpreservation/2012/03/file-fixity-and-digital-preservation-storagemore-results-from-the-ndsa-storage-survey/
Zwaard, Kate (2011): Hashing Out Digital Trust. In: The Signal. Digital Preservation.
http://blogs.loc.gov/digitalpreservation/2011/11/hashing-out-digital-trust/
This work is licensed under a Creative Commons Attribution 4.0 International License.
4