Recommendations on how to improve the quality of the data Version 2.3

Coordination Action for the
integration of Solar System
Infrastructures and Science
Project No.: 261618
Call: FP7-INFRA-2010-2
Recommendations on how to improve the
quality of the data
Version 2.3
Title:
Recommendations on how to improve the quality of data
Document No.:
Date:
CASSIS
Deliverable: D2.3 v2.3
Dec 02, 2013
Editor:
Christian Jacquey
Contributors:
Jean Aboudarham, David Berghmans, Baptiste Cecconi, André
Csillaghy, Véronique Delouille, Sébastien Hess
Distribution:
Project
Recommendations on how to improve the quality of data
Deliverable: D2.3
Revision History
Version
Date
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
2.0
11-June-2012
14-June-2012
15-June-2012
19-June-2012
21-June-2012
21-June-2012
27-June-2012
29-June-2012
24-Nov-2013
C. Jacquey
2.1
29-Nov-2013
J. Aboudarham
2.2
2-Dec-2013
J. Aboudarham
2.3
08-Jan-2014
R.D. Bentley
23/1/14 16:25
Released by
Detail
A. Csillaghy,
C. Jacquey
A. Csillaghy
C. Jacquey
A. Csillaghy
C. Jacquey
C. Jacquey
J. Aboudarham
Update document with EuroPlaNet and
IMPEx information
Update with additional information on
FITS standards
Change character font in accordance to
CASSIS standards and correct some typos
Corrected things found during checking.
Recommendations on how to improve the quality of data
Deliverable: D2.3
1. Introduction........................................................................................................................ 1 2. Description of the data in Solar System Sciences. ............................................................ 1 2.1. Diversity of the field in Solar System Sciences.......................................................... 1 2.2. For who? ..................................................................................................................... 2 2.3. Who? ........................................................................................................................... 2 2.4. When? ......................................................................................................................... 2 2.5. Where? ........................................................................................................................ 3 2.6. What? .......................................................................................................................... 3 3. What does exist already in terms of describing data?........................................................ 4 4. Different layers of interoperability .................................................................................... 8 5. Levels of interoperability versus science cases. .............................................................. 11 6. Layer-1: basic search of data. .......................................................................................... 13 6.1. Matrix of the primary metadata used in the SOTERIA, HELIO and EUROPLANET
RI services. ......................................................................................................................... 13 6.2. Frequently used primary metadata............................................................................ 13 6.2.1. When ? ............................................................................................................... 13 6.2.2. Where ? .............................................................................................................. 16 6.2.3. What ? ................................................................................................................ 19 7. Summary of recommendations. ....................................................................................... 26 8. List of useful links ........................................................................................................... 29 8.1. Institutions ................................................................................................................ 29 8.2. FITS information ...................................................................................................... 29 8.3. Projects web pages and documents........................................................................... 29 8.4. Data Model, UCD, Metadata… ................................................................................ 30 8.5. Tools for data mining and data visualization based on VO standards ...................... 31 8.6. Others ........................................................................................................................ 31 23/1/14 16:25
Recommendations on how to improve the quality of data
Deliverable: D2.3
23/1/14 16:25
Recommendations on how to improve the quality of data
Deliverable: D2.3
1. Introduction
The aim of this document is to provide recommendations for improving the quality of the
metadata in order to enhance the interoperability between the services that the EU projects
SOTERIA, HELIO and EUROPLANET RI provide.
Recently, IMPEx (Integrated Medium for Planetary Exploration) provided a data model to
interconnect planetary observations with numerical models. We shall take this work into
account in this document.
These recommendations could serve beyond the perimeter of these projects and could be
considered more generally in any data production and system related to solar system sciences.
In particular, they could help in the context of the up-coming and future missions such as
SOLAR-ORBITER, ROSETTA and BEPI-COLOMBO.
This document follows
o the review of the services offered in these three projects and in the US VOs as well
(CASSIS/D2.1);
o the analysis (CASSIS/D2.2) of the metadata used in these services, focusing on how
they are appropriate -or conversely inhibit- interoperability.
2. Description of the data in Solar System Sciences.
2.1. Diversity of the field in Solar System Sciences.
The Solar System sciences may be seen as consisting of two major disciplines, the
heliophysics and the planetology. Each one includes a number of sub-disciplines which
intersect depending of the studied thematic.
In heliophysics, we may identify sub-disciplines such as:
- Solar Physics
- Solar Corona Physics
- Interplanetary medium and solar wind physics
- Planetary (including Earth) magnetospheric physics
- Planetary (including Earth) ionospheric physics
In planetology sub disciplines are:
- Planetary (including Earth) magnetospheric physics
- Planetary (including Earth) ionospheric physics
- Atmospheric physics and chemistry
- Surface mineralogy, physics and chemistry
- Planet interior physics (geodynamics)
1
Recommendations on how to improve the quality of data
Deliverable: D2.3
- Small bodies dynamics, physics and chemistry
- Interplanetary matter
The associated communities have a usage of the data which may be different depending on
the sub-disciplines and the types of the exploited data. Consequently the way of describing
data is heterogeneous through the various fields of the Solar System sciences. Some
communities developed standards and other ones did not. At the present time, reaching a
standardized description for all fields at a level such that the data can be used by scientific
tools cannot be envisioned. Such evolution will take years.
However, we can identify the main components on which the description of the data is built.
In this aim, let us consider the questions we have when we want to characterize a particular
set of data:
Important notice: the following applies to the observational data. For simulation or model
result data, and for catalogues, there is an additional complexity that is often not solved yet,
in Solar System Sciences and in Astronomy as well. In space-physics interplanetary
medium, for magnetosphere and in a limited way for ionosphere, it is covered by IMPEx,
which is at a nearly operational stage, and has begun to be spread out of the original
IMPEx program participant (UCLA, LESIA,...).
2.2. For who?
The description of the Solar system sciences data does not often exist, making it necessary to
contact the mission or instrument PI to get information about the data location, format,
caveats, and context. This is most often be done easily by humans at the expense of a lot of
time. Machines could speed up this process by thousands of times, but would need a more
accessible description of the data. Moreover, machines do not think or understand these
descriptions in the strictest sense. At most, they can relate on predefined keywords to search,
sort and analyze the data. In order to be efficient these keywords must be both precise (so the
result is relevant) and in a limited quantity (so that different datasets can be related together).
This is a difficult task in a limited domain of astrophysics, but Solar System Sciences cover
many domains of physics, rendering the task much more complex.
2.3. Who?
Who produces these data? Who is the contact person if we want information on them? What
are the rights associated with them? It is generally easy to generate a description which
answer to these questions and this block of information is present in most of the existing
standards.
2.4. When?
What is the time period represented by the data? When the data have been acquired? To
which time are they related, if they are simulation or predictive data? This information is
crucial when the data correspond to measurement or predictions related to dynamical
phenomena. It is the case for example for plasmas, atmospheric or solar physics. In these
fields, the first criterion for searching data is the time-period. The description of this
information is in principle simple, as it consists of a start-time and an end-time, bounding the
period covered by the data. However, it become more complicated when the data correspond
2
Recommendations on how to improve the quality of data
Deliverable: D2.3
to simulation or prediction of propagating phenomena (e.g. Coronal Mass Ejection). In this
case, the time-period can only be considered by also answering to the question: Where?
2.5. Where?
From which location do the data provide information? What is the target? This question can
already be more complex. In the case of observational data, the target can be local if they
consist of in-situ measurement or distant if they come from remote sensing. Some particular
observation types mix both, as for example the measurement of electromagnetic waves which
can be generated locally by plasmas or come from distant emission sources.
In Solar System sciences, describing the data for answering the "Where?" question can be
declined hierarchically. The first level consists of the object of the Solar System to which the
data correspond; which corresponds generally to the target of the observation. The
target_class defined in EPN DataModel defines the possible values:
•
•
asteroid, dwarf_planet, planet, satellite (types from IAU list IAU nomenclature for object
types: http://planet arynames.wr.usgs.gov/Page/Planets )
comet, exoplanet, interplanetary_medium (solar wind), ring, sample, sky, spacecraft,
spacejunk, star (Sun) (extra types defined for EPN)
The regions or sub-objects can be considered as a second level. In this class, we can find
regions like magnetosphere, atmosphere, corona, Olympus mount on Mars, etc... Sub-objects
can be the Saturn rings, Mars polar cap, etc... However, this class can become difficult to
describe if we want to become more precise. For example, in magnetospheric physics, it could
be useful to indicate if a set of data corresponds to the "dayside magnetopause" or "Pavonis
Mons volcano (on Mars)". But it is obvious that pushing the precision too much will lead to a
complexity of the data description which could become impossible to manage. It was
proposed in EPN at some point to have a "target-region" and a "target-feature" (which is a
level below) but it appeared to be very difficult to describe it formally.
A third level consists of characterizing the target using coordinates. This is easily done in
astronomy because in this case the data come only from remote sensing and there is a
commonly used coordinate system (right ascension and declination). The world coordinate
system is defined in Astronomy for use in the FITS format in
http://fits.gsfc.nasa.gov/fits_wcs.html.
In the Solar System, it is much more complex because (i) the data come from remote sensing
or in-situ measurement, (ii) there is a plethora of objects and associated coordinate systems. In
the case of the Sun, W.T. Thompson (2005) has provided a detailed description of the
coordinate systems for solar images date; then a use of the world coordinate system in the
case of the STEREO probes, in W.T. Thompson (2010).
2.6. What?
What is the type (time series, images, distribution functions, ...) of data? What is the content
of the data? What are the physical parameters provided by these data? What are their
uncertainties, and how were they computed? Which instrument obtains them, and onboard
which observatory? What are the used instrumental techniques? What were the experimental
conditions?
For providing a description answering this question which can rapidly become very complex,
it is useful to consider different levels of information:
3
Recommendations on how to improve the quality of data
Deliverable: D2.3
- the data product level: this level indicates what is the type of data product (detailed values
in section 6.2.3, Data Product subsection)
- the basic instrument level: here, only the name of the observatory and the name of the
instrument are considered;
- the measurement type level: at this level, the description indicates the type of physical
quantities contained in the data. For example, the data contains "magnetic field
measurement", "electron measurement" or "Infrared part of the spectrum measurement". This
is much more tricky than was thought at the beginning, and it is not implemented yet in EPNTAP. This must be further studied, and defined, but in version 0.3 of the Solar System UCDs
document (Cecconi et al.) submitted to IVOA, we can find: “Discuss if the initial goal of
using UCDs for “Measurement Types” information is reached. Probably not.”
- the physical parameter level: this the precise description of the content of the data. Reaching
this level, if enough developed, should allow the scientific exploitation with tools which can
interpreted the associated metadata. They have to include all the needed information that
includes the type of parameter, the exact measured physical quantity, the units, the coordinate
systems, and in some case, calibration parameters or experimental context indications. This
level becomes rapidly complex in Solar System sciences. For example, for ion dynamical
spectra coming from a mass spectrometer, this requires indicating the energy range, the angles
defining the view field, the mass of ions, their ratio mass/charge, and it may be necessary to
have also the measurement of the spacecraft potential. In planet surface imaging, a lot of
metadata are needed in order to extract scientific information by taking into account the
geometrical aspects (scaling, reflectance depending on the incidence, ...).
3. What does exist already in terms of describing data?
Describing data in the solar system sciences has been approached in several different ways.
We shortly review the efforts in a chronological order, in order to place our work into a
broader context.
Solar physics has been confronted with needs of common data descriptions already at the start
of SOHO. There were no Web Services then, but nevertheless there was a clear advantage to
have common descriptions of the data to enable re-using many of the software utilities. Thus,
a simple common vocabulary was introduced. S. Freeland described “Tags” that were
recommended to use in FITS files, making these files readable by many programs in the IDL
SolarSoftWare (SSW) environment1. These tags are still widely used and are based on FITS
standards (Pence, W D, L Chiappetti, C G Page, R A Shaw, and E Stobie. 2010. “Definition
of the Flexible Image Transport System (FITS), Version 3.0.” Astronomy and Astrophysics
524 (November 22): A42. doi:10.1051/0004-6361/201015362.). They have spread widely in
this community. The following table shows some of these tags.
It is interesting to note that we already see here that one of the main concerns is how to
represent date and time information.
1
See page at http://www.mssl.ucl.ac.uk/surf/sswdoc/solarsoft/ssw_standards.html
4
Recommendations on how to improve the quality of data
Deliverable: D2.3
SSW keywords for the solar software tree (from ssw_standards.html)
Tag
DATE_OBS
Status
Required
Source
Definition
W.Thompson,
R.Howard
The time and date of the start of the
observation, in Earth-adjusted UT in a
timestamp format recognized by the
SSW ANYTIM routine (*). CCSDS ASCII
Calendar Segmented (ISO 8601). Example:
1988-01-18T17:20:43.123Z
The duration of the observation, in seconds
(floating point). Permits calculation of the
"mean time" of an observation with the
agreed standard for DATE_OBS.
EXPTIME
Suggested FITS Std
TIME
Suggested
W.Thompson,R.Howard
Milliseconds of day (long integer)
Yohkoh / SOHO
MJD
Suggested
Modified Julian Day (long integer). Used in
W.Thompson,R.Howard
conjunction with TIME to provide efficient
SOHO
binary time representation.
DAY
DATE-OBS
TIME-OBS
Integer number of days since 1-1-79 (short
integer). Used in conjunction with TIME to
provide efficient binary time representation.
Optional SMM / Yohkoh
Suggested FITS std
The date on which the observation was
made, UT. SSW utilities do not directly use
this tag, but its inclusion in headers is
suggested for FITS conformability. (To
access this tag in an IDL structure,
remember to use DATE_d$OBS.)
Suggested Common usage
The time at which the start of the
observation occurred, UT. This tag is not
explicitly defined in the FITS standard, but
has been adopted into common usage. (To
access this tag in an IDL structure,
remember to use TIME_d$OBS.)
(*) The ANYTIM routine can produce this format from all known time standards used in the
solar physics community.
The anytime.pro IDL routine is included in the generic library of SolarSoft. The function is
intended to be Y2K (Year 2 thousand) compliant as it properly handles all conversions within
the range January 1, 1950 through December 31, 2049. There are no time range restrictions
for either of the binary formats ( TIME and DAY or TIME and MJD) or for string formats
which use the fully qualified (4 digit) year, including the formal definition of DATE_OBS
above.
5
Recommendations on how to improve the quality of data
Deliverable: D2.3
If the time description is standardized, the meaning of time in an observation is not so
obvious. This is discussed in section 6.2.1.
The FITS format also proposes coordinates values. The CRPIX, CTYPE and CDELT
keywords, giving respectively the location of Sun’s centre, units of the axis of the ‘image’,
and the pixel resolution in the given unit help to locate an observation on the Sun. In the case
of heliospheres and planets, FITS format does not provide the necessary description in order
to locate correctly observations in a standard format. Some planetologists are in contact with
IAU in order to update FITS standard definition for solving this issue.
The 53 keywords2 found in FITS standard are:
CROTAn EQUINOX NAXISn TBCOLn TUNITn AUTHOR CRPIXn EXTEND OBJECT
TDIMn TZEROn BITPIX CRVALn EXTLEVEL OBSERVER TDISPn XTENSION
BLANK CTYPEn EXTNAME ORIGIN TELESCOP BLOCKED DATAMAX EXTVER
PCOUNT TFIELDS BSCALE DATAMIN GCOUNT PSCALn TFORMn BUNIT DATE
GROUPS PTYPEn THEAP BZERO DATE-OBS HISTORY PZEROn TNULLn CDELTn
END INSTRUME REFERENC TSCALn COMMENT EPOCH NAXIS SIMPLE TTYPEn
In the early Internet age, the European Grid on Solar Observations published 2003 a data
model that reached further than the FITS keywords (K. Reardon et al.,
http://www.mssl.ucl.ac.uk/~rdb/egso/documents/). This data model emphasized the
relationships between the different entities involved in solar observations rather than the
vocabulary. As a result, several different entities were described by the same words,
preventing its wide spread as an interoperability reference document (but this was not its
purpose anyway).
At the same time, the IVOA was investing a fair amount of resources to create data models
for astronomical data as well as creating protocols to make services interoperable. As a result
of this major effort, the IVOA Architecture is a widely regarded set of models and protocols.
Although created for astronomy, the IVOA is a vast source of resources also for solar system
sciences. In particular, the protocols like SAMP and TAP can be used straightforwardly in our
domain. Nevertheless the IVOA does not solve all the problems, in particular in the way it
deals with time. Our domain needs many more ways of expressing data and times.
Also in the framework of astronomy, the Unified Content Descriptors (UCDs) provide a
vocabulary where the solar system sciences can plug in. Here an example of two UCDs
(http://cdsweb.u-strasbg.fr/UCD/tree/js/) for describing time:
•
•
time.start
time.end
However UCDs as defined today are limited and do not provide ways to express all the
semantics available in our (highly dynamic) environment. For instance, there is no easy way
of making a difference between observation and event times, which are fundamental in our
field.
2
Their definition can be found in:
http://www.mssl.ucl.ac.uk/grid/iau/extra/solarsoft/ssw_standards.html or in:
http://bass2000.bagn.obs-mip.fr/Tarbes/spip.php?rubrique22
6
Recommendations on how to improve the quality of data
Deliverable: D2.3
But UCD is probably not the place to describe that… this has to be investigated.
However we can think of:
•
•
time.start;obs and time.start;src
time.end;obs and time.end;src
but we can find, in Solar System UCDs document (V0.3) that “discussion [is] needed on “src”
(definition = Observed source viewed on the sky). For example, change definition so that src
can cover other sources than sky objects (such as samples).
UCDs can be combined. However, there is no precise standard regarding the meaning of these
combinations, rendering them close to useless
Recently, the solar system community began to talk to the UCD providing consortium to
extend it to incorporate also Solar System data. Furthermore, UCDs don’t provide indications
about the format of the date/time.
Stepping in the path of the IVOA, the SPASE (Space Physics Archive Search and Extract)
data model emerged. This is one of the most mature data models in the solar system sciences,
except maybe Earth observations. SPASE has played a particularly important role in the early
phase of HELIO. SPASE received many extensions and is widely used out of the limited
domain of physics it was born in (Heliophysics). Notably it serves as a basis to the IMPEx
data model that describes simulations of the planetary environments.
HELIO took a different approach. It assumed that (too) many data models were already
around and that, rather than inventing a new one, it would unify (or glue) them by defining a
mapping of the different data models using ontologies. This approach corresponds also more
to the path to a vocabulary that is not enforcing specific terms but recommending terms that
still can be mapped to the common vocabulary, leaving it much more free. Nevertheless, an
internal HELIO data model has been used for orchestrating workflows, which are, in fact, also
a specific type of a combination of services.
HELIO also used the work done by IVOA and introduced the PQL standard into solar system
sciences. PQL has been designed as a REST-based query language for astronomy. All HELIO
REST services use PQL as query language. PQL has the characteristics that it functions in a
similar way to SQL, thus the name and the structure of the language. The use of SQL as a
query standard in Web services is very practical from a developer point of view, but it often
confuses users as the terms used to formulate a query follows relational database clauses that
are unfamiliar to many, such as the WHERE clause.
EUROPLANET RI followed an approach similar to the one pursued in astronomy and
attempted to build a data model able to cover the planetology sciences in all its diversity. The
first step was to deeply study the needs and the uses in the different domains and communities
based on use case analysis. This led to a first data model3 based on mixed concepts coming
from IVOA/Characterization and SPASE. This data model was a first assessment of the
required metadata. It is thus rather complete. However, at some point, it was decided to use
TAP as the main data access protocol for the project, and the data model had then to be
rewritten to allow its use with TAP services.
3
http://typhon.obspm.fr/idis/docs/Data_Model_v118a.pdf
7
Recommendations on how to improve the quality of data
Deliverable: D2.3
It has been thereafter converted into a new one, EPN-DM4 using the formalism given in the
IVOA/Obscore, IVOA/VO-resources and IVOA/DataService standards. This work was
inspired by "Obscore" but EPN did not use that standard because it was not ready and was
still too oriented to sky observations. This strategy has been chosen because this data model
allows the use of the PDAP5 protocol from IPDA (International Planetary Data Alliance) for
simple requests and the TAP6 one from IVOA allowing high level queries. In particular
making data compliant to this data model allows the use of the IVOA tools like ALADIN,
TOPCAT, VOSPEC, etc.
4. Different layers of interoperability
The interoperability is needed when a service needs to communicate with another service, for
instance to extract information or data. This is the case when an infrastructure is built with
distributed resources and/or tools.
Building an interoperable infrastructure (or Virtual Observatory) is a necessity for many
scientific disciplines and communities, and in particular for astronomy and solar system
sciences, where observation instruments are usually built by independent groups. However,
the interoperability can quickly drive complexity in such an infrastructure. This potential
complexity also induces risks and costs in regards of infrastructure maintenance and
development, and brings constraints on the data providers (who do not necessarily have the
human resources for supporting them). It is thus very important to identify compromises
between the needs of the scientists and the feasibility/reliability of the envisioned
infrastructures.
Another key consideration is that the insertion of resources into a virtual observatory will
request efforts from the data providers or the service developers. If the standards describing
the resources are too complicated, they will not want them or they will not be able to
participate. It is thus crucial to specify a progressive approach to which the resource providers
could adapt. The most obvious way consists to define layers of interoperability with respect to
the functionalities expected from a virtual observatory.
In a more pragmatic way of thinking, data providers may engage the required efforts if there
are already existing tools or network that may provide added value to their data. Experience
showed that the development of new data models that are not widely implemented by the
existing tools reveals itself useless. The opposite view is also true, as new tools have to rely
on the existing data description. Thus the development of a common data description must
involve a wide community including large data providers and the developers of widely used
tools. The other way around is to proceed to small extensions to existing standards that can be
proposed to the authorities maintaining them [e.g. IMPEx data model is an extension to
SPASE that is proposed to become fully part of the standard].
The previous Recommendation R1, from version 1 of this document was:
We recommend to consider several layers of interoperability corresponding to different layer
of functionality and to develop the standards accordingly
4
http://voparis-europlanet.obspm.fr/docs/PlanetaryScienceResource-DM-latest.pdf
5
http://planetarydata.org/projects/inactive-projects/data-access/documents/pdap-versions/pdap-v1.1/view
6
http://www.ivoa.net/Documents/TAP/
8
Recommendations on how to improve the quality of data
Deliverable: D2.3
But one has to keep in mind that data providers would not maintain more than 1 or 2 layers of
interoperability, and as a tool developer implement more than a couple of them.
Interoperability layering can be useful for the development and evolution of standards, but
should be limited, and should be done in the optics of integrating the upper layers into the
lower ones. Of course, the search and exploitation layers require different approaches, but
they should not be totally disjoined. In particular, they should use similar dictionaries as much
as possible.
This is tricky part: there are 2 ways to address the problem. And data providers may choose
which path to follow depending on their pre-existing infrastructure. Here are the 2 paths:
1. The data provider has a database. It can follow the path described here. The first layer is
what is done in EPN-TAP (more or less) especially if you add "coverage" information
(temporal, spectral, spatial…) and target information (this is rather important!)
2. The data provider has a web page with links to data files (in a VO format like FITS,
VOTable or CDF…). In that case they can start with putting SAMP on their web page. Then
if they have tabular data, it is easy to transform the data into VOTables and then follow the
same path (this is a rather low energy effort). Finally, when the data is available and VOcompliant, the heavier work of setting up a database and putting protocols on top of it can be
started.
So we propose a new Recommendation R1:
We recommend considering several layers of interoperability corresponding to
different layers of functionality and developing the standards accordingly,
although keeping as much consistency as possible between them by using a
common dictionary as much as possible. Two or three layers is probably the
maximum needed.
In this document, we identify three layers of interoperability which relate to three classes of
functionality of an interoperable infrastructure:
Layer-1: Basic search of data.
This layer allows the development of search engines using basic criteria. We define those
basic criteria as being queries over
•
when: the time parameter. It generally correspond to a time period which is the period
of observation in the case of observational data;
•
where: this parameter indicates the object or region which is the target of the
observations. This can correspond to the observatory location in the case of in-situ
measurement or a distant target in the case of remote sensing observation.
•
what: this information can include four main parameter classes, which can be used
separately or combined:
o
o
o
o
the "measurement type" parameter;
the "observatory" parameter
the "instrument" parameter;
the "DataProductType".
9
Recommendations on how to improve the quality of data
Deliverable: D2.3
Layer-2: Precise search of data.
This layer allows the development of search engines using precise criteria, up to a sufficient
"physical parameter level".
Layer-3: Automated basic data processing and scientific analysis.
This layer allows data processing inside workflows without human intervention. In addition
of Layer-2 functionality, this requires the standardization the information on the structure and
the content of the granules (i.e. files in general). For example, if data products such as
spectral cubes or particle distribution function, it is necessary to not only precisely define the
represented physical parameter but it is also necessary to provide the information on how it is
structured inside the files.
This layer allows the data to be exploitable by high levels tools such as ALADIN, TOPCAT,
AMDA, etc. All the information necessary to these tools needed to be provided. Part of this
level can be seen as a common core containing the information that could be needed by and
sufficient for a majority of tools. In addition, the user may need to have access to detailed and
specialized documentation and reference information. These ones can be necessary for
interpreting the data, possible instrumental artefacts or to generate high-level data products.
These documentation can generally be articles published in scientific journals or books (e.g.
ISSI book) or Web pages provided by the data provider. It can also be described as keywords
for tools developed for more precise scientific objectives (CLWeb [http://clweb.cesr.fr] for
particle detector data, ...).
In the scope of the CASSIS project, we aim at analysing the requirements corresponding to
the Layer-1 only as they are identified from the services of the SOTERIA, HELIO and
EUROPLANET RI services.
Beyond this level, the used data models are not stable and standardized enough yet.
Moreover, the diversity of the data types, the measurements and the additional parameters is
so large in the solar system sciences that it seems difficult to envision a global approach. This
will be discussed in the last section.
10
Recommendations on how to improve the quality of data
Deliverable: D2.3
5. Levels of interoperability versus science cases.
Planetology:
The interaction of Saturn's moon Enceladus with the kronian magnetospheric environment is
one of science case analysed in EUROPLANET RI and reported at the Plasma Node site:
http://europlanet-plasmanode.oeaw.ac.at/science-case-3_3.html
This is the summary:
"Enceladus, in Saturn's magnetosphere, is a geologically active moon, in the sense that it gets
stretched and compressed by tidal forces, whilst orbiting Saturn. This heats up the inner
regions of the moon and on the South Pole this results in a reservoir of liquid water. This
water is released into the Kronian magnetosphere through a series of vents [Hansen et al.,
2006; Porco et al., 2006]. These water plumes drastically influence the interaction of the
magnetoplasma with the moon; it creates a strong asymmetry [Saur et al., 2007]. The
sputtering off the moon, mass loading the system [Pontius and Hill, 2009] and these water
plumes create a torus at Enceladus orbit [Johnson et al., 2006]."
Such an object can stimulate the interest of several communities, including the ones studying
the plasmas, the dust, the moon interior dynamics, the moon atmosphere/exosphere, etc... This
is thus a good example of the interest of developing a virtual observatory in solar system
sciences. We can easily imagine that a specialist of the moon interior dynamics could be
interested in using dust or plasma data as a marker for inferring the internal activity of
Enceladus. In this aim, the first step in his/her study will be to find the corresponding
observation. Reciprocally, a plasma scientist may be interested to know where the hot points
on the Enceladus surface are which can be the sources of the geyser that interact with the
magnetospheric plasma and generate waves. The plasma scientist would then be interested in
finding radar or infrared observation of the Enceladus surface.
Figure 1: measurement of electric waves and electron fluxes in the vicinity of
Enceladus, geyser observation at Enceladus in visible, and temperature map of
Enceladus derived from infrared measurement..
The researcher specialist of one domain would like to send a request to a virtual observatory
which will include the "when", "where" and "what" parameters. For example, a plasma
scientist would like to send a request like: "I would like (what) observations in infrared and
visible part of the spectrum obtained (where) on Enceladus and (when) during this specific
time interval". The existence or the non-existence of the searched data will consist of the first
result of this request. As there are not so many data obtained on planets and moons (except
11
Recommendations on how to improve the quality of data
Deliverable: D2.3
the Earth), the researcher will receive back a limited list of data files and he/she will able to
start to analyse them or evaluate their potential for collaboration with the specialists.
As illustrated by this use case, the search using the layer-1 of interoperability is already very
useful. It will save time and energy of the researcher and it will significantly help of
stimulating collaborations and at the end increase the scientific return of the data.
Heliophysics:
Space Weather activity is very concerned with shocks that propagate through the solar
system, from Sun to Earth. Three major phenomena are known to be able to produce shocks
in the solar wind that can affect Earth environment (e.g. particle bombardment of satellites,
contraction of Earth’s ionosphere changing satellites’ altitude, …): Solar flares (including
filament eruptions), Coronal Mass Ejections (CMEs), and Coronal Holes (CH). In the last
one, slow solar wind ejected by quiet Sun is caught up with fast solar wind originating over
CH, producing a shock. If Coronal Holes are well observed and their effect anticipated
(unfortunately, it’s not the most dangerous for Earth’s environment), scientists have been
looking for connexions between flares and CMEs, without success. Flares don’t always
produce CMEs, and CMEs don’t always appear with a flare. We know that both are strongly
related to the solar magnetic field, but not how!
So, to understand those phenomena (and it’s the same with the formation of CH, which is still
unknown, but also related to magnetic field), we need to be able to study a large set of flares
and CME, and their effect in the solar wind, in order to try to extract statistical behaviours of
those phenomena. For instance, flares may occur on the back side of the Sun and their effect
propagate towards us; and some CMEs can be so faint that they are not detected near the Sun.
In both cases, being able to follow shocks in the solar wind may give indication that
something occurred.
To put through this statistical study, layer-1 of interoperability will provide all the needed
information: We need to query data at a given time (possibly interval); at a given place on the
Sun, or a specific location of the solar system (allowing, then to follow propagation).
Location on the Sun is a critical parameter, in order to understand the underlying magnetic
field behaviour, because this will allow queries for data that might be observing this region
with a high spatial resolution (measuring polarization, …). And, of course, in order to ensure
that we compare what can be compared, the “what” information of Layer-1 is mandatory.
There is however a important difference with the previous use case in planetology. There
were, are and will be many more observatories providing data on the Sun or the Earth space
environment. If the user request includes elements that are too simple, the answer from a
search engine will provide too much output, difficult and time consuming to manage. So, in
the case of heliophysics:
- the "where" information should at least indicates the regions (example: "solar wind near
Earth", or "solar corona")
- the "what" information (its "Measurement Type" or "DataProductType" elements) has to be
more detailed for heliophysics than in planetology. In particular, in solar physics it should be
important to specify if the image/movie data consist of wide (full Sun) or short angle.
We can note also that the phenomena studied in heliophysics propagate. In consequence, the
"when" information is more complicated to manage and characterize. The times can indicate
the periods of the observation, these periods propagated (with models or heliospheric
imaging) at another location, the periods of event, the periods of the life of structures, .... This
is discussed in section 6.2.1.
12
Recommendations on how to improve the quality of data
Deliverable: D2.3
6. Layer-1: basic search of data.
6.1. Matrix of the primary metadata used in the SOTERIA, HELIO and
EUROPLANET RI services.
The interoperability is built on protocols that allow the exchange of information and data
between the services. The information on the data or the services needs to be expressed
through a common language, interpretable by the machines. The metadata thus need to be
standardized and organized into data models.
In order to identify (i) the intersection and (ii) the whole set of their primary metadata, we
have analysed the services of SOTERIA, HELIO and EUROPLANET RI. These primary
metadata could globally be considered as the ones corresponding to the interoperability layer1. The results are reported in the following tables.
As complementary information sources we used:
- The document "Specification on Metadata Standards in Heliophysics" (HELIO N3.3
deliverable):
http://www.helio-vo.eu/internal/Documents/Deliverables/HELIO-N3-005d3_MetadataStandards_v1.7.pdf
- The documents describing the data models and the protocols used in EUROPLANET RI:
o The document describing the PDAP protocol:
http://planetarydata.org/projects/inactive-projects/data-access/documents/pdapversions/pdap-v1.1/view
o The document describing the EPN-TAP protocol:
http://voparis-europlanet.obspm.fr/Tdocum.shtml
o The document describing the EPN-DM data model:
http://voparis-europlanet.obspm.fr/docs/PlanetaryScienceResource-DM-latest.pdf
6.2. Frequently used primary metadata.
6.2.1.
When ?
The Table 1.1 indicates how the time parameter is managed in the various services of
SOTERIA, HELIO and EUROPLANET RI projects. We note that the time parameter is used
in all the services.
Format of the dates:
The format of dates of most of the services of HELIO is given in UTC system, using the
CCSDS ASCII (Calendar Segmented time code format, ISO8601):
YYYY-MM-DDTHH:MM:SS(.fff), example: 2003-10-27T04:00:00
This format is now used by nearly all the recent databases.
EUROPLANET RI uses Julian day at the protocol level. But client and server interfaces can
easily make transformations between those two.
13
Recommendations on how to improve the quality of data
Deliverable: D2.3
However, in SOTERIA services, the format is different and not homogeneous through them.
As the dates consist of crucial and widely used parameters, they are exchanged between
interoperable services very often, quasi-systematically in fact. If the services use their own
specific format, it is necessary (i) to integrate a translation functionality inside each of them
and (ii) to add a new metadata indicating the format used for date. Moreover, these format
need to be maintained. Their specifications need to be registered somewhere and to be
accessible. It is clear that this will represent an additional charge of maintenance and an unuseful risk for sustainability. It seems highly preferable that every one uses the same date
format in metadata.
Recommendation R2.1:
For developing interoperability, it is recommended that a standardised format
for writing the dates in the metadata should be adopted. This standardised
format should be ISO 8601.
Meaning of the time parameter:
In many services, the time parameter indicates the time interval of an observation or the time
interval of a data file or a data set. In this case, the time parameter consists of a couple of
dates: START_TIME, END_TIME.
However, in some services, the time parameter has another meaning. It can indicate for
example a predicted or propagated time interval (for example in the Propagation service under
development in HELIO, or the one at CDPP, or in the context of Space Weather services). It
can also characterize the operation time period of an instrument or of an observatory (e.g. in
the instrument descriptor of EUROPLANET RI). In the catalogues of events or features, the
time parameters can indicate the start and end times of events or features, but also additional
characteristic times (e.g. peak time, growth time, visibility times ...). In an interoperable
system, the machines have to interpret the time parameters in respect to the right meaning.
One possibility for solving this issue could be to consider to associate an additional metadata
to the time parameters indicating its meaning. For example, the values of this metadata could
be:
• OST: Observation Start Time
• OET: Observation End Time
• EST: Event Start Time
• EET: Event End Time
The interesting time for science is the time at the target (i.e. event time).
Recommendation R2.2:
The time parameter may have different meanings. It is recommended that a way
be found to indicate clearly what the parameter is, for example by associating an
additional metadata providing its exact meaning.
14
Recommendations on how to improve the quality of data
Deliverable: D2.3
Table 1.1-a: Time parameters for the SOTERIA services
Table 1.1-b: Time parameters for the HELIO services
15
Recommendations on how to improve the quality of data
Deliverable: D2.3
Table 1.1-c: Description of the StartTime parameter for the EUROPLANET RI
services, from the EPN Core Resource Definition
6.2.2.
Where ?
As indicated by Table 1.2, the "where" parameters are not used uniformly through the various
services. In SOTERIA, most of the services provide specific products that do not need to be
filtered by a location criteria. The user knows already what these products are. In HELIO, and
for the SolarMonitor of SOTERIA, the “where” parameters are relatively precise. They
include either a defined region or the coordinates of a location of interest in some pre-chosen
coordinate system. In EUROPLANET RI, in the first layer of interoperability, the "where"
parameters consist only of the specification of the object of the solar system that is the target
of the search. The EUROPLANET team attempted to find more precise metadata for
characterizing the location of the targets, but this is still an un-solved complex issue. The
difficulties come from the fact that there are many objects in the solar system, planets, moons,
asteroids and comets, and that there are for each one a number of coordinate systems (related
to the rotation, the magnetic field, the ecliptic plane, or combination of the object with another
one (the sun for example).
Obviously, the needs for a data search corresponding to the layer-1 are not the same if the
target is the sun or the Earth, or if it is another object of the solar system. . There is a major
difference as there are many observations on the Sun or the Earth while there are a very
limited number of missions that explored each of the other objects. If the "where" parameters
only indicates that the user is searching data on the Sun or the Earth, the answer will be an
extremely long list of URLs or files which will not be very useful. Conversely, if the user
only specifies that he/she wants data on Saturn or another particular object, the answer will be
a limited and useful list of files.
In respect to the layer-1, we propose the following recommendations:
Recommendation R3.1:
The precision of the metadata characterising the location of the target of the
measurement should be adjusted as a function of the amount of the potentially
corresponding data.
16
Recommendations on how to improve the quality of data
Deliverable: D2.3
There are regions that are generic for many objects of the solar system. For example, all the
planets have a magnetosphere that is structured with the same regions. It would be interesting
to study the possibility of building hierarchical descriptor like UCD7 (Unified Content
Descriptors) used in IVOA. This approach has been adopted in SPASE.
Example from SPASE:
Earth.Magnetosphere.Magnetotail
Heliosphere.Heliosheath
Sun.Corona
There is a need for metadata describing spatial coordinates. But it is not the goal of UCD to
go so deep in the description. That is why we remove previous Recommendation R3.2 (The
generalization of UCD for describing the location of the data target should be studied and
experimented.) and replace it with:
Recommendation R3.2:
The generalization of SPASE descriptors of target.region should be studied and
experimented with.
Table 1.2-a: "Where" parameters for the SOTERIA
7
http://www.ivoa.net/Documents/latest/UCD.html
17
Recommendations on how to improve the quality of data
Deliverable: D2.3
Table 1.2-b: "Where" parameters for the HELIO services
Table 1.3-c: "Where" parameters for the EUROPLANET RI services
18
Recommendations on how to improve the quality of data
Deliverable: D2.3
6.2.3.
What ?
Table 1.3-a: "What" parameters for the SOTERIA services
Table 1.3-b: "What" parameters for the HELIO
19
Recommendations on how to improve the quality of data
Deliverable: D2.3
Table 1.3-c: "What" parameters for the EUROPLANET RI services
ª Observatory:
The couple {"Observatory", "Instrument"} are often used in requests for accessing the data. It
is the case in HELIO and SOTERIA, and it is one of the possibilities in EUROPLANET RI.
The "Observatory" metadata is relatively simple to manage. However, in some rare cases,
there are some possible ambiguities that can occur when there is not a unique name for some
observatories. For example, VENUS-EXPRESS is often named as VEX. In the case of
multiple probe missions, there are different ways for naming a particular one. For example,
the names third spacecraft of the CLUSTER mission can be CLUSTER-3, CLUSTER-C3 or
CLUSTER-RUMBA.
Two possibilities can be envisioned for resolving the names of observatories:
(1) as attempted in HELIO, to develop a semantic mapping service which will be able to
associate the different names of a same observatory. Such a service can understand that
CLUSTER-3, CLUSTER-C3 and CLUSTER-RUMBA has the same meaning.
(2) to establish a list of standardised observatory names, to archive it, to maintain it and to
publish it in order that this list could serve as the reference.
The second option seems much easier to develop and to maintain. Moreover, if using the
option (1), it will be necessary to build this reference list of observatory names. It should be
inside this reference list that the output of the semantic service should be taken, for being
thereafter used in the query language.
Recommendation R4.1:
We recommend establishing a reference list of observatory names. This list should
be maintained/updated, and made available in the Internet.
This should be done in the frame of IAU. IAU has a list of ground-based observatories,
maintained by the minor planet centre. The majority of observatories are there. However, they
20
Recommendations on how to improve the quality of data
Deliverable: D2.3
refuse to add an observatory if it is not used for minor planets studies (e.g. the Nançay radio
instruments, in France). For space-based observatories, NASA SPICE8 has a list of missions
that can be used.
It is important to use the official names of space missions (the ones used by space agencies),
as well as to allow alternate names in the client layers
Recommendation R4.2:
It would be useful that this list would be inserted into a catalogue providing other
information such as the period of operation of the observatories, the targets of the
mission/observatory and the url of the mission/observatory site. Moreover, in the
case of space missions, the name used must be the one used by the space agencies.
ª Instrument:
The requirements on the "Instrument" metadata are similar. There are of course many more
instruments than observatories. In addition to the potential causes of ambiguities met for
observatories, some difficulties can come from the fact that a same name can be used for
several different instruments. For example, ASPERA is a particle spectrometer suite which
has flown onboard both Mars-Express and Venus-Express missions. This situation is often
met with some instrument types like the magnetometers that are often called "MGF", "MAG",
"MFI" or "FGM". Thus, it seems that the instrument name should always be associated with
the observatory one for avoiding potential ambiguity. A semantic mapping service may also
be envisioned for instrument, but seems complicated to develop and maintain. Again, for
short terms at least, it seems preferable to establish a reference list of instrument name,
associated (or linked to) with the observatory one.
Recommendation R4.3:
We recommend establishing a reference list of instrument names. This list should
be maintained/updated, and made available in the Internet. The names should
correspond to the ones used in official archives.
Recommendation R4.4:
The reference list for instruments should be associated with / merged with / linked
to the mission/observatory list.
It should be interesting to include some complementary information into the instrument
reference list, such as in particular the instrumental techniques. When possible, there should
also be a mandatory keyword for instrument description [e.g. Magnetometer], so that
whatever the name of the instrument, it is possible to know what it is. This plus the
association of the mission/observatory help defining a particular instrument. But we have to
8
http://naif.jpl.nasa.gov/naif/aboutspice.html
21
Recommendations on how to improve the quality of data
Deliverable: D2.3
keep in mind that it is closely related to measurement type, which is not always an easy task
to define.
Recommendation R4.5:
As often as possible, the instrument type should be associated which each
instrument name and mission/observatory.
ª DataProductType:
The metadata "DataProductType" is relatively universal and is not or only marginally
dependent on the domains. This metadata is easy to manage if its role is to indicate the type of
a data-product relatively to general classes.
This metadata has been reviewed in the context of EUROPLANET RI based on the analysis
of the existing standards (IVOA, SPASE, PDS). The list of Data Product Types extracted
from the EPN-DM data model document is the following:
"DataProduct:
The data product type describes the high level scientific organization of the data product
being considered. The list of product values is:
• Image: associated scalar fields with two spatial axes, e.g. image with multiple color
planes, from multichannel cameras for example. Maps of planetary surfaces are
considered Images
• Spectrum: data product for which the spectral coverage is the primary attribute, e.g.
a set of spectra
• DynamicSpectrum: consecutive spectral measurements through time, organized as a
time series. voir baptiste 1D temp, 1D Spectral
• SpectralCube: set of spectral measurements with 1D or 2D spatial coverage, e. g.
imaging spectroscopy. The choice between Image and Spectral_cube is related to the
characteristics of the instrument
• Profile: scalar or vectorized measurements along one spatial dimension, e.g.
atmospheric profiles, atmospheric paths, sub-surface profiles, etc.
• Volume: any measurement with three spatial dimensions
• Movie: set of chronological 2D spatial measurements
• Cube: multidimensional data with three or more axes, e.g. all that is not described by
other 3D data types such as spectral cubes
• TimeSeries: measurements organized primarily as a function of time (with exception
of dynamical spectra). A light curve is a typical example of a time series dataset.
• Catalogue: it can be a list of events, a catalogue of object parameters, a list of feature,
etc., e.g. list of asteroid properties
• SpatialVector: list of summit coordinates defining a vector, e.g. vector information
from GIS, spatial footprints, ."
(From the EPN-DM datamodel document of EUROPLANET RI)
Such a list could be adopted as a reference list for solar system science. It remains general but
provides enough information for characterizing efficiently the search of data in most of the
22
Recommendations on how to improve the quality of data
Deliverable: D2.3
cases. However, some additional metadata or more precise ones may be necessary. It is the
case in particular for solar physics which needs to distinguish between large and short view
angle, but also for some planetary imagers.
Recommendation R4.6:
We recommend establishing a reference list of names of the data product types.
This list should be maintained/updated, and made available on the Internet.
Recommendation R4.7:
The data product type is information that is mostly non-dependent on the domains.
Such lists exist already in various standards such as IVOA, SPASE or PDS. As it is
easy and obvious, it is strongly recommended that a list coming from one of them
is adopted and if needed to extended.
Remark: this has been done in EUROPLANET RI, using the IVOA standards.
ª Measurement Type:
When a researcher searches data, he/she searches a particular physical parameter or quantity
which can be useful for his/her study. Generally, the request of data in archives or services
uses the "observatory"+"instrument" metadata. However, the user does not know necessary
which instrument can provide such physical parameter. This situation is met in particular in
the case of transverse studies based on observations related to different domains (e.g. the
planetology use case presented in section 5). It is thus highly desirable to provide information
on the nature of data content.
Ideally, we would need to describe the physical parameters contained in the data. But, it may
be difficult and complex, and it could become very demanding for the data providers. Many
semantic problems appear when we want to standardize the information at the parameter
level. Let us consider for example the case of the data provided by the energetic ion
instrument EPIC-STICKS flowing onboard the GEOTAIL spacecraft (which has been used as
a test case in EUROPLANET RI). This instrument combines electrostatic analysers, time-offlight telescopes and solid-state detectors. It provides the 3D distribution functions with the
ability to separate the ions as a function of their mass and their charge. The data are organised
(i) by detection head and (ii) by detector and either (iii) by ion specie (defined by its mass and
its charge), (iv) velocity or (v) energy.
In practice, this instrument continuously produces more than two hundred datasets. It is easy
to imagine that describing such data up to the parameter level is a complicated task.
Standardising this description implies standardising all the necessary information such as the
way of characterising a ion specie or the instrument dependent specie (e.g., the energy or
velocity channels without separation of ions), the coordinate system in which the data are
represented, the units, the quantity nature (counts, fluxes, ...) and several other information
elements depending on the data product types. Moreover, for making this complex description
usable in the interoperability paradigm, it needs to be structured. This particular problem of
the ion spectrometer has been addressed and solved in SPASE. But the SPASE data model
23
Recommendations on how to improve the quality of data
Deliverable: D2.3
cannot be applied outside the heliophysics and thus cannot fit the solar system science as a
whole.
Similar or equivalent in complexity intensity can be found in all the fields of the solar system
science. The data model EPN-DM has been developed in the context of EUROPLANET RI in
the aim to solve this challenge. This model based on the IVOA standards is being published.
It is very likely that it will need some years to become stabilized.
We thus may consider that there is no applicable solution at the present time for providing a
standardized description of the solar system science data at the measured parameter level.
Inserting the "Measurement Type" metadata could be considered as an intermediate option.
The measurement type characterizes type of physical quantities contained in the data by
remaining at a high level of description. For example, the "MeasurementType" metadata can
indicate that the data contains measurement of magnetic field, but does not specify the
coordinate system, the cadence, the units, etc. Such a metadata is very much easier to manage
and allows however to make the data search efficient.
From the data provider side, this option has also some great advantages. As this metadata is
simple, it will be easy to take it into account in the description of his/her data. If the physical
parameter level of description was required, we can imagine that it would be extremely costly
and resource consuming. This would interrupt development and we can anticipate that it
would not make the data providers enthusiastic about publishing their data in a virtual
observatory.
Recommendation R4.8:
As a standardised description at the "physical parameter" level is very complex to
elaborate and to implement, we recommend introducing the intermediate level
characterizing the "Measurement Type".
Recommendation R4.9:
Reference list of the "MeasurementType" metadata exist in SPASE and in EPNDM data models, only for the domain of heliophysics. The communities in other
fields should organize themselves in order to study this issue and to propose
reference lists for this metadata.
Recommendation R4.10:
The MeasurementType metadata should be developed accordingly to existing
standard or by extending them if necessary.
For illustrating this last recommendation, the list of the MeasurementType metadata values
for plasmas used in the EPN-DM is given below. Most of them already existed amongst the
IVOA UCDs but as mentioned above in this document, a conclusion of Solar System UCDs
document v0.3 is that UCD is probably not the best place to describe in detail
MeasurementType. This discussion is still in progress in IVOA.
24
Recommendations on how to improve the quality of data
Deliverable: D2.3
UCDs that already exist in the IVOA:
UCD
Origin
Meaning
_________________________________________________________________
phys.electField
cds
Electric field
phys.magField
cds
Magnetic field
phys.electron
cds
electron
em.gamma
cds
Gamma rays part of the spectrum
em.X-ray
cds
X-ray part of the spectrum
em.UV
cds
Ultraviolet part of the spectrum
em.opt
cds
Optical part of the spectrum
em.IR
cds
Infrared part of the spectrum
em.radio
cds
radio part of the spectrum
Added UCD in EPN-DM:
phys.ion
phys.dust
phys.neutral
EPN-DM
EPN-DM
EPN-DM
Ions
Dust
Neutrals
25
Recommendations on how to improve the quality of data
Deliverable: D2.3
7. Summary of recommendations.
Analysing the metadata used in the SOTERIA, HELIO and EUROPLANET RI projects, and
anticipating interoperable infrastructures desirable for transverse science aimed in future
missions such as SOLAR-ORBITER, ROSETTA or BEPI-COLOMBO, we extracted the
following recommendations:
ª Concerning the general approach:
Recommendation R1:
We recommend considering several layers of interoperability corresponding to
different layer of functionality and developing the standards accordingly,
although keeping as much consistency as possible between them by using a
common dictionary as much as possible. Two or three layers is probably the
maximum needed.
Considering the first layer of interoperability, which allows the user to query data, our
recommendations are:
ª Concerning the description of the time:
Recommendation R2.1:
For developing interoperability, it is recommended that a standardised format
for writing the dates in the metadata should be adopted. This standardised
format should be ISO 8601.
Recommendation R2.2:
The time parameter may have different meanings. It is recommended that a way
be found to indicate clearly what the parameter is, for example by associating an
additional metadata providing its exact meaning.
ª Concerning the description of the location of the targets of the observations:
Recommendation R3.1:
The precision of the metadata characterising the location of the target of the
measurement should be adjusted as a function of the amount of the potentially
corresponding data.
26
Recommendations on how to improve the quality of data
Deliverable: D2.3
Recommendation R3.2:
The generalization of SPASE descriptors of target.region should be studied and
experimented with.
ª Concerning the content of the data:
Recommendation R4.1:
We recommend establishing a reference list of observatory names. This list
should be maintained/updated, and made available in the Internet.
Recommendation R4.2:
It would be useful if this list could be inserted into a catalogue providing other
information such as the period of operation of the observatories, the targets of
the mission/observatory and the url of the mission/observatory site. Moreover, in
the case of space missions, the name used must be the one used by the space
agencies.
Recommendation R4.3:
We recommend establishing a reference list of instrument names. This list should
be maintained/updated, and made available in the Internet. The names should
correspond to the ones used in official archives.
Recommendation R4.4:
The reference list for instruments should be associated with / merged with /
linked to the mission/observatory list.
Recommendation R4.5:
As often as possible, the instrument type should be associated which each
instrument name and mission/observatory.
Recommendation R4.6:
We recommend establishing a reference list of names of the data product types.
This list should be maintained/updated, and made available on the Internet.
Recommendation R4.7:
The data product type is information that is mostly non-dependent on the
domains. Such lists exist already in various standards such as IVOA, SPASE or
PDS. As it is easy and obvious, it is strongly recommended that a list coming
from one of them is adopted and if needed extended.
27
Recommendations on how to improve the quality of data
Deliverable: D2.3
Recommendation R4.8:
As a standardised description at the "physical parameter" level is very complex
to elaborate and to implement, we recommend introducing the intermediate level
characterizing the "Measurement Type".
Recommendation R4.9:
Reference list of the "MeasurementType" metadata exist in SPASE and in EPNDM data models, only for the domain of heliophysics. The communities in other
fields should organize themselves in order to study this issue and to propose
reference lists for this metadata.
Recommendation R4.10:
The MeasurementType metadata should be developed accordingly to existing
standard or by extending them if necessary.
At more general level, we could add:
Recommendation 5.1:
The idea to manage many different data models with help of semantic mapping
service is interesting and will certainly consist of a promising perspective. This
approach should be pursued but it is still a research front. In short terms, it
seems to be difficultly applicable in the solar system sciences as a whole.
Recommendation 5.2:
The IVOA standards cover the description of astronomy, which includes multiple
domains. As experimented in planetology, some of these standards seem to be
flexible enough for being adapted to solar system sciences, by developing some
extensions when necessary. Using and extending the IVOA standards seem to be
the easiest and the less costly way for developing interoperable infrastructures in
solar system sciences.
This last recommendation is true for data format, search and exchange procedures. But for
data description, IVOA standard is not adapted. So this recommendation concerns mainly data
format (VOTable, FITS, …), search (TAP, EPN-TAP, …) and exchange (SAMP, …).
28
Recommendations on how to improve the quality of data
Deliverable: D2.3
8. List of useful links
8.1. Institutions
* IAU nomenclature for object types:
http://planetarynames.wr.usgs.gov/Page/Planets
* IVOA home page:
http://www.ivoa.net/
IVOA documents:
http://www.ivoa.net/documents/
* IPDA (International Planetary Data Alliance) home page:
http://planetarydata.org
8.2. FITS information
* World coordinate system for use in FITS format:
http://fits.gsfc.nasa.gov/fits_wcs.html
* Tags recommended to use in FITS files for use in standard software:
http://www.mssl.ucl.ac.uk/surf/sswdoc/solarsoft/ssw_standards.html
* STEREO/SECCHI FITS Header Keyword Definition (which is widely used):
http://sohowww.nascom.nasa.gov/solarsoft/stereo/secchi/doc/FITS_keywords.pdf
8.3. Projects web pages and documents
* HELIO home page:
http://www.helio-vo.eu/index.php
HELIO useful documents:
* Heliospheric & Planetary Data:
http://www.helio-vo.eu/documents/public/HELIO_Planetary-Data_20100208.pdf
* Coordinate Systems:
http://www.helio-vo.eu/documents/public/HELIO_Coordinates_100322.pdf
* HELIO Data Model:
http://www.heliovo.eu/documents/public/HELIO_UNIMAN_R1_004_TN_DataModel_v0.3.pdf
* SOTERIA home page:
http://soteria-space.eu/index.php
* EUROPLANET IDIS (Integrated and Distributed Information System) home page:
http://www.europlanet-idis.fi/index.php
29
Recommendations on how to improve the quality of data
Deliverable: D2.3
EUROPLANET IDIS documents page:
http://www.europlanet-idis.fi/index.php?id=documents
Documents describing the datamodels and the protocoles used in EUROPLANET RI:
o The document describing the PDAP protocole :
http://planetarydata.org/projects/inactive-projects/data-access/documents/pdapversions/pdap-v1.1/view
o The document describing the EPN-TAP protocole:
http://voparis-europlanet.obspm.fr/Tdocum.shtml
o The document describing the EPN-DM datamodel:
http://voparis-europlanet.obspm.fr/docs/PlanetaryScienceResource-DM-latest.pdf
* IMPEx home page:
http://impex-fp7.oeaw.ac.at
* VSO recommendations:
*Minimum Information for Solar Observations:
http://docs.virtualsolar.org/wiki/MinimumInformation
* Checklist for data description:
http://docs.virtualsolar.org/wiki/Checklists
8.4. Data Model, UCD, Metadata…
* Early data model for solar observations:
http://www.mssl.ucl.ac.uk/~rdb/egso/documents/
* The IMPEx Data Model:
http://impex.latmos.ipsl.fr/tools/DataModel.htm
* Explanation of the meaning of UCDs:
http://cdsweb.u-strasbg.fr/UCD/tree/js/
* IVOA document (version 0.2) on Solar System UCDs:
http://wiki.ivoa.net/internal/IVOA/InterOpSep2013Semantics/SolarSystemUCD-V02.pdf
Version 0.3 of this document:
http://typhon.obspm.fr/idis/docs/SolarSystemUCD-V03.pdf
* "Specification on Metadata Standards in Heliophysics" (HELIO N3.3 deliverable):
http://www.helio-vo.eu/internal/Documents/Deliverables/HELIO-N3-005d3_MetadataStandards_v1.7.pdf
30
Recommendations on how to improve the quality of data
Deliverable: D2.3
8.5. Tools for data mining and data visualization based on VO standards
* Aladin:
http://aladin.u-strasbg.fr/aladin.gml
* Topcat:
http://www.star.bris.ac.uk/~mbt/topcat/
* Amda:
http://cdpp-amda.cesr.fr/DDHTML/index.html
* CL Web:
http://clweb.cesr.fr
8.6. Others
* NASA list of missions:
http://naif.jpl.nasa.gov/naif/aboutspice.html
* SPASE:
•
Home page:
http://www.spase-group.org
•
Registry explorer:
http://www.spase-group.org/registry/explorer/
•
Data Model explorer:
http://www.spase-group.org/data/explorer/
31