Coordination Action for the integration of Solar System Infrastructures and Science Project No.: 261618 Call: FP7-INFRA-2010-2 Recommendations on how to improve the quality of the data Version 2.3 Title: Recommendations on how to improve the quality of data Document No.: Date: CASSIS Deliverable: D2.3 v2.3 Dec 02, 2013 Editor: Christian Jacquey Contributors: Jean Aboudarham, David Berghmans, Baptiste Cecconi, André Csillaghy, Véronique Delouille, Sébastien Hess Distribution: Project Recommendations on how to improve the quality of data Deliverable: D2.3 Revision History Version Date 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 2.0 11-June-2012 14-June-2012 15-June-2012 19-June-2012 21-June-2012 21-June-2012 27-June-2012 29-June-2012 24-Nov-2013 C. Jacquey 2.1 29-Nov-2013 J. Aboudarham 2.2 2-Dec-2013 J. Aboudarham 2.3 08-Jan-2014 R.D. Bentley 23/1/14 16:25 Released by Detail A. Csillaghy, C. Jacquey A. Csillaghy C. Jacquey A. Csillaghy C. Jacquey C. Jacquey J. Aboudarham Update document with EuroPlaNet and IMPEx information Update with additional information on FITS standards Change character font in accordance to CASSIS standards and correct some typos Corrected things found during checking. Recommendations on how to improve the quality of data Deliverable: D2.3 1. Introduction........................................................................................................................ 1 2. Description of the data in Solar System Sciences. ............................................................ 1 2.1. Diversity of the field in Solar System Sciences.......................................................... 1 2.2. For who? ..................................................................................................................... 2 2.3. Who? ........................................................................................................................... 2 2.4. When? ......................................................................................................................... 2 2.5. Where? ........................................................................................................................ 3 2.6. What? .......................................................................................................................... 3 3. What does exist already in terms of describing data?........................................................ 4 4. Different layers of interoperability .................................................................................... 8 5. Levels of interoperability versus science cases. .............................................................. 11 6. Layer-1: basic search of data. .......................................................................................... 13 6.1. Matrix of the primary metadata used in the SOTERIA, HELIO and EUROPLANET RI services. ......................................................................................................................... 13 6.2. Frequently used primary metadata............................................................................ 13 6.2.1. When ? ............................................................................................................... 13 6.2.2. Where ? .............................................................................................................. 16 6.2.3. What ? ................................................................................................................ 19 7. Summary of recommendations. ....................................................................................... 26 8. List of useful links ........................................................................................................... 29 8.1. Institutions ................................................................................................................ 29 8.2. FITS information ...................................................................................................... 29 8.3. Projects web pages and documents........................................................................... 29 8.4. Data Model, UCD, Metadata… ................................................................................ 30 8.5. Tools for data mining and data visualization based on VO standards ...................... 31 8.6. Others ........................................................................................................................ 31 23/1/14 16:25 Recommendations on how to improve the quality of data Deliverable: D2.3 23/1/14 16:25 Recommendations on how to improve the quality of data Deliverable: D2.3 1. Introduction The aim of this document is to provide recommendations for improving the quality of the metadata in order to enhance the interoperability between the services that the EU projects SOTERIA, HELIO and EUROPLANET RI provide. Recently, IMPEx (Integrated Medium for Planetary Exploration) provided a data model to interconnect planetary observations with numerical models. We shall take this work into account in this document. These recommendations could serve beyond the perimeter of these projects and could be considered more generally in any data production and system related to solar system sciences. In particular, they could help in the context of the up-coming and future missions such as SOLAR-ORBITER, ROSETTA and BEPI-COLOMBO. This document follows o the review of the services offered in these three projects and in the US VOs as well (CASSIS/D2.1); o the analysis (CASSIS/D2.2) of the metadata used in these services, focusing on how they are appropriate -or conversely inhibit- interoperability. 2. Description of the data in Solar System Sciences. 2.1. Diversity of the field in Solar System Sciences. The Solar System sciences may be seen as consisting of two major disciplines, the heliophysics and the planetology. Each one includes a number of sub-disciplines which intersect depending of the studied thematic. In heliophysics, we may identify sub-disciplines such as: - Solar Physics - Solar Corona Physics - Interplanetary medium and solar wind physics - Planetary (including Earth) magnetospheric physics - Planetary (including Earth) ionospheric physics In planetology sub disciplines are: - Planetary (including Earth) magnetospheric physics - Planetary (including Earth) ionospheric physics - Atmospheric physics and chemistry - Surface mineralogy, physics and chemistry - Planet interior physics (geodynamics) 1 Recommendations on how to improve the quality of data Deliverable: D2.3 - Small bodies dynamics, physics and chemistry - Interplanetary matter The associated communities have a usage of the data which may be different depending on the sub-disciplines and the types of the exploited data. Consequently the way of describing data is heterogeneous through the various fields of the Solar System sciences. Some communities developed standards and other ones did not. At the present time, reaching a standardized description for all fields at a level such that the data can be used by scientific tools cannot be envisioned. Such evolution will take years. However, we can identify the main components on which the description of the data is built. In this aim, let us consider the questions we have when we want to characterize a particular set of data: Important notice: the following applies to the observational data. For simulation or model result data, and for catalogues, there is an additional complexity that is often not solved yet, in Solar System Sciences and in Astronomy as well. In space-physics interplanetary medium, for magnetosphere and in a limited way for ionosphere, it is covered by IMPEx, which is at a nearly operational stage, and has begun to be spread out of the original IMPEx program participant (UCLA, LESIA,...). 2.2. For who? The description of the Solar system sciences data does not often exist, making it necessary to contact the mission or instrument PI to get information about the data location, format, caveats, and context. This is most often be done easily by humans at the expense of a lot of time. Machines could speed up this process by thousands of times, but would need a more accessible description of the data. Moreover, machines do not think or understand these descriptions in the strictest sense. At most, they can relate on predefined keywords to search, sort and analyze the data. In order to be efficient these keywords must be both precise (so the result is relevant) and in a limited quantity (so that different datasets can be related together). This is a difficult task in a limited domain of astrophysics, but Solar System Sciences cover many domains of physics, rendering the task much more complex. 2.3. Who? Who produces these data? Who is the contact person if we want information on them? What are the rights associated with them? It is generally easy to generate a description which answer to these questions and this block of information is present in most of the existing standards. 2.4. When? What is the time period represented by the data? When the data have been acquired? To which time are they related, if they are simulation or predictive data? This information is crucial when the data correspond to measurement or predictions related to dynamical phenomena. It is the case for example for plasmas, atmospheric or solar physics. In these fields, the first criterion for searching data is the time-period. The description of this information is in principle simple, as it consists of a start-time and an end-time, bounding the period covered by the data. However, it become more complicated when the data correspond 2 Recommendations on how to improve the quality of data Deliverable: D2.3 to simulation or prediction of propagating phenomena (e.g. Coronal Mass Ejection). In this case, the time-period can only be considered by also answering to the question: Where? 2.5. Where? From which location do the data provide information? What is the target? This question can already be more complex. In the case of observational data, the target can be local if they consist of in-situ measurement or distant if they come from remote sensing. Some particular observation types mix both, as for example the measurement of electromagnetic waves which can be generated locally by plasmas or come from distant emission sources. In Solar System sciences, describing the data for answering the "Where?" question can be declined hierarchically. The first level consists of the object of the Solar System to which the data correspond; which corresponds generally to the target of the observation. The target_class defined in EPN DataModel defines the possible values: • • asteroid, dwarf_planet, planet, satellite (types from IAU list IAU nomenclature for object types: http://planet arynames.wr.usgs.gov/Page/Planets ) comet, exoplanet, interplanetary_medium (solar wind), ring, sample, sky, spacecraft, spacejunk, star (Sun) (extra types defined for EPN) The regions or sub-objects can be considered as a second level. In this class, we can find regions like magnetosphere, atmosphere, corona, Olympus mount on Mars, etc... Sub-objects can be the Saturn rings, Mars polar cap, etc... However, this class can become difficult to describe if we want to become more precise. For example, in magnetospheric physics, it could be useful to indicate if a set of data corresponds to the "dayside magnetopause" or "Pavonis Mons volcano (on Mars)". But it is obvious that pushing the precision too much will lead to a complexity of the data description which could become impossible to manage. It was proposed in EPN at some point to have a "target-region" and a "target-feature" (which is a level below) but it appeared to be very difficult to describe it formally. A third level consists of characterizing the target using coordinates. This is easily done in astronomy because in this case the data come only from remote sensing and there is a commonly used coordinate system (right ascension and declination). The world coordinate system is defined in Astronomy for use in the FITS format in http://fits.gsfc.nasa.gov/fits_wcs.html. In the Solar System, it is much more complex because (i) the data come from remote sensing or in-situ measurement, (ii) there is a plethora of objects and associated coordinate systems. In the case of the Sun, W.T. Thompson (2005) has provided a detailed description of the coordinate systems for solar images date; then a use of the world coordinate system in the case of the STEREO probes, in W.T. Thompson (2010). 2.6. What? What is the type (time series, images, distribution functions, ...) of data? What is the content of the data? What are the physical parameters provided by these data? What are their uncertainties, and how were they computed? Which instrument obtains them, and onboard which observatory? What are the used instrumental techniques? What were the experimental conditions? For providing a description answering this question which can rapidly become very complex, it is useful to consider different levels of information: 3 Recommendations on how to improve the quality of data Deliverable: D2.3 - the data product level: this level indicates what is the type of data product (detailed values in section 6.2.3, Data Product subsection) - the basic instrument level: here, only the name of the observatory and the name of the instrument are considered; - the measurement type level: at this level, the description indicates the type of physical quantities contained in the data. For example, the data contains "magnetic field measurement", "electron measurement" or "Infrared part of the spectrum measurement". This is much more tricky than was thought at the beginning, and it is not implemented yet in EPNTAP. This must be further studied, and defined, but in version 0.3 of the Solar System UCDs document (Cecconi et al.) submitted to IVOA, we can find: “Discuss if the initial goal of using UCDs for “Measurement Types” information is reached. Probably not.” - the physical parameter level: this the precise description of the content of the data. Reaching this level, if enough developed, should allow the scientific exploitation with tools which can interpreted the associated metadata. They have to include all the needed information that includes the type of parameter, the exact measured physical quantity, the units, the coordinate systems, and in some case, calibration parameters or experimental context indications. This level becomes rapidly complex in Solar System sciences. For example, for ion dynamical spectra coming from a mass spectrometer, this requires indicating the energy range, the angles defining the view field, the mass of ions, their ratio mass/charge, and it may be necessary to have also the measurement of the spacecraft potential. In planet surface imaging, a lot of metadata are needed in order to extract scientific information by taking into account the geometrical aspects (scaling, reflectance depending on the incidence, ...). 3. What does exist already in terms of describing data? Describing data in the solar system sciences has been approached in several different ways. We shortly review the efforts in a chronological order, in order to place our work into a broader context. Solar physics has been confronted with needs of common data descriptions already at the start of SOHO. There were no Web Services then, but nevertheless there was a clear advantage to have common descriptions of the data to enable re-using many of the software utilities. Thus, a simple common vocabulary was introduced. S. Freeland described “Tags” that were recommended to use in FITS files, making these files readable by many programs in the IDL SolarSoftWare (SSW) environment1. These tags are still widely used and are based on FITS standards (Pence, W D, L Chiappetti, C G Page, R A Shaw, and E Stobie. 2010. “Definition of the Flexible Image Transport System (FITS), Version 3.0.” Astronomy and Astrophysics 524 (November 22): A42. doi:10.1051/0004-6361/201015362.). They have spread widely in this community. The following table shows some of these tags. It is interesting to note that we already see here that one of the main concerns is how to represent date and time information. 1 See page at http://www.mssl.ucl.ac.uk/surf/sswdoc/solarsoft/ssw_standards.html 4 Recommendations on how to improve the quality of data Deliverable: D2.3 SSW keywords for the solar software tree (from ssw_standards.html) Tag DATE_OBS Status Required Source Definition W.Thompson, R.Howard The time and date of the start of the observation, in Earth-adjusted UT in a timestamp format recognized by the SSW ANYTIM routine (*). CCSDS ASCII Calendar Segmented (ISO 8601). Example: 1988-01-18T17:20:43.123Z The duration of the observation, in seconds (floating point). Permits calculation of the "mean time" of an observation with the agreed standard for DATE_OBS. EXPTIME Suggested FITS Std TIME Suggested W.Thompson,R.Howard Milliseconds of day (long integer) Yohkoh / SOHO MJD Suggested Modified Julian Day (long integer). Used in W.Thompson,R.Howard conjunction with TIME to provide efficient SOHO binary time representation. DAY DATE-OBS TIME-OBS Integer number of days since 1-1-79 (short integer). Used in conjunction with TIME to provide efficient binary time representation. Optional SMM / Yohkoh Suggested FITS std The date on which the observation was made, UT. SSW utilities do not directly use this tag, but its inclusion in headers is suggested for FITS conformability. (To access this tag in an IDL structure, remember to use DATE_d$OBS.) Suggested Common usage The time at which the start of the observation occurred, UT. This tag is not explicitly defined in the FITS standard, but has been adopted into common usage. (To access this tag in an IDL structure, remember to use TIME_d$OBS.) (*) The ANYTIM routine can produce this format from all known time standards used in the solar physics community. The anytime.pro IDL routine is included in the generic library of SolarSoft. The function is intended to be Y2K (Year 2 thousand) compliant as it properly handles all conversions within the range January 1, 1950 through December 31, 2049. There are no time range restrictions for either of the binary formats ( TIME and DAY or TIME and MJD) or for string formats which use the fully qualified (4 digit) year, including the formal definition of DATE_OBS above. 5 Recommendations on how to improve the quality of data Deliverable: D2.3 If the time description is standardized, the meaning of time in an observation is not so obvious. This is discussed in section 6.2.1. The FITS format also proposes coordinates values. The CRPIX, CTYPE and CDELT keywords, giving respectively the location of Sun’s centre, units of the axis of the ‘image’, and the pixel resolution in the given unit help to locate an observation on the Sun. In the case of heliospheres and planets, FITS format does not provide the necessary description in order to locate correctly observations in a standard format. Some planetologists are in contact with IAU in order to update FITS standard definition for solving this issue. The 53 keywords2 found in FITS standard are: CROTAn EQUINOX NAXISn TBCOLn TUNITn AUTHOR CRPIXn EXTEND OBJECT TDIMn TZEROn BITPIX CRVALn EXTLEVEL OBSERVER TDISPn XTENSION BLANK CTYPEn EXTNAME ORIGIN TELESCOP BLOCKED DATAMAX EXTVER PCOUNT TFIELDS BSCALE DATAMIN GCOUNT PSCALn TFORMn BUNIT DATE GROUPS PTYPEn THEAP BZERO DATE-OBS HISTORY PZEROn TNULLn CDELTn END INSTRUME REFERENC TSCALn COMMENT EPOCH NAXIS SIMPLE TTYPEn In the early Internet age, the European Grid on Solar Observations published 2003 a data model that reached further than the FITS keywords (K. Reardon et al., http://www.mssl.ucl.ac.uk/~rdb/egso/documents/). This data model emphasized the relationships between the different entities involved in solar observations rather than the vocabulary. As a result, several different entities were described by the same words, preventing its wide spread as an interoperability reference document (but this was not its purpose anyway). At the same time, the IVOA was investing a fair amount of resources to create data models for astronomical data as well as creating protocols to make services interoperable. As a result of this major effort, the IVOA Architecture is a widely regarded set of models and protocols. Although created for astronomy, the IVOA is a vast source of resources also for solar system sciences. In particular, the protocols like SAMP and TAP can be used straightforwardly in our domain. Nevertheless the IVOA does not solve all the problems, in particular in the way it deals with time. Our domain needs many more ways of expressing data and times. Also in the framework of astronomy, the Unified Content Descriptors (UCDs) provide a vocabulary where the solar system sciences can plug in. Here an example of two UCDs (http://cdsweb.u-strasbg.fr/UCD/tree/js/) for describing time: • • time.start time.end However UCDs as defined today are limited and do not provide ways to express all the semantics available in our (highly dynamic) environment. For instance, there is no easy way of making a difference between observation and event times, which are fundamental in our field. 2 Their definition can be found in: http://www.mssl.ucl.ac.uk/grid/iau/extra/solarsoft/ssw_standards.html or in: http://bass2000.bagn.obs-mip.fr/Tarbes/spip.php?rubrique22 6 Recommendations on how to improve the quality of data Deliverable: D2.3 But UCD is probably not the place to describe that… this has to be investigated. However we can think of: • • time.start;obs and time.start;src time.end;obs and time.end;src but we can find, in Solar System UCDs document (V0.3) that “discussion [is] needed on “src” (definition = Observed source viewed on the sky). For example, change definition so that src can cover other sources than sky objects (such as samples). UCDs can be combined. However, there is no precise standard regarding the meaning of these combinations, rendering them close to useless Recently, the solar system community began to talk to the UCD providing consortium to extend it to incorporate also Solar System data. Furthermore, UCDs don’t provide indications about the format of the date/time. Stepping in the path of the IVOA, the SPASE (Space Physics Archive Search and Extract) data model emerged. This is one of the most mature data models in the solar system sciences, except maybe Earth observations. SPASE has played a particularly important role in the early phase of HELIO. SPASE received many extensions and is widely used out of the limited domain of physics it was born in (Heliophysics). Notably it serves as a basis to the IMPEx data model that describes simulations of the planetary environments. HELIO took a different approach. It assumed that (too) many data models were already around and that, rather than inventing a new one, it would unify (or glue) them by defining a mapping of the different data models using ontologies. This approach corresponds also more to the path to a vocabulary that is not enforcing specific terms but recommending terms that still can be mapped to the common vocabulary, leaving it much more free. Nevertheless, an internal HELIO data model has been used for orchestrating workflows, which are, in fact, also a specific type of a combination of services. HELIO also used the work done by IVOA and introduced the PQL standard into solar system sciences. PQL has been designed as a REST-based query language for astronomy. All HELIO REST services use PQL as query language. PQL has the characteristics that it functions in a similar way to SQL, thus the name and the structure of the language. The use of SQL as a query standard in Web services is very practical from a developer point of view, but it often confuses users as the terms used to formulate a query follows relational database clauses that are unfamiliar to many, such as the WHERE clause. EUROPLANET RI followed an approach similar to the one pursued in astronomy and attempted to build a data model able to cover the planetology sciences in all its diversity. The first step was to deeply study the needs and the uses in the different domains and communities based on use case analysis. This led to a first data model3 based on mixed concepts coming from IVOA/Characterization and SPASE. This data model was a first assessment of the required metadata. It is thus rather complete. However, at some point, it was decided to use TAP as the main data access protocol for the project, and the data model had then to be rewritten to allow its use with TAP services. 3 http://typhon.obspm.fr/idis/docs/Data_Model_v118a.pdf 7 Recommendations on how to improve the quality of data Deliverable: D2.3 It has been thereafter converted into a new one, EPN-DM4 using the formalism given in the IVOA/Obscore, IVOA/VO-resources and IVOA/DataService standards. This work was inspired by "Obscore" but EPN did not use that standard because it was not ready and was still too oriented to sky observations. This strategy has been chosen because this data model allows the use of the PDAP5 protocol from IPDA (International Planetary Data Alliance) for simple requests and the TAP6 one from IVOA allowing high level queries. In particular making data compliant to this data model allows the use of the IVOA tools like ALADIN, TOPCAT, VOSPEC, etc. 4. Different layers of interoperability The interoperability is needed when a service needs to communicate with another service, for instance to extract information or data. This is the case when an infrastructure is built with distributed resources and/or tools. Building an interoperable infrastructure (or Virtual Observatory) is a necessity for many scientific disciplines and communities, and in particular for astronomy and solar system sciences, where observation instruments are usually built by independent groups. However, the interoperability can quickly drive complexity in such an infrastructure. This potential complexity also induces risks and costs in regards of infrastructure maintenance and development, and brings constraints on the data providers (who do not necessarily have the human resources for supporting them). It is thus very important to identify compromises between the needs of the scientists and the feasibility/reliability of the envisioned infrastructures. Another key consideration is that the insertion of resources into a virtual observatory will request efforts from the data providers or the service developers. If the standards describing the resources are too complicated, they will not want them or they will not be able to participate. It is thus crucial to specify a progressive approach to which the resource providers could adapt. The most obvious way consists to define layers of interoperability with respect to the functionalities expected from a virtual observatory. In a more pragmatic way of thinking, data providers may engage the required efforts if there are already existing tools or network that may provide added value to their data. Experience showed that the development of new data models that are not widely implemented by the existing tools reveals itself useless. The opposite view is also true, as new tools have to rely on the existing data description. Thus the development of a common data description must involve a wide community including large data providers and the developers of widely used tools. The other way around is to proceed to small extensions to existing standards that can be proposed to the authorities maintaining them [e.g. IMPEx data model is an extension to SPASE that is proposed to become fully part of the standard]. The previous Recommendation R1, from version 1 of this document was: We recommend to consider several layers of interoperability corresponding to different layer of functionality and to develop the standards accordingly 4 http://voparis-europlanet.obspm.fr/docs/PlanetaryScienceResource-DM-latest.pdf 5 http://planetarydata.org/projects/inactive-projects/data-access/documents/pdap-versions/pdap-v1.1/view 6 http://www.ivoa.net/Documents/TAP/ 8 Recommendations on how to improve the quality of data Deliverable: D2.3 But one has to keep in mind that data providers would not maintain more than 1 or 2 layers of interoperability, and as a tool developer implement more than a couple of them. Interoperability layering can be useful for the development and evolution of standards, but should be limited, and should be done in the optics of integrating the upper layers into the lower ones. Of course, the search and exploitation layers require different approaches, but they should not be totally disjoined. In particular, they should use similar dictionaries as much as possible. This is tricky part: there are 2 ways to address the problem. And data providers may choose which path to follow depending on their pre-existing infrastructure. Here are the 2 paths: 1. The data provider has a database. It can follow the path described here. The first layer is what is done in EPN-TAP (more or less) especially if you add "coverage" information (temporal, spectral, spatial…) and target information (this is rather important!) 2. The data provider has a web page with links to data files (in a VO format like FITS, VOTable or CDF…). In that case they can start with putting SAMP on their web page. Then if they have tabular data, it is easy to transform the data into VOTables and then follow the same path (this is a rather low energy effort). Finally, when the data is available and VOcompliant, the heavier work of setting up a database and putting protocols on top of it can be started. So we propose a new Recommendation R1: We recommend considering several layers of interoperability corresponding to different layers of functionality and developing the standards accordingly, although keeping as much consistency as possible between them by using a common dictionary as much as possible. Two or three layers is probably the maximum needed. In this document, we identify three layers of interoperability which relate to three classes of functionality of an interoperable infrastructure: Layer-1: Basic search of data. This layer allows the development of search engines using basic criteria. We define those basic criteria as being queries over • when: the time parameter. It generally correspond to a time period which is the period of observation in the case of observational data; • where: this parameter indicates the object or region which is the target of the observations. This can correspond to the observatory location in the case of in-situ measurement or a distant target in the case of remote sensing observation. • what: this information can include four main parameter classes, which can be used separately or combined: o o o o the "measurement type" parameter; the "observatory" parameter the "instrument" parameter; the "DataProductType". 9 Recommendations on how to improve the quality of data Deliverable: D2.3 Layer-2: Precise search of data. This layer allows the development of search engines using precise criteria, up to a sufficient "physical parameter level". Layer-3: Automated basic data processing and scientific analysis. This layer allows data processing inside workflows without human intervention. In addition of Layer-2 functionality, this requires the standardization the information on the structure and the content of the granules (i.e. files in general). For example, if data products such as spectral cubes or particle distribution function, it is necessary to not only precisely define the represented physical parameter but it is also necessary to provide the information on how it is structured inside the files. This layer allows the data to be exploitable by high levels tools such as ALADIN, TOPCAT, AMDA, etc. All the information necessary to these tools needed to be provided. Part of this level can be seen as a common core containing the information that could be needed by and sufficient for a majority of tools. In addition, the user may need to have access to detailed and specialized documentation and reference information. These ones can be necessary for interpreting the data, possible instrumental artefacts or to generate high-level data products. These documentation can generally be articles published in scientific journals or books (e.g. ISSI book) or Web pages provided by the data provider. It can also be described as keywords for tools developed for more precise scientific objectives (CLWeb [http://clweb.cesr.fr] for particle detector data, ...). In the scope of the CASSIS project, we aim at analysing the requirements corresponding to the Layer-1 only as they are identified from the services of the SOTERIA, HELIO and EUROPLANET RI services. Beyond this level, the used data models are not stable and standardized enough yet. Moreover, the diversity of the data types, the measurements and the additional parameters is so large in the solar system sciences that it seems difficult to envision a global approach. This will be discussed in the last section. 10 Recommendations on how to improve the quality of data Deliverable: D2.3 5. Levels of interoperability versus science cases. Planetology: The interaction of Saturn's moon Enceladus with the kronian magnetospheric environment is one of science case analysed in EUROPLANET RI and reported at the Plasma Node site: http://europlanet-plasmanode.oeaw.ac.at/science-case-3_3.html This is the summary: "Enceladus, in Saturn's magnetosphere, is a geologically active moon, in the sense that it gets stretched and compressed by tidal forces, whilst orbiting Saturn. This heats up the inner regions of the moon and on the South Pole this results in a reservoir of liquid water. This water is released into the Kronian magnetosphere through a series of vents [Hansen et al., 2006; Porco et al., 2006]. These water plumes drastically influence the interaction of the magnetoplasma with the moon; it creates a strong asymmetry [Saur et al., 2007]. The sputtering off the moon, mass loading the system [Pontius and Hill, 2009] and these water plumes create a torus at Enceladus orbit [Johnson et al., 2006]." Such an object can stimulate the interest of several communities, including the ones studying the plasmas, the dust, the moon interior dynamics, the moon atmosphere/exosphere, etc... This is thus a good example of the interest of developing a virtual observatory in solar system sciences. We can easily imagine that a specialist of the moon interior dynamics could be interested in using dust or plasma data as a marker for inferring the internal activity of Enceladus. In this aim, the first step in his/her study will be to find the corresponding observation. Reciprocally, a plasma scientist may be interested to know where the hot points on the Enceladus surface are which can be the sources of the geyser that interact with the magnetospheric plasma and generate waves. The plasma scientist would then be interested in finding radar or infrared observation of the Enceladus surface. Figure 1: measurement of electric waves and electron fluxes in the vicinity of Enceladus, geyser observation at Enceladus in visible, and temperature map of Enceladus derived from infrared measurement.. The researcher specialist of one domain would like to send a request to a virtual observatory which will include the "when", "where" and "what" parameters. For example, a plasma scientist would like to send a request like: "I would like (what) observations in infrared and visible part of the spectrum obtained (where) on Enceladus and (when) during this specific time interval". The existence or the non-existence of the searched data will consist of the first result of this request. As there are not so many data obtained on planets and moons (except 11 Recommendations on how to improve the quality of data Deliverable: D2.3 the Earth), the researcher will receive back a limited list of data files and he/she will able to start to analyse them or evaluate their potential for collaboration with the specialists. As illustrated by this use case, the search using the layer-1 of interoperability is already very useful. It will save time and energy of the researcher and it will significantly help of stimulating collaborations and at the end increase the scientific return of the data. Heliophysics: Space Weather activity is very concerned with shocks that propagate through the solar system, from Sun to Earth. Three major phenomena are known to be able to produce shocks in the solar wind that can affect Earth environment (e.g. particle bombardment of satellites, contraction of Earth’s ionosphere changing satellites’ altitude, …): Solar flares (including filament eruptions), Coronal Mass Ejections (CMEs), and Coronal Holes (CH). In the last one, slow solar wind ejected by quiet Sun is caught up with fast solar wind originating over CH, producing a shock. If Coronal Holes are well observed and their effect anticipated (unfortunately, it’s not the most dangerous for Earth’s environment), scientists have been looking for connexions between flares and CMEs, without success. Flares don’t always produce CMEs, and CMEs don’t always appear with a flare. We know that both are strongly related to the solar magnetic field, but not how! So, to understand those phenomena (and it’s the same with the formation of CH, which is still unknown, but also related to magnetic field), we need to be able to study a large set of flares and CME, and their effect in the solar wind, in order to try to extract statistical behaviours of those phenomena. For instance, flares may occur on the back side of the Sun and their effect propagate towards us; and some CMEs can be so faint that they are not detected near the Sun. In both cases, being able to follow shocks in the solar wind may give indication that something occurred. To put through this statistical study, layer-1 of interoperability will provide all the needed information: We need to query data at a given time (possibly interval); at a given place on the Sun, or a specific location of the solar system (allowing, then to follow propagation). Location on the Sun is a critical parameter, in order to understand the underlying magnetic field behaviour, because this will allow queries for data that might be observing this region with a high spatial resolution (measuring polarization, …). And, of course, in order to ensure that we compare what can be compared, the “what” information of Layer-1 is mandatory. There is however a important difference with the previous use case in planetology. There were, are and will be many more observatories providing data on the Sun or the Earth space environment. If the user request includes elements that are too simple, the answer from a search engine will provide too much output, difficult and time consuming to manage. So, in the case of heliophysics: - the "where" information should at least indicates the regions (example: "solar wind near Earth", or "solar corona") - the "what" information (its "Measurement Type" or "DataProductType" elements) has to be more detailed for heliophysics than in planetology. In particular, in solar physics it should be important to specify if the image/movie data consist of wide (full Sun) or short angle. We can note also that the phenomena studied in heliophysics propagate. In consequence, the "when" information is more complicated to manage and characterize. The times can indicate the periods of the observation, these periods propagated (with models or heliospheric imaging) at another location, the periods of event, the periods of the life of structures, .... This is discussed in section 6.2.1. 12 Recommendations on how to improve the quality of data Deliverable: D2.3 6. Layer-1: basic search of data. 6.1. Matrix of the primary metadata used in the SOTERIA, HELIO and EUROPLANET RI services. The interoperability is built on protocols that allow the exchange of information and data between the services. The information on the data or the services needs to be expressed through a common language, interpretable by the machines. The metadata thus need to be standardized and organized into data models. In order to identify (i) the intersection and (ii) the whole set of their primary metadata, we have analysed the services of SOTERIA, HELIO and EUROPLANET RI. These primary metadata could globally be considered as the ones corresponding to the interoperability layer1. The results are reported in the following tables. As complementary information sources we used: - The document "Specification on Metadata Standards in Heliophysics" (HELIO N3.3 deliverable): http://www.helio-vo.eu/internal/Documents/Deliverables/HELIO-N3-005d3_MetadataStandards_v1.7.pdf - The documents describing the data models and the protocols used in EUROPLANET RI: o The document describing the PDAP protocol: http://planetarydata.org/projects/inactive-projects/data-access/documents/pdapversions/pdap-v1.1/view o The document describing the EPN-TAP protocol: http://voparis-europlanet.obspm.fr/Tdocum.shtml o The document describing the EPN-DM data model: http://voparis-europlanet.obspm.fr/docs/PlanetaryScienceResource-DM-latest.pdf 6.2. Frequently used primary metadata. 6.2.1. When ? The Table 1.1 indicates how the time parameter is managed in the various services of SOTERIA, HELIO and EUROPLANET RI projects. We note that the time parameter is used in all the services. Format of the dates: The format of dates of most of the services of HELIO is given in UTC system, using the CCSDS ASCII (Calendar Segmented time code format, ISO8601): YYYY-MM-DDTHH:MM:SS(.fff), example: 2003-10-27T04:00:00 This format is now used by nearly all the recent databases. EUROPLANET RI uses Julian day at the protocol level. But client and server interfaces can easily make transformations between those two. 13 Recommendations on how to improve the quality of data Deliverable: D2.3 However, in SOTERIA services, the format is different and not homogeneous through them. As the dates consist of crucial and widely used parameters, they are exchanged between interoperable services very often, quasi-systematically in fact. If the services use their own specific format, it is necessary (i) to integrate a translation functionality inside each of them and (ii) to add a new metadata indicating the format used for date. Moreover, these format need to be maintained. Their specifications need to be registered somewhere and to be accessible. It is clear that this will represent an additional charge of maintenance and an unuseful risk for sustainability. It seems highly preferable that every one uses the same date format in metadata. Recommendation R2.1: For developing interoperability, it is recommended that a standardised format for writing the dates in the metadata should be adopted. This standardised format should be ISO 8601. Meaning of the time parameter: In many services, the time parameter indicates the time interval of an observation or the time interval of a data file or a data set. In this case, the time parameter consists of a couple of dates: START_TIME, END_TIME. However, in some services, the time parameter has another meaning. It can indicate for example a predicted or propagated time interval (for example in the Propagation service under development in HELIO, or the one at CDPP, or in the context of Space Weather services). It can also characterize the operation time period of an instrument or of an observatory (e.g. in the instrument descriptor of EUROPLANET RI). In the catalogues of events or features, the time parameters can indicate the start and end times of events or features, but also additional characteristic times (e.g. peak time, growth time, visibility times ...). In an interoperable system, the machines have to interpret the time parameters in respect to the right meaning. One possibility for solving this issue could be to consider to associate an additional metadata to the time parameters indicating its meaning. For example, the values of this metadata could be: • OST: Observation Start Time • OET: Observation End Time • EST: Event Start Time • EET: Event End Time The interesting time for science is the time at the target (i.e. event time). Recommendation R2.2: The time parameter may have different meanings. It is recommended that a way be found to indicate clearly what the parameter is, for example by associating an additional metadata providing its exact meaning. 14 Recommendations on how to improve the quality of data Deliverable: D2.3 Table 1.1-a: Time parameters for the SOTERIA services Table 1.1-b: Time parameters for the HELIO services 15 Recommendations on how to improve the quality of data Deliverable: D2.3 Table 1.1-c: Description of the StartTime parameter for the EUROPLANET RI services, from the EPN Core Resource Definition 6.2.2. Where ? As indicated by Table 1.2, the "where" parameters are not used uniformly through the various services. In SOTERIA, most of the services provide specific products that do not need to be filtered by a location criteria. The user knows already what these products are. In HELIO, and for the SolarMonitor of SOTERIA, the “where” parameters are relatively precise. They include either a defined region or the coordinates of a location of interest in some pre-chosen coordinate system. In EUROPLANET RI, in the first layer of interoperability, the "where" parameters consist only of the specification of the object of the solar system that is the target of the search. The EUROPLANET team attempted to find more precise metadata for characterizing the location of the targets, but this is still an un-solved complex issue. The difficulties come from the fact that there are many objects in the solar system, planets, moons, asteroids and comets, and that there are for each one a number of coordinate systems (related to the rotation, the magnetic field, the ecliptic plane, or combination of the object with another one (the sun for example). Obviously, the needs for a data search corresponding to the layer-1 are not the same if the target is the sun or the Earth, or if it is another object of the solar system. . There is a major difference as there are many observations on the Sun or the Earth while there are a very limited number of missions that explored each of the other objects. If the "where" parameters only indicates that the user is searching data on the Sun or the Earth, the answer will be an extremely long list of URLs or files which will not be very useful. Conversely, if the user only specifies that he/she wants data on Saturn or another particular object, the answer will be a limited and useful list of files. In respect to the layer-1, we propose the following recommendations: Recommendation R3.1: The precision of the metadata characterising the location of the target of the measurement should be adjusted as a function of the amount of the potentially corresponding data. 16 Recommendations on how to improve the quality of data Deliverable: D2.3 There are regions that are generic for many objects of the solar system. For example, all the planets have a magnetosphere that is structured with the same regions. It would be interesting to study the possibility of building hierarchical descriptor like UCD7 (Unified Content Descriptors) used in IVOA. This approach has been adopted in SPASE. Example from SPASE: Earth.Magnetosphere.Magnetotail Heliosphere.Heliosheath Sun.Corona There is a need for metadata describing spatial coordinates. But it is not the goal of UCD to go so deep in the description. That is why we remove previous Recommendation R3.2 (The generalization of UCD for describing the location of the data target should be studied and experimented.) and replace it with: Recommendation R3.2: The generalization of SPASE descriptors of target.region should be studied and experimented with. Table 1.2-a: "Where" parameters for the SOTERIA 7 http://www.ivoa.net/Documents/latest/UCD.html 17 Recommendations on how to improve the quality of data Deliverable: D2.3 Table 1.2-b: "Where" parameters for the HELIO services Table 1.3-c: "Where" parameters for the EUROPLANET RI services 18 Recommendations on how to improve the quality of data Deliverable: D2.3 6.2.3. What ? Table 1.3-a: "What" parameters for the SOTERIA services Table 1.3-b: "What" parameters for the HELIO 19 Recommendations on how to improve the quality of data Deliverable: D2.3 Table 1.3-c: "What" parameters for the EUROPLANET RI services ª Observatory: The couple {"Observatory", "Instrument"} are often used in requests for accessing the data. It is the case in HELIO and SOTERIA, and it is one of the possibilities in EUROPLANET RI. The "Observatory" metadata is relatively simple to manage. However, in some rare cases, there are some possible ambiguities that can occur when there is not a unique name for some observatories. For example, VENUS-EXPRESS is often named as VEX. In the case of multiple probe missions, there are different ways for naming a particular one. For example, the names third spacecraft of the CLUSTER mission can be CLUSTER-3, CLUSTER-C3 or CLUSTER-RUMBA. Two possibilities can be envisioned for resolving the names of observatories: (1) as attempted in HELIO, to develop a semantic mapping service which will be able to associate the different names of a same observatory. Such a service can understand that CLUSTER-3, CLUSTER-C3 and CLUSTER-RUMBA has the same meaning. (2) to establish a list of standardised observatory names, to archive it, to maintain it and to publish it in order that this list could serve as the reference. The second option seems much easier to develop and to maintain. Moreover, if using the option (1), it will be necessary to build this reference list of observatory names. It should be inside this reference list that the output of the semantic service should be taken, for being thereafter used in the query language. Recommendation R4.1: We recommend establishing a reference list of observatory names. This list should be maintained/updated, and made available in the Internet. This should be done in the frame of IAU. IAU has a list of ground-based observatories, maintained by the minor planet centre. The majority of observatories are there. However, they 20 Recommendations on how to improve the quality of data Deliverable: D2.3 refuse to add an observatory if it is not used for minor planets studies (e.g. the Nançay radio instruments, in France). For space-based observatories, NASA SPICE8 has a list of missions that can be used. It is important to use the official names of space missions (the ones used by space agencies), as well as to allow alternate names in the client layers Recommendation R4.2: It would be useful that this list would be inserted into a catalogue providing other information such as the period of operation of the observatories, the targets of the mission/observatory and the url of the mission/observatory site. Moreover, in the case of space missions, the name used must be the one used by the space agencies. ª Instrument: The requirements on the "Instrument" metadata are similar. There are of course many more instruments than observatories. In addition to the potential causes of ambiguities met for observatories, some difficulties can come from the fact that a same name can be used for several different instruments. For example, ASPERA is a particle spectrometer suite which has flown onboard both Mars-Express and Venus-Express missions. This situation is often met with some instrument types like the magnetometers that are often called "MGF", "MAG", "MFI" or "FGM". Thus, it seems that the instrument name should always be associated with the observatory one for avoiding potential ambiguity. A semantic mapping service may also be envisioned for instrument, but seems complicated to develop and maintain. Again, for short terms at least, it seems preferable to establish a reference list of instrument name, associated (or linked to) with the observatory one. Recommendation R4.3: We recommend establishing a reference list of instrument names. This list should be maintained/updated, and made available in the Internet. The names should correspond to the ones used in official archives. Recommendation R4.4: The reference list for instruments should be associated with / merged with / linked to the mission/observatory list. It should be interesting to include some complementary information into the instrument reference list, such as in particular the instrumental techniques. When possible, there should also be a mandatory keyword for instrument description [e.g. Magnetometer], so that whatever the name of the instrument, it is possible to know what it is. This plus the association of the mission/observatory help defining a particular instrument. But we have to 8 http://naif.jpl.nasa.gov/naif/aboutspice.html 21 Recommendations on how to improve the quality of data Deliverable: D2.3 keep in mind that it is closely related to measurement type, which is not always an easy task to define. Recommendation R4.5: As often as possible, the instrument type should be associated which each instrument name and mission/observatory. ª DataProductType: The metadata "DataProductType" is relatively universal and is not or only marginally dependent on the domains. This metadata is easy to manage if its role is to indicate the type of a data-product relatively to general classes. This metadata has been reviewed in the context of EUROPLANET RI based on the analysis of the existing standards (IVOA, SPASE, PDS). The list of Data Product Types extracted from the EPN-DM data model document is the following: "DataProduct: The data product type describes the high level scientific organization of the data product being considered. The list of product values is: • Image: associated scalar fields with two spatial axes, e.g. image with multiple color planes, from multichannel cameras for example. Maps of planetary surfaces are considered Images • Spectrum: data product for which the spectral coverage is the primary attribute, e.g. a set of spectra • DynamicSpectrum: consecutive spectral measurements through time, organized as a time series. voir baptiste 1D temp, 1D Spectral • SpectralCube: set of spectral measurements with 1D or 2D spatial coverage, e. g. imaging spectroscopy. The choice between Image and Spectral_cube is related to the characteristics of the instrument • Profile: scalar or vectorized measurements along one spatial dimension, e.g. atmospheric profiles, atmospheric paths, sub-surface profiles, etc. • Volume: any measurement with three spatial dimensions • Movie: set of chronological 2D spatial measurements • Cube: multidimensional data with three or more axes, e.g. all that is not described by other 3D data types such as spectral cubes • TimeSeries: measurements organized primarily as a function of time (with exception of dynamical spectra). A light curve is a typical example of a time series dataset. • Catalogue: it can be a list of events, a catalogue of object parameters, a list of feature, etc., e.g. list of asteroid properties • SpatialVector: list of summit coordinates defining a vector, e.g. vector information from GIS, spatial footprints, ." (From the EPN-DM datamodel document of EUROPLANET RI) Such a list could be adopted as a reference list for solar system science. It remains general but provides enough information for characterizing efficiently the search of data in most of the 22 Recommendations on how to improve the quality of data Deliverable: D2.3 cases. However, some additional metadata or more precise ones may be necessary. It is the case in particular for solar physics which needs to distinguish between large and short view angle, but also for some planetary imagers. Recommendation R4.6: We recommend establishing a reference list of names of the data product types. This list should be maintained/updated, and made available on the Internet. Recommendation R4.7: The data product type is information that is mostly non-dependent on the domains. Such lists exist already in various standards such as IVOA, SPASE or PDS. As it is easy and obvious, it is strongly recommended that a list coming from one of them is adopted and if needed to extended. Remark: this has been done in EUROPLANET RI, using the IVOA standards. ª Measurement Type: When a researcher searches data, he/she searches a particular physical parameter or quantity which can be useful for his/her study. Generally, the request of data in archives or services uses the "observatory"+"instrument" metadata. However, the user does not know necessary which instrument can provide such physical parameter. This situation is met in particular in the case of transverse studies based on observations related to different domains (e.g. the planetology use case presented in section 5). It is thus highly desirable to provide information on the nature of data content. Ideally, we would need to describe the physical parameters contained in the data. But, it may be difficult and complex, and it could become very demanding for the data providers. Many semantic problems appear when we want to standardize the information at the parameter level. Let us consider for example the case of the data provided by the energetic ion instrument EPIC-STICKS flowing onboard the GEOTAIL spacecraft (which has been used as a test case in EUROPLANET RI). This instrument combines electrostatic analysers, time-offlight telescopes and solid-state detectors. It provides the 3D distribution functions with the ability to separate the ions as a function of their mass and their charge. The data are organised (i) by detection head and (ii) by detector and either (iii) by ion specie (defined by its mass and its charge), (iv) velocity or (v) energy. In practice, this instrument continuously produces more than two hundred datasets. It is easy to imagine that describing such data up to the parameter level is a complicated task. Standardising this description implies standardising all the necessary information such as the way of characterising a ion specie or the instrument dependent specie (e.g., the energy or velocity channels without separation of ions), the coordinate system in which the data are represented, the units, the quantity nature (counts, fluxes, ...) and several other information elements depending on the data product types. Moreover, for making this complex description usable in the interoperability paradigm, it needs to be structured. This particular problem of the ion spectrometer has been addressed and solved in SPASE. But the SPASE data model 23 Recommendations on how to improve the quality of data Deliverable: D2.3 cannot be applied outside the heliophysics and thus cannot fit the solar system science as a whole. Similar or equivalent in complexity intensity can be found in all the fields of the solar system science. The data model EPN-DM has been developed in the context of EUROPLANET RI in the aim to solve this challenge. This model based on the IVOA standards is being published. It is very likely that it will need some years to become stabilized. We thus may consider that there is no applicable solution at the present time for providing a standardized description of the solar system science data at the measured parameter level. Inserting the "Measurement Type" metadata could be considered as an intermediate option. The measurement type characterizes type of physical quantities contained in the data by remaining at a high level of description. For example, the "MeasurementType" metadata can indicate that the data contains measurement of magnetic field, but does not specify the coordinate system, the cadence, the units, etc. Such a metadata is very much easier to manage and allows however to make the data search efficient. From the data provider side, this option has also some great advantages. As this metadata is simple, it will be easy to take it into account in the description of his/her data. If the physical parameter level of description was required, we can imagine that it would be extremely costly and resource consuming. This would interrupt development and we can anticipate that it would not make the data providers enthusiastic about publishing their data in a virtual observatory. Recommendation R4.8: As a standardised description at the "physical parameter" level is very complex to elaborate and to implement, we recommend introducing the intermediate level characterizing the "Measurement Type". Recommendation R4.9: Reference list of the "MeasurementType" metadata exist in SPASE and in EPNDM data models, only for the domain of heliophysics. The communities in other fields should organize themselves in order to study this issue and to propose reference lists for this metadata. Recommendation R4.10: The MeasurementType metadata should be developed accordingly to existing standard or by extending them if necessary. For illustrating this last recommendation, the list of the MeasurementType metadata values for plasmas used in the EPN-DM is given below. Most of them already existed amongst the IVOA UCDs but as mentioned above in this document, a conclusion of Solar System UCDs document v0.3 is that UCD is probably not the best place to describe in detail MeasurementType. This discussion is still in progress in IVOA. 24 Recommendations on how to improve the quality of data Deliverable: D2.3 UCDs that already exist in the IVOA: UCD Origin Meaning _________________________________________________________________ phys.electField cds Electric field phys.magField cds Magnetic field phys.electron cds electron em.gamma cds Gamma rays part of the spectrum em.X-ray cds X-ray part of the spectrum em.UV cds Ultraviolet part of the spectrum em.opt cds Optical part of the spectrum em.IR cds Infrared part of the spectrum em.radio cds radio part of the spectrum Added UCD in EPN-DM: phys.ion phys.dust phys.neutral EPN-DM EPN-DM EPN-DM Ions Dust Neutrals 25 Recommendations on how to improve the quality of data Deliverable: D2.3 7. Summary of recommendations. Analysing the metadata used in the SOTERIA, HELIO and EUROPLANET RI projects, and anticipating interoperable infrastructures desirable for transverse science aimed in future missions such as SOLAR-ORBITER, ROSETTA or BEPI-COLOMBO, we extracted the following recommendations: ª Concerning the general approach: Recommendation R1: We recommend considering several layers of interoperability corresponding to different layer of functionality and developing the standards accordingly, although keeping as much consistency as possible between them by using a common dictionary as much as possible. Two or three layers is probably the maximum needed. Considering the first layer of interoperability, which allows the user to query data, our recommendations are: ª Concerning the description of the time: Recommendation R2.1: For developing interoperability, it is recommended that a standardised format for writing the dates in the metadata should be adopted. This standardised format should be ISO 8601. Recommendation R2.2: The time parameter may have different meanings. It is recommended that a way be found to indicate clearly what the parameter is, for example by associating an additional metadata providing its exact meaning. ª Concerning the description of the location of the targets of the observations: Recommendation R3.1: The precision of the metadata characterising the location of the target of the measurement should be adjusted as a function of the amount of the potentially corresponding data. 26 Recommendations on how to improve the quality of data Deliverable: D2.3 Recommendation R3.2: The generalization of SPASE descriptors of target.region should be studied and experimented with. ª Concerning the content of the data: Recommendation R4.1: We recommend establishing a reference list of observatory names. This list should be maintained/updated, and made available in the Internet. Recommendation R4.2: It would be useful if this list could be inserted into a catalogue providing other information such as the period of operation of the observatories, the targets of the mission/observatory and the url of the mission/observatory site. Moreover, in the case of space missions, the name used must be the one used by the space agencies. Recommendation R4.3: We recommend establishing a reference list of instrument names. This list should be maintained/updated, and made available in the Internet. The names should correspond to the ones used in official archives. Recommendation R4.4: The reference list for instruments should be associated with / merged with / linked to the mission/observatory list. Recommendation R4.5: As often as possible, the instrument type should be associated which each instrument name and mission/observatory. Recommendation R4.6: We recommend establishing a reference list of names of the data product types. This list should be maintained/updated, and made available on the Internet. Recommendation R4.7: The data product type is information that is mostly non-dependent on the domains. Such lists exist already in various standards such as IVOA, SPASE or PDS. As it is easy and obvious, it is strongly recommended that a list coming from one of them is adopted and if needed extended. 27 Recommendations on how to improve the quality of data Deliverable: D2.3 Recommendation R4.8: As a standardised description at the "physical parameter" level is very complex to elaborate and to implement, we recommend introducing the intermediate level characterizing the "Measurement Type". Recommendation R4.9: Reference list of the "MeasurementType" metadata exist in SPASE and in EPNDM data models, only for the domain of heliophysics. The communities in other fields should organize themselves in order to study this issue and to propose reference lists for this metadata. Recommendation R4.10: The MeasurementType metadata should be developed accordingly to existing standard or by extending them if necessary. At more general level, we could add: Recommendation 5.1: The idea to manage many different data models with help of semantic mapping service is interesting and will certainly consist of a promising perspective. This approach should be pursued but it is still a research front. In short terms, it seems to be difficultly applicable in the solar system sciences as a whole. Recommendation 5.2: The IVOA standards cover the description of astronomy, which includes multiple domains. As experimented in planetology, some of these standards seem to be flexible enough for being adapted to solar system sciences, by developing some extensions when necessary. Using and extending the IVOA standards seem to be the easiest and the less costly way for developing interoperable infrastructures in solar system sciences. This last recommendation is true for data format, search and exchange procedures. But for data description, IVOA standard is not adapted. So this recommendation concerns mainly data format (VOTable, FITS, …), search (TAP, EPN-TAP, …) and exchange (SAMP, …). 28 Recommendations on how to improve the quality of data Deliverable: D2.3 8. List of useful links 8.1. Institutions * IAU nomenclature for object types: http://planetarynames.wr.usgs.gov/Page/Planets * IVOA home page: http://www.ivoa.net/ IVOA documents: http://www.ivoa.net/documents/ * IPDA (International Planetary Data Alliance) home page: http://planetarydata.org 8.2. FITS information * World coordinate system for use in FITS format: http://fits.gsfc.nasa.gov/fits_wcs.html * Tags recommended to use in FITS files for use in standard software: http://www.mssl.ucl.ac.uk/surf/sswdoc/solarsoft/ssw_standards.html * STEREO/SECCHI FITS Header Keyword Definition (which is widely used): http://sohowww.nascom.nasa.gov/solarsoft/stereo/secchi/doc/FITS_keywords.pdf 8.3. Projects web pages and documents * HELIO home page: http://www.helio-vo.eu/index.php HELIO useful documents: * Heliospheric & Planetary Data: http://www.helio-vo.eu/documents/public/HELIO_Planetary-Data_20100208.pdf * Coordinate Systems: http://www.helio-vo.eu/documents/public/HELIO_Coordinates_100322.pdf * HELIO Data Model: http://www.heliovo.eu/documents/public/HELIO_UNIMAN_R1_004_TN_DataModel_v0.3.pdf * SOTERIA home page: http://soteria-space.eu/index.php * EUROPLANET IDIS (Integrated and Distributed Information System) home page: http://www.europlanet-idis.fi/index.php 29 Recommendations on how to improve the quality of data Deliverable: D2.3 EUROPLANET IDIS documents page: http://www.europlanet-idis.fi/index.php?id=documents Documents describing the datamodels and the protocoles used in EUROPLANET RI: o The document describing the PDAP protocole : http://planetarydata.org/projects/inactive-projects/data-access/documents/pdapversions/pdap-v1.1/view o The document describing the EPN-TAP protocole: http://voparis-europlanet.obspm.fr/Tdocum.shtml o The document describing the EPN-DM datamodel: http://voparis-europlanet.obspm.fr/docs/PlanetaryScienceResource-DM-latest.pdf * IMPEx home page: http://impex-fp7.oeaw.ac.at * VSO recommendations: *Minimum Information for Solar Observations: http://docs.virtualsolar.org/wiki/MinimumInformation * Checklist for data description: http://docs.virtualsolar.org/wiki/Checklists 8.4. Data Model, UCD, Metadata… * Early data model for solar observations: http://www.mssl.ucl.ac.uk/~rdb/egso/documents/ * The IMPEx Data Model: http://impex.latmos.ipsl.fr/tools/DataModel.htm * Explanation of the meaning of UCDs: http://cdsweb.u-strasbg.fr/UCD/tree/js/ * IVOA document (version 0.2) on Solar System UCDs: http://wiki.ivoa.net/internal/IVOA/InterOpSep2013Semantics/SolarSystemUCD-V02.pdf Version 0.3 of this document: http://typhon.obspm.fr/idis/docs/SolarSystemUCD-V03.pdf * "Specification on Metadata Standards in Heliophysics" (HELIO N3.3 deliverable): http://www.helio-vo.eu/internal/Documents/Deliverables/HELIO-N3-005d3_MetadataStandards_v1.7.pdf 30 Recommendations on how to improve the quality of data Deliverable: D2.3 8.5. Tools for data mining and data visualization based on VO standards * Aladin: http://aladin.u-strasbg.fr/aladin.gml * Topcat: http://www.star.bris.ac.uk/~mbt/topcat/ * Amda: http://cdpp-amda.cesr.fr/DDHTML/index.html * CL Web: http://clweb.cesr.fr 8.6. Others * NASA list of missions: http://naif.jpl.nasa.gov/naif/aboutspice.html * SPASE: • Home page: http://www.spase-group.org • Registry explorer: http://www.spase-group.org/registry/explorer/ • Data Model explorer: http://www.spase-group.org/data/explorer/ 31
© Copyright 2024