Semantically Matching Tools and Data Collection

Semantically Matching Tools and Data Collection
Content: A ToolMatch Use Case Extension
by
Matthew Ferritto
A Thesis Submitted to the Graduate
Faculty of Rensselaer Polytechnic Institute
in Partial Fulfillment of the
Requirements for the Degree of
MASTER OF SCIENCE
Major Subject: COMPUTER SCIENCE
Approved by the
examining committee:
Peter Fox
Thesis Advisor
Deborah McGuinness, Member
Jim Hendler, Member
Rensselaer Polytechnic Institute
Troy, NY
November 2014
(For Graduation December 2014)
Contents
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Initial Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2
Third Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.4
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1
Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
SWRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.3
Semantic Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
3 ToolMatch Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.1
ToolMatch Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
3.2
ToolMatch Instances . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.3
Semantic Matching of Tools and Data Collections . . . . . . . . . . .
15
4 ToolMatch Web Service . . . . . . . . . . . . . . . . . . . . . . . . . .
17
4.1
ToolMatch Web Service (Tools) . . . . . . . . . . . . . . . . . . . . .
ii
17
4.2
ToolMatch Web Service (Data Collections) . . . . . . . . . . . . . . .
21
5 Third Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
5.1
Observation and Measurements Ontology . . . . . . . . . . . . . . . .
24
5.2
Changes in the ToolMatch Ontology . . . . . . . . . . . . . . . . . .
25
5.3
Semantic Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
5.4
Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
29
6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
iii
List of Figures
1
ToolMatch Ontology - Data Collections . . . . . . . . . . . . . . . . .
9
2
ToolMatch Ontology - Tools . . . . . . . . . . . . . . . . . . . . . . .
10
3
ToolMatch Instances - Tool and DataCollection . . . . . . . . . . . .
14
4
Inferencing Example . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
5
Web Service Tool List . . . . . . . . . . . . . . . . . . . . . . . . . .
18
6
Web Serice Tool Splash Page . . . . . . . . . . . . . . . . . . . . . . .
19
7
Web Service Tool Form . . . . . . . . . . . . . . . . . . . . . . . . . .
20
8
Web Service Data Collection Preliminary Form . . . . . . . . . . . .
22
9
Web Service Data Collection Main Form . . . . . . . . . . . . . . . .
23
10
ToolMatch Updated Ontology - Tool . . . . . . . . . . . . . . . . . .
26
11
ToolMatch Updated Ontology - Data Collection . . . . . . . . . . . .
27
12
ToolMatch Instances - Full . . . . . . . . . . . . . . . . . . . . . . . .
41
13
RDF/XML for Tool Instance (Panoply) . . . . . . . . . . . . . . . . .
42
iv
List of Tables
1
ToolMatch Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2
ToolMatch Object Predicates . . . . . . . . . . . . . . . . . . . . . .
12
3
ToolMatch Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . .
43
v
Listings
1
ToolMatch Schema (N3) . . . . . . . . . . . . . . . . . . . . . . . . .
36
2
Observation XML Example . . . . . . . . . . . . . . . . . . . . . . .
44
vi
Acknowledgments
I would like to express my gratitude to my advisor, Dr. Peter Fox, who guided me
throughout the whole thesis process. His vision, guidance and reassurance were all
instrumental in the writing of this thesis. I would also like to thank Dr. Jim Hendler
and Dr. Deborah McGuinness, who were both on my Master’s Thesis committee.
Additionally, I would like to thank Patrick West, Nancy Hoebelheinrich, and Chris
Lynnes for all their contributions, as well as for all of their collaboration on The
ToolMatch service. Their enthusiasm and work ethic made it a pleasure to work with
them.
vii
Abstract
The ToolMatch service was developed with the intent to provide data users with the
means to match their data collections with a comprehensive list of useful, appropriate
tools, and to provide data tool developers with data collections that will work with
their tools. As such ToolMatch had an initial scope of two use cases, the first of
which was the semantic matching of data collections with tools. This would allow
data users to find and choose among a list of otherwise separate and potentially hard
to find tools that could work with their data collections. The second (and more
difficult) of these use cases was the converse: given a tool, semantically find what
data collections that the tool can use. If the first use case is analogous to having nails
and looking for a hammer, then the second use case can be compared to having a
hammer and looking for nails. It is much more difficult to find data collections that
may work with a given tool, since a tool user might not necessarily know what to
look for. Using the ToolMatch service, a tool user could easily find a data collection
to use with their tool. In both of these use cases, wasted time and effort searching
for the correct tool or data collection can be reduced or avoided completely. The
focus of this thesis will be on the implementation of these two use cases, as well as an
extension of the first use case, where a data user with certain semantics for a given
data collection (such as a domain model) can find tools that can be used with the
content of that data. This is an important issue due to the fact that a certain data
collection content may not be appropriate for a tool within a certain domain model.
For example, rainfall or topographic data content that is part of a larger Hydrological
model can be matched to tools that the model as a whole might not be able to match.
This expands the scope of the initial use case in that data collection content requires
stricter matching than just the characteristics of data collection. The requirements of
this use case involve modification and expansion to the ToolMatch conceptual model
and ontology to allow for semantic matching between data content and tools. These
viii
changes will also be reflected in the ToolMatch web service, which allows users to
make add, update, or delete instances of the ToolMatch ontology without having to
have a full understanding of ontologies.
ix
1
Introduction
For a given data collection, it is difficult to find the tools that can be used to work
with that data collection. In many cases, the information that Tool A works with
Data Collection B is somewhere on the Web, but not in a readily identifiable or discoverable form. In other cases, particularly more generalized tools, the information
does not exist at all, until somebody tries to use the tool on a given data collection.
Conversely for a given tool, it can be even more challenging to find data collections
that work properly with that tool.
Out of these two issues sprang two intial use cases for ToolMatch. The simplest
and most prevalent use case developed was for the user to find the tools that can be
used with a given data collection. For example, if a user has data collections that are
accessible via OPeNDAP Hyrax in HDF5 format, how could these data collections be
used? What tools are available to work with that data collection, and where could
one find them?
The second use case is a converse of the first: given a data tool, how can a user
find appropriate data collections that the tool will be able to work with? For a given
tool, it can be difficult to find data collections that work properly with that tool.
A common analogy for the first use case is that someone has a nail, but needs a
hammer to that works with a nail. The second use case can be compared to having
a hammer and looking for the proper nails that can be used with that hammer. It is
much more difficult to find data collections that may work with a given tool, since a
tool user might not necessarily know what to look for. Examples of this use case all
center around a central premise. Given a tool, how could a user find a data collection
with measurements of atmosphereic aerosol optical depth sliced along latitude and
longitude, returned as netcdf data, and accessible in MatLab? Which data collections
1
are available in Giovanni? The final and perhaps the clearest example involves three
tools: HDFView, Ferret, and Panoply, all of which can display data with different
servers and different formats. HDFView can display swath data in DFA as line plots;
Ferret can display swath data via OPeNDAP as a Grid; Panoply can dislpay swath
data via OPeNDAP on a map. The second use case addresses this issue of finding
which data collections can be used by each of these tools. These two use cases formed
the backbone of the ToolMatch service.
1.1
Initial Work
The first iteration of ToolMatch involved the creation of an ontology based upon the
first two use cases, creating a simple set of concepts and relationships in order to
match tools with data collections and data collections with tools. In addition, a web
interface was created to display basic information about ToolMatch model and to
collect information about necessary characteristics of tools and data collections using
simple web-based forms. Given the simple set of concepts and relationships, the form
was able to return information about a tool or data collection using inferencing rules.
User could then view tools and data collections as well as make changes to them,
without necessarily needing to understand ontologies. Using the ToolMatch service,
a data user could easily find one or more tools that can visualize their data collection,
provided that the tools meet the requirements. Conversely, a tool user can discover
data collections to be used by their tool, rather than wasting time and effort searching
for a tool using conventional methods.
1.2
Third Use Case
Since the initial scoping of the ToolMatch service, a third use case has been developed. This use case extends the first use case (semantic matching of tools and data
collections) with additional conditions. Given a data collection and with additional
2
semantics in mind for that data collection (such as a domain model), find tools that
can be used with the content of that data. For example, a researcher who has identified certain types of rainfall or topological measurements with a hydrological model
wants to know what tools will work with the measurement data. This use case involves the definition and documentation of the extension, as well as the adaptation
of the conceptual model, the ontology, and the ToolMatch web service.
1.3
Future Work
With the first iteration of the ToolMatch service complete, we looked for ways to
expand ToolMatch. In order to make further progess on the viability and robustness
of the ToolMatch service, more instance data needs to be added to the knowledge store
in the form of different kinds of visualization tools, and more data collections need
to be added from a variety of science domains. Further populating the knowledge
store will both confirm that the ToolMatch service meets the requirements of all
three initial use cases, but also expand the applicability of the service to other science
domains within the ESIP (Earth Science Information Partners) Federation as well.
This will lead to the development of further use cases.
1.4
Thesis Outline
This thesis will, in order, provide a brief overview of ontologies and semantic matching
(section 2). This will be followed by a detailed section of the ToolMatch service,
including the ontology and semantic matching of tools and data collections (section
3). The ToolMatch web service, an integral part of the use cases, will come after
(section 4). Next will be explanation of the third use case of the ToolMatch service,
as well as its changes to the ToolMatch ontology and web service (section 4). Finally,
the last section will discuss future goals and work that ToolMatch hopes to achieve
(section 5). Additional diagrams and the full ToolMatch schema (in N3) will be
3
provided in the appendices.
4
2
2.1
Historical Background
Ontologies
ToolMatch uses a lightweight ontology of concepts and properties relating to tools
and data collections.An ontology is a set of axioms that models a specific domain,
concepts or objects within that domain, and the properties and relations between
those objects. Ontologies contain two types of properties: data properties and object
properties. Data properties are between an individual and a literal, while object
properties are between two separate individuals. Through the use of a reasoner, we
can make inferences between properties that are implicitly contained in the ontology
(Bechhofer et al. 2004). Ontologies are still one of the main areas of research in
the realm of semantic web, with approximately 59 percent of semantic web papers in
2013 focusing on them (Menemencioglu and Orkak 2014). In order for ToolMatch to
achieve the semantic matching between tools and data collections, SWRL (Semantic
Web Rule Language) is used.
2.2
SWRL
SWRL is based on a combination of the OWL DL and OWL Lite sublanguages of
the Web Ontology Language with the RuleML sublanguages of the Rule Markup
Language (Horrocks et al. 2004). The advantage of SWRL is that it enables Hornlike rules to be used in combination with any OWL knowledge store. A SWRL rule
takes the form of an implication between an antecedent, or body, and its consequent,
also known as a head. For any given rule, conditions are specified for the antecedent.
If these conditions hold, or evaluate to true, any conditions specified in the consequent
must be true as well (Horrocks et al. 2004). A very simple example of a SWRL rule
is shown below:
parent(?x,?y) ^ brother(?y,?z) -> uncle(?x,?z)
5
Given a an individual (?x) who has a parent (?y) and another individual who is
the brother of that parent (?z), it can be inferred that individual ?z is the uncle of
individual ?x. While very basic, this rule shows the potential of inferencing across
ontologies. In summary, SWRL allows for rule-based reasoning across ontologies.
These rules can be used by a reasoner to perform inferencing. In short, we use SWRL
due to the fact that SWRL provides additional expressive power than OWL DL can
by itself. For the ToolMatch service, the lightweight ontology is combined with SWRL
rules to determine matching tools and data collections.
2.3
Semantic Matching
Semantic matching involves the matching of relationships or concepts in ontologies
that are semantically related, but often implicit. Much of the current work being
done in semantic matching revolves around the idea of discovering and matching web
services. For instance, in order to address the problem of service searching in serviceoriented applications researchers described a method which improved the quality and
precision of discovered web services by using semantic web technologies (Lu, Hsu,
Kuo 2013). This proposal included a web service discovery method that combined
WordNet, domain ontologies, and SWRL rules in web service matchmaking. Another paper detailed a framework for preference-based semantic matching between
web services and security policies (Alhazbi, Khan, and Erradi 2013). This again uses
an ontology for domain modeling, but instead uses a matching algorithm applied to
a Hermit reasoner to determine specify the best web security poilicy option to be
mapped with provider capabilities. Finally, another paper put forward an automated
approach to semantic annotation based on the DBpedia knowledge base, and provides a solid foundation for services discovery and automated service composition
(Zhang, Chen and Feng 2013). These methods all revolve around the creation of an
ontology to represent concepts and ideas through classes and object properties, along
6
with inferencing or other algorithms to semantically match web services. As will be
explained in future sections, ToolMatch enables semantic matching to the applicable
realm of data collections and tools. In addition, the ToolMatch web service offers the
opportunity for users who are not ontologists to contribute and modify the ToolMatch
knowledge store.
7
3
ToolMatch Service
In order to properly match tools and data collections, ToolMatch uses an ontology
and a set of rules that can help researchers determine what tools can be used given
a particular set of data, or find data that can be used within a tool. For the purpose
of this initial two use cases we clarify that a data collection can be ”used” by a tool
when it can visualize the given data collection (represented by the visualizedBy object
predicate).
3.1
ToolMatch Ontology
The schema displayed in Figure 1 above shows the classes, object properties, and
the predicates between them relating to the data collection part of the ToolMatch
ontology. Here classes and object properties are entities and the object predicates
are the relationships between them. Object predicates relating to the DataCollection
class include hasDataFormat, isAccessedBy, hasAccessURL, usesConvention, and visualizedBy. While some of these may seem self-explanatory, we will nevertheless go
through them for the sake of clarity
In order to relate a data collection to the format in which it is stored (HDF, NetCDF,
etc.), the predicate hasDataFormat is used. The term isAccessedBy states that a data
collection can be used by a data server, where a data server is a piece of software
that can access, manipulate, and return data products derived from data collections,
including the data collections themselves. The hasAccessURL predicate relates each
data collection to a URL, where a data collection can be addressed via that URL.
UsesConvention relates a data collection with a data convention, which states that
each data collection follows a particular set of agreed upon rules in its representation and metadata. Other properties that describe a DataCollection instance include
8
Figure 1: The ToolMatch ontology defines a vocabulary for data collections. Rounded
nodes in the graph represent OWL classes. Edges (black dashed lines )represent
relationships (described as RDF properties) between classes, while solid blue lines
indicate that one class is a subclass of the class it points to. Different colors are
used to separate ToolMatch ontology terms from terms from other ontologies. This
diagram and other ontology diagrams were created using the CMAP ontology editor.
dc:title and dc:description, which are data properties and incorporate DC (Dublin
Core) terms (DCMI Usage Board 2012). Each data collection can be uniquely identified by either a DOI (Digital Object Identifier) or a GCMD-DIF (Global Change
Master Directory - Directory Interchange Format) Entry ID (NASA 2014). Unique
identifiers for data collections will be discussed more in the next section.
Figure 2 showcases the Tool class along with object properties and predicates related
to it. The object predicates include canUseAccessProtocol, canUseDataServer, hasInputFormat, hasOutputFormat, hasCapability, and isOfType. The first predicate,
9
Figure 2: The ToolMatch ontology defines a vocabulary for tools. Rounded nodes in
the graph represent OWL classes. Edges (black dashed lines )represent relationships
(described as RDF properties) between classes, while solid blue lines indicate that
one class is a subclass of the class it points to. Different colors are used to separate
ToolMatch ontology terms from terms from other ontologies. This diagram and other
ontology diagrams were created using the CMAP ontology editor.
canUseAccessProtocol, specifies the access protocols a tool can use to access data.
The term AccessProtocol refers to a standard set of regulations and requirements
governing the access of data electronically. Similar to isAccessedBy for data collections, canUseDataServer indicates which instances of the DataServer class that a Tool
instance can interact with. HasInputFormat and hasOutputFormat detail what type
of data a Tool instance can input or output, where the term DataFormat refers to
organization of information according to preset specifications for the storage of data.
HasCapability specifies the different facilities available through a Tool instance for
visualizing data products. ToolType describes the method in which a tool can be used
(desktop app, browser app, web service, etc.). The terms dc:title and dc:description
are again used to describe the basic information about a tool (DCMI Usage Board
10
2012). The URL for the homepage of a Tool instance is noted by doap:homepage,
where DOAP refers to the Description of a Project vocabulary. We list the version of a
Tool instance by relating a tool to doap:Version through doap:release (Dumbill 2014).
The isVisualizedBy object property relates a DataCollection and a Tool instance,
and means that a data collection is able to be visualized by, or ”matches”, with a
tool. How this matching is achieved will be explained in the following section. In
summary, the goal of ToolMatch was to develop a simple but effective ontology that
represented key aspects of both tools and data collections that would allow for future expansion. The ontology previously explained forms the basis of the ToolMatch
service, and allows for this effective representation. The complete list of ToolMatch
classes and object predicates can be viewed in Table 1 and Table 2, while the full
RDF/XML schema for ToolMatch is located in the Appendix.
11
Table 1: ToolMatch Classes: Classes are the main concepts of the ToolMatch ontology.
Classes may be the superclass or subclass of another class, or neither. Examples
(instances) of each class are also shown.
Class
DataServer
DataAccessProtocol
DataCollection
DataConvention
DataFormat
DataGridType
Tool
ToolType
URL
VisualizeType
Superclass
Subclass
DataGridType
Example
OPeNDAP Hyrax
DAP
Aqua AIRS Level2 Plus AMSU
CF Convention
HDF4, HDF5, NETCDF
DataConvention
Tool
ToolType, VisualizeType
ERDDAP
Desktop app, Browser app, Web service
Tool
Gridded, Mapped
Table 2: ToolMatch Object Predicates: Object predicates relate one class in the
ToolMatch ontology to another class or to a dataype (range). Examples (instances)
of each class are also shown.
Object Predicate
Domain
Range
canUseAccessProtocol
canUseDataServer
hasAccessURL
hasCapability
hasDataFormat
hasInputFormat
hasOutputFormat
isAccessedBy
isOfType
providesAccess
usesConvention
visualizedBy
Tool, ToolType, VisualizeType
Tool, ToolType, VisualizeType
DataCollection
Tool, ToolType, VisualizeType
DataCollection
Tool, ToolType, VisualizeType
Tool, ToolType, VisualizeType
Tool
Tool, ToolType, VisualizeType
DataServer
DataCollection
DataCollection
DataAccessProtocol
DataServer
URL
VisualizeType
DataFormat
DataFormat
DataFormat
DataCollection, DataServer
ToolType
DataAccessProtocol
DataConvention, DataGridType
Tool, ToolType, VisualizeType
12
3.2
ToolMatch Instances
The instances of tools and data collections are shown in Figure 3. Note that a Tool
instance may be a gridding tool or mapping tool, or neither. These are sublclasses of
the Tool class and provide further specification as to the abilities of a tool. A tool
instance can be uniquely identified by it’s name or by its URL. Note that if a tool
belongs to a larger toolset, we do not count the toolset as an instance, but rather
each tool within the larger grouping. While only a small grouping of tools is listed
here, it is sufficient enough for testing matchings with data collections and vice versa.
Continued expansion and better testing of ToolMatch requires that more instances of
tools be added to the triple store.
A data collection may be identified by either a DOI or a GCMD entry ID. The
GCMD database contains more than 30,000 metatdata descriptions of earth science
data collections, and strives to achieve the overall goal of providing scientists with
”comprehensive and high quality database to reduce overall expenditures for scientific data collection and dissemination” (NASA 2014). The GCMD-DIF Entry ID is
determined by the metadata author for the data collection and for ToolMatch serves
as an identifier to complement a DOI. A DOI acts as a unique identifier for a data
collection, and allows for the construction and maintenance of metadata for that
collection. While every data collection will have a DOI, not every one will have a
GCMD-DIF Entry ID associated with it. Lastly, an access URL may be provided for
a data collection. The access URL acts as a location for the landing page, which is
maintained by the data collection creator. While an access URL is useful in locating
metadata for a collection, a DOI or a GCMD entry ID is preferred. As with tools,
more data collections are needed to better test and to expand the ToolMatch service.
13
Figure 3: Shown above are tool and data collection instances. Rounded nodes in
the graph represent OWL classes. Square nodes represent instances. Edges represent
relationships (described as RDF properties) between classes, while solid blue lines
indicate that one class is a subclass of the class it points to. Note that a Tool
instance may be a gridding tool or mapping tool, or neither. This diagram and other
ontology diagrams were created using the CMAP ontology editor.
14
3.3
Semantic Matching of Tools and Data Collections
Semantic matching is the primary goal of both ToolMatch use cases, be it matching
data collections with tools or vice versa. This can be achieved through the use of
inferencing between ToolMatch ontological properties. SWRL (Semantic Web Rule
Language) in particular allows the creation of rules. The SWRL developed for these
use cases (shown below), written in human readable syntax, finds tools that can
visualize a given data collection and vice versa:
DataCollection(?dc) ^ Tool(?t) ^ hasDataFormat(?dc, ?df) ^ isAccessedBy(?dc, ?ds) ^
canUseAccessProtocol(?t, ?p) ^ providesAccess(?ds, ?p) ^ hasInputFormat(?t, ?df) ^
canUseDataServer(?t, ?ds) => visualizedBy(?t, ?dc)
Here we state that if an instance of a DataCollection(?dc) and a Tool(?t) both
have the same DataFormat (through the use of hasDataFormat(?dc, ?df) and hasInputFormat(?t, ?df) properties), use the same DataServer (indicated by the isAccessedBy(?dc, ?ds) and canUseDataServer(?t, ?ds) properties), and that the DataServer
and the Tool both use the same data access protocol (canUseAccessProtocol(?t,?p)
and providesAccess(?ds, ?p)), then the DataCollection ”matches” with the Tool (visualizedBy(?t, ?dc)).
Figure 4 provides a concrete example of semantic matching in ToolMatch. Here we
are given a DataCollection instance, identified as the Aqua Airs Level2 Plus AMSU,
which consists of ”retrieved estimates of cloud and surface properties, plus profiles
of retrieved temperature, water vapor, ozone, carbon monoxide and methane” (AIRS
Science Team 2013). This data collection has certain metadata properties (data
server, data convention, data format, and grid type). From these properties we infer
that the given data collection can be visualized by (matches) the IDV, McIDAS-V,
and Panoply tools. If a data collection does not match with a tool, this means that no
tools currently present in the ToolMatch knowledge store share the same metadata
15
characteristics with the data collection, or that not enough information about the
data collection was entered by the data user. Therefore, the more metadata a user
enters about a tool or data collection, the more likely they are to find a match.
Figure 4: Inferencing Example: Given a data collection with certain properties described, we can infer what tools the data collection matches with. The ”equivalent
class” section represents the given information for the first use case. The ”subclass
of” section shows what is inferred from the given information, and is achieved by
semantic matching of metadata for both tools and collections.
16
4
4.1
ToolMatch Web Service
ToolMatch Web Service (Tools)
The goal of the ToolMatch web service is to create a simple but effective user interface for data and tool users to find tools and/or data collections. For this service
we assume these users are not ontologists, but are still interested in building formal
models of an OWL ontology, testing the validity of the models, expressing rules using
ontological concepts, and retrieving information via ontologically based queries. The
service allows users to add, edit, and delete both tools and data collections from the
ToolMatch triple store. Users may also query the ToolMatch triple store through the
use of SPARQL, as well as view the ToolMatch schema in a readable format.
Figure 5 displays an example of a layout of all current Tool instances. These tools
are queried via SPARQL from the ToolMatch triple store. Here only the name of the
tool itself is shown, but each entry leads to a splash page for a tool. Again, each tool
can be edited or deleted, and a user can find matching data collections that the tool
can visualize or map.
Figure 6 shows a splash page for an individual Tool instance. This splash page displays the instance name, description, version, access URL, as well as an image for
the tool and any input or output formats. This information is retrieved from the
ToolMatch triple store via SPARQL queries as well. This page allows the user to see
more detailed information regarding a tool instance. If necessary, the user can update
any outdated information (such as the version of the tool) or correct any incorrect
information about the tool instance. Again, only necessary information needed to
identify the tool properly is provided.
17
Figure 5: The ToolMatch web service Tool List shows all Tool instances currently in
the ToolMatch triple store. Each entry links to a splash page for the tool, and can
be edited (via the tool form) or deleted, both of which use SPRAQL queries to make
necessary changes. Users may also find matching data collections, which will display
in a list similar to this one.
Figure 7 displays the ToolMatch tool form. This allows the user to enter or edit
information that describes a tool instance. This includes the tool name, description,
version, and the tool page url, as well as other optional fields such as the tool logo,
input and output formats, tool capabilities and the tool type. Once each tool is submitted, RDF/XML for a tool instance is created and then stored ToolMatch triple
store as a graph. Sample RDF for the Tool instance Panoply can be view in the
Appendices.
18
Figure 6: The ToolMatch web service splash page displays essential information about
each Tool instance. This includes a brief description of the tool, the current version,
the URL for the tool page, input and output formats, tool capability and tool type.
Splash pages for data collections (not shown) are similar in this regard.
19
Figure 7: The ToolMatch web service tool form allows users to enter information to
add or update a Tool instance. This includes the tool name, description, version, and
the tool page URL, as well as other optional fields such as the tool logo, input and
output formats, tool capabilities and the tool type.
20
4.2
ToolMatch Web Service (Data Collections)
The web service for data collections is similar to that for tools, albeit with some differences. Figure 8 shows the first part of the data collection form that asks the user to
enter one or more collection identifiers, consisting of a DOI, a GCMD entry ID, or an
access URL for the data collection. If the user enters a DOI or access URL, the landing page for the data collection is shown in the following form. Similarly, if a GCMD
entry ID is entered, the main form is pre-populated with information about the data
collection. This prevents the user from having to search to find all the relevant information regarding a data collection. Instead, a user simply has to know the DOI,
GCMD, or access URL for that data collection. Requiring less information from the
user prevents data entry error and ensures greater accuracy about the data collection.
Figure 9 details the main input form for data collections. Similar to the tool form, the
data collection form asks for basic information about a data collection, including the
name, description, a DOI, GCMD Entry ID, an access URL, as well as information
about the data collection’s format, convention, and server accessibility. Once a data
collection is entered it is added to the ToolMatch triple store and is then displayed in
a list similar to tools where each entry leads to a splash page for each data collection.
From there data collections can be edited or deleted, or a user can search for matching
tools that can visualize or map the data collection. Data collections in the triple store
can also be queried through SPARQL.
The ToolMatch web service, in conclusion, is based on a simple ontology and set
of rules that describe what type of tools work with what type of data collections, and
vice versa. This helps to facilitate a crowd-sourced approach for domain experts who
are not ontologists by allowing ToolMatch clients direct access to the knowledge base
without needing to necessarily understand ontologies. However, the option to query
21
the ToolMatch triple store is still an option if a user wishes to use SPARQL.
Figure 8: The ToolMatch web service data collection form prompts the user to enter a
data collection DOI, GCMD entry ID, or access URL. Once submitted, the following
form is pre-populated with information about the data collection if a GCMD entry
ID is entered. If a DOI or access URL is entered, the data collection landing page is
shown alongside the form. This prevents unnecessary searching of information about
the data collection.
22
Figure 9: The ToolMatch web service data collection form prompts users to enter
information to add or update a DataCollection instance. This includes the name,
description, a DOI, GCMD Entry ID, an access URL, as well as information about
the data collection’s format, convention, and server accessibility.
23
5
Third Use Case
As stated in the introduction, the third use case extends the scope of the first use
case but at the same time specifies its parameters. Given a data collection and with
additional semantics in mind for that data collection, find tools that can be used with
the content of that data. The third use case involves modification to the ToolMatch
conceptual model and ontology, as well as to the ToolMatch web service.
In order to implement the changes needed in the third use case, additions and modifications to the existing ToolMatch ontology must be made, many of which revolve
around the Tool and DataCollection classes. Because we are attempting to match
tools with data collection content, which includes measurements and sensor readings, ToolMatch will need a method of matching based upon these specific attributes.
This will necessitate the expansion of the ToolMatch ontology, specifically for both
the Tool and DataCollection classes. This requires the incorporation of the OGC
(Open Geospatial Consortium) and ISO Observations and Measurements ontology.
5.1
Observation and Measurements Ontology
The Observations and Measurements ontology (published as ISO/DIS 19156) defines
a conceptual model for observations, and for features involved in sampling when making observations (OGC 2011). The goal of the ontology is to enable interoperability
between scientific and technical communities. At the heart of the ontology is the
observation. An observation can be define as ”an act associated with a discrete time
instant or period through which a number, term or other symbol is assigned to a
phenomenon” (Cox 2012). This is most often achieved through the use of a sensor
or other instrument. The result of an observation, as described in the model itself is
24
”an estimate of the value of a property of some feature” (Cox 2012).
The Observations and Measurments ontology defines, for each observation, a feature
of interest. This is incorporated through the General Feature Model (ISO 19109)
(Cox 2012). A feature of interest is defined as a ”typed object with identity,” an example being vector data (Cox 2013). For each feature there is a feature type, which
is defined by a characteristic set of properties, such as ”attributes, associations, operations”. These feature types are usually specific to an application domain, and
map with objects that are exist in the real world, for instance a ”road, mine, truck,
or storm” (Cox 2013). An XML observation example (taken from the OGC site) is
shown in the appendices. Given that a data collection has data content, which contains observations that relate to a general feature type, we can semantically match a
given tool with data collection content.
5.2
Changes in the ToolMatch Ontology
Figures 10 and 11 show the changes to the ToolMatch ontology. The Tool class will
undergo several changes. Each Tool instance will include the object predicate hasDomain, whose range will be the object property Domain. The domain class will
can have multiple features. For example if given a hyrdological model, features could
include: sinks, flow direction, flow accumulation, watersheds, stream networks, etc.
The DataCollection class, on the other hand, sees several new additions. Each DataCollection instance will be linked with an observation instance by the object predicate
hasObservation. Each observation will have a feature of interest (through the predicate featureOfInterest), which holds conceptual significance within the application
domain. The matching here occurs between each data collection’s Feature property
and each tool’s Domain property. The matching between these two properties will
occur in addition to the previously stated conditions for matching, since the data
25
content must still have the same properties (data format, server accessibility, etc) .
Figure 10: The updated ToolMatch ontology for tools takes into consideration the
conditions of the third use case. In addition to the notations described in previous
ontology figures, any changes are shown in purple. These changes include two classes
from the Observations and Measurements ontology, as well as two object predicates
detailing the relationship between the classes themselves and between the classes and
the Tool class.
26
Figure 11: This figure displays the updated ToolMatch ontology for data collections
takes into consideration the conditions of the third use case. In addition to the notations described in previous ontology figures, any changes are shown in purple. These
changes include two classes from the Observations and Measurements ontology, as well
as two object predicates detailing the relationship between the classes themselves and
between the classes and the Data Collection class.
27
5.3
Semantic Matching
This expansion of the ToolMatch ontology necessitates an enlargement of the existing
SWRL rules as well. The third use case here requires matching between content and
tools. The matching for tools and data collection content now takes in this added
parameter of domain and features. Given data collection content, which has a domain
and features that fall within that domain, find tools that match with that data collection content. If the tools meets all the requirements previously necessary for tools
and data collections, and if both the tool and the content have the same features in
an application domain (represented by the hasDOmain(?t, ?d), featureOfInterest(?d,
?f) for a tool and hasObservation(?dc, ?o) and featureOfInterest(?o, ?f) for data collection content), then it can be stated that they match (visualizedby(?t, ?dc)). The
updated SWRL for matching a given data collection to a tool is shown below:
DataCollection(?dc) ^ Tool(?t) ^ hasDataFormat(?dc, ?df) ^ isAccessedBy(?dc, ?ds) ^
canUseAccessProtocol(?t, ?p) ^ providesAccess(?ds, ?p)^ hasDomain(?t, ?d) ^
hasInputFormat(?t, ?df) ^ canUseDataServer(?t, ?ds) ^ hasObservation(?dc, ?o) ^
featureOfInterest(?o, ?f)
=> visualizedBy(?t, ?dc)
In addition to the previous matching between the DataFormat and DataServer of
a Tool and DataCollection instance, matching now occurs between the domain of a
Tool instance and DataCollection content. Specifically, if a content’s observation’s
featureOfInterest property belongs to a Tool’s domain, then the content and the
tool are said to be matched. As an example we state that a user has watershed
measurements, and needs a tool to model those measurements. ArcGIS offers a
hyrdological toolset, containing mutliple tools that model the flow of water across a
surface (ArcGIS Resource Center 2011). While one or more tools may or may not
be useful with the content that a data user has, it would be time consuming to test
each tool within the toolset to determine usefulness. If a tool within the toolset and
the data content had the same features, and therefore the same domain, we could
28
determine that the tool could model the data collection content. From this matching,
a user would be able to find a tool within a hyrdological model (based upon the
domain of the tool) to visualize the data content.
5.4
Discussion and Conclusion
The third use case expands the scope of the first use case, but at the same time
specifies what type of domain data collection content can have. In this manner we
can effectively match data collection content with tools that can be used with it.
Further testing of this use case requires the addition of more instances for both tools
and data collections. In addition, the ToolMatch web service must also be updated to
reflect the changes already detailed in the ToolMatch ontology and SWRL rules. The
primary changes would occur in the forms for both tools and data collections. The
tool form would be expanded to allow a tool user to input the domain of the tool, as
well as features that fall within a given domain. For data collections, the current form
in place would be altered to let users provide observations for data collection content.
Each observation would be associated with a feature of interest. In the future this
could be expanded to allow a data user to simply submit the data collection content
(as an RDF/XML or JSON object), and the form would automatically find what
tools would work with that content.
29
6
Future Work
As ToolMatch has begun adding information to the underlying Knowledge Store, it
has become increasingly clear that employing other sources of information such as
data catalogs, and data or tool registries is not only advisable but critical to leverage
and scale a service like ToolMatch. In-depth analysis of the types of data collections,
visualization tools, and technologies used by these data catalogs and registries will
be necessary in order to understand how the ToolMatch service can use them in a
practicable, scalable manner. This kind of analysis will also help the ToolMatch team
move toward the goal of demonstrating how the service can be incorporated into existing sets of information services found at data and archive centers.
Additionally, ToolMatch seeks to constantly incorporate new data collections and
new tools, as well as annotations for each. This will allow for more thorough testing
of the ToolMatch service on a wider scale for both tools and data collections. Finally,
while the ToolMatch web service is designed for ”lay users”, or those who are less
familiar with ontologies and SPARQL, it is not designed for those who are experts in
those fields. Thus a further extension would be the option for users to submit data
collections or tools through the use of SPARQL update queries or through submitting
new triples directly to the ToolMatch knowledge base via a RESTful service.
To date, most of the data collections and tools that have been included in the ToolMatch service have been NASA-generated. The initial interest in integrating a data
catalog (a project initiated by the ESIP Energy and Climate cluster known as the
Decision Support Tools Catalog and Community of Practice), fell through when development of the catalog was halted. In its place, other opportunities have arisen. In
particular, several other communities represented within the ESIP Federation have
expressed interest in using the ToolMatch service if it can meet their specific needs,
30
such as USGS (U.S Geological Survey) and NOAA (National Oceanic and Atomspheric Administration). These communities are especially interested in exploring
the integration of data catalogs with ToolMatch.
As described in the previous section, the ToolMatch web service must also be changed
to reflect the modifications already present in the ToolMatch ontology and SWRL
rules for the third use case. The addition of more instances for tools, and specifically
for data collection content must be implemented. This will allow for sufficient testing
of the inferencing developed for the third use case, and will ensure that data users
can properly utilize the benefits of the ToolMatch service.
Finally, it is the hope of ToolMatch that even more use cases will be developed
that move beyond the scope of the first two use cases and the third use case. This
could include the development of a use case where rules map entire classes of tools to
classes of data collections. This use case represents a much broader use case in terms
of scope, as a class here represents a large group of either tools or data collections
instead of just one. These are the types of use cases that ToolMatch hopes to tackle
in the future.
31
7
Conclusion
Using a simple ontology created and a basic web service, ToolMatch is able to effectively allow data users to find tools to work with their data, and tool users to find
data collections that the tools can visualize or map. This can avoid completely the
wasted time and effort needed to conventionally search, as semantic matching does
all the work for the user. A tool user would simply need to know the name of the
tool that they have, while a data user would only need a unique identifier for a data
collection (or the data collection itself). With the third use case, data users with
specific content could also find matching tools within a toolset that the model might
not fit itself. The ToolMatch service offers great potential for in the area of semantic
matching, and promises to bring many benefits to the ESIP community, as well as
other scientific domains that it could possibly expand to.
Community testing of the ToolMatch service would help refine the understanding
of the use cases underlying the service, and further the definition, design and implementation of the service. In addition, that testing and refinement would further
demonstrate the utility of the underlying Semantic Web technologies that are focused upon science data collections and science data tools and move the science data
community forward in this area. While the ToolMatch service works with the three
simple but effective, making the service available openly to a broader community of
data users and tool developers could persuade others to utilize, improve and expand
the service. In addition, exploration of the factors involved in incorporating a ToolMatch service into other important and existent information service tools such as
data,tool, and service catalogs and registries, and into existing information service
suites offered by data centers and archives would greatly inform and influence the
adoption and improvement of such a matching service.
32
References
GES DISC (Goddard Earth Sciences Data and Information Services Center).
“AIRX2RET Version 006: Aqua AIRS Level 2 Standard Physical Retrieval
(AIRS AMSU).” Accessed November 18, 2014.
http://disc.sci.gsfc.nasa.gov/datacollection/AIRX2RET V006.html?AIRX2RET.
Alhazbi, S., K.M. Khan, and A. Erradi. “Preference-based semantic matching of
web service security policies.” Paper presented at the World Congress on
Computer and Information Technology (WCCIT), Sousse, Tunisia, June 22-24,
2013.
ArcGIS Resource Center. “An Overview of the Hydrology Toolset”. Accessed
November 17, 2014.
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html
Cox, Simon. “Overview of Some Relevant Standards from ISO/TC 211.” Last
modified October 28, 2013.
https://www.seegrid.csiro.au/wiki/AppSchemas/IsoTc211Standards.
Cox, Simon. “OWL Representation of ISO 19156 (Observation Model).” Last
modified July 24, 2012.
http://def.seegrid.csiro.au/isotc211/iso19156/2011/observation.
Dublin Core Metadata Initiative. “DCMI Metadata Terms.” Last modified June
14, 2012. http://dublincore.org/documents/dcmi-terms/.
ESIP Tools Catalog. “Decision Support Tools Catalog and Community of
Practice.” Accessed November 13, 2014. http://dstccp.esipfed.org/esip/.
Dumbill, Edd. “Description of a Project.” Last modified January 26, 2014.
https://github.com/edumbill/doap/wiki.
33
Ghomari, L. and A.R. Ghomari. “A Comparative Study: Syntactic versus
Semantic Matching Systems.” Paper presented at the International Conference
on Complex, Intelligent and Software Intensive Systems, Fukuoka, Japan, March
16-19, 2009.
Horrocks, Ian, Peter F. Patel-Schneider, Harold Boley, Said Tabet, Benjamin
Grosof, and Mike Dean. “SWRL: A Semantic Web Rule Language Combining
OWL and RuleML.” Accessed November 17, 2014.
http://www.w3.org/Submission/SWRL/.
Kuhn, Werner. “A functional ontology of observation and measurement.” In
GeoSpatial Semantics, 26-43. Springer Berlin Heidelberg: 2009.
Lu, Shao Yuan, Kuo-Hsun Hsu, and Li-Jing Kuo. “A Semantic Service Match
Approach Based on WordNet and SWRL Rules.” Paper presented at the
e-Business Engineering (ICEBE), 2013 IEEE 10th International Conference,
Coventry, United Kingdom, September 11-13, 2013.
Luo, An, Yandong Wang, Lafang Wang, and You He. “Multi-level Semantic
matching of Geospatial Web Services.” Paper presented at the International
Conference on Geoinformatics, Fairfax, Virginia, August 12-14, 2009.
Menemencioglu, O and I.M. Orkak. “A Review on Semantic Web and Recent
Trends in Its Applications.” Paper presented at the IEEE International
Conference on Semantic Computing (ICSC), Newport Beach, California, June
16-18, 2014.
National Aeronautics and Space Administration. “Directory Interchange Format
(DIF) Writer’s Guide.” Last modified January 1, 2014.
http://gcmd.nasa.gov/add/difguide/.
34
Open Geospatial Consortium. “Observations and Measurements.” Accessed
November 18, 2014. http://www.opengeospatial.org/standards/om.
Peng, Hui. “Context Aware Semantic Web Services Description and Match.” Paper
presented at the International Conference on Cyber-Enable Distributed
Computing and Knowledge Discovery (CyberC), Beijing, China, Ocotber 10-12,
2013.
Smith, Michael, Chris Welty, and Deborah McGuinness. “OWL Web Ontology
Language Guide.” Accessed November 17, 2014.
http://www.w3.org/TR/owl-guide/.
World Wide Web Consortium. “OWL Representation of ISO 19109 (General
Feature Model)”. Last modified July 1, 2012.
http://def.seegrid.csiro.au/isotc211/iso19109/2005/feature#.
World Wide Web Consortium. “OWL Web Ontology Language Reference.”
Accessed November 17, 2014. http://www.w3.org/TR/owl-ref/.
Zhang, Zhen, Shizhan Chen and Zhiyong Feng. “Semantic Annonation for Web
Services Based on DBpedia.” Paper presented at IEEE 7th International
Symposium on Service Oriented System Engineering (SOSE), Redwood City,
California, March 25-28, 2013.
35
Appendices
Listing 1: ToolMatch Schema (N3)
@prefix : < http :// toolmatch . esipfed . org / schema # > .
@prefix dcterms : < http :// purl . org / dc / terms / > .
@prefix foaf : < http :// xmlns . com / foaf /0.1/ > .
@prefix owl : < http :// www . w3 . org /2002/07/ owl # > .
@prefix rdf : < http :// www . w3 . org /1999/02/22 - rdf - syntax - ns # > .
@prefix rdfs : < http :// www . w3 . org /2000/01/ rdf - schema # > .
@prefix time : < http :// www . w3 . org /2006/ time # > .
@prefix tw : < http :// tw . rpi . edu / schema / > .
@prefix twi : < http :// tw . rpi . edu / instances / > .
@prefix xml : < http :// www . w3 . org / XML /1998/ namespace > .
@prefix xsd : < http :// www . w3 . org /2001/ XMLSchema # > .
< http :// toolmatch . esipfed . org / schema > a owl : Ontology ;
rdfs : label " ToolMatch Ontology and Rules " @en ;
dcterms : contributor twi : Christopher_Lynnes ,
twi : PatrickWest ,
< http :// tw . rpi . edu / instances / person / MatthewFerritto > ,
< http :// tw . rpi . edu / instances / person / NancyHoebelheinrich > ;
dcterms : creator twi : EricRozell ;
dcterms : date "2014 -07 -01" ;
dcterms : publisher twi : E SIPFeder ation ;
dcterms : rights " This ontology is distributed under a Creative Commons Attribution License http ://
c re at iv e co mm o ns . org / licenses / by /3.0/" @en ;
rdfs : comment """ ToolMatch is an ontology and set of rules that can help researchers determine
what tools can be used given a particular set of data , or find data that can be used within
a tool . """ @en ;
owl : imports time : ,
foaf : ;
owl : versionInfo " v 1.0 $Revision : 9878 $ $Author : pwest $ $Date : 2014 -07 -03 23:14:15 -0400 ( Thu ,
03 Jul 2014) $ " .
: DataGridType a owl : Class ;
rdfs : label " Data Grid Type " @en ;
rdfs : comment " Gridded data ( or raster data ) are the result of converting scattered individual
data points from one or more sources into a regular \" grid \" ( or \" raster \") of calculated
36
values based on spacial parameters " @en ;
rdfs : subClassOf : Dat aConven tion .
: c a n U s e A c c e s s P r o t o c o l a owl : Object Property ;
rdfs : label " Can Use Data Access Protocol " @en ;
rdfs : comment """ Specifies the access protocols that the tool can use to access data """ @en ;
rdfs : domain : Tool ;
rdfs : range : D a t a A c c e s s P r o t o c o l .
: canU s e D a t a S e r v e r a owl : Objec tPropert y ;
rdfs : label " Can Use Data Server " @en ;
rdfs : comment """ Specifies the type of data servers that the tool can interract with """ @en ;
rdfs : domain : Tool ;
rdfs : range : DataServer .
: hasAccessURL a owl : Objec tProper ty ;
rdfs : label " Has Access URL " @en ;
rdfs : comment """ A data collection can be accessed via aparticular URL """ @en ;
rdfs : domain : Da taColle ction ;
rdfs : range : URL .
: hasCapability a owl : Obj ectProp erty ;
rdfs : label " Has Capability " @en ;
rdfs : comment """ Specifies the different facilities available through the tool for visualizing
data products """ @en ;
rdfs : domain : Tool ;
rdfs : range : VisualizeType .
: hasDataFormat a owl : Obj ectProp erty ;
rdfs : label " Has Data Format " @en ;
rdfs : comment """ A Data Collection is stored in a particular data format , such as hdf5 , netcdf """
@en ;
rdfs : domain : Da taColle ction ;
rdfs : range : DataFormat .
: hasInputFo rmat a owl : O bjectPro perty ;
rdfs : label " Has Input Format " @en ;
rdfs : comment """ A tool can input or import data products that are of a particular format , such as
netcdf4 , hdf5 """ @en ;
37
rdfs : domain : Tool ;
rdfs : range : DataFormat .
: hasO u tp ut Fo r ma t a owl : ObjectP roperty ;
rdfs : label " Has Output Format " @en ;
rdfs : comment """ A tool can output data products in a particular format , such as jpg , json ,
netcdf4 """ @en ;
rdfs : domain : Tool ;
rdfs : range : DataFormat .
: isAccessedBy a owl : Objec tProper ty ;
rdfs : label " Is Accessed By " @en ;
rdfs : comment """ A Data Collection can be accessed by a particular Server """ @en ;
rdfs : domain : Da taColle ction ;
rdfs : range : DataServer .
: isOfType a owl : Obje ctPrope rty ;
rdfs : label " Is Of Type " @en ;
rdfs : comment """ The tool can be a desktop app , browser
app , web service , etc ...""" @en ;
rdfs : domain : Tool ;
rdfs : range : ToolType .
: providesAc cess a owl : O bjectPro perty ;
rdfs : label " Provides Access " @en ;
rdfs : comment """ A Data Server can provide access to data and information using a particular
protocol """ @en ;
rdfs : domain : DataServer ;
rdfs : range : D a t a A c c e s s P r o t o c o l .
: usesConven tion a owl : O bjectPro perty ;
rdfs : label " Uses Convention " @en ;
rdfs : comment """ A data collection follows a particular set of agreed upon rules in its
repr esentati on and metadata """ @en ;
rdfs : domain : Da taColle ction ;
rdfs : range : Data Convent ion .
: visualizedBy a owl : Objec tProper ty ;
rdfs : label " Visualized By " @en ;
38
rdfs : comment """ A Data Collection can be visuzlied by a particular tool """ @en ;
rdfs : domain : Da taColle ction ;
rdfs : range : Tool .
: ToolType a owl : Class ;
rdfs : label " Tool Type " @en ;
rdfs : comment " The method in which a tool can be used , such as as a desktop app , browser app , web
service , etc ..." @en .
: URL a owl : Class .
: Da t a A c c e s s P r o t o c o l a owl : Class ;
rdfs : label " Data Access Protocol " @en ;
rdfs : comment " A standard set of regulations and requirements governing the access of data
elec tronical ly " @en .
: DataConven tion a owl : Class ;
rdfs : label " Data Convention " @en ;
rdfs : comment " A Data Convention is an established technique or practice related to the
repr esentati on of data in data collections ." @en .
: DataFormat a owl : Class ;
rdfs : label " Data Format " @en ;
rdfs : comment " The organization of information according to preset spec ificatio ns for the storage
of data " @en .
: DataServer a owl : Class ;
rdfs : label " Data Server " @en ;
rdfs : comment " A ToolMatch Data Server is a piece of software that can access , manipulate , and
return data products derived from datasets , including the datasets themselves " @en .
: DataCollec tion a owl : Class ;
rdfs : label " Data Collection " @en ;
rdfs : comment " A collection of related sets of information that is composed of separate elements
but can be manipulated as a unit by a computer ." @en .
: Tool a owl : Class ;
rdfs : label " Tool " @en ;
39
rdfs : comment " A ToolMatch Tool is a computer - based utility that can be used for data access ,
manipulation , and visualization ." @en .
Listed above is the full ToolMatch schema in N3 format, with annotations. Listed
first are all object properties, followed by all class descriptions. Classes represent
concepts within the ontology, while object properties act as the relationships
between them. The N3 (Notation 3) format was developed with the purpose of
human readability, as opposed to RDF/XML.
40
Figure 12: Shown above are ToolMatch instances for (from top to bottom) data
formats, data grid types, data conventions, data access protocols, and data servers.
Rounded nodes in the graph represent OWL classes. Square nodes represent instances.
Edges represent relationships (described as RDF properties) between classes, while
solid blue lines indicate that one class is a subclass of the class it points to. Note that
a Tool instance may be a gridding tool or mapping tool, or neither. This diagram
and other ontology diagrams were created using the CMAP ontology editor.
41
Figure 13: The code shown above represents the RDF/XML for the Tool instances
Panoply. Name spaces are shown on the following page.
42
Table 3: A list of namespaces and prefixes used throughout the thesis.
Prefix
URI
dc
http://purl.org/dc/terms/
foaf
http://xmlns.com/foaf/0.1/
time
http://www.w3.org/2006/time#
twi
http://tw.rpi.edu/instances/
tw
http://tw.rpi.edu/schema/
xsd
http://www.w3.org/2001/XMLSchema#
rdf
http://www.w3.org/1999/02/22-rdf-syntax-ns#
dcat
http://www.w3.org/ns/dcat#
doap
http://usefulinc.com/ns/doap#
owl
http://www.w3.org/2002/07/owl#
gcmd
http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/
43
Listing 2: Observation XML Example
<! -- Observation example for sampling geometry extension for
observations as defined in SOS extension TBD -- >
< om :OM_Ob servat ion xmlns:om = " http: // www . opengis . net / om /2.0 "
xmlns:xsi = " http: // www . w3 . org /2001/ XMLSchema - instance " xmlns:xlink
= " http: // www . w3 . org /1999/ xlink " xmlns:gml = " http: // www . opengis . net
/ gml /3.2 " gml:id = " obsTest1 " x si: sc he ma Lo ca ti on = " http: // www .
opengis . net / om /2.0 http: // schemas . opengis . net / om /2.0/ observation .
xsd " >
<! --
optional description of observation
-- >
< gml:description > Spatial observation test instance: water level </
gml:description >
<! --
optional name of observation
-- >
< gml:name > Spatial observation test 1 </ gml:name >
<! --
phenomenon time of observation
-- >
< om :pheno menonT ime >
< gml:TimeInstant gml:id = " pt1 " >
< gml:timePosition > 2010 -03 -08 T16:22:25 .00 </ gml:timePosition >
</ gml:TimeInstant >
</ om :pheno menonT ime >
<! -- result time is same as phenomenon time of observation -- >
< om:resultTime xlink:href = " # pt1 " / >
<! -- link to DescribeSensor operation of SOS which is providing the
sensor description -- >
< om:procedure xlink:href = " http: // mySOSURL ? service = SOS & request =
DescribeSensor & version =2.0.0& p ro c e du r e Id e n ti f i er = " procedure1 " / >
<! - - parameter containing samplingPoint as defined in SOS 2.0
Extension - Data Encoding Restriction - - >
< om:parameter >
< om:NamedValue >
< om:name xlink:href = " http: // www . opengis . net / req / omxml /2.0/ data /
samplingGeometry " / >
< om:value >
44
< gml:Point gml:id = " SamplingPoint " >
< gml:pos srsName = " u r n : o g c : d e f : c r s : E P S G : 4 3 2 6 " >52.9 7.52 </ gml:pos >
</ gml:Point >
</ om:value >
</ om:NamedValue >
</ om:parameter >
<! - -
a notional URN identifying the observed property
-->
< o m: o b se r v ed P r op e r ty xlink:href = " http: // sweet . jpl . nasa . gov /2.0/
hydroSurface . owl # WaterHeight " / >
<! - - a notional WFS call identifying the object regarding which the
observation was made -->
< o m : f e a t u r e O f I n t e r e s t xlink:href = " http: // wfs . example . org ? request =
getFeature & featureid = river1 " / >
<! - - The XML Schema type of the result is indicated using the value
of the xsi:type attribute -->
< om:result xsi:type = " gml:MeasureType " uom = " cm " >28 </ om:result >
</ om:OM_Observation >
Listed above is an example of XML for an observation. Important to note here is
om:featureofInterest, which the Observation and Measurments ontology integrates
from the General Feature model. This property will be incorporated into the
ToolMatch ontology to describe observations in data collections and used in the
matching of domains between data collection content and tools
45