DataLift Use Case - Linking Geographical Entities v2

Linking Geographical Entities
DataLift Use Case
By: Ali Masri
Objectives
Find links between cities and transportation systems’ stop points
Use Geometric distance measure for matching
Query and visualize the results
Process
Getting the
data
Transform into
RDF
Setting up Silk
Write the
Linkage Rule
Matching Data
Query and
Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Data Sources
Dbpedia
◦ Representing cities in France
◦ Used the SPARQL endpoint of dbpedia
http://fr.dbpedia.org/sparql
◦ Using the query:
SNCF
◦ Representing RER A stop points
◦ The format is GTFS and available from
◦ http://gtfs.s3.amazonaws.com/transilien-archiver_20150306_0205.zip
SELECT ?lat ?lon ?ville ?nom WHERE {
?ville
<http://dbpedia.org/ontology/country>
<http://fr.dbpedia.org/resource/France> ;
<http://fr.dbpedia.org/property/latitude>
?lat ;
<http://fr.dbpedia.org/property/longitude>
?lon ;
<http://xmlns.com/foaf/0.1/name>
?nom ;
<http://dbpedia.org/ontology/wikiPageWikiLink>
<http://fr.dbpedia.org/resource/Commune_française> .
} ORDER BY ?nom
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Dbpedia – Data Sample
lat;lon;ville;nom
50.376462;3.081655;http://fr.dbpedia.org/resource/Abbaye_des_Prés_de_Douai;Abbaye des Prés
50.248013;3.63553;http://fr.dbpedia.org/resource/Abbaye_Sainte-‫ة‬lisabeth_du_Quesnoy;Abbaye Sainte-‫ة‬lisabeth
48.728723;2.203225;http://fr.dbpedia.org/resource/Abbaye_Saint-Louis-du-Temple_de_Vauhallan;Abbaye Saint-Louisdu-Temple
49.04416667;7.183611111;http://fr.dbpedia.org/resource/Achen;Achen
45.38444444;-0.471944444;http://fr.dbpedia.org/resource/Agudelle;Agudelle
46.11666667;-0.9325;http://fr.dbpedia.org/resource/Aigrefeuille-d'Aunis;Aigrefeuille-d'Aunis
47.0775;-1.399166667;http://fr.dbpedia.org/resource/Aigrefeuille-sur-Maine;Aigrefeuille-sur-Maine
45.547176;6.612139;http://fr.dbpedia.org/resource/Villette_(Savoie);Aime
48.357778;7.319167;http://fr.dbpedia.org/resource/Albé;Albé
46.18777778;-0.907222222;http://fr.dbpedia.org/resource/Anais_(Charente-Maritime);Anais
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
SNCF – Data Sample
stop_id;stop_name;stop_desc;stop_lat;stop_lon
StopArea:DUA8754526;ABLON;;48.725069;2.419203
StopArea:DUA8738605;ACHERES GRAND CORMIER;;48.955294;2.09278
StopArea:DUA8738165;ACHERES VILLE;;48.969944;2.077125
StopArea:DUA8700147;AEROPORT CHARLES DE GAULLE 2 TGV;;49.004;2.567437
StopArea:DUA8727146;AEROPORT CHARLES DE GAULLE 1;;49.010452;2.559885
StopArea:DUA8738149;ANDRESY;;48.975237;2.050147
StopArea:DUA8754309;ANGERVILLE;;48.311296;2.003525
StopArea:DUA8775875;ANTONY;;48.754712;2.301724
StopArea:DUA8754546;ARPAJON;;48.586312;2.242379
StopArea:DUA8741071;Arquebuse;;49.004356;1.910785
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
DataLift
We will use DataLift to help us with transforming the data and querying it
The latest version of DataLift is available on
◦ http://datalift.org/
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Start by creating a new DataLift Project
DataLift can be accessed by the following
link: http://localhost:9091
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
We choose any name for the project and
proceed to the next step
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
We click on the sources tab to add new
sources for our project
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Click on the plus button to add a new
source
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
1. Upload the file
2. Add a name and a
description if needed
3. Choose the separated by
option (in our case it is a
semicolon)
4. Click create to proceed
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Do the same on the other file
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Now our sources are ready
Next step is to convert them into RDF
We go back to the description tab to access
our modules
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Our files are in CSV format
We choose the “Direct
mapping CSV to RDF module
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Choose the source to convert
It is optional to create your
own names for the new
source and the graph name
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Do the same for the second
file
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
The RDF files are generated
Click on any RDF file to see
what it looks like
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
For the next step we need
the generated RDF files
We go back to the description
tag and select the “Data
export to RDF file module”
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Choose the source to export
and the RDF export format
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Same is done for the second
file
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Silk
For Data Linking we will use the Silk Workbench
The latest version of Silk is available on
◦ http://silk.wbsg.de/
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Run your Silk Workbench
application and open your
browser to
http://localhost:9000
Click on open workspace
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Add a new project by clicking
on the project button
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Choose any name and click
Ok to proceed
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Now we have to import our
RDF files generated by
DataLift
Click on resources button to
do so
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Upload the files and choose
the name to be displayed
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Do this for all of your files
and close to continue
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Now we have to create a new
source and link it to the files
we uploaded previously
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Our files are RDF dumps so navigate
to this window
Select the name, file and the format
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Do the same for the second
file
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
To add the linkage rule
Click on the linking
button
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Specify the name of the rule,
the source and the target
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Specify and output
files to put our
results in
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Choose the name of the file
and the export format
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Click open to start writing the
linkage rule
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Drag the properties, the
transformation methods and
the distance measurement
We want to link two cities
that are within 5km of each
others
We concatenate the latitude
and the longitude of each
source with a space between
them
Then we use the geographical
distance with a threshold
5km to do the task
Click on generate links to
proceed
Getting the Data
Transform into RDF
Click start to start the
matching process
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Click here to proceed
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Now the links are generated
check the properties and
choose whether they are
correct or not
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
After you are satisfied with
the results click on reference
links to export the links
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Choose the output file and
the name of the linking
property
Click export to proceed
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
To reach the output file navigate to
[YOUR_USER_NAME]/.silk/output
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Now we have to add the
output file to data lift
Go to the sources tab and
add the RDF file
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
To import it into our triple
store go to “Import of RDF
source” module
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Click submit to continue
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Now all of our sources are ready
We go to the SPARQL endpoint
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Be careful to choose the
Internal triple store as your
repository
We write our SPARQL query
that selects the latitude,
longitude and the name of
the matched entities
Click execute to see the
results
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Use SparqlViz to view the
results on the map
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
Use SparqlViz to view the
results on the map
Getting the Data
Transform into RDF
Setting up Silk
Write the Linkage Rule
Matching Data
Query and Visualization
The results show a good
matching between the
entities
Thank you
Please check http://datalift.org /?page_id=45 for the latest
usecases and tutorials