Triplestores and SPARQL Notes and Exercises Kevin Page & John Pybus Tue 9th July 2013 1 Content to negotiate and how to SPARQL 1.1 More RDF and content negotiation with the Audio File Repository 1. Use the AskApache HTTP Headers Tool again on with this URI: http://jamendo.legacy.audiofiles.linkedmusic.org/audiofile/98933 but this time set the accept type to “audio/mpeg” give the 303 location to your web browser (or if your browser doesn’t have a handler for audio, download the file then play it) 2. Now use Morph to retrieve and view the RDF. Follow your nose to Jamendo. Make a note of the Jamendo recordings the audio files are encodings of (from the RDF). ○ ○ 3. Also try: ○ ○ http://jamendo.legacy.audiofiles.linkedmusic.org/audiofile/71263 http://jamendo.legacy.audiofiles.linkedmusic.org/audiofile/71265 1.2 A simple SPARQL query to the the Audio File Repository 1. We’ve put an web interface for the Audio File Repository SPARQL endpoint up at: http://jamendo.legacy.audiofiles.linkedmusic.org/snorql/ a. open it in your browser b. note that non-human client can connect directly to the endpoint using the SPARQL protocol 2. When the Collection Builder “grounds” a collection it queries the Audio File repository with a simple query to find any audio files which encode the abstract recordings listed in the (ungrounded) collection: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 1 PREFIX mo: <http://purl.org/ontology/mo/> SELECT ?audiofile WHERE { <http://dbtune.org/jamendo/track/98933> mo:available_as ? audiofile . ?audiofile a mo:AudioFile . } 3. Substituting in the Jamendo recordings you noted above, try out the query on the Audio File Repository. 1.3 SPARQLing with Jamando at dbtune 1. The Collection Builder creates its collections using the RDF and SPARQL endpoint available from dtune ○ http://dbtune.org/jamendo/ 2. There is a similar SPARQL endpoint web interface for the Jamendo data ○ http://dbtune.org/jamendo/store/user/query ○ be sure to select “SPARQL” 3. Enter the query on slide 9 “SPARQL Queries (3)” ○ Do you receive the expected match? 4. Change the SELECT statement to return all variable by replacing “? artistname” with “*”, i.e. ○ SELECT * FROM { ○ Resubmit the query 5. Try changing the graph pattern. Instead of matching against the title of the album find all artists who have made albums that have a 7th track. Some hints: Extend the Record pattern to match tracks Match tracks that have a number 7 Look at the RDF you discovered earlier to find the required concepts and properties (e.g. http://dbtune.org/jamendo/track/71263 ) 6. Verify your results - list and look at the albums that match within the graph pattern ○ ○ ○ ○ Hint: add “?album” after “?artistname” to the SELECT clause ○ Are there fewer artists with 20 track records? 7. Do some artists have more than one album with 5 tracks? Does this cause them to be matched multiple times? ○ Try changing the part of the SELECT clause to “SELECT DISTINCT ? artistname” 2 2 SPARQL in detail 2.1 Why SPARQL? SPARQL is the query language of the Semantic Web. • SPARQL Protocol and RDF Query Language It lets us: • • • • Pull values from structured and semi-structured data Explore data by querying unknown relationships Perform complex joins of disparate databases in a single, simple query Transform RDF data from one vocabulary to another 2.2 Structure of a SPARQL Query A SPARQL query comprises, in order: • • • • • Prefix declarations, for abbreviating URIs Dataset definition, stating what RDF graph(s) are being queried A result clause, identifying what information to return from the query The query pattern, specifying what to query for in the underlying dataset Query modifiers, slicing, ordering, and otherwise rearranging query results # prefix declarations PREFIX foo: <http://example.com/resources/> ... # dataset definition FROM ... # result clause SELECT ... # query pattern WHERE { ... } # query modifiers ORDER BY ... 2.3 Friend of a Friend (FOAF) • FOAF (http://www.foaf-project.org/) is a standard RDF vocabulary for describing people and relationships • Tim Berners-Lee's FOAF information available at http://www.w3.org/People/Berners-Lee/card • For our first query, let's find all the names of people mentioned in Tim's FOAF file: find all subjects (?person) and objects (?name) linked with the foaf:name predicate. Then return all the values of ?name. In other words, find all names mentioned in Tim Berners-Lee's FOAF file. 3 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?person foaf:name ?name . } • SPARQL variables start with a ? and can match any node (resource or literal) in the RDF dataset. • Triple patterns are just like triples, except that any of the parts of a triple can be replaced with a variable. • The SELECT result clause returns a table of variables and values that satisfy the query. 2.3.1 Traversing the graph Find me the homepage of anyone known by Tim Berners-Lee. PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX card: <http://www.w3.org/People/Berners-Lee/card#> SELECT * WHERE { card:i foaf:knows ?known . ?known foaf:homepage ?homepage . } • We can use multiple triple patterns to retrieve multiple properties about a particular resource • Shortcut: SELECT * selects all variables mentioned in the query. • By using ?known as an object of one triple and the subject of another, we traverse multiple links in the graph. 2.3.2 Running a SPARQL query in Gruff • Select View → Query View • Copy the SPARQL into the query window • Click Do Query • The results will be displayed below • (you can select items and add them to the graph view) 4 2.4 Exercise Use the data you loaded into the gruff triplestore in Monday's exercises. 1. Run the query from 2.3.1 2. • http://oxpoints.oucs.ox.ac.uk/id/59030245 is the node for the University Science Area • http://data.ordnancesurvey.co.uk/ontology/spatialrelations/withi n is a property describing one place being contained within another. Write a query to find the names of buildings which are in within the University Science Area 2.5 DISTINCT/LIMIT/ORDER BY Example: DBPedia • DBPedia (http://dbpedia.org/) is an RDF version of information from Wikipedia. • DBPedia contains data derived from Wikipedia's infoboxes, category hierarchy, article abstracts, and various external links. • DBpedia contains over 100 million triples. Find me 50 example concepts in the DBPedia dataset. SELECT DISTINCT ?concept WHERE { ?s a ?concept . } LIMIT 50 • LIMIT is a solution modifier that limits the number of rows returned from a query. SPARQL has two other solution modifiers: • ORDER BY for sorting query solutions on the value of one or more variables • OFFSET, used in conjunction with LIMIT and ORDER BY to take a slice of a sorted solution set (e.g. for paging) • The SPARQL keyword a is a shortcut for the common predicate rdf:type, giving the class of a resource. • The DISTINCT modifier eliminates duplicate rows from the query results. 2.6 SPARQL filters Find me all landlocked countries with a population greater than 15 million. PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 5 PREFIX type: <http://dbpedia.org/class/yago/> PREFIX prop: <http://dbpedia.org/property/> SELECT ?country_name ?population WHERE { ?country a type:LandlockedCountries ; rdfs:label ?country_name ; prop:populationEstimate ?population . FILTER (?population > 15000000) . } ORDER BY DESC(?population) • It would be ORDER BY ?population, or ASC(?population) for ascending order • FILTER constraints use boolean conditions to filter out unwanted query results. • Shortcut: a semicolon (;) can be used to separate two triple patterns that share the same subject. (?country is the shared subject above.) • rdfs:label is a common predicate for giving a human-friendly label to a resource. 2.6.1 • • • • • • SPARQL built-in filter functions Logical: !, &&, || Math: +, -, *, / Comparison: =, !=, >, <, ... SPARQL tests: isURI, isBlank, isLiteral, bound SPARQL accessors: str, lang, datatype Other: sameTerm, langMatches, regex 2.7 OPTIONAL/UNION Dataset: Jamendo • Jamendo is a community collection of music all freely licensed under Creative Commons licenses. • DBTune.org hosts a queryable RDF version of information about Jamendo's music collection. • Hosts data on thousands of artists, tens of thousands of albums, and nearly 100,000 tracks. 2.7.1 Finding artists' info - the wrong way Find all Jamendo artists along with their image, home page, and the location they're near. PREFIX mo: <http://purl.org/ontology/mo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; foaf:name ?name ; foaf:img ?img ; foaf:homepage ?hp ; 6 foaf:based_near ?loc . } • Jamendo has information on about 3,500 artists. • Trying the query, though, we only get 2,667 results. What's wrong? 2.7.2 Finding artists' info - the right way Find all Jamendo artists along with their image, home page, and the location they're near, if any. PREFIX mo: <http://purl.org/ontology/mo/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?img ?hp ?loc WHERE { ?a a mo:MusicArtist ; foaf:name ?name . OPTIONAL { ?a foaf:img ?img } OPTIONAL { ?a foaf:homepage ?hp } OPTIONAL { ?a foaf:based_near ?loc } } • Not every artist has an image, homepage, or location! • OPTIONAL tries to match a graph pattern, but doesn't fail the whole query if the optional match fails. • If an OPTIONAL pattern fails to match for a particular solution, any variables in that pattern remain unbound (no value) for that solution. 2.7.3 Querying alternatives Part of the Class hierarchy used by the CLAROS data is shown. What if we want to find all instances of Man-Made Things? There are two alternatives a Man-Made Object or a Man Made Feature. PREFIX crm: <http://purl.org/NET/crm-owl#> SELECT DISTINCT ?thing WHERE { 7 { ?thing a crm:E25_Man-Made_Feature .} UNION { ?thing a crm:E22_Man-Made_Object . } } • The UNION keyword forms a disjunction of two graph patterns. Solutions to both sides of the UNION are included in the results. 2.8 SPARQL endpoints on the web Many of the SPARQL endpoints to data on the web have forms which allow you to enter SPARQL directly into a webpage. Many of the queries shown above have public endpoints: • Oxpoints ◦ http://oxpoints.oucs.ox.ac.uk/sparql CLAROS • ◦ http://data.clarosnet.org/sparql/ • DBpedia (more than one) ◦ http://dbpedia.org/sparql ◦ http://dbpedia.org/snorql • DBTune/Jamendo ◦ http://dbtune.org/jamendo/store/user/query ◦ (need to manually select SPARQL as defaults to SeRQL) Others listed at: • http://www.w3.org/wiki/SparqlEndpoints ◦ UK gov data: http://data.gov.uk/sparql 2.9 Exercise 1. Try some of the above queries at the relevant endpoint. 2. What query can come up with against either the Gruff store, or one on the web? 8 3 Further SPARQL 3.1 DESCRIBE • The DESCRIBE query result clause allows the server to return whatever RDF it wants that describes the given resource(s). • Because the server is free to interpret DESCRIBE as it sees fit, DESCRIBE queries are not interoperable. Example from data.ox: http://data.ox.ac.uk/sparql/ PREFIX oxp: <http://ns.ox.ac.uk/namespace/oxpoints/2009/02/owl#> DESCRIBE ?x WHERE { ?x a oxp:Library . } LIMIT 10 3.2 CONSTRUCT – creating new triples PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> CONSTRUCT { ?X vCard:FN ?name . ?X vCard:URL ?url . ?X vCard:TITLE ?title . }FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> WHERE { OPTIONAL { ?X foaf:name ?name . FILTER isLiteral(?name) . } OPTIONAL { ?X foaf:homepage ?url . FILTER isURI(?url) . } OPTIONAL { ?X foaf:title ?title . FILTER isLiteral(?title) . } } • CONSTRUCT is an alternative SPARQL result clause to SELECT. Instead of returning a table of result values, CONSTRUCT returns an RDF graph. • The result RDF graph is created by taking the results of the equivalent SELECT query and filling in the values of variables that occur in the CONSTRUCT template. • Triples are not created in the result graph for template patterns that involve an unbound variable. 3.3 SPARQL 1.1 – more features • SPARQL 1.0 became a standard in January, 2008, and included: ◦ SPARQL 1.0 Query Language ◦ SPARQL 1.0 Protocol ◦ SPARQL Results XML Format 9 • SPARQL 1.1 became a standard in March, 2013, and includes: ◦ SPARQL 1.1 Update - for inserting, deleting, modifying RDF data ◦ SPARQL 1.1 Graph Store HTTP Protocol ◦ SPARQL 1.1 Service Descriptions - describe capabilities of SPARQL endpoints ◦ SPARQL 1.1 Entailments - how to combine reasoning with SPARQL ◦ SPARQL 1.1 Basic Federated Query ◦ SPARQL Results CSV/TSV Formats SPARQL 1.0 only dealt with querying data, updating the data was out of scope. SPARQL 1.1 Update adds a language for managing and updating RDF graphs. • • • • • • • INSERT DATA { triples } DELETE DATA { triples } [ DELETE { template } ] [ INSERT { template } ] WHERE { pattern } LOAD uri [ INTO GRAPH uri ] CLEAR GRAPH uri CREATE GRAPH uri DROP GRAPH uri The SPARQL 1.1 Uniform HTTP Protocol defines how to use RESTful HTTP requests to affect an RDF graph store. 3.4 SPARQL 1.1 new query features • Aggregate queries post-process query results by dividing the solutions into groups, and then performing summary calculations on those groups. • As in SQL, the GROUP BY clause specifies the key variable(s) to use to partition the solutions into groups. • SPARQL 1.1 defines these aggregate functions: COUNT, MIN, MAX, SUM, AVG, GROUP_CONCAT, SAMPLE • SPARQL 1.1 also includes a HAVING clause to filter the results of the query after applying aggregates. PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX oxp: <http://ns.ox.ac.uk/namespace/oxpoints/2009/02/owl#> SELECT ?division (count(?building) as ?number_of_buildings) WHERE { ?division a oxp:Division . ?building a oxp:Building . ?building dcterms:isPartOf*/^oxp:occupies/dcterms:isPartOf* ?division . } GROUP BY ?division 10 3.5 Things to try: 1. Try the queries above at data.ox • Can you show a name for the university divisions rather than a URI? 2. The query in section 3.6 listed landlocked countries. • Can you count those above and below 15000000 people instead? • What's the average population of world countries? 11 Part of the SPARQL demo comes from Cambridge Semantics' SPARQL by Example http://www.cambridgesemantics.com/2008/09/sparql-by-example/ Licensed under a Creative Commons Attribution-Share Alike 3.0 License 12
© Copyright 2024