Download Report

Trove’s
Application Programming Interface
Electronic Resources Australia
Annual Forum
Sydney
10 July 2012
Debbie Campbell
Director Collaborative Services
National Library of Australia
CC BY 3.0
http://creativecommons.org/licenses/by/3.0/au/
2
Surveying the possibilities
The API should provide an ability to download sets of records based on one of the
following criteria, or a subset of them:
in a particular format such as images, maps, articles, people and organisations,
digitised newspaper articles
with a particular tag
from a particular contributor
with a particular type of content, for example Tasmanian content
created by a particular set of authors, such as written by researchers from Griffith
University
all commentary collected by Trove.
with an ability to receive updates according to a schedule varying from daily to quarterly,
and
then integrate the records into discovery services ranging from commercial platforms to
in-house solutions or open source solutions.
3
Positioning Trove to
make Australian content more discoverable
allow Trove’s data to be used in ‘mash-ups’ and other services
make enhanced metadata (“collective intelligence”) available to
collaborators
promote the analysis of Australian collections in memory institutions
support academic research
4
Questions explored
Are there records or content the National Library does not have
permission to redistribute?
The records in Trove are not all in the same format (data schema) – will
this be a problem?
Will sending out works instead of individual item records be OK?
Where access conditions and copyright licences are made explicit, such
as Creative Commons licences, how do we continue to emphasise this?
Will lots of simultaneous downloading require additional support from
the Trove platform?
5
The content
Are there records or content the National Library does not have
permission to redistribute?
MaRC records from the National Bibliographic Database, especially those
purchased specifically for the use of Libraries Australia members
book cover art
images of the Australian Women’s Weekly, and other newspaper content
post 1955
websites archived in PANDORA
6
The content
Records available
brief records from Libraries Australia: 17m+
OAIster, Hathi Trust
newspapers – titles, articles...
people and organisation records, in their own API...
wiki.nla.gov.au/display/ARDCPIP/Party+Infrastructure+APIs
7
The data schema
The records in Trove are not all in the same format (data schema) – will
this be a problem?
Records are provided to Trove as:
MaRC/Resource Description & Access (RDA), from the National Bibliographic
Database
Dublin Core, from university and cultural heritage repositories
EAC-CPF (Encoded Archival Context for Corporate Bodies, Persons and
Families), in XML, for the people and organisations zone of Trove
Records are available for lists too
In return, the API provides Qualified Dublin Core
 brief and longer records are provided (in JSON or XML)
8
The work
Will sending out works instead of individual item records be OK?
9
The access provisions
Where access conditions and copyright licences are made explicit, such
as Creative Commons licences, how do we continue to emphasise
them?
Default position:
To carry forward any conditions made explicit in the records including
the ‘best’ online status is available
Exceptions are negotiated upfront, because the NLA doesn’t monitor
the copyright, and data exceptions are a load on the efficiency of the
service itself
10
The downloading
Will lots of simultaneous downloading require additional support?
a limited number of queries
per minute/hour - 100/6K
Millions
Trove Work Count by Zone
350
300
250
200
100
50
May-12
Mar-12
Jan-12
Nov-11
Sep-11
Jul-11
May-11
Mar-11
Jan-11
Nov-10
Sep-10
Jul-10
0
May-10
the limit was tested by half a
dozen beta test sites
150
Mar-10
the quantity of responses
will depend on the query
11
Trove Terms of Use
12
Who is using the Trove API
68 intrepid individuals, some representing research institutions
recent sample declarations:
•
•
•
•
•
•
•
•
•
•
•
integration with a research metadata repository...;
saving results from Trove for thesis research and perhaps using them in a database for my research;
testing for ANDS projects;
...I will be experimenting with using the api with our catalogue to improve our user experience. I am at
the discovering "what is possible" stage, rather than having any fixed plans;
Windows 8 App;
Discoverability tool for non-English language content;
Personal use for playing with data visualisation and mashups etc;
Personal local history and genealogical research;
To see how i can remix data and pull data from it to see if it has any uses for making my own apps or
for the library I work for;
Using it to develop a search [tool] for all Qld Govt Libraries;
An internal transcription tool for the South Australian Genealogy and Heraldry Society Inc (a not for
profit organisation)... interested in creating indexes of shipping entries.
‘.. The possibilities are endless...’
13
Who is using the Trove API
Five days to harvest four million newspaper articles, and analyse them
like this:
Tim Sherratt, discontents, http://discontents.com.au/
14
15
Tim Sherratt: Mining for meanings
16
Tim Sherratt: Mining for meanings
17
18
References
Introduction to the Trove API
http://trove.nla.gov.au/general/api
Trove API Terms of Use
http://trove.nla.gov.au/general/api-termsofuse
Trove Contact Us – for further assistance
http://trove.nla.gov.au/contactus
19