JISC DEVELOPMENT PROGRAMMES Project Document Cover Sheet Project FINAL REPORT

GRADE
JISC DEVELOPMENT PROGRAMMES
Project Document Cover Sheet
FINAL REPORT
Project
Project Acronym
GRADE
Project ID
Project Title
Scoping a Geospatial Repository for Academic Deposit and Extraction
Start Date
1 June 2005
Lead Institution
EDINA
Project Director
David Medyckyj-Scott, EDINA
Project Manager &
contact details
Anne Robertson
EDINA
The University of Edinburgh
Causewayside House
160 Causewayside
Edinburgh EH9 1PR
tel: 0131 651 3874
email: [email protected]
Partner Institutions
University of Edinburgh, University of Southampton
Project Web URL
www.edina.ac.uk/projects/grade
Programme Name (and
number)
Digital Repositories Programme 2005-07
Programme Manager
Neil Jacobs
End Date
30 April 2007
Document
Document Title
Final Report
Reporting Period
End of project
Author(s) & project role
Anne Robertson, Project Manager
James Reid, Project Advisor
David Medyckyj-Scott, Project Director
Date
November 2007
URL
http://edina.ac.uk/projects/grade/GRADEfinalreport.pdf
Access
Project and JISC internal
Filename
GRADEfinalreport.pdf
General dissemination
Document History
Version
Date
Comments
0.1
30/03/07
Initial draft for Programme Manager
1.0
31/10/07
Final draft for review
1.1
30/11/07
Final version for submission to Programme Manager
1.2
31/01/08
Final version with edits requested by Programme Manager
2
Table of Contents
Acknowledgements .......................................................................................................................................4
Executive Summary.......................................................................................................................................5
Background ....................................................................................................................................................6
Aims and Objectives......................................................................................................................................7
Methodology...................................................................................................................................................8
Implementation ..............................................................................................................................................9
Outputs and Results ................................................................................................................................... 12
Outcomes .................................................................................................................................................... 21
Conclusions................................................................................................................................................. 23
Implications ................................................................................................................................................. 23
Recommendations...................................................................................................................................... 25
References .................................................................................................................................................. 26
Appendices ................................................................................................................................................. 27
3
Acknowledgements
The project was funded by JISC under its Digital Repositories Programme. The project wishes to
acknowledge the support of the staff of the EDINA National Data Centre as well as to the valuable
contributions from the following consortium and associate partners:
Dr Charlotte Waelde, AHRC Research Centre for Studies in Intellectual Property and Technology Law
Mags McGingley, AHRC Research Centre for Studies in Intellectual Property and Technology Law
Jonathon Gu, AHRC Research Centre for Studies in Intellectual Property and Technology Law
Pauline Simpson, University of Southampton
Dr Mark Brown, University of Southampton
Dr Mike Smith, Kingston University
Dr Robin Smith, University of Sheffield
Dr Anthony Beck, Leeds University
Andrea Frank, Cardiff University
Bob Abrahart, University of Nottingham
James Batcheller, University of Edinburgh
Nick Groome, Ordnance Survey
Elsa Joao, Strathclyde University
Paul Adderley, Strathclyde University
Bryan Lawrence, British Atmospheric Data Centre
Owen MacDonald, University of Edinburgh
Steven Morris, North Carolina State University
Femke Reitsma, University of Edinburgh
Mike Sanders, Plymouth University
Graham Vowles, Ordnance Survey
Dr Pragya Agarwal, University College London
Alex Mordue
4
Executive Summary
The aim of GRADE has been to assist in developing policy and best-practice strategies for geospatial
data sharing and reuse by providing demonstrable evidence of how, why and under what
circumstances geospatial data are (and may) be managed via repositories. The broad areas for
investigation included:
-
Digital Rights and IPR issues related to use and reuse of primary and derived data;
the technical and cultural milieu in which a geospatial repository might operate in order to better
understand the costs and benefits of various approaches; and
the technical and cultural issues around informal data sharing mechanisms.
An holistic approach to understanding the issues around storing and accessing geospatial data within
a media-centric digital repository was taken with work divided into distinct areas. This approach was
hoped to afford an understanding of the breadth of issues that would arise in a repository environment
and investigate approaches acceptable to the community prior to any future creation and deployment
of an infrastructure for geospatial data sharing.
Key findings of the project are:
• Specific user requirements exist for the effective management of geospatial data within
repositories.
• Clear evidence exists of grass roots support for a national geospatial data repository.
• Geospatial data sharing is commonplace. The most commonly used methods are email
attachment, CD/DVD, ftp.
• There appears to be a lack of departmental geospatial data management policies
• More contemporary informal file sharing practices such as those based upon peer-to-peer
networks are not in common use.
• There is a perception that current licences for geospatial data are overly-complex
• There is a great deal of uncertainty by those creating geospatial data as part of research as to
what they can legally do with derived data products
• The legal argument that the sui generis database right is the appropriate law for geospatial
data and therefore derived data products can legally be made available for reuse throughout
the academic community
• UK institutional repositories do not currently manage geospatial data and are not yet set up to
do so this is compounded by the fact that they have not yet been offered any geospatial data
to manage. Some are willing to accept this media type.
• The mature census-based approach to standards for geospatial data, particularly geospatial
metadata, ensures geospatial data repositories are well placed to interoperate with
institutional repositories within the JISC Information Environment and beyond.
However
more work needs to be done on standards for geospatial data preservation and the packaging
of geospatial data.
The GRADE project has been successful in demonstrating to members of the UK academic
geospatial community the value of a trusted repository for geospatial data sharing and reuse. The
project has highlighted that such a data repository is an essential component of the UK academic
spatial data infrastructure and its absence is unsatisfactory.
The project has successfully
demonstrated that the law on copyright and licensing for geospatial data in the UK is not as well
understood nor as clear-cut as it might be and there exists scope for clarifying fundamental end user
rights. The project has also put forward a legal argument that copyright does not subsist in geospatial
data rather the European database directive is the appropriate law.
5
Background
Over recent years there has been a growing interest in improving the discovery and sharing of data
amongst researchers and end users. Data sharing is important for two reasons. Firstly, data sharing
has historically been considered a hallmark of good scientific practice. Openness in the scientific
process allows for the confirmation of research findings, especially through the replication of results.
Data sharing also makes it possible for scientists to build on the work of others. It is with this in mind
that scientific funding agencies often require grant recipients to share the data produced in their
studies. Secondly, new “big science” projects involve data that are collected and analysed by multiple
people, institutions, and research sites. Sharing data in these cases becomes more than just the
exchange of finalised data sets. E-Science initiatives in the United States and the European Union are
looking at ways to allow scientists to collaborate on the creation and use of very large research data
sets. Despite this importance, however, sharing data is not easy. Many researchers have discussed
the problems underlying this seemingly simple process e.g. Louis et al 2002. Problems include: the
willingness to share, locating data, mechanisms for sharing and accessing data.
1
One special subset of data of particular concern is digital geospatial data. Geospatial data are
typically expensive to create (born digitally or not) and are thus valuable assets which can be best
exploited if the infrastructure for their discovery, sharing and reuse are put in place. However, the way
geospatial data are delivered to, and used by, end users are contingent upon a number of factors.
These include how and why it was created or acquired; the agreements in place to co-operate, share
or exchange data between different institutions; conditions and procedures required to meet legal and
economic requirements; how and where it is stored; and upon software and hardware requirements.
Geospatial data may be categorised by inherent complexity and profound variability (format, size,
temporality, quality, value, access restrictions), which from a users perspective, makes data sharing
logistically complicated and burdensome. EDINA’s endeavours in this area has focused largely on
redistributing (licensing permitting) data files produced by individuals or small teams and has
proceeded very much on an ad hoc and reactive fashion. The reasons for this are two fold:
i. there has to date been no rigorous appraisal of the viable alternatives for geospatial data sharing
within the JISC Information Environment (IE) and there is no default mechanism by which
such data can easily be deposited, discovered and shared;
ii. Intellectual Property Rights (IPR) and copyright issues are a serious impediment to acquiring,
preserving and sharing geospatial data. That is, there are currently concerns and confusion
(either phantom or legitimate) over the assertion of IPR and copyright, particularly where the data
includes third party data e.g. Ordnance Survey or National Census Offices.
At the time of project proposal, April 2005, the UK was somewhat lagging behind developments
emerging elsewhere such as those in the US, e.g. the Cornell University Geospatial Information
Repository (CUGIR) (Westbrooks 2003) or the National Digital Information Infrastructure and
Preservation Program (NDIIPP) which was specifically targeting the ‘Collection and Preservation of
At-Risk Digital Geospatial Data’ (Morris 2005). However, as well as the general need to improve the
sharing of geospatial data between researchers in the UK, various EU directives and regulations e.g.
the Water Framework Directive, Global Monitoring for Environment and Security (GMES), require that
the issues of exchange sharing, access and use of spatial data be addressed. For example, INSPIRE
(http://www.ec-gis.org/inspire/) requires Member States to adopt measures for the sharing and reuse
of spatial data sets between Public Authorities including Universities. Given these external stimuli, the
scoping of a Geospatial Repository for Academic Deposit and Extraction (GRADE) seemed both
necessary and timely.
1
‘Geospatial’ is a term used to describe a class of data that has a geographic or spatial nature e.g. paper maps, electronic
maps, geo-referenced imagery, satellite data, data stored within a Geographic Information System (GIS).
6
Aims and Objectives
The aim of GRADE has been to assist in developing policy and best-practice strategies for geospatial
data sharing and reuse by providing demonstrable evidence of how, why and under what
circumstances geospatial data are (and may) be managed via repositories. The broad areas for
investigation included:
-
Digital Rights and IPR issues related to use and reuse of primary and derived data;
the technical and cultural milieu in which a geospatial repository might operate in order to better
understand the costs and benefits of various approaches; and
the technical and cultural issues around informal data sharing mechanisms.
Additionally, the project has endeavoured to:
-
establish and evaluate prototype demonstrator repositories based on formal models;
establish the mutuality between media-centric (geospatial) repositories and Institutional
repositories; and
establish how metadata standards deployed in various repositories may meet the requirements of
the geospatial community (that is the HFE/GoGeo profile of ISO19115) and what the minimum
quality thresholds for useable spatial metadata (use case driven) may be.
Specific objectives agreed at the start of the project included:
• Establish detailed repository use cases and user based evidence for the requirements and
functionality of a repository capable of managing licensed geospatial assets.
• Investigate and identify the technical and cultural issues surrounding the storage,
management and accessibility of geospatial information derived from licensed data within a
centralised digital repository.
• Synthesise the lessons into best practice and advice for the wider community particularly
those concerned with the establishment and operation of research data or media-centric
repositories.
• Investigate the extent of current informal data publication and sharing and the ‘grey economy’
of geospatial information sharing.
• Investigate the relationships and potential interfaces between informal and formal (including
Institutional) repositories.
• Pilot an informal geospatial data repository demonstrator with Associate Partners to inform
understanding of the cultural and technical issues involved.
• Articulate intended use cases of sharing derived geospatial data.
• Develop a clear understanding of digital rights pertaining to data created entirely by a user or
research team.
• Develop a clear understanding of digital rights issues for derived data respecting the licensing
conditions of the source data.
• Develop a conceptual and technical framework for resolving those described rights
management issues raised in relationship to repositories.
• Determine to what extent Institutional repositories currently manage geospatial assets.
• Determine to what extent Institutional repositories could or should manage geospatial assets.
• Investigate the arguments for and against Institutional versus media-centric repositories for
geospatial data assets.
• Investigate and assess the role of existing JISC sponsored terminology services with respect
to repositories.
• Investigate and assess interoperability between geospatial data repositories and other types
of services/repositories e.g. metadata harvesting into geo-spatial portals, linking e-prints with
the datasets referred to in the articles.
• Investigate the linking of geospatial repositories with e-Science infrastructures.
• Investigate and assess the potential of evolving industry driven geospatial interoperability
standards, specifically Open Geospatial Consortium/ISO 19100 series standards, as a means
of interoperating with repositories within and outside of academia.
At project outset, an objective had been to scope possible technical architectures for informal data
sharing and investigate how security issues may be addressed, synthesising findings into a preferred
7
2
policy statement on the use of Informal Repositories. This objective had to be curtailed somewhat
with an alternative approach of a workshop-based investigation into various peer-to-peer based file
sharing products.
Methodology
The GRADE project consortium led by EDINA with the Arts and Humanities Research Council
Research Centre for Studies into Intellectual Property and Technology Law and the National
Oceanography Centre, University of Southampton demonstrated strengths in the key areas of
understanding geospatial data, digital rights and data centres. To complement this strong consortium,
project associate partners included select UK geospatial academics and representatives from relevant
3
Higher Education Academy Subject Centres . In addition to the expert knowledge brought by
consortium members and associate partners to the project, small pieces of work were commissioned
to external specialists where necessary.
Overall Approach
An holistic approach to understanding the issues around storing and accessing geospatial data within
4
a media-centric digital repository adopted with project work divided into the following work packages:
1
2
3
4
5
Work Package
Investigation into formal repositories for sharing
Scoping the role of informal repositories
Digital Rights Issues
Scoping the role of institutional repositories for
geospatial data
Interoperability
Responsibility
EDINA with input from associate partners
EDINA with input from associate partners
Arts and Humanities Research Council
Research Centre for Studies into Intellectual
Property and Technology Law
National Oceanography Centre, University of
Southampton
EDINA
Detailed Methodology
Within work package 1, the intention was not to establish a geospatial repository per se, but rather to
use the development of a demonstrator repository to explore the range of technical and cultural issues
involved and to gather evidence via user feedback. An advantage of this approach was that it acted
as an intelligence gathering activity whilst offering an immediate advantage to the UK Higher and
5
Further Education (HFE) geospatial community by helping to identify the most salient operational
aspects of a media-centric repository as opposed to a ‘conventional’ general purpose repository. It
should also be noted that the repository open access movement was something of an unknown
quantity to most users and an element of user education was (of necessity) required.
In order to evaluate the scope of informal repositories for data sharing, work package 2 used a
questionnaire to gather initial evidence of existing trends in informal data sharing. A file sharing
experiment and subsequent workshop then provided the opportunity to make an assessment of peerto-peer facilities for data sharing.
A compendium of use cases provided exemplars of geospatial data derived in the course of research
projects. This compendium formed reference material within work package 3 from which the legal
team formed an impression of the digital rights issues faced by those creating geospatial data within
UK HFE. Site visits to the compendium author and to the organisation responsible for base mapping
of the UK supplemented the background material within the compendium. A review of existing
2
The original intention had been to leverage work within the SPIRE project looking at LionShare as an informal repository.
http://edina.ac.uk/projects/grade/team.html
4
This term is used to denote a managed resource collection focusing on a particular content type – in this instance geospatial
resources.
5
The term ‘geospatial community’ is used in the broadest sense to describe those working with geospatial data. In UK HFE
that includes teaching academics, researchers and students.
3
8
academic licences for access to geospatial content also contributed to the informed views of those
responsible for development of the licensing strategy within work package 3.
In work package 4, a survey distributed to institutional repository managers was the principle method
for scoping the role of institutional repositories for geospatial data.
Work package 5, considered interoperability issues and took the form of a technology watch
throughout project duration.
Implementation
Work Package 1
The focus of Work Package 1 was on the storage, management and access to data derived from
6
licensed data within repositories. The original proposal submitted to JISC outlined the concept of
building a demonstrator repository as a means of engaging potential users. Prior to building the
7
demonstrator repository (the demonstrator), a literature review was undertaken to identify and
possibly leverage work already taking place in this area. The review confirmed that in terms of
geospatial data and repositories, very little activity was taking place globally. However one of the
8
project partners had been investigating ingest procedures for their geospatial data archiving project
9
based upon the DSpace open source repository. It seemed appropriate to leverage any experience
this group had had with storing geospatial data within that particular repository software and so
DSpace was chosen as the basis for the first version of the demonstrator.
In terms of who could access the demonstrator, the initial concept described in the project proposal
10
was to repurpose geospatial data derived from licensed content already available via Digimap and
11
UKBORDERS services. With access to the information within these services restricted by
subscription and/or registration, the vision was to have a controlled environment for engaging with an
array of users/creators of geospatial data. However it was realised fairly early on in the development
12
of the demonstrator that the complexity of UKBORDERS licensing made this too hard to implement
and that access would be granted to Digimap registered users only. To ensure legitimate data
sharing via the demonstrator, only registered Digimap users were allowed to register as
13
depositing/retrieving users of the demonstrator . A manual check was made of every potential
registrant against the Digimap registration database.
The methodology chosen for development of the repository was one of short, iterative development
14
cycles based upon user feedback .
Demonstrator Version 1
The first release of the demonstrator offered out-of-the-box DSpace functionality with minor
15
16
modifications . The demonstrator was populated with a selection of seed datasets to enable invited
participants to interact with the demonstrator. Associate partners were invited to interact with the
demonstrator (search, retrieve, upload) and were contacted via email/telephone for their feedback.
Demonstrator Version 2
Feedback from the first version of the demonstrator led to a series of mostly cosmetic enhancements
being made to the demonstrator, including the front page layout and the addition of images of
6
http://edina.ac.uk/projects/grade/GRADE_Proposal.doc
http://edina.ac.uk/projects/grade/RepositoryReviewFinal.doc
8
North Carolina Geospatial Data Archiving Project http://www.lib.ncsu.edu/ncgdap/
9
www.dspace.org
10
Add url - Digimap is a collection of EDINA services that deliver maps and map data of Great Britain to UK tertiary
education.Data is available to download for use in desktop GIS or as maps for printing, inclusion in reports etc.
11
UKBORDERS, funded by ESRC, provides digitised boundary datasets of the UK in a variety of GIS formats for UK HEFE
community to download and use.
12
UKBorders users are licensed to have access to particular data series only, not the entire collection of UKBORDERS data.
13
The repository was open to all to browse and review item metadata
14
A full description is found at http://edina.ac.uk/projects/grade/status1.html
15
Athens Authentication added at the point of data deposit/download
16
Mostly sample environmental data
7
9
17
industry-recognisable logos to assist repository browsing . The second version of the demonstrator
was accompanied by an online questionnaire (Appendix A) which respondents were asked to
complete having interacted with the demonstrator. With this second version of the demonstrator, the
aim was to reach a wider audience. Associate partners were asked to publicise the demonstrator
throughout their departments/institutions Associate partners were also provided with an expanded
template (Appendix B) to not only provide feedback on the demonstrator but also to provide an
overview of their current departmental data management practices.
Demonstrator Version 3
Feedback gathered from the second version of the demonstrator led to the final set of customisations
being made to the demonstrator and it was this customisation where the bulk of software engineering
effort was spent (a summary of this work can be found at Appendix C). The key feature users wanted
from a repository capable of managing geospatial data was to support location-based searching. To
meet this requirement, software engineering effort could either have focused on enhancing the
18
existing UK academic geospatial metadata discovery portal with data deposit/download modules or
continue customising DSpace. It was felt that work with DSpace would probably be of most interest to
the wider repository community. In terms of offering map-based searching within the demonstrator,
19
the decision was taken to use an open source mapping engine as opposed to an EDINA-based
mapping service again for wider programme relevance. Open-source geospatial data translator
libraries were used to (i) verify GIS data format and (ii) check file size during the customised deposit
20
process . In its final form the demonstrator offered a repository supporting location-based searching,
automatic generation of geographic extent during data deposit, file validation and most recently
deposited alerts on the front page.
A new questionnaire (Appendix D) to accompany this third iteration of the demonstrator was made
available via the project web site. Response-rate to this questionnaire was poor.
Work Package 2
Work package 2, assessing the use of informal repositories for geospatial data sharing, commenced
with the distribution of an anonymous questionnaire aimed at gaining insight into current data sharing
practices.
21
The GISRUK conference series is the UK’s national GIS research conference, established in 1993.
th
th
In 2006 it was held in Nottingham, from 5 to 7 April. It seemed sensible to leverage the opportunity
of a large national gathering of GI practitioners to carry out such an investigation. At this event, the
GRADE project officer used breakout times to circulate amongst conference delegates to ask if they
would be prepared to answer a few questions related to data sharing. Copies of the questionnaire
(Appendix E) were also left in the conference computing lab for delegates to complete. The
anonymous questionnaire was also made available on the GRADE project web site. 101 responses
were received.
As well as providing an assessment of current informal data sharing practices, the results of the
survey were hoped to provide direction on the most appropriate way forward for the development of
an informal repository demonstrator. However, the survey showed that more progressive file sharing
practices are not commonplace amongst those working with geospatial data.
22
The GRADE team liaised with members of the JISC-funded SPIRE project on the possibility of
leveraging their work with LionShare software. However this partnership did not progress due to
difficulties encountered in setting up the LionShare software itself.
Other alternatives including instant messaging and bitTorrent technologies were considered as the
technical basis for an informal demonstrator. GeoChat is an extension to one of the most
17
Images depicted software vendors logos to assist user to quickly distinguish certain data format types.
www.gogeo.ac.uk
19
Google Maps
20
For demonstrator purposes, users were restricted to depositing only certain geospatial data types and files of a maximum
data size
21
http://www.geo.ed.ac.uk/gisruk/gisruk.html
22
http://spire.conted.ox.ac.uk/cgi-bin/trac.cgi
18
10
23
commonplace GIS softwares . Geo-chatting is described as offering the ability to exchange
geometries and georeferenced imagery with online contacts via P2P technology. However, problems
encountered with GeoChat related to the difficulty in setting up the extension. The GRADE team then
24
considered some form of demonstrator leveraging geoTorrent.org . GeoTorrent.org was set up in the
25
second half of 2005 as an initiative to facilitate the distribution of large geospatial datasets with the
goal of providing ‘fast peer-to-peer sharing of geospatial data’. Initial contact was made with the
organisation responsible for running geoTorrent.org, but with their base in Australia, it seemed the
logistics required to facilitate the work were beyond the scope of the informal demonstrator.
26
Finally, as an alternative to an informal demonstrator per se, an expert in participatory GIS was
commissioned to carrying out a 2-day data-sharing workshop with a group of twelve participants. On
the first day participants used two peer-to-peer file sharing applications to attempt to share geospatial
data remotely from their workplace. On day two, participants came together to discuss their
experiences and make an assessment on the suitability of informal methods for geospatial data
sharing compared to the formal demonstrator repository.
Work Package 3
The initial step within Work Package 3 (Digital Rights Issues) was to create a compendium of derived
data exemplars for the legal team. A lecturer in GeoSciences at Kingston University with an interest
in the legal intricacies of derived data was commissioned to gather this content. Each use case within
the compendium was presented in a standard way via a use case template (Appendix F). The use
case template was based upon one provided by the digital rights working group of the leading
international geo-standards organisation, the Open Geospatial Consortium (OGC). Part of the
sustainability strategy for the project was to ensure knowledge exchange with the OGC geoDRM
working group. Therefore it seemed appropriate to use their template to describe derived data use
cases.
On completion of the use case compendium it was passed onto the legal team. The legal team
sought clarification on certain data processing techniques described in the compendium and
undertook a site visit to the compendium author. During this information-gathering phase, the legal
27
team also invested time reviewing current HEFCE licenses for geospatial data access, for example,
28
via EDINA’s Digimap service. In December 2005, Dr Waelde visited Ordnance Survey to gain
insight into that particular organisation’s internal processes for data creation.
Having amassed sufficient background material, the legal team commenced their study into
developing a legal framework. In June 2006, initial findings were shared with the OGC geoDRM
working group for their discussion and feedback. In October 2006, final findings were shared with
GRADE project partners at the GRADE all-partner mid-project meeting in Edinburgh
(http://edina.ac.uk/projects/grade/meetings/301006.html) for discussion and feedback. The report
was finalised with project partners’ comments.
Work Package 4
Project work aimed at scoping the role of institutional repositories for geospatial data commenced with
the design of questions for a web-based survey. The questionnaire (Appendix G) was posted on the
GRADE project web site during early 2006. Project associate partners, on behalf of their institutions,
were invited to complete the survey initially.
Members of the wider repository network were also invited to complete the questionnaire. This was
achieved by email canvassing to individual repository managers (details sourced using the Registry of
29
Open Access Repositories ), as well as making several calls on [email protected] and
23
ESRI’s (www.esri.com) ArcGIS is arguably the most popular desktop GIS used within academia.
www.geotorrent.org
25
Geospatial datasets, particularly image datasets, can be very large. A single compressed image can be many gigabytes in
size.
26
community-based management of spatial information
27
http://edina.ac.uk/digimap/terms.shtml
28
Digimap is an EDINA service that delivers Ordnance Survey map data to UKHFE.
29
http://roar.eprints.org/
24
11
30
another to the SHERPA project list. In October 2006, survey findings were presented to GRADE
31
project partners at the GRADE all-partner mid-project meeting in Edinburgh . Project partners were
also asked to contribute to the SWOT analysis of institutional v. media-centric repositories at this
meeting. Work package 4 was completed in January 2007.
Work Package 5
Work investigating how media-centric repositories might interoperate with external repositories and
how interaction with the JISC IE may best be achieved took the form of a technology watch for the
duration of the project. The work was initially commissioned to an external resource at University
College London. However due to extenuating circumstances this work was completed by EDINA
staff.
Outputs and Results
Work Package 1 – Formal media repositories for sharing
The aim of work package 1 was to generate user based evidence for user requirements of a
repository capable of managing licensed geospatial data via the development of a demonstrator
repository.
Prior to considering the specific functional requirements of such a repository, it is worth reporting on
the status of current departmental guidelines/policies on data sharing reported by associate partners.
During the information-gathering phase of the second version of the demonstrator, associate partners
were asked to provide an insight into their current departmental data management policies (Appendix
H brings together each individual’s feedback). Similar issues were raised by each of the associate
partners and can be summarised as:
• a lack of policies/guidelines for archiving and ongoing access to research data;
• poor metadata practices and a lack of policy on metadata creation;
• loss of knowledge/data particularly work completed by post graduates;
• data commonly stored on researchers’ personal computers;
• data curation is seen as responsibility of the individual/group who carried out research;
• for an interested person to access data, they need to contact the data creator directly;
• sharing data is based around people networks;
• one of the key barriers to sharing relates to concerns over breaking licence conditions;
• lack of tie in into institutional practices e.g. archives, asset management.
It is worth noting that these practices are not unique to those working with geospatial data, indeed
many of the issues raised are highlighted within the recently published JISC-funded ‘Dealing with
32
Data’ (Lyon 2007) report which has as one of it’s recommendations the need to “develop a Data
Audit Framework to enable all Universities and colleges to carry out an audit of departmental data
collections, awareness, policies and practice”.
With the described lack of formal infrastructures for data sharing in place therefore, it is perhaps not
surprising that an outcome from this work package has been the level of use generated by the
33
demonstrator repository . At the end of November 2007, the demonstrator has over 160 data
deposits and over 170 registered users. An equivalent number of potential users have been refused
access to deposit or download data from the repository because they are not registered Digimap
users. These ‘rejected’ users are from a variety of sectors including non-Digimap registered UK
institutions, international academic institutions, UK government, UK private sector and individuals.
With EDINA still receiving weekly registration requests from interested users, options for sustaining
the repository are being considered.
In the meantime, however, the key aim of work package 1 was to identify those functions necessary
for a repository capable of managing geospatial data. The three cycles of development culminated in
30
www.sherpa.ac.uk
http://edina.ac.uk/projects/grade/ppt/Simpson_GRADEProject%20Meeting30Oct06.ppt
32
http://www.jisc.ac.uk/whatwedo/programmes/programme_digital_repositories/project_dealing_with_data.aspx
33
http://gradedemo.edina.ac.uk/dspace/index.jsp
31
12
a list of key functions (both geo-specific and more general) being identified by associate partners and
34
their invited reviewers .
General user requirements
Repository Geo-specific user requirements
Function
•
Location-based searching - by drawing a box on a •
Search by institution or research
Search
•
•
•
•
•
Deposit
•
•
•
Download
•
Policy
•
•
•
map, by entering a place name, by clicking an
identifiable area on a map e.g. county boundary, by
entering postcode.
Location-based searching to be prominent on
repository ‘home page’
Location-based searching needs to offer high quality
large scale data for locating oneself (maps available
within Google Maps, for example, is not sufficient)
Geographic breakdown of collections within the
repository e.g. UK, Europe, World collections and
thus refined searching within geographic collection
(linked to this is the option to subscribe to RSS feeds
on these more fine-grained collections)
Searching via geospatial data type
While Dublin Core metadata is sufficient in the first
instance for a user to discover a relevant data set
there needs to be a few supplementary geo-specific
‘quick visuals’ including industry-recognisable logos
for GIS data format and a small pictorial thumbnail
indicating actual geographic extent of the dataset.
Since storing the geographic extent of a dataset is
necessary to enable location-based searching, the
repository needs to be able to automatically extract
the geographic extent of the dataset directly from the
dataset as part of the upload process.
Ability to deposit bundled ‘project’ data – that is
‘project data’ that comprises several datasets as one
deposit item.
A standards-compliant geospatial metadata record
35
(ISO 19115 profile ) should be part of the deposit
item. Metadata of this quality is required to make a
judgement on fitness for purpose and for confident
data reuse.
On download, data needs to be in a form ready for
immediate use in GIS software and other geospatial
application software
No restrictions on GIS data format
No restrictions on geographic extent of data
Rules for naming data deposits (item title to include
place name, country, scale)
•
•
•
•
•
•
•
•
•
•
•
•
•
groups
Improved date searching –
especially date range
Quality rating system
Subject searching assisted by
controlled keywords
Fast and simple submit process
Automatic ingest of multiple files
Ability to deal with data
versioning
Guidance on licensing
Quick download
Ability to inform user of
uncompressed data set size
prior to download
Single authentication system
Rules for dates
36
Platform independent
37
Sustainable and trusted for
long term and guaranteed
access
Table 1: Geo-specific and general user requirements for a repository capable of managing
geospatial data.
34
Detailed in the report http://edina.ac.uk/projects/grade/FormalGISRepositoryFeedbackFinal.pdf
ISO19115 (www.tc211.org/metadata) is the international geospatial metadata standard. The UK academic profile of
ISO19115 is promoted as the geospatial metadata standard to be used within UK HFE.
36
For example, the current demonstrator expected data to be uploaded as a zip file – users from unix-based systems indicated
that restricted them from upload.
37
Of interest, survey respondents indicated that since the demonstrator repository was run by EDINA (EDINA run a suite of
subscription geospatial data download/online mapping services for UK HEFE), a degree of trust was already established in the
repository even though it was only a demonstrator.
35
13
Work Package 2 – Scoping the role of informal repositories
In order to scope the role of informal repositories for geospatial data sharing, work package 2 initially
assessed current data sharing practices with information gathered from (i) the anonymous survey
distributed at GISRUK and on the GRADE project web site and (ii) the 2-day workshop on informal
38
data sharing and (iii) associate partners. Results confirmed that data sharing is commonplace, 90%
of those responding to the anonymous survey could recall a time they had shared data and indeed
many qualified this by saying they shared data often. Data sharing is predominantly between
colleagues and amongst project partners.
The survey also asked respondents to identify key data sharing issues (including barriers to sharing
and what would make data sharing easier). Appendix I contains full results to these questions.
However it is relevant to highlight that confusion over license restrictions and the lack of a national
data repository to facilitate data sharing were key responses to both questions. These issues were
raised again by workshop participants as described below.
Informal repositories per se are not being used, rather conventional techniques are the preferred
methods for sharing data, the most popular being email attachment and CD/DVD. More up-to-date
file sharing techniques are not commonly used. Only a few were described within the anonymous
questionnaire responses including making data available as standards-compliant web services and
sharing data via www.yousendit.com. Associate partner sites and workshop attendees reported using
shared network drives, FTP, USB memory stick, personal URLs, WebCT, unix links, external hard
drives and Blackboard for data sharing.
Since there was no evidence of contemporary file-sharing softwares being widely used, it was decided
the workshop should focus on a brief experiment using fairly new but readily available Internet-based
39
technologies as ‘informal repositories’ mainly utilising peer-to-peer features . Following tests of
several products (with varied suitability) two were chosen for their contrasting approaches: one a
40
browser plug-in; the other a desktop client. AllPeers , a free plug-in for Mozilla’s Firefox web
browser, allows the user to invite friends to various groups so that data can be shared using a peer-to41
peer approach. Exaroom , a portal-based sharing environment involves the user connecting to a
web-page to configure settings and inviting friends. It also involves installing a client that installs
“Ground Control” their file-sharing software.
On day one of the experiment, workshop participants attempted to share data remotely using AllPeers
and Exaroom. For reasons described in the workshop report, the ‘chained’ sharing of data amongst a
group of twelve individuals was unsuccessful. However, the desired outcome of enabling participants
to contribute to the discussion of informal approaches to data sharing was successful.
Participants identified broad problems with the types of approaches to data sharing they had
experimented with including the amount of time taken to set-up the software (especially if there was a
situation with a real need to share data), complicated interfaces, the need to add friends, the ‘massive
searching’ that would be needed for sharing data in a real setting, the need to communicate and
negotiate sharing (that is, having to tell the other person the files are available), privacy concerns.
More technical participants saw potential in such approaches including solving the problem of large
files (for email), solving the lack of immediacy for CDROM, the portability of a browser-based
approach, enabling collaboration, more interactive/in your face sharing, offering potential use for
browsing other peoples’ data, potential to reduce time spent tackling data requests from students. To
expand upon these problems/potentials, participants came up with a SWOT analysis to summarise
their views on these informal approaches to data sharing:
38
Survey results are described in full detail at http://edina.ac.uk/projects/grade/InformalQanalysisReport.pdf
The workshop is described in detail at http://edina.ac.uk/projects/grade/Grade_reportRSSv2.pdf
40
www.AllPeers.com
41
www.exaroom.com/beta
39
14
Strengths
•
•
•
•
•
•
•
•
•
•
•
•
Weaknesses
ease of installation
ease of registration (through one login rather than
multiple links)
“ease of use”
the user-interface of Exaroom being better than other
examples
“less data corruption” in the sharing process that the
tools could be “always on”
that AllPeers was available on any machine
Speed and immediate access to data
Control over access to data for groups or individual
friends
Good community building tools
Able to assist in negotiating the technical aspects of
sharing data
Potential to assist in setting up project groups
Potential to offer an information resource
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Opportunities
•
•
•
•
•
•
“Too many features not really needed”.
Exaroom being “client based”
Exaroom being both “…invasive and [requiring] the
DotNet extra download”
“Privacy” and “possible firewall issues”
A need for administrator login for both installation and
use
Beta status
Poor user interface
Poor functionality (one participant needed to chat to
many friends during the workshop, not just one-to-one)
Need to have pc on 24x7
Need to be logged on
Require both sharers to be pre-registered as friends
Require both sharers to be logged in
Navigating across many friends to actually ‘find’ data
Limited organization of data
Not interoperable
Threats
Aiding collaboration
“Quick negotiation” of access to data “A: ‘Have you got
hospital locations?’ B: ‘Yeah, in my Exaroom folder’”
Developing “data sharing communities” to expand “the
personal network of GI users”
Projects where a small group needs to share datasets
“Inter- and intra-institute collaboration”, noting the need
to share data within and between organisations
“Working with non-academic partners on projects” and
“non-academic access” to datasets
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
the software was “untested”
a dependence on “…on 3rd party software”
that alternatives existed to Exaroom and AllPeers
that “personal preferences” would impact on selection of
alternatives
that better or more embedded examples may exist
“…with more GI-related functionality”
“… other systems [were] not compatible”
the limitations of having an infrastructure which “relies
on individuals to run and maintain [it]”
the administrative overhead for valuable data assets
could be prohibitive in some local settings
system security
“invasive” nature of the systems and “surveillance?”
(especially when it offered too much scrutiny of
colleagues’ work)
the need to “trust” software providers who could,
potentially, have access to the user’s data
the loss of “… control over your data” once it is made
accessible
The “slow take-up” of technology and sharing practices,
impacting on the culture of sharing in the academic
community.
“Will the community really grow?” , limited number of
users, limited resources, technologies difficulty of use
The lack of development of best practice, with the
current culture and “academics!”
Table 2: SWOT analysis of peer-to-peer file sharing applications for informal data sharing
During the first day of the experiment, participants had also been asked to interact with the formal
demonstrator repository (developed within work package 1) as a comparable facility for data sharing.
During discussions on the second day of the workshop, participants developed the following SWOT
analysis on the formal demonstrator repository.
15
Strengths
•
•
•
•
•
•
•
•
•
•
Weaknesses
Quick and easy/straightforward to use
“reliable” and “robust”
“permanent” and “dedicated” resource in a “single
location” (aiding “collaboration”)
“make[s] data available” as part of an “always-on
resource”, including, otherwise, “orphan datasets”
“server based”
using “single registration”
the ease of adding and searching for datasets
“reduced effort” and/or “removal of effort” in creating
“shared” datasets
GI centric / “specifically established to host UK HE
[Higher Education] GI”
Moderated content (i.e. datasets from inside or outside
the UK) by a “known/trusted resource” (i.e. administered
by EDINA)
Opportunities
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
a lack of “links between submissions” and users not
being able to “…sign up to topics/themes”
“lacks communication/community tools” present in the
informal methods
Time consuming to use
Restrictions on GIS data types
Limiting repository content to certain types of ‘GI’
The repository’s underlying software “… cannot read GI
in native formats” as .zip files need to be submitted
General concerns of the management of “data quality
issues”, “quality assurance” and “quality control” of data
and metadata
Unknown ‘meaning’ of datasets, again, relating to the
desire for formalised or community-mediated
“semantics”
Problems of “repeated metadata creation”
The lack of opportunity to “… use previously created
metadata” possibly using the metadata already created
in ESRI’s ArcCatalog product
Threats
the community’s “… most common data delivery/sharing
method”
an experimental base for additional applications such as
the “implementation of a powerful GI search engine”
a chance to develop best practice in this arena
data quality mediation through “Definitive datasets
augmenting ‘official’ representations and descriptions”
making data accessible to increase quality through reuse (alongside the reporting of errors, omissions and
potential application areas)
aid “collaborative research”/“collaboration”, with
examples including “project based file share hosting”
a “Potential major resource for e-science”
“Integration” [of the GRADE demonstrator repository]
with other EDINA GI collections, including the Digimap
service
“links with other metadata catalogues (Go-Geo!)”
resources such as a “Teaching repository; Research
repository”
Once developed considering the resource as part of a
national spatial data infrastructure42
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
potential “underuse” of a resource
possible concerns about unequal levels of contribution data ‘free-loaders’
the “need to be certified”
the need to “… address community concerns of digital
rights”
“… licensing issues – perceived or real”
“…data and software licensing” restrictions
The sustainability of the resource linked to “funding”,
“maintenance [costs]”, current “Project Shelf Life”
Metadata, Limited coverage, the need to adopt
“Metadata Standards, the need to utilise metadata
already created in software such as ArcCatalog
Data volumes (“Does the repository have the right tools
to cope with increased data volumes?”)
That such demonstrator resources may not be seen as
“proper” and not worth participating in if it is not a
mature and established resource
that the resource could be “limited by access rights to
academics only”
the “Core purpose [of repositories] may not reflect
research practice in terms of the [current] culture of
sharing GI”
the wider “need [for] a change in attitude” to sharing GI
‘cultural inertia’ issues, including “personal preference”
and what is seen as “appropriate” methods to share GI
Alternative technologies/resources that may exist in
competition to the GRADE demonstrator repository
(“There are already other geo-repositories in the UK e.g.
NERC”)
Table 3: SWOT analysis of role of geospatial data repository for formal data sharing.
A series of recommendations concluded this work package:
1. There is a need to address the GIS community’s concerns and possible misconceptions about
licensing restrictions against a need to share data. Certainly, the work carried out in WP3 of
GRADE relating to digital rights and licensed users should help in what will need to be an
educational and participatory process.
2. Currently, GIS-users appear to have a mixture of sufficient approaches to share data, in general.
Informal repositories could have some role in geospatial data-sharing for small group activities but
they appear to have limited utility to act as a distributed national resource. More work is needed
42
A spatial data infrastructure (SDI) is generally viewed as an umbrella of policies, standards, and procedures under which
organisations and technologies interact to foster more efficient use, management and production of geospatial data
16
3.
4.
5.
6.
7.
8.
9.
10.
to explore and monitor the uptake and role of informal repositories in small group settings and
how they could contribute to a wider infrastructure.
The development of a national geospatial data repository was well supported by the study’s
participants and should be promoted. As such, there is a need to identify champions in local
settings to promote and encourage its use. Regional training sessions and promotional resources
should be developed to fit varying audiences that are likely to emerge in both research and
teaching.
There is a need to develop links between a national geospatial data repository and other EDINA
resources. In particular, consideration could be given to the core/framework datasets held in, for
example, Digimap and UKBORDERS and storage issues that may otherwise emerge through
duplicated geometry. Such an approach would also aid comparisons of the spatial extent of
topics, both in terms of the patchwork of geographical coverage of a given theme or the coverage
of themes at a given location.
Similarly, the establishment of the GRADE repository should be considered around the role of GoGeo!, where EDINA should envisage Go-Geo! as a geospatial ‘one-stop-shop’ for UK academia.
In particular, the service should help users to find, evaluate and re-use geospatial assets held
within a national facility. Go-Geo! should also be capable of searching and accessing geospatial
data in any repository capable of managing licensed geospatial data as well as exploring the
provision of the social/community tools seen to be needed to negotiate, discuss and, potentially,
instill ‘trust’ around GI, towards in silico research.
As such, there is a need to consider the metadata that this should involve, drawing on information
that users have already provided in existing software. In addition, there is a need to raise
awareness of formal metadata for both sharing and personal data- maintenance/-curation,
something requiring greater promotion at training and grassroots levels, beyond just the GIS
community.
The creation of an accessible permanent central geospatial data repository offers opportunities to
link to wider data sources. The relationships between a national central repository and formal
institutional repositories, alongside grey-sharing and the fostering of orphan datasets, should be
explored in terms of OGC compliant registries and the means to draw neglected data into a place
where it can be stored, re-used, validated and valued.
The creation of a successful national geospatial data repository offers opportunities to develop
Geographical Information Retrieval tools capable of searching and making sense of both a
plethora of data and, through middleware, within massive datasets. Such ideas also link to the
challenges of currently non-standard forms of GI being considered as part of the semantic web
and how (community-mediated) semantic and technical interoperability play a role in having users
readily access the data they need.
Consideration should be given to current research practice and the desire to develop a wider
infrastructure in the context of in silico research and national spatial data infrastructures,
particularly through the conditions that would need to be put in place to allow authorised access
by public sector colleagues to metadata and dataset ‘discussion’ and, if appropriate, allow
authenticated access to the actual datasets, especially given the potential roles of academia in
activities such as INSPIRE.
Such activity should be recognised as an opportunity to continue research and exploration
involving longitudinal and in situ qualitative approaches to better understand the demands of a
variety of GIS users, in relation to concerns about trust relating to their data and that of others,
and the development of appropriate theoretical models to understand the social and technical
context and role that (academic) geospatial data repositories play in Europe’s wider ‘e-society’.
Work Package 3 – Digital Rights Issues
A work package on legal issues was included within the project because it was generally believed the
law on copyright and licensing for geospatial data in the UK is not as well understood nor as clear-cut
as it might be and there exists scope for clarifying the fundamental end user rights.
43
Through the provision of eleven (geospatial) use-case scenarios describing the main actors,
stakeholders, data sets and outputs, a basis for the investigation of copyright issues surrounding the
use and dissemination of derived data sets was given. In particular, the importance of the inheritance
of copyright licensing for derived data sets was established. The interaction of a variety of
43
http://edina.ac.uk/projects/grade/usecasecompendium.pdf
17
stakeholders with varying implicit and explicit licensing conditions makes the definition of precise
copyright boundaries difficult to establish. The requirement to adhere to the most severe licensing
restriction poses significant problems to data repository establishment. The report was principally
based around the presentation of eleven use-case examples from a variety of geographic disciplines,
using national and international data sets from both third parties and collected by the individual
authors. The use-cases were intended to outline how a project uses a variety of input data sets to
perform a research task in order to produce an output data set. It is assumed that one of the primary
outcomes of such a project is the production of a new data set that may then be lodged with a digital
data repository. Each use case identified key stakeholders, goals of those stakeholders, data
processing techniques, described the end product (either the data set or some presentation of the
data) and summarised possible licensing restrictions for reuse of the data.
The use cases provided a sound basis for the legal team to develop an understanding of the
geospatial data licence landscape leading to the legal argument developed as the key outcome of
work package 3. The argument centres around the principle that there is no legal grounding for the
continued strict limitations on the (re)use of digital geospatial data within research and teaching
contexts. It is argued this is because the common assumption that geospatial data are protected by
44
UK copyright law may be founded on fundamental misconceptions of copyright and database law . A
conclusion reached quite separately by Janssen and Dumortier in their paper on the protection of
maps and spatial databases in Europe and the US (Janssen and Dumortier, 2006).
This conclusion has far-reaching consequences for geospatial dataset (re)use in the UK, and could
allow a national geospatial repository service to operate in a less constrained fashion than might
45
otherwise prevail. The JISC IPR Consultancy has acknowledged the work in their paper on
licensing issues in derived data (Korn, Oppenheim and Duncan, 2007) and has recommended that
JISC commission an in-depth study of IPR and licensing of derived data building upon GRADE
46
work . To date, negotiations with content providers have not been based on the conclusions of the
GRADE work on digital rights, though this is a complex area and we understand that discussions are
continuing.
However, if at some point, database law is indeed deemed the most appropriate legal interpretative
context then the following assumptions form the basis of a licensing framework for repositories
managing licensed geospatial data assets:
• A repository capable of managing licensed geospatial assets (a repository) will be used in the
HFE community for consultation, non-commercial research and teaching purposes.
• Geospatial data deposited in a repository will come from a variety of sources and is likely to
have passed through various stages of manipulation
• The researchers who deposit data in a repository will either have created the data
themselves, used data which are not subject to re-use restrictions and/or will be lawful users
of the geospatial databases from which extractions of geospatial data are made
• So long as a lawful user, those researchers are at liberty to extract an insubstantial amount
from the contents of the source database for any purpose (including for deposit in a
repository) and where the data are used for non-commercial research and illustration for
teaching they may extract a substantial part from the source database
• Given that only a limited number of researchers will be interested in any particular part of a
geospatial database, making available the extractions to other researchers for the purposes
of non-commercial research or illustration for teaching would seem not to infringe the reutilisation right even where those extractions are a substantial part of the source database
• Data deposited by researchers may amount to only insubstantial parts of the source
database, but if repeated deposits from the same source database are made, then these
may, in total, amount to a substantial part of the source database. The Database Directive
shields those who use them for the purposes of non-commercial research and illustration for
teaching.
44
http://edina.ac.uk/projects/grade/gradeDigitalRightsIssues.pdf
http://www.jisc.ac.uk/whatwedo/projects/ipr/iprconsultancy.aspx
46
It is doubtful how much more a thorough investigation can be built upon the concise well-developed legal discussion put
th
forward in the GRADE report. The work has widely distributed with an article appearing in the 5 April 2007 edition of
Technology Guardian as part of The Guardians Free Our Data campaign
http://www.guardian.co.uk/technology/2007/apr/05/freeourdata.intellectualproperty
45
18
•
•
•
•
•
•
•
•
Where a researcher or teacher extracts a substantial part of the source database to use for
the permitted purposes then the source must be attributed. [This would be easier to manage
if there is an obligation to attribute source no matter the size of the deposit]
Where extractions are to be made, a standard access management technology, such as
Shibboleth (via the UK Access Management Federation for Education and Research) or
Athens would help to ensure that substantial parts of the source databases are used only
within HFE and for the permitted purposes
A repository facilitates the work of researchers and teachers but is not itself a lawful user of
any of the source databases
By depositing the contents in a repository, the repository does not thereby ‘extract’ contents
from the original database within the meaning of the Directive
By holding the geospatial data deposited by researchers, a repository thereby re-utilises the
data within the meaning of the Directive (i.e. it makes the data available to the public through
on-line transmission)
Where only an insubstantial part of the source database is made available, this would not
infringe the re-utilisation right.
Where a repository makes a substantial part of the source database available to the public,
whether the re-utilisation right is infringed will depend upon whether making it available to a
limited number of researchers and teachers is considered as making it available ‘to the
public’.
Consultation of a database does not infringe the sui generis right. Therefore there would be
no difficulty in having a repository ‘open’ to be consulted by all.
Work Package 4 – Scoping the role of institutional repositories for geospatial data
Work package 4 aimed to make an assessment of the role of institutional repositories for managing
47
geospatial data . The number of responses received to the web-based survey was poor despite
several call exercises being made. Although aimed at UK institutional repositories, of the 35
responses received, several (2) were from Europe and elsewhere (5) and from institutional
repositories and data centres (2).
Survey results revealed that there are currently no UK geospatial subject-specific/community
repositories in operation and that although there are growing numbers of institutional repositories,
none of them currently manage any geospatial content (and would not be capable of doing so). It
probably also has something to do with the evolving repository landscape. The report points to survey
replies with many repositories managers admitting they are dealing first with publication output since
with support from the Open Access Movement they are likely to achieve some measure of success in
obtaining content. At the time of the report it was also suggested that it could also be resultant on the
fact that the open source IR softwares do not have a ready made metadata schema to accommodate
datasets. It was suggested that if the IR software vendors developed a Dataset plug-in it is possible
.
that Institutional Repositories would have already been challenged to manage them JISC are now
funding development of several application profiles to support repository searching of a variety of
media including geospatial data.
The survey also revealed that whilst there would be a willingness of (some) institutional repositories
(IRs) to accept geospatial data, little has been offered to date. Experience of geospatial data handling
within IRs is thus largely non-existent and given the specialisms involved (as detailed in the findings
from the formal demonstrator user survey), institutional repositories may not be ideally suited to
manage the data outputs of the geospatial research community. Indeed where designated data
centres exist it is unlikely that Institutional Repositories will be the archive of choice for datasets, but
there are many disciplines where there is no formal data archive available, or the data centre has a
strict scoping on size and subject of datasets they accept. The report suggests that in these cases
rather than datasets existence being hidden on a personal pc, it is possible the IRs may have a role to
play.
The question of whether IRs are the most suitable repository for geospatial data was debated within
the report. Table 4 provides a summary of that debate.
47
http://edina.ac.uk/projects/grade/GRADE_Survey_Report.pdf
19
STRENGTHS of an IR dealing with geospatial data
One repository – less administrative and technical
overhead
Linking text, datasets, images easier within one
environment
Showcase for all institutional research
IR Software - Open Access – interoperability –
visibility
Software based on International Standards
Metadata skills provided by Information community
Formal Dataset citation
Supports Citation analysis and metrics for
research funding and personal promotion for data
generators/managers
WEAKNESSES of an IR dealing with geospatial data
Software not designed to cope with data
No IR metadata schema for datasets yet
IR staff without Data Processing skills
IRs do not quality control content
IRs not involved in production of information
products
Storage – Preservation (all media types)
OA culture not yet extended to data although
OEDC, EU and some Research Councils etc.
mandate deposit of data emanating from
funding.
OPPORTUNITIES of an IR dealing with geospatial data
Contribute to the design of an IR dataset
metadata module
To offer a data archive (where non exists)
Treats ‘orphan’ datasets not accepted by Data
Centres
Enhancement of IR staff skills
Showcase in one digital repository of all research
output
Ready host when dataset deposit is mandated
Integration – joined up research
Additional Funding opportunities from e-Research
projects
Input to the Data citation model
Data and Information communities working
together
Collaboration between disciplines
Dataset harvesting from IRs to Data Centres
THREATS of an IR dealing with geospatial data
Turf war between IRs and Data Centres
Will funding follow to IRs
Will funding stream for data management
reduce?
Too large an undertaking for IRs
Data lost in publication ‘bucket’
‘Thematic’ datasets distributed
No migration/preservation policy
Datasets fall ‘between stools’
Table 4: SWOT analysis of institutional repositories managing geospatial data.
The report concluded the repository landscape is rapidly evolving and that a pragmatic approach is
appropriate now. If a researcher has access to an appropriate data centre to deposit the dataset then
that should be the preferred route provided that the papers and publications resulting from the dataset
are linked. However, if a researcher does not have access to an appropriate data centre, is it not
better that the dataset is at least deposited in a trusted repository? Leaving the dataset on the
researcher’s pc, as is often the case now, will ensure it is ‘lost’ forever.
Work Package 5
Work package 5 focused upon interoperability aspects for geospatial data repositories. Key findings
of this work confirmed geospatial data repositories are in a strong position to interoperate with other
repositories/services within the JISC IE and beyond. Standards for exchanging geospatial data have
been in development since the early 1990s through the work of the International Organisation for
48
Standardisation’s Technical Committee No. 211
(ISO TC211) and the Open Geospatial
49
Consortium (OGC). ISO TC211 standards form the foundation/ building blocks for geospatial data
interoperability to occur (e.g. setting the metadata standard for describing geospatial data and
defining how to represent coordinate systems that geospatial data exist in). OGC specifications
50
implement these standards . With this mature framework in place, repositories managing geospatial
data are well positioned to exploit opportunities for interoperating. For example, the GRADE project
repository demonstrator successfully demonstrated that a geospatial data repository can interoperate
with other repositories. This was achieved by implementing the DSpace OAI-PMH interface on the
51
demonstrator. The JISC-funded PerX project then harvested metadata from the GRADE project
demonstrator and made it searchable via the PerX project search interface. This demonstrates that a
48
49
50
http://www.isotc211.org/
www.opengeospatial.org
For example the OGC Web Mapping Service (WMS) specification delivers geospatial data as an image which can be viewed
within any browser or overlaid with other geospatial data within a geographic information system. The WMS specification
adheres to the ISO 19111 Coordinate Reference standard for defining the coordinate system so that the image is overlayed
correctly with other georeferenced data.
51
http://www.icbl.hw.ac.uk/perx/
20
geospatial data repository can interoperate with other repositories across the JISC IE via OAI-PMH
52
and Dublin Core as a lowest common denominator metadata standard .
Despite the mature standards-making framework for geospatial data, there are particular areas where
work needs to focus to improve key interoperability issues that a repository environment bring to the
fore for geospatial data, including:
• The lack of geospatial data standards for long term data preservation. Geography mark-up
language is an interoperable geospatial data standard for data transfer. However its verbose
nature and the fact that it cannot be used natively within a GIS application without being
converted means it is not ideal for preservation purposes.
• Content packaging standards – standards making organisations have not to date put effort
into developing standard approaches for packaging geospatial data. For example, geospatial
data can be somewhat meaningless without the appropriate visualisation/rendering
information. Or as reported within work package 1, feedback indicated a strong desire to
deposit project related data as one deposit item. Development of a content packaging
standard for geospatial data is essential to repository interoperability.
• Web services on deposited data – the geospatial community are fairly progressive in their
take up of accessing geospatial data as live data streams or web services. Again during work
package 1 reference was made to a weakness of the demonstrator repository that it was
unable to offer access to repository items as web services rather than as data downloads.
Resolving how to offer up web services of items within a repository would enhance
interoperability of the repository.
Outcomes
Table 5 provides a summary view of project achievements against original project aims and
objectives.
Work Package Aims and objectives
1.1 Establish detailed repository use cases and user based
evidence for the requirements and functionality of a repository
capable of managing licensed geospatial assets.
1.2 To investigate and identify the technical and cultural issues
surrounding the storage, management and accessibility of
geospatial information derived from licensed data within a
centralised digital repository.
1.3 To synthesise the lessons into best practice and advice for
those concerned with the establishment and operation of research
data or media-centric repositories.
2.1 To investigate the extent of current informal data publication
and sharing and the ‘grey economy’ of geospatial information
sharing.
2.2 To investigate the relationships and potential interfaces
between informal and formal (including Institutional) repositories.
2.3 To scope possible technical architectures for informal data
sharing and investigate how security issues may be addressed
(DRM issues will be addressed as part of WP3).
2.4 To pilot an informal geospatial data repository demonstrator
with Associate Partners to inform understanding of the cultural
and technical issues involved.
2.5 To synthesise our findings into a preferred policy statement on
the use of Informal Repositories.
3.1 To articulate intended use cases of sharing derived geospatial
data.
3.2 To develop a clear understanding of digital rights pertaining to
data created entirely by a user or research team.
3.3 To develop a clear understanding of digital rights issues for
derived data respecting the licensing conditions of the source
data.
3.4 To develop a conceptual and technical framework for
resolving those described rights management issues raised in
relationship to repositories.
52
Project Achievements
1.1 User-based evidence well documented
based upon iterative interaction with project
demonstrator
1.2 Technical issues for geospatial data within
digital repository well-investigated and
described. Cultural issues for geospatial data
within repositories came to the fore more in
WP2 during the investigation of informal data
sharing practices
1.3 Compiling user-based evidence brought
many generic issues for data sharing of interest
to all parties dealing with research data
2.1 Achieved a comprehensive investigation of
current data sharing practices
2.2 No investigation of potential interfaces
between informal and formal repositories.
2.3 No scoping of possible architectures for
informal data sharing
2.4 No demonstrator developed, rather a
workshop using a variety of existing peer-2-peer
software
2.5 Series of recommendations developed from
workshop findings
New – Gained insight into departmental data
management practices
3.1 Detailed compendium of derived geospatial
data created
3.2 and 3.3 Legal report presenting arguments
for digital rights issues produced
3.4 Licensing framework proposed.
Mappings between UK academic profile of ISO19115 and Dublin Core and Data Documentation Initiative metadata
standards are available at www.gogeo.ac.uk/
21
4.1 To determine to what extent Institutional repositories currently
manage geospatial assets
4.2 To determine to what extent Institutional repositories could or
should manage geospatial assets
4.3 To investigate the arguments for and against Institutional
versus media-centric repositories for geospatial data assets
5.1 To investigate and assess the role of existing JISC sponsored
terminology services with respect to repositories.
5.2 To investigate and assess interoperability between geospatial
data repositories and other types of services/repositories
5.3 To investigate the linking of geospatial repositories with eScience infrastructures.
5.4 To investigate and assess the potential of evolving industry
driven geospatial interoperability standards, specifically Open
Geospatial Consortium/ISO 19100 series standards, as a means
of interoperating with repositories within and outside of academia.
4.1 Baseline audit of IRs carried out
4.2 and 4.3 Discussion paper investigates role
of IRs
5.1 Relationship of data repository to other key
elements of academic spatial data
infrastructureInvestigated)
5.2 Interoperability between geospatial data
repository and other types of repositories
demonstrated within work package 1.
5.3 Identification of a degree of community
need for repository data to be made available
as web service.
5.4 Identification of key relevant OGC/ISO
19100 standards for repository interoperability
Table 5 – Summary view of project outcomes against original aims and objectives
Work package 1 successfully identified user requirements for a repository capable of managing
geospatial data. It also identified a series of more general repository user requirements relevant to
any repository – institutional or media-centric. Of greater importance perhaps was the opportunity the
demonstrator afforded the wider geospatial community to interact with an infrastructure to facilitate
formal sharing of derived data. Findings from work packages 1 and 2 demonstrated clear support for
53
the creation of a national geospatial data repository . Creating a demonstrator was a positive
method for engaging the community and has resulted in community focus being shifted onto the need
for the provision of this key element of an academic spatial data infrastructure. Perhaps by focusing
on the possibilities offered by the demonstrator, associate partners more readily acknowledged the
lack of data management practices within their own departments.
Work package 2 provided concrete evidence that geospatial data sharing is commonplace with 90%
of survey respondents confirming they had shared data recently. The geospatial community will
benefit from the knowledge that concerns about the (perceived) complexities of current data licences
are commonplace throughout the community.
Workshop participants have benefited from
involvement in the active research aimed at scoping the role of informal methods for data sharing.
This involvement not only offered them exposure to peer-to-peer approaches to file sharing but also
offered them the opportunity to consider the potential use informal sharing approaches versus with the
more formal sharing afforded by the demonstrator repository. Again this work reinforced workshop
participants views of the need for a national geospatial data repository linked closely to other geo
services within the JISC IE including Digimap and Go-Geo!
The investigation into licensing issues for geospatial data repositories within Work package 3 has
brought unprecedented focus on the legality of current licenses for geospatial data reuse within UK
HEFE. While key data providers are unwilling to enter into such a debate, the academic geospatial
community still stand to benefit from this work if only by forcing the development of clear, plain English
guidelines as to what end users can in fact do with derived data under current licences. It should be
viewed favourably that the JISC IPR Consultancy have recommended that JISC fund further
investigation into the area of licensing issues for derived data.
Work package 4 has provided a benchmark assessment of levels of geospatial data within institutional
repositories. This is of value as it can be used to assist in determining growth patterns of institutional
repositories. The report identified that the lack of metadata schema for data within open source
repositories may also have hindered possible deposition rates. It is hoped this conclusion assisted
the JISC in deciding to fund the development of a geospatial application profile for repository
searching. The SWOT analysis of issues for storing geospatial data within institutional repositories
should be of value to repository managers in their strategic planning for future repository expansion.
Work package 5 provided an overview of interoperability issues for geospatial data repositories. This
work is valuable to the geospatial community because it demonstrates a commitment to aligning geoindustry standards (which the geo community feel strong allegiance to) with those standards found
within the JISC IE. The identification of where geospatial standards are weak (content packaging
53
Recommendation 3 from work package 2 concluded “The development of a national geospatial data repository was well
supported by the study’s participants and should be promoted.”
22
standards, preservation standards) demonstrate that there is useful work within the e-Library world
that the geospatial community can leverage and again shows commitment to investing in further
standards-setting work. Likewise the familiarity within the geospatial community of data being made
available as web services is valuable for the wider JISC community in their strategic plan to push
forward e-Science infrastructures. The demonstration that metadata for repository items within the
project demonstrator could be harvested and presented for searching by a general repository should
be of value to both the geospatial and the repository community as demonstrating the value of OAI
interfaces on repositories.
Conclusions
It is clear that the legal issues surrounding the use and reuse of geospatial data within the academic
community are peculiar and distinct from other data sharing communities where IPR and digital rights
issues are less pronounced (if relevant at all). Legal investigations suggest that copyright may not
subsist in geospatial data (rather the correct law is the EU database directive). If this legal argument
were true, derived data can be deposited legitimately in a repository for reuse by other members of
academia for non-commercial use so long as acknowledgement of source data providers is made.
Concerns relating to breaking licensing conditions are the major barriers to more formal data sharing.
There is a clear view from user-based evidence amassed from the results of a questionnaire survey,
from workshop participants and from direct discussions with associate partners, that there is a
perceived need for the establishment of a national data repository to support the sharing and reuse of
geospatial data. This data repository would fill a noticeable gap and provide good linkages to other
elements within the UK academic spatial data infrastructure.
It was discovered the geospatial community has particular user requirements for a repository capable
of managing digital geospatial assets particularly location-based searching and a degree of automatic
generation of geo-related metadata during the deposit process. Iterative enhancements were made
to the formal demonstrator in order to illicit feedback on community need and aspirations resulting in a
demonstrator repository with over 160 deposited datasets, over 170 registered users and an equal
number of users unable to register (due to licensing restrictions).
Informal data sharing is commonplace amongst people networks. Sharing is predominantly via
trusted methods including email attachments and CD/DVD. More up-to-date peer-to-peer file-sharing
approaches are as yet unexploited. During the workshop into informal sharing, doubt was raised as to
their value, in their current beta form.
Institutional repositories do not currently manage geospatial data and are not set up to do so.
However no IR has yet been offered geospatial data as a deposit item and over time IRs may have a
role alongside media-centric repositories in managing scientific research data.
The current
development of a geospatial application profile to support repository searching may go some way to
assisting IRs being able to accept geospatial data.
Geospatial standards (for metadata, OGC-based for geoprocessing services) are key to
interoperability of a geospatial repository within the JISC IE. With this in mind, the well-established
relationship between the International Organisation for Standards and the Open Geospatial
Consortium mean geospatial data repositories are well positioned to interopate with other repositories
and facilities within the JISC IE. For improved interoperability, the development of standards for
geospatial data preservation and content packaging need to be prioritised.
Implications
Legal Implications
The legal argument put forward in GRADE has consequences that reach beyond the Repositories
Programme. Indeed it is of significance to not only those creators/users of geospatial data within
academia but to any individual/organisation seeking to reuse/share geospatial data derived from
licensed geospatial data. It is appropriate that the JISC IPR Consultancy have identified the need to
pursue this work further and encourage wider debate.
23
In the meantime, researchers remain uncertain as to what is a legally acceptable use of derived data.
It is possible however to foresee the establishment of a geospatial data repository working within the
circumscribed space afforded by existing license agreements. Under this scenario, work would have
to focus on resolving the issue of copyright inheritance of derived data again resulting in guiding
principles for assisting the user decide what they can/cannot do with their derived data. Without these
guidelines, researchers will increasingly have to deal with a conflict of interest - coming under
increased pressure to deposit their research outputs whilst remaining uncertain as to what they can
legally do with their derived data within the boundaries of the licence agreements they have entered
into with data providers.
Interoperability
When considering interoperability aspects of geospatial data repositories conclusions reached
claimed the mature geospatial standards organisations ensure repositories managing geospatial data
are well placed to interoperate within the wider IE repository landscape. However, it is clear there are
two key areas that future work should focus upon:
• Geospatial data sets are often not one single file rather a group of files that must exist
together to be valid. In addition to that often geospatial data need to belong with other files
for example, files that describe how to classify and display the data and files that describe
the project system of the data. A third scenario, described in user feedback from work
package 1, was that researchers wish to be able to deposit a bundle of project-related data.
For this to work attention needs to be given to developing standards for packaging geospatial
data, associated files and metadata. The international standards making organisations have
not given attention so far to content packaging for geospatial data so this is a key area for
future work.
• Related to the first point is another key area - standards for geospatial data preservation.
Geospatial data exist in multiple formats. Currently there is no agreed format for data
preservation. Geography Mark-up Language is an XML-based open standard for geospatial
data exchange. A data exchange format is not ideal for preservation purposes not least
54
because of the size of the resultant file but also the possible loss of detail . There is a need
for work to focus on appropriate interoperable formats for geospatial data preservation.
Technical Advances
At GRADE project outset, in mid 2005, a literature review was carried out to assess the global status
of repositories and geospatial data. At the time, there were very few instances of geospatial data
being stored within repositories. However since project outset in June 2005, the data repository
landscape has changed, the challenge for any future work is how to leverage these advancements
including:
• Open Geo-Archives Initiative looking at integrating earth science data centres into research
55
portals. PANGAEA is a public data library for science aimed at archiving, publishing and
distributing georeferenced data with special emphasis on environmental, marine and
geological basic research.
• Advances in semantic linking of data and journal publications. For example the STD-DOI
56
Creating access to Scientific Data project . This project uses DOI to link datasets to articles
and vice versa.
57
• The GEONGrid project: a cyber-infrastructure facility to advance Earth science research and
education with a data repository function.
Informal Repositories for Data Sharing
The other key technical advance during the duration of the GRADE project can be considered to have
58
future implications for informal data sharing. In early 2007 , Google announced that its main search
54
At the First International Workshop on Database Preservation (PresDB07), Peter Buneman’s talk “Why Current Database
Technology Does not Support Preservation” noted that ‘although curated databases use database technology, the contents of
the database seldom includes all the data of interest’
55
www.pangaea.de
www.std-doi.de
57
www.geongrid.org
56
24
engine was able to search and parse KML files, the native file format for Google Earth. More
importantly however the Google search engine parses and understands the geographical data within
the KML and returns relevant results geographically. Searching for and locating relevant geospatial
data purely via Google could well overtake any other informal method for geospatial data sharing and
indeed could impact upon levels of geospatial data made available for depositing within repositories.
New development work could consider the impact of Google’s new geo-enhanced searching.
Recommendations
•
•
•
•
•
•
•
58
There is clear support for a repository to facilitate geospatial data sharing and reuse
throughout UK academia. The JISC should consider the role of a national geospatial data
repository within the UK academic spatial data infrastructure (closely linked to UK academic
geo discovery portal).
As recommended by the JISC IPR Consultancy, JISC should commission an in-depth study
of IPR and licensing of derived data building upon the legal work carried out within GRADE,
There is a need to address the community’s concerns and possible misconceptions about
licensing restrictions against a need to share data. Attempts need to be made to give the
community clear direction on permissible use of derived data specifically when it comes to
depositing data in a repository for others to reuse.
Informal repositories could have some role in geospatial data-sharing for small group
activities but they appear to have limited utility to act as a distributed national resource. More
work is needed to explore and monitor the uptake and role of informal repositories in small
group settings and how they could contribute to a wider infrastructure.
If IRs is to accept geospatial data they need to consider how they will meet the specific user
requirements of the geospatial community (including location-based searching and automatic
metadata generation).
The GIS community should leverage standards developed within the e-Library world for
developing content packaging standards for geospatial data
It is recommended here should be UK HEFE representation on the international working
group looking at standards for geospatial data preservation.
http://www.gearthblog.com/blog/archives/2007/02/new_search_capabilit.html
25
References
Jansse, K. and Dumortier J. (Winter 2006), The Protection of Maps and Spatial Databases in Europe
and the United States by Copyright and the Sui Generis Right, John Marshall Journal of Computer
and Information Law, (24 J. Marshall J. Computer & Info. L. 195)
Korn N., Oppenheim C. and Duncan C., May 2007, IPR and Licensing issues in Derived Data,
http://www.jisc.ac.uk/media/documents/projects/iprinderiveddatareport.pdf
Louis, K. S., Jones, L. M. and Campbell, E. G. Sharing in science. American Scientist 90, 4 (2002),
304-307.
Lyon, L (2007) “Dealing with Data: Roles, Rights, Responsibilities and Relationships” Consultancy
Report, v1.0, June 2007,
http://www.jisc.ac.uk/media/documents/programmes/digitalrepositories/dealing_with_data_reportfinal.pdf
Morris, S (2005) “National Digital Information Infrastructure and Preservation Program Project Work
Plan Collection and Preservation of At-Risk Digital Geospatial Data”
<http://www.lib.ncsu.edu/news/gis.php?p=329&more=1>
Westbrooks, E (2003) “Efficient Distribution and Synchronization of Heterogeneous Metadata
for Digital Library Management and Geospatial Information Repositories.”, Dublin Core 2003, Seattle,
Washington, September 28, 2003 <http://www.siderean.com/dc2003/204_Paper78.pdf>
26
Appendices
Appendix A - Demonstrator Questionnaire Version 1
27
28
Appendix B - Template for demonstrator feedback from associate partners
GRADE Pilot Site Report and Feedback sheet
Completed by ___________________________ Date ________________________
Institution & Department ______________________________________________
1. Registered users of GRADE repository at your institution:
User Name
User email address
Date
Register
ed
Uploaded
– YES/NO
Download
ed –
YES/NO
2. Uploading/Downloading geospatial data from the GRADE repository:
Geospatial data title
Uploaded
Downloaded
Data
type/file
format
Data description
Date
1
2
3
4
5
1
2
Feedback
Upload/depositor process
Download process
Person 1
Search and found data easily?
Ease, instructions clear?
Search improvements
Problems?
Download problems?
Metadata fields required – sufficient
Quality rating?
Zipping up of files clear?
Attach any email/verbal correspondence on
feedback.
Person 2 feedback, etc
Person 3 feedback, etcp
29
Author
Could you also list a top 10 personal wish-list of geospatial datasets you would ideally like a national
GRADE repository to hold:
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
e.g. postcode boundary data, contaminated land data, forest data etc
e.g. Particular companies i.e. Environment Agency data etc
e.g. Coverage? UK? World?
3. Complete User Survey on GRADE repository website – YES/NO
URL: http://gradedemo.edina.ac.uk/dspace/index.jsp
4. Informal Geospatial data-sharing current practices at your institution (at least 500 word report)
The suggestions/questions below are guides to assist with your report. Please feel free to supply as
much information as you wish.
Descriptions of geospatial data your institution/department holds (if part of Go-Geo! pilot study
and you have already completed an Audit state this).
How are files stored? – Where? What format? Data volumes? Is this listed/documented? Who
uses them? (i.e. teaching purposes, ongoing research project)
Is postgraduate/undergraduate research geospatial data stored? – is it searchable
/accessible?
What proportion of the geospatial datasets in your institution/department are:
(1) Funded by research council grants/deposited there?
(2) Derived from primary data sources (e.g. OS)?
(3) Would have no data-sharing restrictions that you are aware of?
Give examples of geospatial datasets that are re-used or derived datasets that you hold.
What methods are used to exchange geospatial data?(internally/externally), i.e. email,
department server, WebCT, URL or websites, FTP,P2P?
Do you have any formal sharing guidelines (i.e. what can be shared, what information must
be added i.e. metadata, contractual agreements/conditions, between particular restricted
groups- who are these)
With whom do you exchange/share GIS data? E.g:
(1)college in department
(2)other departments within institution – which
(3)other Institutions/colleges- please specify
(4) non academic bodies – please specify
(5) others, individuals – outside UK etc – where, how, what, why.
Have you ever encountered problems acquiring geospatial data (provide details of dataset
titles, issues, what happened, how was it solved if it was)
What sources does your institution use to get access to data? - E.g. Athens authenticated
services, internal passwords, where do you get raw GIS data from?
30
If you do not share any geospatial data at all please explain the reasons why not, and state
under what circumstances you would consider sharing GIS-data.
Have you completed the GRADE Informal geospatial data-sharing questionnaire? – YES/NO
URL: http://edina.ac.uk/projects/grade/questionnaire.html
5. Case study examples of a derived geospatial dataset that was shared at your institution (include at
least 2)
CASE STUDY 1:
Title of geospatial
dataset
Actors
Primary
i.e. dataset
creator/researcher
Date
created
Summary detailing what dataset is
Dataset
type &
format
i.e. Vector
ESRI
Shapefile,
etc.
Stakeholders & Role
Secondary
i.e. Funding Body,
or co-researcher
etc.
e.g. organisations/companies license or rights
involved, i.e. EDINA Digimap or OS or NASA, etc.
Roles i.e. Creator, distributor, Education Institution,
Grant body, publisher
Dataset components:
Name
Layer 1
i.e. Land-form PANORAMA
Owner
Distributor
Licensing
Type
Area
OS
EDINA
Copyright JISC till 2009
Raster elevation
derived
31
Layer 2
Etc. continue for as many
dataset layers there are
Who shared
with, why?
What was
shared?
How did you share
Between who,
who requested
it, who did they
approach, and
how.
Full raw data,
map only,
metadata,
licence
terms/conditions,
other
information,
what file format?
Method of exchange,
how was it packaged?
How
long
did it
take
Issues/Problems
Barriers or difficulties
or ones that had to be
overcome.
CASE STUDY 2:
Title of geospatial
dataset
Actors
Primary
Date
created
Dataset
type &
format
Stakeholders
Secondary
Dataset components:
Name
Owner
Distributor
Licensing
Type
Area
Who shared with,
why?
Summary detailing what dataset is
Layer 1
Layer 2
How did you share
How
long
did it
take
Issues/Problems
This form should aim to be well on the way to completion by the GRADE All-Partner Meeting October
th
30 2006, and full report and all tasks finalised by December 2006 deadline.
32
Appendix C – Project Demonstrator
The formal repository demonstrator consisted of one instance of DSpace. Initially set up with an outof-the-box configuration, the demonstrator repository was further customised to meet project
requirements including: restricting item upload and download to registered users only, validating data
deposit on upload, automatically populating geo-related deposit metadata, map-based searching.
The customisation work carried out to accomplish these functions is described below.
Initializing authentication barriers
Using the DSpace administration functions, settings were configured so that item download and item
upload were restricted to registered users but item metadata was visible to alls.
File Validation during upload process
File validator code was built as a separate package that can run independently of DSpace. The
validator takes in a filename of a zip file containing a dataset, it extracts this dataset, runs the
appropriate geospatial info tool (ogrinfo for vector based formats, gdalinfo for raster formats) and
parses the results produced by these tools to create an xml report about the dataset.
To neatly slot this validation package into DSpace a modification for DSpace called the Configurable
Submission System was used, which splits the DSpace submission process up into individual java
servlets, one for each step of the submission process (e.g. one for the upload, one for the metadata
entry etc.). A new submission step called validation was created, which took the uploaded file and fed
it to the validation package detailed above. The resulting xml report is used by the validation step
servlet to produce a response page for the user, letting them know if the file is valid and if it is creating
the dataset boundary for them to tweak.
Display of Geographic Extent of Items
Google maps integration is all done client side using javascript, mainly consisting of writing a little
javascript in the JSP pages (these create the actual HTML parts of DSpace, the parts that are
presented to the user) and making sure that the servlets (which do the work of talking to the database,
authentication, validation, etc.) passed the correct coordinates to the JSP pages so that the bounding
boxes were displayed correctly on the map.
Geographic Searching
DSpace uses a qualified version of the Dublin Core schema. Geographic searching was made
possible by extending the Dublin Core Metadata Element Set Coverage element with the DCMI Box
59
encoding scheme which allows the identification of a region of space using its geographic limits,
representing that information as a value string. The actual searching was done by adding a geoindex
60
table , with the coordinate information stored as floating point numbers, to the database with an entry
for each item in the repository. The searching could have been done by searching through each of the
DCMI Location strings however this would have made the search relatively slow. The combined
keyword and geospatial search firstly does the regular keyword search, the geoindex rows for that
result set are then searched to provide the geospatial capability. The DCMI Location was stored in
order to have more complete metadata but isn't actually used in the demonstrator for searching.
Searching the geoindex is relatively straightforward, each row in the geoindex table stores a reference
to the DSpace item it represents and 4 coordinates (representing the N,E,S,W limits of its bounding
box). The coordinates from the google maps interface are fed to a Java servlet that does either an
"intersects" or a "within" search using SQL. The combined keyword and geospatial search firstly does
the regular keyword search, then the geoindex rows for that result set are then searched to provide
the geospatial capability.
59
This work is of value to the current JISC-funded Geospatial Data Application Profile to support a basic UK repository search
project.
60
The database used with DSpace was Postgres.
33
Appendix D – Second Demonstrator Questionnaire
34
35
36
37
38
Appendix E – Questionnaire on Informal Data Sharing
39
40
Appendix F – Use Case Template
All text in 10-point type refers to content areas. Within these areas, text in italic are variables that
need to be completed by the author. Normal text or fixed variables that should not be changed. The
use of bold denotes different selection options.
Authors
Author 1
Author name
Use case details
Title
Title of project/work
Date
Date
Application Area
Subject area of application
Summary
Actors
Primary
A researcher has received funding and wishes, or is required, to deposit
output data from the project in a digital repository that can then be searched
and accessed by other researchers.
Type:
Name:
Researcher
Name of primary actor
Goals:
Goals for completing the work
Secondary
Type:
Goals:
End-user
Potential use of output data
Broad areas to include research, teaching, class,
institution or personal.
Stakeholders
Ordnance Survey
Type:
Goals:
Creator or distributor or grant body
Sales or
Licensing restrictions or
Marketing or
Advancement of research or
Dissemination of data
41
Dataset Details
Dataset 1
Name:
Dataset name
Owner:
Dataset owner
Distributor:
Dataset distributor
Licensing:
© or Creative Commons or Public Domain
Annual or Perpetual
Quantitative (number of processes) or qualitative
(number of processes)
Raster type or vector
Derived or original or presentation
Processing:
Type:
Area:
Output Data
Type
Format
Vector or raster
File type
Descriptives
Context
Context for the generation of the dataset.
Processing
Processing performed
Key Points
Any key points raised concerning copyright/distribution issues.
References
List of references
42
Appendix G – Institutional Repository Questionnaire
GRADE will investigate and report on the technical and cultural issues around the reuse
of geospatial data within the JISC IE in the context of media-centric, informal and
institutional repositories.
A Work Package requirement is to carry out an audit of geospatial asset management
within institutional repositories. The survey below only takes five minutes to complete;
all completed responses will go into a prize draw for a £30 Amazon book voucher to be
drawn during the week commencing 13th February 2006.
For the purposes of this survey geospatial data is defined as data explicitly containing
coordinate geometry ie vector, raster, geo referenced images, text files, containing x,y
coordinate values. (eg. Electronic maps, geo-referenced imagery, satellite data, data
stored within a Geographic Information System (GIS))
Survey Questions
1. If you have a repository, what software do you use?
2. Is your repository publicly available, if not who are your depositors and users?
3. Do you accept the deposit of geospatial datasets into your repository? If so how
much? If not do you plan to?
4. What special metadata fields do you offer in your repository to describe geospatial
data - how can users search geographically for geospatial data?
5. If you have geospatial data, do you...
Receive supporting documentation from the depositor?
Have guidelines on file format required?
QA the data at all?
Require a declaration of ownership / copyright from depositors?
Confirm if derived from other datasets?
Have agreements to deal with issues of liability from use of the data?
6. Generally, do you think that archiving and providing access to research data is
something institutions should do or specialist data centres?
7. What processes, if any, do you have in place to ensure that long term access to
research data will continue e.g. migration of data so that it remains readable?
8. Please list any other Institutional Repositories you know that are managing geospatial
datasets
43
GRADE
Appendix H – Summary of Data Management Practices at Associate Partner Sites
Site 1
site
Data Policies
•
There are no formal
procedures for
storing or archiving
spatial data or
outputs of research
projects.
•
There are no
guidelines
concerning data
sharing, beyond
any formal
restrictions
imposed on
secondary data
(e.g. OS licensing).
•
There are no
requirements for
metadata or any
formal contractual
agreements
Data Management
•
primary method of data storage is ad hoc whereby
academics store data on a mixture of work PCs, home
PCs and removable devices (e.g. HDDs or USB).
Some academics may also use network storage at the
university
•
Neither PG nor UG research geospatial data is audited
or stored in any manner, other than for UG dissertation
students in GIS where it is a requirement for data
deposition within Blackboard.
•
We are currently implementing metadata input as well
and are keen to trial a repository for these students. Of
more urgency is the need to manage PG data.
•
We currently have a small amount of RC funded
projects and therefore this data should have been
deposited.
•
Far more research (perhaps 70%) is derived from
university and department funding, as well as indirect
research utilising secondary data sources (e.g. OS,
census).
•
There is also corporate, European and Knowledge
transfer funding which accounts for a not insignificant
amount of research.
•
My impressions is that very little research would have
no data-sharing restrictions.
•
actual data usage must run well over 100Gb and
possibly beyond 1Tb
Data Sharing Methods
•
via email, Blackboard,
FTP, CD/DVD, URL
Data Sharing Patterns
•
Data sharing occurs between colleagues within the
department quite frequently, although remains ad hoc.
•
Very little data sharing occurs within the university.
•
Moderate data sharing occurs with other institutions. This will
primarily be research based and usually between project
team members in institutions anywhere in the world.
•
Data sharing will also occur with non-academic bodies,
although this will be related to funded research projects and
will usually involve the in-flow of data required for the project
and out-flow in terms of final project outputs (e.g.
Environment Agency, mining companies).
•
data sharing will occur with individuals who may contact
department members “on spec”. Data sharing on this basis is
much rarer but does happen when there are demonstrable
gains to be made from data sharing.
•
•
•
Site 2
•
There is no formal
structure regarding
the management of
geospatial data.
There is no formal
framework
regarding the
sharing and
exchange of
geospatial data.
Sharing guidance is
informal and verbal
regarding the IPR,
especially for OS
derived data.
2 departments
interviewed state
that their data
sharing policy is
effectively adhering
to the University’s
guidance over
ethical
responsibility,
thereby only using
data for which
consent was
obtained
•
•
•
•
•
•
•
The university has an institutional repository (IR) for
published academic papers. This is not used as a
central depository for geospatial data. In general,
there is the perception that an IR is unsuitable for
depositing geospatial data because of its structure.
Therefore, each department manages its own
geospatial data.
Data commonly stored on a researcher’s PC hard
drive, or portal hard drive, and access is protected
through the university’s generic security systemusername and password.
Geospatial data used or developed in postgraduate
research is not commonly stored, only the thesis or
abstract.
In terms of preventing data loss, each researcher is
normally responsible for his/her data. Some
researchers consider that depositing data with a data
centre is a sufficient disaster recovery strategy.
One dept set up an informal library of geospatial data
(on CD/DVD) to capture data acquired through
research grants, derived/generated from research,
obtained free of charge or supplied with software.
However, confidential or restricted license data remain
with principle researcher
Another dept has acquired a digital data server to
manage geospatial data with a capacity of 900GB. 2
aims (i) to provide secure storage for geospatial data
(ii) improve access within dept, ax uni and 2 external
partners via ftp. Access to DDS restricted 2 approved
users
Data volumes stored are in the range of 5GB to 60GB
with at least 90% of data derived from primary data
•
•
•
45
Internal data sharing is
mainly with coresearchers at the
university by DVD/CD
or shared access hard
drives, or USB memory
sticks, and with
research students by
DVD/CD or e-mail
ftp to a central server
researchers tend to use
Mircosoft Windows
Explorer to search for
their data
•
•
•
The sharing of geospatial data is dependent on the research
undertaken; however, data are commonly shared, internally
and externally, through informal networks within and outside
a department.
External data sharing practices are with project owners or
clients, with subject specific data centres, and with
consortium members e.g when sharing with BODC follow
NERC policy guidelines
Respondents acknowledge the benefits that a GIS user
group could bring to data sharing
•
•
•
There are little or
no formal methods
of
sharing
geospatial data
There are no formal
sharing guidelines.
Certain
research
endeavours
may
have their own
specific
data
submission
stipulations
(e.g.
depositing
data
with NERC as part
of
funding
requirements), but
nothing similar is in
place
at
the
Institute level.
•
•
Site 3
•
Datasets held are predominantly used for teaching
purposes and are mostly located in a central read-only
repository. The repository is managed by the IT
helpdesk, which allocates write permissions on a folder
by folder basis. Users with an account on the local
server may access the data by mapping a drive in
Windows or navigating via the command line in UNIX
(e.g. net use). There is not a well defined folder
structure: depositors create a sub-folder file system
within the net data location based on a variety of
schemes: eg. by year, by location, by relevant
semester and practical session in which the data is due
to be used. Undocumented legacy datasets of
uncertain ownership and origin also exists in the
Institute, most of which are stored on CD-ROM. Such
resources need to be documented and catalogued,
however allocating responsibility and assigning priority
is not straightforward. There is no main index or
catalogue of the data held in the central repository –
users are expected to find the required data
themselves, or solicit guidance from the relevant data
depositor – who is not immediately apparent for those
new to the Institute. Students using data during
practicals are informed of its location at the start of
each session. No universal mechanism for searching
the repository exists outside those facilities provided by
proprietary software e.g. ArcCatalog.
Individual data holdings are mostly undocumented;
metadata is sparse at best. When such metadata items
exist, they are either minimally populated or are the
default files generated automatically by the data’s host
proprietary application (and are hence incomplete).
The results of research by students are available but
not readily accessible. Data may accompany academic
submissions (i.e. on CD-ROM) but such a practice is
not compulsory unless specifically requested by the
relevant supervisor. Once students (or researchers)
depart the Institute, individual accounts are backed up
into an archive and removed from the ‘live’ collection of
accounts. Specific files or folders may be retrieved by
IT personnel for others wishing to conduct further
analysis on archived findings; this however
necessitates logging a call with IT help and may take
up to two weeks to execute. There is however de facto
support for recording time-variant datasets as archiving
of current Institute accounts is carried out on a daily
basis (although requests are subject to similar retrieval
delays unless in case of emergency).
•
•
Commonly used data
exchange methods
include sending
datasets by email – as
zipped attachments or
uncompressed if not
particularly large, via
WebCT, by URL,
passed by hand /
posted via CD-ROM,
pen drives.
Example of sharing
between staff,
researchers and
students using a shared
network drive
•
•
•
•
•
46
There is little evidence of data passed outside the Institute
apart from those datasets produced and deposited with
funding bodies.
Due to the ambiguity / legalese of licensing agreements,
respondents of the current report expressed a reluctance to
share data outside the School of GeoSciences. One
respondent said that if they had to provide data, it would be
predominantly images – rectified geo-tiffs) with standard
human-readable metadata (destination-compliant format),
compressed in a non-lossy format such as tar/gzip.
There is uncertainty as to whether Institute / funded
researcher or group is entitled to certain licensed data.
The Institute does occasionally generate its own data (aerial
(e.g. blimp) photography, gps survey, etc) but these datasets
by and large remain within the domain of the collecting
researcher.
There is little coordination between research groups; similarly
there is little awareness of what data does exist within the
building (the shared repository aside)
There are no given
guidelines about how
data need to be
documented and in what
format it has to be
delivered
•
•
•
•
•
•
Site 4
•
The School does not have a central storage facility
where all datasets can be uploaded or downloaded
when necessary.
Every researcher stores his/her own data locally on
his/her desktop or on an external hard drive. Some
shared sample datasets, mostly used for teaching
purposes are stored in shared locations on servers,
where they can be accessed from departmental
computer rooms. Unfortunately the data are mostly
very badly documented. However, it is possible to
request more information about the datasets from the
person who has deposited the data on the shared drive
(the data are usually in a directory that is labelled with
the instructor’s name).
It is not compulsory for Masters/PhD students to submit
well-documented datasets as part of their dissertations.
Therefore if somebody else would like to continue with
the research or reuse some of the generated data, they
need to contact the author of the work directly or
his/her supervisor for more information.
The exchange of geospatial datasets is mostly done on
an ad-hoc basis depending on the volume of the data.
47
Teaching material is
centrally distributed via
WebCT, however,
bigger datasets are
stored on a local
server.
The department used to
have a Unix machine
upon which such
datasets were stored
with uploads and
downloads being
undertaken using FTP.
This machine has now
been decommissioned
and replaced with a
departmental Windows
server that permits file
transfers using drag
and drop.
Popular ways of
sharing research
related geospatial
information are via
email, or using external
hard drives, or burning
CD and DVD disks.
•
•
The main problem of acquiring geospatial data is closely
connected with a general unwillingness of people to share
and a very poor documentation of the datasets that are
provided for sharing. Therefore, acquiring and reusing
geospatial data can often be problematic.
At the School researchers tend to collect and create their
own dataset which is thought in most cases to be easier and
faster than looking for similar existing datasets that are held
locally and which can be reused.
'Wish list' items
Other
A full list of sources
and informal methods
to share geospatial
data
Faster computer
network speeds
More control over, or
recognition of, your
work
Geography forum
similar to napster or
myspace
University/departmental
repository
Central find/locator
portal
National geospatial
repository
Less restrictive or no
licence agreements
Scores
re
da
ta
Ea
sie
r
M
isu
se
ot
he
rs
re
po
sit
or
y
of
tru
st
in
a
Ac
ce
ss
to
th
e
pr
iva
cy
O
th
er
in
te
rn
et
D
at
a
by
ot
he
of
rs
re
le
va
n
td
to
at
st
as
ar
et
ta
s
ga
in
fro
m
sc
ra
tc
h
Te
ch
ni
ca
la
bi
lity
La
ck
La
ck
to
IP
R
of
m
et
ad
at
a
co
nd
itio
ns
re
se
ar
ch
er
s
Ac
ce
ss
re
La
ck
lic
en
sin
g
C
on
ce
rn
s
Co
nc
er
ns
Responses
GRADE
Appendix I – Survey results identifying key barriers to data sharing and key elements for improving
data sharing (extracted from http://edina.ac.uk/projects/grade/status/workpackage2.html)
Figure 2: Barriers to sharing geospatial data
90
80
70
60
50
40
30
20
10
0
Issues
Figure 3: 'Wish list' of how to make geospatial data sharing easier
200
180
160
140
120
100
80
60
40
20
0