Project Document Cover Sheet

Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date: 31 March 2009
Project Document Cover Sheet
Project Information
Project Acronym
Streamline
Project Title
Integrating Repository Function with Work Practice: Tools to Facilitate
Personal E-Administration
Start Date
1 March 2007
Lead Institution
Leeds Metropolitan University
Project Director
Professor Janet Finlay
Project Manager &
contact details
John Gray,
45 Snowdon Road,
Eccles,
Manchester,
M30 9AS.
email [email protected]
Home: 0161 789 3971
Mob: 07931 674450
Partner Institutions
None
Project Web URL
Web site: http://streamline.leedsmet.ac.uk
Project Blog: http://www.streamlineproject.org
Programme Name (and
number)
Users and Innovation: e-administration
Programme Manager
Lawrie Phipps
st
st
End Date
31 March 2009
Document Name
Document Title
Final Report
Reporting Period
1 March 2007-31 March 2009
Author(s) & project role
John Gray – Project Manager
Date
30 March 2009
URL
if document is posted on project web site
Access
 Project and JISC internal
st
st
th
Filename
FinalReportStreamlinev1.1.doc
 General dissemination
Document History
Version
Date
th
1.0
8 January
2009
1.1
30 January
2009
1.2
30 March 2009
1.3
Comments
Draft project final report
th
Revised to include latest results and respond to feedback on
draft
th
Revised to include feedback from team
st
Final version for JISC
31 March 2009
Page 1 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Users and Innovation Programme
Streamline
Integrating Repository Function with Work Practice:
Tools to Facilitate Personal E-administration
Final Report v1.3
March 2009
Janet Finlay (Project Director, Leeds Metropolitan University)
John Gray (Project Manager, Leeds Metropolitan University)
Dawn Wood (Project Officer, Leeds Metropolitan University)
Project Contact:
Professor Janet Finlay
Technology Enhanced Learning Team
Leeds Metropolitan University
Civic Quarter
Leeds
LS1 3HE
Page 2 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Table of Contents
Table
of
Contents ........................................................................................................................... 3
Table
of
Figures .............................................................................................................................. 4
Acknowledgements ....................................................................................................................... 5
Executive
Summary....................................................................................................................... 6
1
Background ............................................................................................................................... 7
2
Aims
and
Objectives ............................................................................................................... 7
3
Methodology ............................................................................................................................. 8
4
Implementation ....................................................................................................................... 9
4.1
Early
preparatory
work ............................................................................................................... 9
4.2
Automatic
metadata
generation .............................................................................................10
4.3
Enhanced
search
tools ................................................................................................................12
4.4
Resource
management ...............................................................................................................14
5
Outputs
and
Results ............................................................................................................. 14
5.1
Reports
and
documentation .....................................................................................................14
5.2
Automatic
metadata
generation
tool.....................................................................................15
5.2.1
Keyword
extraction
algorithm...........................................................................................................17
5.2.2
Limitations..................................................................................................................................................18
5.3
Enhanced
search
components .................................................................................................18
5.4
Use
cases
and
e­Framework .....................................................................................................20
6
Outcomes ................................................................................................................................. 20
6.1
Achievements
against
aims
and
objectives .........................................................................20
6.2
Impact
on
stakeholders..............................................................................................................21
6.3
Lessons
learned ............................................................................................................................22
7
Project
Management ............................................................................................................ 22
8
Conclusions ............................................................................................................................. 23
9
Implications ............................................................................................................................ 23
10
Recommendations.............................................................................................................. 24
11
Bibliography......................................................................................................................... 24
Page 3 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Table of Figures
Figure 1: Streamline product architecture............................................................................................ 11
Figure 2: Query extension.................................................................................................................... 12
Figure 3: Iterative result reuse ............................................................................................................. 13
Figure 4: Collaborative search via profile matching ............................................................................ 13
Figure 5: Collaborative search via document matching ....................................................................... 14
Figure 6: Preference panel (left) and keyword editor (right) ................................................................ 15
Figure 7: Upload panel (left) and View panel (right) ............................................................................ 16
Figure 8: Collaborators Edit and Selection panels............................................................................... 17
Figure 9: Search component overview ................................................................................................ 18
Figure 10: Detail of the Query Extender .............................................................................................. 19
Figure 11: Enhanced search interface ................................................................................................. 19
Page 4 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Acknowledgements
The Streamline project was funded by JISC under the Users and Innovation programme, in its eadministration strand.
In addition to the core project team (authors), the project drew on a large team of staff at Leeds
Metropolitan University who each contributed to the project through software development, user
engagement and evaluation. These were Mark Dixon, Elizabeth Guest, Stuart Hirst, Sanela Lavareski,
Tony Renshaw, Meg Soosay, and Jill Taylor. In addition, we were able to draw on the expertise of two
former members of Leeds Metropolitan University staff: Rodney Brunt for the area of information
retrieval and classification and John Heap for the eFramework. Ben Ryan from Kainao Ltd contributed
to our early work on repositories and with the e-CAT tool.
We also acknowledge the contribution of the Leeds Met Repository/PERSoNA team: Wendy Luker,
Nick Sheppard and Mike Taylor. The close collaboration between the three Leeds Met repository
projects (An Institutional Repository for Leeds Metropolitan University; PERSoNA and Streamline)
proved to be invaluable.
Leeds Metropolitan University and Belfast Metropolitan College staff contributed greatly to the project
through taking part in focus groups and interviews to explore their workflows and evaluation activities
of various iterations of tools.
Finally the project team would like to acknowledge the support of Leeds Metropolitan University senior
staff, in particular Professor Sally Brown and Dr Barbara Colledge; the useful insights of our JISC
Critical Friend, Professor Peter Hartley; and the guidance of the programme manager, Lawrie Phipps.
Page 5 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Executive Summary
The project aimed to integrate the functions associated with the use of repositories into the day-to-day
practice of staff. Initially the intention was to do that through embedding tools within proprietary
software such as MS Word, building on existing software developed by our partner company Kainao
Ltd. Our early work therefore focused on the evaluation of these existing tools and analysis of how
they are used, as well as investigating potential repository solutions and initiating the development of
a simple test repository (as none was available in the institution). However two key events changed
the direction of the project. Firstly, Leeds Metropolitan University (Leeds Met) initiated a process,
supported by the JISC repositories programme, to establish a repository for Leeds Met. This meant
that any work we did on the project needed to be compatible with that repository and we became
involved in the decision-making process for this. Secondly, our partner company Kainao Ltd went into
liquidation leaving us unable to continue to develop their tools. The project’s aims and objectives were
therefore reviewed and a revised project plan and work packages were approved by JISC.
Our initial work was not wasted however. By capturing case studies and developing narrative
scenarios of user interaction, we had discovered that members of staff at Leeds Met were unfamiliar
with learning objects and metadata, and that those objects that were being developed, tended to be
multimedia artefacts. Our original focus on embedding tools in Word therefore seemed limited. We
adapted our plans to develop generic desktop and web-based tools to support repository functions,
and to increase the focus on staff development activities and support to increase awareness and
encourage use. The tools developed include an automatic metadata generation tool that completes
as much of the metadata as possible, from documentation associated with a learning object, including
suggesting key words to the user; and resource discovery tools, which recommend additional
resources based on closeness of objects to the original search results. In addition, we contributed to a
variety of widgets, developed with the PERSoNA project, to demonstrate the use of social networking
tools to promote sharing of resources through the repository.
In the early phases of the project, we attempted to apply a UIDM-based rapid iteration approach to
development, using electronic paper prototyping tools to develop early tool designs. This was
successful in evaluating interface issues surrounding existing tools. However, once the project was
forced to develop a number of tools from scratch by the liquidation of its initial partner company
Kainao, it became clear that the short iterations adopted by this approach were unsuited to the
combination of problem investigation and software tool development. We therefore adapted our
development approach to allow a much longer software development cycle, which was informed by
regular user interaction, through staff development events, focus groups, interviews and
questionnaires.
In spite of the difficulties in the early part of the project, it has been successful in identifying and
starting to address some key issues relating to the use of repositories at Leeds Met and more widely.
Due to the very recent adoption of repository technology, Leeds Met staff have limited experience of
using repositories. Their knowledge of metadata, both practical and theoretical, is consequently very
limited. They are also, in general, suspicious of the concept of sharing learning objects. Many staff
have little engagement with Web 2.0 tools currently, and those who do, operate in small local groups
with existing sharing mechanisms. This is in contrast to many staff at our partner college Belfast Met
who have a longer history of using repositories. Here staff have an expectation that learning
resources will be shared. However, although they use packaging tools, they still do not generally
supply metadata and management of sharing is handled by a specialised team of staff. Staff
development is a key element in explaining the difference in attitude: new staff induction and training
for existing staff includes an expectation of sharing resources, so that it is considered part of normal
work practice there. Our work has therefore been as much about raising awareness of the repository,
the concept of learning objects and metadata, and Web 2.0 tools for sharing information, as it has
been about integrating those tools into the repository.
The project has achieved its aims of creating an automatic metadata generation tool, resource
discovery tools and encouraging sharing. It has successfully raised staff awareness in the use of the
repository, and evaluated the tools internally and with partners. It also informed the University’s
decision on the choice of repository and its ongoing development and deployment.
Page 6 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
1 Background
Repositories have the potential to enhance productivity and quality through effective management of
assets relating to learning and teaching, research, policy and decision-making. However unless their
use is integrated into existing work practices there is a danger that they will increase rather than
decrease the overall workload of staff and ultimately therefore be under-utilised. The activities of
managing assets and finding resources, though pivotal to the successful deployment of repositories,
are viewed by staff as adding to their administrative overload. The Streamline project therefore aimed
to develop tools and processes that would integrate with existing staff work practice to support for the
use of repositories.
The project built on our prior experience of learning object repositories and tools in the HLSI
(Higher Level Skills for Industry) project, funded by Yorkshire Forward, the EU funded projects
eDILEMA: E-resources and Distance Learning Management (90683-CP-1-2001-1-CZ-Minerva-M) and
REPLIKA (European Repository for Learning Innovation and Knowledge Acquisition) projects and the
current HEFCE funded Centre of Excellence in Teaching and Learning Assessment and Learning in
Practice Settings (CETL ALPS) repository project, each of which has contributed to our understanding
of repository tools and their use. The JISC funded CD-LOR project identifies four barriers to utilisation
of learning object repositories (Margaryan et al., 2006): socio-cultural, pedagogic,
organisational/management and technological. Solutions to these problems are complex but,
according to the findings of the CD-LOR project, they must take better account of user communities
and context and ensure that the repository is seen as part of the normal context of work rather than as
an isolated tool. This concurs with our own experience with HLSI, eDILEMA and REPLIKA where
reluctance to fully use the repository was due to the perceived (and actual) overheads associated with
its use. This was a key issue addressed by the project both through tool development and raising staff
awareness and interest.
2 Aims and Objectives
The project had the following aims and objectives:
Aim: to develop integrated tools to alleviate the additional administration associated with the use of
institutional repositories for assessment, learning and teaching, informed by understanding of existing
work practices.
Objectives:
1. Review previous work on integrated tools, in particular drawing together the findings of previous
projects (Replika, HLSI, CETL ALPS).
2. Evaluate existing tools for integrating repository functions with existing work practice e.g. e-CAT
(REPLIKA’s integrated tool) with users from different disciplines and HE and FE sectors.
3. Examine existing work practices surrounding activities (such as developing course materials) that
might be expected to contribute to or draw on the contents of an institutional ALT repository.
4. Analyse the disciplinary and sector differences in work practice in the use of integrated tools and
repositories.
5. Develop scenarios, domain models and service usage models representing these activities.
6. Explore and evaluate a range of mechanisms for metadata creation, such as automatic metadata
generation, semantic processing, pattern matching, and metadata composition and develop
algorithms and processes for their use. Explore the role of professional indexers in metadata
creation.
7. Explore a range of search strategies, such as keyword and metadata search, browsing, thematic
searching, full text indexing, relationship measures (nearness, distance), and recommendation.
Evaluate against recognised information retrieval measures.
8. Explore the use of ePortfolios and social networking tools such as del.icio.us to facilitate personal
management and sharing of repository resources.
Page 7 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
9. Iteratively develop and evaluate tools for repository e-administration, integrating with both
proprietary and open source software, to reduce personal administrative load.
10. Disseminate findings to evaluate tools more widely.
There were some changes to the emphasis of the aims and objectives when the project plan was
revised. In particular, we focused more on objectives 6 and 7 rather than 8, which was part of the
remit of the PERSoNA project. We also changed our expectations for some, in the light of changes to
work practices. For example, for objective 9, we looked at standalone and web services rather than
embedded tools. We revisit the aims and objectives in section 6.1 when we consider the outcomes
and achievements of the project.
3 Methodology
The project used an iterative and participatory methodology, engaging users at regular points in the
design and decision making process. This was based on the principles of User Innovation and
Development Model (UIDM) for user engagement (Fowler & Scott, 2007): a high level of user
interaction through workshop activities and regular feedback on emerging tools. These workshops
enabled us both to capture requirements, which were represented in use cases and specifications,
and to evaluate the tools as they developed. We engaged in at least five cycles of iteration, moving
through capturing requirements, to executable “paper” prototypes using the tools Gabbeh (Naghsh &
Dearden, 2004) and Denim (Lin et al., 2000), then to a series of software prototypes. Each iteration
was presented to users and feedback informed the next design stage. This process worked well for
the interface elements but proved less succesful for the functional development due to the complexity
of the underlying algorithm.
Evaluation was done using a range of methods including questionnaires and interviews, eye tracking
and focus groups. We worked with different user groups, drawn from a wide constituency, making use
of staff development events and special evaluation events to recruit a broad range of potential users.
A key user group we worked with was the Podcasting Pilot project in order to explore the
requirements of multimedia learning objects. In addition we consulted users in the Faculties of Health,
Carnegie and Innovation North, as well as academics and learning technologists from Belfast
Metropolitan College. User engagement and evaluation questionnaires are available from
http://www.streamlineproject.org.
An important element that emerged as the project progressed was the criticality of working closely
with the other two repository-related JISC projects at Leeds Met, namely the Institutional Repository
project and PERSoNA, which began in 2007 and 2008 respectively. These projects had
complementary aims, with Streamline and PERSoNA converging markedly in the area of supporting
sharing resources and personal management, though Streamline was focused on learning objects
and PERSoNA on research outputs. It was decided that rather than duplicate effort between these
two projects we would combine forces on this aspect and work together to develop applications. This
element is reported in full in the final report for the Persona project (Luker & Sheppard, 2009).
The decision of the University to acquire an Intralibrary repository was also significant as it meant
from that point we were able to work with a real repository and engage users with a new institutional
system. This has enabled us to widen our catchment beyond the originally planned user groups to the
wider institution. However the delay in getting to this point meant that we were unable to make as
much progress with integration as we had intended. Although the IntraLibrary repository was installed
and available for experimentation by a limited number of staff in early May 2008, there have been
several technical issues that have hampered its more widespread use. These include issues
concerning limits to the size of documents that can be deposited via the desktop and problems in
accepting separate XML files containing metadata for objects.
Interoperability was an important principle underlying all our development. Although the tools have
been developed to be specifically compatible with the institutional repository, Intralibrary, we adopted
Dublin Core/LOM (Learning Object Metadata) to support interoperability of the tools with other
repositories. Tools have been developed or translated into Java to allow easy integration and the
development of a Java API.
Page 8 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
4 Implementation
Project activity can be considered in two parts. Initially we intended to extend tools already developed
by our partner company, working with a test repository established by them. The early work on the
project therefore focused on modelling user workflows, evaluating existing tools in preparation for
redevelopment and developing a test repository for our work. However approximately 8 months into
the project, the partner company folded and we had to re-evaluate our approach. It rapidly became
clear that we were not going to be able to continue work on redevelopment of the existing tools, so a
decision was made to develop new tools from scratch. At the same time, Leeds Metropolitan
University was undergoing an evaluation of potential repository systems with a view to acquiring an
institutional repository. This also influenced our decision-making, as we now had to ensure our work
was compatible with the chosen system, rather than working on a test repository. In addition, in
January 2008, another JISC project was funded at Leeds Metropolitan University (PERSoNA), which
was considering issues of sharing and management of resources relating to research outputs in the
repository. It was decided to work with PERSoNA in relation to this area (objective 8) rather than
continuing independently, and instead concentrate development effort on automatic metadata
generation and enhanced search. These changes were included in a full revision of the project plan,
which was approved by JISC in early 2008. We will therefore report on project activity in four sections:
early preparatory work, metadata generation, enhanced search, and personal resource management.
4.1 Early preparatory work
The early work of the project focused on three areas: modelling user workflow relating to learning
object development and repository use; evaluating the e-CAT tool, a packaging tool which is
embedded in MS Word and is used to create learning objects with metadata (produced by Kainao Ltd,
for a guide see Soosay, 2007); and developing a test repository for our project. While these activities
were superceded by later developments (see above) this work was valuable preparatory work and
produced a number of reports, which may be of broader interest to the community. It also produced
the first of a series of use cases describing workflows associated with learning object creation and
repository use, which have been useful throughout the project and are available for other projects.
Working with staff at Kainao Ltd and their e-CAT product provided opportunities for producing
workflows and case studies of staff depositing learning objects into a repository in the early stages of
the project. Although Streamline is focused on learning objects, the institutional repository is designed
to hold other types of objects as well such as research outputs and multimedia materials. Given the
wide range of potential activities and objects that could be reviewed and modelled, it was important to
identify a specific user group and the podcasting user group was identified as an appropriate
candidate. A major podcasting pilot at Leeds Met was just starting which included users from across
the University and the Regional University Network, covering a large range of topics and styles of
podcast. The multimedia nature of these learning objects raised interesting issues for automatic
metadata generation, since the main content was not textual. In addition, we worked with academic
staff who were using e-CAT or similar tools to generate learning objects.
We held initial user group meetings with staff who were exploring podcasting throughout the
university. Academic staff and learning technologists produced scenarios illustrating both creation and
use of podcasts and other learning resources. We were particularly interested in the assets they used,
whether they used materials in repositories (formal or informal) at present, and whether they thought
about the issues of metadata, finding objects, and sharing objects at all. These scenarios resulted in a
collection of text, video and audio descriptions from which use cases were produced. It was
immediately clear that none of the staff had considered the inclusion of metadata and that even those
who were familiar with the concept and its importance, had not considered its capture within their
project workflows.
In June 2007 the project undertook an evaluation of the e-CAT tool in order to determine how best to
support staff in identifying and applying metadata. The evaluation took the form of:
Page 9 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
1. A detailed structured evaluation of e-CAT with a small number of individuals, using eye
tracking, observation and post-task interviews to explore issues.
2. An in depth semi-structured interview with an active self-taught user and tutor of e-CAT.
3. A web-based distribution of e-CAT evaluated by online questionnaire.
This produced a set of recommendations for the enhancement of e-CAT, which ultimately informed
the design of the new automatic metadata generation tool.
As Leeds Met did not have an institutional repository at the start of the project, Kainao Ltd. set out to
create a simple repository using MySQL and Moodle, to provide staff with some experience and
understanding of the process of depositing objects into a repository. One characteristic of this
temporary repository was the need to ‘package’ any objects that were to be deposited i.e. not only
capture relevant metadata but also bind the object and its metadata into a compound item that could
then be deposited in the repository. This functionality had been provided by e-CAT but was a potential
problem when we moved to developing a new tool. This was resolved when the University acquired
the Intralibrary institutional repository, as this functionality is built into the system and we were
therefore able to concentrate on the metadata and search functions.
Using the information and feedback gathered up to March 2008 the team began work on the design
and development of an automated meta data generation tool and enhanced search tools. An overall
architectural view of the main workflow packages that were built as part of the Streamline project is
presented as a conceptual architecture model that identifies the key elements, as shown in Figure 1.
4.2 Automatic metadata generation
The automatic metadata generation prototype was developed to ease the process of depositing
learning objects within a repository. It has been designed as a desktop tool, to be used with Intrallect’s
Intralibrary repository. However, the prototype generates Learning Object Metadata (LOM) compliant
metadata using an XML binding. This can then be uploaded to any similar compliant repository.
It implements three concepts to automatically generate a subset of the LOM specification using an
XML binding. These were identified through the earlier user engagement activities and through
examining the LOM structure used by e-CAT. The concepts and their associated LOM fields are as
follows:
1. Preferences: write once, use many times. Use default settings for personal details and those
fields that can be persistent from one learning object to the next. These currently include
language of end user and content, version, status, current user as contributor to content and
metadata, learning resource type, context, difficulty and typical learning time.
2. Collections: write/extract once use by many. Use selection lists predefined within prototype,
extracted from existing organisational data and created by retaining data previously used by
individual users. These are saved separately as global and personal collections. These
currently include collections for content and metadata contributors, copyright licenses, and
technical specifications for the required operating system and/or browser plug-in
requirements.
3. Content extraction: utilise existing documents. By extracting content from a variety of textual
documents, the fields for keywords, title and description can be populated.
Appropriate elements for metadata generation were identified by looking at university documents to
identify potential keywords within these. The documents ranged from web pages about departments,
through course and module specifications to textual learning resources. By organising them into four
groups – organisation level, department level, course level and module level – the flow of keywords
Page 10 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Figure 1: Streamline product architecture
was modelled. This showed that the number of relevant keywords at the higher levels were fewer and
more generally representative than those at the level of learning materials. Alongside this, techniques
used in automatic summary generation were also reviewed. This dealt with both single documents
and also collections. Initial work focused on using information stored in Word 2003 format documents
as Leeds Met currently uses MS XP and Office 2003 as its institutional standard. It had been hoped
Page 11 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
that this would be extended to other document formats by the end of the project but this was not
possible due to time constraints. However, the tool provides proof of concept for this approach.
The aim of the extraction process is to generate a set of potential keywords from the documents
supplied by the user. Early tests on a corpus of student essays showed that topics are easy to
distinguish. The resulting potential set of metadata keywords is then presented to the user so they can
select what they think is most appropriate or add new words. The intention is to support the user
without removing their ability to amend or extend the set of key words.
The final version of the automatic metadata generation tool was tested at a Repository Awareness
Day held in early November 2008, including with staff who had taken part in earlier evaluations of eCAT and the paper prototypes. Feedback from these staff was very positive. They were impressed,
both by the application and how it had incorporated the suggestions made when discussing the
difficulties with e-CAT. Others were impressed with the keyword generation process and interested in
the method of text extraction we had used. Overall the response was good. Participants were
generally inexperienced in metadata creation, although two were producing learning objects. Reports
on the evaluation are available at http://www.streamlineproject.org.
4.3 Enhanced search tools
In parallel with engaging with user groups on metadata generation, other project team members were
investigating approaches to enhanced resource discovery. Several approaches have been
investigated; the intention was to examine possibilities for serendipitous resource discovery, much like
when you find a book through a library catalogue, you often find useful texts in nearby shelf locations
that were not flagged by the original search.
The following four approaches were investigated:
Method 1: Query extension. Submit multiple searches by getting search information then creating
alternative versions using a thesaurus. Rather than doing a single search, do multiple searches using
this information then produce results as a visualisation. This is illustrated in Figure 2.
Figure 2: Query extension
Method 2: Iterative result reuse. Get the initial results from a search, extract keywords (or other
metadata) of returned documents, and then use this to find related documents. Extracted keywords
and those found in second pass should be standardised using a thesaurus. For each set of search
results, submit new search, then follow same process recursively (obviously removing common
Page 12 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
returned elements). Each iteration will result in documents less related to initial document. This is
illustrated in Figure 3.
Figure 3: Iterative result reuse
Method 3: Collaborative search via profile matching. Identify all searches performed by a user in a
particular session i.e. build a profile of a particular user’s searches. Cross-reference this to searches
performed by other users in a particular session. Identify different searches that are commonly used
within the same sessions. This should give a list of related articles that may be of interest to users
who search for the same type of things (even if the keywords, content or metadata of the documents
are completely unrelated). The more sessions in which a set of related searches are applied, the more
likely it is that they are related in some way. This is illustrated in Figure 4.
Figure 4: Collaborative search via profile matching
Method 4: Collaborative search via document matching. Whenever a search is done, cache the
returned documents. When new searches are done get a returned resource and find all caches that
also contain that document, then return the contents of each cache as secondary results. Again this
could be done recursively. This is illustrated in Figure 5.
The development of the enhanced search prototype focused around two objectives:
1. To enable testing of extended search methods with LOM metadata
2. To enable service-based architecture to allow flexibility in use of the tool.
The first of the enhanced search algorithms was selected for the implementation of the prototype
enhanced search tool. As there were very few learning objects within the repository, preliminary tests
were conducted with the research content, which has significantly different metadata. We have
Page 13 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
therefore not been able to extend the range of the metadata used to include more interesting
educational aspects.
Figure 5: Collaborative search via document matching
4.4 Resource management
The resource management and sharing aspects of the project were investigated early on through
evaluation of a number of potential tools to provide this functionality, within a test repository. The
proposed test repository was built using MySQL and Moodle, a widely used open source eLearning
course management system, which offers a wide range of features enhanced by plugins produced by
the open source community, including ePortfolio functionality. We also reviewed Plone, an open
source content management system. This system gives users or groups individual space as well as
allowing them access to publicly published materials. Finally we considered Tiddlywiki, which
provides a personal wiki via a single interactive web page, written in Javascript, and provides all the
functionality to blog and link resources. Each section of the page can be tagged and searched. Each
of these options was reviewed to assess its value for supporting personalised management of
repository resources (see http://www.streamlineproject.org for these reviews).
However the adoption of Intralibrary made the need for these tools redundant and, in the latter part of
the project, we have been investigating web widgets as flexible tools for providing a personal “view” of
the repository. These have been developed with the PERSoNA project and full details of this activity
are reported in Luker & Sheppard, 2009.
5 Outputs and Results
The project has produced a number of outputs that are available for use and further development by
the community. These fall into four main categories: reports and white papers, metadata generation
tool, enhanced search components, and use cases and other contributions to the eFramework.
5.1 Reports and documentation
A collection of reports and documentation has been produced as part of the activity of the project.
These include: reviews of Plone and Moodle; a comparison of Course Genie and e-CAT; a
comparison of six repository systems; reviews of metadata creation and search algorithms; and
evaluation reports for e-CAT and the automatic metadata generation tool. In addition, the project tools
and processes have been documented, both for users and developers, and a Beginner’s Guide to eCAT has been produced. All of these documents, together with the more informal discussions of the
project team on the blog, are available from http://www.streamlineproject.org.
Page 14 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
5.2 Automatic metadata generation tool
The automatic metadata generation tool is a stand-alone java Jar file using flat file storage within a
specific document structure. Extraction from the download zip file creates this structure and clicking
the Jar will run the application. All user data is stored within a flat file, unencrypted.
A simple logon interface enables previous users to access and use their profile. New users can select
guest and are initially invited to create username and password and enter/select their preferred
default settings (Figure 6). Usernames and passwords are case sensitive and each is checked for
uniqueness. The password can be changed at a later date; the user name is permanent. The
preference panel enables defaults to be set for each new metadata set created. These can be edited
for individual metadata sets later (via Edit) or changed on the preference panel (via Preferences Your Details) for future metadata generation. Tool tips (shown in Figure 6) are available for all the
editable fields when the user rolls over the field’s label.
Figure 6: Preference panel (left) and keyword editor (right)
The interface to the application consists of a series of panels accessed by a top menu bar. Access to
some elements of the menu is restricted depending on the user’s point of metadata generation. Popup windows are used for creating, editing and selecting one of four collections that can be used to
quickly populate the metadata.
After logging in or signing up, the user is presented with the Upload panel. Here they can add and
remove documents for the prototype to analyse for the extraction of keywords, title and description.
Each document must be identified via a drop down list embedded in the table (Figure 7). The
prototype can currently process MSWord2003 document format. The document types are as follows:
•
Unknown: default setting for any documents that do not fit the other types.
Page 15 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
•
Learning object: the learning content in textual form. There should only be one of these
labelled.
•
Department: formal documentation developed at the department/faculty level.
•
Course: formal documentation developed at the course level.
•
Module: formal documentation developed at the module level.
•
Script: learning object development documentation that is non-formal.
Figure 7: Upload panel (left) and View panel (right)
To process the documents the generate button is selected which will, where possible, populate the
keywords, title and description fields. The user preferences are also processed at this point and
added to the metadata. Documents do not have to be used with the prototype, simply selecting
generate will enable the user to create metadata from scratch, and only fields from their preferences
will be included.
The View panel (Figure 7) presents the user with an overview of the metadata fields currently
populated. Text colouring has been used to inform the user of the status of each field. Red text
indicates fields that are mandatory according to the LOM specification and are currently empty. Grey
texts are those fields that are not mandatory and are also empty. This is used only as information to
the user and they may export the XML regardless of the field states.
If they are satisfied with the metadata at this point they can simple export the XML and close the
application. If they need to change the metadata, access to the edit panels is available once
generation has completed. The following edit panels are available: Collaborators panel (Figure 8),
Learning Object Details panel (including rights collection), Technical Details (system and plug-in
collections), Educational, Keywords & Classification (Figure 6). Figure 8 shows the Collaborators
Page 16 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
panel with the Collection Selection table. This is generated from two files, one containing globally
available collaborator details and the other containing the user’s past collaborators. Technical and
Rights both use a similar selection device. Users can move freely around these panels and edit as
they chose. Data will not be saved on any of the panels until Save is selected from the File menu.
This will update the internal application memory and re-write the View panel to show the changes that
have been made. Closing the application before exporting will lose all data not saved in the user
preferences.
Figure 8: Collaborators Edit and Selection panels
5.2.1 Keyword extraction algorithm
The uploaded documents are analysed and a stop list applied to them to remove all small and
common terms such as the, and, their. Stemming (identify the word root) is not applied here as it
produces words that are generally unfamiliar and non-descriptive to user. For example computing
becomes compute, which is as relevant to the field of mathematics as it is to computer science. The
documents are then sorted into the types corresponding to the labels given to them by the user on the
upload panel. Currently all formal document (Department, Course, Module) are grouped together as
tailored stop lists have not yet been implemented. Any unlabelled documents or multiple occurrences
of the learning object label (the first document labelled as learning object is retained as such) are
classified as unknown and processed separately.
Two IR techniques are then applied depending on the type and number of documents uploaded.
Simple Term Frequency (counting the number of occurrences) is used on the learning object (if
textual) and any document types that contain only a single document. The second method, Term
Frequency-Inverse Document Frequency (TF-IDF), works across a set of documents. This basically
Page 17 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
calculates those terms that are most frequent in a given document and least frequent in comparison to
the terms in the rest of the document set. The prototype uses a version of TF-IDF that use natural
logarithm in its calculation as follows:
Document frequency = ln(Number of Doc / Number of Doc Term Occurs in)
TF-IDF = (Term Frequency * Document Frequency)
This results in some combination of up to four sets of keywords: those from the learning object; those
from formal documents; those from scripts; those from unknown documents. Twenty keywords are
then selected from a combination of learning object, scripts and formal documents, depending on
what is available. Duplicates are removed and any shortfall is topped up from the learning object. The
unknown set is only used if no other sets are available.
The description field is extracted from the learning object only, as the first two sentences of the first
paragraph, if these are identifiable. The Title is assumed to be the text formatted as Title in a generic
word document.
5.2.2 Limitations
The current prototype version can only extract content from MSWord2003 although the APIs (Apache
PO1 3.1 Final) are available in the code for extending this to the rest of the MS2003 applications. The
XML output is tailored for use with IntraLibrary, which required the removal of mime type and file size
metadata as these are automatically processed by the repository. It is assumed this is a common
feature of most repositories, but may return a malformed XML from those that do not.
5.3 Enhanced search components
The enhanced search application comprises four components (see Figure 9): a client side set of Ajax
interface written using the Google Web Tool kit; the service initiator that facilitates communication
between client and sever; the query extender that takes the users input and imitates four alternative
searches based on synonyms extracted from the thesaurus, detail shown in Figure 10; and the parser
that extracts the relevant fields from IntraLibrary’s response for each of the queries.
Figure 9: Search component overview
The client side search interface provides the seamless integration of an Ajax style application. A
single search box and button enable users to enter their query. The results are displayed in a series of
embed hide and show panels. A generated hyperlink menu at the top of the page enables users to
select the results of their original term or any of the extended terms (currently set to four) individually.
Each set of results is displayed in a drop down panel that gives the title of the document. When
clicked this allows the user to see further details relating to the document. This currently includes
Page 18 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Authors, Date and Description available in the metadata. A link to the learning object is also provided,
but at this stage it is not attached to any authentication process. With the aide of the parser and GWT
interface this can easily be adapted to accommodate other metadata details. Figure 11 shows the
skeletal structure of the interface, which can be styled via CSS.
Figure 10: Detail of the Query Extender
Figure 11: Enhanced search interface
Page 19 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
5.4 Use cases and e-Framework
Having analysed the data from the podcasting scenarios, we developed initial use case diagrams
capturing the processes described by the users. A high-level diagram was developed identifying the
overall process and decisions a user is required to consider. Some of the processes are specific to
the editing application iMovie (part of the editing tool set used in the Apple video editing suite), but can
be applied to other editing processes. From this, specific scenarios were modelled to show the
changes required, for the user, in the decision process:
•
•
•
How a podcast can be used to deliver material in preparation for a tutorial.
Recording a lecture for student reference
Creating persistent materials for instruction and advice, with student input
A second set of scenarios were developed by an eCat tutor. These were more closely focused on the
production of learning objects by lecturers and other staff, based on data collected through a series of
staff development workshops. A high-level diagram was produced showing the processes and
decisions of producing a learning object using the eCat system. We then used the scenarios above to
examine the differences in requirements between an experienced lecturer and a lecturer at the start of
their teaching profession.
In contrast, we also modelled Belfast Met’s supported learning object creation activity, producing
another four use cases for their multimedia learning object production process. These use cases are
all available at http://www.streamlineproject.org.
In addition we have submitted an entry for the project into JISC’s Innovation Base.
6 Outcomes
6.1 Achievements against aims and objectives
The project has substantially met its main aim: to develop integrated tools to alleviate the additional
administration associated with the use of institutional repositories for assessment, learning and
teaching, informed by understanding of existing work practices. The tools produced are not embedded
in proprietary tools as originally planned but are available as standalone tools and web services to
allow integration with a range of repositories. The standards used have focused on interoperability.
With regard to specific objectives (achievements highlighted in italics):
1. Review previous work on integrated tools, in particular drawing together the findings of previous
projects (Replika, HLSI, CETL ALPS).
We carried out an early review of these projects but when the project changed direction we
needed to review a number of alternative development platforms, prior to the institution acquiring
a full repository. These reviews are available at http://www.streamlineproject.org.
2. Evaluate existing tools for integrating repository functions with existing work practice e.g. e-CAT
(REPLIKA’s integrated tool) with users from different disciplines and HE and FE sectors.
eCAT was evaluated and this informed the design of our own tool. These have been evaluated
with staff from across the institution as well as from an FE college partner.
3. Examine existing work practices surrounding activities (such as developing course materials) that
might be expected to contribute to or draw on the contents of an institutional ALT repository.
Scenarios of practice have been derived from focus group and tutorial activities and use cases
created.
4. Analyse the disciplinary and sector differences in work practice in the use of integrated tools and
repositories.
Although we have evaluated tools with staff from different disciplines, their limited knowledge and
use has made it impossible to make any judgements about disciplinary differences in use. The
Page 20 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
comparison across an HE and an FE institution shows clear differences in process, which have
been considered.
5. Develop scenarios, domain models and service usage models representing these activities.
We have developed use cases and specifications for tools and have contributed to the Innovation
Base.
6. Explore and evaluate a range of mechanisms for metadata creation, such as automatic metadata
generation, semantic processing, pattern matching, and metadata composition and develop
algorithms and processes for their use. Explore the role of professional indexers in metadata
creation.
We have documented our deliberations on a range of approaches to metadata creation and the
choices made in developing the automatic metadata generation tool. This included consideration
of professional indexers.
7. Explore a range of search strategies, such as keyword and metadata search, browsing, thematic
searching, full text indexing, relationship measures (nearness, distance), and recommendation.
Evaluate against recognised information retrieval measures.
We have documented our consideration of a range of search strategies and the choices made in
developing the final tool. The tool has been evaluated against standard search algorithms.
8. Explore the use of ePortfolios and social networking tools such as del.icio.us to facilitate personal
management and sharing of repository resources.
We considered ePortfolio plugins when we were reviewing Moodle and Plone but, when we
began working with the institutional repository, we did not have access to an institutional
ePortfolio so concentrated on social networking. With the PERSoNA project, we have explored
social networking tools to encourage sharing and personal management of resources within the
repository and we have documented a number of approaches considered.
9. Iteratively develop and evaluate tools for repository e-administration, integrating with both
proprietary and open source software, to reduce personal administrative load.
We have developed tools, through an iterative design and evaluation process. Following the
change in the project plan we decided to develop these as stand alone and web-based tools
rather than integrating within proprietary software.
10. Disseminate findings to evaluate tools more widely.
We have disseminated widely within the institution, at JISC events nationally and within partner
institutions. We are planning an end of project event to disseminate the outputs to the community.
The tools produced are being adopted as part of the repository element of the PC3 project funded
by JISC under the Curriculum Design programme.
Overall our observation is that the project’s original objectives were based on the assumption that we
would be adapting existing software, rather than developing tools from scratch. The change in the
project plan led to focus on one tool for each aspect, rather than trialling of several. However, we have
reviewed alternatives and documented our considerations.
6.2 Impact on stakeholders
The project provides benefits to a range of stakeholders in higher education. These are summarised
below.
University senior management: The project has provided tools to assist staff in the creation of a
university wide reusable resource. It has highlighted issues for staff development in both the creation
of reusable learning objects and also depositing and searching for objects in the repository.
Teaching staff and learning technologists: The project provides support to staff in both submitting
objects to the repository and finding objects in it. It also helps improve understanding of the process of
deriving keywords and metadata.
Page 21 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
Students: As staff start to make greater use of the repository for sharing resources, students will
benefit from a wider range of learning resources being made available to them.
Technical staff – supporting repositories: The project has provided two prototype tools that can be
used to enhance repository use. These have been designed to be interoperable and are available to
be used, developed and extended by the community. The project has also modelled workflows of how
staff use and create learning objects, including multimedia objects.
Software developers and vendors – open and commercial: The project has provided
specifications for innovative tools and algorithms. Open source licensing makes these available for
adoption, adaptation and development.
e-framework: The project has provided a statement of services needed to support the tools, together
with models of user behaviour when interacting with repositories and learning objects.
6.3 Lessons learned
In the course of the project, we learned a number of lessons that may be valuable to other projects,
particularly relating to processes and management. The User Innovation and Development Model
(Fowler & Scott, 2007) works well for developing the interactive elements of software but is more
constraining when developing the backend components of complex software algorithms. We were
over-ambitious in our plans, even before circumstances forced us to change direction. Projects should
recognise the complexities of creating robust, well-constructed web-based software tools when
allocating time and resources in the project budget. We have also learned that things can go wrong on
projects, so plans need to be flexible and able to change to accommodate changes in circumstance.
Risk assessment is an important element of project planning and, if done realistically and thoroughly,
will help when things do not go as planned.
7 Project Management
The project director, bought out on a fraction of contract, managed the project in the first year. This
was not ideal, as project management responsibilities were often competing with project development
activities for that time. From the second year, project management was devolved to an experienced
project manager on a contractual basis, which was much more successful.
The start of the project coincided with a freeze on new jobs in the Faculty, which led to a six-month
delay in recruiting a full time project officer. Buying in additional existing staff time covered this time.
However, this was not as effective as a full time member of staff, due to competing demands on their
time. It would be useful in such circumstances to be able to negotiate a delayed start to the project,
particularly since the lead in to projects is often so short.
The liquidation of the external partner company Kainao Ltd, led to significant revision of the original
project plan. In particular, the decision to focus on developing tools from scratch, rather than
modifying the existing e-CAT tool, supported by key staff from Kainao, meant that the original UIDM
based 10-week cycles were no longer feasible. This also led to the project officer having to spend
more time on software development than originally intended. At this point the workload placed on core
project staff meant that more staff were required to support the project, a fact reinforced by the early
retirement of yet another member of the team. As well as the project manager, an additional software
developer was bought in from existing staff, to support the project officer. Subsequent revisions to the
project plan and budget were submitted and approved in March 2008 and these have subsequently
proved to be effective for the project.
The increasingly close partnership between the three repository projects at Leeds Metropolitan
University led to the projects holding joint monthly meetings, sharing results, tools and insights in a
rich and complementary fashion. This has resulted in some blurring of the distinction over where the
responsibility for certain outcomes actually lies, for example the issue of resource management in
Streamline and the use of social networking to increase the sharing of knowledge in PERSoNA have
become merged into one activity.
Page 22 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
As well as face-to-face meetings we managed our activities using social networking tools. At the
beginning of the project we set up two blogs; one an external-facing blog to report progress and the
other an internal project one for ongoing development. As work progressed it became clear that this
was an artificial separation. It was increasingly difficult to decide what should go where and at what
point work had “progressed” far enough to be reported. The net result was that much of the ongoing
work was remaining behind the closed door of the internal blog. We therefore decided to merge the
two. All the internal blog posts (except minutes etc) were added to external-facing blog and all
subsequent posts were made to that blog. This has proved much more effective and demonstrates
the risks associated with selecting an overly complicated communication strategy. It also highlights
the fact that openness on the process of development can be beneficial to projects.
We have held a number of joint events to share outcomes, including sessions at the annual Staff
Development Festival and a Repository Awareness Day. However one further event is planned in May
with a particular emphasis on dissemination externally.
8 Conclusions
The Streamline project met its aims but with reduced scope and on a different trajectory than that
originally intended. These changes in circumstance have at times been difficult for the project team,
but have been beneficial in highlighting important issues relating to risk management and flexibility in
planning.
As the project progressed it became clear that institutional context and culture was also a major
influence on the development of the project. Leeds Metropolitan University is very new to the use of
repositories (with an institutional repository being established for the first time during the project). Staff
therefore have no expectations and limited knowledge of tasks surrounding repository use. This led us
to spend more time than anticipated working with staff on raising awareness. The widespread use of
learning objects is still some way off at Leeds Met but the project has contributed to broadening the
debate and has provided tools to facilitate the process.
However, tools are only one element. There is also an ongoing issue with change management for
staff. An interesting discussion at one of our awareness events resulted in a request to provide a brief
summary of the benefits the repository would bring to an average overworked academic. For research
this easy to articulate. Research assessment is in future going to be weighted towards citation. People
cite what is accessible and an open access repository makes work available for download. In other
words, sharing research outputs is self-rewarding for researchers.
Such benefits are harder to define for learning objects. In theory you share what you have and gain
access a much wider pool of resources through the repository. Academics stop reinventing the wheel,
save time and resources and have better quality materials. But in practice this is rarely how it works,
particularly if a repository has not reached “critical mass” - and many repositories never reach this
point for any given discipline. There has to be content for people to see benefit - and we can only
have content if people contribute in some way altruistically. For many academics this presents a
problem. There is no recognised reward system for sharing teaching and learning, as there is for
research. Unlike research papers, authorship and contribution are often distributed and harder to
specify. Plagiarism is certainly harder to spot and possibly taken less seriously.
To promote a culture of sharing learning resources requires that this is given appropriate recognition.
This might be at an institutional level, giving credit for the learning resources shared in promotion and
personal review. It might be at a community level, offering some kind of community recognition. Any of
this would need to be accompanied by peer review so that credit is given for quality not just quantity.
Such a scheme is beyond the scope of the project but is one aspect that needs to be considered in
future developments in this area.
9 Implications
Page 23 of 24
Project Acronym: Streamline
Version: 1.3
Contact: Janet Finlay
st
Date:
31 March 2009
The project has developed prototype tools and explored a range of algorithms. These tools can used
and extended by the community. Future developments in the following areas would enhance these
tools:
Automatic metadata generation tool:
• The extraction tool needs to be extended to include a wider variety of file types.
• Specialised stop lists could be used for formal document level analysis.
• Sentence-based summary methods could be used to populate the description field,
regardless of the structure of the documents.
• The LOM subset used could be extended.
• Categorisation of learning objects could be automated using the thesaurus and IntraLibrary’s
cataloguing taxonomy.
• Personal collections for technical aspects and rights could be implemented.
Enhanced search tool:
• The search could be extended to include an advanced search that utilises individual metadata
fields.
• The other three augmented searches could be implemented and tested, which could then be
offered as a choice to users.
• Neighbourhood search implementations could be extended to include latent semantic
analysis and other algorithms.
• Develop interface to allow visualisation of search results.
General issues
• Work with IntraLibrary to enhance the range of tools and APIs available so that content can
be seen and searched via the web.
• Applications for access to IntraLibrary from common Web 2.0 applications could be
developed along the lines of the Facebook application developed with PERSoNA.
• Integration of repository and tools with virtual learning environment and other institutional
systems.
10 Recommendations
A number of specific recommendations arise from this project.
1. Institutions should consider specific staff development with a focus metadata and searching if
sharing resources is to be facilitated.
2. Institutions and the wider community should consider developing reward mechanisms to
encourage sharing of quality resources.
11 Bibliography
Fowler, C., & Scott, J. (2007). The User Innovation and Development Model Guide. University of
Essex.
Lin, J., Newman, M. W., Hong, J. I., & Landay, J. A. (2000). Denim: finding a tighter fit between tools
and practice for web site design. Proceedings of CHI (pp. 510-517). ACM.
Luker, W., & Sheppard, N. (2009). PERSoNA: Personal Engagement with Repositories through Social
Network Applications. JISC Final Report, Leeds Metropolitan University.
Naghsh, A. M., & Dearden, A. (2004). GABBEH - A tool to support collaboration in electronic paper
prototyping: A Demonstration. ACM Conference on Computer Supported Coperative Work. Chicago:
ACM.
Soosay, M. (2007). Beginner's Guide to e-cat: Creating a learning object. Leeds Metropolitan
University. Available from http://www.streamlineproject.org.
Page 24 of 24