Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Project Document Cover Sheet Project Information Project Acronym Streamline Project Title Integrating Repository Function with Work Practice: Tools to Facilitate Personal E-Administration Start Date 1 March 2007 Lead Institution Leeds Metropolitan University Project Director Professor Janet Finlay Project Manager & contact details John Gray, 45 Snowdon Road, Eccles, Manchester, M30 9AS. email [email protected] Home: 0161 789 3971 Mob: 07931 674450 Partner Institutions None Project Web URL Web site: http://streamline.leedsmet.ac.uk Project Blog: http://www.streamlineproject.org Programme Name (and number) Users and Innovation: e-administration Programme Manager Lawrie Phipps st st End Date 31 March 2009 Document Name Document Title Final Report Reporting Period 1 March 2007-31 March 2009 Author(s) & project role John Gray – Project Manager Date 30 March 2009 URL if document is posted on project web site Access Project and JISC internal st st th Filename FinalReportStreamlinev1.1.doc General dissemination Document History Version Date th 1.0 8 January 2009 1.1 30 January 2009 1.2 30 March 2009 1.3 Comments Draft project final report th Revised to include latest results and respond to feedback on draft th Revised to include feedback from team st Final version for JISC 31 March 2009 Page 1 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Users and Innovation Programme Streamline Integrating Repository Function with Work Practice: Tools to Facilitate Personal E-administration Final Report v1.3 March 2009 Janet Finlay (Project Director, Leeds Metropolitan University) John Gray (Project Manager, Leeds Metropolitan University) Dawn Wood (Project Officer, Leeds Metropolitan University) Project Contact: Professor Janet Finlay Technology Enhanced Learning Team Leeds Metropolitan University Civic Quarter Leeds LS1 3HE Page 2 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Table of Contents Table of Contents ........................................................................................................................... 3 Table of Figures .............................................................................................................................. 4 Acknowledgements ....................................................................................................................... 5 Executive Summary....................................................................................................................... 6 1 Background ............................................................................................................................... 7 2 Aims and Objectives ............................................................................................................... 7 3 Methodology ............................................................................................................................. 8 4 Implementation ....................................................................................................................... 9 4.1 Early preparatory work ............................................................................................................... 9 4.2 Automatic metadata generation .............................................................................................10 4.3 Enhanced search tools ................................................................................................................12 4.4 Resource management ...............................................................................................................14 5 Outputs and Results ............................................................................................................. 14 5.1 Reports and documentation .....................................................................................................14 5.2 Automatic metadata generation tool.....................................................................................15 5.2.1 Keyword extraction algorithm...........................................................................................................17 5.2.2 Limitations..................................................................................................................................................18 5.3 Enhanced search components .................................................................................................18 5.4 Use cases and eFramework .....................................................................................................20 6 Outcomes ................................................................................................................................. 20 6.1 Achievements against aims and objectives .........................................................................20 6.2 Impact on stakeholders..............................................................................................................21 6.3 Lessons learned ............................................................................................................................22 7 Project Management ............................................................................................................ 22 8 Conclusions ............................................................................................................................. 23 9 Implications ............................................................................................................................ 23 10 Recommendations.............................................................................................................. 24 11 Bibliography......................................................................................................................... 24 Page 3 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Table of Figures Figure 1: Streamline product architecture............................................................................................ 11 Figure 2: Query extension.................................................................................................................... 12 Figure 3: Iterative result reuse ............................................................................................................. 13 Figure 4: Collaborative search via profile matching ............................................................................ 13 Figure 5: Collaborative search via document matching ....................................................................... 14 Figure 6: Preference panel (left) and keyword editor (right) ................................................................ 15 Figure 7: Upload panel (left) and View panel (right) ............................................................................ 16 Figure 8: Collaborators Edit and Selection panels............................................................................... 17 Figure 9: Search component overview ................................................................................................ 18 Figure 10: Detail of the Query Extender .............................................................................................. 19 Figure 11: Enhanced search interface ................................................................................................. 19 Page 4 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Acknowledgements The Streamline project was funded by JISC under the Users and Innovation programme, in its eadministration strand. In addition to the core project team (authors), the project drew on a large team of staff at Leeds Metropolitan University who each contributed to the project through software development, user engagement and evaluation. These were Mark Dixon, Elizabeth Guest, Stuart Hirst, Sanela Lavareski, Tony Renshaw, Meg Soosay, and Jill Taylor. In addition, we were able to draw on the expertise of two former members of Leeds Metropolitan University staff: Rodney Brunt for the area of information retrieval and classification and John Heap for the eFramework. Ben Ryan from Kainao Ltd contributed to our early work on repositories and with the e-CAT tool. We also acknowledge the contribution of the Leeds Met Repository/PERSoNA team: Wendy Luker, Nick Sheppard and Mike Taylor. The close collaboration between the three Leeds Met repository projects (An Institutional Repository for Leeds Metropolitan University; PERSoNA and Streamline) proved to be invaluable. Leeds Metropolitan University and Belfast Metropolitan College staff contributed greatly to the project through taking part in focus groups and interviews to explore their workflows and evaluation activities of various iterations of tools. Finally the project team would like to acknowledge the support of Leeds Metropolitan University senior staff, in particular Professor Sally Brown and Dr Barbara Colledge; the useful insights of our JISC Critical Friend, Professor Peter Hartley; and the guidance of the programme manager, Lawrie Phipps. Page 5 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Executive Summary The project aimed to integrate the functions associated with the use of repositories into the day-to-day practice of staff. Initially the intention was to do that through embedding tools within proprietary software such as MS Word, building on existing software developed by our partner company Kainao Ltd. Our early work therefore focused on the evaluation of these existing tools and analysis of how they are used, as well as investigating potential repository solutions and initiating the development of a simple test repository (as none was available in the institution). However two key events changed the direction of the project. Firstly, Leeds Metropolitan University (Leeds Met) initiated a process, supported by the JISC repositories programme, to establish a repository for Leeds Met. This meant that any work we did on the project needed to be compatible with that repository and we became involved in the decision-making process for this. Secondly, our partner company Kainao Ltd went into liquidation leaving us unable to continue to develop their tools. The project’s aims and objectives were therefore reviewed and a revised project plan and work packages were approved by JISC. Our initial work was not wasted however. By capturing case studies and developing narrative scenarios of user interaction, we had discovered that members of staff at Leeds Met were unfamiliar with learning objects and metadata, and that those objects that were being developed, tended to be multimedia artefacts. Our original focus on embedding tools in Word therefore seemed limited. We adapted our plans to develop generic desktop and web-based tools to support repository functions, and to increase the focus on staff development activities and support to increase awareness and encourage use. The tools developed include an automatic metadata generation tool that completes as much of the metadata as possible, from documentation associated with a learning object, including suggesting key words to the user; and resource discovery tools, which recommend additional resources based on closeness of objects to the original search results. In addition, we contributed to a variety of widgets, developed with the PERSoNA project, to demonstrate the use of social networking tools to promote sharing of resources through the repository. In the early phases of the project, we attempted to apply a UIDM-based rapid iteration approach to development, using electronic paper prototyping tools to develop early tool designs. This was successful in evaluating interface issues surrounding existing tools. However, once the project was forced to develop a number of tools from scratch by the liquidation of its initial partner company Kainao, it became clear that the short iterations adopted by this approach were unsuited to the combination of problem investigation and software tool development. We therefore adapted our development approach to allow a much longer software development cycle, which was informed by regular user interaction, through staff development events, focus groups, interviews and questionnaires. In spite of the difficulties in the early part of the project, it has been successful in identifying and starting to address some key issues relating to the use of repositories at Leeds Met and more widely. Due to the very recent adoption of repository technology, Leeds Met staff have limited experience of using repositories. Their knowledge of metadata, both practical and theoretical, is consequently very limited. They are also, in general, suspicious of the concept of sharing learning objects. Many staff have little engagement with Web 2.0 tools currently, and those who do, operate in small local groups with existing sharing mechanisms. This is in contrast to many staff at our partner college Belfast Met who have a longer history of using repositories. Here staff have an expectation that learning resources will be shared. However, although they use packaging tools, they still do not generally supply metadata and management of sharing is handled by a specialised team of staff. Staff development is a key element in explaining the difference in attitude: new staff induction and training for existing staff includes an expectation of sharing resources, so that it is considered part of normal work practice there. Our work has therefore been as much about raising awareness of the repository, the concept of learning objects and metadata, and Web 2.0 tools for sharing information, as it has been about integrating those tools into the repository. The project has achieved its aims of creating an automatic metadata generation tool, resource discovery tools and encouraging sharing. It has successfully raised staff awareness in the use of the repository, and evaluated the tools internally and with partners. It also informed the University’s decision on the choice of repository and its ongoing development and deployment. Page 6 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 1 Background Repositories have the potential to enhance productivity and quality through effective management of assets relating to learning and teaching, research, policy and decision-making. However unless their use is integrated into existing work practices there is a danger that they will increase rather than decrease the overall workload of staff and ultimately therefore be under-utilised. The activities of managing assets and finding resources, though pivotal to the successful deployment of repositories, are viewed by staff as adding to their administrative overload. The Streamline project therefore aimed to develop tools and processes that would integrate with existing staff work practice to support for the use of repositories. The project built on our prior experience of learning object repositories and tools in the HLSI (Higher Level Skills for Industry) project, funded by Yorkshire Forward, the EU funded projects eDILEMA: E-resources and Distance Learning Management (90683-CP-1-2001-1-CZ-Minerva-M) and REPLIKA (European Repository for Learning Innovation and Knowledge Acquisition) projects and the current HEFCE funded Centre of Excellence in Teaching and Learning Assessment and Learning in Practice Settings (CETL ALPS) repository project, each of which has contributed to our understanding of repository tools and their use. The JISC funded CD-LOR project identifies four barriers to utilisation of learning object repositories (Margaryan et al., 2006): socio-cultural, pedagogic, organisational/management and technological. Solutions to these problems are complex but, according to the findings of the CD-LOR project, they must take better account of user communities and context and ensure that the repository is seen as part of the normal context of work rather than as an isolated tool. This concurs with our own experience with HLSI, eDILEMA and REPLIKA where reluctance to fully use the repository was due to the perceived (and actual) overheads associated with its use. This was a key issue addressed by the project both through tool development and raising staff awareness and interest. 2 Aims and Objectives The project had the following aims and objectives: Aim: to develop integrated tools to alleviate the additional administration associated with the use of institutional repositories for assessment, learning and teaching, informed by understanding of existing work practices. Objectives: 1. Review previous work on integrated tools, in particular drawing together the findings of previous projects (Replika, HLSI, CETL ALPS). 2. Evaluate existing tools for integrating repository functions with existing work practice e.g. e-CAT (REPLIKA’s integrated tool) with users from different disciplines and HE and FE sectors. 3. Examine existing work practices surrounding activities (such as developing course materials) that might be expected to contribute to or draw on the contents of an institutional ALT repository. 4. Analyse the disciplinary and sector differences in work practice in the use of integrated tools and repositories. 5. Develop scenarios, domain models and service usage models representing these activities. 6. Explore and evaluate a range of mechanisms for metadata creation, such as automatic metadata generation, semantic processing, pattern matching, and metadata composition and develop algorithms and processes for their use. Explore the role of professional indexers in metadata creation. 7. Explore a range of search strategies, such as keyword and metadata search, browsing, thematic searching, full text indexing, relationship measures (nearness, distance), and recommendation. Evaluate against recognised information retrieval measures. 8. Explore the use of ePortfolios and social networking tools such as del.icio.us to facilitate personal management and sharing of repository resources. Page 7 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 9. Iteratively develop and evaluate tools for repository e-administration, integrating with both proprietary and open source software, to reduce personal administrative load. 10. Disseminate findings to evaluate tools more widely. There were some changes to the emphasis of the aims and objectives when the project plan was revised. In particular, we focused more on objectives 6 and 7 rather than 8, which was part of the remit of the PERSoNA project. We also changed our expectations for some, in the light of changes to work practices. For example, for objective 9, we looked at standalone and web services rather than embedded tools. We revisit the aims and objectives in section 6.1 when we consider the outcomes and achievements of the project. 3 Methodology The project used an iterative and participatory methodology, engaging users at regular points in the design and decision making process. This was based on the principles of User Innovation and Development Model (UIDM) for user engagement (Fowler & Scott, 2007): a high level of user interaction through workshop activities and regular feedback on emerging tools. These workshops enabled us both to capture requirements, which were represented in use cases and specifications, and to evaluate the tools as they developed. We engaged in at least five cycles of iteration, moving through capturing requirements, to executable “paper” prototypes using the tools Gabbeh (Naghsh & Dearden, 2004) and Denim (Lin et al., 2000), then to a series of software prototypes. Each iteration was presented to users and feedback informed the next design stage. This process worked well for the interface elements but proved less succesful for the functional development due to the complexity of the underlying algorithm. Evaluation was done using a range of methods including questionnaires and interviews, eye tracking and focus groups. We worked with different user groups, drawn from a wide constituency, making use of staff development events and special evaluation events to recruit a broad range of potential users. A key user group we worked with was the Podcasting Pilot project in order to explore the requirements of multimedia learning objects. In addition we consulted users in the Faculties of Health, Carnegie and Innovation North, as well as academics and learning technologists from Belfast Metropolitan College. User engagement and evaluation questionnaires are available from http://www.streamlineproject.org. An important element that emerged as the project progressed was the criticality of working closely with the other two repository-related JISC projects at Leeds Met, namely the Institutional Repository project and PERSoNA, which began in 2007 and 2008 respectively. These projects had complementary aims, with Streamline and PERSoNA converging markedly in the area of supporting sharing resources and personal management, though Streamline was focused on learning objects and PERSoNA on research outputs. It was decided that rather than duplicate effort between these two projects we would combine forces on this aspect and work together to develop applications. This element is reported in full in the final report for the Persona project (Luker & Sheppard, 2009). The decision of the University to acquire an Intralibrary repository was also significant as it meant from that point we were able to work with a real repository and engage users with a new institutional system. This has enabled us to widen our catchment beyond the originally planned user groups to the wider institution. However the delay in getting to this point meant that we were unable to make as much progress with integration as we had intended. Although the IntraLibrary repository was installed and available for experimentation by a limited number of staff in early May 2008, there have been several technical issues that have hampered its more widespread use. These include issues concerning limits to the size of documents that can be deposited via the desktop and problems in accepting separate XML files containing metadata for objects. Interoperability was an important principle underlying all our development. Although the tools have been developed to be specifically compatible with the institutional repository, Intralibrary, we adopted Dublin Core/LOM (Learning Object Metadata) to support interoperability of the tools with other repositories. Tools have been developed or translated into Java to allow easy integration and the development of a Java API. Page 8 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 4 Implementation Project activity can be considered in two parts. Initially we intended to extend tools already developed by our partner company, working with a test repository established by them. The early work on the project therefore focused on modelling user workflows, evaluating existing tools in preparation for redevelopment and developing a test repository for our work. However approximately 8 months into the project, the partner company folded and we had to re-evaluate our approach. It rapidly became clear that we were not going to be able to continue work on redevelopment of the existing tools, so a decision was made to develop new tools from scratch. At the same time, Leeds Metropolitan University was undergoing an evaluation of potential repository systems with a view to acquiring an institutional repository. This also influenced our decision-making, as we now had to ensure our work was compatible with the chosen system, rather than working on a test repository. In addition, in January 2008, another JISC project was funded at Leeds Metropolitan University (PERSoNA), which was considering issues of sharing and management of resources relating to research outputs in the repository. It was decided to work with PERSoNA in relation to this area (objective 8) rather than continuing independently, and instead concentrate development effort on automatic metadata generation and enhanced search. These changes were included in a full revision of the project plan, which was approved by JISC in early 2008. We will therefore report on project activity in four sections: early preparatory work, metadata generation, enhanced search, and personal resource management. 4.1 Early preparatory work The early work of the project focused on three areas: modelling user workflow relating to learning object development and repository use; evaluating the e-CAT tool, a packaging tool which is embedded in MS Word and is used to create learning objects with metadata (produced by Kainao Ltd, for a guide see Soosay, 2007); and developing a test repository for our project. While these activities were superceded by later developments (see above) this work was valuable preparatory work and produced a number of reports, which may be of broader interest to the community. It also produced the first of a series of use cases describing workflows associated with learning object creation and repository use, which have been useful throughout the project and are available for other projects. Working with staff at Kainao Ltd and their e-CAT product provided opportunities for producing workflows and case studies of staff depositing learning objects into a repository in the early stages of the project. Although Streamline is focused on learning objects, the institutional repository is designed to hold other types of objects as well such as research outputs and multimedia materials. Given the wide range of potential activities and objects that could be reviewed and modelled, it was important to identify a specific user group and the podcasting user group was identified as an appropriate candidate. A major podcasting pilot at Leeds Met was just starting which included users from across the University and the Regional University Network, covering a large range of topics and styles of podcast. The multimedia nature of these learning objects raised interesting issues for automatic metadata generation, since the main content was not textual. In addition, we worked with academic staff who were using e-CAT or similar tools to generate learning objects. We held initial user group meetings with staff who were exploring podcasting throughout the university. Academic staff and learning technologists produced scenarios illustrating both creation and use of podcasts and other learning resources. We were particularly interested in the assets they used, whether they used materials in repositories (formal or informal) at present, and whether they thought about the issues of metadata, finding objects, and sharing objects at all. These scenarios resulted in a collection of text, video and audio descriptions from which use cases were produced. It was immediately clear that none of the staff had considered the inclusion of metadata and that even those who were familiar with the concept and its importance, had not considered its capture within their project workflows. In June 2007 the project undertook an evaluation of the e-CAT tool in order to determine how best to support staff in identifying and applying metadata. The evaluation took the form of: Page 9 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 1. A detailed structured evaluation of e-CAT with a small number of individuals, using eye tracking, observation and post-task interviews to explore issues. 2. An in depth semi-structured interview with an active self-taught user and tutor of e-CAT. 3. A web-based distribution of e-CAT evaluated by online questionnaire. This produced a set of recommendations for the enhancement of e-CAT, which ultimately informed the design of the new automatic metadata generation tool. As Leeds Met did not have an institutional repository at the start of the project, Kainao Ltd. set out to create a simple repository using MySQL and Moodle, to provide staff with some experience and understanding of the process of depositing objects into a repository. One characteristic of this temporary repository was the need to ‘package’ any objects that were to be deposited i.e. not only capture relevant metadata but also bind the object and its metadata into a compound item that could then be deposited in the repository. This functionality had been provided by e-CAT but was a potential problem when we moved to developing a new tool. This was resolved when the University acquired the Intralibrary institutional repository, as this functionality is built into the system and we were therefore able to concentrate on the metadata and search functions. Using the information and feedback gathered up to March 2008 the team began work on the design and development of an automated meta data generation tool and enhanced search tools. An overall architectural view of the main workflow packages that were built as part of the Streamline project is presented as a conceptual architecture model that identifies the key elements, as shown in Figure 1. 4.2 Automatic metadata generation The automatic metadata generation prototype was developed to ease the process of depositing learning objects within a repository. It has been designed as a desktop tool, to be used with Intrallect’s Intralibrary repository. However, the prototype generates Learning Object Metadata (LOM) compliant metadata using an XML binding. This can then be uploaded to any similar compliant repository. It implements three concepts to automatically generate a subset of the LOM specification using an XML binding. These were identified through the earlier user engagement activities and through examining the LOM structure used by e-CAT. The concepts and their associated LOM fields are as follows: 1. Preferences: write once, use many times. Use default settings for personal details and those fields that can be persistent from one learning object to the next. These currently include language of end user and content, version, status, current user as contributor to content and metadata, learning resource type, context, difficulty and typical learning time. 2. Collections: write/extract once use by many. Use selection lists predefined within prototype, extracted from existing organisational data and created by retaining data previously used by individual users. These are saved separately as global and personal collections. These currently include collections for content and metadata contributors, copyright licenses, and technical specifications for the required operating system and/or browser plug-in requirements. 3. Content extraction: utilise existing documents. By extracting content from a variety of textual documents, the fields for keywords, title and description can be populated. Appropriate elements for metadata generation were identified by looking at university documents to identify potential keywords within these. The documents ranged from web pages about departments, through course and module specifications to textual learning resources. By organising them into four groups – organisation level, department level, course level and module level – the flow of keywords Page 10 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Figure 1: Streamline product architecture was modelled. This showed that the number of relevant keywords at the higher levels were fewer and more generally representative than those at the level of learning materials. Alongside this, techniques used in automatic summary generation were also reviewed. This dealt with both single documents and also collections. Initial work focused on using information stored in Word 2003 format documents as Leeds Met currently uses MS XP and Office 2003 as its institutional standard. It had been hoped Page 11 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 that this would be extended to other document formats by the end of the project but this was not possible due to time constraints. However, the tool provides proof of concept for this approach. The aim of the extraction process is to generate a set of potential keywords from the documents supplied by the user. Early tests on a corpus of student essays showed that topics are easy to distinguish. The resulting potential set of metadata keywords is then presented to the user so they can select what they think is most appropriate or add new words. The intention is to support the user without removing their ability to amend or extend the set of key words. The final version of the automatic metadata generation tool was tested at a Repository Awareness Day held in early November 2008, including with staff who had taken part in earlier evaluations of eCAT and the paper prototypes. Feedback from these staff was very positive. They were impressed, both by the application and how it had incorporated the suggestions made when discussing the difficulties with e-CAT. Others were impressed with the keyword generation process and interested in the method of text extraction we had used. Overall the response was good. Participants were generally inexperienced in metadata creation, although two were producing learning objects. Reports on the evaluation are available at http://www.streamlineproject.org. 4.3 Enhanced search tools In parallel with engaging with user groups on metadata generation, other project team members were investigating approaches to enhanced resource discovery. Several approaches have been investigated; the intention was to examine possibilities for serendipitous resource discovery, much like when you find a book through a library catalogue, you often find useful texts in nearby shelf locations that were not flagged by the original search. The following four approaches were investigated: Method 1: Query extension. Submit multiple searches by getting search information then creating alternative versions using a thesaurus. Rather than doing a single search, do multiple searches using this information then produce results as a visualisation. This is illustrated in Figure 2. Figure 2: Query extension Method 2: Iterative result reuse. Get the initial results from a search, extract keywords (or other metadata) of returned documents, and then use this to find related documents. Extracted keywords and those found in second pass should be standardised using a thesaurus. For each set of search results, submit new search, then follow same process recursively (obviously removing common Page 12 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 returned elements). Each iteration will result in documents less related to initial document. This is illustrated in Figure 3. Figure 3: Iterative result reuse Method 3: Collaborative search via profile matching. Identify all searches performed by a user in a particular session i.e. build a profile of a particular user’s searches. Cross-reference this to searches performed by other users in a particular session. Identify different searches that are commonly used within the same sessions. This should give a list of related articles that may be of interest to users who search for the same type of things (even if the keywords, content or metadata of the documents are completely unrelated). The more sessions in which a set of related searches are applied, the more likely it is that they are related in some way. This is illustrated in Figure 4. Figure 4: Collaborative search via profile matching Method 4: Collaborative search via document matching. Whenever a search is done, cache the returned documents. When new searches are done get a returned resource and find all caches that also contain that document, then return the contents of each cache as secondary results. Again this could be done recursively. This is illustrated in Figure 5. The development of the enhanced search prototype focused around two objectives: 1. To enable testing of extended search methods with LOM metadata 2. To enable service-based architecture to allow flexibility in use of the tool. The first of the enhanced search algorithms was selected for the implementation of the prototype enhanced search tool. As there were very few learning objects within the repository, preliminary tests were conducted with the research content, which has significantly different metadata. We have Page 13 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 therefore not been able to extend the range of the metadata used to include more interesting educational aspects. Figure 5: Collaborative search via document matching 4.4 Resource management The resource management and sharing aspects of the project were investigated early on through evaluation of a number of potential tools to provide this functionality, within a test repository. The proposed test repository was built using MySQL and Moodle, a widely used open source eLearning course management system, which offers a wide range of features enhanced by plugins produced by the open source community, including ePortfolio functionality. We also reviewed Plone, an open source content management system. This system gives users or groups individual space as well as allowing them access to publicly published materials. Finally we considered Tiddlywiki, which provides a personal wiki via a single interactive web page, written in Javascript, and provides all the functionality to blog and link resources. Each section of the page can be tagged and searched. Each of these options was reviewed to assess its value for supporting personalised management of repository resources (see http://www.streamlineproject.org for these reviews). However the adoption of Intralibrary made the need for these tools redundant and, in the latter part of the project, we have been investigating web widgets as flexible tools for providing a personal “view” of the repository. These have been developed with the PERSoNA project and full details of this activity are reported in Luker & Sheppard, 2009. 5 Outputs and Results The project has produced a number of outputs that are available for use and further development by the community. These fall into four main categories: reports and white papers, metadata generation tool, enhanced search components, and use cases and other contributions to the eFramework. 5.1 Reports and documentation A collection of reports and documentation has been produced as part of the activity of the project. These include: reviews of Plone and Moodle; a comparison of Course Genie and e-CAT; a comparison of six repository systems; reviews of metadata creation and search algorithms; and evaluation reports for e-CAT and the automatic metadata generation tool. In addition, the project tools and processes have been documented, both for users and developers, and a Beginner’s Guide to eCAT has been produced. All of these documents, together with the more informal discussions of the project team on the blog, are available from http://www.streamlineproject.org. Page 14 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 5.2 Automatic metadata generation tool The automatic metadata generation tool is a stand-alone java Jar file using flat file storage within a specific document structure. Extraction from the download zip file creates this structure and clicking the Jar will run the application. All user data is stored within a flat file, unencrypted. A simple logon interface enables previous users to access and use their profile. New users can select guest and are initially invited to create username and password and enter/select their preferred default settings (Figure 6). Usernames and passwords are case sensitive and each is checked for uniqueness. The password can be changed at a later date; the user name is permanent. The preference panel enables defaults to be set for each new metadata set created. These can be edited for individual metadata sets later (via Edit) or changed on the preference panel (via Preferences Your Details) for future metadata generation. Tool tips (shown in Figure 6) are available for all the editable fields when the user rolls over the field’s label. Figure 6: Preference panel (left) and keyword editor (right) The interface to the application consists of a series of panels accessed by a top menu bar. Access to some elements of the menu is restricted depending on the user’s point of metadata generation. Popup windows are used for creating, editing and selecting one of four collections that can be used to quickly populate the metadata. After logging in or signing up, the user is presented with the Upload panel. Here they can add and remove documents for the prototype to analyse for the extraction of keywords, title and description. Each document must be identified via a drop down list embedded in the table (Figure 7). The prototype can currently process MSWord2003 document format. The document types are as follows: • Unknown: default setting for any documents that do not fit the other types. Page 15 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 • Learning object: the learning content in textual form. There should only be one of these labelled. • Department: formal documentation developed at the department/faculty level. • Course: formal documentation developed at the course level. • Module: formal documentation developed at the module level. • Script: learning object development documentation that is non-formal. Figure 7: Upload panel (left) and View panel (right) To process the documents the generate button is selected which will, where possible, populate the keywords, title and description fields. The user preferences are also processed at this point and added to the metadata. Documents do not have to be used with the prototype, simply selecting generate will enable the user to create metadata from scratch, and only fields from their preferences will be included. The View panel (Figure 7) presents the user with an overview of the metadata fields currently populated. Text colouring has been used to inform the user of the status of each field. Red text indicates fields that are mandatory according to the LOM specification and are currently empty. Grey texts are those fields that are not mandatory and are also empty. This is used only as information to the user and they may export the XML regardless of the field states. If they are satisfied with the metadata at this point they can simple export the XML and close the application. If they need to change the metadata, access to the edit panels is available once generation has completed. The following edit panels are available: Collaborators panel (Figure 8), Learning Object Details panel (including rights collection), Technical Details (system and plug-in collections), Educational, Keywords & Classification (Figure 6). Figure 8 shows the Collaborators Page 16 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 panel with the Collection Selection table. This is generated from two files, one containing globally available collaborator details and the other containing the user’s past collaborators. Technical and Rights both use a similar selection device. Users can move freely around these panels and edit as they chose. Data will not be saved on any of the panels until Save is selected from the File menu. This will update the internal application memory and re-write the View panel to show the changes that have been made. Closing the application before exporting will lose all data not saved in the user preferences. Figure 8: Collaborators Edit and Selection panels 5.2.1 Keyword extraction algorithm The uploaded documents are analysed and a stop list applied to them to remove all small and common terms such as the, and, their. Stemming (identify the word root) is not applied here as it produces words that are generally unfamiliar and non-descriptive to user. For example computing becomes compute, which is as relevant to the field of mathematics as it is to computer science. The documents are then sorted into the types corresponding to the labels given to them by the user on the upload panel. Currently all formal document (Department, Course, Module) are grouped together as tailored stop lists have not yet been implemented. Any unlabelled documents or multiple occurrences of the learning object label (the first document labelled as learning object is retained as such) are classified as unknown and processed separately. Two IR techniques are then applied depending on the type and number of documents uploaded. Simple Term Frequency (counting the number of occurrences) is used on the learning object (if textual) and any document types that contain only a single document. The second method, Term Frequency-Inverse Document Frequency (TF-IDF), works across a set of documents. This basically Page 17 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 calculates those terms that are most frequent in a given document and least frequent in comparison to the terms in the rest of the document set. The prototype uses a version of TF-IDF that use natural logarithm in its calculation as follows: Document frequency = ln(Number of Doc / Number of Doc Term Occurs in) TF-IDF = (Term Frequency * Document Frequency) This results in some combination of up to four sets of keywords: those from the learning object; those from formal documents; those from scripts; those from unknown documents. Twenty keywords are then selected from a combination of learning object, scripts and formal documents, depending on what is available. Duplicates are removed and any shortfall is topped up from the learning object. The unknown set is only used if no other sets are available. The description field is extracted from the learning object only, as the first two sentences of the first paragraph, if these are identifiable. The Title is assumed to be the text formatted as Title in a generic word document. 5.2.2 Limitations The current prototype version can only extract content from MSWord2003 although the APIs (Apache PO1 3.1 Final) are available in the code for extending this to the rest of the MS2003 applications. The XML output is tailored for use with IntraLibrary, which required the removal of mime type and file size metadata as these are automatically processed by the repository. It is assumed this is a common feature of most repositories, but may return a malformed XML from those that do not. 5.3 Enhanced search components The enhanced search application comprises four components (see Figure 9): a client side set of Ajax interface written using the Google Web Tool kit; the service initiator that facilitates communication between client and sever; the query extender that takes the users input and imitates four alternative searches based on synonyms extracted from the thesaurus, detail shown in Figure 10; and the parser that extracts the relevant fields from IntraLibrary’s response for each of the queries. Figure 9: Search component overview The client side search interface provides the seamless integration of an Ajax style application. A single search box and button enable users to enter their query. The results are displayed in a series of embed hide and show panels. A generated hyperlink menu at the top of the page enables users to select the results of their original term or any of the extended terms (currently set to four) individually. Each set of results is displayed in a drop down panel that gives the title of the document. When clicked this allows the user to see further details relating to the document. This currently includes Page 18 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Authors, Date and Description available in the metadata. A link to the learning object is also provided, but at this stage it is not attached to any authentication process. With the aide of the parser and GWT interface this can easily be adapted to accommodate other metadata details. Figure 11 shows the skeletal structure of the interface, which can be styled via CSS. Figure 10: Detail of the Query Extender Figure 11: Enhanced search interface Page 19 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 5.4 Use cases and e-Framework Having analysed the data from the podcasting scenarios, we developed initial use case diagrams capturing the processes described by the users. A high-level diagram was developed identifying the overall process and decisions a user is required to consider. Some of the processes are specific to the editing application iMovie (part of the editing tool set used in the Apple video editing suite), but can be applied to other editing processes. From this, specific scenarios were modelled to show the changes required, for the user, in the decision process: • • • How a podcast can be used to deliver material in preparation for a tutorial. Recording a lecture for student reference Creating persistent materials for instruction and advice, with student input A second set of scenarios were developed by an eCat tutor. These were more closely focused on the production of learning objects by lecturers and other staff, based on data collected through a series of staff development workshops. A high-level diagram was produced showing the processes and decisions of producing a learning object using the eCat system. We then used the scenarios above to examine the differences in requirements between an experienced lecturer and a lecturer at the start of their teaching profession. In contrast, we also modelled Belfast Met’s supported learning object creation activity, producing another four use cases for their multimedia learning object production process. These use cases are all available at http://www.streamlineproject.org. In addition we have submitted an entry for the project into JISC’s Innovation Base. 6 Outcomes 6.1 Achievements against aims and objectives The project has substantially met its main aim: to develop integrated tools to alleviate the additional administration associated with the use of institutional repositories for assessment, learning and teaching, informed by understanding of existing work practices. The tools produced are not embedded in proprietary tools as originally planned but are available as standalone tools and web services to allow integration with a range of repositories. The standards used have focused on interoperability. With regard to specific objectives (achievements highlighted in italics): 1. Review previous work on integrated tools, in particular drawing together the findings of previous projects (Replika, HLSI, CETL ALPS). We carried out an early review of these projects but when the project changed direction we needed to review a number of alternative development platforms, prior to the institution acquiring a full repository. These reviews are available at http://www.streamlineproject.org. 2. Evaluate existing tools for integrating repository functions with existing work practice e.g. e-CAT (REPLIKA’s integrated tool) with users from different disciplines and HE and FE sectors. eCAT was evaluated and this informed the design of our own tool. These have been evaluated with staff from across the institution as well as from an FE college partner. 3. Examine existing work practices surrounding activities (such as developing course materials) that might be expected to contribute to or draw on the contents of an institutional ALT repository. Scenarios of practice have been derived from focus group and tutorial activities and use cases created. 4. Analyse the disciplinary and sector differences in work practice in the use of integrated tools and repositories. Although we have evaluated tools with staff from different disciplines, their limited knowledge and use has made it impossible to make any judgements about disciplinary differences in use. The Page 20 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 comparison across an HE and an FE institution shows clear differences in process, which have been considered. 5. Develop scenarios, domain models and service usage models representing these activities. We have developed use cases and specifications for tools and have contributed to the Innovation Base. 6. Explore and evaluate a range of mechanisms for metadata creation, such as automatic metadata generation, semantic processing, pattern matching, and metadata composition and develop algorithms and processes for their use. Explore the role of professional indexers in metadata creation. We have documented our deliberations on a range of approaches to metadata creation and the choices made in developing the automatic metadata generation tool. This included consideration of professional indexers. 7. Explore a range of search strategies, such as keyword and metadata search, browsing, thematic searching, full text indexing, relationship measures (nearness, distance), and recommendation. Evaluate against recognised information retrieval measures. We have documented our consideration of a range of search strategies and the choices made in developing the final tool. The tool has been evaluated against standard search algorithms. 8. Explore the use of ePortfolios and social networking tools such as del.icio.us to facilitate personal management and sharing of repository resources. We considered ePortfolio plugins when we were reviewing Moodle and Plone but, when we began working with the institutional repository, we did not have access to an institutional ePortfolio so concentrated on social networking. With the PERSoNA project, we have explored social networking tools to encourage sharing and personal management of resources within the repository and we have documented a number of approaches considered. 9. Iteratively develop and evaluate tools for repository e-administration, integrating with both proprietary and open source software, to reduce personal administrative load. We have developed tools, through an iterative design and evaluation process. Following the change in the project plan we decided to develop these as stand alone and web-based tools rather than integrating within proprietary software. 10. Disseminate findings to evaluate tools more widely. We have disseminated widely within the institution, at JISC events nationally and within partner institutions. We are planning an end of project event to disseminate the outputs to the community. The tools produced are being adopted as part of the repository element of the PC3 project funded by JISC under the Curriculum Design programme. Overall our observation is that the project’s original objectives were based on the assumption that we would be adapting existing software, rather than developing tools from scratch. The change in the project plan led to focus on one tool for each aspect, rather than trialling of several. However, we have reviewed alternatives and documented our considerations. 6.2 Impact on stakeholders The project provides benefits to a range of stakeholders in higher education. These are summarised below. University senior management: The project has provided tools to assist staff in the creation of a university wide reusable resource. It has highlighted issues for staff development in both the creation of reusable learning objects and also depositing and searching for objects in the repository. Teaching staff and learning technologists: The project provides support to staff in both submitting objects to the repository and finding objects in it. It also helps improve understanding of the process of deriving keywords and metadata. Page 21 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 Students: As staff start to make greater use of the repository for sharing resources, students will benefit from a wider range of learning resources being made available to them. Technical staff – supporting repositories: The project has provided two prototype tools that can be used to enhance repository use. These have been designed to be interoperable and are available to be used, developed and extended by the community. The project has also modelled workflows of how staff use and create learning objects, including multimedia objects. Software developers and vendors – open and commercial: The project has provided specifications for innovative tools and algorithms. Open source licensing makes these available for adoption, adaptation and development. e-framework: The project has provided a statement of services needed to support the tools, together with models of user behaviour when interacting with repositories and learning objects. 6.3 Lessons learned In the course of the project, we learned a number of lessons that may be valuable to other projects, particularly relating to processes and management. The User Innovation and Development Model (Fowler & Scott, 2007) works well for developing the interactive elements of software but is more constraining when developing the backend components of complex software algorithms. We were over-ambitious in our plans, even before circumstances forced us to change direction. Projects should recognise the complexities of creating robust, well-constructed web-based software tools when allocating time and resources in the project budget. We have also learned that things can go wrong on projects, so plans need to be flexible and able to change to accommodate changes in circumstance. Risk assessment is an important element of project planning and, if done realistically and thoroughly, will help when things do not go as planned. 7 Project Management The project director, bought out on a fraction of contract, managed the project in the first year. This was not ideal, as project management responsibilities were often competing with project development activities for that time. From the second year, project management was devolved to an experienced project manager on a contractual basis, which was much more successful. The start of the project coincided with a freeze on new jobs in the Faculty, which led to a six-month delay in recruiting a full time project officer. Buying in additional existing staff time covered this time. However, this was not as effective as a full time member of staff, due to competing demands on their time. It would be useful in such circumstances to be able to negotiate a delayed start to the project, particularly since the lead in to projects is often so short. The liquidation of the external partner company Kainao Ltd, led to significant revision of the original project plan. In particular, the decision to focus on developing tools from scratch, rather than modifying the existing e-CAT tool, supported by key staff from Kainao, meant that the original UIDM based 10-week cycles were no longer feasible. This also led to the project officer having to spend more time on software development than originally intended. At this point the workload placed on core project staff meant that more staff were required to support the project, a fact reinforced by the early retirement of yet another member of the team. As well as the project manager, an additional software developer was bought in from existing staff, to support the project officer. Subsequent revisions to the project plan and budget were submitted and approved in March 2008 and these have subsequently proved to be effective for the project. The increasingly close partnership between the three repository projects at Leeds Metropolitan University led to the projects holding joint monthly meetings, sharing results, tools and insights in a rich and complementary fashion. This has resulted in some blurring of the distinction over where the responsibility for certain outcomes actually lies, for example the issue of resource management in Streamline and the use of social networking to increase the sharing of knowledge in PERSoNA have become merged into one activity. Page 22 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 As well as face-to-face meetings we managed our activities using social networking tools. At the beginning of the project we set up two blogs; one an external-facing blog to report progress and the other an internal project one for ongoing development. As work progressed it became clear that this was an artificial separation. It was increasingly difficult to decide what should go where and at what point work had “progressed” far enough to be reported. The net result was that much of the ongoing work was remaining behind the closed door of the internal blog. We therefore decided to merge the two. All the internal blog posts (except minutes etc) were added to external-facing blog and all subsequent posts were made to that blog. This has proved much more effective and demonstrates the risks associated with selecting an overly complicated communication strategy. It also highlights the fact that openness on the process of development can be beneficial to projects. We have held a number of joint events to share outcomes, including sessions at the annual Staff Development Festival and a Repository Awareness Day. However one further event is planned in May with a particular emphasis on dissemination externally. 8 Conclusions The Streamline project met its aims but with reduced scope and on a different trajectory than that originally intended. These changes in circumstance have at times been difficult for the project team, but have been beneficial in highlighting important issues relating to risk management and flexibility in planning. As the project progressed it became clear that institutional context and culture was also a major influence on the development of the project. Leeds Metropolitan University is very new to the use of repositories (with an institutional repository being established for the first time during the project). Staff therefore have no expectations and limited knowledge of tasks surrounding repository use. This led us to spend more time than anticipated working with staff on raising awareness. The widespread use of learning objects is still some way off at Leeds Met but the project has contributed to broadening the debate and has provided tools to facilitate the process. However, tools are only one element. There is also an ongoing issue with change management for staff. An interesting discussion at one of our awareness events resulted in a request to provide a brief summary of the benefits the repository would bring to an average overworked academic. For research this easy to articulate. Research assessment is in future going to be weighted towards citation. People cite what is accessible and an open access repository makes work available for download. In other words, sharing research outputs is self-rewarding for researchers. Such benefits are harder to define for learning objects. In theory you share what you have and gain access a much wider pool of resources through the repository. Academics stop reinventing the wheel, save time and resources and have better quality materials. But in practice this is rarely how it works, particularly if a repository has not reached “critical mass” - and many repositories never reach this point for any given discipline. There has to be content for people to see benefit - and we can only have content if people contribute in some way altruistically. For many academics this presents a problem. There is no recognised reward system for sharing teaching and learning, as there is for research. Unlike research papers, authorship and contribution are often distributed and harder to specify. Plagiarism is certainly harder to spot and possibly taken less seriously. To promote a culture of sharing learning resources requires that this is given appropriate recognition. This might be at an institutional level, giving credit for the learning resources shared in promotion and personal review. It might be at a community level, offering some kind of community recognition. Any of this would need to be accompanied by peer review so that credit is given for quality not just quantity. Such a scheme is beyond the scope of the project but is one aspect that needs to be considered in future developments in this area. 9 Implications Page 23 of 24 Project Acronym: Streamline Version: 1.3 Contact: Janet Finlay st Date: 31 March 2009 The project has developed prototype tools and explored a range of algorithms. These tools can used and extended by the community. Future developments in the following areas would enhance these tools: Automatic metadata generation tool: • The extraction tool needs to be extended to include a wider variety of file types. • Specialised stop lists could be used for formal document level analysis. • Sentence-based summary methods could be used to populate the description field, regardless of the structure of the documents. • The LOM subset used could be extended. • Categorisation of learning objects could be automated using the thesaurus and IntraLibrary’s cataloguing taxonomy. • Personal collections for technical aspects and rights could be implemented. Enhanced search tool: • The search could be extended to include an advanced search that utilises individual metadata fields. • The other three augmented searches could be implemented and tested, which could then be offered as a choice to users. • Neighbourhood search implementations could be extended to include latent semantic analysis and other algorithms. • Develop interface to allow visualisation of search results. General issues • Work with IntraLibrary to enhance the range of tools and APIs available so that content can be seen and searched via the web. • Applications for access to IntraLibrary from common Web 2.0 applications could be developed along the lines of the Facebook application developed with PERSoNA. • Integration of repository and tools with virtual learning environment and other institutional systems. 10 Recommendations A number of specific recommendations arise from this project. 1. Institutions should consider specific staff development with a focus metadata and searching if sharing resources is to be facilitated. 2. Institutions and the wider community should consider developing reward mechanisms to encourage sharing of quality resources. 11 Bibliography Fowler, C., & Scott, J. (2007). The User Innovation and Development Model Guide. University of Essex. Lin, J., Newman, M. W., Hong, J. I., & Landay, J. A. (2000). Denim: finding a tighter fit between tools and practice for web site design. Proceedings of CHI (pp. 510-517). ACM. Luker, W., & Sheppard, N. (2009). PERSoNA: Personal Engagement with Repositories through Social Network Applications. JISC Final Report, Leeds Metropolitan University. Naghsh, A. M., & Dearden, A. (2004). GABBEH - A tool to support collaboration in electronic paper prototyping: A Demonstration. ACM Conference on Computer Supported Coperative Work. Chicago: ACM. Soosay, M. (2007). Beginner's Guide to e-cat: Creating a learning object. Leeds Metropolitan University. Available from http://www.streamlineproject.org. Page 24 of 24
© Copyright 2025