University of Bradford JISC Grant Funding 03/09 Cover Sheet for Proposals

JISC Grant Funding 03/09
Cover Sheet for Proposals
(All sections must be completed)
Rapid Innovation
Programme
Name of JISC Initiative:
JISC Rapid Innovation
Grants
Name of Lead Institution:
University of Bradford
Name of Proposed Project:
Concept Linkage in Knowledge Repositories
Name(s) of Project
Partner(s):
National Media Museum, Bradford (part of the
National Museums of Science and Industry Group)
Full Contact Details for Primary Contact:
Name:
Position:
Professor Peter Cowling
Associate Dean (Research)
Professor of Computer Science
School of Computing, Informatics & Media
[email protected]
01274 234005
School of Computing, Informatics & Media
Email:
Tel:
Fax:
Address:
University of Bradford
Bradford BD7 1DP
Length of
Project:
6 months
Project Start
Date:
Project End
Date:
1/6/09
Total Funding Requested from JISC:
30/11/09
£32,000
Funding Broken Down over Academic Years (Aug-July):
Aug08 – July09
£10,667
Aug09 – July10
£21,333
Total Institutional Contributions:
Outline Project Description
Knowledge repositories proliferate at an accelerating rate. While these offer excellent
support for specific information searches, there is limited support for unstructured
browsing or semi-structured information gathering, when a user does not know what
i
there is to know (but wants to find information connecting known concepts). Students
making the transition from School to University often feel swamped by information and
need to develop skills in information literacy. There is strong evidence that Wikipedia
is a very important source of information for University students (consider the JISC
SEEL project), especially in year one. Tools for understanding the structure of
information in these large repositories and for conducting semi-structured queries are
needed by University students and by the general public.
This project will build a tool for semi-structured searching of knowledge repositories
based on finding previously unknown concepts that lie between other concepts.
Consider a user who wanted to know about optimisation of crystal structures. A
search which looks for concepts which lie between and hence connect “optimisation”
and “crystal structure” may turn up previously unknown concepts such as “genetic
algorithms” or “space groups” – which would be very difficult to find via conventional
approaches to search (which assume that the user has a good understanding of what
terms to search for).
This project brings together the School of Computing Information and Media and the
Teaching Quality Enhancement Group at the University of Bradford, together with the
National Media Museum (based in Bradford). Hence the project has a range of critical
friends to increase applicability, take up and longevity of the developed tool.
The National Media Museum is in the process of assembling a gallery about the
history, evolution and social impact of the Internet, and the proposed project may yield
a tool which may form part of that gallery, in which case a very large number of users
will try the system via the National Media Museum’s web site and physical gallery
space. Working with the National Media Museum provides a unique opportunity to add
value to the project by raising public understanding and awareness of new ways to
understand and use information repositories, including the largest repository of them
all, the Internet. Please note that the future Internet gallery is not yet in the public
domain and therefore the National Media Museum would need to review and approve
any press releases or information concerning the gallery, or the relationship of this
project to the gallery.
List of priority areas, highlight each that applies:
Mashups of open data ***
Aggregating tags and feeds
Semantic web/ linked data ***
Data search ***
Visualising Data ***
Personalisation
Mobile Technologies
Lightweight Shared Infrastructure Service
User Interface Design ***
I have looked at the example FOI form at
YES
NO
Appendix A and included an FOI form in
√
the attached bid (Tick Box)
I have read the Funding Grant and
associated Terms and Conditions of
Grant at Appendix B (Tick Box)
YES
√
ii
NO
B. FOI Withheld Information Form
We would like JISC to consider withholding the following sections or paragraphs from
disclosure, should the contents of this proposal be requested under the Freedom of
Information Act, or if we are successful in our bid for funding and our project proposal is
made available on JISC’s website.
We acknowledge that the FOI Withheld Information Form is of indicative value only and that
JISC may nevertheless be obliged to disclose this information in accordance with the
requirements of the Act. We acknowledge that the final decision on disclosure rests with
JISC.
Section / Paragraph No.
F
Relevant exemption
from disclosure under
FOI
Budget
iii
Justification
Pricing is commercial in
confidence
C.
Appropriateness and Fit to Programme Objectives and Overall Value to the JISC
Community
'Long before its competitors, Google's knack was to realise that people navigated their way through
Cyburbia not by information per se, but by examining the relationship between bits of information'. James Harkin, Cyburbia, 2009.
1.
2.
3.
4.
5.
6.
Consider a first year undergraduate student writing a first essay. The information resources
available are immense, and possibly intimidating. Many students will use Wikipedia as a first
point of reference. However, there is a problem with using Wikipedia, the Internet, or another
knowledge repository – the student will find it very difficult to find new concepts. For example,
suppose they wish to try to find a link between the second world war and the local area. A
search for “second world war” or for the local area using a search engine such as Google, is
unlikely to yield useful information. The required information lies between these two concepts
(in a space of concepts which are unknown, and hence unsearchable, for the student).
This project aims to address the first priority area in the JISC 03/09 Rapid Innovation Grants
call – that of “Information seeking by learners, teachers and researchers”. We will build a
software tool which allows the user to search for concepts which lie in an unknown (and
hence unsearchable) space of concepts “between” other known concepts (the notion of
“betweenness” is explained in more detail below). There is the potential here to revolutionise
ideas of Internet and knowledge repository searching, but at a realistic level there is great
potential to provide a powerful tool for learning and research, and to provide greater
understanding of the structure of knowledge in the repository.
We consider the concepts in the knowledge repository as the nodes in a network, with the
connections between concept “nodes” giving the degree to which two concepts are related.
The “relatedness” value which we use can consider numbers of common words, or common
links, etc. Experimental analysis of appropriate “relatedness” values is an important goal of
this project. Once we have constructed the network we can investigate it using a range of
search and visualisation approaches. For example, we can use shortest path algorithms to
find concepts which lie between two concept nodes, or we can explore the degree and
distance by which concepts are related, to understand the structure of knowledge in an area.
The second part of the project will investigate algorithms and visualisation tools for
exploration of the knowledge repository network.
The project is linked to the following priority areas: “Mashups of open data”, “Semantic web/
linked data”, “Data search”, “Visualising Data”, and “User Interface Design”, since it will
provide an easy-to-use interface for software tools for data search using the publically
available Wikipedia data repository (although the techniques we will use could be applied to
another repository, such as a publication database or a subset of the Internet, in a follow on
project). In particular, the project will provide a novel way of visualising and exploring
information by considering the information as a network of connected concepts.
The project brings together the research, software and project management skills of the
School of Computing, Informatics and Media at the University of Bradford, the expertise in
new methods for teaching and learning of the Teaching Quality Enhancement Group at the
University, and the public engagement focus of the National Media museum. Brief CVs are
given towards the end of this document. We have demonstrated the technical (software,
modelling, algorithm development, user interface development) skills for this project over
many related projects, which have attracted several million pounds of funding from research
council, EU and industry sources. Engagement with user groups which represent both
University students and a broader interested public should provide a software tool which is
effective in a broad range of teaching, learning, research and public understanding
applications, and likely to provide long term benefits to a wide range of users.
Sustainability of the project and longevity of the software will be assured by involving two
large user groups (representing University of Bradford students and a broader scientifically
curious public), and by releasing all source code using an open source arrangement to allow
broader dissemination to other Universities and groups.
1
D.
Quality of Proposal and Robustness of Workplan
Outline of Project
7. This project will aim to provide software tools for visualisation and analysis of large knowledge
repositories by analysing concept linkage. We will use information from the online
encyclopaedia Wikipedia as it is easily accessible (and can be downloaded in its entirety onto
a PC for development/analysis), and contains a huge amount of connected data.
8.
Visualising large networks is a difficult task. This project will produce tools which allow
visualisation of dynamically changing networks. The networks will change during a user
search, and due to changes in the user-selected measures of connectedness between
concepts.
9.
We will conduct a short study of existing algorithms for comparing two online documents,
using textual and link analysis.
10. Finding a “path” between two knowledge items will identify new and previously unknown
knowledge to the user. Shortest path algorithms such as the A* algorithm, can allow the user
to tailor their search. We will implement a “user-directed” version of the A* shortest path
algorithm.
11. The project will develop tools for visualisation of the search process. A graph of
intermediate nodes which connect the start concept to the end concept will be presented in
such a way that the user can browse at an appropriate level of detail – from simply discerning
intermediate network structure, through to inspection of highly relevant connecting concepts,
to inspection of less immediately relevant concepts. In all cases, it should be possible to
inspect a node to determine the detailed contents, and then easily return to the network
search.
Deliverables
12. This project will deliver the following:
a. Algorithms for quantifying similarity between knowledge items in Wikipedia, and a
user interface to allow the user to easily specify their similarity measurement
preferences. These algorithms may later be generalised across other knowledge
repositories and the Internet.
b. Software for visualisation of concept linkage. This tool will use the above algorithms
to provide a visual representation of how two knowledge items are linked, and to
visualise concepts lying close to a “path” of knowledge items linking them.
c. Evaluation of the tools. Members of the Teaching Quality Enhancement Group and
the National Media Museum will evaluate the software tools in the context of student
learning and research and in the context of public engagement, respectively.
Workplan
13. Preparatory work will take place and will include the following: (1 month)
a. Project website setup.
b. Investigation of requirements in brainstorming/planning meetings of the whole project
group, and finalisation of a detailed project plan.
c. Identification and assessment of project risks.
14. Analyse/design/implement knowledge item similarity metrics: (1 month)
a. Analyse data in the Wikipedia database to understand the format which our
algorithms must use.
b. Concept similarity measures will be investigated and algorithms/software created to
empirically test these methods
15. Analysis/design of search algorithms for concept linkage. (1 month)
a. Investigate/implement search algorithms using previously implemented similarity
measures.
2
b. Empirically test efficiency at finding “interesting” paths between concepts.
16. Visualisation/user interface software development (2 months)
a. Design/implement software for fast retrieval of partial/whole concept data from
Wikipedia at three scales (title only, title/brief description, full).
b. Create resizable visualisations of subnetworks of the overall concept network.
c. Design/implement software for visualising the search methods at different levels of
detail as a tool for exploring concept linkage.
17. Evaluation and testing (1 month)
Throughout the project, there will be continuing evaluation of designs and software
prototypes. At the end of the project there will be a more detailed user evaluation:
a. Testing/evaluation of the developed software as an educational tool, by
undergraduate/masters/research students from the University of Bradford.
b. Testing/evaluation of the developed software as a tool for public understanding of
knowledge repositories and the Internet, by National Media Museum staff.
18. Documentation/dissemination:
a. Minutes of Meetings will be recorded and project progress blogged. The final project
will be well documented with help files for easy dissemination.
Project Management Arrangements
19. The Project Roles are:
Stephen Remde – Software analysis/design/development. Day-to-day task
management.
Peter Cowling – Overall project management and high level design. Steering/Quality
assurance for researcher stakeholder group.
Peter Hartley/Will Stewart – Steering/quality assurance for University teaching and
learning stakeholder group
Joe Stocks-Brook/Tom Woolley – Steering/quality assurance for public
engagement/National Media Museum stakeholder group
20. The Project Steering Committee will consist of all the above people. The group will have two
half day meetings at the start of the project (to brainstorm and develop stakeholder
requirements and high level design ideas). The Steering group will also meet half way through
the task given above in point 16 (the earliest point when there is a fully functioning piece of
software) to assess progress, to identify needed modifications of the part-finished design and
to consider possible follow-up projects. A final meeting will occur at the end of the project to
finalise dissemination routes that will be used to publish the software.
21. If no steering group meeting is planned, a shorter virtual meeting of the steering group
(using Skype) will take place to keep all project members up to date with project progress.
22. The project will be managed by Peter Cowling and the work undertaken by Stephen
Remde. Documented weekly project meetings will take place to monitor progress and set
short term goals between these two, in addition to daily ad hoc meetings.
E.
Engagement with the Community
23. Engagement with the Teaching Quality Enhancement Group at the University of Bradford
means that the student learning stakeholder group is well represented in this proposal.
Through the Teaching Quality Enhancement Group and directly with students from the School
of Computing Informatics and Media, we will ensure that our software addresses the needs of
taught course students. In particular, this may point to the possibility of a follow on project
which uses other data repositories for student learning.
24. Engagement with the National Media Museum, at a time when they are at the early stages
of assembling a gallery about the history, evolution and social impact of the Internet, provides
a golden opportunity to engage members of the public. Our software should enable members
of the public to access and understand data repositories and the Internet in new ways. In
3
25.
26.
27.
28.
F.
particular, this may point to the possibility of a follow-on project which considers a larger part
of the Internet.
Peter Cowling and Stephen Remde are both active researchers. In addition to providing a
dissemination route for project deliverables via the scientific literature, they will work with
other researchers within the School of Computing Informatics and Media to evaluate the
applicability of the developed tools to the research domain. In particular, this may point
towards the possibility of a follow on project using research data and research publication
repositories (such as the Bradford University Repository Project funded by JISC).
Software will be released using an open source licence agreement to be used as a teaching
aid or research tool for a wide range of knowledge repository users.
Progress will be blogged and a project website will be maintained throughout the project.
We will work with other related JISC project groups, such as the Dynamic Learning Maps
project at the Newcastle University, for which Peter Hartley is an external critical friend.
Budget
Directly Incurred Staff
Software Developer (Stephen Remde)
grade 7, 6 months full time
Technical support – contribution 0.1 FTE
Total Directly Incurred Staff (A)
August 08– July 09
£5,633
August 09– July 10
£11,267
TOTAL £
£ 16,900
£508
£6,141
£1,016
£12,283
£ 1,524
£ 18,424
Non-Staff
PC + development software
Total Directly Incurred Non-Staff (B)
August 08– July 09
£1,500
£1,500
August 09– July 10
£
£
TOTAL £
£1,500
£1,500
Directly Incurred Total (C) (A+B=C)
£7,641
£12,283
£19,924
Directly Allocated
Project Manager/Systems Architect 0.2
FTE (Peter Cowling)
Co-investigator 0.05 FTE (Peter Hartley)
Int. critical friend 0.05 FTE (Will Stewart)
Estates
Media Museum staff (5 days)
Directly Allocated Total (D)
August 08– July 09
£2,753
August 09– July 10
£5,505
TOTAL £
£ 8,258
£701
£456
£2,600
£248
£6,758
£1,403
£912
£5,200
£496
£13,516
£ 2,104
£ 1,368
£7,800
£744
£ 20,274
Indirect Costs (E)
£8,360
£16,719
£25,079
Total Project Cost (C+D+E)
Amount Requested from JISC
Institutional Contributions
£22,759
£10,667
£12,092
£42,518
£21,333
£21,185
£65,277
£32,000
£33,277
Percentage Contributions over the life
of the project
JISC
49 %
Partners
51 %
Total
100%
No. FTEs used to calculate indirect
and estates charges, and staff
included
No FTEs
0.65
4
Which Staff
Stephen Remde (1.0 FTE x 6 months)
Peter Cowling (0.2 FTE x 6 months)
Peter Hartley (0.05 FTE x 6 months)
Will Stewart (0.05 FTE x 6 months)
G.
29.
30.
31.
32.
33.
34.
Previous Experience of the Project Team
Professor Peter Cowling (Principal Investigator) is Associate Dean (Research) and
Professor of Computer Science in the School of Computing Informatics and Media at the
University of Bradford. He is very active in computer science research and knowledge
transfer, having published over 60 articles in high quality scientific journals and conferences,
and edited 2 books, as well as securing substantial research funding. His research is related
to the use of Artificial Intelligence to the modelling and search themes of this proposal. He has
a passion for teaching, and has secured funding from Microsoft in recent years to develop
three environments for teaching computer science and programming skills. He has managed
several medium-sized (1-10 person-year) software projects, and was a software project
manager for AI systems BV, Belgium, prior to joining academia.
Stephen Remde (Software Designer/Developer) will submit his PhD thesis at the end of
April 2009. His PhD was undertaken in collaboration with an active industrial partner (Trimble
MRM Ltd.) which has provided him with excellent software development, project management
and team working skills. The topic of his PhD was in systems for mobile workforce
scheduling, requiring substantial expertise in the search and other AI techniques in this
proposal, as well as user interface design methods. He has significant experience working as
a professional software engineer and web application designer (for Intrica Ltd.).
Prof Peter Hartley (co-investigator). Since moving to Bradford in 2003, he has been
proactive in developing the university’s policies and practices to enhance student learning,
extending support for e-learning and e-assessment, and establishing new project and elearning initiatives. His national involvement includes the University’s two successful
partnership CETLs (LearnHigher and ALPS), Project Director/sponsor for 4 previous JISC
projects (ELP1, ELP2, IT4SEA and ASEL), and work for three JISC Advisory Boards. He led
the University’s Pathfinder project following Benchmarking and is External Evaluator for one
CETL focussed on e-learning (SOLSTICE). He is currently one of the Critical Friends on the
JISC Curriculum Delivery Programme. His work as National Teaching Fellow (NTF) has
included multimedia software to support communication via virtual learning - The Interviewer,
Gower 2004. Publications include research on assessment feedback and applications of ICT.
Will Stewart (internal critical friend) began working in 2002 with the NLN Materials Team
at Becta and was involved in promoting the integration of e-learning into the curriculum and
encouraging the use of the NLN Materials in teaching and learning in the FE and ACL
sectors. In his previous role as e-learning advisor with the JISC Regional Support Centre for
Yorkshire and Humber, he actively supported staff development in this field and worked
closely with other national services and initiatives, such as the HEA, LSDA Q Projects, DfES
Standards Unit and other JISC services such as JISC Infonet and the Plagiarism Advisory
Service. In his present role as University e-Learning Advisor, his main role is to support
teaching staff in the use technology to enhance learning and teaching.
Joe Stocks-Brook (external critical friend) is Gallery Development Manager at the
National Media Museum, having extensive experience in driving and creating new visitor
experiences at the National Media Museum and is instrumental in implementing new ways of
using technology to deliver the museums’ public programme. While leading the Gallery
Development department, Joe has successfully managed the technical delivery of all the
museums current permanent and temporary exhibitions and is leading the development of the
interactive, technological and 3D elements of the proposed Internet Gallery.
Tom Woolley (external critical friend) is Curator of New Media at the National Media
Museum, has five years experience as a web designer, in-depth knowledge of HTML web
page structure, CSS layout and Adobe Flash. He also has over two years experience as a
Museum curator, developing gallery content and working with designers to interpret
information accordingly.
5
Pro Vice-Chancellor
Professor of Electronic Imaging and Media Communications
Rae Earnshaw PhD FBCS FInstP FRSA CEng CITP
JISC
Northavon House
Coldharbour Lane
Bristol
BS16 1QD
20 April 2009
Dear Sir or Madam,
JISC Call – Rapid Innovation Programme 03/09
The University wishes to confirm its full support for this bid on Concept Linkage in Knowledge
Repositories.
This bid seeks to address the challenge of unstructured information gathering and the development of
tools and techniques to surf knowledge repositories based on finding concepts that lie between other
concepts. The project will provide new visualization and analysis of large networks of related text.
Initially this will be from the online encyclopaedia Wikipedia. The software will also be used in the
National Media Museum as a way of visualizing information and determining how it can be used
within the context of the Internet to intelligently find related articles.
As the University is currently seeking to develop its learner environment, particularly for active and
collaborative learning, this proposal is especially important and relevant.
Background
The University has a substantial E-Strategy programme (2004-2012) in support of the institution’s
Corporate Plan which is designed to give students greater flexibility of learning and working via online
information and learning environments. The University has adopted a blended learning strategy
whereby online information is used to support, facilitate, and enhance the student learning process
which is initiated in the lecture theatre, classroom, or laboratory.
Perceived Advantages of Emergent Technologies
A recent development in e-administration in the Student Support Services has enabled us to integrate
a number of service units that previously operated separately. Technology has enabled this
integration to be accomplished, and it has also facilitated the change of working practices so that the
services are now more accessible and more student-focussed. This is currently being extended into a
virtual Student Support Service so that the benefits are available to students 24 hours a day 7 days a
week from wherever they are located. We therefore see greater utilisation of Web 2.0 technologies
and mobile devices as the next logical step in terms of service delivery and support. This project will
also be able to build on insights gained from previous and ongoing projects which have looked at the
implications and applications of Web 2.0 technologies for the student experience, for example - the
University Pathfinder project funded by the Higher Education Academy (HEA).
Yours sincerely
Professor Rae Earnshaw
[email protected]