Ghent University – iMinds – Multimedia Lab

Ghent University – iMinds – Multimedia Lab
Master thesis subjects 2015 - 2016
Knowledge on Webscale
Assessing Open Source Code Trustworthiness through Version History
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Tom De Nies, Ruben Verborgh, Miel Vander Sande
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected]
Keywords: Trust, Version Control Systems, Git, Semantic Web
Location: home, Zuiderpoort
Problem definition:
An increasing amount of code is being shared on the Web, thanks to various open source initiatives
such as npm, sourceforge, Google Code, etc. However, with all this code being pushed to the Web,
and the possibility for anyone to contribute to any project, the quality of the resulting software is not
always optimal. Furthermore, manually judging whether or not to trust a piece of code is a timeconsuming and non-exact process, while programmers should be focusing on writing contributions
of their own. Therefore, there is a need for an automatic method to help programmers decide
whether or not they can trust a certain piece of code.
Goals:
To achieve this, the student will exploit the information contained within a Version Control System
(VCS), such as Git. More specifically, the ‘provenance’ (also referred to as ‘data lineage’) of the
code will be exposed using a tool such as Git2PROV. Then, the student will create a method to
automatically reason of this provenance, to give the end-user an assessment of the trustworthiness
of each version of the code. An extensive literature review will need to be conducted, to find out
which criteria influence trustworthiness, and how they can be inferred. Finally, the thesis should
result in a lightweight, user-friendly demonstrator that can easily be used by programmers in their
daily workflow.
Towards a Trusted Web by Tracing the Origins of Composite Web Pages
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Tom De Nies and Ruben Verborgh
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected]
Keywords: Trust, Version Control Systems, Git, Semantic Web
Location: home, Zuiderpoort
Problem definition:
While openness is one of the core foundations of the Web, it has caused such an abundance of
heterogeneous content that it becomes unclear for humans if they can trust the content they see on
web pages. Furthermore, web pages are often littered with tracking mechanisms, such as hidden
pixels, cookies, etc. The first step in deciding whether or not to trust a web page is finding out where
its contents come from, who made/edited them and whether these sources are to be trusted. This is
what’s known as the page’s provenance.
Goals:
In this master’s thesis, the student will investigate a method to trace the provenance of Web pages
and the parts of which they composed. To achieve this, an extensive literature study will be
performed to identify existing approaches that might serve as a baseline for this purpose. The
student will then devise an improvement on these approaches, thereby exposing the full provenance
of a web page. This provenance can then be interpreted by a reasoner to suggest a trust
recommendation to the end user: a human being looking at the web page through a browser. By
working with standards from the W3C Open Web Platform, the solution devised in this thesis
potentially has a worldwide impact.
Bringing time-travel to data on the Web with efficient indexes
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Miel Vander Sande, Laurens De Vocht
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1 or 2
Number of theses: 1
Contact: Miel Vander Sande
Keywords: Data structures, Web APIs, Indexing, RDF, Semantic Web, Linked Data
Location: home, Zuiderpoort
Problem definition:
The Web of Data is an interconnected network of Linked Datasets residing on the Web. In
contrast to documents, concepts within the data are linked in a single global data space. These data
are modelled as a graph using the RDF framework, which describes data as triples (subjectpredicate-object). Although reading infrastructure has evolved significantly, writing this dataspace
is still an unsolved problem. How do you maintain such a dataspace where different users create,
read, update and delete statements? How do you take into account different views on the same
data? These problems pose new challenges for data storage, more specifically indexes, where
changes need to be remembered. Databases uses many different indexing algorithms like B-trees,
B+ trees, skip lists, hash tables and storage structures like Log-structured storage to enable fast and
concurrent read/write access. How do they perform for RDF? Can they be exploited by version
control systems?
Goals:
In this thesis, you initiate a building block for the Read/Write Web of Data. You dive into the
literature about general and versioned indexing strategies in RDF databases. Based on this
knowledge, you propose a technique that supports (a) fast triple pattern based retrieval, (b)
acceptable insertion/removal of triples and (c) track changes in order to retrieve past views. Finally,
the approach is implemented and evaluated using a use case, in order to verify the features
mentioned above.
Monitoring Science Related Conversations on the Web in Real-Time
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Laurens De Vocht, Anastasia Dimou
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Laurens De Vocht
Keywords: Web 2.0, Science 2.0, Researchers, Web of Data, Collaboration Tools, Social Media
Location: home, Zuiderpoort
Problem definition:
Digital libraries and online journals (such as IEEE, ACM) all have search engines to help scholars
find interesting resources. However, these approaches are often ineffective, mostly because
scholars: (i) only look-up resources based, at best, on their topics or keywords, not taking into
account the specific context and the scholar's profile; (ii) are restricted to resources from a single
origin. Of course, aggregators exist that index resources from multiple sources. The challenge is
therefore in matching research needs and contexts to opportunities from multiple, heterogeneous
sources. In other words, we should make the most of the wealth of resources for research through
relating and matching their scholar profile with the online available resources, publications and other
scholar's profiles.
Goals:
Combine streams of Web Collaboration Tools (e.g. Researchgate, Mendeley…) and Social Media
(e.g. Twitter, LinkedIn...) to track scientifically related communication and align it with the Web of
Data (such as COLINDA, DBLP, PubMed). This allows developing an efficient real-time monitor and
a useful environment for researchers. The monitor needs to allow interaction with the users, like a
dashboard. The user's personal research library and preferences could be matched with those of
others. This allows links to be made to social and research data beyond a single researcher's scope
and be a great source for inspiration (what is relevant to me?) and overview (what's hot right now
around me?). This should lead to more fine-grained details facilitating researchers to obtain a
sophisticated selection and linking of contributed resources based on previous assessments and
explored links.
Towards Intelligent Web Experiences: A Contextual User Journey
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Laurens De Vocht
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Laurens De Vocht
Keywords: Storytelling, Information Retrieval, Big Data, Linked Data, Pathfinding, User Interaction
Location: home, Zuiderpoort
Problem definition:
The number of resources that are now available to users on the Web, is rapidly expanding. Although
this Big Data may be structured and even linked, the underlying structure still looks like a maze to
most users. Therefore, there is an abundance of apps and web pages trying to hide this for the user
and of course there are search engines helping the user travel to the right page at all times.
Rather than imposing a linear journey on users through the hierarchies that website navigation and
lists of links impose, a networking-based contextual experience is based on a user’s search and the
relationships that form around what a user searches for. In this way, users navigate the network in
an order of their choice—establishing their own personalized web experience. There is still some
order as resources in the network are related to each other through modeled relationships and
ontologies. But, these relationships can run in multiple directions rather than one direction. The
categories become less important and the focus is put around the user and the content that is most
relevant to them.
You will investigate an approach to influence the generation of explanations of how resources are
related to each other in real-time in a personalized user context. Each explanation can be seen as a
useful/relevant re-combination of multiple associations between resources, for example:
 Trivia Finding / Personalized Story: DBpedia (Structured version of Wikipedia)
 Research 2.0: Recreate Events (Conferences) based on data of Web Collaboration Tools,
Digital Libraries, Linked Open Data
 Medicine: Drug discovery and genome data analysis.
Goals:
In this master’s thesis, the goal is to generate an experience which is both relevant to the user and
coherent. This includes an optimization on how the users interact with data, without violating any
rules of the context in which it is applied (e.g. chosen topics, required resources). Rather than living
in the past, you will investigate methods that make it possible to look toward the future, providing
inspiration as you discover things you didn’t know before. Relationships in the data can suggest new
songs you may want to listen to or people you may want to meet. The user's journey on the Web
evolves from being enforced linearly (passing through search engines over and over) to a network
of data represented in way they like.
User experience modeling in mobile applications using behavior stream visualization,
clustering and sentiment extraction
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Dieter De Witte, Azarakhsh Jalalvand, Paul Davies (UXProbe)
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected], [email protected]
Keywords: Big Data Visualizations, Clustering, Sentiment extraction, Machine Learning
Location: Home, Zuiderpoort, Boerentoren Antwerp
Problem definition:
Nowadays, being a successful mobile application developer is a challenging task. Apart from the
fierce competition, you need to convince users to install your product and to keep using it. In this
context, it is of critical importance to identify and resolve issues/nuisances as soon as possible.
Therefore developers are eager to track user behavior and user sentiment in their application and to
identify, analyze and even predict bad user experiences.
UXProbe (http://www.uxprobe.org/), a startup specialized in usability design, created a service
which developers can integrate in their mobile application. The service collects event streams of
user interactions, errors that occur etc. What makes their service unique is that these event streams
are enriched with user generated sentiment feedback in the form of ratings, emoticons and minisurveys resulting in a very rich dataset. Deriving and communicating insights from this data in a
scalable fashion for increasing user bases, within limited time constraints is a challenging task for
which UXProbe needs you!
Goals:
As a first step you will create a number of dynamic dashboard-like visualizations to allow the visual
exploration of the user behavior patterns and the emotions associated with them. This will enable
development teams to quickly spot unexpected interaction patterns and assess the effectiveness of
the solutions they provide to address these.
In a second phase the thesis will, depending on your interests focus on:
Clustering the events into categories of ‘error motifs’.
The design of a prioritization queue based on the results of user sentiment analysis.
The design of a learning algorithm which can predict upfront which problems a user is likely to face
in the near future.
Analyzing and exploring links in biomedical graphs for drug discovery
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Dieter De Witte, Laurens Devocht & Filip Pattyn (Ontoforce)
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected], [email protected]
Keywords: Biomedical Data Mining, Graph Algorithms, Big Data
Location: home, Zuiderpoort
Problem definition:
Ontoforce is a startup that has created a solution to transparently query many clinical data sources
simultaneously. Their semantic technology - disQover - can query data from molecular databases,
scientific publications, clinical trials etc., allowing pharmaceutical companies to shorten their drug
discovery pipeline significantly. While the focus is currently on semantic technologies, there are
many questions which are more easily mapped onto pathfinding algorithms than to queries, a simple
example being: is there a path between lung cancer and smoking via genetic mutations? Neo4J is a
graph database optimized for graph traversal queries. Rik Van Bruggen, regional director of Neo
Technology in Belgium will be available as technical advisor on how to implement certain use cases.
Goals:
Design a number of specialized path finding algorithms in Neo4J and assess the
limitations/complementarity of both semantic technologies and graph databases for the use cases
presented. Investigate whether the algorithms can be translated to different context, for example in
automatic storytelling (www.everythingisconnected.be), which under the hood also relies on path
finding algorithms.
Turning Social Media Data to a Semantic Gold Mine
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Anastasia Dimou, Laurens De Vocht
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Anastasia Dimou
Keywords: Social Media, Linked Data mapping, Linked Data publishing
Location: home, Zuiderpoort
Problem definition:
Incorporating data from social media, e.g. twitter, Facebook, LinkedIn, into the Linked Open Data
cloud has a lot of value for marketing analysis, business and research. In real-world situations, this
information cannot be efficiently searched as we cannot query it, e.g. we cannot search when a user
last mentioned a place.
Goals:
In this thesis, you research extract-map-load approaches where handling the mapping is driven by
the data in real time. You investigate how to perform an efficient execution plan that handles such
data streams and maps them to RDF. The focus is on mapping social media data to the RDF data
model so you can bridge information from different social media in a way that you can identify
interesting trends around people, topics or events while you filter out the noisy data.
Visualizing biological sequence motifs using high performance multidimensional scaling
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Dieter De Witte
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected],
Keywords: Bioinformatics, Big Data Visualizations, Clustering
Location: home, Zuiderpoort
Problem definition:
The DNA of an organism consists of approximately 1% coding sequences, i.e. genes, and 99%
noncoding sequences. Regulatory sequences are hidden in the vicinity of the genes and help
regulate transcription (the process of translating DNA into RNA). In a later step RNA is then
synthesized into proteins. The accurate discovery of regulatory motifs in sequence data is very
difficult since these motifs are generally short and allow multiple wildcards.
An exhaustive discovery algorithm has already been developed which generates a database of
motifs which are overrepresented in sequences of closely related organisms. The amount of data is
however still too large to derive clear insights yet.
Goals:
The motif database generated by the discovery algorithm generates conservation information about
millions of short motifs. When one motif is found to be significant many highly similar motifs will
generally also be significant which implies that they can be clustered and the cluster as a whole
might have a biological meaning.
Multidimensional scaling is a technique which allows the visualization of high dimensional data by
mapping it onto a 2D or 3D Euclidean space while only requiring a well-chosen string distance
between the motifs.
The thesis student will investigate whether this algorithm can be used for large datasets of short
motifs, investigate the scalability and develop an end-to-end solution in which a biologist can
explore the data in an interactive way. As an extension the results from different motif algorithms
can be visually compared.
Low latency querying of huge Linked Medical datasets using Big Data technologies
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Dieter De Witte, Laurens Devocht
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected]
Keywords: Big Data Architectures, Linked Data, Biomedical Data Mining
Location: home, Zuiderpoort
Problem definition:
The semantic web is a continuously expanding collection of linked datasets. Transparently querying
all these datasets together is a technical challenge for which multiple solutions have been
developed in the past years: from smart LDF clients to federated querying. If data dumps are
available another approach is possible: collect all dumps in a distributed file system (HDFS) and
query them using SQL-like tools developed for Big Data systems. Semantic data is generally
queried using the SPARQL language for which there is no real solution in the Big Data ecosystem
yet. Depending on the availability and the volume of the data this approach might be the preferred
or even the only solution.
Goals:
In a first step a literature study will be performed to get a clear overview of all attempts that have
been made to translate this problem into the Big Data space. In the next step a solution will be
developed which provides a SPARQL interface to a Big Data system of choice. Most recently
Apache Spark has risen to be the most popular big data technology and it would be interesting to
see if its superior data processing performance translates to linked data querying. As a benchmark
a set of queries on multiple medical datasets generated by the disQover platform of Ontoforce will
be used to compare the Big Data approach to the already available solutions based on federated
querying.
A journey planner for the galaxy
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Pieter Colpaert, Ruben Verborgh
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: [email protected]
Keywords: hypermedia, REST, Linked Data, route planning
Location: home, Zuiderpoort
Problem definition:
Today, it would be impossible to write an app which plans routes throughout the entire galaxy: all
the transit data of e.g., buses, space shuttles, trains, bikes or space elevators would have to be
collected on one machine. This machine would then have to calculate hard to cache route planning
advice for all the user agents in the galaxy.
Within our lab we are developing a different server-client mechanism: instead of opening route
planning APIs for wide use, we suggest publishing the raw arrivals and departures using a REST
API. This way, route planning user agents can follow links to get more data on the fly. We are
implementing this data publishing interface with all the transit data that is already available worldwide and we have written a proof of concept user agent which now calculates the routes by
following links, and this within a reasonable query time.
Goals:
The goal of this thesis is to research the possibilities to make these user agents (which may be both
servers or mobile apps) more intelligent in various ways: e.g., the process could be sped up if we
can pre-fetch data on the client-side or the result can be more suitable for end-users if our useragent is able to discover other datasets (e.g., when I’m in Caracas, I only want these routes with the
least criminality reported, or when I’m in a wheelchair, I only want wheelchair accessible routes).
The user agent that is written needs to be generic: anywhere in the galaxy, it should be able to
automatically discover the right datasets published on the Web.
Building a Big Data streaming architecture for real-time Twitter annotation and querying
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Dieter De Witte, Frederic Godin, Gerald Haesendonck
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics,
Number of students: 1
Number of theses: 1
Contact: [email protected]
Keywords: stream querying, linked data, neural networks, semantic annotation
Location: home, Zuiderpoort, Technicum
Problem definition:
Twitter is an online social network service that allows users to send short messages, aka tweets.
Currently over 500 million tweets are being generated every day. Automatically extracting
information from these tweets is a challenging task since they are short, can contain multiple
languages, contain spelling errors etc. Extracting information about tweets which are semantically
related, i.e. deal with a certain topic is far from trivial if they do not contain the same terms.
Goals:
In this thesis the student will be involved in setting up a streaming architecture for enriching tweets
with semantic information. Neural networks will be trained and used to label the tweets and to spot
named entities. The enriched stream will then be converted into semantic RDF triples which can be
queried using a streaming variant of the SPARQL language, for example C-SPARQL. Spark is a Big
Data technology stack that contains tools for both stream processing and batch analysis and is the
recommended technology for tackling this kind of problem.
Distributed query answering on the open Web
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Ruben Verborgh, Miel Vander Sande
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Ruben Verborgh
Keywords: Linked Data, Web, Semantic Web, querying, distributed systems
Location: home, Zuiderpoort
Problem definition:
Mail [email protected] to discuss this subject.
What do your friends think of movies directed by Martin Scorsese?
What Nvidia graphics cards have few bug reports on Linux?
Is it cheaper to buy certain flights and hotels separately or together?
None of the above questions can be answered by a single data source, yet today’s Web technology
still focuses on single-source answer systems. This is problematic because a) it’s not scalable,
since that single source will need to process a lot of queries, and b) that source doesn’t have all the
data it needs to answer questions such as the above. The idea behind Linked Data Fragments
(http://linkeddatafragments.org/) is that clients, instead of servers, should answer queries. Sources
should offer fragments of data in such a way that clients can combine them to answer questions that
span multiple datasets.
A
client
that
works
with
a
single
data
source
already
exists
(https://github.com/LinkedDataFragments/Client). Your task in this master’s thesis is to extend this
client – or build a new one – so that it can query different data sources for a single query.
Goals:
Developing a scalable method to answer queries using different data sources.
Describing to a client which data sources are relevant for a given query.
Evaluating your solution on aspects such as accuracy and performance.
Querying multimedia data on the (social) Web
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Ruben Verborgh, Miel Vander Sande
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Ruben Verborgh
Keywords: Linked Data, Web, Semantic Web, querying, multimedia, images, video
Location: home, Zuiderpoort
Problem definition:
Mail [email protected] to discuss this subject.
How would you find YouTube movies about “New York” in which people mention the Twin
Towers?
How could you find images that depict two people shaking hands?
Even though there is a large amount of metadata available on the Web, finding images and video
can be quite difficult. The goal of this thesis is to build an intelligent client (for instance, as a browser
extension) that is able to find multimedia items on the Web. This saves users many search
operations on different datasets. For this, you will need to combine metadata from different sources.
A starting point for this is the Linked Data Fragments client (http://linkeddatafragments.org/), which
already allows to query the Web Of Data. Your task is to blur the border between textual and
multimedia search, making it easier to find those media items users are looking for.
Goals:
Developing a client to find multimedia data on the Web.
Finding methods to query existing multimedia platforms such as YouTube and Instagram.
Evaluating your solution on aspects such as recall, precision, and performance.
Real-time querying of transport data on the Web
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Ruben Verborgh, Pieter Colpaert
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Ruben Verborgh
Keywords: Linked Data, Web, Semantic Web, querying, transport, train
Location: home, Zuiderpoort
Problem definition:
Mail [email protected] to discuss this subject.
The Belgian rail website allows you to plan your journey, but only in very rigid ways. It does not take
into account your current location and plans. Suppose you need to be in a certain building in
Brussels for a meeting. That morning, you decide to take the train at 14:06. Unfortunately, that train
is severely delayed later on, but you won’t know that until you check the website again. In this thesis,
you develop a querying system over the Web that allows to retrieve real-time results that are
continuously updated. Based on data from different sources, your system automatically picks those
fragments that are necessary for users to plan their journey. You can build on existing work for Web
querying, such as Linked Data Fragments (http://linkeddatafragments.org/).
Goals:
Developing a real-time querying mechanism for transport data.
Planning a route using different fragments of data from the Web.
Evaluating your solution on aspects such as bandwidth and performance.
Predictive analytics for the Internet of Things
Promotors: Erik Mannens and Rik Van de Walle
Supervisors: Dieter De Witte, Sven Beauprez (IoTBE.org)
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 2
Number of theses: 2
Contact: [email protected]
Keywords: Internet of things, linked data publishing, stream reasoning, big data analytics
Location: home, Zuiderpoort
Problem definition:
In the upcoming years the number of devices for the Internet of Things will grow exponentially. IoT
will therefore become the largest source of streaming data. In order to derive insights from this data
it should be converted into a format which can be queried and allows easy semantic enrichment.
Linked open data is the ideal candidate to fulfill this task. To prepare for this innovation a prototype
environment is required which will reveal the challenges for the upcoming data explosion. As a data
source we will make use of Tessel Microcontrollers (www.tessel.io) which can be easily configured
using only JavaScript and which are extendible with a wide range of sensors: audio, video,
temperature, Bluetooth etc. The built-in Wi-Fi allows for a straightforward data transfer to a Big Data
infrastructure.
Goals:
In this thesis the student(s) will be involved in setting up an architecture for analyzing and enriching
sensor streams with semantic information. The annotated streams can be queried using a streaming
variant of the SPARQL language, used for (static) linked data.
In a first phase the student will build an experimental setup with Tessel devices as the data source.
The data generated by the devices will be automatically ingested in a Big Data streaming
architecture. For the analytics platform the student will explore the possibilities of the Apache Spark
stack which contains tools for both stream processing and batch analysis.
Since multiple students can work on this topic the focus of the thesis can be aligned with the
interests of the student: if the data streams are captured he/she can focus on enrichment,
visualizations or on optimizing the performance of the data pipeline.
Web-based framework for real-time data visualization and scene graph management
Promotor: Peter Lambert and Rik Van de Walle
Supervisors: Jelle Van Campen, Tom Pardidaens and Christophe Herreman (Barco)
Study Programme: Master in Computer Science Engineering, Master of Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Christophe Herreman (Barco)
Keywords: html5, web, real-time, scene graph, data visualization
Location: home, Zuiderpoort, Barco
Problem definition:
Barco develops systems and solutions for professional visualization applications in many different
markets and increasingly requires web-based front-ends for these applications. The applications
typically have stringent requirements towards image quality, latency and user experience and deal
with large amounts of live video and data channels. At this moment, front-end applications are
developed by integrators or specifically for a single customer.
Goals:
The goal of this project is to develop prototypes, which demonstrate different technologies to render
highly dynamic scenes such as geographical maps or 3D city models, and augment the visualization
with real-time video and meta-data. After analyzing the requirements from various divisions, a study
of the state-of-the-art should be performed including evaluating available technology (commercial or
open source libraries). The suitability of the libraries is to be explored with regards to performance,
latency, scalability and feature completeness. Then, a framework and software architecture is to be
designed and implemented to enhance these capabilities with the in-house capabilities for real-time
video and meta-data visualization and also to enable handling different types of content in single
browser window, possibly combining the capabilities of several libraries.
This thesis is jointly supervised with Barco (www.barco.be).
Designing an Ontology-Driven Dialog System for Natural Language Understanding
Promotoren: Kris Demuynck en Erik Mannens
Supervisors: Gaëtan Martens([email protected]) en Azarakhsh Jalalvand
Study Programme: Master Computer Science Engineering, Master Electrical Engineering, Master
Mathematical Informatics
Number of students: 2
Number of theses: 1
Contact: Gaëtan Martens ([email protected]), Azarakhsh Jalalvand
Keywords: Automatic Speech Recognition, Natural Language Understanding, Semantic Web,
Dialog System
Location: home, Zuiderpoort
Dialog Management (DM) and Natural Language Understanding (NLU) are fields of computer
science, artificial intelligence, and linguistics concerned with the interactions between computers
and human (natural) languages. Nuance Communications is the market leader in those
technologies and delivers currently the most significant advancements in speech recognition
technology.
Problem definition:
An Automatic Speech Recognition (ASR) + NLU system transforms an input speech signal into a
semantically enriched output which consists of intents and interpretations. A dialog system (DS) is
an engine responsible for DM, which allows having a conversation with a machine based on a
predefined set of concepts. These concepts and their relationships are modeled using Semantic
Web technologies by means of an ontology. This ontology defines the behavior of the DS relying on
a semantic reasoner. For example, if the user says “I want to make a phone call” (intent=“Phone”)
then the DS should ask for additional information such as: “Do you want to call somebody from your
contact list or do you want to dial a number?” On the other hand if the user said “I want to call
Gaëtan Martens on his cell phone” the system should not ask for additional information since the
required interpretations (i.e., contactName=“Gaëtan Martens” and phoneType=“cell phone”) are
already available.
Goals:
In this Master thesis the students will build a DS which relies on an OWL ontology to define its
behavior. This ontology has to be created based on a set of uses cases provided by Nuance. The
ontology has to model the intents, the concepts (i.e., the corresponding NLU interpretations), and
the relationships between the concepts. The next step is then to build the DS around an existing
semantic reasoner. The final goal of this challenging thesis is to have a functional DS, built using
open source libraries, that is able to use the output from Nuance’s state-of-the-art ASR+NLU system
and is configured with the created ontology.
The students will have the opportunity to work on cutting-edge technology within the Nuance’s
automotive R&D team in an international context.
Building a Natural Language Dialog System with Semantic Web technology
Promotoren: Kris Demuynck en Erik Mannens
Supervisors: Gaëtan Martens([email protected]) en Azarakhsh Jalalvand
Study Programme: Master Computer Science Engineering, Master Electrical Engineering, Master
Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Gaëtan Martens ([email protected]), Azarakhsh Jalalvand
Keywords: Automatic Speech Recognition, Natural Language Understanding, Semantic Web,
Dialog System
Dialog Management (DM) and Natural Language Understanding (NLU) are fields of computer
science, artificial intelligence, and linguistics concerned with the interactions between computers
and human (natural) languages. Nuance Communications is the market leader in those
technologies and delivers currently the most significant advancements in speech recognition
technology.
Problem definition:
An Automatic Speech Recognition (ASR) + NLU system transforms the input speech signal into a
semantically enriched textual output. This output consists of an n-best list of intents (e.g., “Call”) with
a set of possible interpretations (such as contactName=“Gaëtan Martens” and phoneType=“cell
phone”) and corresponding probabilities. The dialog system (DS) is the engine responsible for DM,
which allows having a conversation with a machine based on a predefined set of concepts. This set
of concepts (and their inter-relationships) is modeled using Semantic Web technologies by means of
an ontology. This ontology then defines the behavior of the DS. The DS is able, given its current
state, to recognize an intent or ask for more information when the input is ambiguous by applying
reasoning techniques.
Goals:
In this master thesis, the goal is to link an existing ASR + NLU system with a DS via ontologies. At
first, the supported intents and interpretations of the NLU system must be formally defined in the
ontology and missing intents (required by the DS) will have to be added. Further alignment tasks will
have to be investigated by testing and iterative enhancements such as improving the mechanism of
the DS to take into account the NLU probabilities. The final goal is to have a functional speechenabled ontology-based end-to-end DS, which is still a challenge given the current state of the art.
The student will have the opportunity to work on cutting-edge technology within the Nuance’s
automotive R&D team in an international context.
The Multi-sensory and Sentiment-aware eBook
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Hajar Ghaem Sigarchian, Tom De Nies, Frank Salliau
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact:Hajar Ghaem Sigarchain
Keywords: e-Text-Book, Sentiment Analysis, Internet Of Things, Semantic Web
Location: home, Zuiderpoort
Problem definition:
People expect more and more when reading e-Books, in terms of entertainment and immersion.
The recent advances in smart environments theoretically allow people to experience an eBook not
only using their eyes but also using their other senses, such as hearing, touch (e.g., thermoception),
etc. Content of such enhanced e-Books can become augmented and reacted upon by smart objects,
using the Internet of Things (IoT). However, many challenges remain to make this a reality, such as
the automatic annotation, exchange of data, and timing (e.g., should the room react any time a
person flips a page of the eBook?).
Goals:
The goal of this master’s thesis is create digital books as an object in IoT. The student needs to
perform a literature review, in order to have better understanding of the concepts and current state
of the art. In addition he/she will propose an appropriate architecture, and data representation
format. The research domains will include smart rooms, sensors, Internet of Things, digital
publishing and Semantic Web. The next step is implementing a prototype as a proof-of-concept.
Eventually, he/she needs to test the prototype to evaluate and qualify the relevancy of the solution
to the end-user (the reader of the book).
Using Events to Connect Books Based on Their Actual Content
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Pieter Heyvaert, Ben De Meester, Frank Salliau
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Pieter Heyvaert
Keywords: books, events, Semantic Web, media, metadata
Location: home, Zuiderpoort
Problem definition:
Many books tell a story about a certain event or a series of events. Different stories (in multiple
books) might even contain the same events. An example of such a series of events can be the
Battle of Waterloo, which includes separate events for the preparation, the first French attack, the
capture of La Haye Sainte, the attack of the Imperial Guard, the capture of Plancenoit, and so on.
Currently, the metadata extracted from books mostly denotes topics, known locations, people, dates,
and so on. However, the connection between people, their location and the date is not considered,
and these elements are what an event consists of. When having this information available, one is
able to determine the similarity between different books regarding a specific event. This is not only
limited to books: when event information is also available for movies, TV series, news articles, and
so on, one is also able to determine how the literature/movies interpret real events: what happens
when non-fiction (e.g., newspaper articles) takes a fiction approach (via e.g., movies) is one of the
questions that might be answered here.
Goals:
You will need (1) to determine how events can be dissected from the several types of media, with
the focus on books, (2) to decide on the granularity of the events extracted from the media, (3) to
determine how to incorporate existing information, if any, about the events and media, and (4) to
determine/visualize the events found in the different sources. As part of the thesis you will need to
create a prototype which takes media (such as books) as input and outputs the events. Next, you’ll
need to visualize the connection between the different media based on the found events. Also the
difference between events can be determined: two books might talk about the same big event,
however, the events making up the big event may be different, may be in a different order, and so
on.
Deriving Semantics from Styling
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Ben De Meester
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Ben De Meester
Keywords: HTML5, Semantics, CSS
Location: home, Zuiderpoort
Problem definition:
HTML is the de facto publication format for the World Wide Web for humans. Using styling
properties (e.g., in a CSS stylesheet), it is possible to style the HTML document to make it pretty.
However, the HTML file itself is rarely pretty for machines. Coding such as ‘<p class=”title”>’ is a
terror, and contradictory to the semantic structure that HTML5 can provide. One thing that will
always be consistent however, is the visual looks of an HTML page versus its intended semantics.
E.g., ‘<p class=”title”>’ will look similar to ‘<h1>’ for a user, but the latter has a lot better semantic
meaning than the former. And better semantics means better processing of the HTML, which results
in a better Web.
Goals:
The goal of this thesis is to improve existing HTML pages, their structure and semantic meaning, by
using, among others, its visual characteristics. To that end, the student needs to define which
criteria could be important, e.g., not only font-size could be important, but also font-weight, position
on a page, name of the CSS class, etc. Next, the student should provide for a proof-of-concept that
can effectively recognize and improve bad HTML constructs. Many techniques may be used,
involving clustering, Natural Language Processing, and ad hoc processing instructions.
Trusting query results on the open Web
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Ruben Verborgh, Tom De Nies
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Ruben Verborgh
Keywords: Linked Data, provenance, trust, Semantic Web, querying, Web, HTTP, client/server
Location: home, Zuiderpoort
Problem definition:
Mail [email protected] to discuss this subject.
The Web way of answering a question is to find combined answers from different sources. But
how can we be sure that the final answer is based on sources we can trust? This is the question
you will answer in this thesis.
Because information is spread across different places, the old database paradigm of “query =>
answer” doesn’t really work anymore on the Web. Linked Data Fragments
(http://linkeddatafragments.org/) capture this idea: a client asks servers for different parts of
information and is able to combine it by itself. We built a Node.js application that can query the Web
in such a way (https://github.com/LinkedDataFragments/Client). What this client doesn’t tell you
(yet), is where the different parts of the answer come from; it doesn’t give you a guarantee that the
answer is correct/trustworthy. By combining Linked Data Fragments with provenance technology,
we can precisely assess the trustworthiness of data.
Goals:
- Developing a method to combine the trust in different knowledge sources.
- Describing the trust in an answer that encompasses different sources.
- Developing a client that queries the Web and gives an answer you can trust.
Automatic Composition of Context-based Content in Digital Books
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Hajar Ghaem Sigarchian, Tom De Nies, Wesley De Neve, Frank Salliau
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Hajar Ghaem Sigarchain
Keywords: e-Text-Book, Widgets, Ubiquitous environment, Semantic Web, Profile Manager
Location: home, Zuiderpoort
Problem definition:
Books, even digital books are a static medium, with fixed contents. There is currently no means of
personalizing and tailoring books to the person who reads them. However, the technology to provide
dynamic contents inside a digital book already exists. Depending on contextual criteria such as the
reader’s interests, location, language, culture, the contents can adapt itself to fit the reader’s needs.
As a general use case, we can refer to non-fiction books such as tourist guide books that e.g.
automatically update their contents based on the reader’s geolocation.
Educational textbooks also provide an excellent use case to prove this principle, with the textbook
adapting itself to the student’s progress and changing interests. Publishers will also benefit from this
approach: it facilitates reuse of existing content; it also can lead to a significant reduction of both
creation and distribution costs and even lead to new business models.
Goals:
The goal of this master’s thesis is to automatically change digital books based on their context with
external open content. The student needs to do a literature review, in order to have better
understanding of the concepts and current state of the art. In addition he/she will propose an
appropriate architecture, and data representation format. The research domains will include
mashups, widgets, cloud-based synchronization, versioning and the Semantic Web.
The next step is implementing a prototype as a proof-of-concept. Eventually, he/she needs to test
the prototype to evaluate and qualify the relevancy of the automatically provided contents.
Mining anonymized user data for predictive analysis
Promotor: Sofie Van Hoecke and Peter Lambert
Supervisors: Glenn Van Wallendael, Benoit Marion ([email protected]), Dirk Van Gheel
([email protected])
Study Programme: Master Computer Science Engineering, Master Industrial Engineering:
Elektronica – ICT, Master Industrial Engineering: Informatica
Number of students: 1
Number of theses: 1
Contact: Sofie Van Hoecke
Keywords: Data mining, recommendations, crash prevention
Location: home, Zuiderpoort, TPVision Zwijnaarde
Problem definition:
To continuously improve the quality of Philips televisions, TPVision incorporates anonymized user
logging on Android enabled Smart TVs. However, as current TVs are complex systems, a large
amount of user data is logged. To improve the evaluation of user data, advanced data mining
techniques are required in many areas of our operations.
Efficient data mining can result in an optimized recommendation engine to recommend more
interesting TV channels or Android apps, improving the user’s lean back experience. Additionally,
watch later lists combining YouTube, Vimeo, and Spotify can be optimized by providing a prioritized
list.
Furthermore, from the user data the market impact of certain features (such as the availability of
broadcast tuners, pre-installed apps, DLNA connectivity etc.) can be predicted. Identifying
relationships between user behavior and interaction with those features taking into account general
user clusters and regional differences can result in differential marketing strategies.
Finally, the performance of the TV can be improved using machine learning techniques for analysis
of logged data and intervene in a timely manner to prevent a potentially hazardous crash to happen.
Goals:
The goal of this master’s thesis is to apply data mining techniques on anonymous user data from
currently deployed Philips Android TVs within Europe.
Topics to improve user experience, technical enhancements or commercially interesting research
are considerable. Depending on the student’s interest, the actual problem statement can be finetuned.
TP Vision is a dedicated TV player in the world of visual digital entertainment. TP Vision is part of
TPV Technology, the #1 monitor manufacturer in the world. At its Innovation Site Europe, Ghent we
design televisions of the future, for Philips and other brands. NetTV, Ambilight, Android TV and
Cinema 21:9 have all been developed in this innovative and driven environment. Recognition of our
activities is visible in numerous awards such as prestigious EISA-awards.
This master’s thesis is within the Advanced Software Development department which drives the
innovation chain from conceptualization to feasibility and prototyping, as well as leads technology
and standardization roadmaps.
A Proof checker for the Semantic Web
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Dörthe Arndt, Ruben Verborgh
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Dörthe Arndt
Keywords: N3Logic, proof, reasoning, semantic web
Location: home, Zuiderpoort
Problem definition:
The Semantic Web enables computers to understand and reason about public data. There exists a
huge number of reasoners which are able to draw conclusions based on common knowledge (e.g.
Pellet, CWM or EYE). Some of them also provide proofs to clarify how they came to a certain result.
But what is a proof? How can we be sure that a proof is correct? When do we trust it? Could they
be lying?
An independent checker is needed!
Goals:
With this thesis you will help to find a solution for this problem: You will get a better understanding of
N3-logic, the logic used by reasoners such as EYE or CWM
((http://www.w3.org/2000/10/swap/doc/cwm.html, http://eulersharp.sourceforge.net/).
You’ll learn what the current proof format of these reasoners looks like and how it could be improved.
This knowledge will enable you to implement an independent proof checker for N3-proofs which can
handle proofs written in the N3-proof ontology (http://www.w3.org/2000/10/swap/reason.n3).
On-site tijdsregistratie en efficiëntie-analyse in het operatiekwartier
Promotor: Sofie Van Hoecke, Patrick Wouters
Supervisors: dr. Piet Wyffels (UZ Gent)
Study Programme: Master of Science in de industriële wetenschappen: elektronica-ICT - Campus
Kortrijk, Master of Science in Computer Science Engineering, Master of Science in de industriële
wetenschappen: elektronica-ICT - Campus Schoonmeersen, Master of Science in de industriële
wetenschappen: informatica
Number of students: 1
Number of theses: 1
Contact: Sofie Van Hoecke
Keywords:
Location: home, Zuiderpoort, Technicum or UZ Gent (K12)
Problem definition:
Het organiseren en uitvoeren van een operatieprogramma binnen de dagelijks toegemeten werktijd
blijkt één van de meest complexe taken binnen een ziekenhuis. Er bestaat heel wat literatuur waarin
werkmodellen en systemen worden voorgesteld maar deze zijn slechts beperkt toepasselijk, o.a.
door belangrijke internationale verschillen in de structuur van de gezondheidszorg. Een meer recent
en universeel probleem dat de werkbaarheid van gangbare organisatiemodellen ondermijnt is het
spanningsveld tussen opgelegde budgettaire beperkingen en de stijgende zorgvraag.
Het enige afdoende antwoord hierop is een efficiëntere inzet van bestaande middelen en personeel
en een optimalisatie van de workflow (planning en uitvoer) in deze multidisciplinaire en
hoogtechnologische omgeving. Dit is geen eenvoudige opdracht gezien de aanwezigheid van
onvoorspelbare factoren (urgenties, complicaties), de betrokkenheid en uiteenlopende belangen
van verschillende disciplines (chirurgen, anesthesisten, verpleegkundigen, technici) en de
afstemming op diensten die patiënten moeten aanleveren en ontvangen.
Goals:
Het doel van deze masterproef is het ontwerpen van een intelligent, flexibel en gebruiksvriendelijk
systeem (bv. onder de vorm van web-gebaseerde toepassing) dat een accurate tijdsregistratie op
de werkvloer mogelijk maakt met aandacht voor de verschillende fases en deelaspecten die aan
bod komen bij, en invloed hebben op het verloop van een chirurgische ingreep (opstart tijd, turnover
tijd, installatietijd...). Het ontwikkelen van een geschikt datamodel is hierbij heel belangrijk. De data
die aldus opgemeten wordt, moet eenvoudig en efficiënt kunnen bevraagd worden om
tijdsverliesposten in kaart te brengen en na root cause analysis systematische verbeterinitiatieven te
lanceren.
Tijdens deze masterproef zal de student kennis opdoen omtrent de werking van een
operatiekwartier en met de verschillende partijen afstemmen om de verschillende standpunten en
invalshoeken op te nemen in het ontwerp. Er bestaat momenteel een grote en zeer brede interesse
in (en markt voor) dergelijke toepassingen.
Deze masterproef loopt in samenwerking met de dienst Anesthesie van het UZ Gent.
Real-time data-enrichment and data-mining of multilingual databases in procurement
environment
Promotors: Erik Mannens & Rik Van de Walle
Supervisors: Ruben Verborgh & Dieter De Witte
Study Programme: Master Computer Science Engineering, Master Mathematical Informatics
Number of students: 1
Number of theses: 1
Contact: Manu Lindekens (GIS International)
Keywords: datamining, data-enrichment, data-cleaning
Location: GIS International – Stapelplein 70 – 9000 Gent
Problem definition:
In a B2B world we as a procurement service provider (www.gisinternational.net) do receive lots of
information from our customers on items we need to buy from different sort of vendors. This
information is in most cases very limited, in different languages, incomplete and polluted.
Based on that limited information we, in a first phase, categorize these articles into different
commodities. This categorization will allow us to define better which vendors to target. A second
step is to enrich the data to be more accurate in ordering the material. These are all steps that are
today done manually but are very labor-intensive. An off-the-shelf software package that does all
this is currently not available.
Secondly the data in our own ERP system needs to be cleaned and enriched in real-time. Currently
we have identified different parties to tackle this problem with intelligent software.
Goals:
The goal of this thesis is to do be the link between the different parties who will develop and
implement the software to solve this problem. Therefore a good understanding of the needs, the
processes and how the software will work is important. GIS International has the knowledge on the
parts, knows which information is needed and commercially where to get it from. The other parties
are top research and technology companies which are experts in their fields. Purpose is to build a
state-of-the-art add-on to the ERP software of GIS (MS Dynamics NAV) which will be the
differentiator in the market.
Do not hesitate to contact us when you are interested in this topic and have some questions!
An extensible web-based intermodal route planner for Belgium
Promotor: Erik Mannens and Rik Van de Walle
Supervisors: Pieter Colpaert, Ruben Verborgh
Study Programme: Industriële wetenschappen
Number of students: 1
Number of theses: 1
Contact: [email protected]
Keywords: hypermedia, REST, Linked Data, route planning
Location: home, Zuiderpoort
Problem definition:
Developers of route planning apps today have to be satisfied with web-services which do the hard
work for them: they offer a black-box algorithm with a finite set of functionalities (e.g.,
http://api.myapp/?from=...&to=...). When the developer would like to make the algorithm take into
account other modes of transport or take into account different edge weights (e.g., depending on
wheelchair accessibility, criminality statistics or probability of being too late), it needs to request
these features to the server-admin.
At MultiMedia Lab, we are researching an alternate server-client trade-off: instead of exposing route
planning algorithms over HTTP, we suggest publishing the raw arrivals and departures using a
REST API with paged fragments (http://linkeddatafragments.org). This way, route planning user
agents, which now execute the algorithm on their own, can follow hypermedia controls to get more
data on the fly.
When we publish an ordered list of arrival/departures (connections), route planning can be done by
relaxing each connection once (shortest path in a DAG). We publish this graph as Linked Open
Data resources for route planning purposes.
Goals:
The goal of this master thesis is to research different strategies to merge graphs from different
transport
modes
on
the
scale
of
Belgium.
A
visualization
(cfr.
http://kevanahlquist.com/osm_pathfinding/) should be created to understand the different strategies
for merging different transport modes.