Ghent University – iMinds – Multimedia Lab Master thesis subjects 2015 - 2016 Knowledge on Webscale Assessing Open Source Code Trustworthiness through Version History Promotor: Erik Mannens and Rik Van de Walle Supervisors: Tom De Nies, Ruben Verborgh, Miel Vander Sande Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected] Keywords: Trust, Version Control Systems, Git, Semantic Web Location: home, Zuiderpoort Problem definition: An increasing amount of code is being shared on the Web, thanks to various open source initiatives such as npm, sourceforge, Google Code, etc. However, with all this code being pushed to the Web, and the possibility for anyone to contribute to any project, the quality of the resulting software is not always optimal. Furthermore, manually judging whether or not to trust a piece of code is a timeconsuming and non-exact process, while programmers should be focusing on writing contributions of their own. Therefore, there is a need for an automatic method to help programmers decide whether or not they can trust a certain piece of code. Goals: To achieve this, the student will exploit the information contained within a Version Control System (VCS), such as Git. More specifically, the ‘provenance’ (also referred to as ‘data lineage’) of the code will be exposed using a tool such as Git2PROV. Then, the student will create a method to automatically reason of this provenance, to give the end-user an assessment of the trustworthiness of each version of the code. An extensive literature review will need to be conducted, to find out which criteria influence trustworthiness, and how they can be inferred. Finally, the thesis should result in a lightweight, user-friendly demonstrator that can easily be used by programmers in their daily workflow. Towards a Trusted Web by Tracing the Origins of Composite Web Pages Promotor: Erik Mannens and Rik Van de Walle Supervisors: Tom De Nies and Ruben Verborgh Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected] Keywords: Trust, Version Control Systems, Git, Semantic Web Location: home, Zuiderpoort Problem definition: While openness is one of the core foundations of the Web, it has caused such an abundance of heterogeneous content that it becomes unclear for humans if they can trust the content they see on web pages. Furthermore, web pages are often littered with tracking mechanisms, such as hidden pixels, cookies, etc. The first step in deciding whether or not to trust a web page is finding out where its contents come from, who made/edited them and whether these sources are to be trusted. This is what’s known as the page’s provenance. Goals: In this master’s thesis, the student will investigate a method to trace the provenance of Web pages and the parts of which they composed. To achieve this, an extensive literature study will be performed to identify existing approaches that might serve as a baseline for this purpose. The student will then devise an improvement on these approaches, thereby exposing the full provenance of a web page. This provenance can then be interpreted by a reasoner to suggest a trust recommendation to the end user: a human being looking at the web page through a browser. By working with standards from the W3C Open Web Platform, the solution devised in this thesis potentially has a worldwide impact. Bringing time-travel to data on the Web with efficient indexes Promotor: Erik Mannens and Rik Van de Walle Supervisors: Miel Vander Sande, Laurens De Vocht Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 or 2 Number of theses: 1 Contact: Miel Vander Sande Keywords: Data structures, Web APIs, Indexing, RDF, Semantic Web, Linked Data Location: home, Zuiderpoort Problem definition: The Web of Data is an interconnected network of Linked Datasets residing on the Web. In contrast to documents, concepts within the data are linked in a single global data space. These data are modelled as a graph using the RDF framework, which describes data as triples (subjectpredicate-object). Although reading infrastructure has evolved significantly, writing this dataspace is still an unsolved problem. How do you maintain such a dataspace where different users create, read, update and delete statements? How do you take into account different views on the same data? These problems pose new challenges for data storage, more specifically indexes, where changes need to be remembered. Databases uses many different indexing algorithms like B-trees, B+ trees, skip lists, hash tables and storage structures like Log-structured storage to enable fast and concurrent read/write access. How do they perform for RDF? Can they be exploited by version control systems? Goals: In this thesis, you initiate a building block for the Read/Write Web of Data. You dive into the literature about general and versioned indexing strategies in RDF databases. Based on this knowledge, you propose a technique that supports (a) fast triple pattern based retrieval, (b) acceptable insertion/removal of triples and (c) track changes in order to retrieve past views. Finally, the approach is implemented and evaluated using a use case, in order to verify the features mentioned above. Monitoring Science Related Conversations on the Web in Real-Time Promotors: Erik Mannens and Rik Van de Walle Supervisors: Laurens De Vocht, Anastasia Dimou Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Laurens De Vocht Keywords: Web 2.0, Science 2.0, Researchers, Web of Data, Collaboration Tools, Social Media Location: home, Zuiderpoort Problem definition: Digital libraries and online journals (such as IEEE, ACM) all have search engines to help scholars find interesting resources. However, these approaches are often ineffective, mostly because scholars: (i) only look-up resources based, at best, on their topics or keywords, not taking into account the specific context and the scholar's profile; (ii) are restricted to resources from a single origin. Of course, aggregators exist that index resources from multiple sources. The challenge is therefore in matching research needs and contexts to opportunities from multiple, heterogeneous sources. In other words, we should make the most of the wealth of resources for research through relating and matching their scholar profile with the online available resources, publications and other scholar's profiles. Goals: Combine streams of Web Collaboration Tools (e.g. Researchgate, Mendeley…) and Social Media (e.g. Twitter, LinkedIn...) to track scientifically related communication and align it with the Web of Data (such as COLINDA, DBLP, PubMed). This allows developing an efficient real-time monitor and a useful environment for researchers. The monitor needs to allow interaction with the users, like a dashboard. The user's personal research library and preferences could be matched with those of others. This allows links to be made to social and research data beyond a single researcher's scope and be a great source for inspiration (what is relevant to me?) and overview (what's hot right now around me?). This should lead to more fine-grained details facilitating researchers to obtain a sophisticated selection and linking of contributed resources based on previous assessments and explored links. Towards Intelligent Web Experiences: A Contextual User Journey Promotor: Erik Mannens and Rik Van de Walle Supervisors: Laurens De Vocht Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Laurens De Vocht Keywords: Storytelling, Information Retrieval, Big Data, Linked Data, Pathfinding, User Interaction Location: home, Zuiderpoort Problem definition: The number of resources that are now available to users on the Web, is rapidly expanding. Although this Big Data may be structured and even linked, the underlying structure still looks like a maze to most users. Therefore, there is an abundance of apps and web pages trying to hide this for the user and of course there are search engines helping the user travel to the right page at all times. Rather than imposing a linear journey on users through the hierarchies that website navigation and lists of links impose, a networking-based contextual experience is based on a user’s search and the relationships that form around what a user searches for. In this way, users navigate the network in an order of their choice—establishing their own personalized web experience. There is still some order as resources in the network are related to each other through modeled relationships and ontologies. But, these relationships can run in multiple directions rather than one direction. The categories become less important and the focus is put around the user and the content that is most relevant to them. You will investigate an approach to influence the generation of explanations of how resources are related to each other in real-time in a personalized user context. Each explanation can be seen as a useful/relevant re-combination of multiple associations between resources, for example: Trivia Finding / Personalized Story: DBpedia (Structured version of Wikipedia) Research 2.0: Recreate Events (Conferences) based on data of Web Collaboration Tools, Digital Libraries, Linked Open Data Medicine: Drug discovery and genome data analysis. Goals: In this master’s thesis, the goal is to generate an experience which is both relevant to the user and coherent. This includes an optimization on how the users interact with data, without violating any rules of the context in which it is applied (e.g. chosen topics, required resources). Rather than living in the past, you will investigate methods that make it possible to look toward the future, providing inspiration as you discover things you didn’t know before. Relationships in the data can suggest new songs you may want to listen to or people you may want to meet. The user's journey on the Web evolves from being enforced linearly (passing through search engines over and over) to a network of data represented in way they like. User experience modeling in mobile applications using behavior stream visualization, clustering and sentiment extraction Promotors: Erik Mannens and Rik Van de Walle Supervisors: Dieter De Witte, Azarakhsh Jalalvand, Paul Davies (UXProbe) Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected], [email protected] Keywords: Big Data Visualizations, Clustering, Sentiment extraction, Machine Learning Location: Home, Zuiderpoort, Boerentoren Antwerp Problem definition: Nowadays, being a successful mobile application developer is a challenging task. Apart from the fierce competition, you need to convince users to install your product and to keep using it. In this context, it is of critical importance to identify and resolve issues/nuisances as soon as possible. Therefore developers are eager to track user behavior and user sentiment in their application and to identify, analyze and even predict bad user experiences. UXProbe (http://www.uxprobe.org/), a startup specialized in usability design, created a service which developers can integrate in their mobile application. The service collects event streams of user interactions, errors that occur etc. What makes their service unique is that these event streams are enriched with user generated sentiment feedback in the form of ratings, emoticons and minisurveys resulting in a very rich dataset. Deriving and communicating insights from this data in a scalable fashion for increasing user bases, within limited time constraints is a challenging task for which UXProbe needs you! Goals: As a first step you will create a number of dynamic dashboard-like visualizations to allow the visual exploration of the user behavior patterns and the emotions associated with them. This will enable development teams to quickly spot unexpected interaction patterns and assess the effectiveness of the solutions they provide to address these. In a second phase the thesis will, depending on your interests focus on: Clustering the events into categories of ‘error motifs’. The design of a prioritization queue based on the results of user sentiment analysis. The design of a learning algorithm which can predict upfront which problems a user is likely to face in the near future. Analyzing and exploring links in biomedical graphs for drug discovery Promotors: Erik Mannens and Rik Van de Walle Supervisors: Dieter De Witte, Laurens Devocht & Filip Pattyn (Ontoforce) Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected], [email protected] Keywords: Biomedical Data Mining, Graph Algorithms, Big Data Location: home, Zuiderpoort Problem definition: Ontoforce is a startup that has created a solution to transparently query many clinical data sources simultaneously. Their semantic technology - disQover - can query data from molecular databases, scientific publications, clinical trials etc., allowing pharmaceutical companies to shorten their drug discovery pipeline significantly. While the focus is currently on semantic technologies, there are many questions which are more easily mapped onto pathfinding algorithms than to queries, a simple example being: is there a path between lung cancer and smoking via genetic mutations? Neo4J is a graph database optimized for graph traversal queries. Rik Van Bruggen, regional director of Neo Technology in Belgium will be available as technical advisor on how to implement certain use cases. Goals: Design a number of specialized path finding algorithms in Neo4J and assess the limitations/complementarity of both semantic technologies and graph databases for the use cases presented. Investigate whether the algorithms can be translated to different context, for example in automatic storytelling (www.everythingisconnected.be), which under the hood also relies on path finding algorithms. Turning Social Media Data to a Semantic Gold Mine Promotors: Erik Mannens and Rik Van de Walle Supervisors: Anastasia Dimou, Laurens De Vocht Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Anastasia Dimou Keywords: Social Media, Linked Data mapping, Linked Data publishing Location: home, Zuiderpoort Problem definition: Incorporating data from social media, e.g. twitter, Facebook, LinkedIn, into the Linked Open Data cloud has a lot of value for marketing analysis, business and research. In real-world situations, this information cannot be efficiently searched as we cannot query it, e.g. we cannot search when a user last mentioned a place. Goals: In this thesis, you research extract-map-load approaches where handling the mapping is driven by the data in real time. You investigate how to perform an efficient execution plan that handles such data streams and maps them to RDF. The focus is on mapping social media data to the RDF data model so you can bridge information from different social media in a way that you can identify interesting trends around people, topics or events while you filter out the noisy data. Visualizing biological sequence motifs using high performance multidimensional scaling Promotor: Erik Mannens and Rik Van de Walle Supervisors: Dieter De Witte Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected], Keywords: Bioinformatics, Big Data Visualizations, Clustering Location: home, Zuiderpoort Problem definition: The DNA of an organism consists of approximately 1% coding sequences, i.e. genes, and 99% noncoding sequences. Regulatory sequences are hidden in the vicinity of the genes and help regulate transcription (the process of translating DNA into RNA). In a later step RNA is then synthesized into proteins. The accurate discovery of regulatory motifs in sequence data is very difficult since these motifs are generally short and allow multiple wildcards. An exhaustive discovery algorithm has already been developed which generates a database of motifs which are overrepresented in sequences of closely related organisms. The amount of data is however still too large to derive clear insights yet. Goals: The motif database generated by the discovery algorithm generates conservation information about millions of short motifs. When one motif is found to be significant many highly similar motifs will generally also be significant which implies that they can be clustered and the cluster as a whole might have a biological meaning. Multidimensional scaling is a technique which allows the visualization of high dimensional data by mapping it onto a 2D or 3D Euclidean space while only requiring a well-chosen string distance between the motifs. The thesis student will investigate whether this algorithm can be used for large datasets of short motifs, investigate the scalability and develop an end-to-end solution in which a biologist can explore the data in an interactive way. As an extension the results from different motif algorithms can be visually compared. Low latency querying of huge Linked Medical datasets using Big Data technologies Promotors: Erik Mannens and Rik Van de Walle Supervisors: Dieter De Witte, Laurens Devocht Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected] Keywords: Big Data Architectures, Linked Data, Biomedical Data Mining Location: home, Zuiderpoort Problem definition: The semantic web is a continuously expanding collection of linked datasets. Transparently querying all these datasets together is a technical challenge for which multiple solutions have been developed in the past years: from smart LDF clients to federated querying. If data dumps are available another approach is possible: collect all dumps in a distributed file system (HDFS) and query them using SQL-like tools developed for Big Data systems. Semantic data is generally queried using the SPARQL language for which there is no real solution in the Big Data ecosystem yet. Depending on the availability and the volume of the data this approach might be the preferred or even the only solution. Goals: In a first step a literature study will be performed to get a clear overview of all attempts that have been made to translate this problem into the Big Data space. In the next step a solution will be developed which provides a SPARQL interface to a Big Data system of choice. Most recently Apache Spark has risen to be the most popular big data technology and it would be interesting to see if its superior data processing performance translates to linked data querying. As a benchmark a set of queries on multiple medical datasets generated by the disQover platform of Ontoforce will be used to compare the Big Data approach to the already available solutions based on federated querying. A journey planner for the galaxy Promotor: Erik Mannens and Rik Van de Walle Supervisors: Pieter Colpaert, Ruben Verborgh Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: [email protected] Keywords: hypermedia, REST, Linked Data, route planning Location: home, Zuiderpoort Problem definition: Today, it would be impossible to write an app which plans routes throughout the entire galaxy: all the transit data of e.g., buses, space shuttles, trains, bikes or space elevators would have to be collected on one machine. This machine would then have to calculate hard to cache route planning advice for all the user agents in the galaxy. Within our lab we are developing a different server-client mechanism: instead of opening route planning APIs for wide use, we suggest publishing the raw arrivals and departures using a REST API. This way, route planning user agents can follow links to get more data on the fly. We are implementing this data publishing interface with all the transit data that is already available worldwide and we have written a proof of concept user agent which now calculates the routes by following links, and this within a reasonable query time. Goals: The goal of this thesis is to research the possibilities to make these user agents (which may be both servers or mobile apps) more intelligent in various ways: e.g., the process could be sped up if we can pre-fetch data on the client-side or the result can be more suitable for end-users if our useragent is able to discover other datasets (e.g., when I’m in Caracas, I only want these routes with the least criminality reported, or when I’m in a wheelchair, I only want wheelchair accessible routes). The user agent that is written needs to be generic: anywhere in the galaxy, it should be able to automatically discover the right datasets published on the Web. Building a Big Data streaming architecture for real-time Twitter annotation and querying Promotors: Erik Mannens and Rik Van de Walle Supervisors: Dieter De Witte, Frederic Godin, Gerald Haesendonck Study Programme: Master Computer Science Engineering, Master Mathematical Informatics, Number of students: 1 Number of theses: 1 Contact: [email protected] Keywords: stream querying, linked data, neural networks, semantic annotation Location: home, Zuiderpoort, Technicum Problem definition: Twitter is an online social network service that allows users to send short messages, aka tweets. Currently over 500 million tweets are being generated every day. Automatically extracting information from these tweets is a challenging task since they are short, can contain multiple languages, contain spelling errors etc. Extracting information about tweets which are semantically related, i.e. deal with a certain topic is far from trivial if they do not contain the same terms. Goals: In this thesis the student will be involved in setting up a streaming architecture for enriching tweets with semantic information. Neural networks will be trained and used to label the tweets and to spot named entities. The enriched stream will then be converted into semantic RDF triples which can be queried using a streaming variant of the SPARQL language, for example C-SPARQL. Spark is a Big Data technology stack that contains tools for both stream processing and batch analysis and is the recommended technology for tackling this kind of problem. Distributed query answering on the open Web Promotor: Erik Mannens and Rik Van de Walle Supervisors: Ruben Verborgh, Miel Vander Sande Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Ruben Verborgh Keywords: Linked Data, Web, Semantic Web, querying, distributed systems Location: home, Zuiderpoort Problem definition: Mail [email protected] to discuss this subject. What do your friends think of movies directed by Martin Scorsese? What Nvidia graphics cards have few bug reports on Linux? Is it cheaper to buy certain flights and hotels separately or together? None of the above questions can be answered by a single data source, yet today’s Web technology still focuses on single-source answer systems. This is problematic because a) it’s not scalable, since that single source will need to process a lot of queries, and b) that source doesn’t have all the data it needs to answer questions such as the above. The idea behind Linked Data Fragments (http://linkeddatafragments.org/) is that clients, instead of servers, should answer queries. Sources should offer fragments of data in such a way that clients can combine them to answer questions that span multiple datasets. A client that works with a single data source already exists (https://github.com/LinkedDataFragments/Client). Your task in this master’s thesis is to extend this client – or build a new one – so that it can query different data sources for a single query. Goals: Developing a scalable method to answer queries using different data sources. Describing to a client which data sources are relevant for a given query. Evaluating your solution on aspects such as accuracy and performance. Querying multimedia data on the (social) Web Promotor: Erik Mannens and Rik Van de Walle Supervisors: Ruben Verborgh, Miel Vander Sande Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Ruben Verborgh Keywords: Linked Data, Web, Semantic Web, querying, multimedia, images, video Location: home, Zuiderpoort Problem definition: Mail [email protected] to discuss this subject. How would you find YouTube movies about “New York” in which people mention the Twin Towers? How could you find images that depict two people shaking hands? Even though there is a large amount of metadata available on the Web, finding images and video can be quite difficult. The goal of this thesis is to build an intelligent client (for instance, as a browser extension) that is able to find multimedia items on the Web. This saves users many search operations on different datasets. For this, you will need to combine metadata from different sources. A starting point for this is the Linked Data Fragments client (http://linkeddatafragments.org/), which already allows to query the Web Of Data. Your task is to blur the border between textual and multimedia search, making it easier to find those media items users are looking for. Goals: Developing a client to find multimedia data on the Web. Finding methods to query existing multimedia platforms such as YouTube and Instagram. Evaluating your solution on aspects such as recall, precision, and performance. Real-time querying of transport data on the Web Promotor: Erik Mannens and Rik Van de Walle Supervisors: Ruben Verborgh, Pieter Colpaert Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Ruben Verborgh Keywords: Linked Data, Web, Semantic Web, querying, transport, train Location: home, Zuiderpoort Problem definition: Mail [email protected] to discuss this subject. The Belgian rail website allows you to plan your journey, but only in very rigid ways. It does not take into account your current location and plans. Suppose you need to be in a certain building in Brussels for a meeting. That morning, you decide to take the train at 14:06. Unfortunately, that train is severely delayed later on, but you won’t know that until you check the website again. In this thesis, you develop a querying system over the Web that allows to retrieve real-time results that are continuously updated. Based on data from different sources, your system automatically picks those fragments that are necessary for users to plan their journey. You can build on existing work for Web querying, such as Linked Data Fragments (http://linkeddatafragments.org/). Goals: Developing a real-time querying mechanism for transport data. Planning a route using different fragments of data from the Web. Evaluating your solution on aspects such as bandwidth and performance. Predictive analytics for the Internet of Things Promotors: Erik Mannens and Rik Van de Walle Supervisors: Dieter De Witte, Sven Beauprez (IoTBE.org) Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 2 Number of theses: 2 Contact: [email protected] Keywords: Internet of things, linked data publishing, stream reasoning, big data analytics Location: home, Zuiderpoort Problem definition: In the upcoming years the number of devices for the Internet of Things will grow exponentially. IoT will therefore become the largest source of streaming data. In order to derive insights from this data it should be converted into a format which can be queried and allows easy semantic enrichment. Linked open data is the ideal candidate to fulfill this task. To prepare for this innovation a prototype environment is required which will reveal the challenges for the upcoming data explosion. As a data source we will make use of Tessel Microcontrollers (www.tessel.io) which can be easily configured using only JavaScript and which are extendible with a wide range of sensors: audio, video, temperature, Bluetooth etc. The built-in Wi-Fi allows for a straightforward data transfer to a Big Data infrastructure. Goals: In this thesis the student(s) will be involved in setting up an architecture for analyzing and enriching sensor streams with semantic information. The annotated streams can be queried using a streaming variant of the SPARQL language, used for (static) linked data. In a first phase the student will build an experimental setup with Tessel devices as the data source. The data generated by the devices will be automatically ingested in a Big Data streaming architecture. For the analytics platform the student will explore the possibilities of the Apache Spark stack which contains tools for both stream processing and batch analysis. Since multiple students can work on this topic the focus of the thesis can be aligned with the interests of the student: if the data streams are captured he/she can focus on enrichment, visualizations or on optimizing the performance of the data pipeline. Web-based framework for real-time data visualization and scene graph management Promotor: Peter Lambert and Rik Van de Walle Supervisors: Jelle Van Campen, Tom Pardidaens and Christophe Herreman (Barco) Study Programme: Master in Computer Science Engineering, Master of Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Christophe Herreman (Barco) Keywords: html5, web, real-time, scene graph, data visualization Location: home, Zuiderpoort, Barco Problem definition: Barco develops systems and solutions for professional visualization applications in many different markets and increasingly requires web-based front-ends for these applications. The applications typically have stringent requirements towards image quality, latency and user experience and deal with large amounts of live video and data channels. At this moment, front-end applications are developed by integrators or specifically for a single customer. Goals: The goal of this project is to develop prototypes, which demonstrate different technologies to render highly dynamic scenes such as geographical maps or 3D city models, and augment the visualization with real-time video and meta-data. After analyzing the requirements from various divisions, a study of the state-of-the-art should be performed including evaluating available technology (commercial or open source libraries). The suitability of the libraries is to be explored with regards to performance, latency, scalability and feature completeness. Then, a framework and software architecture is to be designed and implemented to enhance these capabilities with the in-house capabilities for real-time video and meta-data visualization and also to enable handling different types of content in single browser window, possibly combining the capabilities of several libraries. This thesis is jointly supervised with Barco (www.barco.be). Designing an Ontology-Driven Dialog System for Natural Language Understanding Promotoren: Kris Demuynck en Erik Mannens Supervisors: Gaëtan Martens([email protected]) en Azarakhsh Jalalvand Study Programme: Master Computer Science Engineering, Master Electrical Engineering, Master Mathematical Informatics Number of students: 2 Number of theses: 1 Contact: Gaëtan Martens ([email protected]), Azarakhsh Jalalvand Keywords: Automatic Speech Recognition, Natural Language Understanding, Semantic Web, Dialog System Location: home, Zuiderpoort Dialog Management (DM) and Natural Language Understanding (NLU) are fields of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. Nuance Communications is the market leader in those technologies and delivers currently the most significant advancements in speech recognition technology. Problem definition: An Automatic Speech Recognition (ASR) + NLU system transforms an input speech signal into a semantically enriched output which consists of intents and interpretations. A dialog system (DS) is an engine responsible for DM, which allows having a conversation with a machine based on a predefined set of concepts. These concepts and their relationships are modeled using Semantic Web technologies by means of an ontology. This ontology defines the behavior of the DS relying on a semantic reasoner. For example, if the user says “I want to make a phone call” (intent=“Phone”) then the DS should ask for additional information such as: “Do you want to call somebody from your contact list or do you want to dial a number?” On the other hand if the user said “I want to call Gaëtan Martens on his cell phone” the system should not ask for additional information since the required interpretations (i.e., contactName=“Gaëtan Martens” and phoneType=“cell phone”) are already available. Goals: In this Master thesis the students will build a DS which relies on an OWL ontology to define its behavior. This ontology has to be created based on a set of uses cases provided by Nuance. The ontology has to model the intents, the concepts (i.e., the corresponding NLU interpretations), and the relationships between the concepts. The next step is then to build the DS around an existing semantic reasoner. The final goal of this challenging thesis is to have a functional DS, built using open source libraries, that is able to use the output from Nuance’s state-of-the-art ASR+NLU system and is configured with the created ontology. The students will have the opportunity to work on cutting-edge technology within the Nuance’s automotive R&D team in an international context. Building a Natural Language Dialog System with Semantic Web technology Promotoren: Kris Demuynck en Erik Mannens Supervisors: Gaëtan Martens([email protected]) en Azarakhsh Jalalvand Study Programme: Master Computer Science Engineering, Master Electrical Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Gaëtan Martens ([email protected]), Azarakhsh Jalalvand Keywords: Automatic Speech Recognition, Natural Language Understanding, Semantic Web, Dialog System Dialog Management (DM) and Natural Language Understanding (NLU) are fields of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. Nuance Communications is the market leader in those technologies and delivers currently the most significant advancements in speech recognition technology. Problem definition: An Automatic Speech Recognition (ASR) + NLU system transforms the input speech signal into a semantically enriched textual output. This output consists of an n-best list of intents (e.g., “Call”) with a set of possible interpretations (such as contactName=“Gaëtan Martens” and phoneType=“cell phone”) and corresponding probabilities. The dialog system (DS) is the engine responsible for DM, which allows having a conversation with a machine based on a predefined set of concepts. This set of concepts (and their inter-relationships) is modeled using Semantic Web technologies by means of an ontology. This ontology then defines the behavior of the DS. The DS is able, given its current state, to recognize an intent or ask for more information when the input is ambiguous by applying reasoning techniques. Goals: In this master thesis, the goal is to link an existing ASR + NLU system with a DS via ontologies. At first, the supported intents and interpretations of the NLU system must be formally defined in the ontology and missing intents (required by the DS) will have to be added. Further alignment tasks will have to be investigated by testing and iterative enhancements such as improving the mechanism of the DS to take into account the NLU probabilities. The final goal is to have a functional speechenabled ontology-based end-to-end DS, which is still a challenge given the current state of the art. The student will have the opportunity to work on cutting-edge technology within the Nuance’s automotive R&D team in an international context. The Multi-sensory and Sentiment-aware eBook Promotor: Erik Mannens and Rik Van de Walle Supervisors: Hajar Ghaem Sigarchian, Tom De Nies, Frank Salliau Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact:Hajar Ghaem Sigarchain Keywords: e-Text-Book, Sentiment Analysis, Internet Of Things, Semantic Web Location: home, Zuiderpoort Problem definition: People expect more and more when reading e-Books, in terms of entertainment and immersion. The recent advances in smart environments theoretically allow people to experience an eBook not only using their eyes but also using their other senses, such as hearing, touch (e.g., thermoception), etc. Content of such enhanced e-Books can become augmented and reacted upon by smart objects, using the Internet of Things (IoT). However, many challenges remain to make this a reality, such as the automatic annotation, exchange of data, and timing (e.g., should the room react any time a person flips a page of the eBook?). Goals: The goal of this master’s thesis is create digital books as an object in IoT. The student needs to perform a literature review, in order to have better understanding of the concepts and current state of the art. In addition he/she will propose an appropriate architecture, and data representation format. The research domains will include smart rooms, sensors, Internet of Things, digital publishing and Semantic Web. The next step is implementing a prototype as a proof-of-concept. Eventually, he/she needs to test the prototype to evaluate and qualify the relevancy of the solution to the end-user (the reader of the book). Using Events to Connect Books Based on Their Actual Content Promotor: Erik Mannens and Rik Van de Walle Supervisors: Pieter Heyvaert, Ben De Meester, Frank Salliau Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Pieter Heyvaert Keywords: books, events, Semantic Web, media, metadata Location: home, Zuiderpoort Problem definition: Many books tell a story about a certain event or a series of events. Different stories (in multiple books) might even contain the same events. An example of such a series of events can be the Battle of Waterloo, which includes separate events for the preparation, the first French attack, the capture of La Haye Sainte, the attack of the Imperial Guard, the capture of Plancenoit, and so on. Currently, the metadata extracted from books mostly denotes topics, known locations, people, dates, and so on. However, the connection between people, their location and the date is not considered, and these elements are what an event consists of. When having this information available, one is able to determine the similarity between different books regarding a specific event. This is not only limited to books: when event information is also available for movies, TV series, news articles, and so on, one is also able to determine how the literature/movies interpret real events: what happens when non-fiction (e.g., newspaper articles) takes a fiction approach (via e.g., movies) is one of the questions that might be answered here. Goals: You will need (1) to determine how events can be dissected from the several types of media, with the focus on books, (2) to decide on the granularity of the events extracted from the media, (3) to determine how to incorporate existing information, if any, about the events and media, and (4) to determine/visualize the events found in the different sources. As part of the thesis you will need to create a prototype which takes media (such as books) as input and outputs the events. Next, you’ll need to visualize the connection between the different media based on the found events. Also the difference between events can be determined: two books might talk about the same big event, however, the events making up the big event may be different, may be in a different order, and so on. Deriving Semantics from Styling Promotor: Erik Mannens and Rik Van de Walle Supervisors: Ben De Meester Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Ben De Meester Keywords: HTML5, Semantics, CSS Location: home, Zuiderpoort Problem definition: HTML is the de facto publication format for the World Wide Web for humans. Using styling properties (e.g., in a CSS stylesheet), it is possible to style the HTML document to make it pretty. However, the HTML file itself is rarely pretty for machines. Coding such as ‘<p class=”title”>’ is a terror, and contradictory to the semantic structure that HTML5 can provide. One thing that will always be consistent however, is the visual looks of an HTML page versus its intended semantics. E.g., ‘<p class=”title”>’ will look similar to ‘<h1>’ for a user, but the latter has a lot better semantic meaning than the former. And better semantics means better processing of the HTML, which results in a better Web. Goals: The goal of this thesis is to improve existing HTML pages, their structure and semantic meaning, by using, among others, its visual characteristics. To that end, the student needs to define which criteria could be important, e.g., not only font-size could be important, but also font-weight, position on a page, name of the CSS class, etc. Next, the student should provide for a proof-of-concept that can effectively recognize and improve bad HTML constructs. Many techniques may be used, involving clustering, Natural Language Processing, and ad hoc processing instructions. Trusting query results on the open Web Promotor: Erik Mannens and Rik Van de Walle Supervisors: Ruben Verborgh, Tom De Nies Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Ruben Verborgh Keywords: Linked Data, provenance, trust, Semantic Web, querying, Web, HTTP, client/server Location: home, Zuiderpoort Problem definition: Mail [email protected] to discuss this subject. The Web way of answering a question is to find combined answers from different sources. But how can we be sure that the final answer is based on sources we can trust? This is the question you will answer in this thesis. Because information is spread across different places, the old database paradigm of “query => answer” doesn’t really work anymore on the Web. Linked Data Fragments (http://linkeddatafragments.org/) capture this idea: a client asks servers for different parts of information and is able to combine it by itself. We built a Node.js application that can query the Web in such a way (https://github.com/LinkedDataFragments/Client). What this client doesn’t tell you (yet), is where the different parts of the answer come from; it doesn’t give you a guarantee that the answer is correct/trustworthy. By combining Linked Data Fragments with provenance technology, we can precisely assess the trustworthiness of data. Goals: - Developing a method to combine the trust in different knowledge sources. - Describing the trust in an answer that encompasses different sources. - Developing a client that queries the Web and gives an answer you can trust. Automatic Composition of Context-based Content in Digital Books Promotor: Erik Mannens and Rik Van de Walle Supervisors: Hajar Ghaem Sigarchian, Tom De Nies, Wesley De Neve, Frank Salliau Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Hajar Ghaem Sigarchain Keywords: e-Text-Book, Widgets, Ubiquitous environment, Semantic Web, Profile Manager Location: home, Zuiderpoort Problem definition: Books, even digital books are a static medium, with fixed contents. There is currently no means of personalizing and tailoring books to the person who reads them. However, the technology to provide dynamic contents inside a digital book already exists. Depending on contextual criteria such as the reader’s interests, location, language, culture, the contents can adapt itself to fit the reader’s needs. As a general use case, we can refer to non-fiction books such as tourist guide books that e.g. automatically update their contents based on the reader’s geolocation. Educational textbooks also provide an excellent use case to prove this principle, with the textbook adapting itself to the student’s progress and changing interests. Publishers will also benefit from this approach: it facilitates reuse of existing content; it also can lead to a significant reduction of both creation and distribution costs and even lead to new business models. Goals: The goal of this master’s thesis is to automatically change digital books based on their context with external open content. The student needs to do a literature review, in order to have better understanding of the concepts and current state of the art. In addition he/she will propose an appropriate architecture, and data representation format. The research domains will include mashups, widgets, cloud-based synchronization, versioning and the Semantic Web. The next step is implementing a prototype as a proof-of-concept. Eventually, he/she needs to test the prototype to evaluate and qualify the relevancy of the automatically provided contents. Mining anonymized user data for predictive analysis Promotor: Sofie Van Hoecke and Peter Lambert Supervisors: Glenn Van Wallendael, Benoit Marion ([email protected]), Dirk Van Gheel ([email protected]) Study Programme: Master Computer Science Engineering, Master Industrial Engineering: Elektronica – ICT, Master Industrial Engineering: Informatica Number of students: 1 Number of theses: 1 Contact: Sofie Van Hoecke Keywords: Data mining, recommendations, crash prevention Location: home, Zuiderpoort, TPVision Zwijnaarde Problem definition: To continuously improve the quality of Philips televisions, TPVision incorporates anonymized user logging on Android enabled Smart TVs. However, as current TVs are complex systems, a large amount of user data is logged. To improve the evaluation of user data, advanced data mining techniques are required in many areas of our operations. Efficient data mining can result in an optimized recommendation engine to recommend more interesting TV channels or Android apps, improving the user’s lean back experience. Additionally, watch later lists combining YouTube, Vimeo, and Spotify can be optimized by providing a prioritized list. Furthermore, from the user data the market impact of certain features (such as the availability of broadcast tuners, pre-installed apps, DLNA connectivity etc.) can be predicted. Identifying relationships between user behavior and interaction with those features taking into account general user clusters and regional differences can result in differential marketing strategies. Finally, the performance of the TV can be improved using machine learning techniques for analysis of logged data and intervene in a timely manner to prevent a potentially hazardous crash to happen. Goals: The goal of this master’s thesis is to apply data mining techniques on anonymous user data from currently deployed Philips Android TVs within Europe. Topics to improve user experience, technical enhancements or commercially interesting research are considerable. Depending on the student’s interest, the actual problem statement can be finetuned. TP Vision is a dedicated TV player in the world of visual digital entertainment. TP Vision is part of TPV Technology, the #1 monitor manufacturer in the world. At its Innovation Site Europe, Ghent we design televisions of the future, for Philips and other brands. NetTV, Ambilight, Android TV and Cinema 21:9 have all been developed in this innovative and driven environment. Recognition of our activities is visible in numerous awards such as prestigious EISA-awards. This master’s thesis is within the Advanced Software Development department which drives the innovation chain from conceptualization to feasibility and prototyping, as well as leads technology and standardization roadmaps. A Proof checker for the Semantic Web Promotor: Erik Mannens and Rik Van de Walle Supervisors: Dörthe Arndt, Ruben Verborgh Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Dörthe Arndt Keywords: N3Logic, proof, reasoning, semantic web Location: home, Zuiderpoort Problem definition: The Semantic Web enables computers to understand and reason about public data. There exists a huge number of reasoners which are able to draw conclusions based on common knowledge (e.g. Pellet, CWM or EYE). Some of them also provide proofs to clarify how they came to a certain result. But what is a proof? How can we be sure that a proof is correct? When do we trust it? Could they be lying? An independent checker is needed! Goals: With this thesis you will help to find a solution for this problem: You will get a better understanding of N3-logic, the logic used by reasoners such as EYE or CWM ((http://www.w3.org/2000/10/swap/doc/cwm.html, http://eulersharp.sourceforge.net/). You’ll learn what the current proof format of these reasoners looks like and how it could be improved. This knowledge will enable you to implement an independent proof checker for N3-proofs which can handle proofs written in the N3-proof ontology (http://www.w3.org/2000/10/swap/reason.n3). On-site tijdsregistratie en efficiëntie-analyse in het operatiekwartier Promotor: Sofie Van Hoecke, Patrick Wouters Supervisors: dr. Piet Wyffels (UZ Gent) Study Programme: Master of Science in de industriële wetenschappen: elektronica-ICT - Campus Kortrijk, Master of Science in Computer Science Engineering, Master of Science in de industriële wetenschappen: elektronica-ICT - Campus Schoonmeersen, Master of Science in de industriële wetenschappen: informatica Number of students: 1 Number of theses: 1 Contact: Sofie Van Hoecke Keywords: Location: home, Zuiderpoort, Technicum or UZ Gent (K12) Problem definition: Het organiseren en uitvoeren van een operatieprogramma binnen de dagelijks toegemeten werktijd blijkt één van de meest complexe taken binnen een ziekenhuis. Er bestaat heel wat literatuur waarin werkmodellen en systemen worden voorgesteld maar deze zijn slechts beperkt toepasselijk, o.a. door belangrijke internationale verschillen in de structuur van de gezondheidszorg. Een meer recent en universeel probleem dat de werkbaarheid van gangbare organisatiemodellen ondermijnt is het spanningsveld tussen opgelegde budgettaire beperkingen en de stijgende zorgvraag. Het enige afdoende antwoord hierop is een efficiëntere inzet van bestaande middelen en personeel en een optimalisatie van de workflow (planning en uitvoer) in deze multidisciplinaire en hoogtechnologische omgeving. Dit is geen eenvoudige opdracht gezien de aanwezigheid van onvoorspelbare factoren (urgenties, complicaties), de betrokkenheid en uiteenlopende belangen van verschillende disciplines (chirurgen, anesthesisten, verpleegkundigen, technici) en de afstemming op diensten die patiënten moeten aanleveren en ontvangen. Goals: Het doel van deze masterproef is het ontwerpen van een intelligent, flexibel en gebruiksvriendelijk systeem (bv. onder de vorm van web-gebaseerde toepassing) dat een accurate tijdsregistratie op de werkvloer mogelijk maakt met aandacht voor de verschillende fases en deelaspecten die aan bod komen bij, en invloed hebben op het verloop van een chirurgische ingreep (opstart tijd, turnover tijd, installatietijd...). Het ontwikkelen van een geschikt datamodel is hierbij heel belangrijk. De data die aldus opgemeten wordt, moet eenvoudig en efficiënt kunnen bevraagd worden om tijdsverliesposten in kaart te brengen en na root cause analysis systematische verbeterinitiatieven te lanceren. Tijdens deze masterproef zal de student kennis opdoen omtrent de werking van een operatiekwartier en met de verschillende partijen afstemmen om de verschillende standpunten en invalshoeken op te nemen in het ontwerp. Er bestaat momenteel een grote en zeer brede interesse in (en markt voor) dergelijke toepassingen. Deze masterproef loopt in samenwerking met de dienst Anesthesie van het UZ Gent. Real-time data-enrichment and data-mining of multilingual databases in procurement environment Promotors: Erik Mannens & Rik Van de Walle Supervisors: Ruben Verborgh & Dieter De Witte Study Programme: Master Computer Science Engineering, Master Mathematical Informatics Number of students: 1 Number of theses: 1 Contact: Manu Lindekens (GIS International) Keywords: datamining, data-enrichment, data-cleaning Location: GIS International – Stapelplein 70 – 9000 Gent Problem definition: In a B2B world we as a procurement service provider (www.gisinternational.net) do receive lots of information from our customers on items we need to buy from different sort of vendors. This information is in most cases very limited, in different languages, incomplete and polluted. Based on that limited information we, in a first phase, categorize these articles into different commodities. This categorization will allow us to define better which vendors to target. A second step is to enrich the data to be more accurate in ordering the material. These are all steps that are today done manually but are very labor-intensive. An off-the-shelf software package that does all this is currently not available. Secondly the data in our own ERP system needs to be cleaned and enriched in real-time. Currently we have identified different parties to tackle this problem with intelligent software. Goals: The goal of this thesis is to do be the link between the different parties who will develop and implement the software to solve this problem. Therefore a good understanding of the needs, the processes and how the software will work is important. GIS International has the knowledge on the parts, knows which information is needed and commercially where to get it from. The other parties are top research and technology companies which are experts in their fields. Purpose is to build a state-of-the-art add-on to the ERP software of GIS (MS Dynamics NAV) which will be the differentiator in the market. Do not hesitate to contact us when you are interested in this topic and have some questions! An extensible web-based intermodal route planner for Belgium Promotor: Erik Mannens and Rik Van de Walle Supervisors: Pieter Colpaert, Ruben Verborgh Study Programme: Industriële wetenschappen Number of students: 1 Number of theses: 1 Contact: [email protected] Keywords: hypermedia, REST, Linked Data, route planning Location: home, Zuiderpoort Problem definition: Developers of route planning apps today have to be satisfied with web-services which do the hard work for them: they offer a black-box algorithm with a finite set of functionalities (e.g., http://api.myapp/?from=...&to=...). When the developer would like to make the algorithm take into account other modes of transport or take into account different edge weights (e.g., depending on wheelchair accessibility, criminality statistics or probability of being too late), it needs to request these features to the server-admin. At MultiMedia Lab, we are researching an alternate server-client trade-off: instead of exposing route planning algorithms over HTTP, we suggest publishing the raw arrivals and departures using a REST API with paged fragments (http://linkeddatafragments.org). This way, route planning user agents, which now execute the algorithm on their own, can follow hypermedia controls to get more data on the fly. When we publish an ordered list of arrival/departures (connections), route planning can be done by relaxing each connection once (shortest path in a DAG). We publish this graph as Linked Open Data resources for route planning purposes. Goals: The goal of this master thesis is to research different strategies to merge graphs from different transport modes on the scale of Belgium. A visualization (cfr. http://kevanahlquist.com/osm_pathfinding/) should be created to understand the different strategies for merging different transport modes.
© Copyright 2024