April 10TH 2015 Universidade de Lisboa Book of Abstracts Editors Catarina Martins, Universidade de Lisboa Daniela Oliveira, Universidade de Lisboa Joana Barros, Universidade de Lisboa Technical Data Title: Book of Abstracts Editors: Catarina Martins, Daniela Oliveira and Joana Barros Year: 2015 Published in electronic format at: bod2015.ciencias.ulisboa.pt 2 Table of Contents Preface .................................................................................................................................................. 4 Lecture .................................................................................................................................................. 5 Metagenomics closing the gap ........................................................................................................... 6 Presentations ........................................................................................................................................ 7 A role for the carboxyl-terminal domain of RNA Pol II in pre-mRNA splicing ...................................... 8 Genome-wide mapping of RNA Polymerase II CTD modifications with single-nucleotide resolution.. 9 Do Parasites Count the Time? RNA-sequencing the Circadian Transcriptome of T. brucei ................ 10 Detecting footprints of HIV-1 derived sncRNAs in small-RNA-seq data ............................................ 11 Recognition and Normalization of Biomedical Entities Based on Ontologies .................................... 12 Posters ................................................................................................................................................ 13 Compound Matching of Biomedical Ontologies ............................................................................... 14 Data Mining Analysis of Lung Cancer Electronic Health Records ...................................................... 15 Detection of alternative miRNA Processing using the IsomIR_window ............................................ 16 Genetics of Familial Paget’s Disease of Bone.................................................................................... 17 Integrated in silico analysis of the neuronal transcriptome of SMA disease models: untangling the gene regulatory networks underlying motor neuron degeneration at the cell- and systems-level ... 18 Mining cardiac side-effects for known drugs .................................................................................... 19 Modelling drug response for closely related CNS GPCR proteins ...................................................... 20 Resist: An Intelligent System to Predict Antibiotic Resistance .......................................................... 21 VetBDI – Development of an Integrated Database in Veterinary Medicine ...................................... 22 Visualizing the incoherencies in Bioportal ........................................................................................ 23 3 Preface This Book of Abstracts refers to the 4th edition of Bioinformatics Open Days of 2015 that took place at Faculty of Sciences at the University of Lisbon in the 10th of April. Bioinformatics Open Days is a student-led initiative first held at the Universidade do Minho, Braga in 2012. It aims to promote the exchange of knowledge between students, teachers and researchers from the Bioinformatics and Computational Biology fields. This symposium’s 4th edition is a joint collaboration with the Bioinformatics and Computational Biology Master students of Universidade de Lisboa. The accepted submissions aimed to promote the bioinformatics field in our country by doing a short oral communication or a poster. Each article was reviewed by one or two members of our scientific committee. We had 16 submissions (6 oral communications and 10 posters) and a lecture by Ana Teresa Freitas from Instituto Superior Técnico. The abstract of this lecture is also included in the book of abstracts. We would like to thank all who collaborated in organizing this event, especially Professor Francisco Couto and Professor Cátia Pesquita for their guidance and support, our reviewers for their effort and careful analysis of the submitted abstracts, the Faculty of Sciences and its Student Association for providing and aiding in several logistic matters. We would also like to thank all of our guests for accepting to be a part of this project and, finally, we appreciate your interest in this initiative and hope you liked the program and subjects discussed. Lisbon, April 2015 Catarina Martins, Daniela Oliveira and BOD organizing committee 4 Lecture 5 Metagenomics: closing the gap Ana Teresa Freitas Microorganisms constitute the vast majority of life forms on Earth. The diversity of microorganisms shows the existence of an astonish set of functions that are necessary for all other life forms to exist. So it seems that microbes run the world. Most of the microbiological research is focused on organisms that are cultured on laboratory, limiting our understanding of the true magnitude of diversity within microbial communities. For example, the idea that amniotic fluid is sterile has been a fundamental tenet in obstetrics since the early 1900s. Nowadays, it is known that healthy amniotic fluid is not as sterile as previously thought. Findings like this one are only possible due to metagenomics studies that offer a window on an enormous unknown world of microorganisms. Due to the presence of microbes in all walks of human life, there is a constant interaction between microbes and humans. Understanding human-associated communities, known as Human microbiome, is one of the major frontiers of research in human health. In this stage, metagenomics is an instrumental tool to close the genotype-phenotype gap in human disease. Recent advances in DNA sequencing technologies lead to a huge cost reduction, being now possible and financially viable to sequence environmental samples. However, metagenomics still presents many challenges that come mainly from data analysis. DNA extracted from environmental samples is a mixture of genomes, which brings complexity to the assembly of the reads in order to obtain the complete genes, operons and especially individual genomes that make up the sample. In addition, the enormous amount of information collectively generated by metagenomics approaches poses new challenges for processing and data analysis. The petabyte and even exabyte scale of this 'big data' generation requires the introduction of advanced parallel computing and other high-performance computing approaches (HPC), such as cloud infrastructures, strategies that effectively enable the data exploitation. In recent years, several computational tools have been developed in order to support the analysis of metagenomes to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. In this talk we will go through the Human Microbiome Project, discussing the role of metagenomics in clinical diagnosis and presenting its computational challenges. 6 Presentations 7 A role for the carboxyl-terminal domain of RNA Pol II in pre-mRNA splicing Joana Tavares, Noélia Custódio and M.Carmo Fonseca In mammals protein-encoding transcripts are formed by short exons separated by much longer introns and it is still not fully understood how the small exons separated by long introns are correctly juxtaposed for splicing. According to a recent model, the carboxyl-terminal domain (CTD) of RNA polymerase II (RNA Pol II) serves as a platform that tethers the upstream exon until it is ligated to the downstream exon in the nascent pre-mRNA. To test this model, we generated cell lines with mutant versions of the CTD. The transcriptome (polyA + RNA) of cells containing either wild-type, point mutations or deletions of the CTD was analyzed by RNA-seq. Genes that were either differentially expressed or differentially spliced were identified using two bioinformatics tools, MISO and MATS. The results show that distinct types of CTD mutations have specific effects on splicing profiles. We conclude that the composition of the CTD is critical for determining splicing decisions. 8 Genome-wide mapping of RNA Polymerase II CTD modifications with single-nucleotide resolution Tomás Gomes, Ana Rita Grosso and Maria Carmo-Fonseca RNA Polymerase II (Pol II) is responsible for the production of most RNA molecules by transcribing the information encoded in the DNA. The RNA is not readily used by the cell, being first subject to several processing events, such as 5' capping, splicing or 3' processing. There has been increasing evidence that these mechanisms also act during transcription, not just after the production of the full molecule. Pol II C-terminal domain (CTD) coordinates interactions between these processes' machinery and the polymerase itself by being subject to various modifications on its conserved repeats. However, the relationship between different modifications and cotranscriptional processing events is not fully understood. To approach this, the Proudfoot lab developed a new protocol for Native Elongating Transcript sequencing (NET-seq) that targets different human Pol II CTD isoforms, allowing them to be mapped in the genome with singlenucleotide resolution (Nojima, Gomes et al 2015). Due to the novelty and specificities of this protocol, a workflow for analyzing these datasets was not well established. In this work such pipeline was defined, including critical steps to deal with partial adapter contamination and to acquire the Pol II single-nucleotide resolution mapping. Repetitive sequences, in particular from snRNA and snoRNA, were highly represented in the sequenced data, and had to be taken into consideration in further downstream analysis regarding average gene profiles for Pol II CTD isoforms. These profiles showed previously described Serine 2-phosphorylated Pol II pausing after the 3' end, as well as unphosphorylated Pol II in the TSS. Surprisingly, Serine 5-phosphorylated Pol II was not seen to accumulate at the TSS as expected. An approach based on the NET-seq data was also developed to determine which exons were spliced during transcription. Results show that co-transcriptional splicing is correlated with Serine 5 phosphorylation in the CTD repeats, an isoform that was not associated with the elongation phase of transcription previously. 9 Do Parasites Count the Time? RNA-sequencing the Circadian Transcriptome of T. brucei Filipa Rijo Ferreira, Daniel Pinto Neves, Joseph S. Takahashi and Luisa M. Figueiredo Circadian rhythms are cyclic biological processes with a period of approximately 24 hours that allow organisms to anticipate regular changes in their environment, such as the changes in light and temperature from day to night. These rhythms have been observed across all kingdoms of life, including mammals, bacteria, fungi and plants. Moreover, the divergent molecular mechanisms underlying these processes suggest their independent evolution in different kingdoms. African Trypanosomiasis is a major neglected tropical disease that affects both humans and animals, threatening over 60 million people in some regions of sub-Saharan Africa and with an incidence of 7.000 new cases per year. It is caused by Trypanosoma brucei, a unicellular protozoan parasite whose life-cycle is entirely extracellular both in the mammalian host and insect vector. Upon the parasite’s invasion of the central nervous system, the disease causes an array of neurological disorders, including the disruption of the host’s sleep cycle. In this work we used established protocols and algorithms in the circadian rhythm field to determine if the parasite itself has an endogenous, entrainable circadian rhythm. We used RNAseq to obtain temporal transcriptome data of two life-cycle stages of the parasite in culture, using two different entrainment stimuli to synchronize their circadian clock. We found that, in free-running, in the absence of external stimuli, approximately 12% of T. brucei genes are expressed in a time-dependent cyclic manner with a period of approximately 24 hours. Additionally, these genes cluster into two distinct phases of maximal expression. Functional annotation of these genes indicates their involvement in a wide range of cellular processes. Overall these results strongly suggest the existence of a molecular clock regulating gene expression in T. brucei. 10 Detecting footprints of HIV-1 derived sncRNAs in small-RNA-seq data Andreia Amaral, Paula Matoso, Rui Soares, Russel Foxall, Ana Sousa and Margarida Gama-Carvalho It has become widely accepted that RNA viruses do not encode miRNAs, to avoid unproductive cleavage of their genomes or mRNAs1. However a recent study, demonstrated that a retrovirus, the bovine leukemia virus (BLV)2 , encodes a conserved cluster of miRNAs which are transcribed by pol III escaping mRNA cleavage because only subgenomic pol III transcripts are processed into miRNAs. Similarly, we hypothesized that HIV-1 could potentially encode non-canonical miRNAlike molecules that cannot be predicted using miRNA prediction algorithms or have not yet been detected due to the lack of the appropriate experimental context. Therefore, in this study we have sought to investigate whether HIV-1 derived small non-coding RNAs (sncRNAs) could be observed in stimulated CD4+ T-cells infected with HIV-1 in conditions mimicking physiological infection, using high-throughput sequencing. Similarly to previous studies, the reads with high homology to HIV-1 genome corresponded to 0.11% of the total dataset but in physiological conditions of infection with ~10% of cells infected, meaning that the proportion of HIV-1 encoded sncRNAs in infected cells would be 1.1%, which is the highest proportion ever reported. We further hypothesized that by modeling the distribution of reads along the HIV-1 genome we could identify regions with na accumulation of reads higher than that expected if they were simply derived from RNA breakdown products. Using this genome-wide modeling approach we have identified 6 putative HIV-1 encoded sncRNAs that are within the size range of effector miRNA molecules, which display potential RNA hairpin structures. Furthermore, in silico targeting analysis revealed that these HIV-1 encoded sncRNAs may potentially target mRNAs involved in apoptosis, mRNA transport and mRNA export from nucleus. These results lead us to speculate that the virus could be using these endogenous miRNA-like molecules to manipulate T cell differentiation and export of RNAs from nucleus in particular, to export the unspliced full length viral RNA. The therapeutic implications of such mechanism are intriguing, because targeting these viral miRNAs could constitute an antiviral therapy. 11 Recognition and Normalization of Biomedical Entities Based on Ontologies André Leal, Bruno Martins and Francisco Couto Clinical notes in textual form occur frequently in Electronic Health Records (EHRs). They are mainly used to describe treatment plans, symptoms, diagnostics, etc. Clinical notes are recorded in narrative language without any structured form and, since each medical professional uses different types of terminologies according to context and to their specialization, these notes are very challenging for their complexity, heterogeneity and contextual sensitivity. Forcing medical professionals to introduce the information in a predefined structure simplifies the interpretation. However, the imposition of such a rigid structure increases not only the time needed to record data, but it also introduces barriers at recording unusual cases. One possible solution consists on the application of text-mining techniques to the clinical texts, in order to support the recognition and normalization of medical concepts. Together, these techniques can result in the correct and efficient information gathering by information systems. We developed an automated system for recognizing medical concepts (i.e, mentions to disorders) in clinical notes, which then also normalizes them with a UMLS concept unique identifier (CUI). This system was developed with the intention to overcome some challenges presented in this task, such as the recognition of non-continuous entities and the normalization of ambiguous entities. For the recognition we use the novel SBIEON encoding which contains a tag to specify words inside recognized entities that are not part of them. We also explore non-annotated clinical notes to generate lower-dimensional representation of the word vocabulary, and therefore reduce the data sparsity. Conditional Random Filed (CRF) models were generated based on the mentioned features among others, such as domain specific lexicon, token shape, etc. For normalization we use a rule based approach to normalize the recognized entities and we also take in consideration the information content of each entity for disambiguation. This system was used to participate in SemEval 2015 Task 14, achieving a second place in the competition. For future work, we intent to explore semantic similarity between disorder mentions within individual clinical notes, to improve normalization results. This approach is based on the assumption that entities inside individual clinical notes should be related between them. 12 Posters 13 Compound Matching of Biomedical Ontologies Daniela Oliveira and Catia Pesquita Ontologies model the knowledge in a given domain using concepts, properties, and relations and are particularly successful in the life sciences. There are several biomedical ontologies that cover the same field or related fields and, to guarantee their interoperability, it is necessary to establish meaningful relationships between them. Ontology Matching techniques were developed to address this problem, since they take ontologies as input and determine a set of correspondences between semantically related entities of those ontologies, creating an alignment., which enables the knowledge and data expressed in the matched ontologies to interoperate. Compound matching algorithms can find matches between class or propriety expressions involving more than two ontologies, and thus can improve the integration of ontologies covering related domains. We define a compound mapping as the correspondence between a class of one source ontology and two classes of two different ontologies, which together are equivalent to the source. We are developing novel algorithms to establish compound mappings integrated into the AgreementMakerLight (AML) ontology matching system. In a preliminary strategy, we use a twostep approach based on lexical similarity that first aligns the source ontology with the first target and then matches the unmatched words of the source ontology labels to the second target ontology. We use a modified Jaccard index to calculate the confidence of the match, by comparing each word on every class label of both ontologies. Finally, the algorithm has a selection step, which selects the match with the highest similarity. To evaluate our strategy we used a set of seven reference alignments automatically created by inferring compound mappings from logical definitions in OBO ontologies. Preliminary results using this evaluation approach present low f-measure, however a manual inspection of the top mappings has revealed the incompleteness of the reference alignments. Future work will involve the manual evaluation of a portion of the generated mappings, to improve the coverage of the reference alignments, and the investigation of other algorithms suited to compound matching. 14 Data Mining Analysis of Lung Cancer Electronic Health Records Ana Silva, Cátia Pesquita, Lisete Sousa, Alexandra Mayer and Ana Miranda Lung cancer has one of highest incidence and mortality rates in both genders in the entire world. In Portugal, both rates are currently showing a growing trend. To improve our understanding of Portuguese lung cancer patients and their characteristics, we are mining the data collected by ROR-Sul – Registo Oncológico Regional do Sul. This organization adds all health public institutions in Lisboa e Vale do Tejo, Alentejo, Algarve and Região Autónoma da Madeira. It’s mainly work is collect and process all records and make periodic publish of the results. Our selected data set covers 950 cases of lung cancer which occurred during the first half of 2013. We selected a set of demographic and cancer characteristics variables from lung cancer patients. In a first step we made some process of data cleaning in R program. Also, we created new variables, e.g., the age at diagnosis group variable out of the birth date and diagnosis date variables. Then we conducted a spatial analysis based in demographic variables, using the Local Indicators of Spatial Association - LISA - from Moran’s I algorithm available in an R’s package. This analysis allowed the identification of regions where the incidence rate differs from their neighbors. We have found these regions Beja, Aljustrel, Serpa, Ourique, Mértola and Castro Verde have a higher incidence than the neighboring regions. With this information we can plan the actions for prevention and early detection of lung cancer more efficiently and with lower costs. This work is still ongoing, and in the following steps we will explore the full breadth of available variables and other algorithms. 15 Detection of alternative miRNA Processing using the IsomIR_window José Gil Lopes, Laura Do Souto, Paula Matoso, Rui Soares, Russel Foxall, Ana Sousa, Margarida Gama-Carvalho and Andreia J. Amaral MicroRNAs (miRNAs) are small non-coding RNAs involved in post transcription regulation of gene expression. IsomiRs, have been described as miRNA variants that differ from the canonical miRNAs but deriving from the same pre-miRNA, the precursor molecule. The most abundant classes of IsomiRs are classified into three types: 5’, 3’ and internal isomiRs, which are the result of differential processing of the pre-miRNA by Dicer or by RNA editing. We have developed a PERL pipeline, the IsomIR_Window, which allows accessing the complexity of miRNA biogenesis in Next Generation Sequencing (NGS) data. We show the analysis of the profile of small noncoding RNAs in CD4+ T cells by NGS using the IsomiR_window. The study included two datasets. The first dataset comprised the study of two experimental conditions, naive and activated CD4+ T cells with no biological replicates. Each library derived from an RNA pool generated from cells collected from nine healthy donors. The second dataset included smallRNA-seq libraries of activated CD4+ T cells obtained from healthy donors in three different experimental conditions: non-infected (N=3), HIV-1 infected (N=2) and infected with HIV-2 (N=3). Each library was also derived from na RNA pool this time from three individuals. Using as input the sequences of small noncoding RNAs (sncRNAs) and its frequency in the data, the IsomIR_window makes an automated search of each sequence in a database of pre-miRNAs and coordinates of canonical miRNAs. The IsomiR_window retrieves, the ID of the pre-miRNA and classifies the IsomIR. Results from both the first and second datasets show that isomiRs were two times more frequent than canonical miRNAs. Furthermore, in regard to the most abundant isomiRs these displayed an expression either equal or greater than the corresponding canonical miRNAs. Finally, differentially expressed isomiRs, with significant fold changes and number of reads have been found between the naive and stimulated conditions, as well as when comparing healthy with infected cells. These results showed that isomiRs play an important role in T cells. Finally, although activation and infection leaded to differential isomiR expression, the effect of activation of T-cells in differential miRNA processing seems to be stronger. 16 Genetics of Familial Paget’s Disease of Bone Patrícia Santos, Inês Sousa, Vânia Francisco, Joana Xavier, José Patto, Filipe Barcelos and Sofia Oliveira Paget's disease of bone (PDB) is a systemic disease characterized by increased bone resorption and formation, causing gradual destruction of parts of the skeleton and subsequent reconstruction of a more fragile bone. PDB has an overall incidence of 2% in the population over 55 years. PDB is a complex disease with multiple genes implicated in its pathogenesis, but in its monogenic form, only one gene (SQSTM1) has been linked to PDB. To identify novel genes causing familial PDB, we performed whole exome sequencing (WES) in six individuals from a Portuguese multiplex family composed of five PDB cases, two unaffected individuals and one individual with unclear diagnosis. Given the uncertain diagnosis for one family member, we conducted two analyses: model 1, in which this individual is considered affected and model 2 where he is unaffected. DNA was captured using the SureSelect Target Enrichment System kit and sequenced using Hiseq2000 (Illumina’s Solexa). We identified three variants (c.C4786T (KIAA1875), c.C53T (NLRC3) and c.T566C (SRL)) in model 1 and one variant (c.G180A (SERINC2)) in model 2 that were present in all affected and absent from the unaffected in next-generation sequencing (NGS) data. Validation of these mutations by Sanger sequencing in all family members revealed that all model 1 mutations were present in all individuals, while the model 2 mutation was present in all family members except the individual with unclear diagnosis. None of these variants were present in a second Portuguese PDB multiplex family. In conclusion, our findings support the notion that bioinformatics analyses of NGS data is a process requiring optimization. We found four novel variants which may cause PDB in this family with an autosomal dominant pattern of inheritance and incomplete penetrance. Further studies in other PDB families are warranted to determine the pathogenic potential of these genes/variants. 17 Integrated in silico analysis of the neuronal transcriptome of SMA disease models: untangling the gene regulatory networks underlying motor neuron degeneration at the cell- and systems-level Hugo A F Santos, Andreia Amaral, Takakazu Yokokura, David Van Vactor and Margarida Gama-Carvalho Spinal Muscular Atrophy (SMA), a lethal inherited neurodegenerative disorder, is characterized by low levels of the Survival of Motor Neuron (Smn) protein, which is essential for the assembly of spliceosomal small nuclear ribonucleoproteins (snRNPs). Strikingly, low levels of this ubiquitous protein mainly affect motor neurons (MNs), disrupting neuromuscular junctions (NMJs) and leading to MN degeneration. Despite robust knowledge of SMA’s genetics, the exact molecular mechanisms underlying the disease’s phenotype remain largely elusive, preventing the development of rational therapeutics. One possibility is low levels of Smn have a higher impact in the expression and splicing of genes critical for MN function and survival, or that these cells are intrinsically more sensitive to global changes in RNA processing. Alternatively, Smn may be involved in MN specific functions. Possibly both hypothesis are applicable. To address the relevance of Smn-dependent changes in neuronal gene expression, we performed RNA-seq to obtain an unbiased profile of the central nervous system transcriptome of a Drosophila melanogaster SMA disease model. Upon SMN down-regulation we observe changes in exon usage in a particular subset of genes crucial for neuronal development, viability and NMJ function. This suggests that SMN-dependent changes in the splicing machinery do not have widespread effects, affecting specific genes possibly due to the existence of certain features in their sequence or structure. Interestingly a large proportion of identified genes with altered splicing are known genetic modifiers of the NMJ phenotype in SMA fly models, thereby supporting the biological relevance of our data. By further assessing the significance of the associated cellular functions and pathways where the identified genes are involved we aim to generate and test hypothesis regarding their potential contribution to the establishment of the SMA phenotype. 18 Mining cardiac side-effects for known drugs Joana Barros and André Falcão Despite the population growth, the drug research and development process (R&D) has roughly maintained unaltered since 1960 and it isn’t well suited for today’s requirements. On average, the probability of a compound passing all the R&D stages and achieve commercialization is estimated at only 16%. In silico methods are commonly used as an alternative research method in the drug development process to help reduce its cost and duration. One important and unintentional drug target is the hERG protein. This ion channel is responsible for mediating the rapidly activating component of the delayed rectifying potassium current in the heart (IKr), making it an important component of the heart normal function. Its inhibition can lead to fatal cardiac arrhythmias making it an important study case. It is estimated that about 40% and 70% of all new drug-like molecules affect hERG. This research aims to develop a prediction tool to identify, in the early development stages, potential drug candidates that inhibit the hERG channel. Using Quantitative Structure-Activity Relationship (QSAR) methods we built computational models to find a significant relation between molecular properties and the compound bioactivity. These models were built using molecular descriptors from several public chemical packages e.g. (RDKit, CDK and E-Dragon) as well as molecular fingerprints. Since it was expected that not all molecular descriptors were necessary to build the prediction model we experimented various methods of variable reduction. Several methodologies were used for variable selection; namely Principal Components Analysis, Elastic Nets, Random Forests and Linear Regression. Of those, the latter method provided the best and most reliable results which were then used in a stepwise process in a Support Vector Machine Model. This approach was applied to each set of variables individually and to different data set combinations. The preliminary results obtained from simple cross validation show that significant models using the RDKit descriptor set coupled with Fingerprints were able to produce the best results reaching over 61% of explained variance. To further improve the prediction model we plan to implement different molecular similarity descriptors, obtained using NAMS, and also develop a free online prediction tool for public use. 19 Modelling drug response for closely related CNS GPCR proteins Vânia Ferreira and André Falcão Drug development is a complex and expensive process and one of the hottest fields of modern science. The complexity of the field and the high cost of screening new compounds led to the development of in silico models for identification of new pharmacologically active compounds. Nonetheless a new drug that has been predicted active is many times prone to secondary effects as most molecules can bind to more than one target. The molecules' binding affinity conservation between different receptors can be a starting point to understand which molecular structural properties have a more relevant role in receptor binding. With this information, it is possible to decrease the range of possible molecules as targets for drug discovery, making the first steps on selecting new potential drugs for testing a faster and less expensive process. In this work we have analyzed different bioactivity patterns for several molecules for which it is known that they bind to different CNS G-Protein Coupled Receptors (GPCR) proteins; namely different serotonin and dopamine receptors in Homo sapiens and Rattus norvegicus. We have used a Neighbor-Joining phylogenetic tree and a sequence identity matrix to first identify the evolutionary relation between all the receptors and then to select the most structurally conservative pairs between them. Subsequently, the bioactivity values (Ki) for the binding molecules of these receptors were collected from ChEMBL. In particular we wanted to verify how sequence similarity impacts drug response for closely related proteins, namely Dopamine and Serotonin receptors. In total we tested 19 different receptors - pairwise compared - and their binding affinities for the same molecules. Differently from what was expected we have found no significant relationship between sequence similarity and binding affinities. Nonetheless several highly significant binding relationships between different receptors emerged. These patterns apparently are not related to sequence similarity nor to the primary binding target of those receptors. Namely significant relationships were identified between dopamine and serotonin receptors [e.g. 5HT2c and D1]. Moreover, we searched for a bioactivity pattern between the same receptors from Homo sapiens and Rattus norvegicus and a strong relation became evident between bioactivity levels of their molecules. 20 Resist: An Intelligent System to Predict Antibiotic Resistance João Nascimento and Cátia Pesquita The recent advances in technology and computation power and the expanding use of electronic health records have opened new avenues of research that explore the information in these records to improve healthcare, namely in diagnosis and therapeutic prescriptions diagnosis and therapeutic prescriptions. One increasingly relevant public health concern is antibiotic resistance. This phenomenon happens when some sub-populations of a microorganism survive after exposure to antibiotics, becoming more difficult to control. The World Health Organization has already stated that unless the antibiotic resistance's growing trend is reduced, we are heading towards a post-antibiotic era, where the death rate of common infection will rise due to the expected failure of standard medical treatments. This project's goal is to investigate if it is possible to develop supervised learning models that are able to classify patients regarding their antibiotic resistance risk using the information that is usually collected at a clinical and laboratorial level and stored in electronic health records in Portuguese hospitals. We are interested in investigating the potential of variables such as time of the year, geographical location and demographics as well as suspected infection location. After pre-processing the data using data cleaning, standardization and transformation techniques, we are now devising and applying machine learning based strategies to train a model for antibiotic resistance prediction at the patient level. The ability to successfully predict antibiotic resistance risk can have a significant impact worldwide, because it can help clinicians in selecting appropriate antibiotics. This can help reduce antibiotic resistance levels, improve patient treatment, and ultimately decrease health care costs. 21 VetBDI – Development of an Integrated Database in Veterinary Medicine Ricardo Faustino, Daniel Simões, Renata Neves, Daniel Teixeira, Fredy Pinheiro, Micael Faustino and Liliana Marques The importance of bioinformatics database for clinical decision making has been steadily increasing over decades. In order to develop new biomarker and statistical correlation studies we are working to improve an integrated database in veterinary medicine using biochemistry and medical imaging data. However, the imaging analyses depend directly on the expertise level of vet doctors, sonographer or other imaging specialists. This is a very important aspect, because the results, in some cases, are very subjective or unspecified. To solve this problem we are using the statistical results correlation between biochemistry and medical imaging information (Pierroti M.P.- S.R.L – X-Ray Unit). Clinical chemistry data are decisive for evaluating altered organ function in animals. Blood samples were taken and analysed for electrolytes, substrates, metabolites and enzymes. All data were obtained using Mindray BC-2800Vet medical devices, frequently employed in most veterinary clinics. We used samples of two different species, cats and dogs. The biochemical parameters used are: GLU-PS, TP-PS, ALB-PS, GPT-PS, GOT-PS, AMYL-PS, BUN-PS, CRE-PS, Lymph#, Mon#, Gran#, Lymph%, Mon%, Gran%, Eos%, and Histograms for WBC, RBC and PLT. We also performed multiple quantifications of discrete imaging findings using a Java-based image-processing program. An important component of the discovery, characterization, validation and application of biomarkers is the extraction of information and meaning from images through image processing and subsequent analysis. Associations between these changes and disease state can be analysed using classifiers, like support vector machines (SVM). In conclusion, we are creating an integrated database in medicine to develop new biomarkers. Data obtained in this process will be crossed with biochemistry and imaging data, which will produce accurate, reproducible and feasible information over time. 22 Visualizing the incoherencies in Bioportal Catarina Martins, Catia Pesquita, Ernesto Jimenez-Ruiz and Emanuel Santos Bioportal is a web portal that provides access to a large number of biomedical ontologies, in OBO format or OWL format, and to the mappings between them. The mappings are automatically generated or added manually by experts. However, sometimes those mappings are not compatible with each other and can lead to conflicts in the alignments due to erroneous mappings or even incompatibilities between the ontologies. Therefore, it is important not only to find the conflicts between the ontologies but also find an intuitive way of identifying them. In order to solve this problem, AgreementMakerLight (AML) and LogMap, two ontology matching tools, applied their repair algorithms in 19 pairs of ontologies from Bioportal and discovered that 11 in 19 had logical errors involving in average 22% of the mappings. The creation of a visualization tool to identify the incoherencies between the ontologies would be very helpful to the scientific community allowing a more intuitive and analyzing possible conflicting mappings. We present the preliminary version of a web tool that supports the visualization of conflicting mappings and their context. The backend of the tool includes a database with the necessary ontology data and mappings, as well as the conflict sets data precomputed using AML and LogMap. The frontend allows users to select a mapping from a list, and then access the information about the associated conflicts in two different formats: visualize a graph that shows the conflicting mappings and the ontology axioms that are behind the conflict; or in a table, that lists the different sets of conflicting mappings. Future work will include the extension of the tool to permit users to manually solve conflicts, and export the repaired alignments. We will also directly link our tool to BioPortal, to support access to all the ontologies and mappings it contains. 23
© Copyright 2024