De novo comparison of huge metagenomics experiments coming from NGS technologies Application on tara oceans project Nicolas MAILLET - October 7th 2014 Work realized under the supervision of Dominique LAVENIER & Pierre PETERLONGO Credit: Sauveur D. Comparative Metagenomics: What do we want? How to compare two samples? 1. Introduction 1/5 Comparative Metagenomics: What do we want? How to compare two samples? Mapping sequences on current knowledge 1. Introduction 1/5 Comparative Metagenomics: What do we want? How to compare two samples? Mapping sequences on current knowledge 1. Introduction De novo assembly 1/5 Comparative Metagenomics: What do we want? How to compare two samples? Mapping sequences on current knowledge De novo assembly De novo comparative metagenomics! 1. Introduction 1/5 Comparative Metagenomics: What do we want? How to compare two samples? Mapping sequences on current knowledge De novo assembly De novo comparative metagenomics! Quantify similarity between two metagenomic datasets at the read level, i.e. find similar sequences between two datasets 1. Introduction 1/5 Comparative Metagenomics: What do we want? How to compare two samples? Mapping sequences on current knowledge De novo assembly De novo comparative metagenomics! Quantify similarity between two metagenomic datasets at the read level, i.e. find similar sequences between Highly efficient approach to two datasets scale with huge metagenomic datasets (100-500 millions reads) 1. Introduction 1/5 But… How to define "similarity"? 2. Methodology 2/5 But… How to define "similarity"? A rough but efficient notion of "similar sequences": 2. Methodology 2/5 But… How to define "similarity"? A rough but efficient notion of "similar sequences": Given integers k and t, two sequences s1 and s2 are said similar if and only if they share at least t non overlapping k-mers. 2. Methodology 2/5 But… How to define "similarity"? A rough but efficient notion of "similar sequences": Given integers k and t, two sequences s1 and s2 are said similar if and only if they share at least t non Based on this definition, the Compareads overlapping k-mers. Nicolas Maillet, was Claire realized. Lemaitre, Rayan Chikhi, Dominique Lavenier software and Pierre Peterlongo, “Compareads: comparing huge metagenomic experiments.”, 2012. 2. Methodology 2/5 Tara Oceans expedition In collaboration with Thomas Vannier & Olivier Jaillon (Genoscope) 3. Some results 3/5 Tara Oceans expedition 0.8 to 5μm In collaboration with Thomas Vannier & Olivier Jaillon (Genoscope) 3. Some results 3/5 Tara Oceans expedition 0.8 to 5μm 180 to 2000μm In collaboration with Thomas Vannier & Olivier Jaillon (Genoscope) 3. Some results 3/5 Compareads enhanced: COMMET Direct intersection of multiple samples 4. New methodology 4/5 Compareads enhanced: COMMET Direct intersection of multiple samples 2x faster for small datasets, more for huge datasets. Storage footprint divided by a factor 100 using bit vectors. 4. New methodology 4/5 Compareads enhanced: COMMET Direct intersection of multiple samples Logical operations to combine comparison results 2x faster for small datasets, more for huge datasets. Storage footprint divided by a factor 100 using bit vectors. 4. New methodology 4/5 Compareads enhanced: COMMET Direct intersection of multiple samples Logical operations to combine comparison results Blue subset: reads from A not similar to any read from B. 2x faster for small datasets, more for huge datasets. 4. Green subset: reads from A similar to at least one read from B and one read from C. Orange subset: reads from B similar to at least one read from A or one read from C. Storage footprint divided by a Red subset : reads from C similar Nicolas Maillet, Guillaume Collet, Thomas Vannier, Dominique factor 100 using Lavenier and Pierre Peterlongo, “COMMET : comparing combining to at least one and read from A, but bit vectors. multiple metagenomic datasets.”, in press. not similar to any read from B. New methodology 4/5 Compareads enhanced: COMMET New outputs 5. Some Dendrogram and heatmaps representing the newsimilarity results! 5/5 Compareads enhanced: COMMET New outputs Metasoil T. O. Delmont et al., study fluctuation and “Structure, magnitude of a natural grassland soil metagenome”, 2012 Original study 5. Some Dendrogram and heatmaps representing the newsimilarity results! 5/5 Compareads enhanced: COMMET New outputs Metasoil T. O. Delmont et al., study fluctuation and “Structure, magnitude of a natural grassland soil metagenome”, 2012 Original study 5. Some Dendrogram and heatmaps representing the newsimilarity results! COMMET output (3.5x faster) 5/5 Thank you for your attention! http://github.com/pierrepeterlongo/commet
© Copyright 2024