Præcisionsmedicin, sygdommens baner og big data Big data” for sundhed Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark [email protected] & Novo Nordisk Foundation Center for Protein Research University of Copenhagen www.cpr.ku.dk Præcisionsmedicin TCCAAACCCAGGCTCTCTCCCAAACCAGTTTGCGGCAGATGGCCAGTGGAACCTCACTCTCCTCATCAGTAAAAAGGGGGCAGAGTGAGGGTCCTGAGAGCTAGTACAGGGACTGTG TGAAGTAGACAATGCCCAGTGTTTAGCGTAAGAATCAGGGTCCAGCTGGTGCTCCCTAAACAGCAGCTGCTGTTCACTGTTGAAAGGCGCTCTGGAAGGCCAGGCGCGGTGGCTCAT GCTTGTAATCCCAGCACTGTGGGAGGCCGAGGTGGGCGGATCACCTGAGGTAGGGAGTTCGAGACCAGCCTGACCAACGTGGAGAAACCCCATCTCTCCTAAAAATACAAAATTAGC CAGGCGTGGTAGCACATACCTGTAATCCCAGCGACTCGGGAGGCTGAGGCAAGAGAATTGCTTGAAACCAGCAGGGGAGGTTGTGGTGAGCCAAGATCGAGCCATTGCACTCCAGCC AGGGCAACAAGAGGCAAAATGGCGAAACTCCATCTCCGAGAAAAAAAAAAAAAAAGAATACTTTCTGAAAGTATTTATTCATACAAATAAAGACTTGACCCATAAGGTAGGAACGCA AATGGGCCACGGAATCACTCATTCCACAGTATACACCGAGTGCCCTTGAAGTGCTGGGCACTGCTCCAGGATTGGGGGCATATTGGTGAAAAGAGAAGCAAGCCTGCCTGCTCAGAT GGCAGGGAATGGGGAAAAACAGGGAGACAGTTTCCTGTTTGAGATGTTGGGAGTCTGCTTCGAGTAGTATATTTACTGGAAATAGACCACTAACTTGGATGTCCCTTTTTGGAAATG TGCCTGCGTCCAGGGCTGGGTTGGGGCCCCAATGAACTTTGGCTCTGACATAGCTGTTGCCACACTCAGTGGAACTGAATCCATGTTTGCCTTCACCCGGCATCCTTCACCCCAACT CTCCCCGCCACAACATACATCCCATGCCAGCCTGGGGACCCTCAAAGGTGCTTCATCATTAGGTTTGTGGCTGGGTCCTACTGAAGTAAGTCTTGGCACTCAGAGGGATAGGAATTG AATGAAGACATGAGATTCCTCTGCGGGAGGCCTCTCTAGGAAATCTGTGGACTCACACGTTTACTAATGTTGCTGCAGCCCCGCACCCACCTTGGCCTTGGGCAGCCATACTCTAGG GCTTTTGTAACCTCTCCATGTGAGGAACTCAAATTAGACCTGGGTTTGGAGGCGGTGCTCCGAGCTGGCCTTTGGGGGAGGTTTTGTGCGAGGCATTTCCCAAGTGCTGGCAGGATT GTGTCACAGACACAGAGTAAACTTTTGCTGGGCTCCAAGTGACCGCCCATAGTTTATTATAAAGGTGACTGCACCCTGCAGCCACCAGCACTGCCTGGCTCCACGTGCCTCCTGGTC TCAGTATGGCGCTGTCCTGGGTTCTTACAGTCCTGAGCCTCCTACCTCTGCTGGAAGCCCAGATCCCATTGTGTGCCAACCTAGTACCGGTGCCCATCACCAACGCCACCCTGGACC GGGTGAGTGCCTGGGCTAGCCCTGTCCTGAGCACATGGGCAGCTGCCTCCCTTCTCTGGGCTTCCCTTTACCTGCTGGCTGTGGTCGCACCCCCACTCCCAGCTCTGCCTTTTTCTC TTCTGGGTCCCCAGGGTGAAATTCTCACCAGCCCAGGGGACTCTGGAGGCACCCCCTGCCTCCAAACACAGAAGCCTCACTGCAGAGTCCTTCACGGAGGACGGTTCTGTGCTGGGC CTGGAGGGGCTGCCTGGGGGGCAATGACTGATCCTCAGGGTGAGCTCCTGCATGCGCACTGCCCACCAGGGGCCTCATCTCCCCATCTGCAAAATCAGGGAGAGATCTGCCTGAGTC TCCTCCCAGCTGACAGTCAAAGATTCAGCATCAAGCCCCCATCACCAGCTCCCCCCTTCTCCCCAGATCACTGGCAAGTGGTTTTATATCGCATCGGCCTTTCGAAACGAGGAGTAC AATAAGTCGGTTCAGGAGATCCAAGCAACCTTCTTTTACTTTACCCCCAACAAGACAGAGGACACGATCTTTCTCAGAGAGTACCAGACCCGGTGAGAGCCCCCATTCCAATGCACC CCCGATCTCAGCTGTCTGGCCAGAAGACCTGAGCAAGTCCCTCCTTCTTCCTGGCCTTGGCCTTCCCATGGGTGGAACCGGGAGGGTTGGCTTTAATCTCCACCAGAACTCTTGCCC CGGGACTGTGATGGGCGATTGGCCACTTCTCCTCGATAACATTACTGTTTTTCTTCCGCCTTCTGGTTGACTTTAGCCAGAACCAGTGCTTCTATAACTCCAGTTACCTGAATGTCC AGCGGGAGAATGGGACCGTCTCCAGATACGGTGAGGGCCAGCCCTCAGGCAGGAGGGTTCACCGTGGGAACAGGGCAGGCCAGCATAAGGTGGGGGCTGGATGTAGAGCCCTGGAGG CTTTGGGCACAGAGAAATAACCACTAACATTTTTGAGCTCTTACCACGTGCTCAGAAAAAATCCCTAAGAAGACACTGAGAGAATTAGATGAGGAAACATAAGAACAGAGACCTCAA ATAGTTTCCCCAAGGTCACACAGCTTATAATTAGAACTAGAATTGGAACTCCAGGCTGGCTTCAGATCTGCCTCTCTCTCACGCCCTCTTTAAGATCCTTTGCAAACCAATGGTAGA AGCCTGTATGTTGGAGAGGTGGTACCTTCAACTATGTCCCCCATCACCGCAGAGGTGGCACATGGCAGGGATCTGATGGAGCTGAACTGACATCATTTAGCATCCCGAGCCTCCTCT CTGGGCCTCATTTTCCTCCTCTGTAAAACGGGGAGAAAGGCCCTGACAGCCACAGTCTGTGTGAGGCTCCTGAGATCTCATGTACAGAAAGTGCTTGGCGTGGAGCTGGGCACGCAG CAGGGGCTGGGCACACGGTGGCCCAAAGGAGACCCGGGCCTTCACTGATGGGCTTTGTGGCCCCGGACACATTTCTCTTCCAGAGGGAGGCCGAGAACATGTTGCTCACCTGCGTTC CTTAGGGACACCCCTAGGACTCCTCACCTGTAAGACAGGCACCATTGTGCCATCCCATGTTCTCACCCAGAGGCTCTTAAGACCTTGATGTTTGGTTCCTACCTGGACGATGAGAAG AACTGGGGGCTGTCTTTCTATGGTAGGCATGCTTAGCAGCCCCAAACTCATGCCCCTCTCAGGCCTCACCCCCCATTCACCCACCCCTGGGCTGGCCCCTAGAACCCCAGCCCTCCC TGGCCTCCGCCGGGCCCCACCATGTCCCCAGTCAGTCTCCTTGCTCCCCCTGCAGCTGACAAGCCAGAGACGACCAAGGAGCAACTGGGAGAGTTCTACGAAGCTCTCGACTGCTTG TGCATTCCCAGGTCAGATGTCATGTACACCGACTGGAAAAAGGTAAACGCAAGGGATTGGACATTGCCCACCTTGTCCATGGCCCAACTTGGGCAGCCCCAGAGGCCCAGAGCAGGA AAGCTGCCAGGCAAGGCTGCACAGCTAGGCAGATCTTCTGCTTTTAGGCACCTGCCTCACTGTAGGGACAGCTGAGCTCTACAGAGGCCCAGGGGTGGTGGATGAGAGCCCAGGAGG GAGAAGTCCCTGTGAAACCAGGGAGGACCTGAAAGCTAACAGGAGGGAACAGCGTGAGCCACGGGGTTGGGGGATTGGCAATTGGAGGGGACGTAATGCGGGGAGTTACCACCTACA GACGCGTCCCAAACCCCAGGCTTTCACCCCAACCTCCACTCCCCGCTCATTTTTAATACCCGTGCAGTGGGGAATTGATACTGTGGTTTTCAATGTCACCCACACTGCAGCACGGCC ACAGTCACCATCCCGATTTTTGCTACAAATGAAAATTACTGTATAATGAGCTCCTTAACACTTTTCTTTAAACCTGTGTTTGGAAGACTTGTGTTGGTGTGGCCCTGTGCCCTAATA CCTGTGAAATCACAGCACCGATGAGCTGGTTCCAATTTTTAAAATATATACATGCAGTACTTCCATGACTATTCAAAGAAAAACAATTCCTTCCATTTGCCACCTGAGATGACCACC AGGGATGTGAACTACCTCCTGCCCCATCCCCAGCCCCAGGATCCTGGGACAGGGCTTATGAACGCAACCACTGTAGTCAGCTCACTTGATCCACAGCCTGGCACCTCCACTGTCTGG CTAGGGAGCCTCGAATGGGTCCCAAGGCCACCCTGCTCCTCAGTTACATCATCTGCATAGTAGTGGTGGTTGTGAGGAATTCAGGAGCTGCAGCATAAGGGCCCTGCAGGTACTATG TGCTCAGTAAATGCCAGTGGTTCTTAAGGGTCTGAGCTCCCATTGTAGAGGCAAGTAAGCTGAGGTTCAGAGAAGAAAATGACTTGCCCAAGATCACCCAGCTGGGAAGTGACAGTG CCAGGGTTGGAGCCCTGGTTGAGCTGGTTCCACAGGCCAGAGCTCATTCTGCCCTCTCCCCGGAAGACCTCCCACCCTGTCCCCATGCCTCTGCTTCTCCCTCACCCCAATTCCCCG CTGCCTTCTAGGATAAGTGTGAGCCACTGGAGAAGCAGCACGAGAAGGAGAGGAAACAGGAGGAGGGGGAATCCTAGCAGGACACAGCCTTGGATCAGGACAGAGACTTGGGGGCCA TCCTGCCCCTCCAACCCGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAAGCTTCGCATCGGCCTTTCGAAACGAGGAGTACAATAAGTCGGTTCAGGAGCCCTCAGG CAGGAGGGTTCACCGTGGGAACAGGGCAGGCCAGCATAAGGTGGGGGCTGGATGTAGAGCCCTGGAGGCTTTGGGCACAGAGGCCACCCTGGACCGGGTGAGTGCCTGGGCTAGCCC TGTCCTGAGCACATGGGCAGCTGCCTCCCTTCTCTGGGCTTCCCTTTACCTGCTGGCTGTGGTCGCACCCCCACTCCCAGCCCCCAACTCTCCCCGCCACAACATACATCCCATGCC CAGGAGGGTTCACCGTGGGAACAGGGCAGGCCAGCATAAGGTGGGGGCTGGATGTAGAGCCCTGGAGGCTTTGGGCACAGAGGCCACCCTGGACCGGGTGAGTGCCTGGGCTAGCCC Fra molekylære data til sygdomme ... ... fra sygdomme til molekylære data biomedbridges.eu Sygdoms-omet versus en enkelt sygdom Korrelationer mellem sygdomme i det fulde sygdomsspektrum Sygdomsforløb versus ”destination” Bivirkningsforløb og deres genetiske relationer Landspatientregistret (6.2 M danskere) ICD10 diagnoser som funktion af alder In Out IR Females Males KOL trajektorie-netværk data fra 6.2 M danskere Jensen et al., Nature Comm., 2014 Diabetes trajektorie-netværk Jensen et al., Nature Comm., 2014 Registre versus komplette patientjournaler versus spørgeskemaer Zettabyte = 1 trillion gigabytes En mere komplet patientprofil: ICD10 koder fra ”text mining” F20 Negation F200 Familjerelateret , Jensen et al., Nature Rev. Genet. 2012 Strukturerede ICD10 koder versus tekst analyse 4947 3825 32626 Assigned Codes Mined Codes Patientstratificering på et helt hospital Patient 1 Patient 2 Patient 3 Roque et al. PLoS Comp. Biol. 2011, Jensen et al., Nature Rev. Genet. 2012 Roque et al. PLoS Comp. Biol. 2011, Jensen et al., Nature Rev. Genet. 2012 Bivirkninger: Vi har kun nogle få i databaserne Text mining af lægemiddelbetegnelser, ”begivenheder”, diagnoser, … Text mining af bivirkninger i journaltekst (7,500 lægemidler and 21,000 bivirkninger) Eriksson et al. Drug Safety 2014 Sammenhæng mellem dosis og bivirkninger ADRs and doses are normalized on multiples of the minimum dose prescribed of each drug. Plot for 21 days steady dosage data is visualized, sample average slope 0.1105 (95% CI, 0.03085-0.1901), non-zero slope p-value was 0,0074, all individual drug slopes are positive except for haloperidol. Eriksson et al. Drug Safety 2014 Vi bliver i stigende grad til intensivpatienter på data-niveauet Aggregering af tidsskaler Sepsis overlevelse på tværs af forhistorie P_alive/P_dead 30 days after A41 ”Big data” på meget store patientkohorter (110M patienter, registerdata) Blair et al., Cell 2013 ∆ big data • Sundhedsdata: – – – – Redefinere fænotyper Håndtere støj bedre Håndtere livslang datafangst ”Live data” versus data dumps versus registre • Inkludere det der ikke er i journalen på nye måder – Ernæring, indkomst, uddannelse, … – Skævvridende faktorer, etniske faktorer,
© Copyright 2024