Foredrag 4, Præcisionsmedicin, Sygdommens Baner og Big Data

Præcisionsmedicin, sygdommens baner og big data
Big
data” for
sundhed
Søren Brunak
Center for Biological Sequence Analysis
Technical University of Denmark
[email protected]
&
Novo Nordisk Foundation Center for Protein Research
University of Copenhagen
www.cpr.ku.dk
Præcisionsmedicin
TCCAAACCCAGGCTCTCTCCCAAACCAGTTTGCGGCAGATGGCCAGTGGAACCTCACTCTCCTCATCAGTAAAAAGGGGGCAGAGTGAGGGTCCTGAGAGCTAGTACAGGGACTGTG
TGAAGTAGACAATGCCCAGTGTTTAGCGTAAGAATCAGGGTCCAGCTGGTGCTCCCTAAACAGCAGCTGCTGTTCACTGTTGAAAGGCGCTCTGGAAGGCCAGGCGCGGTGGCTCAT
GCTTGTAATCCCAGCACTGTGGGAGGCCGAGGTGGGCGGATCACCTGAGGTAGGGAGTTCGAGACCAGCCTGACCAACGTGGAGAAACCCCATCTCTCCTAAAAATACAAAATTAGC
CAGGCGTGGTAGCACATACCTGTAATCCCAGCGACTCGGGAGGCTGAGGCAAGAGAATTGCTTGAAACCAGCAGGGGAGGTTGTGGTGAGCCAAGATCGAGCCATTGCACTCCAGCC
AGGGCAACAAGAGGCAAAATGGCGAAACTCCATCTCCGAGAAAAAAAAAAAAAAAGAATACTTTCTGAAAGTATTTATTCATACAAATAAAGACTTGACCCATAAGGTAGGAACGCA
AATGGGCCACGGAATCACTCATTCCACAGTATACACCGAGTGCCCTTGAAGTGCTGGGCACTGCTCCAGGATTGGGGGCATATTGGTGAAAAGAGAAGCAAGCCTGCCTGCTCAGAT
GGCAGGGAATGGGGAAAAACAGGGAGACAGTTTCCTGTTTGAGATGTTGGGAGTCTGCTTCGAGTAGTATATTTACTGGAAATAGACCACTAACTTGGATGTCCCTTTTTGGAAATG
TGCCTGCGTCCAGGGCTGGGTTGGGGCCCCAATGAACTTTGGCTCTGACATAGCTGTTGCCACACTCAGTGGAACTGAATCCATGTTTGCCTTCACCCGGCATCCTTCACCCCAACT
CTCCCCGCCACAACATACATCCCATGCCAGCCTGGGGACCCTCAAAGGTGCTTCATCATTAGGTTTGTGGCTGGGTCCTACTGAAGTAAGTCTTGGCACTCAGAGGGATAGGAATTG
AATGAAGACATGAGATTCCTCTGCGGGAGGCCTCTCTAGGAAATCTGTGGACTCACACGTTTACTAATGTTGCTGCAGCCCCGCACCCACCTTGGCCTTGGGCAGCCATACTCTAGG
GCTTTTGTAACCTCTCCATGTGAGGAACTCAAATTAGACCTGGGTTTGGAGGCGGTGCTCCGAGCTGGCCTTTGGGGGAGGTTTTGTGCGAGGCATTTCCCAAGTGCTGGCAGGATT
GTGTCACAGACACAGAGTAAACTTTTGCTGGGCTCCAAGTGACCGCCCATAGTTTATTATAAAGGTGACTGCACCCTGCAGCCACCAGCACTGCCTGGCTCCACGTGCCTCCTGGTC
TCAGTATGGCGCTGTCCTGGGTTCTTACAGTCCTGAGCCTCCTACCTCTGCTGGAAGCCCAGATCCCATTGTGTGCCAACCTAGTACCGGTGCCCATCACCAACGCCACCCTGGACC
GGGTGAGTGCCTGGGCTAGCCCTGTCCTGAGCACATGGGCAGCTGCCTCCCTTCTCTGGGCTTCCCTTTACCTGCTGGCTGTGGTCGCACCCCCACTCCCAGCTCTGCCTTTTTCTC
TTCTGGGTCCCCAGGGTGAAATTCTCACCAGCCCAGGGGACTCTGGAGGCACCCCCTGCCTCCAAACACAGAAGCCTCACTGCAGAGTCCTTCACGGAGGACGGTTCTGTGCTGGGC
CTGGAGGGGCTGCCTGGGGGGCAATGACTGATCCTCAGGGTGAGCTCCTGCATGCGCACTGCCCACCAGGGGCCTCATCTCCCCATCTGCAAAATCAGGGAGAGATCTGCCTGAGTC
TCCTCCCAGCTGACAGTCAAAGATTCAGCATCAAGCCCCCATCACCAGCTCCCCCCTTCTCCCCAGATCACTGGCAAGTGGTTTTATATCGCATCGGCCTTTCGAAACGAGGAGTAC
AATAAGTCGGTTCAGGAGATCCAAGCAACCTTCTTTTACTTTACCCCCAACAAGACAGAGGACACGATCTTTCTCAGAGAGTACCAGACCCGGTGAGAGCCCCCATTCCAATGCACC
CCCGATCTCAGCTGTCTGGCCAGAAGACCTGAGCAAGTCCCTCCTTCTTCCTGGCCTTGGCCTTCCCATGGGTGGAACCGGGAGGGTTGGCTTTAATCTCCACCAGAACTCTTGCCC
CGGGACTGTGATGGGCGATTGGCCACTTCTCCTCGATAACATTACTGTTTTTCTTCCGCCTTCTGGTTGACTTTAGCCAGAACCAGTGCTTCTATAACTCCAGTTACCTGAATGTCC
AGCGGGAGAATGGGACCGTCTCCAGATACGGTGAGGGCCAGCCCTCAGGCAGGAGGGTTCACCGTGGGAACAGGGCAGGCCAGCATAAGGTGGGGGCTGGATGTAGAGCCCTGGAGG
CTTTGGGCACAGAGAAATAACCACTAACATTTTTGAGCTCTTACCACGTGCTCAGAAAAAATCCCTAAGAAGACACTGAGAGAATTAGATGAGGAAACATAAGAACAGAGACCTCAA
ATAGTTTCCCCAAGGTCACACAGCTTATAATTAGAACTAGAATTGGAACTCCAGGCTGGCTTCAGATCTGCCTCTCTCTCACGCCCTCTTTAAGATCCTTTGCAAACCAATGGTAGA
AGCCTGTATGTTGGAGAGGTGGTACCTTCAACTATGTCCCCCATCACCGCAGAGGTGGCACATGGCAGGGATCTGATGGAGCTGAACTGACATCATTTAGCATCCCGAGCCTCCTCT
CTGGGCCTCATTTTCCTCCTCTGTAAAACGGGGAGAAAGGCCCTGACAGCCACAGTCTGTGTGAGGCTCCTGAGATCTCATGTACAGAAAGTGCTTGGCGTGGAGCTGGGCACGCAG
CAGGGGCTGGGCACACGGTGGCCCAAAGGAGACCCGGGCCTTCACTGATGGGCTTTGTGGCCCCGGACACATTTCTCTTCCAGAGGGAGGCCGAGAACATGTTGCTCACCTGCGTTC
CTTAGGGACACCCCTAGGACTCCTCACCTGTAAGACAGGCACCATTGTGCCATCCCATGTTCTCACCCAGAGGCTCTTAAGACCTTGATGTTTGGTTCCTACCTGGACGATGAGAAG
AACTGGGGGCTGTCTTTCTATGGTAGGCATGCTTAGCAGCCCCAAACTCATGCCCCTCTCAGGCCTCACCCCCCATTCACCCACCCCTGGGCTGGCCCCTAGAACCCCAGCCCTCCC
TGGCCTCCGCCGGGCCCCACCATGTCCCCAGTCAGTCTCCTTGCTCCCCCTGCAGCTGACAAGCCAGAGACGACCAAGGAGCAACTGGGAGAGTTCTACGAAGCTCTCGACTGCTTG
TGCATTCCCAGGTCAGATGTCATGTACACCGACTGGAAAAAGGTAAACGCAAGGGATTGGACATTGCCCACCTTGTCCATGGCCCAACTTGGGCAGCCCCAGAGGCCCAGAGCAGGA
AAGCTGCCAGGCAAGGCTGCACAGCTAGGCAGATCTTCTGCTTTTAGGCACCTGCCTCACTGTAGGGACAGCTGAGCTCTACAGAGGCCCAGGGGTGGTGGATGAGAGCCCAGGAGG
GAGAAGTCCCTGTGAAACCAGGGAGGACCTGAAAGCTAACAGGAGGGAACAGCGTGAGCCACGGGGTTGGGGGATTGGCAATTGGAGGGGACGTAATGCGGGGAGTTACCACCTACA
GACGCGTCCCAAACCCCAGGCTTTCACCCCAACCTCCACTCCCCGCTCATTTTTAATACCCGTGCAGTGGGGAATTGATACTGTGGTTTTCAATGTCACCCACACTGCAGCACGGCC
ACAGTCACCATCCCGATTTTTGCTACAAATGAAAATTACTGTATAATGAGCTCCTTAACACTTTTCTTTAAACCTGTGTTTGGAAGACTTGTGTTGGTGTGGCCCTGTGCCCTAATA
CCTGTGAAATCACAGCACCGATGAGCTGGTTCCAATTTTTAAAATATATACATGCAGTACTTCCATGACTATTCAAAGAAAAACAATTCCTTCCATTTGCCACCTGAGATGACCACC
AGGGATGTGAACTACCTCCTGCCCCATCCCCAGCCCCAGGATCCTGGGACAGGGCTTATGAACGCAACCACTGTAGTCAGCTCACTTGATCCACAGCCTGGCACCTCCACTGTCTGG
CTAGGGAGCCTCGAATGGGTCCCAAGGCCACCCTGCTCCTCAGTTACATCATCTGCATAGTAGTGGTGGTTGTGAGGAATTCAGGAGCTGCAGCATAAGGGCCCTGCAGGTACTATG
TGCTCAGTAAATGCCAGTGGTTCTTAAGGGTCTGAGCTCCCATTGTAGAGGCAAGTAAGCTGAGGTTCAGAGAAGAAAATGACTTGCCCAAGATCACCCAGCTGGGAAGTGACAGTG
CCAGGGTTGGAGCCCTGGTTGAGCTGGTTCCACAGGCCAGAGCTCATTCTGCCCTCTCCCCGGAAGACCTCCCACCCTGTCCCCATGCCTCTGCTTCTCCCTCACCCCAATTCCCCG
CTGCCTTCTAGGATAAGTGTGAGCCACTGGAGAAGCAGCACGAGAAGGAGAGGAAACAGGAGGAGGGGGAATCCTAGCAGGACACAGCCTTGGATCAGGACAGAGACTTGGGGGCCA
TCCTGCCCCTCCAACCCGACATGTGTACCTCAGCTTTTTCCCTCACTTGCATCAATAAAGCTTCGCATCGGCCTTTCGAAACGAGGAGTACAATAAGTCGGTTCAGGAGCCCTCAGG
CAGGAGGGTTCACCGTGGGAACAGGGCAGGCCAGCATAAGGTGGGGGCTGGATGTAGAGCCCTGGAGGCTTTGGGCACAGAGGCCACCCTGGACCGGGTGAGTGCCTGGGCTAGCCC
TGTCCTGAGCACATGGGCAGCTGCCTCCCTTCTCTGGGCTTCCCTTTACCTGCTGGCTGTGGTCGCACCCCCACTCCCAGCCCCCAACTCTCCCCGCCACAACATACATCCCATGCC
CAGGAGGGTTCACCGTGGGAACAGGGCAGGCCAGCATAAGGTGGGGGCTGGATGTAGAGCCCTGGAGGCTTTGGGCACAGAGGCCACCCTGGACCGGGTGAGTGCCTGGGCTAGCCC
Fra molekylære data til sygdomme ...
... fra sygdomme til molekylære data
biomedbridges.eu
Sygdoms-omet versus en enkelt sygdom
Korrelationer mellem sygdomme i det fulde
sygdomsspektrum
Sygdomsforløb versus ”destination”
Bivirkningsforløb og deres genetiske relationer
Landspatientregistret (6.2 M danskere)
ICD10 diagnoser som funktion af alder
In
Out
IR
Females
Males
KOL trajektorie-netværk
data fra 6.2 M danskere
Jensen et al., Nature Comm., 2014
Diabetes trajektorie-netværk
Jensen et al., Nature Comm., 2014
Registre versus komplette
patientjournaler versus spørgeskemaer
Zettabyte = 1 trillion gigabytes
En mere komplet patientprofil: ICD10 koder fra
”text mining”
F20
Negation
F200
Familjerelateret
,
Jensen et al., Nature
Rev. Genet. 2012
Strukturerede ICD10 koder versus tekst analyse
4947
3825
32626
Assigned Codes
Mined Codes
Patientstratificering på et helt hospital
Patient 1
Patient 2
Patient 3
Roque et al. PLoS Comp. Biol. 2011,
Jensen et al., Nature Rev. Genet. 2012
Roque et al. PLoS Comp. Biol. 2011,
Jensen et al., Nature Rev. Genet. 2012
Bivirkninger:
Vi har kun nogle få i databaserne
Text mining af
lægemiddelbetegnelser,
”begivenheder”, diagnoser, …
Text mining af bivirkninger i journaltekst
(7,500 lægemidler and 21,000 bivirkninger)
Eriksson et al. Drug Safety 2014
Sammenhæng mellem dosis og
bivirkninger
ADRs and doses are normalized on multiples of the minimum dose prescribed of each drug.
Plot for 21 days steady dosage data is visualized, sample average slope 0.1105 (95% CI, 0.03085-0.1901), non-zero slope p-value
was 0,0074, all individual drug slopes are positive except for haloperidol.
Eriksson et al. Drug Safety 2014
Vi bliver i stigende grad til
intensivpatienter på data-niveauet
Aggregering af tidsskaler
Sepsis overlevelse på tværs af forhistorie
P_alive/P_dead
30 days after A41
”Big data” på meget store patientkohorter
(110M patienter, registerdata)
Blair et al., Cell 2013
∆ big data
• Sundhedsdata:
–
–
–
–
Redefinere fænotyper
Håndtere støj bedre
Håndtere livslang datafangst
”Live data” versus data dumps versus registre
• Inkludere det der ikke er i
journalen på nye måder
– Ernæring, indkomst, uddannelse, …
– Skævvridende faktorer,
etniske faktorer,