GENOMIC STUDIES OF THE NORMAL AND MALIGNANT NEURAL CREST by Olena Morozova B.Sc. (Hons), University of Toronto, 2006 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Bioinformatics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2012 © Olena Morozova, 2012 Abstract Neuroblastoma (NBL) is an enigmatic pediatric tumor of the sympathetic nervous system that is lethal in most children diagnosed over 18 months of age with metastatic disease. NBL is thought to originate from a differentiation arrest of the neural crest, a vertebrate-specific cell lineage with one of the most diverse developmental potentials. Genomic studies of NBL have contributed to the development of new diagnostic and prognostic markers. In addition, somatic and germline mutations in the ALK oncogene have been identified and are being targeted clinically. Based on this prior work, two hypotheses were developed and addressed in this thesis: (1) characterization of NBL with higher resolution genomic technologies will lead to the identification of novel loci that contribute to the disease and (2) analysis of the transcriptome of normal neural crest cells will help identify loci of relevance to NBL. To address these hypotheses I used several datasets generated from microarrays as well as RNA and DNA sequencing experiments. Two key results have emerged from this analysis including the putative role of the BRCA1/BARD1 pathway in the development of NBL, and the heterogeneity of the genetic landscape of primary NBL tumors. Potential translational avenues for the results reported in this thesis are the exploration of AURKB and MAPK inhibitors as treatment agents for NBL. ii Preface Portions of Chapter 1 have been published: O. Morozova, M. Hirst, M.A. Marra. Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10:135-51, 2009. Copyright by Annual Reviews; O. Morozova and M.A. Marra. Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5), 2008. Copyright by Elsevier; O. Morozova and M.A. Marra. From cytogenetics to next-generation sequencing technologies: advances in the detection of genome rearrangements in tumors. Biochem. Cell Biol. 86(2):81-91, 2008. Copyright by Canadian Science Publishing. I have written most of the text for these review manuscripts with guidance and input from my supervisor, M.A. Marra, and the co-author, M. Hirst. Portions of Chapter 2 have been pubished in three manuscripts: H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press; M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier; O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008. The author contributions for each manuscript are provided below on a per-manuscript basis. Sections 2.2.1, 2.2.2, 2.4.1, 2.4.2, 2.4.3, 2.4.4; Figures 2.1, 2.2, 2.3, 2.4, and Table 2.1 are based on the manuscript: H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press. H.J., K.L.J., J.A.B., M.P., and F.D.M were involved in the conception and design of the study. H.J., K.L.J. and J.A.B. performed the collection and analysis of experimental data, including the RT PCR experiments described in Section 2.2.2. R.H., M.A.R., Y.C. and F.R. provided study material. F.D.M. provided financial support and supervised the study. M.A.M. participated in data iii interpretation, and provided supervisory support, including manuscript approval, for the computational part of the study (microarray data analysis). I performed all the microarray data analysis, made the figures, interpreted the results, and wrote the sections of the manuscript reproduced in this thesis, except as specified below. The RT PCR panels in Figures 2.3A and B were made by members of F.D.M.‘s laboratory. The description of the RT PCR method in Section 2.4.4 was written by members of F.D.M.‘s laboratory. All animal use was approved by the Animal Care Committee for the Hospital for Sick Children in accordance with the Canadian Council of Animal Care policies. Sections 2.2.5, 2.2.5.1, 2.2.5.2, 2.4.6, 2.4.8; Figures 2.6, and 2.7B, and Table 2.3 are based on the manuscript: M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier. The author contributions for the sections of the manuscript described in the thesis are provided below. Members of the Eaves laboratory and their colleagues made the SAGE libraries described in Table 2.3, and defined the 319 candidate pluripotency genes listed in Appendix B. G.R. performed the PASTAA motif enrichment analysis discussed in Section 2.2.5.2, participated in wrting Sections 2.2.5.2 and 2.4.8 and made Figure 2.7B. M.A.M. provided supervisory support for the seriation component of the study. I designed and performed the seriation analysis of ESC SAGE libraries, made figures and tables and wrote the sections of the manuscript reproduced in this thesis, except as defined above. Sections 2.2.5.1 and 2.4.7 are based on the manuscript: O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008. O.M. and M.A.M. conceived and designed the study, and cowrote the manuscript with input from all co-authors. V.M. developed and implemented the seriation algorithm in Matlab. B.G.H. and C.D.H. constructed pancreatic SAGE libraries and provided guidance for biological interpretation of the results (the description and analysis of pancreatic SAGE libraries is not included in this thesis). M.A.M. supervised the study. I adopted the seriation algorithm to the analysis of SAGE data, performed the analysis, interpreted the results and wrote the manuscript with input from all co-authors, including all portions of the manuscript reproduced in this thesis. iv A version of Chapter 3 has been published: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin. Cancer Res. 16(18):4572-82, 2010. Copyright by the American Association for Cancer Research. M.V. performed the protein experiments, including mass spectrometry (Section 3.2.3) and Western Blot (Section 3.2.5), and participated in writing the relevant sections of the manuscript (Sections 3.2.3, 3.4.4, 3.4.6). P.T., T.K., and M.F.M. provided technical and supervisory assistance with the mass spectrometry facility, and approved the manuscript. N.G. performed the drug inhibitor studies, made panel C of Figure 3.3 and participated in writing Section 3.4.5. L.M.H. isolated and cultured NBL TIC lines, provided materials for sequencing, and participated in writing Section 3.4.1 (describing the culturing of NBL TICs and SKPs). K.M.B. performed shRNA experiments (Section 3.2.5), generated data for panel B of Figure 3.3, and participated in writing the relevant sections of the manuscript (Sections 3.2.5 and 3.4.7). J.M. provided supervisory support to K.M.B. and approved the manuscript. A.M., T.C., R.D.M., N.T., R.V., and S.J. provided bioinformatic assistance with processing RNA-Seq data and approved the manuscript. M.H., R.M., and Y.Z. provided technical assistance with library construction and RNA sequencing of NBL TIC and SKP libraries and approved the manuscript. K.M.S. provided technical assistance to the Toronto group. F.M. provided SKP lines for the study. D.R.K. provided project leadership and financial support to the Toronto component of the study, and approved the manuscript. M.A.M. provided supervisory and financial support, participated in the study design, and approved the manuscript. I participated in the study design, conceived and performed all the computational analyses detailed in Sections 3.2.1, 3.2.2, 3.2.4, 3.2.6, and 3.2.7), interpreted the data, made the figures and tables (except as described above), and wrote the manuscript with input from all co-authors. A version of Chapter 4 is in revision: T.J. Pugh*, O. Morozova*, E.F. Attiyeh, S. Asgharzadeh, J.S. Wei, D. Auclair, K. Cibulskis, M.S. Lawrence, A.H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, I. Birol, R.D. Corbett, K.L. Mungall, Y. Zhao, R.A. Moore, N. Thiessen, A. Lo, R. Chiu, S.D. Jackman, A. Ally, B. Kamoh, A. Tam, J. Qian, M. Krzywinski, M. Hirst, S.J. Diskin, Y.P. Mosse, K.A. Cole, M. Diamond, R. Sposto, L. Ji, T. v Badgett, W.B. London, Y. Moyer, J.M. Gastier-Foster, M.A. Smith, J.M. Guidry Auvil, D.S. Gerhard, M.D. Hogarty, S.B. Gabriel, S.J.M. Jones, G. Getz, R.C. Seeger, J. Khan, M.A. Marra, M. Meyerson, J.M. Maris. The genomic landscape of high-risk neuroblastomas reveals a wide spectrum of somatic mutation. In Revision. *Authors contributed equally. J.M.M., J.K., R.C.S., D.S.G. and M.A.S. conceived and led the project. M.A.M. and M.M. conceived and supervised all aspects of the sequencing work at the Genome Sciences Centre and Broad Institute, respectively. E.F.A., S.A., J.S.W., S.J.D., Y.P.M, K.A.C., L.J, T.B., Y.M., J.G-F and M.H. selected and characterized samples, provided disease-specific expertise in data analysis, and edited the manuscript. R.S. and W.L. provided statistical support. D.A, E.S., C.S., M.D., and J.M.G.A. provided overall project management and quality control support. K.C., M.S.L., A.H.R., and A.S. supported analysis of exome sequencing data. I.B., K.L.M., R.C., S.J., and J.Q. performed de novo assembly of Illumina sequencing data. Y.Z. led the library construction effort for the Illumina libraries. A.T. and Y.Z. planned the sequencing verification, and A.A. and B.K. performed the experiments. R.D.C. performed copy number analysis of genome sequencing data. M.K. performed verification of candidate rearrangements. N.T. ran the gene- and exon-level quantification pipeline on the RNA-Seq data. A.L. helped interpret data provided by Complete Genomics, Inc. R.A.M. and M.H. led the sequencing effort for the Illumina genome and transcriptome libraries. S.B.G. led the sequencing effort for the exome sequencing libraries. G.G. and S.J.M.J. supervised the bioinformatics group at the Broad Institute and Genome Sciences Centre, respectively. T.J.P. performed the mutation analysis of the exome sequencing data and the MutSig analysis. I performed the mutation analysis of genome and transcriptome sequencing data, and conducted integrative analysis of these data by combining mutation analysis, copy number analysis and de novo assembly results. Together with T.J.P., I combined the exome, genome, and transcriptome data from the different sequencing platforms, and interpreted the results. In concert with T.J.P, D.J.G., M.A.M., M.M. and J.M.M, I co-wrote the manuscript with input from all co-authors. I made all the figures and tables in Chapter 4, except Figure 4.1, Figure 4.2, Table 4.1 and Table 4.2, which were modified from Trevor Pugh‘s work. vi Table of Contents Abstract .................................................................................................................................... ii Preface ..................................................................................................................................... iii Table of Contents .................................................................................................................. vii List of Tables .......................................................................................................................... xi List of Abbreviations ........................................................................................................... xiii Acknowledgements .............................................................................................................. xiv Dedication ............................................................................................................................. xvi Chapter 1: Evolving methods of genomic analysis and their application to the study of neuroblastoma ......................................................................................................................... 1 1.1 Introduction ........................................................................................................................... 1 1.2 Cancer as a genetic disease ................................................................................................... 1 1.3 Cancer as a multigenic disease.............................................................................................. 2 1.4 Origin of genetic mutations in cancers.................................................................................. 3 1.4.1 Familial cancers and cancer syndromes............................................................................ 3 1.4.2 Genetic causes of sporadic cancers................................................................................... 4 1.5 Cancer stem cell hypothesis.............................................................................................. 5 1.6 Genetic lesions in cancers and methods for their detection .................................................. 6 1.6.1 Pre-genomic methods for studying genetic aberrations in cancers ................................... 6 1.6.2 Array-based methods for the detection of genetic lesions in cancer genomes ................. 8 1.6.3 Sequencing approaches for the detection of genetic lesions in cancers ............................ 9 1.6.3.1 Advances in DNA sequencing technologies ............................................................ 9 1.6.3.2 Sanger-based sequencing methods for the detection of genetic lesions................. 13 1.6.3.3 Cancer sequencing studies using the Sanger technology ....................................... 16 1.6.3.4 Cancer genome and exome sequencing using new sequencing technologies ........ 18 1.7 Cancer transcriptomes as proxies for the genomic diversity of tumors .............................. 20 1.7.1 Transcriptome analysis of cancers using microarrays .................................................... 20 1.7.2 Sequence census approaches to transcriptome analysis.................................................. 22 1.7.2.1 Whole transcriptome sequencing of cancers .......................................................... 23 1.8 Integrative genomics of cancers .......................................................................................... 25 1.9 Childhood neuroblastoma ................................................................................................... 26 vii 1.9.1 Classification, treatment and prognosis .......................................................................... 27 1.9.2 Neuroblastoma genetics and genomics ........................................................................... 29 1.9.2.1 Copy number aberrations ....................................................................................... 30 1.9.2.2 Gene expression profiling of neuroblastoma ......................................................... 31 1.9.2.3 Genetically engineered mouse models of neuroblastoma ...................................... 32 1.10 Thesis roadmap and chapter summaries ............................................................................. 32 Chapter 2: Transcriptome analysis of normal neural crest stem cells ........................... 40 2.1 Introduction ......................................................................................................................... 40 2.2 Results ................................................................................................................................. 43 2.2.1 SKPs of distinct developmental origin are highly similar at the transcriptional level and differ from bone marrow mesenchymal stem cells (MSCs) ........................................................ 43 2.2.2 SKPs of distinct developmental origin maintain a lineage history at the gene expression level….. ....................................................................................................................................... 44 2.2.3 Identification of genes significantly enriched and depleted in neural crest stem cell-like cells…… ...................................................................................................................................... 45 2.2.4 Pathway analysis of SKP-enriched and SKP-depleted transcripts ................................. 46 2.2.5 SKPs share expression profile similarities with ES cells ............................................... 48 2.2.5.1 Identification of genes associated with the maintenance of the undifferentiated state in human ES cells ............................................................................................................ 48 2.2.5.2 Validation of pluripotency markers using computational methods ....................... 49 2.2.5.3 Pluripotency genes whose transcripts are enriched or depleted in normal neural crest stem cell-like cells compared to mesenchymal stem cells .............................................. 51 2.3 Discussion ........................................................................................................................... 52 2.4 Materials and methods ........................................................................................................ 54 2.4.1 Microarray analysis of rat SKP lines .............................................................................. 54 2.4.2 Unsupervised analysis to assess global transcriptome similarity ................................... 55 2.4.3 Differential expression analysis using microarrays ........................................................ 56 2.4.4 Reverse Transcription Polymerase Chain Reaction (RT-PCR) to confirm results from SKP microarray analysis.............................................................................................................. 56 2.4.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs ..... 57 2.4.6 Seriation analysis of LongSAGE libraries from the Cancer Genome Anatomy Project 58 2.4.7 Seriation using the progressive construction of contigs heuristic ................................... 58 2.4.8 Computational validation of transcripts in Supercontig 1 as pluripotency markers ........... 60 viii Chapter 3: Transcriptome analysis of neuroblastoma tumor-initiating cells for therapeutic target prediction ............................................................................................... 93 3.1 Introduction ......................................................................................................................... 93 3.2 Results ................................................................................................................................. 94 3.2.1 Identification of genes preferentially enriched or depleted in NBL TICs compared to a compendium of cancer tissues and SKPs .................................................................................... 94 3.2.2 Elevated mRNA levels of BRCA1 signaling pathway members are associated with the NBL TIC phenotype .................................................................................................................... 96 3.2.3 MudPIT analysis confirms the abundance of DNA repair proteins in the proteome of a NBL TIC line ............................................................................................................................... 97 3.2.4 Known drug targets among NBL TIC-enriched transcripts............................................ 98 3.2.5 Targeting BRCA1 signaling: inhibition of AURKB is selectively cytotoxic to NBL TICs….. ....................................................................................................................................... 99 3.2.6 Exon-level expression analysis of BARD1 reveals a potential mechanism for the sensitivity of NBL TICs to AURKB inhibition ......................................................................... 100 3.2.7 Relevance to primary neuroblastoma ........................................................................... 102 3.3 Discussion ......................................................................................................................... 104 3.4 Materials and methods ...................................................................................................... 107 3.4.1 RNA sequencing and data analysis............................................................................... 107 3.4.2 Microarray experiments and data analysis.................................................................... 108 3.4.3 Identification of NBL TIC-enriched and depleted genes and the functional enrichment analysis. ............................................................................................. …………………………108 3.4.4 Gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry .............................................................................................................................. 109 3.4.5 AlamarBlue assay ......................................................................................................... 110 3.4.6 Western blotting ........................................................................................................... 110 3.4.7 Small hairpin RNA (shRNA) knockdowns .................................................................. 111 3.4.8 Exon-level analysis of RNA sequencing data............................................................... 111 3.4.9 AURKB expression analysis ........................................................................................ 112 Chapter 4: Whole genome characterization of primary neuroblastoma tumors reveals a wide spectrum of somatic alteration ................................................................................. 137 4.1 Introduction ....................................................................................................................... 137 4.2 Results ............................................................................................................................... 138 4.2.1 Exome sequencing ........................................................................................................ 138 ix 4.2.2 Whole genome and transcriptome sequencing ............................................................. 139 4.2.3 Overall mutation frequencies ........................................................................................ 140 4.2.4 Verification of candidate somatic mutations using orthogonal approaches ................. 141 4.2.5 Genes and pathways with significant frequency of mutation ....................................... 142 4.2.6 Genome rearrangements and structural variants ........................................................... 144 4.2.7 Mutations in other known cancer genes and regions .................................................... 145 4.3 Discussion ......................................................................................................................... 147 4.4 Materials and methods ...................................................................................................... 149 4.4.1 Sample selection and preparation ................................................................................. 149 4.4.2 Illumina library construction and sequencing ............................................................... 149 4.4.3 Detection of candidate somatic mutations in genome sequencing data ........................ 149 4.4.4 Gene coverage in transcriptome sequencing data ......................................................... 150 4.4.5 Copy number analysis using genome sequencing data ................................................. 151 4.4.6 Rearrangement detection .............................................................................................. 151 4.4.7 Exome sequencing and data analysis ............................................................................ 153 4.4.8 Integrated analysis of somatic variation from exome and genome data sets ................ 153 Chapter 5: Conclusions and future directions ................................................................. 178 5.1 Transcriptome analysis of normal neural crest cells identifies key pathways, enriched and depleted in this population compared to other related cell types ................................................... 178 5.2 Plasticity of the neural crest stem cell phenotype and NBL heterogeneity ....................... 180 5.3 Transcriptome analysis of NBL tumor-initiating cells implicates AURKB as a novel .... 181 drug target for NBL........................................................................................................................ 181 5.4 Whole genome, transcriptome and exome sequencing of primary NBL tumors reveals a broad spectrum of somatic mutations ............................................................................................ 183 5.5 Future directions in NBL genomics .................................................................................. 184 Bibliography ........................................................................................................................ 184 Appendices ........................................................................................................................... 222 Appendix A Transcripts enriched and depleted in SKPs as discussed in Chapter 2 ................ 222 Appendix B Candidate pluripotency genes used for seriation analysis in Chapter 2 ............... 255 Appendix C Transcripts enriched and depleted in NBL TICs.................................................. 263 Appendix D Original data for the 99 NBL cases described in Chapter 4 ................................. 290 Appendix E Variant calls detected in the 99 tumor/normal pairs ............................................ 296 Appendix F Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 ........... 297 x List of Tables Table 1.1 Specifications of the common next-generation sequencing platforms as compared to the most common Sanger sequencer (Life Technologies’ ABI3730XL) ........................... 39 Table 2.1 Genes with significant evidence of differential expression between (A) fSKPs and dSKPs, and (B) dSKPs and vSKPs as shown in Figure 2.3B ................................................. 82 Table 2.2 Pathways enriched among the transcripts increased or decreased in abundance in SKPs compared to MSCs ........................................................................................................ 86 Table 2.3 LongSAGE libraries used for the seriation analysis described in Section 2.2.4 .... 90 Table 2.4 Pluripotency genes with transcript abundance increased or decreased in SKPs compared to MSCs .................................................................................................................. 91 Table 3.1 Human NBL TIC and SKP lines used for gene expression analysis .................... 127 Table 3.2 List of RNA sequencing libraries and their sequencing statistics ........................ 129 Table 3.3 Proteins detected in the whole and crude membrane cell extract of NBL TIC line NB88R and their corresponding RNA-Seq expression level ................................................ 132 Table 3.4 Known drug targets among NBL TIC-enriched genes ......................................... 135 Table 4.1 Non-silent mutations in genes of interest along with their validation status ........ 166 Table 4.2 Genes with significant frequency of somatic mutation ........................................ 171 Table 4.3 Notable structural variants detected and confirmed in NBL genomes and transcriptomes ....................................................................................................................... 172 Table 4.4 Parameters used to select high confidence candidate somatic mutations reported by CGI........................................................................................................................................ 175 Table 4.5 Primer sequences used for genomic validation of structural variants and gene fusions detected by BCCA pipeline ...................................................................................... 176 Table 4.6 Primer sequences used for tumor RNA validation of structural variants and gene fusions detected by the BCCA pipeline ................................................................................ 177 Table A.1 Transcripts enriched and depleted in SKPs as discussed in Chapter 2 ................ 222 Table B.1 Candidate pluripotency genes used for seriation analysis in Chapter 2 ............... 255 Table C.1 Transcripts enriched and depleted in NBL TICs.................................................. 263 Table D.1 Original data for the 99 NBL cases described in Chapter 4 ................................ 290 Table F.1 Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 ........... 297 xi List of Figures Figure 1.1 Advances in sequencing chemistry implemented in the earliest next-generation sequencers ............................................................................................................................... 35 Figure 1.2 Transcript model coverage by various sequencing-based methods for transcriptome analysis ............................................................................................................. 37 Figure 2.1 Global expression patterns are similar across SKPs of distinct development origins ................................................................................................................................................. 61 Figure 2.2 Facial and dorsal trunk SKP lineages show similar degrees of divergence from MSCs....................................................................................................................................... 64 Figure 2.3 SKPs of distinct developmental origin express neural crest specification genes despite maintaining a lineage history at the gene expression level ........................................ 67 Figure 2.4 Transcripts preferentially enriched or depleted in SKPs compared to MSCs ....... 70 Figure 2.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs ................................................................................................................................................. 72 Figure 2.6 Seriation analysis to identify developmentally restricted transcripts expressed in undifferentiated ES cells ......................................................................................................... 77 Figure 2.7 Computational validation of genes identified by seriation as pluripotency markers ................................................................................................................................................. 79 Figure 3.1 Transcripts enriched and depleted in NBL TICs compared with SKPs and other tumor tissues ......................................................................................................................... 113 Figure 3.2 Pathway analysis of NBL TIC-enriched transcripts ............................................ 116 Figure 3.3 NBL TICs are sensitive to Aurora B kinase inhibition ....................................... 120 Figure 3.5 NBL cells preferentially express the oncogenic BARD1beta isoform that is involved in the stabilization of AURKB ............................................................................... 124 Figure 4.1 Overview of the multi-centre next-generation sequencing initiatives and data analyses ................................................................................................................................. 155 Figure 4.2 Somatic mutation frequencies in 99 NBL tumor/normal pairs with samples ordered by type of genes with somatic alteration ................................................................. 156 Figure 4.3 Integrated analysis of 99 neuroblastoma cases reveals a diversity of somatic aberration .............................................................................................................................. 160 xii List of Abbreviations BP Base Pair CGI Complete Genomics, Inc. COG Children‘s Oncology Group CNV Copy Number Variant ESC Embryonic Stem Cell ES Embryonic Stem ESP End-Sequence Profiling GB Giga Base GWAS Genome-Wide Association Study GWA Genome-Wide Association ICGC International Cancer Genome Consortium KB Kilo Base MB Mega Base NBL Neuroblastoma RNA-Seq RNA Sequencing RPKM Reads Per Kilobase of Gene Model per Million Mapped Reads SAGE Serial Analysis of Gene Expression SEER Surveilance Epidemiology and End Results SI Splice Index SKP Skin-Derived Precursor Cell SNP Single Nucleotide Polymorphism SNV Single Nucleotide Variant TARGET Therapeutically Applicable Research to Generate Effective Treatments TIC Tumor Initiating Cell TCGA The Cancer Genome Atlas MSC Mesenchymal Stem Cell MPSS Massively Parallel Signature Sequencing NCI National Cancer Institute xiii Acknowledgements Over the course of my PhD I have been honored to learn from many talented scientists, clinicians, professionals, and members of the general public. To these individuals, only some of whom could be personally mentioned here due to space constraints, I am indebted for the success in my endeavors and my continued enthusiasm in scientific research. First and foremost, I could never overstate my gratitude to my PhD supervisor, Dr. Marco Marra, who has become a role model of excellence in science, leadership and personal integrity. He has supported me throughout my PhD scientifically, financially and emotionally, and provided me with numerous invaluable learning opportunities both in science and in life. I simply could not have wished for a better supervisor. I would like to express my deepest gratitude to my thesis supervisory committee, Drs. Angela BrooksWilson, Paul Pavlidis, and Samuel Aparicio who have challenged me with insightful questions and discussions that had a great impact on my scientific growth. I am also grateful to the members of my examining examiners, Drs. Phil Hieter, Poul Sorensen, Lynn Raymond, and Annie Huang for their detailed reading of my thesis and thoughtful comments and questions that have greatly enhanced the final document. I have been fortunate to participate in a number of collaborative projects that taught me the benefits and challenges of team science, and allowed me to interact with many exceptional individuals and world-class scientists. I am honored to have been involved in the National Cancer Institute Neuroblastoma TARGET initiative, and would like to thank Drs. John Maris, Daniela Gerhard and Malcolm Smith for this opportunity. I am also thankful to have worked with Drs. David Kaplan, Freda Miller, Jason Moffat, Gregory Cairncross, Neal Boerkoel, Connie Eaves, Sheila Singh and members of their laboratories. I would like to specifically acknowledge Loen Hansford, Milijana Vojvodic, Kristen Smith, Kim Blakely and Nathalie Grinstein for providing experimental support for my work. On the note of collaborations, I cannot fail to thank Dr. Stephen Yip for introducing me to neuropathology, and for helping me on this journey in more ways than could be listed here. I am privileged to have been part of the Marra lab, and would like to thank its current and former members for technical assistance, insightful discussions, and emotional support. I wish to specifically thank Noushin Farnoud, Andy Mungall, Malachi Griffith, Trevor Pugh, Ryan Morin, Tesa Severson, Rodrigo Goya, Maria Mendez-Lago, Sorana Morrissy, Jill xiv Mwenifumbo, and Suganthi Chittaranjan for their expertise and team spirit that have contributed immensely to this thesis. I also would like to acknowledge the gifted summer students Alexandra Maslova and Yulia Merkulova who have been a great help in my research. My sincerest gratitude goes to Lulu Crisostomo for her invaluable assistance with administrative tasks and much more. I am thankful to have been surrounded by many talented staff and scientists at the BC Cancer Agency‘s Genome Sciences Center (GSC), especially, Richard Corbett, Yaron Butterfield, Karen Mungall, Mikhail Bilenky, Hye Jung (Elizabeth) Chun, Greg Taylor, Roland Santos, Alireza Hadj Khodabakhshi, Gordon Robertson, Nina Thiessen, and Rob Chrisp. These individuals have been a source of both scientific and emotional support over the course of my PhD. My work would not have been possible without the skilled assistance from the members of the GSC library construction, sequencing, and bioinformatics teams. I would also like to thank Robyn Roscoe, Karen Novik, Diane Miller, Dominik Stoll and Cecelia Suragh, for their help with funding applications and project management support. I have enjoyed being part of the Canadian Institutes for Health Research / Michael Smith Foundation for Health Research Strategic Training Program in Bioinformatics, and would like to thank the two foundations for my stipend during the rotations. I would also like to extend my gratitude to Dr. Steven Jones and Sharon Ruschkowski for fostering a great training environment, and supporting me through my rotations and thesis work. In addition to the bioinformatics program stipend, I have been honored to receive salary and travel funds from the National Sciences and Engineering Research Council, Michael Smith Foundation for Health Research, Genome Canada, American Association for Cancer Research Women in Cancer Research Council, University of British Columbia, Roman M. Babicki Fellowship in Medical Research, and the John Bosdet Memorial Fund. I also cannot fail to acknowledge the Jordan Hopkins Foundation for Cancer Research, the James Fund for Neuroblastoma Research, the British Columbia Childhood Cancer Parents‘ Association, and the Will to Survive Campaign for their passionate support of pediatric cancer research, including my thesis project. Finally, I wish to extend my thanks to fellow graduate students Anya Gangaeva, Meeta Mistry, Shabnam Tavassolli, Katayoon Kasaian, Warren Cheung, Leon French, Kieran O‘Neill, Anthony Fejes, and Yvonne Li, as well as my family and friends for being a great source of encouragement, motivation, and fun throughout these years. xv Dedication To Anna, Ava, Emily, Ethan, Brendan, Connor, Jake, James, Jordan, Kaiya, Nate, Maya, Reese, Ryan, Taras, Shivank as well as countless others who have journeyed through the world of neuroblastoma, and to Megan McNeil, who fought hard for a day when no child would die from cancer. xvi Chapter 1: Evolving methods of genomic analysis and their application to the study of neuroblastoma1 1.1 Introduction While it has been long realized that cancers are genetic diseases, it is only with the recent advent of high resolution genomic technologies that the exact nature of genetic changes associated with most cancers are being elucidated. This Chapter reviews the evolution of genomic approaches that have been developed for cancer analysis, with an emphasis on the genomic technologies, microarrays and next-generation sequencing, used for the research described in Chapters 2, 3 and 4 of this thesis. A specific focus of the dissertation is on the genomic analysis of pediatric neuroblastoma, a cancer of the developing sympathetic nervous system that most commonly affects children under the age of 5. Section 1.9 provides a brief overview of the clinical and biological features of neuroblastoma, as well as the advances in neuroblastoma genetics and genomics. Finally, Section 1.10 introduces the specific hypotheses and experimental goals addressed in each of the research chapters of this thesis. 1.2 Cancer as a genetic disease The presence and causative role of genetic defects in cancer cells was first suggested by David von Hansemann and Theodor Boveri in the 1890s-1900s [1]. Boveri accepted von Hansemann‘s original idea that abnormal chromatin content was central to cancer cells, and refined it in his subsequent experimental work on sea urchin embryos. Using the sea urchin model system, Boveri observed that abnormal numbers of chromosomes led to improper embryonic development, and, in some cases, to uncontrolled cell growth. Boveri further hypothesized that genetic aberrations came in two flavors, those stimulatory and those inhibitory to cell growth [2,3]. The growth stimulatory chromosomes would be accumulated 1 Portions of this Chapter have been published, and the author contributions are provided in the Preface as per the University of British Columbia PhD thesis guidelines: O. Morozova, M. Hirst, M.A. Marra. Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 10:135-51, 2009. Copyright by Annual Reviews; O. Morozova and M.A. Marra. Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5), 2008. Copyright by Elsevier; O. Morozova and M.A. Marra. From cytogenetics to next-generation sequencing technologies: advances in the detection of genome rearrangements in tumors. Biochem. Cell Biol. 86(2):81-91, 2008. Copyright by Canadian Science Publishing. 1 by cancer cells, while the inhibitory ones would be excluded. Boveri‘s prescient concepts of stimulatory and inhibitory genetic material were much later manifested in the notions of oncogenes and tumor suppressors, collectively known as cancer genes [4–6]. Oncogenes and tumor suppressors are genes whose products function in cell growth pathways or are involved in the control of the cell cycle. Mutated oncogenes typically function in a dominant fashion, while mutations in tumor suppressors are recessive [7]. The first cellular oncogene c-src was discovered by homology with viral sequence previously shown by Peyton Rous to cause sarcomas in hen [8,6]. The normal cellular homologues of viral oncogenes are commonly referred to as prototype oncogenes (proto-oncogenes) to highlight the fact that they need to be activated by a genetic event (a gain-of-function mutation) to become oncogenes, whereas the viral counterparts encode constitutively active pro-survival proteins. Another class of genes that contribute to cancer and is sometimes considered part of the term ―cancer genes‖ includes genes involved in DNA repair. Defects in these genes contribute to the increased rate of accumulation of DNA damage as well as genomic instability that in turn enhances the likelihood of producing a genetic alteration affecting a proto-oncogene or tumor suppressor (e.g. mutations in mismatch repair genes are responsible for hereditary nonpolyposis colorectal cancer [9]). 1.3 Cancer as a multigenic disease Mathematical modeling studies that used epidemiological data on the age distribution of common cancers have led investigators, such as Carl Nordling, to propose that several (originally as many as seven) genetic hits may be required for tumorigenesis [10]. Alfred Knudson applied the idea of multistep tumorigenesis to the study of retinoblastoma, a pediatric cancer that can occur in both sporadic and familial forms. Knudson used statistical modeling to suggest that the distribution of sporadic and familial retinoblastoma tumors was consistent with the disease being caused by two hits (later termed Knudson‘s two-hit hypothesis). The two-hit hypothesis suggested that in familial cases the first genetic hit was inherited and the second acquired somatically, while in sporadic cases both hits were somatic [11]. This model explained why familial but not sporadic cases often presented with multiple tumors or tumors in both eyes. It was put forward in Nordling‘s original paper that only hits that confer survival advantages on cancer cells would count towards the proposed seven required for 2 tumorigenesis, thereby alluding to the ideas of cancer driver mutations and clonal evolution. Peter Nowell later formalized these ideas into a theoretical framework of stepwise acquisition and Darwinian selection of genetic changes that underlies our current view of tumorigenesis [12]. Nowell also suggested that early genetic mutations that occur in cancer cells may contribute to genomic instability and even more genetic alterations observed in later-stage tumors. However, due to limited biological knowledge available at the time, he was unable to pinpoint the exact nature of the genetic changes required for tumorigenesis. In a seminal paper published in 1990, Eric Fearon and Bert Vogelstein combined previous theoretical work with advances in the identification of oncogenes and tumor suppressors to propose a specific molecular model of colorectal tumorigenesis [13]. According to this model, aggressive colorectal carcinomas developed from benign adenomas by sequential acquisition of changes that included activation of oncogenes and inactivation or loss of tumor suppressors. The model also incorporated epigenetic changes, such as DNA hypomethylation, which was originally reported to occur in tumors by Feinberg and Vogelstein [14], but shown to have a causal role in cancer only several years later [15]. It is now accepted that abnormalities in cancer genes, accumulated and selected for in a step-wise process, contribute to a genetic landscape that underlies the biological hallmarks of tumors: self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of programmed cell death (apoptosis), unlimited replicative potential, sustained angiogenesis, and tissue invasion [16]. 1.4 Origin of genetic mutations in cancers 1.4.1 Familial cancers and cancer syndromes While most cancers are acquired sporadically, a fraction of malignancies, such as familial breast cancer, cluster in families and are associated with inherited mutations in cancer genes [17]. In addition, several cancer syndromes have been characterized and linked with overall increased risk of cancers, for instance Li-Fraumeni Syndrome and von HippelLindau disease are associated with increased risk of certain types of solid tumors [18]. Familial cancers and cancer syndromes have been instrumental in inferring the identities of a fraction of cancer genes that play a role in both sporadic and familial forms of the same malignancy. For example, tumor suppressors RB and VHL are altered in both sporadic and familial forms of retinoblastoma and renal cell carcinoma, respectively [19,20]. However, in 3 most cases, such as ductal and lobular breast cancer, alterations in different genes underlie sporadic and familial forms of the same disease [21]. 1.4.2 Genetic causes of sporadic cancers Most well-characterized familial cancer syndromes follow a dominant mode of inheritance and are associated with a small number of rare alleles that confer a significant effect on the phenotype [18]. The completion of the first two human genome sequences and the International HapMap Initiative led to the realization of the abundance of human genetic variation that may contribute to an individual‘s risk of common diseases, including cancer [22–24]. This realization brought about investigations into recessive genetic components that may influence susceptibilities to sporadic cancers. Due to the high lifetime relative risk of developing a sporadic cancer at an invasive site (45% for males and 38% for women in the US according to the Surveillance Epidemiology and End Results database [25]), studies of cancer families are confounded by a high likelihood of chance associations and difficulties in discerning hereditary and environmental causes [17]. To address these concerns and help delineate the potential hereditary component of common cancers, a large-scale study designed to compare co-occurrence of common cancers in monozygotic and dizygotic twins was conducted [26]. The study examined 44,788 sets of twins from Sweden, Finland and Denmark, and found minor contributions of a hereditary component to susceptibility for most types of cancer, suggesting that most significant causes of common sporadic cancers were environmental. Environmental agents that have been associated with cancer include tobacco smoke, UV light, radiation, hormones, viruses, and various chemical substances. In fact, it is currently thought that the environmental causes of human cancers are underappreciated [27]. Given these observations, sporadic cancers are likely caused by a combination of inherited predisposition alleles and acquired (somatic) mutations that result in uncontrolled proliferation and tumor growth. Inherited or acquired defects in DNA repair, replication or segregation can aggravate the neoplastic phenotype and contribute to further cancer progression. The acquired alterations may arise through the exposure to environmental agents, or due to other factors, some of which may be currently unknown. 4 1.5 Cancer stem cell hypothesis Stem cells are defined as special cells within a multicellular organism that are able to self-renew and, through cell division, generate specialized cell types that compose each tissue within the body. For instance, embryonic stem cells are able to generate all cell types within the developing embryo while adult (somatic) stem cells are able to regenerate cell types within a particular tissue [28]. Modern use of the term ―cancer stem cell‖ has been pioneered by the work in leukemia that showed that the cell of origin of leukemias, regardless of their heterogeneity, consistently exhibited properties of the normal hematopoietic stem cell [29]. This work resolved the long-term debate on the target cell that was susceptible to leukemic transformation, and implicated the hematopoietic stem cell in this role. Since this landmark study, similar observations have been made in brain and breast malignancies [30,31]. Both of these reports together with the original leukemia work suggested that a small fraction of cells (0.1-0.0001%) within each tumor maintains stem cell properties and is responsible for selfrenewal and regeneration of the tumor hierarchy by producing differentiated cells that form the bulk of the tumor. However, the idea of rarity of tumor-regenerating cancer stem cells was questioned by the work in melanoma, which reported that an average of 27% of unsorted melanoma cells from patients were capable of forming tumors in mice in single-cell transplant experiments [32]. The contention of the melanoma study was that the common use of NOD/SCID mice, such as that reported in the original leukemia work [29], may underestimate the frequency of tumor-forming cells as these mice have remnants of immunity and are less susceptible to developing cancer. In contrast, the melanoma study used NOD/SCID interleukin-2 receptor gamma chain null (Il2rg (-/-)) mice that are more immunocompromised than the NOD/SCID mice and are thus better suitable for estimating true tumorigenic capacity of cancer cells. Another study challenged the presumed origin of cancer stem cells from resident normal stem cells within the tissue and showed that breast cancer stem cells may arise from tumor cells via epithelial to mesenchymal transformation induced by immune signaling [33]. Given these observations, the state-of-the-art version of the cancer stem cell model is dynamic, and incorporates the possibility of variable frequencies of stem cells in different cancer types, as well as the potential for inter-conversion of cancer stem cell and non-stem cell compartments within the tumor [34]. 5 An important result from studies in the cancer stem cell field is the finding that cancer stem cells may be resistant to therapies and may be associated with tumor recurrence [35]. As such, cancer stem cells provide important targets for novel therapy development, particularly for recurrent and refractory disease. Therefore, studying genetic changes found in these cells may shed light onto the potential therapeutics that may be specific to the cancer stem cell compartment. 1.6 Genetic lesions in cancers and methods for their detection Somatic and germline aberrations implicated in tumorigenesis can affect a single base (point mutations) as well as multiple bases (translocations, inversions, small insertions/deletions (indels), and copy number variants (CNVs)). Throughout this thesis, events are defined as duplications or deletions if they are <1 kb in length and as CNVs if they are >1 kb in length. In addition, losses of heterozygosity (LOH) are defined in the context of tumor suppressors when one allele, most often the functional one, is lost either through the loss of a copy of a chromosome or via a copy-number-neutral mechanism. Due to their size and ease of detection, genome rearrangements involving whole chromosomes or their parts were the earliest genetic events shown to be associated with cancer [3]. Chromosomal translocations can result in either chimeric protein products or aberrant gene expression due to the apposition of coding sequences to regulatory regions of other genes, either of which can be associated with cancer genes [7]. Copy number gains (amplifications) have been shown to be linked to increased expression of oncogenes, such as MYCN in neuroblastoma, while regions of copy number losses may harbor tumor suppressor loci [36]. In addition, coding or regulatory sequence of cancer genes can be disrupted by point mutations and small indels affecting the amino acid sequence or gene expression, respectively. The smaller events evaded detection by early low resolution approaches, and the extent of their contribution to tumorigenesis has been realized only in the recent decade. 1.6.1 Pre-genomic methods for studying genetic aberrations in cancers The earliest methods for detecting chromosomal and genomic aberrations in cancers involved microscopic examinations of chromosomes and chromosome banding patterns [37]. Application of these approaches led to the discovery of the Philadelphia chromosome, which results from an exchange of DNA between chromosomes 9 and 22 in chronic myeologenous leukemia (CML) [38,39]. PCR-based methods have been used to detect known genome 6 rearrangements, particularly alterations in gene copy number. These methods produce results promptly, require little starting material, and are excellent for locus-specific identification of known rearrangements of a few kbs in size. Several techniques allow detection of genomic lesions larger than those detectable by traditional PCR (5 – 6 kb) [40]. For instance, Long PCR uses a mixture of two polymerases, a proofreading and a non-proofreading one, thus increasing the product size to 35 kb [40]. The product length of non-proofreading polymerases is limited by the low efficiency of extension at mismatched bases, while the product length of proofreading polymerases can be limited by their 3´-exonuclease activity; therefore, combining the two types of polymerases increases the product size achievable by each enzyme alone. This method is useful for identifying specific large aberrations, including intragenic deletions, insertions and duplications [41]. An important milestone in molecular cytogenetics was the development of in situ hybridization. This procedure is based on the principle of the hybridization of a labeled probe, containing genomic DNA of interest, to a complementary target; probe copy number is assessed by means of microscopic visualization. Since the first report of the method in 1969 [42], in situ hybridization methods have undergone extensive advancement with regards to both the target and the probe [43]. The most commonly used conventional in situ hybridization protocol in cancer research is dual-color fluorescence in situ hybridization (FISH). This method involves labeling centromeres and the DNA region of interest with different colors and estimating probe copy number from the ratio of the centromeric and noncentromeric signal. Dual-color FISH is used for the detection of chromosomal gains or losses (aneuploidy); intrachromosomal insertions, deletions, inversions, amplifications; and chromosomal translocations in both solid and hematopoietic cancers [44]. An extension of conventional FISH methods is the development of multi-fluorochrome techniques such as multiplex FISH (M-FISH) [45], spectral karyotyping (SKY) [46] and combined binary ratio labeling (COBRA) [47] which allow the simultaneous visualization of all chromosomes in 24 colors. Improvements in target resolution have been achieved through the use of different probe substrates, including metaphase chromosomes (~5 Mb resolution), interphase nuclei (50 kb – 2 Mb resolution), and extended chromatin or DNA fibers (5 – 500 kb resolution) [43]. Mapped genomic clones such as bacterial artificial chromosomes (BACs), P1-derived artificial chromosomes (PACs), and yeast artificial chromosomes (YACs) have also been 7 used as FISH probes to achieve a higher resolution mapping of genome rearrangements to the human genome sequence than that achievable by chromosome FISH [48–50]. 1.6.2 Array-based methods for the detection of genetic lesions in cancer genomes Comparative genomic hybridization (CGH) is a molecular cytogenetic method for detecting relative differences in copy number between two genomes. In its original form, DNAs from reference and test samples were labeled with different colors and hybridized to metaphase chromosomes. The ratios of test to reference fluorescence intensities were quantified using digital image analysis, and were used to identify genomic losses or gains in the test sample (e.g. a tumor sample) with respect to the reference sample [51]. Conventional CGH is labor intensive, providing relatively low resolution of 5 to 10 Mb for deletions and 2 Mb for amplifications [52]; moreover, it is unsuitable for the detection of balanced rearrangements (e.g. balanced translocations and inversions) as well as whole genome copy number changes (ploidy) [53]. However, CGH can be used as a discovery tool as it requires no prior knowledge of chromosomal imbalances. To overcome the low resolution limitation of CGH, array CGH (aCGH) was developed. In aCGH, the differentially labeled test and reference DNA is hybridized to a glass slide containing arrayed DNA probes rather than metaphase chromosomes [54]. With the recent development of arrays of mapped clones spanning whole chromosomes [55,56] and the whole human genome [57], large-scale aCGH experiments are feasible. For instance, 79 kb resolution has been achieved using a genome-wide array of BACs [58]; 75 and 110 kb resolutions have been reported with chromosomal arrays containing a mix of BACs/PACs and fosmids/cosmids, and BACs only, respectively [55,56]. Arrays of mapped genomic clones are robust with a high signal to noise ratio, and have been applied to the detection of copy number changes in tumors on a genome-wide and chromosome-wide scale [59,52]. In contrast, oligonucleotide arrays can provide a higher resolution (generally 5 to 50 kb) but have been reported to suffer from lower sensitivity resulting in failure to reliably detect lowcopy number changes due to a poorer signal to noise ratio [60]. Oligonucleotide array CGH can potentially provide even higher resolution than 5 kb as overlapping nucleotides can be synthesized with as little as a single base off-set [53]. Despite the popularity of aCGH methods, the main technological limitation of these methods is their restricted applicability to 8 the detection of genome rearrangements that involve a change in copy numbers relative to a reference sample. Single Nucleotide Polymorphism (SNP) arrays, originally designed for genotyping, are oligonucleotide arrays that detect the two different alleles of biallelic SNPs [61]. Probe signal intensities can be used to determine SNP genotypes and to detect copy number changes [62]. In contrast to array CGH, in which samples are differentially labeled and cohybridized, only one labeled sample is hybridized to the SNP array at a time; CNVs are detected by comparison with one or several reference samples analyzed in separate hybridizations. Currently SNP arrays capable of genotyping more than 1M SNPs are available from companies such as Illumina and Affymetrix, providing a resolution that matches or exceeds that of most state-of-the-art aCGH platforms. An important advantage of SNP arrays is the ability, unique among genomic methods discussed thus far, to detect copy number neutral losses of heterozygosity [63]. Further, SNP arrays have been used to detect allele-specific copy number variants [64]. A disadvantage of the technology is the requirement of a PCR amplification step to increase the signal to noise ratio; as a result, amplification biases may be introduced giving rise to spurious CNVs [53]. Moreover, CNV predictions achieved using SNP arrays vary depending on the reference set and computational approach used [65]. Even so, SNP arrays have been widely applied to the analysis of genomes of various tumors including neuroblastoma [66] and in The Cancer Genome Atlas discussed in Section 1.6.3.3. 1.6.3 Sequencing approaches for the detection of genetic lesions in cancers 1.6.3.1 Advances in DNA sequencing technologies With the completion of the reference human genome projects [22,24], the need for resequencing studies in which individual genomes and genomic segments are examined for the presence of changes linked to the phenotype of interest became apparent. This observation drove technological developments that resulted in the advent of a panel of conceptually new sequencing methods collectively referred to as ―next-generation‖, ―new generation‖ or ―second generation‖ sequencers that are more cost-effective than Sanger sequencing. A standard DNA sequencing workflow has traditionally included three key steps, sample preparation, sequencing, and data analysis. The new sequencing technologies improve upon 9 the Sanger protocol by advances in the first steps of the workflow, albeit often at the cost of higher error rates and shorter read lengths that can challenge data analysis. Several high throughput new-generation sequencing technologies are currently commercially available, including 454/FLX (Roche), Illumina, SOLiD (Life Technolgoies), Pacific Biosciences, Ion Torrent (Life Technologies). As of July 2011, the Helicos Heliscope instrument used in several published next-generation sequencing studies is no longer available for purchase. In the research described in this thesis, the Illumina technology is used in Chapter 3 to analyze the transcriptomes of neuroblastoma tumor-initiating cells as well as their normal counterparts. In Chapter 4, the same technology is used to analyze the genome, exome and transcriptome sequences of primary neuroblastoma tumors. The new technologies produce an abundance of short reads at a higher throughput than is achievable with the state-of-the-art Sanger sequencer, and their specifications are summarized in Table 1.1. An additional company not mentioned in Table 1.1, Complete Genomics, Inc. (CGI), provides whole human genome sequencing and analysis as a service [67]. Genome sequences generated by CGI from primary neuroblastoma tumors and matched peripheral blood are discussed in Chapter 4. The advances in sample preparation and sequencing chemistry and detection are reviewed below for the most common nextgeneration sequencing technologies: 454/Roche, Illumina, and SOLiD. To provide an example of the true single molecule technology, the Helicos Heliscope is also discussed. 1.6.3.1.1 Advances in Sample Preparation In the original Sanger sequencing protocol, a DNA sample is first sheared into fragments, and then subcloned into vectors, followed by the amplification in bacterial or yeast hosts. The amplified DNA is then isolated and sequenced with the Sanger chain termination method [68]. Cloning-based amplification allows for the sequencing of contiguous large fragments, and does not require prior information about the genome sequence (termed ―de novo sequencing‖). However, it is prone to host-related biases, and is lengthy and labor intensive, restricting large-scale Sanger sequencing to designated genome sequencing centers. Cloning-based amplification followed by Sanger sequencing was used for the determination of the first human genome sequences [24,22]. Notably, when a reference genome sequence of an organism is available and when regions to be sequenced are 10 small, templates can be prepared for sequencing by PCR amplification instead of cloning [69]. A major advantage of the second-generation sequencing platforms is the elimination of the in vivo cloning step and its replacement with PCR-based amplification. Both 454/Roche [70] and Applied Biosystems SOLiD technologies circumvented the cloning requirement by taking advantage of emulsion PCR [71], which uses emulsion droplets to isolate single DNA templates in separate micro reactors where amplification is carried out. This template amplification is also used in Ion Torrent instruments [72]. The Illumina platform [73,74] uses bridge amplification, a solid phase amplification approach in which DNA molecules are attached to a solid surface and amplified in situ, generating clusters of identical DNA molecules. Both of these amplification approaches result in the generation of a collection of clonal copies of the template, which are fed into subsequent steps of the sequencing pipelines. The first single-molecule method to be commercialized was developed by Stephen Quake‘s laboratory (and commercialized by Helicos Biosciences), eliminated the amplification step, directly sequencing single DNA molecules bound to a surface [75]. Another commercially available single-molecule sequencing method (Pacific Biosciences) employs real-time detection of single fluorescently-labeled nucleotides as they are incorporated by a polymerase [76]. Such single-molecule sequencing approaches are referred to as third-generation technologies. Third-generation sequencers have the potential to reduce the sequencing costs of the second-generation instruments, although their scalability remains unproven. 1.6.3.1.2 Advances in Sequencing Chemistry and Detection The paradigm of the original Sanger method is the DNA polymerase-dependent synthesis of a complementary strand in the presence of four labeled nonreversible synthesis terminators, 2´,3´-dideoxynucleotides (ddNTPs) corresponding to the four natural 2´deoxynucleotides (dNTPs). The four non-reversible terminators are incorporated into the growing DNA strand at random in place of the corresponding dNTP, thereby producing a collection of DNA fragments of varying lengths that are then separated by polyacrylamide gel electrophoresis [68]. Originally, radioactive ddNTPs were used and four different reactions were required per template molecule. Subsequently, the radioactive ddNTPs were replaced with fluorescently labeled terminators that allowed the four sequencing reactions to 11 be carried out simultaneously with different ddNTPs distinguishable by emission spectra [77]. Another variation of automated Sanger sequencing is the dye-labeled primer sequencing in which fluorescent dyes are attached to the 5′ end of primers [78]. A key disadvantage that hindered further development of this method as compared to the dye-labeled terminators described above is the need for four separate extension reactions that needed to be pooled prior to loading, and four dye-labeled primers for each template. Other improvements of Sanger sequencing included the replacement of slab gel electrophoresis with capillaries, the advent of capillary arrays that allowed sample multiplexing, and the deployment of production-scale sequencing workflows. As a result of these developments, the Sanger method achieved the read length, accuracy, and throughput compatible with de novo sequencing of whole genomes. To date, Sanger sequencing has been responsible for the generation of reference genome sequences of many species including that of human [22,24]. The pyrosequencing approach was the first alternative to Sanger sequencing to achieve commercialization as part of the Roche/454 instrument [70]. Pyrosequencing uses chemiluminescence-based detection of each released pyrophosphate that occurs upon the incorporation of a nucleotide by the DNA polymerase (Figure 1.1A). The four nucleotides are added to the sequencing reaction one at a time, such that only one type of nucleotide is available to the DNA polymerase at a given step. The addition of the correct nucleotide is accompanied by the release of light allowing for the inference of the nucleotide identity at each position in a sequencing read. The amount of light produced is proportional to the number of incorporated nucleotides, potentially permitting the detection of homopolymers. In practice, however, sequencing of homopolymer stretches using the Roche/454 technology is error-prone [79]. In the 454 FLX instrument, about 1.6 million pyrosequencing reactions occur in parallel, each in a separate well of a picotiter plate contributing to a much higher sequencing throughput than that achieved in a 96-well capillary array of a modern Sanger sequencer. Similarly to 454/Roche, the Illumina Genome Analyzer also uses sequencing-bysynthesis, albeit with a different detection chemistry [74]. The Illumina sequencing reaction utilizes four fluorescently labeled nucleotide analogs that serve as reversible sequencing terminators, and highly modified DNA polymerases that are capable of incorporating these analogs into the growing oligonucleotide chain (Figure 1.1B). At each step the correct 12 nucleotide analog is incorporated into the growing chain and its identity is revealed by the color of its fluorescent label. Importantly, the 3´-OH group of the nucleotide is blocked to prevent further extension of the nascent DNA chain. After the imaging step, the label is washed off and the blockage is reversed, thereby allowing the synthesis to proceed. The sequencing reactions occur in a massively parallel fashion on a flow cell, which is a glass surface that contains hundreds of millions of clusters of clonally identical DNA molecules. The true single-molecule sequencing approach commercialized by Helicos Biosciences in the HeliScope instrument also used a sequencing-by-synthesis procedure in which virtual terminators (nucleotide analogs that reduce the processivity of DNA polymerase) are used [80]. The reduced DNA polymerase processivity allows for the accurate identification of homopolymer stretches. In the Helicos system, single-molecule DNA templates are captured on the flow cell surface. The Cy3-labels attached at both ends of each DNA molecule are used to reveal the location of each template bound to immobilized primers on the surface of the flow cell. The Cy5-labeled nucleotides are added to the reaction one at a time, and the detection of incorporated nucleotides is achieved (Figure 1.1.C). In contrast to the polymerase-based approaches discussed above, the SOLiD (Supported Oligonucleotide Ligation and Detection System) system uses a sequencing-byligation approach in which the sequence is inferred indirectly via successive rounds of hybridization and ligation events. This approach was first published by the Church laboratory as the ―polony sequencing technique‖ [81]. The SOLiD system uses 16 dinucleotides, each carrying a fluorescent label. Four fluorescent dyes are used in the system such that one dye labels four different dinucleotides (Figure 1.1D). The identity of each base is determined from the fluorescent readout of two successive ligation reactions. An advantage of the twobase encoding scheme is that each position is effectively probed twice, in principle allowing for the distinction of sequencing error from a true sequence polymorphism. 1.6.3.2 Sanger-based sequencing methods for the detection of genetic lesions Since Sanger sequencing had been the only available sequencing technology for more than 20 years, routine whole genome sequencing was not feasible in that time frame, and Sanger-based methods for rearrangement detection, not requiring whole genome sequencing, had been developed. Digital karyotyping (DK) is a method for genome-wide analysis of copy number changes and other genome rearrangements [82]. The method can be regarded as a 13 ―genomic version‖ of the serial analysis of gene expression (SAGE) technique [83] described in Section 1.7.2. In DK, genomic DNA is digested with a mapping restriction enzyme, originally SacI (with a 6 bp recognition sequence) followed by the ligation of biotinylated linkers and a second digestion using a fragmenting restriction enzyme with a 4 bp recognition sequence. The biotinylated sequences are isolated by binding to streptavidin and the DNA tags are released using a tagging enzyme with a 6 bp recognition sequence. The isolated sequence tags are concatenated, cloned, sequenced, and aligned to a reference genome assembly, providing a copy number estimate at the particular locus. The combination of the mapping and fragmenting enzymes used determines the size of detectable rearrangements, and the genome-wide occurrence of mapping enzyme recognition sites defines genomic areas represented in DK analysis. In the case of SacI, recognition sites are abundant and expected to occur every 4 kb; however, some areas of the human genome (<5%) have lower densities of SacI sites and thus would be analyzed at a lower resolution [82]. To date, DK has been successfully applied to the analysis of a variety of cancers, including those of colon and brain, and has been used to identify putative oncogenes and tumor suppressors in these tumors [84,85]. The original version of DK has a theoretical resolution of 4 kb, which is higher than the generally available array-based methods. A partial limitation of DK imposed by the use of restriction enzymes is the uneven coverage of the genome, which may be addressed by using different combinations of mapping and fragmenting enzymes. Clone-based methods have been developed to detect both balanced and unbalanced genome rearrangements in cancers. An end sequence profiling approach (ESP) has been developed and successfully applied to the genome-wide analysis of rearrangements of the MCF7 breast cancer cell line [86,87]. In ESP, a BAC library is constructed for the tumor genome of interest, both ends of BAC clones are sequenced, and the paired-end sequences are mapped back to a reference genome assembly. Structural genomic variants are discovered by identifying clones whose paired-end sequences map to the reference genome in orientations that indicate the clone was derived from rearranged DNA. The ESP approach is potentially applicable to the detection of all types of genome rearrangements, which could be inferred from different types of ―ESP signatures‖ [86]. While powerful, paired-end sequencing of clones has several limitations. First, the approach is dependent on the 14 construction of clone libraries, which can be slow and costly, requiring high molecular weight DNA. Second, the resolution of paired-end sequencing methods is determined by the clone properties and the redundancy of genome coverage. Also, since the sampling occurs only from the ends, large numbers of clones would be necessary to achieve genome-wide high resolution coverage of rearrangements. To address this limitation a BAC clone fingerprint profiling (FPP) approach for high resolution detection of genome rearrangements was developed [88]. The FPP method includes the digestion of genomic BAC clones prepared from tumor DNA with five restriction enzymes, HindIII, EcoRI, BglII, NcoII, and PvuII to generate clone fingerprints that are then aligned against the in silico digests of the reference genome sequence using the FPP alignment algorithm. The restriction enzymes were selected to achieve frequent cutting and restriction site location complementation (restriction-site-poor areas of one enzyme corresponding to restriction-site-rich areas of another enzyme). The FPP alignment algorithm consists of four steps that are detailed in [88]. Briefly, the steps for aligning each BAC fingerprint to the reference genome sequence include the following: a global search of the reference genome sequence to identify BACsized or smaller genomic regions that yield digest patterns similar to that of the query clone; a local search that further delineates the local correspondence between the fingerprint of the query clone and that of the in silico digested genomic region(s) identified in step 1; an edge detection algorithm that precisely identifies the extent of the alignment; and the final partitioning step that selects an optimal solution, whereby a minimal set of alignments maximally accounts for all clone fragments on the genome. Differences between the experimental and in silico digestion patterns are indicative of genomic differences, including genome rearrangements in the clone versus the reference genome. For instance, an alignment in which the clone maps to one genomic region, but in which there are internal gaps in fragment alignments, indicates the presence of a localized rearrangement confined to the clone; on the other hand, an alignment in which the clone fingerprint is partitioned over several regions in the genome suggests the presence of a translocation, inversion, or a large deletion. The FPP approach provides several important advantages over ESP and other genome-wide methods for rearrangement detection. First, the method samples the entire clone insert and not just the clone ends, as in ESP. Therefore, rearrangement coordinates 15 mapping within the clone will be more precisely localized with FPP than ESP, given the same number of clones sampled [88]. Second, FPP is relatively tolerant of repeats compared with ESP and oligonucleotide microarrays, since only 7% of human repeats are found in contiguous regions of 3.9 kb (the average sizeable HindIII restriction fragment)[88]. This is an important advantage, considering that a significant portion of the human genome is composed of repeat sequences. Third, both balanced and unbalanced rearrangements are potentially detectable. As in ESP, clones harboring rearrangements can be directly selected for functional analyses and sequencing. Some of the drawbacks of the FPP approach include the cost and speed of library production (similar to ESP), the cost of clone characterization (cheaper than in ESP), and the requirement of a large amount of starting DNA material (less than in ESP). Consequently, although the FPP approach is potentially very powerful, the reliance on clones currently limits its widespread application. In addition, just as it is the case with other methods that rely on restriction enzyme digestion, FPP may erroneously interpret restriction fragment length polymorphisms as genome rearrangements. This limitation may be partially addressed in the future as more complete catalogues of normal genomic variation are compiled. 1.6.3.3 Cancer sequencing studies using the Sanger technology As discussed in Sections 1.2 through 1.4, it has become increasingly clear that sporadic cancers are associated with multiple acquired genetic lesions that contribute to various aspects of oncogenesis. To address the spectrum of these lesions more comprehensively than possible with hybridization or clone-based sequencing discussed in previous sections, several sequencing initiatives have been launched worldwide. The most notable of these are the Cancer Genome Project (CGP) in the United Kingdom and The Cancer Genome Atlas (TCGA) in the United Stated [89,90]. A branch of the TCGA with a pediatric focus, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative was also set up to apply similar approaches to the analysis of pediatric tumors (http://target.cancer.gov/). Initially the large-scale sequencing projects relied on the Sanger-based re-sequencing of the coding sequence of a gene set of interest or all genes in the genome; however, with the advent of new sequencing technologies discussed in Section 1.6.3.1, these projects are switching to whole genome, exome and transcriptome analysis using new sequencing platforms (Section 1.6.3.4). An analysis of 99 neuroblastoma cases 16 studied as part of the TARGET initiative using new sequencing technologies is discussed in Chapter 4. The systematic re-sequencing of the PCR-amplified coding exons of 518 protein kinase genes in 210 human cancers of 13 different types of histology conducted by the CGP initiative identified 1,007 somatic mutations, of which 921 were single base substitutions, 78 were indels, and 8 were complex rearrangements; 2/3 of these mutations had previously been uncharacterized [69]. The first TCGA report of a comprehensive analysis of glioblastoma tumors that incorporated re-sequencing data from a panel of over 600 genes in 143 cases revealed three signaling pathways that may be disrupted in glioblastoma [91]. However, since the sequencing effort involved only a subset of genes, recurrent mutations in IDH1, a gene previously not implicated in cancer, were missed by this approach but detected by a more comprehensive sequencing study [92]. Similar studies where pre-selected gene sets were re-sequenced in panels of tumors were also conducted in pediatric acute lymphoblastic leukemia, lung, and soft tissue sarcomas [93–95]. In all cases, these studies identified novel loci and pathways associated with the diseases. The re-sequencing of the coding regions of RefSeq and Consensus Coding Sequence (CCDS) genes was conducted in 11 breast and 11 colorectal cancers [96,97] and identified somatic mutations in 1718 genes (9.4% of the genes analyzed). More recently, similar approaches were also conducted in ovarian cancer and pediatric solid tumor medulloblastoma [98,99]. The medulloblastoma study involved the analysis of 22 tumors, and found an average of 11 somatic gene alterations per tumor, which was fewer by a factor of 5 to 10 compared to the adult solid tumors analyzed by related approaches, as described in this section above. Nonetheless, the study found mutations in MLL2 and MLL3, previously unknown in this malignancy. These studies suggest that large-scale sequencing efforts are successful at identifying known and novel genetic aberrations in human cancers, and that our catalogs of genetic variants that contribute to oncogenesis are incomplete for both pediatric and adult tumors. In fact, prior to large-scale sequencing studies, approximately 1% of human genes had been shown to be mutated in cancers using other techniques [7]. In contrast, recent data from the Catalogue of Somatic Mutations in Cancers database at the Sanger Institute suggest that up to 26% of all genes may harbor somatic mutations in cancers, and novel cancer genes, with 17 proven causal roles in oncogenesis, are defined each year [100]. Some of the notable examples of novel cancer genes discovered by sequencing studies include IDH1 in gliomas and leukemias [101,92], EZH2 in lymphomas and myeloid disorders [102], and FOXL2 in ovarian cancers [103]. The increasing number of genes with reported mutations in cancers points at the heterogeneity of somatic mutation found in certain cancer types, particularly solid tumors. For instance, the recent report from the sequencing of the coding region of 316 ovarian tumors by the TCGA revealed that TP53 was the only highly prevalent recurrently mutated gene, and that other genes were mutated in small subsets of tumors [98]. Therefore, the large-scale sequencing studies indicate that unbiased analyses of both adult and pediatric cancers using higher resolution approaches may identify novel loci relevant to these diseases. 1.6.3.4 Cancer genome and exome sequencing using new sequencing technologies With the advent of next-generation sequencing technologies described in Section 1.6.3.1, whole genome, exome, and transcriptome sequencing studies became more feasible and routine than previously possible with Sanger sequencing. In addition to reducing the cost of large-scale sequencing, the introduction of next-generation sequencers increased the sensitivity of mutation detection. An early study using the 454/Roche sequencing technology demonstrated the potential of next-generation sequencers to detect rare variants present in specific subpopulations of cells that elude cost-effective detection by capillary sequencing approaches [104]. The ability to detect genetic heterogeneity is due to the use of sequencing templates that have been clonally derived from a single molecule; in this manner, a variant present in a few cells can be detected if sufficient sequencing depth is applied. This feature is particularly important in cancer research in light of the hierarchy of different cell types within a tumor discussed in Section 1.5. Given this hierarchy as well as variable levels of stromal contamination invariably present in clinical samples, Sanger sequencing studies of cancers likely sampled only the most common genotypes present in a tumor, and may have missed mutations in samples containing a high frequency of normal cells [104]. In contrast, new sequencing technologies are potentially more sensitive and capable of detecting the genetic make-up of rare populations. The study that used Illumina technology to sequence the whole genome of an acute myeloid leukemia (AML) sample became the first report of a cancer genome sequenced with a new sequencing technology [105]. This study identified known and novel somatic 18 mutations that might contribute to leukemogenesis, suggesting that next-generation sequencing provides a comprehensive way for analyzing cancer genomes. Since this initial report, the genomes of additional hematopoietic (acute myeloid leukemia, chronic lymphoblastic leukemia, multiple myeloma, B-cell lymphoma) and solid (lung, breast, tongue, prostate, and skin) tumors have been published [101,106–114]. These studies have led to the identification of genetic lesions previously not implicated in the particular malignancy or oncogenesis per se. Some of this information was shown to be immediately clinically actionable, such as in the case of a tongue adenocarcinoma, whose genome sequence was used to suggest a potential therapeutic option for the patient [109]. Similarly, the identification of BRAF mutations in a fraction of multiple myeloma patients suggests a role for BRAF inhibitors in the management of the disease [108]. With the rapidly increasing number of cancer sequencing studies, largely facilitated by the introduction of new sequencing technologies, an international group of experts established the International Cancer Genome Consortium with the purpose of coordinating the ongoing cancer sequencing efforts in different countries [115]. The projects within the consortium encompass the sequencing of over 50 different cancer types, and over 25,000 individual cancer genomes. In addition to whole genome sequencing of tumors, many efforts involve the sequencing of coding regions of the genome or exome. A rationale for conducting exome rather than whole genome sequencing is the current cost-efficiency of the former approach. It can be also argued that somatic variation within the coding sequence is currently more readily interpretable and clinically actionable than intergenic variation captured by whole genome projects along with the coding variation. While sequencing experiments are becoming increasingly more affordable, whole genome sequencing is still costly when performed to the depth required to comprehensively identify variants in all genes (average 30X haploid coverage [116] that was later upgraded to at least 50X haploid coverage [117]). Therefore, major reductions in sequencing and analysis costs need to occur before exome sequencing can be rendered obsolete. Several methods of target enrichment have been developed to select the coding regions for sequencing. These methods comprise two most common categories, PCR-based enrichment of targets, and hybridization-based enrichment of targets conducted in solution, 19 on an array or as a combination of these two approaches (hybrid) capture; each of these methods have their own advantages and disadvantages [118]. To date several cancers have been analyzed using next-generation exome sequencing, including rare tumors, such as pheochromocytoma, hepatocellular carcinoma, hairy cell leukemia, renal cell carcinoma, and acute monocytic leukemia [119–123]. Exome sequencing has been useful for detecting point mutations and indels in the coding sequence, while whole genome methods, in addition to detecting these events, have also detected gene fusions and structural rearrangements. 1.7 Cancer transcriptomes as proxies for the genomic diversity of tumors Historically, cancers have been classified based on their pathological features. However, it became evident that patients with an identical histopathological diagnosis differed dramatically in terms of their disease course and response to therapy. These phenotypic differences can be now attributed to the genomic heterogeneity that has emerged from recent genome-level analyses of individual tumors as described in Section 1.6.3 (also, as reviewed in [116]). However, even prior to high resolution genome sequencing studies, some of this heterogeneity could be assessed from studying gene expression profiles of seemingly identical tumors. Two conceptually different approaches to high throughput gene expression profiling, using hybridization and sequencing, have emerged in the last decades and allow for the interrogation of gene expression levels on a genome-wide scale. 1.7.1 Transcriptome analysis of cancers using microarrays One group of methods for global transcriptome analysis is based on microarrays, in which cDNA is hybridized to arrays of complementary oligonucleotide probes corresponding to genes of interest, and the abundance of a particular mRNA species is estimated from its hybridization intensity to the relevant probe [124]. Microarray analysis is used in this thesis to study gene expression profiles of normal and malignant neural crest-like cells in Chapters 2 and 3. In Chapter 2, microarray expression data derived from several lineages of normal SKin-derived Precursor cells (SKPs) are used to characterize the neural crest-like phenotype of these cells and support their use as normal counterparts of neuroblastoma cells. In Chapter 3, microarray analysis is used to confirm the results from RNA sequencing experiments (Section 1.7.2). Several microarray platforms are currently available or in development; however, all of them rely on the principle of probe-target hybridization, in which the signal intensity 20 provides a measure of the amount of particular nucleic acid in a sample. In addition to measuring the concentration of nucleic acid in a sample, the signal intensity also depends on probe-target binding affinity, specificity of which is controlled for in a microarray experiment by introducing mismatch probes [125]. The seminal study applied microarrays to the examination of expression profiles of acute myeloid and acute lymphoblastic leukemias and showed that these clinically-distinct leukemias could be distinguished prospectively in an unsupervised manner based on their gene expression information alone [126]. In addition to the finding of correlation between the disease phenotype and global gene expression profile, this study introduced two conceptually different applications of microarray analysis: class prediction (assigning new tumors into known classes) and class discovery (discovering novel clinically relevant subtypes) that have since been used widely in cancer transcriptomics research. This work also brought about a multitude of expression profiling initiatives that to date have been performed in many types of malignancy [127]. These studies aimed to classify tumors previously indistinguishable with conventional approaches into clinicallyrelevant subtypes (class discovery) as well as to identify expression markers that could be used to prospectively classify tumors into known disease subtypes (class prediction). Another common direction of microarray data analysis is class comparison [128]. In class comparison studies, genes with evidence of differential expression among disease types, cell populations or experiments of interest are identified and used to gain novel biological or clinical insight into the different classes being compared. The analyses of microarray data, described in Chapters 2 and 3 of this thesis, are class comparisons, in which we sought transcripts significantly increased or decreased in abundance in different populations of cells. Early influential works in the cancer microarray field include class discovery studies that identified previously indistinguishable clinically and biologically relevant subtypes that derived from different cells of origin in diffuse large B-cell lymphomas [129] and breast cancers [130]. Expression-based molecular classifiers developed as a result of such studies (notably, the MammaPrint assay in breast cancers [131]) are being used in clinics and have been shown to outperform conventional methods of clinical assessment [131]. Further developments in the microarray field enabled other cancer transcriptomics applications, such as the detection of noncoding RNAs [132], single nucleotide polymorphisms (SNPs) (described in Section 1.6.2), and alternative splicing events [133]. 21 Despite their power to measure the expression of thousands of genes simultaneously, microarray methods do not readily address several key aspects, notably the ability to detect novel transcripts and the ability to study the coding sequence of detected transcripts. Moreover, microarrays are indirect methods in which transcript abundance is inferred from hybridization intensity rather than measured explicitly. These properties may interfere with experimental reproducibility, particularly when performed by different laboratories [125]. 1.7.2 Sequence census approaches to transcriptome analysis A conceptually different group of methods uses sequencing of cDNA fragments derived from mRNA, followed by counting the number of times a particular fragment has been observed (Figure 1.2). This group of methods originally included the Serial Analysis of Gene Expression (SAGE) method [83], and Massively Parallel Signature Sequencing (MPSS) [134]. In SAGE, restriction enzymes are used to obtain short sequence fragments (tags) usually derived from the 3‘ end of an mRNA; the tags are concatenated and sequenced to determine the expression profiles of their corresponding mRNAs [83]. Modifications of this protocol extended the tag length from the original 14 bp to 17 bp in LongSAGE and 26 bp in the SuperSAGE protocol [135,136]. In Chapter 2 of this thesis, SAGE analysis is used to define a list of candidate pluripotency genes, preferentially expressed in undifferentiated human embryonic stem cells. The MPSS method also generates small fragment signatures of each mRNA species; however, the in vivo propagation in bacteria used in SAGE is replaced with in vitro cloning on microbeads [134,137,83]. In addition, MPSS uses a ligation-based sequencing method instead of Sanger sequencing used in SAGE [134,83]. SAGE and MPSS are often termed ―clone-and-count‖ or ―sequence census‖ techniques as they provide a digital overview of gene expression profiles in a cell [138]. Advantages of such digital readouts include statistical robustness, and less stringent standardization and replication requirements than those for microarrays [139,134]. Some disadvantages that had hindered the use of SAGE and MPSS up until recently included the cost of sequencing and the biases introduced by the necessary cloning step. Despite its superior performance compared to microarrays at detecting highlyabundant transcripts, traditional SAGE is not very efficient at detecting rare mRNA populations [140]. New sequencing technologies have increased the cost-effectiveness of the 22 method that originally relied on the Sanger sequencing protocol and eliminated the requirement for the in vivo step [83]. Several next-generation sequencing-based SAGE methods have been reported. One method termed DeepSAGE uses the 454 sequencing technology to generate 300,000 tags with less effort than a traditional LongSAGE experiment generating 50,000 tags [141]. Another SAGE-like method based on a new sequencing technology, Tag-Seq relies on the Illumina technology to generate 10 million tags per run which represents a two orders of magnitude increase over the throughput of traditional LongSAGE [142]. Both of these methods have been shown to increase the representation of low abundance transcripts that evade detection by Sanger-based SAGE methods [142,141], thereby providing a more complete view of the transcriptome. In addition to improving the original sequencing-based methods for gene expression analysis, new sequencing technologies have enabled the development of new sequence census methods, such as Rapid Analysis of 5‘-Transcript Ends (5‘-RATE) used for surveying 5‘ end fragments [143]. Originally the LongSAGE protocol was used in the Cancer Genome Anatomy Project (CGAP) consortium that was formed to construct a public database of gene expression information across multiple cancer, pre-cancer, and normal tissues [144]. This initiative aims to provide a comprehensive resource that could be mined for the identification of transcripts enriched in a particular tissue type. The SAGE protocol was chosen over microarrays due to its digital expression readout and the relative ease with which data from multiple laboratories could be combined together for analysis [144]. Due to the advent of Illumina-based Tag-Seq, several recent CGAP libraries have been constructed using Tag-Seq, which was shown to outperform the originally used LongSAGE protocol and microarrays in terms of dynamic range and transcript representation, including the representation of sense-antisense transcript pairs [142]. 1.7.2.1 Whole transcriptome sequencing of cancers Full length cDNA sequencing [145] and the generation of expressed sequence tags (ESTs) or single sequencing reads derived from one end of a cDNA clone [146] have been used to characterize cellular mRNA profiles, including those of cancer cells. However, primarily due to the cost of sequencing, these Sanger sequencing-based methods had been even less effective than traditional SAGE at providing a representation of rare cellular transcripts or transcript representation [147]. With the development of new sequencing 23 technologies, EST sequencing gained potential as one of the sequence census method for studying mRNA profiles on a genome-wide scale. With the elimination of the cloning step and common use of random priming, next-generation EST sequencing tags can now cover the whole length of transcripts [148]. Deep EST sequencing of transcriptomes using nextgeneration technology is also referred to as whole transcriptome shotgun sequencing (WTSS)[149] or RNA sequencing (RNA-Seq) [150,151]. In a version of this approach, polyA-selected or ribosomal RNA-depleted RNA is reverse transcribed into cDNA, which is then fragmented and sequenced using a next-generation technology to generate reads intended to cover the full length of a transcript [149]. Comparative transcript coverage with each of the sequencing-based methods described thus far is provided in Fig. 1.2. The ability to cover the whole length of transcripts with RNA-sequencing reads enables many applications, previously unachievable with tag sequencing and hybridization approaches [152]. Similarly to hybridization-based approaches, RNA-Seq is able to address differential gene- and exon-level expression but with lower background, over a larger dynamic range, and with opportunities for repeat analyses based on different sets of annotations [153]. In addition, RNA sequencing data can be used to study the structure of splice isoforms [154], and identify chimeric transcripts [155] that may result from genomic rearrangements [156] and/or trans-splicing [155]. Moreover, read sequence information allows for the detection of mutations [103] and RNA edits [157], as well as quantification of the expression level of each alternative allele [158] – applications not readily available with tag-sequencing or array technologies. To date the transcriptomes of several cancer cell lines and primary tumors, including those from cervical, colon, prostate, and hematopoietic cancer types have been characterized by RNA-Seq protocols using 454, Illumina or SOLiD technologies [152]. RNA-Seq was the approach that enabled the recent discovery of key recurrent mutations in FOXL2 and ARID1A in ovarian cancers [103,159], and EZH2 mutations in Bcell lymphoma [102]. Similar approaches have been also applied to the discovery of mutations in other cancers, including acute myeloid leukemia [160] and malignant pleural mesothelioma [161]. In addition, this approach has led to the discovery of novel expressed gene fusions affecting the RAF kinase pathway in solid malignancies [162]. The alternative splicing application of RNA-Seq has been applied to the identification of splice isoforms 24 associated with drug resistance in colorectal cancer [154]. These studies suggest that RNASeq is a versatile approach that not only enables the examination of gene expression profiles, but also simultaneously allows the detection of coding mutations and gene rearrangements, at least where these events do not abrogate gene expression. RNA sequencing is used in Chapters 3 and 4 of this thesis to characterize the expression profiles on neuroblastoma tumor-initiating cells (Chapter 3) and primary tumors (Chapter 4). In addition to gene-level expression profiling, RNA-Seq is used for exon level expression analysis (Chapter 3), and the detection of point mutations and fusion transcripts (Chapter 4). 1.8 Integrative genomics of cancers With increasing amounts of genome sequence, copy number, expression, and epigenetic data, generated for different cancer types, efforts have focused on integrating these data sets to produce multidimensional views of cancers. Such efforts are important priorities of large-scale cancer genomics initiatives, notably the TCGA [89]. To address the demands of the research community, several software platforms have been developed for the visualization and analysis of multiple types of genomic data, including the Integrated Genomics Viewer (IGV) [163], the Cancer Genomics Workbench [164], the UCSC Cancer Genomics Browser [165] and others. Integrative genomic studies of cancers have followed several general directions: identifying genes [166,92] and pathways [98] affected by multiple types of aberrations within the same cancer; combining multiple data types to define and characterize disease subtypes [167,168]; and conducting systems biology analyses to reconstruct cellular regulatory networks [169]. The first TCGA study that demonstrated the power of integrating multiple datasets to provide a system-level view of a cancer combined DNA copy number, gene expression, sequence and DNA methylation information from a cohort of 206 cases of glioblastoma multiforme (GBM) [91]. This study defined three signaling pathways, RTK/RAS/PI-3K, RB, and p53 signaling, each altered in over 75% of GBM patients. Even though GBM did not have frequent recurrent changes at the level of single genes, multiple datasets revealed highly-recurrent changes at the level of signaling pathways, demonstrating the power of integrative analysis to identify recurrent and prevalent alterations at the level of pathways and functional networks. Other example discoveries from integrative data analyses of cancers include the characterization of three subtypes of GBM (proneural, mesenchymal 25 and classical) associated with different gene expression and mutation signatures impacting the clinical outcome [168]; the discovery of defects in homologous recombination in a large fraction (approximately 50%) of ovarian cancers studied by the TCGA [98]; and the realization that multiple types of sequence, expression and epigenetic defects, observed in acute lymphoblastic leukemia, affect the WNT and MAPK pathways, implicating these pathways as potential therapeutic targets for the disease [166]. 1.9 Childhood neuroblastoma As discussed in Sections 1.3 and 1.4, most adult cancers arise through progressive accumulation of genetic aberrations likely occurring over many years or decades. In contrast, fewer genetic changes occurring in a short developmental time window may be sufficient for the tumorigenesis of childhood cancers [99,170,171]. Therefore, characterizing the developmental origin of childhood cancers is essential to understanding the biology of these malignancies. Neuroblastoma (NBL) is a childhood cancer of the developing sympathetic nervous system [172]. Tumors of the sympathetic nervous system account for 7.8% of all cancers among children younger than 15 years of age and of these, 97% are NBLs [25]. The ganglia of the sympathetic nervous system are derived from the sympathoadrenal lineage of the embryonic neural crest [173]. The neural crest and its multiple lineages are discussed in more detail in Section 2.1. According to the Surveillance Epidemiology and End Results database that tracks cancer epidemiology data in the United States, NBL is the most common cancer diagnosed in the first year of life in the United States [25]. There are approximately 60 new NBL cases each year in Canada (Canadian Cancer Society). The most common site for primary NBL tumors is the adrenal medulla; however, tumors can arise anywhere along the sympathetic branch of the autonomic nervous system (the branch that mediates the fight-or-flight response) [174]. The exact cell of origin of NBL is unknown and likely differs for different disease subgroups, such that aggressive tumors derive from morphologically undifferentiated cells while benign tumors derive from more differentiated cell types [175] . It is thought that a subset of NBLs originates from PHOX2Bpositive neuronal progenitors [176]. As discussed in Section 1.9.2, inherited mutations in PHOX2B are associated with a fraction of familial NBLs. 26 1.9.1 Classification, treatment and prognosis NBL cases are diverse with regards to the histopathology, molecular features, and clinical outcomes. At presentation the disease can be limited to a single organ, locally or regionally invasive, or widely disseminated; more than 50% of cases are metastatic at presentation [177,174]. The most common metastatic sites are lymph nodes, bone marrow, bone, and liver [174]. Intriguingly, NBL is both disproportionally lethal despite very aggressive multimodal therapy and associated with a highest rate of spontaneous and complete regression in a subset of cases [178,174,179]. Among other factors, disease prognosis strongly depends on the age at diagnosis, with most infants typically having more favorable prognosis than older children. Historically a 12 months age cutoff was used for pre-treatment risk assessment; however, a recent retrospective study that examined the outcomes of 3,666 patients correcting them for MYCN status and stage, reported a continuous prognostic impact of age [180]. Statistical analysis performed in this study showed that a 460-day (18 months) cutoff maximized the outcome difference for younger and older patients. To facilitate comparisons between clinical trials and studies conducted in different countries, the International Neuroblastoma Staging System (INSS) was developed in 1988 by an international panel of experts and revised in 1993 [181]. Since then, the INSS has been the most commonly used staging system in Europe and North America [179]. The INSS is a surgically-based system that differentiates patients into stages 1, 2A, 2B, 3, 4 and 4S based on the degree of surgical excision, lymph node involvement, presence of distant metastases and age (younger or older than 12 months). A significant limitation of this system is its dependence on surgical resection, whereby patients with localized disease who do not undergo surgery cannot be properly staged. To address this limitation, a pre-treatment staging system was developed by the International Neuroblastoma Risk Group (INRG) task force and termed the INRG staging system [182]. According to the INRG staging system, tumors are to be classified at diagnosis into one of the four stages: L1 (localized disease without image-defined risk factors), L2 (localized disease with image-defined risk factors), M (metastatic disease), and MS (metastatic special disease). In addition, after examining 8,800 NBL cases from North America, Europe, Japan, and Australia, the INRG task force also characterized 16 clinically 27 distinct pre-treatment risk groups that are defined by 7 risk factors: age, INRG stage (L1, L2, M or MS), histological category, differentiation grade, MYCN oncogene amplification status, 11q LOH status, and ploidy [183]. Based on these factors, the INRG recommends classifying tumors into four pre-treatment risk categories with statistically different 5-year event-free survival (EFS): very low-risk (5-year EFS > 85%), low-risk (5-year EFS 75-85%), intermediate-risk (5-year EFS 50-75%), and high-risk (5-year EFS < 50%). Low- and very low-risk patients are often observed without any interventions or cured with surgery alone [184]. A special subset of low-risk patients with metastatic disease, denoted as INRG stage MS (INSS stage 4S) includes patients younger than 18 months with metastatic disease limited to bone marrow, liver or skin, favorable histology and no MYCN amplification. This subset of patients is often given supportive care and observed as these patients tend to achieve complete disease regression without any treatment [184]. Intermediate-risk patients are treated with surgery and moderate intensity chemotherapy, while high-risk patients undergo one of the most aggressive anti-cancer protocols available for both pediatric and adult cancer [184,174]. The front-line protocol for high-risk NBL includes surgery, high intensity chemotherapy with stem cell rescue, radiation, and biological therapy with retinoids [179]. Even despite this aggressive treatment, only 30-40% patients achieve long-term survival, and there is no regimen proven to be curative for relapsed disease [174]. A recent phase 3 clinical trial showed that adding ch14.18 monoclonal antibody against tumor-specific antigen GD2 to standard isotretinoin therapy for first remission improves survival for high-risk NBL patients by 20%, suggesting implementation of the immunotherapy protocol as part of the standard treatment for high-risk NBL [185]. Even so, high-risk NBL remains a significant challenge for pediatric oncologists, and new therapies are needed to improve the survival and reduce treatment-related morbidities for these patients. Chapter 3 of this thesis focuses on the analysis of NBL tumor-initiating cells, isolated from the bone marrow of relapsed high-risk NBL patients. As described in Section 1.5, cancer stem cells and tumor-initiating cells are presumed to be associated with tumor recurrences and drug resistance [35]. Therefore, the characterization of the transcriptomes of NBL tumor-initiating cells may help identify drug targets for relapsed and refractory disease. Chapter 4 describes an analysis of genomes, exomes, and transcriptomes of primary high-risk 28 NBL tumors with the goal of identifying genetic targets that could influence the development of novel therapies for high-risk NBL. 1.9.2 Neuroblastoma genetics and genomics A small subset of NBL cases (<5%) are familial and display an autosomal dominant mode of inheritance [179]. It has been shown in early studies that NBL incidence and family history follows Knudson‘s two hit hypothesis, and it was estimated that up to 22% of cases may have a germline mutation [186]. Recent studies have implicated activating mutations in anaplastic lymphoma kinase ALK to account for most cases of familial neuroblastoma [187,188]. Additionally, a small number of NBL cases that occur in conjunction with congenital central hypoventilation syndrome or Hirschsprung‘s disease are associated with germline mutations in PHOX2B [189,190]. The locus encodes a homeodomain transcription factor essential for the development of autonomic derivatives of the neural crest [191]. While PHOX2B harbors mutations that are exclusively germline, the ALK locus can be mutated or amplified in 5-15% of sporadic NBL [192–194,188]. Mutated ALK protein is typically overexpressed and shows constitutive kinase activity, and knockdowns of mutant alleles reduce proliferation of NBL cell lines [193]. In addition, recent evidence suggests that wild type ALK alleles may be oncogenic if they are associated with ALK overexpression; therefore, inhibition of wild type or mutant protein with small molecule inhibitors may provide therapeutic avenues for NBL patients with or without ALK mutations [195]. To understand the contribution of common variants to the development of sporadic NBL, a genome-wide association study is currently under way under the patronage of the Children‘s Oncology Group [196,174]. The study aims to genotype 5,000 European ancestry NBL cases and 10,000 matched controls using the Illumina HumanHap550 BeadChip platform. To date, the study has reported significant association with the high-risk NBL phenotypes of SNPs within FLJ22536 at 6p22 (odds ratio = 1.37; 95% confidence interval 1.27 to 1.49; P = 9.33E-15), BARD1 at 2q35 (odds ratio = 1.68; 95% confidence interval 1.49 to 1.90; P = 8.65E-18), and LMO1 (odds ratio = 1.34; 95% confidence interval 1.25 to 1.44; P = 5.20E-16) at 11p15 [197,196,198,66]; while SNPs within DUSP12 at 1q23 (odds ratio = 1.46; 95% confidence interval 1.28 to 1.65; P = 8.13E-9), DDX4 at 5q11 (odds ratio = 1.31; 95% confidence interval 1.14 to 1.49; P = 8.00E-5), IL31RA at 5q11 (odds ratio = 1.24; 95% confidence interval 1.08 to 1.42; P = 2.24E-3), and HSD17B12 at 11p11 (odds ratio = 1.47; 29 95% confidence interval 1.30 to 1.66; P = 5.04E-10) were associated with low-risk disease [199]. In addition, common gains of 1q21 (NBPF23) have been found to be significantly associated with NBL (odds ratio = 2.49; 95% confidence interval 2.02 to 3.05; P = 2.97E17), regardless of the disease phenotype [200]. 1.9.2.1 Copy number aberrations Tumor-specific amplification of the MYCN oncogene, found in approximately 20% of primary tumors, was the first copy number alteration to be characterized in NBL [201]. This copy number aberration was immediately recognized to be linked with inferior disease prognosis [202,203], and has remained a key molecular factor in pre-treatment risk assessment ever since (Section 1.9.1). As discussed in Section 1.9.2, ALK is amplified in a subset of NBL tumors. Examination of a panel of 50 NBLs using interphase FISH found that copy number alterations involving ALK occurred in 60% of tumors and were not correlated with copy number status at MYCN, 1p36, 11q or 17q loci [204]. In addition to MYCN and ALK amplifications, copy number alterations at several larger genomic regions are associated with clinical behavior or other phenotypic characteristics of the disease. For instance, losses of 11q have been reported to occur in NBLs without MYCN amplification, and have been associated with a poor disease prognosis in this subgroup [205,206]. In contrast, losses of 1p36 have been shown to be enriched in MYCN-amplified tumors; however, it has been suggested that these losses may confer a poor effect on survival, independently of MYCN. To address the prognostic significance of these two aberrations independently from other factors, these loci were specifically examined in a panel of 915 tumors; the study revealed that unbalanced loss of 11q and loss of heterozygosity at 1p36 were independently associated with poor prognosis in NBL. Due to its prevalence in non-MYCN-amplified cases, 11q status is currently used as one of the criteria for assigning a pre-treatment risk group according to the INRG system [183]. Another common copy number alteration found in NBL is gain of the distal arm of chromosome 17 (17q gain) [207]. This alteration usually occurs in tumors with poor prognosis; however, its independent prognostic significance is unknown [196]. Several reports of translocations between chromosomes 11 and 17 provide a potential pathway for the concomitant occurrence of 17q gain and 11q loss aberrations [208,209]. Other less frequent chromosomal alterations with unknown independent prognostic value have also been reported in NBL [196]. 30 In addition to specific copy number events discussed above, overall genome structures of 493 NBL tumor samples were examined using array CGH [187]. The study found that the structure of the tumor genome was variable across tumors, such that some tumors harbored exclusively whole chromosome gains and losses (numerical alterations), while others harbored gains and losses of parts of chromosomes (segmental alterations). Moreover, these genomic patterns were indicative of disease prognosis, such that tumors with numerical chromosomal alterations were associated with excellent prognosis, while tumors harboring any types of segmental chromosomal alterations were associated with high-risk disease or relapse. 1.9.2.2 Gene expression profiling of neuroblastoma Expression of several individual markers, including TRK neurotrophin receptors, have been associated with prognosis in NBL; in particular, expression of NTRK1 (TRK-A) and NTRK3 (TRK-C) is associated with favorable prognosis [210,211], while the expression of NTRK2 (TRK-B) is associated with poor prognosis [212]. The first microarray study conducted in NBL confirmed the association of gene expression patterns with disease course and derived a panel of 19 genes that could be used to classify tumors into prognostic groups [213] . A number of microarray expression profiling studies followed, several of them specifically focusing on improving the prognostic stratification for intermediate-risk cases that are difficult to assign to a treatment plan according to the known prognostic markers [214,215]. Another large-scale study developed a 144-gene signature, based on which the investigators were able to improve retrospectively the risk stratification used in NBL clinical trials; the gene expression signature was originally validated in a set of 174 patients [216], and later in 440 patients [217]. A recent meta-analysis study examined several previously published microarray datasets and single-gene studies to develop a robust 59-gene signature that was then validated in a set of 579 primary tumors spanning all risk groups, the largest patient cohort examined to date by gene expression analysis. After adjusting for other known clinical markers of prognosis, such as MYCN status, age and disease stage, the prognostic signature was found to be independently predictive of the overall and event-free survival [218]. While risk stratification based on gene expression has been shown to improve the performance of the 31 prognostic factors currently used in the clinics based on large sample sets, it is yet to be implemented into the clinical management of NBL. 1.9.2.3 Genetically engineered mouse models of neuroblastoma The creation of genetically engineered mouse models (GEMMs) carrying exogenous DNA of interest has contributed to our understanding of the functions of cancer genes [219]. A key role of GEMMs in cancer research has been in characterizing which aberrations in cancer genes can induce or contribute to tumorigenesis when expressed in mice, providing functional evidence for these aberrations acting as driver events in human cancer formation [220,219]. As discussed in Section 1.9.2.1, amplification of the MYCN oncogene is the best characterized genetic event that occurs in 20% of NBL tumors and is associated with poor disease prognosis. To understand the role of MYCN in NBL, a GEMM in which MYCN is overexpressed in the sympathoadrenal lineage of the neural crest using the tyrosine hydroxylase (TH) promoter was constructed [221]. The TH-MYCN mice hemizygous for the MYCN transgene develop NBL tumors with 70% penetrance by 1 year of age, while homozygous mice develop tumors with 100% penetrance by 4 months of age [222,220]. The TH-MYCN mouse model provides a model of MYCN-amplified NBL, and is currently the only well-characterized GEMM available for NBL [220]. Murine NBL recapitulates many of the biological and clinical characteristics of human MYCN-amplified NBL, such as genomic abnormalities (including MYCN amplification), disease pathology and gene expression patterns [220,222]. However, the model differs from the human disease by the low frequency of bone marrow metastases, and the predominantly paraspinal (as opposed to adrenal in humans) primary tumors [223,220]. 1.10 Thesis roadmap and chapter summaries Recent advances in cancer genomics have contributed to our understanding of cancers as diseases associated with multiple aberrations that can affect genes at the level of sequence, copy number, mRNA expression, or epigenetics. Applications of cancer genomic methods to the analysis of high-risk neuroblastoma (NBL) led to discoveries of recurrent copy number aberrations, gene expression signatures, and predisposition markers predictive of the disease phenotype. These studies revealed molecular heterogeneity of high-risk NBL, suggesting that application of higher resolution approaches may identify novel markers linked to the 32 pathogenesis of this disease. The main hypothesis underlying the research described in this thesis is that single-nucleotide resolution analysis of high-risk NBL genomes and transcriptomes will lead to the discovery of new loci that contribute to the disease. I also hypothesized that better understanding of gene expression profiles of the putative normal cell of origin of NBL will help interpret high throughput sequencing data from NBL cells by placing it in the context of expression analysis of the normal neural crest cells. Therefore, the objectives of my research are to characterize the genomes and transcriptomes of high-risk NBL primary tumors, NBL tumor initiating cells, and normal neural crest cells using new sequencing technologies with a goal of identifying novel loci that may be implicated in the disease. Since NBL originates from the developing neural crest, the goal of the research in Chapter 2 is to identify and characterize the expression of key genes and pathways that distinguish normal neural crest stem cells from other stem cell lineages. Key findings of the work described in Chapter 2 include the plasticity of the neural crest stem cell phenotype, in which non-neural-crest derived cells can converge to this phenotype; and the finding of a decreased expression of double-stranded DNA repair genes as compared to another somatic stem cell lineage with a broad developmental potential, the mesenchymal stem cells. The rationale for studying NBL tumor initiating cells (TICs), a highly tumorigenic population of metastases-derived NBL cells, in Chapter 3 is the aggressive behavior of highrisk NBL and its high propensity for relapse, potentially linked to the persistence of TICs that are resistant to conventional therapies. The goal of the research described in Chapter 3 is to use RNA sequencing data from NBL TICs to identify NBL TIC-enriched transcripts and use them to predict therapeutics that could specifically target these cells. The key finding of this work is the identification and validation of AURKB as a novel drug target for NBL TICs. Having studied the transcriptomes of normal neural crest cells (Chapter 2) and NBL TICs (Chapter 3), I addressed in Chapter 4 whether whole genome and transcriptome analysis of primary NBL tumors may identify additional genetic markers that could inform novel therapies of relevance to primary NBL tumors at diagnosis. This large-scale sequencing work revealed that NBL tumors harbored relatively low frequencies of somatic point mutations in coding sequences. Despite this observation, several gene groups, including those involved in the MAPK signaling pathway and chromatin remodeling, emerged from this analysis as 33 being the targets of somatic mutations in 15% and 11% of patients, respectively. These mutational signatures may suggest potential therapeutic avenues that could be explored in patient subgroups with these mutations. 34 Figure 1.1 Advances in sequencing chemistry implemented in the earliest next-generation sequencers In each diagram, DNA templates are depicted as black bars, sequencing primers are shown as aquamarine bars, and DNA polymerases are represented as light blue circles. (A). The pyrosequencing approach implemented in 454/Roche sequencing technology detects incorporated nucleotides (here an A nucleotide is shown) by chemiluminescence (yellow shape) resulting from PPi release. (B). The Illumina method utilizes sequencing-by-synthesis in the presence of fluorescently labeled nucleotide analogues (green, red, blue and yellow circles) that serve as reversible reaction terminators. The sequencing is performed on millions on templates simultaneously, and an imaging step follows each incorporation step to determine the identity of added nucleotides (bottom). (C) The single-molecule sequencing-by-synthesis approach detects template extension using Cy3 and Cy5 labels attached to the sequencing primer (aquamarine) and the incoming nucleotides (fuchsia), respectively. (D). The SOLiD method sequences templates by sequential ligation of labeled degenerate probes. Two-base encoding implemented in the SOLiD instrument allows for probing each nucleotide position twice. For instance, the nucleotide sequence demonstrates that the T base is effectively read twice by red (A to T) and green (T to G). The matrix on the left shows that each of the four colors encodes two separate nucleotide pairs. Reprinted with permissions of Annual Reviews. 35 A C B D 36 Figure 1.2 Transcript model coverage by various sequencing-based methods for transcriptome analysis The exons in a gene model are represented by orange, blue and green bars, while the introns are in grey. Following transcription and splicing, a transcript carrying exons 1, 2, and 3 is produced. The coverage of this transcript by various methods is depicted in the black box: Sanger-based expressed sequence tags (ESTs) are generated from the 3‘ or 5‘ end of transcripts, whereas SAGE tags represent short sequences at their 3‘ ends; randomly primed short reads generated by next-generation sequencers detect bases throughout the length of the transcript. Modified with permissions of Annual Reviews. 37 38 Table 1.1 Specifications of the common next-generation sequencing platforms as compared to the most common Sanger sequencer (Life Technologies’ ABI3730XL) The run statistics in this table are from [224]. The average read length is for high quality reads of more than 200 bases (the mode is higher). *Polymerase Chain Reaction (PCR) can be used for the amplification of templates for Sanger sequencing, when it is desired to sequence specific regions of the genome; the use of PCR for template amplification and candidate gene sequencing by the Sanger method is discussed in Section 1.6.3.3. Instrument Average read length 650 bp Run time 2 hrs Mega bases per run 0.06 Paired 150 bp 96,000 Paired 100 bp 14 days 8 days 400 bp 10 hrs 500 Sequencing by synthesis with irreversible terminators (Sanger) Sequencing by synthesis with reversible terminators Sequencing by synthesis with reversible terminators Pyrosequencing on solid support 12 days 71,400 PacBio RS Paired 50 bp (forward) and 35 bp (reverse) 860-1,100 bp 0.52hrs 5-10 Heliscope 35 bp N/A 28,000 Ion Torrent (316 chip) >100 bp 2 hrs >100 ABI3730XL Illumina GAIIx Illumina HiSeq2000 454/FLX Titanium SOLiD-4 200,000 Sequencing chemistry Template amplification In vivo cloning* Company Bridge PCR Life Technologies Illumina Bridge PCR Illumina Emulsion PCR Roche Sequencing by ligation Emulsion PCR Life Technologies Sequencing by synthesis using SMRT (single molecule real time) technology Sequencing by synthesis with virtual terminators Sequencing by synthesis with semiconductor detection None Pacific Biosciences None Helicos BioSciences Life Technologies Emulsion PCR 39 Chapter 2: Transcriptome analysis of normal neural crest stem cells 2 2.1 Introduction During early human development, a zygote (fertilized egg) undergoes cell divisions to form a blastula that implants into the uterus to continue embryogenesis. Following implantation, the process of gastrulation results in the formation of the asymmetric embryo consisting of three germ layers – ectoderm, mesoderm and endoderm – that go on to develop all major organs and tissues in the body. The ectoderm-derived neural crest is a transiently multipotent cell population unique to vertebrates [225]. Neural crest cells migrate out of their origin at the apex of the neural tube, the embryo‘s precursor to the central nervous system, and form aggregates throughout the embryo that later develop into ganglia of the peripheral nervous system. A fraction of neural crest cells infiltrates other organs, such as skin, gut and adrenal glands to generate melanocytes, enteric neurons, and hormone-secreting chromaffin cells, respectively [226]. Neural crest cells also contribute to craniofacial cartilage and bone, as well as cardiac and smooth muscle tissues. The development of neural crest cell lineages has been compared to the process of haematopoiesis [226], in which blood cell types derive from a hematopoietic stem cell via differentiation into a series of committed progenitors. In this model, the original stem cell is multipotent, while the differentiation potential of the committed progenitors is restricted to the cell types that make up the particular lineage. Accordingly, the existence of progenitor cells committed to specific neural crest lineages has been proposed, including those committed to the enteric, parasympathetic, sympathoadrenal, sensory, glial, and melanogenic lineage [226]. The sympathoadrenal progenitor, a common progenitor to sympathetic neurons 2 Portions of this Chapter have been published, and the co-author contributions are detailed in the Preface as per the University of British Columbia PhD thesis guidelines: O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008; H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press; M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier. 40 and chromaffin cells originating from the trunk region of the neural tube, has been identified and characterized using imaging studies in rats [227]. Intriguingly, cell types derived from the sympathoadrenal progenitor, but not other neural crest progenitors, are those that are susceptible to transformation into NBL [216,217]. The differentiation and specification of the sympathoadrenal lineage progenitors, although not completely understood, involves the transcription factors ASCL1, PHOX2A, PHOX2B, and HAND2 [175]. Since NBL originates from the developing neural crest, and moreover, from a specific neural crest lineage, understanding the biology and differentiation of normal neural crest stem cells may help shed light onto molecular events associated with NBL formation. Moreover, germline mutations in PHOX2B are associated with a fraction of familial NBL cases implicating genes involved in neural crest differentiation in NBL formation [190,189]. Work to date has shown the persistence of adult or somatic stem cells in many tissues, most notably central nervous and hematopoietic systems [230–232]. Similarly, multipotent adult stem cells have been isolated from the dermis of rodent and human skin and termed SKinderived Precursors (SKPs) [221,222]. These cells have been shown to maintain differentiation potential reminiscent of the neural crest stem cell, and are able to generate peripheral neurons, glia, Schwann cells (a subtype only thought to be made from the neural crest), and smooth muscles [235]. We also demonstrated in a publication that is beyond the scope of this thesis that SKP progenitors reside in the hair follicle niche and exhibit properties expected of a dermal stem cell, contributing to dermal maintenance, woundhealing, and hair follicle morphogenesis [236]. Skin-derived Precursors can be derived from the dermis throughout the body, only the facial component of which originates from the neural crest embryonically [237–239]. Therefore, it is unclear whether SKPs isolated from different areas of the dermis are of neural crest origin per se or converge towards the neural crest stem cell phenotype. If SKPs are derived from the neural crest, this would imply that neural crest progenitors invade the mesoderm-derived dorsal and ventral dermis [237,239] during embryogenesis, and that it is these precursors that associate with hair follicles and generate SKPs. Alternatively, if ventral and dorsal SKPs are not derived from the neural crest, this would indicate a possibility of a second developmental pathway to generate neural crest-like cells from a non-neural-crest origin. Distinguishing between these two possibilities would have important implications for 41 developmental biology as the origin of somatic stem cells, such as SKPs, is not well understood [240]. In terms of the NBL development, the second option would indicate that NBL may potentially derive from a non-neural crest origin by lineage convergence. In terms of NBL laboratory research, the origin of SKPs by lineage convergence would also indicate that SKPs derived from any area of the body can be used to model normal counterparts of NBL cells in the laboratory. Precedence for the lineage convergence phenomenon has been established by in vitro studies where normal fibroblasts could be reprogrammed toward an ES-cell-like phenotype [241,242]. In addition, as discussed in Section 1.5, breast cancer stem cells can arise in vivo from tumor cells via epithelial to mesenchymal transformation [33]. These considerations led us to hypothesize that neural crest stem cell-like cells may arise from cell lineages other than the neural crest by adopting a neural crest stem cell-like phenotype. The overall objective of this Chapter is to characterize the expression profiles of SKP lines used as models of neural crest progenitors [234] and representing the presumed normal counterparts of neuroblastoma cells [231,232]. To fulfill this objective, I addressed the specific aims outlined below. First, I characterized the transcriptomes of SKPs isolated from facial, ventral and dorsal skin regions of the body that were shown by lineage tracing work to be derived from different developmental origins. In this experiment I showed that the three SKP lineages were similar at the expression level, but maintained the expression of a small set of genes indicative of their embryonic origin. Second, I used the three SKP populations to identify genes enriched and depleted in all SKPs compared to a mesodermal multipotent somatic stem cell lineage, mesenchymal stem cells. These transcripts represent markers that are common to neural crest progenitors regardless of their origin and distinguish the neural crest progenitors from mesenchymal stem cells. Third, as it was noted during the experiments described under Specific Aim 2 that neural crest, but not mesenchymal progenitors, expressed markers of pluripotency, I compared the normal neural crest progenitor-enriched transcripts to transcripts enriched in normal embryonic stem cells. This was done to further delineate similarities and differences between SKPs and embryonic stem cells (ESCs). The results from these analyses provided insights into distinguishing characteristics of the transcriptome of normal neural crest progenitors, the cell type that is thought to undergo transformation to form NBL. 42 2.2 Results 2.2.1 SKPs of distinct developmental origin are highly similar at the transcriptional level and differ from bone marrow mesenchymal stem cells (MSCs) To address whether SKPs and their endogenous dermal precursors originate from the neural crest or whether, like the dermis itself, they originate from multiple developmental origins our collaborators conducted lineage tracing experiments. Briefly, they used two different mouse Cre lines that allowed them to perform lineage tracing: Wnt1-cre, which targets cells deriving from the neural crest, and Myf5-cre, which targets cells of a somite (mesodermal) origin. By crossing these Cre lines to reporter mice, they showed that the endogenous follicle-associated dermal precursors in the face derive from the neural crest, and those in the dorsal trunk derive from the somites (mesoderm), as do the SKPs they generate. The ventral trunk SKPs were found to derive from lateral plate mesoderm. Despite these different developmental origins, facial and trunk SKPs are functionally similar, even with regard to their ability to differentiate into Schwann cells, a cell type only thought to be generated from the neural crest [245]. To comprehensively define the similarities and differences among these developmentally distinct populations of SKPs, I compared global expression profiles derived from dorsal trunk SKPs, ventral trunk SKPs, and facial SKPs and mesenchymal stem cells (MSCs), used as a control for mesoderm-derived ventral and dorsal SKPs, all generated from adult rats. The rat model was chosen to provide a direct comparison with adult rat mesenchymal stromal cells (MSCs) that were available to our collaborators. RNA samples purified from the SKPs and MSCs were analyzed on the Affymetrix GeneChip Rat Gene 1.0 ST Array. After normalization and filtering described in the Methods, genes with variable expression profiles across facial SKPs, dorsal SKPs, ventral SKPs and MSCs were identified using the multiple group comparison implemented in the LIMMA Bioconductor package [246]. The LIMMA method was chosen for this analysis, as several studies noted its consistently favorable performance compared to other common methods of microarray data analysis, including Welch's T-test, ANOVA, SAM, and RVM [247,248]. In total, 7,012 out of 18,879 genes showed evidence of differential expression across the 4 groups (BenjaminiHochberg-corrected q < 0.05). Spearman Rank correlations, computed between each sample pair based on the expression profiles of these genes, demonstrated that dorsal, ventral, and 43 facial SKPs were virtually identical with an average Spearman Rank correlation value of 0.94 (SD = 0.031). In contrast, average Spearman Rank correlation between SKPs and MSCs was 0.82 (SD = 0.025) (Figure 2.1A). Using a two-tailed Student‘s T-test, this difference was statistically significant (P < 0.0001). Unsupervised hierarchical consensus clustering analysis, performed on the samples based on the variable gene set described above, using a standard hierarchical clustering algorithm (correlation distance, average linkage clustering, 100 bootstrap iterations), confirmed these conclusions. The clustering result demonstrated that the three SKP populations grouped together, and that all three were distinct from the MSC samples (Figure 2.1B). 2.2.2 SKPs of distinct developmental origin maintain a lineage history at the gene expression level To delineate the extent of differences among the transcriptomes of the neural crestderived facial SKPs, the mesoderm-derived ventral and dorsal SKPs and the MSCs, I performed three-way differential expression analysis using linear models implemented in the LIMMA Bioconductor package [237,234]. The Venn diagrams show the numbers of significantly differentially expressed genes (Benjamini-Hochberg-corrected q < 0.05) that are in common among the comparisons (Figure 2.2A). Taken together, a total of 2,603 genes showed evidence of differential expression in any of the three pairwise comparisons; and the expression levels of these genes are plotted as a heatmap (Figure 2.2B). Of these genes, only 106 were significantly different between dorsal and facial SKPs, while 2,233 and 2,525 differed between MSCs versus dorsal SKPs and MSCs versus facial SKPs, respectively. These data are compatible with the interpretation that precursor cells of at least two, and potentially three, different developmental origins converge on to a highly similar phenotype. I therefore directly compared the expression of genes associated with neural crest fate specification [250], focusing on Slug, Snail, Twist, Sox9, Sox10, Foxd3, and Ap2a1. Heatmaps of the microarray data showed that these genes were expressed at similar levels in all three of the adult rat SKP samples, as were p75NTR and RhoB, which are also associated with neural crest precursors (Figure 2.3A, left panel) [239,240]. Reverse transcription polymerase chain reaction (RT-PCR) analyses of neonatal mouse skin, conducted by our collaborators, confirmed that these mRNAs were also expressed at similar levels in neonatal murine dorsal versus facial SKPs (Fig. 2.3B, left panel). 44 Although these analyses indicate that mesenchymal precursors of different developmental origins converge to a very similar adult precursor cell phenotype, a pairwise differential expression comparison of facial versus dorsal trunk SKPs and dorsal trunk versus ventral trunk SKPs using linear models demonstrated that a subset of genes were significantly differentially expressed (Benjamini-Hochberg-corrected q < 0.05; Table 2.1). Of the 35 most differentially expressed genes in the facial versus dorsal comparison, 10 were higher in dorsal SKPs and 25 in facial SKPs. Of the 35 most differentially expressed genes in the dorsal versus ventral comparison, four were higher in dorsal SKPs and 31 in ventral SKPs (Table 2.1). Many of these genes play an important role during embryogenesis. In particular, dorsal trunk SKPs express high levels of the Zic1 transcription factor relative to both facial and ventral trunk SKPs, and the hox transcription factors Hoxa5, Hoxc4, Hoxc6, and Hoxc9 are high in dorsal trunk SKPs relative to facial SKPs (Figure 2.3B; Table 2.1). In contrast, facial SKPs expressed high relative levels of Pax3, and Msx1, both of which are transcription factors associated with cranial neural crest cells [253], and Mab-21-like 1 and Mab-21-like 2, mammalian homologues of the C. elegans mab-21 cell fate gene that is expressed during embryogenesis [254] (Figure 2.3A, right panel; Table 2.1). The relative enrichment of these different mRNAs was confirmed by our collaborators using RT-PCR analysis of neonatal mouse dorsal versus facial SKPs (Figure 2.3B, right panel). Thus, although these different dermal precursor populations are highly similar, they maintain a history of their distinct developmental origins. 2.2.3 Identification of genes significantly enriched and depleted in neural crest stem cell-like cells We reported in Sections 2.2.1 and 2.2.2 that the expression analysis of SKPs, complementing laboratory studies by our collaborators, was compatible with SKPs being derived from different developmental origins and converging onto the common neural crest stem cell-like phenotype. We next set out to identify transcripts that are enriched and depleted in SKPs from all three developmental origins compared to MSCs. These transcripts represent markers enriched and depleted in normal neural crest stem cell-like cells compared to another type of multipotent somatic stem cell, the bone marrow-derived MSCs. The MSCs provide a suitable comparator for SKPs for two reasons. First, they represent one of the few somatic stem cell lineages with a similarly broad developmental potential to the neural crest 45 [230,255]. Second, MSCs derive from the mesoderm [256], which, as discussed in Sections 2.1 and 2.2.1, is also the lineage of origin of ventral and dorsal SKPs. To identify transcripts enriched in each of facial SKPs, ventral trunk SKPs and dorsal trunk compared to MSCs, I performed pairwise differential expression analysis using linear models implemented in the LIMMA Bioconductor package [246,249]. Based on the pairwise gene expression comparisons, 3,406 genes were significantly differentially expressed between ventral trunk SKPs and MSCs; 2,793 genes were significantly differentially expressed between facial SKPs and MSCs; and 2,424 genes were significantly differentially expressed between dorsal SKPs and MSCs (Benjamini-Hochberg-corrected q < 0.05). Next, results from the pairwise comparisons were combined to identify genes that were significantly enriched or depleted in SKPs compared to MSCs in all three comparisons. In total, 654 genes were found to be enriched in all three SKP lineages compared to MSCs, while 752 were found to be depleted in all the three SKP lineages compared to MSCs. These genes are listed in Appendix A and their expression is plotted as a heatmap in Figure 2.4. 2.2.4 Pathway analysis of SKP-enriched and SKP-depleted transcripts To characterize the functions of transcripts differentially abundant in SKPs compared to MSCs, I conducted a pathway enrichment analysis using the Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com). Using rat-derived gene lists, 618/654 and 681/752 could be annotated by the Ingenuity Knowledgebase. However, when the rat-derived gene lists were converted to human orthologs, 624 out of 654 and 696 out of 752 genes could be annotated in the Ingenuity database, for genes enriched and depleted in SKPs compared to MSCs, respectively. Notably, when rat-derived gene lists were analyzed, the functional enrichment results were almost identical to those obtained for the human orthologues. The results from the human ortholog pathway analysis are quoted here as the human analysis had a higher number of annotated genes. After applying Benjamini-Hochberg correction, 14 canonical pathways were significantly enriched among the genes upregulated in SKPs (Figure 2.5A, Table 2.2A), while 13 canonical pathways were enriched among the genes downregulated in SKPs (Figure 2.5C, Table 2.2B). The majority (9 out of 14) of the pathways upregulated in SKPs compared to MSCs involved WNT/Beta-Catenin, bone morphogenetic protein BMP or transforming growth factor beta TGFB signaling (Table 2.2A). These include canonical pathways named ―Role of 46 Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis‖, ―Axonal Guidance Signaling‖, ―Colorectal Cancer Metastasis Signaling‖, ―Human Embryonic Stem Cell Pluripotency‖, ―WNT/Beta-Catenin Signaling‖, ―Role of Macrophages, Fibroblasts and Endothelial Cells in Rheumatoid Arthritis‖, ―Molecular Mechanisms of Cancer‖, and ―Leukocyte Extravasation Signaling‖. While the canonical WNT/Beta-Catenin signaling on its own induces differentiation of the neural crest along the sensory neural lineage, the BMP signaling pathway was shown to antagonize this activity, such that presence of both BMP and WNT maintains neural crest stem cell phenotype and multipotency [257]. Therefore, the finding that WNT/Beta-Catenin and BMP signaling account for the majority of pathways upregulated in SKPs is consistent with the neural crest stem cell-like phenotype of these cells, and distinguishes them from other non-neural crest multipotent stem cells. In addition, this finding provides a validation for the computational approach used to identify the pathways associated with transcripts enriched and depleted in SKPs compared to MSCs. An adaptation of the Ingenuity canonical pathway named ―Human Embryonic Stem Cell Pluripotency‖, outlining WNT/Beta-Catenin and BMP signaling along with other key stemness molecules in provided in Figure 2.5B with the genes upregulated in SKPs highlighted in red. In contrast, the majority (9 out of 13) of the pathways downregulated in SKPs compared to MSCs were involved in cell cycle control or DNA repair (Table 2.2B). These include canonical pathways named ―Hereditary Breast Cancer Signaling‖, ―Role of BRCA1 in DNA Damage Response‖, ―ATM Signaling‖, ―DNA Double-Strand Break Repair by Homologous Recombination‖, ―Mitotic Roles of Polo-Like Kinase‖, ―Cell Cycle Control of Chromosomal Replication‖, ―Cell Cycle: G2/M DNA Damage Checkpoint Regulation‖, ―Role of CHK proteins in Cell Cycle Checkpoint Control‖, and ―Molecular Mechanisms of Cancer‖. Intriguingly, the breast cancer, early onset 1 BRCA1 gene was involved in 7 out of 9 of these pathways (Table 2.2B) suggesting its central role in the regulatory network downregulated in SKPs compared to MSCs. An adaptation of the Ingenuity canonical pathway named ―Role of BRCA1 in DNA Damage Response‖ is depicted in Figure 2.5D with the genes downregulated in SKPs highlighted in green. 47 2.2.5 SKPs share expression profile similarities with ES cells 2.2.5.1 Identification of genes associated with the maintenance of the undifferentiated state in human ES cells In Sections 2.2.3 and 2.2.4 I reported on similarities and differences in gene expression between two lineages of multipotent somatic stem cells with a broad developmental potential, SKPs and MSCs [258,230,255]. This analysis revealed that the Ingenuity pathway annotation ―Human Embryonic Stem Cell Pluripotency‖ was significantly enriched among transcripts expressed at a higher level in all three SKP lineages compared to MSCs (Table 2.2A) suggesting that SKPs had more features of a pluripotent cell than MSCs. This result is consistent with the observation that neural crest stem cells arguably have the broadest developmental potential among the somatic stem cell types, as they are able to generate diverse cell types, including smooth muscle cells, dermis, tendons, connective tissues, sensory neurons and root ganglia of the peripheral nervous system, Schwann cells, pigment cells, and neuroendocrine cells of the adrenal medulla. Since the expression of pluripotency markers was a distinguishing feature of SKPs compared to MSCs, we aimed to further define the extent to which SKPs resembled human ES cells. To investigate the similarities and differences in gene expression between SKPs and ES cells, I set out to identify a comprehensive list of genes that may be associated with the maintenance of pluripotency in human ESCs. To accomplish this, 319 candidate pluripotency genes were selected by the Connie Eaves laboratory based on their potential role in stem cell biology, and literature reviews. These genes are involved in transcription, chromatin maintenance, and membrane receptor signaling and are listed in Appendix B. To determine which of the 319 candidate genes might be most tightly linked to the maintenance of pluripotency, I undertook an analysis of their representation in 17 LongSAGE libraries derived from undifferentiated and differentiated human ES cells as well as several adult tissues (Table 2.3). I used the SAGE Genie tool for the gene-to-tag mapping such that each gene was represented by one LongSAGE tag [259]. Following the mapping, the tag counts were normalized to the depth of 100,000, and a heuristic approach termed seriation was used to identify groups of genes with similar expression levels in different libraries. The seriation approach was chosen for this analysis, as it was shown to perform favorably compared to a clustering algorithm PoissonC, commonly used for SAGE data 48 analysis, when relatively small numbers of genes (under 5,000) are studied [260]. Since our implementation of seriation is based on the visualization of contigs to identify co-expressed genes, the approach is suitable for targeted analyses, such as when the expression of a selection of genes is considered. The method is not suitable for unsupervised genome-wide analysis. Seriation is a statistical method for simultaneously ordering rows and columns of a symmetrical distance matrix for the purposes of revealing an underlying one-dimensional structure [261]. An assumption in seriation analysis is that there is an order (or distinct sub orders) in the data that are biologically meaningful. The inherent orders may represent any sequential structure among the data (e.g. their dependence on time or another variable). In the application described here, I hypothesized that the sequential structure present in the expression data was the developmental restriction of the expression of the candidate pluripotency genes to the undifferentiated ES cell libraries. The seriation analysis identified three categories of genes (Figure 2.6A, Appendix B). In the original publication, we termed these higher order structures or categories ―Supercontigs‖ [260]. Upon further inspection, Supercontig1 contained 114 genes for which the expression was restricted to undifferentiated ES cells. This group contained tags for POU5F1, NANOG, SOX2, FOXD3 and other genes whose expression is known to decrease upon human ES cell differentiation. The average expression levels of these 114 genes in the 17 LongSAGE libraries are plotted in Figure 2.6B. A second group (Supercontig 2) consisted of 145 genes whose transcript abundance was increased in differentiated cells. The third subset (Supercontig 3) contained the remaining 60 genes whose expression patterns did not fit within either of the two categories. 2.2.5.2 Validation of pluripotency markers using computational methods Since the genes in Supercontig 1 were preferentially expressed in the undifferentiated human ES cells, I hypothesized that they would be involved in pathways associated with human ES cell pluripotency. To test this hypothesis, I used Ingenuity software (Ingenuity Systems, www.ingenuity.com) to identify canonical pathways significantly enriched among the 114 genes in Supercontig 1 (Appendix B). After correcting for multiple testing, four Ingenuity canonical pathways ―Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency‖, ―Role of NANOG in Mammalian Embryonic Stem Cell Pluripotency‖, ―Human Embryonic Stem Cell Pluripotency‖, ―Actin Cytoskeleton Signaling‖ remained 49 significantly enriched among the 114 transcripts in Supercontig 1 (Figure 2.7A). As expected, all of these pathways appeared to be associated with the maintenance of pluripotency. Given the restriction of the expression of the 114 transcripts in Supercontig 1 to the undifferentiated ES cells, we further hypothesized that the promoters of these genes, but not genes in other Supercontigs, contained binding sites for core transcription factors OCT/POU, SOX2, and NANOG that are known to be required for the maintenance of pluripotency in cell culture [262]. To address this possibility, we used PASTAA software to analyze the promoters of the genes contained in all three Supercontigs [263]. PASTAA interrogates groups of co-expressed genes and ranks their likelihood of being regulated by a transcription factor, as evident from the presence of known transcription factor binding sites. Two separate PASTAA analyses were performed on each of the three Supercontigs. One analysis involved interrogating a region extending 10 kb upstream from the transcription start site (distal analysis); the other analysis interrogated a region 6400 bp on each side of the transcription start site (proximal analysis). PASTAA analysis of Supercontig1 (the 114 undifferentiated human ES cell-specific genes) showed SOX (P= 0.0018) and FOX (P= 0.041) PWMs to be highly ranked in the distal analysis, and NANOG (P=0.013) and OCT/POU (P= 0.024) PWMs to be highly ranked in the proximal analysis (Figure 2.7C and B, respectively). Notably, the PASTAA data predicted binding of all four core pluripotency transcription factors NANOG, OCT/POU and SOX to their own and each others‘ promoters, as expected [262]. Several PWMs scored higher than NANOG in the proximal analysis (hollow circles left of the NANOG PWM in Figure 2.7B). These included PWMs representing binding sites for nuclear factor Y (NFY), a regulator of stem cell gene CD34 [264]; transcription factor AP2, expressed during early development [265] and required for the development of the sympathetic lineage [266]; TBX5, a regulator of limb development; and a neuronal regulator ELK1 [267]. In the distal analysis, PWMs for cAMP response element-binding (CREB) family members involved in neurogenesis [268] scored highly (hollow circles between ATF and FOX PWMs in Figure 2.7C), in addition to the PWMs for the core pluripotency factors. PASTAA analysis of Supercontig 2 also predicted a FOX motif to be present in the distal promoter region of these genes (P=0.0185) and a NANOG motif in the proximal promoter region (P=0.00786). Analysis of Supercontig 3 predicted FOX motifs in both the 50 distal (P=0.0404) and proximal (P=0.0327) promoter regions of these genes, but no NANOG, OCT/POU, or SOX motifs in either the distal or proximal promoter regions. In conclusion, only upstream regions of genes in Supercontig 1, but not the other Supercontigs, contained binding sites for all four core pluripotency transcription factors NANOG, OCT/POU and SOX that are required for the maintenance of the undifferentiated state in cell culture. This finding is consistent with the restricted expression of these genes in the undifferentiated ES cells, and provides additional evidence for their candidate role in the regulation of pluripotency. 2.2.5.3 Pluripotency genes whose transcripts are enriched or depleted in normal neural crest stem cell-like cells compared to mesenchymal stem cells In Section 2.2.5.1 I used seriation of LongSAGE libraries to identify candidate pluripotency genes whose expression was restricted to undifferentiated human ES cells. I identified 114 genes preferentially expressed in undifferentiated human ES cells, and their candidate role in pluripotency was supported by computational analyses in Section 2.2.5.2. To assess whether these pluripotency markers were among the transcripts enriched or depleted in SKPs with respect to MSCs, I compared the genes in Supercontig 1 to those identified in Section 2.2.3 and listed in Appendix A. This analysis revealed 5 pluripotency markers (CTNNB1, ETV4, MAD2L2, PITX2, SOX2) among the genes enriched in SKPs compared to MSCs, and 13 pluripotency markers (ADAM23, AURKB, CENPK, FAM46B, FAM64A, HMGB2, IGF2BP3, KPNA2, MTHFD1, MYBL2, TBX4, TPM1, ZFP57) among the genes depleted in SKPs compared to MSCs (Table 2.4). Two known pluripotency genes CTNNB1, a member of the WNT signaling pathway, and SOX2, one of the master regulators of pluripotency [262], were among the transcripts enriched in SKPs compared to MSCs. In addition, AURKB, a kinase known to interact with the BRCA1-associated RING domain protein 1 (BARD1), a member of the double-stranded break repair pathway [269] was found to be depleted in SKPs compared to MSCs. Intriguingly, we show in Chapter 3 that AURKB is expressed in NBL tumor-initiating cells and is a drug target for high-risk NBL. These observations suggest that although SKPs share expression profile similarities with ESCs, and have a broader developmental potential than MSCs, they are different from ESCs in the identity of pluripotency markers they express, highlighting the uniqueness of the neural crest stem phenotype. 51 2.3 Discussion NBL originates from the sympathoadrenal lineage that is thought to derive from sequential differentiation of the neural crest stem cell [226,173,229]. Sympathoadrenal precursors go on to develop the neuroendocrine cells of adrenal medulla, the most common primary site of NBL [174]. A correlation between the differentiation state of NBL cells and the clinical aggressiveness of the disease has been noted, such that cells of the most aggressive high-risk subtype resemble most primitive neural crest precursors, while cells of low-risk subtypes resemble various stages of neural crest differentiation [175]. Therefore, understanding the genesis of the sympathoadrenal lineage from early neural crest precursors, the origin and the phenotype of the neural crest stem cell and its transformation to the malignant counterpart, may shed light onto molecular events that contribute to the development of NBL. In Sections 2.2.1 and 2.2.2 I built on the lineage tracing experiments conducted by our collaborators to help characterize the developmental origin of different populations of neural crest stem cell-like cells (SKPs) that possess many properties of the somatic neural crest stem cell. The lineage tracing work found that skin-derived neural crest stem cell-like cells originated from different developmental lineages depending on the part of the body they are derived from. However, my gene expression analysis showed that these cell populations possess remarkably similar gene expression profiles. In fact, only 35 genes were significantly differentially expressed between each facial and dorsal, and dorsal and ventral SKPs, in contrast to thousands of genes differentially expressed between each SKP lineage and another multipotent somatic stem cell lineage, the mesenchymal stem cells (MSCs). The MSCs were chosen for this comparison to represent mesoderm, the embryonic origin of dorsal and ventral SKPs. This result sets precedence for the origin of a somatic stem cell from a different tissue type, as in this case mesoderm-derived SKPs from the dorsal and ventral trunk appear to converge to a neural crest stem cell phenotype that is similar to that of neural crest-derived facial SKPs. This observation may be significant for the genesis of NBL that is thought to derive from the neural crest. Our results suggest that since neural crest-like cells can derive from a non-neural crest lineage, it may be possible that neural crest-like lineages may give rise to NBL. 52 Conceptual support for the idea of lineage convergence occurring in nature, specifically as applied to the neural crest, comes from the fact that several tissues, including the gut and respiratory epithelium, are known to contain neuroendocrine cells (typical neural crest derivatives) of non-neural crest origin [270], supporting the existence of a second developmental pathway that converges on to a neural crest-like phenotype. Additional support for lineage convergence is based on the observation that pluripotent stem cells can be produced from germ cells [271] and even from somatic fibroblasts by in vitro reprogramming [242]. Intriguingly, just as reported here with SKPs, pluripotent cells obtained from different developmental origins (ES cells, reprogrammed germ cells and reprogrammed somatic cells) have similar expression profiles and functional characteristics but retain epigenetic and expression marks indicative of their primary origin [271]. Having shown that neural crest stem cell-like SKPs derive from different developmental origins, I went on to identify genes and pathways that distinguish the common neural crest stem cell-like phenotype of the three SKP lineages from a multipotent somatic stem lineage derived from the mesoderm (the origin of dorsal and ventral SKP lineages), the MSCs. Mesenchymal stem cells represent one of the few somatic stem cell lineages that is similar to the neural crest in terms of their developmental potential. The MSCs can reportedly differentiate into a wide array of tissue types, including osteoblasts (bone), adipocytes (fat), chondrocytes (cartilage) as well as myocytes (muscle cells) and neurons [230,255]. Differential expression analysis comparing the transcriptomes of SKPs and MSCs revealed that the majority of pathways associated with transcripts increased in abundance in SKPs compared to MSCs involved three core neural crest stem cell signaling pathways, WNT/Beta-Catenin, bone morphogenetic protein BMP or transforming growth factor beta (TGFB) signaling [272]. The expression of both the BMP and WNT/Beta-Catenin pathway members was consistent with the neural crest stem cell phenotype of SKPs, as the coordinated activity of these two pathways is required for the maintenance of undifferentiated state in neural crest stem cells [257]. In contrast, the majority of pathways associated with transcripts whose abundance was decreased in SKPs involved double-strand break DNA repair and cell cycle control. In particular, the BRCA1 molecule participated in many of the DNA repair pathways found to be significantly enriched among the SKPdepleted (MSC-enriched) transcripts, suggesting its central role in the functional network of 53 molecules relatively increased in abundance in MSCs compared to SKPs. This observation is consistent with the recent findings that the MSC cell lineage was resistant to irradiation through, among other pathways, the activation of double-strand break repair by homologous recombination and nonhomologous end joining (NHEJ) governed by BRCA1 [273]. As discussed in Section 2.2.5.3 AURKB, a kinase linked to the BRCA1 signaling pathway through the interaction with BARD1 [269] was found to be part of Supercontig 1 containing genes whose expression pattern was restricted to undifferentiated ES cells. As reported in Section 2.2.5.3 the mRNA expression level of this pluripotency-associated kinase was found to be decreased in SKPs compared to MSCs suggesting that the increased expression of BRCA1 DNA repair pathway in MSCs compared to SKPs is similar to that observed in ES cells. In conclusion, I addressed the overall goal of this Chapter of characterizing the expression profiles of SKP lines used as models of neural crest progenitors and normal counterparts of neuroblastoma cells [243,233]. In particular, I found that known signaling pathways specifically implicated in neural crest stem cells, such as WNT/Beta-Catenin, BMP and TGFB signaling were preferentially expressed in the neural crest stem cell-like cells compared to a mesodermal multipotent cell lineage. A novel finding from our work is the relative decrease in gene-level expression of members of the double-stranded DNA repair pathways that involve BRCA1 in SKPs compared to MSCs and ESCs. The molecular mechanism underlying this observation as well as its functional significance remains to be addressed by future work. Of note is the fact that members of the same pathway were found to be expressed at a higher level in NBL tumor-initiating cells as compared to SKPs (Chapter 3). 2.4 Materials and methods 2.4.1 Microarray analysis of rat SKP lines RNA was prepared from twice-passaged adult rat dorsal, facial, and ventral SKPs and MSCs using Trizol (Invitrogen), as per the manufacturer‘s instructions, followed by the RNeasy Mini Kit (Qiagen, Venlo, Netherlands, http://www.qiagen.com). Three independent isolates for each of dorsal trunk SKPs, ventral trunk SKPs, and facial SKPs, and four independent isolates of MSCs were used for the microarray study. The independent isolates were obtained from different animals. The RNA samples were analyzed on Affymetrix Gene- 54 Chip Rat Gene 1.0 ST Arrays (Affymetrix, Santa Clara, CA, http://www.affymetrix.com). The data were checked for batch effects, background corrected and quantile-normalized according to the standard Robust Multichip Average (RMA) procedure using the Affymetrix Expression Console software. The gene expression data were annotated using R. norvegicus genome build rn4. Subsequent statistical analysis was conducted using R-2.8.1. Microarray data were deposited in the NIH GEO repository (accession number GSE23954). 2.4.2 Unsupervised analysis to assess global transcriptome similarity Genes with variable expression pattern across four groups (facial SKPs, dorsal SKPs, ventral SKPs, and MSCs) were identified using the multiple group comparison implemented in the Linear Models for Microarray Data (LIMMA) Bioconductor package version 2.16.5. This package was chosen for the analysis as it was reported to perform favorably compared to other common approaches for microarray data analysis, including SAM, Welch‘s T-test, ANOVA and Wilcoxon‘s test [247]. The LIMMA method first builds a linear model for each gene using the number of parameters (experimental groups) defined by the user. A moderated t-statistic (computed based on the weighted average between the variance of each gene and the variance for all genes) is then used to test the hypothesis of each parameter coefficient for each gene being equal to zero [249]. The estimated coefficients are used to represent the fold changes between the experimental groups. For multiple group comparisons described in Section 2.2.1 (facial SKPs, dorsal SKPs, ventral SKPs, and MSCs) the F-statistic with Benjamini-Hochberg (BH) multiple testing correction implemented in the eBayes function was used to test the null hypothesis that all parameter coefficients were equal to zero, and in other words there were no differences among the experimental groups [249]. The parameter coefficients were defined by the contrasts between each group pairs: facial SKPs versus dorsal SKPs, facial SKPs versus ventral SKPs, facial SKPs versus MSCs, dorsal SKPs versus ventral SKPs, dorsal SKPs versus MSCs, and ventral SKPs versus MSCs. Those genes with BH-corrected q < 0.05 were considered statistically significant. A total of 7,012 out of 18,879 genes showed evidence of differential expression among the four groups (BenjaminiHochberg false discovery rate-corrected q < 0.05), and these genes were used in the correlation analysis and unsupervised consensus hierarchical clustering. Unsupervised hierarchical cluster analysis was conducted using Bioconductor package Pvclust version 1.2-1 with 100 bootstrap alterations. The Spearman Rank correlation 55 matrix was computed using the standard R cor function, and plotted as an image using the custom function myImagePlot (available at www.phaget4.org/R/myImagePlot.R). 2.4.3 Differential expression analysis using microarrays The preprocessed data were analyzed using the Linear Models for Microarray Data (LIMMA) Bioconductor package to identify genes that show significant evidence of differential expression in each pairwise or multiple group comparison described in the text [249]. The LIMMA method first builds a linear model (lmFit function) for each gene using the number of parameters (experimental groups) defined by the user. A moderated t-statistic computed based on the weighted average between the variance of each gene and the variance of all genes is then used to test the hypothesis of each parameter coefficient for each gene being equal to zero [249]. The estimated coefficients are used to represent the fold changes between the experimental groups, and those genes with coefficients significantly different from zero based on the moderated T-test are considered differentially expressed. The Benjamini-Hochberg (BH)-corrected q < 0.05 was used as the threshold for statistical significance. For multiple group comparisons, such as described in Section 2.2.2 (ventral SKPs, dorsal SKPs, and MSCs) the F-statistic with BH multiple testing correction implemented in the eBayes function was used to test the null hypothesis that all parameter coefficients were equal to zero (analysis similar to ANOVA), and in other words there were no differences among the experimental groups [249]. Those genes with BH-corrected q < 0.05 were considered statistically significant, unless listed otherwise. 2.4.4 Reverse Transcription Polymerase Chain Reaction (RT-PCR) to confirm results from SKP microarray analysis RNA was prepared from twice-passaged neonatal mouse dorsal and facial SKPs using Trizol (Invitrogen) and from sorted, uncultured mouse skin cells using Cells-to-cDNA II kit (Ambion/Applied Biosystems, Austin, TX, http://www.ambion.com) as per the manufacturer‘s instructions, followed by the RNeasy Mini Kit (Qiagen). For all analyses, controls were performed without reverse transcriptase. PCR reactions were performed as follows: 94°C, 2 minutes; 25–35 cycles of 94°C, 15 seconds; gene-specific annealing temperature for 30 seconds; and 72°C for 30 seconds. Primers used in this study were as follows: Ap2a1, 5′-TCCCTG TCCAAGTCCAACAGCAAT-3′ and 5′- 56 AAATTCGGTTTCGCACACGTACCC-3′; Eya1, 5′-CTAACCAGCCCGCATAGCCG-3′ and 5′-TAGTTTGTGAGGAAGGGGTAGG-3′; Foxd3, 5′-TCTTACATCGCGCTCATCAC3′ and 5′-TCTTGACGAAGCAGTCGTTG-3′; Gapdh, 5′CGTAGACAAAATGTGAAGGTCGG-3′ and 5′-AAGCAGTTGGTGGTGCAGGATG-3′; Hoxa5, 5′-TAGTTCCGTGAGCGAACAATTC-3′ and 5′GCTGAGATCCATGCCATTGTAG-3′; Hoxc4, 5′-AACCCATAGTCTACCCTTGGATGA3′ and 5′-CGGTTGTAATGAAACTCTTTCTCTAATTC-3′; Hoxc6, 5′ACGTCGCCCTCAATTCCA-3′ and 5′-CTGAGCTACGGCTGCTCCAT-3′; Hoxc9, 5′TGTAGCGATTTTCCGTCCTGTAG-3′ and 5′-CC GTAAGGGTGATAGACCACAGA-3′; Mab21l1, 5′-CCCCAACATGATCGCGGCCCAGGCC-3′ and 5′CCTCCTTCAGGACGTCGGAGACCAC-3′; Mab21l2, 5′CCCCAACATGATCGCCGCTCAGGCC-3′ and 5′-CGGGGCTCTTGCACCTCCACTTCC3′; Msx1, 5′-CGGGCGCCTCACTCTACAGT-3′ and 5′-TCCCGCTGCTCTGCTCAAA-3′; p75NTR, 5′-GTGTTCTCCTGCCAGGACAA-3′ and 5′-GCAGCTGTTCCACCTCTTGA-3′; Pax3, 5′-TGCCCTCAGTGAGTTCTATCAGC-3′ and 5′GCTAAACCAGACCTGCACTCGGGC-3′; Rhob, 5′-AAGACGTGCCTGCTGATCGTG-3′ and 5′-CTTGCAGCAGTTGATGCAGCC-3′; Slug, 5′CGTCGGCAGCTCCACTCCACTCTC-3′ and 5′-TCTTCAGGGCACCCAGGCTCACAT3′; Snail1, 5′-CGGCGCCGTCGTCCTTCT-3′ and 5′GGCCTGGCACTGGTATCTCTTCAC-3′; Sox9, 5′CCGCCCATCACCCGCTCGCAATAC-3′ and 5′-GCCCCTCCTCGCTGATACTGGTG-3′; Sox10, 5′-CAAGGGGCCCGTGTGCTA-3′ and 5′-GCCCGTGCCATGCTAACTCT-3′; Twist1, 5′-CTTTCCGCCCACCCACTTCCTCTT-3′ and 5′GTCCACGGGCCTGTCTCGCTTTCT-3′; and Zic1, 5′-GCGGCCGAAAGCCAACT-3′ and 5′-TGCCAAAAGCAATGGACAGC-3′. 2.4.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs Ingenuity software (Ingenuity Systems, www.ingenuity.com) was used for pathway enrichment analysis according to the instructions available on the software website. The rat genes were mapped to the corresponding human orthologs using the Ingenuity software. The human orthologs were then annotated using the Ingenuity Knowledgebase and subjected to a 57 pathway enrichment analysis. This analysis uses a Fisher‘s Exact test to assess the null hypothesis that the number of observed genes in a particular pathway is not different from the number expected by chance, based on the size of the observed gene list, and the number of genes in the pathway. The P-values can be adjusted for multiple testing using the Benjamini-Hochberg (BH) procedure. In this Chapter, BH-corrected q-values of less than 0.05 (unless noted otherwise in the text) were considered statistically significant and sufficient to reject the null hypothesis of a chance association of a particular pathway with the observed gene list. 2.4.6 Seriation analysis of LongSAGE libraries from the Cancer Genome Anatomy Project LongSAGE gene expression libraries were prepared as described previously [274]. The libraries used in this study are available via the Gene Expression Omnibus database as part of the record GSE14 of the Cancer Genome Anatomy Project resource [259,144]. The tissue origins of the libraries are summarized in Table 2.3. LongSAGE tags were mapped to genes using the Hs_long.best_tag file available through the SAGE Genie at ftp://ftp1.nci.nih.gov/pub/SAGE/HUMAN and as previously described [259]. The tag counts in each library were normalized to the depth of 100,000. The resulting dataset was subjected to seriation analysis using the progressive construction of contigs heuristic implemented using custom MATLAB scripts. The algorithm described in Section 2.4.6 was run three times on the same dataset to ensure that the seriation result was robust. 2.4.7 Seriation using the progressive construction of contigs heuristic Seriation seeks the best enumeration order among objects based on their similarity according to a chosen criterion. Since the problem is NP-hard, we developed a novel heuristic specifically for the SAGE data analysis task. The ‗progressive construction of contigs‘ heuristic attempts to put the most similar objects side by side without breaking already established chains of closely related elements we term ‗contigs‘. Here we use pairwise correlations between expression vectors (normalized tag counts for a particular tag across all libraries) as the criterion for defining similarities between tags; however, in principle, other similarity criteria can be used for this task. The pairwise correlations between tag expression vectors x and y are calculated using the standard correlation coefficient function, R(x,y) = C(x,y)/sqrt(C(x,y)*C(y,x)) where C(x,y) = E[(x – x̅)*(y – y̅)], where x̅ and 58 y̅ are the means of expression vectors x and y, and E is the mathematical expectation. The correlation values are subsequently arrayed into a symmetric matrix, which is subjected to the following progressive seriation procedure. In the first step, the tag pair with the highest correlation value is found and marked as the beginning of the first contig. At each subsequent step the tag pair with the next highest correlation value is identified. If one of the members of the tag pair is involved in a previously formed contig, the columns of the matrix are reorganized to place the other member at the nearest edge of the same contig; since the matrix is symmetrical, the rows are reordered accordingly. Importantly, previously reordered elements are kept intact in this process. If it is impossible to add the similarity maximum of the current step to a contig given the restriction on the previously-moved objects or if the tag pair with the correlation maximum does not involve any of the members of the formed contigs, the current similarity maximum is used to start a new contig. The seriation process continues until all elements have been processed. The result is the production of contigs of similar correlation values that can be displayed along the diagonal of the correlation matrix representing internal topologies in the data. Theoretically, in the case of a Robinson data structure, whereby the data are from a unimodal distribution, the contigs are merged into one and the obtained result is the most optimal single seriation solution [275,261]. A key algorithmic difference between the seriation algorithm described above and a procedurally similar hierarchical clustering algorithm (such as the hierarchical clustering method developed in [276] and implemented in [277] is the treatment of vectors after the highest pairwise correlation value has been identified at each step. In clustering, the vectors are averaged together into a new vector using a linkage rule (for instance, average linkage clustering) and this new vector is represented by a node in the hierarchical clusterogram. In contrast, in the case of seriation, no new vector or node is formed, and the rows and columns of the correlation matrix are merely reordered to reflect underlying patterns in the data as described above. Therefore, no linkage rule is required in seriation in addition to the distance metric used to define similarities. In our implementation of the seriation algorithm, ordered structures (contigs) are revealed by color-coding the reordered correlation matrix according to the magnitude of the correlation value. In this manner, visual inspection of the matrix allows for the selection of ordered contigs for further inspection. Higher order structures (supercontigs) can also emerge from this analysis, indicating more complex patterns in the 59 dataset. Due to the visualization component, the algorithm is able to analyze up to 4000 genes at a time (tested on 1.7 IBM PC Pentium 4, Z60t laptop) and is suitable for the analysis of pre-selected sets of genes. Importantly, the algorithm produces a robust solution for each seriation run (in other words, equivalent solution is produced upon repeated seriation of the same data set). 2.4.8 Computational validation of transcripts in Supercontig 1 as pluripotency markers Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to identify canonical pathways associated with genes in Supercontig 1 as described in Section 2.4.5. To identify transcription factor binding sites in groups of genes, we used the PASTAA Web server as recommended by the authors [263]. The PASTAA algorithm ranks genes by estimating the overall affinity of a position weight matrix (PWM) for sequence regions that are defined relative to the transcriptional start site of each gene in a list. Two separate PASTAA analyses were performed on each of the three Supercontigs to identify candidate distal- and proximal-acting transcription factor. The distal analysis involved interrogating a region extending 10 kb upstream from the transcription start site, while the distal analysis interrogated a region 6400 bp on each side of the transcription start site. 60 Figure 2.1 Global expression patterns are similar across SKPs of distinct development origins Transcriptome-wide expression profiles from dorsal, facial, and ventral SKPs and mesenchymal stem cells (dSKPs, fSKPs, vSKPs, and MSCs, respectively) were processed as described in Methods. (A). Spearman Rank correlations computed based on genes differentially expressed among the different types of SKPs and MSCs were color coded as shown in the color legend such that yellow represents high and blue represents low correlation, respectively. The color-coded Spearman correlation matrix reveals the relative similarity among the expression profiles of SKPs, regardless of their origin, which is in contrast to the expression profiles of MSCs that formed a separate square (bottom right). The correlation matrix is symmetrical such that the ordering of samples is the same along the xand y-axes. (B). Unsupervised clustering conducted using correlation distance, and average linkage clustering over 100 bootstrap iterations confirmed the finding of similarity across the three SKP lineages, and their distinction from MSCs. The significance of the hierarchical clustering result was assessed using AU (approximately unbiased, in red font) and BP (bootstrap probability, in green font) re-sampling based on 100 iterations implemented in the R package Pvclust. Modified with permissions of AlphaMed Press. 61 A 62 B 63 Figure 2.2 Facial and dorsal trunk SKP lineages show similar degrees of divergence from MSCs (A). Numbers of genes significant in each of the pairwise differential expression comparisons among the facial SKPs, dorsal trunk SKPs, and MSCs (Benjamini-Hochberg-corrected q < 0.05) are plotted as Venn diagrams. Each pairwise comparison is denoted by a colored circle: facial SKPs vs. dorsal SKPs (yellow); MSCs vs. facial SKPs (pink); MSCs vs. dorsal SKPs (green). Numbers of genes significant in each comparison are quoted in the circles. For instance, the comparison in the bottom (pink and green Venn) reveals that there are 2,525 (1,069 + 1,456) and 2,233 (777 + 1,456) genes differentially expressed between MSCs and facial SKPs, and MSCs and dorsal SKPs, respectively; 1,456 of these genes are differentially expressed in both of the comparisons. This analysis reveals that facial SKPs and dorsal SKPs are more similar to each other than either of them are to MSCs. In addition, the extent of divergence between each SKP lineage and MSCs is similar with 2,525 and 2,233 genes being differentially expressed between the MSCs and facial and dorsal SKPs, respectively. (B). Three-way comparison was conducted across the three groups (facial SKPs, dorsal trunk SKPs and MSCs) to identify genes that show evidence of differential expression using the LIMMA Bioconductor package. Expression profiles of 2,603 genes, identified as differentially expressed among the groups (Benjamini-Hochberg-corrected q < 0.05) are plotted as a heatmap. The rows are centered and scaled by subtracting the mean of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). Modified with permissions of AlphaMed Press. 64 A 65 B 66 Figure 2.3 SKPs of distinct developmental origin express neural crest specification genes despite maintaining a lineage history at the gene expression level (A, left panel). Rat microarray expression levels of genes involved in neural crest specification and associated with neural crest precursors: Snail1, Slug, Twist, Sox9, Sox10, Foxd3, Ap2a1, p75NTR and RhoB [252,250,251]. Green indicates the lowest relative levels of expression and red the highest, as defined by the color key. Note that these genes are expressed similarly in the facial and dorsal trunk SKPs, despite the distinct developmental origins of these two SKP lineages from the neural crest and mesoderm, respectively. (A, right panel). Rat microarray expression levels of 10 out of the 35 transcription factors that were identified as being among the most differentially expressed genes between dorsal and facial SKPs in the analysis in Table 2.1. Green indicates the lowest relative levels of expression and red the highest, as defined by the color key. Note the differential expression between dorsal and facial SKPs. (B). RT PCR validation of the microarray results above conducted in the mouse model. For the RT PCR experiment in the left panel, total RNA was isolated from neonatal mouse dorsal trunk and facial secondary SKP spheres. Total RNA from E8.5 murine embryos was used as a positive control for primer performance. For the RT PCR experiment in the right panel, the total RNA was purified from uncultured EGFP-positive cells from neonatal Sox2-EGFP mouse dorsal trunk and facial skin secondary SKP spheres. Total RNA from E8.5 mouse embryos was used as a positive control for primer performance. Modified with permissions of AlphaMed Press. 67 A 68 B 69 Figure 2.4 Transcripts preferentially enriched or depleted in SKPs compared to MSCs Pairwise comparisons were conducted between each of dSKPs, fSKPs and vSKPs and MSCs using the LIMMA Bioconductor package to identify significantly differentially expressed genes (Benjamini-Hochberg-corrected q < 0.05). The results from these comparisons were combined to identify genes commonly enriched or depleted in SKPs; the expression profiles of 654 and 752 genes enriched or depleted in SKPs compared to MSCs, respectively are plotted as a heatmap. The rows are centered and scaled by subtracting the mean of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). The genes are listed in Appendix A. 70 71 Figure 2.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal pathway annotations significantly enriched among transcripts increased (A) or decreased (C) in abundance in SKPs compared to MSCs (Benjamini-Hochberg-corrected q < 0.05). The list of transcripts used for the analysis is provided in Appendix A. In (A) and (C) the negative logs of P values are plotted along the x-axis while the pathways are plotted along the y-axis. (B). The Ingenuity canonical pathway named ―Human Embryonic Stem Cell Pluripotency‖ is significantly enriched among the transcripts upregulated in SKPs reflecting the broad development potential of the neural crest stem cell [258]; the pathway members upregulated in SKPs are in red; the protein complexes are bolded; the kinases are denoted with triangles, while cytokines are denoted with squares. (D). The Ingenuity canonical pathway named ―Role of BRCA1 in DNA Damage Response‖ is significantly enriched among transcripts downregulated in SKPs compared to MSCs; the pathway members downregulated in SKPs are in green, and the protein complexes are bolded. 72 A 73 B 74 C 75 D 76 Figure 2.6 Seriation analysis to identify developmentally restricted transcripts expressed in undifferentiated ES cells (A). Seriation analysis of the 319 candidate pluripotency genes (Appendix B) revealed three Supercontigs of co-expressed genes, containing 114, 145 and 60 genes, respectively. Supercontigs are bounded by red boxes and are numbered. Upon inspection, Supercontig1, composed of 114 genes, contained transcripts increased in abundance in the undifferentiated ES cells. (B). Average LongSAGE-based expression level for the 114 genes in Supercontig1 genes across 17 LongSAGE libraries. Reprinted with permissions of Elsevier. 77 A B 78 Figure 2.7 Computational validation of genes identified by seriation as pluripotency markers (A). Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal canonical pathways significantly enriched among transcripts in Supercontig 1 (Benjamini-Hochberg-corrected q < 0.05). The list of transcripts used for the analysis is provided in Appendix B. (B, C). Proximal and distal promoter analyses of the genes in Supercontig1 reveals the presence of binding sites for the core pluripotency transcription factors, SOX, NANOG, and OCT/POU that are required for the propagation of undifferentiated ES cells in culture [262]. The hollow blue circles indicate individual PWMs used for the analyses (444 and 487 PWMs were used for the proximal and distal analyses, respectively). The affinity scores of each PWM computed by PASTAA algorithm are plotted against the p-values such that the highest scoring PWMs are in the top left [263]. Several PWMs scored higher than NANOG in the proximal analysis (hollow circles left of the NANOG PWM). These included PWMs representing binding sites for NFY, AP2, TBX5, and ELK1, all associated with stemness and early development [267,264,278,265]. Panels 2.7B and 2.7C are reprinted with permissions of Elsevier. 79 A 80 B C 81 Table 2.1 Genes with significant evidence of differential expression between (A) fSKPs and dSKPs, and (B) dSKPs and vSKPs as shown in Figure 2.3B (A) Negative log fold change (LogFC) indicates genes whose mRNA levels are decreased in fSKPs versus dSKPs while positive LogFC indicates genes whose mRNA levels are increased in fSKPs versus dSKPs. The genes are sorted by their LogFC; (B) negative LogFC indicates genes whose mRNA levels are decreased in dSKPs versus vSKPs while positive log fold change indicates genes whose mRNA levels are increased in dSKPs versus vSKPs. The genes are sorted by their LogFC. Modified with permissions of AlphaMed Press. A Gene LogFC Benjamini-Hochberg-corrected q Eltd1 -4.44344 0.0024664 Zic1 -4.21255 0.0049442 Hoxc6 -4.02777 0.0025499 Hoxc9 -3.35571 0.0206326 Hoxa5 -3.20921 Cdh7 -2.44479 0.0229012 Tfpi -2.4351 0.0020694 Anxa8 -2.13009 0.0157295 Hoxc4 -1.67475 0.0206326 Avpr1a -1.44901 0.0206326 Cox4j2 1.552854 0.0206326 Fzd6 1.728556 0.0206326 Herc3 1.738304 0.0298194 Glt25d2 1.854643 0.0170832 Cnksr3 1.886715 0.0150527 Gpr85 2.016079 0.0207053 Nrp1 2.084411 0.0110914 Sytl3 2.099451 0.0259196 0.0024664 82 Gene LogFC Benjamini-Hochberg-corrected q Il16 2.201115 0.0452825 Cd200 2.377227 0.0381321 Eya1 2.543472 0.0485376 Lphn3 2.547429 0.0117803 Pu3f4 2.820537 0.0051457 Cxcl14 3.096734 0.0024715 Mab21l2 3.277685 0.0071718 Tnfsf11 3.328933 0.0110914 Msln 3.405196 0.0102423 Ptprn 3.521102 0.0206326 Reln 3.724786 0.0016128 RGD13052 3.900023 0.0169522 Thbs4 4.172965 0.0017979 Cdh6 4.205613 0.0016128 Cntn6 4.224511 0.0495426 Pax3 4.409286 0.0049414 Mab21l1 5.134758 0.0016128 83 B Gene LogFC Benjamini-Hochberg-corrected q Mab21l1 -5.30391 0.00026 Frzb -5.23657 0.001611 RGD1310827 -3.86079 0.00954 Ccr2 -3.78686 0.003454 LOC681994 -3.57244 0.008613 Car2 -3.52152 0.001611 Tmem26 -3.40143 0.013088 Nes -3.22634 0.011844 Cbln4 -3.21517 0.008461 Upk1b -3.14743 0.007513 LRRTM1 -3.12189 0.01291 Cmklr1 -3.09583 0.013115 Cldn10 -3.05589 0.004253 Mark1 -2.9736 0.002901 Il16 -2.80564 0.008005 Plxdc1 -2.70215 0.003454 Slc1a1 -2.66076 0.001611 RGD1307749 -2.60502 0.001611 Nrp1 -2.48394 0.001611 RGD1305869 -2.46487 0.001444 Loxl2 -2.40684 0.013111 Map2 -2.32846 0.010447 Slc4a11 -2.31295 0.008613 Tbx5 -2.26593 0.008242 Scn4b -2.23519 0.004551 Acy3 -2.1814 0.006916 Itga2 -2.17679 0.008005 84 Gene LogFC Benjamini-Hochberg-corrected q RGD1563891 -2.16154 0.01076 Col11a1 -2.16135 0.013115 Tead2 -2.13101 0.008439 -2.1293 0.002551 2.1247 0.008461 Cdh7 2.587961 0.008613 Emb 3.149873 0.011844 Zic1 4.191047 0.003422 LOC499465 Avpr1a 85 Table 2.2 Pathways enriched among the transcripts increased or decreased in abundance in SKPs compared to MSCs Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to identify pathway annotations significantly enriched among transcripts differentially expressed between SKPs and MSCs. A Ingenuity Canonical Pathways Role of Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis Axonal Guidance Signaling -log(BH q) Ratio Molecules 3.74 1.15E-01 3.74 9.16E-02 Colorectal Cancer Metastasis Signaling 2.54 9.87E-02 Human Embryonic Stem Cell Pluripotency Wnt/Beta-catenin Signaling 1.69 1.08E-01 1.52 9.94E-02 Role of Macrophages, Fibroblasts and Endothelial Cells in Rheumatoid Arthritis 1.52 7.96E-02 TCF4, ADAM17, SFRP2, MMP3, BMP2, MMP14, PIK3R1, WNT16, MMP13, NFKBIA, NFAT5, WIF1, RUNX2, WNT7B, DKK2, TNFRSF1B, CTNNB1, PPP3CA, MMP1 (includes EG:300339), ADAMTS4, LRP5, GSN, IL7, BMP7, WNT5A FYN, ADAM17, PIK3R1, BMP2, MMP13, WNT16, EPHA4, NCK1, PLXNA2, ROBO1, PRKCZ, SEMA6C, SEMA4C, NTN1, LIMK1, EFNB2, NFAT5, EFNB1, WNT7B, PLCB1, GNA13, ROBO2, RASA1, PPP3CA, GNG12, SEMA5A, MMP10, SEMA3A, WIPF1, NTRK2, SEMA6D, PRKCD, BMP7, SEMA7A, WNT5A LRP5, TCF4, ADCY2, TGFBR1, MMP3, PTGER3, MMP14, PIK3R1, WNT16, MMP10, MMP13, MMP2, TGFBR2, WNT7B, MMP11, MMP12, CTNNB1, ADCY7, GNG12, RALGDS, MMP9, MMP1 (includes EG:300339), WNT5A TCF4, S1PR2, TGFBR1, PIK3R1, FGFR1, BMP2, WNT16, TGFBR2, SOX2, NTRK2, WNT7B, BMP7, CTNNB1, WNT5A LRP5, AXIN2, TCF4, SFRP2, TGFBR1, APPL2, WNT16, KREMEN1, TGFBR2, SOX2, CDH2, WIF1, WNT7B, DKK2, CTNNB1, WNT5A LRP5, TCF4, SFRP2, MMP3, PIK3R1, WNT16, MMP13, IL7, PRKCZ, IL16, NFKBIA, NFAT5, WIF1, PRKCD, WNT7B, PLCB1, DKK2, TNFRSF1B, CTNNB1, PPP3CA, MMP1 (includes EG:300339), WNT5A, ADAMTS4 86 Ingenuity Canonical Pathways RAR Activation -log(BH q) Ratio Molecules 1.52 9.47E-02 Molecular Mechanisms of Cancer 1.52 7.42E-02 Bladder Cancer Signaling 1.35 1.19E-01 Leukocyte Extravasation Signaling 1.32 9.04E-02 Airway Pathology in Chronic Obstructive Pulmonary Disease LXR/RXR Activation 1.30 4.29E-01 ADCY2, RDH10, BMP2, PIK3R1, RBP1, PRKCZ, CRABP1, PTEN, ADH7, PNRC1, PRKCD, IGFBP3, RXRA, ZBTB16, ADCY7, MMP1 (includes EG:300339) FYN, TCF4, TGFBR1, BMP2, PIK3R1, PSEN2, PSENEN, CDKN2B, PRKCZ, TGFBR2, NFKBIA, PLCB1, GNA13, CTNNB1, RASA1, RALGDS, ADCY2, LRP5, RASGRF2, PRKCD, BMP7, ARHGEF9, ADCY7, BCL2L11, WNT5A DAPK1, MMP3, MMP14, MMP10, MMP13, MMP2, MMP11, MMP12, MMP9, MMP1 (includes EG:300339) TIMP3, MMP3, MMP14, PIK3R1, MMP10, MMP13, MMP2, PRKCZ, WIPF1, PRKCD, VAV3, MMP11, MMP12, CTNNB1, MMP9, MMP1 (includes EG:300339) MMP2, MMP9, MMP1 (includes EG:300339) 1.30 1.14E-01 Hepatic Fibrosis / Hepatic Stellate Cell Activation 1.30 9.56E-02 PTEN Signaling 1.30 1.01E-01 APOE, SCD, LY96, NR1H3, ABCG1, TNFRSF1B, RXRA, MMP9, ABCA1 IGFBP4, TGFBR1, FGFR1, MMP13, IGFBP5, MMP2, TGFBR2, LY96, IGFBP3, EDNRA, TNFRSF1B, MMP9, MMP1 (includes EG:300339) TGFBR2, GHR, NTRK2, TGFBR1, PIK3R1, FGFR1, FOXO3, IGF2R, BCL2L11, PRKCZ, PTEN 87 B Ingenuity Canonical Pathways Hereditary Breast Cancer Signaling -log(BH q) Ratio Molecules 5.85 1.86E-01 Role of BRCA1 in DNA Damage Response ATM Signaling 5.38 2.55E-01 4.85 2.50E-01 DNA Double-Strand Break Repair by Homologous Recombination Mitotic Roles of Polo-Like Kinase Cell Cycle Control of Chromosomal Replication Cell Cycle: G2/M DNA Damage Checkpoint Regulation Germ Cell-Sertoli Cell Junction Signaling 4.37 5.00E-01 POLR2F, CDC25C, GADD45B, GADD45G, BARD1, RPA1, RAD50, CHEK1, CCNB1, RAD51, HDAC6, FANCB, FANCM, RRAS2, FANCD2, RFC4, H2AFX, MRAS, BRCA2, BRCA1, RFC3 BARD1, PLK1, RPA1, RAD50, CHEK1, RAD51, FANCB, FANCM, FANCD2, RFC4, BRCA2, BRIP1, BRCA1, RFC3 CDC25C, GADD45B, GADD45G, CCNB2, MAPK12, RAD50, CHEK1, CCNB1, RAD51, SMC2, FANCD2, H2AFX, BRCA1 RAD51, LIG1, GEN1, BRCA2, RPA1, BRCA1, RAD50 3.01 1.93E-01 2.45 2.59E-01 1.93 1.86E-01 KAT2B, CDC25C, CCNB2, PKMYT1, PLK1, BRCA1, CHEK1, CCNB1 1.93 1.10E-01 Role of CHK Proteins in Cell Cycle Checkpoint Control Molecular Mechanisms of Cancer 1.93 2.06E-01 TUBB2C, LAMC3, MAP3K5, MAPK12, TUBB, TUBA1B, PAK1, RRAS2, PAK3, SORBS1, MRAS, TGFB3, TUBA1C, ACTG2, ACTN4, ACTN1 CDC25C, RFC4, RPA1, BRCA1, RAD50, CHEK1, RFC3 1.93 8.01E-02 KIF23, CDC25C, PLK4, ESPL1, CDC20, PRC1, CCNB2, PKMYT1, PLK1, KIF11, CCNB1 MCM6, CDC45, MCM2, CDC6, RPA1, DBF4, ORC1 BMP4, LRP6, CDKN2C, MAP3K5, FAS, CHEK1, PAK1, FANCD2, MRAS, BRCA1, CDC25C, CCNE2, SMAD9, HAT1, SMAD6, AURKA, MAPK12, FOS, CCNE1, PRKCI, RRAS2, FZD4, PAK3, FZD6, TGFB3, PLCB3, CAMK2G 88 Ingenuity Canonical Pathways Breast Cancer Regulation by Stathmin1 -log(BH q) Ratio Molecules 1.76 9.68E-02 Aryl Hydrocarbon Receptor 1.59 Signaling 1.56 RhoA Signaling 1.02E-01 PPP1R14C, CCNE2, TUBB2C, PPP1R3C, PPP1R14A, GNG3, ITPR1, TUBB, TUBA1B, ROCK2, PAK1, CCNE1, RRAS2, PRKCI, MRAS, PLCB3, TUBA1C, CAMK2G GSTM1, CCNE2, IL6, ALDH9A1, FAS, CYP1B1, CHEK1, CCNA2, FOS, CCNE1, NFIA, TGFB3, AHR, HSPB1 ROCK2, ARHGAP5, MYL9, PFN1, MYL6, CFL2, CIT, IGF1R, ANLN, ACTG2, DLC1, LPAR3 1.17E-01 89 Table 2.3 LongSAGE libraries used for the seriation analysis described in Section 2.2.4 The LongSAGE libraries from undifferentiated ES cells, differentiated ES cells and adult tissues were used in seriation analysis to select genes whose expression was restricted to undifferentiated ES cells. Library Group Description Shes2 Undifferentiated human ES cells H9 human ES cells Shes9 Undifferentiated human ES cells HSF6 human ES cells She10 Undifferentiated human ES cells HES3 human ES cells She11 Undifferentiated human ES cells HES4 human ES cells She13 Undifferentiated human ES cells H7 human ES cells She14 Undifferentiated human ES cells H14 human ES cells She15 Undifferentiated human ES cells H13 human ES cells She16 Undifferentiated human ES cells H1 human ES cells She17 Undifferentiated human ES cells H1 human ES cells She19 Undifferentiated human ES cells BG01 human ES cells Shs11 Differentiated human ES cells H1 human ES cell-derived erythromegakaryocytic progenitors Shs12 Differentiated human ES cells Shs13 Differentiated human ES cells Cg643 Adult tissue H1 human ES cell-derived enriched primitive hematopoietic multipotent progenitors H1 human ES cell-derived enriched primitive hematopoietic myeloid progenitors Normal adult bulk pancreas Cg647 Adult tissue Mammary gland, antibody purified Cg648 Adult tissue Normal substantia nigra Cg655 Adult tissue Normal liver vascular epithelium ID 90 Table 2.4 Pluripotency genes with transcript abundance increased or decreased in SKPs compared to MSCs The list of 114 pluripotency genes identified by seriation analysis described in Section 2.2.5 was overlapped with the list of transcripts significantly differentially expressed between SKPs and MSCs (Section 2.2.3). Gene symbol Description Molecular function Ingenuity Canonical Pathway enriched in ES cells CTNNB1 Catenin (cadherinassociated protein), beta 1, 88kDa Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency; Role of NANOG in Mammalian Embryonic Stem Cell Pluripotency ETV4 Ets variant 4 MAD2L2 MAD2 mitotic arrest deficient-like 2 (yeast) PITX2 Paired-like homeodomain 2 SOX2 SRY (sex determining region Y)-box 2 Transcriptional regulator, key member of the WNT signaling pathway Transcriptional regulator Component of the mitotic spindle assembly checkpoint Transcriptional regulator Transcriptional regulator ADAM23 AURKB ADAM metallopeptidase domain 23 Aurora kinase B CENPK Centromere protein K Increased or decreased in SKPs Increased Increased Increased Increased Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency; Role of NANOG in Mammalian Embryonic Stem Cell Pluripotency; Human Embryonic Stem Cell Pluripotency Increased Metalloprotease Decreased Protein serine/threonine kinase Protein binding Decreased Decreased 91 Gene symbol Description Molecular function FAM46B Family with sequence similarity 46, member B Family with sequence similarity 64, member A High mobility group box 2 Unknown Increased or decreased in SKPs Decreased Unknown Decreased Transcriptional regulator Translational regulator Decreased Protein transporter Decreased Dehydrogenase activity Decreased Transcriptional regulator Decreased Transcriptional regulator Structural component of cytoskeleton Decreased Transcriptional regulator Decreased FAM64A HMGB2 IGF2BP3 KPNA2 MTHFD1 MYBL2 TBX4 TPM1 (includes EG:22003) ZFP57 Insulin-like growth factor 2 mRNA binding protein 3 Karyopherin alpha 2 (RAG cohort 1, importin alpha 1) Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1, methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthetase v-Myb myeloblastosis viral oncogene homolog (avian)like 2 T-box 4 Tropomyosin 1 (alpha) Zinc finger protein 57 homolog (mouse) Ingenuity Canonical Pathway enriched in ES cells Decreased Decreased 92 Chapter 3: Transcriptome analysis of neuroblastoma tumor-initiating cells for therapeutic target prediction3 3.1 Introduction Cancer stem cells and tumor-initiating cells (TICs) have been described in a variety of hematopoietic and solid malignancies, including those of the breast, brain, pancreas, liver, skin, and colon [35]. Primary TIC lines have also been isolated from NBL tumors and metastases [279]. NBL TICs and cancer stem cells share several properties, including the ability to self-renew and differentiate into cell types observed in the bulk tumor, express stem cell markers, and exhibit enhanced tumorigenic potential [279]. While it has been reported that several NBL TIC lines may be contaminated with Epstein-Barr-transformed lymphocytes [280], these lines have been shown to recapitulate metastatic NBL in animals, including upon serial transplantation, supporting their usefulness as models for NBL [279]. A recent study using chronic myeloid leukemia stem cells provided proof of principle that targeting a cancer stem cell–enriched gene could lead to the eradication of such cells and a potential disease cure [281]. Therefore, NBL TICs, which are non-immortalized cell lines with high tumorigenic potential in immunosuppressed mice, can provide a model for the development of improved therapies for recurrent and metastatic NBL. In this Chapter I describe an RNA-Seq analysis applied to a panel of human NBL TIC and SKP lines (Table 3.1). The overall objective of this Chapter is to assess whether transcripts preferentially abundant in NBL TICs could reveal candidate new drug targets against NBL. To fulfill this objective, I address three specific aims described below. First, I use RNA-Seq data to identify transcripts for which the expression is increased in NBL TICs compared to other tissue types. Second, I conduct functional analysis to identify candidate drug targets among these transcripts, with a specific focus on one drug target of interest, Aurora kinase B 3 A version of this chapter has been published, and the co-author contributions are detailed in the Preface as per the University of British Columbia PhD thesis guidelines: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin. Cancer Res. 16(18):4572-82, 2010. Copyright by the American Association for Cancer Research. 93 (AURKB). Third, I conduct an exon-level analysis to provide a potential mechanism to explain the increased expression of AURKB in NBL TICs. I used SKPs as a normal reference sample for these analyses since, as discussed in detail in Chapter 2, SKPs are multipotent precursors isolated from human foreskin that are able to self-renew and differentiate into various neural crest derivatives, including peripheral neurons and neural crest lineage-specific Schwann cells [234]. As NBL is a tumor of neural crest precursors, SKPs provide a normal reference transcriptome for the identification of candidate gene expression changes associated with the TIC phenotype. To increase the specificity of the identified gene expression changes to NBL TICs, I also compared the expression profiles of NBL TICs to those of a compendium of cancer tissues, including primary tumor samples from breast, skin, brain, B-cells, ovary, cervix, and lung, as well as breast cancer cell lines. 3.2 Results 3.2.1 Identification of genes preferentially enriched or depleted in NBL TICs compared to a compendium of cancer tissues and SKPs Having analyzed the expression profiles of normal neural crest stem cell-like cells in Chapter 2, I set out to characterize the expression profiles of their malignant counterparts, NBL tumor-initiating cells (TICs). These cells have been identified and characterized in NBL primary tumors and metastases, and have been shown to be associated with tumor relapse [279]. We sequenced transcriptomes from 10 NBL TIC lines isolated from tumors and metastases of six high-risk NBL patients (Table 3.1) using Illumina RNA sequencing (RNASeq)[148,149]. The NBL TIC lines used in this study included those isolated from patients during disease relapse and one from a patient in remission. Because we previously showed that line NB67, isolated from the bone marrow of a patient in clinical remission who subsequently relapsed, was tumorigenic we included this NBL TIC line in the analysis [279]. To generate reference normal expression profiles, we sequenced the transcriptomes of three foreskin-derived SKP lines from three children without cancer [234]. As described in some detail in Chapter 2, these skin-derived progenitor cells, regardless of their embryonic origin, possess the properties of neural crest stem cells, and therefore may serve as reasonable normal counterparts to NBL TICs. 94 To identify transcripts significantly enriched in NBL TIC lines compared to normal neural crest cells I used the LIMMA Bioconductor package [249] as described in Methods. The LIMMA analysis revealed 817 and 1,913 genes either significantly increased or decreased in abundance in NBL TICs versus SKPs. I considered it likely that, within the list of differentially expressed genes, there would be candidate NBL TIC markers and also transcripts generally associated with a proliferative phenotype. Targeting gene products that are nonspecifically expressed in proliferating cell types would potentially result in increased toxicity, particularly in children whose organ systems are undergoing growth and development. To avoid identifying such genes, and to select gene expression differences specific to NBL TICs, I compared our NBL TIC RNA sequences to RNA sequencing data from 30 cancer samples available at the Genome Sciences Centre. These samples were derived from seven tissue types, including ovary, B-cells, lung, blood, brain, skin, and cervix (Table 3.2) and were included as an additional reference set for the identification of transcripts enriched specifically in NBL TICs. The LIMMA package [249] was used to compare gene expression levels between NBL TICs and other tissues as described in Methods. This comparison revealed that 2,258 genes were increased in NBL TICs compared to other tissues, while 2,397 genes were decreased in expression in NBL TICs compared to other tissues. These gene lists were then compared to the lists of genes identified as significantly differentially expressed between NBL TICs and SKPs to select genes that were significant in the same direction in both of these comparison. This comparison revealed that 449 transcripts were significantly increased in expression in NBL TICs as compared to SKPs and as compared to other tissues. Similarly, 1,059 genes were decreased in expression in NBL TICs in both comparisons, NBL TICs versus SKPs and NBL TICs versus other tissues. To confirm the differential expression of candidate 449 NBL TIC-enriched transcripts (transcripts enriched in NBL TICs as compared to both SKPs and other tissues) and candidate 1,059 NBL TIC-depleted transcripts (transcripts depleted in NBL TICs as compared to both SKPs and other tissues) identified using RNA sequencing, I analyzed eight NBL TIC lines from five patients and five SKP lines from four cancer-free children (Table 2.1) using Affymetrix Human Exon 1.0 ST Array data. This platform provides independent confirmation of gene expression at the level of exons [282,283]. Analysis of exon array data, 95 conducted as described in Methods, confirmed the differential expression of 321 (71%) NBL TIC-enriched and 819 (77%) NBL TIC-depleted transcripts, which were identified as significantly differentially expressed between NBL TICs and SKPs using microarrays (Figure 3.1; Appendix C). These genes represented robust sets of NBL TIC-enriched and depleted transcripts that I analyzed further to identify the pathways disrupted in NBL TICs. 3.2.2 Elevated mRNA levels of BRCA1 signaling pathway members are associated with the NBL TIC phenotype To assess the functional significance of transcripts differentially abundant in NBL TICs, I conducted a pathway enrichment analysis using Ingenuity software (Ingenuity Systems, www.ingenuity.com) as described in Methods. The analysis revealed several signaling pathways significantly associated with the NBL TIC-enriched transcripts (Fisher‘s Exact P <0.05). The pathways are listed below along with the number of NBL TIC-enriched transcripts involved in each pathway: ―Role of BRCA1 in DNA Damage Response‖ (13 genes), ―Purine Metabolism‖ (20 genes), ―Mitotic Roles of Polo-Like Kinases‖ (8 genes), ―Pyrimidine Metabolism‖ (13 genes), ―Role of CHK proteins in Cell Cycle Checkpoint Control‖ (6 genes), ―One Carbon Pool by Folate‖ (6 genes), ―Cell Cycle: G2/M DNA Damage Checkpoint Regulation‖ (5 genes), ―ATM Signaling‖ (4 genes), ―Cleavage and Polyadenylation of Pre-mRNA‖ (2 genes), ―Alanine and Aspartate Metabolism‖ (3 genes) (Figure 3.2A). In contrast, the following pathways were significantly enriched among NBL TIC-depleted transcripts (Fisher‘s Exact P <0.05): ―Axonal Guidance Signaling‖ (43 genes), ―Hepatic Fibrosis / Hepatic Stellate Cell Activation‖ (21 genes), ―Coagulation System‖ (10 genes), ―Colorectal Cancer Metastasis Signaling‖ (25 genes), ―CXCR4 Signaling‖ (18 genes), ―Germ Cell-Sertoli Cell Junction Signaling‖ (18 genes), ―Factors Promoting Cardiogenesis in Vertebrates‖ (12 genes), ―TGF-beta Signaling‖ (12 genes), ―ILK Signaling‖ (19 genes), and ―Complement System‖ (7 genes) (Figure 3.2B). Of the 321 genes significantly upregulated in NBL TICs (Appendix C), thirteen were known members of the BRCA1 DNA damage response pathway (Figure 3.2C). This pathway was identified as the most significantly associated with the NBL TIC-enriched transcripts, such that 13 of the 53 pathway members were among the 321 NBL TIC-enriched transcripts. In addition, eight and eleven genes were associated with polo-like kinase and cell cycle 96 checkpoint control pathways, respectively, both of which are direct downstream targets of BRCA1 signaling (Figure 3.2C). 3.2.3 MudPIT analysis confirms the abundance of DNA repair proteins in the proteome of a NBL TIC line To assess the contribution of the NBL TIC-enriched transcripts to the NBL TIC proteome, we conducted a Multidimensional Protein Identification Technology (MudPIT) analysis of whole-cell lysate and a membrane-enriched fraction of NBL TIC line NB88R2 generated from a bone marrow metastasis of a high-risk patient. The MudPIT technique involves digesting the protein sample into peptides with trypsin and separating the peptides with two liquid column chromatography steps. As the peptides elute from the second column they are assigned unique mass/charge ratio fingerprints that can be used to reveal the identity of each protein [284]. The MudPIT approach can effectively identify thousands of concurrently expressed proteins for global or subcellular fraction–specific proteomic profile analyses [285]. The MudPIT analysis of the whole-cell lysate isolated from line NB88R2 cells revealed 819 proteins in which each protein was identified by at least two peptides. A similar analysis identified 1,530 proteins in the membrane-enriched fraction isolated from the same line. Of the 321 TIC-enriched genes, all of which were expressed in line NB88R2, peptides for 75 were detected by MudPIT in either whole-cell or membrane-enriched lysate of line NB88R2 or both (Table 3.3). Forty-five of the detected proteins were encoded by genes that were expressed in the 75% to 100% expression percentile in the NB88R2 line, whereas only two protein products were detected for genes expressed in the 0% to 24% expression percentile, indicating a correlation between transcript abundance and MudPIT analysis in one cell line. In addition, the median expression level of the genes for which the protein products were detected by MudPIT was 206, while the median expression level of the genes for which the protein products were not detected was 73.3. To investigate the significance of this difference, we compared the NB88R2 square-root-transformed average expression levels of the 75 NBL TIC-enriched genes for which the protein products were detected by MudPIT to the average expression levels of the 246 NBL TIC-enriched genes for which the protein products were not detected by MudPIT using a one-tailed two-sample equal variance T-test. Based on the P = 7.365E-24, we rejected the null hypothesis that the two expression means 97 were the same, providing evidence for a correlation between the higher expression level of a gene and the ability to detect its protein product by MudPIT. According to the Ingenuity Knowledgebase annotation, 21% (16 of 75) of the detected proteins were associated with the DNA replication, recombination, and repair functional category (Ingenuity Systems, www.ingenuity.com), including PARP1, PCNA, UBE2N, FEN1, HMGB2, and RFC, which forms a major complex interacting with BRCA1 [286]. This result suggests that DNA repair proteins are expressed in the proteome of at least one NBL TIC line, providing further support to the results of the gene expression analysis. 3.2.4 Known drug targets among NBL TIC-enriched transcripts Because the most direct pharmacologic intervention is inhibition of a target protein [287], I focused further functional analyses on genes upregulated rather than downregulated in NBL TICs with respect to SKPs and other tissues. Drug repositioning, in which existing drugs are used for novel indications, is a powerful approach to novel therapy development because it greatly reduces the cost and time required to clinically develop a new therapeutic option [288]. I therefore aimed to use NBL TIC-enriched genes to identify targets of existing therapeutics with the concept that such drugs could be potentially effective against recurrent NBL. I applied the Ingenuity Knowledgebase (Ingenuity Systems, www.ingenuity.com) tool to map the 321 NBL TIC-enriched transcripts, as well as their interacting partners, to known drugs. This analysis revealed thirty known drug targets among the NBL TIC-enriched genes and their interacting partners defined by the Ingenuity Knowledge Base (bold type in Table 3.3 indicates the NBL TIC-enriched genes). Seventeen out of thirty of the predicted drug targets have been explored preclinically or clinically for the treatment of NBL (Table 3.4). These drugs included both general chemotherapeutics, such as etoposide, becatecarin, doxorubicin, flavopiridol, and vincristine, all of which are currently approved or in trials for NBL, as well as targeted agents such as BCL2 inhibitors, evaluated for the treatment of NBL [289]. Several agents predicted by my analysis, such as HDAC inhibitors and PARP inhibitors, have already shown promise in the management of chemotherapy-resistant NBL [290,291] suggesting that our approach can identify drug targets relevant to the disease. In addition to known NBL drug targets, my analysis predicted genes and gene products targeted by existing drugs that at the time of the publication of this study had not been implicated clinically as therapeutic targets for high-risk NBL. These molecules included 98 AURKB, PLK1, ADORA2A, CXCL10, SLC1A4, COL14A1, TNFRSF10B, ITGA2b, and IL6. Based on biological and clinical considerations discussed in Section 3.2.5, we selected AURKB for further evaluation as a potential drug target against metastatic NBL. 3.2.5 Targeting BRCA1 signaling: inhibition of AURKB is selectively cytotoxic to NBL TICs The Aurora kinase family includes three serine/threonine kinases involved in the control of the cell cycle. Inhibitors of Aurora A and B kinases have shown promise as anticancer agents for the treatment of solid tumors and leukemias [292]. Although an Aurora A kinase inhibitor is in an ongoing phase I/II clinical trial for NBL (NCT00739427), Aurora B kinase inhibitors have not been investigated in relation to NBL. A recent report suggested a direct link between Aurora B kinase and BARD1, a key component of the BRCA1 signaling pathway that is also associated with susceptibility to NBL [197,269]. This report suggested that full-length BARD1, expressed by normal cells, interacts with BRCA1 and mediates AURKB degradation, while a shorter BARD1beta isoform lacking the BRCA1 interaction domain, expressed by some cancer cells, scaffolds AURKB with BRCA2 stimulating cellular proliferation (Figure 3.2C) [269]. In this study, AURKB was highly expressed in NBL TICs, at the average expression level of 44.35 Reads Per Kilobase of gene model per Million mapped reads (RPKM)(range 9.83—67.66 RPKM). In contrast, AURKB transcripts were not detectable above background in SKPs or other normal samples (Section 3.2.7). The BARD1/AURKB relationship, together with the aberrant expression of the BRCA1/BARD1 pathway and AURKB in NBL TICs observed in this study, as well as the clinical feasibility of Aurora kinase inhibitors provided a rationale for exploring the antiproliferative potential of Aurora B kinase inhibitors in NBL TICs. To assess whether elevated mRNA levels at the AURKB locus in NBL TICs compared to SKPs and a panel of other cancers corresponded to increased levels of AURKB protein, we performed Western blot analysis using whole-cell lysates from three NBL TIC lines (NB12, NB88R2, and NB122R) and two SKP lines (FS274 and FS227). This analysis revealed the presence of the AURKB protein in NBL TICs but detected no protein in SKPs, supporting the gene expression result (Figure 3.3A). To gain further insight into the role of AURKB in controlling NBL TIC proliferation, we performed shRNA knockdown experiments in NBL TIC lines NB12 and NB88R2. NBL TICs stably infected with 99 lentiviruses encoding two separate shRNAs to AURKB showed 77% to 80% growth reduction compared with NBL TICs infected with lentiviruses carrying mock shRNAs to green fluorescent protein or β-galactosidase (Figure 3.3B). The observed reduction in proliferation following AURKB knockdowns supports the premise that AURKB signaling is important for the viability of NBL TICs. To assess whether pharmacologic inhibition of AURKB would have the same effect on NBL TIC proliferation as the AURKB knockdowns done above, we used AZD1152, a selective AURKB inhibitor that is currently undergoing phase I/II testing in patients with acute myelogenous leukemia (NCT00497991). NBL TIC lines (NB12 and NB88R2), as well as the FS283 SKP line, were treated with a range of AZD1152 concentrations, and cell growth was assessed 96 hours later using alamarBlue reduction [293] as a read-out of cellular metabolic activity. As shown in Figure 3.3C, proliferation of NBL TICs is reduced following treatment with AZD1152, showing low micromolar EC50 values (1.5-4.6 μmol/L). In contrast to this, SKPs were less sensitive to AZD1152, exhibiting higher EC50 values (12.4 μmol/L). The enhanced reduction of proliferation of NBL TICs compared to SKPs following genetic (shRNA) and pharmacological (AZD1152) inhibition of AURKB is consistent with the hypothesis that AURKB is a potential drug target for metastatic NBL. 3.2.6 Exon-level expression analysis of BARD1 reveals a potential mechanism for the sensitivity of NBL TICs to AURKB inhibition The full-length BARD1 isoform interacting with BRCA1 was reported to mediate AURKB degradation, while the shorter BARD1beta isoform lacking the BRCA1 interaction domain was reported to be involved in the stabilization of AURKB via interactions with BRCA2 [269]. Since NBL TICs expressed AURKB both at the level of mRNA (Figure 3.1B) and protein (Figure 3.3A), and were sensitive to AURKB inhibition (Figure 3.3B and C) we hypothesized that NBL TICs expressed the BARD1beta isoform that is involved in the scaffolding of AURKB and BRCA2. Upon inspection of the NBL TIC and SKP RNA-Seq data, I found that SKPs expressed BARD1 at the expression threshold level of Reads Per Kilobase of exon model per Million mapped reads (RPKM) ~ 1 [150]. Therefore, I sought an alternative source of reference normal RNA-Seq data to study the exon usage at the BARD1 locus in normal and NBL TIC cells. To address the hypothesis that NBL TICs preferentially express the 100 BARD1beta isoform, while normal cells express the full-length BARD1 isoform, I used the RNA-Seq data from NBL TIC libraries (Table 3.1), and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611). The exon-level expression at the BARD1 locus was quantified in these samples as described in the Methods using the RPKM expression measure [150]. The exon usage of each BARD1 exon was defined as the splice index (SI), calculated as the percent of the RPKM level of each exon from the overall RPKM level of the gene, (exon RPKM/gene RPKM)*100. The average SI of exon 2 in NBL TICs, computed across the 10 NBL TIC RNA-Seq libraries, was 2.17% (SD = 0.96), which was significantly less than the average value of 11.5% (SD = 4.01%), computed across the 16 normal tissues, as assessed by a moderated Ttest with the Benjamini-Hochberg multiple testing correction implemented in the LIMMA Bioconductor package [249] (BH-corrected q < 0.05). Similarly, the average SI of exon 3 was 8.75% (SD = 2.60%) in NBL TICs, which was found to be significantly different from the average value of 31% (SD = 7.00%) in the normal tissues (BH-corrected q < 0.05) (Figure 3.5A). The moderated T-test was selected for this analysis as it does not assume independence of exons from each other, which is likely a biologically relevant assumption [249]. Instead, to compute the T-statistic, the empirical Bayes moderated T-test method uses information from all exons in a gene, by computing a weighted average between the variance at each exon and the variance across all exons at the locus [249]. The moderated T-test has been used previously for studying differential expression in RNA-Seq data [294] and for differential splicing analysis using exon arrays [295]. Upon manual inspection in the IGV browser [163], exon 1 was found to have a high GC content (average 70% or more), which may account for the low coverage of this region in all samples. Due to the low coverage of exon 1, its SI could not be reliably assessed in this study. Based on the UniProtKB records, the BRCA1 interaction region of BARD1 is comprised of residues 26-119, encoded by a portion of exon 1, exon 2 and exon 3 (Figure 3.5C) [296]. Therefore, the finding of the lower expression of exons 2 and 3 of BARD1 in NBL TIC cells is consistent with the expression of the shorter BARD1beta isoform that has been reported to be involved in the stabilization of AURKB in cancer cells [269]. We also used the trans-ABySS de novo assembly pipeline [297] discussed in Section 4.4.6 to reconstruct the structure of BARD1 transcripts expressed by NBL TICs. This pipeline 101 assembled short RNA sequencing reads into contigs, aligned the contigs to the reference hg18 genome, and then compared the alignments to the annotated transcript models from Ensembl 54 [298]. Since exon 1 of BARD1 was not covered by sequencing reads, we were unable to assemble contigs that spanned the full length of BARD1 transcripts. However, we detected contigs that were missing exons 1, 2, and 3 providing additional evidence for the expression of the BARD1beta isoform by NBL TICs. 3.2.7 Relevance to primary neuroblastoma In Sections 3.2.1-3.2.5 I found that the mRNA levels of members of the BRCA1/BARD1 signaling pathway were significantly higher in predominantly metastasesderived NBL TICs than in normal neural crest-like cells (SKPs), and other cancers. Moreover, both transcript and protein levels of AURKB, a member of the BRCA1/BARD1 pathway, were found to be enriched in expression in NBL TICs compared to SKPs (Sections 3.2.2 and 3.2.5). We also showed that genetic and pharmacological inhibition of AURKB was cytotoxic to NBL TICs, and less so to SKPs. In Section 3.2.6 I linked the observation of the preferential expression of AURKB by NBL TICs to the expression of the oncogenic BARD1beta isoform that was reported to stabilize AURKB in cancer cells (Figure 3.5C) [269]. Since NBL TICs used in this analysis are predominantly derived from bone marrow metastases of relapsed NBL patients [279], I asked whether the BRCA1/BARD1 pathway, the oncogenic BARD1beta isoform and AURKB were also expressed by primary NBL cells. To address this question, I used the RNA-Seq data from 10 primary NBL tumors, described in Chapter 4 and Appendix D. To investigate whether the mRNA expression of BRCA1/BARD1 pathway members was enriched in primary NBL tumors with respect to normal cells, I compared the expression profiles of 10 primary NBL tumors (Appendix D) and 16 normal tissues from the Illumina BodyMap 2.0 project (Section 3.2.6). I used the Reads Per Kilobase of gene model per Million mapped reads (RPKM) as a measure of gene expression [150], and applied the methods in the LIMMA package [249] to identify genes significantly enriched in expression in NBL cells, as described in Section 3.4.3. This analysis revealed 1,828 genes with evidence of increased mRNA abundance in NBL cells compared to normal tissues (BenjaminiHochberg-corrected q < 0.05). Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) was then used to identify significantly enriched annotations within this 102 gene list, as described in Section 3.4.3. The pathway enrichment analysis revealed that the pathway entitled ―Role of BRCA1 in DNA Damage Response‖ was the most significantly enriched annotation among the 1,828 genes (Fisher‘s Exact P < 0.05), such that 15 out of 53 members of this pathway (FANCG, FANCA, FANCD2, RAD51, BRCA1, BACH1, AURKB, BLM, RFC, MSH2, SWI/SNF, OCT1, TP53, PLK1, E2F) were more abundant in NBL cells compared to normals at the level of mRNA (Benjamini-Hochberg-corrected q < 0.05). Having established that the BRCA1/BARD1 signaling pathway annotation was significantly enriched among transcripts increased in expression in NBL tumors compared to normal cells, we used the RPKM measure to directly compare the AURKB expression levels, and BARD1 exon usage in NBL TICs, primary NBL and Illumina BodyMap 2.0. We used the Illumina BodyMap 2.0 data rather than SKP data for the primary tumor versus normal analyses, since, as mentioned in Section 3.2.6, BARD1 exon usage could not be reliably assessed in SKPs due to the marginal expression of this gene in the SKPs RNA-Seq libraries. The average SI of exon 2 in NBL TICs and NBL primary tumors was 2.17% (SD = 0.96%) and 3.57% (SD = 1.89%), respectively, both of which were significantly less than the average value of 11.5% (SD = 4.01%), computed across the 16 normal tissues, as assessed by a moderated T-test in and the Benjamini-Hochberg multiple testing correction implemented in the LIMMA Bioconductor package [249] (BH-corrected q < 0.05). Similarly, the average SI of exon 3 was 8.75% (SD = 2.60%) and 7.55% (SD = 2.04%) in NBL TICs and primary tumors, respectively, both of which were found to be significantly different from the average value of 31% in the normal tissues (BH-corrected q < 0.05) (Figure 3.5A). The average genelevel RPKM values for AURKB were computed for Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). While AURKB expression was not detectable above background (RPKM ~ 1) in any of the 16 normal libraries, the average AURKB expression in NBL primary tumors and NBL TICs was 21.6 RPKM (range 2.55—36.95 RPKM) and 44.35 RPKM (range 9.83—67.66 RPKM), respectively (Figure 3.5B). These results are consistent with the interpretation that the BARD1beta isoform is present in both NBL TICs and primary NBL tumors, and that both primary tumors and NBL TICs may be sensitive to AURKB inhibition. 103 3.3 Discussion The rationale for the work in Chapter 3 was the idea that targeting cancer stem cellspecific proteins could be cytotoxic to cancer stem cells, while sparing their normal stem cell counterparts, and lead to discoveries with potential clinical application. This idea has been previously validated in a chronic myeloid leukemia model, where a leukemia stem cellspecific gene Alox5 was identified, and its inhibition led to the eradication of chronic myeloid leukemia in a mouse model [281]. Therefore, I aimed to apply the same concept to the NBL TIC model, using SKPs as normal reference stem cells. To identify transcripts for which the expression was enriched in NBL TICs, I used RNA-Seq expression data from NBL TICs, SKPs, and a compendium of cancer tissues. It is important to note that since the compendium of cancer tissues included RNA-Seq data from cancerous lymph nodes, B-cellspecific transcripts found in NBL TICs by us (not shown) and others [280], possibly as a result of contamination with Epstein-Barr-transformed lymphocytes, would not be identified as NBL TIC-enriched. The gene-level expression analysis of RNA-Seq data from ten NBL TIC samples revealed 321 transcripts increased in expression in NBL TICs compared to SKPs and a panel of cancer tissues. Twenty-one of these transcripts were members of the BRCA1 signaling pathway or its downstream components, which amounted to a statistically significant enrichment of this pathway annotation among transcripts increased in expression in NBL TICs (Fisher‘s Exact P < 0.05). A key component of the BRCA1 pathway, BRCA1associated RING domain protein 1 (BARD1), was shown to act as a predisposition locus for high-risk NBL by a single nucleotide polymorphism (SNP)–based genome-wide association study [197]. In this study of more than 500 high-risk NBL patients, also described in Section 1.9.2, six intronic SNPs at the BARD1 locus, contained within BARD1 introns 1, 3, and 4, met genome-wide significance for association with the disease (odds ratio for the most significant SNP = 1.68; 95% confidence interval 1.49 to 1.90; P = 8.65E-18 ). Evidence in breast tumors suggests that BARD1 is a regulator of the tumour-suppressor function of BRCA1, and can act as a tumor suppressor itself [299,300]. In particular, the BARD1/BRCA1 heterodimer is important for the tumor suppressor activity, such that losses of BARD1, BRCA1 or their interaction are tumorigenic and result in similar basal-like phenotypes in breast cancers [300]. Preliminary investigations of the effects of the BARD1 104 NBL risk alleles identified in the genome-wide association study [197] suggest that these alleles result in the overexpression of the oncogenic BARD1beta isoform (Figure 3.5C) [196]. The BARD1beta isoform lacks exons 2 and 3 that encode the RING-finger domain involved in the interaction with BRCA1 [269]. Aberrant BARD1 splicing, although not the isoform seen in NBL, has been previously reported in other cancers, including ovarian [301], colon [302] and non-small cell lung cancers [303]. In this study, we also observed that NBL TICs and primary tumors, but not normal tissues, expressed the oncogenic BARD1beta isoform (Section 3.2.7) that does not interact with BRCA1, but instead is involved in scaffolding BRCA2 and AURKB (Figure 3.5C) [269]. To identify existing therapeutics that could be applied to the treatment of recurrent NBL, I used Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) to analyze the functional significance of the identified genes and match them against a database of available drugs. In total, thirty targets with an available inhibitor were identified, nine of which have never been implicated in NBL treatment. Aurora kinase B (AURKB), one of the nine novel drug targets, was selected for further validation based on two factors: its link to the BRCA1 signaling pathway through reported interactions with the shorter (beta) isoform of the NBL predisposition locus BARD1 [269], and the known role of its family member, AURKA, in NBL [304]. Both AURKA and AURKB are essential for proper chromosome alignment and separation during mitosis. The inhibition of either protein results in gross defects in chromosome segregation: aneuploidy in the case of selective AURKA inhibition, and polyploidy in the case of selective AURKB inhibition, either leading to cell death [305]. Treatments with a selective AURKB inhibitor, AZD1152, were cytotoxic to NBL TICs used in the study but not to normal pediatric neural crest-like precursor cells. Although AURKA inhibitors are currently in clinical trials for NBL (NCT00739427), to our knowledge this study provides the first report of AURKB inhibitors as potential therapeutics for NBL. Because AURKB inhibitors are already in clinical trials, there is potential for rapid translation of the finding in NBL to therapy against the disease. The selective activity of AZD1152 in NBL TICs compared to SKPs, which is likely due to the differential AURKB protein abundance in NBL TICs compared with SKPs, provides a foundation for further exploring AURKB as a drug target for pediatric NBL. 105 An independent validation of the potential significance of AURKB in NBL is the preliminary report from the KidsCancerKinome initiative that studied a panel of pediatric tumors and cell lines and found that both AURKA and AURKB were expressed at a high level in tumors with poor prognosis, including high-risk NBL [306]. The therapeutic potential of inhibiting AURKA and AURKB in NBL is currently being investigated by the group through functional studies, including shRNA knockdowns and in vivo inhibitor studies (Ellen Westerhout, personal communication). The confirmation of our finding by an independent group of investigators studying primary NBL tumors lends credibility to our bioinformatic approach, in which we used normal SKPs and a compendium of cancer tissues to select NBL TIC-enriched markers. Further validation of the results from my bioinformatic analysis is provided by two reports that used NBL TICs [307] and primary NBL tumors and cell lines [308] to provide experimental evidence of the therapeutic potential of PLK1 inhibition in high-risk NBL. As shown in Figure 3.2C, PLK1 signaling is downstream of BRCA1/BARD1 pathway, and the PLK1 molecule was also suggested by my analysis as one of the potential therapeutic targets against NBL TICs (Table 3.4). In conclusion, the work described in this Chapter provides the first high-resolution system-level analysis of NBL TICs and a proof of principle that next-generation sequencing of primary human NBL TICs can reveal therapeutically relevant candidates for NBL. Specifically, we showed that inhibiting an NBL TIC-enriched transcript implicated in a relevant pathway is selectively cytotoxic to these cells compared to their normal stem cell counterparts (SKPs). The selective cytotoxicity against cancer stem cell-like NBL TICs is particularly important for high-risk NBL, as current therapies used in the management of the disease can effectively reduce tumor burden, but do not produce a durable cure in the majority of patients [174]. Since cancer stem cells are thought to be associated with disease relapse [243], the specific targeting of NBL TICs may help result in stable long-term remission for high-risk NBL patients. The apparent selectivity of AURKB inhibition, as compared to normal pediatric stem cells (SKPs), may imply that this treatment would potentially be less toxic to children with NBL. 106 3.4 Materials and methods 3.4.1 RNA sequencing and data analysis NBL TICs and SKPs were cultured as previously described [279,234]. Briefly, the cells were cultured in DMEM-F12 medium, 3:1 (Invitrogen), containing 2% B27 supplement (Gibco), 40 ng/mL basic fibroblast growth factor 2, and 20 ng/mL epidermal growth factor (both from Collaborative Research; proliferation media) in 75 cm2 flasks in a 37°C and 5% CO2 tissue-culture incubator. The cell growth conditions were normalized such that NBL TICs were cultured for 7 days and SKPs for 14 days post plating prior to harvesting in exponential growth phase and RNA isolation for transcriptome analysis. Details of the NBL TIC and SKP samples used in this analysis are provided in Table 3.1. RNA sequencing libraries from NBL TICs and SKPs were constructed from DNase I treated mRNA as previously described [149,102]. The libraries were sequenced on an Illumina Genome Analyzer. The read length and amount of aligned sequence data generated for each library is provided in Table 3.2. The reads were aligned to the human reference genome build hg18 (National Center for Biotechnology Information Build 36) and a database of known exon junctions [149] using MAQ software version 0.7.1 in paired-end mode [309]. Duplicate reads were retained for the expression analysis. The number of bases sequenced per number of exonic bases mapped was used as a measure of gene expression level for each gene [114]. The sequencing and processing of RNA-Seq libraries from other tumor types was conducted according to the same production protocol [102,149]. The read length and amount of aligned sequence data generated for each library is provided in Table 3.2. The reads were aligned to the human reference genome build hg18 (National Center for Biotechnology Information Build 36.1) and a database of known exon junctions [149] using MAQ software version 0.7.1 in paired-end mode, and the duplicate read pairs were removed [309]. The number of bases sequenced per number of exonic bases mapped was used as a measure of gene expression level for each gene [114]. The genes with the cumulative expression value of less than 10 (computed across all samples) were filtered out from the analysis. The expression values were square-root transformed and used in the lmFit function of the Linear Models for Microarray Data (LIMMA) Bioconductor package to estimate fold changes between the compared groups by fitting linear models to each gene [249]. The LIMMA method was selected for this analysis as it was previously successfully used for the 107 analysis of RNA-Seq data [294]. The NBL TICs versus SKPs and NBL TICs versus other cancers comparisons were conducted similarly, such that single contrasts were defined in each analysis creating pairwise comparisons [249]. For both pairwise comparisons (NBL TICs versus SKPs, NBL TICs versus other cancers), the moderated T-statistic with Benjamini-Hochberg (BH) multiple testing correction implemented in the eBayes function was used to assess the significance of differential expression. Those genes with BH– corrected q < 0.05 were considered statistically significant. 3.4.2 Microarray experiments and data analysis Cells were collected and lysed in Trizol, and RNA was purified using the RNeasy mini kit (Qiagen). RNA samples (Table 2.1) were analyzed on Affymetrix GeneChip Human Exon 1.0 ST Arrays. The data were checked for batch effects, background corrected, and normalized according to the Robust Multichip Average procedure using the Affymetrix Expression Console software. Gene-level expression summaries were computed based on all core probes. Differential gene expression was assessed using the lmFit function of the Linear Models for Microarray Data (LIMMA) Bioconductor package [249] as described previously in Section 2.4.3. 3.4.3 Identification of NBL TIC-enriched and depleted genes and the functional enrichment analysis List of significantly differentially expressed genes from each analysis (NBL TICs versus SKPs and NBL TIC versus tissue pool, as measured by RNA sequencing) were overlapped to identify genes that are significantly enriched and depleted in NBL TICs with respect to both SKPs and a panel of cancer tissues. The lists of NBL TIC-enriched and NBL TIC-depleted transcripts were then compared to the lists of differentially expressed genes from the microarray analysis described in Section 3.4.2 (NBL TICs versus SKPs) to derive robust sets of genes increased and decreased in expression in NBL TICs compared to SKPs and other cancers, and confirmed by both RNA sequencing and microarrays (Appendix C). Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) was then used on these sets to select canonical pathways significantly enriched among the microarrayconfirmed sets of NBL TIC-enriched and NBL TIC-depleted transcripts (Fisher‘s Exact P < 0.05). The pathway enrichment analysis implemented in Ingenuity uses a Fisher‘s Exact test to assess the null hypothesis of the number of observed genes in a particular pathway being 108 produced by chance (Ingenuity Systems, www.ingenuity.com). The null hypothesis is rejected at Fisher‘s Exact P < 0.05. 3.4.4 Gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry A crude membrane fraction was prepared as follows. NB88R2 cells were swollen in hypotonic buffer (20 mmol/L Tris, pH 7.4; 10 mmol/L KCl; 5 mmol/L sodium vanadate; 1mmol/L phenylmethylsulfonylfluoride) and lysed by dounce homogenization. The cleared cell lysate was centrifuged for 15 minutes at 6,000 × g to collect the crude membrane fraction. The protein fraction was resuspended in urea buffer (8 mol/L urea, 2 mmol/L HEPES, 2.5mmol/L sodium pyrophosphate, 1 mmol/Lβ-glycerophosphate, and 1 mmol/L vanadate; Cell Signaling Technology) and was reduced and alkylated with 4.5 mmol/L dithiothreitol (DTT) and 10 mmol/L iodoacetamide, respectively. Whole-cellular fraction was prepared as follows. NB88R2 cells were lysed in urea lysis buffer (8 mol/L urea, 2 mmol/L HEPES, 2.5 mmol/L sodium pyrophosphate, 1 mmol/L β-glycerophosphate, and 1 mmol/L vanadate) and sonicated (3 bursts of 4 W for 10 s). The cell lysate was cleared by centrifugation (20,000 × g for 15 min at 4°C) and was reduced and alkylated with 4.5 mmol/L DTT and 10 mmol/L iodoacetamide, respectively. Proteins were digested with trypsin and purified using C18 reverse phase resin prior to mass spectrometry. The gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry (MudPIT) analysis was done for 8 cycles as described [284] with the following modifications: approximately 60 μg (membrane fraction) or 40 μg (whole-cell fraction) of digested protein was analyzed on a linear ion-trap LTQ-Orbitrap mass spectrometer (ThermoFisher). Samples were loaded using a Proxeon HPLC system (Thermo Fisher Scientific) and subjected to MudPIT analysis. All data was analyzed using Sequest (ThermoFinnigan; version SRF v. 5) and X! Tandem (http://www.thegpm.org/; version 2007.01.01.2 for membrane fraction or version TORNADO 2009.04.01.3 for whole-cell fraction) search algorithms using the Human International Protein Index database (version 3.41 with 72,155 entries or version 3.66 with 86,845 entries for membrane and whole-cell fractions, respectively). Sequest and X! Tandem were searched with a fragment ion mass tolerance of 0.50 or 0.40 Da for membrane and whole-cell fraction, respectively, and a parent ion tolerance of 2.0 or 5.0 ppm for membrane or whole-cell fraction, respectively. The 109 fragment ion mass tolerance defines an error range for considering two ion peaks as identical, while the parent ion tolerance defines the error range for peptide identification in the database. The iodoacetamide derivative of cysteine was specified as a fixed modification in Sequest and X! Tandem. The oxidation of methionine was specified as a variable modification. Proteins were accepted based on the following criteria. At least two peptides per protein were identified with a probability threshold of 95% or greater or 90% or greater as derived by the Peptide Prophet algorithm [310] and an overall protein identity of >95.0% or >90% using the Protein Prophet algorithm was used for the membrane-enriched fraction or whole cell lysate, respectively [311]. 3.4.5 AlamarBlue assay NB12, NB88R2, and FS283 spheres were dissociated into single cells and seeded in triplicates at 3,000 cells per well in 50 μL medium containing 30% SKPs conditioned media in non–tissue culture–treated 96-well plates (Corning Life Sciences). AZD1152 (Selleck Chemicals LLC) was dissolved in dimethyl sulfoxide (DMSO) to a stock concentration of 50 mmol/L, from which 1:3 fold sequential dilutions were prepared. Intermediate dilutions of the compound were made in medium and immediately added to the cells in a volume of 50 μL. Cells treated with 0.05% DMSO in the absence of the drug were used as a control for optimal cellular proliferation, whereas wells containing media only were used to determine the background fluorescence; alamarBlue (10 μL) was added to each well after 72 hours, followed by incubation for an additional 24 hours. Fluorescence intensity was measured using PHERAstar SpectraMax Plus384 microplate reader (BMG Labtech) with an excitation filter of 535 nm and an emission filter of 590 nm. Percentage reduction of alamarBlue was calculated as ((mean fluorescence of treated wells - background fluorescence)/(mean fluorescence of DMSO-treated wells - background fluorescence)) * 100. Half maximal effective concentration (EC50) curves were generated using GraphPad Prism 5 software (GraphPad Software, Inc.). 3.4.6 Western blotting Cells were harvested, washed with cold HBSS, and lysed with NP40 lysis buffer containing 10 mmol/L Tris (pH 8.0), 150 mmol/L NaCl, 10% glycerol, 1% Nonidet P-40, 1 mmol/L phenylmethylsulfonylfluoride, 1 mmol/L orthovanadate, and proteinase inhibitor cocktail tablet (Complete Mini, EDTA-free, Roche). Cells were lysed for 10 to 20 minutes on 110 ice and centrifuged for 10 minutes at 12,000 rpm at 4°C. Protein amounts were determined by BCA Assay (Pierce), and 40 μg of protein was loaded per lane. Western blots were probed with rabbit polyclonal anti-Aurora B antibody (Abcam; ab2254) and mouse monoclonal antiglyceraldehydes-3-phosphate dehydrogenase antibody (Santa Cruz; sc-47724) in 5% w/v nonfat dry milk in TBS/0.1% Tween-20 over night at 4°C. Blots were developed using ECL or ECL-plus reagent (GE Healthcare Life Sciences). 3.4.7 Small hairpin RNA (shRNA) knockdowns Cell lines were stably infected with either a mock treatment or lentivirus-encoding shRNAs of interest at a multiplicity of infection of 1.0. Seventy-two hours post infection, the virus was removed, and cells were seeded in triplicate at a density of 10,000 per well in 24well plates. The remaining cells were used for RNA isolation to determine the efficiency of knockdown by quantitative reverse transcriptase qRT-PCR. Viable cell numbers were determined on days 1, 3, 5, and 7 post plating by removing cells from wells and counting via hemocytometer. The experiments were conducted in triplicates. 3.4.8 Exon-level analysis of RNA sequencing data The BARD1 splicing analysis using RNA-Seq data was conducted as described below. The RNA-Seq data from NBL TIC libraries (Table 3.1), NBL primary tumors (Appendix D) and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611) was processed as described in Section 3.4.1. The exon coverage analysis was based on Ensembl gene annotations (homo_sapiens_core_54_36p) [298]. These annotations were converted into one model per gene by taking all transcripts of a given gene and collapsing them into a single gene model such that exonic bases in a collapsed gene model were the union of exonic bases that belonged to all known transcripts of the gene. The analysis used SAMtools version 0.1.13 pileup [312] to get the per-base coverage depths, and excluded reads with mapping quality < 10 and reads flagged as poor quality according to the Illumina chastity filter. The final analysis report included coverage information for each individual exon and intron in the collapsed gene models, as well as for the cumulative coverage across all the exons in each model. These coverage statistics were computed using the RPKM method [150]. The RPKM of 1 was used as a threshold to consider an exon expressed above background [150]. RPKM for each exon was calculated using the formula: (number of reads mapped to an exon x 111 1.00E9)/(NORM_TOTAL x length of the exon), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. RPKM for the whole gene was calculated using the formula: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. The splice indices for BARD1 (ENSG00000138376) exons were computed as (exon RPKM / gene RPKM) *100. The significance of the observed differences in splice indices between sample pairs was assessed using the R Linear Models for Microarray Data (LIMMA) package adopted for splicing analysis as previously described [295]. The Benjamini-Hochberg correction for multiple testing was used, and the corrected q-values of less than 0.05 were considered statistically significant. 3.4.9 AURKB expression analysis The AURKB expression analysis in Section 3.2.7 was conducted using the RPKM expression measure as described above for BARD1. The gene-level RPKM was computed as: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. The gene annotation was based on Ensembl 54 (homo_sapiens_core_54_36p) [298]. The RPKM of 1 was used as a threshold to consider an exon expressed above background [150]. 112 Figure 3.1 Transcripts enriched and depleted in NBL TICs compared with SKPs and other tumor tissues Differentially expressed genes were identified using RNA sequencing data from NBL TICs, SKPs and a panel of cancer tissues. An equivalent differential expression analysis was conducted using exon array data from NBL TICs and SKPs. (A). Venn diagrams summarize the overlap of the results from the three differential expression analyses (NBL TICs versus SKPs using microarray, NBL TICs versus SKPs using RNA-Seq, and NBL TICs versus other cancers using RNA-Seq) for upregulated (left panel) and downregulated (right panel) genes. (B). RNA sequencing expression profiles of 321 NBL TIC-enriched (red column) and 819 TIC-depleted transcripts (blue column) in NBL TICs, SKPs, and other cancer libraries are plotted as a heatmap with genes as rows and samples as columns. The transcripts are represented by rows and samples are represented as columns. The rows are centered and scaled by subtracting the mean of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). The NBL TIC libraries are labeled with the “TIC” prefix, and the tissue identities of the remaining libraries are explained in Table 3.2. The 321 NBL TIC-enriched genes and 819 NBL TIC-depleted genes were confirmed as significantly differentially expressed in all three comparisons as described in (A). The robustness of the heatmap was confirmed using the bootstrapping algorithm implemented in the Pvclust Bioconductor package [313], such that NBL TICs could be separated from the other tissues based on the expression of the 321 NBL TIC-enriched transcripts 98/100 times. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. 113 A 114 B 115 Figure 3.2 Pathway analysis of NBL TIC-enriched transcripts Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal canonical pathways significantly enriched among genes upregulated (A) or downregulated (B) in NBL TICs (Fisher‘s Exact P < 0.05). The ratios of observed versus total numbers of genes in each pathway are plotted with the orange line, whereas the lengths of the blue bars are the significance scores for each pathway; significance threshold (Fisher‘s Exact P < 0.05) is marked by the vertical orange line. (C). The pathway named ―Role of BRCA1 in DNA damage response‖ was most significantly upregulated in NBL TICs compared with SKPs and other tissues; pathway members for which the expression is increased in NBL TICs are highlighted in red, and the protein complexes are indicated using a bold circle. The recently reported protein-protein interaction between AURKB, BRCA2 and the short (beta) isoform of BARD1 is denoted with a dotted line. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. Systemlevel analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. 116 A 117 B 118 C 119 Figure 3.3 NBL TICs are sensitive to Aurora B kinase inhibition (A). Western blot analysis confirmed the presence of AURKB protein in NBL TICs but not in SKPs. Blots were probed with the rabbit polyclonal anti-Aurora B antibody (Abcam; ab2254) and the mouse monoclonal anti-GAPDH antibody (Santa Cruz; sc-47724). The AURKB band at 37 kDa is detectable in NBL TIC lines NB12, NB88R2 and NB122R, similarly to the positive control (HeLa cells). The AURKB band is undetectable in SKP lines FS274 and FS227. (B). Reduction of the proliferation of NBL TICs upon shRNA knockdown of AURKB. Growth curves of NBL TIC lines NB88R2 (top) and NB12 (bottom) infected with shRNA against AURKB or controls (left panel); quantitative reverse transcriptase PCR was used to determine the effectiveness of AURKB knockdown (76-86%) (right panel). All experiments were done in triplicates. (C). AlamarBlue assay revealed that AURKB inhibition with AZD1152 was effective in NBL TICs at EC50 of 1.5 to 4.6 μmol/L, whereas AURKB inhibition was effective in SKPs at 12.4 μmol/L. All experiments were done in triplicates. Reprinted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. 120 A 121 B 122 C 123 Figure 3.5 NBL cells preferentially express the oncogenic BARD1beta isoform that is involved in the stabilization of AURKB The RNA-Seq data from 10 NBL TIC libraries (Table 3.1), 10 NBL primary tumors (Appendix D) and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611) were analyzed for exon-level and gene-level expression as described in Methods. (A). Exon usage at the BARD1 locus is quantified using splice indices (SI). The SI for each BARD1 exon is computed as (exon RPKM/gene RPKM)*100, and the average SI value is calculated across each of the three groups: Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). The SI values for each of the BARD1 exons (x-axis) are plotted along the y-axis. The SI values of exons 2 and 3 (marked by stars) are significantly lower in the NBL primary tumors and NBL TICs, as compared to the normal tissues (Benjamini-Hochberg-corrected q < 0.05). This finding is consistent with the expression of the BARD1beta isoform by primary NBL cells and NBL TICs. (B). The genelevel expression of AURKB in each sample was quantified using the RPKM measure as described in Methods. The average gene-level RPKM value for AURKB is computed for each group: Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). While AURKB expression is not detectable above background (RPKM ~ 1) in any of the 16 normal libraries, the average AURKB expression in NBL primary tumors and NBL TICs is 21.6 RPKM (range 2.55—36.95 RPKM) and 44.35 RPKM (range 9.83—67.66 RPKM), respectively. (C). A cartoon representation of the hg18 Ensembl 54 BARD1 gene model [298]. The exons are depicted by squares, while introns are shown as lines. The protein domains are depicted with squares of different colors, as described in the legend, and are marked on the exons that encode these domains. The fulllength BARD1 transcript includes all coding exons, and contains three ANK repeats, two BRCT domains, and a RING-finger domain [296]. The BRCA1 interaction region includes the RING-finger domain and comprises residues 26-119, encoded by a portion of exon 1, exon 2 and exon 3. The BARD1beta transcript lacks exons 2 and 3 and encodes a protein product without the RING-finger domain that stabilizes AURKB through its scaffolding with BRCA2 [269]. 124 A B 125 C 126 Table 3.1 Human NBL TIC and SKP lines used for gene expression analysis Human NBL TIC and SKP lines were analyzed by RNA-Seq (column 6) and/or microarray (column 5) to identify transcripts significantly enriched in NBL TICs (Section 3.2.1). The International Neuroblastoma Staging System (INSS) stage is listed in column 2, the MYCN oncogene amplification status of NBL samples is listed in column 3, and the tissue origin is listed in column 4. All NBL TIC lines are derived from high-risk NBL patients, while SKP lines are derived from cancer-free children. Superscripts designate samples from the same patient. Reprinted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. Systemlevel analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. Sample INSS Stage MYCN Description Human Exon Array NB121 4 Single copy Yes NB671 4 Single copy NB12-21 4 Single copy NB88L12 4 Single copy NB88R22 4 Single copy NB122R3 4 Single copy NB122L3 4 Single copy Bone marrow metastasis, relapse Bone marrow metastasis, remission Bone marrow metastasis, relapse Bone marrow metastasis, relapse Bone marrow metastasis, relapse Bone marrow metastasis, relapse Bone marrow metastasis, relapse RNASequencing (Library ID) Yes (HS0502) Yes (HS0499) Yes Yes (HS1041) Yes Yes (HS0382) Yes Yes (HS0627) Yes Yes (HS1040) Yes Yes (HS1151) 127 Sample INSS Stage MYCN Description Human Exon Array NB100 4 Amplified Yes NB1284 4 Amplified NB1534 4 Amplified NB121 4 Amplified FS210 Normal Single copy FS248 Normal Single copy FS253 Normal Single copy FS225 Normal Single copy FS227-P1 Normal Single copy FS227-P2 Normal Single copy FS229 Normal Single copy FS230 Normal Single copy Brain metastasis, relapse Bone marrow metastasis, diagnosis Primary tumor, postchemotherapy Bone marrow metastasis, diagnosis Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Neural crest stem cell-like SKPs Yes RNASequencing (Library ID) Yes (HS1149) Yes (HS1241) Yes (HS1593) Yes (HS1042) Yes (HS1043) Yes (HS1150) Yes Yes Yes Yes Yes 128 Table 3.2 List of RNA sequencing libraries and their sequencing statistics Messenger RNA from NBL TICs, SKPs, and a compendium of cancer tissues were sequenced on an Illumina Genome Analyzer. The reads were aligned to the human reference genome build hg18 (National Center for Biotechnology Information Build 36) and a database of exon junctions [149] using MAQ software version 0.7.1 in paired-end mode [309]. The duplicate reads were retained for this analysis. The median read length for each library is provided in column 3, and the total amount of aligned sequence is provided in column 4. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. Library Tissue source ID Median read Aligned sequence, length, bp bp HS0382 Neuroblastoma TICs 42 6,739,934,150 HS0627 Neuroblastoma TICs 36 3,985,563,940 HS0499 Neuroblastoma TICs 42 2,476,237,136 HS0502 Neuroblastoma TICs 42 2,215,675,756 HS1040 Neuroblastoma TICs 50 3,729,063,600 HS1041 Neuroblastoma TICs 50 4,313,933,000 HS1149 Neuroblastoma TICs 75 5,822,325,000 HS1151 Neuroblastoma TICs 50 3,374,707,500 HS1241 Neuroblastoma TICs 50 3,214,430,500 HS1593 Neuroblastoma TICs 50 5,252,354,800 HS1042 SKPs 50 4,209,705,900 HS1043 SKPs 50 4,453,217,800 HS1151 SKPs 50 3,374,707,500 HS0299 Breast cancer cell line 36 1,119,888,180 129 Library Tissue source ID Median read Aligned sequence, length, bp bp HS0327 Ovarian tumor 42 1,415,520,768 HS0419 Breast cancer cell line 36 1,177,457,184 HS0445 Breast cancer cell line 36 1,828,856,952 HS0462 Ovarian tumor 36 1,915,337,292 HS0463 Ovarian tumor 39 856,303,704 HS0464 Ovarian tumor 42 868,958,028 HS0465 Ovarian tumor 42 1,058,404,872 HS0466 Ovarian tumor 42 1,295,742,888 HS0467 Ovarian cancer cell line 39 775,075,260 HS0468 Ovarian tumor 36 2,698,323,072 HS0469 Ovarian tumor 42 1,374,384,648 HS0470 Ovarian tumor 36 2,285,498,720 HS0471 Ovarian tumor 42 1,393,590,120 HS0511 Breast tumor 36 6,236,709,588 HS0644 Lymphoma 36 5,816,275,584 HS0652 Lymphoma 36 2,332,176,480 HS0663 Lung tumor 42 283,413,396 HS0701 Ovarian tumor 46 1,074,023,336 HS0702 Ovarian tumor 50 1,830,544,072 HS0703 Ovarian tumor 36 1,422,457,368 HS0706 Lung tumor 36 1,459,970,088 HS0708 Ovarian tumor 36 1,760,040,532 HS0709 Oligodendroglioma cell line 36 2,741,182,488 HS0724 Blood from a cancer patient 42 2,119,757,136 HS0727 Lung tumor 42 2,421,817,104 HS0728 Lung tumor 42 4,112,016,636 HS1085 Oligodendroglioma tumor 50 5,024,879,600 HS1086 Oligodendroglioma tumor 50 6,907,718,400 130 Library Tissue source ID HS1400 Metastatic adenocarcinoma tumor Median read Aligned sequence, length, bp bp 50 12,806,918,200 131 Table 3.3 Proteins detected in the whole and crude membrane cell extract of NBL TIC line NB88R and their corresponding RNA-Seq expression level Protein detection was done as described in Methods. Briefly, at least two peptides per protein were identified with a probability threshold of 95% or greater or 90% of greater as derived by the Peptide Prophet algorithm [310] and an overall protein identity of >95.0% or >90% using the Protein Prophet algorithm was used for the membrane-enriched fraction or whole cell lysate, respectively [311]. In other words, the 95% CI cutoff for the membraneenriched fraction represents the 95% or greater likelihood of each protein being identified correctly. Similarly, the lowered threshold of 90% CI used for the whole cell lysate represents 90% or greater likelihood of each protein being identified correctly. The threshold was lowered for whole cell lysate analysis due to the lower sensitivity of this assay for protein identification [314]. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. 132 NBL TICenriched gene Average expression level in NBL TICs HNRNPU SFXN1 KPNB1 NUP210 NUP214 SLC7A6 XPO5 NUP107 SLC1A4 NUP88 HNRNPM HNRNPD FUBP1 HMGB2 TAF15 SFRS2 GTF2I HTT EPB41 998 140 652 605 231 232 212 107 279 98 364 379 271 306 278 505 129 531 229 PSME3 USP10 LMNB1 SFRS1 TMPO HNRNPH1 CPSF6 HNRNPR SFPQ IMMT SSRP1 NUP93 PCNA CYFIP2 CEP72 NOLC1 LARP1 STRBP ANKRD44 CLN6 287 118 292 526 304 628 189 257 588 151 369 121 229 535 17 285 750 134 76 76 Protein product type Transporter Transporter Transporter Transporter Transporter Transporter Transporter Transporter Transporter Transporter Transmembrane receptor Transcription regulator Transcription regulator Transcription regulator Transcription regulator Transcription regulator Transcription regulator Transcription regulator Plasma membrane protein Peptidase Peptidase Other Other Other Other Other Other Other Other Other Other Other Other Other Other Other Other Other Other MembraneEnriched Fraction (95% CI) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Whole Cell Lysate (90% CI) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 133 NBL TICenriched gene WDR77 MKI67 SUPT16H RFC5 PARP1 NNT MCM7 FH ATIC MTHFD1 PAICS MCM2 MCM6 TRAP1 GART MCM4 GOT2 KARS MCM3 RRM2 MARS UBE2N LBR TOP2A MRPL37 SCLY DARS2 DHTKD1 POLR1A RFC3 FEN1 MCCC1 TARS2 GPHN RRM1 SUPV3L1 Average expression level in NBL TICs 117 1253 268 57 368 212 374 95 227 188 357 323 188 242 205 390 202 203 346 425 266 119 299 586 123 70 79 133 195 99 133 71 71 77 218 59 Protein product type Other Other Nuclear protein Nuclear protein Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme Enzyme MembraneEnriched Fraction (95% CI) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Whole Cell Lysate (90% CI) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 134 Table 3.4 Known drug targets among NBL TIC-enriched genes Transcripts enriched in NBL TICs are in bold. The drug-target associations were obtained from the Ingenuity Knowledgebase (Ingenuity Systems, www.ingenuity.com). The drugs previously or currently used in NBL (based on literature review, Ingenuity Knowledgebase, or ClinicalTrials.gov; http://www.clinicaltrials.gov/ as of February, 2010) are underlined. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82. Gene symbol ADORA2A AURKB PLK1 PDE7A TYMS PRIM1 POLE3 RRM1 RRM2 PARP1 GART POLE TOP2A BCL2 SLC1A4 ODC1 IL6 Drug Caffeine-containing drugs, adenosine, istradefylline, dyphylline, binodenoson, regadenoson, aminophylline, clofarabine, theophylline AZD-1152 BI2536 Dyphylline, nitroglycerin, aminophyline, anagrelide, milrinone, dipyridamole, tolbutamide, theophylline, pentoxifylline Flucytosine, plevitrexed, nolatrexed, capecitabine, floxuridine, LY231514, 5-fluorouracil, trifluridine Fludarabine phosphate Gemcitabine Fludarabine phosphate, gemcitabine, clofarabine Triapine, hydroxyurea, fludarabine phosphate, gemcitabine INO-1001 LY231514 Nelarabine, gemcitabine, clofarabine, trifluridine Novobiocin, CPI-0004Na, pixantrone, elsamitrucin, AQ4N, BN 80927, tafluposide, norfloxacin, tirapazamine, TAS-103, gatifloxacin, valrubicin, gemifloxacin, nemorubicin, nalidixic acid, epirubicin, daunorubicin, etoposide, doxorubicin, moxifloxacin, becatecarin, mitoxantrone, dexrazoxane Oblimersen, (-)-gossypol, obatoclax, G3139 Riluzole Tazarotene, eflornithine Tocilizumab 135 Gene symbol ERBB2 Drug Trastuzumab, BMS-599626, ARRY-334543, XL647, CP-724714, HKI-272, lapatinib, erlotinib HDAC1 Tributyrin, PXD101, pyroxamide, MGCD0103, FR901228, vorinostat Abciximab, TP-9201, eptifibatide, tirofiban Adalimumab, infliximab, CDP870, golimumab, thalidomide, etanercept Corticosteroid-containing drugs (beclomethasone dipropionate) MDX-1100 GRN163L Colchicine/probenecid, XRP9881, E7389, AL-108, EC145, NPI2358, milataxel, TTI-237, vinflunine, podophyllotoxin, colchicines, epothilone B, TPI 287, docetaxel, vinorelbine, vincristine, vinblastine, paclitaxel, ixabepilone Flavopiridol Collagenase CS-1008 Dasatinib δ-Aminolevulinic acid ITGA2B TNF NR3C1 CXCL10 TERT TUBA1C CDC2 COL14A1 TNFRSF10B FYN ALAD 136 Chapter 4: Whole genome characterization of primary neuroblastoma tumors reveals a wide spectrum of somatic alteration4 4.1 Introduction In Chapters 2 and 3 of this thesis, I reported on the analysis of the expression profiles of normal and malignant neural crest stem cell-like cells, respectively. These analyses revealed a number of genes and pathways, such as those involved in DNA double-stranded break repair, to be aberrantly expressed in metastases-derived NBL TICs, and implicated Aurora kinase B as a novel drug target against NBL TICs (Chapter 3). Exon-level analysis of RNA-Seq data described in Chapter 3 provided a potential mechanistic avenue to account for the sensitivity of NBL TICs, but not normal cells, to AURKB inhibition. Subsequent work by others has confirmed Aurora kinase B to be a drug target against NBL in primary tumors [306]. The overall objective of Chapter 4 is to conduct a high resolution characterization of a panel of primary NBL tumors using next-generation sequencing approaches with a goal of identifying additional drug targets for the disease that are relevant to primary tumors at diagnosis. In particular, in this Chapter I address two specific aims listed below. First, I address whether primary NBL tumors harbor recurrently mutated genes. Second, I investigate whether the genetic aberrations found in primary tumors recurrently target the same signaling pathways. To accomplish these aims, we developed a strategy that uses a combination of next-generation sequencing approaches to comprehensively characterize 99 primary NBL tumor DNA samples and matched peripheral blood DNA samples used as normal reference material (Figure 4.1). We used Illumina whole exome sequencing to 4 A version of the Chapter is in revision, and the co-author contributions are detailed in the Preface as per the University of British Columbia PhD thesis guidelines T.J. Pugh*, O. Morozova*, E.F. Attiyeh, S. Asgharzadeh, J.S. Wei, D. Auclair, K. Cibulskis, M.S. Lawrence, A.H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, I. Birol, R.D. Corbett, K.L. Mungall, Y. Zhao, R.A. Moore, N. Thiessen, A. Lo, R. Chiu, S.D. Jackman, A. Ally, B. Kamoh, A. Tam, J. Qian, M. Krzywinski, M. Hirst, S.J. Diskin, Y.P. Mosse, K.A. Cole, M. Diamond, R. Sposto, L. Ji, T. Badgett, W.B. London, Y. Moyer, J.M. Gastier-Foster, M.A. Smith, J.M. Guidry Auvil, D.S. Gerhard, M.D. Hogarty, S.B. Gabriel, S.J.M. Jones, G. Getz, R.C. Seeger, J. Khan, M.A. Marra, M. Meyerson, J.M. Maris. The genomic landscape of high-risk neuroblastomas reveals a wide spectrum of somatic mutation. In Revision. *Authors contributed equally. 137 characterize 81 tumor/normal pairs; Illumina whole genome and transcriptome sequencing to characterize 10 tumor/normal pairs; and Complete Genomics, Inc. (CGI) whole genome sequencing to characterize another 10 tumor/normal pairs. Among these samples, we included one case that was studied by both whole exome and whole genome sequencing using Illumina, and another case that was studied by whole genome sequencing using both CGI and Illumina. The Illumina and CGI sequencing technologies are discussed in Section 1.6.3.1. This study reports on the application of second-generation sequencing to the characterization of high-risk NBL. The 99 NBL cases included in this analysis were collected and characterized as part of the Therapeutically Applicable Research to Generate Effective Targets (TARGET) initiative (http://target.cancer.gov). The TARGET initiative is a pediatric branch of The Cancer Genome Atlas, and was designed to identify molecular targets for pediatric cancer drug development. All sequence data have been deposited in dbGAP (http://www.ncbi.nlm.nih.gov/gap) and six-letter codes are used to identify individuals in this database throughout the text. The clinical details of the cases are listed in Appendix D. 4.2 Results 4.2.1 Exome sequencing Exome sequencing was used to survey the frequency of coding sequence mutations in 81 high-risk NBL tumor/normal pairs, one of which was also included in the set of 19 whole genome sequences described below (Figure 4.1). DNA was extracted, amplified, and ~33 Mb of genomic sequence captured by in-solution hybridization [315] followed by Illumina sequencing [316]. The target regions consisted of 193,094 exons from 18,863 genes annotated by the Consensus Coding Sequences [317] and RefSeq [318] databases as coding for protein or micro-RNA (accessed November 2010). On average, 10.7 Gb of unique sequence data were generated for each sample, of which 58% were aligned to the target exome using Burrows-Wheeler Aligner [319] (84% if bases within 250 bp of each target were included), resulting in median coverage of 191X of each on-target base. On average, 90% of targeted bases were suitable for mutation detection (> 14 reads in the tumor and > 8 reads in the normal) using the muTect algorithm [108,320,91]. A total of 14% of exons had fewer than 90% of bases assessable for mutation in at least 73 of 81 exome pairs (90%), apparently due to systematic capture or sequencing problems related to GC-content. 138 4.2.2 Whole genome and transcriptome sequencing Genome sequencing of tumor and matched normal DNA was used to explore the spectrum of somatic sequence and structural aberration present in 19 high-risk neuroblastoma cases. To survey the fraction of rearrangements that are expressed in the transcriptome, we also generated over 10 Gb of RNA-Seq data in 10 out of 19 cases. To account for potential biases imposed by the new sequencing platforms, we used two different genome sequencing approaches, Illumina [316] (10 cases) and CGI [321] (10 cases); one case, PASLGS, was sequenced using both methods. Ten tumor and normal genomes were sequenced to an average 29.7X haploid coverage using Illumina technology, while another set of ten tumor and normal genomes were sequenced to an average 59.9X haploid coverage using CGI technology. The coverage in the Illumina-derived data set permitted single nucleotide variant (SNV) detection at over 86% of positions in the reference genome (hg19), and over 74% of the coding sequence, as defined by the exome experiment. Similarly, the average coverage by the CGI allowed for the SNV analysis of 86% of the reference genome, and 94% of the coding sequence. The differences in average coverage achieved by the CGI and Illumina platforms could be attributed to the different depth of sequencing used in each approach (average 29.7X and average 59.9X for the Illumina and CGI platforms, respectively). The analysis of the common case, PASLGS, revealed that the somatic non-silent mutation rates per Mb of coding sequence computed separately for CGI and Illumina data were 0.58 and 0.66, respectively, and the two methods detected 9 somatic exomic mutations in common. As both exome and genome data were generated for PANYGR using Illumina technology, we had the opportunity to compare variants detected using both approaches on a single sample. The mutation rates in the coding region derived from these data sets were 0.59 and 0.65 non-silent mutations per Mb, in the exome and genome respectively, and the two methods detected 27 somatic exomic mutations in common. One additional variant called from the exome data was not detected by the genome analysis due to low coverage in the tumor genome at this position (4X), although the mutant allele was supported by one read. As the two methods were concordant for all somatic exomic mutations, we combined and directly compared mutation calls from exome and genome data sets for subsequent analyses. 139 4.2.3 Overall mutation frequencies Across the coding regions of 99 tumor/normal pairs (80 exome, 18 genome, 1 exome and genome), we detected 2,500 candidate somatic mutations in 2,105 genes (Appendix E). A median of 20 candidate exomic mutations was found per tumor (range 3-236), of which 16 were predicted to affect an amino acid or splicing change (non-silent mutations) (range 3171) (Figure 4.2A). This corresponded to a median non-silent mutation frequency of 0.56 mutations per Mb, after correction for the number of bases with sufficient data for mutation detection (Figure 4.2A). This is one of the lowest median mutation frequencies reported to date [116,322], and is consistent with recent data showing a similar 0.4 non-silent mutations/Mb recently reported for medulloblastoma [99], another pediatric solid tumor. Synonymous mutations were relatively few compared to non-silent changes, suggesting a low rate of putative passenger events. We did not observe a correlation between mutation frequency and age of diagnosis, MYCN amplification status, or other prognostically relevant clinical or genomic variables (q > 0.015). The rates of transitions (substitutions that change purines to purines or pyrimidines to pyrimidines) and transversions (substitutions that change purines to pyrimidines or vice versa) in NBL differed greatly from those found in cancers with known environmental contributions. For instance, over 90% of mutations in melanoma are C>T/G>A transitions associated with ultraviolet light exposure [112] while smoking-associated C>A/G>T transversions make up 46% of mutations in small cell lung cancer [110]. By comparison, the C>T/G>A transitions and C>A/G>T transversions comprised 29% and 36% of all mutations in NBL, respectively. These rates were consistent with current hypotheses of limited environmental contribution to NBL development [174]. While the mutation rate was low across most tumors, 2 tumors had markedly increased non-silent mutation rates (greater than the third quartile plus 3.5 times the interquartile range, i.e. Q3 + 3.5*IQR) (Figure 4.2). This threshold for outliers (hypermutated samples) was selected to identify extreme outliers, more stringent to that used in the TCGA study that reported on hypermutated glioblastoma multiforme (GBM) tumors [91]. Both hypermutated tumors contained alterations that may explain accumulation of somatic mutations. Specifically, PAPPKJ harbored a deletion of one copy and a nonsense mutation of the other copy of the DNA mismatch-repair gene MLH1, likely resulting in a complete loss 140 of this protein. Similarly, PALJPX contained a heterozygous nonsense mutation in the DNA nucleotide excision repair (NER) gene damaged DNA-binding protein 1 DDB1 that is involved in maintaining genome integrity and preventing the accumulation of DNA lesions in replicating cells [323]. The loss of the fission yeast DDB1 ortholog ddb1 results in a hypermutator phenotype in yeast [324]. In addition, the knockdown of the Drosophila DDB1 ortholog D-DDB1 in wing imaginal discs produces a genome instability phenotype in somatic cells [325]. These observations suggest that the hypermutator phenotype in the NBL PALJPX case may be explained by the heterozygous nonsense mutation in DDB1. However, as the yeast experiments were conducted in haploid organisms, there is no direct evidence that haploinsufficiency for ddb1 observed in the NBL tumor is sufficient to drive hypermutation. No nonsense mutations in DDB1 have been reported in the COSMIC database as of February, 2012 [326]. The finding of hypermutation in NBL represents a possible new subtype of this disease and future studies will define if there are unique clinical features associated with this aberration. 4.2.4 Verification of candidate somatic mutations using orthogonal approaches A total of 438 candidate somatic SNVs identified by whole genome sequencing were selected for verification by Sanger and/or Illumina sequencing, performed at CGI. Three hundred and seventy nine variants were confirmed using at least one of these approaches, corresponding to a 90% verification rate when failed assays were accounted for. In addition, 224 candidate somatic mutations were selected for verification with Sequenom genotyping [327], conducted at the Broad Institute. The Sequenom method is based on distinguishing allele-specific primer extension products by mass spectrometry, and was originally developed for germline genotyping [328]. One hundred and sixteen of the 224 sites (52%) were confirmed by this approach. Two main reasons may account for the low verification rate of the Sequenom experiment as compared to the sequencing verification experiment conducted at CGI. First, since the Sequenom assay was originally developed for germline analysis, it is poor at detecting mutations present at low allelic frequencies [116], as may be the case in heterogeneous cancer samples. Second, whole genome amplification reactions, which were used in the exome sequencing experiment, may have introduced artifacts, resulting in a higher fraction of false positives in these data compared to genome sequencing data, which did not involve whole genome amplification. All somatic mutations that are 141 explicitly described in the text have been verified, and the methods used to confirm each mutation are listed in Table 4.1. 4.2.5 Genes and pathways with significant frequency of mutation We identified 8 genes mutated at a significant frequency (q < 0.2) in the 99 tumors using the MutSig algorithm [329] (Table 4.2). Of these, only five genes remained significant (q < 0.2) when the hypermutated samples were excluded from the analysis (Table 4.2). The MutSig algorithm tests the null hypothesis that all the observed mutations in each gene are a consequence of random background mutation. Genes for which this hypothesis is rejected based on the Benjamini-Hochberg false discovery rate-corrected q-value are considered significantly mutated [329], such that mutations in these genes likely contribute to the malignant phenotype. Due to the very low mutation rate in NBL, we chose the BH false discovery rate-based q-value threshold of 0.2, which implies that we allowed for 20% of false positives in our data (1 in 5 MutSig hits). While mutations in ALK and PTPN11 were previously known in NBL [188,194,193,192,330], the remaining 6 candidate genes are newly reported in this disease. Using the available RNA-Seq data from 10 cases as well as published RNA-Seq data from neural-crest-like cells [244], also discussed in Chapter 3, we determined that PGLYRP3, GABRA6 and IGSF11 were not expressed in either normal neural crest-like cells or NBL cells. Since a key goal of this study and the TARGET consortium is to identify genetic alterations that could potentially be targeted by drugs, we focused our analysis on four genes significantly mutated in non-hypermutated samples and expressed in NBL: ALK, PTPN11, LILRB1 and NRAS. ALK was previously reported to be mutated in up to 10% of NBL cases [192,194,188,193] consistent with our unbiased screen here showing 9 cases with a somatic ALK mutation, all restricted to the kinase domain. A tenth case, PANYGR, harbored a germline variant also in the ALK kinase domain (clinically-associated dbSNP rs113994092, described as a pathogenic allele), heterozygous in tumor and normal samples from this patient. Mutations in the ALK oncogene occurred exclusive of the other genes mutated at a statistically significant frequency and were independent of the MYCN amplification status (3 MYCN-amplified, 7 non-amplified). Germline and somatic PTPN11 mutations have been reported in NBL [330] and both somatic mutations observed in this study were located at residues frequently mutated in 142 juvenile myelomonocytic leukemia [331] and individuals with Noonan syndrome [332]. No pathogenic germline PTPN11 variants were found as part of the study. Another MutSig hit, LILRB1 is an inhibitory cell-surface immunoglobulin-like receptor that has been reported to limit the activation of the mTOR pathway through the activation of SHP2 [333], the protein encoded by PTPN11. Therefore, loss of SHP2 regulation by LILRB1 could in theory have a similar oncogenic effect as activating mutations of PTPN11, as such mutations could potentially lead to the constitutive activation of the RAS/MAPK signaling. The case PALTEG contained a splice site mutation that is predicted to disrupt the splice consensus sequence. Two additional mutations fell within the immunoglobulin domain: a nonsense alteration in PALZZV and a missense change in PANUKV. In addition to PTPN11 and LILBR1 that are upstream regulators of the RAS/MAPK pathway, five Cancer Gene Census genes in the MAPK signaling pathway (KEGG hsa04010) appeared to harbor somatic mutations in six tumors. These mutations were in receptor tyrosine kinases EGFR, NFKB2, NTRK1, PDGFRB, and the downstream target NRAS (2 tumors), which was also identified by MutSig as significantly mutated in our study. A recent report suggested that PTPN13 may also function in the MAPK pathway [334] and translocations or mutations of PTPN12 and PTPN13 were identified in 3 samples, albeit one of which was hypermutated. Overall, MAPK pathway oncogenes were mutated in 15% of high-risk NBL cases studied (Figure 4.2B). Chromatin remodeling genes appeared to be frequently disrupted in NBL as 18 histone-modifying genes harbored coding somatic mutations: nonsense mutations in CREBBP, CHD8, KDM6A, MLL4; and missense mutations in EP300, ARID1A (2 tumors), ARID1B, ASH1L, CHD6, HDAC4, KDM5A, MLL3, MLL5, NUP98, PAX5, PRDM2, and PRDM4. These genes encode characterized histone acetyltransferases (CREBBP and EP300), DNA helicases (CHD6, CHD8, ARID1A), histone demethylases (KDM5A and KDM6A), histone methyltransferases (ASH1L, MLL3, MLL4, MLL5, PRDM2), histone deacetylase (HDAC4), and other proteins involved in chromatin remodeling (ARID1B, NUP98, PAX5, PRDM4). More than a half (9/17) of the chromatin remodeling genes mutated in NBL were annotated in the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [335] as positive regulators of transcription. Five chromatin remodeling genes were mutated in tumors with ALK mutations. Intriguingly, loss of function mutations in chromatin 143 modifiers have been reported in lymphoma [102,106], bladder cancer [336], and other tumors [337]. Overall, a potential defect in chromatin remodeling was identified in 11% of high-risk NBL cases studied (Figure 4.2B). 4.2.6 Genome rearrangements and structural variants We used the trans-ABySS de novo assembly pipeline for Illumina sequencing data [297] to search for expressed rearrangements affecting genes; each of these events was confirmed by local re-assembly of the genomic reads [338]. In parallel, the CGI structural variation pipeline was used to detect candidate structural variants in the CGI genomes [110]. In total, 83 distinct events affecting 97 genes were identified using the two approaches in the 19 neuroblastoma cases, including 22 expressed events found in the RNA-Seq data; and a median of 4 structural variants (0 to 14 events range) was detected per tumor genome. The genomic architectures of 19 cases with available genome sequencing data are plotted in Figure 4.3 using CIRCOS [339]. The notable structural variants are summarized in Table 4.3. We found 4 distinct somatic translocations between chromosomal arms 11q and 17q that are commonly affected by numerical alterations in NBL [196]. These four somatic events occurred in three cases, PARGUX (2 events), PASCKI and PANNMS, and involved different genes and breakpoints in each case. Notably, one of the t(11;17) translocation in PARGUX is predicted to disrupt the function of IKZF3, an Ikaros DNA-binding protein 3 involved in chromatin remodeling, and previously implicated in chronic lymphocytic leukemia [340]. Another chromatin remodeling gene, ARID1B, was targeted by a somatic ~30 kb deletion that removed exon 2 in PASLGS, and appeared to be loss-of-function. Members of the MAPK pathway were also the target of somatic structural change, a MAPK10/PRDM5 fusion and a PRELID2/MAPK9 fusion, both resulting from intrachromosomal deletions but with unknown frame effects due to multiple transcript annotations for these genes. Other cancer genes affected by somatic structural variants included ABL2 which was fused out of frame with ACBD6 and harbored a somatic missense mutation in this study; STAG1, a p53 pathway member recently implicated as a target of translocations in several cancers [341,342]; cadherins CDH13 and CDH18; and NOP2 and AUTS2, both known translocation targets in acute lymphoblastic leukemia [343]. Two loci were affected recurrently by structural variants in two cases: the transcriptional repressor ZFHX3 (ATBF1) that has been shown to function as a tumor 144 suppressor in several cancers [344] and the CDK5 regulatory subunit associated protein 1 CDKAL1. Neither of these genes has been implicated in NBL, and their potential role in this disease warrants further investigation. The NBAS (neuroblastoma amplified sequence) locus located 0.4 Mb from MYCN appeared to be most commonly affected by rearrangements in our cohort of MYCN-amplified cases, harboring 11 distinct rearrangements in three cases PASDZJ, PARSHT, and PARIRD. The NBAS-rearranged cases were associated with an increased copy number at the NBAS locus, and with more than 2-fold increase of the NBAS mRNA compared to the wild type NBAS cases, consistent with the previous observation that a fraction of MYCN-amplified cases involves co-amplification of NBAS. 4.2.7 Mutations in other known cancer genes and regions Beyond ALK, PTPN11, and NRAS, no cancer genes listed in the Cancer Gene Census [7] had mutation frequencies that rose to the level of statistical significance (q < 0.2). In addition to the mutations in gene sets noted above, 14 genes listed in the Cancer Gene Census were mutated across 12 samples (Table 4.1; Figure 4.2B). Mutations in 2 of these genes, ATM and PIK3CA, matched a mutation listed in COSMIC [100]. Two MYC family members, MYC and MYCN, were mutated in two NBL tumors lacking MYCN amplification. The SIFT algorithm [345] predicted both variants to be deleterious (score < 0.01) and the MYCN mutation was previously reported in glioblastoma [346], suggesting that it may confer selective advantage to malignant cells. We also detected and validated a fusion of MYCN with GULP1 that retained the reading frame and may be activating. Therefore, it appears that several mechanisms exist to promote MYC signaling in NBL beyond amplification. Among the genes that harbored protein sequence-altering mutations in two or more non-hypermutated cases, there were several genes that mapped to known chromosomal regions frequently altered somatically and of clinical significance [197]. Known chromosomal regions of clinical significance in NBL are described in detail in Section 1.9.2.1 and include losses of chromosomal arms 1p and 11q, and gains of chromosomal arm 17q. Out of the 99 cases analyzed in this study, 42 cases harbored a loss of 1p, 49 cases harbored a loss of 11q, and 63 cases harbored a gain of 17q (Appendix D). A single gene mapping to the 1p31-1p36 common deletion region, ARID1A, harbored somatic non-synonymous mutations in two non-hypermutated cases. The ARID1A locus has been implicated as a tumor suppressor in several adult and pediatric solid tumors by both 145 genomic and functional evidence [347,348]. Even though the two cases with non-silent ARID1A mutations, PALXHW and PALNLU, each had two copies of 1p, both ARID1A mutations (Table 4.1) were missense homozygous changes predicted to be damaging to the protein (score < 0.05) by the SIFT algorithm [349]. The genes CATSPER1, AHNAK, PITPNM1, and SORL1, mapping to the 11q region commonly deleted in NBL, were each mutated in two non-hypermutated cases. All six cases with mutations in at least one of these genes harbored a loss of the chromosomal arm 11q. These cases included PALHVD, PAINLH, PAPBZI, PAKFUY, PALFPI, and PALSAE (Appendix D). Consistent with losing one copy of the 11q arm, all candidate mutations in CATSPER1, AHNAK, PITPNM1, and SORL1 detected in this study were homozygous. While CATSPER1 and PITPNM1 have not been reported previously to play a role in NBL or cancer in general, the loss of AHNAK is seemingly associated with the radiosensitivity of NBL cells [350], and SORL1 may be involved in the proliferation of NBL cell lines [351]. In addition, SCN4A, CRHR1, ABCA5, and IGF2BP1 mapped to 17q, commonly gained in high-risk NBL. None of these loci have been previously implicated in NBL. Finally, several genes on 11q (FAM86C1, RNF121, ATG2A and SHANK2) and 17q (IKZF3, TRIM37 and BCAS3) appeared to be affected by the t(11;17) translocations. None of these genes were affected by the translocations recurrently. All three cases, PASCKI, PARGUX and PANNMS, with translocations between chromosomal arms 17q and 11q, harbored concurrent losses of 11q and gains of 17q, suggesting that the unbalanced translocations t(11;17) may account for gains of 17q and losses of 11q in a fraction of NBL tumors. As described in Section 1.9.2, a GWAS study has been contacted in NBL and implicated common germline variants in FLJ22536, BARD1 and LMO1 to be associated with the susceptibility to sporadic high-risk NBL [197,66,198,196]. The current study did not detect somatic non-silent variants or somatic recurrent non-coding variants in any of these loci in our cohort of 99 NBL cases. However, we did observe that all 10 cases with available matched tumor and normal genome and tumor transcriptome sequencing data (Appendix D) had novel germline variants (single nucleotide substitutions), not reported in the 1000 genomes project data [352] or any samples from non-cancerous tissues sequenced at the Genome Sciences Centre [353]. These variants may be related to the aberrant splicing observed at the BARD1 locus (Section 3.2.7). 146 4.3 Discussion The described survey for somatic mutation in primary NBL tumors has found this cancer to have one of the lowest mutation frequency rates among solid tumors examined to date, similar to that of another pediatric cancer, medulloblastoma [99]. The mutations identified in our study were distributed across a large number of genes as 88% of genes with non-silent mutations were only mutated in 1 of the 99 tumors studied. In addition, non-silent mutations were seen four times more often than silent mutations (1,735 non-silent versus 420 synonymous mutations in non-hypermutated samples), suggesting a selective pressure for coding changes. This is unlike most adult cancers, where passenger mutations are much more frequent than driver mutations [354]. Presumably, the low passenger mutation rate observed in NBL reflects less environmental influence in this cancer compared to adult malignancies. This is consistent with NBL typically arising at a very young age, with most cases diagnosed before 5 years of age [174]. The genome sequence analysis described in this Chapter has been able to identify candidate mechanisms involved in 51 of 99 neuroblastomas (ALK mutations, MAPK pathway oncogene mutations, mutations in chromatin remodeling genes, mutations in MYC family genes, mutations in Cancer Gene Census genes, as highlighted in Figure 4.2). While sequencing of more NBL cases will provide increased power to discover additional recurrent somatic events, the relative paucity of focal mutations discovered here challenges the general concept that druggable targets and pathways can be defined in each patient by sequencing approaches alone, at least in the somatic mutation space. Nonetheless, our data address the overall objective of this Chapter and identify common vulnerabilities of primary NBL tumors that may be exploited therapeutically. For instance, a subset of NBL patients may be sensitive to the inhibition of ALK (9% of patients with high-risk NBL) and MAPK signaling (15% of patients with high-risk NBL); and strategies that target these pathways can be immediately prioritized for clinical development due to the known activating role of the mutations in these pathways. In contrast, chromatin remodeling abnormalities, found in 11% of patients with high-risk NBL, need to be further investigated before they can be targeted clinically. In Chapter 3, we conducted an expression analysis of NBL TICs and found that the double-stranded break DNA repair pathway, involving the BRCA1/BARD1 complex was 147 expressed at a higher level in NBL TICs compared to normal neural crest-like cells and a panel of other cancers. Aberrations in DNA repair may appear counter-intuitive, given the low mutation rate in NBL, discussed in this Chapter. However, the observation is consistent with the results of sequencing studies in breast cancers, many of which also possess a defect in double-stranded break DNA repair [355]. In particular, albeit with some exceptions, breast cancers do not typically harbor an increased frequency of somatic point mutations as compared to other adult tumors, such as lung cancers or melanoma [354]. Instead, breast cancer genomes often harbor a high frequency of large chromosomal aberrations [356] that may be associated with both a deficiency of homologous recombination (e.g. loss of function of BRCA1 in familial breast cancer) or the hyperactivity of the BRCA1 signaling pathway through gain of function mutations in BRCA1, seen in other types of breast tumors [355]. Similarly, despite the low rate of somatic point mutations, NBL tumors display a high prevalence of large chromosomal alterations (chromosomal alterations affecting genes are shown in Figure 4.3) suggesting that aberrations in double-stranded break DNA repair may play a role in this disease. Increased expression of the BRCA1 pathway in NBL TICs further suggests that hyperactivity of this pathway may be a factor in NBL, similarly to what is seen in some cases of breast cancer [357]. Interestingly, we did not observe any somatic non-silent mutations in the BRCA1 pathway members, including BARD1, shown to be aberrantly spliced in NBL cells in Chapter 3. We did, however, observe previously unreported intronic germline variants in BARD1 in all 10 NBL cases with available genome and transcriptome sequencing data. Since preliminary reports suggest that GWAS risk-alleles of BARD1, all occurring in introns [197], may be associated with aberrant splicing of this gene [196], we hypothesized that the novel germline variants observed at this locus by the current study may also play a role in BARD1 splicing. Further functional work is needed to confirm these possibilities. Since the majority of our data set comprised exome sequencing data, we do not exclude the possibility that non-coding (regulatory) mutations may explain the observed increased expression of the BRCA1 signaling pathway members discussed in Chapter 3. 148 4.4 Materials and methods 4.4.1 Sample selection and preparation The study focused on high-risk NBL, and we attempted to reduce heterogeneity by restricting eligibility to subjects between 1.5 and 5.5 years of age at diagnosis (median 2.94 years) with stage 4 (high-risk metastatic) disease (Appendix D). There was a preponderance of male subjects (62%). All specimens were obtained at original diagnosis after informed consent at Children‘s Oncology Group (COG) member institutions. Thirty-four of the 99 tumors studied harbored amplification of the MYCN oncogene and 40 had a diploid DNA index (values of 1 in Appendix D). These two assays are routinely performed on all NBL samples in the COG NBL reference laboratory by fluorescence in situ hybridization and flow cytometry, respectively. Flash frozen tumor samples were analyzed for percent tumor content by histopathology prior to nucleic acid extraction, and samples with <75% tumor content were not included in this study. Tumor RNA and DNA were derived from fresh frozen primary NBL tissue and matched normal peripheral blood. All sequence data have been uploaded to dbGAP (http://www.ncbi.nlm.nih.gov/gap) and six-letter codes used to identify individuals in this database are referenced throughout the text. 4.4.2 Illumina library construction and sequencing Genome and transcriptome libraries of the ten BCCA cases were constructed from input amounts of 2-4µg DNA and 3-10µg DNaseI-treated total RNA, respectively, following the previously described protocols [102,106]. The sequencing was carried out using Genome Analyzer IIx (GAIIx) (Illumina, Hayward, CA, USA) as per the manufacturer's instructions. Paired end reads generated from genome and transcriptome sequencing were aligned to the hg19 (GRCh37) reference human genome assembly using BWA version 0.5.7 [319]. RNASeq reads were processed as previously described in Section 3.4.1 and [244,149]. 4.4.3 Detection of candidate somatic mutations in genome sequencing data SNV detection in Illumina tumor genome and transcriptome data was performed using SNVMix2 with filtering to include SNVs such that the combined probability of either heterozygous or homozygous SNV was greater than 0.99 [358]. Reads flagged as poor quality according to the Illumina chastity filter, duplicate reads, and reads aligned with a mapping quality < 40 were excluded from SNV calling. The somatic status of SNV calls was determined using read evidence from the SAMtools version 0.1.13 pileup [312] constructed 149 at the variant positions in the matched normal genome. Positions with normal genome coverage of at least 5 unique reads supporting the reference allele were considered somatic. The candidate somatic SNV calls were inspected using the Integrative Genome Browser [163], and only those calls confirmed by visual inspection were used in the analysis. Ten of these events, listed in Table 4.1, was validated using ultra-deep re-sequencing with read indexing as previously described [102]. The Pindel software was used as suggested by the authors to identify candidate short insertions from the tumor and normal genomic bam files [359]. The mean and standard deviation of read pair insert sizes were calculated for all samples to be ~400 bp, and this value was used in each Pindel run. The Pindel short insertion output was filtered to select events that mapped to annotated genes (Ensembl 59 [360]). Candidate somatic short insertion events that recurred in at least two cases were manually reviewed in the Integrated Genome Browser (Broad Institute). In addition, SAMtools version 0.1.13 pileup and varFilter functionality [312] was used to indentify indels from the tumor and normal genomic alignment bam files. To detect candidate somatic indels, further filtering was done separately on normal and tumor libraries. In the normals, any event with a total coverage of less than 8 was discarded. In the tumor libraries, only the indels with (#indel reads/#total reads) >= 16% were considered. After the filtering, any indel present in one or more normal libraries was flagged as germline. None of the candidate somatic coding indels from the Pindel or SAMtools analysis was confirmed by manual inspection in the Integrated Genome Browser [163], and hence they are excluded from the text. For CGI data, the provided MAF files were used to extract somatic mutations using the filtering criteria provided in Table 4.4. 4.4.4 Gene coverage in transcriptome sequencing data The alignments of RNA-Seq data were used to estimate gene expression levels. Gene coverage analysis was based on Ensembl gene annotations (homo_sapiens_core_59_37d) [360]. These annotations were converted into one model per gene by taking all transcripts of a given gene and collapsing them into a single gene model such that exonic bases in a collapsed gene model were the union of exonic bases that belonged to all known transcripts of the gene. 150 The analysis used SAMtools version 0.1.13 pileup [312] to get the per-base coverage depths, and excluded reads with mapping quality < 10 and reads flagged as poor quality according to the Illumina chastity filter. The reads per kilobase of exon model per million mapped reads (RPKM) metric was used to estimate gene expression level [150]. RPKM was calculated using the following formula: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. 4.4.5 Copy number analysis using genome sequencing data Copy number analysis was conducted using an HMM that was previously described [114,109]. Briefly, for copy number analysis, 50 million reads (mapping Q >10) were randomly selected from the final merged bam files for the tumor and matched normal genomes. The normal reads were split into bins of 200 adjacent alignments, and the corresponding bins in the tumor genome were used to calculate the ratio of tumor/normal reads in each bin. These values were normalized by subtracting the median of the tumornormal ratios across the whole genome. This resulted in a measurement of the relative read density from the tumors and matched normals in bins of variable length along the genome, where bin width was inversely proportional to the number of mapped reads in the normal genome. GC bias correction was applied, and an HMM was used to classify and segment the tumor genome into continuous regions of somatic copy number loss (HMM state 1), neutrality (HMM state 2), slight gain (HMM state 3), gain (HMM state 4) or high gain (HMM state 5). For CGI data, cnvTumorSegmentsRelative.tsv files were used to obtain somatic CNV calls. These calls were then converted to the five HMM states described above using the following rules: if calledLevel<=0.79 then 1; if 0.79<calledLevel<=1.25 then 2; if 1.25<calledLevel<=1.75 then 3; if 1.75<calledLevel<=2.5 then 4; if calledLevel >2.5 then 5 4.4.6 Rearrangement detection De novo transcriptome assembly by ABySS [338] was performed on the ten RNASeq datasets to identify candidate transcript rearrangements. The assembled contigs were run through the trans-ABySS pipeline [297] which aligned a merged contig set to the hg19 (GRCh37) human reference genome assembly and compared the alignments to annotated 151 transcript models, allowing identification of known and novel transcript structures. The transcript rearrangement component of the pipeline identified all contigs that had two separate discrete genomic BLAT alignments. The top 5 scoring alignments were inspected manually and the read evidence support was used to filter out likely false positive events. Smaller scale rearrangements were identified from contigs with single, gapped BLAT alignments with supporting read evidence again used to filter out false positive events. Targeted genomic assembly of the candidate rearranged regions was performed to validate the events in the genomic data. In addition, 9 events were validated with PCR and Sanger sequencing in the tumor DNA and RNA using the following procedure. Primer pairs were selected around the event breakpoint with a 10 bp margin on either side using Primer3 [361] with the following parameters: 22-26 bp size, 40-46 GC and 54-66 TM restrictions, and using GC clamp. Primers were selected favoring product sizes 500-600 bp, 400-700 bp, and 300800 bp, respectively. For each amplicon, up to 100 primer pairs were initially identified. This set was filtered for pairs that hybridized to a unique location using BLAT (min identity 100, tile size 10, step size 2) on hg19 human genome assembly. Each primer was independently ranked using the Primer3 objective function. The primer sequences used for the genome and transcriptome validations are provided in Table 4.5 and Table 4.6, respectively. For the RNA validation, first strand cDNA was synthesized using 500ng of DNaseItreated total RNA from tumor by following the Agilent AccuScript High Fidelity 1st Strand cDNA Synthesis protocol (catalog #200820); 1µL of 5-fold diluted template (1st strand cDNA) was used for setting up the PCR with 98oC for 30 seconds, followed by 32 cycles of 98oC for 10 seconds, 59oC for 30 seconds, 72oC for 10 seconds, and then 72oC for 5 minutes. The PCR product was run on an 8% PAGE gel for 35 minutes at 200V, and stained with SyBr green for 1 minute to visualize the image. For the DNA validation, 1ng genomic DNA was used as a template for PCR with 98oC for 30 seconds, followed by 28 cycles of 98oC for 30 seconds, 63oC for 30 seconds, 72oC for 60 seconds, and then 72oC for 5 minutes. The PCR product was run on a 1% agarose gel for 90 minutes at 100V, and stained using SyBr green for 45 minutes for visualization. The target PCR products from matching tumor and normal DNA were excised, cloned into vector pCR4-TOPO (Invitrogen) and sequenced using M13 forward and M13 reverse primers on the ABI3730xl capillary sequencer. 152 The CGI structural variation pipeline was used to identify rearrangements present in the CGI data [110]. Candidate somatic events were confirmed by PCR and electrophoresis alone or followed by Sanger sequencing. 4.4.7 Exome sequencing and data analysis The generation, sequencing, and analysis of 81 pairs of exome libraries at the Broad Institute was performed using a detailed, previously described protocol [108]. A summary of deviations from this protocol is provided here. Due to the small quantities of DNA available, all DNA samples were amplified using Phi29-based multiple-strand displacement whole genome amplification (Repli-g service, QIAgen). Exonic regions were captured by insolution hybridization using RNA baits similar to those described [108] but supplemented with additional probes capturing additional genes listed in RefSeq [318] in addition to the original Consensus Coding Sequence (CCDS) set [317]. In total, ~33 Mb of genomic sequence was captured, consisting of 193,094 exons from 18,863 genes annotated by the CCDS [317] and RefSeq [318] databases as coding for protein or micro-RNA (accessed November 2010). Sequencing of 76 bp paired-end reads was performed using Illumina Genome Analyzer IIx (GAIIx) and HiSeq 2000 instruments. Reads were aligned to the hg19 (GRCh37) build of the human reference genome sequence using BWA [319]. To confirm sample identity, copy number profiles derived from sequence data were compared with those previously derived from microarray data from each case, downloaded from dbGAP. Candidate somatic base substitutions were detected using muTect (previously referred to as muTector [108]). Candidate somatic insertions and deletions were detected as previously described [108]. 4.4.8 Integrated analysis of somatic variation from exome and genome data sets Somatic mutations detected in genome, exome, and transcriptome data sets were annotated using Oncotator version 0.4. Genes mutated at a statistically significant frequency were identified using the MutSig algorithm [329]. Briefly, background mutation rates were estimated from all data for each of the 7 mutation categories: C or G in CpG; C in TpC or G in GpA; A; remaining C; remaining G; insertion/deletion/duplication. These rates were assumed to be constant across all patients and across all genes in the genome. The overall background mutation rate was considered to be the sum of the seven random variables, describing each mutation category. The observed mutation data for each gene 153 across all patients, corrected for gene length, was then compared to the background mutation rate, and a likelihood ratio test was applied to select those genes whose observed mutation rate was significantly different from the estimated background mutation rate. The BenjaminiHochberg false discovery rate correction for multiple testing was applied to calculate the qvalue for each gene, quoted in the text. The q-values of less than 0.2 were considered statistically significant which amounts to a 20% false positive rate. The relationship between mutation frequency and age of diagnosis was tested using the Spearman rank test. The R version 2.11.1 implementation of the Kolmogorov-Smirnov test (ks.test) was used to test differences in mutation frequency distributions of the following: 1) MYCN amplified vs. unamplified, 2) 17q loss vs. wildtype, 3) 1p gain vs. wildtype, 4) 11q loss vs. wildtype, and 5) hyperdiploid vs. diploid. Correction for multiple testing was performed using the R Bioconductor package q-value [246]. Significantly mutated genes led to an investigation of related genes, specifically those involved in chromatin remodeling and MAPK signaling. These lists of genes are provided in Appendix F. In a search for informative mutations in hypermutated samples, we examined mutations in genes from a published [362] and updated list of DNA repair genes available through the authors‘ website: http://sciencepark.mdanderson.org/labs/wood/dna_repair_genes.html. 154 Figure 4.1 Overview of the multi-centre next-generation sequencing initiatives and data analyses 155 Figure 4.2 Somatic mutation frequencies in 99 NBL tumor/normal pairs with samples ordered by type of genes with somatic alteration (A). Individual somatic mutation rates in the 99 NBL tumors arranged by mutation categories discussed in the text (color-coded): hypermutated, ALK mutated, chromatin remodeling gene mutated, MAPK pathway oncogene mutated, Cancer Gene Census gene mutated, and unknown. Within each category the samples are ordered by their somatic non-silent mutation rate corrected for callable exonic sequence (Mb). The data panels are described below in bold. Data type – sequencing technology used, blue = in-solution exome capture followed by Illumina sequencing, orange = Illumina whole genome sequencing, yellow = Complete Genomics, Inc. (CGI) whole genome sequencing. Hatched blocks identify cases for which data were generated using two technologies. Callable exonic Mb – megabases of coding sequence with sufficient data for mutation detection. Count of candidate somatic mutation – stacked bar plot of silent (i.e. synonymous) and non-silent mutations in each tumor. Boxplot to the right depicts distribution of non-silent mutation frequencies across all 99 tumors. Whiskers depict upper and lower ranges used to detect outliers, equal to first or third quartile minus or plus 3.5 times the interquartile range, for the first and third quartiles, respectively (i.e. Q1 - 3.5*IQR or Q3 + 3.5*IQR). Outlier mutation frequencies are shown as circles. dbGAP 6-letter identifiers – TARGET sample identifiers (Appendix D). (B). Distribution of specific mutations in each mutational category of interest (color-coded): hypermutated, ALK mutated, chromatin remodeling gene mutated, MAPK pathway oncogene mutated, Cancer Gene Census gene mutated, and unknown. Genes, found to be mutated at a significant frequency by MutSig analysis, are listed in bold. Mutations in MYC family members (MYCN and MYC) are also highlighted. Genes that are listed in the unknown category are MutSig hits that do not belong to any of the other categories described by the legend (PGLYRP3, GABRA6, SUCLG2, IGSF11). The data panels are described below in bold. Heatmap of non-silent mutations and structural rearrangements – colored blocks identify alterations in genes with statistically significant mutation frequencies or implicated as part of a mechanism disrupted in NBL; DNA repair (red), ALK signaling (orange), chromatin remodeling (green), MAPK signaling (blue), MYC family member (light blue). Alteration types are color-coded missense mutation (black), nonsense/frameshift/splice site mutation (red), and structural rearrangement (orange). MYCN amplification – black 156 rectangles used to identify samples with MYCN amplification. A grey square identifies a sample for which a measurement of MYCN amplification could not be made for technical reasons. 157 A Categories for sample classification Hypermutated ALK mutated Chromatin remodeling gene mutated MAPK pathway oncogene mutated Cancer Gene Census gene mutated Unclassified 158 B Categories for sample classification Hypermutated ALK mutated Chromatin remodeling gene mutated MAPK pathway oncogene mutated MYC family mutated Cancer Gene Census gene mutated Unclassified Gene alteration categories in heatmap Missense mutation Nonsense, frameshift or splice site mutation Structural variant or gene fusion 159 Figure 4.3 Integrated analysis of 99 neuroblastoma cases reveals a diversity of somatic aberration Each case analyzed by whole genome sequencing is represented as a CIRCOS plot [339]. The reference human chromosomes are arranged end-to-end in the outer-most ring. Genes harboring non-silent mutations are depicted outside of the chromosomes with circles colorcoded as described in the legend. The ring inside the chromosomes shows somatic gains and losses of copy number. Finally, the inner-most ring depicts structural aberrations with black lines inside the circle; aberrations predicted to result in a gene fusion are highlighted with orange lines. The cases are ordered according to the categories in Figure 1. (A). Four cases with MAPK pathway aberrations. The MAPK pathway aberrations detected in the NBL genomes include mutations in NRAS and NF1 and gene fusions involving MAPK9 and MAPK10. (B). Four cases with chromatin remodeling aberrations. The chromatin remodeling aberrations detected in the NBL genomes include mutations in MLL5, CHD8, CREBBP, deletion in ARID1B, and a gene fusion involving IKZF3. (C). Three cases with aberrations in known cancer genes, including ABL2, ATM, and FANCD2. (D). Two cases with somatic mutations in ALK. (E). Six unclassified cases with no aberrations in the categories described thus far. Two unclassified cases (PARIRD and PARSHT) contain rearrangements of the MYCN amplification region. 160 A. Cases with somatic alterations in MAPK pathway oncogenes 161 B. Cases with somatic alterations in chromatin remodeling genes 162 C. Cases with somatic alterations in Cancer Gene Census genes 163 D. Cases with somatic alterations in the ALK oncogene 164 E. Unclassified cases 165 Table 4.1 Non-silent mutations in genes of interest along with their validation status The genes of interest listed in the table include genes mutated at a significant frequency (as detected by the MutSig analysis), genes implicated in chromatin remodeling or MAPK signaling (Appendix F), and genes listed in the Cancer Gene Census [7]. The genes are listed in the order shown in Figure 4.2. MutSig genes are highlighted in bold. Chromatin = X identifies chromatin remodeling genes hits; Cancer = X identifies genes listed in the Cancer Gene Census; MAPK = X identifies genes that encode a member of the MAPK Pathway (KEGG hsa04010); COSMIC = Number of samples recorded in the Catalogue of Somatic Mutations in Cancer that overlap the particular NBL mutation. Confirmation by orthogonal method = method(s) used to confirm the variant: Sanger sequencing, Custom hybrid capture and Illumina sequencing, RNA sequencing, genotyping using Sequenom assay [327]; * denotes hypermutated samples. 166 Category Protein change Genome change (hg19) Case identifiers COSMIC overlaps Hypermutated MLH1 (DNA repair) DDB1 p.Y157* p.C725* chr3:37050322C>A chr11:61079358G>T PAPPKJ* PALJPX* 0 0 ALK ALK ALK ALK p.I1170N p.I1171N p.F1174L chr2:29445216A>T chr2:29445213A>T chr2:29443695G>T 2 2 58 ALK p.F1174L chr2:29443695G>C PANRHJ PAKXDZ PANZVU, PALAKE PAREGK ALK ALK p.F1245I p.I1250T chr2:29436860A>T chr2:29432739A>G 7 0 ALK p.R1275Q chr2:29432664C>T ALK p.R1275L chr2:29432664C>A PAINLN PANYGR germline PALNLU, PANBCI PASAZJ ARID1A ARID1A ARID1B ASH1L CHD6 CREBBP p.G1139V p.G1942D p.R1487M p.K324R p.P2383T p.S1365* chr1:27099000G>T chr1:27106214G>A chr6:157522242G>T chr1:155451690T>C chr20:40040888G>T chr16:3790439G>T PALNLU PALXHW PAMMWD PAPPKJ* PAKFUY PASCLP 0 0 0 0 0 0 EP300 HDAC4 IKZF4 KDM5A KDM6A p.R915C p.P917L p.G151V p.A1028V p.Q1354* chr22:41546128C>T chr2:239990289G>A chr12:56420730G>T chr12:420184G>A chrX:44966680C>T PANIPC PALJPX* PALFPI PANBCI PALZRG 0 0 0 0 0 Chromatin remodeling Gene 58 42 42 Orthogonal method used to confirm variant Sequenom Sequenom Sanger Sanger, Sequenom Sanger, Capture, Sequenom Sanger, Capture, Sequenom Sanger, Sequenom Sanger, Sequenom Sanger, Capture, Sequenom Sanger, Capture, Sequenom Sequenom Sequenom Capture, RNA-Seq, Sequenom Sequenom 167 Category MAPK pathway Gene Protein change Genome change (hg19) Case identifiers COSMIC overlaps MLL3 MLL4 MLL5 p.A4748T p.C1432* p.P1759Q chr7:151841899C>T chr19:36218517C>A chr7:104753479C>A PALWIP PANZVU PANYGR 0 0 0 NUP98 PAX5 PRDM2 PRDM4 EGFR LILRB1 LILRB1 chr11:3735118T>A chr9:36966647C>T chr1:14108929C>A chr12:108128195T>C chr7:55240678C>A chr19:55143049G>T chr19:55147061G>T PANPVI PAIXNV PAMMWD PAPPKJ* PALUDH PALZZV PALTEG 0 0 0 0 0 0 0 LILRB1 NF1 NF1 NFKB2 p.E836V p.D227N p.P1547T p.E733G p.P641H p.E57* p.R550_splic e p.S209T p.Ile1679Val p.E2501* p.H894fs PANUKV PASDZJ PALJPX* PAPPKJ* 0 0 1 0 NRAS NRAS p.G13R p.Q61K chr19:55143652T>A chr17:29653037A>G chr17:29679318G>T chr10:104162112_10416211 2delC chr1:115258745C>G chr1:115256530G>T PANBSP PAPTMM 331 1551 NTRK1 PDGFRB PTPN11 PTPN11 PTPN12 PTPN13 MYC p.V263L p.A927S p.A72T p.E76A p.H460Y p.V811L p.T73I chr1:156841484G>T chr5:149499049C>A chr12:112888198G>A chr12:112888211A>C chr7:77256374C>T chr4:87662913G>T chr8:128750681C>T PAIXNC PAITCI PAPBZI PALHVD PAPPKJ PAILNU PAKZRF 0 0 68 128 0 0 0 Orthogonal method used to confirm variant Sequenom Capture, RNA-Seq, Sequenom Sequenom Sequenom Sequenom Sequenom Sanger, Capture, Sequenom 168 Category Cancer Gene Census Gene Protein change Genome change (hg19) Case identifiers COSMIC overlaps MYCN p.P44L chr2:16082317C>T PASLGS 1 PGLYRP3 PGLYRP3 PGLYRP3 GABRA6 GABRA6 SUCLG2 SUCLG2 IGSF11 IGSF11 ABL2 ATIC ATM ATRX BCR p.R175M p.F237C p.H338N p.R84H p.A322T p.D109H p.R160Q p.S270T p.P366Q p.P996A p.P148A p.A2274V p.S2017P p.S317fs PALZRG PALHVD PALZSL PAPPKJ PAMUTD PAICGF PAPPKJ PAITCI PALJPX PANYGR PALAKM PANRRW PALXHW PAHYWC 0 0 0 0 0 0 0 0 0 0 0 1 0 0 CARD11 CD79A p.R1011K p.P128fs CIITA CLTC COL1A1 DDX5 FANCD2 FANCE KTN1 MEN1 p.E372D p.A76T p.G1163R p.I82V p.K871N p.R200C p.Q618L p.R521fs chr1:153276338C>A chr1:153274903A>C chr1:153270446G>T chr5:161115980G>A chr5:161119084G>A chr3:67579512C>G chr3:67570997C>T chr3:118623540C>G chr3:118621566G>T chr1:179077416G>C chr2:216190772C>G chr11:108196798C>T chrX:76849227A>G chr22:23524096_23524097in sC chr7:2951918C>T chr19:42383609_42383610in sC chr16:11000462G>C chr17:57721820G>A chr17:48264420C>G chr17:62500403T>C chr3:10114944A>C chr6:35423873C>T chr14:56106660A>T chr11:64572092_64572093in Orthogonal method used to confirm variant Sanger, Capture, RNA-Seq, Sequenom Sequenom Sequenom Sequenom Sequenom Sequenom Sequenom Sequenom RNA-Seq, Sequenom Capture, RNA-Seq PAHYWC 0 PAMMWD 0 PAMVAG PAPPKJ* PALHVD PAPPKJ* PANN PAMBAC PARGUX PANUKV 0 0 0 0 0 0 0 2 RNA-Seq Sanger, Capture 169 Category Gene Protein change Genome change (hg19) MET MLLT3 MLLT3 MSI2 NACA p.R359Q p.167_168SS >S p.Q326* p.R269W p.P996fs NKX2-1 NOTCH1 NOTCH2 p.S44Y p.C2189Y p.P6fs PDE4DIP PIK3CA PLAG1 ROS1 TAF15 TMPRSS2 p.Q1197K p.K111N p.P458Q p.L2013V p.R406I p.A423fs TRIP11 TSC1 USP6 p.A1552T p.T356I p.67_68IR> MW sG chr7:116340214G>A chr9:20414341_20414343del CTA chr9:20413868G>A chr17:55752347C>T chr12:57112326_57112326d elG chr14:36988522G>T chr9:139391625C>T chr1:120612003_120612004 delGG chr1:144881607G>T chr3:178916946G>T chr8:57078932G>T chr6:117638404G>C chr17:34171520G>T chr21:42842589_42842590in sC chr14:92466356C>T chr9:135786463G>A chr17:5036210_5036211TC >GT WT1 p.R495P chr11:32410674C>G Case identifiers COSMIC overlaps Orthogonal method used to confirm variant PASAZJ PALXMM 0 0 Capture PAPPKJ* PAKFUY PALAKE 0 0 0 Sequenom PAMZMG PAPPKJ* PALWVJ 0 0 0 PAKXDZ PAIPGU PAREGK PAHYWC PALUDH PALAKE 0 20 0 0 0 0 PALNLU PALFPI PASLGS possible germline PANYBL 0 0 0 Sequenom Sequenom Sanger, Capture Sanger, Capture 0 170 Table 4.2 Genes with significant frequency of somatic mutation Somatic mutations in exomic regions from 99 NBL cases were analyzed using the MutSig algorithm [329] as described in Section 4.4.8 with and without two hypermutated (HM) samples. The MutSig algorithm tests the null hypothesis that all the observed mutations in each gene are a consequence of random background mutation processes. Genes for which this hypothesis is rejected based on the Benjamini-Hochberg false discovery rate-corrected q-value (q < 0.2) are considered significantly mutated, and are listed in the table. Gene Description Anaplastic lymphoma receptor tyrosine kinase PGLYRP3 Peptidoglycan recognition protein 3 Leukocyte immunoglobulin-like LILRB1 receptor, subfamily B, member 1 Protein tyrosine phosphatase, PTPN11 non-receptor type 11 Neuroblastoma RAS viral (v-ras) NRAS oncogene homolog GABRA6 Gamma-aminobutyric acid (GABA) A receptor, alpha 6 SUCLG2 Succinate-CoA ligase, GDPforming, beta subunit Immunoglobulin superfamily, IGSF11 member 11 ALK Patients Unique sites q-value no HM q-value with HM 9 6 7.7x10-7 2.6x10-6 Expressed in 10 neuroblastoma transcriptomes Yes 3 3 0.045 0.065 No 3 3 0.071 0.085 Yes 2 2 0.13 0.17 Yes 2 2 0.17 0.17 Yes 2 2 1.00 0.17 No 2 2 1.00 0.17 Yes 2 2 1.00 0.18 No 171 Table 4.3 Notable structural variants detected and confirmed in NBL genomes and transcriptomes *These fusions likely have complex architecture and may involve additional neighboring genes. The following designations are used in the Table: SV = structural variant; CE = capillary electrophoresis; MAPK = identifies genes that encode a known or putative member of the MAPK Pathway (KEGG hsa04010); Cancer = identifies genes listed in the Cancer Gene Census [7]; t(11;17) = identifies genes affected by a translocation between chromosomal arms 17q and 11q; Chromatin remodeling = identifies genes that function in chromatin remodeling; Recurrent genes = identifies genes recurrently affected by structural variants in this study; Mitelman database = identifies genes known to be involved in cancer-specific genome rearrangements as recorded in the Mitelman database of chromosome aberrations and gene fusions in cancer [343]; Other = denotes other notable genes described in the text; Confirmed, evidence in blood = somatic events detectable by PCR in the patient‘s blood, likely derived from circulating tumor DNA. 172 Gene(s) MYCN; GULP1 ABL2; ACBD6 ARID1B FAM86C1; IKZF3 MAPK10; PRDM5 PRELID2; MAPK9 AUTS2 RRM1714; NOP2 PTPN13 Event Type Fusion Sample PARIRD Breakpoint chr2:16083041 Breakpoint chr2:189393508 Validation status Confirmed somatic Comment Cancer Fusion PARRBU chr1:179198375 chr1:180382607 Probable somatic Cancer SV Fusion PASLGS PARGUX chr6:157138276 chr11:71508561 chr6:157168409 chr17:37960058 Fusion PASCKI chr4:87260892 chr4:121730886 Chromatin remodeling Chromatin remodeling; t(11;17) MAPK Fusion PAPSKM chr5:145142894 chr5:179682098 Confirmed somatic Probable somatic by CE Probable somatic by CE Confirmed somatic SV Fusion PARIRD PARDUJ chr7:70188135 chr12:6669132 chr7:70200616 chr12:6679022 Confirmed somatic Confirmed somatic Mitelman database Mitelman database SV PASCKI chr4:87732011 chr4:104699882 MAPK FAM134B; CDH18 CDH13 LSAMP; STAG1 NBAS; BAZ2B NBAS; CCNT2* NBAS NBAS Fusion PAPSKM chr5:16539884 chr5:19720574 Probable somatic by CE Confirmed somatic SV Fusion PAPTLD PANRRW chr16:82673944 chr3:116668399 chr16:82684731 chr3:136402660 Confirmed somatic Confirmed somatic Other Other Fusion PARIRD chr2:15578371 chr2:160221641 Confirmed somatic Recurrent gene Fusion PARIRD chr2:15591679 chr2:135682003 Confirmed somatic Recurrent gene SV SV PARIRD PARIRD chr2:15648856 chr2:15659845 chr2:15650260 chr2:15660266 Recurrent gene Recurrent gene NBAS SV PARIRD chr2:15699066 chr17:53790753 Confirmed somatic Probable somatic by CE Probable somatic by CE MAPK Other Recurrent gene 173 Gene(s) CDKAL1 Event Type SV Sample PAPTMM Breakpoint chr6:20769198 Breakpoint chr6:20806792 CDKAL1 APBB1766; ZFHX3 NBAS; AK001558* NBAS SV Fusion PASLGS PARDUJ chr6:20806846 chr16:73036896 chr6:20899275 chr16:73064047 Fusion PARSHT chr2:15629062 chr2:12660527 SV PASDZJ chr2:15794685 chr2:17080300 NBAS; FAM49A* NBAS Fusion PASDZJ chr2:15667544 chr2:17302098 SV PASDZJ chr2:16794222 chr2:17046968 NBAS NBAS SV SV PASDZJ PARSHT chr2:16975790 chr2:12660729 chr2:17208524 chr2:15626595 ZFHX3 Duplication PANRRW chr16:73064821 chr16:73352657 RNF121; TRIM37 ATG2A; BCAS3 SHANK2 Fusion PANNMS chr11:71692501 chr17:57072537 Fusion PARGUX chr11:64674966 chr17:58891730 SV PASCKI chr11:70784776 chr17:34136040 Validation status Probable somatic by CE Confirmed somatic Confirmed somatic Comment Recurrent gene Confirmed, evidence in blood Confirmed, evidence in blood Confirmed, evidence in blood Confirmed, evidence in blood Confirmed somatic Confirmed, evidence in blood Putative unknown origin Confirmed somatic Recurrent gene Probable somatic by CE Confirmed somatic t(11;17) Recurrent gene Recurrent gene Recurrent gene Recurrent gene Recurrent gene Recurrent gene Recurrent gene Recurrent gene t(11;17) t(11;17) 174 Table 4.4 Parameters used to select high confidence candidate somatic mutations reported by CGI The MAF files provided by Complete Genomics, Inc (CGI) were filtered based on the parameters described in the table. Selection Criterion Operator Value Variant_Classification Equal (Nonsense, Misstart, Nonstop, Frame_Shift, In_Frame, Missense, Splice_Site) Variant_Type Equal (Snp, Ins, Del, Sub) Mutation_Status Equal (Somatic, LOH) Tumor_VarScore_Rank >= 0.025 Match_Norm_RefScore_Rank >= 0.025 175 Table 4.5 Primer sequences used for genomic validation of structural variants and gene fusions detected by BCCA pipeline Sample Gene(s) PANNMS PARSHT RNF121; TRIM37 LSAMP; STAG1 NBAS PARSHT PANRRW Genomic Genomic breakpoint breakpoint chr11:71692501 chr17:57072537 chr3:116668399 chr3:136402660 chr2:12660527 chr2:15629062 NBAS chr2:12660729 chr2:15626595 PASDZJ NBAS chr2:15794685 chr2:17080300 PASDZJ NBAS chr2:15667544 chr2:17302098 PASDZJ NBAS chr2:16794222 chr2:17046968 PASDZJ NBAS chr2:16975790 chr2:17208524 PASLGS RERE chr1:5081632 chr1:8421299 PARRBU MPRIP chr17:16952510 chr17:2459132 Primer 1 Primer 2 GATATTTCGTTTGGATAGCA CTGG TCTGCAGAGAGAAAGACTAC CTTG ATAATTGTTGCTAGTGGAGG AAGG ATAATTGTTGCTAGTGGAGG AAGG GTCAAATTTATCAGCCTTTG GC GACGATCTATCCTGGCACTG AC GGAACTTCTTGATATGGTCT GACTC ATAGGAATCACAACAGGAA AGGAG GACACTCATGAGCATAGAAA AAGG CCGAGTTTAAGCGATTCTTG TG GAAGTGCAGTAGCACGATTT TGG TACTGAGTTTTCCTATCCACA AGC ACAAATACCCTGAGAGTCTG GAAG ACAAATACCCTGAGAGTCTG GAAG GTTTAAGGCCCTGATAGAAG AGG ATTCATGTTGCAAGAGCAGA AG TTCCCAGTTCTTTCTTATAGA GGTG CTACAGCACGGGCTTCTAAA AC AGGACAATGAGAGTGACTCG GAC GGTATATGCCAAGAAGAATT GAGG 176 Table 4.6 Primer sequences used for tumor RNA validation of structural variants and gene fusions detected by the BCCA pipeline Sample PANNMS PANRRW PARRBU PARSHT PASDZJ PASDZJ PASDZJ PASLGS PASLGS Gene(s) RNF121; TRIM37 LSAMP; STAG1 MPRIP NBAS NBAS NBAS NBAS RERE RERE Primer1 ATCTCTCTCCAGAAGAGCAATGG Primer2 AGGTGCAGTGTCAGTTTCAAATC GAATAACACACCGGAGACTTTTG GTTAAAATCCACGCTGCGAACAG GGATGACACTTTGAGAACTCCTG ACTGGAACAAATTCTCAGTGTGTC GTGAGAAGTGGTGTCACTCACGC ATCAACACAGCTATTTACCACCC ACAGCTATTTACCACCCTGGTC ACAGTGAAGAAGTCGGCCAAGAAG GTACCTCCAGCAATGACAGTAAAG GTGTCAGAACTGCTTCAAGCCC GAATCCATCTTTCTCTCATGTAGC ATTCATGTTGCAAGAGCAGAAG GTTCAGAGAATCTCCCAAAATCAC GTCCATCATAGAGCTGAAAATGTG AGAGACACCAAACAGGCTTTGAG CTCATTTTGTCTTCAATGTGGG 177 Chapter 5: Conclusions and future directions Evolving methods of genomic analysis reviewed in Chapter 1 have contributed to the characterization of cancer genomes and transcriptomes at ever-increasing resolution. The advent of second-generation sequencing technologies has enabled studies of cancers that achieve single-nucleotide views of both genomes and transcriptomes. Applications of arraybased methods and candidate gene Sanger-based re-sequencing to the analysis of human neuroblastoma (NBL) have revealed novel loci associated with the disease, most notably the anaplastic lymphoma kinase ALK that is subject to somatic mutation and amplification, occurring in 5-15% of patients with sporadic NBL [192,193,188,194] . Based on these studies we hypothesized that interrogations of NBL genomes and transcriptomes using second generation technologies may lead to novel insights into the disease. We also hypothesized that better understanding of the gene expression profile of the putative cell of origin of NBL will help identify loci with clinical relevance to the disease and interpret high throughput sequencing data from NBL cells. To address these hypotheses we developed three research objectives that formed the basis of Chapters 2, 3, and 4, each fulfilling specific goals described in the subsections below. 5.1 Transcriptome analysis of normal neural crest cells identifies key pathways, enriched and depleted in this population compared to other related cell types Since NBL is thought to originate from a differentiation arrest along the sympathoadrenal lineage of the neural crest, understanding the neural crest stem cell and its development into this lineage may provide insight into the pathogenesis of NBL. Therefore, the overall objective of Chapter 2 was to identify and characterize the expression of genes and pathways that distinguish neural crest stem cells from other stem cell lineages with similarly broad developmental potential. The Skin-derived Precursor cells (SKPs) have been validated as models for normal neural crest stem cells by previous work [234] and have been used for this analysis. The Mesenchymal Stem Cells (MSCs) have been chosen for comparison as they represent one of the few somatic stem cell lineages that approach the developmental potential of the neural crest [230,255,363]. To address the research objective of Chapter 2, I first characterized the transcriptomes of SKPs isolated from ventral, dorsal and facial skin regions of the body that are thought to derive from different developmental origins, including neural crest itself and somite 178 mesoderm. This analysis revealed plasticity of the neural crest stem cell phenotype suggesting that cells resembling normal neural crest stem cells may arise from non-neural crest lineages. Based on this result, I used the three SKP populations to identify transcripts enriched and depleted in SKPs compared to a related multipotent somatic stem cell lineage, the MSCs. This analysis revealed the relative increase of mRNA abundance of transcripts involved in the WNT/Beta-catenin, BMP and TGFB pathways, and relative depletion of transcripts involved in double-stranded break DNA repair in SKPs compared to MSCs. While the importance of active WNT/Beta-catenin, BMP and TGFB signaling in neural crest cells is well-established [257,364,272], the relative reduction of the expression level of genes involved in double-stranded break DNA repair is a novel finding. A recent study in mice has identified eleven DNA repair genes, highly expressed during very early embryonic development and barely detectable in the adrenal medulla, an organ derived from the sympathoadrenal lineage of the neural crest and the most common primary site of NBL [365]. This study is consistent with my finding of the decreased mRNA abundance of DNA repair genes in SKPs compared to MSCs. In addition, the SKP and MSC comparison revealed the preferential expression of pluripotency markers in SKPs, which prompted me to further investigate similarities and differences between the expression profiles of SKPs and ES cells. This analysis revealed 5 pluripotency markers (CTNNB1, ETV4, MAD2L2, PITX2, SOX2) among the genes enriched in SKPs compared to MSCs, and 13 pluripotency markers (ADAM23, AURKB, CENPK, FAM46B, FAM64A, HMGB2, IGF2BP3, KPNA2, MTHFD1, MYBL2, TBX4, TPM1, ZFP57) among the genes depleted in SKPs compared to MSCs, highlighting the unique phenotype of SKPs. Future studies based on the findings in Chapter 2 may investigate further the functional mechanism and consequences of the observed reduction of expression of DNA repair genes in normal neural crest cells. The work discussed in Chapter 2 involved rat rather than human neural crest cells as a model for the analysis, and it will be important to validate these findings in human SKPs as well as other models of neural crest cells (for example, the human epidermal neural crest stem cells [366]). The original choice of the rat cells was driven by our desire to investigate the similarities and differences among SKPs isolated from different parts of the body, and the availability of rat-derived MSCs for comparisons. Since 179 we showed the convergence of facial, dorsal trunk and ventral trunk SKPs to a neural crest stem cell phenotype, and assuming that this finding holds true in other vertebrate species, SKPs from any part of the body can be used to model normal neural crest stem cells. We took advantage of this result in Chapter 3, where human foreskin-derived SKPs were used as a reference normal tissue for the analysis of NBL tumor-initiating cells. 5.2 Plasticity of the neural crest stem cell phenotype and NBL heterogeneity The results of the analysis described in Chapter 2 are consistent with the hypothesis that neural crest stem cell-like cells could derive from non-neural crest lineages. In particular, we showed that mesoderm-derived ventral and dorsal SKPs were similar to neural crestderived facial SKPs at the level of gene expression and differentiation potential. This finding may relate to the heterogeneity of NBL, which is a spectrum of diseases with diverse genetic aberrations, pathological features, and clinical courses. Dozens of clinical and biological markers of potential clinical significance have been proposed for NBL [367]. Seven of these markers, including the differentiation grade of the tumor (neuroblastoma, ganglioneuroblastoma or ganglioneuroma), are currently used clinically for pre-treatment risk stratification of new NBL patients [183]. The differentiation grade of NBL cells may reflect the developmental stage at transformation, and correlates with the disease course, such that low-risk tumors typically have a more differentiated morphology than high-risk tumors [175]. NBL recurrence may still occur in some low- or intermediate-risk patients with differentiated morphology and low-stage disease, suggesting that tumors of the same differentiation grade and stage may be heterogeneous at a molecular level. Poor outcome in patients with differentiated tumors and low-stage disease was found to be associated with high expression of MYC and low expression of genes involved in sympathetic neuronal differentiation [368]. The heterogeneity of NBL cells with respect to their apparent developmental program is also reflected in the variable sensitivity of NBL cell lines to differentiation agents. For instance, retinoids can induce marked neuronal differentiation and cell cycle arrest in some NBL cell lines but fail to have any effect on other NBL lines, derived from patients with similar disease characteristics [369]. Notably, retinoic acid is involved in regulating the differentiation of many tissues; however, the nature of the growth and differentiation response to retinoic acid depends on the cell type. [370]. 180 The diversity of NBL cells with respect to differentiation grade, expression of developmental markers, and sensitivity to retinoids may reflect different origins of the neural crest progenitor cells that undergo transformation into NBL. As reported in Chapter 2, neural crest stem cell-like cells may arise from both the neural crest and the mesoderm. This observation suggests that NBL may in principle derive from mesodermal cells that have converged to a neural crest precursor phenotype. Since cells of different developmental origins, despite having similar phenotypes, maintain a developmental history at the gene expression level (Section 2.2.2), different developmental origins of NBL may account for the observed gene expression differences among NBL cells of presumably similar differentiation grades [368]. The potential for mesodermal cells to give rise to NBL, as well as the putative impact of this on NBL heterogeneity remains to be addressed by future studies. 5.3 Transcriptome analysis of NBL tumor-initiating cells implicates AURKB as a novel drug target for NBL Having characterized the transcriptomes of normal neural crest cells in Chapter 2, I set out to characterize a presumed malignant counterpart of these cells, the NBL tumorinitiating cells (TICs) derived from bone marrow metastases of high-risk NBL patients. These cells have been shown to give rise to NBL when injected in mice, and upon serial transplantation, suggesting that they are a suitable model for the disease. In addition, the isolation of NBL TICs from patients in remission who later relapsed suggested that these cells could be used as markers for minimal residual disease in otherwise asymptomatic patients [279]. The overall objectives of Chapter 3 were to identify transcripts preferentially abundant in NBL TICs compared to normal SKPs, characterized in Chapter 2, and to assess if these transcripts could be used to suggest new targets against NBL. To address these objectives I used RNA-Seq data from NBL TICs, SKPs, and other cancers to identify transcripts whose expression was increased in NBL TICs compared to other tissue types. I then conducted pathway analysis to identify functional associations among these transcripts that could be targeted by inhibitors. The pathway analysis revealed the increased expression of the BRCA1 signaling pathway members in NBL TICs compared to SKPs and other tissue types, suggesting that the double-stranded break DNA repair pathway might be activated in NBL TICs. The finding of the potential tumor-specific activation of this pathway led to the 181 hypothesis that AURKB, a kinase linked to this pathway through its interaction with BRCA1-associated RING domain protein 1 (BARD1) and aberrantly expressed in NBL TICs could serve as a drug target against these cells. This hypothesis was tested through AURKB knock downs and treatments with an AURKB-specific pharmacological inhibitor, and both experiments led to the specific killing of NBL TICs but not normal SKPs. An independent group of investigators later confirmed AURKB to be a target in primary NBL tumors further validating our result [306]. Thus, the expression of members of the BRCA1 signaling pathway found to be lower in normal neural crest stem cell-like cells compared to MSCs in Chapter 2, appeared to be increased in NBL TICs. Inhibition of a kinase involved in this pathway appeared to be cytotoxic to these cells, suggesting the importance of this pathway for NBL pathogenesis. Additional support for the role of BRCA1 signaling in the pathogenesis of NBL comes from a GWA study that implicated SNPs in the BARD1 locus to be associated with the development of sporadic high-risk NBL [197]. The SNPs identified by the GWAS analysis have been suggested to influence the splicing of BARD1 such that exons 2 and 3 are excluded, resulting in the loss of the functional domain involved in the BARD1 interaction with BRCA1 [196]. The exon-level RNA-Seq analysis reported in Chapter 3 supported the hypothesis of the preferential expression of the short BARD1beta isoform by NBL cells. The report of a novel function of the stabilization of AURKB by the short BARD1beta isoform [269] provides a potential mechanism for the preferential sensitivity of NBL cells but not normal neural crest cells to AURKB inhibition. Future work resulting from this finding will include functional studies that would investigate the molecular effects of the inhibition of AURKB on the expression of the BRCA1 pathway members. In this thesis, we speculated that inhibition of AURKB acts through downregulating the expression of BRCA1 pathway members, such as gross chromosomal abnormalities are accumulated and not repaired, resulting in cell death. However, direct experimental evidence is required to support or refute this speculation. Examining the expression of BARD1 and its splicing status following AURKB inhibition would also be of interest. This experiment would reveal whether the killing of NBL cells by inhibiting AURKB is associated with the downregulation of the expression of the BARD1beta isoform. 182 A limitation of the work described in Chapter 3 is the use of NBL TICs that are reportedly contaminated with EBV-transformed lymphocytes [280]. I believe that effects of this contamination on the results were partially accounted for by the experimental design that used an expression compendium with lymphocyte-related tissues (diseased B-cells) as reference for identifying NBL TIC-enriched transcripts. The independent validation of drug targets predicted by my analysis in primary tumors by other investigators (Chapter 3) provided additional validation for the usefulness of NBL TICs as models of NBL, despite the contamination. However, confirmatory studies in non-contaminated NBL stem cells would be useful to assess the generality of our findings. 5.4 Whole genome, transcriptome and exome sequencing of primary NBL tumors reveals a broad spectrum of somatic mutations The analysis described in Chapters 2 and 3 focused on the transcriptomes of normal and malignant neural crest cells, and implicated the BRCA1 DNA repair pathway as aberrantly enriched at the mRNA level in metastases-derived NBL TICs compared to the normal neural crest-like cells. While the finding of AURKB as a novel drug target was validated in primary tumors [306], the overall experimental design in Chapter 3 focused on identifying metastases-enriched transcripts and potential targets. Therefore, the objective of Chapter 4 was to conduct a high resolution characterization of a panel of 99 primary NBL tumors to identify recurrently altered genes and pathways of relevance to primary tumors at diagnosis. We also investigated whether the genetic aberrations found in primary tumors targeted similar pathways to those that have been identified to be aberrantly expressed in metastases-derived NBL TICs (Chapter 3). We sequenced 99 primary tumors and matched peripheral blood using a combination of whole genome and exome sequencing performed using Illumina and CGI technologies. We also sequenced the transcriptomes from 10 primary tumors included in the set of 99 cases. Analysis of these data revealed that NBL tumors contained a median 0.56 non-silent mutations per megabase of coding DNA, one of the lowest rates reported in cancer to date. The ALK gene showed the highest somatic mutation rate and was found to be mutated in 9% cases, with another case PANYGR harboring an oncogenic germline mutation in the kinase domain of ALK. Three additional genes (LILRB1, PTPN11 and NRAS) showed significantly recurrent mutations in non-hypermutated cases, albeit in less than 5% of cases. A loss-of- 183 function translocation of IKZF3 together with alterations found in related genes implicated disruption of chromatin remodeling mechanisms in 11% of cases. Mutations in PTPN11, its regulator, LILRB1, and other MAPK signaling components including NRAS, implicated hyperactivation of the RAS/MAPK pathway in 15% of cases. Mutations in MYC and MYCN were seen in two tumors without MYCN amplification, suggesting that MYCN could be activated in NBL through a variety of mechanisms. A hypermutator phenotype was found in 2% of the cases with loss of function mutations in DNA repair genes. In addition, we identified over 80 somatic structural variants including the aforementioned IKZF3 rearrangement. Therefore, the work described in Chapter 4 highlighted the molecular heterogeneity of high-risk NBL, identified commonly disrupted pathways, and demonstrated a relative paucity of somatically acquired mutations, thus implicating epigenetic events as potentially contributing to the tumor behavior. In addition to cataloging the genetic aberrations found in primary tumors, I also compared the genes harboring somatic mutations in primary tumors to those found to be increased in expression in NBL TICs compared to SKPs. While I did not observe somatic mutations in BARD1 that could directly explain the preferential expression of the short BARD1beta isoform described in Chapter 3, I did observe several novel germline variants occurring in BARD1 introns that could be associated with this phenotype. Future studies can address this possibility by examining a larger cohort of tumors with matched expression and DNA sequence data from tumor and normal DNA. 5.5 Future directions in NBL genomics While the work conducted in Chapter 4 was able to identify a potential disease mechanism in over 50% of all cases (Figure 4.2B), there is a significant amount of discovery that still needs to occur to unravel additional molecular aberrations that may contribute to NBL development. It remains a challenge from the translational point of view that the most common genomic aberrations in primary NBL are large chromosomal rearrangements affecting hundreds of genes, and other than MYCN and ALK, focal disruption of individual genes appear to be rare (as seen in Figure 4.3). In addition, it is possible that a significant part of the disease phenotype may be related to germline genetic variation and subsequent stochastic and/or epigenetic alterations in tumor cells. Future efforts in the field may involve integration of data from the genome-wide association efforts [196] with the sequencing data, 184 such as those described in Chapter 4, as well as generation of new data sets querying epigenetic and expression changes. Precedence for epigenetic abnormalities playing a causative role in the pathogenesis of a pediatric cancer has been established by a recent study in retinoblastoma. This study employed genome-wide sequencing and epigenetic analysis of retinoblastoma tumors to reveal few somatic mutations but a number of cancer pathways, including the pathway involving the proto-oncogene SYK, being deregulated at an epigenetic level [170]. Since NBL can be regarded as a malignancy resulting from a differentiation arrest of the neural crest [371], epigenetic abnormalities may play a significant part in determining the ultimate clinical phenotype. Whether this is so remains to be addressed through comprehensive surveys of the epigenome. 185 Bibliography 1. Nature Milestones in Cancer [http://www.nature.com/milestones/milecancer/masthead/index.html].Accessed 2 March 2011. 2. Boveri T: Uber mehrpolige mitosen als mittel zur analyse des zellkerns. Verh. D. Phys. Med. Ges. 1902, 35:67–90. 3. Boveri T: Zur Frage der Entstehung maligner Tumoren. Jena: Verlag von Gustav Fischer; 1914. 4. Finlay CA, Hinds PW, Levine AJ: The p53 proto-oncogene can act as a suppressor of transformation. Cell 1989, 57:1083–1093. 5. Huang HJ, Yee JK, Shew JY, Chen PL, Bookstein R, Friedmann T, Lee EY, Lee WH: Suppression of the neoplastic phenotype by replacement of the RB gene in human cancer cells. Science 1988, 242:1563–1566. 6. Stehelin D, Varmus HE, Bishop JM, Vogt PK: DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA. Nature 1976, 260:170–173. 7. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nat. Rev. Cancer 2004, 4:177–18310.1038/nrc1299. 8. Rous P: A sarcome of the fowl transmissible by an agent separable from the tumor cells. J. Exp. Med 1911, 13:397–411. 9. Leach FS, Nicolaides NC, Papadopoulos N, Liu B, Jen J, Parsons R, Peltomäki P, Sistonen P, Aaltonen LA, Nyström-Lahti M: Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell 1993, 75:1215–1225. 10. Nordling CO: A new theory on cancer-inducing mechanism. Br. J. Cancer 1953, 7:68– 72. 11. Knudson AG: Mutation and cancer: statistical study of retinoblastoma. Proc. Natl. Acad. Sci. U.S.A 1971, 68:820–823. 12. Nowell PC: The clonal evolution of tumor cell populations. Science 1976, 194:23–28. 13. Fearon ER, Vogelstein B: A genetic model for colorectal tumorigenesis. Cell 1990, 61:759–767. 14. Feinberg AP, Vogelstein B: Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 1983, 301:89–92. 15. Laird PW, Jackson-Grusby L, Fazeli A, Dickinson SL, Jung WE, Li E, Weinberg RA, Jaenisch R: Suppression of intestinal neoplasia by DNA hypomethylation. Cell 1995, 81:197–205. 186 16. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100:57–70. 17. Li FP: Familial cancer syndromes and clusters. Curr Probl Cancer 1990, 14:73–114. 18. Sijmons R: Identifying patients with familial cancer syndromes. In Cancer Syndromes National Center for Biotechnology Information (US); 2009. 19. Knudson AG: Hereditary cancers disclose a class of cancer genes. Cancer 1989, 63:1888–1891. 20. Latif F, Tory K, Gnarra J, Yao M, Duh FM, Orcutt ML, Stackhouse T, Kuzmin I, Modi W, Geil L: Identification of the von Hippel-Lindau disease tumor suppressor gene. Science 1993, 260:1317–1320. 21. Kenemans P, Verstraeten RA, Verheijen RHM: Oncogenic pathways in hereditary and sporadic breast cancer. Maturitas 2004, 49:34–4310.1016/j.maturitas.2004.06.005. 22. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al.: Initial sequencing and analysis of the human genome. Nature 2001, 409:860– 92110.1038/35057062. 23. The International HapMap Project: Nature 2003, 426:789–79610.1038/nature02168. 24. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, et al.: The sequence of the human genome. Science 2001, 291:1304– 135110.1126/science.1058040. 25. National Cancer Institute: Surveillance, Epidemiology and End Results (SEER) Database. 2010, Available: http://seer.cancer.gov/statistics/.Accessed 28 July 2011. 26. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K: Environmental and heritable factors in the causation of cancer-analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med 2000, 343:78–8510.1056/NEJM200007133430201. 27. Christiani DC: Combating Environmental Causes of Cancer. New England Journal of Medicine 2011, 364:791–793. 28. Stent GS: The role of cell lineage in development. Philos. Trans. R. Soc. Lond., B, Biol. Sci 1985, 312:3–19. 187 29. Bonnet D, Dick JE: Human acute myeloid leukemia is organized as a hierarchy that originates from a primitive hematopoietic cell. Nat Med 1997, 3:730– 73710.1038/nm0797-730. 30. Al-Hajj M, Wicha MS, Benito-Hernandez A, Morrison SJ, Clarke MF: Prospective identification of tumorigenic breast cancer cells. Proc. Natl. Acad. Sci. U.S.A 2003, 100:3983–398810.1073/pnas.0530291100. 31. Singh SK, Clarke ID, Terasaki M, Bonn VE, Hawkins C, Squire J, Dirks PB: Identification of a Cancer Stem Cell in Human Brain Tumors. Cancer Research 2003, 63:5821 –5828. 32. Quintana E, Shackleton M, Sabel MS, Fullen DR, Johnson TM, Morrison SJ: Efficient tumour formation by single human melanoma cells. Nature 2008, 456:593– 59810.1038/nature07567. 33. Santisteban M, Reiman JM, Asiedu MK, Behrens MD, Nassar A, Kalli KR, Haluska P, Ingle JN, Hartmann LC, Manjili MH, Radisky DC, Ferrone S, Knutson KL: Immuneinduced epithelial to mesenchymal transition in vivo generates breast cancer stem cells. Cancer Res 2009, 69:2887–289510.1158/0008-5472.CAN-08-3343. 34. Gupta PB, Chaffer CL, Weinberg RA: Cancer stem cells: mirage or reality? Nat Med 2009, 15:1010–101210.1038/nm0909-1010. 35. Zhou B-BS, Zhang H, Damelin M, Geles KG, Grindley JC, Dirks PB: Tumourinitiating cells: challenges and opportunities for anticancer drug discovery. Nat Rev Drug Discov 2009, 8:806–82310.1038/nrd2137. 36. Beheshti B, Braude I, Marrano P, Thorner P, Zielenska M, Squire JA: Chromosomal localization of DNA amplifications in neuroblastoma tumors using cDNA microarray comparative genomic hybridization. Neoplasia 2003, 5:53–62. 37. Caspersson T, Lindsten J, Lomakka G, Moller A, Zech L: The use of fluorescence techniques for the recognition of mammalian chromosomes and chromosome regions. Int Rev Exp Pathol 1972, 11:1–72. 38. Rowley JD: Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 1973, 243:290–293. 39. Nowell PC, Hungerford DA: Chromosome studies on normal and leukemic human leukocytes. J. Natl. Cancer Inst. 1960, 25:85–109. 40. Barnes WM: PCR amplification of up to 35-kb DNA with high fidelity and high yield from lambda bacteriophage templates. Proc. Natl. Acad. Sci. U.S.A. 1994, 91:2216–2220. 41. Vasickova P, Machackova E, Lukesova M, Damborsky J, Horky O, Pavlu H, Kuklova J, Kosinova V, Navratilova M, Foretova L: High occurrence of BRCA1 intragenic 188 rearrangements in hereditary breast and ovarian cancer syndrome in the Czech Republic. BMC Med. Genet. 2007, 8:3210.1186/1471-2350-8-32. 42. Buongiorno-Nardelli M, Amaldi F: Autoradiographic detection of molecular hybrids between RNA and DNA in tissue sections. Nature 1970, 225:946–948. 43. Speicher MR, Carter NP: The new cytogenetics: blurring the boundaries with molecular biology. Nat. Rev. Genet. 2005, 6:782–79210.1038/nrg1692. 44. Patel AS, Hawkins AL, Griffin CA: Cytogenetics and cancer. Curr Opin Oncol 2000, 12:62–67. 45. Speicher MR, Gwyn Ballard S, Ward DC: Karyotyping human chromosomes by combinatorial multi-fluor FISH. Nat. Genet. 1996, 12:368–37510.1038/ng0496-368. 46. Schröck E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, Ning Y, Ledbetter DH, Bar-Am I, Soenksen D, Garini Y, Ried T: Multicolor spectral karyotyping of human chromosomes. Science 1996, 273:494–497. 47. Tanke HJ, Wiegant J, van Gijlswijk RP, Bezrookove V, Pattenier H, Heetebrij RJ, Talman EG, Raap AK, Vrolijk J: New strategy for multi-colour fluorescence in situ hybridisation: COBRA: COmbined Binary RAtio labelling. Eur. J. Hum. Genet. 1999, 7:2–1110.1038/sj.ejhg.5200265. 48. Fujiwara H, Emi M, Nagai H, Ohgaki K, Imoto I, Akimoto M, Ogawa O, Habuchi T: Definition of a 1-Mb homozygous deletion at 9q32-q33 in a human bladder-cancer cell line. J. Hum. Genet. 2001, 46:372–37710.1007/s100380170056. 49. Henderson L-J, Okamoto I, Lestou VS, Ludkovski O, Robichaud M, Chhanabhai M, Gascoyne RD, Klasa RJ, Connors JM, Marra MA, Horsman DE, Lam WL: Delineation of a minimal region of deletion at 6q16.3 in follicular lymphoma and construction of a bacterial artificial chromosome contig spanning a 6-megabase region of 6q16-q21. Genes Chromosomes Cancer 2004, 40:60–6510.1002/gcc.20013. 50. Huang H, Qian C, Jenkins RB, Smith DI: Fish mapping of YAC clones at human chromosomal band 7q31.2: identification of YACS spanning FRA7G within the common region of LOH in breast and prostate cancer. Genes Chromosomes Cancer 1998, 21:152–159. 51. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D: Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992, 258:818–821. 52. Mantripragada KK, Buckley PG, Diaz de Ståhl T, Dumanski JP: Genomic microarrays in the spotlight. Trends Genet. 2004, 20:87–94. 53. Carter NP: Methods and strategies for analyzing copy number variation using DNA microarrays. Nat. Genet. 2007, 39:S16–2110.1038/ng2028. 189 54. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 1998, 20:207–21110.1038/2524. 55. Buckley PG, Mantripragada KK, Benetkiewicz M, Tapia-Páez I, Diaz De Ståhl T, Rosenquist M, Ali H, Jarbo C, De Bustos C, Hirvelä C, Sinder Wilén B, Fransson I, Thyr C, Johnsson B-I, Bruder CEG, Menzel U, Hergersberg M, Mandahl N, Blennow E, Wedell A, Beare DM, Collins JE, Dunham I, Albertson D, Pinkel D, Bastian BC, Faruqi AF, Lasken RS, Ichimura K, Collins VP, et al.: A full-coverage, high-resolution human chromosome 22 genomic microarray for clinical and research applications. Hum. Mol. Genet. 2002, 11:3221–3229. 56. Buckley PG, Mantripragada KK, Piotrowski A, Diaz de Ståhl T, Dumanski JP: Copynumber polymorphisms: mining the tip of an iceberg. Trends Genet. 2005, 21:315– 31710.1016/j.tig.2005.04.007. 57. Krzywinski M, Bosdet I, Smailus D, Chiu R, Mathewson C, Wye N, Barber S, BrownJohn M, Chan S, Chand S, Cloutier A, Girn N, Lee D, Masson A, Mayo M, Olson T, Pandoh P, Prabhu A-L, Schoenmakers E, Tsai M, Albertson D, Lam W, Choy C-O, Osoegawa K, Zhao S, de Jong PJ, Schein J, Jones S, Marra MA: A set of BAC clones spanning the human genome. Nucleic Acids Res. 2004, 32:3651–366010.1093/nar/gkh700. 58. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A, Albertson DG, Pinkel D, Marra MA, Ling V, MacAulay C, Lam WL: A tiling resolution DNA microarray with complete coverage of the human genome. Nat. Genet. 2004, 36:299–30310.1038/ng1307. 59. Inazawa J, Inoue J, Imoto I: Comparative genomic hybridization (CGH)-arrays pave the way for identification of novel cancer-related genes. Cancer Sci. 2004, 95:559–563. 60. De Lellis L, Curia MC, Aceto GM, Toracchio S, Colucci G, Russo A, Mariani-Costantini R, Cama A: Analysis of extended genomic rearrangements in oncological research. Ann. Oncol. 2007, 18 Suppl 6:vi173–17810.1093/annonc/mdm251. 61. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS: A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet. 2005, 37:549– 55410.1038/ng1547. 62. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, Stratton MR, Futreal PA, Weber B, Shapero MH, Wooster R: High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004, 14:287–29510.1101/gr.2012304. 63. Heinrichs S, Look AT: Identification of structural aberrations in cancer by SNP array analysis. Genome Biol. 2007, 8:21910.1186/gb-2007-8-7-219. 190 64. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, Meyerson M: Allele-specific amplification in cancer revealed by SNP array analysis. PLoS Comput. Biol. 2005, 1:e6510.1371/journal.pcbi.0010065. 65. Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M, Birch P, Brown-John M, Fernandes N, Go A, Kennedy G, Langlois S, Eydoux P, Friedman JM, Marra MA: Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 2007, 8:36810.1186/1471-2105-8-368. 66. Wang K, Diskin SJ, Zhang H, Attiyeh EF, Winter C, Hou C, Schnepp RW, Diamond M, Bosse K, Mayes PA, Glessner J, Kim C, Frackelton E, Garris M, Wang Q, Glaberson W, Chiavacci R, Nguyen L, Jagannathan J, Saeki N, Sasaki H, Grant SFA, Iolascon A, Mosse YP, Cole KA, Li H, Devoto M, McGrady PW, London WB, Capasso M, et al.: Integrative genomics identifies LMO1 as a neuroblastoma oncogene. Nature 2011, 469:216– 22010.1038/nature09609. 67. Reid C: Company Profile: Complete Genomics Inc. Future Oncology 2011, 7:219– 22110.2217/fon.10.173. 68. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A 1977, 74:5463–5467. 69. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O‘Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, et al.: Patterns of somatic mutation in human cancer genomes. Nature 2007, 446:153– 15810.1038/nature05610. 70. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Goodwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al.: Genome Sequencing in Open Microfabricated High Density Picoliter Reactors. Nature 2005, 437:376–38010.1038/nature03959. 71. Tawfik DS, Griffiths AD: Man-made cell-like compartments for molecular evolution. Nat. Biotechnol 1998, 16:652–65610.1038/nbt0798-652. 72. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT, Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao X, Reed B, et al.: An integrated semiconductor device enabling non-optical genome sequencing. Nature 2011, 475:348– 35210.1038/nature10242. 191 73. Bennett ST, Barnes C, Cox A, Davies L, Brown C: Toward the $1000 human genome. Pharmacogenomics 2005, 6:373–38210.1517/14622416.6.4.373. 74. Bentley DR: Whole-genome re-sequencing. Curr. Opin. Genet. Dev 2006, 16:545– 55210.1016/j.gde.2006.10.009. 75. Braslavsky I, Hebert B, Kartalov E, Quake SR: Sequence information can be obtained from single DNA molecules. Proc Natl Acad Sci U S A 2003, 100:3960– 396410.1073/pnas.0230489100. 76. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, et al.: Real-time DNA sequencing from single polymerase molecules. Science 2009, 323:133–13810.1126/science.1162986. 77. Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE: Fluorescence detection in automated DNA sequence analysis. Nature 1986, 321:674–67910.1038/321674a0. 78. Rosenblum BB, Lee LG, Spurgeon SL, Khan SH, Menchen SM, Heiner CR, Chen SM: New dye-labeled terminators for improved DNA sequencing patterns. Nucleic Acids Research 1997, 25:4500 –450410.1093/nar/25.22.4500. 79. Dames S, Durtschi J, Geiersbach K, Stephens J, Voelkerding KV: Comparison of the Illumina Genome Analyzer and Roche 454 GS FLX for Resequencing of Hypertrophic Cardiomyopathy-Associated Genes. J Biomol Tech 2010, 21:73–80. 80. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z: Single-molecule DNA sequencing of a viral genome. Science 2008, 320:106–10910.1126/science.1150427. 81. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM: Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 2005, 309:1728 –173210.1126/science.1117389. 82. Wang T-L, Maierhofer C, Speicher MR, Lengauer C, Vogelstein B, Kinzler KW, Velculescu VE: Digital karyotyping. Proc. Natl. Acad. Sci. U.S.A 2002, 99:16156– 1616110.1073/pnas.202610899. 83. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science 1995, 270:484–487. 84. Parrett TJ, Yan H: Digital karyotyping technology: exploring the cancer genome. Expert Rev. Mol. Diagn 2005, 5:917–92510.1586/14737159.5.6.917. 192 85. Salani R, Chang C-L, Cope L, Wang T-L: Digital karyotyping: an update of its applications in cancer. Mol Diagn Ther 2006, 10:231–237. 86. Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH, Bajsarowicz K, Paris PL, Tao Q, Kowbel D, Lapuk A, Shagin DA, Shagina IA, Gray JW, Cheng J-F, de Jong PJ, Pevzner P, Collins C: Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res 2006, 16:394–40410.1101/gr.4247306. 87. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo W-L, Magrane G, De Jong P, Gray JW, Collins C: End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl. Acad. Sci. U.S.A 2003, 100:7696–770110.1073/pnas.1232418100. 88. Krzywinski M, Bosdet I, Mathewson C, Wye N, Brebner J, Chiu R, Corbett R, Field M, Lee D, Pugh T, Volik S, Siddiqui A, Jones S, Schein J, Collins C, Marra M: A BAC clone fingerprinting approach to the detection of human genome rearrangements. Genome Biol 2007, 8:R22410.1186/gb-2007-8-10-r224. 89. Collins FS, Barker AD: Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci. Am 2007, 296:50–57. 90. Dickson D: Wellcome funds cancer database. Nature 1999, 401:72910.1038/44413. 91. The Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008, 455:1061–106810.1038/nature07385. 92. Parsons DW, Jones S, Zhang X, Lin JC-H, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu I-M, Gallia GL, Olivi A, McLendon R, Rasheed BA, Keir S, Nikolskaya T, Nikolsky Y, Busam DA, Tekleab H, Diaz LA, Hartigan J, Smith DR, Strausberg RL, Marie SKN, Shinjo SMO, Yan H, Riggins GJ, Bigner DD, Karchin R, Papadopoulos N, Parmigiani G, et al.: An integrated genomic analysis of human glioblastoma multiforme. Science 2008, 321:1807–181210.1126/science.1164382. 93. Barretina J, Taylor BS, Banerji S, Ramos AH, Lagos-Quintana M, Decarolis PL, Shah K, Socci ND, Weir BA, Ho A, Chiang DY, Reva B, Mermel CH, Getz G, Antipin Y, Beroukhim R, Major JE, Hatton C, Nicoletti R, Hanna M, Sharpe T, Fennell TJ, Cibulskis K, Onofrio RC, Saito T, Shukla N, Lau C, Nelander S, Silver SJ, Sougnez C, et al.: Subtype-specific genomic alterations define new targets for soft-tissue sarcoma therapy. Nat. Genet 2010, 42:715–72110.1038/ng.619. 94. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, et al.: Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008, 455:1069– 107510.1038/nature07423. 193 95. Zhang J, Mullighan CG, Harvey RC, Wu G, Chen X, Edmonson M, Buetow KH, Carroll WL, Chen I-M, Devidas M, Gerhard DS, Loh ML, Reaman GH, Relling MV, Camitta BM, Bowman WP, Smith MA, Willman CL, Downing JR, Hunger SP: Key pathways are frequently mutated in high risk childhood acute lymphoblastic leukemia: a report from the Children’s Oncology Group. Blood 2011, 10.1182/blood-2011-03-341412Available: http://www.ncbi.nlm.nih.gov/pubmed/21680795.Accessed 27 June 2011. 96. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JKV, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE: The consensus coding sequences of human breast and colorectal cancers. Science 2006, 314:268– 27410.1126/science.1133427. 97. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JKV, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PVK, et al.: The genomic landscapes of human breast and colorectal cancers. Science 2007, 318:1108– 111310.1126/science.1145720. 98. The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474:609–61510.1038/nature10166. 99. Parsons DW, Li M, Zhang X, Jones S, Leary RJ, Lin JC-H, Boca SM, Carter H, Samayoa J, Bettegowda C, Gallia GL, Jallo GI, Binder ZA, Nikolsky Y, Hartigan J, Smith DR, Gerhard DS, Fults DW, VandenBerg S, Berger MS, Marie SKN, Shinjo SMO, Clara C, Phillips PC, Minturn JE, Biegel JA, Judkins AR, Resnick AC, Storm PB, Curran T, et al.: The genetic landscape of the childhood cancer medulloblastoma. Science 2011, 331:435– 43910.1126/science.1198056. 100. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 2011, 39:D945–95010.1093/nar/gkq929. 101. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, Fulton RS, Delehaunty KD, McGrath SD, Fulton LA, Locke DP, Magrini VJ, Abbott RM, Vickery TL, Reed JS, Robinson JS, Wylie T, Smith SM, Carmichael L, Eldred JM, Harris CC, Walker J, Peck JB, Du F, Dukes AF, Sanderson GE, Brummett AM, Clark E, McMichael JF, et al.: Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med 2009, 361:1058–106610.1056/NEJMoa0903840. 102. Morin RD, Johnson NA, Severson TM, Mungall AJ, An J, Goya R, Paul JE, Boyle M, Woolcock BW, Kuchenbauer F, Yap D, Humphries RK, Griffith OL, Shah S, Zhu H, Kimbara M, Shashkin P, Charlot JF, Tcherpakov M, Corbett R, Tam A, Varhol R, Smailus D, Moksa M, Zhao Y, Delaney A, Qian H, Birol I, Schein J, Moore R, et al.: Somatic 194 mutation of EZH2 (Y641) in Follicular and Diffuse Large B-cell Lymphomas of Germinal Center Origin. Nat Genet 2010, 42:181–18510.1038/ng.518. 103. Shah SP, Köbel M, Senz J, Morin RD, Clarke BA, Wiegand KC, Leung G, Zayed A, Mehl E, Kalloger SE, Sun M, Giuliany R, Yorida E, Jones S, Varhol R, Swenerton KD, Miller D, Clement PB, Crane C, Madore J, Provencher D, Leung P, DeFazio A, Khattra J, Turashvili G, Zhao Y, Zeng T, Glover JNM, Vanderhyden B, Zhao C, et al.: Mutation of FOXL2 in granulosa-cell tumors of the ovary. N. Engl. J. Med 2009, 360:2719– 272910.1056/NEJMoa0902542. 104. Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T, Yuza Y, Garraway LA, LaFramboise T, Lee JC, Shah K, O‘Neill K, Sasaki H, Lindeman N, Wong K-K, Borras AM, Gutmann EJ, Dragnev KH, DeBiasi R, Chen T-H, Glatt KA, Greulich H, Desany B, Lubeski CK, Brockman W, Alvarez P, Hutchison SK, Leamon JH, Ronan MT, Turenchalk GS, Egholm M, et al.: Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med 2006, 12:852– 85510.1038/nm1437. 105. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, DunfordShore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V, Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P, et al.: DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008, 456:66–7210.1038/nature07485. 106. Morin RD, Mendez-Lago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, Johnson NA, Severson TM, Chiu R, Field M, Jackman S, Krzywinski M, Scott DW, Trinh DL, Tamura-Wells J, Li S, Firme MR, Rogic S, Griffith M, Chan S, Yakovenko O, Meyer IM, Zhao EY, Smailus D, Moksa M, Chittaranjan S, Rimsza L, Brooks-Wilson A, Spinelli JJ, Ben-Neriah S, et al.: Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 2011, 476:298–30310.1038/nature10351. 107. Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, Sboner A, Esgueva R, Pflueger D, Sougnez C, Onofrio R, Carter SL, Park K, Habegger L, Ambrogio L, Fennell T, Parkin M, Saksena G, Voet D, Ramos AH, Pugh TJ, Wilkinson J, Fisher S, Winckler W, Mahan S, Ardlie K, Baldwin J, Simons JW, Kitabayashi N, MacDonald TY, et al.: The genomic complexity of primary human prostate cancer. Nature 2011, 470:214– 22010.1038/nature09744. 108. Chapman MA, Lawrence MS, Keats JJ, Cibulskis K, Sougnez C, Schinzel AC, Harview CL, Brunet J-P, Ahmann GJ, Adli M, Anderson KC, Ardlie KG, Auclair D, Baker A, Bergsagel PL, Bernstein BE, Drier Y, Fonseca R, Gabriel SB, Hofmeister CC, Jagannath S, Jakubowiak AJ, Krishnan A, Levy J, Liefeld T, Lonial S, Mahan S, Mfuko B, Monti S, Perkins LM, et al.: Initial genome sequencing and analysis of multiple myeloma. Nature 2011, 471:467–47210.1038/nature09837. 195 109. Jones SJ, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T, Chuah E, Corbett R, Fejes AP, Griffith M, Yee J, Martin M, Mayo M, Melnyk N, Morin RD, Pugh TJ, Severson T, Shah SP, Sutcliffe M, Tam A, Terry J, Thiessen N, Thomson T, Varhol R, Zeng T, Zhao Y, Moore RA, Huntsman DG, et al.: Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol 2010, 11:R8210.1186/gb-2010-11-8-r82. 110. Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt D, Ha C, Johnson S, Kennemer MI, Mohan S, Nazarenko I, Watanabe C, Sparks AB, Shames DS, Gentleman R, de Sauvage FJ, Stern H, Pandita A, Ballinger DG, Drmanac R, Modrusan Z, Seshagiri S, Zhang Z: The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 2010, 465:473–47710.1038/nature09004. 111. Pleasance ED, Stephens PJ, O‘Meara S, McBride DJ, Meynert A, Jones D, Lin M-L, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, et al.: A smallcell lung cancer genome with complex signatures of tobacco exposure. Nature 2010, 463:184–19010.1038/nature08629. 112. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin M-L, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, et al.: A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 2010, 463:191–19610.1038/nature08658. 113. Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G, Jares P, Beà S, González-Díaz M, Bassaganyas L, Baumann T, Juan M, López-Guerra M, Colomer D, Tubío JMC, López C, Navarro A, Tornador C, Aymerich M, Rozman M, Hernández JM, Puente DA, Freije JMP, Velasco G, Gutiérrez-Fernández A, Costa D, Carrió A, Guijarro S, Enjuanes A, et al.: Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 2011, 475:101– 10510.1038/nature10113. 114. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, Guliany R, Senz J, Steidl C, Holt RA, Jones S, Sun M, Leung G, Moore R, Severson T, Taylor GA, Teschendorff AE, Tse K, Turashvili G, Varhol R, Warren RL, Watson P, Zhao Y, Caldas C, Huntsman D, Hirst M, Marra MA, Aparicio S: Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 2009, 461:809– 81310.1038/nature08489. 115. Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, et al.: International network of cancer genome projects. Nature 2010, 464:993–99810.1038/nature08987. 196 116. Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010, 11:685–69610.1038/nrg2841. 117. Ajay SS, Parker SCJ, Abaan HO, Fajardo KVF, Margulies EH: Accurate and comprehensive sequencing of personal genomes. Genome Res. 2011, 21:1498– 150510.1101/gr.123638.111. 118. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J, Turner DJ: Target-enrichment strategies for next-generation sequencing. Nat. Methods 2010, 7:111–11810.1038/nmeth.1419. 119. Comino-Méndez I, Gracia-Aznárez FJ, Schiavi F, Landa I, Leandro-García LJ, Letón R, Honrado E, Ramos-Medina R, Caronia D, Pita G, Gómez-Graña A, de Cubas AA, IngladaPérez L, Maliszewska A, Taschin E, Bobisse S, Pica G, Loli P, Hernández-Lavado R, Díaz JA, Gómez-Morales M, González-Neira A, Roncador G, Rodríguez-Antona C, Benítez J, Mannelli M, Opocher G, Robledo M, Cascón A: Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma. Nat Genet 2011, 10.1038/ng.861Available: http://www.ncbi.nlm.nih.gov/pubmed/21685915.Accessed 27 June 2011. 120. Tiacci E, Trifonov V, Schiavoni G, Holmes A, Kern W, Martelli MP, Pucciarini A, Bigerna B, Pacini R, Wells VA, Sportoletti P, Pettirossi V, Mannucci R, Elliott O, Liso A, Ambrosetti A, Pulsoni A, Forconi F, Trentin L, Semenzato G, Inghirami G, Capponi M, Di Raimondo F, Patti C, Arcaini L, Musto P, Pileri S, Haferlach C, Schnittger S, Pizzolo G, et al.: BRAF mutations in hairy-cell leukemia. N. Engl. J. Med 2011, 364:2305– 231510.1056/NEJMoa1014209. 121. Totoki Y, Tatsuno K, Yamamoto S, Arai Y, Hosoda F, Ishikawa S, Tsutsumi S, Sonoda K, Totsuka H, Shirakihara T, Sakamoto H, Wang L, Ojima H, Shimada K, Kosuge T, Okusaka T, Kato K, Kusuda J, Yoshida T, Aburatani H, Shibata T: High-resolution characterization of a hepatocellular carcinoma genome. Nat. Genet 2011, 43:464– 46910.1038/ng.804. 122. Varela I, Tarpey P, Raine K, Huang D, Ong CK, Stephens P, Davies H, Jones D, Lin ML, Teague J, Bignell G, Butler A, Cho J, Dalgliesh GL, Galappaththige D, Greenman C, Hardy C, Jia M, Latimer C, Lau KW, Marshall J, McLaren S, Menzies A, Mudie L, Stebbings L, Largaespada DA, Wessels LFA, Richard S, Kahnoski RJ, Anema J, et al.: Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 2011, 469:539–54210.1038/nature09639. 123. Yan X-J, Xu J, Gu Z-H, Pan C-M, Lu G, Shen Y, Shi J-Y, Zhu Y-M, Tang L, Zhang XW, Liang W-X, Mi J-Q, Song H-D, Li K-Q, Chen Z, Chen S-J: Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nat. Genet 2011, 43:309–31510.1038/ng.788. 124. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270:467–470. 197 125. Pozhitkov AE, Tautz D, Noble PA: Oligonucleotide microarrays: widely applied— poorly understood. Briefings in Functional Genomics & Proteomics 2007, 6:141 – 14810.1093/bfgp/elm014. 126. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286:531–537. 127. Balmain A: Cancer genetics: from Boveri and Mendel to microarrays. Nat. Rev. Cancer 2001, 1:77–8210.1038/35094086. 128. Perez-Diez A, Morgun A, Shulzhenko N: Microarrays for cancer diagnosis and classification. Adv. Exp. Med. Biol. 2007, 593:74–8510.1007/978-0-387-39978-2_8. 129. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403:503–51110.1038/35000501. 130. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lønning PE, Børresen-Dale A-L: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences 2001, 98:10869 –1087410.1073/pnas.191367098. 131. Kunz G: Use of a genomic test (MammaPrintTM) in daily clinical practice to assist in risk stratification of young breast cancer patients. Arch. Gynecol. Obstet 2011, 283:597–60210.1007/s00404-010-1454-9. 132. White NMA, Bao TT, Grigull J, Youssef YM, Girgis A, Diamandis M, Fatoohi E, Metias M, Honey RJ, Stewart R, Pace KT, Bjarnason GA, Yousef GM: miRNA profiling for clear cell renal cell carcinoma: biomarker discovery and identification of potential controls and consequences of miRNA dysregulation. J. Urol. 2011, 186:1077– 108310.1016/j.juro.2011.04.110. 133. Griffith M, Tang MJ, Griffith OL, Morin RD, Chan SY, Asano JK, Zeng T, Flibotte S, Ally A, Baross A, Hirst M, Jones SJM, Morin GB, Tai IT, Marra MA: ALEXA: a microarray design platform for alternative expression analysis. Nat. Methods 2008, 5:11810.1038/nmeth0208-118. 134. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 2000, 18:630–63410.1038/76469. 198 135. Matsumura H, Reich S, Ito A, Saitoh H, Kamoun S, Winter P, Kahl G, Reuter M, Kruger DH, Terauchi R: Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc. Natl. Acad. Sci. U.S.A 2003, 100:15718– 1572310.1073/pnas.2536670100. 136. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE: Using the transcriptome to annotate the genome. Nat. Biotechnol 2002, 20:508–51210.1038/nbt0502-508. 137. Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G: In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc. Natl. Acad. Sci. U.S.A. 2000, 97:1665–1670. 138. Morozova O, Marra MA: Applications of next-generation sequencing technologies in functional genomics. Genomics 2008, 92:255–26410.1016/j.ygeno.2008.07.001. 139. Audic S, Claverie JM: The significance of digital gene expression profiles. Genome Res. 1997, 7:986–995. 140. Wang SM: Understanding SAGE data. Trends Genet. 2007, 23:42– 5010.1016/j.tig.2006.11.001. 141. Nielsen KL, Høgh AL, Emmersen J: DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res 2006, 34:e13310.1093/nar/gkl714. 142. Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M, Marra MA: Next-generation tag sequencing for cancer gene expression profiling. Genome Res 2009, 19:1825–183510.1101/gr.094482.109. 143. Gowda M, Li H, Alessi J, Chen F, Pratt R, Wang G-L: Robust analysis of 5’transcript ends (5’-RATE): a novel technique for transcriptome analysis and genome annotation. Nucleic Acids Res 2006, 34:e12610.1093/nar/gkl522. 144. Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA, Prange C, Morin PJ, Polyak K, Papadopoulos N, Vogelstein B, Kinzler KW, Strausberg RL, Riggins GJ: A Public Database for Gene Expression in Human Cancers. Cancer Research 1999, 59:5403 –5407. 145. Tsai C-C, Chung Y-D, Lee H-J, Chang W-H, Suzuku Y, Sugano S, Lin J-Y: Largescale sequencing analysis of the full-length cDNA library of human hepatocellular carcinoma. J. Biomed. Sci 2003, 10:636–64310.1159/000073529. 146. Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W, Hawkins M, Hultman M, Kucaba T, Lacy M, Le M, Le N, Mardis E, Moore B, Morris M, Parsons J, Prange C, Rifkin L, Rohlfing T, Schellenberg K, 199 Marra M: Generation and analysis of 280,000 human expressed sequence tags. Genome Research 1996, 6:807 –82810.1101/gr.6.9.807. 147. Sun M, Zhou G, Lee S, Chen J, Shi RZ, Wang SM: SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics 2004, 5:110.1186/14712164-5-1. 148. Morozova O, Hirst M, Marra MA: Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 2009, 10:135– 15110.1146/annurev-genom-082908-145957. 149. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M: Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 2008, 45:81– 9410.2144/000112900. 150. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008, 5:621– 62810.1038/nmeth.1226. 151. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320:1344–134910.1126/science.1158441. 152. Costa V, Angelini C, De Feis I, Ciccodicola A: Uncovering the complexity of transcriptomes with RNA-Seq. J. Biomed. Biotechnol 2010, 2010:85391610.1155/2010/853916. 153. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10:57–6310.1038/nrg2484. 154. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou Y-C, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA: Alternative expression analysis by RNA sequencing. Nat Meth 2010, 7:843–84710.1038/nmeth.1503. 155. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA: Integrative analysis of the melanoma transcriptome. Genome Res. 2010, 20:413–42710.1101/gr.103697.109. 156. Pflueger D, Terry S, Sboner A, Habegger L, Esgueva R, Lin P-C, Svensson MA, Kitabayashi N, Moss BJ, MacDonald TY, Cao X, Barrette T, Tewari AK, Chee MS, Chinnaiyan AM, Rickman DS, Demichelis F, Gerstein MB, Rubin MA: Discovery of nonETS gene fusions in human prostate cancer using next-generation RNA sequencing. Genome Res. 2011, 21:56–6710.1101/gr.110684.110. 200 157. Rosenberg BR, Hamilton CE, Mwangi MM, Dewell S, Papavasiliou FN: Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3’ UTRs. Nat. Struct. Mol. Biol. 2011, 18:230–23610.1038/nsmb.1975. 158. Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N, Bhardwaj N, Rubin M, Snyder M, Gerstein M: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 2011, 7:52210.1038/msb.2011.54. 159. Wiegand KC, Shah SP, Al-Agha OM, Zhao Y, Tse K, Zeng T, Senz J, McConechy MK, Anglesio MS, Kalloger SE, Yang W, Heravi-Moussavi A, Giuliany R, Chow C, Fee J, Zayed A, Prentice L, Melnyk N, Turashvili G, Delaney AD, Madore J, Yip S, McPherson AW, Ha G, Bell L, Fereday S, Tam A, Galletta L, Tonin PN, Provencher D, et al.: ARID1A mutations in endometriosis-associated ovarian carcinomas. N. Engl. J. Med 2010, 363:1532–154310.1056/NEJMoa1008433. 160. Greif PA, Eck SH, Konstandin NP, Benet-Pagès A, Ksienzyk B, Dufour A, Vetter AT, Popp HD, Lorenz-Depiereux B, Meitinger T, Bohlander SK, Strom TM: Identification of recurring tumor-specific somatic mutations in acute myeloid leukemia by transcriptome sequencing. Leukemia 2011, 25:821–82710.1038/leu.2011.19. 161. Sugarbaker DJ, Richards WG, Gordon GJ, Dong L, De Rienzo A, Maulik G, Glickman JN, Chirieac LR, Hartman M-L, Taillon BE, Du L, Bouffard P, Kingsmore SF, Miller NA, Farmer AD, Jensen RV, Gullans SR, Bueno R: Transcriptome sequencing of malignant pleural mesothelioma tumors. Proc. Natl. Acad. Sci. U.S.A 2008, 105:3521– 352610.1073/pnas.0712399105. 162. Palanisamy N, Ateeq B, Kalyana-Sundaram S, Pflueger D, Ramnarayanan K, Shankar S, Han B, Cao Q, Cao X, Suleman K, Kumar-Sinha C, Dhanasekaran SM, Chen Y, Esgueva R, Banerjee S, LaFargue CJ, Siddiqui J, Demichelis F, Moeller P, Bismar TA, Kuefer R, Fullen DR, Johnson TM, Greenson JK, Giordano TJ, Tan P, Tomlins SA, Varambally S, Rubin MA, Maher CA, et al.: Rearrangements of the RAF kinase pathway in prostate cancer, gastric cancer and melanoma. Nat. Med 2010, 16:793–79810.1038/nm.2166. 163. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotech 2011, 29:24–2610.1038/nbt.1754. 164. Zhang J, Finney R, Edmonson M, Schaefer C, Rowe W, Yan C, Clifford R, Greenblum S, Wu G, Zhang H, Liu H, Nguyen C, Hu Y, Madhavan S, Ding L, Wheeler DA, Gerhard DS, Buetow KH: The Cancer Genome Workbench: Identifying and Visualizing Complex Genetic Alterations in Tumors. NCI Nature Pathway Interaction Database 2010, 10.1038/pid.2010.1Available: http://pid.nci.nih.gov/PID/2010/100309/full/pid.2010.1.shtml.Accessed 11 January 2012. 165. Sanborn JZ, Benz SC, Craft B, Szeto C, Kober KM, Meyer L, Vaske CJ, Goldman M, Smith KE, Kuhn RM, Karolchik D, Kent WJ, Stuart JM, Haussler D, Zhu J: The UCSC cancer genomics browser: update 2011. Nucleic Acids Research 2010, 39:D951– D95910.1093/nar/gkq1113. 201 166. Hogan LE, Meyer JA, Yang J, Wang J, Wong N, Yang W, Condos G, Hunger SP, Raetz E, Saffery R, Relling MV, Bhojwani D, Morrison DJ, Carroll WL: Integrated genomic analysis of relapsed childhood acute lymphoblastic leukemia reveals therapeutic strategies. Blood 2011, 118:5218–522610.1182/blood-2011-04-345595. 167. Cho Y-J, Tsherniak A, Tamayo P, Santagata S, Ligon A, Greulich H, Berhoukim R, Amani V, Goumnerova L, Eberhart CG, Lau CC, Olson JM, Gilbertson RJ, Gajjar A, Delattre O, Kool M, Ligon K, Meyerson M, Mesirov JP, Pomeroy SL: Integrative genomic analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical outcome. J. Clin. Oncol. 2011, 29:1424–143010.1200/JCO.2010.28.5148. 168. Verhaak RGW, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O‘Kelly M, Tamayo P, Weir BA, Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG, James CD, Sarkaria JN, Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP, Gray JW, Meyerson M, et al.: Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17:98–11010.1016/j.ccr.2009.12.020. 169. Floratos A, Smith K, Ji Z, Watkinson J, Califano A: geWorkbench: an open source platform for integrative genomics. Bioinformatics 2010, 26:1779– 178010.1093/bioinformatics/btq282. 170. Zhang J, Benavente CA, McEvoy J, Flores-Otero J, Ding L, Chen X, Ulyanov A, Wu G, Wilson M, Wang J, Brennan R, Rusch M, Manning AL, Ma J, Easton J, Shurtleff S, Mullighan C, Pounds S, Mukatira S, Gupta P, Neale G, Zhao D, Lu C, Fulton RS, Fulton LL, Hong X, Dooling DJ, Ochoa K, Naeve C, Dyson NJ, et al.: A novel retinoblastoma therapy from genomic and epigenetic analyses. Nature 2012, advance online publication10.1038/nature10733Available: http://dx.doi.org/10.1038/nature10733.Accessed 22 January 2012. 171. Scotting PJ, Walker DA, Perilongo G: Childhood solid tumours: a developmental disorder. Nat Rev Cancer 2005, 5:481–48810.1038/nrc1633. 172. Goodman, Gurney, Smith, Olshan: Sympathetic nervous system tumors. Available: http://seer.cancer.gov/publications/childhood/.Accessed 5 March 2011. 173. Huber K: The sympathoadrenal cell lineage: Specification, diversification, and new perspectives. Developmental Biology 2006, 298:335–34316/j.ydbio.2006.07.010. 174. Maris JM: Recent advances in neuroblastoma. N. Engl. J. Med 2010, 362:2202– 221110.1056/NEJMra0804577. 175. Mohlin SA, Wigerup C, Påhlman S: Neuroblastoma aggressiveness in relation to sympathetic neuronal differentiation stage. Seminars in Cancer Biology 2011, 21:276– 28210.1016/j.semcancer.2011.09.002. 202 176. Alam G, Cui H, Shi H, Yang L, Ding J, Mao L, Maltese WA, Ding H-F: MYCN promotes the expansion of Phox2B-positive neuronal progenitors to drive neuroblastoma development. Am. J. Pathol. 2009, 175:856– 86610.2353/ajpath.2009.090019. 177. Joyner BD: Neuroblastoma: eMedicine Urology. 2010, Available: http://emedicine.medscape.com/article/439263-overview.Accessed 5 March 2011. 178. Brodeur GM: Neuroblastoma: biological insights into a clinical enigma. Nat. Rev. Cancer 2003, 3:203–21610.1038/nrc1014. 179. Mueller S, Matthay KK: Neuroblastoma: biology and staging. Curr Oncol Rep 2009, 11:431–438. 180. London WB, Castleberry RP, Matthay KK, Look AT, Seeger RC, Shimada H, Thorner P, Brodeur G, Maris JM, Reynolds CP, Cohn SL: Evidence for an age cutoff greater than 365 days for neuroblastoma risk group stratification in the Children’s Oncology Group. J. Clin. Oncol 2005, 23:6459–646510.1200/JCO.2005.05.571. 181. Brodeur GM, Pritchard J, Berthold F, Carlsen NL, Castel V, Castelberry RP, De Bernardi B, Evans AE, Favrot M, Hedborg F: Revisions of the international criteria for neuroblastoma diagnosis, staging, and response to treatment. J. Clin. Oncol 1993, 11:1466–1477. 182. Monclair T, Brodeur GM, Ambros PF, Brisse HJ, Cecchetto G, Holmes K, Kaneko M, London WB, Matthay KK, Nuchtern JG, von Schweinitz D, Simon T, Cohn SL, Pearson ADJ: The International Neuroblastoma Risk Group (INRG) Staging System: An INRG Task Force Report. Journal of Clinical Oncology 2009, 27:298 – 30310.1200/JCO.2008.16.6876. 183. Cohn SL, Pearson ADJ, London WB, Monclair T, Ambros PF, Brodeur GM, Faldum A, Hero B, Iehara T, Machin D, Mosseri V, Simon T, Garaventa A, Castel V, Matthay KK: The International Neuroblastoma Risk Group (INRG) Classification System: An INRG Task Force Report. Journal of Clinical Oncology 2009, 27:289 – 29710.1200/JCO.2008.16.6785. 184. Øra I, Eggert A: Progress in treatment and risk stratification of neuroblastoma: Impact on future clinical and basic research. Seminars in Cancer Biology 2011, 21:217– 22810.1016/j.semcancer.2011.07.002. 185. Yu AL, Gilman AL, Ozkaynak MF, London WB, Kreissman SG, Chen HX, Smith M, Anderson B, Villablanca JG, Matthay KK, Shimada H, Grupp SA, Seeger R, Reynolds CP, Buxton A, Reisfeld RA, Gillies SD, Cohn SL, Maris JM, Sondel PM: Anti-GD2 antibody with GM-CSF, interleukin-2, and isotretinoin for neuroblastoma. N. Engl. J. Med 2010, 363:1324–133410.1056/NEJMoa0911123. 186. Knudson AG, Strong LC: Mutation and cancer: neuroblastoma and pheochromocytoma. Am J Hum Genet 1972, 24:514–532. 203 187. Janoueix-Lerosey I, Schleiermacher G, Michels E, Mosseri V, Ribeiro A, Lequin D, Vermeulen J, Couturier J, Peuchmaur M, Valent A, Plantaz D, Rubie H, Valteau-Couanet D, Thomas C, Combaret V, Rousseau R, Eggert A, Michon J, Speleman F, Delattre O: Overall genomic pattern is a predictor of outcome in neuroblastoma. J. Clin. Oncol 2009, 27:1026–103310.1200/JCO.2008.16.0630. 188. Mossé YP, Laudenslager M, Longo L, Cole KA, Wood A, Attiyeh EF, Laquaglia MJ, Sennett R, Lynch JE, Perri P, Laureys G, Speleman F, Kim C, Hou C, Hakonarson H, Torkamani A, Schork NJ, Brodeur GM, Tonini GP, Rappaport E, Devoto M, Maris JM: Identification of ALK as a major familial neuroblastoma predisposition gene. Nature 2008, 455:930–93510.1038/nature07261. 189. Mosse YP, Laudenslager M, Khazi D, Carlisle AJ, Winter CL, Rappaport E, Maris JM: Germline PHOX2B mutation in hereditary neuroblastoma. Am. J. Hum. Genet 2004, 75:727–73010.1086/424530. 190. Trochet D, Bourdeaut F, Janoueix-Lerosey I, Deville A, de Pontual L, Schleiermacher G, Coze C, Philip N, Frébourg T, Munnich A, Lyonnet S, Delattre O, Amiel J: Germline mutations of the paired-like homeobox 2B (PHOX2B) gene in neuroblastoma. Am. J. Hum. Genet 2004, 74:761–76410.1086/383253. 191. Pattyn A, Morin X, Cremer H, Goridis C, Brunet J-F: The homeobox gene Phox2b is essential for the development of autonomic neural crest derivatives. Nature 1999, 399:366–37010.1038/20700. 192. Chen Y, Takita J, Choi YL, Kato M, Ohira M, Sanada M, Wang L, Soda M, Kikuchi A, Igarashi T, Nakagawara A, Hayashi Y, Mano H, Ogawa S: Oncogenic mutations of ALK kinase in neuroblastoma. Nature 2008, 455:971–97410.1038/nature07399. 193. George RE, Sanda T, Hanna M, Fröhling S, Luther W, Zhang J, Ahn Y, Zhou W, London WB, McGrady P, Xue L, Zozulya S, Gregor VE, Webb TR, Gray NS, Gilliland DG, Diller L, Greulich H, Morris SW, Meyerson M, Look AT: Activating mutations in ALK provide a therapeutic target in neuroblastoma. Nature 2008, 455:975– 97810.1038/nature07397. 194. Janoueix-Lerosey I, Lequin D, Brugières L, Ribeiro A, de Pontual L, Combaret V, Raynal V, Puisieux A, Schleiermacher G, Pierron G, Valteau-Couanet D, Frebourg T, Michon J, Lyonnet S, Amiel J, Delattre O: Somatic and germline activating mutations of the ALK kinase receptor in neuroblastoma. Nature 2008, 455:967– 97010.1038/nature07398. 195. Passoni L, Longo L, Collini P, Coluccia AML, Bozzi F, Podda M, Gregorio A, Gambini C, Garaventa A, Pistoia V, Del Grosso F, Tonini GP, Cheng M, Gambacorti-Passerini C, Anichini A, Fossati-Bellani F, Di Nicola M, Luksch R: Mutation-independent anaplastic lymphoma kinase overexpression in poor prognosis neuroblastoma patients. Cancer Res 2009, 69:7338–734610.1158/0008-5472.CAN-08-4419. 204 196. Deyell RJ, Attiyeh EF: Advances in the understanding of constitutional and somatic genomic alterations in neuroblastoma. Cancer Genetics 2011, 204:113– 12116/j.cancergen.2011.03.001. 197. Capasso M, Devoto M, Hou C, Asgharzadeh S, Glessner JT, Attiyeh EF, Mosse YP, Kim C, Diskin SJ, Cole KA, Bosse K, Diamond M, Laudenslager M, Winter C, Bradfield JP, Scott RH, Jagannathan J, Garris M, McConville C, London WB, Seeger RC, Grant SFA, Li H, Rahman N, Rappaport E, Hakonarson H, Maris JM: Common variations in BARD1 influence susceptibility to high-risk neuroblastoma. Nat. Genet 2009, 41:718– 72310.1038/ng.374. 198. Maris JM, Mosse YP, Bradfield JP, Hou C, Monni S, Scott RH, Asgharzadeh S, Attiyeh EF, Diskin SJ, Laudenslager M, Winter C, Cole KA, Glessner JT, Kim C, Frackelton EC, Casalunovo T, Eckert AW, Capasso M, Rappaport EF, McConville C, London WB, Seeger RC, Rahman N, Devoto M, Grant SFA, Li H, Hakonarson H: Chromosome 6p22 locus associated with clinically aggressive neuroblastoma. N. Engl. J. Med 2008, 358:2585– 259310.1056/NEJMoa0708698. 199. Nguyen LB, Diskin SJ, Capasso M, Wang K, Diamond MA, Glessner J, Kim C, Attiyeh EF, Mosse YP, Cole K, Iolascon A, Devoto M, Hakonarson H, Li HK, Maris JM: Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility Loci. PLoS Genet. 2011, 7:e100202610.1371/journal.pgen.1002026. 200. Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mossé YP, Wood A, Lynch JE, Pecor K, Diamond M, Winter C, Wang K, Kim C, Geiger EA, McGrady PW, Blakemore AIF, London WB, Shaikh TH, Bradfield J, Grant SFA, Li H, Devoto M, Rappaport ER, Hakonarson H, Maris JM: Copy number variation at 1q21.1 associated with neuroblastoma. Nature 2009, 459:987–99110.1038/nature08035. 201. Schwab M, Alitalo K, Klempnauer K-H, Varmus HE, Bishop JM, Gilbert F, Brodeur G, Goldstein M, Trent J: Amplified DNA with limited homology to myc cellular oncogene is shared by human neuroblastoma cell lines and a neuroblastoma tumour. Nature 1983, 305:245–24810.1038/305245a0. 202. Brodeur G, Seeger R, Schwab M, Varmus H, Bishop J: Amplification of N-myc in untreated human neuroblastomas correlates with advanced disease stage. Science 1984, 224:1121 –112410.1126/science.6719137. 203. Seeger RC, Brodeur GM, Sather H, Dalton A, Siegel SE, Wong KY, Hammond D: Association of multiple copies of the N-myc oncogene with rapid progression of neuroblastomas. N. Engl. J. Med 1985, 313:1111–111610.1056/NEJM198510313131802. 204. Subramaniam MM, Piqueras M, Navarro S, Noguera R: Aberrant copy numbers of ALK gene is a frequent genetic alteration in neuroblastomas. Hum. Pathol 2009, 40:1638–164210.1016/j.humpath.2009.05.002. 205 205. Attiyeh EF, London WB, Mossé YP, Wang Q, Winter C, Khazi D, McGrady PW, Seeger RC, Look AT, Shimada H, Brodeur GM, Cohn SL, Matthay KK, Maris JM: Chromosome 1p and 11q deletions and outcome in neuroblastoma. N. Engl. J. Med 2005, 353:2243–225310.1056/NEJMoa052399. 206. Guo C, White PS, Weiss MJ, Hogarty MD, Thompson PM, Stram DO, Gerbing R, Matthay KK, Seeger RC, Brodeur GM, Maris JM: Allelic deletion at 11q23 is common in MYCN single copy neuroblastomas. Oncogene 1999, 18:4948– 495710.1038/sj.onc.1202887. 207. Abel F, Ejeskär K, Kogner P, Martinsson T: Gain of chromosome arm 17q is associated with unfavourable prognosis in neuroblastoma, but does not involve mutations in the somatostatin receptor 2(SSTR2) gene at 17q24. Br. J. Cancer 1999, 81:1402–140910.1038/sj.bjc.6692231. 208. Stallings RL, Carty P, McArdle L, Mullarkey M, McDermott M, Breatnach F, O‘Meara A: Molecular cytogenetic analysis of recurrent unbalanced t(11;17) in neuroblastoma. Cancer Genet. Cytogenet 2004, 154:44–5110.1016/j.cancergencyto.2004.04.003. 209. Stark B, Jeison M, Glaser-Gabay L, Bar-Am I, Mardoukh J, Ash S, Atias D, Stein J, Zaizov R, Yaniv I: der(11)t(11;17): a distinct cytogenetic pathway of advanced stage neuroblastoma (NBL) - detected by spectral karyotyping (SKY). Cancer Lett 2003, 197:75–79. 210. Nakagawara A, Arima-Nakagawara M, Scavarda NJ, Azar CG, Cantor AB, Brodeur GM: Association between high levels of expression of the TRK gene and favorable outcome in human neuroblastoma. N. Engl. J. Med 1993, 328:847– 85410.1056/NEJM199303253281205. 211. Rydén M, Sehgal R, Dominici C, Schilling FH, Ibáñez CF, Kogner P: Expression of mRNA for the neurotrophin receptor trkC in neuroblastomas with favourable tumour stage and good prognosis. Br J Cancer 1996, 74:773–779. 212. Nakagawara A, Azar CG, Scavarda NJ, Brodeur GM: Expression and function of TRK-B and BDNF in human neuroblastomas. Mol. Cell. Biol 1994, 14:759–767. 213. Wei JS, Greer BT, Westermann F, Steinberg SM, Son C-G, Chen Q-R, Whiteford CC, Bilke S, Krasnoselsky AL, Cenacchi N, Catchpoole D, Berthold F, Schwab M, Khan J: Prediction of clinical outcome using gene expression profiling and artificial neural networks for patients with neuroblastoma. Cancer Res 2004, 64:6883–689110.1158/00085472.CAN-04-0695. 214. Asgharzadeh S, Pique-Regi R, Sposto R, Wang H, Yang Y, Shimada H, Matthay K, Buckley J, Ortega A, Seeger RC: Prognostic significance of gene expression profiles of metastatic neuroblastomas lacking MYCN gene amplification. J. Natl. Cancer Inst. 2006, 98:1193–120310.1093/jnci/djj330. 206 215. Ohira M, Oba S, Nakamura Y, Isogai E, Kaneko S, Nakagawa A, Hirata T, Kubo H, Goto T, Yamada S, Yoshida Y, Fuchioka M, Ishii S, Nakagawara A: Expression profiling using a tumor-specific cDNA microarray predicts the prognosis of intermediate risk neuroblastomas. Cancer Cell 2005, 7:337–35010.1016/j.ccr.2005.03.019. 216. Oberthuer A, Berthold F, Warnat P, Hero B, Kahlert Y, Spitz R, Ernestus K, König R, Haas S, Eils R, Schwab M, Brors B, Westermann F, Fischer M: Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J. Clin. Oncol. 2006, 24:5070– 507810.1200/JCO.2006.06.1879. 217. Oberthuer A, Hero B, Berthold F, Juraeva D, Faldum A, Kahlert Y, Asgharzadeh S, Seeger R, Scaruffi P, Tonini GP, Janoueix-Lerosey I, Delattre O, Schleiermacher G, Vandesompele J, Vermeulen J, Speleman F, Noguera R, Piqueras M, Bénard J, Valent A, Avigad S, Yaniv I, Weber A, Christiansen H, Grundy RG, Schardt K, Schwab M, Eils R, Warnat P, Kaderali L, et al.: Prognostic impact of gene expression-based classification for neuroblastoma. J. Clin. Oncol. 2010, 28:3506–351510.1200/JCO.2009.27.3367. 218. Vermeulen J, De Preter K, Naranjo A, Vercruysse L, Roy NV, Hellemans J, Swerts K, Bravo S, Scaruffi P, Tonini GP, Noguera R, Piqueras M, Janoueix-Lerosey I, Delattre O, Combaret V, Fischer M, Oberthuer A, Ambros PF, Beiske K, Bénard J, Marques B, Michon J, Schleiermacher G, Bernardi BD, Rubie H, Cañete A, Castel V, Kohler J, Pötschger U, Ladenstein R, et al.: Outcome Prediction of Children with Neuroblastoma using a Multigene Expression Signature, a Retrospective SIOPEN/COG/GPOH Study. Lancet Oncol 2009, 10:663–67110.1016/S1470-2045(09)70154-8. 219. Politi K, Pao W: How genetically engineered mouse tumor models provide insights into human cancers. J. Clin. Oncol. 2011, 29:2273–228110.1200/JCO.2010.30.8304. 220. Chesler L, Weiss WA: Genetically engineered murine models – Contribution to our understanding of the genetics, molecular pathology and therapeutic targeting of neuroblastoma. Seminars in Cancer Biology 2011, 21:245– 25510.1016/j.semcancer.2011.09.011. 221. Weiss WA, Aldape K, Mohapatra G, Feuerstein BG, Bishop JM: Targeted expression of MYCN causes neuroblastoma in transgenic mice. EMBO J 1997, 16:2985– 299510.1093/emboj/16.11.2985. 222. Rounbehler RJ, Li W, Hall MA, Yang C, Fallahi M, Cleveland JL: Targeting Ornithine Decarboxylase Impairs Development of MYCN-Amplified Neuroblastoma. Cancer Res 2009, 69:547–55310.1158/0008-5472.CAN-08-2968. 223. Teitz T, Stanke JJ, Federico S, Bradley CL, Brennan R, Zhang J, Johnson MD, Sedlacik J, Inoue M, Zhang ZM, Frase S, Rehg JE, Hillenbrand CM, Finkelstein D, Calabrese C, Dyer MA, Lahti JM: Preclinical Models for Neuroblastoma: Establishing a Baseline for Treatment. PLoS ONE 2011, 6:e1913310.1371/journal.pone.0019133. 207 224. Glenn TC: Field guide to next-generation DNA sequencers. Mol Ecol Resour 2011, 10.1111/j.1755-0998.2011.03024.xAvailable: http://www.ncbi.nlm.nih.gov/pubmed/21592312.Accessed 19 July 2011. 225. Huang X, Saint-Jeannet J-P: Induction of the neural crest and the opportunities of life on the edge. Developmental Biology 2004, 275:1–1116/j.ydbio.2004.07.033. 226. Anderson DJ: The neural crest cell lineage problem: Neuropoiesis? Neuron 1989, 3:1–1216/0896-6273(89)90110-4. 227. Anderson DJ, Carnahan JF, Michelsohn A, Patterson PH: Antibody markers identify a common progenitor to sympathetic neurons and chromaffin cells in vivo and reveal the timing of commitment to neuronal differentiation in the sympathoadrenal lineage. J. Neurosci 1991, 11:3507–3519. 228. Nakagawara A, Ohira M: Comprehensive genomics linking between neural development and cancer: neuroblastoma as a model. Cancer Letters 2004, 204:213– 22416/S0304-3835(03)00457-9. 229. Jiang M, Stanke J, Lahti JM: The connections between neural crest development and neuroblastoma. Curr. Top. Dev. Biol 2011, 94:77–12710.1016/B978-0-12-380916-2.000048. 230. Prockop DJ: Marrow Stromal Cells as Stem Cells for Nonhematopoietic Tissues. Science 1997, 276:71 –7410.1126/science.276.5309.71. 231. Gage FH: Mammalian Neural Stem Cells. Science 2000, 287:1433 – 143810.1126/science.287.5457.1433. 232. Reynolds B, Weiss S: Generation of neurons and astrocytes from isolated cells of the adult mammalian central nervous system. Science 1992, 255:1707 – 171010.1126/science.1553558. 233. Toma JG, Akhavan M, Fernandes KJL, Barnabe-Heider F, Sadikot A, Kaplan DR, Miller FD: Isolation of multipotent adult stem cells from the dermis of mammalian skin. Nat Cell Biol 2001, 3:778–78410.1038/ncb0901-778. 234. Toma JG, McKenzie IA, Bagli D, Miller FD: Isolation and characterization of multipotent skin-derived precursors from human skin. Stem Cells 2005, 23:727– 73710.1634/stemcells.2004-0134. 235. Fernandes KJL, McKenzie IA, Mill P, Smith KM, Akhavan M, Barnabe-Heider F, Biernaskie J, Junek A, Kobayashi NR, Toma JG, Kaplan DR, Labosky PA, Rafuse V, Hui CC, Miller FD: A dermal niche for multipotent adult skin-derived precursor cells. Nat Cell Biol 2004, 6:1082–109310.1038/ncb1181. 208 236. Biernaskie J, Paris M, Morozova O, Fagan BM, Marra M, Pevny L, Miller FD: SKPs derive from hair follicle precursors and exhibit properties of adult dermal stem cells. Cell Stem Cell 2009, 5:610–62310.1016/j.stem.2009.10.019. 237. Christ B, Ordahl CP: Early stages of chick somite development. Anat. Embryol. 1995, 191:381–396. 238. Couly G, Grapin-Botton A, Coltey P, Ruhin B, Le Douarin NM: Determination of the identity of the derivatives of the cephalic neural crest: incompatibility between Hox gene expression and lower jaw development. Development 1998, 125:3445–3459. 239. Mauger A: [The role of somitic mesoderm in the development of dorsal plumage in chick embryos. II. Regionalization of the plumage-forming mesoderm]. J Embryol Exp Morphol 1972, 28:343–366. 240. Lanza RP: Handbook of stem cells. Academic Press; 2004. 241. Okita K, Ichisaka T, Yamanaka S: Generation of germline-competent induced pluripotent stem cells. Nature 2007, 448:313–31710.1038/nature05934. 242. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, Hochedlinger K, Bernstein BE, Jaenisch R: In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 2007, 448:318–32410.1038/nature05944. 243. Smith KM, Datti A, Fujitani M, Grinshtein N, Zhang L, Morozova O, Blakely KM, Rotenberg SA, Hansford LM, Miller FD, Yeger H, Irwin MS, Moffat J, Marra MA, Baruchel S, Wrana JL, Kaplan DR: Selective targeting of neuroblastoma tumour-initiating cells by compounds identified in stem cell-based small molecule screens. EMBO Mol Med 2010, 2:371–38410.1002/emmm.201000093. 244. Morozova O, Vojvodic M, Grinshtein N, Hansford LM, Blakely KM, Maslova A, Hirst M, Cezard T, Morin RD, Moore R, Smith KM, Miller F, Taylor P, Thiessen N, Varhol R, Zhao Y, Jones S, Moffat J, Kislinger T, Moran MF, Kaplan DR, Marra MA: System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin. Cancer Res 2010, 16:4572–458210.1158/1078-0432.CCR10-0627. 245. Jessen KR, Mirsky R: The origin and development of glial cells in peripheral nerves. Nat. Rev. Neurosci 2005, 6:671–68210.1038/nrn1746. 246. Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit Seds.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer-Verlag; 2005 Available: http://www.springerlink.com/content/978-0-387-251462#section=519945&page=1.Accessed 6 June 2011. 247. Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, Guedj M: Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A 209 Comparison of Variance Modeling Strategies. PLoS ONE 2010, 5:e1233610.1371/journal.pone.0012336. 248. Jeffery IB, Higgins DG, Culhane AC: Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 2006, 7:35910.1186/1471-2105-7-359. 249. Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3:Article310.2202/1544-6115.1027. 250. Sauka-Spengler T, Meulemans D, Jones M, Bronner-Fraser M: Ancient evolutionary origin of the neural crest gene regulatory network. Dev. Cell 2007, 13:405– 42010.1016/j.devcel.2007.08.005. 251. Stemple DL, Anderson DJ: Isolation of a stem cell for neurons and glia from the mammalian neural crest. Cell 1992, 71:973–985. 252. Liu JP, Jessell TM: A role for rhoB in the delamination of neural crest cells from the dorsal neural tube. Development 1998, 125:5055–5067. 253. Kurauchi T, Izutsu Y, Maéno M: Involvement of Neptune in induction of the hatching gland and neural crest in the Xenopus embryo. Differentiation, 79:251– 25916/j.diff.2010.01.003. 254. Wong Y-M, Chow KL: Expression of zebrafish mab21 genes marks the differentiating eye, midbrain and neural tube. Mech. Dev 2002, 113:149–152. 255. Schraufstatter IU, Discipio RG, Khaldoyanidi S: Mesenchymal stem cells and their microenvironment. Front. Biosci. 2011, 17:2271–2288. 256. Vodyanik MA, Yu J, Zhang X, Tian S, Stewart R, Thomson JA, Slukvin II: A Mesoderm-Derived Precursor for Mesenchymal Stem and Endothelial Cells. Cell Stem Cell 2010, 7:718–72910.1016/j.stem.2010.11.011. 257. Kléber M, Lee H-Y, Wurdak H, Buchstaller J, Riccomagno MM, Ittner LM, Suter U, Epstein DJ, Sommer L: Neural crest stem cell maintenance by combinatorial Wnt and BMP signaling. J. Cell Biol. 2005, 169:309–32010.1083/jcb.200411095. 258. Douarin NL, Kalcheim C: The neural crest. Cambridge University Press; 1999. 259. Boon K, Osório EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, de Souza SJ, Riggins GJ: An anatomy of normal and malignant gene expression. Proceedings of the National Academy of Sciences 2002, 99:11287 –1129210.1073/pnas.152324199. 210 260. Morozova O, Morozov V, Hoffman BG, Helgason CD, Marra MA: A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 2008, 3:e320510.1371/journal.pone.0003205. 261. Robinson WS: A method for chronologically ordering archaeological deposits. American Antiquity 1951, 16:293–301. 262. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122:947– 95610.1016/j.cell.2005.08.020. 263. Roider HG, Manke T, O‘Keeffe S, Vingron M, Haas SA: PASTAA: identifying transcription factors associated with sets of co-regulated genes. Bioinformatics 2009, 25:435–44210.1093/bioinformatics/btn627. 264. Radomska HS, Satterthwaite AB, Taranenko N, Narravula S, Krause DS, Tenen DG: A nuclear factor Y (NFY) site positively regulates the human CD34 stem cell gene. Blood 1999, 94:3772–3780. 265. Winger Q, Huang J, Auman HJ, Lewandoski M, Williams T: Analysis of transcription factor AP-2 expression and function during mouse preimplantation development. Biol. Reprod. 2006, 75:324–33310.1095/biolreprod.106.052407. 266. Schmidt M, Huber L, Majdazari A, Schütz G, Williams T, Rohrer H: The transcription factors AP-2β and AP-2α are required for survival of sympathetic progenitors and differentiated sympathetic neurons. Dev. Biol. 2011, 355:89– 10010.1016/j.ydbio.2011.04.011. 267. Cesari F, Brecht S, Vintersten K, Vuong LG, Hofmann M, Klingel K, Schnorr J-J, Arsenian S, Schild H, Herdegen T, Wiebel FF, Nordheim A: Mice deficient for the ets transcription factor elk-1 show normal immune responses and mildly impaired neuronal gene activation. Mol. Cell. Biol. 2004, 24:294–305. 268. Dworkin S, Mantamadiotis T: Targeting CREB signalling in neurogenesis. Expert Opin. Ther. Targets 2010, 14:869–87910.1517/14728222.2010.501332. 269. Ryser S, Dizin E, Jefford CE, Delaval B, Gagos S, Christodoulidou A, Krause K-H, Birnbaum D, Irminger-Finger I: Distinct Roles of BARD1 Isoforms in Mitosis: FullLength BARD1 Mediates Aurora B Degradation, Cancer-Associated BARD1β Scaffolds Aurora B and BRCA2. Cancer Research 2009, 69:1125 –113410.1158/0008-5472.CAN08-2134. 270. Modlin IM, Champaneria MC, Bornschein J, Kidd M: Evolution of the diffuse neuroendocrine system--clear cells and cloudy origins. Neuroendocrinology 2006, 84:69– 8210.1159/000096997. 211 271. Kuijk EW, Chuva de Sousa Lopes SM, Geijsen N, Macklon N, Roelen BAJ: The different shades of mammalian pluripotent stem cells. Hum. Reprod. Update 2011, 17:254–27110.1093/humupd/dmq035. 272. Wurdak H, Ittner LM, Lang KS, Leveen P, Suter U, Fischer JA, Karlsson S, Born W, Sommer L: Inactivation of TGFbeta signaling in neural crest stem cells leads to multiple defects reminiscent of DiGeorge syndrome. Genes Dev 2005, 19:530– 53510.1101/gad.317405. 273. Chen M-F, Lin C-T, Chen W-C, Yang C-T, Chen C-C, Liao S-K, Liu JM, Lu C-H, Lee K-D: The sensitivity of human mesenchymal stem cells to ionizing radiation. Int. J. Radiat. Oncol. Biol. Phys. 2006, 66:244–25310.1016/j.ijrobp.2006.03.062. 274. Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu A, Ma K, Lee S, Ally A, Tam A, Sa D, Rogers S, Charest D, Stott J, Zuyderduyn S, Varhol R, Eaves C, Jones S, Holt R, Hirst M, Hoodless PA, Marra MA: Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines. Genome Res 2007, 17:108–11610.1101/gr.5488207. 275. Caraux G, Pinloche S: PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order. Bioinformatics 2005, 21:1280– 128110.1093/bioinformatics/bti141. 276. Sokal RR, Michener CD: A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin 1958, 28:1409–1438. 277. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 1998, 95:14863–14868. 278. Rodriguez-Esteban C, Tsukui T, Yonei S, Magallon J, Tamura K, Izpisua Belmonte JC: The T-box genes Tbx4 and Tbx5 regulate limb outgrowth and identity. Nature 1999, 398:814–81810.1038/19769. 279. Hansford LM, McKee AE, Zhang L, George RE, Gerstle JT, Thorner PS, Smith KM, Look AT, Yeger H, Miller FD, Irwin MS, Thiele CJ, Kaplan DR: Neuroblastoma cells isolated from bone marrow metastases contain a naturally enriched tumor-initiating cell. Cancer Res 2007, 67:11234–1124310.1158/0008-5472.CAN-07-0718. 280. Pahlman, Sven, Johnsson, Sofie, Pietras, Alexander: Patient-derived EBVimmortalized B-lymphocytes are a dominant contaminant of in vitro cultured human neuroblastoma tumor-initiating cells isolated from bone marrow. 2011, Available: http://www.abstractsonline.com/Plan/ViewAbstract.aspx?sKey=d2eb516b-a2a1-4fac-9e8d3b032d6bc731&cKey=29fa6ed6-fc53-4101-8a24-608b68ef3e2f&mKey=%7B507D311AB6EC-436A-BD67-6D14ED39622C%7D.Accessed 20 May 2011. 281. Chen Y, Li D, Li S: The Alox5 gene is a novel therapeutic target in cancer stem cells of chronic myeloid leukemia. cc 2009, 8:3488–349210.4161/cc.8.21.9852. 212 282. Bitton D, Okoniewski M, Connolly Y, Miller C: Exon level integration of proteomics and microarray data. BMC Bioinformatics 2008, 9:11810.1186/1471-2105-9-118. 283. Okoniewski MJ, Miller CJ: Comprehensive Analysis of Affymetrix Exon Arrays Using BioConductor. PLoS Comput Biol 2008, 4:e610.1371/journal.pcbi.0040006. 284. Taylor P, Nielsen PA, Trelle MB, Hørning OB, Andersen MB, Vorm O, Moran MF, Kislinger T: Automated 2D Peptide Separation on a 1D Nano-LC-MS System. J. Proteome Res. 2009, 8:1610–161610.1021/pr800986c. 285. Chen EI, Hewel J, Felding-Habermann B, Yates JR: Large Scale Protein Profiling by Combination of Protein Fractionation and Multidimensional Protein Identification Technology (MudPIT). Molecular & Cellular Proteomics 2006, 5:53 – 5610.1074/mcp.T500013-MCP200. 286. Skibbens RV: Cell biology of cancer: BRCA1 and sister chromatid pairing reactions? cc 2008, 7:449–45210.4161/cc.7.4.5435. 287. Billingsley ML: Druggable targets and targeted drugs: enhancing the development of new therapeutics. Pharmacology 2008, 82:239–24410.1159/000157624. 288. Tobinick EL: The value of drug repositioning in the current pharmaceutical market. Drug News Perspect 2009, 22:5310.1358/dnp.2009.22.1.1303818. 289. Goldsmith KC, Hogarty MD: Targeting programmed cell death pathways with experimental therapeutics: opportunities in high-risk neuroblastoma. Cancer Letters 2005, 228:133–14116/j.canlet.2005.01.048. 290. Daniel RA, Rozanska AL, Thomas HD, Mulligan EA, Drew Y, Castelbuono DJ, Hostomsky Z, Plummer ER, Boddy AV, Tweddle DA, Curtin NJ, Clifford SC: Inhibition of poly(ADP-ribose) polymerase-1 enhances temozolomide and topotecan activity against childhood neuroblastoma. Clin. Cancer Res 2009, 15:1241–124910.1158/1078-0432.CCR08-1095. 291. Witt O, Deubzer HE, Lodrini M, Milde T, Oehme I: Targeting histone deacetylases in neuroblastoma. Curr. Pharm. Des 2009, 15:436–447. 292. Gautschi O, Heighway J, Mack PC, Purnell PR, Lara PN, Gandara DR: Aurora kinases as anticancer drug targets. Clin. Cancer Res 2008, 14:1639–164810.1158/1078-0432.CCR07-2179. 293. Alley MC, Scudiero DA, Monks A, Hursey ML, Czerwinski MJ, Fine DL, Abbott BJ, Mayo JG, Shoemaker RH, Boyd MR: Feasibility of Drug Screening with Panels of Human Tumor Cell Lines Using a Microculture Tetrazolium Assay. Cancer Research 1988, 48:589 –601. 294. Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, 213 Peckham HE, Manning JM, McKernan KJ, Grimmond SM: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 2008, 5:613– 61910.1038/nmeth.1223. 295. Shah SH, Pallas JA: Identifying differential exon splicing using linear models and correlation coefficients. BMC Bioinformatics 2009, 10:2610.1186/1471-2105-10-26. 296. The UniProt Consortium: Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Research 2010, 39:D214–D21910.1093/nar/gkq1020. 297. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu A-L, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJM, Hoodless PA, Birol I: De novo assembly and analysis of RNA-seq data. Nat. Methods 2010, 7:909–91210.1038/nmeth.1517. 298. Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, et al.: Ensembl 2009. Nucleic Acids Research 2009, 37:D690–D69710.1093/nar/gkn828. 299. Irminger-Finger I, Jefford CE: Is there more to BARD1 than BRCA1? Nat. Rev. Cancer 2006, 6:382–39110.1038/nrc1878. 300. Shakya R, Szabolcs M, McCarthy E, Ospina E, Basso K, Nandula S, Murty V, Baer R, Ludwig T: The basal-like mammary carcinomas induced by Brca1 or Bard1 inactivation implicate the BRCA1/BARD1 heterodimer in tumor suppression. Proc. Natl. Acad. Sci. U.S.A. 2008, 105:7040–704510.1073/pnas.0711032105. 301. Li L, Ryser S, Dizin E, Pils D, Krainer M, Jefford CE, Bertoni F, Zeillinger R, IrmingerFinger I: Oncogenic BARD1 isoforms expressed in gynecological cancers. Cancer Res. 2007, 67:11876–1188510.1158/0008-5472.CAN-07-2370. 302. Sporn JC, Hothorn T, Jung B: BARD1 expression predicts outcome in colon cancer. Clin. Cancer Res. 2011, 17:5451–546210.1158/1078-0432.CCR-11-0263. 303. Zhang Y-Q, Bianco A, Malkinson AM, Leoni VP, Frau G, De Rosa N, André P-A, Versace R, Boulvain M, Laurent GJ, Atzori L, Irminger-Finger I: BARD1: An independent predictor of survival in non-small cell lung cancer. International Journal of Cancer. Journal International Du Cancer 2011, 10.1002/ijc.26346Available: http://www.ncbi.nlm.nih.gov/pubmed/21815143.Accessed 24 January 2012. 304. Shang X, Burlingame SM, Okcu MF, Ge N, Russell HV, Egler RA, David RD, Vasudevan SA, Yang J, Nuchtern JG: Aurora A is a negative prognostic factor and a new therapeutic target in human neuroblastoma. Mol. Cancer Ther 2009, 8:2461– 246910.1158/1535-7163.MCT-08-0857. 214 305. Lens SMA, Voest EE, Medema RH: Shared and separate functions of polo-like kinases and aurora kinases in cancer. Nat Rev Cancer 2010, 10:825–84110.1038/nrc2964. 306. Westerhout E, Kool M, Molenaar J, Stroeken, den Boer M, Segers S, Clifford S, Delattre O, Benetkiewicz M, Lanvers C, Pieters R, Pietsch T, Holst M, Renshaw J, Shipley J, Serra M, Scotlandi K, Geoerger B, Vassal G, Degrand O, Verschuur A, Versteeg R, Caron H: OR1 The KidsCancerKinome: Validation of Aurora kinases as potential drug targets in neuroblastoma and other pediatric tumors. 2010, Available: http://www.anr2010.com/anr2010_data/documents/ANR%202010%20for%20web.pdf.Acces sed 11 July 2011. 307. Grinshtein N, Datti A, Fujitani M, Uehling D, Prakesch M, Isaac M, Irwin MS, Wrana JL, Al-Awar R, Kaplan DR: Small molecule kinase inhibitor screen identifies polo-like kinase 1 as a target for neuroblastoma tumor-initiating cells. Cancer Res 2011, 71:1385– 139510.1158/0008-5472.CAN-10-2484. 308. Ackermann S, Goeser F, Schulte JH, Schramm A, Ehemann V, Hero B, Eggert A, Berthold F, Fischer M: Polo-like kinase 1 is a therapeutic target in high-risk neuroblastoma. Clin. Cancer Res 2011, 17:731–74110.1158/1078-0432.CCR-10-1129. 309. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18:1851–185810.1101/gr.078212.108. 310. Keller A, Nesvizhskii AI, Kolker E, Aebersold R: Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search. Analytical Chemistry 2002, 74:5383–539210.1021/ac025747h. 311. Nesvizhskii AI, Keller A, Kolker E, Aebersold R: A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem 2003, 75:4646–4658. 312. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25:2078–207910.1093/bioinformatics/btp352. 313. Suzuki R, Shimodaira H: Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 2006, 22:1540 –154210.1093/bioinformatics/btl117. 314. Kislinger T, Gramolini AO, MacLennan DH, Emili A: Multidimensional protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue. J. Am. Soc. Mass Spectrom. 2005, 16:1207–122010.1016/j.jasms.2005.02.015. 315. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C: Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 2009, 27:182–18910.1038/nbt.1523. 215 316. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456:53–5910.1038/nature07517. 317. Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner M-M, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, et al.: The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009, 19:1316–132310.1101/gr.080531.108. 318. Pruitt KD, Tatusova T, Klimke W, Maglott DR: NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2009, 37:D32–3610.1093/nar/gkn721. 319. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25:1754–176010.1093/bioinformatics/btp324. 320. Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K, Sivachenko A, Kryukov GV, Lawrence MS, Sougnez C, McKenna A, Shefler E, Ramos AH, Stojanov P, Carter SL, Voet D, Cortés ML, Auclair D, Berger MF, Saksena G, Guiducci C, Onofrio RC, Parkin M, Romkes M, Weissfeld JL, Seethala RR, Wang L, Rangel-Escareño C, Fernandez-Lopez JC, Hidalgo-Miranda A, Melendez-Zajgla J, et al.: The mutational landscape of head and neck squamous cell carcinoma. Science 2011, 333:1157–116010.1126/science.1208130. 321. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen GB, Yeung G, Dahl F, Fernandez A, Staker B, Pant KP, Baccash J, Borcherding AP, Brownley A, Cedeno R, Chen L, Chernikoff D, Cheung A, Chirita R, Curson B, Ebert JC, Hacker CR, Hartlage R, Hauser B, Huang S, Jiang Y, Karpinchyk V, et al.: Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays. Science 2010, 327:78 –8110.1126/science.1181498. 322. Loeb LA: Human cancers express mutator phenotypes: origin, consequences and targeting. Nat Rev Cancer 2011, 11:450–45710.1038/nrc3063. 323. Lovejoy CA, Lock K, Yenamandra A, Cortez D: DDB1 maintains genome integrity through regulation of Cdt1. Mol. Cell. Biol. 2006, 26:7977–799010.1128/MCB.00819-06. 324. Holmberg C, Fleck O, Hansen HA, Liu C, Slaaby R, Carr AM, Nielsen O: Ddb1 controls genome stability and meiosis in fission yeast. Genes & Development 2005, 19:853 –86210.1101/gad.329905. 325. Shimanouchi K, Takata K, Yamaguchi M, Murakami S, Ishikawa G, Takeuchi R, Kanai Y, Ruike T, Nakamura R, Abe Y, Sakaguchi K: Drosophila Damaged DNA Binding Protein 1 Contributes to Genome Stability in Somatic Cells. J Biochem 2006, 139:51– 5810.1093/jb/mvj006. 216 326. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR, Wooster R: The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 2004, 91:355–35810.1038/sj.bjc.6601894. 327. Bradić M, Costa J, Chelo IM: Genotyping with Sequenom. Methods Mol. Biol. 2011, 772:193–21010.1007/978-1-61779-228-1_11. 328. Thomas RK, Baker AC, Debiasi RM, Winckler W, Laframboise T, Lin WM, Wang M, Feng W, Zander T, MacConaill L, Macconnaill LE, Lee JC, Nicoletti R, Hatton C, Goyette M, Girard L, Majmudar K, Ziaugra L, Wong K-K, Gabriel S, Beroukhim R, Peyton M, Barretina J, Dutt A, Emery C, Greulich H, Shah K, Sasaki H, Gazdar A, Minna J, et al.: High-throughput oncogene mutation profiling in human cancer. Nat. Genet. 2007, 39:347–35110.1038/ng1975. 329. Getz G, Höfling H, Mesirov JP, Golub TR, Meyerson M, Tibshirani R, Lander ES: Comment on ―The consensus coding sequences of human breast and colorectal cancers.‖Science 2007, 317:150010.1126/science.1138764. 330. Bentires-Alj M, Paez JG, David FS, Keilhack H, Halmos B, Naoki K, Maris JM, Richardson A, Bardelli A, Sugarbaker DJ, Richards WG, Du J, Girard L, Minna JD, Loh ML, Fisher DE, Velculescu VE, Vogelstein B, Meyerson M, Sellers WR, Neel BG: Activating mutations of the noonan syndrome-associated SHP2/PTPN11 gene in human solid tumors and adult acute myelogenous leukemia. Cancer Res. 2004, 64:8816– 882010.1158/0008-5472.CAN-04-1923. 331. Tartaglia M, Niemeyer CM, Fragale A, Song X, Buechner J, Jung A, Hahlen K, Hasle H, Licht JD, Gelb BD: Somatic mutations in PTPN11 in juvenile myelomonocytic leukemia, myelodysplastic syndromes and acute myeloid leukemia. Nat Genet 2003, 34:148–15010.1038/ng1156. 332. Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, van der Burgt I, Crosby AH, Ion A, Jeffery S, Kalidas K, Patton MA, Kucherlapati RS, Gelb BD: Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet 2001, 29:465–46810.1038/ng772. 333. Ketroussi F, Giuliani M, Bahri R, Azzarone B, Charpentier B, Durrbach A: Lymphocyte Cell-Cycle Inhibition by HLA-G Is Mediated by Phosphatase SHP-2 and Acts on the mTOR Pathway. PLoS ONE 2011, 6:e2277610.1371/journal.pone.0022776. 334. Hoover AC, Strand GL, Nowicki PN, Anderson ME, Vermeer PD, Klingelhutz AJ, Bossler AD, Pottala JV, Hendriks WJAJ, Lee JH: Impaired PTPN13 phosphatase activity in spontaneous or HPV-induced squamous cell carcinomas potentiates oncogene signaling via the MAP kinase pathway. Oncogene 2009, 28:3960– 397010.1038/onc.2009.251. 335. Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols 2008, 4:44– 5710.1038/nprot.2008.211. 217 336. Gui Y, Guo G, Huang Y, Hu X, Tang A, Gao S, Wu R, Chen C, Li X, Zhou L, He M, Li Z, Sun X, Jia W, Chen J, Yang S, Zhou F, Zhao X, Wan S, Ye R, Liang C, Liu Z, Huang P, Liu C, Jiang H, Wang Y, Zheng H, Sun L, Liu X, Jiang Z, et al.: Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat. Genet. 2011, 43:875–87810.1038/ng.907. 337. Wilson BG, Roberts CWM: SWI/SNF nucleosome remodellers and cancer. Nat Rev Cancer 2011, 11:481–49210.1038/nrc3068. 338. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res 2009, 19:1117– 112310.1101/gr.089532.108. 339. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res 2009, 19:1639–164510.1101/gr.092759.109. 340. Nückel H, Frey UH, Sellmann L, Collins CH, Dührsen U, Siffert W: The IKZF3 (Aiolos) transcription factor is highly upregulated and inversely correlated with clinical progression in chronic lymphocytic leukaemia. Br. J. Haematol. 2009, 144:268– 27010.1111/j.1365-2141.2008.07442.x. 341. Foster RE, Abdulrahman M, Morris MR, Prigmore E, Gribble S, Ng B, Gentle D, Ready S, Weston PMT, Wiesener MS, Kishida T, Yao M, Davison V, Barbero JL, Chu C, Carter NP, Latif F, Maher ER: Characterization of a 3;6 translocation associated with renal cell carcinoma. Genes Chromosomes Cancer 2007, 46:311–31710.1002/gcc.20403. 342. Hirokawa YS, Takagi A, Uchida K, Kozuka Y, Yoneda M, Watanabe M, Shiraishi T: High level expression of STAG1/PMEPA1 in an androgen-independent prostate cancer PC3 subclone. Cell. Mol. Biol. Lett 2007, 10.2478/s11658-007-0009-yAvailable: http://www.ncbi.nlm.nih.gov/pubmed/17318295.Accessed 14 September 2011. 343. Mitelman F, Johansson B, Mertens F: Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. 2011. http://cgap.nci.nih.gov/Chromosomes/Mitelman.Accessed 2 November 2011. 344. Sun X, Frierson HF, Chen C, Li C, Ran Q, Otto KB, Cantarel BL, Cantarel BM, Vessella RL, Gao AC, Petros J, Miura Y, Simons JW, Dong J-T: Frequent somatic mutations of the transcription factor ATBF1 in human prostate cancer. Nat. Genet. 2005, 37:407–41210.1038/ng1528. 345. Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009, 4:1073– 108110.1038/nprot.2009.86. 346. Comprehensive genomic characterization defines human glioblastoma genes and core pathways: Nature 2008, 455:1061–106810.1038/nature07385. 218 347. Huang J, Zhao Y, Li Y, Fletcher JA, Xiao S: Genomic and functional evidence for an ARID1A tumor suppressor role. Genes, Chromosomes and Cancer 2007, 46:745– 75010.1002/gcc.20459. 348. Jones S, Li M, Parsons DW, Zhang X, Wesseling J, Kristel P, Schmidt MK, Markowitz S, Yan H, Bigner D, Hruban RH, Eshleman JR, Iacobuzio‐Donahue CA, Goggins M, Maitra A, Malek SN, Powell S, Vogelstein B, Kinzler KW, Velculescu VE, Papadopoulos N: Somatic mutations in the chromatin remodeling gene ARID1A occur in several tumor types. Human Mutation 2012, 33:100–10310.1002/humu.21633. 349. Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009, 4:1073– 108110.1038/nprot.2009.86. 350. Stiff T, Shtivelman E, Jeggo P, Kysela B: AHNAK interacts with the DNA ligase IVXRCC4 complex and stimulates DNA ligase IV-mediated double-stranded ligation. DNA Repair (Amst.) 2004, 3:245–25610.1016/j.dnarep.2003.11.001. 351. Hirayama S, Bujo H, Yamazaki H, Kanaki T, Takahashi K, Kobayashi J, Schneider WJ, Saito Y: Differential expression of LR11 during proliferation and differentiation of cultured neuroblastoma cells. Biochem. Biophys. Res. Commun. 2000, 275:365– 37310.1006/bbrc.2000.3312. 352. Consortium T1000 GP: A map of human genome variation from population-scale sequencing. Nature 2010, 467:1061–107310.1038/nature09534. 353. Fejes AP, Khodabakhshi AH, Birol I, Jones SJM: Human variation database: an open-source database template for genomic discovery. Bioinformatics 2011, 27:1155– 115610.1093/bioinformatics/btr100. 354. Stratton MR: Exploring the Genomes of Cancer Cells: Progress and Promise. Science 2011, 331:1553 –155810.1126/science.1204040. 355. Davis JD, Lin S-Y: DNA damage and breast cancer. World J Clin Oncol 2011, 2:329– 33810.5306/wjco.v2.i9.329. 356. Stephens PJ, McBride DJ, Lin M-L, Varela I, Pleasance ED, Simpson JT, Stebbings LA, Leroy C, Edkins S, Mudie LJ, Greenman CD, Jia M, Latimer C, Teague JW, Lau KW, Burton J, Quail MA, Swerdlow H, Churcher C, Natrajan R, Sieuwerts AM, Martens JWM, Silver DP, Langerød A, Russnes HEG, Foekens JA, Reis-Filho JS, van ‘t Veer L, Richardson AL, Børresen-Dale A-L, et al.: Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 2009, 462:1005–101010.1038/nature08645. 357. Dever SM, Golding SE, Rosenberg E, Adams BR, Idowu MO, Quillin JM, Valerie N, Xu B, Povirk LF, Valerie K: Mutations in the BRCT binding site of BRCA1 result in hyper-recombination. Aging (Albany NY) 2011, 3:515–532. 219 358. Goya R, Sun MGF, Morin RD, Leung G, Ha G, Wiegand KC, Senz J, Crisan A, Marra MA, Hirst M, Huntsman D, Murphy KP, Aparicio S, Shah SP: SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics 2010, 26:730–73610.1093/bioinformatics/btq040. 359. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009, 25:2865–287110.1093/bioinformatics/btp394. 360. Ensembl‘s 10th year: Available: http://nar.oxfordjournals.org/content/38/suppl_1/D557.Accessed 29 February 2012. 361. Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol 2000, 132:365–386. 362. Wood RD, Mitchell M, Lindahl T: Human DNA repair genes, 2005. Mutat. Res. 2005, 577:275–28310.1016/j.mrfmmm.2005.03.007. 363. Uccelli A, Laroni A, Freedman MS: Mesenchymal stem cells for the treatment of multiple sclerosis and other neurological diseases. Lancet Neurol 2011, 10:649– 65610.1016/S1474-4422(11)70121-1. 364. Shah N: Alternative Neural Crest Cell Fates Are Instructively Promoted by TGF? Superfamily Members. Cell 1996, 85:331–34310.1016/S0092-8674(00)81112-5. 365. Albino D, Brizzolara A, Moretti S, Falugi C, Mirisola V, Scaruffi P, Di Candia M, Truini M, Coco S, Bonassi S, Tonini GP: Gene expression profiling identifies eleven DNA repair genes down-regulated during mouse neural crest cell migration. Int. J. Dev. Biol. 2011, 55:65–7210.1387/ijdb.092970da. 366. Clewes O, Narytnyk A, Gillinder KR, Loughney AD, Murdoch AP, Sieber-Blum M: Human Epidermal Neural Crest Stem Cells (hEPI-NCSC)—Characterization and Directed Differentiation into Osteocytes and Melanocytes. Stem Cell Rev and Rep 2011, 10.1007/s12015-011-9255-5Available: http://www.springerlink.com/content/e060gl54173u7t73/fulltext.html#CR1.Accessed 3 June 2011. 367. Maris JM: The biologic basis for neuroblastoma heterogeneity and risk stratification. Curr. Opin. Pediatr. 2005, 17:7–13. 368. Fredlund E, Ringnér M, Maris JM, Påhlman S: High Myc Pathway Activity and Low Stage of Neuronal Differentiation Associate with Poor Outcome in Neuroblastoma. PNAS 2008, 105:14094–1409910.1073/pnas.0804455105. 369. Thiele CJ: Neuroblastoma (Ed). In Neuroblastoma cell lines. Masers, J Human Cell Culture Lancaster, UK: Kluwer Academic Publishers; Vol. 1 1998:21–53. 220 370. Niederreither K, Doll|[eacute]| P: Retinoic acid in development: towards an integrated view. Nature Reviews Genetics 2008, 9:541–55310.1038/nrg2340. 371. Grimmer MR, Weiss WA: Childhood tumors of the nervous system as disorders of normal development. Curr. Opin. Pediatr 2006, 18:634– 63810.1097/MOP.0b013e32801080fe. 221 Appendices Appendix A Transcripts enriched and depleted in SKPs as discussed in Chapter 2 Table A.1 Transcripts enriched and depleted in SKPs as discussed in Chapter 2 Transcripts enriched (LogFC>0.5 in SKPs vs. MSCs) or depleted (LogFC>0.5 in MSCs vs. SKPs) in SKPs compared to MSCs have been identified as described in Chapter 2. The average log fold enrichment (LogFC Enrichment) is calculated based on three pairwise gene expression comparisons: vSKPs vs. MSCs, dSKPs vs. MSCs and fSKPs vs. MSCs. The transcripts are sorted based on the magnitude of the log fold enrichment. Gene symbol LogFC enrichment Population enriched in Mmp13 8.439845 SKPs Dcn 7.549103 SKPs Mmp10 7.476266 SKPs Porf1 6.404842 SKPs Dio2 6.399098 SKPs Lama2 6.137718 SKPs RGD1563970 5.824436 SKPs Car2 5.750984 SKPs Mmp3 5.609179 SKPs Sema3a 5.581426 SKPs Efemp1 5.538448 SKPs Il13ra2 5.363509 SKPs RGD1563628 5.278261 SKPs Bmp7 5.128188 SKPs Dusp4 5.114638 SKPs Pcp4 5.110036 SKPs Serpina3n 5.094111 SKPs Enpp2 4.939557 SKPs Alox15 4.90602 SKPs Crabp1 4.872659 SKPs Mmp12 4.864477 SKPs Megf10 4.864363 SKPs Cyp26b1 4.799108 SKPs Ptprz1 4.797765 SKPs Shc4 4.74527 SKPs Apoe 4.714561 SKPs Slc43a3 4.709655 SKPs Tcfap2c 4.686553 SKPs Ass1 4.636152 SKPs Adcyap1r1 4.629722 SKPs 222 Gene symbol LogFC enrichment Population enriched in Lrrc15 4.60588 SKPs Prl6a1 4.592803 SKPs Ace 4.580035 SKPs Ctsc 4.550322 SKPs Cntn1 4.534987 SKPs Gfra2 4.531173 SKPs Pi15 4.526552 SKPs Timp3 4.513819 SKPs Sparcl1 4.490847 SKPs Atp8a1 4.470007 SKPs Fam105a 4.443334 SKPs Igfbp5 4.441704 SKPs Sema6d 4.433809 SKPs Mamdc2 4.398762 SKPs Cfh 4.374376 SKPs Tacr3 4.320622 SKPs Nov 4.292506 SKPs Rgs2 4.270706 SKPs Igfbp3 4.270205 SKPs Sema7a 4.210779 SKPs Arhgdib 4.174716 SKPs Ccr1 4.150435 SKPs Ampd3 4.087024 SKPs Lilrb4 4.015927 SKPs Syngr1 3.948953 SKPs Sv2b 3.892332 SKPs Dapk1 3.845124 SKPs Clu 3.832665 SKPs Ar 3.820615 SKPs Hapln1 3.808152 SKPs Robo2 3.805609 SKPs Entpd1 3.784394 SKPs LOC691317 3.782885 SKPs Mmp11 3.774979 SKPs Ifitm1 3.772668 SKPs LOC682861 3.751067 SKPs Igfbp4 3.733792 SKPs C1s 3.72869 SKPs RGD1310788 3.721911 SKPs Ppl 3.693138 SKPs C1ql3 3.688747 SKPs Ddit4l 3.666462 SKPs Abcg1 3.644367 SKPs 223 Gene symbol LogFC enrichment Population enriched in Mfap2 3.633226 SKPs Sfrp2 3.605529 SKPs Robo1 3.564526 SKPs Adamts8 3.517057 SKPs Depdc2 3.500137 SKPs Bdkrb2 3.483871 SKPs Angpt1 3.475515 SKPs Clec14a 3.424995 SKPs Ms4a6a 3.414956 SKPs Dkk2 3.41011 SKPs LOC691995 3.389074 SKPs Spon1 3.387731 SKPs Galnac4s-6st 3.354692 SKPs Mmp9 3.353969 SKPs Epm2aip1 3.352892 SKPs Sntg1 3.338857 SKPs Pla2g7 3.300792 SKPs Sema5a 3.235042 SKPs Cttnbp2 3.218479 SKPs Usp18 3.216911 SKPs Mal 3.211138 SKPs RGD1309969 3.195953 SKPs Rasd1 3.171879 SKPs Bmp2 3.168457 SKPs Bace2 3.16736 SKPs Tgm2 3.149175 SKPs Mafb 3.142627 SKPs Egflam 3.128537 SKPs Ntn1 3.109019 SKPs C1qtnf1 3.045016 SKPs Gas7 3.042994 SKPs RGD1563838 3.042454 SKPs Twist2 3.027152 SKPs Smox 3.025184 SKPs Gabre 3.024032 SKPs Il17rd 3.00017 SKPs Cyp2j4 2.993084 SKPs Gpnmb 2.988691 SKPs Cxcl16 2.982404 SKPs Pcdh7 2.957364 SKPs Serpini1 2.953977 SKPs Prdm1 2.952008 SKPs Slpi 2.91591 SKPs 224 Gene symbol LogFC enrichment Population enriched in Mgst1 2.913738 SKPs Kcnab1 2.896829 SKPs Tspan9 2.869862 SKPs Enc1 2.85147 SKPs Ednra 2.846815 SKPs Chrna1 2.829633 SKPs Col7a1 2.810433 SKPs Btbd3 2.794844 SKPs Dusp6 2.793823 SKPs Scn7a 2.788322 SKPs Icoslg 2.785946 SKPs Plekhg4 2.782189 SKPs Fbxo32 2.763752 SKPs Alx4 2.752864 SKPs Rps6ka2 2.752572 SKPs F2rl2 2.746206 SKPs Wnt7b 2.743404 SKPs Ubash3b 2.737652 SKPs Rerg 2.731972 SKPs Kcnf1 2.729508 SKPs Pde7b 2.720348 SKPs Slc22a23 2.707091 SKPs Cd24 2.706476 SKPs Vav3 2.70184 SKPs Gpx3 2.700941 SKPs Wnt16 2.697651 SKPs Abca1 2.696829 SKPs Tspan11 2.692371 SKPs Sh3kbp1 2.690237 SKPs Xylt1 2.679182 SKPs Slco2b1 2.67564 SKPs MGC112715 2.665346 SKPs St8sia4 2.64635 SKPs Fam171b 2.619459 SKPs Prr16 2.614479 SKPs Plxdc1 2.61116 SKPs Sulf2 2.609031 SKPs Upp1 2.602906 SKPs Fap 2.586286 SKPs Kcna3 2.57713 SKPs Tcfap2a 2.566328 SKPs Arl11 2.563157 SKPs Il16 2.562949 SKPs 225 Gene symbol LogFC enrichment Population enriched in Wnt5a 2.542061 SKPs Pion 2.532506 SKPs Col23a1 2.516179 SKPs Calcrl 2.514765 SKPs Ly75 2.498175 SKPs Cyp2j3 2.475447 SKPs Fam43a 2.457866 SKPs Fgfr1 2.444475 SKPs Il7 2.442199 SKPs Adh7 2.420907 SKPs Bhlhb3 2.413693 SKPs Sorcs2 2.376019 SKPs Rspo4 2.375095 SKPs Gem 2.375041 SKPs Rdh10 2.374578 SKPs Zfp423 2.370252 SKPs Cygb 2.364885 SKPs Gbp2 2.363691 SKPs Serpine2 2.356378 SKPs Zbtb16 2.352951 SKPs Mmp1a 2.331148 SKPs Sdc1 2.318789 SKPs LOC500046 2.309033 SKPs Limk1 2.303573 SKPs Boc 2.287568 SKPs Olfm1 2.281832 SKPs Peli2 2.271998 SKPs Rapgef5 2.248381 SKPs Mitf 2.247989 SKPs Efnb1 2.244997 SKPs Olr155 2.24431 SKPs RGD1563349 2.23565 SKPs Rftn2 2.233297 SKPs Adcy2 2.232741 SKPs Dchs1 2.231393 SKPs Gper 2.22473 SKPs Fkbp5 2.216139 SKPs Cask 2.214251 SKPs Dll1 2.206801 SKPs Gadd45a 2.206485 SKPs Pcdh9 2.20426 SKPs RGD1307051 2.203518 SKPs RGD1309676 2.2009 SKPs 226 Gene symbol LogFC enrichment Population enriched in Etv1 2.194529 SKPs Crisp1 2.183366 SKPs RGD1310753 2.181243 SKPs Ucn2 2.174403 SKPs Ptprk 2.172379 SKPs Scd 2.170952 SKPs Evi2a 2.164939 SKPs Fndc3a 2.164736 SKPs Cd97 2.162718 SKPs Tbx2 2.159147 SKPs Asam 2.153283 SKPs Cd93 2.148491 SKPs Dgkb 2.141238 SKPs Dnm1 2.130209 SKPs Slc16a2 2.128206 SKPs Net1 2.126936 SKPs Elovl4 2.125843 SKPs Pitx2 2.112302 SKPs Lrrc17 2.111186 SKPs Pcsk2 2.082576 SKPs Pdpn 2.08118 SKPs Cpm 2.075904 SKPs Adam33 2.073154 SKPs Scara5 2.071966 SKPs Sgk1 2.071694 SKPs Tnfrsf1b 2.067268 SKPs Cdh2 2.064023 SKPs Glrx1 2.061341 SKPs RGD1560686 2.059465 SKPs Lpcat2 2.052348 SKPs Ptger3 2.047112 SKPs Axin2 2.033148 SKPs Sepp1 2.029722 SKPs Shroom2 2.024783 SKPs Dennd3 2.008681 SKPs Cdkn2b 1.995121 SKPs Dusp5 1.992932 SKPs Ptpre 1.979358 SKPs RGD1563185 1.972694 SKPs Gpc3 1.972504 SKPs Ptgdrl 1.962685 SKPs Lpin1 1.961242 SKPs Rhbdf2 1.949743 SKPs 227 Gene symbol LogFC enrichment Population enriched in LOC501482 1.946375 SKPs RGD1307119 1.937082 SKPs Adcy7 1.924664 SKPs Pcsk5 1.924428 SKPs Nfkbia 1.91993 SKPs Kcnh1 1.918282 SKPs RT1-CE5 1.9116 SKPs Tcf4 1.907333 SKPs Cyp7b1 1.902961 SKPs Psmb8 1.898685 SKPs Rnf149 1.896076 SKPs Adamts4 1.885483 SKPs Slc39a14 1.878078 SKPs Cyb561 1.873581 SKPs Amica1 1.870064 SKPs Npas2 1.858108 SKPs Gsta4 1.855098 SKPs Irx1 1.849742 SKPs LOC365723 1.842754 SKPs LOC689545 1.841621 SKPs Wif1 1.840557 SKPs Phf15 1.839713 SKPs Rab38 1.81861 SKPs Agpat9 1.815353 SKPs Card6 1.799736 SKPs RGD1564996 1.788221 SKPs Arhgap22 1.785155 SKPs Slc1a1 1.784007 SKPs Mmp2 1.78395 SKPs Nrg1 1.775892 SKPs Tcp11l2 1.735227 SKPs Cyp2d4v1 1.735093 SKPs Thrb 1.733174 SKPs Angptl2 1.72938 SKPs Rbp1 1.718883 SKPs Fyn 1.718785 SKPs Twist1 1.718642 SKPs Lrp5 1.710311 SKPs Ghr 1.706529 SKPs Sox2 1.702773 SKPs Syt17 1.697783 SKPs Prex1 1.697384 SKPs Ebf4 1.69161 SKPs 228 Gene symbol LogFC enrichment Population enriched in Etv4 1.688637 SKPs Plcb1 1.679783 SKPs Klhl24 1.676954 SKPs Npc2 1.675324 SKPs Ramp 1.667865 SKPs Hspa12b 1.662875 SKPs Tex9 1.659326 SKPs Rasgrf2 1.654338 SKPs Rpia 1.653981 SKPs Adamts7 1.650379 SKPs B3galnt1 1.644111 SKPs Lmo2 1.626439 SKPs Ap1s2 1.618013 SKPs Slc19a2 1.61485 SKPs Csad 1.606494 SKPs P2ry1 1.606197 SKPs Arsb 1.599389 SKPs Cacna1a 1.59918 SKPs Adfp 1.595259 SKPs RGD1310552 1.593861 SKPs P2ry2 1.590305 SKPs Il2rg 1.58555 SKPs Pfkfb4 1.575458 SKPs LOC305633 1.57441 SKPs Zc3h12c 1.574396 SKPs Epb4.1 1.571719 SKPs Kremen1 1.564563 SKPs Sh3gl3 1.564535 SKPs Irx2 1.561118 SKPs Pcdh18 1.558573 SKPs RGD1307749 1.556957 SKPs Ankrd6 1.545125 SKPs Cyb5a 1.539911 SKPs Lrig1 1.537755 SKPs Gng8 1.535937 SKPs Spred1 1.533063 SKPs Pik3r1 1.522542 SKPs Trpm3 1.520454 SKPs Ctsh 1.519906 SKPs Ppp1r3b 1.512322 SKPs St3gal4 1.509367 SKPs Ddit4 1.507832 SKPs Ptpn3 1.507165 SKPs 229 Gene symbol LogFC enrichment Population enriched in Ctsd 1.499332 SKPs Lgals3 1.498291 SKPs Fut8 1.497645 SKPs Spry4 1.493729 SKPs RGD1307569 1.491449 SKPs Capg 1.486736 SKPs Gsta3 1.48666 SKPs Aplp1 1.473645 SKPs Satb2 1.472972 SKPs Rab3il1 1.467931 SKPs Gpr153_predicted 1.460756 SKPs Tmem119 1.457966 SKPs Frrs1 1.456784 SKPs Abca2 1.453259 SKPs Mex3b 1.448965 SKPs Pde1b 1.442909 SKPs Irf1 1.441505 SKPs Btg1 1.44123 SKPs Mr1 1.437384 SKPs Dcbld2 1.437351 SKPs LOC691418 1.434372 SKPs Nr1h3 1.433924 SKPs Slc4a7 1.432864 SKPs Ttyh3 1.432388 SKPs RGD1560778 1.430171 SKPs Rai2 1.429157 SKPs Ugt1a5 1.421798 SKPs Prrg3 1.417992 SKPs Egr3 1.417143 SKPs Ptgir 1.406829 SKPs Rasa1 1.400974 SKPs Atp6v1b2 1.384625 SKPs Cables1 1.376785 SKPs LOC685707 1.363671 SKPs Nudt4 1.347848 SKPs Foxo3 1.346535 SKPs Zmiz1 1.346355 SKPs Flrt3 1.343682 SKPs RGD1564942 1.339691 SKPs Grasp 1.335824 SKPs Tgfbr2 1.335203 SKPs Cd9 1.332448 SKPs Dyrk3 1.332066 SKPs 230 Gene symbol LogFC enrichment Population enriched in Ankh 1.331324 SKPs Nme3 1.325655 SKPs Col16a1 1.324764 SKPs Obfc2a 1.324321 SKPs RGD1310862 1.323114 SKPs Slc7a7 1.322833 SKPs Efnb2 1.321452 SKPs Ctsl1 1.321234 SKPs Cd82 1.315652 SKPs Slc25a37 1.314125 SKPs Rnf182 1.312826 SKPs Uhrf2 1.31142 SKPs Tifa 1.309473 SKPs Adam17 1.308222 SKPs Cbfa2t3 1.304733 SKPs Igsf3 1.30396 SKPs Syt13 1.303321 SKPs LOC690769 1.303044 SKPs Tpp1 1.28837 SKPs LOC688916 1.284926 SKPs Bcl2l11 1.277599 SKPs Scd1 1.274121 SKPs B2m 1.26819 SKPs Oprd1 1.267969 SKPs Tm7sf2 1.267528 SKPs Nfe2l2 1.265797 SKPs Myo1b 1.265205 SKPs Cdh11 1.264608 SKPs Kcnab2 1.259399 SKPs Syt6 1.25806 SKPs Gchfr 1.255077 SKPs Scpep1 1.250481 SKPs Abhd14b 1.24897 SKPs S1pr2 1.247579 SKPs Spock1 1.240663 SKPs Slc39a6 1.23809 SKPs Dlgap2 1.233464 SKPs Prkcz 1.232432 SKPs Sh2b2 1.2306 SKPs Tcp11l1 1.225166 SKPs Shc2 1.221249 SKPs Mettl7a 1.22108 SKPs Sema4c 1.220116 SKPs 231 Gene symbol LogFC enrichment Population enriched in Tgfbr1 1.219942 SKPs Nid67 1.219078 SKPs Ralgds 1.208394 SKPs Ier5 1.203804 SKPs Wwtr1 1.202635 SKPs Tmem200b 1.201752 SKPs Gstt2 1.198435 SKPs Ntrk2 1.193961 SKPs Ctsf 1.185615 SKPs Cntnap1 1.18223 SKPs Zfp36l1 1.179654 SKPs Plxna2 1.17861 SKPs Nkd1 1.173436 SKPs Mylip 1.17217 SKPs S100a16 1.170254 SKPs Ly96 1.166122 SKPs Ccdc92 1.165316 SKPs Mocos 1.164343 SKPs Grn 1.162677 SKPs Chchd10 1.161562 SKPs Neu2 1.158107 SKPs RGD1359529 1.153555 SKPs Slc12a7 1.151441 SKPs RGD1566021 1.150212 SKPs RGD1566132 1.14984 SKPs RGD1562618 1.14834 SKPs Lphn1 1.147889 SKPs Cachd1 1.146924 SKPs Atp10d 1.136646 SKPs Runx3 1.135472 SKPs Dennd2c 1.135139 SKPs Tusc3 1.134717 SKPs Cugbp2 1.133646 SKPs Dnajc12 1.132937 SKPs Pnrc1 1.120236 SKPs Clk4 1.117579 SKPs Gabarapl1 1.103554 SKPs Igf2r 1.099944 SKPs Bach1 1.094652 SKPs Pcmtd2 1.094107 SKPs Sntb2 1.09236 SKPs Zfand5 1.090404 SKPs Uba7 1.090038 SKPs 232 Gene symbol LogFC enrichment Population enriched in Lamp2 1.089041 SKPs Pten 1.086014 SKPs Itm2c 1.082152 SKPs Psmd10 1.081984 SKPs Lgals3bp 1.081565 SKPs Dpy19l1 1.075216 SKPs Mmp14 1.072628 SKPs RGD1306437 1.069995 SKPs LOC501110 1.065487 SKPs RGD1306613 1.059337 SKPs Znf503 1.054815 SKPs Plag1 1.045373 SKPs Tmem140 1.042719 SKPs Rasa3 1.039741 SKPs Cdkn2aipnl 1.03926 SKPs Ppargc1b 1.036635 SKPs RGD1309621 1.036602 SKPs Prkcd 1.034258 SKPs Epha4 1.032744 SKPs Gcnt1 1.027978 SKPs Tram1l1 1.026665 SKPs RGD1304884 1.017999 SKPs Mad2l2 1.015835 SKPs Nradd 1.015643 SKPs Gpd1 1.008748 SKPs Ankrd44 1.007042 SKPs Pink1 1.005692 SKPs Hsd17b11 1.004974 SKPs Nrg2 0.999896 SKPs RGD1306058 0.995803 SKPs Map3k6 0.990044 SKPs Homer2 0.983582 SKPs Tbc1d2b 0.983029 SKPs RGD1308093 0.982798 SKPs Camk2n1 0.982001 SKPs Gsn 0.978805 SKPs Psap 0.978628 SKPs Plekha5 0.978532 SKPs Emp1 0.97834 SKPs Isoc1 0.978253 SKPs Arrdc3 0.975421 SKPs RGD1306271 0.972209 SKPs Sipa1l2 0.972091 SKPs 233 Gene symbol LogFC enrichment Population enriched in Manba 0.971204 SKPs Rtkn 0.971138 SKPs Maml3 0.970924 SKPs Runx1t1 0.96641 SKPs Grina 0.965713 SKPs Psen2 0.965253 SKPs Ptp4a1 0.961293 SKPs MGC94600 0.960727 SKPs Atp6v1a 0.960519 SKPs Cst3 0.955815 SKPs Ppp3ca 0.955555 SKPs Utx 0.953783 SKPs Nucb1 0.951491 SKPs LOC683801 0.945135 SKPs Man2b1 0.940716 SKPs Tacc2 0.938214 SKPs Kif26a 0.937563 SKPs Tbc1d16 0.934257 SKPs Tapbpl 0.932016 SKPs Pepd 0.9291 SKPs Sema6c 0.928331 SKPs Pnpla7 0.927908 SKPs Slc25a2 0.927831 SKPs H2-M3 0.926751 SKPs Zfp347 0.924779 SKPs Nqo1 0.923352 SKPs Insig1 0.922818 SKPs Tap2 0.922308 SKPs Gng12 0.92217 SKPs Ltbp3 0.920772 SKPs Ripk5 0.919637 SKPs Hdac11 0.919197 SKPs Lrrc8a 0.917183 SKPs Gna13 0.91352 SKPs Ctsb 0.912132 SKPs RGD1304694 0.906121 SKPs Nbr1 0.904426 SKPs Slc24a6 0.90129 SKPs Ccdc50 0.899954 SKPs Fhit 0.899764 SKPs Mfsd7 0.898991 SKPs Laptm4b 0.895226 SKPs Nfat5 0.892256 SKPs 234 Gene symbol LogFC enrichment Population enriched in Tiparp 0.883362 SKPs Appl2 0.877522 SKPs Fip1l1 0.872146 SKPs LOC100192313 0.868357 SKPs Pld3 0.864994 SKPs Nck1 0.861146 SKPs Fbxo44 0.859717 SKPs Arhgef9 0.859525 SKPs Chka 0.858264 SKPs Tmem59_RGD1310313 0.857462 SKPs Wipf1 0.848145 SKPs Rxra 0.846768 SKPs Atp6ap2 0.845348 SKPs Itm2b 0.844967 SKPs Parp9 0.839773 SKPs RGD1566386 0.838603 SKPs Sh2b3 0.838432 SKPs Rora 0.834799 SKPs Atp6ap1 0.834476 SKPs Dtx3l 0.825572 SKPs Runx2 0.824737 SKPs Fundc2 0.824482 SKPs Ogt 0.818993 SKPs RGD1311605 0.818612 SKPs Wdfy2 0.81507 SKPs Ier5l 0.814783 SKPs Per1 0.814229 SKPs Nudt9 0.813495 SKPs Alg2 0.812933 SKPs Gramd3 0.808468 SKPs Psenen 0.807557 SKPs Hbp1 0.805151 SKPs RGD1309926 0.804397 SKPs Rnf167 0.802496 SKPs Ctnnb1 0.800444 SKPs Man2a2 0.799784 SKPs Fuca1 0.79051 SKPs Cd1d1 0.78585 SKPs Chst11 0.784267 SKPs Wdr91 0.781068 SKPs RGD1306284 0.779611 SKPs Slk 0.779439 SKPs Aadacl1 0.776881 SKPs 235 Gene symbol LogFC enrichment Population enriched in Zfoc1 0.776444 SKPs Kdsr 0.776261 SKPs Slc23a2 0.775929 SKPs Ptprm 0.774578 SKPs Crebl2 0.771172 SKPs Gns 0.769812 SKPs Ntng2 0.767413 SKPs Ccl4 0.767396 SKPs Plekho1 0.764344 SKPs Rufy1 0.758117 SKPs Ube2b 0.755211 SKPs Hace1 0.753332 SKPs Wdr68 0.743617 SKPs Chic2 0.738018 SKPs Tpbg 0.733065 SKPs Dnm3 0.731917 SKPs Ubl3 0.731577 SKPs Slc29a3 0.731236 SKPs Ccbl1 0.728912 SKPs RGD1307986 0.727596 SKPs Hs1bp3 0.725371 SKPs Stk39 0.722163 SKPs Scamp1 0.711949 SKPs Btbd1 0.711071 SKPs LOC300225 0.708721 SKPs Hipk1 0.707363 SKPs Atp6v0d1 0.701349 SKPs Ercc5 0.700507 SKPs Vamp5 0.700015 SKPs Orai2 0.699471 SKPs Jmjd1c 0.697773 SKPs Dlk2 0.694736 SKPs Calcoco1 0.691674 SKPs Ihpk1 0.67988 SKPs Tmem179b 0.67951 SKPs Ggps1 0.678025 SKPs LOC685925 0.670847 SKPs Igbp1 0.669457 SKPs Slc27a1 0.6684 SKPs Wdr45 0.667068 SKPs Gramd1a 0.66656 SKPs Ihpk2 0.663588 SKPs Acsf2 0.662109 SKPs 236 Gene symbol LogFC enrichment Population enriched in Sh3bp4 0.647964 SKPs Reep3 0.64462 SKPs Zfp110 0.641738 SKPs Zbtb43 0.632255 SKPs Fam113a 0.621516 SKPs Tbc1d17 0.613462 SKPs Lztfl1 0.609969 SKPs Lig4 0.609094 SKPs Ss18 0.603474 SKPs RGD1307682 0.597754 SKPs Cnnm3 0.597087 SKPs Mafg 0.596552 SKPs Narf 0.592008 SKPs Tmem188 0.589472 SKPs Laptm4a 0.584644 SKPs RGD1560612 0.580607 SKPs Ldb1 0.574763 SKPs Ccng1 0.565446 SKPs Phactr2 0.562541 SKPs Fam160a2 0.560316 SKPs Srebf2 0.559231 SKPs RGD1560108 0.551124 SKPs Fhl1 5.787124 MSCs Akap12 5.615653 MSCs Cyp1b1 5.522993 MSCs Itga11 5.428756 MSCs Smoc1 5.375434 MSCs Acan 5.22488 MSCs Cnn1 5.089795 MSCs Mfap5 5.061273 MSCs Tgfb3 4.94043 MSCs Ogn 4.819063 MSCs Grb14 4.716693 MSCs Actg2 4.535768 MSCs Cxcr7 4.504343 MSCs Aoc3 4.478056 MSCs Arsi 4.408675 MSCs Cryab 4.163519 MSCs Nexn 4.14033 MSCs Wisp2 4.09953 MSCs Pak1 4.065077 MSCs Casq2 4.005767 MSCs Hs6st2 3.943769 MSCs 237 Gene symbol LogFC enrichment Population enriched in Ccdc80 3.892134 MSCs Lmod1 3.867123 MSCs Slc38a4 3.806375 MSCs Amotl2 3.798151 MSCs Prss23 3.793714 MSCs Hbegf 3.717367 MSCs Myocd 3.698403 MSCs LOC290595 3.68784 MSCs Stxbp6 3.666674 MSCs Lox 3.662211 MSCs Clec2dl1 3.593328 MSCs Meox2 3.487526 MSCs Myl9 3.460855 MSCs Diaph3 3.460381 MSCs Ptgr1 3.404171 MSCs Slc24a3 3.400344 MSCs Sytl2 3.399421 MSCs Dysf 3.375439 MSCs Anln 3.375205 MSCs Slc2a3 3.369597 MSCs Lims2 3.367772 MSCs Prkg1 3.213902 MSCs Hoxa10 3.213595 MSCs Myh11 3.209889 MSCs Pak3 3.200443 MSCs Bmper 3.186379 MSCs Cdh3 3.175107 MSCs Fabp3 3.161417 MSCs Anxa8 3.158857 MSCs Cenpf 3.140417 MSCs Ddah1 3.123098 MSCs Abhd10 3.117166 MSCs Jag1 3.111915 MSCs Adamtsl3 3.088061 MSCs Jub 3.075224 MSCs Itpr1 3.010265 MSCs Kif4 2.943342 MSCs Tpm1 2.918062 MSCs Prc1 2.91533 MSCs Scrn1 2.908155 MSCs Cdkn3 2.899309 MSCs Zfpm2 2.884909 MSCs Ctgf 2.884599 MSCs 238 Gene symbol LogFC enrichment Population enriched in Tnfsf18 2.882138 MSCs Wbscr17 2.872408 MSCs Slc2a12 2.849829 MSCs Ect2 2.834975 MSCs Palmd 2.760869 MSCs Ppp1r14a 2.758544 MSCs Kbtbd10 2.753339 MSCs Egfl6 2.750576 MSCs Lgr4 2.744906 MSCs Iqgap3 2.730602 MSCs Sema3e 2.7087 MSCs Glb1l2 2.694916 MSCs Nalcn 2.69204 MSCs Cep55 2.69109 MSCs Ccnb1 2.671173 MSCs RGD1309930 2.666364 MSCs Slc8a1 2.650329 MSCs RGD1309360 2.643812 MSCs Lpp 2.637284 MSCs Gas2l3_LOC687775 2.632208 MSCs Gstm2 2.62979 MSCs Ebpl 2.628666 MSCs Gprc5a 2.625732 MSCs Bcar3 2.616395 MSCs Ahr 2.603595 MSCs Kif23 2.598668 MSCs Nek2 2.593458 MSCs Mybl1 2.574959 MSCs RGD1562646 2.571932 MSCs Foxp1 2.566615 MSCs Mrvi1 2.556722 MSCs Cadm4 2.555991 MSCs Fzd4 2.553415 MSCs Kif2c 2.539284 MSCs Sgms2 2.536525 MSCs Cdc20 2.523547 MSCs Samd9l 2.510591 MSCs Nuf2 2.5042 MSCs Pragmin 2.501032 MSCs Crim1 2.498068 MSCs Kif20a 2.481075 MSCs Klf2 2.479546 MSCs Bard1 2.474458 MSCs 239 Gene symbol LogFC enrichment Population enriched in Pdlim5 2.471379 MSCs Tpx2 2.468738 MSCs Sntb1 2.460512 MSCs Tpm2 2.456111 MSCs Aard 2.447477 MSCs Sgol2 2.4445 MSCs Psat1 2.440751 MSCs Ngf 2.423738 MSCs Pdlim1 2.416225 MSCs Car9 2.397039 MSCs Cenpe 2.386557 MSCs RGD1311642 2.385081 MSCs Vldlr 2.384781 MSCs Pftk1 2.375764 MSCs Ebf2 2.375753 MSCs Epas1 2.374871 MSCs Adam23 2.374386 MSCs LOC497860 2.368422 MSCs Plod2 2.366713 MSCs S1pr3 2.362976 MSCs Sox30 2.353627 MSCs Racgap1 2.34999 MSCs Kif20b 2.348522 MSCs Spc25 2.341741 MSCs Dner 2.330488 MSCs Cenpi 2.326654 MSCs Flvcr2 2.319188 MSCs RGD1307201 2.315084 MSCs Cyr61 2.2983 MSCs Dlgap5 2.294088 MSCs Clec2d 2.289774 MSCs Mastl 2.288684 MSCs P4ha3 2.283988 MSCs Plk1 2.280576 MSCs C1qtnf5 2.275317 MSCs Bub1b 2.264176 MSCs Mybl2 2.249627 MSCs RGD1310335 2.24446 MSCs Casc5 2.244297 MSCs Aspm 2.241747 MSCs Adamts5 2.237903 MSCs Rad51 2.237872 MSCs Sync 2.233378 MSCs 240 Gene symbol LogFC enrichment Population enriched in Spag5 2.233216 MSCs Chmp4c 2.220502 MSCs RGD1306565 2.219701 MSCs Ttk 2.218258 MSCs Camk2g 2.21016 MSCs Espl1 2.20576 MSCs Hmmr 2.192208 MSCs Hnt 2.186537 MSCs Ccnb2 2.185957 MSCs Nr4a1 2.183144 MSCs Bub1 2.174155 MSCs Slitrk5 2.157658 MSCs Nr3c2 2.152545 MSCs Lonrf2 2.147856 MSCs Col11a1 2.145523 MSCs Ccna2 2.128854 MSCs St5 2.124276 MSCs Fam64a 2.117837 MSCs Fancd2 2.111739 MSCs Cdc2 2.105662 MSCs Shroom1 2.10519 MSCs Tacc3 2.096018 MSCs Cnnm2 2.09465 MSCs Eno3 2.093444 MSCs Lamc3 2.093437 MSCs Il6 2.091523 MSCs Fam83d 2.084619 MSCs Kif11 2.08098 MSCs RGD1310376 2.075754 MSCs Lmnb1 2.070811 MSCs Grb10 2.066945 MSCs Smoc2 2.0652 MSCs Podnl1 2.059558 MSCs Uhrf1 2.054711 MSCs Pgm5 2.054271 MSCs Spon2 2.050796 MSCs Fam164a 2.043834 MSCs Plk4 2.039775 MSCs Vcl 2.037924 MSCs Niban 2.023685 MSCs Klhl30 2.021423 MSCs Rcan2 2.019507 MSCs Lhfp 2.010277 MSCs 241 Gene symbol LogFC enrichment Population enriched in Cdca2 2.010068 MSCs Fhl2 2.007997 MSCs Dtl 2.002985 MSCs Syde2 2.001031 MSCs Zfhx3 1.997763 MSCs Fat3 1.996478 MSCs Nfia 1.994831 MSCs Syne2 1.994614 MSCs Slc7a5 1.983641 MSCs Gen1 1.982272 MSCs Sulf1 1.982193 MSCs Glipr2 1.981042 MSCs Mustn1 1.964918 MSCs Smarca1 1.962925 MSCs Hrasls 1.961982 MSCs Hjurp 1.950612 MSCs Mmp23 1.947182 MSCs Kntc1 1.945821 MSCs Kif22 1.944116 MSCs Amot 1.936063 MSCs Spc24 1.932001 MSCs LOC691979 1.929052 MSCs Cenpa 1.922477 MSCs Flnc 1.912271 MSCs Klhl13 1.910179 MSCs Cdca3 1.90934 MSCs Depdc1 1.908642 MSCs Plekhk1 1.90516 MSCs Cenpk 1.904267 MSCs Serpine1 1.900852 MSCs LOC683179 1.894496 MSCs Ncam1 1.889711 MSCs Cenpt 1.888805 MSCs Slfn3 1.887402 MSCs RGD1559690 1.883979 MSCs Bicd1 1.882376 MSCs Gadd45g 1.879958 MSCs Phf17 1.877361 MSCs Mlf1ip 1.876278 MSCs Pcbd1 1.872876 MSCs Ppp1r3c 1.868575 MSCs Sh3md4 1.858281 MSCs LOC684771 1.849719 MSCs 242 Gene symbol LogFC enrichment Population enriched in RGD1559896 1.842205 MSCs Six1 1.840969 MSCs Gpr176 1.837392 MSCs Basp1 1.83541 MSCs Prr11 1.833908 MSCs Cited2 1.831179 MSCs Lama4 1.829976 MSCs Ptprf 1.821942 MSCs RGD735112 1.815782 MSCs Magi3 1.814402 MSCs RGD1308541 1.810995 MSCs Brip1 1.808806 MSCs RGD1561090 1.803992 MSCs Pdk4 1.788501 MSCs Odf3l1 1.788221 MSCs Stil 1.785303 MSCs Cdc6 1.784175 MSCs Ckap2l 1.77599 MSCs Kalrn 1.774908 MSCs Tcea3 1.774461 MSCs Aurka 1.772638 MSCs Kif18b 1.769443 MSCs Npas4 1.765861 MSCs Amph 1.761949 MSCs Ccne2 1.761286 MSCs Gramd1c 1.7597 MSCs Phgdh 1.742096 MSCs Chst10 1.731918 MSCs Fbxo5 1.725971 MSCs Melk 1.724418 MSCs Wwc2 1.724391 MSCs Ssx2ip 1.7198 MSCs Ccdc37 1.710348 MSCs Cep72 1.7062 MSCs Slc4a4 1.701822 MSCs RGD1305412 1.70005 MSCs Myo1c 1.693038 MSCs Rapsn 1.690128 MSCs Traf4af1 1.686846 MSCs Lpar3 1.686723 MSCs Hist2h3c2_Hist1h3f_LOC679950_LOC684762_LOC684841 1.683404 MSCs RGD1562846 1.68111 MSCs Arhgap11a 1.678273 MSCs 243 Gene symbol LogFC enrichment Population enriched in Ccnf 1.676746 MSCs Rrad 1.672345 MSCs Ercc6l 1.667879 MSCs Mcm10 1.666948 MSCs Scn2a1 1.659186 MSCs Fads3 1.657539 MSCs LOC684611 1.656247 MSCs Gja5 1.65283 MSCs Tgfb1i1 1.645035 MSCs Cdca8 1.644789 MSCs Pde4b 1.642621 MSCs E2f7 1.641516 MSCs Casp12 1.641111 MSCs P2rx5 1.639832 MSCs Slc29a2 1.63483 MSCs Nt5dc3 1.63426 MSCs Asf1b 1.633098 MSCs Fos 1.632931 MSCs Kprp 1.631475 MSCs Dnajb4 1.623556 MSCs Bmp4 1.619565 MSCs Fzd6 1.617197 MSCs Plscr2 1.615788 MSCs Pitx1 1.614561 MSCs LOC305691 1.61005 MSCs RGD1304693 1.609554 MSCs Rin3 1.602719 MSCs Pck2 1.601887 MSCs Mansc1 1.601569 MSCs Slc1a4 1.599831 MSCs Apold1 1.59164 MSCs Smc4 1.591152 MSCs Smc2 1.587481 MSCs Spta1 1.585526 MSCs Zwilch 1.585128 MSCs Sgca 1.585054 MSCs Myadm 1.581937 MSCs Adamts6 1.580761 MSCs Rasl12 1.570126 MSCs Fam26e 1.570074 MSCs Sass6 1.569233 MSCs Cav3 1.567815 MSCs Depdc1b 1.565429 MSCs 244 Gene symbol LogFC enrichment Population enriched in Pdia5 1.563651 MSCs Foxm1 1.562253 MSCs Trip13 1.558548 MSCs Fat1 1.552651 MSCs Rgs16 1.551015 MSCs Geft 1.548072 MSCs Tmem30b 1.546123 MSCs Dusp8 1.545449 MSCs Mcm6 1.544847 MSCs Sorbs1 1.542864 MSCs Col5a1 1.542143 MSCs LOC682888 1.539402 MSCs Zfp568 1.535262 MSCs Dbf4 1.533319 MSCs RGD1305450 1.533239 MSCs Csrp1 1.532804 MSCs Calml4 1.528867 MSCs Fosb 1.524234 MSCs RGD1306507 1.524096 MSCs Pole 1.518326 MSCs Flna 1.51607 MSCs Zdhhc15 1.514909 MSCs Hspc159 1.510997 MSCs Tmem195 1.509481 MSCs Crip2 1.50931 MSCs Gins1 1.507104 MSCs Cenpm 1.505979 MSCs Klhdc8a 1.490918 MSCs Setbp1 1.487603 MSCs Kif15 1.484794 MSCs Afap1 1.483962 MSCs Aurkb 1.482251 MSCs Mum1l1 1.47854 MSCs Ccdc99 1.465444 MSCs Foxs1 1.463887 MSCs Pkp2 1.462688 MSCs Lrig3 1.462068 MSCs Zfp367 1.460579 MSCs Rras2 1.455064 MSCs Gtse1 1.452223 MSCs Brca1 1.448116 MSCs Cit 1.447295 MSCs Scx 1.445946 MSCs 245 Gene symbol LogFC enrichment Population enriched in Clspn 1.444163 MSCs RGD1310784 1.443659 MSCs Dusp14 1.438739 MSCs Slc35f2 1.437791 MSCs Ccdc18 1.428912 MSCs Ncapd2 1.424076 MSCs Pkmyt1 1.423767 MSCs Kifc1 1.422214 MSCs Rfc4 1.418915 MSCs LOC362464 1.41099 MSCs Smad6 1.409944 MSCs LOC682649 1.409515 MSCs Slc7a1 1.407705 MSCs Samd4a 1.401517 MSCs Gadd45b 1.400404 MSCs Pdlim7 1.396858 MSCs Hist1h2bb_LOC684647 1.391249 MSCs Hoxa1 1.388269 MSCs RGD1566107 1.38725 MSCs Chrdl2 1.386097 MSCs Tead3 1.384032 MSCs LOC684534 1.383994 MSCs LOC689399 1.383116 MSCs RGD1309051 1.379638 MSCs Mthfd2 1.373176 MSCs Igf2bp3 1.371842 MSCs Hmgb2 1.367814 MSCs Cdc45l 1.364643 MSCs Rhobtb3 1.362241 MSCs Bok 1.356159 MSCs Tpd52 1.345157 MSCs Igf2bp1 1.337675 MSCs Dlc1 1.334957 MSCs Lrba 1.331905 MSCs Dscc1 1.325862 MSCs Cdc25c 1.325676 MSCs Btn2a2 1.323856 MSCs Fas 1.321422 MSCs Troap 1.312044 MSCs Arf2 1.310841 MSCs Mnd1 1.301351 MSCs Farp2 1.301342 MSCs LOC689296 1.298606 MSCs 246 Gene symbol LogFC enrichment Population enriched in Emd 1.296894 MSCs Kif18a 1.293936 MSCs Rnf150 1.293415 MSCs Luzp5 1.291584 MSCs RGD1563296 1.288481 MSCs Col8a2 1.288463 MSCs Actn1 1.282504 MSCs Ankrd15 1.282415 MSCs Rad54l 1.281679 MSCs Ptpn14 1.280334 MSCs Zfp469 1.280251 MSCs Orc1l 1.279677 MSCs Chst3 1.278381 MSCs Myh9 1.277894 MSCs Nacad 1.276872 MSCs Eme1 1.276691 MSCs Itgb1bp2 1.276208 MSCs Dse 1.272807 MSCs Csgalnact1 1.272569 MSCs Trim59 1.266625 MSCs LOC500700 1.264398 MSCs Dsn1 1.262052 MSCs Tmem144 1.261668 MSCs Garnl4 1.260447 MSCs Smad9 1.254173 MSCs Rab9b 1.25407 MSCs Dzip1l 1.252809 MSCs RGD1308101 1.2528 MSCs Myl6 1.250609 MSCs Tmod2 1.249875 MSCs Lrrcc1 1.246468 MSCs Sgol1 1.246005 MSCs RGD1309522 1.244637 MSCs Ezh2 1.239213 MSCs Mgat1 1.239113 MSCs LOC500118 1.237289 MSCs Gclm 1.236967 MSCs Hip1 1.235322 MSCs Ehbp1 1.233421 MSCs Glra1 1.232095 MSCs Hist1h2ail 1.23196 MSCs Usp13 1.231576 MSCs Nuak2 1.225236 MSCs 247 Gene symbol LogFC enrichment Population enriched in Csgalnact2 1.22522 MSCs Cmtm4 1.224424 MSCs Phex 1.222562 MSCs Gmnn 1.22044 MSCs Klhl31 1.219011 MSCs Ptpn21 1.210203 MSCs LOC680565 1.209748 MSCs RGD1565493 1.209541 MSCs RGD1305288 1.208245 MSCs Cdkn2c 1.205427 MSCs Ctps 1.199407 MSCs Tead1 1.197407 MSCs Sema3b 1.196813 MSCs LOC679958 1.195581 MSCs Fam81a 1.193139 MSCs Fez2 1.193071 MSCs Hebp2 1.191734 MSCs Topbp1 1.188391 MSCs Hist2h2bb 1.187189 MSCs Ncapd3 1.176062 MSCs Zfp57 1.172868 MSCs Fermt2 1.168342 MSCs Ddx11 1.165571 MSCs Sdpr 1.165222 MSCs Ankrd50 1.158041 MSCs Hist1h2bn 1.154474 MSCs Rbpms2 1.151599 MSCs Cdca7 1.150413 MSCs Raph1 1.150341 MSCs Kpna2 1.147317 MSCs Rnf19b 1.144871 MSCs Pycs 1.143195 MSCs H2afx 1.143114 MSCs Spa17 1.139523 MSCs Gng3 1.136039 MSCs Kif13a 1.134418 MSCs Sdc3 1.134226 MSCs Dpysl3 1.130437 MSCs Chd3 1.130081 MSCs Pmaip1 1.128317 MSCs Snta1 1.126853 MSCs RGD1561444 1.126628 MSCs Lima1 1.125223 MSCs 248 Gene symbol LogFC enrichment Population enriched in Il15 1.121316 MSCs Ccdc34 1.119761 MSCs Tuba1b 1.1123 MSCs RGD1565514 1.111078 MSCs RGD1311723 1.108701 MSCs Chek1 1.108128 MSCs Traip 1.104958 MSCs Pif1 1.104593 MSCs Myo10 1.10342 MSCs Phlpp 1.102351 MSCs Plcb3 1.100418 MSCs Cltb 1.096971 MSCs Slc9a3r2 1.096582 MSCs Ccne1 1.093652 MSCs Actn4 1.093246 MSCs Ggt7 1.092181 MSCs Mad2l1 1.08754 MSCs Kif24 1.087532 MSCs Pmf1 1.086 MSCs Flnb 1.08499 MSCs Phka2 1.083581 MSCs Lhx2 1.08313 MSCs RGD1559864 1.082552 MSCs Znf569 1.079618 MSCs Arhgap5 1.078053 MSCs Hspg2 1.075367 MSCs Lmo1 1.074913 MSCs Ppm1e 1.073445 MSCs Nup37 1.070194 MSCs Rnd1 1.069068 MSCs LOC366669 1.068581 MSCs Tuba1c 1.067777 MSCs Rad18 1.066146 MSCs Specc1 1.063826 MSCs Epb4.1l4a 1.063244 MSCs Arid5b 1.060975 MSCs Lmln 1.058453 MSCs Ppp1r14c 1.057457 MSCs Gm672 1.046797 MSCs Dlx5 1.042727 MSCs Tmpo 1.029393 MSCs Dtnb 1.028785 MSCs S100a11 1.026543 MSCs 249 Gene symbol LogFC enrichment Population enriched in Rab23 1.024861 MSCs Brca2 1.018536 MSCs RGD1305834 1.015468 MSCs Lig1 1.012114 MSCs RGD1566017 1.008647 MSCs Lats2 1.007614 MSCs Cxadr 1.002953 MSCs Fchsd1 1.001617 MSCs Smarcd3 1.000893 MSCs Rpa1 1.000753 MSCs Ccdc19 0.999318 MSCs Bag2 0.999226 MSCs RGD1306959 0.998331 MSCs Klk14 0.99808 MSCs RGD1310453 0.997023 MSCs Mnat1 0.993812 MSCs LOC688667_LOC688856 0.991433 MSCs Chtf18 0.987898 MSCs Rfc3 0.98663 MSCs Ccdc126 0.986204 MSCs Fancb 0.985542 MSCs Ccdc6 0.983361 MSCs Neil3 0.983124 MSCs Hist1h2bl 0.982372 MSCs Cenpn 0.979684 MSCs RGD1564851 0.973588 MSCs Hist2h3c2 0.972597 MSCs Hist2h2ac 0.972071 MSCs Cfl2 0.970339 MSCs Cald1 0.967778 MSCs Chac1 0.966312 MSCs Slc43a1 0.966018 MSCs Sacs 0.965841 MSCs Man2a1 0.964464 MSCs Etfb 0.961831 MSCs Rmi1 0.96168 MSCs Tsc22d2 0.959404 MSCs Hivep1 0.958742 MSCs Recql4 0.954728 MSCs LOC502894 0.953326 MSCs Tbx4 0.949726 MSCs Ube2c 0.948554 MSCs Fam118a 0.947657 MSCs 250 Gene symbol LogFC enrichment Population enriched in Atad5 0.947584 MSCs Otub2 0.945935 MSCs Rttn 0.938489 MSCs Mcm2 0.936726 MSCs Polr2f 0.933054 MSCs Hspb1 0.929518 MSCs Psph 0.929357 MSCs Lrrfip1 0.927076 MSCs Pold2 0.926056 MSCs Kctd9 0.924984 MSCs Egfl7 0.923466 MSCs RGD1565800 0.922333 MSCs Grin2d 0.919418 MSCs Ica1 0.916431 MSCs Pqlc3 0.908666 MSCs Fmo4 0.905354 MSCs RGD1563581 0.904496 MSCs Ebf3 0.90002 MSCs Btbd14a 0.899153 MSCs Pid1 0.896085 MSCs Vars 0.895589 MSCs Deadc1 0.8951 MSCs Cdh15 0.89501 MSCs Cdc14b 0.892984 MSCs Shmt1 0.891732 MSCs Hdac7a 0.889758 MSCs LOC680477 0.889563 MSCs RGD1307897 0.889537 MSCs Kifc3 0.889111 MSCs Smyd2 0.887529 MSCs Wsb2 0.887298 MSCs Vwa1 0.885098 MSCs RGD1309104 0.882353 MSCs Ndufa12 0.882044 MSCs Ankrd52 0.88161 MSCs Tnfrsf10b 0.880219 MSCs Znf618 0.877231 MSCs Nup155 0.877008 MSCs Rap1gds1 0.875686 MSCs Hspa2 0.875312 MSCs Spsb4 0.872341 MSCs Lmnb2 0.872232 MSCs Murc 0.863715 MSCs 251 Gene symbol LogFC enrichment Population enriched in Dus4l 0.860021 MSCs Slfn4 0.856277 MSCs Gpx8 0.855493 MSCs Kcnk12 0.844082 MSCs Lmo4 0.842343 MSCs Rad51c 0.833717 MSCs Prim2 0.833545 MSCs Aldh9a1 0.833079 MSCs Donson 0.831719 MSCs LOC680531 0.828979 MSCs RGD1561381 0.826888 MSCs Snrpa 0.824862 MSCs Osr1 0.823524 MSCs Cenpl 0.821522 MSCs Numbl 0.82063 MSCs Prepl 0.820366 MSCs RGD1310263 0.817502 MSCs Clic4 0.817495 MSCs Ddx1 0.817175 MSCs Zfp382 0.814465 MSCs Wdr51a 0.812226 MSCs Srf 0.804712 MSCs Pbx3 0.801977 MSCs Kpnb1 0.800526 MSCs Zswim6 0.799177 MSCs Ifit1lb 0.796812 MSCs Zfyve16 0.796099 MSCs LOC687694 0.792873 MSCs Ddx59 0.790222 MSCs Ikzf2 0.787695 MSCs Rock2 0.786616 MSCs RGD1307392 0.785065 MSCs Cep76 0.784812 MSCs Gnrh1 0.783452 MSCs Lce1l 0.781424 MSCs Ckap5 0.780541 MSCs Mkl1 0.778233 MSCs Ppp1r13b 0.777774 MSCs Rcbtb2 0.776925 MSCs Fam98a 0.775675 MSCs Tnk2 0.77436 MSCs Nde1 0.772563 MSCs Bcap29 0.771976 MSCs 252 Gene symbol LogFC enrichment Population enriched in Slc2a10 0.769331 MSCs RGD1311357 0.766269 MSCs LOC498265 0.764429 MSCs Wdr1 0.764231 MSCs Tubb2c 0.764179 MSCs Fignl1 0.763991 MSCs Lrp6 0.763018 MSCs Xrcc5 0.759323 MSCs Pus7 0.757055 MSCs LOC303566 0.752332 MSCs Tcof1 0.751344 MSCs Arl6ip1 0.751149 MSCs Arhgap21 0.74874 MSCs Nup107 0.745331 MSCs Cyb5r1 0.744133 MSCs LOC499418 0.743873 MSCs Uqcrq 0.737866 MSCs Vps26b 0.737671 MSCs Xkr5 0.735592 MSCs Nme1 0.734099 MSCs Ptdss1 0.729706 MSCs Cdh24 0.728564 MSCs Dock6 0.726929 MSCs Dzip1 0.726798 MSCs Pold1 0.723757 MSCs Nup133 0.720452 MSCs Fam124a 0.719128 MSCs Dst 0.717113 MSCs Igf1r 0.712456 MSCs Pcyox1 0.711349 MSCs Cend1 0.708776 MSCs Gpr75 0.708735 MSCs Mthfd1 0.697434 MSCs Pcaf 0.697153 MSCs Hat1 0.696301 MSCs Hist1h2aa 0.694028 MSCs Pfn1 0.690669 MSCs Mras 0.689433 MSCs RGD1307648 0.688893 MSCs Znf483 0.683073 MSCs Rad50 0.68213 MSCs Eif2b2 0.681619 MSCs Cdk5rap2 0.67669 MSCs 253 Gene symbol LogFC enrichment Population enriched in Sh3bp5l 0.675682 MSCs Galnt12 0.67034 MSCs Nasp 0.668743 MSCs Prkci 0.668136 MSCs Serpinb2 0.663595 MSCs Jtv1 0.662664 MSCs Epb4.1l5 0.662612 MSCs Mapk12 0.661378 MSCs Oxsr1 0.660418 MSCs Nup85 0.656737 MSCs Ankrd45 0.653621 MSCs Fam46b 0.653448 MSCs Rangap1 0.650477 MSCs Sec61g 0.647782 MSCs Pfkfb1 0.646639 MSCs Paqr6 0.646637 MSCs Tbrg1 0.645657 MSCs Tubb5 0.642566 MSCs Pls3 0.640865 MSCs Plekhg3 0.639197 MSCs Rap2ip 0.625015 MSCs Cpeb3 0.624811 MSCs RGD1306227 0.624198 MSCs Ccdc85b 0.619519 MSCs Kcnh2 0.619411 MSCs Mcm8 0.615113 MSCs Kif2a 0.61001 MSCs Mxd3 0.60728 MSCs Nlp 0.606752 MSCs Spna2 0.604583 MSCs Hcca2 0.603778 MSCs RGD1562044 0.602541 MSCs Apbb1 0.5993 MSCs Hdgf 0.592069 MSCs Uba6 0.582823 MSCs Tnfsf15 0.582577 MSCs Dars2 0.580561 MSCs Epm2a 0.578651 MSCs Lrrfip2 0.577533 MSCs Hsf2bp 0.573738 MSCs Efemp2 0.558699 MSCs LOC500054 0.540898 MSCs Hdac6 0.526934 MSCs 254 Appendix B Candidate pluripotency genes used for seriation analysis in Chapter 2 Table B.1 Candidate pluripotency genes used for seriation analysis in Chapter 2 A list of 319 candidate pluripotency genes selected by Dr. Connie Eaves‘ laboratory and used to identify those markers for which the expression is increased in undifferentiated ES cells (Supercontig 1). The genes are sorted alphabetically. Gene symbol Seriation result Comment ACTC1 Supercontig 1 Enriched in undifferentiated ES cells ACVR2B Supercontig 1 Enriched in undifferentiated ES cells ADAM23 Supercontig 1 Enriched in undifferentiated ES cells ARID3B Supercontig 1 Enriched in undifferentiated ES cells AURKB Supercontig 1 Enriched in undifferentiated ES cells C15orf15 Supercontig 1 Enriched in undifferentiated ES cells CA14 Supercontig 1 Enriched in undifferentiated ES cells CD24L4 Supercontig 1 Enriched in undifferentiated ES cells CDX2 Supercontig 1 Enriched in undifferentiated ES cells CENPK Supercontig 1 Enriched in undifferentiated ES cells CER1 Supercontig 1 Enriched in undifferentiated ES cells CGB Supercontig 1 Enriched in undifferentiated ES cells COASY Supercontig 1 Enriched in undifferentiated ES cells CRABP2 Supercontig 1 Enriched in undifferentiated ES cells CTH Supercontig 1 Enriched in undifferentiated ES cells CTNNB1 Supercontig 1 Enriched in undifferentiated ES cells CYP26A1 Supercontig 1 Enriched in undifferentiated ES cells DAZL Supercontig 1 Enriched in undifferentiated ES cells DIAPH2 Supercontig 1 Enriched in undifferentiated ES cells DKC1 Supercontig 1 Enriched in undifferentiated ES cells DPPA2 Supercontig 1 Enriched in undifferentiated ES cells DPPA5 Supercontig 1 Enriched in undifferentiated ES cells DSG2 Supercontig 1 Enriched in undifferentiated ES cells EED Supercontig 1 Enriched in undifferentiated ES cells EEF1A1 Supercontig 1 Enriched in undifferentiated ES cells EIF4A1 Supercontig 1 Enriched in undifferentiated ES cells EPCAM Supercontig 1 Enriched in undifferentiated ES cells ETV4 Supercontig 1 Enriched in undifferentiated ES cells FABP5 Supercontig 1 Enriched in undifferentiated ES cells FAM46B Supercontig 1 Enriched in undifferentiated ES cells FAM64A Supercontig 1 Enriched in undifferentiated ES cells FBXO15 Supercontig 1 Enriched in undifferentiated ES cells FGF4 Supercontig 1 Enriched in undifferentiated ES cells FGF5 Supercontig 1 Enriched in undifferentiated ES cells 255 Gene symbol Seriation result Comment FGF8 Supercontig 1 Enriched in undifferentiated ES cells FLJ10884 Supercontig 1 Enriched in undifferentiated ES cells FN1 Supercontig 1 Enriched in undifferentiated ES cells FOXD3 Supercontig 1 Enriched in undifferentiated ES cells FOXH1 Supercontig 1 Enriched in undifferentiated ES cells GAL Supercontig 1 Enriched in undifferentiated ES cells GBX2 Supercontig 1 Enriched in undifferentiated ES cells GDF3 Supercontig 1 Enriched in undifferentiated ES cells GPC4 Supercontig 1 Enriched in undifferentiated ES cells GRB7 Supercontig 1 Enriched in undifferentiated ES cells HMGB2 Supercontig 1 Enriched in undifferentiated ES cells HNRPA1L3 Supercontig 1 Enriched in undifferentiated ES cells HOMER1 Supercontig 1 Enriched in undifferentiated ES cells IFITM2 Supercontig 1 Enriched in undifferentiated ES cells IGF2BP3 Supercontig 1 Enriched in undifferentiated ES cells INS Supercontig 1 Enriched in undifferentiated ES cells ISL1 Supercontig 1 Enriched in undifferentiated ES cells KANK4 Supercontig 1 Enriched in undifferentiated ES cells KPNA2 Supercontig 1 Enriched in undifferentiated ES cells LAMB1 Supercontig 1 Enriched in undifferentiated ES cells LECT1 Supercontig 1 Enriched in undifferentiated ES cells LEFTY1 Supercontig 1 Enriched in undifferentiated ES cells LEFTY2 Supercontig 1 Enriched in undifferentiated ES cells LIN28 Supercontig 1 Enriched in undifferentiated ES cells LRRN1 Supercontig 1 Enriched in undifferentiated ES cells MAD2L2 Supercontig 1 Enriched in undifferentiated ES cells MMP1 Supercontig 1 Enriched in undifferentiated ES cells MTHFD1 Supercontig 1 Enriched in undifferentiated ES cells MYBL2 Supercontig 1 Enriched in undifferentiated ES cells NANOGP8 Supercontig 1 Enriched in undifferentiated ES cells NES Supercontig 1 Enriched in undifferentiated ES cells NLRP2 Supercontig 1 Enriched in undifferentiated ES cells NMU Supercontig 1 Enriched in undifferentiated ES cells NOL11 Supercontig 1 Enriched in undifferentiated ES cells NPM1 Supercontig 1 Enriched in undifferentiated ES cells NPPA Supercontig 1 Enriched in undifferentiated ES cells NR0B1 Supercontig 1 Enriched in undifferentiated ES cells NUDT5 Supercontig 1 Enriched in undifferentiated ES cells OTX2 Supercontig 1 Enriched in undifferentiated ES cells PITX2 Supercontig 1 Enriched in undifferentiated ES cells POU5F1 Supercontig 1 Enriched in undifferentiated ES cells PTTG1 Supercontig 1 Enriched in undifferentiated ES cells RAB3B Supercontig 1 Enriched in undifferentiated ES cells 256 Gene symbol Seriation result Comment REXO2 Supercontig 1 Enriched in undifferentiated ES cells ROR1 Supercontig 1 Enriched in undifferentiated ES cells RP6-213H19.1 Supercontig 1 Enriched in undifferentiated ES cells RPL17 Supercontig 1 Enriched in undifferentiated ES cells RPL23 Supercontig 1 Enriched in undifferentiated ES cells RPL6P27 Supercontig 1 Enriched in undifferentiated ES cells SALL2 Supercontig 1 Enriched in undifferentiated ES cells SALL4 Supercontig 1 Enriched in undifferentiated ES cells SCGB3A2 Supercontig 1 Enriched in undifferentiated ES cells SET Supercontig 1 Enriched in undifferentiated ES cells SILV Supercontig 1 Enriched in undifferentiated ES cells SLC16A1 Supercontig 1 Enriched in undifferentiated ES cells SLC39A10 Supercontig 1 Enriched in undifferentiated ES cells SMS Supercontig 1 Enriched in undifferentiated ES cells SNRPF Supercontig 1 Enriched in undifferentiated ES cells SNRPN Supercontig 1 Enriched in undifferentiated ES cells SOX2 Supercontig 1 Enriched in undifferentiated ES cells SST Supercontig 1 Enriched in undifferentiated ES cells SYCP3 Supercontig 1 Enriched in undifferentiated ES cells TBX4 Supercontig 1 Enriched in undifferentiated ES cells TCL1A Supercontig 1 Enriched in undifferentiated ES cells TDGF1 Supercontig 1 Enriched in undifferentiated ES cells TERF1 Supercontig 1 Enriched in undifferentiated ES cells TEX19 Supercontig 1 Enriched in undifferentiated ES cells TFCP2L1 Supercontig 1 Enriched in undifferentiated ES cells TPM1 Supercontig 1 Enriched in undifferentiated ES cells TRAP1 Supercontig 1 Enriched in undifferentiated ES cells TUBB Supercontig 1 Enriched in undifferentiated ES cells USO1 Supercontig 1 Enriched in undifferentiated ES cells UTF1 Supercontig 1 Enriched in undifferentiated ES cells VASH2 Supercontig 1 Enriched in undifferentiated ES cells VAT1L Supercontig 1 Enriched in undifferentiated ES cells WDSOF1 Supercontig 1 Enriched in undifferentiated ES cells ZFP42 Supercontig 1 Enriched in undifferentiated ES cells ZFP57 Supercontig 1 Enriched in undifferentiated ES cells ZIC3 Supercontig 1 Enriched in undifferentiated ES cells ZSCAN10 Supercontig 1 Enriched in undifferentiated ES cells AC002480.6 Supercontig 2 Enriched in differentiated ES cells ACTB Supercontig 2 Enriched in differentiated ES cells AK3 Supercontig 2 Enriched in differentiated ES cells AMMECR1 Supercontig 2 Enriched in differentiated ES cells ANKRD10 Supercontig 2 Enriched in differentiated ES cells ARL5B Supercontig 2 Enriched in differentiated ES cells 257 Gene symbol Seriation result Comment ASH2L Supercontig 2 Enriched in differentiated ES cells AURKA Supercontig 2 Enriched in differentiated ES cells B2M Supercontig 2 Enriched in differentiated ES cells BIRC5 Supercontig 2 Enriched in differentiated ES cells C12orf48 Supercontig 2 Enriched in differentiated ES cells CACHD1 Supercontig 2 Enriched in differentiated ES cells CALU Supercontig 2 Enriched in differentiated ES cells CCNB1IP1 Supercontig 2 Enriched in differentiated ES cells CCNC Supercontig 2 Enriched in differentiated ES cells CCT8 Supercontig 2 Enriched in differentiated ES cells CDC2 Supercontig 2 Enriched in differentiated ES cells CDH5 Supercontig 2 Enriched in differentiated ES cells CDK2 Supercontig 2 Enriched in differentiated ES cells CDT1 Supercontig 2 Enriched in differentiated ES cells CENPF Supercontig 2 Enriched in differentiated ES cells CHORDC1 Supercontig 2 Enriched in differentiated ES cells CHRNA7 Supercontig 2 Enriched in differentiated ES cells COBL Supercontig 2 Enriched in differentiated ES cells COL2A1 Supercontig 2 Enriched in differentiated ES cells COL5A2 Supercontig 2 Enriched in differentiated ES cells COMMD3 Supercontig 2 Enriched in differentiated ES cells CRABP1 Supercontig 2 Enriched in differentiated ES cells CXorf15 Supercontig 2 Enriched in differentiated ES cells DDX21 Supercontig 2 Enriched in differentiated ES cells DES Supercontig 2 Enriched in differentiated ES cells DPPA4 Supercontig 2 Enriched in differentiated ES cells EDNRB Supercontig 2 Enriched in differentiated ES cells EIF4EBP1 Supercontig 2 Enriched in differentiated ES cells ELOVL6 Supercontig 2 Enriched in differentiated ES cells EPRS Supercontig 2 Enriched in differentiated ES cells ERBB2 Supercontig 2 Enriched in differentiated ES cells ESRRB Supercontig 2 Enriched in differentiated ES cells ETV5 Supercontig 2 Enriched in differentiated ES cells FAM83D Supercontig 2 Enriched in differentiated ES cells FGF2 Supercontig 2 Enriched in differentiated ES cells FLT1 Supercontig 2 Enriched in differentiated ES cells GAPDH Supercontig 2 Enriched in differentiated ES cells GCG Supercontig 2 Enriched in differentiated ES cells GCM1 Supercontig 2 Enriched in differentiated ES cells GFAP Supercontig 2 Enriched in differentiated ES cells GGTLA1 Supercontig 2 Enriched in differentiated ES cells GLIS2 Supercontig 2 Enriched in differentiated ES cells GNL3 Supercontig 2 Enriched in differentiated ES cells 258 Gene symbol Seriation result Comment HBB Supercontig 2 Enriched in differentiated ES cells HBZ Supercontig 2 Enriched in differentiated ES cells HCK Supercontig 2 Enriched in differentiated ES cells HDAC2 Supercontig 2 Enriched in differentiated ES cells HMGA1 Supercontig 2 Enriched in differentiated ES cells HMGB3 Supercontig 2 Enriched in differentiated ES cells HSPA4 Supercontig 2 Enriched in differentiated ES cells HSPD1 Supercontig 2 Enriched in differentiated ES cells IAPP Supercontig 2 Enriched in differentiated ES cells IDH1 Supercontig 2 Enriched in differentiated ES cells IFITM1 Supercontig 2 Enriched in differentiated ES cells IGF1R Supercontig 2 Enriched in differentiated ES cells IGF2 Supercontig 2 Enriched in differentiated ES cells IGF2BP2 Supercontig 2 Enriched in differentiated ES cells IMPDH2 Supercontig 2 Enriched in differentiated ES cells JMJD2C Supercontig 2 Enriched in differentiated ES cells KIF4A Supercontig 2 Enriched in differentiated ES cells KIT Supercontig 2 Enriched in differentiated ES cells KLF2 Supercontig 2 Enriched in differentiated ES cells KLF4 Supercontig 2 Enriched in differentiated ES cells KLF5 Supercontig 2 Enriched in differentiated ES cells LAMA1 Supercontig 2 Enriched in differentiated ES cells LAMC1 Supercontig 2 Enriched in differentiated ES cells LAPTM4B Supercontig 2 Enriched in differentiated ES cells LDHB Supercontig 2 Enriched in differentiated ES cells LIFR Supercontig 2 Enriched in differentiated ES cells LMAN1 Supercontig 2 Enriched in differentiated ES cells LMNB2 Supercontig 2 Enriched in differentiated ES cells MANBA Supercontig 2 Enriched in differentiated ES cells MGST1 Supercontig 2 Enriched in differentiated ES cells MKRNP6 Supercontig 2 Enriched in differentiated ES cells MTHFD2 Supercontig 2 Enriched in differentiated ES cells MTMR7 Supercontig 2 Enriched in differentiated ES cells MYC Supercontig 2 Enriched in differentiated ES cells NCAPG2 Supercontig 2 Enriched in differentiated ES cells NFYC Supercontig 2 Enriched in differentiated ES cells NODAL Supercontig 2 Enriched in differentiated ES cells NR5A2 Supercontig 2 Enriched in differentiated ES cells NR6A1 Supercontig 2 Enriched in differentiated ES cells NUMB Supercontig 2 Enriched in differentiated ES cells NUSAP1 Supercontig 2 Enriched in differentiated ES cells OLA1 Supercontig 2 Enriched in differentiated ES cells PAX6 Supercontig 2 Enriched in differentiated ES cells 259 Gene symbol Seriation result Comment PECAM1 Supercontig 2 Enriched in differentiated ES cells PHC1 Supercontig 2 Enriched in differentiated ES cells PHF17 Supercontig 2 Enriched in differentiated ES cells PODXL Supercontig 2 Enriched in differentiated ES cells POU4F2 Supercontig 2 Enriched in differentiated ES cells PPAT Supercontig 2 Enriched in differentiated ES cells PSMA2 Supercontig 2 Enriched in differentiated ES cells PSMA3 Supercontig 2 Enriched in differentiated ES cells PTEN Supercontig 2 Enriched in differentiated ES cells RBBP9 Supercontig 2 Enriched in differentiated ES cells RCC2 Supercontig 2 Enriched in differentiated ES cells REST Supercontig 2 Enriched in differentiated ES cells RPL10A Supercontig 2 Enriched in differentiated ES cells RPL24 Supercontig 2 Enriched in differentiated ES cells RPL4 Supercontig 2 Enriched in differentiated ES cells RPL7 Supercontig 2 Enriched in differentiated ES cells RPLP0P6 Supercontig 2 Enriched in differentiated ES cells RPS24 Supercontig 2 Enriched in differentiated ES cells RRP12 Supercontig 2 Enriched in differentiated ES cells SALL1 Supercontig 2 Enriched in differentiated ES cells SDC4 Supercontig 2 Enriched in differentiated ES cells SEMA3A Supercontig 2 Enriched in differentiated ES cells SEPHS1 Supercontig 2 Enriched in differentiated ES cells SERPINH1 Supercontig 2 Enriched in differentiated ES cells SFRP2 Supercontig 2 Enriched in differentiated ES cells SFRS7 Supercontig 2 Enriched in differentiated ES cells SMAD2 Supercontig 2 Enriched in differentiated ES cells SMAD3 Supercontig 2 Enriched in differentiated ES cells SMC4 Supercontig 2 Enriched in differentiated ES cells SOCS2 Supercontig 2 Enriched in differentiated ES cells SPP1 Supercontig 2 Enriched in differentiated ES cells SSB Supercontig 2 Enriched in differentiated ES cells SYP Supercontig 2 Enriched in differentiated ES cells TAT Supercontig 2 Enriched in differentiated ES cells TBC1D23 Supercontig 2 Enriched in differentiated ES cells TBX3 Supercontig 2 Enriched in differentiated ES cells TCF15 Supercontig 2 Enriched in differentiated ES cells TCF3 Supercontig 2 Enriched in differentiated ES cells TH Supercontig 2 Enriched in differentiated ES cells TLE4 Supercontig 2 Enriched in differentiated ES cells TNNT1 Supercontig 2 Enriched in differentiated ES cells TPX2 Supercontig 2 Enriched in differentiated ES cells TUBB4 Supercontig 2 Enriched in differentiated ES cells 260 Gene symbol Seriation result Comment UGP2 Supercontig 2 Enriched in differentiated ES cells WDR77 Supercontig 2 Enriched in differentiated ES cells WNT3 Supercontig 2 Enriched in differentiated ES cells WT1 Supercontig 2 Enriched in differentiated ES cells XIST Supercontig 2 Enriched in differentiated ES cells XPO1 Supercontig 2 Enriched in differentiated ES cells ZFPM2 Supercontig 2 Enriched in differentiated ES cells ZNF117 Supercontig 2 Enriched in differentiated ES cells ZNF43 Supercontig 2 Enriched in differentiated ES cells ZNF90 Supercontig 2 Enriched in differentiated ES cells ALDH18A1 Supercontig 3 Other BXDC2 Supercontig 3 Other C13orf7 Supercontig 3 Other C20orf129 Supercontig 3 Other CCNB1 Supercontig 3 Other CD9 Supercontig 3 Other CLDN6 Supercontig 3 Other CLDN7 Supercontig 3 Other COL1A1 Supercontig 3 Other DDX4 Supercontig 3 Other DLG7 Supercontig 3 Other DLGAP5 Supercontig 3 Other DNMT3B Supercontig 3 Other ERVWE1 Supercontig 3 Other FAM29A Supercontig 3 Other FZD7 Supercontig 3 Other GABARAPL1 Supercontig 3 Other GABRB3 Supercontig 3 Other GALNT7 Supercontig 3 Other GJA1 Supercontig 3 Other GSH1 Supercontig 3 Other HNRNPAB Supercontig 3 Other IGF2BP1 Supercontig 3 Other IL6ST Supercontig 3 Other IPO7 Supercontig 3 Other JMJD1A Supercontig 3 Other KRT1 Supercontig 3 Other KRT18P19 Supercontig 3 Other MCM2 Supercontig 3 Other MDK Supercontig 3 Other MTF2 Supercontig 3 Other MYF5 Supercontig 3 Other NASP Supercontig 3 Other 261 Gene symbol Seriation result Comment NBR2 Supercontig 3 Other NME1-NME2 Supercontig 3 Other NOG Supercontig 3 Other NUP205 Supercontig 3 Other PAMR1 Supercontig 3 Other PAX4 Supercontig 3 Other PCNA Supercontig 3 Other PDX1 Supercontig 3 Other PHIP Supercontig 3 Other POLR3G Supercontig 3 Other PSIP1 Supercontig 3 Other PTF1A Supercontig 3 Other PTPRZ1 Supercontig 3 Other RAD51AP1 Supercontig 3 Other RAF1 Supercontig 3 Other SCNN1A Supercontig 3 Other SEMA6A Supercontig 3 Other SKP2 Supercontig 3 Other TCEA1 Supercontig 3 Other TERT Supercontig 3 Other TK1 Supercontig 3 Other UBA2 Supercontig 3 Other UBE2T Supercontig 3 Other ZMYM2 Supercontig 3 Other ZNF257 Supercontig 3 Other ZNF263 Supercontig 3 Other ZNF296 Supercontig 3 Other 262 Appendix C Transcripts enriched and depleted in NBL TICs Table C.1 Transcripts enriched and depleted in NBL TICs Transcripts enriched (LogFC>0) or depleted (LogFC<0) in abundance in NBL TICs compared to SKPs and other cancers have been identified as described in Chapter 3. The RNA-Seq-based log fold change values (LogFC) of NBL TICs compared to SKPs and NBL TICs compared to other cancers are listed in the table. The transcripts are ordered based on the magnitude of the log fold change in the NBL TICs vs. SKPs comparison. Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers MKI67 26.96693646 20.24348157 SYNE2 24.89213386 9.826071774 NUP210 23.07971503 11.262793 SLC38A1 21.94938243 9.734695824 ODC1 18.84307121 13.50929761 TOP2A 16.30455777 9.608052033 RRM2 15.57444611 11.82179336 UCP2 15.57083886 5.3169606 CENPF 15.37177492 8.159367496 MYBL2 15.19227895 9.661516845 ATP8A1 14.8453599 10.23366053 PPP1R16B 14.30008339 10.15757581 LMNB1 13.41180995 8.246556372 HMGB2 12.86230979 7.356895948 KIAA0922 12.60446044 11.94840748 CCDC88C 12.49395783 6.094528175 MCM2 12.35126443 8.644136302 CYFIP2 11.91765271 13.49135946 BUB1 11.88377025 6.827490744 MCM4 11.70087367 8.786203302 HNRNPU 11.65596287 7.663725376 SLC1A4 11.59411326 7.631615754 NUSAP1 11.28618426 6.42912675 WHSC1 11.23229062 7.525906497 NCAPD2 11.2119662 7.998102303 MCM3 11.20572045 6.916758222 MCM7 10.90358042 7.294148992 TPX2 10.88247709 5.193337769 GLCCI1 10.61611644 8.33243708 TMPO 9.895594821 6.576471432 FOXM1 9.872573723 6.259132006 SFRS2 9.852469772 5.338115715 263 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers PARP1 9.806251555 5.963989379 FANCI 9.739436411 6.190187583 PLK1 9.715453192 5.98535316 POLE 9.644724181 7.005488763 BCL2 9.431311828 6.027401695 LBR 9.202775362 5.473504232 SLAIN1 9.109385266 4.29268503 HTT 9.030088243 8.781018918 SPAG5 8.737359666 5.036636031 KIF11 8.677708674 4.924155792 REC8 8.641534886 5.177187826 SFRS1 8.629057042 5.233936209 GTSE1 8.586073335 5.376473968 EPB41 8.582966105 5.082246866 KNTC1 8.58091033 5.259046823 PCNA 8.578027568 6.019505716 C13orf23 8.522257983 6.505111731 HELLS 8.413872205 5.294195148 KPNB1 8.349121905 6.883673178 TYMS 8.31849503 7.1039114 CCNA2 8.275261702 5.608805409 STRBP 8.265040666 4.128132161 FANCA 8.250961879 4.763474473 TNIK 8.250470164 6.347733225 SFPQ 8.21920524 7.393266297 SSH2 8.123243985 4.700551118 HNRNPD 8.118885354 6.12481756 BRCA1 8.080316984 4.694020474 BUB1B 8.022160021 4.766682184 TRAP1 8.020173086 4.726922977 HNRNPH1 7.910774749 6.031343787 POLQ 7.904287215 4.522189398 CCNF 7.83925954 5.109190013 DTL 7.827754961 4.950282716 TIMELESS 7.814059837 4.270228836 NCAPH 7.764024636 4.896804715 ESPL1 7.715435082 5.328877224 HNRNPM 7.704609653 5.00087792 FANCD2 7.676870984 3.472258776 LARP1 7.652278445 8.553497042 EZH2 7.600979598 4.249810161 KIF2C 7.476052997 4.112622992 PAICS 7.374446298 6.454305355 264 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers NCAPD3 7.36873723 4.220636378 SSRP1 7.343881609 5.213869182 ZWINT 7.27778899 3.742347744 CXXC4 7.267073319 4.16133507 TTF2 7.236692033 4.843751718 HMMR 7.233360734 4.540960488 H2AFZ 7.225104774 5.098107304 RRM1 7.194620552 4.913623607 ARHGAP11A 7.136910455 3.889090542 KIAA0226 7.132822101 3.950143242 C15orf42 7.131260335 5.094445131 NCAPG2 7.128260642 3.913703109 CDCA5 7.063811558 4.306104755 WDR62 6.987701084 4.191356296 FUBP1 6.929092106 5.184804987 ASF1B 6.921080204 4.046731342 FEN1 6.851343898 5.216085613 FCHSD2 6.848150542 3.684554185 INCENP 6.833092793 4.923179017 CDCA8 6.831699999 3.587798407 BLM 6.730885681 3.525631711 MCM6 6.722370964 4.874330191 NIPA1 6.702036174 3.742189044 SH2D3C 6.699618285 4.390826168 KCNQ5 6.680977014 5.889312755 BIRC5 6.679754362 3.515619857 CHAF1A 6.665332109 3.93000489 COCH 6.633708753 4.998343929 KIF15 6.596667348 3.711546305 E2F8 6.571113522 4.14937197 RFC3 6.55482491 3.814897155 NOLC1 6.552213307 5.163050984 TOPBP1 6.549238094 3.391938776 R3HDM1 6.49768625 5.017031599 KIAA0101 6.385035305 3.936724279 DLGAP5 6.325357025 3.69758135 STIL 6.309741949 2.937222442 FCHO1 6.300447737 3.948016621 CCNB2 6.275291841 3.364012223 E2F2 6.247975595 3.554071586 PPM1G 6.23586497 3.72597527 MTHFD1 6.226972732 4.768734908 PRR11 6.19794566 4.568309711 265 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers TROAP 6.15467655 3.4151157 TAF15 6.128098956 4.719746686 HJURP 6.125644276 2.833539817 FTSJD2 6.125379201 4.413928313 RFX5 6.103610472 3.891829312 AURKB 6.016255948 3.551091541 DHTKD1 6.015728164 2.518022525 MTMR4 5.984615575 5.538871904 ATIC 5.954323285 5.265440496 RFWD3 5.89016915 4.080894344 MCM10 5.827584837 3.015809736 SLC7A6 5.783429301 5.653614402 CDC6 5.782590016 3.128265846 PLK4 5.778220979 3.090520595 EXO1 5.775830175 3.022462056 CLSPN 5.738688325 3.145796671 INTS7 5.706283007 3.181182621 CDC45L 5.689564308 3.468064778 PDCD11 5.681473873 5.31686487 XRCC2 5.669930973 3.595010252 MARS 5.664132882 4.845495405 PSME3 5.63737278 3.774099378 SLCO5A1 5.594555538 4.17915762 POLA2 5.552885592 4.279033683 ARID3B 5.491653346 4.217263113 MSH2 5.482425558 1.794540467 GOT2 5.475664796 3.79530811 PDE7A 5.382296032 2.765055754 ORC1L 5.367773468 3.895330687 ORC6L 5.367417719 3.467293101 SUPT16H 5.367406587 4.208585658 ANKRD44 5.34289189 4.401552665 FSD1L 5.339145486 4.078777794 C13orf3 5.298071971 2.721428925 GINS2 5.279494061 3.346340865 DEPDC1B 5.246522316 2.524779376 UBAP2 5.24211682 3.456639147 HNRNPR 5.202415118 3.814980856 CDCA2 5.195174062 2.889317414 MCM8 5.179990587 2.750677467 LRRC61 5.01565871 2.333247222 GTPBP1 5.003982945 5.071726797 CPSF6 4.981033566 4.590501629 266 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers CDC25A 4.976662591 2.897095077 WDR76 4.886753231 3.344157827 GINS1 4.884026824 2.154240018 GRK6 4.857117722 3.042371941 POLR1B 4.833254474 2.792918357 CEP192 4.813957861 2.077827663 SART3 4.79851291 3.294444378 NNT 4.78307819 4.503689785 KARS 4.730879121 3.214759655 SAFB 4.715110885 3.33951508 UBR7 4.680847249 2.541130104 CKAP2L 4.657502861 2.606384436 SMG7 4.643623934 3.364262941 XPO5 4.631385079 3.880937709 WBP11 4.609014895 4.1022292 FIGNL1 4.606627418 2.08154431 POP1 4.591867213 3.59779264 POU2F1 4.563974482 2.181336015 MAD2L1 4.561555188 2.508935283 LARP5 4.558911544 2.65342629 RNF34 4.544089795 2.934788489 BRI3BP 4.532722879 2.871632222 KIF18A 4.522376826 2.206534579 WDHD1 4.479557562 2.520464807 SFRS14 4.465183576 4.195033652 GSG2 4.460019337 2.711871862 MLH1 4.425945334 3.479686613 RAD51AP1 4.414012376 2.04157747 BRIP1 4.405527838 2.329165924 MARS2 4.398022854 2.932305616 SFXN1 4.397965629 3.007931772 POLR1A 4.382775726 5.193290309 RAD54L 4.370926787 2.608848739 SCLY 4.355601848 3.445955091 NUP214 4.353240599 2.936660367 CDC7 4.295084175 2.081913712 POLE3 4.291177896 2.103924077 TNPO3 4.280898854 3.598140761 PRIM1 4.274990302 2.025835536 FAM60A 4.256602585 2.075988358 GART 4.243787156 3.429799675 VPRBP 4.215273963 3.15503957 VRK1 4.188937853 2.163563272 267 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers GPHN 4.161746127 3.38821531 DCLRE1A 4.146642533 1.79077843 ARHGAP19 4.132293427 2.061365319 PAXIP1 4.124460519 3.004467999 WDR77 4.123325715 3.025945145 SMARCD1 4.120511779 3.615830263 NUP107 4.045449327 1.753345509 C18orf24 4.04326302 2.018041286 NUP93 4.027404583 2.945193827 RFC5 4.010794374 1.354433287 SPC24 3.992537717 2.214071882 NUP88 3.943067477 2.388249501 IMMT 3.914790876 2.508472502 CLN6 3.900640665 2.27579406 AKIRIN2 3.900030725 2.836039109 CCDC21 3.879431224 1.437645446 FBXO5 3.828820351 2.238630699 HIRA 3.82496504 2.994567893 SKP2 3.815969671 2.416286639 HIRIP3 3.80741046 1.628796146 TARS2 3.802558603 1.3672385 CDC23 3.802042917 1.860206111 CHAF1B 3.783016334 2.054530361 MRPL37 3.780801493 2.40942888 ZNF142 3.770070642 3.048480677 UBE2N 3.750756431 2.344418087 CASP2 3.750512131 2.340320183 FH 3.744416849 2.043887261 SUV39H2 3.74039011 2.195390412 WDR33 3.738294696 1.758481797 GSTCD 3.737602949 2.162448303 RQCD1 3.734033552 3.535254452 TOP3A 3.725890353 2.635567545 ADORA2A 3.717888226 4.442241391 KIF24 3.697520999 1.409909211 AOF2 3.687197714 2.599503608 GINS3 3.662782269 1.739907398 POLR2D 3.628248484 2.931046889 FLVCR1 3.618104604 1.996451151 GINS4 3.615801209 2.356390981 CNOT10 3.613905301 1.685010236 SMCR7L 3.608625305 3.212237855 PLAGL2 3.524856809 2.109283482 268 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers DARS2 3.51768568 1.487205349 TTLL4 3.476111783 2.441340128 MCCC1 3.471570139 1.534167246 DBF4 3.45884113 1.777369717 SFRS15 3.424266851 3.379665735 IL7 3.38843542 2.267578323 COX15 3.378255239 1.412871034 CAMK1D 3.377882359 3.177801349 ZBED4 3.336441744 2.829901878 USP10 3.268405624 2.576268409 KIAA0406 3.234756667 2.035658762 LIG3 3.229174992 1.64472414 GTF2I 3.220667531 3.450350357 INTS9 3.198541778 2.434484233 IQCB1 3.178188401 1.643674056 SPC25 3.106304875 1.380774585 CIRH1A 3.077488856 3.202130099 UIMC1 3.073808766 1.733997127 SLC25A44 3.036954121 1.439039613 TUBGCP4 3.036740346 2.074667408 DBF4B 3.033985363 1.140479257 NEIL3 3.02537989 1.304399255 OBFC2B 3.021603876 2.190940372 MRPL39 3.019986937 1.567427458 NEU3 3.017587736 1.545475561 TDP1 3.003823574 1.52193641 EPS15L1 3.003625319 1.814080644 OIP5 2.96368369 1.170646238 TRAIP 2.937710558 1.031190933 YY1AP1 2.898722858 1.609558842 CENPH 2.881530426 1.902673357 MND1 2.876177385 1.564408195 CCDC150 2.795805465 0.886209742 CCDC138 2.765815084 1.134923442 C2orf44 2.751755693 0.906136788 TUBGCP3 2.721374409 1.77509212 C17orf53 2.68997555 1.582560662 ZNF367 2.689225324 1.497127019 PRPF4 2.633247904 1.538737866 FBXO22 2.618684553 2.250848582 TMEM38A 2.616570587 1.368175333 SAAL1 2.593548151 1.286650015 EIF4ENIF1 2.577655456 1.407605112 269 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers EXOSC2 2.571935733 1.532168826 PSPC1 2.569692198 2.784445975 FANCC 2.566279224 1.664071832 SUPV3L1 2.492549812 1.339015985 CHAC2 2.485408105 1.024778737 PMS2 2.448133776 0.929402054 CEP72 2.439410684 0.772423918 OXNAD1 2.410329055 0.76354805 RCL1 2.404256929 1.542624735 DEPDC5 2.39761354 1.248607917 TAF5 2.362775978 1.223091289 C1orf83 2.301767803 1.151520786 GPR63 2.280708562 1.207851655 CENPP 2.248725925 1.248345294 ZNF346 2.244204786 1.546711313 ZSCAN29 2.207977222 1.343883678 TIPIN 2.138518069 0.891238401 C1orf135 2.103025921 0.759216212 PDSS1 2.095539926 0.896088276 TADA2L 2.074052935 0.954661779 MAPKAPK5 2.012235543 1.039750798 RDM1 1.988151172 0.648744845 FANCF 1.977133522 1.191082872 DIS3L2 1.933699088 1.069976799 ADAT1 1.92676802 2.008459187 RFT1 1.86627526 1.147624113 EXOSC3 1.829802731 1.068017527 LCMT2 1.73709081 1.105566639 HIST4H4 1.585250613 0.860151795 AGBL3 1.420685766 0.432104796 DMC1 1.168679573 1.050398384 COL6A3 -78.04876454 -16.85990928 COL1A2 -74.28382256 -30.63347454 COL6A1 -70.81728836 -14.82521282 COL3A1 -69.32155992 -26.02203479 COL6A2 -66.47244796 -14.23599076 LRP1 -63.70014555 -15.92959162 STC1 -47.54280562 -6.145779114 MMP14 -41.93994416 -11.71656345 THBS1 -41.71825847 -19.39565855 A2M -41.67647726 -15.45064146 TIMP2 -41.32682416 -16.66538096 COL12A1 -39.56528437 -10.31866608 270 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers TIMP3 -36.85823738 -9.266906593 LTBP1 -36.46381379 -6.571303551 IGFBP4 -34.37614225 -8.623311271 DCN -33.47103761 -11.02791379 MRC2 -31.63508222 -7.381992482 THBS2 -30.30736203 -8.919600586 COL4A2 -30.04177913 -18.2569881 SPARC -30.03075174 -23.25678739 CTSB -29.78773998 -22.91869527 LTBP2 -29.43404477 -7.758417707 COL4A1 -28.91887381 -18.14799415 COL5A1 -28.27314464 -11.44344762 LAMA4 -27.82331693 -8.506528196 IL1R1 -26.61378524 -5.95894671 EMP1 -25.54752865 -7.958234731 APLP2 -25.46329869 -16.10061772 LIF -25.38426464 -4.897867691 EGR1 -25.35845682 -9.915105808 IGFBP3 -25.11693493 -16.68251927 CTSK -25.05455531 -4.47115708 GPNMB -24.83904743 -11.49861057 C1S -24.5961446 -11.02230665 COL14A1 -24.59352871 -4.22817457 GPR177 -24.34878596 -8.438125423 PAM -24.32406687 -8.359315884 CDH11 -24.1193849 -9.565715541 ITGA5 -23.64093796 -9.260110484 GJA1 -23.60614413 -11.61979875 RND3 -23.55506063 -6.0046257 SERPINE1 -23.16056691 -8.630229627 APP -22.98030826 -12.75370324 LUM -22.77965526 -10.17257656 SNED1 -22.67452714 -4.868355364 TNS1 -22.45675488 -9.714016832 KIRREL -22.43946218 -9.94648015 PCDH18 -22.02019443 -4.189050091 NOTCH2 -21.89379407 -16.32654975 COL5A2 -21.77502404 -11.52680265 NRP1 -21.38894984 -7.750109981 TGFBI -21.24751868 -13.07147811 WWTR1 -20.75644679 -7.836355958 NR4A2 -20.6156634 -2.9510618 GEM -20.60197572 -2.620395877 271 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers OSMR -20.56101279 -9.201801163 FAM20A -20.39985668 -4.593564199 DUSP1 -20.16242224 -9.657610928 NBL1 -20.07469535 -6.937737177 FAM129B -19.98783785 -10.39608277 LAMB2 -19.95308915 -14.69860634 CYBRD1 -19.93848535 -6.370081816 OLFML2A -19.90998587 -2.904819273 SH3PXD2B -19.83901217 -6.39907524 CALD1 -19.74193838 -12.19211819 ATP2B4 -19.61647869 -11.0788483 PTRF -19.58427542 -10.97920262 FKBP10 -19.58059473 -8.954395944 HTRA1 -19.57924127 -14.13191448 C8orf4 -19.50378531 -6.202443039 RASSF8 -19.48824453 -3.883287891 SEMA5A -19.47901493 -5.99655432 FOS -19.14497757 -12.3390116 GNG12 -19.08650837 -10.97189749 BDKRB2 -18.97342222 -2.678507743 YAP1 -18.96674298 -12.59020866 FLRT2 -18.79091846 -7.184943351 CTSL1 -18.59753461 -10.46089502 NOTCH3 -18.54489451 -10.36533256 CA12 -18.49584348 -6.515770648 COL18A1 -18.42731903 -11.71050325 HMCN1 -18.36868778 -7.946450498 KCTD12 -18.19363982 -9.273857259 EHD2 -17.93919932 -8.616043747 TEAD1 -17.92768588 -8.770750578 ITGB5 -17.64107271 -12.49497074 SERPING1 -17.55234831 -11.10516147 NFIX -17.43834292 -8.956013952 SOD2 -17.35300278 -6.836897057 GPR124 -17.25492758 -4.222101983 CALU -17.1889367 -5.999802865 VCL -16.83625316 -7.367180565 APOD -16.79051527 -5.790401164 C3 -16.71424309 -16.10242887 ECM1 -16.66393171 -3.823785596 RAI14 -16.5905115 -8.281575704 ANTXR1 -16.57881118 -11.91383723 FOSL2 -16.48463792 -12.57451693 272 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers SVIL -16.46876063 -6.175904336 FAP -16.28886625 -3.725852679 PLAU -15.86203841 -5.943753873 PLXND1 -15.66738386 -8.662532266 OLFML3 -15.66277569 -4.28071462 PTGES -15.63034811 -2.648610768 TSHZ3 -15.57785844 -2.969009019 SERPINH1 -15.56813836 -8.359743614 CAPN2 -15.38654712 -13.6095608 PHLDB1 -15.36518663 -8.63821974 GSN -15.34059977 -9.672098668 LMNA -15.31965109 -5.824609806 AXL -15.22175969 -8.904915926 EDNRA -15.20618586 -3.608933886 ASPH -15.18095934 -8.113789638 DOCK1 -15.16017507 -10.10388288 WNT5A -15.15001855 -5.553020355 SERPINF1 -15.13573756 -5.056045743 ADAMTS1 -15.05286479 -8.104117196 CDCP1 -15.02753489 -7.152940405 CD63 -14.96413661 -10.03787356 TPBG -14.839185 -3.471507963 EGFR -14.55248836 -8.941034339 IL1RN -14.48309103 -3.207110942 EPAS1 -14.48042686 -8.640772663 CLDN11 -14.47746878 -3.682166313 SNAI2 -14.46265737 -4.41616214 NDRG1 -14.44607486 -14.75347253 ZCCHC24 -14.4431622 -5.727545191 PPFIBP1 -14.40674149 -6.024646385 LPHN2 -14.33308517 -7.825536328 FGF7 -14.27535374 -1.788270549 COLEC12 -14.24886118 -3.203575325 JUN -14.23874206 -6.766796119 PLAT -14.19710956 -4.525958612 COL15A1 -14.16794557 -7.121650502 DLC1 -14.13984025 -6.448991963 MYADM -13.73542092 -4.854969622 GLUL -13.72780957 -15.97624911 RHOB -13.69059127 -9.772591693 PRICKLE2 -13.65239421 -6.475097061 BNC2 -13.61192273 -5.054286963 AKAP12 -13.59256823 -7.122020042 273 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers ANXA5 -13.58163078 -6.852037154 PID1 -13.48603173 -3.866039118 DAB2 -13.46091826 -9.876934964 FBLN2 -13.39394781 -4.151322139 LTBP3 -13.3631227 -6.858943143 RUNX1 -13.32070135 -5.355153529 SPON2 -13.31926274 -3.342430912 DUSP6 -13.15615784 -9.455747147 PPAP2B -13.12560671 -4.165517189 CKAP4 -13.07249461 -5.835801251 ENG -13.06560596 -5.664063169 SLC7A8 -13.02527221 -5.849910393 ANGPTL2 -12.99594985 -4.498940098 SCARA3 -12.98026158 -6.361633292 TGM2 -12.94356396 -9.954817516 CRISPLD2 -12.90363585 -7.239695428 LAMA2 -12.88112437 -4.168749481 AQP1 -12.86038045 -6.11426318 RAPH1 -12.83514349 -6.382058242 ARRDC3 -12.72082597 -6.837594341 CNIH3 -12.68270008 -1.611950504 S100A16 -12.63290844 -9.279431721 LAMC3 -12.62765305 -3.714979413 SQSTM1 -12.625313 -5.781480456 ERRFI1 -12.60055212 -7.840590918 TENC1 -12.59809289 -7.652323199 CRTAP -12.58839591 -4.687266734 C13orf33 -12.48512237 -1.916803209 ABCA8 -12.47150186 -4.533097268 TPM2 -12.44094513 -7.805033448 PARVA -12.43777936 -6.969828096 ITGAV -12.38530746 -8.139326983 CD59 -12.30291445 -5.92380581 ITGA2 -12.25986726 -7.203555443 APOE -12.25754133 -11.17096323 CYP1B1 -12.24828786 -6.947273878 ADAM12 -12.15207836 -3.932139302 PLTP -12.1307572 -8.020028122 COL27A1 -12.06210615 -8.623114701 PRNP -12.0059327 -7.722098809 CYR61 -11.99693644 -12.72590386 THY1 -11.98744733 -7.603081229 CD81 -11.93552128 -5.561894401 274 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers ADAMTSL4 -11.87497622 -4.217613275 SGCD -11.87200633 -2.880542507 MYO1D -11.80020356 -6.4294112 TGFBR2 -11.79727963 -6.994783647 MYL9 -11.79645654 -9.5779603 VGLL3 -11.79643524 -5.495291356 SGK1 -11.7941642 -10.6613682 TGFBR3 -11.79155848 -5.454172978 LOXL1 -11.59938592 -3.237194054 ANO1 -11.59710602 -4.166717255 GLIS3 -11.55381543 -4.365957691 CDKN2B -11.51121895 -4.17916099 ABCA1 -11.43556142 -6.649833717 PTPRM -11.36916806 -5.162405371 SDC2 -11.25293874 -9.504626621 LTBR -11.15390232 -7.721778445 FKBP9 -11.09377343 -5.699242159 IGFBP7 -11.07068506 -16.10366513 CYP26B1 -11.06377339 -2.628451065 DKK3 -11.00099782 -5.653720777 RIN2 -10.98113061 -8.299592625 NFIL3 -10.89407591 -3.649267639 LAPTM4A -10.88216777 -9.47215333 BMP1 -10.8786157 -4.766378222 SOD3 -10.81387608 -3.877136096 S1PR3 -10.7987874 -4.286666048 ZAK -10.72636121 -5.392825677 RHBDF1 -10.71524753 -4.907186665 PTGFRN -10.60453625 -6.133509099 FAM114A1 -10.59929069 -4.903248794 CFH -10.56578364 -7.485800187 LDB2 -10.55190267 -3.293476084 CXCL12 -10.51769782 -5.381089935 SASH1 -10.51428694 -6.504370574 OLFML2B -10.50710993 -4.6165065 CCDC80 -10.46805278 -9.757658449 PLEKHH2 -10.45808007 -8.975549088 CCL2 -10.44772256 -2.938458965 ST5 -10.35820527 -7.230739093 ANXA2 -10.35450711 -8.005340644 RCN3 -10.25311568 -3.828035687 CERCAM -10.2297807 -5.433521276 FBLIM1 -10.22314531 -6.428469349 275 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers AOX1 -10.22031805 -3.454629633 TNS3 -10.20737724 -8.434795054 TNFRSF21 -10.14190423 -4.438171176 TEAD3 -10.1227539 -5.395441645 MYO10 -10.12081238 -12.21513298 TSC22D1 -10.10898402 -7.582652909 S100A11 -10.10188181 -8.730604537 BCL6 -10.09658936 -7.822117134 C20orf108 -10.09016468 -6.969000036 ISLR -10.08844424 -4.259210081 HS3ST3B1 -10.08841956 -2.320917712 HDAC7 -10.03171112 -6.470323992 CTTN -10.02167162 -6.051999367 SIPA1L2 -10.01795158 -8.801795287 EFEMP2 -10.00911656 -4.619686694 SLIT2 -9.938604134 -2.985233377 CEBPB -9.898592102 -4.505294919 LPP -9.878244273 -6.655387588 CTGF -9.819439609 -12.64589232 GLI3 -9.819055635 -5.886470103 ANXA1 -9.782642138 -9.528694416 SPRED1 -9.76651532 -2.727582675 ZNF521 -9.763690352 -2.730362369 GNS -9.729843848 -6.339669288 LOXL4 -9.708168833 -1.736244632 ACVR1 -9.658399585 -5.484235318 ACVRL1 -9.635867517 -2.713385941 MARVELD1 -9.635339224 -4.377384876 MRGPRF -9.631304729 -1.981925739 PHC2 -9.609922427 -4.071979284 CTSD -9.606444904 -13.31775028 THBD -9.598752449 -4.821603233 CD9 -9.581243802 -11.39487796 PRSS23 -9.570128052 -12.42252705 PPL -9.561352289 -6.426420138 PMP22 -9.547583021 -6.147791751 SLC2A10 -9.516643045 -5.66408614 CXCL2 -9.507327972 -2.843435827 SDCBP -9.500524879 -7.215643551 ELN -9.491548517 -3.556549743 PERP -9.391331291 -9.895308091 FZD1 -9.383943068 -4.742826265 CD68 -9.349115776 -6.274991454 276 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers SYNJ2 -9.29317645 -3.42565216 PLAUR -9.289470894 -5.622362986 GPC1 -9.274839211 -5.734448346 IGFBP6 -9.235275592 -2.554213124 GPX8 -9.215793404 -5.12605569 CAMK2N1 -9.187242129 -5.306105918 TSPAN9 -9.174666495 -4.103350464 PRRX2 -9.126840885 -2.114365079 PLSCR4 -9.124381087 -4.684690133 RRBP1 -9.108183756 -7.177983965 QSOX1 -9.107436296 -10.3064023 FMNL2 -9.065666318 -6.936887663 NPAS2 -9.041839001 -7.411285038 GAS1 -9.026347802 -4.247690497 SDC3 -9.013725476 -7.144361436 FBLN5 -9.001980836 -4.161165708 CPXM2 -8.992582666 -3.834618173 ITGA3 -8.980123329 -9.451206222 P4HA3 -8.954169365 -2.082308203 EPHB4 -8.936195555 -6.952342539 NNMT -8.925792746 -7.743672477 LEPREL2 -8.916546541 -3.315024385 IGF2BP2 -8.908767903 -6.95203324 FZD7 -8.897677253 -3.468373513 RAB34 -8.89608857 -5.710764579 TFPI -8.84179743 -7.175196491 RAB11FIP5 -8.840025129 -5.451994883 KLF9 -8.839468524 -4.918849303 BEST1 -8.827654094 -3.919819046 PLA2R1 -8.770241517 -3.338931021 ENPP1 -8.752570065 -3.668683867 P4HA2 -8.734472794 -4.809403071 COL8A1 -8.726130522 -3.532970244 FOXF2 -8.701247735 -2.117684582 C1orf198 -8.699706442 -7.491121806 PRKCDBP -8.69071793 -2.74946525 FMOD -8.681518412 -6.154060851 EXT1 -8.668801156 -5.665957739 RASL12 -8.665005333 -3.460498611 CTHRC1 -8.647135441 -4.506655088 SYDE1 -8.646761171 -4.365059329 GNG11 -8.644075412 -3.773216942 MGP -8.624858953 -5.562503466 277 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers PDPN -8.510596161 -4.357340742 HNT -8.506693559 -2.438433309 TSKU -8.504561028 -5.725797058 CTDSPL -8.500641577 -6.550946516 ABCC3 -8.496660426 -6.888844974 COL21A1 -8.476494712 -2.126368743 SOCS3 -8.471398809 -6.777988424 RASD1 -8.453479432 -2.810048051 MYLK -8.44827779 -6.529454031 SPOCD1 -8.386992704 -2.257010912 HTRA3 -8.371155822 -3.014984376 NDN -8.366652244 -2.742375004 TFAP2A -8.322210864 -5.920147189 FAM46A -8.308391213 -6.861647777 ETS2 -8.291891093 -4.355855537 PDE4D -8.233225415 -3.346617658 KREMEN1 -8.152496436 -4.244799555 SDC4 -8.13050297 -6.735199708 MAF -8.085015773 -6.759776865 FCGRT -8.073802006 -6.589981787 TGFB3 -8.060166458 -2.208196382 CAPN5 -8.051766787 -4.212594332 CXCL14 -8.046046533 -5.825759452 RBPMS -8.016187551 -5.237182416 KLF4 -7.984642039 -5.567505716 OAT -7.984368681 -5.668808035 MXRA8 -7.971489533 -2.522300331 UACA -7.963584603 -9.56377129 PROS1 -7.938933499 -4.183393923 ADM -7.927246362 -4.584555704 USP53 -7.910664276 -5.720550057 SDC1 -7.905863484 -6.210648382 OLFML1 -7.888464408 -2.14161084 LIFR -7.86819662 -7.824708089 LGALS3 -7.865227668 -6.554969793 WIPI1 -7.85738754 -2.363668589 IL1B -7.826892855 -1.62473 NXN -7.817095474 -5.347718906 GRB10 -7.796194839 -6.771341544 BNIP3L -7.751175484 -6.11879348 KDELR3 -7.750582284 -4.896616797 TEAD2 -7.733819222 -5.53817432 VEGFC -7.726343312 -2.760093577 278 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers S100A10 -7.706383395 -9.426143903 EMP2 -7.681330298 -5.064766873 CD151 -7.680970127 -5.309222834 SECTM1 -7.676116872 -3.525289925 SFXN3 -7.658782612 -4.005013653 CXCL1 -7.593245321 -1.585578081 OSBPL5 -7.589333896 -4.278467372 ADAMTS4 -7.583022887 -3.258689058 ARHGEF10 -7.57342629 -5.671118169 BMP2 -7.570872832 -2.593800257 SATB2 -7.5341501 -3.909401256 HAS2 -7.525316662 -2.484401716 SEPT10 -7.498414731 -4.471631734 C4orf18 -7.48128806 -9.189586226 CDC42EP1 -7.477939717 -7.082085177 TGFB1I1 -7.47142632 -3.173952085 ABLIM3 -7.452537899 -3.107283056 TNFRSF1A -7.439621713 -4.958182382 GAS6 -7.402249328 -7.10299006 KIAA1217 -7.399828827 -10.89688074 GFPT2 -7.388986754 -3.593618384 SPHK1 -7.379576269 -2.864533216 EFEMP1 -7.366984718 -7.114644571 MICAL2 -7.32384118 -5.48083052 ARSJ -7.311815753 -3.407951701 EPHX1 -7.279526661 -8.725567839 PAPSS2 -7.246967525 -5.411536807 CTNNA1 -7.205848641 -5.306411666 CLEC11A -7.18377535 -1.819791819 PLXDC1 -7.137582179 -3.989787157 TMBIM1 -7.12751138 -4.402032703 RHOJ -7.110654409 -2.781958958 LRRN4CL -7.108072581 -1.252273333 PBX1 -7.100835591 -3.564040295 PRKG1 -7.087578611 -2.412224689 VAMP3 -7.023920839 -5.574199643 MSRB3 -7.016889633 -5.129640282 C1QTNF1 -6.955396955 -5.521046776 PODN -6.895086391 -3.289103217 PPP2R3A -6.877105692 -4.079980097 INHBA -6.845944344 -3.72848226 SEMA3C -6.822473957 -8.224077435 BACE2 -6.800375378 -7.583558835 279 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers SELENBP1 -6.795310263 -5.730780685 SLC12A4 -6.772912594 -2.971504846 WASF3 -6.739269145 -6.028087784 MMP19 -6.732602067 -3.001987285 BOC -6.726496252 -3.747440184 HEXB -6.703179153 -5.216143131 VDR -6.702047932 -4.060181813 COL8A2 -6.700493251 -2.862313155 TNFRSF12A -6.691840145 -5.151227194 LEPR -6.631501555 -3.272815512 TCF7L1 -6.605692804 -4.34587665 ADAMTS2 -6.602837416 -3.150757871 CST3 -6.601605906 -8.156517547 MICALL2 -6.599242848 -4.648431844 ARHGAP28 -6.59233419 -3.198146613 ARHGAP22 -6.56438503 -2.096098486 MET -6.554864818 -8.971494241 SERPINB6 -6.510573534 -7.342858329 SPRY1 -6.48080002 -4.82771692 PDGFC -6.475847796 -9.507483635 GLDN -6.465955374 -2.93320048 LAMP1 -6.454410133 -3.683878078 GNB4 -6.431438675 -5.720721892 PCYOX1 -6.429425406 -3.83655458 STON1-GTF2A1L -6.40807117 -4.447070424 BAG3 -6.357222603 -4.75649919 SLC41A1 -6.347525157 -5.787237437 PRELP -6.304871424 -4.466188185 TMEM98 -6.278499121 -4.549171697 LRIG3 -6.274767847 -4.391888718 LRRC32 -6.271102492 -3.92078561 FABP3 -6.27000498 -1.680905462 ADAMTS14 -6.252392959 -2.950600514 FZD8 -6.238275342 -3.667324801 SORBS3 -6.199764438 -4.983742988 PLOD3 -6.191712523 -2.740599898 ADAMTS5 -6.176111936 -4.861178552 PROCR -6.154886679 -2.507465922 SIX4 -6.152206492 -4.052431783 FOXC1 -6.145639303 -3.735285252 RCN1 -6.135201968 -3.188453242 CTSA -6.131197523 -6.613474808 PVRL2 -6.130258386 -8.693192786 280 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers GSTM5 -6.128755384 -1.725050053 TCF7L2 -6.117949068 -4.32435385 DIXDC1 -6.087907482 -3.058016053 TBC1D8 -6.077189106 -3.44033092 KIAA0284 -6.062288324 -4.99177228 F2RL1 -6.026680764 -5.483468262 ELTD1 -5.992308396 -3.537887829 HRH1 -5.984605139 -5.171036736 DUSP3 -5.97886987 -4.994806874 PLXDC2 -5.978129737 -4.292542777 YES1 -5.969485046 -3.159561299 LRP11 -5.969400803 -6.227606164 SPSB1 -5.96178341 -4.281276609 THRB -5.956137533 -5.544209459 SMOX -5.955836165 -2.350371197 RUNX2 -5.951199409 -5.825787079 HOXA10 -5.946566364 -2.682600664 FZD4 -5.929448199 -6.498081664 COL13A1 -5.916915567 -2.52997628 MAFB -5.909735755 -5.610078081 NOV -5.887565304 -3.360918343 DOCK6 -5.880463474 -3.247385072 CTSF -5.840415945 -4.842621904 C1RL -5.814845584 -3.124944199 MMP11 -5.814552698 -4.532706056 ACO1 -5.807221424 -2.68174222 MAP3K6 -5.802085804 -4.050536119 KITLG -5.798588989 -4.272432622 TM4SF1 -5.791335369 -12.53707774 RIN1 -5.79041071 -2.104647464 HSD11B1 -5.76932916 -2.22178531 TAX1BP3 -5.76452567 -3.209701844 PAQR5 -5.761581864 -4.214632811 C5orf4 -5.754953241 -3.793389815 PLEKHA4 -5.720222629 -3.258903839 SRGAP1 -5.70642442 -3.853435882 HSPA12A -5.66454931 -4.817842799 CAV2 -5.660559349 -7.57964101 ACOX2 -5.628977628 -2.736537429 BHMT2 -5.625579946 -1.345439586 GLI2 -5.603244683 -2.473602022 RAB3IL1 -5.565584766 -3.132443833 SEMA3F -5.557072592 -4.270430452 281 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers B3GNT5 -5.5404847 -4.112549456 CXCL3 -5.508133504 -1.585181963 REXO2 -5.487375387 -4.875971125 NPC2 -5.467632755 -7.19861593 RAB23 -5.463987254 -2.125287919 F3 -5.424974893 -7.132590103 PHACTR2 -5.401149075 -3.918105107 FHL2 -5.376727646 -8.031326308 VGLL4 -5.363118769 -4.120134414 PRSS35 -5.343275362 -1.92637967 KIFC3 -5.310813373 -4.354083028 PDGFRL -5.3055922 -2.385839009 ID1 -5.28631763 -5.298422921 C6orf145 -5.279585823 -5.20611973 HEBP1 -5.272833822 -4.4772269 GLIPR2 -5.26032381 -4.116335794 FAM176B -5.247141428 -1.866258823 GLT8D2 -5.24163493 -2.046208978 MTMR11 -5.222017527 -4.246619305 NAV3 -5.217429215 -4.100427073 AASS -5.213321679 -4.189915377 PITX1 -5.210536739 -2.421279115 MAFF -5.182515306 -2.544924447 CAMKK1 -5.168032725 -1.907668336 ADCY4 -5.16494943 -2.429088209 RRAS -5.149514903 -3.14941192 VWA5A -5.129713685 -3.862181613 FAM180A -5.08850125 -1.171564467 SELM -5.021347133 -3.355638782 TFAP2C -4.99638407 -3.433801542 CYFIP1 -4.96855948 -4.929299348 HOMER3 -4.952919758 -2.81510466 PPAP2C -4.950033988 -5.907537966 FCHO2 -4.920911614 -6.490230276 PVRL3 -4.906461106 -2.641037996 PPM2C -4.901737602 -3.609136592 CYYR1 -4.884924548 -2.600459654 PPIC -4.884664398 -4.676650833 ANKMY2 -4.881466636 -4.557392134 LAYN -4.822383468 -4.05499754 TRPM4 -4.819527968 -6.0084593 ALDH1A1 -4.81528634 -6.074483835 S100A13 -4.778693848 -4.221550791 282 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers SLC39A13 -4.776980813 -1.719045894 FNDC4 -4.773185554 -2.562406532 KCNE4 -4.762751855 -2.735488259 SLC17A5 -4.744694533 -2.305056467 PELI2 -4.698487941 -3.244166815 ZDHHC1 -4.651571589 -2.035733645 PCSK5 -4.632623254 -4.607916997 PDIA5 -4.611012344 -2.144754054 NPTN -4.606204812 -1.695356223 C10orf26 -4.575940207 -2.067815648 SERTAD4 -4.561560526 -2.137812621 NFIA -4.535872275 -3.876790266 C7orf58 -4.495019125 -3.209354968 PRDM16 -4.438731006 -2.319554634 CCDC8 -4.436150348 -2.748129084 SMPD1 -4.419194676 -1.581937356 RBMS1 -4.404841258 -1.754251543 HSPB7 -4.389496884 -1.468639817 RAB32 -4.370171421 -3.918508973 C15orf52 -4.36349757 -3.942262012 ESM1 -4.3497793 -2.778770497 PDK4 -4.338228338 -5.718004565 CHRD -4.313464557 -3.664091931 MLPH -4.309663139 -6.717474014 SEMA3B -4.308485621 -5.190046543 SLC27A1 -4.294278503 -3.325017944 HSPB8 -4.292079135 -3.936750911 AKR1C3 -4.290460758 -2.947677596 PLK2 -4.275368945 -7.055120854 MITF -4.250183532 -5.643236941 AKR1C1 -4.249415053 -1.466560683 SULF1 -4.246097656 -8.139079379 FIBIN -4.226883212 -3.386613889 C2CD2 -4.211710288 -3.143417078 C14orf37 -4.198515451 -3.106271068 CCDC149 -4.192084046 -3.482738848 TMEM43 -4.177853825 -2.643167232 UBASH3B -4.170145266 -3.774068436 TSPAN14 -4.169282644 -4.758198937 ANKS6 -4.154123871 -3.724535282 PHYHD1 -4.153721779 -4.216307558 MRVI1 -4.149700688 -4.684852909 IDUA -4.132797153 -2.207640717 283 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers EVI1 -4.114268194 -7.754222039 NRBP2 -4.104305668 -4.610167775 AK1 -4.093822904 -2.551949602 GDNF -4.088665901 -2.190981364 RYK -4.0815333 -2.137519639 SEPP1 -4.077407388 -7.929501879 TAGLN -4.053155762 -9.48424099 FUCA2 -4.047931189 -3.013556519 CDC42EP2 -4.047369669 -2.155297031 TRIM47 -4.047167126 -3.781359049 ACSS3 -4.037661389 -3.67342106 CPZ -4.029550012 -3.160129756 IER5L -4.023042592 -3.445505645 PLCD1 -4.020832181 -2.655017488 CYB5R2 -3.994072667 -2.61434352 UBTD1 -3.983241777 -2.036866077 RCAN2 -3.976382029 -3.912958043 PHLDA2 -3.944089149 -2.264846309 GAS2L1 -3.898895447 -4.209798964 METRNL -3.888523659 -2.112678173 SRXN1 -3.876143746 -2.712735161 PDE7B -3.870863541 -3.285205001 GALNTL2 -3.870132779 -2.358110052 MYO15B -3.855189874 -4.553683104 PIAS3 -3.847304715 -3.168990346 EVC2 -3.844502451 -2.323086749 ZBTB47 -3.828299939 -3.88949216 SNX21 -3.805668094 -2.640802004 IL17RC -3.80222719 -3.442234801 RARRES2 -3.797634718 -4.067158304 FKBP14 -3.796892982 -1.515218504 C1orf85 -3.780465924 -1.991446795 RSPO3 -3.780429343 -1.798839888 C13orf15 -3.74967828 -3.376011963 FOXQ1 -3.729193949 -3.584925777 OSBPL10 -3.727365972 -5.191370072 C10orf116 -3.711889152 -1.718061087 CDC42EP5 -3.711218833 -1.476855159 HSD3B7 -3.706665643 -2.996050087 ERBB2 -3.667947762 -8.636475401 TCEA3 -3.659653581 -6.179313848 EMILIN3 -3.627410134 -2.908769075 TRPV4 -3.623725773 -2.76201237 284 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers SGMS2 -3.617949434 -3.390928611 FOXF1 -3.614159895 -1.394243848 PACSIN3 -3.605181529 -2.457731712 GHR -3.60293857 -3.001962857 ALS2CL -3.56682601 -5.309426893 AFAP1L2 -3.553497092 -4.223559245 FAM26E -3.52398329 -2.439725767 PINK1 -3.520122308 -3.450526169 ANGPTL4 -3.517212882 -2.732106717 PGM5 -3.516890932 -2.277163906 EPHA2 -3.512566689 -6.263577289 DNALI1 -3.485274373 -3.511265537 GRASP -3.466521496 -1.96491781 SMAD6 -3.462255328 -3.146344274 SCUBE2 -3.452494709 -2.905828206 PARD3B -3.445829724 -5.486728445 AVPI1 -3.441618788 -2.794113379 FEZ2 -3.429675235 -3.482671271 CXCL16 -3.415275159 -7.459035941 LMOD1 -3.411606416 -3.941051214 SAV1 -3.375077144 -2.026245067 KDR -3.369497175 -4.780565859 PPP1R3B -3.355477032 -5.036247713 TMEM54 -3.350610866 -3.631761557 CCL8 -3.327662266 -1.401894626 SHISA4 -3.322864911 -2.735933785 C1orf190 -3.286832834 -1.342338054 DZIP1L -3.266274887 -1.985723457 GGCX -3.240521603 -2.274992029 OSR1 -3.23142412 -1.751664357 PRKD1 -3.230927516 -3.911381651 HSPA12B -3.223459801 -1.337081532 ZNF385D -3.219959033 -1.058379034 SEC16B -3.218653186 -4.786135301 MALL -3.217061507 -4.733712568 SPARCL1 -3.212048835 -11.8199309 SLC24A3 -3.209905918 -3.324118735 RARRES1 -3.209621122 -3.208968809 KCNC4 -3.202464584 -2.794996903 ADAMTSL5 -3.201806381 -1.518305887 PTGR1 -3.197121585 -4.223760449 LAMB3 -3.196494672 -7.276319302 GPRC5A -3.175196272 -8.961440421 285 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers HNMT -3.173719874 -5.102794548 GPR116 -3.153585055 -5.578342665 FAM62B -3.14485815 -3.371049628 CARD10 -3.139150425 -4.2608198 PPP1R3C -3.134776085 -1.642447904 DYSF -3.126052071 -5.976131499 CSAD -3.106272454 -1.983823996 SLC22A3 -3.096206567 -3.156503602 SSH3 -3.095159486 -4.005379201 SLC47A1 -3.078488911 -3.052334369 CTF1 -3.072710682 -1.659437405 SPATA6 -3.044058221 -2.984994957 SLC9A3R2 -3.043470191 -4.693516207 AFAP1L1 -2.982184698 -3.26031397 PDE1A -2.98023333 -2.52832547 GALNS -2.967579841 -1.74710031 ST3GAL4 -2.956946574 -3.912489902 FAM89A -2.942052464 -1.969187999 RHOD -2.937517399 -2.748935586 ALDH3B1 -2.92890583 -2.33133059 SUMF1 -2.911329405 -3.324310927 MN1 -2.902812492 -3.520119653 SLC40A1 -2.898867417 -9.139870098 MOCOS -2.892582181 -2.429495323 GALC -2.886076193 -3.529778526 ATP10A -2.869999284 -3.663155094 CITED4 -2.853194127 -3.074437845 SSPN -2.842143308 -2.958511343 BMPER -2.831041783 -2.586797912 SMPDL3A -2.818892488 -3.311611897 DENND2C -2.798282656 -2.595737715 SYTL2 -2.784038915 -5.14524253 SNAI1 -2.767112852 -1.133956399 TNFAIP8L3 -2.744514934 -1.884748834 FBXO32 -2.741882206 -4.257330649 PLD2 -2.687559604 -2.669529515 MIPOL1 -2.678454205 -3.461690355 DDIT4L -2.636492314 -3.273675802 ITGA8 -2.634776665 -1.700761282 NR1H3 -2.633790054 -2.512860462 HIBADH -2.596026311 -2.481823028 MDFI -2.58654607 -2.858403902 PECI -2.584292175 -2.404978976 286 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers COPZ2 -2.571744547 -1.424691709 TMEM26 -2.568361625 -2.118568154 C1orf113 -2.557503412 -2.393016384 SLC13A3 -2.547295644 -2.090526983 IMPA2 -2.529106094 -3.653636386 TMEM204 -2.526005165 -1.699061085 CCDC102B -2.490260853 -1.312432148 IRX3 -2.479158668 -2.940255575 SLC38A6 -2.453751247 -1.631223655 C1QTNF5 -2.435954514 -3.390450745 HHIPL1 -2.413947089 -1.400613509 C21orf63 -2.389293888 -2.570457714 APOL4 -2.346710733 -2.24217477 GPRC5B -2.343715663 -8.206296234 ANKRD35 -2.329550354 -1.746755157 ERG -2.327770509 -3.461522943 MAN1C1 -2.32220856 -1.545203118 FBXO2 -2.320454204 -1.755862404 SCN1B -2.301943868 -1.625929209 RAPGEF3 -2.289205649 -4.672807712 C17orf58 -2.284840938 -1.471960485 FAM62C -2.249543604 -1.405822585 STBD1 -2.223316556 -3.025177167 BDH2 -2.211305835 -1.529715315 HFE -2.206308486 -1.912565834 RGL3 -2.196214103 -5.239700395 CRK -2.195299658 -1.494098184 CALB2 -2.168028442 -1.960137425 CTNS -2.145769771 -1.421872661 NQO2 -2.142834843 -2.643948306 HSPB2 -2.124338329 -1.111288135 MEOX2 -2.103017552 -1.300509283 KCNE3 -2.102654663 -4.873856979 COBLL1 -2.094888349 -6.796616668 LCAT -2.092353952 -1.589959228 HSD17B14 -2.089041477 -1.527940498 BTC -2.051202476 -2.358846201 FSTL3 -2.040016988 -2.338291504 ATP8B4 -2.028350816 -2.27849812 C17orf79 -2.02794028 -1.478014819 ELOVL3 -2.026537942 -0.735474867 MMRN2 -2.005964606 -3.040995679 PODNL1 -1.980694963 -0.993287832 287 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers C11orf70 -1.96656768 -2.14691841 COX7A1 -1.966152073 -0.870369896 CBR3 -1.963880965 -2.594560417 ZFYVE28 -1.956748776 -2.288409663 ATOH8 -1.94111599 -3.036016508 GATA6 -1.928981721 -4.954263955 TRPC4 -1.858765342 -2.051374366 SAT2 -1.819457401 -1.880599548 IL1RL2 -1.790481085 -1.346146898 FAM70B -1.78681553 -1.514850877 MFSD7 -1.77713905 -1.987598316 BFSP1 -1.76186939 -1.713445729 FBLN7 -1.739362393 -2.224490617 GRTP1 -1.731558781 -2.058697287 NAGS -1.727035502 -1.216408054 LRRC8E -1.726783136 -2.087601159 RERG -1.715712021 -2.423153868 CYB5R1 -1.714982751 -2.904796525 PHYH -1.709306062 -3.141843075 ANG -1.699739817 -2.001075823 TDRD6 -1.692123533 -2.911215907 TRPC6 -1.67925083 -2.476680632 NGEF -1.63185099 -2.671933637 MUC1 -1.592239123 -11.75078683 C7orf10 -1.576745585 -1.412980884 CLDN4 -1.57635278 -8.928835 GNAL -1.575655113 -2.287137561 GALNT12 -1.529861771 -3.356288507 AUH -1.51256067 -1.175701761 WFDC1 -1.505443832 -2.01179907 C2orf55 -1.494938285 -3.13294996 GLB1L -1.480696154 -1.30790568 DAPK2 -1.475908689 -2.39774179 IL34 -1.457330427 -1.465733257 CILP2 -1.418459217 -1.825934344 C1QTNF2 -1.399929082 -1.095600012 IL17RE -1.342615928 -1.870260637 LMLN -1.336624565 -1.681177035 MKX -1.312696003 -2.882888198 RAB20 -1.288259179 -3.401389641 SEC14L4 -1.256031489 -1.62132683 RILP -1.249363443 -1.624661241 HOXA7 -1.234138863 -1.715475959 288 Gene symbol LogFC NBL TICs vs. SKPs LogFC NBL TICs vs. cancers DMRTA1 -1.212603424 -1.850178151 C6orf97 -1.211879885 -3.529950128 NPHP1 -1.209234419 -2.497931825 EMCN -1.195739018 -1.871341211 ARMC4 -1.172615219 -2.00966696 FANK1 -1.142279119 -2.645228138 NTF3 -1.122823527 -1.039375172 BST1 -1.114702878 -2.748435494 LRRC6 -1.092334252 -2.811812875 CYP39A1 -1.081336695 -2.073697336 CCDC48 -1.063435177 -1.68095822 OVGP1 -1.059869511 -1.404768543 PLEK2 -1.021794976 -2.411484974 WBSCR27 -0.937879305 -1.493620355 GGTA1 -0.927626335 -2.093590991 C3AR1 -0.900210057 -3.100673913 SLC2A9 -0.888734851 -1.168363147 ABCG2 -0.881713128 -2.878861541 SLC25A21 -0.868662295 -1.299284359 CFB -0.84444826 -3.206153468 KCNK15 -0.842602826 -1.611983671 KBTBD10 -0.835221952 -0.934105664 EPHA1 -0.810484856 -2.683318814 C18orf34 -0.763570571 -1.137732737 PLA2G7 -0.705743421 -3.058844942 ABHD7 -0.671317596 -1.556041026 CFHR3 -0.604716036 -1.123689251 RASSF9 -0.587285591 -1.608385317 SH3RF2 -0.585414125 -2.644100244 CLDN7 -0.50975886 -6.66713801 C21orf9 -0.506504987 -0.5697938 DKKL1 -0.495623989 -0.780226737 KLRD1 -0.380751344 -0.951847703 289 Appendix D Original data for the 99 NBL cases described in Chapter 4 Table D.1 Original data for the 99 NBL cases described in Chapter 4 The data below includes TARGET identifiers (column 1), sequencing technologies used to characterize each sample (columns 2 through 4) and the patients’ clinical characteristics (columns 5 through 8). TARGET_ID TARGET-30PALCBW TARGET-30PASGPY TARGET-30PAKIPY TARGET-30PANZPV TARGET-30PANBMJ TARGET-30PAIFCS TARGET-30PAPBJT TARGET-30PAPSMC TARGET-30PAKYZS TARGET-30PAPEFE TARGET-30PARSHT TARGET-30PALTYB TARGET-30PASFRV TARGET-30PAMVRA Genome, transcriptome, Illumina sequencing DNA Index 1 X 1 1 1 1 1.16 1.9 1 Age (days) MYCN 1 = amp 2 = not amp 3 = not done 4 = unsatisfactory 2 679 2 Shimada 0 = Unknown, 1 = Neuroblastoma 2 = Ganglioneuroblastoma, intermixed 3 = Ganglioneuroma, maturing OR well diff. 4 = Ganglioneuroblastoma, nodular 1 2 1972 2 2 X 1 1116 2 2 X 2 1757 2 2 X 1 993 2 2 X 1 1540 2 0 X 1 570 2 4 X 1 1337 1 2 X 1 659 2 3 X 1 990 1 2 1 752 1 2 2 1536 2 1 2 1057 2 2 2 1554 2 2 X 1.771 1 X Gender 1 = male 2 = female 9 = unknown 1.449 1.09 X Exome, Illumina sequencing X 2.02 1.17 X Genome, Complete Genomics sequencing 290 TARGET_ID TARGET-30PANKFE TARGET-30PAKHCF TARGET-30PALJUV TARGET-30PAMCXF TARGET-30PALSAE TARGET-30PARDUJ TARGET-30PARRBU TARGET-30PAMUTD TARGET-30PAPTLD TARGET-30PAPHPE TARGET-30PAMMXF TARGET-30PAPKWN TARGET-30PALPGG TARGET-30PAKGKH TARGET-30PAIXRK TARGET-30PALETP TARGET-30PAIXIF TARGET-30PAMYCE TARGET-30PAICGF Genome, transcriptome, Illumina sequencing DNA Index 1 1.19 1 1.17 2 Exome, Illumina sequencing Gender 1 = male 2 = female 9 = unknown Age (days) MYCN 1 = amp 2 = not amp 3 = not done 4 = unsatisfactory X 2 1280 2 Shimada 0 = Unknown, 1 = Neuroblastoma 2 = Ganglioneuroblastoma, intermixed 3 = Ganglioneuroma, maturing OR well diff. 4 = Ganglioneuroblastoma, nodular 2 X 2 839 2 1 X 2 1301 2 2 X 1 765 1 2 X 1 782 2 2 2 1052 2 2 1 1448 1 2 1 825 1 2 1 1244 2 2 X 1 806 2 2 X 1 1661 1 2 X 2 710 1 2 X 2 700 1 0 X 1 882 2 2 X 1 564 1 2 X 2 1466 2 2 X 1 1315 2 2 X 2 1059 2 2 X 2 1278 2 0 X 0 X Genome, Complete Genomics sequencing 1.24 1.09 X X 1.66 1.13 1.07 1.6 1 1 1 1 1 1 1.05 291 TARGET_ID TARGET-30PAIMDT TARGET-30PALBFW TARGET-30PAITEG TARGET-30PAPNEP TARGET-30PAKHHB TARGET-30PAMDAL TARGET-30PAIVHE TARGET-30PAKZRE TARGET-30PAPBGH TARGET-30PAMZGT TARGET-30PAISSH TARGET-30PANRRW TARGET-30PANNMS TARGET-30PAMBAC TARGET-30PAMZMG TARGET-30PAHYWC TARGET-30PARIRD TARGET-30PAMVAG TARGET-30PALXMM Genome, transcriptome, Illumina sequencing DNA Index 1.91 1 1 1 1 0 1.06 1 1 1 1.45 X X Exome, Illumina sequencing Genome, Complete Genomics sequencing Gender 1 = male 2 = female 9 = unknown Age (days) MYCN 1 = amp 2 = not amp 3 = not done 4 = unsatisfactory X 2 1408 2 Shimada 0 = Unknown, 1 = Neuroblastoma 2 = Ganglioneuroblastoma, intermixed 3 = Ganglioneuroma, maturing OR well diff. 4 = Ganglioneuroblastoma, nodular 0 X 1 804 2 2 X 1 1034 2 0 X 1 720 1 2 X 1 687 1 2 X 2 1222 1 2 X 1 1123 2 2 X 2 1083 1 2 X 2 1074 2 2 X 1 602 1 4 X 2 656 1 0 1 1730 2 2 2 1080 2 2 X 1 945 1 2 X 2 614 1 2 X 1 704 1 0 1 733 1 2 X 1 663 2 2 X 1 1475 2 2 1 2 1.13 1 1 X 2.08 1 1.28 292 TARGET_ID TARGET-30PANYBL TARGET-30PALWVJ TARGET-30PALAKM TARGET-30PAIPGU TARGET-30PASDZJ TARGET-30PAPSKM TARGET-30PANUKV TARGET-30PASCKI TARGET-30PAPTMM TARGET-30PAITCI TARGET-30PAIXNC TARGET-30PALZZV TARGET-30PAKZRF TARGET-30PALHVD TARGET-30PALUDH TARGET-30PAILNU TARGET-30PANBSP TARGET-30PALTEG TARGET-30PAKJRE Genome, transcriptome, Illumina sequencing DNA Index 1.85 1.88 1.06 1.04 X Exome, Illumina sequencing Genome, Complete Genomics sequencing Gender 1 = male 2 = female 9 = unknown Age (days) MYCN 1 = amp 2 = not amp 3 = not done 4 = unsatisfactory X 1 1216 2 Shimada 0 = Unknown, 1 = Neuroblastoma 2 = Ganglioneuroblastoma, intermixed 3 = Ganglioneuroma, maturing OR well diff. 4 = Ganglioneuroblastoma, nodular 0 X 1 1544 2 2 X 2 1758 2 2 X 2 898 2 0 2 902 1 2 1 1330 2 2 1 1486 1 2 X 2 911 2 2 X 2 559 1 2 X 1 728 2 1 X 1 723 2 2 X 2 699 1 2 X 1 1285 4 2 X 2 1634 2 2 X 2 1105 2 2 X 1 1683 2 0 X 2 1064 2 2 X 1 1367 2 2 X 1 1100 2 2 1 X 1.4 1 X 1 1 1.92 1.13 1.79 1 1.17 1.89 1.08 1 1.96 1.84 293 TARGET_ID TARGET-30PAPBZI TARGET-30PAIXNV TARGET-30PALEVG TARGET-30PANPVI TARGET-30PASLGS TARGET-30PALZRG TARGET-30PALXHW TARGET-30PALWIP TARGET-30PALIIN TARGET-30PALZSL TARGET-30PALFPI TARGET-30PASCLP TARGET-30PARGUX TARGET-30PAKFUY TARGET-30PANIPC TARGET-30PAMMWD TARGET-30PANRHJ TARGET-30PAREGK TARGET-30PANBCI Genome, transcriptome, Illumina sequencing DNA Index 1.99 1.58 1.3 1.11 X Gender 1 = male 2 = female 9 = unknown Age (days) MYCN 1 = amp 2 = not amp 3 = not done 4 = unsatisfactory X 1 1710 2 Shimada 0 = Unknown, 1 = Neuroblastoma 2 = Ganglioneuroblastoma, intermixed 3 = Ganglioneuroma, maturing OR well diff. 4 = Ganglioneuroblastoma, nodular 2 X 2 644 2 2 X 1 666 1 2 X 1 1289 2 2 1 1.23 1.2 1.68 1 Genome, Complete Genomics sequencing 1 1218 2 2 X 2 577 1 2 X 2 1131 2 2 X 1 1232 2 2 X 1 1279 2 2 X 1 1564 1 2 X 1 1152 2 4 1 644 1 2 2 1704 2 2 X 1 1194 2 2 X 1 921 1 4 X 1 812 1 2 X 2 1117 2 2 1 568 2 2 1 753 1 2 X 1.122 1.16 X Exome, Illumina sequencing 1.908 X 1 1.73 1.68 1 1.51 X 1.46 1 X 294 TARGET_ID Genome, transcriptome, Illumina sequencing TARGET-30PANYGR TARGET-30PAINLN TARGET-30PAKXDZ TARGET-30PANZVU TARGET-30PASAZJ TARGET-30PALNLU TARGET-30PALAKE TARGET-30PALJPX TARGET-30PAPPKJ X DNA Index 1 2.41 1 1.2 Exome, Illumina sequencing Gender 1 = male 2 = female 9 = unknown Age (days) MYCN 1 = amp 2 = not amp 3 = not done 4 = unsatisfactory X 1 993 2 Shimada 0 = Unknown, 1 = Neuroblastoma 2 = Ganglioneuroblastoma, intermixed 3 = Ganglioneuroma, maturing OR well diff. 4 = Ganglioneuroblastoma, nodular 2 X 1 1404 2 0 X 2 828 1 2 X 1 1713 2 2 1 1764 2 2 X 1 1439 2 2 X 1 860 1 3 X 1 860 1 2 X 1 1731 2 2 X 1.107 1.49 1 1.94 1.16 Genome, Complete Genomics sequencing 295 Appendix E Variant calls detected in the 99 tumor/normal pairs The variant calls (MAF files) for the 99 tumor/normal pairs described in Chapter 4 have been submitted to the database of Genotypes and Phenotypes (dbGAP), and are available under the study accession number phs000218.v3.p1. 296 Appendix F Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 Table F.1 Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 The list of chromatin remodeling genes was compiled from previously published work [336,106]. The MAPK genes are MAPK pathway members (KEGG hsa04010) that are also found in the Cancer Gene Census [7]. The genes are sorted alphabetically within each category. Gene symbol Function ACTL6A Chromatin remodeling ARID1A Chromatin remodeling ARID1B Chromatin remodeling ASH1L Chromatin remodeling BPTF Chromatin remodeling BRCA1 Chromatin remodeling BRCA2 Chromatin remodeling BRD2 Chromatin remodeling BRD4 Chromatin remodeling BRD7 Chromatin remodeling CHD3 Chromatin remodeling CHD4 Chromatin remodeling CHD6 Chromatin remodeling CREBBP Chromatin remodeling DOT1L Chromatin remodeling EP300 Chromatin remodeling EXT1 Chromatin remodeling EZH1 Chromatin remodeling EZH2 Chromatin remodeling GATAD2B Chromatin remodeling H1F0 Chromatin remodeling H1FX Chromatin remodeling H2AFY Chromatin remodeling H3F3A Chromatin remodeling HDAC4 Chromatin remodeling HDAC6 Chromatin remodeling HDAC7 Chromatin remodeling HDAC9 Chromatin remodeling HIST1H2AG Chromatin remodeling HIST1H2AL Chromatin remodeling HIST1H2BI Chromatin remodeling HIST1H2BL Chromatin remodeling HIST1H3A Chromatin remodeling 297 Gene symbol Function HIST1H3F Chromatin remodeling HIST1H4I Chromatin remodeling HIST2H2BF Chromatin remodeling HIST3H2BB Chromatin remodeling HTATIP2 Chromatin remodeling IKZF3 Chromatin remodeling IKZF4 Chromatin remodeling INF2 Chromatin remodeling ING3 Chromatin remodeling JAK2 Chromatin remodeling KAT2A Chromatin remodeling KAT2B Chromatin remodeling KDM2A Chromatin remodeling KDM3B Chromatin remodeling KDM4A Chromatin remodeling KDM4D Chromatin remodeling KDM5A Chromatin remodeling KDM5B Chromatin remodeling KDM6A Chromatin remodeling MEF2B Chromatin remodeling MLL1 Chromatin remodeling MLL2 Chromatin remodeling MLL3 Chromatin remodeling MLL4 Chromatin remodeling MLL5 Chromatin remodeling MYST3 Chromatin remodeling MYST4 Chromatin remodeling NCOA1 Chromatin remodeling NCOA3 Chromatin remodeling NCOA5 Chromatin remodeling NCOR1 Chromatin remodeling NCOR2 Chromatin remodeling NSD1 Chromatin remodeling NUP98 Chromatin remodeling PADI4 Chromatin remodeling PAX5 Chromatin remodeling PPP1CA Chromatin remodeling PRDM1 Chromatin remodeling PRDM14 Chromatin remodeling PRDM15 Chromatin remodeling PRDM16 Chromatin remodeling PRDM2 Chromatin remodeling PRDM4 Chromatin remodeling 298 Gene symbol Function PRDM5 Chromatin remodeling PYGO1 Chromatin remodeling PYGO2 Chromatin remodeling SETD1A Chromatin remodeling SETD2 Chromatin remodeling SETD5 Chromatin remodeling SETD8 Chromatin remodeling SMARCA2 Chromatin remodeling SMARCA4 Chromatin remodeling SMARCC1 Chromatin remodeling SMYD1 Chromatin remodeling SUV420H1 Chromatin remodeling UTX Chromatin remodeling AKT1 MAPK pathway oncogene AKT2 MAPK pathway oncogene BRAF MAPK pathway oncogene DAXX MAPK pathway oncogene DDIT3 MAPK pathway oncogene EGFR MAPK pathway oncogene ELK4 MAPK pathway oncogene FGFR1 MAPK pathway oncogene FGFR2 MAPK pathway oncogene FGFR3 MAPK pathway oncogene HRAS MAPK pathway oncogene JUN MAPK pathway oncogene KRAS MAPK pathway oncogene LILRB1 MAPK pathway oncogene MAP2K4 MAPK pathway oncogene MAPK10 MAPK pathway oncogene MAPK9 MAPK pathway oncogene MYC MAPK pathway oncogene NF1 MAPK pathway oncogene NFKB2 MAPK pathway oncogene NRAS MAPK pathway oncogene NTRK1 MAPK pathway oncogene PDGFB MAPK pathway oncogene PDGFRA MAPK pathway oncogene PDGFRB MAPK pathway oncogene PTPN11 MAPK pathway oncogene PTPN13 MAPK pathway oncogene RAF1 MAPK pathway oncogene TP53 MAPK pathway oncogene 299
© Copyright 2024