GENOMIC STUDIES OF THE NORMAL AND MALIGNANT NEURAL CREST by Olena Morozova

GENOMIC STUDIES OF THE NORMAL AND MALIGNANT NEURAL CREST
by
Olena Morozova
B.Sc. (Hons), University of Toronto, 2006
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
in
THE FACULTY OF GRADUATE STUDIES
(Bioinformatics)
THE UNIVERSITY OF BRITISH COLUMBIA
(Vancouver)
June 2012
© Olena Morozova, 2012
Abstract
Neuroblastoma (NBL) is an enigmatic pediatric tumor of the sympathetic nervous
system that is lethal in most children diagnosed over 18 months of age with metastatic
disease. NBL is thought to originate from a differentiation arrest of the neural crest, a
vertebrate-specific cell lineage with one of the most diverse developmental potentials.
Genomic studies of NBL have contributed to the development of new diagnostic and
prognostic markers. In addition, somatic and germline mutations in the ALK oncogene have
been identified and are being targeted clinically. Based on this prior work, two hypotheses
were developed and addressed in this thesis: (1) characterization of NBL with higher
resolution genomic technologies will lead to the identification of novel loci that contribute to
the disease and (2) analysis of the transcriptome of normal neural crest cells will help
identify loci of relevance to NBL. To address these hypotheses I used several datasets
generated from microarrays as well as RNA and DNA sequencing experiments. Two key
results have emerged from this analysis including the putative role of the BRCA1/BARD1
pathway in the development of NBL, and the heterogeneity of the genetic landscape of
primary NBL tumors. Potential translational avenues for the results reported in this thesis are
the exploration of AURKB and MAPK inhibitors as treatment agents for NBL.
ii
Preface
Portions of Chapter 1 have been published: O. Morozova, M. Hirst, M.A. Marra.
Applications of new sequencing technologies for transcriptome analysis. Annu. Rev.
Genomics Hum. Genet. 10:135-51, 2009. Copyright by Annual Reviews; O. Morozova and
M.A. Marra. Applications of next-generation sequencing technologies in functional
genomics. Genomics 92(5), 2008. Copyright by Elsevier; O. Morozova and M.A. Marra.
From cytogenetics to next-generation sequencing technologies: advances in the detection of
genome rearrangements in tumors. Biochem. Cell Biol. 86(2):81-91, 2008. Copyright by
Canadian Science Publishing. I have written most of the text for these review manuscripts
with guidance and input from my supervisor, M.A. Marra, and the co-author, M. Hirst.
Portions of Chapter 2 have been pubished in three manuscripts: H. Jinno, O.
Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F.
Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem
cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by
AlphaMed Press; M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova,
S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A.
Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human
pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by
Elsevier; O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation
approach for visualization-driven discovery of co-expression patterns in Serial Analysis of
Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008. The author contributions for
each manuscript are provided below on a per-manuscript basis.
Sections 2.2.1, 2.2.2, 2.4.1, 2.4.2, 2.4.3, 2.4.4; Figures 2.1, 2.2, 2.3, 2.4, and Table 2.1
are based on the manuscript: H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris,
R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent
genesis of an adult neural crest-like dermal stem cell from distinct developmental origins.
Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press. H.J., K.L.J., J.A.B., M.P.,
and F.D.M were involved in the conception and design of the study. H.J., K.L.J. and J.A.B.
performed the collection and analysis of experimental data, including the RT PCR
experiments described in Section 2.2.2. R.H., M.A.R., Y.C. and F.R. provided study material.
F.D.M. provided financial support and supervised the study. M.A.M. participated in data
iii
interpretation, and provided supervisory support, including manuscript approval, for the
computational part of the study (microarray data analysis). I performed all the microarray
data analysis, made the figures, interpreted the results, and wrote the sections of the
manuscript reproduced in this thesis, except as specified below. The RT PCR panels in
Figures 2.3A and B were made by members of F.D.M.‘s laboratory. The description of the
RT PCR method in Section 2.4.4 was written by members of F.D.M.‘s laboratory. All animal
use was approved by the Animal Care Committee for the Hospital for Sick Children in
accordance with the Canadian Council of Animal Care policies.
Sections 2.2.5, 2.2.5.1, 2.2.5.2, 2.4.6, 2.4.8; Figures 2.6, and 2.7B, and Table 2.3 are
based on the manuscript: M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O.
Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A.
Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for
human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright
by Elsevier. The author contributions for the sections of the manuscript described in the
thesis are provided below. Members of the Eaves laboratory and their colleagues made the
SAGE libraries described in Table 2.3, and defined the 319 candidate pluripotency genes
listed in Appendix B. G.R. performed the PASTAA motif enrichment analysis discussed in
Section 2.2.5.2, participated in wrting Sections 2.2.5.2 and 2.4.8 and made Figure 2.7B.
M.A.M. provided supervisory support for the seriation component of the study. I designed
and performed the seriation analysis of ESC SAGE libraries, made figures and tables and
wrote the sections of the manuscript reproduced in this thesis, except as defined above.
Sections 2.2.5.1 and 2.4.7 are based on the manuscript: O. Morozova, V. Morozov,
B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven
discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data.
PLoS ONE 3(9):e3205, 2008. O.M. and M.A.M. conceived and designed the study, and cowrote the manuscript with input from all co-authors. V.M. developed and implemented the
seriation algorithm in Matlab. B.G.H. and C.D.H. constructed pancreatic SAGE libraries and
provided guidance for biological interpretation of the results (the description and analysis of
pancreatic SAGE libraries is not included in this thesis). M.A.M. supervised the study. I
adopted the seriation algorithm to the analysis of SAGE data, performed the analysis,
interpreted the results and wrote the manuscript with input from all co-authors, including all
portions of the manuscript reproduced in this thesis.
iv
A version of Chapter 3 has been published: O. Morozova, M. Vojvodic, N.
Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R.
Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J.
Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of
neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for
neuroblastoma. Clin. Cancer Res. 16(18):4572-82, 2010. Copyright by the American
Association for Cancer Research. M.V. performed the protein experiments, including mass
spectrometry (Section 3.2.3) and Western Blot (Section 3.2.5), and participated in writing the
relevant sections of the manuscript (Sections 3.2.3, 3.4.4, 3.4.6). P.T., T.K., and M.F.M.
provided technical and supervisory assistance with the mass spectrometry facility, and
approved the manuscript. N.G. performed the drug inhibitor studies, made panel C of Figure
3.3 and participated in writing Section 3.4.5. L.M.H. isolated and cultured NBL TIC lines,
provided materials for sequencing, and participated in writing Section 3.4.1 (describing the
culturing of NBL TICs and SKPs). K.M.B. performed shRNA experiments (Section 3.2.5),
generated data for panel B of Figure 3.3, and participated in writing the relevant sections of
the manuscript (Sections 3.2.5 and 3.4.7). J.M. provided supervisory support to K.M.B. and
approved the manuscript. A.M., T.C., R.D.M., N.T., R.V., and S.J. provided bioinformatic
assistance with processing RNA-Seq data and approved the manuscript. M.H., R.M., and
Y.Z. provided technical assistance with library construction and RNA sequencing of NBL
TIC and SKP libraries and approved the manuscript. K.M.S. provided technical assistance to
the Toronto group. F.M. provided SKP lines for the study. D.R.K. provided project
leadership and financial support to the Toronto component of the study, and approved the
manuscript. M.A.M. provided supervisory and financial support, participated in the study
design, and approved the manuscript. I participated in the study design, conceived and
performed all the computational analyses detailed in Sections 3.2.1, 3.2.2, 3.2.4, 3.2.6, and
3.2.7), interpreted the data, made the figures and tables (except as described above), and
wrote the manuscript with input from all co-authors.
A version of Chapter 4 is in revision: T.J. Pugh*, O. Morozova*, E.F. Attiyeh, S.
Asgharzadeh, J.S. Wei, D. Auclair, K. Cibulskis, M.S. Lawrence, A.H. Ramos, E. Shefler, A.
Sivachenko, C. Sougnez, I. Birol, R.D. Corbett, K.L. Mungall, Y. Zhao, R.A. Moore, N.
Thiessen, A. Lo, R. Chiu, S.D. Jackman, A. Ally, B. Kamoh, A. Tam, J. Qian, M.
Krzywinski, M. Hirst, S.J. Diskin, Y.P. Mosse, K.A. Cole, M. Diamond, R. Sposto, L. Ji, T.
v
Badgett, W.B. London, Y. Moyer, J.M. Gastier-Foster, M.A. Smith, J.M. Guidry Auvil, D.S.
Gerhard, M.D. Hogarty, S.B. Gabriel, S.J.M. Jones, G. Getz, R.C. Seeger, J. Khan, M.A.
Marra, M. Meyerson, J.M. Maris. The genomic landscape of high-risk neuroblastomas
reveals a wide spectrum of somatic mutation. In Revision. *Authors contributed equally.
J.M.M., J.K., R.C.S., D.S.G. and M.A.S. conceived and led the project. M.A.M. and M.M.
conceived and supervised all aspects of the sequencing work at the Genome Sciences Centre
and Broad Institute, respectively. E.F.A., S.A., J.S.W., S.J.D., Y.P.M, K.A.C., L.J, T.B.,
Y.M., J.G-F and M.H. selected and characterized samples, provided disease-specific
expertise in data analysis, and edited the manuscript. R.S. and W.L. provided statistical
support. D.A, E.S., C.S., M.D., and J.M.G.A. provided overall project management and
quality control support. K.C., M.S.L., A.H.R., and A.S. supported analysis of exome
sequencing data. I.B., K.L.M., R.C., S.J., and J.Q. performed de novo assembly of Illumina
sequencing data. Y.Z. led the library construction effort for the Illumina libraries. A.T. and
Y.Z. planned the sequencing verification, and A.A. and B.K. performed the experiments.
R.D.C. performed copy number analysis of genome sequencing data. M.K. performed
verification of candidate rearrangements. N.T. ran the gene- and exon-level quantification
pipeline on the RNA-Seq data. A.L. helped interpret data provided by Complete Genomics,
Inc. R.A.M. and M.H. led the sequencing effort for the Illumina genome and transcriptome
libraries. S.B.G. led the sequencing effort for the exome sequencing libraries. G.G. and
S.J.M.J. supervised the bioinformatics group at the Broad Institute and Genome Sciences
Centre, respectively. T.J.P. performed the mutation analysis of the exome sequencing data
and the MutSig analysis. I performed the mutation analysis of genome and transcriptome
sequencing data, and conducted integrative analysis of these data by combining mutation
analysis, copy number analysis and de novo assembly results. Together with T.J.P., I
combined the exome, genome, and transcriptome data from the different sequencing
platforms, and interpreted the results. In concert with T.J.P, D.J.G., M.A.M., M.M. and
J.M.M, I co-wrote the manuscript with input from all co-authors. I made all the figures and
tables in Chapter 4, except Figure 4.1, Figure 4.2, Table 4.1 and Table 4.2, which were
modified from Trevor Pugh‘s work.
vi
Table of Contents
Abstract .................................................................................................................................... ii
Preface ..................................................................................................................................... iii
Table of Contents .................................................................................................................. vii
List of Tables .......................................................................................................................... xi
List of Abbreviations ........................................................................................................... xiii
Acknowledgements .............................................................................................................. xiv
Dedication ............................................................................................................................. xvi
Chapter 1: Evolving methods of genomic analysis and their application to the study of
neuroblastoma ......................................................................................................................... 1
1.1
Introduction ........................................................................................................................... 1
1.2
Cancer as a genetic disease ................................................................................................... 1
1.3
Cancer as a multigenic disease.............................................................................................. 2
1.4
Origin of genetic mutations in cancers.................................................................................. 3
1.4.1
Familial cancers and cancer syndromes............................................................................ 3
1.4.2
Genetic causes of sporadic cancers................................................................................... 4
1.5
Cancer stem cell hypothesis.............................................................................................. 5
1.6
Genetic lesions in cancers and methods for their detection .................................................. 6
1.6.1
Pre-genomic methods for studying genetic aberrations in cancers ................................... 6
1.6.2
Array-based methods for the detection of genetic lesions in cancer genomes ................. 8
1.6.3
Sequencing approaches for the detection of genetic lesions in cancers ............................ 9
1.6.3.1
Advances in DNA sequencing technologies ............................................................ 9
1.6.3.2
Sanger-based sequencing methods for the detection of genetic lesions................. 13
1.6.3.3
Cancer sequencing studies using the Sanger technology ....................................... 16
1.6.3.4
Cancer genome and exome sequencing using new sequencing technologies ........ 18
1.7
Cancer transcriptomes as proxies for the genomic diversity of tumors .............................. 20
1.7.1
Transcriptome analysis of cancers using microarrays .................................................... 20
1.7.2
Sequence census approaches to transcriptome analysis.................................................. 22
1.7.2.1
Whole transcriptome sequencing of cancers .......................................................... 23
1.8
Integrative genomics of cancers .......................................................................................... 25
1.9
Childhood neuroblastoma ................................................................................................... 26
vii
1.9.1
Classification, treatment and prognosis .......................................................................... 27
1.9.2
Neuroblastoma genetics and genomics ........................................................................... 29
1.9.2.1
Copy number aberrations ....................................................................................... 30
1.9.2.2
Gene expression profiling of neuroblastoma ......................................................... 31
1.9.2.3
Genetically engineered mouse models of neuroblastoma ...................................... 32
1.10
Thesis roadmap and chapter summaries ............................................................................. 32
Chapter 2: Transcriptome analysis of normal neural crest stem cells ........................... 40
2.1
Introduction ......................................................................................................................... 40
2.2
Results ................................................................................................................................. 43
2.2.1
SKPs of distinct developmental origin are highly similar at the transcriptional level and
differ from bone marrow mesenchymal stem cells (MSCs) ........................................................ 43
2.2.2
SKPs of distinct developmental origin maintain a lineage history at the gene expression
level….. ....................................................................................................................................... 44
2.2.3
Identification of genes significantly enriched and depleted in neural crest stem cell-like
cells…… ...................................................................................................................................... 45
2.2.4
Pathway analysis of SKP-enriched and SKP-depleted transcripts ................................. 46
2.2.5
SKPs share expression profile similarities with ES cells ............................................... 48
2.2.5.1
Identification of genes associated with the maintenance of the undifferentiated
state in human ES cells ............................................................................................................ 48
2.2.5.2
Validation of pluripotency markers using computational methods ....................... 49
2.2.5.3
Pluripotency genes whose transcripts are enriched or depleted in normal neural
crest stem cell-like cells compared to mesenchymal stem cells .............................................. 51
2.3
Discussion ........................................................................................................................... 52
2.4
Materials and methods ........................................................................................................ 54
2.4.1
Microarray analysis of rat SKP lines .............................................................................. 54
2.4.2
Unsupervised analysis to assess global transcriptome similarity ................................... 55
2.4.3
Differential expression analysis using microarrays ........................................................ 56
2.4.4
Reverse Transcription Polymerase Chain Reaction (RT-PCR) to confirm results from
SKP microarray analysis.............................................................................................................. 56
2.4.5
Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs ..... 57
2.4.6
Seriation analysis of LongSAGE libraries from the Cancer Genome Anatomy Project 58
2.4.7
Seriation using the progressive construction of contigs heuristic ................................... 58
2.4.8
Computational validation of transcripts in Supercontig 1 as pluripotency markers ........... 60
viii
Chapter 3: Transcriptome analysis of neuroblastoma tumor-initiating cells for
therapeutic target prediction ............................................................................................... 93
3.1
Introduction ......................................................................................................................... 93
3.2
Results ................................................................................................................................. 94
3.2.1
Identification of genes preferentially enriched or depleted in NBL TICs compared to a
compendium of cancer tissues and SKPs .................................................................................... 94
3.2.2
Elevated mRNA levels of BRCA1 signaling pathway members are associated with the
NBL TIC phenotype .................................................................................................................... 96
3.2.3
MudPIT analysis confirms the abundance of DNA repair proteins in the proteome of a
NBL TIC line ............................................................................................................................... 97
3.2.4
Known drug targets among NBL TIC-enriched transcripts............................................ 98
3.2.5
Targeting BRCA1 signaling: inhibition of AURKB is selectively cytotoxic to NBL
TICs….. ....................................................................................................................................... 99
3.2.6
Exon-level expression analysis of BARD1 reveals a potential mechanism for the
sensitivity of NBL TICs to AURKB inhibition ......................................................................... 100
3.2.7
Relevance to primary neuroblastoma ........................................................................... 102
3.3
Discussion ......................................................................................................................... 104
3.4
Materials and methods ...................................................................................................... 107
3.4.1
RNA sequencing and data analysis............................................................................... 107
3.4.2
Microarray experiments and data analysis.................................................................... 108
3.4.3
Identification of NBL TIC-enriched and depleted genes and the functional enrichment
analysis. ............................................................................................. …………………………108
3.4.4
Gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass
spectrometry .............................................................................................................................. 109
3.4.5
AlamarBlue assay ......................................................................................................... 110
3.4.6
Western blotting ........................................................................................................... 110
3.4.7
Small hairpin RNA (shRNA) knockdowns .................................................................. 111
3.4.8
Exon-level analysis of RNA sequencing data............................................................... 111
3.4.9
AURKB expression analysis ........................................................................................ 112
Chapter 4: Whole genome characterization of primary neuroblastoma tumors reveals a
wide spectrum of somatic alteration ................................................................................. 137
4.1
Introduction ....................................................................................................................... 137
4.2
Results ............................................................................................................................... 138
4.2.1
Exome sequencing ........................................................................................................ 138
ix
4.2.2
Whole genome and transcriptome sequencing ............................................................. 139
4.2.3
Overall mutation frequencies ........................................................................................ 140
4.2.4
Verification of candidate somatic mutations using orthogonal approaches ................. 141
4.2.5
Genes and pathways with significant frequency of mutation ....................................... 142
4.2.6
Genome rearrangements and structural variants ........................................................... 144
4.2.7
Mutations in other known cancer genes and regions .................................................... 145
4.3
Discussion ......................................................................................................................... 147
4.4
Materials and methods ...................................................................................................... 149
4.4.1
Sample selection and preparation ................................................................................. 149
4.4.2
Illumina library construction and sequencing ............................................................... 149
4.4.3
Detection of candidate somatic mutations in genome sequencing data ........................ 149
4.4.4
Gene coverage in transcriptome sequencing data ......................................................... 150
4.4.5
Copy number analysis using genome sequencing data ................................................. 151
4.4.6
Rearrangement detection .............................................................................................. 151
4.4.7
Exome sequencing and data analysis ............................................................................ 153
4.4.8
Integrated analysis of somatic variation from exome and genome data sets ................ 153
Chapter 5: Conclusions and future directions ................................................................. 178
5.1
Transcriptome analysis of normal neural crest cells identifies key pathways, enriched and
depleted in this population compared to other related cell types ................................................... 178
5.2
Plasticity of the neural crest stem cell phenotype and NBL heterogeneity ....................... 180
5.3
Transcriptome analysis of NBL tumor-initiating cells implicates AURKB as a novel .... 181
drug target for NBL........................................................................................................................ 181
5.4
Whole genome, transcriptome and exome sequencing of primary NBL tumors reveals a
broad spectrum of somatic mutations ............................................................................................ 183
5.5
Future directions in NBL genomics .................................................................................. 184
Bibliography ........................................................................................................................ 184
Appendices ........................................................................................................................... 222
Appendix A
Transcripts enriched and depleted in SKPs as discussed in Chapter 2 ................ 222
Appendix B
Candidate pluripotency genes used for seriation analysis in Chapter 2 ............... 255
Appendix C
Transcripts enriched and depleted in NBL TICs.................................................. 263
Appendix D
Original data for the 99 NBL cases described in Chapter 4 ................................. 290
Appendix E
Variant calls detected in the 99 tumor/normal pairs ............................................ 296
Appendix F
Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 ........... 297
x
List of Tables
Table 1.1 Specifications of the common next-generation sequencing platforms as compared
to the most common Sanger sequencer (Life Technologies’ ABI3730XL) ........................... 39
Table 2.1 Genes with significant evidence of differential expression between (A) fSKPs and
dSKPs, and (B) dSKPs and vSKPs as shown in Figure 2.3B ................................................. 82
Table 2.2 Pathways enriched among the transcripts increased or decreased in abundance in
SKPs compared to MSCs ........................................................................................................ 86
Table 2.3 LongSAGE libraries used for the seriation analysis described in Section 2.2.4 .... 90
Table 2.4 Pluripotency genes with transcript abundance increased or decreased in SKPs
compared to MSCs .................................................................................................................. 91
Table 3.1 Human NBL TIC and SKP lines used for gene expression analysis .................... 127
Table 3.2 List of RNA sequencing libraries and their sequencing statistics ........................ 129
Table 3.3 Proteins detected in the whole and crude membrane cell extract of NBL TIC line
NB88R and their corresponding RNA-Seq expression level ................................................ 132
Table 3.4 Known drug targets among NBL TIC-enriched genes ......................................... 135
Table 4.1 Non-silent mutations in genes of interest along with their validation status ........ 166
Table 4.2 Genes with significant frequency of somatic mutation ........................................ 171
Table 4.3 Notable structural variants detected and confirmed in NBL genomes and
transcriptomes ....................................................................................................................... 172
Table 4.4 Parameters used to select high confidence candidate somatic mutations reported by
CGI........................................................................................................................................ 175
Table 4.5 Primer sequences used for genomic validation of structural variants and gene
fusions detected by BCCA pipeline ...................................................................................... 176
Table 4.6 Primer sequences used for tumor RNA validation of structural variants and gene
fusions detected by the BCCA pipeline ................................................................................ 177
Table A.1 Transcripts enriched and depleted in SKPs as discussed in Chapter 2 ................ 222
Table B.1 Candidate pluripotency genes used for seriation analysis in Chapter 2 ............... 255
Table C.1 Transcripts enriched and depleted in NBL TICs.................................................. 263
Table D.1 Original data for the 99 NBL cases described in Chapter 4 ................................ 290
Table F.1 Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 ........... 297
xi
List of Figures
Figure 1.1 Advances in sequencing chemistry implemented in the earliest next-generation
sequencers ............................................................................................................................... 35
Figure 1.2 Transcript model coverage by various sequencing-based methods for
transcriptome analysis ............................................................................................................. 37
Figure 2.1 Global expression patterns are similar across SKPs of distinct development origins
................................................................................................................................................. 61
Figure 2.2 Facial and dorsal trunk SKP lineages show similar degrees of divergence from
MSCs....................................................................................................................................... 64
Figure 2.3 SKPs of distinct developmental origin express neural crest specification genes
despite maintaining a lineage history at the gene expression level ........................................ 67
Figure 2.4 Transcripts preferentially enriched or depleted in SKPs compared to MSCs ....... 70
Figure 2.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs
................................................................................................................................................. 72
Figure 2.6 Seriation analysis to identify developmentally restricted transcripts expressed in
undifferentiated ES cells ......................................................................................................... 77
Figure 2.7 Computational validation of genes identified by seriation as pluripotency markers
................................................................................................................................................. 79
Figure 3.1 Transcripts enriched and depleted in NBL TICs compared with SKPs and other
tumor tissues ......................................................................................................................... 113
Figure 3.2 Pathway analysis of NBL TIC-enriched transcripts ............................................ 116
Figure 3.3 NBL TICs are sensitive to Aurora B kinase inhibition ....................................... 120
Figure 3.5 NBL cells preferentially express the oncogenic BARD1beta isoform that is
involved in the stabilization of AURKB ............................................................................... 124
Figure 4.1 Overview of the multi-centre next-generation sequencing initiatives and data
analyses ................................................................................................................................. 155
Figure 4.2 Somatic mutation frequencies in 99 NBL tumor/normal pairs with samples
ordered by type of genes with somatic alteration ................................................................. 156
Figure 4.3 Integrated analysis of 99 neuroblastoma cases reveals a diversity of somatic
aberration .............................................................................................................................. 160
xii
List of Abbreviations
BP
Base Pair
CGI
Complete Genomics, Inc.
COG
Children‘s Oncology Group
CNV
Copy Number Variant
ESC
Embryonic Stem Cell
ES
Embryonic Stem
ESP
End-Sequence Profiling
GB
Giga Base
GWAS
Genome-Wide Association Study
GWA
Genome-Wide Association
ICGC
International Cancer Genome Consortium
KB
Kilo Base
MB
Mega Base
NBL
Neuroblastoma
RNA-Seq
RNA Sequencing
RPKM
Reads Per Kilobase of Gene Model per Million Mapped Reads
SAGE
Serial Analysis of Gene Expression
SEER
Surveilance Epidemiology and End Results
SI
Splice Index
SKP
Skin-Derived Precursor Cell
SNP
Single Nucleotide Polymorphism
SNV
Single Nucleotide Variant
TARGET
Therapeutically Applicable Research to Generate Effective Treatments
TIC
Tumor Initiating Cell
TCGA
The Cancer Genome Atlas
MSC
Mesenchymal Stem Cell
MPSS
Massively Parallel Signature Sequencing
NCI
National Cancer Institute
xiii
Acknowledgements
Over the course of my PhD I have been honored to learn from many talented
scientists, clinicians, professionals, and members of the general public. To these individuals,
only some of whom could be personally mentioned here due to space constraints, I am
indebted for the success in my endeavors and my continued enthusiasm in scientific research.
First and foremost, I could never overstate my gratitude to my PhD supervisor, Dr. Marco
Marra, who has become a role model of excellence in science, leadership and personal
integrity. He has supported me throughout my PhD scientifically, financially and
emotionally, and provided me with numerous invaluable learning opportunities both in
science and in life. I simply could not have wished for a better supervisor. I would like to
express my deepest gratitude to my thesis supervisory committee, Drs. Angela BrooksWilson, Paul Pavlidis, and Samuel Aparicio who have challenged me with insightful
questions and discussions that had a great impact on my scientific growth. I am also grateful
to the members of my examining examiners, Drs. Phil Hieter, Poul Sorensen, Lynn
Raymond, and Annie Huang for their detailed reading of my thesis and thoughtful comments
and questions that have greatly enhanced the final document.
I have been fortunate to participate in a number of collaborative projects that taught
me the benefits and challenges of team science, and allowed me to interact with many
exceptional individuals and world-class scientists. I am honored to have been involved in the
National Cancer Institute Neuroblastoma TARGET initiative, and would like to thank Drs.
John Maris, Daniela Gerhard and Malcolm Smith for this opportunity. I am also thankful to
have worked with Drs. David Kaplan, Freda Miller, Jason Moffat, Gregory Cairncross, Neal
Boerkoel, Connie Eaves, Sheila Singh and members of their laboratories. I would like to
specifically acknowledge Loen Hansford, Milijana Vojvodic, Kristen Smith, Kim Blakely
and Nathalie Grinstein for providing experimental support for my work. On the note of
collaborations, I cannot fail to thank Dr. Stephen Yip for introducing me to neuropathology,
and for helping me on this journey in more ways than could be listed here.
I am privileged to have been part of the Marra lab, and would like to thank its current
and former members for technical assistance, insightful discussions, and emotional support. I
wish to specifically thank Noushin Farnoud, Andy Mungall, Malachi Griffith, Trevor Pugh,
Ryan Morin, Tesa Severson, Rodrigo Goya, Maria Mendez-Lago, Sorana Morrissy, Jill
xiv
Mwenifumbo, and Suganthi Chittaranjan for their expertise and team spirit that have
contributed immensely to this thesis. I also would like to acknowledge the gifted summer
students Alexandra Maslova and Yulia Merkulova who have been a great help in my
research. My sincerest gratitude goes to Lulu Crisostomo for her invaluable assistance with
administrative tasks and much more.
I am thankful to have been surrounded by many talented staff and scientists at the BC
Cancer Agency‘s Genome Sciences Center (GSC), especially, Richard Corbett, Yaron
Butterfield, Karen Mungall, Mikhail Bilenky, Hye Jung (Elizabeth) Chun, Greg Taylor,
Roland Santos, Alireza Hadj Khodabakhshi, Gordon Robertson, Nina Thiessen, and Rob
Chrisp. These individuals have been a source of both scientific and emotional support over
the course of my PhD. My work would not have been possible without the skilled assistance
from the members of the GSC library construction, sequencing, and bioinformatics teams. I
would also like to thank Robyn Roscoe, Karen Novik, Diane Miller, Dominik Stoll and
Cecelia Suragh, for their help with funding applications and project management support.
I have enjoyed being part of the Canadian Institutes for Health Research / Michael
Smith Foundation for Health Research Strategic Training Program in Bioinformatics, and
would like to thank the two foundations for my stipend during the rotations. I would also like
to extend my gratitude to Dr. Steven Jones and Sharon Ruschkowski for fostering a great
training environment, and supporting me through my rotations and thesis work.
In addition to the bioinformatics program stipend, I have been honored to receive salary and
travel funds from the National Sciences and Engineering Research Council, Michael Smith
Foundation for Health Research, Genome Canada, American Association for Cancer
Research Women in Cancer Research Council, University of British Columbia, Roman M.
Babicki Fellowship in Medical Research, and the John Bosdet Memorial Fund. I also cannot
fail to acknowledge the Jordan Hopkins Foundation for Cancer Research, the James Fund for
Neuroblastoma Research, the British Columbia Childhood Cancer Parents‘ Association, and
the Will to Survive Campaign for their passionate support of pediatric cancer research,
including my thesis project.
Finally, I wish to extend my thanks to fellow graduate students Anya Gangaeva,
Meeta Mistry, Shabnam Tavassolli, Katayoon Kasaian, Warren Cheung, Leon French,
Kieran O‘Neill, Anthony Fejes, and Yvonne Li, as well as my family and friends for being a
great source of encouragement, motivation, and fun throughout these years.
xv
Dedication
To Anna, Ava, Emily, Ethan, Brendan, Connor, Jake, James, Jordan, Kaiya, Nate, Maya,
Reese, Ryan, Taras, Shivank as well as countless others who have journeyed through the
world of neuroblastoma, and to Megan McNeil, who fought hard for a day when no child
would die from cancer.
xvi
Chapter 1: Evolving methods of genomic analysis and their application to
the study of neuroblastoma1
1.1
Introduction
While it has been long realized that cancers are genetic diseases, it is only with the
recent advent of high resolution genomic technologies that the exact nature of genetic
changes associated with most cancers are being elucidated. This Chapter reviews the
evolution of genomic approaches that have been developed for cancer analysis, with an
emphasis on the genomic technologies, microarrays and next-generation sequencing, used for
the research described in Chapters 2, 3 and 4 of this thesis. A specific focus of the
dissertation is on the genomic analysis of pediatric neuroblastoma, a cancer of the developing
sympathetic nervous system that most commonly affects children under the age of 5. Section
1.9 provides a brief overview of the clinical and biological features of neuroblastoma, as well
as the advances in neuroblastoma genetics and genomics. Finally, Section 1.10 introduces the
specific hypotheses and experimental goals addressed in each of the research chapters of this
thesis.
1.2
Cancer as a genetic disease
The presence and causative role of genetic defects in cancer cells was first suggested
by David von Hansemann and Theodor Boveri in the 1890s-1900s [1]. Boveri accepted von
Hansemann‘s original idea that abnormal chromatin content was central to cancer cells, and
refined it in his subsequent experimental work on sea urchin embryos. Using the sea urchin
model system, Boveri observed that abnormal numbers of chromosomes led to improper
embryonic development, and, in some cases, to uncontrolled cell growth. Boveri further
hypothesized that genetic aberrations came in two flavors, those stimulatory and those
inhibitory to cell growth [2,3]. The growth stimulatory chromosomes would be accumulated
1
Portions of this Chapter have been published, and the author contributions are provided in the Preface as per
the University of British Columbia PhD thesis guidelines: O. Morozova, M. Hirst, M.A. Marra. Applications of
new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 10:135-51, 2009.
Copyright by Annual Reviews; O. Morozova and M.A. Marra. Applications of next-generation sequencing
technologies in functional genomics. Genomics 92(5), 2008. Copyright by Elsevier; O. Morozova and M.A.
Marra. From cytogenetics to next-generation sequencing technologies: advances in the detection of genome
rearrangements in tumors. Biochem. Cell Biol. 86(2):81-91, 2008. Copyright by Canadian Science Publishing.
1
by cancer cells, while the inhibitory ones would be excluded. Boveri‘s prescient concepts of
stimulatory and inhibitory genetic material were much later manifested in the notions of
oncogenes and tumor suppressors, collectively known as cancer genes [4–6]. Oncogenes and
tumor suppressors are genes whose products function in cell growth pathways or are
involved in the control of the cell cycle. Mutated oncogenes typically function in a dominant
fashion, while mutations in tumor suppressors are recessive [7]. The first cellular oncogene
c-src was discovered by homology with viral sequence previously shown by Peyton Rous to
cause sarcomas in hen [8,6]. The normal cellular homologues of viral oncogenes are
commonly referred to as prototype oncogenes (proto-oncogenes) to highlight the fact that
they need to be activated by a genetic event (a gain-of-function mutation) to become
oncogenes, whereas the viral counterparts encode constitutively active pro-survival proteins.
Another class of genes that contribute to cancer and is sometimes considered part of the term
―cancer genes‖ includes genes involved in DNA repair. Defects in these genes contribute to
the increased rate of accumulation of DNA damage as well as genomic instability that in turn
enhances the likelihood of producing a genetic alteration affecting a proto-oncogene or tumor
suppressor (e.g. mutations in mismatch repair genes are responsible for hereditary
nonpolyposis colorectal cancer [9]).
1.3
Cancer as a multigenic disease
Mathematical modeling studies that used epidemiological data on the age distribution
of common cancers have led investigators, such as Carl Nordling, to propose that several
(originally as many as seven) genetic hits may be required for tumorigenesis [10]. Alfred
Knudson applied the idea of multistep tumorigenesis to the study of retinoblastoma, a
pediatric cancer that can occur in both sporadic and familial forms. Knudson used statistical
modeling to suggest that the distribution of sporadic and familial retinoblastoma tumors was
consistent with the disease being caused by two hits (later termed Knudson‘s two-hit
hypothesis). The two-hit hypothesis suggested that in familial cases the first genetic hit was
inherited and the second acquired somatically, while in sporadic cases both hits were somatic
[11]. This model explained why familial but not sporadic cases often presented with multiple
tumors or tumors in both eyes.
It was put forward in Nordling‘s original paper that only hits that confer survival
advantages on cancer cells would count towards the proposed seven required for
2
tumorigenesis, thereby alluding to the ideas of cancer driver mutations and clonal evolution.
Peter Nowell later formalized these ideas into a theoretical framework of stepwise acquisition
and Darwinian selection of genetic changes that underlies our current view of tumorigenesis
[12]. Nowell also suggested that early genetic mutations that occur in cancer cells may
contribute to genomic instability and even more genetic alterations observed in later-stage
tumors. However, due to limited biological knowledge available at the time, he was unable to
pinpoint the exact nature of the genetic changes required for tumorigenesis. In a seminal
paper published in 1990, Eric Fearon and Bert Vogelstein combined previous theoretical
work with advances in the identification of oncogenes and tumor suppressors to propose a
specific molecular model of colorectal tumorigenesis [13]. According to this model,
aggressive colorectal carcinomas developed from benign adenomas by sequential acquisition
of changes that included activation of oncogenes and inactivation or loss of tumor
suppressors. The model also incorporated epigenetic changes, such as DNA
hypomethylation, which was originally reported to occur in tumors by Feinberg and
Vogelstein [14], but shown to have a causal role in cancer only several years later [15]. It is
now accepted that abnormalities in cancer genes, accumulated and selected for in a step-wise
process, contribute to a genetic landscape that underlies the biological hallmarks of tumors:
self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of
programmed cell death (apoptosis), unlimited replicative potential, sustained angiogenesis,
and tissue invasion [16].
1.4
Origin of genetic mutations in cancers
1.4.1 Familial cancers and cancer syndromes
While most cancers are acquired sporadically, a fraction of malignancies, such as
familial breast cancer, cluster in families and are associated with inherited mutations in
cancer genes [17]. In addition, several cancer syndromes have been characterized and linked
with overall increased risk of cancers, for instance Li-Fraumeni Syndrome and von HippelLindau disease are associated with increased risk of certain types of solid tumors [18].
Familial cancers and cancer syndromes have been instrumental in inferring the identities of a
fraction of cancer genes that play a role in both sporadic and familial forms of the same
malignancy. For example, tumor suppressors RB and VHL are altered in both sporadic and
familial forms of retinoblastoma and renal cell carcinoma, respectively [19,20]. However, in
3
most cases, such as ductal and lobular breast cancer, alterations in different genes underlie
sporadic and familial forms of the same disease [21].
1.4.2 Genetic causes of sporadic cancers
Most well-characterized familial cancer syndromes follow a dominant mode of
inheritance and are associated with a small number of rare alleles that confer a significant
effect on the phenotype [18]. The completion of the first two human genome sequences and
the International HapMap Initiative led to the realization of the abundance of human genetic
variation that may contribute to an individual‘s risk of common diseases, including cancer
[22–24]. This realization brought about investigations into recessive genetic components that
may influence susceptibilities to sporadic cancers. Due to the high lifetime relative risk of
developing a sporadic cancer at an invasive site (45% for males and 38% for women in the
US according to the Surveillance Epidemiology and End Results database [25]), studies of
cancer families are confounded by a high likelihood of chance associations and difficulties in
discerning hereditary and environmental causes [17]. To address these concerns and help
delineate the potential hereditary component of common cancers, a large-scale study
designed to compare co-occurrence of common cancers in monozygotic and dizygotic twins
was conducted [26]. The study examined 44,788 sets of twins from Sweden, Finland and
Denmark, and found minor contributions of a hereditary component to susceptibility for most
types of cancer, suggesting that most significant causes of common sporadic cancers were
environmental. Environmental agents that have been associated with cancer include tobacco
smoke, UV light, radiation, hormones, viruses, and various chemical substances. In fact, it is
currently thought that the environmental causes of human cancers are underappreciated [27].
Given these observations, sporadic cancers are likely caused by a combination of inherited
predisposition alleles and acquired (somatic) mutations that result in uncontrolled
proliferation and tumor growth. Inherited or acquired defects in DNA repair, replication or
segregation can aggravate the neoplastic phenotype and contribute to further cancer
progression. The acquired alterations may arise through the exposure to environmental
agents, or due to other factors, some of which may be currently unknown.
4
1.5
Cancer stem cell hypothesis
Stem cells are defined as special cells within a multicellular organism that are able to
self-renew and, through cell division, generate specialized cell types that compose each tissue
within the body. For instance, embryonic stem cells are able to generate all cell types within
the developing embryo while adult (somatic) stem cells are able to regenerate cell types
within a particular tissue [28]. Modern use of the term ―cancer stem cell‖ has been pioneered
by the work in leukemia that showed that the cell of origin of leukemias, regardless of their
heterogeneity, consistently exhibited properties of the normal hematopoietic stem cell [29].
This work resolved the long-term debate on the target cell that was susceptible to leukemic
transformation, and implicated the hematopoietic stem cell in this role. Since this landmark
study, similar observations have been made in brain and breast malignancies [30,31]. Both of
these reports together with the original leukemia work suggested that a small fraction of cells
(0.1-0.0001%) within each tumor maintains stem cell properties and is responsible for selfrenewal and regeneration of the tumor hierarchy by producing differentiated cells that form
the bulk of the tumor. However, the idea of rarity of tumor-regenerating cancer stem cells
was questioned by the work in melanoma, which reported that an average of 27% of unsorted
melanoma cells from patients were capable of forming tumors in mice in single-cell
transplant experiments [32]. The contention of the melanoma study was that the common use
of NOD/SCID mice, such as that reported in the original leukemia work [29], may
underestimate the frequency of tumor-forming cells as these mice have remnants of
immunity and are less susceptible to developing cancer. In contrast, the melanoma study used
NOD/SCID interleukin-2 receptor gamma chain null (Il2rg (-/-)) mice that are more
immunocompromised than the NOD/SCID mice and are thus better suitable for estimating
true tumorigenic capacity of cancer cells.
Another study challenged the presumed origin of cancer stem cells from resident
normal stem cells within the tissue and showed that breast cancer stem cells may arise from
tumor cells via epithelial to mesenchymal transformation induced by immune signaling [33].
Given these observations, the state-of-the-art version of the cancer stem cell model is
dynamic, and incorporates the possibility of variable frequencies of stem cells in different
cancer types, as well as the potential for inter-conversion of cancer stem cell and non-stem
cell compartments within the tumor [34].
5
An important result from studies in the cancer stem cell field is the finding that cancer
stem cells may be resistant to therapies and may be associated with tumor recurrence [35]. As
such, cancer stem cells provide important targets for novel therapy development, particularly
for recurrent and refractory disease. Therefore, studying genetic changes found in these cells
may shed light onto the potential therapeutics that may be specific to the cancer stem cell
compartment.
1.6
Genetic lesions in cancers and methods for their detection
Somatic and germline aberrations implicated in tumorigenesis can affect a single base
(point mutations) as well as multiple bases (translocations, inversions, small
insertions/deletions (indels), and copy number variants (CNVs)). Throughout this thesis,
events are defined as duplications or deletions if they are <1 kb in length and as CNVs if they
are >1 kb in length. In addition, losses of heterozygosity (LOH) are defined in the context of
tumor suppressors when one allele, most often the functional one, is lost either through the
loss of a copy of a chromosome or via a copy-number-neutral mechanism.
Due to their size and ease of detection, genome rearrangements involving whole
chromosomes or their parts were the earliest genetic events shown to be associated with
cancer [3]. Chromosomal translocations can result in either chimeric protein products or
aberrant gene expression due to the apposition of coding sequences to regulatory regions of
other genes, either of which can be associated with cancer genes [7]. Copy number gains
(amplifications) have been shown to be linked to increased expression of oncogenes, such as
MYCN in neuroblastoma, while regions of copy number losses may harbor tumor suppressor
loci [36]. In addition, coding or regulatory sequence of cancer genes can be disrupted by
point mutations and small indels affecting the amino acid sequence or gene expression,
respectively. The smaller events evaded detection by early low resolution approaches, and
the extent of their contribution to tumorigenesis has been realized only in the recent decade.
1.6.1 Pre-genomic methods for studying genetic aberrations in cancers
The earliest methods for detecting chromosomal and genomic aberrations in cancers
involved microscopic examinations of chromosomes and chromosome banding patterns [37].
Application of these approaches led to the discovery of the Philadelphia chromosome, which
results from an exchange of DNA between chromosomes 9 and 22 in chronic myeologenous
leukemia (CML) [38,39]. PCR-based methods have been used to detect known genome
6
rearrangements, particularly alterations in gene copy number. These methods produce results
promptly, require little starting material, and are excellent for locus-specific identification of
known rearrangements of a few kbs in size. Several techniques allow detection of genomic
lesions larger than those detectable by traditional PCR (5 – 6 kb) [40]. For instance, Long
PCR uses a mixture of two polymerases, a proofreading and a non-proofreading one, thus
increasing the product size to 35 kb [40]. The product length of non-proofreading
polymerases is limited by the low efficiency of extension at mismatched bases, while the
product length of proofreading polymerases can be limited by their 3´-exonuclease activity;
therefore, combining the two types of polymerases increases the product size achievable by
each enzyme alone. This method is useful for identifying specific large aberrations, including
intragenic deletions, insertions and duplications [41].
An important milestone in molecular cytogenetics was the development of in situ
hybridization. This procedure is based on the principle of the hybridization of a labeled
probe, containing genomic DNA of interest, to a complementary target; probe copy number
is assessed by means of microscopic visualization. Since the first report of the method in
1969 [42], in situ hybridization methods have undergone extensive advancement with regards
to both the target and the probe [43]. The most commonly used conventional in situ
hybridization protocol in cancer research is dual-color fluorescence in situ hybridization
(FISH). This method involves labeling centromeres and the DNA region of interest with
different colors and estimating probe copy number from the ratio of the centromeric and noncentromeric signal. Dual-color FISH is used for the detection of chromosomal gains or losses
(aneuploidy); intrachromosomal insertions, deletions, inversions, amplifications; and
chromosomal translocations in both solid and hematopoietic cancers [44]. An extension of
conventional FISH methods is the development of multi-fluorochrome techniques such as
multiplex FISH (M-FISH) [45], spectral karyotyping (SKY) [46] and combined binary ratio
labeling (COBRA) [47] which allow the simultaneous visualization of all chromosomes in 24
colors. Improvements in target resolution have been achieved through the use of different
probe substrates, including metaphase chromosomes (~5 Mb resolution), interphase nuclei
(50 kb – 2 Mb resolution), and extended chromatin or DNA fibers (5 – 500 kb resolution)
[43]. Mapped genomic clones such as bacterial artificial chromosomes (BACs), P1-derived
artificial chromosomes (PACs), and yeast artificial chromosomes (YACs) have also been
7
used as FISH probes to achieve a higher resolution mapping of genome rearrangements to the
human genome sequence than that achievable by chromosome FISH [48–50].
1.6.2 Array-based methods for the detection of genetic lesions in cancer genomes
Comparative genomic hybridization (CGH) is a molecular cytogenetic method for
detecting relative differences in copy number between two genomes. In its original form,
DNAs from reference and test samples were labeled with different colors and hybridized to
metaphase chromosomes. The ratios of test to reference fluorescence intensities were
quantified using digital image analysis, and were used to identify genomic losses or gains in
the test sample (e.g. a tumor sample) with respect to the reference sample [51]. Conventional
CGH is labor intensive, providing relatively low resolution of 5 to 10 Mb for deletions and 2
Mb for amplifications [52]; moreover, it is unsuitable for the detection of balanced
rearrangements (e.g. balanced translocations and inversions) as well as whole genome copy
number changes (ploidy) [53]. However, CGH can be used as a discovery tool as it requires
no prior knowledge of chromosomal imbalances.
To overcome the low resolution limitation of CGH, array CGH (aCGH) was
developed. In aCGH, the differentially labeled test and reference DNA is hybridized to a
glass slide containing arrayed DNA probes rather than metaphase chromosomes [54]. With
the recent development of arrays of mapped clones spanning whole chromosomes [55,56]
and the whole human genome [57], large-scale aCGH experiments are feasible. For instance,
79 kb resolution has been achieved using a genome-wide array of BACs [58]; 75 and 110 kb
resolutions have been reported with chromosomal arrays containing a mix of BACs/PACs
and fosmids/cosmids, and BACs only, respectively [55,56]. Arrays of mapped genomic
clones are robust with a high signal to noise ratio, and have been applied to the detection of
copy number changes in tumors on a genome-wide and chromosome-wide scale [59,52]. In
contrast, oligonucleotide arrays can provide a higher resolution (generally 5 to 50 kb) but
have been reported to suffer from lower sensitivity resulting in failure to reliably detect lowcopy number changes due to a poorer signal to noise ratio [60]. Oligonucleotide array CGH
can potentially provide even higher resolution than 5 kb as overlapping nucleotides can be
synthesized with as little as a single base off-set [53]. Despite the popularity of aCGH
methods, the main technological limitation of these methods is their restricted applicability to
8
the detection of genome rearrangements that involve a change in copy numbers relative to a
reference sample.
Single Nucleotide Polymorphism (SNP) arrays, originally designed for genotyping,
are oligonucleotide arrays that detect the two different alleles of biallelic SNPs [61]. Probe
signal intensities can be used to determine SNP genotypes and to detect copy number
changes [62]. In contrast to array CGH, in which samples are differentially labeled and cohybridized, only one labeled sample is hybridized to the SNP array at a time; CNVs are
detected by comparison with one or several reference samples analyzed in separate
hybridizations. Currently SNP arrays capable of genotyping more than 1M SNPs are
available from companies such as Illumina and Affymetrix, providing a resolution that
matches or exceeds that of most state-of-the-art aCGH platforms. An important advantage of
SNP arrays is the ability, unique among genomic methods discussed thus far, to detect copy
number neutral losses of heterozygosity [63]. Further, SNP arrays have been used to detect
allele-specific copy number variants [64]. A disadvantage of the technology is the
requirement of a PCR amplification step to increase the signal to noise ratio; as a result,
amplification biases may be introduced giving rise to spurious CNVs [53]. Moreover, CNV
predictions achieved using SNP arrays vary depending on the reference set and
computational approach used [65]. Even so, SNP arrays have been widely applied to the
analysis of genomes of various tumors including neuroblastoma [66] and in The Cancer
Genome Atlas discussed in Section 1.6.3.3.
1.6.3 Sequencing approaches for the detection of genetic lesions in cancers
1.6.3.1 Advances in DNA sequencing technologies
With the completion of the reference human genome projects [22,24], the need for resequencing studies in which individual genomes and genomic segments are examined for the
presence of changes linked to the phenotype of interest became apparent. This observation
drove technological developments that resulted in the advent of a panel of conceptually new
sequencing methods collectively referred to as ―next-generation‖, ―new generation‖ or
―second generation‖ sequencers that are more cost-effective than Sanger sequencing. A
standard DNA sequencing workflow has traditionally included three key steps, sample
preparation, sequencing, and data analysis. The new sequencing technologies improve upon
9
the Sanger protocol by advances in the first steps of the workflow, albeit often at the cost of
higher error rates and shorter read lengths that can challenge data analysis.
Several high throughput new-generation sequencing technologies are currently
commercially available, including 454/FLX (Roche), Illumina, SOLiD (Life Technolgoies),
Pacific Biosciences, Ion Torrent (Life Technologies). As of July 2011, the Helicos Heliscope
instrument used in several published next-generation sequencing studies is no longer
available for purchase. In the research described in this thesis, the Illumina technology is
used in Chapter 3 to analyze the transcriptomes of neuroblastoma tumor-initiating cells as
well as their normal counterparts. In Chapter 4, the same technology is used to analyze the
genome, exome and transcriptome sequences of primary neuroblastoma tumors.
The new technologies produce an abundance of short reads at a higher throughput
than is achievable with the state-of-the-art Sanger sequencer, and their specifications are
summarized in Table 1.1. An additional company not mentioned in Table 1.1, Complete
Genomics, Inc. (CGI), provides whole human genome sequencing and analysis as a service
[67]. Genome sequences generated by CGI from primary neuroblastoma tumors and matched
peripheral blood are discussed in Chapter 4. The advances in sample preparation and
sequencing chemistry and detection are reviewed below for the most common nextgeneration sequencing technologies: 454/Roche, Illumina, and SOLiD. To provide an
example of the true single molecule technology, the Helicos Heliscope is also discussed.
1.6.3.1.1
Advances in Sample Preparation
In the original Sanger sequencing protocol, a DNA sample is first sheared into
fragments, and then subcloned into vectors, followed by the amplification in bacterial or
yeast hosts. The amplified DNA is then isolated and sequenced with the Sanger chain
termination method [68]. Cloning-based amplification allows for the sequencing of
contiguous large fragments, and does not require prior information about the genome
sequence (termed ―de novo sequencing‖). However, it is prone to host-related biases, and is
lengthy and labor intensive, restricting large-scale Sanger sequencing to designated genome
sequencing centers. Cloning-based amplification followed by Sanger sequencing was used
for the determination of the first human genome sequences [24,22]. Notably, when a
reference genome sequence of an organism is available and when regions to be sequenced are
10
small, templates can be prepared for sequencing by PCR amplification instead of cloning
[69].
A major advantage of the second-generation sequencing platforms is the elimination
of the in vivo cloning step and its replacement with PCR-based amplification. Both
454/Roche [70] and Applied Biosystems SOLiD technologies circumvented the cloning
requirement by taking advantage of emulsion PCR [71], which uses emulsion droplets to
isolate single DNA templates in separate micro reactors where amplification is carried out.
This template amplification is also used in Ion Torrent instruments [72]. The Illumina
platform [73,74] uses bridge amplification, a solid phase amplification approach in which
DNA molecules are attached to a solid surface and amplified in situ, generating clusters of
identical DNA molecules. Both of these amplification approaches result in the generation of
a collection of clonal copies of the template, which are fed into subsequent steps of the
sequencing pipelines. The first single-molecule method to be commercialized was developed
by Stephen Quake‘s laboratory (and commercialized by Helicos Biosciences), eliminated the
amplification step, directly sequencing single DNA molecules bound to a surface [75].
Another commercially available single-molecule sequencing method (Pacific Biosciences)
employs real-time detection of single fluorescently-labeled nucleotides as they are
incorporated by a polymerase [76]. Such single-molecule sequencing approaches are referred
to as third-generation technologies. Third-generation sequencers have the potential to reduce
the sequencing costs of the second-generation instruments, although their scalability remains
unproven.
1.6.3.1.2
Advances in Sequencing Chemistry and Detection
The paradigm of the original Sanger method is the DNA polymerase-dependent
synthesis of a complementary strand in the presence of four labeled nonreversible synthesis
terminators, 2´,3´-dideoxynucleotides (ddNTPs) corresponding to the four natural 2´deoxynucleotides (dNTPs). The four non-reversible terminators are incorporated into the
growing DNA strand at random in place of the corresponding dNTP, thereby producing a
collection of DNA fragments of varying lengths that are then separated by polyacrylamide
gel electrophoresis [68]. Originally, radioactive ddNTPs were used and four different
reactions were required per template molecule. Subsequently, the radioactive ddNTPs were
replaced with fluorescently labeled terminators that allowed the four sequencing reactions to
11
be carried out simultaneously with different ddNTPs distinguishable by emission spectra
[77]. Another variation of automated Sanger sequencing is the dye-labeled primer sequencing
in which fluorescent dyes are attached to the 5′ end of primers [78]. A key disadvantage that
hindered further development of this method as compared to the dye-labeled terminators
described above is the need for four separate extension reactions that needed to be pooled
prior to loading, and four dye-labeled primers for each template.
Other improvements of Sanger sequencing included the replacement of slab gel
electrophoresis with capillaries, the advent of capillary arrays that allowed sample
multiplexing, and the deployment of production-scale sequencing workflows. As a result of
these developments, the Sanger method achieved the read length, accuracy, and throughput
compatible with de novo sequencing of whole genomes. To date, Sanger sequencing has been
responsible for the generation of reference genome sequences of many species including that
of human [22,24].
The pyrosequencing approach was the first alternative to Sanger sequencing to
achieve commercialization as part of the Roche/454 instrument [70]. Pyrosequencing uses
chemiluminescence-based detection of each released pyrophosphate that occurs upon the
incorporation of a nucleotide by the DNA polymerase (Figure 1.1A). The four nucleotides
are added to the sequencing reaction one at a time, such that only one type of nucleotide is
available to the DNA polymerase at a given step. The addition of the correct nucleotide is
accompanied by the release of light allowing for the inference of the nucleotide identity at
each position in a sequencing read. The amount of light produced is proportional to the
number of incorporated nucleotides, potentially permitting the detection of homopolymers. In
practice, however, sequencing of homopolymer stretches using the Roche/454 technology is
error-prone [79]. In the 454 FLX instrument, about 1.6 million pyrosequencing reactions
occur in parallel, each in a separate well of a picotiter plate contributing to a much higher
sequencing throughput than that achieved in a 96-well capillary array of a modern Sanger
sequencer. Similarly to 454/Roche, the Illumina Genome Analyzer also uses sequencing-bysynthesis, albeit with a different detection chemistry [74]. The Illumina sequencing reaction
utilizes four fluorescently labeled nucleotide analogs that serve as reversible sequencing
terminators, and highly modified DNA polymerases that are capable of incorporating these
analogs into the growing oligonucleotide chain (Figure 1.1B). At each step the correct
12
nucleotide analog is incorporated into the growing chain and its identity is revealed by the
color of its fluorescent label. Importantly, the 3´-OH group of the nucleotide is blocked to
prevent further extension of the nascent DNA chain. After the imaging step, the label is
washed off and the blockage is reversed, thereby allowing the synthesis to proceed. The
sequencing reactions occur in a massively parallel fashion on a flow cell, which is a glass
surface that contains hundreds of millions of clusters of clonally identical DNA molecules.
The true single-molecule sequencing approach commercialized by Helicos
Biosciences in the HeliScope instrument also used a sequencing-by-synthesis procedure in
which virtual terminators (nucleotide analogs that reduce the processivity of DNA
polymerase) are used [80]. The reduced DNA polymerase processivity allows for the
accurate identification of homopolymer stretches. In the Helicos system, single-molecule
DNA templates are captured on the flow cell surface. The Cy3-labels attached at both ends of
each DNA molecule are used to reveal the location of each template bound to immobilized
primers on the surface of the flow cell. The Cy5-labeled nucleotides are added to the reaction
one at a time, and the detection of incorporated nucleotides is achieved (Figure 1.1.C).
In contrast to the polymerase-based approaches discussed above, the SOLiD
(Supported Oligonucleotide Ligation and Detection System) system uses a sequencing-byligation approach in which the sequence is inferred indirectly via successive rounds of
hybridization and ligation events. This approach was first published by the Church laboratory
as the ―polony sequencing technique‖ [81]. The SOLiD system uses 16 dinucleotides, each
carrying a fluorescent label. Four fluorescent dyes are used in the system such that one dye
labels four different dinucleotides (Figure 1.1D). The identity of each base is determined
from the fluorescent readout of two successive ligation reactions. An advantage of the twobase encoding scheme is that each position is effectively probed twice, in principle allowing
for the distinction of sequencing error from a true sequence polymorphism.
1.6.3.2 Sanger-based sequencing methods for the detection of genetic lesions
Since Sanger sequencing had been the only available sequencing technology for more
than 20 years, routine whole genome sequencing was not feasible in that time frame, and
Sanger-based methods for rearrangement detection, not requiring whole genome sequencing,
had been developed. Digital karyotyping (DK) is a method for genome-wide analysis of copy
number changes and other genome rearrangements [82]. The method can be regarded as a
13
―genomic version‖ of the serial analysis of gene expression (SAGE) technique [83] described
in Section 1.7.2.
In DK, genomic DNA is digested with a mapping restriction enzyme, originally SacI
(with a 6 bp recognition sequence) followed by the ligation of biotinylated linkers and a
second digestion using a fragmenting restriction enzyme with a 4 bp recognition sequence.
The biotinylated sequences are isolated by binding to streptavidin and the DNA tags are
released using a tagging enzyme with a 6 bp recognition sequence. The isolated sequence
tags are concatenated, cloned, sequenced, and aligned to a reference genome assembly,
providing a copy number estimate at the particular locus. The combination of the mapping
and fragmenting enzymes used determines the size of detectable rearrangements, and the
genome-wide occurrence of mapping enzyme recognition sites defines genomic areas
represented in DK analysis. In the case of SacI, recognition sites are abundant and expected
to occur every 4 kb; however, some areas of the human genome (<5%) have lower densities
of SacI sites and thus would be analyzed at a lower resolution [82]. To date, DK has been
successfully applied to the analysis of a variety of cancers, including those of colon and
brain, and has been used to identify putative oncogenes and tumor suppressors in these
tumors [84,85]. The original version of DK has a theoretical resolution of 4 kb, which is
higher than the generally available array-based methods. A partial limitation of DK imposed
by the use of restriction enzymes is the uneven coverage of the genome, which may be
addressed by using different combinations of mapping and fragmenting enzymes.
Clone-based methods have been developed to detect both balanced and unbalanced
genome rearrangements in cancers. An end sequence profiling approach (ESP) has been
developed and successfully applied to the genome-wide analysis of rearrangements of the
MCF7 breast cancer cell line [86,87]. In ESP, a BAC library is constructed for the tumor
genome of interest, both ends of BAC clones are sequenced, and the paired-end sequences
are mapped back to a reference genome assembly. Structural genomic variants are discovered
by identifying clones whose paired-end sequences map to the reference genome in
orientations that indicate the clone was derived from rearranged DNA. The ESP approach is
potentially applicable to the detection of all types of genome rearrangements, which could be
inferred from different types of ―ESP signatures‖ [86]. While powerful, paired-end
sequencing of clones has several limitations. First, the approach is dependent on the
14
construction of clone libraries, which can be slow and costly, requiring high molecular
weight DNA. Second, the resolution of paired-end sequencing methods is determined by the
clone properties and the redundancy of genome coverage. Also, since the sampling occurs
only from the ends, large numbers of clones would be necessary to achieve genome-wide
high resolution coverage of rearrangements. To address this limitation a BAC clone
fingerprint profiling (FPP) approach for high resolution detection of genome rearrangements
was developed [88]. The FPP method includes the digestion of genomic BAC clones
prepared from tumor DNA with five restriction enzymes, HindIII, EcoRI, BglII, NcoII, and
PvuII to generate clone fingerprints that are then aligned against the in silico digests of the
reference genome sequence using the FPP alignment algorithm. The restriction enzymes
were selected to achieve frequent cutting and restriction site location complementation
(restriction-site-poor areas of one enzyme corresponding to restriction-site-rich areas of
another enzyme). The FPP alignment algorithm consists of four steps that are detailed in
[88]. Briefly, the steps for aligning each BAC fingerprint to the reference genome sequence
include the following: a global search of the reference genome sequence to identify BACsized or smaller genomic regions that yield digest patterns similar to that of the query clone;
a local search that further delineates the local correspondence between the fingerprint of the
query clone and that of the in silico digested genomic region(s) identified in step 1; an edge
detection algorithm that precisely identifies the extent of the alignment; and the final
partitioning step that selects an optimal solution, whereby a minimal set of alignments
maximally accounts for all clone fragments on the genome. Differences between the
experimental and in silico digestion patterns are indicative of genomic differences, including
genome rearrangements in the clone versus the reference genome. For instance, an alignment
in which the clone maps to one genomic region, but in which there are internal gaps in
fragment alignments, indicates the presence of a localized rearrangement confined to the
clone; on the other hand, an alignment in which the clone fingerprint is partitioned over
several regions in the genome suggests the presence of a translocation, inversion, or a large
deletion.
The FPP approach provides several important advantages over ESP and other
genome-wide methods for rearrangement detection. First, the method samples the entire
clone insert and not just the clone ends, as in ESP. Therefore, rearrangement coordinates
15
mapping within the clone will be more precisely localized with FPP than ESP, given the
same number of clones sampled [88]. Second, FPP is relatively tolerant of repeats compared
with ESP and oligonucleotide microarrays, since only 7% of human repeats are found in
contiguous regions of 3.9 kb (the average sizeable HindIII restriction fragment)[88]. This is
an important advantage, considering that a significant portion of the human genome is
composed of repeat sequences. Third, both balanced and unbalanced rearrangements are
potentially detectable. As in ESP, clones harboring rearrangements can be directly selected
for functional analyses and sequencing. Some of the drawbacks of the FPP approach include
the cost and speed of library production (similar to ESP), the cost of clone characterization
(cheaper than in ESP), and the requirement of a large amount of starting DNA material (less
than in ESP). Consequently, although the FPP approach is potentially very powerful, the
reliance on clones currently limits its widespread application. In addition, just as it is the case
with other methods that rely on restriction enzyme digestion, FPP may erroneously interpret
restriction fragment length polymorphisms as genome rearrangements. This limitation may
be partially addressed in the future as more complete catalogues of normal genomic variation
are compiled.
1.6.3.3 Cancer sequencing studies using the Sanger technology
As discussed in Sections 1.2 through 1.4, it has become increasingly clear that
sporadic cancers are associated with multiple acquired genetic lesions that contribute to
various aspects of oncogenesis. To address the spectrum of these lesions more
comprehensively than possible with hybridization or clone-based sequencing discussed in
previous sections, several sequencing initiatives have been launched worldwide. The most
notable of these are the Cancer Genome Project (CGP) in the United Kingdom and The
Cancer Genome Atlas (TCGA) in the United Stated [89,90]. A branch of the TCGA with a
pediatric focus, the Therapeutically Applicable Research to Generate Effective Treatments
(TARGET) initiative was also set up to apply similar approaches to the analysis of pediatric
tumors (http://target.cancer.gov/). Initially the large-scale sequencing projects relied on the
Sanger-based re-sequencing of the coding sequence of a gene set of interest or all genes in
the genome; however, with the advent of new sequencing technologies discussed in Section
1.6.3.1, these projects are switching to whole genome, exome and transcriptome analysis
using new sequencing platforms (Section 1.6.3.4). An analysis of 99 neuroblastoma cases
16
studied as part of the TARGET initiative using new sequencing technologies is discussed in
Chapter 4.
The systematic re-sequencing of the PCR-amplified coding exons of 518 protein
kinase genes in 210 human cancers of 13 different types of histology conducted by the CGP
initiative identified 1,007 somatic mutations, of which 921 were single base substitutions, 78
were indels, and 8 were complex rearrangements; 2/3 of these mutations had previously been
uncharacterized [69]. The first TCGA report of a comprehensive analysis of glioblastoma
tumors that incorporated re-sequencing data from a panel of over 600 genes in 143 cases
revealed three signaling pathways that may be disrupted in glioblastoma [91]. However,
since the sequencing effort involved only a subset of genes, recurrent mutations in IDH1, a
gene previously not implicated in cancer, were missed by this approach but detected by a
more comprehensive sequencing study [92]. Similar studies where pre-selected gene sets
were re-sequenced in panels of tumors were also conducted in pediatric acute lymphoblastic
leukemia, lung, and soft tissue sarcomas [93–95]. In all cases, these studies identified novel
loci and pathways associated with the diseases.
The re-sequencing of the coding regions of RefSeq and Consensus Coding Sequence
(CCDS) genes was conducted in 11 breast and 11 colorectal cancers [96,97] and identified
somatic mutations in 1718 genes (9.4% of the genes analyzed). More recently, similar
approaches were also conducted in ovarian cancer and pediatric solid tumor medulloblastoma
[98,99]. The medulloblastoma study involved the analysis of 22 tumors, and found an
average of 11 somatic gene alterations per tumor, which was fewer by a factor of 5 to 10
compared to the adult solid tumors analyzed by related approaches, as described in this
section above. Nonetheless, the study found mutations in MLL2 and MLL3, previously
unknown in this malignancy.
These studies suggest that large-scale sequencing efforts are successful at identifying
known and novel genetic aberrations in human cancers, and that our catalogs of genetic
variants that contribute to oncogenesis are incomplete for both pediatric and adult tumors. In
fact, prior to large-scale sequencing studies, approximately 1% of human genes had been
shown to be mutated in cancers using other techniques [7]. In contrast, recent data from the
Catalogue of Somatic Mutations in Cancers database at the Sanger Institute suggest that up to
26% of all genes may harbor somatic mutations in cancers, and novel cancer genes, with
17
proven causal roles in oncogenesis, are defined each year [100]. Some of the notable
examples of novel cancer genes discovered by sequencing studies include IDH1 in gliomas
and leukemias [101,92], EZH2 in lymphomas and myeloid disorders [102], and FOXL2 in
ovarian cancers [103]. The increasing number of genes with reported mutations in cancers
points at the heterogeneity of somatic mutation found in certain cancer types, particularly
solid tumors. For instance, the recent report from the sequencing of the coding region of 316
ovarian tumors by the TCGA revealed that TP53 was the only highly prevalent recurrently
mutated gene, and that other genes were mutated in small subsets of tumors [98]. Therefore,
the large-scale sequencing studies indicate that unbiased analyses of both adult and pediatric
cancers using higher resolution approaches may identify novel loci relevant to these diseases.
1.6.3.4 Cancer genome and exome sequencing using new sequencing technologies
With the advent of next-generation sequencing technologies described in Section
1.6.3.1, whole genome, exome, and transcriptome sequencing studies became more feasible
and routine than previously possible with Sanger sequencing. In addition to reducing the cost
of large-scale sequencing, the introduction of next-generation sequencers increased the
sensitivity of mutation detection. An early study using the 454/Roche sequencing technology
demonstrated the potential of next-generation sequencers to detect rare variants present in
specific subpopulations of cells that elude cost-effective detection by capillary sequencing
approaches [104]. The ability to detect genetic heterogeneity is due to the use of sequencing
templates that have been clonally derived from a single molecule; in this manner, a variant
present in a few cells can be detected if sufficient sequencing depth is applied. This feature is
particularly important in cancer research in light of the hierarchy of different cell types
within a tumor discussed in Section 1.5. Given this hierarchy as well as variable levels of
stromal contamination invariably present in clinical samples, Sanger sequencing studies of
cancers likely sampled only the most common genotypes present in a tumor, and may have
missed mutations in samples containing a high frequency of normal cells [104]. In contrast,
new sequencing technologies are potentially more sensitive and capable of detecting the
genetic make-up of rare populations.
The study that used Illumina technology to sequence the whole genome of an acute
myeloid leukemia (AML) sample became the first report of a cancer genome sequenced with
a new sequencing technology [105]. This study identified known and novel somatic
18
mutations that might contribute to leukemogenesis, suggesting that next-generation
sequencing provides a comprehensive way for analyzing cancer genomes. Since this initial
report, the genomes of additional hematopoietic (acute myeloid leukemia, chronic
lymphoblastic leukemia, multiple myeloma, B-cell lymphoma) and solid (lung, breast,
tongue, prostate, and skin) tumors have been published [101,106–114]. These studies have
led to the identification of genetic lesions previously not implicated in the particular
malignancy or oncogenesis per se. Some of this information was shown to be immediately
clinically actionable, such as in the case of a tongue adenocarcinoma, whose genome
sequence was used to suggest a potential therapeutic option for the patient [109]. Similarly,
the identification of BRAF mutations in a fraction of multiple myeloma patients suggests a
role for BRAF inhibitors in the management of the disease [108].
With the rapidly increasing number of cancer sequencing studies, largely facilitated
by the introduction of new sequencing technologies, an international group of experts
established the International Cancer Genome Consortium with the purpose of coordinating
the ongoing cancer sequencing efforts in different countries [115]. The projects within the
consortium encompass the sequencing of over 50 different cancer types, and over 25,000
individual cancer genomes.
In addition to whole genome sequencing of tumors, many efforts involve the
sequencing of coding regions of the genome or exome. A rationale for conducting exome
rather than whole genome sequencing is the current cost-efficiency of the former approach. It
can be also argued that somatic variation within the coding sequence is currently more
readily interpretable and clinically actionable than intergenic variation captured by whole
genome projects along with the coding variation. While sequencing experiments are
becoming increasingly more affordable, whole genome sequencing is still costly when
performed to the depth required to comprehensively identify variants in all genes (average
30X haploid coverage [116] that was later upgraded to at least 50X haploid coverage [117]).
Therefore, major reductions in sequencing and analysis costs need to occur before exome
sequencing can be rendered obsolete.
Several methods of target enrichment have been developed to select the coding
regions for sequencing. These methods comprise two most common categories, PCR-based
enrichment of targets, and hybridization-based enrichment of targets conducted in solution,
19
on an array or as a combination of these two approaches (hybrid) capture; each of these
methods have their own advantages and disadvantages [118]. To date several cancers have
been analyzed using next-generation exome sequencing, including rare tumors, such as
pheochromocytoma, hepatocellular carcinoma, hairy cell leukemia, renal cell carcinoma, and
acute monocytic leukemia [119–123]. Exome sequencing has been useful for detecting point
mutations and indels in the coding sequence, while whole genome methods, in addition to
detecting these events, have also detected gene fusions and structural rearrangements.
1.7
Cancer transcriptomes as proxies for the genomic diversity of tumors
Historically, cancers have been classified based on their pathological features.
However, it became evident that patients with an identical histopathological diagnosis
differed dramatically in terms of their disease course and response to therapy. These
phenotypic differences can be now attributed to the genomic heterogeneity that has emerged
from recent genome-level analyses of individual tumors as described in Section 1.6.3 (also,
as reviewed in [116]). However, even prior to high resolution genome sequencing studies,
some of this heterogeneity could be assessed from studying gene expression profiles of
seemingly identical tumors. Two conceptually different approaches to high throughput gene
expression profiling, using hybridization and sequencing, have emerged in the last decades
and allow for the interrogation of gene expression levels on a genome-wide scale.
1.7.1 Transcriptome analysis of cancers using microarrays
One group of methods for global transcriptome analysis is based on microarrays, in
which cDNA is hybridized to arrays of complementary oligonucleotide probes corresponding
to genes of interest, and the abundance of a particular mRNA species is estimated from its
hybridization intensity to the relevant probe [124]. Microarray analysis is used in this thesis
to study gene expression profiles of normal and malignant neural crest-like cells in Chapters
2 and 3. In Chapter 2, microarray expression data derived from several lineages of normal
SKin-derived Precursor cells (SKPs) are used to characterize the neural crest-like phenotype
of these cells and support their use as normal counterparts of neuroblastoma cells. In Chapter
3, microarray analysis is used to confirm the results from RNA sequencing experiments
(Section 1.7.2).
Several microarray platforms are currently available or in development; however, all
of them rely on the principle of probe-target hybridization, in which the signal intensity
20
provides a measure of the amount of particular nucleic acid in a sample. In addition to
measuring the concentration of nucleic acid in a sample, the signal intensity also depends on
probe-target binding affinity, specificity of which is controlled for in a microarray
experiment by introducing mismatch probes [125]. The seminal study applied microarrays to
the examination of expression profiles of acute myeloid and acute lymphoblastic leukemias
and showed that these clinically-distinct leukemias could be distinguished prospectively in an
unsupervised manner based on their gene expression information alone [126]. In addition to
the finding of correlation between the disease phenotype and global gene expression profile,
this study introduced two conceptually different applications of microarray analysis: class
prediction (assigning new tumors into known classes) and class discovery (discovering novel
clinically relevant subtypes) that have since been used widely in cancer transcriptomics
research. This work also brought about a multitude of expression profiling initiatives that to
date have been performed in many types of malignancy [127]. These studies aimed to
classify tumors previously indistinguishable with conventional approaches into clinicallyrelevant subtypes (class discovery) as well as to identify expression markers that could be
used to prospectively classify tumors into known disease subtypes (class prediction). Another
common direction of microarray data analysis is class comparison [128]. In class comparison
studies, genes with evidence of differential expression among disease types, cell populations
or experiments of interest are identified and used to gain novel biological or clinical insight
into the different classes being compared. The analyses of microarray data, described in
Chapters 2 and 3 of this thesis, are class comparisons, in which we sought transcripts
significantly increased or decreased in abundance in different populations of cells.
Early influential works in the cancer microarray field include class discovery studies
that identified previously indistinguishable clinically and biologically relevant subtypes that
derived from different cells of origin in diffuse large B-cell lymphomas [129] and breast
cancers [130]. Expression-based molecular classifiers developed as a result of such studies
(notably, the MammaPrint assay in breast cancers [131]) are being used in clinics and have
been shown to outperform conventional methods of clinical assessment [131].
Further developments in the microarray field enabled other cancer transcriptomics
applications, such as the detection of noncoding RNAs [132], single nucleotide
polymorphisms (SNPs) (described in Section 1.6.2), and alternative splicing events [133].
21
Despite their power to measure the expression of thousands of genes simultaneously,
microarray methods do not readily address several key aspects, notably the ability to detect
novel transcripts and the ability to study the coding sequence of detected transcripts.
Moreover, microarrays are indirect methods in which transcript abundance is inferred from
hybridization intensity rather than measured explicitly. These properties may interfere with
experimental reproducibility, particularly when performed by different laboratories [125].
1.7.2 Sequence census approaches to transcriptome analysis
A conceptually different group of methods uses sequencing of cDNA fragments
derived from mRNA, followed by counting the number of times a particular fragment has
been observed (Figure 1.2). This group of methods originally included the Serial Analysis of
Gene Expression (SAGE) method [83], and Massively Parallel Signature Sequencing
(MPSS) [134]. In SAGE, restriction enzymes are used to obtain short sequence fragments
(tags) usually derived from the 3‘ end of an mRNA; the tags are concatenated and sequenced
to determine the expression profiles of their corresponding mRNAs [83]. Modifications of
this protocol extended the tag length from the original 14 bp to 17 bp in LongSAGE and 26
bp in the SuperSAGE protocol [135,136]. In Chapter 2 of this thesis, SAGE analysis is used
to define a list of candidate pluripotency genes, preferentially expressed in undifferentiated
human embryonic stem cells.
The MPSS method also generates small fragment signatures of each mRNA species;
however, the in vivo propagation in bacteria used in SAGE is replaced with in vitro cloning
on microbeads [134,137,83]. In addition, MPSS uses a ligation-based sequencing method
instead of Sanger sequencing used in SAGE [134,83]. SAGE and MPSS are often termed
―clone-and-count‖ or ―sequence census‖ techniques as they provide a digital overview of
gene expression profiles in a cell [138]. Advantages of such digital readouts include
statistical robustness, and less stringent standardization and replication requirements than
those for microarrays [139,134]. Some disadvantages that had hindered the use of SAGE and
MPSS up until recently included the cost of sequencing and the biases introduced by the
necessary cloning step.
Despite its superior performance compared to microarrays at detecting highlyabundant transcripts, traditional SAGE is not very efficient at detecting rare mRNA
populations [140]. New sequencing technologies have increased the cost-effectiveness of the
22
method that originally relied on the Sanger sequencing protocol and eliminated the
requirement for the in vivo step [83]. Several next-generation sequencing-based SAGE
methods have been reported. One method termed DeepSAGE uses the 454 sequencing
technology to generate 300,000 tags with less effort than a traditional LongSAGE experiment
generating 50,000 tags [141]. Another SAGE-like method based on a new sequencing
technology, Tag-Seq relies on the Illumina technology to generate 10 million tags per run
which represents a two orders of magnitude increase over the throughput of traditional
LongSAGE [142]. Both of these methods have been shown to increase the representation of
low abundance transcripts that evade detection by Sanger-based SAGE methods [142,141],
thereby providing a more complete view of the transcriptome. In addition to improving the
original sequencing-based methods for gene expression analysis, new sequencing
technologies have enabled the development of new sequence census methods, such as Rapid
Analysis of 5‘-Transcript Ends (5‘-RATE) used for surveying 5‘ end fragments [143].
Originally the LongSAGE protocol was used in the Cancer Genome Anatomy Project
(CGAP) consortium that was formed to construct a public database of gene expression
information across multiple cancer, pre-cancer, and normal tissues [144]. This initiative aims
to provide a comprehensive resource that could be mined for the identification of transcripts
enriched in a particular tissue type. The SAGE protocol was chosen over microarrays due to
its digital expression readout and the relative ease with which data from multiple laboratories
could be combined together for analysis [144]. Due to the advent of Illumina-based Tag-Seq,
several recent CGAP libraries have been constructed using Tag-Seq, which was shown to
outperform the originally used LongSAGE protocol and microarrays in terms of dynamic
range and transcript representation, including the representation of sense-antisense transcript
pairs [142].
1.7.2.1 Whole transcriptome sequencing of cancers
Full length cDNA sequencing [145] and the generation of expressed sequence tags
(ESTs) or single sequencing reads derived from one end of a cDNA clone [146] have been
used to characterize cellular mRNA profiles, including those of cancer cells. However,
primarily due to the cost of sequencing, these Sanger sequencing-based methods had been
even less effective than traditional SAGE at providing a representation of rare cellular
transcripts or transcript representation [147]. With the development of new sequencing
23
technologies, EST sequencing gained potential as one of the sequence census method for
studying mRNA profiles on a genome-wide scale. With the elimination of the cloning step
and common use of random priming, next-generation EST sequencing tags can now cover
the whole length of transcripts [148]. Deep EST sequencing of transcriptomes using nextgeneration technology is also referred to as whole transcriptome shotgun sequencing
(WTSS)[149] or RNA sequencing (RNA-Seq) [150,151]. In a version of this approach,
polyA-selected or ribosomal RNA-depleted RNA is reverse transcribed into cDNA, which is
then fragmented and sequenced using a next-generation technology to generate reads
intended to cover the full length of a transcript [149].
Comparative transcript coverage with each of the sequencing-based methods
described thus far is provided in Fig. 1.2. The ability to cover the whole length of transcripts
with RNA-sequencing reads enables many applications, previously unachievable with tag
sequencing and hybridization approaches [152]. Similarly to hybridization-based approaches,
RNA-Seq is able to address differential gene- and exon-level expression but with lower
background, over a larger dynamic range, and with opportunities for repeat analyses based on
different sets of annotations [153]. In addition, RNA sequencing data can be used to study
the structure of splice isoforms [154], and identify chimeric transcripts [155] that may result
from genomic rearrangements [156] and/or trans-splicing [155]. Moreover, read sequence
information allows for the detection of mutations [103] and RNA edits [157], as well as
quantification of the expression level of each alternative allele [158] – applications not
readily available with tag-sequencing or array technologies.
To date the transcriptomes of several cancer cell lines and primary tumors, including
those from cervical, colon, prostate, and hematopoietic cancer types have been characterized
by RNA-Seq protocols using 454, Illumina or SOLiD technologies [152].
RNA-Seq was the approach that enabled the recent discovery of key recurrent
mutations in FOXL2 and ARID1A in ovarian cancers [103,159], and EZH2 mutations in Bcell lymphoma [102]. Similar approaches have been also applied to the discovery of
mutations in other cancers, including acute myeloid leukemia [160] and malignant pleural
mesothelioma [161]. In addition, this approach has led to the discovery of novel expressed
gene fusions affecting the RAF kinase pathway in solid malignancies [162]. The alternative
splicing application of RNA-Seq has been applied to the identification of splice isoforms
24
associated with drug resistance in colorectal cancer [154]. These studies suggest that RNASeq is a versatile approach that not only enables the examination of gene expression profiles,
but also simultaneously allows the detection of coding mutations and gene rearrangements, at
least where these events do not abrogate gene expression. RNA sequencing is used in
Chapters 3 and 4 of this thesis to characterize the expression profiles on neuroblastoma
tumor-initiating cells (Chapter 3) and primary tumors (Chapter 4). In addition to gene-level
expression profiling, RNA-Seq is used for exon level expression analysis (Chapter 3), and
the detection of point mutations and fusion transcripts (Chapter 4).
1.8
Integrative genomics of cancers
With increasing amounts of genome sequence, copy number, expression, and
epigenetic data, generated for different cancer types, efforts have focused on integrating these
data sets to produce multidimensional views of cancers. Such efforts are important priorities
of large-scale cancer genomics initiatives, notably the TCGA [89]. To address the demands
of the research community, several software platforms have been developed for the
visualization and analysis of multiple types of genomic data, including the Integrated
Genomics Viewer (IGV) [163], the Cancer Genomics Workbench [164], the UCSC Cancer
Genomics Browser [165] and others.
Integrative genomic studies of cancers have followed several general directions:
identifying genes [166,92] and pathways [98] affected by multiple types of aberrations within
the same cancer; combining multiple data types to define and characterize disease subtypes
[167,168]; and conducting systems biology analyses to reconstruct cellular regulatory
networks [169]. The first TCGA study that demonstrated the power of integrating multiple
datasets to provide a system-level view of a cancer combined DNA copy number, gene
expression, sequence and DNA methylation information from a cohort of 206 cases of
glioblastoma multiforme (GBM) [91]. This study defined three signaling pathways,
RTK/RAS/PI-3K, RB, and p53 signaling, each altered in over 75% of GBM patients. Even
though GBM did not have frequent recurrent changes at the level of single genes, multiple
datasets revealed highly-recurrent changes at the level of signaling pathways, demonstrating
the power of integrative analysis to identify recurrent and prevalent alterations at the level of
pathways and functional networks. Other example discoveries from integrative data analyses
of cancers include the characterization of three subtypes of GBM (proneural, mesenchymal
25
and classical) associated with different gene expression and mutation signatures impacting
the clinical outcome [168]; the discovery of defects in homologous recombination in a large
fraction (approximately 50%) of ovarian cancers studied by the TCGA [98]; and the
realization that multiple types of sequence, expression and epigenetic defects, observed in
acute lymphoblastic leukemia, affect the WNT and MAPK pathways, implicating these
pathways as potential therapeutic targets for the disease [166].
1.9
Childhood neuroblastoma
As discussed in Sections 1.3 and 1.4, most adult cancers arise through
progressive accumulation of genetic aberrations likely occurring over many years or decades.
In contrast, fewer genetic changes occurring in a short developmental time window may be
sufficient for the tumorigenesis of childhood cancers [99,170,171]. Therefore, characterizing
the developmental origin of childhood cancers is essential to understanding the biology of
these malignancies.
Neuroblastoma (NBL) is a childhood cancer of the developing sympathetic nervous
system [172]. Tumors of the sympathetic nervous system account for 7.8% of all cancers
among children younger than 15 years of age and of these, 97% are NBLs [25]. The ganglia
of the sympathetic nervous system are derived from the sympathoadrenal lineage of the
embryonic neural crest [173]. The neural crest and its multiple lineages are discussed in more
detail in Section 2.1. According to the Surveillance Epidemiology and End Results database
that tracks cancer epidemiology data in the United States, NBL is the most common cancer
diagnosed in the first year of life in the United States [25]. There are approximately 60 new
NBL cases each year in Canada (Canadian Cancer Society).
The most common site for primary NBL tumors is the adrenal medulla; however,
tumors can arise anywhere along the sympathetic branch of the autonomic nervous system
(the branch that mediates the fight-or-flight response) [174]. The exact cell of origin of NBL
is unknown and likely differs for different disease subgroups, such that aggressive tumors
derive from morphologically undifferentiated cells while benign tumors derive from more
differentiated cell types [175] . It is thought that a subset of NBLs originates from PHOX2Bpositive neuronal progenitors [176]. As discussed in Section 1.9.2, inherited mutations in
PHOX2B are associated with a fraction of familial NBLs.
26
1.9.1 Classification, treatment and prognosis
NBL cases are diverse with regards to the histopathology, molecular features, and
clinical outcomes. At presentation the disease can be limited to a single organ, locally or
regionally invasive, or widely disseminated; more than 50% of cases are metastatic at
presentation [177,174]. The most common metastatic sites are lymph nodes, bone marrow,
bone, and liver [174]. Intriguingly, NBL is both disproportionally lethal despite very
aggressive multimodal therapy and associated with a highest rate of spontaneous and
complete regression in a subset of cases [178,174,179]. Among other factors, disease
prognosis strongly depends on the age at diagnosis, with most infants typically having more
favorable prognosis than older children. Historically a 12 months age cutoff was used for
pre-treatment risk assessment; however, a recent retrospective study that examined the
outcomes of 3,666 patients correcting them for MYCN status and stage, reported a continuous
prognostic impact of age [180]. Statistical analysis performed in this study showed that a
460-day (18 months) cutoff maximized the outcome difference for younger and older
patients.
To facilitate comparisons between clinical trials and studies conducted in different
countries, the International Neuroblastoma Staging System (INSS) was developed in 1988 by
an international panel of experts and revised in 1993 [181]. Since then, the INSS has been the
most commonly used staging system in Europe and North America [179]. The INSS is a
surgically-based system that differentiates patients into stages 1, 2A, 2B, 3, 4 and 4S based
on the degree of surgical excision, lymph node involvement, presence of distant metastases
and age (younger or older than 12 months). A significant limitation of this system is its
dependence on surgical resection, whereby patients with localized disease who do not
undergo surgery cannot be properly staged.
To address this limitation, a pre-treatment staging system was developed by the
International Neuroblastoma Risk Group (INRG) task force and termed the INRG staging
system [182]. According to the INRG staging system, tumors are to be classified at diagnosis
into one of the four stages: L1 (localized disease without image-defined risk factors), L2
(localized disease with image-defined risk factors), M (metastatic disease), and MS
(metastatic special disease). In addition, after examining 8,800 NBL cases from North
America, Europe, Japan, and Australia, the INRG task force also characterized 16 clinically
27
distinct pre-treatment risk groups that are defined by 7 risk factors: age, INRG stage (L1, L2,
M or MS), histological category, differentiation grade, MYCN oncogene amplification status,
11q LOH status, and ploidy [183]. Based on these factors, the INRG recommends classifying
tumors into four pre-treatment risk categories with statistically different 5-year event-free
survival (EFS): very low-risk (5-year EFS > 85%), low-risk (5-year EFS 75-85%),
intermediate-risk (5-year EFS 50-75%), and high-risk (5-year EFS < 50%). Low- and very
low-risk patients are often observed without any interventions or cured with surgery alone
[184]. A special subset of low-risk patients with metastatic disease, denoted as INRG stage
MS (INSS stage 4S) includes patients younger than 18 months with metastatic disease
limited to bone marrow, liver or skin, favorable histology and no MYCN amplification. This
subset of patients is often given supportive care and observed as these patients tend to
achieve complete disease regression without any treatment [184]. Intermediate-risk patients
are treated with surgery and moderate intensity chemotherapy, while high-risk patients
undergo one of the most aggressive anti-cancer protocols available for both pediatric and
adult cancer [184,174].
The front-line protocol for high-risk NBL includes surgery, high intensity
chemotherapy with stem cell rescue, radiation, and biological therapy with retinoids [179].
Even despite this aggressive treatment, only 30-40% patients achieve long-term survival, and
there is no regimen proven to be curative for relapsed disease [174]. A recent phase 3 clinical
trial showed that adding ch14.18 monoclonal antibody against tumor-specific antigen GD2 to
standard isotretinoin therapy for first remission improves survival for high-risk NBL patients
by 20%, suggesting implementation of the immunotherapy protocol as part of the standard
treatment for high-risk NBL [185]. Even so, high-risk NBL remains a significant challenge
for pediatric oncologists, and new therapies are needed to improve the survival and reduce
treatment-related morbidities for these patients.
Chapter 3 of this thesis focuses on the analysis of NBL tumor-initiating cells, isolated
from the bone marrow of relapsed high-risk NBL patients. As described in Section 1.5,
cancer stem cells and tumor-initiating cells are presumed to be associated with tumor
recurrences and drug resistance [35]. Therefore, the characterization of the transcriptomes of
NBL tumor-initiating cells may help identify drug targets for relapsed and refractory disease.
Chapter 4 describes an analysis of genomes, exomes, and transcriptomes of primary high-risk
28
NBL tumors with the goal of identifying genetic targets that could influence the development
of novel therapies for high-risk NBL.
1.9.2 Neuroblastoma genetics and genomics
A small subset of NBL cases (<5%) are familial and display an autosomal dominant
mode of inheritance [179]. It has been shown in early studies that NBL incidence and family
history follows Knudson‘s two hit hypothesis, and it was estimated that up to 22% of cases
may have a germline mutation [186]. Recent studies have implicated activating mutations in
anaplastic lymphoma kinase ALK to account for most cases of familial neuroblastoma
[187,188]. Additionally, a small number of NBL cases that occur in conjunction with
congenital central hypoventilation syndrome or Hirschsprung‘s disease are associated with
germline mutations in PHOX2B [189,190]. The locus encodes a homeodomain transcription
factor essential for the development of autonomic derivatives of the neural crest [191]. While
PHOX2B harbors mutations that are exclusively germline, the ALK locus can be mutated or
amplified in 5-15% of sporadic NBL [192–194,188]. Mutated ALK protein is typically
overexpressed and shows constitutive kinase activity, and knockdowns of mutant alleles
reduce proliferation of NBL cell lines [193]. In addition, recent evidence suggests that wild
type ALK alleles may be oncogenic if they are associated with ALK overexpression;
therefore, inhibition of wild type or mutant protein with small molecule inhibitors may
provide therapeutic avenues for NBL patients with or without ALK mutations [195].
To understand the contribution of common variants to the development of sporadic
NBL, a genome-wide association study is currently under way under the patronage of the
Children‘s Oncology Group [196,174]. The study aims to genotype 5,000 European ancestry
NBL cases and 10,000 matched controls using the Illumina HumanHap550 BeadChip
platform. To date, the study has reported significant association with the high-risk NBL
phenotypes of SNPs within FLJ22536 at 6p22 (odds ratio = 1.37; 95% confidence interval
1.27 to 1.49; P = 9.33E-15), BARD1 at 2q35 (odds ratio = 1.68; 95% confidence interval 1.49
to 1.90; P = 8.65E-18), and LMO1 (odds ratio = 1.34; 95% confidence interval 1.25 to 1.44;
P = 5.20E-16) at 11p15 [197,196,198,66]; while SNPs within DUSP12 at 1q23 (odds ratio =
1.46; 95% confidence interval 1.28 to 1.65; P = 8.13E-9), DDX4 at 5q11 (odds ratio = 1.31;
95% confidence interval 1.14 to 1.49; P = 8.00E-5), IL31RA at 5q11 (odds ratio = 1.24; 95%
confidence interval 1.08 to 1.42; P = 2.24E-3), and HSD17B12 at 11p11 (odds ratio = 1.47;
29
95% confidence interval 1.30 to 1.66; P = 5.04E-10) were associated with low-risk disease
[199]. In addition, common gains of 1q21 (NBPF23) have been found to be significantly
associated with NBL (odds ratio = 2.49; 95% confidence interval 2.02 to 3.05; P = 2.97E17), regardless of the disease phenotype [200].
1.9.2.1 Copy number aberrations
Tumor-specific amplification of the MYCN oncogene, found in approximately 20% of
primary tumors, was the first copy number alteration to be characterized in NBL [201]. This
copy number aberration was immediately recognized to be linked with inferior disease
prognosis [202,203], and has remained a key molecular factor in pre-treatment risk
assessment ever since (Section 1.9.1). As discussed in Section 1.9.2, ALK is amplified in a
subset of NBL tumors. Examination of a panel of 50 NBLs using interphase FISH found that
copy number alterations involving ALK occurred in 60% of tumors and were not correlated
with copy number status at MYCN, 1p36, 11q or 17q loci [204].
In addition to MYCN and ALK amplifications, copy number alterations at several
larger genomic regions are associated with clinical behavior or other phenotypic
characteristics of the disease. For instance, losses of 11q have been reported to occur in
NBLs without MYCN amplification, and have been associated with a poor disease prognosis
in this subgroup [205,206]. In contrast, losses of 1p36 have been shown to be enriched in
MYCN-amplified tumors; however, it has been suggested that these losses may confer a poor
effect on survival, independently of MYCN. To address the prognostic significance of these
two aberrations independently from other factors, these loci were specifically examined in a
panel of 915 tumors; the study revealed that unbalanced loss of 11q and loss of
heterozygosity at 1p36 were independently associated with poor prognosis in NBL. Due to its
prevalence in non-MYCN-amplified cases, 11q status is currently used as one of the criteria
for assigning a pre-treatment risk group according to the INRG system [183]. Another
common copy number alteration found in NBL is gain of the distal arm of chromosome 17
(17q gain) [207]. This alteration usually occurs in tumors with poor prognosis; however, its
independent prognostic significance is unknown [196]. Several reports of translocations
between chromosomes 11 and 17 provide a potential pathway for the concomitant occurrence
of 17q gain and 11q loss aberrations [208,209]. Other less frequent chromosomal alterations
with unknown independent prognostic value have also been reported in NBL [196].
30
In addition to specific copy number events discussed above, overall genome
structures of 493 NBL tumor samples were examined using array CGH [187]. The study
found that the structure of the tumor genome was variable across tumors, such that some
tumors harbored exclusively whole chromosome gains and losses (numerical alterations),
while others harbored gains and losses of parts of chromosomes (segmental alterations).
Moreover, these genomic patterns were indicative of disease prognosis, such that tumors with
numerical chromosomal alterations were associated with excellent prognosis, while tumors
harboring any types of segmental chromosomal alterations were associated with high-risk
disease or relapse.
1.9.2.2 Gene expression profiling of neuroblastoma
Expression of several individual markers, including TRK neurotrophin receptors, have
been associated with prognosis in NBL; in particular, expression of NTRK1 (TRK-A) and
NTRK3 (TRK-C) is associated with favorable prognosis [210,211], while the expression of
NTRK2 (TRK-B) is associated with poor prognosis [212]. The first microarray study
conducted in NBL confirmed the association of gene expression patterns with disease course
and derived a panel of 19 genes that could be used to classify tumors into prognostic groups
[213] . A number of microarray expression profiling studies followed, several of them
specifically focusing on improving the prognostic stratification for intermediate-risk cases
that are difficult to assign to a treatment plan according to the known prognostic markers
[214,215]. Another large-scale study developed a 144-gene signature, based on which the
investigators were able to improve retrospectively the risk stratification used in NBL clinical
trials; the gene expression signature was originally validated in a set of 174 patients [216],
and later in 440 patients [217].
A recent meta-analysis study examined several previously published microarray
datasets and single-gene studies to develop a robust 59-gene signature that was then validated
in a set of 579 primary tumors spanning all risk groups, the largest patient cohort examined
to date by gene expression analysis. After adjusting for other known clinical markers of
prognosis, such as MYCN status, age and disease stage, the prognostic signature was found to
be independently predictive of the overall and event-free survival [218]. While risk
stratification based on gene expression has been shown to improve the performance of the
31
prognostic factors currently used in the clinics based on large sample sets, it is yet to be
implemented into the clinical management of NBL.
1.9.2.3 Genetically engineered mouse models of neuroblastoma
The creation of genetically engineered mouse models (GEMMs) carrying exogenous
DNA of interest has contributed to our understanding of the functions of cancer genes [219].
A key role of GEMMs in cancer research has been in characterizing which aberrations in
cancer genes can induce or contribute to tumorigenesis when expressed in mice, providing
functional evidence for these aberrations acting as driver events in human cancer formation
[220,219].
As discussed in Section 1.9.2.1, amplification of the MYCN oncogene is the best
characterized genetic event that occurs in 20% of NBL tumors and is associated with poor
disease prognosis. To understand the role of MYCN in NBL, a GEMM in which MYCN is
overexpressed in the sympathoadrenal lineage of the neural crest using the tyrosine
hydroxylase (TH) promoter was constructed [221]. The TH-MYCN mice hemizygous for the
MYCN transgene develop NBL tumors with 70% penetrance by 1 year of age, while
homozygous mice develop tumors with 100% penetrance by 4 months of age [222,220]. The
TH-MYCN mouse model provides a model of MYCN-amplified NBL, and is currently the
only well-characterized GEMM available for NBL [220]. Murine NBL recapitulates many of
the biological and clinical characteristics of human MYCN-amplified NBL, such as genomic
abnormalities (including MYCN amplification), disease pathology and gene expression
patterns [220,222]. However, the model differs from the human disease by the low frequency
of bone marrow metastases, and the predominantly paraspinal (as opposed to adrenal in
humans) primary tumors [223,220].
1.10
Thesis roadmap and chapter summaries
Recent advances in cancer genomics have contributed to our understanding of cancers
as diseases associated with multiple aberrations that can affect genes at the level of sequence,
copy number, mRNA expression, or epigenetics. Applications of cancer genomic methods to
the analysis of high-risk neuroblastoma (NBL) led to discoveries of recurrent copy number
aberrations, gene expression signatures, and predisposition markers predictive of the disease
phenotype. These studies revealed molecular heterogeneity of high-risk NBL, suggesting that
application of higher resolution approaches may identify novel markers linked to the
32
pathogenesis of this disease. The main hypothesis underlying the research described in this
thesis is that single-nucleotide resolution analysis of high-risk NBL genomes and
transcriptomes will lead to the discovery of new loci that contribute to the disease. I also
hypothesized that better understanding of gene expression profiles of the putative normal cell
of origin of NBL will help interpret high throughput sequencing data from NBL cells by
placing it in the context of expression analysis of the normal neural crest cells. Therefore, the
objectives of my research are to characterize the genomes and transcriptomes of high-risk
NBL primary tumors, NBL tumor initiating cells, and normal neural crest cells using new
sequencing technologies with a goal of identifying novel loci that may be implicated in the
disease.
Since NBL originates from the developing neural crest, the goal of the research in
Chapter 2 is to identify and characterize the expression of key genes and pathways that
distinguish normal neural crest stem cells from other stem cell lineages. Key findings of the
work described in Chapter 2 include the plasticity of the neural crest stem cell phenotype, in
which non-neural-crest derived cells can converge to this phenotype; and the finding of a
decreased expression of double-stranded DNA repair genes as compared to another somatic
stem cell lineage with a broad developmental potential, the mesenchymal stem cells.
The rationale for studying NBL tumor initiating cells (TICs), a highly tumorigenic
population of metastases-derived NBL cells, in Chapter 3 is the aggressive behavior of highrisk NBL and its high propensity for relapse, potentially linked to the persistence of TICs that
are resistant to conventional therapies. The goal of the research described in Chapter 3 is to
use RNA sequencing data from NBL TICs to identify NBL TIC-enriched transcripts and use
them to predict therapeutics that could specifically target these cells. The key finding of this
work is the identification and validation of AURKB as a novel drug target for NBL TICs.
Having studied the transcriptomes of normal neural crest cells (Chapter 2) and NBL TICs
(Chapter 3), I addressed in Chapter 4 whether whole genome and transcriptome analysis of
primary NBL tumors may identify additional genetic markers that could inform novel
therapies of relevance to primary NBL tumors at diagnosis. This large-scale sequencing work
revealed that NBL tumors harbored relatively low frequencies of somatic point mutations in
coding sequences. Despite this observation, several gene groups, including those involved in
the MAPK signaling pathway and chromatin remodeling, emerged from this analysis as
33
being the targets of somatic mutations in 15% and 11% of patients, respectively. These
mutational signatures may suggest potential therapeutic avenues that could be explored in
patient subgroups with these mutations.
34
Figure 1.1 Advances in sequencing chemistry implemented in the earliest next-generation sequencers
In each diagram, DNA templates are depicted as black bars, sequencing primers are shown as aquamarine bars, and DNA polymerases
are represented as light blue circles. (A). The pyrosequencing approach implemented in 454/Roche sequencing technology detects
incorporated nucleotides (here an A nucleotide is shown) by chemiluminescence (yellow shape) resulting from PPi release. (B). The
Illumina method utilizes sequencing-by-synthesis in the presence of fluorescently labeled nucleotide analogues (green, red, blue and
yellow circles) that serve as reversible reaction terminators. The sequencing is performed on millions on templates simultaneously,
and an imaging step follows each incorporation step to determine the identity of added nucleotides (bottom). (C) The single-molecule
sequencing-by-synthesis approach detects template extension using Cy3 and Cy5 labels attached to the sequencing primer
(aquamarine) and the incoming nucleotides (fuchsia), respectively. (D). The SOLiD method sequences templates by sequential
ligation of labeled degenerate probes. Two-base encoding implemented in the SOLiD instrument allows for probing each nucleotide
position twice. For instance, the nucleotide sequence demonstrates that the T base is effectively read twice by red (A to T) and green
(T to G). The matrix on the left shows that each of the four colors encodes two separate nucleotide pairs. Reprinted with permissions
of Annual Reviews.
35
A
C
B
D
36
Figure 1.2 Transcript model coverage by various sequencing-based methods for transcriptome analysis
The exons in a gene model are represented by orange, blue and green bars, while the introns are in grey. Following transcription and
splicing, a transcript carrying exons 1, 2, and 3 is produced. The coverage of this transcript by various methods is depicted in the black
box: Sanger-based expressed sequence tags (ESTs) are generated from the 3‘ or 5‘ end of transcripts, whereas SAGE tags represent
short sequences at their 3‘ ends; randomly primed short reads generated by next-generation sequencers detect bases throughout the
length of the transcript. Modified with permissions of Annual Reviews.
37
38
Table 1.1 Specifications of the common next-generation sequencing platforms as compared to the most common Sanger
sequencer (Life Technologies’ ABI3730XL)
The run statistics in this table are from [224]. The average read length is for high quality reads of more than 200 bases (the mode is
higher). *Polymerase Chain Reaction (PCR) can be used for the amplification of templates for Sanger sequencing, when it is desired
to sequence specific regions of the genome; the use of PCR for template amplification and candidate gene sequencing by the Sanger
method is discussed in Section 1.6.3.3.
Instrument
Average read
length
650 bp
Run
time
2 hrs
Mega bases
per run
0.06
Paired 150 bp
96,000
Paired 100 bp
14
days
8 days
400 bp
10 hrs
500
Sequencing by synthesis with
irreversible terminators (Sanger)
Sequencing by synthesis with
reversible terminators
Sequencing by synthesis with
reversible terminators
Pyrosequencing on solid support
12
days
71,400
PacBio RS
Paired 50 bp
(forward) and 35
bp (reverse)
860-1,100 bp
0.52hrs
5-10
Heliscope
35 bp
N/A
28,000
Ion Torrent
(316 chip)
>100 bp
2 hrs
>100
ABI3730XL
Illumina
GAIIx
Illumina
HiSeq2000
454/FLX
Titanium
SOLiD-4
200,000
Sequencing chemistry
Template
amplification
In vivo cloning*
Company
Bridge PCR
Life
Technologies
Illumina
Bridge PCR
Illumina
Emulsion PCR
Roche
Sequencing by ligation
Emulsion PCR
Life
Technologies
Sequencing by synthesis using
SMRT (single molecule real
time) technology
Sequencing by synthesis with
virtual terminators
Sequencing by synthesis with
semiconductor detection
None
Pacific
Biosciences
None
Helicos
BioSciences
Life
Technologies
Emulsion PCR
39
Chapter 2: Transcriptome analysis of normal neural crest stem cells 2
2.1 Introduction
During early human development, a zygote (fertilized egg) undergoes cell divisions to
form a blastula that implants into the uterus to continue embryogenesis. Following
implantation, the process of gastrulation results in the formation of the asymmetric embryo
consisting of three germ layers – ectoderm, mesoderm and endoderm – that go on to develop
all major organs and tissues in the body. The ectoderm-derived neural crest is a transiently
multipotent cell population unique to vertebrates [225]. Neural crest cells migrate out of their
origin at the apex of the neural tube, the embryo‘s precursor to the central nervous system,
and form aggregates throughout the embryo that later develop into ganglia of the peripheral
nervous system. A fraction of neural crest cells infiltrates other organs, such as skin, gut and
adrenal glands to generate melanocytes, enteric neurons, and hormone-secreting chromaffin
cells, respectively [226]. Neural crest cells also contribute to craniofacial cartilage and bone,
as well as cardiac and smooth muscle tissues.
The development of neural crest cell lineages has been compared to the process of
haematopoiesis [226], in which blood cell types derive from a hematopoietic stem cell via
differentiation into a series of committed progenitors. In this model, the original stem cell is
multipotent, while the differentiation potential of the committed progenitors is restricted to
the cell types that make up the particular lineage. Accordingly, the existence of progenitor
cells committed to specific neural crest lineages has been proposed, including those
committed to the enteric, parasympathetic, sympathoadrenal, sensory, glial, and melanogenic
lineage [226]. The sympathoadrenal progenitor, a common progenitor to sympathetic neurons
2
Portions of this Chapter have been published, and the co-author contributions are detailed in the Preface as per
the University of British Columbia PhD thesis guidelines: O. Morozova, V. Morozov, B.G. Hoffman, C.D.
Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in
Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008; H. Jinno, O. Morozova, K.L.
Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller.
Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem
Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press; M.D. O‘Connor, E. Wederell, G. Robertson, A.
Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A.
Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell
maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier.
40
and chromaffin cells originating from the trunk region of the neural tube, has been identified
and characterized using imaging studies in rats [227]. Intriguingly, cell types derived from
the sympathoadrenal progenitor, but not other neural crest progenitors, are those that are
susceptible to transformation into NBL [216,217]. The differentiation and specification of the
sympathoadrenal lineage progenitors, although not completely understood, involves the
transcription factors ASCL1, PHOX2A, PHOX2B, and HAND2 [175].
Since NBL originates from the developing neural crest, and moreover, from a specific
neural crest lineage, understanding the biology and differentiation of normal neural crest
stem cells may help shed light onto molecular events associated with NBL formation.
Moreover, germline mutations in PHOX2B are associated with a fraction of familial NBL
cases implicating genes involved in neural crest differentiation in NBL formation [190,189].
Work to date has shown the persistence of adult or somatic stem cells in many tissues, most
notably central nervous and hematopoietic systems [230–232]. Similarly, multipotent adult
stem cells have been isolated from the dermis of rodent and human skin and termed SKinderived Precursors (SKPs) [221,222]. These cells have been shown to maintain
differentiation potential reminiscent of the neural crest stem cell, and are able to generate
peripheral neurons, glia, Schwann cells (a subtype only thought to be made from the neural
crest), and smooth muscles [235]. We also demonstrated in a publication that is beyond the
scope of this thesis that SKP progenitors reside in the hair follicle niche and exhibit
properties expected of a dermal stem cell, contributing to dermal maintenance, woundhealing, and hair follicle morphogenesis [236].
Skin-derived Precursors can be derived from the dermis throughout the body, only
the facial component of which originates from the neural crest embryonically [237–239].
Therefore, it is unclear whether SKPs isolated from different areas of the dermis are of neural
crest origin per se or converge towards the neural crest stem cell phenotype. If SKPs are
derived from the neural crest, this would imply that neural crest progenitors invade the
mesoderm-derived dorsal and ventral dermis [237,239] during embryogenesis, and that it is
these precursors that associate with hair follicles and generate SKPs. Alternatively, if ventral
and dorsal SKPs are not derived from the neural crest, this would indicate a possibility of a
second developmental pathway to generate neural crest-like cells from a non-neural-crest
origin. Distinguishing between these two possibilities would have important implications for
41
developmental biology as the origin of somatic stem cells, such as SKPs, is not well
understood [240]. In terms of the NBL development, the second option would indicate that
NBL may potentially derive from a non-neural crest origin by lineage convergence. In terms
of NBL laboratory research, the origin of SKPs by lineage convergence would also indicate
that SKPs derived from any area of the body can be used to model normal counterparts of
NBL cells in the laboratory. Precedence for the lineage convergence phenomenon has been
established by in vitro studies where normal fibroblasts could be reprogrammed toward an
ES-cell-like phenotype [241,242]. In addition, as discussed in Section 1.5, breast cancer stem
cells can arise in vivo from tumor cells via epithelial to mesenchymal transformation [33].
These considerations led us to hypothesize that neural crest stem cell-like cells may arise
from cell lineages other than the neural crest by adopting a neural crest stem cell-like
phenotype.
The overall objective of this Chapter is to characterize the expression profiles of SKP
lines used as models of neural crest progenitors [234] and representing the presumed normal
counterparts of neuroblastoma cells [231,232]. To fulfill this objective, I addressed the
specific aims outlined below. First, I characterized the transcriptomes of SKPs isolated from
facial, ventral and dorsal skin regions of the body that were shown by lineage tracing work to
be derived from different developmental origins. In this experiment I showed that the three
SKP lineages were similar at the expression level, but maintained the expression of a small
set of genes indicative of their embryonic origin. Second, I used the three SKP populations to
identify genes enriched and depleted in all SKPs compared to a mesodermal multipotent
somatic stem cell lineage, mesenchymal stem cells. These transcripts represent markers that
are common to neural crest progenitors regardless of their origin and distinguish the neural
crest progenitors from mesenchymal stem cells. Third, as it was noted during the experiments
described under Specific Aim 2 that neural crest, but not mesenchymal progenitors,
expressed markers of pluripotency, I compared the normal neural crest progenitor-enriched
transcripts to transcripts enriched in normal embryonic stem cells. This was done to further
delineate similarities and differences between SKPs and embryonic stem cells (ESCs). The
results from these analyses provided insights into distinguishing characteristics of the
transcriptome of normal neural crest progenitors, the cell type that is thought to undergo
transformation to form NBL.
42
2.2 Results
2.2.1 SKPs of distinct developmental origin are highly similar at the transcriptional
level and differ from bone marrow mesenchymal stem cells (MSCs)
To address whether SKPs and their endogenous dermal precursors originate from the
neural crest or whether, like the dermis itself, they originate from multiple developmental
origins our collaborators conducted lineage tracing experiments. Briefly, they used two
different mouse Cre lines that allowed them to perform lineage tracing: Wnt1-cre, which
targets cells deriving from the neural crest, and Myf5-cre, which targets cells of a somite
(mesodermal) origin. By crossing these Cre lines to reporter mice, they showed that the
endogenous follicle-associated dermal precursors in the face derive from the neural crest, and
those in the dorsal trunk derive from the somites (mesoderm), as do the SKPs they generate.
The ventral trunk SKPs were found to derive from lateral plate mesoderm. Despite these
different developmental origins, facial and trunk SKPs are functionally similar, even with
regard to their ability to differentiate into Schwann cells, a cell type only thought to be
generated from the neural crest [245].
To comprehensively define the similarities and differences among these
developmentally distinct populations of SKPs, I compared global expression profiles derived
from dorsal trunk SKPs, ventral trunk SKPs, and facial SKPs and mesenchymal stem cells
(MSCs), used as a control for mesoderm-derived ventral and dorsal SKPs, all generated from
adult rats. The rat model was chosen to provide a direct comparison with adult rat
mesenchymal stromal cells (MSCs) that were available to our collaborators. RNA samples
purified from the SKPs and MSCs were analyzed on the Affymetrix GeneChip Rat Gene 1.0
ST Array. After normalization and filtering described in the Methods, genes with variable
expression profiles across facial SKPs, dorsal SKPs, ventral SKPs and MSCs were identified
using the multiple group comparison implemented in the LIMMA Bioconductor package
[246]. The LIMMA method was chosen for this analysis, as several studies noted its
consistently favorable performance compared to other common methods of microarray data
analysis, including Welch's T-test, ANOVA, SAM, and RVM [247,248]. In total, 7,012 out
of 18,879 genes showed evidence of differential expression across the 4 groups (BenjaminiHochberg-corrected q < 0.05). Spearman Rank correlations, computed between each sample
pair based on the expression profiles of these genes, demonstrated that dorsal, ventral, and
43
facial SKPs were virtually identical with an average Spearman Rank correlation value of 0.94
(SD = 0.031). In contrast, average Spearman Rank correlation between SKPs and MSCs was
0.82 (SD = 0.025) (Figure 2.1A). Using a two-tailed Student‘s T-test, this difference was
statistically significant (P < 0.0001). Unsupervised hierarchical consensus clustering analysis,
performed on the samples based on the variable gene set described above, using a standard
hierarchical clustering algorithm (correlation distance, average linkage clustering, 100
bootstrap iterations), confirmed these conclusions. The clustering result demonstrated that the
three SKP populations grouped together, and that all three were distinct from the MSC
samples (Figure 2.1B).
2.2.2 SKPs of distinct developmental origin maintain a lineage history at the gene
expression level
To delineate the extent of differences among the transcriptomes of the neural crestderived facial SKPs, the mesoderm-derived ventral and dorsal SKPs and the MSCs, I
performed three-way differential expression analysis using linear models implemented in the
LIMMA Bioconductor package [237,234]. The Venn diagrams show the numbers of
significantly differentially expressed genes (Benjamini-Hochberg-corrected q < 0.05) that are
in common among the comparisons (Figure 2.2A). Taken together, a total of 2,603 genes
showed evidence of differential expression in any of the three pairwise comparisons; and the
expression levels of these genes are plotted as a heatmap (Figure 2.2B). Of these genes, only
106 were significantly different between dorsal and facial SKPs, while 2,233 and 2,525
differed between MSCs versus dorsal SKPs and MSCs versus facial SKPs, respectively.
These data are compatible with the interpretation that precursor cells of at least two, and
potentially three, different developmental origins converge on to a highly similar phenotype.
I therefore directly compared the expression of genes associated with neural crest fate
specification [250], focusing on Slug, Snail, Twist, Sox9, Sox10, Foxd3, and Ap2a1.
Heatmaps of the microarray data showed that these genes were expressed at similar levels in
all three of the adult rat SKP samples, as were p75NTR and RhoB, which are also associated
with neural crest precursors (Figure 2.3A, left panel) [239,240]. Reverse transcription
polymerase chain reaction (RT-PCR) analyses of neonatal mouse skin, conducted by our
collaborators, confirmed that these mRNAs were also expressed at similar levels in neonatal
murine dorsal versus facial SKPs (Fig. 2.3B, left panel).
44
Although these analyses indicate that mesenchymal precursors of different
developmental origins converge to a very similar adult precursor cell phenotype, a pairwise
differential expression comparison of facial versus dorsal trunk SKPs and dorsal trunk versus
ventral trunk SKPs using linear models demonstrated that a subset of genes were
significantly differentially expressed (Benjamini-Hochberg-corrected q < 0.05; Table 2.1). Of
the 35 most differentially expressed genes in the facial versus dorsal comparison, 10 were
higher in dorsal SKPs and 25 in facial SKPs. Of the 35 most differentially expressed genes in
the dorsal versus ventral comparison, four were higher in dorsal SKPs and 31 in ventral SKPs
(Table 2.1). Many of these genes play an important role during embryogenesis. In particular,
dorsal trunk SKPs express high levels of the Zic1 transcription factor relative to both facial
and ventral trunk SKPs, and the hox transcription factors Hoxa5, Hoxc4, Hoxc6, and Hoxc9
are high in dorsal trunk SKPs relative to facial SKPs (Figure 2.3B; Table 2.1). In contrast,
facial SKPs expressed high relative levels of Pax3, and Msx1, both of which are transcription
factors associated with cranial neural crest cells [253], and Mab-21-like 1 and Mab-21-like 2,
mammalian homologues of the C. elegans mab-21 cell fate gene that is expressed during
embryogenesis [254] (Figure 2.3A, right panel; Table 2.1). The relative enrichment of these
different mRNAs was confirmed by our collaborators using RT-PCR analysis of neonatal
mouse dorsal versus facial SKPs (Figure 2.3B, right panel). Thus, although these different
dermal precursor populations are highly similar, they maintain a history of their distinct
developmental origins.
2.2.3 Identification of genes significantly enriched and depleted in neural crest stem
cell-like cells
We reported in Sections 2.2.1 and 2.2.2 that the expression analysis of SKPs,
complementing laboratory studies by our collaborators, was compatible with SKPs being
derived from different developmental origins and converging onto the common neural crest
stem cell-like phenotype. We next set out to identify transcripts that are enriched and
depleted in SKPs from all three developmental origins compared to MSCs. These transcripts
represent markers enriched and depleted in normal neural crest stem cell-like cells compared
to another type of multipotent somatic stem cell, the bone marrow-derived MSCs. The MSCs
provide a suitable comparator for SKPs for two reasons. First, they represent one of the few
somatic stem cell lineages with a similarly broad developmental potential to the neural crest
45
[230,255]. Second, MSCs derive from the mesoderm [256], which, as discussed in Sections
2.1 and 2.2.1, is also the lineage of origin of ventral and dorsal SKPs.
To identify transcripts enriched in each of facial SKPs, ventral trunk SKPs and dorsal
trunk compared to MSCs, I performed pairwise differential expression analysis using linear
models implemented in the LIMMA Bioconductor package [246,249]. Based on the pairwise
gene expression comparisons, 3,406 genes were significantly differentially expressed
between ventral trunk SKPs and MSCs; 2,793 genes were significantly differentially
expressed between facial SKPs and MSCs; and 2,424 genes were significantly differentially
expressed between dorsal SKPs and MSCs (Benjamini-Hochberg-corrected q < 0.05). Next,
results from the pairwise comparisons were combined to identify genes that were
significantly enriched or depleted in SKPs compared to MSCs in all three comparisons. In
total, 654 genes were found to be enriched in all three SKP lineages compared to MSCs,
while 752 were found to be depleted in all the three SKP lineages compared to MSCs. These
genes are listed in Appendix A and their expression is plotted as a heatmap in Figure 2.4.
2.2.4 Pathway analysis of SKP-enriched and SKP-depleted transcripts
To characterize the functions of transcripts differentially abundant in SKPs compared
to MSCs, I conducted a pathway enrichment analysis using the Ingenuity Pathway Analysis
tool (Ingenuity Systems, www.ingenuity.com). Using rat-derived gene lists, 618/654 and
681/752 could be annotated by the Ingenuity Knowledgebase. However, when the rat-derived
gene lists were converted to human orthologs, 624 out of 654 and 696 out of 752 genes could
be annotated in the Ingenuity database, for genes enriched and depleted in SKPs compared to
MSCs, respectively. Notably, when rat-derived gene lists were analyzed, the functional
enrichment results were almost identical to those obtained for the human orthologues. The
results from the human ortholog pathway analysis are quoted here as the human analysis had
a higher number of annotated genes. After applying Benjamini-Hochberg correction, 14
canonical pathways were significantly enriched among the genes upregulated in SKPs
(Figure 2.5A, Table 2.2A), while 13 canonical pathways were enriched among the genes
downregulated in SKPs (Figure 2.5C, Table 2.2B).
The majority (9 out of 14) of the pathways upregulated in SKPs compared to MSCs
involved WNT/Beta-Catenin, bone morphogenetic protein BMP or transforming growth
factor beta TGFB signaling (Table 2.2A). These include canonical pathways named ―Role of
46
Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis‖, ―Axonal Guidance
Signaling‖, ―Colorectal Cancer Metastasis Signaling‖, ―Human Embryonic Stem Cell
Pluripotency‖, ―WNT/Beta-Catenin Signaling‖, ―Role of Macrophages, Fibroblasts and
Endothelial Cells in Rheumatoid Arthritis‖, ―Molecular Mechanisms of Cancer‖, and
―Leukocyte Extravasation Signaling‖. While the canonical WNT/Beta-Catenin signaling on
its own induces differentiation of the neural crest along the sensory neural lineage, the BMP
signaling pathway was shown to antagonize this activity, such that presence of both BMP
and WNT maintains neural crest stem cell phenotype and multipotency [257]. Therefore, the
finding that WNT/Beta-Catenin and BMP signaling account for the majority of pathways
upregulated in SKPs is consistent with the neural crest stem cell-like phenotype of these
cells, and distinguishes them from other non-neural crest multipotent stem cells. In addition,
this finding provides a validation for the computational approach used to identify the
pathways associated with transcripts enriched and depleted in SKPs compared to MSCs. An
adaptation of the Ingenuity canonical pathway named ―Human Embryonic Stem Cell
Pluripotency‖, outlining WNT/Beta-Catenin and BMP signaling along with other key
stemness molecules in provided in Figure 2.5B with the genes upregulated in SKPs
highlighted in red.
In contrast, the majority (9 out of 13) of the pathways downregulated in SKPs
compared to MSCs were involved in cell cycle control or DNA repair (Table 2.2B). These
include canonical pathways named ―Hereditary Breast Cancer Signaling‖, ―Role of BRCA1
in DNA Damage Response‖, ―ATM Signaling‖, ―DNA Double-Strand Break Repair by
Homologous Recombination‖, ―Mitotic Roles of Polo-Like Kinase‖, ―Cell Cycle Control of
Chromosomal Replication‖, ―Cell Cycle: G2/M DNA Damage Checkpoint Regulation‖,
―Role of CHK proteins in Cell Cycle Checkpoint Control‖, and ―Molecular Mechanisms of
Cancer‖. Intriguingly, the breast cancer, early onset 1 BRCA1 gene was involved in 7 out of 9
of these pathways (Table 2.2B) suggesting its central role in the regulatory network
downregulated in SKPs compared to MSCs. An adaptation of the Ingenuity canonical
pathway named ―Role of BRCA1 in DNA Damage Response‖ is depicted in Figure 2.5D
with the genes downregulated in SKPs highlighted in green.
47
2.2.5 SKPs share expression profile similarities with ES cells
2.2.5.1 Identification of genes associated with the maintenance of the undifferentiated
state in human ES cells
In Sections 2.2.3 and 2.2.4 I reported on similarities and differences in gene
expression between two lineages of multipotent somatic stem cells with a broad
developmental potential, SKPs and MSCs [258,230,255]. This analysis revealed that the
Ingenuity pathway annotation ―Human Embryonic Stem Cell Pluripotency‖ was significantly
enriched among transcripts expressed at a higher level in all three SKP lineages compared to
MSCs (Table 2.2A) suggesting that SKPs had more features of a pluripotent cell than MSCs.
This result is consistent with the observation that neural crest stem cells arguably have the
broadest developmental potential among the somatic stem cell types, as they are able to
generate diverse cell types, including smooth muscle cells, dermis, tendons, connective
tissues, sensory neurons and root ganglia of the peripheral nervous system, Schwann cells,
pigment cells, and neuroendocrine cells of the adrenal medulla. Since the expression of
pluripotency markers was a distinguishing feature of SKPs compared to MSCs, we aimed to
further define the extent to which SKPs resembled human ES cells.
To investigate the similarities and differences in gene expression between SKPs and
ES cells, I set out to identify a comprehensive list of genes that may be associated with the
maintenance of pluripotency in human ESCs. To accomplish this, 319 candidate pluripotency
genes were selected by the Connie Eaves laboratory based on their potential role in stem cell
biology, and literature reviews. These genes are involved in transcription, chromatin
maintenance, and membrane receptor signaling and are listed in Appendix B.
To determine which of the 319 candidate genes might be most tightly linked to the
maintenance of pluripotency, I undertook an analysis of their representation in 17
LongSAGE libraries derived from undifferentiated and differentiated human ES cells as well
as several adult tissues (Table 2.3). I used the SAGE Genie tool for the gene-to-tag mapping
such that each gene was represented by one LongSAGE tag [259]. Following the mapping,
the tag counts were normalized to the depth of 100,000, and a heuristic approach termed
seriation was used to identify groups of genes with similar expression levels in different
libraries. The seriation approach was chosen for this analysis, as it was shown to perform
favorably compared to a clustering algorithm PoissonC, commonly used for SAGE data
48
analysis, when relatively small numbers of genes (under 5,000) are studied [260]. Since our
implementation of seriation is based on the visualization of contigs to identify co-expressed
genes, the approach is suitable for targeted analyses, such as when the expression of a
selection of genes is considered. The method is not suitable for unsupervised genome-wide
analysis.
Seriation is a statistical method for simultaneously ordering rows and columns of a
symmetrical distance matrix for the purposes of revealing an underlying one-dimensional
structure [261]. An assumption in seriation analysis is that there is an order (or distinct sub
orders) in the data that are biologically meaningful. The inherent orders may represent any
sequential structure among the data (e.g. their dependence on time or another variable). In the
application described here, I hypothesized that the sequential structure present in the
expression data was the developmental restriction of the expression of the candidate
pluripotency genes to the undifferentiated ES cell libraries. The seriation analysis identified
three categories of genes (Figure 2.6A, Appendix B). In the original publication, we termed
these higher order structures or categories ―Supercontigs‖ [260]. Upon further inspection,
Supercontig1 contained 114 genes for which the expression was restricted to undifferentiated
ES cells. This group contained tags for POU5F1, NANOG, SOX2, FOXD3 and other genes
whose expression is known to decrease upon human ES cell differentiation. The average
expression levels of these 114 genes in the 17 LongSAGE libraries are plotted in Figure
2.6B. A second group (Supercontig 2) consisted of 145 genes whose transcript abundance
was increased in differentiated cells. The third subset (Supercontig 3) contained the
remaining 60 genes whose expression patterns did not fit within either of the two categories.
2.2.5.2 Validation of pluripotency markers using computational methods
Since the genes in Supercontig 1 were preferentially expressed in the undifferentiated
human ES cells, I hypothesized that they would be involved in pathways associated with
human ES cell pluripotency. To test this hypothesis, I used Ingenuity software (Ingenuity
Systems, www.ingenuity.com) to identify canonical pathways significantly enriched among
the 114 genes in Supercontig 1 (Appendix B). After correcting for multiple testing, four
Ingenuity canonical pathways ―Role of Oct4 in Mammalian Embryonic Stem Cell
Pluripotency‖, ―Role of NANOG in Mammalian Embryonic Stem Cell Pluripotency‖,
―Human Embryonic Stem Cell Pluripotency‖, ―Actin Cytoskeleton Signaling‖ remained
49
significantly enriched among the 114 transcripts in Supercontig 1 (Figure 2.7A). As
expected, all of these pathways appeared to be associated with the maintenance of
pluripotency.
Given the restriction of the expression of the 114 transcripts in Supercontig 1 to the
undifferentiated ES cells, we further hypothesized that the promoters of these genes, but not
genes in other Supercontigs, contained binding sites for core transcription factors OCT/POU,
SOX2, and NANOG that are known to be required for the maintenance of pluripotency in
cell culture [262]. To address this possibility, we used PASTAA software to analyze the
promoters of the genes contained in all three Supercontigs [263]. PASTAA interrogates
groups of co-expressed genes and ranks their likelihood of being regulated by a transcription
factor, as evident from the presence of known transcription factor binding sites. Two separate
PASTAA analyses were performed on each of the three Supercontigs. One analysis involved
interrogating a region extending 10 kb upstream from the transcription start site (distal
analysis); the other analysis interrogated a region 6400 bp on each side of the transcription
start site (proximal analysis). PASTAA analysis of Supercontig1 (the 114 undifferentiated
human ES cell-specific genes) showed SOX (P= 0.0018) and FOX (P= 0.041) PWMs to be
highly ranked in the distal analysis, and NANOG (P=0.013) and OCT/POU (P= 0.024)
PWMs to be highly ranked in the proximal analysis (Figure 2.7C and B, respectively).
Notably, the PASTAA data predicted binding of all four core pluripotency transcription
factors NANOG, OCT/POU and SOX to their own and each others‘ promoters, as expected
[262]. Several PWMs scored higher than NANOG in the proximal analysis (hollow circles
left of the NANOG PWM in Figure 2.7B). These included PWMs representing binding sites
for nuclear factor Y (NFY), a regulator of stem cell gene CD34 [264]; transcription factor
AP2, expressed during early development [265] and required for the development of the
sympathetic lineage [266]; TBX5, a regulator of limb development; and a neuronal regulator
ELK1 [267]. In the distal analysis, PWMs for cAMP response element-binding (CREB)
family members involved in neurogenesis [268] scored highly (hollow circles between ATF
and FOX PWMs in Figure 2.7C), in addition to the PWMs for the core pluripotency factors.
PASTAA analysis of Supercontig 2 also predicted a FOX motif to be present in the
distal promoter region of these genes (P=0.0185) and a NANOG motif in the proximal
promoter region (P=0.00786). Analysis of Supercontig 3 predicted FOX motifs in both the
50
distal (P=0.0404) and proximal (P=0.0327) promoter regions of these genes, but no NANOG,
OCT/POU, or SOX motifs in either the distal or proximal promoter regions. In conclusion,
only upstream regions of genes in Supercontig 1, but not the other Supercontigs, contained
binding sites for all four core pluripotency transcription factors NANOG, OCT/POU and SOX
that are required for the maintenance of the undifferentiated state in cell culture. This finding
is consistent with the restricted expression of these genes in the undifferentiated ES cells, and
provides additional evidence for their candidate role in the regulation of pluripotency.
2.2.5.3 Pluripotency genes whose transcripts are enriched or depleted in normal neural
crest stem cell-like cells compared to mesenchymal stem cells
In Section 2.2.5.1 I used seriation of LongSAGE libraries to identify candidate
pluripotency genes whose expression was restricted to undifferentiated human ES cells. I
identified 114 genes preferentially expressed in undifferentiated human ES cells, and their
candidate role in pluripotency was supported by computational analyses in Section 2.2.5.2.
To assess whether these pluripotency markers were among the transcripts enriched or
depleted in SKPs with respect to MSCs, I compared the genes in Supercontig 1 to those
identified in Section 2.2.3 and listed in Appendix A.
This analysis revealed 5 pluripotency markers (CTNNB1, ETV4, MAD2L2, PITX2,
SOX2) among the genes enriched in SKPs compared to MSCs, and 13 pluripotency markers
(ADAM23, AURKB, CENPK, FAM46B, FAM64A, HMGB2, IGF2BP3, KPNA2, MTHFD1,
MYBL2, TBX4, TPM1, ZFP57) among the genes depleted in SKPs compared to MSCs (Table
2.4). Two known pluripotency genes CTNNB1, a member of the WNT signaling pathway,
and SOX2, one of the master regulators of pluripotency [262], were among the transcripts
enriched in SKPs compared to MSCs. In addition, AURKB, a kinase known to interact with
the BRCA1-associated RING domain protein 1 (BARD1), a member of the double-stranded
break repair pathway [269] was found to be depleted in SKPs compared to MSCs.
Intriguingly, we show in Chapter 3 that AURKB is expressed in NBL tumor-initiating cells
and is a drug target for high-risk NBL. These observations suggest that although SKPs share
expression profile similarities with ESCs, and have a broader developmental potential than
MSCs, they are different from ESCs in the identity of pluripotency markers they express,
highlighting the uniqueness of the neural crest stem phenotype.
51
2.3 Discussion
NBL originates from the sympathoadrenal lineage that is thought to derive from
sequential differentiation of the neural crest stem cell [226,173,229]. Sympathoadrenal
precursors go on to develop the neuroendocrine cells of adrenal medulla, the most common
primary site of NBL [174]. A correlation between the differentiation state of NBL cells and
the clinical aggressiveness of the disease has been noted, such that cells of the most
aggressive high-risk subtype resemble most primitive neural crest precursors, while cells of
low-risk subtypes resemble various stages of neural crest differentiation [175]. Therefore,
understanding the genesis of the sympathoadrenal lineage from early neural crest precursors,
the origin and the phenotype of the neural crest stem cell and its transformation to the
malignant counterpart, may shed light onto molecular events that contribute to the
development of NBL.
In Sections 2.2.1 and 2.2.2 I built on the lineage tracing experiments conducted by
our collaborators to help characterize the developmental origin of different populations of
neural crest stem cell-like cells (SKPs) that possess many properties of the somatic neural
crest stem cell. The lineage tracing work found that skin-derived neural crest stem cell-like
cells originated from different developmental lineages depending on the part of the body they
are derived from. However, my gene expression analysis showed that these cell populations
possess remarkably similar gene expression profiles. In fact, only 35 genes were significantly
differentially expressed between each facial and dorsal, and dorsal and ventral SKPs, in
contrast to thousands of genes differentially expressed between each SKP lineage and
another multipotent somatic stem cell lineage, the mesenchymal stem cells (MSCs). The
MSCs were chosen for this comparison to represent mesoderm, the embryonic origin of
dorsal and ventral SKPs. This result sets precedence for the origin of a somatic stem cell
from a different tissue type, as in this case mesoderm-derived SKPs from the dorsal and
ventral trunk appear to converge to a neural crest stem cell phenotype that is similar to that of
neural crest-derived facial SKPs. This observation may be significant for the genesis of NBL
that is thought to derive from the neural crest. Our results suggest that since neural crest-like
cells can derive from a non-neural crest lineage, it may be possible that neural crest-like
lineages may give rise to NBL.
52
Conceptual support for the idea of lineage convergence occurring in nature,
specifically as applied to the neural crest, comes from the fact that several tissues, including
the gut and respiratory epithelium, are known to contain neuroendocrine cells (typical neural
crest derivatives) of non-neural crest origin [270], supporting the existence of a second
developmental pathway that converges on to a neural crest-like phenotype. Additional
support for lineage convergence is based on the observation that pluripotent stem cells can be
produced from germ cells [271] and even from somatic fibroblasts by in vitro reprogramming
[242]. Intriguingly, just as reported here with SKPs, pluripotent cells obtained from different
developmental origins (ES cells, reprogrammed germ cells and reprogrammed somatic cells)
have similar expression profiles and functional characteristics but retain epigenetic and
expression marks indicative of their primary origin [271].
Having shown that neural crest stem cell-like SKPs derive from different
developmental origins, I went on to identify genes and pathways that distinguish the common
neural crest stem cell-like phenotype of the three SKP lineages from a multipotent somatic
stem lineage derived from the mesoderm (the origin of dorsal and ventral SKP lineages), the
MSCs. Mesenchymal stem cells represent one of the few somatic stem cell lineages that is
similar to the neural crest in terms of their developmental potential. The MSCs can
reportedly differentiate into a wide array of tissue types, including osteoblasts (bone),
adipocytes (fat), chondrocytes (cartilage) as well as myocytes (muscle cells) and neurons
[230,255]. Differential expression analysis comparing the transcriptomes of SKPs and MSCs
revealed that the majority of pathways associated with transcripts increased in abundance in
SKPs compared to MSCs involved three core neural crest stem cell signaling pathways,
WNT/Beta-Catenin, bone morphogenetic protein BMP or transforming growth factor beta
(TGFB) signaling [272]. The expression of both the BMP and WNT/Beta-Catenin pathway
members was consistent with the neural crest stem cell phenotype of SKPs, as the
coordinated activity of these two pathways is required for the maintenance of
undifferentiated state in neural crest stem cells [257]. In contrast, the majority of pathways
associated with transcripts whose abundance was decreased in SKPs involved double-strand
break DNA repair and cell cycle control. In particular, the BRCA1 molecule participated in
many of the DNA repair pathways found to be significantly enriched among the SKPdepleted (MSC-enriched) transcripts, suggesting its central role in the functional network of
53
molecules relatively increased in abundance in MSCs compared to SKPs. This observation is
consistent with the recent findings that the MSC cell lineage was resistant to irradiation
through, among other pathways, the activation of double-strand break repair by homologous
recombination and nonhomologous end joining (NHEJ) governed by BRCA1 [273]. As
discussed in Section 2.2.5.3 AURKB, a kinase linked to the BRCA1 signaling pathway
through the interaction with BARD1 [269] was found to be part of Supercontig 1 containing
genes whose expression pattern was restricted to undifferentiated ES cells. As reported in
Section 2.2.5.3 the mRNA expression level of this pluripotency-associated kinase was found
to be decreased in SKPs compared to MSCs suggesting that the increased expression of
BRCA1 DNA repair pathway in MSCs compared to SKPs is similar to that observed in ES
cells.
In conclusion, I addressed the overall goal of this Chapter of characterizing the
expression profiles of SKP lines used as models of neural crest progenitors and normal
counterparts of neuroblastoma cells [243,233]. In particular, I found that known signaling
pathways specifically implicated in neural crest stem cells, such as WNT/Beta-Catenin, BMP
and TGFB signaling were preferentially expressed in the neural crest stem cell-like cells
compared to a mesodermal multipotent cell lineage. A novel finding from our work is the
relative decrease in gene-level expression of members of the double-stranded DNA repair
pathways that involve BRCA1 in SKPs compared to MSCs and ESCs. The molecular
mechanism underlying this observation as well as its functional significance remains to be
addressed by future work. Of note is the fact that members of the same pathway were found
to be expressed at a higher level in NBL tumor-initiating cells as compared to SKPs (Chapter
3).
2.4 Materials and methods
2.4.1 Microarray analysis of rat SKP lines
RNA was prepared from twice-passaged adult rat dorsal, facial, and ventral SKPs and
MSCs using Trizol (Invitrogen), as per the manufacturer‘s instructions, followed by the
RNeasy Mini Kit (Qiagen, Venlo, Netherlands, http://www.qiagen.com). Three independent
isolates for each of dorsal trunk SKPs, ventral trunk SKPs, and facial SKPs, and four
independent isolates of MSCs were used for the microarray study. The independent isolates
were obtained from different animals. The RNA samples were analyzed on Affymetrix Gene-
54
Chip Rat Gene 1.0 ST Arrays (Affymetrix, Santa Clara, CA, http://www.affymetrix.com).
The data were checked for batch effects, background corrected and quantile-normalized
according to the standard Robust Multichip Average (RMA) procedure using the Affymetrix
Expression Console software. The gene expression data were annotated using R. norvegicus
genome build rn4. Subsequent statistical analysis was conducted using R-2.8.1. Microarray
data were deposited in the NIH GEO repository (accession number GSE23954).
2.4.2 Unsupervised analysis to assess global transcriptome similarity
Genes with variable expression pattern across four groups (facial SKPs, dorsal SKPs,
ventral SKPs, and MSCs) were identified using the multiple group comparison implemented
in the Linear Models for Microarray Data (LIMMA) Bioconductor package version 2.16.5.
This package was chosen for the analysis as it was reported to perform favorably compared
to other common approaches for microarray data analysis, including SAM, Welch‘s T-test,
ANOVA and Wilcoxon‘s test [247]. The LIMMA method first builds a linear model for each
gene using the number of parameters (experimental groups) defined by the user. A moderated
t-statistic (computed based on the weighted average between the variance of each gene and
the variance for all genes) is then used to test the hypothesis of each parameter coefficient for
each gene being equal to zero [249]. The estimated coefficients are used to represent the fold
changes between the experimental groups. For multiple group comparisons described in
Section 2.2.1 (facial SKPs, dorsal SKPs, ventral SKPs, and MSCs) the F-statistic with
Benjamini-Hochberg (BH) multiple testing correction implemented in the eBayes function
was used to test the null hypothesis that all parameter coefficients were equal to zero, and in
other words there were no differences among the experimental groups [249]. The parameter
coefficients were defined by the contrasts between each group pairs: facial SKPs versus
dorsal SKPs, facial SKPs versus ventral SKPs, facial SKPs versus MSCs, dorsal SKPs versus
ventral SKPs, dorsal SKPs versus MSCs, and ventral SKPs versus MSCs. Those genes with
BH-corrected q < 0.05 were considered statistically significant. A total of 7,012 out of
18,879 genes showed evidence of differential expression among the four groups (BenjaminiHochberg false discovery rate-corrected q < 0.05), and these genes were used in the
correlation analysis and unsupervised consensus hierarchical clustering.
Unsupervised hierarchical cluster analysis was conducted using Bioconductor
package Pvclust version 1.2-1 with 100 bootstrap alterations. The Spearman Rank correlation
55
matrix was computed using the standard R cor function, and plotted as an image using the
custom function myImagePlot (available at www.phaget4.org/R/myImagePlot.R).
2.4.3 Differential expression analysis using microarrays
The preprocessed data were analyzed using the Linear Models for Microarray Data
(LIMMA) Bioconductor package to identify genes that show significant evidence of
differential expression in each pairwise or multiple group comparison described in the text
[249]. The LIMMA method first builds a linear model (lmFit function) for each gene using
the number of parameters (experimental groups) defined by the user. A moderated t-statistic
computed based on the weighted average between the variance of each gene and the variance
of all genes is then used to test the hypothesis of each parameter coefficient for each gene
being equal to zero [249]. The estimated coefficients are used to represent the fold changes
between the experimental groups, and those genes with coefficients significantly different
from zero based on the moderated T-test are considered differentially expressed. The
Benjamini-Hochberg (BH)-corrected q < 0.05 was used as the threshold for statistical
significance.
For multiple group comparisons, such as described in Section 2.2.2 (ventral SKPs,
dorsal SKPs, and MSCs) the F-statistic with BH multiple testing correction implemented in
the eBayes function was used to test the null hypothesis that all parameter coefficients were
equal to zero (analysis similar to ANOVA), and in other words there were no differences
among the experimental groups [249]. Those genes with BH-corrected q < 0.05 were
considered statistically significant, unless listed otherwise.
2.4.4 Reverse Transcription Polymerase Chain Reaction (RT-PCR) to confirm results
from SKP microarray analysis
RNA was prepared from twice-passaged neonatal mouse dorsal and facial SKPs using
Trizol (Invitrogen) and from sorted, uncultured mouse skin cells using Cells-to-cDNA II kit
(Ambion/Applied Biosystems, Austin, TX, http://www.ambion.com) as per the
manufacturer‘s instructions, followed by the RNeasy Mini Kit (Qiagen). For all analyses,
controls were performed without reverse transcriptase. PCR reactions were performed as
follows: 94°C, 2 minutes; 25–35 cycles of 94°C, 15 seconds; gene-specific annealing
temperature for 30 seconds; and 72°C for 30 seconds. Primers used in this study were as
follows: Ap2a1, 5′-TCCCTG TCCAAGTCCAACAGCAAT-3′ and 5′-
56
AAATTCGGTTTCGCACACGTACCC-3′; Eya1, 5′-CTAACCAGCCCGCATAGCCG-3′
and 5′-TAGTTTGTGAGGAAGGGGTAGG-3′; Foxd3, 5′-TCTTACATCGCGCTCATCAC3′ and 5′-TCTTGACGAAGCAGTCGTTG-3′; Gapdh, 5′CGTAGACAAAATGTGAAGGTCGG-3′ and 5′-AAGCAGTTGGTGGTGCAGGATG-3′;
Hoxa5, 5′-TAGTTCCGTGAGCGAACAATTC-3′ and 5′GCTGAGATCCATGCCATTGTAG-3′; Hoxc4, 5′-AACCCATAGTCTACCCTTGGATGA3′ and 5′-CGGTTGTAATGAAACTCTTTCTCTAATTC-3′; Hoxc6, 5′ACGTCGCCCTCAATTCCA-3′ and 5′-CTGAGCTACGGCTGCTCCAT-3′; Hoxc9, 5′TGTAGCGATTTTCCGTCCTGTAG-3′ and 5′-CC GTAAGGGTGATAGACCACAGA-3′;
Mab21l1, 5′-CCCCAACATGATCGCGGCCCAGGCC-3′ and 5′CCTCCTTCAGGACGTCGGAGACCAC-3′; Mab21l2, 5′CCCCAACATGATCGCCGCTCAGGCC-3′ and 5′-CGGGGCTCTTGCACCTCCACTTCC3′; Msx1, 5′-CGGGCGCCTCACTCTACAGT-3′ and 5′-TCCCGCTGCTCTGCTCAAA-3′;
p75NTR, 5′-GTGTTCTCCTGCCAGGACAA-3′ and 5′-GCAGCTGTTCCACCTCTTGA-3′;
Pax3, 5′-TGCCCTCAGTGAGTTCTATCAGC-3′ and 5′GCTAAACCAGACCTGCACTCGGGC-3′; Rhob, 5′-AAGACGTGCCTGCTGATCGTG-3′
and 5′-CTTGCAGCAGTTGATGCAGCC-3′; Slug, 5′CGTCGGCAGCTCCACTCCACTCTC-3′ and 5′-TCTTCAGGGCACCCAGGCTCACAT3′; Snail1, 5′-CGGCGCCGTCGTCCTTCT-3′ and 5′GGCCTGGCACTGGTATCTCTTCAC-3′; Sox9, 5′CCGCCCATCACCCGCTCGCAATAC-3′ and 5′-GCCCCTCCTCGCTGATACTGGTG-3′;
Sox10, 5′-CAAGGGGCCCGTGTGCTA-3′ and 5′-GCCCGTGCCATGCTAACTCT-3′;
Twist1, 5′-CTTTCCGCCCACCCACTTCCTCTT-3′ and 5′GTCCACGGGCCTGTCTCGCTTTCT-3′; and Zic1, 5′-GCGGCCGAAAGCCAACT-3′ and
5′-TGCCAAAAGCAATGGACAGC-3′.
2.4.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to
MSCs
Ingenuity software (Ingenuity Systems, www.ingenuity.com) was used for pathway
enrichment analysis according to the instructions available on the software website. The rat
genes were mapped to the corresponding human orthologs using the Ingenuity software. The
human orthologs were then annotated using the Ingenuity Knowledgebase and subjected to a
57
pathway enrichment analysis. This analysis uses a Fisher‘s Exact test to assess the null
hypothesis that the number of observed genes in a particular pathway is not different from
the number expected by chance, based on the size of the observed gene list, and the number
of genes in the pathway. The P-values can be adjusted for multiple testing using the
Benjamini-Hochberg (BH) procedure. In this Chapter, BH-corrected q-values of less than
0.05 (unless noted otherwise in the text) were considered statistically significant and
sufficient to reject the null hypothesis of a chance association of a particular pathway with
the observed gene list.
2.4.6 Seriation analysis of LongSAGE libraries from the Cancer Genome Anatomy
Project
LongSAGE gene expression libraries were prepared as described previously [274].
The libraries used in this study are available via the Gene Expression Omnibus database as
part of the record GSE14 of the Cancer Genome Anatomy Project resource [259,144]. The
tissue origins of the libraries are summarized in Table 2.3. LongSAGE tags were mapped to
genes using the Hs_long.best_tag file available through the SAGE Genie at
ftp://ftp1.nci.nih.gov/pub/SAGE/HUMAN and as previously described [259]. The tag counts
in each library were normalized to the depth of 100,000. The resulting dataset was subjected
to seriation analysis using the progressive construction of contigs heuristic implemented
using custom MATLAB scripts. The algorithm described in Section 2.4.6 was run three times
on the same dataset to ensure that the seriation result was robust.
2.4.7 Seriation using the progressive construction of contigs heuristic
Seriation seeks the best enumeration order among objects based on their similarity
according to a chosen criterion. Since the problem is NP-hard, we developed a novel
heuristic specifically for the SAGE data analysis task. The ‗progressive construction of
contigs‘ heuristic attempts to put the most similar objects side by side without breaking
already established chains of closely related elements we term ‗contigs‘. Here we use
pairwise correlations between expression vectors (normalized tag counts for a particular tag
across all libraries) as the criterion for defining similarities between tags; however, in
principle, other similarity criteria can be used for this task. The pairwise correlations between
tag expression vectors x and y are calculated using the standard correlation coefficient
function, R(x,y) = C(x,y)/sqrt(C(x,y)*C(y,x)) where C(x,y) = E[(x – x̅)*(y – y̅)], where x̅ and
58
y̅ are the means of expression vectors x and y, and E is the mathematical expectation. The
correlation values are subsequently arrayed into a symmetric matrix, which is subjected to
the following progressive seriation procedure. In the first step, the tag pair with the highest
correlation value is found and marked as the beginning of the first contig. At each subsequent
step the tag pair with the next highest correlation value is identified. If one of the members of
the tag pair is involved in a previously formed contig, the columns of the matrix are
reorganized to place the other member at the nearest edge of the same contig; since the
matrix is symmetrical, the rows are reordered accordingly. Importantly, previously reordered
elements are kept intact in this process. If it is impossible to add the similarity maximum of
the current step to a contig given the restriction on the previously-moved objects or if the tag
pair with the correlation maximum does not involve any of the members of the formed
contigs, the current similarity maximum is used to start a new contig. The seriation process
continues until all elements have been processed. The result is the production of contigs of
similar correlation values that can be displayed along the diagonal of the correlation matrix
representing internal topologies in the data. Theoretically, in the case of a Robinson data
structure, whereby the data are from a unimodal distribution, the contigs are merged into one
and the obtained result is the most optimal single seriation solution [275,261]. A key
algorithmic difference between the seriation algorithm described above and a procedurally
similar hierarchical clustering algorithm (such as the hierarchical clustering method
developed in [276] and implemented in [277] is the treatment of vectors after the highest
pairwise correlation value has been identified at each step. In clustering, the vectors are
averaged together into a new vector using a linkage rule (for instance, average linkage
clustering) and this new vector is represented by a node in the hierarchical clusterogram. In
contrast, in the case of seriation, no new vector or node is formed, and the rows and columns
of the correlation matrix are merely reordered to reflect underlying patterns in the data as
described above. Therefore, no linkage rule is required in seriation in addition to the distance
metric used to define similarities. In our implementation of the seriation algorithm, ordered
structures (contigs) are revealed by color-coding the reordered correlation matrix according
to the magnitude of the correlation value. In this manner, visual inspection of the matrix
allows for the selection of ordered contigs for further inspection. Higher order structures
(supercontigs) can also emerge from this analysis, indicating more complex patterns in the
59
dataset. Due to the visualization component, the algorithm is able to analyze up to 4000
genes at a time (tested on 1.7 IBM PC Pentium 4, Z60t laptop) and is suitable for the analysis
of pre-selected sets of genes. Importantly, the algorithm produces a robust solution for each
seriation run (in other words, equivalent solution is produced upon repeated seriation of the
same data set).
2.4.8 Computational validation of transcripts in Supercontig 1 as pluripotency
markers
Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used
to identify canonical pathways associated with genes in Supercontig 1 as described in Section
2.4.5. To identify transcription factor binding sites in groups of genes, we used the PASTAA
Web server as recommended by the authors [263]. The PASTAA algorithm ranks genes by
estimating the overall affinity of a position weight matrix (PWM) for sequence regions that
are defined relative to the transcriptional start site of each gene in a list. Two separate
PASTAA analyses were performed on each of the three Supercontigs to identify candidate
distal- and proximal-acting transcription factor. The distal analysis involved interrogating a
region extending 10 kb upstream from the transcription start site, while the distal analysis
interrogated a region 6400 bp on each side of the transcription start site.
60
Figure 2.1 Global expression patterns are similar across SKPs of distinct development
origins
Transcriptome-wide expression profiles from dorsal, facial, and ventral SKPs and
mesenchymal stem cells (dSKPs, fSKPs, vSKPs, and MSCs, respectively) were processed as
described in Methods. (A). Spearman Rank correlations computed based on genes
differentially expressed among the different types of SKPs and MSCs were color coded as
shown in the color legend such that yellow represents high and blue represents low
correlation, respectively. The color-coded Spearman correlation matrix reveals the relative
similarity among the expression profiles of SKPs, regardless of their origin, which is in
contrast to the expression profiles of MSCs that formed a separate square (bottom right). The
correlation matrix is symmetrical such that the ordering of samples is the same along the xand y-axes. (B). Unsupervised clustering conducted using correlation distance, and average
linkage clustering over 100 bootstrap iterations confirmed the finding of similarity across the
three SKP lineages, and their distinction from MSCs. The significance of the hierarchical
clustering result was assessed using AU (approximately unbiased, in red font) and BP
(bootstrap probability, in green font) re-sampling based on 100 iterations implemented in the
R package Pvclust. Modified with permissions of AlphaMed Press.
61
A
62
B
63
Figure 2.2 Facial and dorsal trunk SKP lineages show similar degrees of divergence
from MSCs
(A). Numbers of genes significant in each of the pairwise differential expression comparisons
among the facial SKPs, dorsal trunk SKPs, and MSCs (Benjamini-Hochberg-corrected
q < 0.05) are plotted as Venn diagrams. Each pairwise comparison is denoted by a colored
circle: facial SKPs vs. dorsal SKPs (yellow); MSCs vs. facial SKPs (pink); MSCs vs. dorsal
SKPs (green). Numbers of genes significant in each comparison are quoted in the circles. For
instance, the comparison in the bottom (pink and green Venn) reveals that there are 2,525
(1,069 + 1,456) and 2,233 (777 + 1,456) genes differentially expressed between MSCs and
facial SKPs, and MSCs and dorsal SKPs, respectively; 1,456 of these genes are differentially
expressed in both of the comparisons. This analysis reveals that facial SKPs and dorsal SKPs
are more similar to each other than either of them are to MSCs. In addition, the extent of
divergence between each SKP lineage and MSCs is similar with 2,525 and 2,233 genes being
differentially expressed between the MSCs and facial and dorsal SKPs, respectively. (B).
Three-way comparison was conducted across the three groups (facial SKPs, dorsal trunk
SKPs and MSCs) to identify genes that show evidence of differential expression using the
LIMMA Bioconductor package. Expression profiles of 2,603 genes, identified as
differentially expressed among the groups (Benjamini-Hochberg-corrected q < 0.05) are
plotted as a heatmap. The rows are centered and scaled by subtracting the mean of the row
from every value and then dividing the resulting values by the standard deviation of the row
(row Z-Score). Modified with permissions of AlphaMed Press.
64
A
65
B
66
Figure 2.3 SKPs of distinct developmental origin express neural crest specification
genes despite maintaining a lineage history at the gene expression level
(A, left panel). Rat microarray expression levels of genes involved in neural crest
specification and associated with neural crest precursors: Snail1, Slug, Twist, Sox9, Sox10,
Foxd3, Ap2a1, p75NTR and RhoB [252,250,251]. Green indicates the lowest relative levels
of expression and red the highest, as defined by the color key. Note that these genes are
expressed similarly in the facial and dorsal trunk SKPs, despite the distinct developmental
origins of these two SKP lineages from the neural crest and mesoderm, respectively. (A, right
panel). Rat microarray expression levels of 10 out of the 35 transcription factors that were
identified as being among the most differentially expressed genes between dorsal and facial
SKPs in the analysis in Table 2.1. Green indicates the lowest relative levels of expression and
red the highest, as defined by the color key. Note the differential expression between dorsal
and facial SKPs. (B). RT PCR validation of the microarray results above conducted in the
mouse model. For the RT PCR experiment in the left panel, total RNA was isolated from
neonatal mouse dorsal trunk and facial secondary SKP spheres. Total RNA from E8.5 murine
embryos was used as a positive control for primer performance. For the RT PCR experiment
in the right panel, the total RNA was purified from uncultured EGFP-positive cells from
neonatal Sox2-EGFP mouse dorsal trunk and facial skin secondary SKP spheres. Total RNA
from E8.5 mouse embryos was used as a positive control for primer performance. Modified
with permissions of AlphaMed Press.
67
A
68
B
69
Figure 2.4 Transcripts preferentially enriched or depleted in SKPs compared to MSCs
Pairwise comparisons were conducted between each of dSKPs, fSKPs and vSKPs and MSCs using the LIMMA Bioconductor package
to identify significantly differentially expressed genes (Benjamini-Hochberg-corrected q < 0.05). The results from these comparisons
were combined to identify genes commonly enriched or depleted in SKPs; the expression profiles of 654 and 752 genes enriched or
depleted in SKPs compared to MSCs, respectively are plotted as a heatmap. The rows are centered and scaled by subtracting the mean
of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). The genes are
listed in Appendix A.
70
71
Figure 2.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs
Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal pathway annotations significantly
enriched among transcripts increased (A) or decreased (C) in abundance in SKPs compared to MSCs (Benjamini-Hochberg-corrected
q < 0.05). The list of transcripts used for the analysis is provided in Appendix A. In (A) and (C) the negative logs of P values are
plotted along the x-axis while the pathways are plotted along the y-axis. (B). The Ingenuity canonical pathway named ―Human
Embryonic Stem Cell Pluripotency‖ is significantly enriched among the transcripts upregulated in SKPs reflecting the broad
development potential of the neural crest stem cell [258]; the pathway members upregulated in SKPs are in red; the protein complexes
are bolded; the kinases are denoted with triangles, while cytokines are denoted with squares. (D). The Ingenuity canonical pathway
named ―Role of BRCA1 in DNA Damage Response‖ is significantly enriched among transcripts downregulated in SKPs compared to
MSCs; the pathway members downregulated in SKPs are in green, and the protein complexes are bolded.
72
A
73
B
74
C
75
D
76
Figure 2.6 Seriation analysis to identify developmentally restricted transcripts
expressed in undifferentiated ES cells
(A). Seriation analysis of the 319 candidate pluripotency genes (Appendix B) revealed three
Supercontigs of co-expressed genes, containing 114, 145 and 60 genes, respectively.
Supercontigs are bounded by red boxes and are numbered. Upon inspection, Supercontig1,
composed of 114 genes, contained transcripts increased in abundance in the undifferentiated
ES cells. (B). Average LongSAGE-based expression level for the 114 genes in Supercontig1
genes across 17 LongSAGE libraries. Reprinted with permissions of Elsevier.
77
A
B
78
Figure 2.7 Computational validation of genes identified by seriation as pluripotency
markers
(A). Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to
reveal canonical pathways significantly enriched among transcripts in Supercontig 1
(Benjamini-Hochberg-corrected q < 0.05). The list of transcripts used for the analysis is
provided in Appendix B. (B, C). Proximal and distal promoter analyses of the genes in
Supercontig1 reveals the presence of binding sites for the core pluripotency transcription
factors, SOX, NANOG, and OCT/POU that are required for the propagation of
undifferentiated ES cells in culture [262]. The hollow blue circles indicate individual PWMs
used for the analyses (444 and 487 PWMs were used for the proximal and distal analyses,
respectively). The affinity scores of each PWM computed by PASTAA algorithm are plotted
against the p-values such that the highest scoring PWMs are in the top left [263]. Several
PWMs scored higher than NANOG in the proximal analysis (hollow circles left of the
NANOG PWM). These included PWMs representing binding sites for NFY, AP2, TBX5, and
ELK1, all associated with stemness and early development [267,264,278,265]. Panels 2.7B
and 2.7C are reprinted with permissions of Elsevier.
79
A
80
B
C
81
Table 2.1 Genes with significant evidence of differential expression between (A) fSKPs
and dSKPs, and (B) dSKPs and vSKPs as shown in Figure 2.3B
(A) Negative log fold change (LogFC) indicates genes whose mRNA levels are decreased in
fSKPs versus dSKPs while positive LogFC indicates genes whose mRNA levels are
increased in fSKPs versus dSKPs. The genes are sorted by their LogFC; (B) negative LogFC
indicates genes whose mRNA levels are decreased in dSKPs versus vSKPs while positive log
fold change indicates genes whose mRNA levels are increased in dSKPs versus vSKPs. The
genes are sorted by their LogFC. Modified with permissions of AlphaMed Press.
A
Gene
LogFC
Benjamini-Hochberg-corrected q
Eltd1
-4.44344
0.0024664
Zic1
-4.21255
0.0049442
Hoxc6
-4.02777
0.0025499
Hoxc9
-3.35571
0.0206326
Hoxa5
-3.20921
Cdh7
-2.44479
0.0229012
Tfpi
-2.4351
0.0020694
Anxa8
-2.13009
0.0157295
Hoxc4
-1.67475
0.0206326
Avpr1a
-1.44901
0.0206326
Cox4j2
1.552854
0.0206326
Fzd6
1.728556
0.0206326
Herc3
1.738304
0.0298194
Glt25d2
1.854643
0.0170832
Cnksr3
1.886715
0.0150527
Gpr85
2.016079
0.0207053
Nrp1
2.084411
0.0110914
Sytl3
2.099451
0.0259196
0.0024664
82
Gene
LogFC
Benjamini-Hochberg-corrected q
Il16
2.201115
0.0452825
Cd200
2.377227
0.0381321
Eya1
2.543472
0.0485376
Lphn3
2.547429
0.0117803
Pu3f4
2.820537
0.0051457
Cxcl14
3.096734
0.0024715
Mab21l2
3.277685
0.0071718
Tnfsf11
3.328933
0.0110914
Msln
3.405196
0.0102423
Ptprn
3.521102
0.0206326
Reln
3.724786
0.0016128
RGD13052
3.900023
0.0169522
Thbs4
4.172965
0.0017979
Cdh6
4.205613
0.0016128
Cntn6
4.224511
0.0495426
Pax3
4.409286
0.0049414
Mab21l1
5.134758
0.0016128
83
B
Gene
LogFC
Benjamini-Hochberg-corrected q
Mab21l1
-5.30391
0.00026
Frzb
-5.23657
0.001611
RGD1310827
-3.86079
0.00954
Ccr2
-3.78686
0.003454
LOC681994
-3.57244
0.008613
Car2
-3.52152
0.001611
Tmem26
-3.40143
0.013088
Nes
-3.22634
0.011844
Cbln4
-3.21517
0.008461
Upk1b
-3.14743
0.007513
LRRTM1
-3.12189
0.01291
Cmklr1
-3.09583
0.013115
Cldn10
-3.05589
0.004253
Mark1
-2.9736
0.002901
Il16
-2.80564
0.008005
Plxdc1
-2.70215
0.003454
Slc1a1
-2.66076
0.001611
RGD1307749
-2.60502
0.001611
Nrp1
-2.48394
0.001611
RGD1305869
-2.46487
0.001444
Loxl2
-2.40684
0.013111
Map2
-2.32846
0.010447
Slc4a11
-2.31295
0.008613
Tbx5
-2.26593
0.008242
Scn4b
-2.23519
0.004551
Acy3
-2.1814
0.006916
Itga2
-2.17679
0.008005
84
Gene
LogFC
Benjamini-Hochberg-corrected q
RGD1563891
-2.16154
0.01076
Col11a1
-2.16135
0.013115
Tead2
-2.13101
0.008439
-2.1293
0.002551
2.1247
0.008461
Cdh7
2.587961
0.008613
Emb
3.149873
0.011844
Zic1
4.191047
0.003422
LOC499465
Avpr1a
85
Table 2.2 Pathways enriched among the transcripts increased or decreased in abundance in SKPs compared to MSCs
Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to identify pathway annotations significantly
enriched among transcripts differentially expressed between SKPs and MSCs.
A
Ingenuity Canonical
Pathways
Role of Osteoblasts,
Osteoclasts and
Chondrocytes in
Rheumatoid Arthritis
Axonal Guidance Signaling
-log(BH q)
Ratio
Molecules
3.74
1.15E-01
3.74
9.16E-02
Colorectal Cancer
Metastasis Signaling
2.54
9.87E-02
Human Embryonic Stem
Cell Pluripotency
Wnt/Beta-catenin Signaling
1.69
1.08E-01
1.52
9.94E-02
Role of Macrophages,
Fibroblasts and Endothelial
Cells in Rheumatoid
Arthritis
1.52
7.96E-02
TCF4, ADAM17, SFRP2, MMP3, BMP2, MMP14, PIK3R1,
WNT16, MMP13, NFKBIA, NFAT5, WIF1, RUNX2, WNT7B,
DKK2, TNFRSF1B, CTNNB1, PPP3CA, MMP1 (includes
EG:300339), ADAMTS4, LRP5, GSN, IL7, BMP7, WNT5A
FYN, ADAM17, PIK3R1, BMP2, MMP13, WNT16, EPHA4, NCK1,
PLXNA2, ROBO1, PRKCZ, SEMA6C, SEMA4C, NTN1, LIMK1,
EFNB2, NFAT5, EFNB1, WNT7B, PLCB1, GNA13, ROBO2,
RASA1, PPP3CA, GNG12, SEMA5A, MMP10, SEMA3A, WIPF1,
NTRK2, SEMA6D, PRKCD, BMP7, SEMA7A, WNT5A
LRP5, TCF4, ADCY2, TGFBR1, MMP3, PTGER3, MMP14,
PIK3R1, WNT16, MMP10, MMP13, MMP2, TGFBR2, WNT7B,
MMP11, MMP12, CTNNB1, ADCY7, GNG12, RALGDS, MMP9,
MMP1 (includes EG:300339), WNT5A
TCF4, S1PR2, TGFBR1, PIK3R1, FGFR1, BMP2, WNT16,
TGFBR2, SOX2, NTRK2, WNT7B, BMP7, CTNNB1, WNT5A
LRP5, AXIN2, TCF4, SFRP2, TGFBR1, APPL2, WNT16,
KREMEN1, TGFBR2, SOX2, CDH2, WIF1, WNT7B, DKK2,
CTNNB1, WNT5A
LRP5, TCF4, SFRP2, MMP3, PIK3R1, WNT16, MMP13, IL7,
PRKCZ, IL16, NFKBIA, NFAT5, WIF1, PRKCD, WNT7B, PLCB1,
DKK2, TNFRSF1B, CTNNB1, PPP3CA, MMP1 (includes
EG:300339), WNT5A, ADAMTS4
86
Ingenuity Canonical
Pathways
RAR Activation
-log(BH q)
Ratio
Molecules
1.52
9.47E-02
Molecular Mechanisms of
Cancer
1.52
7.42E-02
Bladder Cancer Signaling
1.35
1.19E-01
Leukocyte Extravasation
Signaling
1.32
9.04E-02
Airway Pathology in
Chronic Obstructive
Pulmonary Disease
LXR/RXR Activation
1.30
4.29E-01
ADCY2, RDH10, BMP2, PIK3R1, RBP1, PRKCZ, CRABP1, PTEN,
ADH7, PNRC1, PRKCD, IGFBP3, RXRA, ZBTB16, ADCY7,
MMP1 (includes EG:300339)
FYN, TCF4, TGFBR1, BMP2, PIK3R1, PSEN2, PSENEN,
CDKN2B, PRKCZ, TGFBR2, NFKBIA, PLCB1, GNA13, CTNNB1,
RASA1, RALGDS, ADCY2, LRP5, RASGRF2, PRKCD, BMP7,
ARHGEF9, ADCY7, BCL2L11, WNT5A
DAPK1, MMP3, MMP14, MMP10, MMP13, MMP2, MMP11,
MMP12, MMP9, MMP1 (includes EG:300339)
TIMP3, MMP3, MMP14, PIK3R1, MMP10, MMP13, MMP2,
PRKCZ, WIPF1, PRKCD, VAV3, MMP11, MMP12, CTNNB1,
MMP9, MMP1 (includes EG:300339)
MMP2, MMP9, MMP1 (includes EG:300339)
1.30
1.14E-01
Hepatic Fibrosis / Hepatic
Stellate Cell Activation
1.30
9.56E-02
PTEN Signaling
1.30
1.01E-01
APOE, SCD, LY96, NR1H3, ABCG1, TNFRSF1B, RXRA, MMP9,
ABCA1
IGFBP4, TGFBR1, FGFR1, MMP13, IGFBP5, MMP2, TGFBR2,
LY96, IGFBP3, EDNRA, TNFRSF1B, MMP9, MMP1 (includes
EG:300339)
TGFBR2, GHR, NTRK2, TGFBR1, PIK3R1, FGFR1, FOXO3,
IGF2R, BCL2L11, PRKCZ, PTEN
87
B
Ingenuity Canonical
Pathways
Hereditary Breast Cancer
Signaling
-log(BH q)
Ratio
Molecules
5.85
1.86E-01
Role of BRCA1 in DNA
Damage Response
ATM Signaling
5.38
2.55E-01
4.85
2.50E-01
DNA Double-Strand Break
Repair by Homologous
Recombination
Mitotic Roles of Polo-Like
Kinase
Cell Cycle Control of
Chromosomal Replication
Cell Cycle: G2/M DNA
Damage Checkpoint
Regulation
Germ Cell-Sertoli Cell
Junction Signaling
4.37
5.00E-01
POLR2F, CDC25C, GADD45B, GADD45G, BARD1, RPA1,
RAD50, CHEK1, CCNB1, RAD51, HDAC6, FANCB, FANCM,
RRAS2, FANCD2, RFC4, H2AFX, MRAS, BRCA2, BRCA1, RFC3
BARD1, PLK1, RPA1, RAD50, CHEK1, RAD51, FANCB,
FANCM, FANCD2, RFC4, BRCA2, BRIP1, BRCA1, RFC3
CDC25C, GADD45B, GADD45G, CCNB2, MAPK12, RAD50,
CHEK1, CCNB1, RAD51, SMC2, FANCD2, H2AFX, BRCA1
RAD51, LIG1, GEN1, BRCA2, RPA1, BRCA1, RAD50
3.01
1.93E-01
2.45
2.59E-01
1.93
1.86E-01
KAT2B, CDC25C, CCNB2, PKMYT1, PLK1, BRCA1, CHEK1,
CCNB1
1.93
1.10E-01
Role of CHK Proteins in
Cell Cycle Checkpoint
Control
Molecular Mechanisms of
Cancer
1.93
2.06E-01
TUBB2C, LAMC3, MAP3K5, MAPK12, TUBB, TUBA1B, PAK1,
RRAS2, PAK3, SORBS1, MRAS, TGFB3, TUBA1C, ACTG2,
ACTN4, ACTN1
CDC25C, RFC4, RPA1, BRCA1, RAD50, CHEK1, RFC3
1.93
8.01E-02
KIF23, CDC25C, PLK4, ESPL1, CDC20, PRC1, CCNB2, PKMYT1,
PLK1, KIF11, CCNB1
MCM6, CDC45, MCM2, CDC6, RPA1, DBF4, ORC1
BMP4, LRP6, CDKN2C, MAP3K5, FAS, CHEK1, PAK1, FANCD2,
MRAS, BRCA1, CDC25C, CCNE2, SMAD9, HAT1, SMAD6,
AURKA, MAPK12, FOS, CCNE1, PRKCI, RRAS2, FZD4, PAK3,
FZD6, TGFB3, PLCB3, CAMK2G
88
Ingenuity Canonical
Pathways
Breast Cancer Regulation
by Stathmin1
-log(BH q)
Ratio
Molecules
1.76
9.68E-02
Aryl Hydrocarbon Receptor 1.59
Signaling
1.56
RhoA Signaling
1.02E-01
PPP1R14C, CCNE2, TUBB2C, PPP1R3C, PPP1R14A, GNG3,
ITPR1, TUBB, TUBA1B, ROCK2, PAK1, CCNE1, RRAS2, PRKCI,
MRAS, PLCB3, TUBA1C, CAMK2G
GSTM1, CCNE2, IL6, ALDH9A1, FAS, CYP1B1, CHEK1,
CCNA2, FOS, CCNE1, NFIA, TGFB3, AHR, HSPB1
ROCK2, ARHGAP5, MYL9, PFN1, MYL6, CFL2, CIT, IGF1R,
ANLN, ACTG2, DLC1, LPAR3
1.17E-01
89
Table 2.3 LongSAGE libraries used for the seriation analysis described in Section 2.2.4
The LongSAGE libraries from undifferentiated ES cells, differentiated ES cells and adult
tissues were used in seriation analysis to select genes whose expression was restricted to
undifferentiated ES cells.
Library
Group
Description
Shes2
Undifferentiated human ES cells
H9 human ES cells
Shes9
Undifferentiated human ES cells
HSF6 human ES cells
She10
Undifferentiated human ES cells
HES3 human ES cells
She11
Undifferentiated human ES cells
HES4 human ES cells
She13
Undifferentiated human ES cells
H7 human ES cells
She14
Undifferentiated human ES cells
H14 human ES cells
She15
Undifferentiated human ES cells
H13 human ES cells
She16
Undifferentiated human ES cells
H1 human ES cells
She17
Undifferentiated human ES cells
H1 human ES cells
She19
Undifferentiated human ES cells
BG01 human ES cells
Shs11
Differentiated human ES cells
H1 human ES cell-derived
erythromegakaryocytic progenitors
Shs12
Differentiated human ES cells
Shs13
Differentiated human ES cells
Cg643
Adult tissue
H1 human ES cell-derived enriched
primitive hematopoietic multipotent
progenitors
H1 human ES cell-derived enriched
primitive hematopoietic myeloid
progenitors
Normal adult bulk pancreas
Cg647
Adult tissue
Mammary gland, antibody purified
Cg648
Adult tissue
Normal substantia nigra
Cg655
Adult tissue
Normal liver vascular epithelium
ID
90
Table 2.4 Pluripotency genes with transcript abundance increased or decreased in SKPs compared to MSCs
The list of 114 pluripotency genes identified by seriation analysis described in Section 2.2.5 was overlapped with the list of transcripts
significantly differentially expressed between SKPs and MSCs (Section 2.2.3).
Gene
symbol
Description
Molecular function
Ingenuity Canonical Pathway
enriched in ES cells
CTNNB1
Catenin (cadherinassociated protein), beta 1,
88kDa
Role of Oct4 in Mammalian
Embryonic Stem Cell Pluripotency;
Role of NANOG in Mammalian
Embryonic Stem Cell Pluripotency
ETV4
Ets variant 4
MAD2L2
MAD2 mitotic arrest
deficient-like 2 (yeast)
PITX2
Paired-like homeodomain 2
SOX2
SRY (sex determining
region Y)-box 2
Transcriptional
regulator, key member
of the WNT signaling
pathway
Transcriptional
regulator
Component of the
mitotic spindle
assembly checkpoint
Transcriptional
regulator
Transcriptional
regulator
ADAM23
AURKB
ADAM metallopeptidase
domain 23
Aurora kinase B
CENPK
Centromere protein K
Increased or
decreased in
SKPs
Increased
Increased
Increased
Increased
Role of Oct4 in Mammalian
Embryonic Stem Cell Pluripotency;
Role of NANOG in Mammalian
Embryonic Stem Cell
Pluripotency; Human Embryonic
Stem Cell Pluripotency
Increased
Metalloprotease
Decreased
Protein serine/threonine
kinase
Protein binding
Decreased
Decreased
91
Gene
symbol
Description
Molecular function
FAM46B
Family with sequence
similarity 46, member B
Family with sequence
similarity 64, member A
High mobility group box 2
Unknown
Increased or
decreased in
SKPs
Decreased
Unknown
Decreased
Transcriptional
regulator
Translational regulator
Decreased
Protein transporter
Decreased
Dehydrogenase activity
Decreased
Transcriptional
regulator
Decreased
Transcriptional
regulator
Structural component
of cytoskeleton
Decreased
Transcriptional
regulator
Decreased
FAM64A
HMGB2
IGF2BP3
KPNA2
MTHFD1
MYBL2
TBX4
TPM1
(includes
EG:22003)
ZFP57
Insulin-like growth factor 2
mRNA binding protein 3
Karyopherin alpha 2 (RAG
cohort 1, importin alpha 1)
Methylenetetrahydrofolate
dehydrogenase (NADP+
dependent) 1,
methenyltetrahydrofolate
cyclohydrolase,
formyltetrahydrofolate
synthetase
v-Myb myeloblastosis viral
oncogene homolog (avian)like 2
T-box 4
Tropomyosin 1 (alpha)
Zinc finger protein 57
homolog (mouse)
Ingenuity Canonical Pathway
enriched in ES cells
Decreased
Decreased
92
Chapter 3: Transcriptome analysis of neuroblastoma tumor-initiating cells
for therapeutic target prediction3
3.1 Introduction
Cancer stem cells and tumor-initiating cells (TICs) have been described in a variety
of hematopoietic and solid malignancies, including those of the breast, brain, pancreas, liver,
skin, and colon [35]. Primary TIC lines have also been isolated from NBL tumors and
metastases [279]. NBL TICs and cancer stem cells share several properties, including the
ability to self-renew and differentiate into cell types observed in the bulk tumor, express stem
cell markers, and exhibit enhanced tumorigenic potential [279]. While it has been reported
that several NBL TIC lines may be contaminated with Epstein-Barr-transformed
lymphocytes [280], these lines have been shown to recapitulate metastatic NBL in animals,
including upon serial transplantation, supporting their usefulness as models for NBL [279].
A recent study using chronic myeloid leukemia stem cells provided proof of
principle that targeting a cancer stem cell–enriched gene could lead to the eradication of such
cells and a potential disease cure [281]. Therefore, NBL TICs, which are non-immortalized
cell lines with high tumorigenic potential in immunosuppressed mice, can provide a model
for the development of improved therapies for recurrent and metastatic NBL. In this Chapter
I describe an RNA-Seq analysis applied to a panel of human NBL TIC and SKP lines (Table
3.1). The overall objective of this Chapter is to assess whether transcripts preferentially
abundant in NBL TICs could reveal candidate new drug targets against NBL. To fulfill this
objective, I address three specific aims described below. First, I use RNA-Seq data to
identify transcripts for which the expression is increased in NBL TICs compared to other
tissue types. Second, I conduct functional analysis to identify candidate drug targets among
these transcripts, with a specific focus on one drug target of interest, Aurora kinase B
3
A version of this chapter has been published, and the co-author contributions are detailed in the Preface as per
the University of British Columbia PhD thesis guidelines: O. Morozova, M. Vojvodic, N. Grinshtein, L.M.
Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P.
Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A.
Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target
for neuroblastoma. Clin. Cancer Res. 16(18):4572-82, 2010. Copyright by the American Association for Cancer
Research.
93
(AURKB). Third, I conduct an exon-level analysis to provide a potential mechanism to
explain the increased expression of AURKB in NBL TICs.
I used SKPs as a normal reference sample for these analyses since, as discussed in
detail in Chapter 2, SKPs are multipotent precursors isolated from human foreskin that are
able to self-renew and differentiate into various neural crest derivatives, including peripheral
neurons and neural crest lineage-specific Schwann cells [234]. As NBL is a tumor of neural
crest precursors, SKPs provide a normal reference transcriptome for the identification of
candidate gene expression changes associated with the TIC phenotype. To increase the
specificity of the identified gene expression changes to NBL TICs, I also compared the
expression profiles of NBL TICs to those of a compendium of cancer tissues, including
primary tumor samples from breast, skin, brain, B-cells, ovary, cervix, and lung, as well as
breast cancer cell lines.
3.2 Results
3.2.1 Identification of genes preferentially enriched or depleted in NBL TICs
compared to a compendium of cancer tissues and SKPs
Having analyzed the expression profiles of normal neural crest stem cell-like cells in
Chapter 2, I set out to characterize the expression profiles of their malignant counterparts,
NBL tumor-initiating cells (TICs). These cells have been identified and characterized in NBL
primary tumors and metastases, and have been shown to be associated with tumor relapse
[279]. We sequenced transcriptomes from 10 NBL TIC lines isolated from tumors and
metastases of six high-risk NBL patients (Table 3.1) using Illumina RNA sequencing (RNASeq)[148,149]. The NBL TIC lines used in this study included those isolated from patients
during disease relapse and one from a patient in remission. Because we previously showed
that line NB67, isolated from the bone marrow of a patient in clinical remission who
subsequently relapsed, was tumorigenic we included this NBL TIC line in the analysis [279].
To generate reference normal expression profiles, we sequenced the transcriptomes of three
foreskin-derived SKP lines from three children without cancer [234]. As described in some
detail in Chapter 2, these skin-derived progenitor cells, regardless of their embryonic origin,
possess the properties of neural crest stem cells, and therefore may serve as reasonable
normal counterparts to NBL TICs.
94
To identify transcripts significantly enriched in NBL TIC lines compared to normal
neural crest cells I used the LIMMA Bioconductor package [249] as described in Methods.
The LIMMA analysis revealed 817 and 1,913 genes either significantly increased or
decreased in abundance in NBL TICs versus SKPs. I considered it likely that, within the list
of differentially expressed genes, there would be candidate NBL TIC markers and also
transcripts generally associated with a proliferative phenotype. Targeting gene products that
are nonspecifically expressed in proliferating cell types would potentially result in increased
toxicity, particularly in children whose organ systems are undergoing growth and
development. To avoid identifying such genes, and to select gene expression differences
specific to NBL TICs, I compared our NBL TIC RNA sequences to RNA sequencing data
from 30 cancer samples available at the Genome Sciences Centre. These samples were
derived from seven tissue types, including ovary, B-cells, lung, blood, brain, skin, and cervix
(Table 3.2) and were included as an additional reference set for the identification of
transcripts enriched specifically in NBL TICs.
The LIMMA package [249] was used to compare gene expression levels between
NBL TICs and other tissues as described in Methods. This comparison revealed that 2,258
genes were increased in NBL TICs compared to other tissues, while 2,397 genes were
decreased in expression in NBL TICs compared to other tissues. These gene lists were then
compared to the lists of genes identified as significantly differentially expressed between
NBL TICs and SKPs to select genes that were significant in the same direction in both of
these comparison. This comparison revealed that 449 transcripts were significantly increased
in expression in NBL TICs as compared to SKPs and as compared to other tissues. Similarly,
1,059 genes were decreased in expression in NBL TICs in both comparisons, NBL TICs
versus SKPs and NBL TICs versus other tissues.
To confirm the differential expression of candidate 449 NBL TIC-enriched transcripts
(transcripts enriched in NBL TICs as compared to both SKPs and other tissues) and
candidate 1,059 NBL TIC-depleted transcripts (transcripts depleted in NBL TICs as
compared to both SKPs and other tissues) identified using RNA sequencing, I analyzed eight
NBL TIC lines from five patients and five SKP lines from four cancer-free children (Table
2.1) using Affymetrix Human Exon 1.0 ST Array data. This platform provides independent
confirmation of gene expression at the level of exons [282,283]. Analysis of exon array data,
95
conducted as described in Methods, confirmed the differential expression of 321 (71%) NBL
TIC-enriched and 819 (77%) NBL TIC-depleted transcripts, which were identified as
significantly differentially expressed between NBL TICs and SKPs using microarrays
(Figure 3.1; Appendix C). These genes represented robust sets of NBL TIC-enriched and
depleted transcripts that I analyzed further to identify the pathways disrupted in NBL TICs.
3.2.2 Elevated mRNA levels of BRCA1 signaling pathway members are associated
with the NBL TIC phenotype
To assess the functional significance of transcripts differentially abundant in NBL
TICs, I conducted a pathway enrichment analysis using Ingenuity software (Ingenuity
Systems, www.ingenuity.com) as described in Methods. The analysis revealed several
signaling pathways significantly associated with the NBL TIC-enriched transcripts (Fisher‘s
Exact P <0.05). The pathways are listed below along with the number of NBL TIC-enriched
transcripts involved in each pathway: ―Role of BRCA1 in DNA Damage Response‖ (13
genes), ―Purine Metabolism‖ (20 genes), ―Mitotic Roles of Polo-Like Kinases‖ (8 genes),
―Pyrimidine Metabolism‖ (13 genes), ―Role of CHK proteins in Cell Cycle Checkpoint
Control‖ (6 genes), ―One Carbon Pool by Folate‖ (6 genes), ―Cell Cycle: G2/M DNA
Damage Checkpoint Regulation‖ (5 genes), ―ATM Signaling‖ (4 genes), ―Cleavage and
Polyadenylation of Pre-mRNA‖ (2 genes), ―Alanine and Aspartate Metabolism‖ (3 genes)
(Figure 3.2A). In contrast, the following pathways were significantly enriched among NBL
TIC-depleted transcripts (Fisher‘s Exact P <0.05): ―Axonal Guidance Signaling‖ (43 genes),
―Hepatic Fibrosis / Hepatic Stellate Cell Activation‖ (21 genes), ―Coagulation System‖ (10
genes), ―Colorectal Cancer Metastasis Signaling‖ (25 genes), ―CXCR4 Signaling‖ (18
genes), ―Germ Cell-Sertoli Cell Junction Signaling‖ (18 genes), ―Factors Promoting
Cardiogenesis in Vertebrates‖ (12 genes), ―TGF-beta Signaling‖ (12 genes), ―ILK Signaling‖
(19 genes), and ―Complement System‖ (7 genes) (Figure 3.2B).
Of the 321 genes significantly upregulated in NBL TICs (Appendix C), thirteen were
known members of the BRCA1 DNA damage response pathway (Figure 3.2C). This pathway
was identified as the most significantly associated with the NBL TIC-enriched transcripts,
such that 13 of the 53 pathway members were among the 321 NBL TIC-enriched transcripts.
In addition, eight and eleven genes were associated with polo-like kinase and cell cycle
96
checkpoint control pathways, respectively, both of which are direct downstream targets of
BRCA1 signaling (Figure 3.2C).
3.2.3 MudPIT analysis confirms the abundance of DNA repair proteins in the
proteome of a NBL TIC line
To assess the contribution of the NBL TIC-enriched transcripts to the NBL TIC
proteome, we conducted a Multidimensional Protein Identification Technology (MudPIT)
analysis of whole-cell lysate and a membrane-enriched fraction of NBL TIC line NB88R2
generated from a bone marrow metastasis of a high-risk patient. The MudPIT technique
involves digesting the protein sample into peptides with trypsin and separating the peptides
with two liquid column chromatography steps. As the peptides elute from the second column
they are assigned unique mass/charge ratio fingerprints that can be used to reveal the identity
of each protein [284].
The MudPIT approach can effectively identify thousands of concurrently expressed
proteins for global or subcellular fraction–specific proteomic profile analyses [285]. The
MudPIT analysis of the whole-cell lysate isolated from line NB88R2 cells revealed 819
proteins in which each protein was identified by at least two peptides. A similar analysis
identified 1,530 proteins in the membrane-enriched fraction isolated from the same line. Of
the 321 TIC-enriched genes, all of which were expressed in line NB88R2, peptides for 75
were detected by MudPIT in either whole-cell or membrane-enriched lysate of line NB88R2
or both (Table 3.3). Forty-five of the detected proteins were encoded by genes that were
expressed in the 75% to 100% expression percentile in the NB88R2 line, whereas only two
protein products were detected for genes expressed in the 0% to 24% expression percentile,
indicating a correlation between transcript abundance and MudPIT analysis in one cell line.
In addition, the median expression level of the genes for which the protein products were
detected by MudPIT was 206, while the median expression level of the genes for which the
protein products were not detected was 73.3. To investigate the significance of this
difference, we compared the NB88R2 square-root-transformed average expression levels of
the 75 NBL TIC-enriched genes for which the protein products were detected by MudPIT to
the average expression levels of the 246 NBL TIC-enriched genes for which the protein
products were not detected by MudPIT using a one-tailed two-sample equal variance T-test.
Based on the P = 7.365E-24, we rejected the null hypothesis that the two expression means
97
were the same, providing evidence for a correlation between the higher expression level of a
gene and the ability to detect its protein product by MudPIT.
According to the Ingenuity Knowledgebase annotation, 21% (16 of 75) of the
detected proteins were associated with the DNA replication, recombination, and repair
functional category (Ingenuity Systems, www.ingenuity.com), including PARP1, PCNA,
UBE2N, FEN1, HMGB2, and RFC, which forms a major complex interacting with BRCA1
[286]. This result suggests that DNA repair proteins are expressed in the proteome of at least
one NBL TIC line, providing further support to the results of the gene expression analysis.
3.2.4 Known drug targets among NBL TIC-enriched transcripts
Because the most direct pharmacologic intervention is inhibition of a target protein
[287], I focused further functional analyses on genes upregulated rather than downregulated
in NBL TICs with respect to SKPs and other tissues. Drug repositioning, in which existing
drugs are used for novel indications, is a powerful approach to novel therapy development
because it greatly reduces the cost and time required to clinically develop a new therapeutic
option [288]. I therefore aimed to use NBL TIC-enriched genes to identify targets of existing
therapeutics with the concept that such drugs could be potentially effective against recurrent
NBL. I applied the Ingenuity Knowledgebase (Ingenuity Systems, www.ingenuity.com) tool
to map the 321 NBL TIC-enriched transcripts, as well as their interacting partners, to known
drugs. This analysis revealed thirty known drug targets among the NBL TIC-enriched genes
and their interacting partners defined by the Ingenuity Knowledge Base (bold type in Table
3.3 indicates the NBL TIC-enriched genes). Seventeen out of thirty of the predicted drug
targets have been explored preclinically or clinically for the treatment of NBL (Table 3.4).
These drugs included both general chemotherapeutics, such as etoposide, becatecarin,
doxorubicin, flavopiridol, and vincristine, all of which are currently approved or in trials for
NBL, as well as targeted agents such as BCL2 inhibitors, evaluated for the treatment of NBL
[289]. Several agents predicted by my analysis, such as HDAC inhibitors and PARP
inhibitors, have already shown promise in the management of chemotherapy-resistant NBL
[290,291] suggesting that our approach can identify drug targets relevant to the disease.
In addition to known NBL drug targets, my analysis predicted genes and gene
products targeted by existing drugs that at the time of the publication of this study had not
been implicated clinically as therapeutic targets for high-risk NBL. These molecules included
98
AURKB, PLK1, ADORA2A, CXCL10, SLC1A4, COL14A1, TNFRSF10B, ITGA2b, and
IL6. Based on biological and clinical considerations discussed in Section 3.2.5, we selected
AURKB for further evaluation as a potential drug target against metastatic NBL.
3.2.5 Targeting BRCA1 signaling: inhibition of AURKB is selectively cytotoxic to
NBL TICs
The Aurora kinase family includes three serine/threonine kinases involved in the
control of the cell cycle. Inhibitors of Aurora A and B kinases have shown promise as
anticancer agents for the treatment of solid tumors and leukemias [292]. Although an Aurora
A kinase inhibitor is in an ongoing phase I/II clinical trial for NBL (NCT00739427), Aurora
B kinase inhibitors have not been investigated in relation to NBL. A recent report suggested a
direct link between Aurora B kinase and BARD1, a key component of the BRCA1 signaling
pathway that is also associated with susceptibility to NBL [197,269]. This report suggested
that full-length BARD1, expressed by normal cells, interacts with BRCA1 and mediates
AURKB degradation, while a shorter BARD1beta isoform lacking the BRCA1 interaction
domain, expressed by some cancer cells, scaffolds AURKB with BRCA2 stimulating cellular
proliferation (Figure 3.2C) [269].
In this study, AURKB was highly expressed in NBL TICs, at the average expression
level of 44.35 Reads Per Kilobase of gene model per Million mapped reads (RPKM)(range
9.83—67.66 RPKM). In contrast, AURKB transcripts were not detectable above background
in SKPs or other normal samples (Section 3.2.7). The BARD1/AURKB relationship, together
with the aberrant expression of the BRCA1/BARD1 pathway and AURKB in NBL TICs
observed in this study, as well as the clinical feasibility of Aurora kinase inhibitors provided
a rationale for exploring the antiproliferative potential of Aurora B kinase inhibitors in NBL
TICs. To assess whether elevated mRNA levels at the AURKB locus in NBL TICs compared
to SKPs and a panel of other cancers corresponded to increased levels of AURKB protein,
we performed Western blot analysis using whole-cell lysates from three NBL TIC lines
(NB12, NB88R2, and NB122R) and two SKP lines (FS274 and FS227). This analysis
revealed the presence of the AURKB protein in NBL TICs but detected no protein in SKPs,
supporting the gene expression result (Figure 3.3A). To gain further insight into the role of
AURKB in controlling NBL TIC proliferation, we performed shRNA knockdown
experiments in NBL TIC lines NB12 and NB88R2. NBL TICs stably infected with
99
lentiviruses encoding two separate shRNAs to AURKB showed 77% to 80% growth
reduction compared with NBL TICs infected with lentiviruses carrying mock shRNAs to
green fluorescent protein or β-galactosidase (Figure 3.3B). The observed reduction in
proliferation following AURKB knockdowns supports the premise that AURKB signaling is
important for the viability of NBL TICs.
To assess whether pharmacologic inhibition of AURKB would have the same effect
on NBL TIC proliferation as the AURKB knockdowns done above, we used AZD1152, a
selective AURKB inhibitor that is currently undergoing phase I/II testing in patients with
acute myelogenous leukemia (NCT00497991). NBL TIC lines (NB12 and NB88R2), as well
as the FS283 SKP line, were treated with a range of AZD1152 concentrations, and cell
growth was assessed 96 hours later using alamarBlue reduction [293] as a read-out of cellular
metabolic activity. As shown in Figure 3.3C, proliferation of NBL TICs is reduced following
treatment with AZD1152, showing low micromolar EC50 values (1.5-4.6 μmol/L). In
contrast to this, SKPs were less sensitive to AZD1152, exhibiting higher EC50 values (12.4
μmol/L). The enhanced reduction of proliferation of NBL TICs compared to SKPs following
genetic (shRNA) and pharmacological (AZD1152) inhibition of AURKB is consistent with
the hypothesis that AURKB is a potential drug target for metastatic NBL.
3.2.6 Exon-level expression analysis of BARD1 reveals a potential mechanism for the
sensitivity of NBL TICs to AURKB inhibition
The full-length BARD1 isoform interacting with BRCA1 was reported to mediate
AURKB degradation, while the shorter BARD1beta isoform lacking the BRCA1 interaction
domain was reported to be involved in the stabilization of AURKB via interactions with
BRCA2 [269]. Since NBL TICs expressed AURKB both at the level of mRNA (Figure 3.1B)
and protein (Figure 3.3A), and were sensitive to AURKB inhibition (Figure 3.3B and C) we
hypothesized that NBL TICs expressed the BARD1beta isoform that is involved in the
scaffolding of AURKB and BRCA2.
Upon inspection of the NBL TIC and SKP RNA-Seq data, I found that SKPs
expressed BARD1 at the expression threshold level of Reads Per Kilobase of exon model per
Million mapped reads (RPKM) ~ 1 [150]. Therefore, I sought an alternative source of
reference normal RNA-Seq data to study the exon usage at the BARD1 locus in normal and
NBL TIC cells. To address the hypothesis that NBL TICs preferentially express the
100
BARD1beta isoform, while normal cells express the full-length BARD1 isoform, I used the
RNA-Seq data from NBL TIC libraries (Table 3.1), and a panel of 16 normal tissues from the
Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611).
The exon-level expression at the BARD1 locus was quantified in these samples as described
in the Methods using the RPKM expression measure [150]. The exon usage of each BARD1
exon was defined as the splice index (SI), calculated as the percent of the RPKM level of
each exon from the overall RPKM level of the gene, (exon RPKM/gene RPKM)*100.
The average SI of exon 2 in NBL TICs, computed across the 10 NBL TIC RNA-Seq
libraries, was 2.17% (SD = 0.96), which was significantly less than the average value of
11.5% (SD = 4.01%), computed across the 16 normal tissues, as assessed by a moderated Ttest with the Benjamini-Hochberg multiple testing correction implemented in the LIMMA
Bioconductor package [249] (BH-corrected q < 0.05). Similarly, the average SI of exon 3
was 8.75% (SD = 2.60%) in NBL TICs, which was found to be significantly different from
the average value of 31% (SD = 7.00%) in the normal tissues (BH-corrected q < 0.05)
(Figure 3.5A). The moderated T-test was selected for this analysis as it does not assume
independence of exons from each other, which is likely a biologically relevant assumption
[249]. Instead, to compute the T-statistic, the empirical Bayes moderated T-test method uses
information from all exons in a gene, by computing a weighted average between the variance
at each exon and the variance across all exons at the locus [249]. The moderated T-test has
been used previously for studying differential expression in RNA-Seq data [294] and for
differential splicing analysis using exon arrays [295]. Upon manual inspection in the IGV
browser [163], exon 1 was found to have a high GC content (average 70% or more), which
may account for the low coverage of this region in all samples. Due to the low coverage of
exon 1, its SI could not be reliably assessed in this study.
Based on the UniProtKB records, the BRCA1 interaction region of BARD1 is
comprised of residues 26-119, encoded by a portion of exon 1, exon 2 and exon 3 (Figure
3.5C) [296]. Therefore, the finding of the lower expression of exons 2 and 3 of BARD1 in
NBL TIC cells is consistent with the expression of the shorter BARD1beta isoform that has
been reported to be involved in the stabilization of AURKB in cancer cells [269].
We also used the trans-ABySS de novo assembly pipeline [297] discussed in Section
4.4.6 to reconstruct the structure of BARD1 transcripts expressed by NBL TICs. This pipeline
101
assembled short RNA sequencing reads into contigs, aligned the contigs to the reference
hg18 genome, and then compared the alignments to the annotated transcript models from
Ensembl 54 [298]. Since exon 1 of BARD1 was not covered by sequencing reads, we were
unable to assemble contigs that spanned the full length of BARD1 transcripts. However, we
detected contigs that were missing exons 1, 2, and 3 providing additional evidence for the
expression of the BARD1beta isoform by NBL TICs.
3.2.7 Relevance to primary neuroblastoma
In Sections 3.2.1-3.2.5 I found that the mRNA levels of members of the
BRCA1/BARD1 signaling pathway were significantly higher in predominantly metastasesderived NBL TICs than in normal neural crest-like cells (SKPs), and other cancers.
Moreover, both transcript and protein levels of AURKB, a member of the BRCA1/BARD1
pathway, were found to be enriched in expression in NBL TICs compared to SKPs (Sections
3.2.2 and 3.2.5). We also showed that genetic and pharmacological inhibition of AURKB
was cytotoxic to NBL TICs, and less so to SKPs. In Section 3.2.6 I linked the observation of
the preferential expression of AURKB by NBL TICs to the expression of the oncogenic
BARD1beta isoform that was reported to stabilize AURKB in cancer cells (Figure 3.5C)
[269]. Since NBL TICs used in this analysis are predominantly derived from bone marrow
metastases of relapsed NBL patients [279], I asked whether the BRCA1/BARD1 pathway,
the oncogenic BARD1beta isoform and AURKB were also expressed by primary NBL cells.
To address this question, I used the RNA-Seq data from 10 primary NBL tumors, described
in Chapter 4 and Appendix D.
To investigate whether the mRNA expression of BRCA1/BARD1 pathway members
was enriched in primary NBL tumors with respect to normal cells, I compared the expression
profiles of 10 primary NBL tumors (Appendix D) and 16 normal tissues from the Illumina
BodyMap 2.0 project (Section 3.2.6). I used the Reads Per Kilobase of gene model per
Million mapped reads (RPKM) as a measure of gene expression [150], and applied the
methods in the LIMMA package [249] to identify genes significantly enriched in expression
in NBL cells, as described in Section 3.4.3. This analysis revealed 1,828 genes with evidence
of increased mRNA abundance in NBL cells compared to normal tissues (BenjaminiHochberg-corrected q < 0.05). Ingenuity Pathway Analysis software (Ingenuity Systems,
www.ingenuity.com) was then used to identify significantly enriched annotations within this
102
gene list, as described in Section 3.4.3. The pathway enrichment analysis revealed that the
pathway entitled ―Role of BRCA1 in DNA Damage Response‖ was the most significantly
enriched annotation among the 1,828 genes (Fisher‘s Exact P < 0.05), such that 15 out of 53
members of this pathway (FANCG, FANCA, FANCD2, RAD51, BRCA1, BACH1,
AURKB, BLM, RFC, MSH2, SWI/SNF, OCT1, TP53, PLK1, E2F) were more abundant in
NBL cells compared to normals at the level of mRNA (Benjamini-Hochberg-corrected q <
0.05).
Having established that the BRCA1/BARD1 signaling pathway annotation was
significantly enriched among transcripts increased in expression in NBL tumors compared to
normal cells, we used the RPKM measure to directly compare the AURKB expression levels,
and BARD1 exon usage in NBL TICs, primary NBL and Illumina BodyMap 2.0. We used
the Illumina BodyMap 2.0 data rather than SKP data for the primary tumor versus normal
analyses, since, as mentioned in Section 3.2.6, BARD1 exon usage could not be reliably
assessed in SKPs due to the marginal expression of this gene in the SKPs RNA-Seq libraries.
The average SI of exon 2 in NBL TICs and NBL primary tumors was 2.17% (SD =
0.96%) and 3.57% (SD = 1.89%), respectively, both of which were significantly less than the
average value of 11.5% (SD = 4.01%), computed across the 16 normal tissues, as assessed by
a moderated T-test in and the Benjamini-Hochberg multiple testing correction implemented
in the LIMMA Bioconductor package [249] (BH-corrected q < 0.05). Similarly, the average
SI of exon 3 was 8.75% (SD = 2.60%) and 7.55% (SD = 2.04%) in NBL TICs and primary
tumors, respectively, both of which were found to be significantly different from the average
value of 31% in the normal tissues (BH-corrected q < 0.05) (Figure 3.5A). The average genelevel RPKM values for AURKB were computed for Illumina BodyMap 2.0 normal tissues
(16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). While
AURKB expression was not detectable above background (RPKM ~ 1) in any of the 16
normal libraries, the average AURKB expression in NBL primary tumors and NBL TICs was
21.6 RPKM (range 2.55—36.95 RPKM) and 44.35 RPKM (range 9.83—67.66 RPKM),
respectively (Figure 3.5B). These results are consistent with the interpretation that the
BARD1beta isoform is present in both NBL TICs and primary NBL tumors, and that both
primary tumors and NBL TICs may be sensitive to AURKB inhibition.
103
3.3 Discussion
The rationale for the work in Chapter 3 was the idea that targeting cancer stem cellspecific proteins could be cytotoxic to cancer stem cells, while sparing their normal stem cell
counterparts, and lead to discoveries with potential clinical application. This idea has been
previously validated in a chronic myeloid leukemia model, where a leukemia stem cellspecific gene Alox5 was identified, and its inhibition led to the eradication of chronic
myeloid leukemia in a mouse model [281]. Therefore, I aimed to apply the same concept to
the NBL TIC model, using SKPs as normal reference stem cells. To identify transcripts for
which the expression was enriched in NBL TICs, I used RNA-Seq expression data from NBL
TICs, SKPs, and a compendium of cancer tissues. It is important to note that since the
compendium of cancer tissues included RNA-Seq data from cancerous lymph nodes, B-cellspecific transcripts found in NBL TICs by us (not shown) and others [280], possibly as a
result of contamination with Epstein-Barr-transformed lymphocytes, would not be identified
as NBL TIC-enriched.
The gene-level expression analysis of RNA-Seq data from ten NBL TIC samples
revealed 321 transcripts increased in expression in NBL TICs compared to SKPs and a panel
of cancer tissues. Twenty-one of these transcripts were members of the BRCA1 signaling
pathway or its downstream components, which amounted to a statistically significant
enrichment of this pathway annotation among transcripts increased in expression in NBL
TICs (Fisher‘s Exact P < 0.05). A key component of the BRCA1 pathway, BRCA1associated RING domain protein 1 (BARD1), was shown to act as a predisposition locus for
high-risk NBL by a single nucleotide polymorphism (SNP)–based genome-wide association
study [197]. In this study of more than 500 high-risk NBL patients, also described in Section
1.9.2, six intronic SNPs at the BARD1 locus, contained within BARD1 introns 1, 3, and 4,
met genome-wide significance for association with the disease (odds ratio for the most
significant SNP = 1.68; 95% confidence interval 1.49 to 1.90; P = 8.65E-18 ). Evidence in
breast tumors suggests that BARD1 is a regulator of the tumour-suppressor function of
BRCA1, and can act as a tumor suppressor itself [299,300]. In particular, the
BARD1/BRCA1 heterodimer is important for the tumor suppressor activity, such that losses
of BARD1, BRCA1 or their interaction are tumorigenic and result in similar basal-like
phenotypes in breast cancers [300]. Preliminary investigations of the effects of the BARD1
104
NBL risk alleles identified in the genome-wide association study [197] suggest that these
alleles result in the overexpression of the oncogenic BARD1beta isoform (Figure 3.5C)
[196]. The BARD1beta isoform lacks exons 2 and 3 that encode the RING-finger domain
involved in the interaction with BRCA1 [269]. Aberrant BARD1 splicing, although not the
isoform seen in NBL, has been previously reported in other cancers, including ovarian [301],
colon [302] and non-small cell lung cancers [303]. In this study, we also observed that NBL
TICs and primary tumors, but not normal tissues, expressed the oncogenic BARD1beta
isoform (Section 3.2.7) that does not interact with BRCA1, but instead is involved in
scaffolding BRCA2 and AURKB (Figure 3.5C) [269].
To identify existing therapeutics that could be applied to the treatment of recurrent
NBL, I used Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com)
to analyze the functional significance of the identified genes and match them against a
database of available drugs. In total, thirty targets with an available inhibitor were identified,
nine of which have never been implicated in NBL treatment.
Aurora kinase B (AURKB), one of the nine novel drug targets, was selected for
further validation based on two factors: its link to the BRCA1 signaling pathway through
reported interactions with the shorter (beta) isoform of the NBL predisposition locus BARD1
[269], and the known role of its family member, AURKA, in NBL [304]. Both AURKA and
AURKB are essential for proper chromosome alignment and separation during mitosis. The
inhibition of either protein results in gross defects in chromosome segregation: aneuploidy in
the case of selective AURKA inhibition, and polyploidy in the case of selective AURKB
inhibition, either leading to cell death [305]. Treatments with a selective AURKB inhibitor,
AZD1152, were cytotoxic to NBL TICs used in the study but not to normal pediatric neural
crest-like precursor cells. Although AURKA inhibitors are currently in clinical trials for NBL
(NCT00739427), to our knowledge this study provides the first report of AURKB inhibitors
as potential therapeutics for NBL. Because AURKB inhibitors are already in clinical trials,
there is potential for rapid translation of the finding in NBL to therapy against the disease.
The selective activity of AZD1152 in NBL TICs compared to SKPs, which is likely due to
the differential AURKB protein abundance in NBL TICs compared with SKPs, provides a
foundation for further exploring AURKB as a drug target for pediatric NBL.
105
An independent validation of the potential significance of AURKB in NBL is the
preliminary report from the KidsCancerKinome initiative that studied a panel of pediatric
tumors and cell lines and found that both AURKA and AURKB were expressed at a high
level in tumors with poor prognosis, including high-risk NBL [306]. The therapeutic
potential of inhibiting AURKA and AURKB in NBL is currently being investigated by the
group through functional studies, including shRNA knockdowns and in vivo inhibitor studies
(Ellen Westerhout, personal communication). The confirmation of our finding by an
independent group of investigators studying primary NBL tumors lends credibility to our
bioinformatic approach, in which we used normal SKPs and a compendium of cancer tissues
to select NBL TIC-enriched markers. Further validation of the results from my bioinformatic
analysis is provided by two reports that used NBL TICs [307] and primary NBL tumors and
cell lines [308] to provide experimental evidence of the therapeutic potential of PLK1
inhibition in high-risk NBL. As shown in Figure 3.2C, PLK1 signaling is downstream of
BRCA1/BARD1 pathway, and the PLK1 molecule was also suggested by my analysis as one
of the potential therapeutic targets against NBL TICs (Table 3.4).
In conclusion, the work described in this Chapter provides the first high-resolution
system-level analysis of NBL TICs and a proof of principle that next-generation sequencing
of primary human NBL TICs can reveal therapeutically relevant candidates for NBL.
Specifically, we showed that inhibiting an NBL TIC-enriched transcript implicated in a
relevant pathway is selectively cytotoxic to these cells compared to their normal stem cell
counterparts (SKPs). The selective cytotoxicity against cancer stem cell-like NBL TICs is
particularly important for high-risk NBL, as current therapies used in the management of the
disease can effectively reduce tumor burden, but do not produce a durable cure in the
majority of patients [174]. Since cancer stem cells are thought to be associated with disease
relapse [243], the specific targeting of NBL TICs may help result in stable long-term
remission for high-risk NBL patients. The apparent selectivity of AURKB inhibition, as
compared to normal pediatric stem cells (SKPs), may imply that this treatment would
potentially be less toxic to children with NBL.
106
3.4 Materials and methods
3.4.1 RNA sequencing and data analysis
NBL TICs and SKPs were cultured as previously described [279,234]. Briefly, the
cells were cultured in DMEM-F12 medium, 3:1 (Invitrogen), containing 2% B27 supplement
(Gibco), 40 ng/mL basic fibroblast growth factor 2, and 20 ng/mL epidermal growth factor
(both from Collaborative Research; proliferation media) in 75 cm2 flasks in a 37°C and 5%
CO2 tissue-culture incubator. The cell growth conditions were normalized such that NBL
TICs were cultured for 7 days and SKPs for 14 days post plating prior to harvesting in
exponential growth phase and RNA isolation for transcriptome analysis.
Details of the NBL TIC and SKP samples used in this analysis are provided in Table
3.1. RNA sequencing libraries from NBL TICs and SKPs were constructed from DNase I
treated mRNA as previously described [149,102]. The libraries were sequenced on an
Illumina Genome Analyzer. The read length and amount of aligned sequence data generated
for each library is provided in Table 3.2. The reads were aligned to the human reference
genome build hg18 (National Center for Biotechnology Information Build 36) and a database
of known exon junctions [149] using MAQ software version 0.7.1 in paired-end mode [309].
Duplicate reads were retained for the expression analysis. The number of bases sequenced
per number of exonic bases mapped was used as a measure of gene expression level for each
gene [114]. The sequencing and processing of RNA-Seq libraries from other tumor types was
conducted according to the same production protocol [102,149]. The read length and amount
of aligned sequence data generated for each library is provided in Table 3.2. The reads were
aligned to the human reference genome build hg18 (National Center for Biotechnology
Information Build 36.1) and a database of known exon junctions [149] using MAQ software
version 0.7.1 in paired-end mode, and the duplicate read pairs were removed [309]. The
number of bases sequenced per number of exonic bases mapped was used as a measure of
gene expression level for each gene [114]. The genes with the cumulative expression value of
less than 10 (computed across all samples) were filtered out from the analysis.
The expression values were square-root transformed and used in the lmFit function of
the Linear Models for Microarray Data (LIMMA) Bioconductor package to estimate fold
changes between the compared groups by fitting linear models to each gene [249]. The
LIMMA method was selected for this analysis as it was previously successfully used for the
107
analysis of RNA-Seq data [294]. The NBL TICs versus SKPs and NBL TICs versus other
cancers comparisons were conducted similarly, such that single contrasts were defined in
each analysis creating pairwise comparisons [249]. For both pairwise comparisons (NBL
TICs versus SKPs, NBL TICs versus other cancers), the moderated T-statistic with
Benjamini-Hochberg (BH) multiple testing correction implemented in the eBayes function
was used to assess the significance of differential expression. Those genes with BH–
corrected q < 0.05 were considered statistically significant.
3.4.2 Microarray experiments and data analysis
Cells were collected and lysed in Trizol, and RNA was purified using the RNeasy
mini kit (Qiagen). RNA samples (Table 2.1) were analyzed on Affymetrix GeneChip Human
Exon 1.0 ST Arrays. The data were checked for batch effects, background corrected, and
normalized according to the Robust Multichip Average procedure using the Affymetrix
Expression Console software. Gene-level expression summaries were computed based on all
core probes. Differential gene expression was assessed using the lmFit function of the Linear
Models for Microarray Data (LIMMA) Bioconductor package [249] as described previously
in Section 2.4.3.
3.4.3 Identification of NBL TIC-enriched and depleted genes and the functional
enrichment analysis
List of significantly differentially expressed genes from each analysis (NBL TICs
versus SKPs and NBL TIC versus tissue pool, as measured by RNA sequencing) were
overlapped to identify genes that are significantly enriched and depleted in NBL TICs with
respect to both SKPs and a panel of cancer tissues. The lists of NBL TIC-enriched and NBL
TIC-depleted transcripts were then compared to the lists of differentially expressed genes
from the microarray analysis described in Section 3.4.2 (NBL TICs versus SKPs) to derive
robust sets of genes increased and decreased in expression in NBL TICs compared to SKPs
and other cancers, and confirmed by both RNA sequencing and microarrays (Appendix C).
Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) was then
used on these sets to select canonical pathways significantly enriched among the microarrayconfirmed sets of NBL TIC-enriched and NBL TIC-depleted transcripts (Fisher‘s Exact P <
0.05). The pathway enrichment analysis implemented in Ingenuity uses a Fisher‘s Exact test
to assess the null hypothesis of the number of observed genes in a particular pathway being
108
produced by chance (Ingenuity Systems, www.ingenuity.com). The null hypothesis is
rejected at Fisher‘s Exact P < 0.05.
3.4.4 Gel-free two-dimensional liquid chromatography coupled to shotgun tandem
mass spectrometry
A crude membrane fraction was prepared as follows. NB88R2 cells were swollen in
hypotonic buffer (20 mmol/L Tris, pH 7.4; 10 mmol/L KCl; 5 mmol/L sodium vanadate;
1mmol/L phenylmethylsulfonylfluoride) and lysed by dounce homogenization. The cleared
cell lysate was centrifuged for 15 minutes at 6,000 × g to collect the crude membrane
fraction. The protein fraction was resuspended in urea buffer (8 mol/L urea, 2 mmol/L
HEPES, 2.5mmol/L sodium pyrophosphate, 1 mmol/Lβ-glycerophosphate, and 1 mmol/L
vanadate; Cell Signaling Technology) and was reduced and alkylated with 4.5 mmol/L
dithiothreitol (DTT) and 10 mmol/L iodoacetamide, respectively. Whole-cellular fraction
was prepared as follows. NB88R2 cells were lysed in urea lysis buffer (8 mol/L urea, 2
mmol/L HEPES, 2.5 mmol/L sodium pyrophosphate, 1 mmol/L β-glycerophosphate, and 1
mmol/L vanadate) and sonicated (3 bursts of 4 W for 10 s). The cell lysate was cleared by
centrifugation (20,000 × g for 15 min at 4°C) and was reduced and alkylated with 4.5
mmol/L DTT and 10 mmol/L iodoacetamide, respectively. Proteins were digested with
trypsin and purified using C18 reverse phase resin prior to mass spectrometry. The gel-free
two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry
(MudPIT) analysis was done for 8 cycles as described [284] with the following
modifications: approximately 60 μg (membrane fraction) or 40 μg (whole-cell fraction) of
digested protein was analyzed on a linear ion-trap LTQ-Orbitrap mass spectrometer
(ThermoFisher). Samples were loaded using a Proxeon HPLC system (Thermo Fisher
Scientific) and subjected to MudPIT analysis. All data was analyzed using Sequest
(ThermoFinnigan; version SRF v. 5) and X! Tandem (http://www.thegpm.org/; version
2007.01.01.2 for membrane fraction or version TORNADO 2009.04.01.3 for whole-cell
fraction) search algorithms using the Human International Protein Index database (version
3.41 with 72,155 entries or version 3.66 with 86,845 entries for membrane and whole-cell
fractions, respectively). Sequest and X! Tandem were searched with a fragment ion mass
tolerance of 0.50 or 0.40 Da for membrane and whole-cell fraction, respectively, and a parent
ion tolerance of 2.0 or 5.0 ppm for membrane or whole-cell fraction, respectively. The
109
fragment ion mass tolerance defines an error range for considering two ion peaks as identical,
while the parent ion tolerance defines the error range for peptide identification in the
database. The iodoacetamide derivative of cysteine was specified as a fixed modification in
Sequest and X! Tandem. The oxidation of methionine was specified as a variable
modification. Proteins were accepted based on the following criteria. At least two peptides
per protein were identified with a probability threshold of 95% or greater or 90% or greater
as derived by the Peptide Prophet algorithm [310] and an overall protein identity of >95.0%
or >90% using the Protein Prophet algorithm was used for the membrane-enriched fraction or
whole cell lysate, respectively [311].
3.4.5 AlamarBlue assay
NB12, NB88R2, and FS283 spheres were dissociated into single cells and seeded in
triplicates at 3,000 cells per well in 50 μL medium containing 30% SKPs conditioned media
in non–tissue culture–treated 96-well plates (Corning Life Sciences). AZD1152 (Selleck
Chemicals LLC) was dissolved in dimethyl sulfoxide (DMSO) to a stock concentration of 50
mmol/L, from which 1:3 fold sequential dilutions were prepared. Intermediate dilutions of
the compound were made in medium and immediately added to the cells in a volume of 50
μL. Cells treated with 0.05% DMSO in the absence of the drug were used as a control for
optimal cellular proliferation, whereas wells containing media only were used to determine
the background fluorescence; alamarBlue (10 μL) was added to each well after 72 hours,
followed by incubation for an additional 24 hours. Fluorescence intensity was measured
using PHERAstar SpectraMax Plus384 microplate reader (BMG Labtech) with an excitation
filter of 535 nm and an emission filter of 590 nm. Percentage reduction of alamarBlue was
calculated as ((mean fluorescence of treated wells - background fluorescence)/(mean
fluorescence of DMSO-treated wells - background fluorescence)) * 100. Half maximal
effective concentration (EC50) curves were generated using GraphPad Prism 5 software
(GraphPad Software, Inc.).
3.4.6 Western blotting
Cells were harvested, washed with cold HBSS, and lysed with NP40 lysis buffer
containing 10 mmol/L Tris (pH 8.0), 150 mmol/L NaCl, 10% glycerol, 1% Nonidet P-40, 1
mmol/L phenylmethylsulfonylfluoride, 1 mmol/L orthovanadate, and proteinase inhibitor
cocktail tablet (Complete Mini, EDTA-free, Roche). Cells were lysed for 10 to 20 minutes on
110
ice and centrifuged for 10 minutes at 12,000 rpm at 4°C. Protein amounts were determined
by BCA Assay (Pierce), and 40 μg of protein was loaded per lane. Western blots were probed
with rabbit polyclonal anti-Aurora B antibody (Abcam; ab2254) and mouse monoclonal antiglyceraldehydes-3-phosphate dehydrogenase antibody (Santa Cruz; sc-47724) in 5% w/v
nonfat dry milk in TBS/0.1% Tween-20 over night at 4°C. Blots were developed using ECL
or ECL-plus reagent (GE Healthcare Life Sciences).
3.4.7 Small hairpin RNA (shRNA) knockdowns
Cell lines were stably infected with either a mock treatment or lentivirus-encoding
shRNAs of interest at a multiplicity of infection of 1.0. Seventy-two hours post infection, the
virus was removed, and cells were seeded in triplicate at a density of 10,000 per well in 24well plates. The remaining cells were used for RNA isolation to determine the efficiency of
knockdown by quantitative reverse transcriptase qRT-PCR. Viable cell numbers were
determined on days 1, 3, 5, and 7 post plating by removing cells from wells and counting via
hemocytometer. The experiments were conducted in triplicates.
3.4.8 Exon-level analysis of RNA sequencing data
The BARD1 splicing analysis using RNA-Seq data was conducted as described
below. The RNA-Seq data from NBL TIC libraries (Table 3.1), NBL primary tumors
(Appendix D) and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project
available through the Gene Expression Omnibus (GSE30611) was processed as described in
Section 3.4.1. The exon coverage analysis was based on Ensembl gene annotations
(homo_sapiens_core_54_36p) [298]. These annotations were converted into one model per
gene by taking all transcripts of a given gene and collapsing them into a single gene model
such that exonic bases in a collapsed gene model were the union of exonic bases that
belonged to all known transcripts of the gene. The analysis used SAMtools version 0.1.13
pileup [312] to get the per-base coverage depths, and excluded reads with mapping quality <
10 and reads flagged as poor quality according to the Illumina chastity filter. The final
analysis report included coverage information for each individual exon and intron in the
collapsed gene models, as well as for the cumulative coverage across all the exons in each
model. These coverage statistics were computed using the RPKM method [150]. The RPKM
of 1 was used as a threshold to consider an exon expressed above background [150]. RPKM
for each exon was calculated using the formula: (number of reads mapped to an exon x
111
1.00E9)/(NORM_TOTAL x length of the exon), where NORM_TOTAL = the total number
of reads that are mapped to exons excluding those belonging to the mitochondrial genome.
RPKM for the whole gene was calculated using the formula: (number of reads mapped to all
exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene),
where NORM_TOTAL = the total number of reads that are mapped to exons excluding those
belonging to the mitochondrial genome.
The splice indices for BARD1 (ENSG00000138376) exons were computed as
(exon RPKM / gene RPKM) *100. The significance of the observed differences in splice
indices between sample pairs was assessed using the R Linear Models for Microarray Data
(LIMMA) package adopted for splicing analysis as previously described [295]. The
Benjamini-Hochberg correction for multiple testing was used, and the corrected q-values of
less than 0.05 were considered statistically significant.
3.4.9 AURKB expression analysis
The AURKB expression analysis in Section 3.2.7 was conducted using the RPKM
expression measure as described above for BARD1. The gene-level RPKM was computed
as: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the
lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are
mapped to exons excluding those belonging to the mitochondrial genome. The gene
annotation was based on Ensembl 54 (homo_sapiens_core_54_36p) [298]. The RPKM of 1
was used as a threshold to consider an exon expressed above background [150].
112
Figure 3.1 Transcripts enriched and depleted in NBL TICs compared with SKPs and
other tumor tissues
Differentially expressed genes were identified using RNA sequencing data from NBL TICs,
SKPs and a panel of cancer tissues. An equivalent differential expression analysis was
conducted using exon array data from NBL TICs and SKPs. (A). Venn diagrams summarize
the overlap of the results from the three differential expression analyses (NBL TICs versus
SKPs using microarray, NBL TICs versus SKPs using RNA-Seq, and NBL TICs versus other
cancers using RNA-Seq) for upregulated (left panel) and downregulated (right panel) genes.
(B). RNA sequencing expression profiles of 321 NBL TIC-enriched (red column) and 819
TIC-depleted transcripts (blue column) in NBL TICs, SKPs, and other cancer libraries are
plotted as a heatmap with genes as rows and samples as columns. The transcripts are
represented by rows and samples are represented as columns. The rows are centered and
scaled by subtracting the mean of the row from every value and then dividing the resulting
values by the standard deviation of the row (row Z-Score). The NBL TIC libraries are labeled
with the “TIC” prefix, and the tissue identities of the remaining libraries are explained in
Table 3.2. The 321 NBL TIC-enriched genes and 819 NBL TIC-depleted genes were
confirmed as significantly differentially expressed in all three comparisons as described in
(A). The robustness of the heatmap was confirmed using the bootstrapping algorithm
implemented in the Pvclust Bioconductor package [313], such that NBL TICs could be
separated from the other tissues based on the expression of the 321 NBL TIC-enriched
transcripts 98/100 times. Adapted by permission from the American Association for Cancer
Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A.
Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N.
Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan,
M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates
AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep
15;16(18):4572-82.
113
A
114
B
115
Figure 3.2 Pathway analysis of NBL TIC-enriched transcripts
Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal canonical pathways significantly
enriched among genes upregulated (A) or downregulated (B) in NBL TICs (Fisher‘s Exact P < 0.05). The ratios of observed versus
total numbers of genes in each pathway are plotted with the orange line, whereas the lengths of the blue bars are the significance
scores for each pathway; significance threshold (Fisher‘s Exact P < 0.05) is marked by the vertical orange line. (C). The pathway
named ―Role of BRCA1 in DNA damage response‖ was most significantly upregulated in NBL TICs compared with SKPs and other
tissues; pathway members for which the expression is increased in NBL TICs are highlighted in red, and the protein complexes are
indicated using a bold circle. The recently reported protein-protein interaction between AURKB, BRCA2 and the short (beta) isoform
of BARD1 is denoted with a dotted line. Adapted by permission from the American Association for Cancer Research: O. Morozova,
M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F.
Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. Systemlevel analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res.
2010 Sep 15;16(18):4572-82.
116
A
117
B
118
C
119
Figure 3.3 NBL TICs are sensitive to Aurora B kinase inhibition
(A). Western blot analysis confirmed the presence of AURKB protein in NBL TICs but not
in SKPs. Blots were probed with the rabbit polyclonal anti-Aurora B antibody (Abcam;
ab2254) and the mouse monoclonal anti-GAPDH antibody (Santa Cruz; sc-47724). The
AURKB band at 37 kDa is detectable in NBL TIC lines NB12, NB88R2 and NB122R,
similarly to the positive control (HeLa cells). The AURKB band is undetectable in SKP lines
FS274 and FS227. (B). Reduction of the proliferation of NBL TICs upon shRNA knockdown
of AURKB. Growth curves of NBL TIC lines NB88R2 (top) and NB12 (bottom) infected
with shRNA against AURKB or controls (left panel); quantitative reverse transcriptase PCR
was used to determine the effectiveness of AURKB knockdown (76-86%) (right panel). All
experiments were done in triplicates. (C). AlamarBlue assay revealed that AURKB inhibition
with AZD1152 was effective in NBL TICs at EC50 of 1.5 to 4.6 μmol/L, whereas AURKB
inhibition was effective in SKPs at 12.4 μmol/L. All experiments were done in triplicates.
Reprinted by permission from the American Association for Cancer Research: O. Morozova,
M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard,
R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S.
Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis
of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for
neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.
120
A
121
B
122
C
123
Figure 3.5 NBL cells preferentially express the oncogenic BARD1beta isoform that is
involved in the stabilization of AURKB
The RNA-Seq data from 10 NBL TIC libraries (Table 3.1), 10 NBL primary tumors
(Appendix D) and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project
available through the Gene Expression Omnibus (GSE30611) were analyzed for exon-level
and gene-level expression as described in Methods. (A). Exon usage at the BARD1 locus is
quantified using splice indices (SI). The SI for each BARD1 exon is computed as (exon
RPKM/gene RPKM)*100, and the average SI value is calculated across each of the three
groups: Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10
samples), and NBL TICs (10 samples). The SI values for each of the BARD1 exons (x-axis)
are plotted along the y-axis. The SI values of exons 2 and 3 (marked by stars) are
significantly lower in the NBL primary tumors and NBL TICs, as compared to the normal
tissues (Benjamini-Hochberg-corrected q < 0.05). This finding is consistent with the
expression of the BARD1beta isoform by primary NBL cells and NBL TICs. (B). The genelevel expression of AURKB in each sample was quantified using the RPKM measure as
described in Methods. The average gene-level RPKM value for AURKB is computed for
each group: Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10
samples), and NBL TICs (10 samples). While AURKB expression is not detectable above
background (RPKM ~ 1) in any of the 16 normal libraries, the average AURKB expression in
NBL primary tumors and NBL TICs is 21.6 RPKM (range 2.55—36.95 RPKM) and 44.35
RPKM (range 9.83—67.66 RPKM), respectively. (C). A cartoon representation of the hg18
Ensembl 54 BARD1 gene model [298]. The exons are depicted by squares, while introns are
shown as lines. The protein domains are depicted with squares of different colors, as
described in the legend, and are marked on the exons that encode these domains. The fulllength BARD1 transcript includes all coding exons, and contains three ANK repeats, two
BRCT domains, and a RING-finger domain [296]. The BRCA1 interaction region includes
the RING-finger domain and comprises residues 26-119, encoded by a portion of exon 1,
exon 2 and exon 3. The BARD1beta transcript lacks exons 2 and 3 and encodes a protein
product without the RING-finger domain that stabilizes AURKB through its scaffolding with
BRCA2 [269].
124
A
B
125
C
126
Table 3.1 Human NBL TIC and SKP lines used for gene expression analysis
Human NBL TIC and SKP lines were analyzed by RNA-Seq (column 6) and/or microarray
(column 5) to identify transcripts significantly enriched in NBL TICs (Section 3.2.1). The
International Neuroblastoma Staging System (INSS) stage is listed in column 2, the MYCN
oncogene amplification status of NBL samples is listed in column 3, and the tissue origin is
listed in column 4. All NBL TIC lines are derived from high-risk NBL patients, while SKP
lines are derived from cancer-free children. Superscripts designate samples from the same
patient. Reprinted by permission from the American Association for Cancer Research: O.
Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst,
T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol,
Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. Systemlevel analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug
target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.
Sample
INSS
Stage
MYCN
Description
Human
Exon Array
NB121
4
Single copy
Yes
NB671
4
Single copy
NB12-21
4
Single copy
NB88L12
4
Single copy
NB88R22
4
Single copy
NB122R3
4
Single copy
NB122L3
4
Single copy
Bone marrow
metastasis,
relapse
Bone marrow
metastasis,
remission
Bone marrow
metastasis,
relapse
Bone marrow
metastasis,
relapse
Bone marrow
metastasis,
relapse
Bone marrow
metastasis,
relapse
Bone marrow
metastasis,
relapse
RNASequencing
(Library ID)
Yes
(HS0502)
Yes
(HS0499)
Yes
Yes
(HS1041)
Yes
Yes
(HS0382)
Yes
Yes
(HS0627)
Yes
Yes
(HS1040)
Yes
Yes
(HS1151)
127
Sample
INSS
Stage
MYCN
Description
Human
Exon Array
NB100
4
Amplified
Yes
NB1284
4
Amplified
NB1534
4
Amplified
NB121
4
Amplified
FS210
Normal
Single copy
FS248
Normal
Single copy
FS253
Normal
Single copy
FS225
Normal
Single copy
FS227-P1
Normal
Single copy
FS227-P2
Normal
Single copy
FS229
Normal
Single copy
FS230
Normal
Single copy
Brain
metastasis,
relapse
Bone marrow
metastasis,
diagnosis
Primary tumor,
postchemotherapy
Bone marrow
metastasis,
diagnosis
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Neural crest
stem cell-like
SKPs
Yes
RNASequencing
(Library ID)
Yes
(HS1149)
Yes
(HS1241)
Yes
(HS1593)
Yes
(HS1042)
Yes
(HS1043)
Yes
(HS1150)
Yes
Yes
Yes
Yes
Yes
128
Table 3.2 List of RNA sequencing libraries and their sequencing statistics
Messenger RNA from NBL TICs, SKPs, and a compendium of cancer tissues were
sequenced on an Illumina Genome Analyzer. The reads were aligned to the human reference
genome build hg18 (National Center for Biotechnology Information Build 36) and a database
of exon junctions [149] using MAQ software version 0.7.1 in paired-end mode [309]. The
duplicate reads were retained for this analysis. The median read length for each library is
provided in column 3, and the total amount of aligned sequence is provided in column 4.
Adapted by permission from the American Association for Cancer Research: O. Morozova,
M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard,
R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S.
Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis
of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for
neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.
Library
Tissue source
ID
Median read
Aligned sequence,
length, bp
bp
HS0382
Neuroblastoma TICs
42
6,739,934,150
HS0627
Neuroblastoma TICs
36
3,985,563,940
HS0499
Neuroblastoma TICs
42
2,476,237,136
HS0502
Neuroblastoma TICs
42
2,215,675,756
HS1040
Neuroblastoma TICs
50
3,729,063,600
HS1041
Neuroblastoma TICs
50
4,313,933,000
HS1149
Neuroblastoma TICs
75
5,822,325,000
HS1151
Neuroblastoma TICs
50
3,374,707,500
HS1241
Neuroblastoma TICs
50
3,214,430,500
HS1593
Neuroblastoma TICs
50
5,252,354,800
HS1042
SKPs
50
4,209,705,900
HS1043
SKPs
50
4,453,217,800
HS1151
SKPs
50
3,374,707,500
HS0299
Breast cancer cell line
36
1,119,888,180
129
Library
Tissue source
ID
Median read
Aligned sequence,
length, bp
bp
HS0327
Ovarian tumor
42
1,415,520,768
HS0419
Breast cancer cell line
36
1,177,457,184
HS0445
Breast cancer cell line
36
1,828,856,952
HS0462
Ovarian tumor
36
1,915,337,292
HS0463
Ovarian tumor
39
856,303,704
HS0464
Ovarian tumor
42
868,958,028
HS0465
Ovarian tumor
42
1,058,404,872
HS0466
Ovarian tumor
42
1,295,742,888
HS0467
Ovarian cancer cell line
39
775,075,260
HS0468
Ovarian tumor
36
2,698,323,072
HS0469
Ovarian tumor
42
1,374,384,648
HS0470
Ovarian tumor
36
2,285,498,720
HS0471
Ovarian tumor
42
1,393,590,120
HS0511
Breast tumor
36
6,236,709,588
HS0644
Lymphoma
36
5,816,275,584
HS0652
Lymphoma
36
2,332,176,480
HS0663
Lung tumor
42
283,413,396
HS0701
Ovarian tumor
46
1,074,023,336
HS0702
Ovarian tumor
50
1,830,544,072
HS0703
Ovarian tumor
36
1,422,457,368
HS0706
Lung tumor
36
1,459,970,088
HS0708
Ovarian tumor
36
1,760,040,532
HS0709
Oligodendroglioma cell line
36
2,741,182,488
HS0724
Blood from a cancer patient
42
2,119,757,136
HS0727
Lung tumor
42
2,421,817,104
HS0728
Lung tumor
42
4,112,016,636
HS1085
Oligodendroglioma tumor
50
5,024,879,600
HS1086
Oligodendroglioma tumor
50
6,907,718,400
130
Library
Tissue source
ID
HS1400
Metastatic adenocarcinoma tumor
Median read
Aligned sequence,
length, bp
bp
50
12,806,918,200
131
Table 3.3 Proteins detected in the whole and crude membrane cell extract of NBL TIC
line NB88R and their corresponding RNA-Seq expression level
Protein detection was done as described in Methods. Briefly, at least two peptides per
protein were identified with a probability threshold of 95% or greater or 90% of greater as
derived by the Peptide Prophet algorithm [310] and an overall protein identity of >95.0% or
>90% using the Protein Prophet algorithm was used for the membrane-enriched fraction or
whole cell lysate, respectively [311]. In other words, the 95% CI cutoff for the membraneenriched fraction represents the 95% or greater likelihood of each protein being identified
correctly. Similarly, the lowered threshold of 90% CI used for the whole cell lysate
represents 90% or greater likelihood of each protein being identified correctly. The threshold
was lowered for whole cell lysate analysis due to the lower sensitivity of this assay for
protein identification [314]. Adapted by permission from the American Association for
Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely,
A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N.
Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan,
M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates
AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep
15;16(18):4572-82.
132
NBL TICenriched
gene
Average
expression level
in NBL TICs
HNRNPU
SFXN1
KPNB1
NUP210
NUP214
SLC7A6
XPO5
NUP107
SLC1A4
NUP88
HNRNPM
HNRNPD
FUBP1
HMGB2
TAF15
SFRS2
GTF2I
HTT
EPB41
998
140
652
605
231
232
212
107
279
98
364
379
271
306
278
505
129
531
229
PSME3
USP10
LMNB1
SFRS1
TMPO
HNRNPH1
CPSF6
HNRNPR
SFPQ
IMMT
SSRP1
NUP93
PCNA
CYFIP2
CEP72
NOLC1
LARP1
STRBP
ANKRD44
CLN6
287
118
292
526
304
628
189
257
588
151
369
121
229
535
17
285
750
134
76
76
Protein product type
Transporter
Transporter
Transporter
Transporter
Transporter
Transporter
Transporter
Transporter
Transporter
Transporter
Transmembrane receptor
Transcription regulator
Transcription regulator
Transcription regulator
Transcription regulator
Transcription regulator
Transcription regulator
Transcription regulator
Plasma membrane
protein
Peptidase
Peptidase
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
MembraneEnriched
Fraction
(95% CI)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Whole
Cell
Lysate
(90% CI)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
133
NBL TICenriched
gene
WDR77
MKI67
SUPT16H
RFC5
PARP1
NNT
MCM7
FH
ATIC
MTHFD1
PAICS
MCM2
MCM6
TRAP1
GART
MCM4
GOT2
KARS
MCM3
RRM2
MARS
UBE2N
LBR
TOP2A
MRPL37
SCLY
DARS2
DHTKD1
POLR1A
RFC3
FEN1
MCCC1
TARS2
GPHN
RRM1
SUPV3L1
Average
expression level
in NBL TICs
117
1253
268
57
368
212
374
95
227
188
357
323
188
242
205
390
202
203
346
425
266
119
299
586
123
70
79
133
195
99
133
71
71
77
218
59
Protein product type
Other
Other
Nuclear protein
Nuclear protein
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
Enzyme
MembraneEnriched
Fraction
(95% CI)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Whole
Cell
Lysate
(90% CI)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
134
Table 3.4 Known drug targets among NBL TIC-enriched genes
Transcripts enriched in NBL TICs are in bold. The drug-target associations were obtained
from the Ingenuity Knowledgebase (Ingenuity Systems, www.ingenuity.com). The drugs
previously or currently used in NBL (based on literature review, Ingenuity Knowledgebase,
or ClinicalTrials.gov; http://www.clinicaltrials.gov/ as of February, 2010) are underlined.
Adapted by permission from the American Association for Cancer Research: O. Morozova,
M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard,
R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S.
Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis
of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for
neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.
Gene symbol
ADORA2A
AURKB
PLK1
PDE7A
TYMS
PRIM1
POLE3
RRM1
RRM2
PARP1
GART
POLE
TOP2A
BCL2
SLC1A4
ODC1
IL6
Drug
Caffeine-containing drugs, adenosine, istradefylline, dyphylline,
binodenoson, regadenoson, aminophylline, clofarabine, theophylline
AZD-1152
BI2536
Dyphylline, nitroglycerin, aminophyline, anagrelide, milrinone,
dipyridamole, tolbutamide, theophylline, pentoxifylline
Flucytosine, plevitrexed, nolatrexed, capecitabine, floxuridine,
LY231514, 5-fluorouracil, trifluridine
Fludarabine phosphate
Gemcitabine
Fludarabine phosphate, gemcitabine, clofarabine
Triapine, hydroxyurea, fludarabine phosphate, gemcitabine
INO-1001
LY231514
Nelarabine, gemcitabine, clofarabine, trifluridine
Novobiocin, CPI-0004Na, pixantrone, elsamitrucin, AQ4N, BN
80927, tafluposide, norfloxacin, tirapazamine, TAS-103, gatifloxacin,
valrubicin, gemifloxacin, nemorubicin, nalidixic acid, epirubicin,
daunorubicin, etoposide, doxorubicin, moxifloxacin, becatecarin,
mitoxantrone, dexrazoxane
Oblimersen, (-)-gossypol, obatoclax, G3139
Riluzole
Tazarotene, eflornithine
Tocilizumab
135
Gene symbol
ERBB2
Drug
Trastuzumab, BMS-599626, ARRY-334543, XL647, CP-724714,
HKI-272, lapatinib, erlotinib
HDAC1
Tributyrin, PXD101, pyroxamide, MGCD0103, FR901228,
vorinostat
Abciximab, TP-9201, eptifibatide, tirofiban
Adalimumab, infliximab, CDP870, golimumab, thalidomide,
etanercept
Corticosteroid-containing drugs (beclomethasone dipropionate)
MDX-1100
GRN163L
Colchicine/probenecid, XRP9881, E7389, AL-108, EC145, NPI2358, milataxel, TTI-237, vinflunine, podophyllotoxin, colchicines,
epothilone B, TPI 287, docetaxel, vinorelbine, vincristine,
vinblastine, paclitaxel, ixabepilone
Flavopiridol
Collagenase
CS-1008
Dasatinib
δ-Aminolevulinic acid
ITGA2B
TNF
NR3C1
CXCL10
TERT
TUBA1C
CDC2
COL14A1
TNFRSF10B
FYN
ALAD
136
Chapter 4: Whole genome characterization of primary neuroblastoma
tumors reveals a wide spectrum of somatic alteration4
4.1 Introduction
In Chapters 2 and 3 of this thesis, I reported on the analysis of the expression profiles
of normal and malignant neural crest stem cell-like cells, respectively. These analyses
revealed a number of genes and pathways, such as those involved in DNA double-stranded
break repair, to be aberrantly expressed in metastases-derived NBL TICs, and implicated
Aurora kinase B as a novel drug target against NBL TICs (Chapter 3). Exon-level analysis of
RNA-Seq data described in Chapter 3 provided a potential mechanistic avenue to account for
the sensitivity of NBL TICs, but not normal cells, to AURKB inhibition. Subsequent work by
others has confirmed Aurora kinase B to be a drug target against NBL in primary tumors
[306].
The overall objective of Chapter 4 is to conduct a high resolution characterization of a
panel of primary NBL tumors using next-generation sequencing approaches with a goal of
identifying additional drug targets for the disease that are relevant to primary tumors at
diagnosis. In particular, in this Chapter I address two specific aims listed below. First, I
address whether primary NBL tumors harbor recurrently mutated genes. Second, I
investigate whether the genetic aberrations found in primary tumors recurrently target the
same signaling pathways. To accomplish these aims, we developed a strategy that uses a
combination of next-generation sequencing approaches to comprehensively characterize 99
primary NBL tumor DNA samples and matched peripheral blood DNA samples used as
normal reference material (Figure 4.1). We used Illumina whole exome sequencing to
4
A version of the Chapter is in revision, and the co-author contributions are detailed in the Preface as per the
University of British Columbia PhD thesis guidelines T.J. Pugh*, O. Morozova*, E.F. Attiyeh, S. Asgharzadeh,
J.S. Wei, D. Auclair, K. Cibulskis, M.S. Lawrence, A.H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, I.
Birol, R.D. Corbett, K.L. Mungall, Y. Zhao, R.A. Moore, N. Thiessen, A. Lo, R. Chiu, S.D. Jackman, A. Ally,
B. Kamoh, A. Tam, J. Qian, M. Krzywinski, M. Hirst, S.J. Diskin, Y.P. Mosse, K.A. Cole, M. Diamond, R.
Sposto, L. Ji, T. Badgett, W.B. London, Y. Moyer, J.M. Gastier-Foster, M.A. Smith, J.M. Guidry Auvil, D.S.
Gerhard, M.D. Hogarty, S.B. Gabriel, S.J.M. Jones, G. Getz, R.C. Seeger, J. Khan, M.A. Marra, M. Meyerson,
J.M. Maris. The genomic landscape of high-risk neuroblastomas reveals a wide spectrum of somatic mutation.
In Revision. *Authors contributed equally.
137
characterize 81 tumor/normal pairs; Illumina whole genome and transcriptome sequencing to
characterize 10 tumor/normal pairs; and Complete Genomics, Inc. (CGI) whole genome
sequencing to characterize another 10 tumor/normal pairs. Among these samples, we
included one case that was studied by both whole exome and whole genome sequencing
using Illumina, and another case that was studied by whole genome sequencing using both
CGI and Illumina. The Illumina and CGI sequencing technologies are discussed in Section
1.6.3.1.
This study reports on the application of second-generation sequencing to the
characterization of high-risk NBL. The 99 NBL cases included in this analysis were collected
and characterized as part of the Therapeutically Applicable Research to Generate Effective
Targets (TARGET) initiative (http://target.cancer.gov). The TARGET initiative is a pediatric
branch of The Cancer Genome Atlas, and was designed to identify molecular targets for
pediatric cancer drug development. All sequence data have been deposited in dbGAP
(http://www.ncbi.nlm.nih.gov/gap) and six-letter codes are used to identify individuals in this
database throughout the text. The clinical details of the cases are listed in Appendix D.
4.2 Results
4.2.1 Exome sequencing
Exome sequencing was used to survey the frequency of coding sequence mutations in
81 high-risk NBL tumor/normal pairs, one of which was also included in the set of 19 whole
genome sequences described below (Figure 4.1). DNA was extracted, amplified, and ~33 Mb
of genomic sequence captured by in-solution hybridization [315] followed by Illumina
sequencing [316]. The target regions consisted of 193,094 exons from 18,863 genes
annotated by the Consensus Coding Sequences [317] and RefSeq [318] databases as coding
for protein or micro-RNA (accessed November 2010). On average, 10.7 Gb of unique
sequence data were generated for each sample, of which 58% were aligned to the target
exome using Burrows-Wheeler Aligner [319] (84% if bases within 250 bp of each target
were included), resulting in median coverage of 191X of each on-target base. On average,
90% of targeted bases were suitable for mutation detection (> 14 reads in the tumor and > 8
reads in the normal) using the muTect algorithm [108,320,91]. A total of 14% of exons had
fewer than 90% of bases assessable for mutation in at least 73 of 81 exome pairs (90%),
apparently due to systematic capture or sequencing problems related to GC-content.
138
4.2.2 Whole genome and transcriptome sequencing
Genome sequencing of tumor and matched normal DNA was used to explore the
spectrum of somatic sequence and structural aberration present in 19 high-risk neuroblastoma
cases. To survey the fraction of rearrangements that are expressed in the transcriptome, we
also generated over 10 Gb of RNA-Seq data in 10 out of 19 cases. To account for potential
biases imposed by the new sequencing platforms, we used two different genome sequencing
approaches, Illumina [316] (10 cases) and CGI [321] (10 cases); one case, PASLGS, was
sequenced using both methods.
Ten tumor and normal genomes were sequenced to an average 29.7X haploid
coverage using Illumina technology, while another set of ten tumor and normal genomes
were sequenced to an average 59.9X haploid coverage using CGI technology. The coverage
in the Illumina-derived data set permitted single nucleotide variant (SNV) detection at over
86% of positions in the reference genome (hg19), and over 74% of the coding sequence, as
defined by the exome experiment. Similarly, the average coverage by the CGI allowed for
the SNV analysis of 86% of the reference genome, and 94% of the coding sequence. The
differences in average coverage achieved by the CGI and Illumina platforms could be
attributed to the different depth of sequencing used in each approach (average 29.7X and
average 59.9X for the Illumina and CGI platforms, respectively). The analysis of the
common case, PASLGS, revealed that the somatic non-silent mutation rates per Mb of
coding sequence computed separately for CGI and Illumina data were 0.58 and 0.66,
respectively, and the two methods detected 9 somatic exomic mutations in common.
As both exome and genome data were generated for PANYGR using Illumina
technology, we had the opportunity to compare variants detected using both approaches on a
single sample. The mutation rates in the coding region derived from these data sets were 0.59
and 0.65 non-silent mutations per Mb, in the exome and genome respectively, and the two
methods detected 27 somatic exomic mutations in common. One additional variant called
from the exome data was not detected by the genome analysis due to low coverage in the
tumor genome at this position (4X), although the mutant allele was supported by one read.
As the two methods were concordant for all somatic exomic mutations, we combined and
directly compared mutation calls from exome and genome data sets for subsequent analyses.
139
4.2.3 Overall mutation frequencies
Across the coding regions of 99 tumor/normal pairs (80 exome, 18 genome, 1 exome
and genome), we detected 2,500 candidate somatic mutations in 2,105 genes (Appendix E).
A median of 20 candidate exomic mutations was found per tumor (range 3-236), of which 16
were predicted to affect an amino acid or splicing change (non-silent mutations) (range 3171) (Figure 4.2A). This corresponded to a median non-silent mutation frequency of 0.56
mutations per Mb, after correction for the number of bases with sufficient data for mutation
detection (Figure 4.2A). This is one of the lowest median mutation frequencies reported to
date [116,322], and is consistent with recent data showing a similar 0.4 non-silent
mutations/Mb recently reported for medulloblastoma [99], another pediatric solid tumor.
Synonymous mutations were relatively few compared to non-silent changes, suggesting a
low rate of putative passenger events. We did not observe a correlation between mutation
frequency and age of diagnosis, MYCN amplification status, or other prognostically relevant
clinical or genomic variables (q > 0.015).
The rates of transitions (substitutions that change purines to purines or pyrimidines to
pyrimidines) and transversions (substitutions that change purines to pyrimidines or vice
versa) in NBL differed greatly from those found in cancers with known environmental
contributions. For instance, over 90% of mutations in melanoma are C>T/G>A transitions
associated with ultraviolet light exposure [112] while smoking-associated C>A/G>T
transversions make up 46% of mutations in small cell lung cancer [110]. By comparison, the
C>T/G>A transitions and C>A/G>T transversions comprised 29% and 36% of all mutations
in NBL, respectively. These rates were consistent with current hypotheses of limited
environmental contribution to NBL development [174].
While the mutation rate was low across most tumors, 2 tumors had markedly
increased non-silent mutation rates (greater than the third quartile plus 3.5 times the
interquartile range, i.e. Q3 + 3.5*IQR) (Figure 4.2). This threshold for outliers (hypermutated
samples) was selected to identify extreme outliers, more stringent to that used in the TCGA
study that reported on hypermutated glioblastoma multiforme (GBM) tumors [91]. Both
hypermutated tumors contained alterations that may explain accumulation of somatic
mutations. Specifically, PAPPKJ harbored a deletion of one copy and a nonsense mutation of
the other copy of the DNA mismatch-repair gene MLH1, likely resulting in a complete loss
140
of this protein. Similarly, PALJPX contained a heterozygous nonsense mutation in the DNA
nucleotide excision repair (NER) gene damaged DNA-binding protein 1 DDB1 that is
involved in maintaining genome integrity and preventing the accumulation of DNA lesions in
replicating cells [323]. The loss of the fission yeast DDB1 ortholog ddb1 results in a
hypermutator phenotype in yeast [324]. In addition, the knockdown of the Drosophila DDB1
ortholog D-DDB1 in wing imaginal discs produces a genome instability phenotype in
somatic cells [325]. These observations suggest that the hypermutator phenotype in the NBL
PALJPX case may be explained by the heterozygous nonsense mutation in DDB1. However,
as the yeast experiments were conducted in haploid organisms, there is no direct evidence
that haploinsufficiency for ddb1 observed in the NBL tumor is sufficient to drive
hypermutation. No nonsense mutations in DDB1 have been reported in the COSMIC
database as of February, 2012 [326]. The finding of hypermutation in NBL represents a
possible new subtype of this disease and future studies will define if there are unique clinical
features associated with this aberration.
4.2.4 Verification of candidate somatic mutations using orthogonal approaches
A total of 438 candidate somatic SNVs identified by whole genome sequencing were
selected for verification by Sanger and/or Illumina sequencing, performed at CGI. Three
hundred and seventy nine variants were confirmed using at least one of these approaches,
corresponding to a 90% verification rate when failed assays were accounted for. In addition,
224 candidate somatic mutations were selected for verification with Sequenom genotyping
[327], conducted at the Broad Institute. The Sequenom method is based on distinguishing
allele-specific primer extension products by mass spectrometry, and was originally
developed for germline genotyping [328]. One hundred and sixteen of the 224 sites (52%)
were confirmed by this approach. Two main reasons may account for the low verification
rate of the Sequenom experiment as compared to the sequencing verification experiment
conducted at CGI. First, since the Sequenom assay was originally developed for germline
analysis, it is poor at detecting mutations present at low allelic frequencies [116], as may be
the case in heterogeneous cancer samples. Second, whole genome amplification reactions,
which were used in the exome sequencing experiment, may have introduced artifacts,
resulting in a higher fraction of false positives in these data compared to genome sequencing
data, which did not involve whole genome amplification. All somatic mutations that are
141
explicitly described in the text have been verified, and the methods used to confirm each
mutation are listed in Table 4.1.
4.2.5 Genes and pathways with significant frequency of mutation
We identified 8 genes mutated at a significant frequency (q < 0.2) in the 99 tumors
using the MutSig algorithm [329] (Table 4.2). Of these, only five genes remained significant
(q < 0.2) when the hypermutated samples were excluded from the analysis (Table 4.2). The
MutSig algorithm tests the null hypothesis that all the observed mutations in each gene are a
consequence of random background mutation. Genes for which this hypothesis is rejected
based on the Benjamini-Hochberg false discovery rate-corrected q-value are considered
significantly mutated [329], such that mutations in these genes likely contribute to the
malignant phenotype. Due to the very low mutation rate in NBL, we chose the BH false
discovery rate-based q-value threshold of 0.2, which implies that we allowed for 20% of false
positives in our data (1 in 5 MutSig hits). While mutations in ALK and PTPN11 were
previously known in NBL [188,194,193,192,330], the remaining 6 candidate genes are newly
reported in this disease. Using the available RNA-Seq data from 10 cases as well as
published RNA-Seq data from neural-crest-like cells [244], also discussed in Chapter 3, we
determined that PGLYRP3, GABRA6 and IGSF11 were not expressed in either normal neural
crest-like cells or NBL cells. Since a key goal of this study and the TARGET consortium is
to identify genetic alterations that could potentially be targeted by drugs, we focused our
analysis on four genes significantly mutated in non-hypermutated samples and expressed in
NBL: ALK, PTPN11, LILRB1 and NRAS.
ALK was previously reported to be mutated in up to 10% of NBL cases
[192,194,188,193] consistent with our unbiased screen here showing 9 cases with a somatic
ALK mutation, all restricted to the kinase domain. A tenth case, PANYGR, harbored a
germline variant also in the ALK kinase domain (clinically-associated dbSNP rs113994092,
described as a pathogenic allele), heterozygous in tumor and normal samples from this
patient. Mutations in the ALK oncogene occurred exclusive of the other genes mutated at a
statistically significant frequency and were independent of the MYCN amplification status (3
MYCN-amplified, 7 non-amplified).
Germline and somatic PTPN11 mutations have been reported in NBL [330] and both
somatic mutations observed in this study were located at residues frequently mutated in
142
juvenile myelomonocytic leukemia [331] and individuals with Noonan syndrome [332]. No
pathogenic germline PTPN11 variants were found as part of the study.
Another MutSig hit, LILRB1 is an inhibitory cell-surface immunoglobulin-like
receptor that has been reported to limit the activation of the mTOR pathway through the
activation of SHP2 [333], the protein encoded by PTPN11. Therefore, loss of SHP2
regulation by LILRB1 could in theory have a similar oncogenic effect as activating mutations
of PTPN11, as such mutations could potentially lead to the constitutive activation of the
RAS/MAPK signaling. The case PALTEG contained a splice site mutation that is predicted
to disrupt the splice consensus sequence. Two additional mutations fell within the
immunoglobulin domain: a nonsense alteration in PALZZV and a missense change in
PANUKV. In addition to PTPN11 and LILBR1 that are upstream regulators of the
RAS/MAPK pathway, five Cancer Gene Census genes in the MAPK signaling pathway
(KEGG hsa04010) appeared to harbor somatic mutations in six tumors. These mutations
were in receptor tyrosine kinases EGFR, NFKB2, NTRK1, PDGFRB, and the downstream
target NRAS (2 tumors), which was also identified by MutSig as significantly mutated in our
study. A recent report suggested that PTPN13 may also function in the MAPK pathway [334]
and translocations or mutations of PTPN12 and PTPN13 were identified in 3 samples, albeit
one of which was hypermutated. Overall, MAPK pathway oncogenes were mutated in 15%
of high-risk NBL cases studied (Figure 4.2B).
Chromatin remodeling genes appeared to be frequently disrupted in NBL as 18
histone-modifying genes harbored coding somatic mutations: nonsense mutations in
CREBBP, CHD8, KDM6A, MLL4; and missense mutations in EP300, ARID1A (2 tumors),
ARID1B, ASH1L, CHD6, HDAC4, KDM5A, MLL3, MLL5, NUP98, PAX5, PRDM2, and
PRDM4. These genes encode characterized histone acetyltransferases (CREBBP and EP300),
DNA helicases (CHD6, CHD8, ARID1A), histone demethylases (KDM5A and KDM6A),
histone methyltransferases (ASH1L, MLL3, MLL4, MLL5, PRDM2), histone deacetylase
(HDAC4), and other proteins involved in chromatin remodeling (ARID1B, NUP98, PAX5,
PRDM4). More than a half (9/17) of the chromatin remodeling genes mutated in NBL were
annotated in the Database for Annotation, Visualization, and Integrated Discovery (DAVID)
[335] as positive regulators of transcription. Five chromatin remodeling genes were mutated
in tumors with ALK mutations. Intriguingly, loss of function mutations in chromatin
143
modifiers have been reported in lymphoma [102,106], bladder cancer [336], and other tumors
[337]. Overall, a potential defect in chromatin remodeling was identified in 11% of high-risk
NBL cases studied (Figure 4.2B).
4.2.6 Genome rearrangements and structural variants
We used the trans-ABySS de novo assembly pipeline for Illumina sequencing data
[297] to search for expressed rearrangements affecting genes; each of these events was
confirmed by local re-assembly of the genomic reads [338]. In parallel, the CGI structural
variation pipeline was used to detect candidate structural variants in the CGI genomes [110].
In total, 83 distinct events affecting 97 genes were identified using the two approaches in the
19 neuroblastoma cases, including 22 expressed events found in the RNA-Seq data; and a
median of 4 structural variants (0 to 14 events range) was detected per tumor genome. The
genomic architectures of 19 cases with available genome sequencing data are plotted in
Figure 4.3 using CIRCOS [339]. The notable structural variants are summarized in Table 4.3.
We found 4 distinct somatic translocations between chromosomal arms 11q and 17q
that are commonly affected by numerical alterations in NBL [196]. These four somatic
events occurred in three cases, PARGUX (2 events), PASCKI and PANNMS, and involved
different genes and breakpoints in each case. Notably, one of the t(11;17) translocation in
PARGUX is predicted to disrupt the function of IKZF3, an Ikaros DNA-binding protein 3
involved in chromatin remodeling, and previously implicated in chronic lymphocytic
leukemia [340]. Another chromatin remodeling gene, ARID1B, was targeted by a somatic
~30 kb deletion that removed exon 2 in PASLGS, and appeared to be loss-of-function.
Members of the MAPK pathway were also the target of somatic structural change, a
MAPK10/PRDM5 fusion and a PRELID2/MAPK9 fusion, both resulting from
intrachromosomal deletions but with unknown frame effects due to multiple transcript
annotations for these genes. Other cancer genes affected by somatic structural variants
included ABL2 which was fused out of frame with ACBD6 and harbored a somatic missense
mutation in this study; STAG1, a p53 pathway member recently implicated as a target of
translocations in several cancers [341,342]; cadherins CDH13 and CDH18; and NOP2 and
AUTS2, both known translocation targets in acute lymphoblastic leukemia [343].
Two loci were affected recurrently by structural variants in two cases: the
transcriptional repressor ZFHX3 (ATBF1) that has been shown to function as a tumor
144
suppressor in several cancers [344] and the CDK5 regulatory subunit associated protein 1
CDKAL1. Neither of these genes has been implicated in NBL, and their potential role in this
disease warrants further investigation. The NBAS (neuroblastoma amplified sequence) locus
located 0.4 Mb from MYCN appeared to be most commonly affected by rearrangements in
our cohort of MYCN-amplified cases, harboring 11 distinct rearrangements in three cases
PASDZJ, PARSHT, and PARIRD. The NBAS-rearranged cases were associated with an
increased copy number at the NBAS locus, and with more than 2-fold increase of the NBAS
mRNA compared to the wild type NBAS cases, consistent with the previous observation that
a fraction of MYCN-amplified cases involves co-amplification of NBAS.
4.2.7 Mutations in other known cancer genes and regions
Beyond ALK, PTPN11, and NRAS, no cancer genes listed in the Cancer Gene Census
[7] had mutation frequencies that rose to the level of statistical significance (q < 0.2). In
addition to the mutations in gene sets noted above, 14 genes listed in the Cancer Gene
Census were mutated across 12 samples (Table 4.1; Figure 4.2B). Mutations in 2 of these
genes, ATM and PIK3CA, matched a mutation listed in COSMIC [100]. Two MYC family
members, MYC and MYCN, were mutated in two NBL tumors lacking MYCN amplification.
The SIFT algorithm [345] predicted both variants to be deleterious (score < 0.01) and the
MYCN mutation was previously reported in glioblastoma [346], suggesting that it may confer
selective advantage to malignant cells. We also detected and validated a fusion of MYCN
with GULP1 that retained the reading frame and may be activating. Therefore, it appears that
several mechanisms exist to promote MYC signaling in NBL beyond amplification.
Among the genes that harbored protein sequence-altering mutations in two or more
non-hypermutated cases, there were several genes that mapped to known chromosomal
regions frequently altered somatically and of clinical significance [197]. Known
chromosomal regions of clinical significance in NBL are described in detail in Section
1.9.2.1 and include losses of chromosomal arms 1p and 11q, and gains of chromosomal arm
17q. Out of the 99 cases analyzed in this study, 42 cases harbored a loss of 1p, 49 cases
harbored a loss of 11q, and 63 cases harbored a gain of 17q (Appendix D).
A single gene mapping to the 1p31-1p36 common deletion region, ARID1A, harbored
somatic non-synonymous mutations in two non-hypermutated cases. The ARID1A locus has
been implicated as a tumor suppressor in several adult and pediatric solid tumors by both
145
genomic and functional evidence [347,348]. Even though the two cases with non-silent
ARID1A mutations, PALXHW and PALNLU, each had two copies of 1p, both ARID1A
mutations (Table 4.1) were missense homozygous changes predicted to be damaging to the
protein (score < 0.05) by the SIFT algorithm [349].
The genes CATSPER1, AHNAK, PITPNM1, and SORL1, mapping to the 11q region
commonly deleted in NBL, were each mutated in two non-hypermutated cases. All six cases
with mutations in at least one of these genes harbored a loss of the chromosomal arm 11q.
These cases included PALHVD, PAINLH, PAPBZI, PAKFUY, PALFPI, and PALSAE
(Appendix D). Consistent with losing one copy of the 11q arm, all candidate mutations in
CATSPER1, AHNAK, PITPNM1, and SORL1 detected in this study were homozygous. While
CATSPER1 and PITPNM1 have not been reported previously to play a role in NBL or
cancer in general, the loss of AHNAK is seemingly associated with the radiosensitivity of
NBL cells [350], and SORL1 may be involved in the proliferation of NBL cell lines [351].
In addition, SCN4A, CRHR1, ABCA5, and IGF2BP1 mapped to 17q, commonly
gained in high-risk NBL. None of these loci have been previously implicated in NBL.
Finally, several genes on 11q (FAM86C1, RNF121, ATG2A and SHANK2) and 17q (IKZF3,
TRIM37 and BCAS3) appeared to be affected by the t(11;17) translocations. None of these
genes were affected by the translocations recurrently. All three cases, PASCKI, PARGUX
and PANNMS, with translocations between chromosomal arms 17q and 11q, harbored
concurrent losses of 11q and gains of 17q, suggesting that the unbalanced translocations
t(11;17) may account for gains of 17q and losses of 11q in a fraction of NBL tumors.
As described in Section 1.9.2, a GWAS study has been contacted in NBL and
implicated common germline variants in FLJ22536, BARD1 and LMO1 to be associated with
the susceptibility to sporadic high-risk NBL [197,66,198,196]. The current study did not
detect somatic non-silent variants or somatic recurrent non-coding variants in any of these
loci in our cohort of 99 NBL cases. However, we did observe that all 10 cases with available
matched tumor and normal genome and tumor transcriptome sequencing data (Appendix D)
had novel germline variants (single nucleotide substitutions), not reported in the 1000
genomes project data [352] or any samples from non-cancerous tissues sequenced at the
Genome Sciences Centre [353]. These variants may be related to the aberrant splicing
observed at the BARD1 locus (Section 3.2.7).
146
4.3 Discussion
The described survey for somatic mutation in primary NBL tumors has found this
cancer to have one of the lowest mutation frequency rates among solid tumors examined to
date, similar to that of another pediatric cancer, medulloblastoma [99]. The mutations
identified in our study were distributed across a large number of genes as 88% of genes with
non-silent mutations were only mutated in 1 of the 99 tumors studied. In addition, non-silent
mutations were seen four times more often than silent mutations (1,735 non-silent versus 420
synonymous mutations in non-hypermutated samples), suggesting a selective pressure for
coding changes. This is unlike most adult cancers, where passenger mutations are much more
frequent than driver mutations [354]. Presumably, the low passenger mutation rate observed
in NBL reflects less environmental influence in this cancer compared to adult malignancies.
This is consistent with NBL typically arising at a very young age, with most cases diagnosed
before 5 years of age [174].
The genome sequence analysis described in this Chapter has been able to identify
candidate mechanisms involved in 51 of 99 neuroblastomas (ALK mutations, MAPK
pathway oncogene mutations, mutations in chromatin remodeling genes, mutations in MYC
family genes, mutations in Cancer Gene Census genes, as highlighted in Figure 4.2). While
sequencing of more NBL cases will provide increased power to discover additional recurrent
somatic events, the relative paucity of focal mutations discovered here challenges the general
concept that druggable targets and pathways can be defined in each patient by sequencing
approaches alone, at least in the somatic mutation space. Nonetheless, our data address the
overall objective of this Chapter and identify common vulnerabilities of primary NBL tumors
that may be exploited therapeutically. For instance, a subset of NBL patients may be
sensitive to the inhibition of ALK (9% of patients with high-risk NBL) and MAPK signaling
(15% of patients with high-risk NBL); and strategies that target these pathways can be
immediately prioritized for clinical development due to the known activating role of the
mutations in these pathways. In contrast, chromatin remodeling abnormalities, found in 11%
of patients with high-risk NBL, need to be further investigated before they can be targeted
clinically.
In Chapter 3, we conducted an expression analysis of NBL TICs and found that the
double-stranded break DNA repair pathway, involving the BRCA1/BARD1 complex was
147
expressed at a higher level in NBL TICs compared to normal neural crest-like cells and a
panel of other cancers. Aberrations in DNA repair may appear counter-intuitive, given the
low mutation rate in NBL, discussed in this Chapter. However, the observation is consistent
with the results of sequencing studies in breast cancers, many of which also possess a defect
in double-stranded break DNA repair [355]. In particular, albeit with some exceptions,
breast cancers do not typically harbor an increased frequency of somatic point mutations as
compared to other adult tumors, such as lung cancers or melanoma [354]. Instead, breast
cancer genomes often harbor a high frequency of large chromosomal aberrations [356] that
may be associated with both a deficiency of homologous recombination (e.g. loss of function
of BRCA1 in familial breast cancer) or the hyperactivity of the BRCA1 signaling pathway
through gain of function mutations in BRCA1, seen in other types of breast tumors [355].
Similarly, despite the low rate of somatic point mutations, NBL tumors display a high
prevalence of large chromosomal alterations (chromosomal alterations affecting genes are
shown in Figure 4.3) suggesting that aberrations in double-stranded break DNA repair may
play a role in this disease. Increased expression of the BRCA1 pathway in NBL TICs further
suggests that hyperactivity of this pathway may be a factor in NBL, similarly to what is seen
in some cases of breast cancer [357].
Interestingly, we did not observe any somatic non-silent mutations in the BRCA1
pathway members, including BARD1, shown to be aberrantly spliced in NBL cells in
Chapter 3. We did, however, observe previously unreported intronic germline variants in
BARD1 in all 10 NBL cases with available genome and transcriptome sequencing data.
Since preliminary reports suggest that GWAS risk-alleles of BARD1, all occurring in introns
[197], may be associated with aberrant splicing of this gene [196], we hypothesized that the
novel germline variants observed at this locus by the current study may also play a role in
BARD1 splicing. Further functional work is needed to confirm these possibilities. Since the
majority of our data set comprised exome sequencing data, we do not exclude the possibility
that non-coding (regulatory) mutations may explain the observed increased expression of the
BRCA1 signaling pathway members discussed in Chapter 3.
148
4.4 Materials and methods
4.4.1 Sample selection and preparation
The study focused on high-risk NBL, and we attempted to reduce heterogeneity by
restricting eligibility to subjects between 1.5 and 5.5 years of age at diagnosis (median 2.94
years) with stage 4 (high-risk metastatic) disease (Appendix D). There was a preponderance
of male subjects (62%). All specimens were obtained at original diagnosis after informed
consent at Children‘s Oncology Group (COG) member institutions. Thirty-four of the 99
tumors studied harbored amplification of the MYCN oncogene and 40 had a diploid DNA
index (values of 1 in Appendix D). These two assays are routinely performed on all NBL
samples in the COG NBL reference laboratory by fluorescence in situ hybridization and flow
cytometry, respectively. Flash frozen tumor samples were analyzed for percent tumor content
by histopathology prior to nucleic acid extraction, and samples with <75% tumor content
were not included in this study. Tumor RNA and DNA were derived from fresh frozen
primary NBL tissue and matched normal peripheral blood. All sequence data have been
uploaded to dbGAP (http://www.ncbi.nlm.nih.gov/gap) and six-letter codes used to identify
individuals in this database are referenced throughout the text.
4.4.2 Illumina library construction and sequencing
Genome and transcriptome libraries of the ten BCCA cases were constructed from
input amounts of 2-4µg DNA and 3-10µg DNaseI-treated total RNA, respectively, following
the previously described protocols [102,106]. The sequencing was carried out using Genome
Analyzer IIx (GAIIx) (Illumina, Hayward, CA, USA) as per the manufacturer's instructions.
Paired end reads generated from genome and transcriptome sequencing were aligned to the
hg19 (GRCh37) reference human genome assembly using BWA version 0.5.7 [319]. RNASeq reads were processed as previously described in Section 3.4.1 and [244,149].
4.4.3 Detection of candidate somatic mutations in genome sequencing data
SNV detection in Illumina tumor genome and transcriptome data was performed
using SNVMix2 with filtering to include SNVs such that the combined probability of either
heterozygous or homozygous SNV was greater than 0.99 [358]. Reads flagged as poor
quality according to the Illumina chastity filter, duplicate reads, and reads aligned with a
mapping quality < 40 were excluded from SNV calling. The somatic status of SNV calls was
determined using read evidence from the SAMtools version 0.1.13 pileup [312] constructed
149
at the variant positions in the matched normal genome. Positions with normal genome
coverage of at least 5 unique reads supporting the reference allele were considered somatic.
The candidate somatic SNV calls were inspected using the Integrative Genome Browser
[163], and only those calls confirmed by visual inspection were used in the analysis. Ten of
these events, listed in Table 4.1, was validated using ultra-deep re-sequencing with read
indexing as previously described [102].
The Pindel software was used as suggested by the authors to identify candidate short
insertions from the tumor and normal genomic bam files [359]. The mean and standard
deviation of read pair insert sizes were calculated for all samples to be ~400 bp, and this
value was used in each Pindel run. The Pindel short insertion output was filtered to select
events that mapped to annotated genes (Ensembl 59 [360]). Candidate somatic short insertion
events that recurred in at least two cases were manually reviewed in the Integrated Genome
Browser (Broad Institute). In addition, SAMtools version 0.1.13 pileup and varFilter
functionality [312] was used to indentify indels from the tumor and normal genomic
alignment bam files. To detect candidate somatic indels, further filtering was done separately
on normal and tumor libraries. In the normals, any event with a total coverage of less than 8
was discarded. In the tumor libraries, only the indels with (#indel reads/#total reads) >= 16%
were considered. After the filtering, any indel present in one or more normal libraries was
flagged as germline.
None of the candidate somatic coding indels from the Pindel or SAMtools analysis
was confirmed by manual inspection in the Integrated Genome Browser [163], and hence
they are excluded from the text. For CGI data, the provided MAF files were used to extract
somatic mutations using the filtering criteria provided in Table 4.4.
4.4.4 Gene coverage in transcriptome sequencing data
The alignments of RNA-Seq data were used to estimate gene expression levels. Gene
coverage analysis was based on Ensembl gene annotations (homo_sapiens_core_59_37d)
[360]. These annotations were converted into one model per gene by taking all transcripts of
a given gene and collapsing them into a single gene model such that exonic bases in a
collapsed gene model were the union of exonic bases that belonged to all known transcripts
of the gene.
150
The analysis used SAMtools version 0.1.13 pileup [312] to get the per-base coverage
depths, and excluded reads with mapping quality < 10 and reads flagged as poor quality
according to the Illumina chastity filter. The reads per kilobase of exon model per million
mapped reads (RPKM) metric was used to estimate gene expression level [150]. RPKM was
calculated using the following formula: (number of reads mapped to all exons in a gene x
1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where
NORM_TOTAL = the total number of reads that are mapped to exons excluding those
belonging to the mitochondrial genome.
4.4.5 Copy number analysis using genome sequencing data
Copy number analysis was conducted using an HMM that was previously described
[114,109]. Briefly, for copy number analysis, 50 million reads (mapping Q >10) were
randomly selected from the final merged bam files for the tumor and matched normal
genomes. The normal reads were split into bins of 200 adjacent alignments, and the
corresponding bins in the tumor genome were used to calculate the ratio of tumor/normal
reads in each bin. These values were normalized by subtracting the median of the tumornormal ratios across the whole genome. This resulted in a measurement of the relative read
density from the tumors and matched normals in bins of variable length along the genome,
where bin width was inversely proportional to the number of mapped reads in the normal
genome. GC bias correction was applied, and an HMM was used to classify and segment the
tumor genome into continuous regions of somatic copy number loss (HMM state 1),
neutrality (HMM state 2), slight gain (HMM state 3), gain (HMM state 4) or high gain
(HMM state 5).
For CGI data, cnvTumorSegmentsRelative.tsv files were used to obtain somatic CNV
calls. These calls were then converted to the five HMM states described above using the
following rules: if calledLevel<=0.79 then 1; if 0.79<calledLevel<=1.25 then 2; if
1.25<calledLevel<=1.75 then 3; if 1.75<calledLevel<=2.5 then 4; if calledLevel >2.5 then 5
4.4.6 Rearrangement detection
De novo transcriptome assembly by ABySS [338] was performed on the ten RNASeq datasets to identify candidate transcript rearrangements. The assembled contigs were run
through the trans-ABySS pipeline [297] which aligned a merged contig set to the hg19
(GRCh37) human reference genome assembly and compared the alignments to annotated
151
transcript models, allowing identification of known and novel transcript structures. The
transcript rearrangement component of the pipeline identified all contigs that had two
separate discrete genomic BLAT alignments. The top 5 scoring alignments were inspected
manually and the read evidence support was used to filter out likely false positive events.
Smaller scale rearrangements were identified from contigs with single, gapped BLAT
alignments with supporting read evidence again used to filter out false positive events.
Targeted genomic assembly of the candidate rearranged regions was performed to validate
the events in the genomic data. In addition, 9 events were validated with PCR and Sanger
sequencing in the tumor DNA and RNA using the following procedure. Primer pairs were
selected around the event breakpoint with a 10 bp margin on either side using Primer3 [361]
with the following parameters: 22-26 bp size, 40-46 GC and 54-66 TM restrictions, and using
GC clamp. Primers were selected favoring product sizes 500-600 bp, 400-700 bp, and 300800 bp, respectively. For each amplicon, up to 100 primer pairs were initially identified. This
set was filtered for pairs that hybridized to a unique location using BLAT (min identity 100,
tile size 10, step size 2) on hg19 human genome assembly. Each primer was independently
ranked using the Primer3 objective function. The primer sequences used for the genome and
transcriptome validations are provided in Table 4.5 and Table 4.6, respectively.
For the RNA validation, first strand cDNA was synthesized using 500ng of DNaseItreated total RNA from tumor by following the Agilent AccuScript High Fidelity 1st Strand
cDNA Synthesis protocol (catalog #200820); 1µL of 5-fold diluted template (1st strand
cDNA) was used for setting up the PCR with 98oC for 30 seconds, followed by 32 cycles of
98oC for 10 seconds, 59oC for 30 seconds, 72oC for 10 seconds, and then 72oC for 5 minutes.
The PCR product was run on an 8% PAGE gel for 35 minutes at 200V, and stained with
SyBr green for 1 minute to visualize the image.
For the DNA validation, 1ng genomic DNA was used as a template for PCR with
98oC for 30 seconds, followed by 28 cycles of 98oC for 30 seconds, 63oC for 30 seconds,
72oC for 60 seconds, and then 72oC for 5 minutes. The PCR product was run on a 1%
agarose gel for 90 minutes at 100V, and stained using SyBr green for 45 minutes for
visualization. The target PCR products from matching tumor and normal DNA were excised,
cloned into vector pCR4-TOPO (Invitrogen) and sequenced using M13 forward and M13
reverse primers on the ABI3730xl capillary sequencer.
152
The CGI structural variation pipeline was used to identify rearrangements present in
the CGI data [110]. Candidate somatic events were confirmed by PCR and electrophoresis
alone or followed by Sanger sequencing.
4.4.7 Exome sequencing and data analysis
The generation, sequencing, and analysis of 81 pairs of exome libraries at the Broad
Institute was performed using a detailed, previously described protocol [108]. A summary of
deviations from this protocol is provided here. Due to the small quantities of DNA available,
all DNA samples were amplified using Phi29-based multiple-strand displacement whole
genome amplification (Repli-g service, QIAgen). Exonic regions were captured by insolution hybridization using RNA baits similar to those described [108] but supplemented
with additional probes capturing additional genes listed in RefSeq [318] in addition to the
original Consensus Coding Sequence (CCDS) set [317]. In total, ~33 Mb of genomic
sequence was captured, consisting of 193,094 exons from 18,863 genes annotated by the
CCDS [317] and RefSeq [318] databases as coding for protein or micro-RNA (accessed
November 2010). Sequencing of 76 bp paired-end reads was performed using Illumina
Genome Analyzer IIx (GAIIx) and HiSeq 2000 instruments. Reads were aligned to the hg19
(GRCh37) build of the human reference genome sequence using BWA [319]. To confirm
sample identity, copy number profiles derived from sequence data were compared with those
previously derived from microarray data from each case, downloaded from dbGAP.
Candidate somatic base substitutions were detected using muTect (previously
referred to as muTector [108]). Candidate somatic insertions and deletions were detected as
previously described [108].
4.4.8 Integrated analysis of somatic variation from exome and genome data sets
Somatic mutations detected in genome, exome, and transcriptome data sets were
annotated using Oncotator version 0.4. Genes mutated at a statistically significant
frequency were identified using the MutSig algorithm [329]. Briefly, background mutation
rates were estimated from all data for each of the 7 mutation categories: C or G in CpG; C in
TpC or G in GpA; A; remaining C; remaining G; insertion/deletion/duplication. These rates
were assumed to be constant across all patients and across all genes in the genome. The
overall background mutation rate was considered to be the sum of the seven random
variables, describing each mutation category. The observed mutation data for each gene
153
across all patients, corrected for gene length, was then compared to the background mutation
rate, and a likelihood ratio test was applied to select those genes whose observed mutation
rate was significantly different from the estimated background mutation rate. The BenjaminiHochberg false discovery rate correction for multiple testing was applied to calculate the qvalue for each gene, quoted in the text. The q-values of less than 0.2 were considered
statistically significant which amounts to a 20% false positive rate.
The relationship between mutation frequency and age of diagnosis was tested using
the Spearman rank test. The R version 2.11.1 implementation of the Kolmogorov-Smirnov
test (ks.test) was used to test differences in mutation frequency distributions of the following:
1) MYCN amplified vs. unamplified, 2) 17q loss vs. wildtype, 3) 1p gain vs. wildtype, 4) 11q
loss vs. wildtype, and 5) hyperdiploid vs. diploid. Correction for multiple testing was
performed using the R Bioconductor package q-value [246]. Significantly mutated genes led
to an investigation of related genes, specifically those involved in chromatin remodeling and
MAPK signaling. These lists of genes are provided in Appendix F. In a search for
informative mutations in hypermutated samples, we examined mutations in genes from a
published [362] and updated list of DNA repair genes available through the authors‘ website:
http://sciencepark.mdanderson.org/labs/wood/dna_repair_genes.html.
154
Figure 4.1 Overview of the multi-centre next-generation sequencing initiatives and data
analyses
155
Figure 4.2 Somatic mutation frequencies in 99 NBL tumor/normal pairs with samples
ordered by type of genes with somatic alteration
(A). Individual somatic mutation rates in the 99 NBL tumors arranged by mutation categories
discussed in the text (color-coded): hypermutated, ALK mutated, chromatin remodeling gene
mutated, MAPK pathway oncogene mutated, Cancer Gene Census gene mutated, and
unknown. Within each category the samples are ordered by their somatic non-silent mutation
rate corrected for callable exonic sequence (Mb). The data panels are described below in
bold. Data type – sequencing technology used, blue = in-solution exome capture followed by
Illumina sequencing, orange = Illumina whole genome sequencing, yellow = Complete
Genomics, Inc. (CGI) whole genome sequencing. Hatched blocks identify cases for which
data were generated using two technologies. Callable exonic Mb – megabases of coding
sequence with sufficient data for mutation detection. Count of candidate somatic mutation
– stacked bar plot of silent (i.e. synonymous) and non-silent mutations in each tumor.
Boxplot to the right depicts distribution of non-silent mutation frequencies across all 99
tumors. Whiskers depict upper and lower ranges used to detect outliers, equal to first or third
quartile minus or plus 3.5 times the interquartile range, for the first and third quartiles,
respectively (i.e. Q1 - 3.5*IQR or Q3 + 3.5*IQR). Outlier mutation frequencies are shown as
circles. dbGAP 6-letter identifiers – TARGET sample identifiers (Appendix D). (B).
Distribution of specific mutations in each mutational category of interest (color-coded):
hypermutated, ALK mutated, chromatin remodeling gene mutated, MAPK pathway oncogene
mutated, Cancer Gene Census gene mutated, and unknown. Genes, found to be mutated at a
significant frequency by MutSig analysis, are listed in bold. Mutations in MYC family
members (MYCN and MYC) are also highlighted. Genes that are listed in the unknown
category are MutSig hits that do not belong to any of the other categories described by the
legend (PGLYRP3, GABRA6, SUCLG2, IGSF11). The data panels are described below in
bold. Heatmap of non-silent mutations and structural rearrangements – colored blocks
identify alterations in genes with statistically significant mutation frequencies or implicated
as part of a mechanism disrupted in NBL; DNA repair (red), ALK signaling (orange),
chromatin remodeling (green), MAPK signaling (blue), MYC family member (light blue).
Alteration types are color-coded missense mutation (black), nonsense/frameshift/splice site
mutation (red), and structural rearrangement (orange). MYCN amplification – black
156
rectangles used to identify samples with MYCN amplification. A grey square identifies a
sample for which a measurement of MYCN amplification could not be made for technical
reasons.
157
A
Categories for sample classification
Hypermutated
ALK
mutated
Chromatin remodeling
gene mutated
MAPK pathway
oncogene mutated
Cancer Gene Census
gene mutated
Unclassified
158
B
Categories for sample classification
Hypermutated
ALK
mutated
Chromatin
remodeling gene
mutated
MAPK pathway
oncogene mutated
MYC
family
mutated
Cancer Gene
Census gene
mutated
Unclassified
Gene alteration categories in heatmap
Missense mutation
Nonsense, frameshift or splice site mutation
Structural variant or gene fusion
159
Figure 4.3 Integrated analysis of 99 neuroblastoma cases reveals a diversity of somatic
aberration
Each case analyzed by whole genome sequencing is represented as a CIRCOS plot [339].
The reference human chromosomes are arranged end-to-end in the outer-most ring. Genes
harboring non-silent mutations are depicted outside of the chromosomes with circles colorcoded as described in the legend. The ring inside the chromosomes shows somatic gains and
losses of copy number. Finally, the inner-most ring depicts structural aberrations with black
lines inside the circle; aberrations predicted to result in a gene fusion are highlighted with
orange lines. The cases are ordered according to the categories in Figure 1. (A). Four cases
with MAPK pathway aberrations. The MAPK pathway aberrations detected in the NBL
genomes include mutations in NRAS and NF1 and gene fusions involving MAPK9 and
MAPK10. (B). Four cases with chromatin remodeling aberrations. The chromatin remodeling
aberrations detected in the NBL genomes include mutations in MLL5, CHD8, CREBBP,
deletion in ARID1B, and a gene fusion involving IKZF3. (C). Three cases with aberrations in
known cancer genes, including ABL2, ATM, and FANCD2. (D). Two cases with somatic
mutations in ALK. (E). Six unclassified cases with no aberrations in the categories described
thus far. Two unclassified cases (PARIRD and PARSHT) contain rearrangements of the
MYCN amplification region.
160
A. Cases with somatic alterations in MAPK pathway oncogenes
161
B. Cases with somatic alterations in chromatin remodeling genes
162
C. Cases with somatic alterations in Cancer Gene Census genes
163
D. Cases with somatic alterations in the ALK oncogene
164
E. Unclassified cases
165
Table 4.1 Non-silent mutations in genes of interest along with their validation status
The genes of interest listed in the table include genes mutated at a significant frequency (as detected by the MutSig analysis), genes
implicated in chromatin remodeling or MAPK signaling (Appendix F), and genes listed in the Cancer Gene Census [7]. The genes are
listed in the order shown in Figure 4.2. MutSig genes are highlighted in bold. Chromatin = X identifies chromatin remodeling genes
hits; Cancer = X identifies genes listed in the Cancer Gene Census; MAPK = X identifies genes that encode a member of the MAPK
Pathway (KEGG hsa04010); COSMIC = Number of samples recorded in the Catalogue of Somatic Mutations in Cancer that overlap
the particular NBL mutation. Confirmation by orthogonal method = method(s) used to confirm the variant: Sanger sequencing,
Custom hybrid capture and Illumina sequencing, RNA sequencing, genotyping using Sequenom assay [327]; * denotes hypermutated
samples.
166
Category
Protein
change
Genome change (hg19)
Case
identifiers
COSMIC
overlaps
Hypermutated MLH1
(DNA repair) DDB1
p.Y157*
p.C725*
chr3:37050322C>A
chr11:61079358G>T
PAPPKJ*
PALJPX*
0
0
ALK
ALK
ALK
ALK
p.I1170N
p.I1171N
p.F1174L
chr2:29445216A>T
chr2:29445213A>T
chr2:29443695G>T
2
2
58
ALK
p.F1174L
chr2:29443695G>C
PANRHJ
PAKXDZ
PANZVU,
PALAKE
PAREGK
ALK
ALK
p.F1245I
p.I1250T
chr2:29436860A>T
chr2:29432739A>G
7
0
ALK
p.R1275Q
chr2:29432664C>T
ALK
p.R1275L
chr2:29432664C>A
PAINLN
PANYGR
germline
PALNLU,
PANBCI
PASAZJ
ARID1A
ARID1A
ARID1B
ASH1L
CHD6
CREBBP
p.G1139V
p.G1942D
p.R1487M
p.K324R
p.P2383T
p.S1365*
chr1:27099000G>T
chr1:27106214G>A
chr6:157522242G>T
chr1:155451690T>C
chr20:40040888G>T
chr16:3790439G>T
PALNLU
PALXHW
PAMMWD
PAPPKJ*
PAKFUY
PASCLP
0
0
0
0
0
0
EP300
HDAC4
IKZF4
KDM5A
KDM6A
p.R915C
p.P917L
p.G151V
p.A1028V
p.Q1354*
chr22:41546128C>T
chr2:239990289G>A
chr12:56420730G>T
chr12:420184G>A
chrX:44966680C>T
PANIPC
PALJPX*
PALFPI
PANBCI
PALZRG
0
0
0
0
0
Chromatin
remodeling
Gene
58
42
42
Orthogonal method
used to confirm
variant
Sequenom
Sequenom
Sanger
Sanger, Sequenom
Sanger, Capture,
Sequenom
Sanger, Capture,
Sequenom
Sanger, Sequenom
Sanger, Sequenom
Sanger, Capture,
Sequenom
Sanger, Capture,
Sequenom
Sequenom
Sequenom
Capture, RNA-Seq,
Sequenom
Sequenom
167
Category
MAPK
pathway
Gene
Protein
change
Genome change (hg19)
Case
identifiers
COSMIC
overlaps
MLL3
MLL4
MLL5
p.A4748T
p.C1432*
p.P1759Q
chr7:151841899C>T
chr19:36218517C>A
chr7:104753479C>A
PALWIP
PANZVU
PANYGR
0
0
0
NUP98
PAX5
PRDM2
PRDM4
EGFR
LILRB1
LILRB1
chr11:3735118T>A
chr9:36966647C>T
chr1:14108929C>A
chr12:108128195T>C
chr7:55240678C>A
chr19:55143049G>T
chr19:55147061G>T
PANPVI
PAIXNV
PAMMWD
PAPPKJ*
PALUDH
PALZZV
PALTEG
0
0
0
0
0
0
0
LILRB1
NF1
NF1
NFKB2
p.E836V
p.D227N
p.P1547T
p.E733G
p.P641H
p.E57*
p.R550_splic
e
p.S209T
p.Ile1679Val
p.E2501*
p.H894fs
PANUKV
PASDZJ
PALJPX*
PAPPKJ*
0
0
1
0
NRAS
NRAS
p.G13R
p.Q61K
chr19:55143652T>A
chr17:29653037A>G
chr17:29679318G>T
chr10:104162112_10416211
2delC
chr1:115258745C>G
chr1:115256530G>T
PANBSP
PAPTMM
331
1551
NTRK1
PDGFRB
PTPN11
PTPN11
PTPN12
PTPN13
MYC
p.V263L
p.A927S
p.A72T
p.E76A
p.H460Y
p.V811L
p.T73I
chr1:156841484G>T
chr5:149499049C>A
chr12:112888198G>A
chr12:112888211A>C
chr7:77256374C>T
chr4:87662913G>T
chr8:128750681C>T
PAIXNC
PAITCI
PAPBZI
PALHVD
PAPPKJ
PAILNU
PAKZRF
0
0
68
128
0
0
0
Orthogonal method
used to confirm
variant
Sequenom
Capture, RNA-Seq,
Sequenom
Sequenom
Sequenom
Sequenom
Sequenom
Sanger, Capture,
Sequenom
168
Category
Cancer
Gene
Census
Gene
Protein
change
Genome change (hg19)
Case
identifiers
COSMIC
overlaps
MYCN
p.P44L
chr2:16082317C>T
PASLGS
1
PGLYRP3
PGLYRP3
PGLYRP3
GABRA6
GABRA6
SUCLG2
SUCLG2
IGSF11
IGSF11
ABL2
ATIC
ATM
ATRX
BCR
p.R175M
p.F237C
p.H338N
p.R84H
p.A322T
p.D109H
p.R160Q
p.S270T
p.P366Q
p.P996A
p.P148A
p.A2274V
p.S2017P
p.S317fs
PALZRG
PALHVD
PALZSL
PAPPKJ
PAMUTD
PAICGF
PAPPKJ
PAITCI
PALJPX
PANYGR
PALAKM
PANRRW
PALXHW
PAHYWC
0
0
0
0
0
0
0
0
0
0
0
1
0
0
CARD11
CD79A
p.R1011K
p.P128fs
CIITA
CLTC
COL1A1
DDX5
FANCD2
FANCE
KTN1
MEN1
p.E372D
p.A76T
p.G1163R
p.I82V
p.K871N
p.R200C
p.Q618L
p.R521fs
chr1:153276338C>A
chr1:153274903A>C
chr1:153270446G>T
chr5:161115980G>A
chr5:161119084G>A
chr3:67579512C>G
chr3:67570997C>T
chr3:118623540C>G
chr3:118621566G>T
chr1:179077416G>C
chr2:216190772C>G
chr11:108196798C>T
chrX:76849227A>G
chr22:23524096_23524097in
sC
chr7:2951918C>T
chr19:42383609_42383610in
sC
chr16:11000462G>C
chr17:57721820G>A
chr17:48264420C>G
chr17:62500403T>C
chr3:10114944A>C
chr6:35423873C>T
chr14:56106660A>T
chr11:64572092_64572093in
Orthogonal method
used to confirm
variant
Sanger, Capture,
RNA-Seq, Sequenom
Sequenom
Sequenom
Sequenom
Sequenom
Sequenom
Sequenom
Sequenom
RNA-Seq, Sequenom
Capture, RNA-Seq
PAHYWC 0
PAMMWD 0
PAMVAG
PAPPKJ*
PALHVD
PAPPKJ*
PANN
PAMBAC
PARGUX
PANUKV
0
0
0
0
0
0
0
2
RNA-Seq
Sanger, Capture
169
Category
Gene
Protein
change
Genome change (hg19)
MET
MLLT3
MLLT3
MSI2
NACA
p.R359Q
p.167_168SS
>S
p.Q326*
p.R269W
p.P996fs
NKX2-1
NOTCH1
NOTCH2
p.S44Y
p.C2189Y
p.P6fs
PDE4DIP
PIK3CA
PLAG1
ROS1
TAF15
TMPRSS2
p.Q1197K
p.K111N
p.P458Q
p.L2013V
p.R406I
p.A423fs
TRIP11
TSC1
USP6
p.A1552T
p.T356I
p.67_68IR>
MW
sG
chr7:116340214G>A
chr9:20414341_20414343del
CTA
chr9:20413868G>A
chr17:55752347C>T
chr12:57112326_57112326d
elG
chr14:36988522G>T
chr9:139391625C>T
chr1:120612003_120612004
delGG
chr1:144881607G>T
chr3:178916946G>T
chr8:57078932G>T
chr6:117638404G>C
chr17:34171520G>T
chr21:42842589_42842590in
sC
chr14:92466356C>T
chr9:135786463G>A
chr17:5036210_5036211TC
>GT
WT1
p.R495P
chr11:32410674C>G
Case
identifiers
COSMIC
overlaps
Orthogonal method
used to confirm
variant
PASAZJ
PALXMM
0
0
Capture
PAPPKJ*
PAKFUY
PALAKE
0
0
0
Sequenom
PAMZMG
PAPPKJ*
PALWVJ
0
0
0
PAKXDZ
PAIPGU
PAREGK
PAHYWC
PALUDH
PALAKE
0
20
0
0
0
0
PALNLU
PALFPI
PASLGS
possible
germline
PANYBL
0
0
0
Sequenom
Sequenom
Sanger, Capture
Sanger, Capture
0
170
Table 4.2 Genes with significant frequency of somatic mutation
Somatic mutations in exomic regions from 99 NBL cases were analyzed using the MutSig algorithm [329] as described in Section
4.4.8 with and without two hypermutated (HM) samples. The MutSig algorithm tests the null hypothesis that all the observed
mutations in each gene are a consequence of random background mutation processes. Genes for which this hypothesis is rejected
based on the Benjamini-Hochberg false discovery rate-corrected q-value (q < 0.2) are considered significantly mutated, and are listed
in the table.
Gene
Description
Anaplastic lymphoma receptor
tyrosine kinase
PGLYRP3 Peptidoglycan recognition
protein 3
Leukocyte immunoglobulin-like
LILRB1
receptor, subfamily B, member 1
Protein tyrosine phosphatase,
PTPN11
non-receptor type 11
Neuroblastoma RAS viral (v-ras)
NRAS
oncogene homolog
GABRA6 Gamma-aminobutyric acid
(GABA) A receptor, alpha 6
SUCLG2 Succinate-CoA ligase, GDPforming, beta subunit
Immunoglobulin superfamily,
IGSF11
member 11
ALK
Patients
Unique
sites
q-value
no HM
q-value
with HM
9
6
7.7x10-7
2.6x10-6
Expressed in 10
neuroblastoma
transcriptomes
Yes
3
3
0.045
0.065
No
3
3
0.071
0.085
Yes
2
2
0.13
0.17
Yes
2
2
0.17
0.17
Yes
2
2
1.00
0.17
No
2
2
1.00
0.17
Yes
2
2
1.00
0.18
No
171
Table 4.3 Notable structural variants detected and confirmed in NBL genomes and transcriptomes
*These fusions likely have complex architecture and may involve additional neighboring genes. The following designations are used
in the Table: SV = structural variant; CE = capillary electrophoresis; MAPK = identifies genes that encode a known or putative
member of the MAPK Pathway (KEGG hsa04010); Cancer = identifies genes listed in the Cancer Gene Census [7]; t(11;17) =
identifies genes affected by a translocation between chromosomal arms 17q and 11q; Chromatin remodeling = identifies genes that
function in chromatin remodeling; Recurrent genes = identifies genes recurrently affected by structural variants in this study;
Mitelman database = identifies genes known to be involved in cancer-specific genome rearrangements as recorded in the Mitelman
database of chromosome aberrations and gene fusions in cancer [343]; Other = denotes other notable genes described in the text;
Confirmed, evidence in blood = somatic events detectable by PCR in the patient‘s blood, likely derived from circulating tumor DNA.
172
Gene(s)
MYCN;
GULP1
ABL2;
ACBD6
ARID1B
FAM86C1;
IKZF3
MAPK10;
PRDM5
PRELID2;
MAPK9
AUTS2
RRM1714;
NOP2
PTPN13
Event Type
Fusion
Sample
PARIRD
Breakpoint
chr2:16083041
Breakpoint
chr2:189393508
Validation status
Confirmed somatic
Comment
Cancer
Fusion
PARRBU
chr1:179198375
chr1:180382607
Probable somatic
Cancer
SV
Fusion
PASLGS
PARGUX
chr6:157138276
chr11:71508561
chr6:157168409
chr17:37960058
Fusion
PASCKI
chr4:87260892
chr4:121730886
Chromatin remodeling
Chromatin remodeling;
t(11;17)
MAPK
Fusion
PAPSKM
chr5:145142894
chr5:179682098
Confirmed somatic
Probable somatic
by CE
Probable somatic
by CE
Confirmed somatic
SV
Fusion
PARIRD
PARDUJ
chr7:70188135
chr12:6669132
chr7:70200616
chr12:6679022
Confirmed somatic
Confirmed somatic
Mitelman database
Mitelman database
SV
PASCKI
chr4:87732011
chr4:104699882
MAPK
FAM134B;
CDH18
CDH13
LSAMP;
STAG1
NBAS;
BAZ2B
NBAS;
CCNT2*
NBAS
NBAS
Fusion
PAPSKM
chr5:16539884
chr5:19720574
Probable somatic
by CE
Confirmed somatic
SV
Fusion
PAPTLD
PANRRW
chr16:82673944
chr3:116668399
chr16:82684731
chr3:136402660
Confirmed somatic
Confirmed somatic
Other
Other
Fusion
PARIRD
chr2:15578371
chr2:160221641
Confirmed somatic
Recurrent gene
Fusion
PARIRD
chr2:15591679
chr2:135682003
Confirmed somatic
Recurrent gene
SV
SV
PARIRD
PARIRD
chr2:15648856
chr2:15659845
chr2:15650260
chr2:15660266
Recurrent gene
Recurrent gene
NBAS
SV
PARIRD
chr2:15699066
chr17:53790753
Confirmed somatic
Probable somatic
by CE
Probable somatic
by CE
MAPK
Other
Recurrent gene
173
Gene(s)
CDKAL1
Event Type
SV
Sample
PAPTMM
Breakpoint
chr6:20769198
Breakpoint
chr6:20806792
CDKAL1
APBB1766;
ZFHX3
NBAS;
AK001558*
NBAS
SV
Fusion
PASLGS
PARDUJ
chr6:20806846
chr16:73036896
chr6:20899275
chr16:73064047
Fusion
PARSHT
chr2:15629062
chr2:12660527
SV
PASDZJ
chr2:15794685
chr2:17080300
NBAS;
FAM49A*
NBAS
Fusion
PASDZJ
chr2:15667544
chr2:17302098
SV
PASDZJ
chr2:16794222
chr2:17046968
NBAS
NBAS
SV
SV
PASDZJ
PARSHT
chr2:16975790
chr2:12660729
chr2:17208524
chr2:15626595
ZFHX3
Duplication
PANRRW
chr16:73064821
chr16:73352657
RNF121;
TRIM37
ATG2A;
BCAS3
SHANK2
Fusion
PANNMS
chr11:71692501
chr17:57072537
Fusion
PARGUX
chr11:64674966
chr17:58891730
SV
PASCKI
chr11:70784776
chr17:34136040
Validation status
Probable somatic
by CE
Confirmed somatic
Confirmed somatic
Comment
Recurrent gene
Confirmed,
evidence in blood
Confirmed,
evidence in blood
Confirmed,
evidence in blood
Confirmed,
evidence in blood
Confirmed somatic
Confirmed,
evidence in blood
Putative unknown
origin
Confirmed somatic
Recurrent gene
Probable somatic
by CE
Confirmed somatic
t(11;17)
Recurrent gene
Recurrent gene
Recurrent gene
Recurrent gene
Recurrent gene
Recurrent gene
Recurrent gene
Recurrent gene
t(11;17)
t(11;17)
174
Table 4.4 Parameters used to select high confidence candidate somatic mutations
reported by CGI
The MAF files provided by Complete Genomics, Inc (CGI) were filtered based on the
parameters described in the table.
Selection Criterion
Operator
Value
Variant_Classification
Equal
(Nonsense, Misstart, Nonstop,
Frame_Shift, In_Frame, Missense,
Splice_Site)
Variant_Type
Equal
(Snp, Ins, Del, Sub)
Mutation_Status
Equal
(Somatic, LOH)
Tumor_VarScore_Rank
>=
0.025
Match_Norm_RefScore_Rank >=
0.025
175
Table 4.5 Primer sequences used for genomic validation of structural variants and gene fusions detected by BCCA pipeline
Sample
Gene(s)
PANNMS
PARSHT
RNF121;
TRIM37
LSAMP;
STAG1
NBAS
PARSHT
PANRRW
Genomic
Genomic
breakpoint
breakpoint
chr11:71692501 chr17:57072537
chr3:116668399 chr3:136402660
chr2:12660527
chr2:15629062
NBAS
chr2:12660729
chr2:15626595
PASDZJ
NBAS
chr2:15794685
chr2:17080300
PASDZJ
NBAS
chr2:15667544
chr2:17302098
PASDZJ
NBAS
chr2:16794222
chr2:17046968
PASDZJ
NBAS
chr2:16975790
chr2:17208524
PASLGS
RERE
chr1:5081632
chr1:8421299
PARRBU
MPRIP
chr17:16952510 chr17:2459132
Primer 1
Primer 2
GATATTTCGTTTGGATAGCA
CTGG
TCTGCAGAGAGAAAGACTAC
CTTG
ATAATTGTTGCTAGTGGAGG
AAGG
ATAATTGTTGCTAGTGGAGG
AAGG
GTCAAATTTATCAGCCTTTG
GC
GACGATCTATCCTGGCACTG
AC
GGAACTTCTTGATATGGTCT
GACTC
ATAGGAATCACAACAGGAA
AGGAG
GACACTCATGAGCATAGAAA
AAGG
CCGAGTTTAAGCGATTCTTG
TG
GAAGTGCAGTAGCACGATTT
TGG
TACTGAGTTTTCCTATCCACA
AGC
ACAAATACCCTGAGAGTCTG
GAAG
ACAAATACCCTGAGAGTCTG
GAAG
GTTTAAGGCCCTGATAGAAG
AGG
ATTCATGTTGCAAGAGCAGA
AG
TTCCCAGTTCTTTCTTATAGA
GGTG
CTACAGCACGGGCTTCTAAA
AC
AGGACAATGAGAGTGACTCG
GAC
GGTATATGCCAAGAAGAATT
GAGG
176
Table 4.6 Primer sequences used for tumor RNA validation of structural variants and gene fusions detected by the BCCA
pipeline
Sample
PANNMS
PANRRW
PARRBU
PARSHT
PASDZJ
PASDZJ
PASDZJ
PASLGS
PASLGS
Gene(s)
RNF121;
TRIM37
LSAMP;
STAG1
MPRIP
NBAS
NBAS
NBAS
NBAS
RERE
RERE
Primer1
ATCTCTCTCCAGAAGAGCAATGG
Primer2
AGGTGCAGTGTCAGTTTCAAATC
GAATAACACACCGGAGACTTTTG
GTTAAAATCCACGCTGCGAACAG
GGATGACACTTTGAGAACTCCTG
ACTGGAACAAATTCTCAGTGTGTC
GTGAGAAGTGGTGTCACTCACGC
ATCAACACAGCTATTTACCACCC
ACAGCTATTTACCACCCTGGTC
ACAGTGAAGAAGTCGGCCAAGAAG
GTACCTCCAGCAATGACAGTAAAG
GTGTCAGAACTGCTTCAAGCCC
GAATCCATCTTTCTCTCATGTAGC
ATTCATGTTGCAAGAGCAGAAG
GTTCAGAGAATCTCCCAAAATCAC
GTCCATCATAGAGCTGAAAATGTG
AGAGACACCAAACAGGCTTTGAG
CTCATTTTGTCTTCAATGTGGG
177
Chapter 5: Conclusions and future directions
Evolving methods of genomic analysis reviewed in Chapter 1 have contributed to the
characterization of cancer genomes and transcriptomes at ever-increasing resolution. The
advent of second-generation sequencing technologies has enabled studies of cancers that
achieve single-nucleotide views of both genomes and transcriptomes. Applications of arraybased methods and candidate gene Sanger-based re-sequencing to the analysis of human
neuroblastoma (NBL) have revealed novel loci associated with the disease, most notably the
anaplastic lymphoma kinase ALK that is subject to somatic mutation and amplification,
occurring in 5-15% of patients with sporadic NBL [192,193,188,194] . Based on these
studies we hypothesized that interrogations of NBL genomes and transcriptomes using
second generation technologies may lead to novel insights into the disease. We also
hypothesized that better understanding of the gene expression profile of the putative cell of
origin of NBL will help identify loci with clinical relevance to the disease and interpret high
throughput sequencing data from NBL cells. To address these hypotheses we developed
three research objectives that formed the basis of Chapters 2, 3, and 4, each fulfilling specific
goals described in the subsections below.
5.1 Transcriptome analysis of normal neural crest cells identifies key pathways,
enriched and depleted in this population compared to other related cell types
Since NBL is thought to originate from a differentiation arrest along the
sympathoadrenal lineage of the neural crest, understanding the neural crest stem cell and its
development into this lineage may provide insight into the pathogenesis of NBL. Therefore,
the overall objective of Chapter 2 was to identify and characterize the expression of genes
and pathways that distinguish neural crest stem cells from other stem cell lineages with
similarly broad developmental potential. The Skin-derived Precursor cells (SKPs) have been
validated as models for normal neural crest stem cells by previous work [234] and have been
used for this analysis. The Mesenchymal Stem Cells (MSCs) have been chosen for
comparison as they represent one of the few somatic stem cell lineages that approach the
developmental potential of the neural crest [230,255,363].
To address the research objective of Chapter 2, I first characterized the transcriptomes
of SKPs isolated from ventral, dorsal and facial skin regions of the body that are thought to
derive from different developmental origins, including neural crest itself and somite
178
mesoderm. This analysis revealed plasticity of the neural crest stem cell phenotype
suggesting that cells resembling normal neural crest stem cells may arise from non-neural
crest lineages.
Based on this result, I used the three SKP populations to identify transcripts enriched
and depleted in SKPs compared to a related multipotent somatic stem cell lineage, the MSCs.
This analysis revealed the relative increase of mRNA abundance of transcripts involved in
the WNT/Beta-catenin, BMP and TGFB pathways, and relative depletion of transcripts
involved in double-stranded break DNA repair in SKPs compared to MSCs. While the
importance of active WNT/Beta-catenin, BMP and TGFB signaling in neural crest cells is
well-established [257,364,272], the relative reduction of the expression level of genes
involved in double-stranded break DNA repair is a novel finding. A recent study in mice has
identified eleven DNA repair genes, highly expressed during very early embryonic
development and barely detectable in the adrenal medulla, an organ derived from the
sympathoadrenal lineage of the neural crest and the most common primary site of NBL
[365]. This study is consistent with my finding of the decreased mRNA abundance of DNA
repair genes in SKPs compared to MSCs. In addition, the SKP and MSC comparison
revealed the preferential expression of pluripotency markers in SKPs, which prompted me to
further investigate similarities and differences between the expression profiles of SKPs and
ES cells. This analysis revealed 5 pluripotency markers (CTNNB1, ETV4, MAD2L2, PITX2,
SOX2) among the genes enriched in SKPs compared to MSCs, and 13 pluripotency markers
(ADAM23, AURKB, CENPK, FAM46B, FAM64A, HMGB2, IGF2BP3, KPNA2, MTHFD1,
MYBL2, TBX4, TPM1, ZFP57) among the genes depleted in SKPs compared to MSCs,
highlighting the unique phenotype of SKPs.
Future studies based on the findings in Chapter 2 may investigate further the
functional mechanism and consequences of the observed reduction of expression of DNA
repair genes in normal neural crest cells. The work discussed in Chapter 2 involved rat rather
than human neural crest cells as a model for the analysis, and it will be important to validate
these findings in human SKPs as well as other models of neural crest cells (for example, the
human epidermal neural crest stem cells [366]). The original choice of the rat cells was
driven by our desire to investigate the similarities and differences among SKPs isolated from
different parts of the body, and the availability of rat-derived MSCs for comparisons. Since
179
we showed the convergence of facial, dorsal trunk and ventral trunk SKPs to a neural crest
stem cell phenotype, and assuming that this finding holds true in other vertebrate species,
SKPs from any part of the body can be used to model normal neural crest stem cells. We
took advantage of this result in Chapter 3, where human foreskin-derived SKPs were used as
a reference normal tissue for the analysis of NBL tumor-initiating cells.
5.2 Plasticity of the neural crest stem cell phenotype and NBL heterogeneity
The results of the analysis described in Chapter 2 are consistent with the hypothesis
that neural crest stem cell-like cells could derive from non-neural crest lineages. In particular,
we showed that mesoderm-derived ventral and dorsal SKPs were similar to neural crestderived facial SKPs at the level of gene expression and differentiation potential. This finding
may relate to the heterogeneity of NBL, which is a spectrum of diseases with diverse genetic
aberrations, pathological features, and clinical courses.
Dozens of clinical and biological markers of potential clinical significance have been
proposed for NBL [367]. Seven of these markers, including the differentiation grade of the
tumor (neuroblastoma, ganglioneuroblastoma or ganglioneuroma), are currently used
clinically for pre-treatment risk stratification of new NBL patients [183]. The differentiation
grade of NBL cells may reflect the developmental stage at transformation, and correlates
with the disease course, such that low-risk tumors typically have a more differentiated
morphology than high-risk tumors [175]. NBL recurrence may still occur in some low- or
intermediate-risk patients with differentiated morphology and low-stage disease, suggesting
that tumors of the same differentiation grade and stage may be heterogeneous at a molecular
level. Poor outcome in patients with differentiated tumors and low-stage disease was found to
be associated with high expression of MYC and low expression of genes involved in
sympathetic neuronal differentiation [368]. The heterogeneity of NBL cells with respect to
their apparent developmental program is also reflected in the variable sensitivity of NBL cell
lines to differentiation agents. For instance, retinoids can induce marked neuronal
differentiation and cell cycle arrest in some NBL cell lines but fail to have any effect on other
NBL lines, derived from patients with similar disease characteristics [369]. Notably, retinoic
acid is involved in regulating the differentiation of many tissues; however, the nature of the
growth and differentiation response to retinoic acid depends on the cell type. [370].
180
The diversity of NBL cells with respect to differentiation grade, expression of
developmental markers, and sensitivity to retinoids may reflect different origins of the neural
crest progenitor cells that undergo transformation into NBL. As reported in Chapter 2, neural
crest stem cell-like cells may arise from both the neural crest and the mesoderm. This
observation suggests that NBL may in principle derive from mesodermal cells that have
converged to a neural crest precursor phenotype. Since cells of different developmental
origins, despite having similar phenotypes, maintain a developmental history at the gene
expression level (Section 2.2.2), different developmental origins of NBL may account for the
observed gene expression differences among NBL cells of presumably similar differentiation
grades [368]. The potential for mesodermal cells to give rise to NBL, as well as the putative
impact of this on NBL heterogeneity remains to be addressed by future studies.
5.3 Transcriptome analysis of NBL tumor-initiating cells implicates AURKB as a novel
drug target for NBL
Having characterized the transcriptomes of normal neural crest cells in Chapter 2, I
set out to characterize a presumed malignant counterpart of these cells, the NBL tumorinitiating cells (TICs) derived from bone marrow metastases of high-risk NBL patients.
These cells have been shown to give rise to NBL when injected in mice, and upon serial
transplantation, suggesting that they are a suitable model for the disease. In addition, the
isolation of NBL TICs from patients in remission who later relapsed suggested that these
cells could be used as markers for minimal residual disease in otherwise asymptomatic
patients [279].
The overall objectives of Chapter 3 were to identify transcripts preferentially
abundant in NBL TICs compared to normal SKPs, characterized in Chapter 2, and to assess
if these transcripts could be used to suggest new targets against NBL. To address these
objectives I used RNA-Seq data from NBL TICs, SKPs, and other cancers to identify
transcripts whose expression was increased in NBL TICs compared to other tissue types. I
then conducted pathway analysis to identify functional associations among these transcripts
that could be targeted by inhibitors. The pathway analysis revealed the increased expression
of the BRCA1 signaling pathway members in NBL TICs compared to SKPs and other tissue
types, suggesting that the double-stranded break DNA repair pathway might be activated in
NBL TICs. The finding of the potential tumor-specific activation of this pathway led to the
181
hypothesis that AURKB, a kinase linked to this pathway through its interaction with
BRCA1-associated RING domain protein 1 (BARD1) and aberrantly expressed in NBL TICs
could serve as a drug target against these cells. This hypothesis was tested through AURKB
knock downs and treatments with an AURKB-specific pharmacological inhibitor, and both
experiments led to the specific killing of NBL TICs but not normal SKPs. An independent
group of investigators later confirmed AURKB to be a target in primary NBL tumors further
validating our result [306].
Thus, the expression of members of the BRCA1 signaling pathway found to be lower
in normal neural crest stem cell-like cells compared to MSCs in Chapter 2, appeared to be
increased in NBL TICs. Inhibition of a kinase involved in this pathway appeared to be
cytotoxic to these cells, suggesting the importance of this pathway for NBL pathogenesis.
Additional support for the role of BRCA1 signaling in the pathogenesis of NBL
comes from a GWA study that implicated SNPs in the BARD1 locus to be associated with the
development of sporadic high-risk NBL [197]. The SNPs identified by the GWAS analysis
have been suggested to influence the splicing of BARD1 such that exons 2 and 3 are
excluded, resulting in the loss of the functional domain involved in the BARD1 interaction
with BRCA1 [196]. The exon-level RNA-Seq analysis reported in Chapter 3 supported the
hypothesis of the preferential expression of the short BARD1beta isoform by NBL cells. The
report of a novel function of the stabilization of AURKB by the short BARD1beta isoform
[269] provides a potential mechanism for the preferential sensitivity of NBL cells but not
normal neural crest cells to AURKB inhibition.
Future work resulting from this finding will include functional studies that would
investigate the molecular effects of the inhibition of AURKB on the expression of the
BRCA1 pathway members. In this thesis, we speculated that inhibition of AURKB acts
through downregulating the expression of BRCA1 pathway members, such as gross
chromosomal abnormalities are accumulated and not repaired, resulting in cell death.
However, direct experimental evidence is required to support or refute this speculation.
Examining the expression of BARD1 and its splicing status following AURKB inhibition
would also be of interest. This experiment would reveal whether the killing of NBL cells by
inhibiting AURKB is associated with the downregulation of the expression of the
BARD1beta isoform.
182
A limitation of the work described in Chapter 3 is the use of NBL TICs that are
reportedly contaminated with EBV-transformed lymphocytes [280]. I believe that effects of
this contamination on the results were partially accounted for by the experimental design that
used an expression compendium with lymphocyte-related tissues (diseased B-cells) as
reference for identifying NBL TIC-enriched transcripts. The independent validation of drug
targets predicted by my analysis in primary tumors by other investigators (Chapter 3)
provided additional validation for the usefulness of NBL TICs as models of NBL, despite the
contamination. However, confirmatory studies in non-contaminated NBL stem cells would
be useful to assess the generality of our findings.
5.4
Whole genome, transcriptome and exome sequencing of primary NBL tumors
reveals a broad spectrum of somatic mutations
The analysis described in Chapters 2 and 3 focused on the transcriptomes of normal
and malignant neural crest cells, and implicated the BRCA1 DNA repair pathway as
aberrantly enriched at the mRNA level in metastases-derived NBL TICs compared to the
normal neural crest-like cells. While the finding of AURKB as a novel drug target was
validated in primary tumors [306], the overall experimental design in Chapter 3 focused on
identifying metastases-enriched transcripts and potential targets.
Therefore, the objective of Chapter 4 was to conduct a high resolution
characterization of a panel of 99 primary NBL tumors to identify recurrently altered genes
and pathways of relevance to primary tumors at diagnosis. We also investigated whether the
genetic aberrations found in primary tumors targeted similar pathways to those that have
been identified to be aberrantly expressed in metastases-derived NBL TICs (Chapter 3).
We sequenced 99 primary tumors and matched peripheral blood using a combination
of whole genome and exome sequencing performed using Illumina and CGI technologies.
We also sequenced the transcriptomes from 10 primary tumors included in the set of 99
cases. Analysis of these data revealed that NBL tumors contained a median 0.56 non-silent
mutations per megabase of coding DNA, one of the lowest rates reported in cancer to date.
The ALK gene showed the highest somatic mutation rate and was found to be mutated in 9%
cases, with another case PANYGR harboring an oncogenic germline mutation in the kinase
domain of ALK. Three additional genes (LILRB1, PTPN11 and NRAS) showed significantly
recurrent mutations in non-hypermutated cases, albeit in less than 5% of cases. A loss-of-
183
function translocation of IKZF3 together with alterations found in related genes implicated
disruption of chromatin remodeling mechanisms in 11% of cases. Mutations in PTPN11, its
regulator, LILRB1, and other MAPK signaling components including NRAS, implicated
hyperactivation of the RAS/MAPK pathway in 15% of cases. Mutations in MYC and MYCN
were seen in two tumors without MYCN amplification, suggesting that MYCN could be
activated in NBL through a variety of mechanisms. A hypermutator phenotype was found in
2% of the cases with loss of function mutations in DNA repair genes. In addition, we
identified over 80 somatic structural variants including the aforementioned IKZF3
rearrangement. Therefore, the work described in Chapter 4 highlighted the molecular
heterogeneity of high-risk NBL, identified commonly disrupted pathways, and demonstrated
a relative paucity of somatically acquired mutations, thus implicating epigenetic events as
potentially contributing to the tumor behavior.
In addition to cataloging the genetic aberrations found in primary tumors, I also
compared the genes harboring somatic mutations in primary tumors to those found to be
increased in expression in NBL TICs compared to SKPs. While I did not observe somatic
mutations in BARD1 that could directly explain the preferential expression of the short
BARD1beta isoform described in Chapter 3, I did observe several novel germline variants
occurring in BARD1 introns that could be associated with this phenotype. Future studies can
address this possibility by examining a larger cohort of tumors with matched expression and
DNA sequence data from tumor and normal DNA.
5.5 Future directions in NBL genomics
While the work conducted in Chapter 4 was able to identify a potential disease
mechanism in over 50% of all cases (Figure 4.2B), there is a significant amount of discovery
that still needs to occur to unravel additional molecular aberrations that may contribute to
NBL development. It remains a challenge from the translational point of view that the most
common genomic aberrations in primary NBL are large chromosomal rearrangements
affecting hundreds of genes, and other than MYCN and ALK, focal disruption of individual
genes appear to be rare (as seen in Figure 4.3). In addition, it is possible that a significant part
of the disease phenotype may be related to germline genetic variation and subsequent
stochastic and/or epigenetic alterations in tumor cells. Future efforts in the field may involve
integration of data from the genome-wide association efforts [196] with the sequencing data,
184
such as those described in Chapter 4, as well as generation of new data sets querying
epigenetic and expression changes. Precedence for epigenetic abnormalities playing a
causative role in the pathogenesis of a pediatric cancer has been established by a recent study
in retinoblastoma. This study employed genome-wide sequencing and epigenetic analysis of
retinoblastoma tumors to reveal few somatic mutations but a number of cancer pathways,
including the pathway involving the proto-oncogene SYK, being deregulated at an epigenetic
level [170]. Since NBL can be regarded as a malignancy resulting from a differentiation
arrest of the neural crest [371], epigenetic abnormalities may play a significant part in
determining the ultimate clinical phenotype. Whether this is so remains to be addressed
through comprehensive surveys of the epigenome.
185
Bibliography
1. Nature Milestones in Cancer
[http://www.nature.com/milestones/milecancer/masthead/index.html].Accessed 2 March
2011.
2. Boveri T: Uber mehrpolige mitosen als mittel zur analyse des zellkerns. Verh. D. Phys.
Med. Ges. 1902, 35:67–90.
3. Boveri T: Zur Frage der Entstehung maligner Tumoren. Jena: Verlag von Gustav Fischer;
1914.
4. Finlay CA, Hinds PW, Levine AJ: The p53 proto-oncogene can act as a suppressor of
transformation. Cell 1989, 57:1083–1093.
5. Huang HJ, Yee JK, Shew JY, Chen PL, Bookstein R, Friedmann T, Lee EY, Lee WH:
Suppression of the neoplastic phenotype by replacement of the RB gene in human
cancer cells. Science 1988, 242:1563–1566.
6. Stehelin D, Varmus HE, Bishop JM, Vogt PK: DNA related to the transforming gene(s)
of avian sarcoma viruses is present in normal avian DNA. Nature 1976, 260:170–173.
7. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton
MR: A census of human cancer genes. Nat. Rev. Cancer 2004, 4:177–18310.1038/nrc1299.
8. Rous P: A sarcome of the fowl transmissible by an agent separable from the tumor
cells. J. Exp. Med 1911, 13:397–411.
9. Leach FS, Nicolaides NC, Papadopoulos N, Liu B, Jen J, Parsons R, Peltomäki P, Sistonen
P, Aaltonen LA, Nyström-Lahti M: Mutations of a mutS homolog in hereditary
nonpolyposis colorectal cancer. Cell 1993, 75:1215–1225.
10. Nordling CO: A new theory on cancer-inducing mechanism. Br. J. Cancer 1953, 7:68–
72.
11. Knudson AG: Mutation and cancer: statistical study of retinoblastoma. Proc. Natl.
Acad. Sci. U.S.A 1971, 68:820–823.
12. Nowell PC: The clonal evolution of tumor cell populations. Science 1976, 194:23–28.
13. Fearon ER, Vogelstein B: A genetic model for colorectal tumorigenesis. Cell 1990,
61:759–767.
14. Feinberg AP, Vogelstein B: Hypomethylation distinguishes genes of some human
cancers from their normal counterparts. Nature 1983, 301:89–92.
15. Laird PW, Jackson-Grusby L, Fazeli A, Dickinson SL, Jung WE, Li E, Weinberg RA,
Jaenisch R: Suppression of intestinal neoplasia by DNA hypomethylation. Cell 1995,
81:197–205.
186
16. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100:57–70.
17. Li FP: Familial cancer syndromes and clusters. Curr Probl Cancer 1990, 14:73–114.
18. Sijmons R: Identifying patients with familial cancer syndromes. In Cancer Syndromes
National Center for Biotechnology Information (US); 2009.
19. Knudson AG: Hereditary cancers disclose a class of cancer genes. Cancer 1989,
63:1888–1891.
20. Latif F, Tory K, Gnarra J, Yao M, Duh FM, Orcutt ML, Stackhouse T, Kuzmin I, Modi
W, Geil L: Identification of the von Hippel-Lindau disease tumor suppressor gene.
Science 1993, 260:1317–1320.
21. Kenemans P, Verstraeten RA, Verheijen RHM: Oncogenic pathways in hereditary and
sporadic breast cancer. Maturitas 2004, 49:34–4310.1016/j.maturitas.2004.06.005.
22. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K,
Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L,
Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris
W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al.: Initial
sequencing and analysis of the human genome. Nature 2001, 409:860–
92110.1038/35057062.
23. The International HapMap Project: Nature 2003, 426:789–79610.1038/nature02168.
24. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M,
Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR,
Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J,
Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, et al.:
The sequence of the human genome. Science 2001, 291:1304–
135110.1126/science.1058040.
25. National Cancer Institute: Surveillance, Epidemiology and End Results (SEER)
Database. 2010, Available: http://seer.cancer.gov/statistics/.Accessed 28 July 2011.
26. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E,
Skytthe A, Hemminki K: Environmental and heritable factors in the causation of cancer-analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med 2000,
343:78–8510.1056/NEJM200007133430201.
27. Christiani DC: Combating Environmental Causes of Cancer. New England Journal of
Medicine 2011, 364:791–793.
28. Stent GS: The role of cell lineage in development. Philos. Trans. R. Soc. Lond., B, Biol.
Sci 1985, 312:3–19.
187
29. Bonnet D, Dick JE: Human acute myeloid leukemia is organized as a hierarchy that
originates from a primitive hematopoietic cell. Nat Med 1997, 3:730–
73710.1038/nm0797-730.
30. Al-Hajj M, Wicha MS, Benito-Hernandez A, Morrison SJ, Clarke MF: Prospective
identification of tumorigenic breast cancer cells. Proc. Natl. Acad. Sci. U.S.A 2003,
100:3983–398810.1073/pnas.0530291100.
31. Singh SK, Clarke ID, Terasaki M, Bonn VE, Hawkins C, Squire J, Dirks PB:
Identification of a Cancer Stem Cell in Human Brain Tumors. Cancer Research 2003,
63:5821 –5828.
32. Quintana E, Shackleton M, Sabel MS, Fullen DR, Johnson TM, Morrison SJ: Efficient
tumour formation by single human melanoma cells. Nature 2008, 456:593–
59810.1038/nature07567.
33. Santisteban M, Reiman JM, Asiedu MK, Behrens MD, Nassar A, Kalli KR, Haluska P,
Ingle JN, Hartmann LC, Manjili MH, Radisky DC, Ferrone S, Knutson KL: Immuneinduced epithelial to mesenchymal transition in vivo generates breast cancer stem cells.
Cancer Res 2009, 69:2887–289510.1158/0008-5472.CAN-08-3343.
34. Gupta PB, Chaffer CL, Weinberg RA: Cancer stem cells: mirage or reality? Nat Med
2009, 15:1010–101210.1038/nm0909-1010.
35. Zhou B-BS, Zhang H, Damelin M, Geles KG, Grindley JC, Dirks PB: Tumourinitiating cells: challenges and opportunities for anticancer drug discovery. Nat Rev
Drug Discov 2009, 8:806–82310.1038/nrd2137.
36. Beheshti B, Braude I, Marrano P, Thorner P, Zielenska M, Squire JA: Chromosomal
localization of DNA amplifications in neuroblastoma tumors using cDNA microarray
comparative genomic hybridization. Neoplasia 2003, 5:53–62.
37. Caspersson T, Lindsten J, Lomakka G, Moller A, Zech L: The use of fluorescence
techniques for the recognition of mammalian chromosomes and chromosome regions.
Int Rev Exp Pathol 1972, 11:1–72.
38. Rowley JD: Letter: A new consistent chromosomal abnormality in chronic
myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining.
Nature 1973, 243:290–293.
39. Nowell PC, Hungerford DA: Chromosome studies on normal and leukemic human
leukocytes. J. Natl. Cancer Inst. 1960, 25:85–109.
40. Barnes WM: PCR amplification of up to 35-kb DNA with high fidelity and high yield
from lambda bacteriophage templates. Proc. Natl. Acad. Sci. U.S.A. 1994, 91:2216–2220.
41. Vasickova P, Machackova E, Lukesova M, Damborsky J, Horky O, Pavlu H, Kuklova J,
Kosinova V, Navratilova M, Foretova L: High occurrence of BRCA1 intragenic
188
rearrangements in hereditary breast and ovarian cancer syndrome in the Czech
Republic. BMC Med. Genet. 2007, 8:3210.1186/1471-2350-8-32.
42. Buongiorno-Nardelli M, Amaldi F: Autoradiographic detection of molecular hybrids
between RNA and DNA in tissue sections. Nature 1970, 225:946–948.
43. Speicher MR, Carter NP: The new cytogenetics: blurring the boundaries with
molecular biology. Nat. Rev. Genet. 2005, 6:782–79210.1038/nrg1692.
44. Patel AS, Hawkins AL, Griffin CA: Cytogenetics and cancer. Curr Opin Oncol 2000,
12:62–67.
45. Speicher MR, Gwyn Ballard S, Ward DC: Karyotyping human chromosomes by
combinatorial multi-fluor FISH. Nat. Genet. 1996, 12:368–37510.1038/ng0496-368.
46. Schröck E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, Ning
Y, Ledbetter DH, Bar-Am I, Soenksen D, Garini Y, Ried T: Multicolor spectral
karyotyping of human chromosomes. Science 1996, 273:494–497.
47. Tanke HJ, Wiegant J, van Gijlswijk RP, Bezrookove V, Pattenier H, Heetebrij RJ,
Talman EG, Raap AK, Vrolijk J: New strategy for multi-colour fluorescence in situ
hybridisation: COBRA: COmbined Binary RAtio labelling. Eur. J. Hum. Genet. 1999,
7:2–1110.1038/sj.ejhg.5200265.
48. Fujiwara H, Emi M, Nagai H, Ohgaki K, Imoto I, Akimoto M, Ogawa O, Habuchi T:
Definition of a 1-Mb homozygous deletion at 9q32-q33 in a human bladder-cancer cell
line. J. Hum. Genet. 2001, 46:372–37710.1007/s100380170056.
49. Henderson L-J, Okamoto I, Lestou VS, Ludkovski O, Robichaud M, Chhanabhai M,
Gascoyne RD, Klasa RJ, Connors JM, Marra MA, Horsman DE, Lam WL: Delineation of a
minimal region of deletion at 6q16.3 in follicular lymphoma and construction of a
bacterial artificial chromosome contig spanning a 6-megabase region of 6q16-q21.
Genes Chromosomes Cancer 2004, 40:60–6510.1002/gcc.20013.
50. Huang H, Qian C, Jenkins RB, Smith DI: Fish mapping of YAC clones at human
chromosomal band 7q31.2: identification of YACS spanning FRA7G within the
common region of LOH in breast and prostate cancer. Genes Chromosomes Cancer
1998, 21:152–159.
51. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D:
Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors.
Science 1992, 258:818–821.
52. Mantripragada KK, Buckley PG, Diaz de Ståhl T, Dumanski JP: Genomic microarrays
in the spotlight. Trends Genet. 2004, 20:87–94.
53. Carter NP: Methods and strategies for analyzing copy number variation using DNA
microarrays. Nat. Genet. 2007, 39:S16–2110.1038/ng2028.
189
54. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C,
Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG: High resolution analysis of DNA
copy number variation using comparative genomic hybridization to microarrays. Nat.
Genet. 1998, 20:207–21110.1038/2524.
55. Buckley PG, Mantripragada KK, Benetkiewicz M, Tapia-Páez I, Diaz De Ståhl T,
Rosenquist M, Ali H, Jarbo C, De Bustos C, Hirvelä C, Sinder Wilén B, Fransson I, Thyr C,
Johnsson B-I, Bruder CEG, Menzel U, Hergersberg M, Mandahl N, Blennow E, Wedell A,
Beare DM, Collins JE, Dunham I, Albertson D, Pinkel D, Bastian BC, Faruqi AF, Lasken
RS, Ichimura K, Collins VP, et al.: A full-coverage, high-resolution human chromosome
22 genomic microarray for clinical and research applications. Hum. Mol. Genet. 2002,
11:3221–3229.
56. Buckley PG, Mantripragada KK, Piotrowski A, Diaz de Ståhl T, Dumanski JP: Copynumber polymorphisms: mining the tip of an iceberg. Trends Genet. 2005, 21:315–
31710.1016/j.tig.2005.04.007.
57. Krzywinski M, Bosdet I, Smailus D, Chiu R, Mathewson C, Wye N, Barber S, BrownJohn M, Chan S, Chand S, Cloutier A, Girn N, Lee D, Masson A, Mayo M, Olson T, Pandoh
P, Prabhu A-L, Schoenmakers E, Tsai M, Albertson D, Lam W, Choy C-O, Osoegawa K,
Zhao S, de Jong PJ, Schein J, Jones S, Marra MA: A set of BAC clones spanning the
human genome. Nucleic Acids Res. 2004, 32:3651–366010.1093/nar/gkh700.
58. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A,
Albertson DG, Pinkel D, Marra MA, Ling V, MacAulay C, Lam WL: A tiling resolution
DNA microarray with complete coverage of the human genome. Nat. Genet. 2004,
36:299–30310.1038/ng1307.
59. Inazawa J, Inoue J, Imoto I: Comparative genomic hybridization (CGH)-arrays pave
the way for identification of novel cancer-related genes. Cancer Sci. 2004, 95:559–563.
60. De Lellis L, Curia MC, Aceto GM, Toracchio S, Colucci G, Russo A, Mariani-Costantini
R, Cama A: Analysis of extended genomic rearrangements in oncological research. Ann.
Oncol. 2007, 18 Suppl 6:vi173–17810.1093/annonc/mdm251.
61. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS: A genome-wide scalable
SNP genotyping assay using microarray technology. Nat. Genet. 2005, 37:549–
55410.1038/ng1547.
62. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW,
Wei W, Stratton MR, Futreal PA, Weber B, Shapero MH, Wooster R: High-resolution
analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004,
14:287–29510.1101/gr.2012304.
63. Heinrichs S, Look AT: Identification of structural aberrations in cancer by SNP
array analysis. Genome Biol. 2007, 8:21910.1186/gb-2007-8-7-219.
190
64. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR,
Meyerson M: Allele-specific amplification in cancer revealed by SNP array analysis.
PLoS Comput. Biol. 2005, 1:e6510.1371/journal.pcbi.0010065.
65. Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A,
Cao M, Birch P, Brown-John M, Fernandes N, Go A, Kennedy G, Langlois S, Eydoux P,
Friedman JM, Marra MA: Assessment of algorithms for high throughput detection of
genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics
2007, 8:36810.1186/1471-2105-8-368.
66. Wang K, Diskin SJ, Zhang H, Attiyeh EF, Winter C, Hou C, Schnepp RW, Diamond M,
Bosse K, Mayes PA, Glessner J, Kim C, Frackelton E, Garris M, Wang Q, Glaberson W,
Chiavacci R, Nguyen L, Jagannathan J, Saeki N, Sasaki H, Grant SFA, Iolascon A, Mosse
YP, Cole KA, Li H, Devoto M, McGrady PW, London WB, Capasso M, et al.: Integrative
genomics identifies LMO1 as a neuroblastoma oncogene. Nature 2011, 469:216–
22010.1038/nature09609.
67. Reid C: Company Profile: Complete Genomics Inc. Future Oncology 2011, 7:219–
22110.2217/fon.10.173.
68. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors.
Proc. Natl. Acad. Sci. U.S.A 1977, 74:5463–5467.
69. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H,
Teague J, Butler A, Stevens C, Edkins S, O‘Meara S, Vastrik I, Schmidt EE, Avis T,
Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray
K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, et al.: Patterns of
somatic mutation in human cancer genomes. Nature 2007, 446:153–
15810.1038/nature05610.
70. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J,
Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Goodwin BC,
He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim
J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al.: Genome
Sequencing in Open Microfabricated High Density Picoliter Reactors. Nature 2005,
437:376–38010.1038/nature03959.
71. Tawfik DS, Griffiths AD: Man-made cell-like compartments for molecular evolution.
Nat. Biotechnol 1998, 16:652–65610.1038/nbt0798-652.
72. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH,
Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson
JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT, Stoner IB,
Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao X, Reed B, et al.: An integrated
semiconductor device enabling non-optical genome sequencing. Nature 2011, 475:348–
35210.1038/nature10242.
191
73. Bennett ST, Barnes C, Cox A, Davies L, Brown C: Toward the $1000 human genome.
Pharmacogenomics 2005, 6:373–38210.1517/14622416.6.4.373.
74. Bentley DR: Whole-genome re-sequencing. Curr. Opin. Genet. Dev 2006, 16:545–
55210.1016/j.gde.2006.10.009.
75. Braslavsky I, Hebert B, Kartalov E, Quake SR: Sequence information can be obtained
from single DNA molecules. Proc Natl Acad Sci U S A 2003, 100:3960–
396410.1073/pnas.0230489100.
76. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman
B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter
A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G,
Kong X, Kuse R, Lacroix Y, Lin S, et al.: Real-time DNA sequencing from single
polymerase molecules. Science 2009, 323:133–13810.1126/science.1162986.
77. Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB,
Hood LE: Fluorescence detection in automated DNA sequence analysis. Nature 1986,
321:674–67910.1038/321674a0.
78. Rosenblum BB, Lee LG, Spurgeon SL, Khan SH, Menchen SM, Heiner CR, Chen SM:
New dye-labeled terminators for improved DNA sequencing patterns. Nucleic Acids
Research 1997, 25:4500 –450410.1093/nar/25.22.4500.
79. Dames S, Durtschi J, Geiersbach K, Stephens J, Voelkerding KV: Comparison of the
Illumina Genome Analyzer and Roche 454 GS FLX for Resequencing of Hypertrophic
Cardiomyopathy-Associated Genes. J Biomol Tech 2010, 21:73–80.
80. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell
J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake
SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z: Single-molecule DNA
sequencing of a viral genome. Science 2008, 320:106–10910.1126/science.1150427.
81. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang
MD, Zhang K, Mitra RD, Church GM: Accurate Multiplex Polony Sequencing of an
Evolved Bacterial Genome. Science 2005, 309:1728 –173210.1126/science.1117389.
82. Wang T-L, Maierhofer C, Speicher MR, Lengauer C, Vogelstein B, Kinzler KW,
Velculescu VE: Digital karyotyping. Proc. Natl. Acad. Sci. U.S.A 2002, 99:16156–
1616110.1073/pnas.202610899.
83. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene
expression. Science 1995, 270:484–487.
84. Parrett TJ, Yan H: Digital karyotyping technology: exploring the cancer genome.
Expert Rev. Mol. Diagn 2005, 5:917–92510.1586/14737159.5.6.917.
192
85. Salani R, Chang C-L, Cope L, Wang T-L: Digital karyotyping: an update of its
applications in cancer. Mol Diagn Ther 2006, 10:231–237.
86. Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH,
Bajsarowicz K, Paris PL, Tao Q, Kowbel D, Lapuk A, Shagin DA, Shagina IA, Gray JW,
Cheng J-F, de Jong PJ, Pevzner P, Collins C: Decoding the fine-scale structure of a breast
cancer genome and transcriptome. Genome Res 2006, 16:394–40410.1101/gr.4247306.
87. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk
A, Kuo W-L, Magrane G, De Jong P, Gray JW, Collins C: End-sequence profiling:
sequence-based analysis of aberrant genomes. Proc. Natl. Acad. Sci. U.S.A 2003,
100:7696–770110.1073/pnas.1232418100.
88. Krzywinski M, Bosdet I, Mathewson C, Wye N, Brebner J, Chiu R, Corbett R, Field M,
Lee D, Pugh T, Volik S, Siddiqui A, Jones S, Schein J, Collins C, Marra M: A BAC clone
fingerprinting approach to the detection of human genome rearrangements. Genome
Biol 2007, 8:R22410.1186/gb-2007-8-10-r224.
89. Collins FS, Barker AD: Mapping the cancer genome. Pinpointing the genes involved
in cancer will help chart a new course across the complex landscape of human
malignancies. Sci. Am 2007, 296:50–57.
90. Dickson D: Wellcome funds cancer database. Nature 1999, 401:72910.1038/44413.
91. The Cancer Genome Atlas Research Network: Comprehensive genomic
characterization defines human glioblastoma genes and core pathways. Nature 2008,
455:1061–106810.1038/nature07385.
92. Parsons DW, Jones S, Zhang X, Lin JC-H, Leary RJ, Angenendt P, Mankoo P, Carter H,
Siu I-M, Gallia GL, Olivi A, McLendon R, Rasheed BA, Keir S, Nikolskaya T, Nikolsky Y,
Busam DA, Tekleab H, Diaz LA, Hartigan J, Smith DR, Strausberg RL, Marie SKN, Shinjo
SMO, Yan H, Riggins GJ, Bigner DD, Karchin R, Papadopoulos N, Parmigiani G, et al.: An
integrated genomic analysis of human glioblastoma multiforme. Science 2008,
321:1807–181210.1126/science.1164382.
93. Barretina J, Taylor BS, Banerji S, Ramos AH, Lagos-Quintana M, Decarolis PL, Shah K,
Socci ND, Weir BA, Ho A, Chiang DY, Reva B, Mermel CH, Getz G, Antipin Y, Beroukhim
R, Major JE, Hatton C, Nicoletti R, Hanna M, Sharpe T, Fennell TJ, Cibulskis K, Onofrio
RC, Saito T, Shukla N, Lau C, Nelander S, Silver SJ, Sougnez C, et al.: Subtype-specific
genomic alterations define new targets for soft-tissue sarcoma therapy. Nat. Genet 2010,
42:715–72110.1038/ng.619.
94. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C,
Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence
MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR,
Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, et al.: Somatic mutations
affect key pathways in lung adenocarcinoma. Nature 2008, 455:1069–
107510.1038/nature07423.
193
95. Zhang J, Mullighan CG, Harvey RC, Wu G, Chen X, Edmonson M, Buetow KH, Carroll
WL, Chen I-M, Devidas M, Gerhard DS, Loh ML, Reaman GH, Relling MV, Camitta BM,
Bowman WP, Smith MA, Willman CL, Downing JR, Hunger SP: Key pathways are
frequently mutated in high risk childhood acute lymphoblastic leukemia: a report from
the Children’s Oncology Group. Blood 2011, 10.1182/blood-2011-03-341412Available:
http://www.ncbi.nlm.nih.gov/pubmed/21680795.Accessed 27 June 2011.
96. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ,
Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J,
Dawson D, Willson JKV, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH,
Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE: The consensus
coding sequences of human breast and colorectal cancers. Science 2006, 314:268–
27410.1126/science.1133427.
97. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, Shen D, Boca SM, Barber
T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin
R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson
JKV, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PVK, et al.: The genomic
landscapes of human breast and colorectal cancers. Science 2007, 318:1108–
111310.1126/science.1145720.
98. The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian
carcinoma. Nature 2011, 474:609–61510.1038/nature10166.
99. Parsons DW, Li M, Zhang X, Jones S, Leary RJ, Lin JC-H, Boca SM, Carter H, Samayoa
J, Bettegowda C, Gallia GL, Jallo GI, Binder ZA, Nikolsky Y, Hartigan J, Smith DR,
Gerhard DS, Fults DW, VandenBerg S, Berger MS, Marie SKN, Shinjo SMO, Clara C,
Phillips PC, Minturn JE, Biegel JA, Judkins AR, Resnick AC, Storm PB, Curran T, et al.:
The genetic landscape of the childhood cancer medulloblastoma. Science 2011, 331:435–
43910.1126/science.1198056.
100. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung
K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: COSMIC: mining
complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic
Acids Res 2011, 39:D945–95010.1093/nar/gkq929.
101. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC,
Fulton RS, Delehaunty KD, McGrath SD, Fulton LA, Locke DP, Magrini VJ, Abbott RM,
Vickery TL, Reed JS, Robinson JS, Wylie T, Smith SM, Carmichael L, Eldred JM, Harris
CC, Walker J, Peck JB, Du F, Dukes AF, Sanderson GE, Brummett AM, Clark E,
McMichael JF, et al.: Recurring mutations found by sequencing an acute myeloid
leukemia genome. N. Engl. J. Med 2009, 361:1058–106610.1056/NEJMoa0903840.
102. Morin RD, Johnson NA, Severson TM, Mungall AJ, An J, Goya R, Paul JE, Boyle M,
Woolcock BW, Kuchenbauer F, Yap D, Humphries RK, Griffith OL, Shah S, Zhu H,
Kimbara M, Shashkin P, Charlot JF, Tcherpakov M, Corbett R, Tam A, Varhol R, Smailus
D, Moksa M, Zhao Y, Delaney A, Qian H, Birol I, Schein J, Moore R, et al.: Somatic
194
mutation of EZH2 (Y641) in Follicular and Diffuse Large B-cell Lymphomas of
Germinal Center Origin. Nat Genet 2010, 42:181–18510.1038/ng.518.
103. Shah SP, Köbel M, Senz J, Morin RD, Clarke BA, Wiegand KC, Leung G, Zayed A,
Mehl E, Kalloger SE, Sun M, Giuliany R, Yorida E, Jones S, Varhol R, Swenerton KD,
Miller D, Clement PB, Crane C, Madore J, Provencher D, Leung P, DeFazio A, Khattra J,
Turashvili G, Zhao Y, Zeng T, Glover JNM, Vanderhyden B, Zhao C, et al.: Mutation of
FOXL2 in granulosa-cell tumors of the ovary. N. Engl. J. Med 2009, 360:2719–
272910.1056/NEJMoa0902542.
104. Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T, Yuza Y, Garraway LA,
LaFramboise T, Lee JC, Shah K, O‘Neill K, Sasaki H, Lindeman N, Wong K-K, Borras AM,
Gutmann EJ, Dragnev KH, DeBiasi R, Chen T-H, Glatt KA, Greulich H, Desany B, Lubeski
CK, Brockman W, Alvarez P, Hutchison SK, Leamon JH, Ronan MT, Turenchalk GS,
Egholm M, et al.: Sensitive mutation detection in heterogeneous cancer specimens by
massively parallel picoliter reactor sequencing. Nat Med 2006, 12:852–
85510.1038/nm1437.
105. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, DunfordShore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl
C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V,
Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P, et al.: DNA
sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008,
456:66–7210.1038/nature07485.
106. Morin RD, Mendez-Lago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, Johnson
NA, Severson TM, Chiu R, Field M, Jackman S, Krzywinski M, Scott DW, Trinh DL,
Tamura-Wells J, Li S, Firme MR, Rogic S, Griffith M, Chan S, Yakovenko O, Meyer IM,
Zhao EY, Smailus D, Moksa M, Chittaranjan S, Rimsza L, Brooks-Wilson A, Spinelli JJ,
Ben-Neriah S, et al.: Frequent mutation of histone-modifying genes in non-Hodgkin
lymphoma. Nature 2011, 476:298–30310.1038/nature10351.
107. Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, Sboner
A, Esgueva R, Pflueger D, Sougnez C, Onofrio R, Carter SL, Park K, Habegger L, Ambrogio
L, Fennell T, Parkin M, Saksena G, Voet D, Ramos AH, Pugh TJ, Wilkinson J, Fisher S,
Winckler W, Mahan S, Ardlie K, Baldwin J, Simons JW, Kitabayashi N, MacDonald TY, et
al.: The genomic complexity of primary human prostate cancer. Nature 2011, 470:214–
22010.1038/nature09744.
108. Chapman MA, Lawrence MS, Keats JJ, Cibulskis K, Sougnez C, Schinzel AC, Harview
CL, Brunet J-P, Ahmann GJ, Adli M, Anderson KC, Ardlie KG, Auclair D, Baker A,
Bergsagel PL, Bernstein BE, Drier Y, Fonseca R, Gabriel SB, Hofmeister CC, Jagannath S,
Jakubowiak AJ, Krishnan A, Levy J, Liefeld T, Lonial S, Mahan S, Mfuko B, Monti S,
Perkins LM, et al.: Initial genome sequencing and analysis of multiple myeloma. Nature
2011, 471:467–47210.1038/nature09837.
195
109. Jones SJ, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T,
Chuah E, Corbett R, Fejes AP, Griffith M, Yee J, Martin M, Mayo M, Melnyk N, Morin RD,
Pugh TJ, Severson T, Shah SP, Sutcliffe M, Tam A, Terry J, Thiessen N, Thomson T, Varhol
R, Zeng T, Zhao Y, Moore RA, Huntsman DG, et al.: Evolution of an adenocarcinoma in
response to selection by targeted kinase inhibitors. Genome Biol 2010,
11:R8210.1186/gb-2010-11-8-r82.
110. Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt
D, Ha C, Johnson S, Kennemer MI, Mohan S, Nazarenko I, Watanabe C, Sparks AB, Shames
DS, Gentleman R, de Sauvage FJ, Stern H, Pandita A, Ballinger DG, Drmanac R, Modrusan
Z, Seshagiri S, Zhang Z: The mutation spectrum revealed by paired genome sequences
from a lung cancer patient. Nature 2010, 465:473–47710.1038/nature09004.
111. Pleasance ED, Stephens PJ, O‘Meara S, McBride DJ, Meynert A, Jones D, Lin M-L,
Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ,
Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A,
Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, et al.: A smallcell lung cancer genome with complex signatures of tobacco exposure. Nature 2010,
463:184–19010.1038/nature08629.
112. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD,
Varela I, Lin M-L, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A,
Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ,
Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies
A, et al.: A comprehensive catalogue of somatic mutations from a human cancer
genome. Nature 2010, 463:191–19610.1038/nature08658.
113. Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G,
Jares P, Beà S, González-Díaz M, Bassaganyas L, Baumann T, Juan M, López-Guerra M,
Colomer D, Tubío JMC, López C, Navarro A, Tornador C, Aymerich M, Rozman M,
Hernández JM, Puente DA, Freije JMP, Velasco G, Gutiérrez-Fernández A, Costa D, Carrió
A, Guijarro S, Enjuanes A, et al.: Whole-genome sequencing identifies recurrent
mutations in chronic lymphocytic leukaemia. Nature 2011, 475:101–
10510.1038/nature10113.
114. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K,
Guliany R, Senz J, Steidl C, Holt RA, Jones S, Sun M, Leung G, Moore R, Severson T,
Taylor GA, Teschendorff AE, Tse K, Turashvili G, Varhol R, Warren RL, Watson P, Zhao
Y, Caldas C, Huntsman D, Hirst M, Marra MA, Aparicio S: Mutational evolution in a
lobular breast tumour profiled at single nucleotide resolution. Nature 2009, 461:809–
81310.1038/nature08489.
115. Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F,
Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P,
Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson
J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, et al.: International
network of cancer genome projects. Nature 2010, 464:993–99810.1038/nature08987.
196
116. Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes
through second-generation sequencing. Nat Rev Genet 2010, 11:685–69610.1038/nrg2841.
117. Ajay SS, Parker SCJ, Abaan HO, Fajardo KVF, Margulies EH: Accurate and
comprehensive sequencing of personal genomes. Genome Res. 2011, 21:1498–
150510.1101/gr.123638.111.
118. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E,
Shendure J, Turner DJ: Target-enrichment strategies for next-generation sequencing.
Nat. Methods 2010, 7:111–11810.1038/nmeth.1419.
119. Comino-Méndez I, Gracia-Aznárez FJ, Schiavi F, Landa I, Leandro-García LJ, Letón R,
Honrado E, Ramos-Medina R, Caronia D, Pita G, Gómez-Graña A, de Cubas AA, IngladaPérez L, Maliszewska A, Taschin E, Bobisse S, Pica G, Loli P, Hernández-Lavado R, Díaz
JA, Gómez-Morales M, González-Neira A, Roncador G, Rodríguez-Antona C, Benítez J,
Mannelli M, Opocher G, Robledo M, Cascón A: Exome sequencing identifies MAX
mutations as a cause of hereditary pheochromocytoma. Nat Genet 2011,
10.1038/ng.861Available: http://www.ncbi.nlm.nih.gov/pubmed/21685915.Accessed 27 June
2011.
120. Tiacci E, Trifonov V, Schiavoni G, Holmes A, Kern W, Martelli MP, Pucciarini A,
Bigerna B, Pacini R, Wells VA, Sportoletti P, Pettirossi V, Mannucci R, Elliott O, Liso A,
Ambrosetti A, Pulsoni A, Forconi F, Trentin L, Semenzato G, Inghirami G, Capponi M, Di
Raimondo F, Patti C, Arcaini L, Musto P, Pileri S, Haferlach C, Schnittger S, Pizzolo G, et
al.: BRAF mutations in hairy-cell leukemia. N. Engl. J. Med 2011, 364:2305–
231510.1056/NEJMoa1014209.
121. Totoki Y, Tatsuno K, Yamamoto S, Arai Y, Hosoda F, Ishikawa S, Tsutsumi S, Sonoda
K, Totsuka H, Shirakihara T, Sakamoto H, Wang L, Ojima H, Shimada K, Kosuge T,
Okusaka T, Kato K, Kusuda J, Yoshida T, Aburatani H, Shibata T: High-resolution
characterization of a hepatocellular carcinoma genome. Nat. Genet 2011, 43:464–
46910.1038/ng.804.
122. Varela I, Tarpey P, Raine K, Huang D, Ong CK, Stephens P, Davies H, Jones D, Lin ML, Teague J, Bignell G, Butler A, Cho J, Dalgliesh GL, Galappaththige D, Greenman C,
Hardy C, Jia M, Latimer C, Lau KW, Marshall J, McLaren S, Menzies A, Mudie L,
Stebbings L, Largaespada DA, Wessels LFA, Richard S, Kahnoski RJ, Anema J, et al.:
Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1
in renal carcinoma. Nature 2011, 469:539–54210.1038/nature09639.
123. Yan X-J, Xu J, Gu Z-H, Pan C-M, Lu G, Shen Y, Shi J-Y, Zhu Y-M, Tang L, Zhang XW, Liang W-X, Mi J-Q, Song H-D, Li K-Q, Chen Z, Chen S-J: Exome sequencing
identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute
monocytic leukemia. Nat. Genet 2011, 43:309–31510.1038/ng.788.
124. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene
expression patterns with a complementary DNA microarray. Science 1995, 270:467–470.
197
125. Pozhitkov AE, Tautz D, Noble PA: Oligonucleotide microarrays: widely applied—
poorly understood. Briefings in Functional Genomics & Proteomics 2007, 6:141 –
14810.1093/bfgp/elm014.
126. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh
ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of
cancer: class discovery and class prediction by gene expression monitoring. Science
1999, 286:531–537.
127. Balmain A: Cancer genetics: from Boveri and Mendel to microarrays. Nat. Rev.
Cancer 2001, 1:77–8210.1038/35094086.
128. Perez-Diez A, Morgun A, Shulzhenko N: Microarrays for cancer diagnosis and
classification. Adv. Exp. Med. Biol. 2007, 593:74–8510.1007/978-0-387-39978-2_8.
129. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet
H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB,
Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke
R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, et al.: Distinct types of
diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000,
403:503–51110.1038/35000501.
130. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB,
van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lønning
PE, Børresen-Dale A-L: Gene expression patterns of breast carcinomas distinguish
tumor subclasses with clinical implications. Proceedings of the National Academy of
Sciences 2001, 98:10869 –1087410.1073/pnas.191367098.
131. Kunz G: Use of a genomic test (MammaPrintTM) in daily clinical practice to assist
in risk stratification of young breast cancer patients. Arch. Gynecol. Obstet 2011,
283:597–60210.1007/s00404-010-1454-9.
132. White NMA, Bao TT, Grigull J, Youssef YM, Girgis A, Diamandis M, Fatoohi E,
Metias M, Honey RJ, Stewart R, Pace KT, Bjarnason GA, Yousef GM: miRNA profiling
for clear cell renal cell carcinoma: biomarker discovery and identification of potential
controls and consequences of miRNA dysregulation. J. Urol. 2011, 186:1077–
108310.1016/j.juro.2011.04.110.
133. Griffith M, Tang MJ, Griffith OL, Morin RD, Chan SY, Asano JK, Zeng T, Flibotte S,
Ally A, Baross A, Hirst M, Jones SJM, Morin GB, Tai IT, Marra MA: ALEXA: a
microarray design platform for alternative expression analysis. Nat. Methods 2008,
5:11810.1038/nmeth0208-118.
134. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy
S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon
K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene
expression analysis by massively parallel signature sequencing (MPSS) on microbead
arrays. Nat. Biotechnol. 2000, 18:630–63410.1038/76469.
198
135. Matsumura H, Reich S, Ito A, Saitoh H, Kamoun S, Winter P, Kahl G, Reuter M,
Kruger DH, Terauchi R: Gene expression analysis of plant host-pathogen interactions by
SuperSAGE. Proc. Natl. Acad. Sci. U.S.A 2003, 100:15718–
1572310.1073/pnas.2536670100.
136. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW,
Velculescu VE: Using the transcriptome to annotate the genome. Nat. Biotechnol 2002,
20:508–51210.1038/nbt0502-508.
137. Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S,
Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G: In vitro cloning of complex
mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs.
Proc. Natl. Acad. Sci. U.S.A. 2000, 97:1665–1670.
138. Morozova O, Marra MA: Applications of next-generation sequencing technologies in
functional genomics. Genomics 2008, 92:255–26410.1016/j.ygeno.2008.07.001.
139. Audic S, Claverie JM: The significance of digital gene expression profiles. Genome
Res. 1997, 7:986–995.
140. Wang SM: Understanding SAGE data. Trends Genet. 2007, 23:42–
5010.1016/j.tig.2006.11.001.
141. Nielsen KL, Høgh AL, Emmersen J: DeepSAGE--digital transcriptomics with high
sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res
2006, 34:e13310.1093/nar/gkl714.
142. Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M,
Marra MA: Next-generation tag sequencing for cancer gene expression profiling.
Genome Res 2009, 19:1825–183510.1101/gr.094482.109.
143. Gowda M, Li H, Alessi J, Chen F, Pratt R, Wang G-L: Robust analysis of 5’transcript ends (5’-RATE): a novel technique for transcriptome analysis and genome
annotation. Nucleic Acids Res 2006, 34:e12610.1093/nar/gkl522.
144. Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA,
Prange C, Morin PJ, Polyak K, Papadopoulos N, Vogelstein B, Kinzler KW, Strausberg RL,
Riggins GJ: A Public Database for Gene Expression in Human Cancers. Cancer
Research 1999, 59:5403 –5407.
145. Tsai C-C, Chung Y-D, Lee H-J, Chang W-H, Suzuku Y, Sugano S, Lin J-Y: Largescale sequencing analysis of the full-length cDNA library of human hepatocellular
carcinoma. J. Biomed. Sci 2003, 10:636–64310.1159/000073529.
146. Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N,
DuBuque T, Favello A, Gish W, Hawkins M, Hultman M, Kucaba T, Lacy M, Le M, Le N,
Mardis E, Moore B, Morris M, Parsons J, Prange C, Rifkin L, Rohlfing T, Schellenberg K,
199
Marra M: Generation and analysis of 280,000 human expressed sequence tags. Genome
Research 1996, 6:807 –82810.1101/gr.6.9.807.
147. Sun M, Zhou G, Lee S, Chen J, Shi RZ, Wang SM: SAGE is far more sensitive than
EST for detecting low-abundance transcripts. BMC Genomics 2004, 5:110.1186/14712164-5-1.
148. Morozova O, Hirst M, Marra MA: Applications of new sequencing technologies for
transcriptome analysis. Annu Rev Genomics Hum Genet 2009, 10:135–
15110.1146/annurev-genom-082908-145957.
149. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol
R, Jones S, Marra M: Profiling the HeLa S3 transcriptome using randomly primed
cDNA and massively parallel short-read sequencing. BioTechniques 2008, 45:81–
9410.2144/000112900.
150. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008, 5:621–
62810.1038/nmeth.1226.
151. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The
transcriptional landscape of the yeast genome defined by RNA sequencing. Science
2008, 320:1344–134910.1126/science.1158441.
152. Costa V, Angelini C, De Feis I, Ciccodicola A: Uncovering the complexity of
transcriptomes with RNA-Seq. J. Biomed. Biotechnol 2010,
2010:85391610.1155/2010/853916.
153. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics.
Nat. Rev. Genet. 2009, 10:57–6310.1038/nrg2484.
154. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R,
Tang MJ, Hou Y-C, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li
HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM,
Tai IT, Marra MA: Alternative expression analysis by RNA sequencing. Nat Meth 2010,
7:843–84710.1038/nmeth.1503.
155. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson
LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E,
Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES,
Dummer R, Gnirke A, Nusbaum C, Garraway LA: Integrative analysis of the melanoma
transcriptome. Genome Res. 2010, 20:413–42710.1101/gr.103697.109.
156. Pflueger D, Terry S, Sboner A, Habegger L, Esgueva R, Lin P-C, Svensson MA,
Kitabayashi N, Moss BJ, MacDonald TY, Cao X, Barrette T, Tewari AK, Chee MS,
Chinnaiyan AM, Rickman DS, Demichelis F, Gerstein MB, Rubin MA: Discovery of nonETS gene fusions in human prostate cancer using next-generation RNA sequencing.
Genome Res. 2011, 21:56–6710.1101/gr.110684.110.
200
157. Rosenberg BR, Hamilton CE, Mwangi MM, Dewell S, Papavasiliou FN:
Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in
transcript 3’ UTRs. Nat. Struct. Mol. Biol. 2011, 18:230–23610.1038/nsmb.1975.
158. Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R,
Kong Y, Kitabayashi N, Bhardwaj N, Rubin M, Snyder M, Gerstein M: AlleleSeq: analysis
of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 2011,
7:52210.1038/msb.2011.54.
159. Wiegand KC, Shah SP, Al-Agha OM, Zhao Y, Tse K, Zeng T, Senz J, McConechy MK,
Anglesio MS, Kalloger SE, Yang W, Heravi-Moussavi A, Giuliany R, Chow C, Fee J, Zayed
A, Prentice L, Melnyk N, Turashvili G, Delaney AD, Madore J, Yip S, McPherson AW, Ha
G, Bell L, Fereday S, Tam A, Galletta L, Tonin PN, Provencher D, et al.: ARID1A
mutations in endometriosis-associated ovarian carcinomas. N. Engl. J. Med 2010,
363:1532–154310.1056/NEJMoa1008433.
160. Greif PA, Eck SH, Konstandin NP, Benet-Pagès A, Ksienzyk B, Dufour A, Vetter AT,
Popp HD, Lorenz-Depiereux B, Meitinger T, Bohlander SK, Strom TM: Identification of
recurring tumor-specific somatic mutations in acute myeloid leukemia by
transcriptome sequencing. Leukemia 2011, 25:821–82710.1038/leu.2011.19.
161. Sugarbaker DJ, Richards WG, Gordon GJ, Dong L, De Rienzo A, Maulik G, Glickman
JN, Chirieac LR, Hartman M-L, Taillon BE, Du L, Bouffard P, Kingsmore SF, Miller NA,
Farmer AD, Jensen RV, Gullans SR, Bueno R: Transcriptome sequencing of malignant
pleural mesothelioma tumors. Proc. Natl. Acad. Sci. U.S.A 2008, 105:3521–
352610.1073/pnas.0712399105.
162. Palanisamy N, Ateeq B, Kalyana-Sundaram S, Pflueger D, Ramnarayanan K, Shankar
S, Han B, Cao Q, Cao X, Suleman K, Kumar-Sinha C, Dhanasekaran SM, Chen Y, Esgueva
R, Banerjee S, LaFargue CJ, Siddiqui J, Demichelis F, Moeller P, Bismar TA, Kuefer R,
Fullen DR, Johnson TM, Greenson JK, Giordano TJ, Tan P, Tomlins SA, Varambally S,
Rubin MA, Maher CA, et al.: Rearrangements of the RAF kinase pathway in prostate
cancer, gastric cancer and melanoma. Nat. Med 2010, 16:793–79810.1038/nm.2166.
163. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov
JP: Integrative genomics viewer. Nat Biotech 2011, 29:24–2610.1038/nbt.1754.
164. Zhang J, Finney R, Edmonson M, Schaefer C, Rowe W, Yan C, Clifford R, Greenblum
S, Wu G, Zhang H, Liu H, Nguyen C, Hu Y, Madhavan S, Ding L, Wheeler DA, Gerhard
DS, Buetow KH: The Cancer Genome Workbench: Identifying and Visualizing
Complex Genetic Alterations in Tumors. NCI Nature Pathway Interaction Database 2010,
10.1038/pid.2010.1Available:
http://pid.nci.nih.gov/PID/2010/100309/full/pid.2010.1.shtml.Accessed 11 January 2012.
165. Sanborn JZ, Benz SC, Craft B, Szeto C, Kober KM, Meyer L, Vaske CJ, Goldman M,
Smith KE, Kuhn RM, Karolchik D, Kent WJ, Stuart JM, Haussler D, Zhu J: The UCSC
cancer genomics browser: update 2011. Nucleic Acids Research 2010, 39:D951–
D95910.1093/nar/gkq1113.
201
166. Hogan LE, Meyer JA, Yang J, Wang J, Wong N, Yang W, Condos G, Hunger SP, Raetz
E, Saffery R, Relling MV, Bhojwani D, Morrison DJ, Carroll WL: Integrated genomic
analysis of relapsed childhood acute lymphoblastic leukemia reveals therapeutic
strategies. Blood 2011, 118:5218–522610.1182/blood-2011-04-345595.
167. Cho Y-J, Tsherniak A, Tamayo P, Santagata S, Ligon A, Greulich H, Berhoukim R,
Amani V, Goumnerova L, Eberhart CG, Lau CC, Olson JM, Gilbertson RJ, Gajjar A,
Delattre O, Kool M, Ligon K, Meyerson M, Mesirov JP, Pomeroy SL: Integrative genomic
analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical
outcome. J. Clin. Oncol. 2011, 29:1424–143010.1200/JCO.2010.28.5148.
168. Verhaak RGW, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR,
Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O‘Kelly M, Tamayo P, Weir BA,
Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG, James CD, Sarkaria JN,
Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP, Gray JW, Meyerson M, et al.:
Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma
characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010,
17:98–11010.1016/j.ccr.2009.12.020.
169. Floratos A, Smith K, Ji Z, Watkinson J, Califano A: geWorkbench: an open source
platform for integrative genomics. Bioinformatics 2010, 26:1779–
178010.1093/bioinformatics/btq282.
170. Zhang J, Benavente CA, McEvoy J, Flores-Otero J, Ding L, Chen X, Ulyanov A, Wu G,
Wilson M, Wang J, Brennan R, Rusch M, Manning AL, Ma J, Easton J, Shurtleff S,
Mullighan C, Pounds S, Mukatira S, Gupta P, Neale G, Zhao D, Lu C, Fulton RS, Fulton LL,
Hong X, Dooling DJ, Ochoa K, Naeve C, Dyson NJ, et al.: A novel retinoblastoma therapy
from genomic and epigenetic analyses. Nature 2012, advance online
publication10.1038/nature10733Available: http://dx.doi.org/10.1038/nature10733.Accessed
22 January 2012.
171. Scotting PJ, Walker DA, Perilongo G: Childhood solid tumours: a developmental
disorder. Nat Rev Cancer 2005, 5:481–48810.1038/nrc1633.
172. Goodman, Gurney, Smith, Olshan: Sympathetic nervous system tumors. Available:
http://seer.cancer.gov/publications/childhood/.Accessed 5 March 2011.
173. Huber K: The sympathoadrenal cell lineage: Specification, diversification, and new
perspectives. Developmental Biology 2006, 298:335–34316/j.ydbio.2006.07.010.
174. Maris JM: Recent advances in neuroblastoma. N. Engl. J. Med 2010, 362:2202–
221110.1056/NEJMra0804577.
175. Mohlin SA, Wigerup C, Påhlman S: Neuroblastoma aggressiveness in relation to
sympathetic neuronal differentiation stage. Seminars in Cancer Biology 2011, 21:276–
28210.1016/j.semcancer.2011.09.002.
202
176. Alam G, Cui H, Shi H, Yang L, Ding J, Mao L, Maltese WA, Ding H-F: MYCN
promotes the expansion of Phox2B-positive neuronal progenitors to drive
neuroblastoma development. Am. J. Pathol. 2009, 175:856–
86610.2353/ajpath.2009.090019.
177. Joyner BD: Neuroblastoma: eMedicine Urology. 2010, Available:
http://emedicine.medscape.com/article/439263-overview.Accessed 5 March 2011.
178. Brodeur GM: Neuroblastoma: biological insights into a clinical enigma. Nat. Rev.
Cancer 2003, 3:203–21610.1038/nrc1014.
179. Mueller S, Matthay KK: Neuroblastoma: biology and staging. Curr Oncol Rep 2009,
11:431–438.
180. London WB, Castleberry RP, Matthay KK, Look AT, Seeger RC, Shimada H, Thorner
P, Brodeur G, Maris JM, Reynolds CP, Cohn SL: Evidence for an age cutoff greater than
365 days for neuroblastoma risk group stratification in the Children’s Oncology Group.
J. Clin. Oncol 2005, 23:6459–646510.1200/JCO.2005.05.571.
181. Brodeur GM, Pritchard J, Berthold F, Carlsen NL, Castel V, Castelberry RP, De
Bernardi B, Evans AE, Favrot M, Hedborg F: Revisions of the international criteria for
neuroblastoma diagnosis, staging, and response to treatment. J. Clin. Oncol 1993,
11:1466–1477.
182. Monclair T, Brodeur GM, Ambros PF, Brisse HJ, Cecchetto G, Holmes K, Kaneko M,
London WB, Matthay KK, Nuchtern JG, von Schweinitz D, Simon T, Cohn SL, Pearson
ADJ: The International Neuroblastoma Risk Group (INRG) Staging System: An INRG
Task Force Report. Journal of Clinical Oncology 2009, 27:298 –
30310.1200/JCO.2008.16.6876.
183. Cohn SL, Pearson ADJ, London WB, Monclair T, Ambros PF, Brodeur GM, Faldum A,
Hero B, Iehara T, Machin D, Mosseri V, Simon T, Garaventa A, Castel V, Matthay KK: The
International Neuroblastoma Risk Group (INRG) Classification System: An INRG
Task Force Report. Journal of Clinical Oncology 2009, 27:289 –
29710.1200/JCO.2008.16.6785.
184. Øra I, Eggert A: Progress in treatment and risk stratification of neuroblastoma:
Impact on future clinical and basic research. Seminars in Cancer Biology 2011, 21:217–
22810.1016/j.semcancer.2011.07.002.
185. Yu AL, Gilman AL, Ozkaynak MF, London WB, Kreissman SG, Chen HX, Smith M,
Anderson B, Villablanca JG, Matthay KK, Shimada H, Grupp SA, Seeger R, Reynolds CP,
Buxton A, Reisfeld RA, Gillies SD, Cohn SL, Maris JM, Sondel PM: Anti-GD2 antibody
with GM-CSF, interleukin-2, and isotretinoin for neuroblastoma. N. Engl. J. Med 2010,
363:1324–133410.1056/NEJMoa0911123.
186. Knudson AG, Strong LC: Mutation and cancer: neuroblastoma and
pheochromocytoma. Am J Hum Genet 1972, 24:514–532.
203
187. Janoueix-Lerosey I, Schleiermacher G, Michels E, Mosseri V, Ribeiro A, Lequin D,
Vermeulen J, Couturier J, Peuchmaur M, Valent A, Plantaz D, Rubie H, Valteau-Couanet D,
Thomas C, Combaret V, Rousseau R, Eggert A, Michon J, Speleman F, Delattre O: Overall
genomic pattern is a predictor of outcome in neuroblastoma. J. Clin. Oncol 2009,
27:1026–103310.1200/JCO.2008.16.0630.
188. Mossé YP, Laudenslager M, Longo L, Cole KA, Wood A, Attiyeh EF, Laquaglia MJ,
Sennett R, Lynch JE, Perri P, Laureys G, Speleman F, Kim C, Hou C, Hakonarson H,
Torkamani A, Schork NJ, Brodeur GM, Tonini GP, Rappaport E, Devoto M, Maris JM:
Identification of ALK as a major familial neuroblastoma predisposition gene. Nature
2008, 455:930–93510.1038/nature07261.
189. Mosse YP, Laudenslager M, Khazi D, Carlisle AJ, Winter CL, Rappaport E, Maris JM:
Germline PHOX2B mutation in hereditary neuroblastoma. Am. J. Hum. Genet 2004,
75:727–73010.1086/424530.
190. Trochet D, Bourdeaut F, Janoueix-Lerosey I, Deville A, de Pontual L, Schleiermacher
G, Coze C, Philip N, Frébourg T, Munnich A, Lyonnet S, Delattre O, Amiel J: Germline
mutations of the paired-like homeobox 2B (PHOX2B) gene in neuroblastoma. Am. J.
Hum. Genet 2004, 74:761–76410.1086/383253.
191. Pattyn A, Morin X, Cremer H, Goridis C, Brunet J-F: The homeobox gene Phox2b is
essential for the development of autonomic neural crest derivatives. Nature 1999,
399:366–37010.1038/20700.
192. Chen Y, Takita J, Choi YL, Kato M, Ohira M, Sanada M, Wang L, Soda M, Kikuchi A,
Igarashi T, Nakagawara A, Hayashi Y, Mano H, Ogawa S: Oncogenic mutations of ALK
kinase in neuroblastoma. Nature 2008, 455:971–97410.1038/nature07399.
193. George RE, Sanda T, Hanna M, Fröhling S, Luther W, Zhang J, Ahn Y, Zhou W,
London WB, McGrady P, Xue L, Zozulya S, Gregor VE, Webb TR, Gray NS, Gilliland DG,
Diller L, Greulich H, Morris SW, Meyerson M, Look AT: Activating mutations in ALK
provide a therapeutic target in neuroblastoma. Nature 2008, 455:975–
97810.1038/nature07397.
194. Janoueix-Lerosey I, Lequin D, Brugières L, Ribeiro A, de Pontual L, Combaret V,
Raynal V, Puisieux A, Schleiermacher G, Pierron G, Valteau-Couanet D, Frebourg T,
Michon J, Lyonnet S, Amiel J, Delattre O: Somatic and germline activating mutations of
the ALK kinase receptor in neuroblastoma. Nature 2008, 455:967–
97010.1038/nature07398.
195. Passoni L, Longo L, Collini P, Coluccia AML, Bozzi F, Podda M, Gregorio A, Gambini
C, Garaventa A, Pistoia V, Del Grosso F, Tonini GP, Cheng M, Gambacorti-Passerini C,
Anichini A, Fossati-Bellani F, Di Nicola M, Luksch R: Mutation-independent anaplastic
lymphoma kinase overexpression in poor prognosis neuroblastoma patients. Cancer Res
2009, 69:7338–734610.1158/0008-5472.CAN-08-4419.
204
196. Deyell RJ, Attiyeh EF: Advances in the understanding of constitutional and somatic
genomic alterations in neuroblastoma. Cancer Genetics 2011, 204:113–
12116/j.cancergen.2011.03.001.
197. Capasso M, Devoto M, Hou C, Asgharzadeh S, Glessner JT, Attiyeh EF, Mosse YP,
Kim C, Diskin SJ, Cole KA, Bosse K, Diamond M, Laudenslager M, Winter C, Bradfield JP,
Scott RH, Jagannathan J, Garris M, McConville C, London WB, Seeger RC, Grant SFA, Li
H, Rahman N, Rappaport E, Hakonarson H, Maris JM: Common variations in BARD1
influence susceptibility to high-risk neuroblastoma. Nat. Genet 2009, 41:718–
72310.1038/ng.374.
198. Maris JM, Mosse YP, Bradfield JP, Hou C, Monni S, Scott RH, Asgharzadeh S, Attiyeh
EF, Diskin SJ, Laudenslager M, Winter C, Cole KA, Glessner JT, Kim C, Frackelton EC,
Casalunovo T, Eckert AW, Capasso M, Rappaport EF, McConville C, London WB, Seeger
RC, Rahman N, Devoto M, Grant SFA, Li H, Hakonarson H: Chromosome 6p22 locus
associated with clinically aggressive neuroblastoma. N. Engl. J. Med 2008, 358:2585–
259310.1056/NEJMoa0708698.
199. Nguyen LB, Diskin SJ, Capasso M, Wang K, Diamond MA, Glessner J, Kim C, Attiyeh
EF, Mosse YP, Cole K, Iolascon A, Devoto M, Hakonarson H, Li HK, Maris JM: Phenotype
restricted genome-wide association study using a gene-centric approach identifies three
low-risk neuroblastoma susceptibility Loci. PLoS Genet. 2011,
7:e100202610.1371/journal.pgen.1002026.
200. Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mossé
YP, Wood A, Lynch JE, Pecor K, Diamond M, Winter C, Wang K, Kim C, Geiger EA,
McGrady PW, Blakemore AIF, London WB, Shaikh TH, Bradfield J, Grant SFA, Li H,
Devoto M, Rappaport ER, Hakonarson H, Maris JM: Copy number variation at 1q21.1
associated with neuroblastoma. Nature 2009, 459:987–99110.1038/nature08035.
201. Schwab M, Alitalo K, Klempnauer K-H, Varmus HE, Bishop JM, Gilbert F, Brodeur G,
Goldstein M, Trent J: Amplified DNA with limited homology to myc cellular oncogene is
shared by human neuroblastoma cell lines and a neuroblastoma tumour. Nature 1983,
305:245–24810.1038/305245a0.
202. Brodeur G, Seeger R, Schwab M, Varmus H, Bishop J: Amplification of N-myc in
untreated human neuroblastomas correlates with advanced disease stage. Science 1984,
224:1121 –112410.1126/science.6719137.
203. Seeger RC, Brodeur GM, Sather H, Dalton A, Siegel SE, Wong KY, Hammond D:
Association of multiple copies of the N-myc oncogene with rapid progression of
neuroblastomas. N. Engl. J. Med 1985, 313:1111–111610.1056/NEJM198510313131802.
204. Subramaniam MM, Piqueras M, Navarro S, Noguera R: Aberrant copy numbers of
ALK gene is a frequent genetic alteration in neuroblastomas. Hum. Pathol 2009,
40:1638–164210.1016/j.humpath.2009.05.002.
205
205. Attiyeh EF, London WB, Mossé YP, Wang Q, Winter C, Khazi D, McGrady PW,
Seeger RC, Look AT, Shimada H, Brodeur GM, Cohn SL, Matthay KK, Maris JM:
Chromosome 1p and 11q deletions and outcome in neuroblastoma. N. Engl. J. Med 2005,
353:2243–225310.1056/NEJMoa052399.
206. Guo C, White PS, Weiss MJ, Hogarty MD, Thompson PM, Stram DO, Gerbing R,
Matthay KK, Seeger RC, Brodeur GM, Maris JM: Allelic deletion at 11q23 is common in
MYCN single copy neuroblastomas. Oncogene 1999, 18:4948–
495710.1038/sj.onc.1202887.
207. Abel F, Ejeskär K, Kogner P, Martinsson T: Gain of chromosome arm 17q is
associated with unfavourable prognosis in neuroblastoma, but does not involve
mutations in the somatostatin receptor 2(SSTR2) gene at 17q24. Br. J. Cancer 1999,
81:1402–140910.1038/sj.bjc.6692231.
208. Stallings RL, Carty P, McArdle L, Mullarkey M, McDermott M, Breatnach F, O‘Meara
A: Molecular cytogenetic analysis of recurrent unbalanced t(11;17) in neuroblastoma.
Cancer Genet. Cytogenet 2004, 154:44–5110.1016/j.cancergencyto.2004.04.003.
209. Stark B, Jeison M, Glaser-Gabay L, Bar-Am I, Mardoukh J, Ash S, Atias D, Stein J,
Zaizov R, Yaniv I: der(11)t(11;17): a distinct cytogenetic pathway of advanced stage
neuroblastoma (NBL) - detected by spectral karyotyping (SKY). Cancer Lett 2003,
197:75–79.
210. Nakagawara A, Arima-Nakagawara M, Scavarda NJ, Azar CG, Cantor AB, Brodeur
GM: Association between high levels of expression of the TRK gene and favorable
outcome in human neuroblastoma. N. Engl. J. Med 1993, 328:847–
85410.1056/NEJM199303253281205.
211. Rydén M, Sehgal R, Dominici C, Schilling FH, Ibáñez CF, Kogner P: Expression of
mRNA for the neurotrophin receptor trkC in neuroblastomas with favourable tumour
stage and good prognosis. Br J Cancer 1996, 74:773–779.
212. Nakagawara A, Azar CG, Scavarda NJ, Brodeur GM: Expression and function of
TRK-B and BDNF in human neuroblastomas. Mol. Cell. Biol 1994, 14:759–767.
213. Wei JS, Greer BT, Westermann F, Steinberg SM, Son C-G, Chen Q-R, Whiteford CC,
Bilke S, Krasnoselsky AL, Cenacchi N, Catchpoole D, Berthold F, Schwab M, Khan J:
Prediction of clinical outcome using gene expression profiling and artificial neural
networks for patients with neuroblastoma. Cancer Res 2004, 64:6883–689110.1158/00085472.CAN-04-0695.
214. Asgharzadeh S, Pique-Regi R, Sposto R, Wang H, Yang Y, Shimada H, Matthay K,
Buckley J, Ortega A, Seeger RC: Prognostic significance of gene expression profiles of
metastatic neuroblastomas lacking MYCN gene amplification. J. Natl. Cancer Inst. 2006,
98:1193–120310.1093/jnci/djj330.
206
215. Ohira M, Oba S, Nakamura Y, Isogai E, Kaneko S, Nakagawa A, Hirata T, Kubo H,
Goto T, Yamada S, Yoshida Y, Fuchioka M, Ishii S, Nakagawara A: Expression profiling
using a tumor-specific cDNA microarray predicts the prognosis of intermediate risk
neuroblastomas. Cancer Cell 2005, 7:337–35010.1016/j.ccr.2005.03.019.
216. Oberthuer A, Berthold F, Warnat P, Hero B, Kahlert Y, Spitz R, Ernestus K, König R,
Haas S, Eils R, Schwab M, Brors B, Westermann F, Fischer M: Customized oligonucleotide
microarray gene expression-based classification of neuroblastoma patients outperforms
current clinical risk stratification. J. Clin. Oncol. 2006, 24:5070–
507810.1200/JCO.2006.06.1879.
217. Oberthuer A, Hero B, Berthold F, Juraeva D, Faldum A, Kahlert Y, Asgharzadeh S,
Seeger R, Scaruffi P, Tonini GP, Janoueix-Lerosey I, Delattre O, Schleiermacher G,
Vandesompele J, Vermeulen J, Speleman F, Noguera R, Piqueras M, Bénard J, Valent A,
Avigad S, Yaniv I, Weber A, Christiansen H, Grundy RG, Schardt K, Schwab M, Eils R,
Warnat P, Kaderali L, et al.: Prognostic impact of gene expression-based classification for
neuroblastoma. J. Clin. Oncol. 2010, 28:3506–351510.1200/JCO.2009.27.3367.
218. Vermeulen J, De Preter K, Naranjo A, Vercruysse L, Roy NV, Hellemans J, Swerts K,
Bravo S, Scaruffi P, Tonini GP, Noguera R, Piqueras M, Janoueix-Lerosey I, Delattre O,
Combaret V, Fischer M, Oberthuer A, Ambros PF, Beiske K, Bénard J, Marques B, Michon
J, Schleiermacher G, Bernardi BD, Rubie H, Cañete A, Castel V, Kohler J, Pötschger U,
Ladenstein R, et al.: Outcome Prediction of Children with Neuroblastoma using a
Multigene Expression Signature, a Retrospective SIOPEN/COG/GPOH Study. Lancet
Oncol 2009, 10:663–67110.1016/S1470-2045(09)70154-8.
219. Politi K, Pao W: How genetically engineered mouse tumor models provide insights
into human cancers. J. Clin. Oncol. 2011, 29:2273–228110.1200/JCO.2010.30.8304.
220. Chesler L, Weiss WA: Genetically engineered murine models – Contribution to our
understanding of the genetics, molecular pathology and therapeutic targeting of
neuroblastoma. Seminars in Cancer Biology 2011, 21:245–
25510.1016/j.semcancer.2011.09.011.
221. Weiss WA, Aldape K, Mohapatra G, Feuerstein BG, Bishop JM: Targeted expression
of MYCN causes neuroblastoma in transgenic mice. EMBO J 1997, 16:2985–
299510.1093/emboj/16.11.2985.
222. Rounbehler RJ, Li W, Hall MA, Yang C, Fallahi M, Cleveland JL: Targeting
Ornithine Decarboxylase Impairs Development of MYCN-Amplified Neuroblastoma.
Cancer Res 2009, 69:547–55310.1158/0008-5472.CAN-08-2968.
223. Teitz T, Stanke JJ, Federico S, Bradley CL, Brennan R, Zhang J, Johnson MD, Sedlacik
J, Inoue M, Zhang ZM, Frase S, Rehg JE, Hillenbrand CM, Finkelstein D, Calabrese C, Dyer
MA, Lahti JM: Preclinical Models for Neuroblastoma: Establishing a Baseline for
Treatment. PLoS ONE 2011, 6:e1913310.1371/journal.pone.0019133.
207
224. Glenn TC: Field guide to next-generation DNA sequencers. Mol Ecol Resour 2011,
10.1111/j.1755-0998.2011.03024.xAvailable:
http://www.ncbi.nlm.nih.gov/pubmed/21592312.Accessed 19 July 2011.
225. Huang X, Saint-Jeannet J-P: Induction of the neural crest and the opportunities of
life on the edge. Developmental Biology 2004, 275:1–1116/j.ydbio.2004.07.033.
226. Anderson DJ: The neural crest cell lineage problem: Neuropoiesis? Neuron 1989,
3:1–1216/0896-6273(89)90110-4.
227. Anderson DJ, Carnahan JF, Michelsohn A, Patterson PH: Antibody markers identify a
common progenitor to sympathetic neurons and chromaffin cells in vivo and reveal the
timing of commitment to neuronal differentiation in the sympathoadrenal lineage. J.
Neurosci 1991, 11:3507–3519.
228. Nakagawara A, Ohira M: Comprehensive genomics linking between neural
development and cancer: neuroblastoma as a model. Cancer Letters 2004, 204:213–
22416/S0304-3835(03)00457-9.
229. Jiang M, Stanke J, Lahti JM: The connections between neural crest development and
neuroblastoma. Curr. Top. Dev. Biol 2011, 94:77–12710.1016/B978-0-12-380916-2.000048.
230. Prockop DJ: Marrow Stromal Cells as Stem Cells for Nonhematopoietic Tissues.
Science 1997, 276:71 –7410.1126/science.276.5309.71.
231. Gage FH: Mammalian Neural Stem Cells. Science 2000, 287:1433 –
143810.1126/science.287.5457.1433.
232. Reynolds B, Weiss S: Generation of neurons and astrocytes from isolated cells of
the adult mammalian central nervous system. Science 1992, 255:1707 –
171010.1126/science.1553558.
233. Toma JG, Akhavan M, Fernandes KJL, Barnabe-Heider F, Sadikot A, Kaplan DR,
Miller FD: Isolation of multipotent adult stem cells from the dermis of mammalian skin.
Nat Cell Biol 2001, 3:778–78410.1038/ncb0901-778.
234. Toma JG, McKenzie IA, Bagli D, Miller FD: Isolation and characterization of
multipotent skin-derived precursors from human skin. Stem Cells 2005, 23:727–
73710.1634/stemcells.2004-0134.
235. Fernandes KJL, McKenzie IA, Mill P, Smith KM, Akhavan M, Barnabe-Heider F,
Biernaskie J, Junek A, Kobayashi NR, Toma JG, Kaplan DR, Labosky PA, Rafuse V, Hui CC, Miller FD: A dermal niche for multipotent adult skin-derived precursor cells. Nat
Cell Biol 2004, 6:1082–109310.1038/ncb1181.
208
236. Biernaskie J, Paris M, Morozova O, Fagan BM, Marra M, Pevny L, Miller FD: SKPs
derive from hair follicle precursors and exhibit properties of adult dermal stem cells.
Cell Stem Cell 2009, 5:610–62310.1016/j.stem.2009.10.019.
237. Christ B, Ordahl CP: Early stages of chick somite development. Anat. Embryol. 1995,
191:381–396.
238. Couly G, Grapin-Botton A, Coltey P, Ruhin B, Le Douarin NM: Determination of the
identity of the derivatives of the cephalic neural crest: incompatibility between Hox
gene expression and lower jaw development. Development 1998, 125:3445–3459.
239. Mauger A: [The role of somitic mesoderm in the development of dorsal plumage in
chick embryos. II. Regionalization of the plumage-forming mesoderm]. J Embryol Exp
Morphol 1972, 28:343–366.
240. Lanza RP: Handbook of stem cells. Academic Press; 2004.
241. Okita K, Ichisaka T, Yamanaka S: Generation of germline-competent induced
pluripotent stem cells. Nature 2007, 448:313–31710.1038/nature05934.
242. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, Hochedlinger K, Bernstein
BE, Jaenisch R: In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like
state. Nature 2007, 448:318–32410.1038/nature05944.
243. Smith KM, Datti A, Fujitani M, Grinshtein N, Zhang L, Morozova O, Blakely KM,
Rotenberg SA, Hansford LM, Miller FD, Yeger H, Irwin MS, Moffat J, Marra MA, Baruchel
S, Wrana JL, Kaplan DR: Selective targeting of neuroblastoma tumour-initiating cells by
compounds identified in stem cell-based small molecule screens. EMBO Mol Med 2010,
2:371–38410.1002/emmm.201000093.
244. Morozova O, Vojvodic M, Grinshtein N, Hansford LM, Blakely KM, Maslova A, Hirst
M, Cezard T, Morin RD, Moore R, Smith KM, Miller F, Taylor P, Thiessen N, Varhol R,
Zhao Y, Jones S, Moffat J, Kislinger T, Moran MF, Kaplan DR, Marra MA: System-level
analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug
target for neuroblastoma. Clin. Cancer Res 2010, 16:4572–458210.1158/1078-0432.CCR10-0627.
245. Jessen KR, Mirsky R: The origin and development of glial cells in peripheral nerves.
Nat. Rev. Neurosci 2005, 6:671–68210.1038/nrn1746.
246. Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit Seds.: Bioinformatics and
Computational Biology Solutions Using R and Bioconductor. New York: Springer-Verlag;
2005 Available: http://www.springerlink.com/content/978-0-387-251462#section=519945&page=1.Accessed 6 June 2011.
247. Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, Guedj M: Should We
Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A
209
Comparison of Variance Modeling Strategies. PLoS ONE 2010,
5:e1233610.1371/journal.pone.0012336.
248. Jeffery IB, Higgins DG, Culhane AC: Comparison and evaluation of methods for
generating differentially expressed gene lists from microarray data. BMC Bioinformatics
2006, 7:35910.1186/1471-2105-7-359.
249. Smyth GK: Linear models and empirical bayes methods for assessing differential
expression in microarray experiments. Stat Appl Genet Mol Biol 2004,
3:Article310.2202/1544-6115.1027.
250. Sauka-Spengler T, Meulemans D, Jones M, Bronner-Fraser M: Ancient evolutionary
origin of the neural crest gene regulatory network. Dev. Cell 2007, 13:405–
42010.1016/j.devcel.2007.08.005.
251. Stemple DL, Anderson DJ: Isolation of a stem cell for neurons and glia from the
mammalian neural crest. Cell 1992, 71:973–985.
252. Liu JP, Jessell TM: A role for rhoB in the delamination of neural crest cells from
the dorsal neural tube. Development 1998, 125:5055–5067.
253. Kurauchi T, Izutsu Y, Maéno M: Involvement of Neptune in induction of the
hatching gland and neural crest in the Xenopus embryo. Differentiation, 79:251–
25916/j.diff.2010.01.003.
254. Wong Y-M, Chow KL: Expression of zebrafish mab21 genes marks the
differentiating eye, midbrain and neural tube. Mech. Dev 2002, 113:149–152.
255. Schraufstatter IU, Discipio RG, Khaldoyanidi S: Mesenchymal stem cells and their
microenvironment. Front. Biosci. 2011, 17:2271–2288.
256. Vodyanik MA, Yu J, Zhang X, Tian S, Stewart R, Thomson JA, Slukvin II: A
Mesoderm-Derived Precursor for Mesenchymal Stem and Endothelial Cells. Cell Stem
Cell 2010, 7:718–72910.1016/j.stem.2010.11.011.
257. Kléber M, Lee H-Y, Wurdak H, Buchstaller J, Riccomagno MM, Ittner LM, Suter U,
Epstein DJ, Sommer L: Neural crest stem cell maintenance by combinatorial Wnt and
BMP signaling. J. Cell Biol. 2005, 169:309–32010.1083/jcb.200411095.
258. Douarin NL, Kalcheim C: The neural crest. Cambridge University Press; 1999.
259. Boon K, Osório EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ,
Buetow KH, Strausberg RL, de Souza SJ, Riggins GJ: An anatomy of normal and
malignant gene expression. Proceedings of the National Academy of Sciences 2002,
99:11287 –1129210.1073/pnas.152324199.
210
260. Morozova O, Morozov V, Hoffman BG, Helgason CD, Marra MA: A seriation
approach for visualization-driven discovery of co-expression patterns in Serial Analysis
of Gene Expression (SAGE) data. PLoS ONE 2008, 3:e320510.1371/journal.pone.0003205.
261. Robinson WS: A method for chronologically ordering archaeological deposits.
American Antiquity 1951, 16:293–301.
262. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG,
Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core
transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122:947–
95610.1016/j.cell.2005.08.020.
263. Roider HG, Manke T, O‘Keeffe S, Vingron M, Haas SA: PASTAA: identifying
transcription factors associated with sets of co-regulated genes. Bioinformatics 2009,
25:435–44210.1093/bioinformatics/btn627.
264. Radomska HS, Satterthwaite AB, Taranenko N, Narravula S, Krause DS, Tenen DG: A
nuclear factor Y (NFY) site positively regulates the human CD34 stem cell gene. Blood
1999, 94:3772–3780.
265. Winger Q, Huang J, Auman HJ, Lewandoski M, Williams T: Analysis of transcription
factor AP-2 expression and function during mouse preimplantation development. Biol.
Reprod. 2006, 75:324–33310.1095/biolreprod.106.052407.
266. Schmidt M, Huber L, Majdazari A, Schütz G, Williams T, Rohrer H: The transcription
factors AP-2β and AP-2α are required for survival of sympathetic progenitors and
differentiated sympathetic neurons. Dev. Biol. 2011, 355:89–
10010.1016/j.ydbio.2011.04.011.
267. Cesari F, Brecht S, Vintersten K, Vuong LG, Hofmann M, Klingel K, Schnorr J-J,
Arsenian S, Schild H, Herdegen T, Wiebel FF, Nordheim A: Mice deficient for the ets
transcription factor elk-1 show normal immune responses and mildly impaired
neuronal gene activation. Mol. Cell. Biol. 2004, 24:294–305.
268. Dworkin S, Mantamadiotis T: Targeting CREB signalling in neurogenesis. Expert
Opin. Ther. Targets 2010, 14:869–87910.1517/14728222.2010.501332.
269. Ryser S, Dizin E, Jefford CE, Delaval B, Gagos S, Christodoulidou A, Krause K-H,
Birnbaum D, Irminger-Finger I: Distinct Roles of BARD1 Isoforms in Mitosis: FullLength BARD1 Mediates Aurora B Degradation, Cancer-Associated BARD1β Scaffolds
Aurora B and BRCA2. Cancer Research 2009, 69:1125 –113410.1158/0008-5472.CAN08-2134.
270. Modlin IM, Champaneria MC, Bornschein J, Kidd M: Evolution of the diffuse
neuroendocrine system--clear cells and cloudy origins. Neuroendocrinology 2006, 84:69–
8210.1159/000096997.
211
271. Kuijk EW, Chuva de Sousa Lopes SM, Geijsen N, Macklon N, Roelen BAJ: The
different shades of mammalian pluripotent stem cells. Hum. Reprod. Update 2011,
17:254–27110.1093/humupd/dmq035.
272. Wurdak H, Ittner LM, Lang KS, Leveen P, Suter U, Fischer JA, Karlsson S, Born W,
Sommer L: Inactivation of TGFbeta signaling in neural crest stem cells leads to multiple
defects reminiscent of DiGeorge syndrome. Genes Dev 2005, 19:530–
53510.1101/gad.317405.
273. Chen M-F, Lin C-T, Chen W-C, Yang C-T, Chen C-C, Liao S-K, Liu JM, Lu C-H, Lee
K-D: The sensitivity of human mesenchymal stem cells to ionizing radiation. Int. J.
Radiat. Oncol. Biol. Phys. 2006, 66:244–25310.1016/j.ijrobp.2006.03.062.
274. Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla
N, Prabhu A, Ma K, Lee S, Ally A, Tam A, Sa D, Rogers S, Charest D, Stott J, Zuyderduyn
S, Varhol R, Eaves C, Jones S, Holt R, Hirst M, Hoodless PA, Marra MA: Large-scale
production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell
lines. Genome Res 2007, 17:108–11610.1101/gr.5488207.
275. Caraux G, Pinloche S: PermutMatrix: a graphical environment to arrange gene
expression profiles in optimal linear order. Bioinformatics 2005, 21:1280–
128110.1093/bioinformatics/bti141.
276. Sokal RR, Michener CD: A statistical method for evaluating systematic
relationships. University of Kansas Science Bulletin 1958, 28:1409–1438.
277. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of
genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 1998, 95:14863–14868.
278. Rodriguez-Esteban C, Tsukui T, Yonei S, Magallon J, Tamura K, Izpisua Belmonte JC:
The T-box genes Tbx4 and Tbx5 regulate limb outgrowth and identity. Nature 1999,
398:814–81810.1038/19769.
279. Hansford LM, McKee AE, Zhang L, George RE, Gerstle JT, Thorner PS, Smith KM,
Look AT, Yeger H, Miller FD, Irwin MS, Thiele CJ, Kaplan DR: Neuroblastoma cells
isolated from bone marrow metastases contain a naturally enriched tumor-initiating
cell. Cancer Res 2007, 67:11234–1124310.1158/0008-5472.CAN-07-0718.
280. Pahlman, Sven, Johnsson, Sofie, Pietras, Alexander: Patient-derived EBVimmortalized B-lymphocytes are a dominant contaminant of in vitro cultured human
neuroblastoma tumor-initiating cells isolated from bone marrow. 2011, Available:
http://www.abstractsonline.com/Plan/ViewAbstract.aspx?sKey=d2eb516b-a2a1-4fac-9e8d3b032d6bc731&cKey=29fa6ed6-fc53-4101-8a24-608b68ef3e2f&mKey=%7B507D311AB6EC-436A-BD67-6D14ED39622C%7D.Accessed 20 May 2011.
281. Chen Y, Li D, Li S: The Alox5 gene is a novel therapeutic target in cancer stem cells
of chronic myeloid leukemia. cc 2009, 8:3488–349210.4161/cc.8.21.9852.
212
282. Bitton D, Okoniewski M, Connolly Y, Miller C: Exon level integration of proteomics
and microarray data. BMC Bioinformatics 2008, 9:11810.1186/1471-2105-9-118.
283. Okoniewski MJ, Miller CJ: Comprehensive Analysis of Affymetrix Exon Arrays
Using BioConductor. PLoS Comput Biol 2008, 4:e610.1371/journal.pcbi.0040006.
284. Taylor P, Nielsen PA, Trelle MB, Hørning OB, Andersen MB, Vorm O, Moran MF,
Kislinger T: Automated 2D Peptide Separation on a 1D Nano-LC-MS System. J.
Proteome Res. 2009, 8:1610–161610.1021/pr800986c.
285. Chen EI, Hewel J, Felding-Habermann B, Yates JR: Large Scale Protein Profiling by
Combination of Protein Fractionation and Multidimensional Protein Identification
Technology (MudPIT). Molecular & Cellular Proteomics 2006, 5:53 –
5610.1074/mcp.T500013-MCP200.
286. Skibbens RV: Cell biology of cancer: BRCA1 and sister chromatid pairing
reactions? cc 2008, 7:449–45210.4161/cc.7.4.5435.
287. Billingsley ML: Druggable targets and targeted drugs: enhancing the development
of new therapeutics. Pharmacology 2008, 82:239–24410.1159/000157624.
288. Tobinick EL: The value of drug repositioning in the current pharmaceutical
market. Drug News Perspect 2009, 22:5310.1358/dnp.2009.22.1.1303818.
289. Goldsmith KC, Hogarty MD: Targeting programmed cell death pathways with
experimental therapeutics: opportunities in high-risk neuroblastoma. Cancer Letters
2005, 228:133–14116/j.canlet.2005.01.048.
290. Daniel RA, Rozanska AL, Thomas HD, Mulligan EA, Drew Y, Castelbuono DJ,
Hostomsky Z, Plummer ER, Boddy AV, Tweddle DA, Curtin NJ, Clifford SC: Inhibition of
poly(ADP-ribose) polymerase-1 enhances temozolomide and topotecan activity against
childhood neuroblastoma. Clin. Cancer Res 2009, 15:1241–124910.1158/1078-0432.CCR08-1095.
291. Witt O, Deubzer HE, Lodrini M, Milde T, Oehme I: Targeting histone deacetylases in
neuroblastoma. Curr. Pharm. Des 2009, 15:436–447.
292. Gautschi O, Heighway J, Mack PC, Purnell PR, Lara PN, Gandara DR: Aurora kinases
as anticancer drug targets. Clin. Cancer Res 2008, 14:1639–164810.1158/1078-0432.CCR07-2179.
293. Alley MC, Scudiero DA, Monks A, Hursey ML, Czerwinski MJ, Fine DL, Abbott BJ,
Mayo JG, Shoemaker RH, Boyd MR: Feasibility of Drug Screening with Panels of
Human Tumor Cell Lines Using a Microculture Tetrazolium Assay. Cancer Research
1988, 48:589 –601.
294. Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF,
Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS,
213
Peckham HE, Manning JM, McKernan KJ, Grimmond SM: Stem cell transcriptome
profiling via massive-scale mRNA sequencing. Nat. Methods 2008, 5:613–
61910.1038/nmeth.1223.
295. Shah SH, Pallas JA: Identifying differential exon splicing using linear models and
correlation coefficients. BMC Bioinformatics 2009, 10:2610.1186/1471-2105-10-26.
296. The UniProt Consortium: Ongoing and future developments at the Universal Protein
Resource. Nucleic Acids Research 2010, 39:D214–D21910.1093/nar/gkq1020.
297. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S,
Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS,
Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu A-L, Tam A, Zhao Y, Moore
RA, Hirst M, Marra MA, Jones SJM, Hoodless PA, Birol I: De novo assembly and analysis
of RNA-seq data. Nat. Methods 2010, 7:909–91210.1038/nmeth.1517.
298. Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y,
Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf
S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D,
Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, et al.: Ensembl
2009. Nucleic Acids Research 2009, 37:D690–D69710.1093/nar/gkn828.
299. Irminger-Finger I, Jefford CE: Is there more to BARD1 than BRCA1? Nat. Rev.
Cancer 2006, 6:382–39110.1038/nrc1878.
300. Shakya R, Szabolcs M, McCarthy E, Ospina E, Basso K, Nandula S, Murty V, Baer R,
Ludwig T: The basal-like mammary carcinomas induced by Brca1 or Bard1
inactivation implicate the BRCA1/BARD1 heterodimer in tumor suppression. Proc.
Natl. Acad. Sci. U.S.A. 2008, 105:7040–704510.1073/pnas.0711032105.
301. Li L, Ryser S, Dizin E, Pils D, Krainer M, Jefford CE, Bertoni F, Zeillinger R, IrmingerFinger I: Oncogenic BARD1 isoforms expressed in gynecological cancers. Cancer Res.
2007, 67:11876–1188510.1158/0008-5472.CAN-07-2370.
302. Sporn JC, Hothorn T, Jung B: BARD1 expression predicts outcome in colon cancer.
Clin. Cancer Res. 2011, 17:5451–546210.1158/1078-0432.CCR-11-0263.
303. Zhang Y-Q, Bianco A, Malkinson AM, Leoni VP, Frau G, De Rosa N, André P-A,
Versace R, Boulvain M, Laurent GJ, Atzori L, Irminger-Finger I: BARD1: An independent
predictor of survival in non-small cell lung cancer. International Journal of Cancer.
Journal International Du Cancer 2011, 10.1002/ijc.26346Available:
http://www.ncbi.nlm.nih.gov/pubmed/21815143.Accessed 24 January 2012.
304. Shang X, Burlingame SM, Okcu MF, Ge N, Russell HV, Egler RA, David RD,
Vasudevan SA, Yang J, Nuchtern JG: Aurora A is a negative prognostic factor and a new
therapeutic target in human neuroblastoma. Mol. Cancer Ther 2009, 8:2461–
246910.1158/1535-7163.MCT-08-0857.
214
305. Lens SMA, Voest EE, Medema RH: Shared and separate functions of polo-like
kinases and aurora kinases in cancer. Nat Rev Cancer 2010, 10:825–84110.1038/nrc2964.
306. Westerhout E, Kool M, Molenaar J, Stroeken, den Boer M, Segers S, Clifford S,
Delattre O, Benetkiewicz M, Lanvers C, Pieters R, Pietsch T, Holst M, Renshaw J, Shipley J,
Serra M, Scotlandi K, Geoerger B, Vassal G, Degrand O, Verschuur A, Versteeg R, Caron H:
OR1 The KidsCancerKinome: Validation of Aurora kinases as potential drug targets in
neuroblastoma and other pediatric tumors. 2010, Available:
http://www.anr2010.com/anr2010_data/documents/ANR%202010%20for%20web.pdf.Acces
sed 11 July 2011.
307. Grinshtein N, Datti A, Fujitani M, Uehling D, Prakesch M, Isaac M, Irwin MS, Wrana
JL, Al-Awar R, Kaplan DR: Small molecule kinase inhibitor screen identifies polo-like
kinase 1 as a target for neuroblastoma tumor-initiating cells. Cancer Res 2011, 71:1385–
139510.1158/0008-5472.CAN-10-2484.
308. Ackermann S, Goeser F, Schulte JH, Schramm A, Ehemann V, Hero B, Eggert A,
Berthold F, Fischer M: Polo-like kinase 1 is a therapeutic target in high-risk
neuroblastoma. Clin. Cancer Res 2011, 17:731–74110.1158/1078-0432.CCR-10-1129.
309. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants
using mapping quality scores. Genome Res 2008, 18:1851–185810.1101/gr.078212.108.
310. Keller A, Nesvizhskii AI, Kolker E, Aebersold R: Empirical Statistical Model To
Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database
Search. Analytical Chemistry 2002, 74:5383–539210.1021/ac025747h.
311. Nesvizhskii AI, Keller A, Kolker E, Aebersold R: A statistical model for identifying
proteins by tandem mass spectrometry. Anal. Chem 2003, 75:4646–4658.
312. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G,
Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009,
25:2078–207910.1093/bioinformatics/btp352.
313. Suzuki R, Shimodaira H: Pvclust: an R package for assessing the uncertainty in
hierarchical clustering. Bioinformatics 2006, 22:1540 –154210.1093/bioinformatics/btl117.
314. Kislinger T, Gramolini AO, MacLennan DH, Emili A: Multidimensional protein
identification technology (MudPIT): technical overview of a profiling method optimized
for the comprehensive proteomic investigation of normal and diseased heart tissue. J.
Am. Soc. Mass Spectrom. 2005, 16:1207–122010.1016/j.jasms.2005.02.015.
315. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T,
Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C: Solution
hybrid selection with ultra-long oligonucleotides for massively parallel targeted
sequencing. Nat. Biotechnol. 2009, 27:182–18910.1038/nbt.1523.
215
316. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall
KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R,
Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS,
Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt
MR, et al.: Accurate whole human genome sequencing using reversible terminator
chemistry. Nature 2008, 456:53–5910.1038/nature07517.
317. Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell
CM, Loveland JE, Ruef BJ, Hart E, Suner M-M, Landrum MJ, Aken B, Ayling S, Baertsch
R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster
M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, et al.: The consensus
coding sequence (CCDS) project: Identifying a common protein-coding gene set for the
human and mouse genomes. Genome Res. 2009, 19:1316–132310.1101/gr.080531.108.
318. Pruitt KD, Tatusova T, Klimke W, Maglott DR: NCBI Reference Sequences: current
status, policy and new initiatives. Nucleic Acids Res. 2009, 37:D32–3610.1093/nar/gkn721.
319. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 2009, 25:1754–176010.1093/bioinformatics/btp324.
320. Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K, Sivachenko A, Kryukov
GV, Lawrence MS, Sougnez C, McKenna A, Shefler E, Ramos AH, Stojanov P, Carter SL,
Voet D, Cortés ML, Auclair D, Berger MF, Saksena G, Guiducci C, Onofrio RC, Parkin M,
Romkes M, Weissfeld JL, Seethala RR, Wang L, Rangel-Escareño C, Fernandez-Lopez JC,
Hidalgo-Miranda A, Melendez-Zajgla J, et al.: The mutational landscape of head and neck
squamous cell carcinoma. Science 2011, 333:1157–116010.1126/science.1208130.
321. Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P,
Nazarenko I, Nilsen GB, Yeung G, Dahl F, Fernandez A, Staker B, Pant KP, Baccash J,
Borcherding AP, Brownley A, Cedeno R, Chen L, Chernikoff D, Cheung A, Chirita R,
Curson B, Ebert JC, Hacker CR, Hartlage R, Hauser B, Huang S, Jiang Y, Karpinchyk V, et
al.: Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA
Nanoarrays. Science 2010, 327:78 –8110.1126/science.1181498.
322. Loeb LA: Human cancers express mutator phenotypes: origin, consequences and
targeting. Nat Rev Cancer 2011, 11:450–45710.1038/nrc3063.
323. Lovejoy CA, Lock K, Yenamandra A, Cortez D: DDB1 maintains genome integrity
through regulation of Cdt1. Mol. Cell. Biol. 2006, 26:7977–799010.1128/MCB.00819-06.
324. Holmberg C, Fleck O, Hansen HA, Liu C, Slaaby R, Carr AM, Nielsen O: Ddb1
controls genome stability and meiosis in fission yeast. Genes & Development 2005, 19:853
–86210.1101/gad.329905.
325. Shimanouchi K, Takata K, Yamaguchi M, Murakami S, Ishikawa G, Takeuchi R, Kanai
Y, Ruike T, Nakamura R, Abe Y, Sakaguchi K: Drosophila Damaged DNA Binding
Protein 1 Contributes to Genome Stability in Somatic Cells. J Biochem 2006, 139:51–
5810.1093/jb/mvj006.
216
326. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague
J, Futreal PA, Stratton MR, Wooster R: The COSMIC (Catalogue of Somatic Mutations in
Cancer) database and website. Br. J. Cancer 2004, 91:355–35810.1038/sj.bjc.6601894.
327. Bradić M, Costa J, Chelo IM: Genotyping with Sequenom. Methods Mol. Biol. 2011,
772:193–21010.1007/978-1-61779-228-1_11.
328. Thomas RK, Baker AC, Debiasi RM, Winckler W, Laframboise T, Lin WM, Wang M,
Feng W, Zander T, MacConaill L, Macconnaill LE, Lee JC, Nicoletti R, Hatton C, Goyette
M, Girard L, Majmudar K, Ziaugra L, Wong K-K, Gabriel S, Beroukhim R, Peyton M,
Barretina J, Dutt A, Emery C, Greulich H, Shah K, Sasaki H, Gazdar A, Minna J, et al.:
High-throughput oncogene mutation profiling in human cancer. Nat. Genet. 2007,
39:347–35110.1038/ng1975.
329. Getz G, Höfling H, Mesirov JP, Golub TR, Meyerson M, Tibshirani R, Lander ES:
Comment on ―The consensus coding sequences of human breast and colorectal
cancers.‖Science 2007, 317:150010.1126/science.1138764.
330. Bentires-Alj M, Paez JG, David FS, Keilhack H, Halmos B, Naoki K, Maris JM,
Richardson A, Bardelli A, Sugarbaker DJ, Richards WG, Du J, Girard L, Minna JD, Loh
ML, Fisher DE, Velculescu VE, Vogelstein B, Meyerson M, Sellers WR, Neel BG:
Activating mutations of the noonan syndrome-associated SHP2/PTPN11 gene in human
solid tumors and adult acute myelogenous leukemia. Cancer Res. 2004, 64:8816–
882010.1158/0008-5472.CAN-04-1923.
331. Tartaglia M, Niemeyer CM, Fragale A, Song X, Buechner J, Jung A, Hahlen K, Hasle
H, Licht JD, Gelb BD: Somatic mutations in PTPN11 in juvenile myelomonocytic
leukemia, myelodysplastic syndromes and acute myeloid leukemia. Nat Genet 2003,
34:148–15010.1038/ng1156.
332. Tartaglia M, Mehler EL, Goldberg R, Zampino G, Brunner HG, Kremer H, van der
Burgt I, Crosby AH, Ion A, Jeffery S, Kalidas K, Patton MA, Kucherlapati RS, Gelb BD:
Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan
syndrome. Nat Genet 2001, 29:465–46810.1038/ng772.
333. Ketroussi F, Giuliani M, Bahri R, Azzarone B, Charpentier B, Durrbach A:
Lymphocyte Cell-Cycle Inhibition by HLA-G Is Mediated by Phosphatase SHP-2 and
Acts on the mTOR Pathway. PLoS ONE 2011, 6:e2277610.1371/journal.pone.0022776.
334. Hoover AC, Strand GL, Nowicki PN, Anderson ME, Vermeer PD, Klingelhutz AJ,
Bossler AD, Pottala JV, Hendriks WJAJ, Lee JH: Impaired PTPN13 phosphatase activity
in spontaneous or HPV-induced squamous cell carcinomas potentiates oncogene
signaling via the MAP kinase pathway. Oncogene 2009, 28:3960–
397010.1038/onc.2009.251.
335. Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large
gene lists using DAVID bioinformatics resources. Nature Protocols 2008, 4:44–
5710.1038/nprot.2008.211.
217
336. Gui Y, Guo G, Huang Y, Hu X, Tang A, Gao S, Wu R, Chen C, Li X, Zhou L, He M, Li
Z, Sun X, Jia W, Chen J, Yang S, Zhou F, Zhao X, Wan S, Ye R, Liang C, Liu Z, Huang P,
Liu C, Jiang H, Wang Y, Zheng H, Sun L, Liu X, Jiang Z, et al.: Frequent mutations of
chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat. Genet.
2011, 43:875–87810.1038/ng.907.
337. Wilson BG, Roberts CWM: SWI/SNF nucleosome remodellers and cancer. Nat Rev
Cancer 2011, 11:481–49210.1038/nrc3068.
338. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I: ABySS: a parallel
assembler for short read sequence data. Genome Res 2009, 19:1117–
112310.1101/gr.089532.108.
339. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra
MA: Circos: an information aesthetic for comparative genomics. Genome Res 2009,
19:1639–164510.1101/gr.092759.109.
340. Nückel H, Frey UH, Sellmann L, Collins CH, Dührsen U, Siffert W: The IKZF3
(Aiolos) transcription factor is highly upregulated and inversely correlated with clinical
progression in chronic lymphocytic leukaemia. Br. J. Haematol. 2009, 144:268–
27010.1111/j.1365-2141.2008.07442.x.
341. Foster RE, Abdulrahman M, Morris MR, Prigmore E, Gribble S, Ng B, Gentle D, Ready
S, Weston PMT, Wiesener MS, Kishida T, Yao M, Davison V, Barbero JL, Chu C, Carter
NP, Latif F, Maher ER: Characterization of a 3;6 translocation associated with renal cell
carcinoma. Genes Chromosomes Cancer 2007, 46:311–31710.1002/gcc.20403.
342. Hirokawa YS, Takagi A, Uchida K, Kozuka Y, Yoneda M, Watanabe M, Shiraishi T:
High level expression of STAG1/PMEPA1 in an androgen-independent prostate cancer
PC3 subclone. Cell. Mol. Biol. Lett 2007, 10.2478/s11658-007-0009-yAvailable:
http://www.ncbi.nlm.nih.gov/pubmed/17318295.Accessed 14 September 2011.
343. Mitelman F, Johansson B, Mertens F: Mitelman Database of Chromosome
Aberrations and Gene Fusions in Cancer. 2011.
http://cgap.nci.nih.gov/Chromosomes/Mitelman.Accessed 2 November 2011.
344. Sun X, Frierson HF, Chen C, Li C, Ran Q, Otto KB, Cantarel BL, Cantarel BM,
Vessella RL, Gao AC, Petros J, Miura Y, Simons JW, Dong J-T: Frequent somatic
mutations of the transcription factor ATBF1 in human prostate cancer. Nat. Genet.
2005, 37:407–41210.1038/ng1528.
345. Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat Protoc 2009, 4:1073–
108110.1038/nprot.2009.86.
346. Comprehensive genomic characterization defines human glioblastoma genes and core
pathways: Nature 2008, 455:1061–106810.1038/nature07385.
218
347. Huang J, Zhao Y, Li Y, Fletcher JA, Xiao S: Genomic and functional evidence for an
ARID1A tumor suppressor role. Genes, Chromosomes and Cancer 2007, 46:745–
75010.1002/gcc.20459.
348. Jones S, Li M, Parsons DW, Zhang X, Wesseling J, Kristel P, Schmidt MK, Markowitz
S, Yan H, Bigner D, Hruban RH, Eshleman JR, Iacobuzio‐Donahue CA, Goggins M, Maitra
A, Malek SN, Powell S, Vogelstein B, Kinzler KW, Velculescu VE, Papadopoulos N:
Somatic mutations in the chromatin remodeling gene ARID1A occur in several tumor
types. Human Mutation 2012, 33:100–10310.1002/humu.21633.
349. Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat Protoc 2009, 4:1073–
108110.1038/nprot.2009.86.
350. Stiff T, Shtivelman E, Jeggo P, Kysela B: AHNAK interacts with the DNA ligase IVXRCC4 complex and stimulates DNA ligase IV-mediated double-stranded ligation.
DNA Repair (Amst.) 2004, 3:245–25610.1016/j.dnarep.2003.11.001.
351. Hirayama S, Bujo H, Yamazaki H, Kanaki T, Takahashi K, Kobayashi J, Schneider WJ,
Saito Y: Differential expression of LR11 during proliferation and differentiation of
cultured neuroblastoma cells. Biochem. Biophys. Res. Commun. 2000, 275:365–
37310.1006/bbrc.2000.3312.
352. Consortium T1000 GP: A map of human genome variation from population-scale
sequencing. Nature 2010, 467:1061–107310.1038/nature09534.
353. Fejes AP, Khodabakhshi AH, Birol I, Jones SJM: Human variation database: an
open-source database template for genomic discovery. Bioinformatics 2011, 27:1155–
115610.1093/bioinformatics/btr100.
354. Stratton MR: Exploring the Genomes of Cancer Cells: Progress and Promise.
Science 2011, 331:1553 –155810.1126/science.1204040.
355. Davis JD, Lin S-Y: DNA damage and breast cancer. World J Clin Oncol 2011, 2:329–
33810.5306/wjco.v2.i9.329.
356. Stephens PJ, McBride DJ, Lin M-L, Varela I, Pleasance ED, Simpson JT, Stebbings LA,
Leroy C, Edkins S, Mudie LJ, Greenman CD, Jia M, Latimer C, Teague JW, Lau KW,
Burton J, Quail MA, Swerdlow H, Churcher C, Natrajan R, Sieuwerts AM, Martens JWM,
Silver DP, Langerød A, Russnes HEG, Foekens JA, Reis-Filho JS, van ‘t Veer L,
Richardson AL, Børresen-Dale A-L, et al.: Complex landscapes of somatic rearrangement
in human breast cancer genomes. Nature 2009, 462:1005–101010.1038/nature08645.
357. Dever SM, Golding SE, Rosenberg E, Adams BR, Idowu MO, Quillin JM, Valerie N,
Xu B, Povirk LF, Valerie K: Mutations in the BRCT binding site of BRCA1 result in
hyper-recombination. Aging (Albany NY) 2011, 3:515–532.
219
358. Goya R, Sun MGF, Morin RD, Leung G, Ha G, Wiegand KC, Senz J, Crisan A, Marra
MA, Hirst M, Huntsman D, Murphy KP, Aparicio S, Shah SP: SNVMix: predicting single
nucleotide variants from next-generation sequencing of tumors. Bioinformatics 2010,
26:730–73610.1093/bioinformatics/btq040.
359. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z: Pindel: a pattern growth approach
to detect break points of large deletions and medium sized insertions from paired-end
short reads. Bioinformatics 2009, 25:2865–287110.1093/bioinformatics/btp394.
360. Ensembl‘s 10th year: Available:
http://nar.oxfordjournals.org/content/38/suppl_1/D557.Accessed 29 February 2012.
361. Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist
programmers. Methods Mol. Biol 2000, 132:365–386.
362. Wood RD, Mitchell M, Lindahl T: Human DNA repair genes, 2005. Mutat. Res. 2005,
577:275–28310.1016/j.mrfmmm.2005.03.007.
363. Uccelli A, Laroni A, Freedman MS: Mesenchymal stem cells for the treatment of
multiple sclerosis and other neurological diseases. Lancet Neurol 2011, 10:649–
65610.1016/S1474-4422(11)70121-1.
364. Shah N: Alternative Neural Crest Cell Fates Are Instructively Promoted by TGF?
Superfamily Members. Cell 1996, 85:331–34310.1016/S0092-8674(00)81112-5.
365. Albino D, Brizzolara A, Moretti S, Falugi C, Mirisola V, Scaruffi P, Di Candia M,
Truini M, Coco S, Bonassi S, Tonini GP: Gene expression profiling identifies eleven DNA
repair genes down-regulated during mouse neural crest cell migration. Int. J. Dev. Biol.
2011, 55:65–7210.1387/ijdb.092970da.
366. Clewes O, Narytnyk A, Gillinder KR, Loughney AD, Murdoch AP, Sieber-Blum M:
Human Epidermal Neural Crest Stem Cells (hEPI-NCSC)—Characterization and
Directed Differentiation into Osteocytes and Melanocytes. Stem Cell Rev and Rep 2011,
10.1007/s12015-011-9255-5Available:
http://www.springerlink.com/content/e060gl54173u7t73/fulltext.html#CR1.Accessed 3 June
2011.
367. Maris JM: The biologic basis for neuroblastoma heterogeneity and risk
stratification. Curr. Opin. Pediatr. 2005, 17:7–13.
368. Fredlund E, Ringnér M, Maris JM, Påhlman S: High Myc Pathway Activity and Low
Stage of Neuronal Differentiation Associate with Poor Outcome in Neuroblastoma.
PNAS 2008, 105:14094–1409910.1073/pnas.0804455105.
369. Thiele CJ: Neuroblastoma (Ed). In Neuroblastoma cell lines. Masers, J Human Cell
Culture Lancaster, UK: Kluwer Academic Publishers; Vol. 1 1998:21–53.
220
370. Niederreither K, Doll|[eacute]| P: Retinoic acid in development: towards an
integrated view. Nature Reviews Genetics 2008, 9:541–55310.1038/nrg2340.
371. Grimmer MR, Weiss WA: Childhood tumors of the nervous system as disorders of
normal development. Curr. Opin. Pediatr 2006, 18:634–
63810.1097/MOP.0b013e32801080fe.
221
Appendices
Appendix A Transcripts enriched and depleted in SKPs as discussed in Chapter 2
Table A.1 Transcripts enriched and depleted in SKPs as discussed in Chapter 2
Transcripts enriched (LogFC>0.5 in SKPs vs. MSCs) or depleted (LogFC>0.5 in MSCs vs.
SKPs) in SKPs compared to MSCs have been identified as described in Chapter 2. The
average log fold enrichment (LogFC Enrichment) is calculated based on three pairwise gene
expression comparisons: vSKPs vs. MSCs, dSKPs vs. MSCs and fSKPs vs. MSCs. The
transcripts are sorted based on the magnitude of the log fold enrichment.
Gene symbol
LogFC enrichment
Population enriched in
Mmp13
8.439845
SKPs
Dcn
7.549103
SKPs
Mmp10
7.476266
SKPs
Porf1
6.404842
SKPs
Dio2
6.399098
SKPs
Lama2
6.137718
SKPs
RGD1563970
5.824436
SKPs
Car2
5.750984
SKPs
Mmp3
5.609179
SKPs
Sema3a
5.581426
SKPs
Efemp1
5.538448
SKPs
Il13ra2
5.363509
SKPs
RGD1563628
5.278261
SKPs
Bmp7
5.128188
SKPs
Dusp4
5.114638
SKPs
Pcp4
5.110036
SKPs
Serpina3n
5.094111
SKPs
Enpp2
4.939557
SKPs
Alox15
4.90602
SKPs
Crabp1
4.872659
SKPs
Mmp12
4.864477
SKPs
Megf10
4.864363
SKPs
Cyp26b1
4.799108
SKPs
Ptprz1
4.797765
SKPs
Shc4
4.74527
SKPs
Apoe
4.714561
SKPs
Slc43a3
4.709655
SKPs
Tcfap2c
4.686553
SKPs
Ass1
4.636152
SKPs
Adcyap1r1
4.629722
SKPs
222
Gene symbol
LogFC enrichment
Population enriched in
Lrrc15
4.60588
SKPs
Prl6a1
4.592803
SKPs
Ace
4.580035
SKPs
Ctsc
4.550322
SKPs
Cntn1
4.534987
SKPs
Gfra2
4.531173
SKPs
Pi15
4.526552
SKPs
Timp3
4.513819
SKPs
Sparcl1
4.490847
SKPs
Atp8a1
4.470007
SKPs
Fam105a
4.443334
SKPs
Igfbp5
4.441704
SKPs
Sema6d
4.433809
SKPs
Mamdc2
4.398762
SKPs
Cfh
4.374376
SKPs
Tacr3
4.320622
SKPs
Nov
4.292506
SKPs
Rgs2
4.270706
SKPs
Igfbp3
4.270205
SKPs
Sema7a
4.210779
SKPs
Arhgdib
4.174716
SKPs
Ccr1
4.150435
SKPs
Ampd3
4.087024
SKPs
Lilrb4
4.015927
SKPs
Syngr1
3.948953
SKPs
Sv2b
3.892332
SKPs
Dapk1
3.845124
SKPs
Clu
3.832665
SKPs
Ar
3.820615
SKPs
Hapln1
3.808152
SKPs
Robo2
3.805609
SKPs
Entpd1
3.784394
SKPs
LOC691317
3.782885
SKPs
Mmp11
3.774979
SKPs
Ifitm1
3.772668
SKPs
LOC682861
3.751067
SKPs
Igfbp4
3.733792
SKPs
C1s
3.72869
SKPs
RGD1310788
3.721911
SKPs
Ppl
3.693138
SKPs
C1ql3
3.688747
SKPs
Ddit4l
3.666462
SKPs
Abcg1
3.644367
SKPs
223
Gene symbol
LogFC enrichment
Population enriched in
Mfap2
3.633226
SKPs
Sfrp2
3.605529
SKPs
Robo1
3.564526
SKPs
Adamts8
3.517057
SKPs
Depdc2
3.500137
SKPs
Bdkrb2
3.483871
SKPs
Angpt1
3.475515
SKPs
Clec14a
3.424995
SKPs
Ms4a6a
3.414956
SKPs
Dkk2
3.41011
SKPs
LOC691995
3.389074
SKPs
Spon1
3.387731
SKPs
Galnac4s-6st
3.354692
SKPs
Mmp9
3.353969
SKPs
Epm2aip1
3.352892
SKPs
Sntg1
3.338857
SKPs
Pla2g7
3.300792
SKPs
Sema5a
3.235042
SKPs
Cttnbp2
3.218479
SKPs
Usp18
3.216911
SKPs
Mal
3.211138
SKPs
RGD1309969
3.195953
SKPs
Rasd1
3.171879
SKPs
Bmp2
3.168457
SKPs
Bace2
3.16736
SKPs
Tgm2
3.149175
SKPs
Mafb
3.142627
SKPs
Egflam
3.128537
SKPs
Ntn1
3.109019
SKPs
C1qtnf1
3.045016
SKPs
Gas7
3.042994
SKPs
RGD1563838
3.042454
SKPs
Twist2
3.027152
SKPs
Smox
3.025184
SKPs
Gabre
3.024032
SKPs
Il17rd
3.00017
SKPs
Cyp2j4
2.993084
SKPs
Gpnmb
2.988691
SKPs
Cxcl16
2.982404
SKPs
Pcdh7
2.957364
SKPs
Serpini1
2.953977
SKPs
Prdm1
2.952008
SKPs
Slpi
2.91591
SKPs
224
Gene symbol
LogFC enrichment
Population enriched in
Mgst1
2.913738
SKPs
Kcnab1
2.896829
SKPs
Tspan9
2.869862
SKPs
Enc1
2.85147
SKPs
Ednra
2.846815
SKPs
Chrna1
2.829633
SKPs
Col7a1
2.810433
SKPs
Btbd3
2.794844
SKPs
Dusp6
2.793823
SKPs
Scn7a
2.788322
SKPs
Icoslg
2.785946
SKPs
Plekhg4
2.782189
SKPs
Fbxo32
2.763752
SKPs
Alx4
2.752864
SKPs
Rps6ka2
2.752572
SKPs
F2rl2
2.746206
SKPs
Wnt7b
2.743404
SKPs
Ubash3b
2.737652
SKPs
Rerg
2.731972
SKPs
Kcnf1
2.729508
SKPs
Pde7b
2.720348
SKPs
Slc22a23
2.707091
SKPs
Cd24
2.706476
SKPs
Vav3
2.70184
SKPs
Gpx3
2.700941
SKPs
Wnt16
2.697651
SKPs
Abca1
2.696829
SKPs
Tspan11
2.692371
SKPs
Sh3kbp1
2.690237
SKPs
Xylt1
2.679182
SKPs
Slco2b1
2.67564
SKPs
MGC112715
2.665346
SKPs
St8sia4
2.64635
SKPs
Fam171b
2.619459
SKPs
Prr16
2.614479
SKPs
Plxdc1
2.61116
SKPs
Sulf2
2.609031
SKPs
Upp1
2.602906
SKPs
Fap
2.586286
SKPs
Kcna3
2.57713
SKPs
Tcfap2a
2.566328
SKPs
Arl11
2.563157
SKPs
Il16
2.562949
SKPs
225
Gene symbol
LogFC enrichment
Population enriched in
Wnt5a
2.542061
SKPs
Pion
2.532506
SKPs
Col23a1
2.516179
SKPs
Calcrl
2.514765
SKPs
Ly75
2.498175
SKPs
Cyp2j3
2.475447
SKPs
Fam43a
2.457866
SKPs
Fgfr1
2.444475
SKPs
Il7
2.442199
SKPs
Adh7
2.420907
SKPs
Bhlhb3
2.413693
SKPs
Sorcs2
2.376019
SKPs
Rspo4
2.375095
SKPs
Gem
2.375041
SKPs
Rdh10
2.374578
SKPs
Zfp423
2.370252
SKPs
Cygb
2.364885
SKPs
Gbp2
2.363691
SKPs
Serpine2
2.356378
SKPs
Zbtb16
2.352951
SKPs
Mmp1a
2.331148
SKPs
Sdc1
2.318789
SKPs
LOC500046
2.309033
SKPs
Limk1
2.303573
SKPs
Boc
2.287568
SKPs
Olfm1
2.281832
SKPs
Peli2
2.271998
SKPs
Rapgef5
2.248381
SKPs
Mitf
2.247989
SKPs
Efnb1
2.244997
SKPs
Olr155
2.24431
SKPs
RGD1563349
2.23565
SKPs
Rftn2
2.233297
SKPs
Adcy2
2.232741
SKPs
Dchs1
2.231393
SKPs
Gper
2.22473
SKPs
Fkbp5
2.216139
SKPs
Cask
2.214251
SKPs
Dll1
2.206801
SKPs
Gadd45a
2.206485
SKPs
Pcdh9
2.20426
SKPs
RGD1307051
2.203518
SKPs
RGD1309676
2.2009
SKPs
226
Gene symbol
LogFC enrichment
Population enriched in
Etv1
2.194529
SKPs
Crisp1
2.183366
SKPs
RGD1310753
2.181243
SKPs
Ucn2
2.174403
SKPs
Ptprk
2.172379
SKPs
Scd
2.170952
SKPs
Evi2a
2.164939
SKPs
Fndc3a
2.164736
SKPs
Cd97
2.162718
SKPs
Tbx2
2.159147
SKPs
Asam
2.153283
SKPs
Cd93
2.148491
SKPs
Dgkb
2.141238
SKPs
Dnm1
2.130209
SKPs
Slc16a2
2.128206
SKPs
Net1
2.126936
SKPs
Elovl4
2.125843
SKPs
Pitx2
2.112302
SKPs
Lrrc17
2.111186
SKPs
Pcsk2
2.082576
SKPs
Pdpn
2.08118
SKPs
Cpm
2.075904
SKPs
Adam33
2.073154
SKPs
Scara5
2.071966
SKPs
Sgk1
2.071694
SKPs
Tnfrsf1b
2.067268
SKPs
Cdh2
2.064023
SKPs
Glrx1
2.061341
SKPs
RGD1560686
2.059465
SKPs
Lpcat2
2.052348
SKPs
Ptger3
2.047112
SKPs
Axin2
2.033148
SKPs
Sepp1
2.029722
SKPs
Shroom2
2.024783
SKPs
Dennd3
2.008681
SKPs
Cdkn2b
1.995121
SKPs
Dusp5
1.992932
SKPs
Ptpre
1.979358
SKPs
RGD1563185
1.972694
SKPs
Gpc3
1.972504
SKPs
Ptgdrl
1.962685
SKPs
Lpin1
1.961242
SKPs
Rhbdf2
1.949743
SKPs
227
Gene symbol
LogFC enrichment
Population enriched in
LOC501482
1.946375
SKPs
RGD1307119
1.937082
SKPs
Adcy7
1.924664
SKPs
Pcsk5
1.924428
SKPs
Nfkbia
1.91993
SKPs
Kcnh1
1.918282
SKPs
RT1-CE5
1.9116
SKPs
Tcf4
1.907333
SKPs
Cyp7b1
1.902961
SKPs
Psmb8
1.898685
SKPs
Rnf149
1.896076
SKPs
Adamts4
1.885483
SKPs
Slc39a14
1.878078
SKPs
Cyb561
1.873581
SKPs
Amica1
1.870064
SKPs
Npas2
1.858108
SKPs
Gsta4
1.855098
SKPs
Irx1
1.849742
SKPs
LOC365723
1.842754
SKPs
LOC689545
1.841621
SKPs
Wif1
1.840557
SKPs
Phf15
1.839713
SKPs
Rab38
1.81861
SKPs
Agpat9
1.815353
SKPs
Card6
1.799736
SKPs
RGD1564996
1.788221
SKPs
Arhgap22
1.785155
SKPs
Slc1a1
1.784007
SKPs
Mmp2
1.78395
SKPs
Nrg1
1.775892
SKPs
Tcp11l2
1.735227
SKPs
Cyp2d4v1
1.735093
SKPs
Thrb
1.733174
SKPs
Angptl2
1.72938
SKPs
Rbp1
1.718883
SKPs
Fyn
1.718785
SKPs
Twist1
1.718642
SKPs
Lrp5
1.710311
SKPs
Ghr
1.706529
SKPs
Sox2
1.702773
SKPs
Syt17
1.697783
SKPs
Prex1
1.697384
SKPs
Ebf4
1.69161
SKPs
228
Gene symbol
LogFC enrichment
Population enriched in
Etv4
1.688637
SKPs
Plcb1
1.679783
SKPs
Klhl24
1.676954
SKPs
Npc2
1.675324
SKPs
Ramp
1.667865
SKPs
Hspa12b
1.662875
SKPs
Tex9
1.659326
SKPs
Rasgrf2
1.654338
SKPs
Rpia
1.653981
SKPs
Adamts7
1.650379
SKPs
B3galnt1
1.644111
SKPs
Lmo2
1.626439
SKPs
Ap1s2
1.618013
SKPs
Slc19a2
1.61485
SKPs
Csad
1.606494
SKPs
P2ry1
1.606197
SKPs
Arsb
1.599389
SKPs
Cacna1a
1.59918
SKPs
Adfp
1.595259
SKPs
RGD1310552
1.593861
SKPs
P2ry2
1.590305
SKPs
Il2rg
1.58555
SKPs
Pfkfb4
1.575458
SKPs
LOC305633
1.57441
SKPs
Zc3h12c
1.574396
SKPs
Epb4.1
1.571719
SKPs
Kremen1
1.564563
SKPs
Sh3gl3
1.564535
SKPs
Irx2
1.561118
SKPs
Pcdh18
1.558573
SKPs
RGD1307749
1.556957
SKPs
Ankrd6
1.545125
SKPs
Cyb5a
1.539911
SKPs
Lrig1
1.537755
SKPs
Gng8
1.535937
SKPs
Spred1
1.533063
SKPs
Pik3r1
1.522542
SKPs
Trpm3
1.520454
SKPs
Ctsh
1.519906
SKPs
Ppp1r3b
1.512322
SKPs
St3gal4
1.509367
SKPs
Ddit4
1.507832
SKPs
Ptpn3
1.507165
SKPs
229
Gene symbol
LogFC enrichment
Population enriched in
Ctsd
1.499332
SKPs
Lgals3
1.498291
SKPs
Fut8
1.497645
SKPs
Spry4
1.493729
SKPs
RGD1307569
1.491449
SKPs
Capg
1.486736
SKPs
Gsta3
1.48666
SKPs
Aplp1
1.473645
SKPs
Satb2
1.472972
SKPs
Rab3il1
1.467931
SKPs
Gpr153_predicted
1.460756
SKPs
Tmem119
1.457966
SKPs
Frrs1
1.456784
SKPs
Abca2
1.453259
SKPs
Mex3b
1.448965
SKPs
Pde1b
1.442909
SKPs
Irf1
1.441505
SKPs
Btg1
1.44123
SKPs
Mr1
1.437384
SKPs
Dcbld2
1.437351
SKPs
LOC691418
1.434372
SKPs
Nr1h3
1.433924
SKPs
Slc4a7
1.432864
SKPs
Ttyh3
1.432388
SKPs
RGD1560778
1.430171
SKPs
Rai2
1.429157
SKPs
Ugt1a5
1.421798
SKPs
Prrg3
1.417992
SKPs
Egr3
1.417143
SKPs
Ptgir
1.406829
SKPs
Rasa1
1.400974
SKPs
Atp6v1b2
1.384625
SKPs
Cables1
1.376785
SKPs
LOC685707
1.363671
SKPs
Nudt4
1.347848
SKPs
Foxo3
1.346535
SKPs
Zmiz1
1.346355
SKPs
Flrt3
1.343682
SKPs
RGD1564942
1.339691
SKPs
Grasp
1.335824
SKPs
Tgfbr2
1.335203
SKPs
Cd9
1.332448
SKPs
Dyrk3
1.332066
SKPs
230
Gene symbol
LogFC enrichment
Population enriched in
Ankh
1.331324
SKPs
Nme3
1.325655
SKPs
Col16a1
1.324764
SKPs
Obfc2a
1.324321
SKPs
RGD1310862
1.323114
SKPs
Slc7a7
1.322833
SKPs
Efnb2
1.321452
SKPs
Ctsl1
1.321234
SKPs
Cd82
1.315652
SKPs
Slc25a37
1.314125
SKPs
Rnf182
1.312826
SKPs
Uhrf2
1.31142
SKPs
Tifa
1.309473
SKPs
Adam17
1.308222
SKPs
Cbfa2t3
1.304733
SKPs
Igsf3
1.30396
SKPs
Syt13
1.303321
SKPs
LOC690769
1.303044
SKPs
Tpp1
1.28837
SKPs
LOC688916
1.284926
SKPs
Bcl2l11
1.277599
SKPs
Scd1
1.274121
SKPs
B2m
1.26819
SKPs
Oprd1
1.267969
SKPs
Tm7sf2
1.267528
SKPs
Nfe2l2
1.265797
SKPs
Myo1b
1.265205
SKPs
Cdh11
1.264608
SKPs
Kcnab2
1.259399
SKPs
Syt6
1.25806
SKPs
Gchfr
1.255077
SKPs
Scpep1
1.250481
SKPs
Abhd14b
1.24897
SKPs
S1pr2
1.247579
SKPs
Spock1
1.240663
SKPs
Slc39a6
1.23809
SKPs
Dlgap2
1.233464
SKPs
Prkcz
1.232432
SKPs
Sh2b2
1.2306
SKPs
Tcp11l1
1.225166
SKPs
Shc2
1.221249
SKPs
Mettl7a
1.22108
SKPs
Sema4c
1.220116
SKPs
231
Gene symbol
LogFC enrichment
Population enriched in
Tgfbr1
1.219942
SKPs
Nid67
1.219078
SKPs
Ralgds
1.208394
SKPs
Ier5
1.203804
SKPs
Wwtr1
1.202635
SKPs
Tmem200b
1.201752
SKPs
Gstt2
1.198435
SKPs
Ntrk2
1.193961
SKPs
Ctsf
1.185615
SKPs
Cntnap1
1.18223
SKPs
Zfp36l1
1.179654
SKPs
Plxna2
1.17861
SKPs
Nkd1
1.173436
SKPs
Mylip
1.17217
SKPs
S100a16
1.170254
SKPs
Ly96
1.166122
SKPs
Ccdc92
1.165316
SKPs
Mocos
1.164343
SKPs
Grn
1.162677
SKPs
Chchd10
1.161562
SKPs
Neu2
1.158107
SKPs
RGD1359529
1.153555
SKPs
Slc12a7
1.151441
SKPs
RGD1566021
1.150212
SKPs
RGD1566132
1.14984
SKPs
RGD1562618
1.14834
SKPs
Lphn1
1.147889
SKPs
Cachd1
1.146924
SKPs
Atp10d
1.136646
SKPs
Runx3
1.135472
SKPs
Dennd2c
1.135139
SKPs
Tusc3
1.134717
SKPs
Cugbp2
1.133646
SKPs
Dnajc12
1.132937
SKPs
Pnrc1
1.120236
SKPs
Clk4
1.117579
SKPs
Gabarapl1
1.103554
SKPs
Igf2r
1.099944
SKPs
Bach1
1.094652
SKPs
Pcmtd2
1.094107
SKPs
Sntb2
1.09236
SKPs
Zfand5
1.090404
SKPs
Uba7
1.090038
SKPs
232
Gene symbol
LogFC enrichment
Population enriched in
Lamp2
1.089041
SKPs
Pten
1.086014
SKPs
Itm2c
1.082152
SKPs
Psmd10
1.081984
SKPs
Lgals3bp
1.081565
SKPs
Dpy19l1
1.075216
SKPs
Mmp14
1.072628
SKPs
RGD1306437
1.069995
SKPs
LOC501110
1.065487
SKPs
RGD1306613
1.059337
SKPs
Znf503
1.054815
SKPs
Plag1
1.045373
SKPs
Tmem140
1.042719
SKPs
Rasa3
1.039741
SKPs
Cdkn2aipnl
1.03926
SKPs
Ppargc1b
1.036635
SKPs
RGD1309621
1.036602
SKPs
Prkcd
1.034258
SKPs
Epha4
1.032744
SKPs
Gcnt1
1.027978
SKPs
Tram1l1
1.026665
SKPs
RGD1304884
1.017999
SKPs
Mad2l2
1.015835
SKPs
Nradd
1.015643
SKPs
Gpd1
1.008748
SKPs
Ankrd44
1.007042
SKPs
Pink1
1.005692
SKPs
Hsd17b11
1.004974
SKPs
Nrg2
0.999896
SKPs
RGD1306058
0.995803
SKPs
Map3k6
0.990044
SKPs
Homer2
0.983582
SKPs
Tbc1d2b
0.983029
SKPs
RGD1308093
0.982798
SKPs
Camk2n1
0.982001
SKPs
Gsn
0.978805
SKPs
Psap
0.978628
SKPs
Plekha5
0.978532
SKPs
Emp1
0.97834
SKPs
Isoc1
0.978253
SKPs
Arrdc3
0.975421
SKPs
RGD1306271
0.972209
SKPs
Sipa1l2
0.972091
SKPs
233
Gene symbol
LogFC enrichment
Population enriched in
Manba
0.971204
SKPs
Rtkn
0.971138
SKPs
Maml3
0.970924
SKPs
Runx1t1
0.96641
SKPs
Grina
0.965713
SKPs
Psen2
0.965253
SKPs
Ptp4a1
0.961293
SKPs
MGC94600
0.960727
SKPs
Atp6v1a
0.960519
SKPs
Cst3
0.955815
SKPs
Ppp3ca
0.955555
SKPs
Utx
0.953783
SKPs
Nucb1
0.951491
SKPs
LOC683801
0.945135
SKPs
Man2b1
0.940716
SKPs
Tacc2
0.938214
SKPs
Kif26a
0.937563
SKPs
Tbc1d16
0.934257
SKPs
Tapbpl
0.932016
SKPs
Pepd
0.9291
SKPs
Sema6c
0.928331
SKPs
Pnpla7
0.927908
SKPs
Slc25a2
0.927831
SKPs
H2-M3
0.926751
SKPs
Zfp347
0.924779
SKPs
Nqo1
0.923352
SKPs
Insig1
0.922818
SKPs
Tap2
0.922308
SKPs
Gng12
0.92217
SKPs
Ltbp3
0.920772
SKPs
Ripk5
0.919637
SKPs
Hdac11
0.919197
SKPs
Lrrc8a
0.917183
SKPs
Gna13
0.91352
SKPs
Ctsb
0.912132
SKPs
RGD1304694
0.906121
SKPs
Nbr1
0.904426
SKPs
Slc24a6
0.90129
SKPs
Ccdc50
0.899954
SKPs
Fhit
0.899764
SKPs
Mfsd7
0.898991
SKPs
Laptm4b
0.895226
SKPs
Nfat5
0.892256
SKPs
234
Gene symbol
LogFC enrichment
Population enriched in
Tiparp
0.883362
SKPs
Appl2
0.877522
SKPs
Fip1l1
0.872146
SKPs
LOC100192313
0.868357
SKPs
Pld3
0.864994
SKPs
Nck1
0.861146
SKPs
Fbxo44
0.859717
SKPs
Arhgef9
0.859525
SKPs
Chka
0.858264
SKPs
Tmem59_RGD1310313
0.857462
SKPs
Wipf1
0.848145
SKPs
Rxra
0.846768
SKPs
Atp6ap2
0.845348
SKPs
Itm2b
0.844967
SKPs
Parp9
0.839773
SKPs
RGD1566386
0.838603
SKPs
Sh2b3
0.838432
SKPs
Rora
0.834799
SKPs
Atp6ap1
0.834476
SKPs
Dtx3l
0.825572
SKPs
Runx2
0.824737
SKPs
Fundc2
0.824482
SKPs
Ogt
0.818993
SKPs
RGD1311605
0.818612
SKPs
Wdfy2
0.81507
SKPs
Ier5l
0.814783
SKPs
Per1
0.814229
SKPs
Nudt9
0.813495
SKPs
Alg2
0.812933
SKPs
Gramd3
0.808468
SKPs
Psenen
0.807557
SKPs
Hbp1
0.805151
SKPs
RGD1309926
0.804397
SKPs
Rnf167
0.802496
SKPs
Ctnnb1
0.800444
SKPs
Man2a2
0.799784
SKPs
Fuca1
0.79051
SKPs
Cd1d1
0.78585
SKPs
Chst11
0.784267
SKPs
Wdr91
0.781068
SKPs
RGD1306284
0.779611
SKPs
Slk
0.779439
SKPs
Aadacl1
0.776881
SKPs
235
Gene symbol
LogFC enrichment
Population enriched in
Zfoc1
0.776444
SKPs
Kdsr
0.776261
SKPs
Slc23a2
0.775929
SKPs
Ptprm
0.774578
SKPs
Crebl2
0.771172
SKPs
Gns
0.769812
SKPs
Ntng2
0.767413
SKPs
Ccl4
0.767396
SKPs
Plekho1
0.764344
SKPs
Rufy1
0.758117
SKPs
Ube2b
0.755211
SKPs
Hace1
0.753332
SKPs
Wdr68
0.743617
SKPs
Chic2
0.738018
SKPs
Tpbg
0.733065
SKPs
Dnm3
0.731917
SKPs
Ubl3
0.731577
SKPs
Slc29a3
0.731236
SKPs
Ccbl1
0.728912
SKPs
RGD1307986
0.727596
SKPs
Hs1bp3
0.725371
SKPs
Stk39
0.722163
SKPs
Scamp1
0.711949
SKPs
Btbd1
0.711071
SKPs
LOC300225
0.708721
SKPs
Hipk1
0.707363
SKPs
Atp6v0d1
0.701349
SKPs
Ercc5
0.700507
SKPs
Vamp5
0.700015
SKPs
Orai2
0.699471
SKPs
Jmjd1c
0.697773
SKPs
Dlk2
0.694736
SKPs
Calcoco1
0.691674
SKPs
Ihpk1
0.67988
SKPs
Tmem179b
0.67951
SKPs
Ggps1
0.678025
SKPs
LOC685925
0.670847
SKPs
Igbp1
0.669457
SKPs
Slc27a1
0.6684
SKPs
Wdr45
0.667068
SKPs
Gramd1a
0.66656
SKPs
Ihpk2
0.663588
SKPs
Acsf2
0.662109
SKPs
236
Gene symbol
LogFC enrichment
Population enriched in
Sh3bp4
0.647964
SKPs
Reep3
0.64462
SKPs
Zfp110
0.641738
SKPs
Zbtb43
0.632255
SKPs
Fam113a
0.621516
SKPs
Tbc1d17
0.613462
SKPs
Lztfl1
0.609969
SKPs
Lig4
0.609094
SKPs
Ss18
0.603474
SKPs
RGD1307682
0.597754
SKPs
Cnnm3
0.597087
SKPs
Mafg
0.596552
SKPs
Narf
0.592008
SKPs
Tmem188
0.589472
SKPs
Laptm4a
0.584644
SKPs
RGD1560612
0.580607
SKPs
Ldb1
0.574763
SKPs
Ccng1
0.565446
SKPs
Phactr2
0.562541
SKPs
Fam160a2
0.560316
SKPs
Srebf2
0.559231
SKPs
RGD1560108
0.551124
SKPs
Fhl1
5.787124
MSCs
Akap12
5.615653
MSCs
Cyp1b1
5.522993
MSCs
Itga11
5.428756
MSCs
Smoc1
5.375434
MSCs
Acan
5.22488
MSCs
Cnn1
5.089795
MSCs
Mfap5
5.061273
MSCs
Tgfb3
4.94043
MSCs
Ogn
4.819063
MSCs
Grb14
4.716693
MSCs
Actg2
4.535768
MSCs
Cxcr7
4.504343
MSCs
Aoc3
4.478056
MSCs
Arsi
4.408675
MSCs
Cryab
4.163519
MSCs
Nexn
4.14033
MSCs
Wisp2
4.09953
MSCs
Pak1
4.065077
MSCs
Casq2
4.005767
MSCs
Hs6st2
3.943769
MSCs
237
Gene symbol
LogFC enrichment
Population enriched in
Ccdc80
3.892134
MSCs
Lmod1
3.867123
MSCs
Slc38a4
3.806375
MSCs
Amotl2
3.798151
MSCs
Prss23
3.793714
MSCs
Hbegf
3.717367
MSCs
Myocd
3.698403
MSCs
LOC290595
3.68784
MSCs
Stxbp6
3.666674
MSCs
Lox
3.662211
MSCs
Clec2dl1
3.593328
MSCs
Meox2
3.487526
MSCs
Myl9
3.460855
MSCs
Diaph3
3.460381
MSCs
Ptgr1
3.404171
MSCs
Slc24a3
3.400344
MSCs
Sytl2
3.399421
MSCs
Dysf
3.375439
MSCs
Anln
3.375205
MSCs
Slc2a3
3.369597
MSCs
Lims2
3.367772
MSCs
Prkg1
3.213902
MSCs
Hoxa10
3.213595
MSCs
Myh11
3.209889
MSCs
Pak3
3.200443
MSCs
Bmper
3.186379
MSCs
Cdh3
3.175107
MSCs
Fabp3
3.161417
MSCs
Anxa8
3.158857
MSCs
Cenpf
3.140417
MSCs
Ddah1
3.123098
MSCs
Abhd10
3.117166
MSCs
Jag1
3.111915
MSCs
Adamtsl3
3.088061
MSCs
Jub
3.075224
MSCs
Itpr1
3.010265
MSCs
Kif4
2.943342
MSCs
Tpm1
2.918062
MSCs
Prc1
2.91533
MSCs
Scrn1
2.908155
MSCs
Cdkn3
2.899309
MSCs
Zfpm2
2.884909
MSCs
Ctgf
2.884599
MSCs
238
Gene symbol
LogFC enrichment
Population enriched in
Tnfsf18
2.882138
MSCs
Wbscr17
2.872408
MSCs
Slc2a12
2.849829
MSCs
Ect2
2.834975
MSCs
Palmd
2.760869
MSCs
Ppp1r14a
2.758544
MSCs
Kbtbd10
2.753339
MSCs
Egfl6
2.750576
MSCs
Lgr4
2.744906
MSCs
Iqgap3
2.730602
MSCs
Sema3e
2.7087
MSCs
Glb1l2
2.694916
MSCs
Nalcn
2.69204
MSCs
Cep55
2.69109
MSCs
Ccnb1
2.671173
MSCs
RGD1309930
2.666364
MSCs
Slc8a1
2.650329
MSCs
RGD1309360
2.643812
MSCs
Lpp
2.637284
MSCs
Gas2l3_LOC687775
2.632208
MSCs
Gstm2
2.62979
MSCs
Ebpl
2.628666
MSCs
Gprc5a
2.625732
MSCs
Bcar3
2.616395
MSCs
Ahr
2.603595
MSCs
Kif23
2.598668
MSCs
Nek2
2.593458
MSCs
Mybl1
2.574959
MSCs
RGD1562646
2.571932
MSCs
Foxp1
2.566615
MSCs
Mrvi1
2.556722
MSCs
Cadm4
2.555991
MSCs
Fzd4
2.553415
MSCs
Kif2c
2.539284
MSCs
Sgms2
2.536525
MSCs
Cdc20
2.523547
MSCs
Samd9l
2.510591
MSCs
Nuf2
2.5042
MSCs
Pragmin
2.501032
MSCs
Crim1
2.498068
MSCs
Kif20a
2.481075
MSCs
Klf2
2.479546
MSCs
Bard1
2.474458
MSCs
239
Gene symbol
LogFC enrichment
Population enriched in
Pdlim5
2.471379
MSCs
Tpx2
2.468738
MSCs
Sntb1
2.460512
MSCs
Tpm2
2.456111
MSCs
Aard
2.447477
MSCs
Sgol2
2.4445
MSCs
Psat1
2.440751
MSCs
Ngf
2.423738
MSCs
Pdlim1
2.416225
MSCs
Car9
2.397039
MSCs
Cenpe
2.386557
MSCs
RGD1311642
2.385081
MSCs
Vldlr
2.384781
MSCs
Pftk1
2.375764
MSCs
Ebf2
2.375753
MSCs
Epas1
2.374871
MSCs
Adam23
2.374386
MSCs
LOC497860
2.368422
MSCs
Plod2
2.366713
MSCs
S1pr3
2.362976
MSCs
Sox30
2.353627
MSCs
Racgap1
2.34999
MSCs
Kif20b
2.348522
MSCs
Spc25
2.341741
MSCs
Dner
2.330488
MSCs
Cenpi
2.326654
MSCs
Flvcr2
2.319188
MSCs
RGD1307201
2.315084
MSCs
Cyr61
2.2983
MSCs
Dlgap5
2.294088
MSCs
Clec2d
2.289774
MSCs
Mastl
2.288684
MSCs
P4ha3
2.283988
MSCs
Plk1
2.280576
MSCs
C1qtnf5
2.275317
MSCs
Bub1b
2.264176
MSCs
Mybl2
2.249627
MSCs
RGD1310335
2.24446
MSCs
Casc5
2.244297
MSCs
Aspm
2.241747
MSCs
Adamts5
2.237903
MSCs
Rad51
2.237872
MSCs
Sync
2.233378
MSCs
240
Gene symbol
LogFC enrichment
Population enriched in
Spag5
2.233216
MSCs
Chmp4c
2.220502
MSCs
RGD1306565
2.219701
MSCs
Ttk
2.218258
MSCs
Camk2g
2.21016
MSCs
Espl1
2.20576
MSCs
Hmmr
2.192208
MSCs
Hnt
2.186537
MSCs
Ccnb2
2.185957
MSCs
Nr4a1
2.183144
MSCs
Bub1
2.174155
MSCs
Slitrk5
2.157658
MSCs
Nr3c2
2.152545
MSCs
Lonrf2
2.147856
MSCs
Col11a1
2.145523
MSCs
Ccna2
2.128854
MSCs
St5
2.124276
MSCs
Fam64a
2.117837
MSCs
Fancd2
2.111739
MSCs
Cdc2
2.105662
MSCs
Shroom1
2.10519
MSCs
Tacc3
2.096018
MSCs
Cnnm2
2.09465
MSCs
Eno3
2.093444
MSCs
Lamc3
2.093437
MSCs
Il6
2.091523
MSCs
Fam83d
2.084619
MSCs
Kif11
2.08098
MSCs
RGD1310376
2.075754
MSCs
Lmnb1
2.070811
MSCs
Grb10
2.066945
MSCs
Smoc2
2.0652
MSCs
Podnl1
2.059558
MSCs
Uhrf1
2.054711
MSCs
Pgm5
2.054271
MSCs
Spon2
2.050796
MSCs
Fam164a
2.043834
MSCs
Plk4
2.039775
MSCs
Vcl
2.037924
MSCs
Niban
2.023685
MSCs
Klhl30
2.021423
MSCs
Rcan2
2.019507
MSCs
Lhfp
2.010277
MSCs
241
Gene symbol
LogFC enrichment
Population enriched in
Cdca2
2.010068
MSCs
Fhl2
2.007997
MSCs
Dtl
2.002985
MSCs
Syde2
2.001031
MSCs
Zfhx3
1.997763
MSCs
Fat3
1.996478
MSCs
Nfia
1.994831
MSCs
Syne2
1.994614
MSCs
Slc7a5
1.983641
MSCs
Gen1
1.982272
MSCs
Sulf1
1.982193
MSCs
Glipr2
1.981042
MSCs
Mustn1
1.964918
MSCs
Smarca1
1.962925
MSCs
Hrasls
1.961982
MSCs
Hjurp
1.950612
MSCs
Mmp23
1.947182
MSCs
Kntc1
1.945821
MSCs
Kif22
1.944116
MSCs
Amot
1.936063
MSCs
Spc24
1.932001
MSCs
LOC691979
1.929052
MSCs
Cenpa
1.922477
MSCs
Flnc
1.912271
MSCs
Klhl13
1.910179
MSCs
Cdca3
1.90934
MSCs
Depdc1
1.908642
MSCs
Plekhk1
1.90516
MSCs
Cenpk
1.904267
MSCs
Serpine1
1.900852
MSCs
LOC683179
1.894496
MSCs
Ncam1
1.889711
MSCs
Cenpt
1.888805
MSCs
Slfn3
1.887402
MSCs
RGD1559690
1.883979
MSCs
Bicd1
1.882376
MSCs
Gadd45g
1.879958
MSCs
Phf17
1.877361
MSCs
Mlf1ip
1.876278
MSCs
Pcbd1
1.872876
MSCs
Ppp1r3c
1.868575
MSCs
Sh3md4
1.858281
MSCs
LOC684771
1.849719
MSCs
242
Gene symbol
LogFC enrichment
Population enriched in
RGD1559896
1.842205
MSCs
Six1
1.840969
MSCs
Gpr176
1.837392
MSCs
Basp1
1.83541
MSCs
Prr11
1.833908
MSCs
Cited2
1.831179
MSCs
Lama4
1.829976
MSCs
Ptprf
1.821942
MSCs
RGD735112
1.815782
MSCs
Magi3
1.814402
MSCs
RGD1308541
1.810995
MSCs
Brip1
1.808806
MSCs
RGD1561090
1.803992
MSCs
Pdk4
1.788501
MSCs
Odf3l1
1.788221
MSCs
Stil
1.785303
MSCs
Cdc6
1.784175
MSCs
Ckap2l
1.77599
MSCs
Kalrn
1.774908
MSCs
Tcea3
1.774461
MSCs
Aurka
1.772638
MSCs
Kif18b
1.769443
MSCs
Npas4
1.765861
MSCs
Amph
1.761949
MSCs
Ccne2
1.761286
MSCs
Gramd1c
1.7597
MSCs
Phgdh
1.742096
MSCs
Chst10
1.731918
MSCs
Fbxo5
1.725971
MSCs
Melk
1.724418
MSCs
Wwc2
1.724391
MSCs
Ssx2ip
1.7198
MSCs
Ccdc37
1.710348
MSCs
Cep72
1.7062
MSCs
Slc4a4
1.701822
MSCs
RGD1305412
1.70005
MSCs
Myo1c
1.693038
MSCs
Rapsn
1.690128
MSCs
Traf4af1
1.686846
MSCs
Lpar3
1.686723
MSCs
Hist2h3c2_Hist1h3f_LOC679950_LOC684762_LOC684841
1.683404
MSCs
RGD1562846
1.68111
MSCs
Arhgap11a
1.678273
MSCs
243
Gene symbol
LogFC enrichment
Population enriched in
Ccnf
1.676746
MSCs
Rrad
1.672345
MSCs
Ercc6l
1.667879
MSCs
Mcm10
1.666948
MSCs
Scn2a1
1.659186
MSCs
Fads3
1.657539
MSCs
LOC684611
1.656247
MSCs
Gja5
1.65283
MSCs
Tgfb1i1
1.645035
MSCs
Cdca8
1.644789
MSCs
Pde4b
1.642621
MSCs
E2f7
1.641516
MSCs
Casp12
1.641111
MSCs
P2rx5
1.639832
MSCs
Slc29a2
1.63483
MSCs
Nt5dc3
1.63426
MSCs
Asf1b
1.633098
MSCs
Fos
1.632931
MSCs
Kprp
1.631475
MSCs
Dnajb4
1.623556
MSCs
Bmp4
1.619565
MSCs
Fzd6
1.617197
MSCs
Plscr2
1.615788
MSCs
Pitx1
1.614561
MSCs
LOC305691
1.61005
MSCs
RGD1304693
1.609554
MSCs
Rin3
1.602719
MSCs
Pck2
1.601887
MSCs
Mansc1
1.601569
MSCs
Slc1a4
1.599831
MSCs
Apold1
1.59164
MSCs
Smc4
1.591152
MSCs
Smc2
1.587481
MSCs
Spta1
1.585526
MSCs
Zwilch
1.585128
MSCs
Sgca
1.585054
MSCs
Myadm
1.581937
MSCs
Adamts6
1.580761
MSCs
Rasl12
1.570126
MSCs
Fam26e
1.570074
MSCs
Sass6
1.569233
MSCs
Cav3
1.567815
MSCs
Depdc1b
1.565429
MSCs
244
Gene symbol
LogFC enrichment
Population enriched in
Pdia5
1.563651
MSCs
Foxm1
1.562253
MSCs
Trip13
1.558548
MSCs
Fat1
1.552651
MSCs
Rgs16
1.551015
MSCs
Geft
1.548072
MSCs
Tmem30b
1.546123
MSCs
Dusp8
1.545449
MSCs
Mcm6
1.544847
MSCs
Sorbs1
1.542864
MSCs
Col5a1
1.542143
MSCs
LOC682888
1.539402
MSCs
Zfp568
1.535262
MSCs
Dbf4
1.533319
MSCs
RGD1305450
1.533239
MSCs
Csrp1
1.532804
MSCs
Calml4
1.528867
MSCs
Fosb
1.524234
MSCs
RGD1306507
1.524096
MSCs
Pole
1.518326
MSCs
Flna
1.51607
MSCs
Zdhhc15
1.514909
MSCs
Hspc159
1.510997
MSCs
Tmem195
1.509481
MSCs
Crip2
1.50931
MSCs
Gins1
1.507104
MSCs
Cenpm
1.505979
MSCs
Klhdc8a
1.490918
MSCs
Setbp1
1.487603
MSCs
Kif15
1.484794
MSCs
Afap1
1.483962
MSCs
Aurkb
1.482251
MSCs
Mum1l1
1.47854
MSCs
Ccdc99
1.465444
MSCs
Foxs1
1.463887
MSCs
Pkp2
1.462688
MSCs
Lrig3
1.462068
MSCs
Zfp367
1.460579
MSCs
Rras2
1.455064
MSCs
Gtse1
1.452223
MSCs
Brca1
1.448116
MSCs
Cit
1.447295
MSCs
Scx
1.445946
MSCs
245
Gene symbol
LogFC enrichment
Population enriched in
Clspn
1.444163
MSCs
RGD1310784
1.443659
MSCs
Dusp14
1.438739
MSCs
Slc35f2
1.437791
MSCs
Ccdc18
1.428912
MSCs
Ncapd2
1.424076
MSCs
Pkmyt1
1.423767
MSCs
Kifc1
1.422214
MSCs
Rfc4
1.418915
MSCs
LOC362464
1.41099
MSCs
Smad6
1.409944
MSCs
LOC682649
1.409515
MSCs
Slc7a1
1.407705
MSCs
Samd4a
1.401517
MSCs
Gadd45b
1.400404
MSCs
Pdlim7
1.396858
MSCs
Hist1h2bb_LOC684647
1.391249
MSCs
Hoxa1
1.388269
MSCs
RGD1566107
1.38725
MSCs
Chrdl2
1.386097
MSCs
Tead3
1.384032
MSCs
LOC684534
1.383994
MSCs
LOC689399
1.383116
MSCs
RGD1309051
1.379638
MSCs
Mthfd2
1.373176
MSCs
Igf2bp3
1.371842
MSCs
Hmgb2
1.367814
MSCs
Cdc45l
1.364643
MSCs
Rhobtb3
1.362241
MSCs
Bok
1.356159
MSCs
Tpd52
1.345157
MSCs
Igf2bp1
1.337675
MSCs
Dlc1
1.334957
MSCs
Lrba
1.331905
MSCs
Dscc1
1.325862
MSCs
Cdc25c
1.325676
MSCs
Btn2a2
1.323856
MSCs
Fas
1.321422
MSCs
Troap
1.312044
MSCs
Arf2
1.310841
MSCs
Mnd1
1.301351
MSCs
Farp2
1.301342
MSCs
LOC689296
1.298606
MSCs
246
Gene symbol
LogFC enrichment
Population enriched in
Emd
1.296894
MSCs
Kif18a
1.293936
MSCs
Rnf150
1.293415
MSCs
Luzp5
1.291584
MSCs
RGD1563296
1.288481
MSCs
Col8a2
1.288463
MSCs
Actn1
1.282504
MSCs
Ankrd15
1.282415
MSCs
Rad54l
1.281679
MSCs
Ptpn14
1.280334
MSCs
Zfp469
1.280251
MSCs
Orc1l
1.279677
MSCs
Chst3
1.278381
MSCs
Myh9
1.277894
MSCs
Nacad
1.276872
MSCs
Eme1
1.276691
MSCs
Itgb1bp2
1.276208
MSCs
Dse
1.272807
MSCs
Csgalnact1
1.272569
MSCs
Trim59
1.266625
MSCs
LOC500700
1.264398
MSCs
Dsn1
1.262052
MSCs
Tmem144
1.261668
MSCs
Garnl4
1.260447
MSCs
Smad9
1.254173
MSCs
Rab9b
1.25407
MSCs
Dzip1l
1.252809
MSCs
RGD1308101
1.2528
MSCs
Myl6
1.250609
MSCs
Tmod2
1.249875
MSCs
Lrrcc1
1.246468
MSCs
Sgol1
1.246005
MSCs
RGD1309522
1.244637
MSCs
Ezh2
1.239213
MSCs
Mgat1
1.239113
MSCs
LOC500118
1.237289
MSCs
Gclm
1.236967
MSCs
Hip1
1.235322
MSCs
Ehbp1
1.233421
MSCs
Glra1
1.232095
MSCs
Hist1h2ail
1.23196
MSCs
Usp13
1.231576
MSCs
Nuak2
1.225236
MSCs
247
Gene symbol
LogFC enrichment
Population enriched in
Csgalnact2
1.22522
MSCs
Cmtm4
1.224424
MSCs
Phex
1.222562
MSCs
Gmnn
1.22044
MSCs
Klhl31
1.219011
MSCs
Ptpn21
1.210203
MSCs
LOC680565
1.209748
MSCs
RGD1565493
1.209541
MSCs
RGD1305288
1.208245
MSCs
Cdkn2c
1.205427
MSCs
Ctps
1.199407
MSCs
Tead1
1.197407
MSCs
Sema3b
1.196813
MSCs
LOC679958
1.195581
MSCs
Fam81a
1.193139
MSCs
Fez2
1.193071
MSCs
Hebp2
1.191734
MSCs
Topbp1
1.188391
MSCs
Hist2h2bb
1.187189
MSCs
Ncapd3
1.176062
MSCs
Zfp57
1.172868
MSCs
Fermt2
1.168342
MSCs
Ddx11
1.165571
MSCs
Sdpr
1.165222
MSCs
Ankrd50
1.158041
MSCs
Hist1h2bn
1.154474
MSCs
Rbpms2
1.151599
MSCs
Cdca7
1.150413
MSCs
Raph1
1.150341
MSCs
Kpna2
1.147317
MSCs
Rnf19b
1.144871
MSCs
Pycs
1.143195
MSCs
H2afx
1.143114
MSCs
Spa17
1.139523
MSCs
Gng3
1.136039
MSCs
Kif13a
1.134418
MSCs
Sdc3
1.134226
MSCs
Dpysl3
1.130437
MSCs
Chd3
1.130081
MSCs
Pmaip1
1.128317
MSCs
Snta1
1.126853
MSCs
RGD1561444
1.126628
MSCs
Lima1
1.125223
MSCs
248
Gene symbol
LogFC enrichment
Population enriched in
Il15
1.121316
MSCs
Ccdc34
1.119761
MSCs
Tuba1b
1.1123
MSCs
RGD1565514
1.111078
MSCs
RGD1311723
1.108701
MSCs
Chek1
1.108128
MSCs
Traip
1.104958
MSCs
Pif1
1.104593
MSCs
Myo10
1.10342
MSCs
Phlpp
1.102351
MSCs
Plcb3
1.100418
MSCs
Cltb
1.096971
MSCs
Slc9a3r2
1.096582
MSCs
Ccne1
1.093652
MSCs
Actn4
1.093246
MSCs
Ggt7
1.092181
MSCs
Mad2l1
1.08754
MSCs
Kif24
1.087532
MSCs
Pmf1
1.086
MSCs
Flnb
1.08499
MSCs
Phka2
1.083581
MSCs
Lhx2
1.08313
MSCs
RGD1559864
1.082552
MSCs
Znf569
1.079618
MSCs
Arhgap5
1.078053
MSCs
Hspg2
1.075367
MSCs
Lmo1
1.074913
MSCs
Ppm1e
1.073445
MSCs
Nup37
1.070194
MSCs
Rnd1
1.069068
MSCs
LOC366669
1.068581
MSCs
Tuba1c
1.067777
MSCs
Rad18
1.066146
MSCs
Specc1
1.063826
MSCs
Epb4.1l4a
1.063244
MSCs
Arid5b
1.060975
MSCs
Lmln
1.058453
MSCs
Ppp1r14c
1.057457
MSCs
Gm672
1.046797
MSCs
Dlx5
1.042727
MSCs
Tmpo
1.029393
MSCs
Dtnb
1.028785
MSCs
S100a11
1.026543
MSCs
249
Gene symbol
LogFC enrichment
Population enriched in
Rab23
1.024861
MSCs
Brca2
1.018536
MSCs
RGD1305834
1.015468
MSCs
Lig1
1.012114
MSCs
RGD1566017
1.008647
MSCs
Lats2
1.007614
MSCs
Cxadr
1.002953
MSCs
Fchsd1
1.001617
MSCs
Smarcd3
1.000893
MSCs
Rpa1
1.000753
MSCs
Ccdc19
0.999318
MSCs
Bag2
0.999226
MSCs
RGD1306959
0.998331
MSCs
Klk14
0.99808
MSCs
RGD1310453
0.997023
MSCs
Mnat1
0.993812
MSCs
LOC688667_LOC688856
0.991433
MSCs
Chtf18
0.987898
MSCs
Rfc3
0.98663
MSCs
Ccdc126
0.986204
MSCs
Fancb
0.985542
MSCs
Ccdc6
0.983361
MSCs
Neil3
0.983124
MSCs
Hist1h2bl
0.982372
MSCs
Cenpn
0.979684
MSCs
RGD1564851
0.973588
MSCs
Hist2h3c2
0.972597
MSCs
Hist2h2ac
0.972071
MSCs
Cfl2
0.970339
MSCs
Cald1
0.967778
MSCs
Chac1
0.966312
MSCs
Slc43a1
0.966018
MSCs
Sacs
0.965841
MSCs
Man2a1
0.964464
MSCs
Etfb
0.961831
MSCs
Rmi1
0.96168
MSCs
Tsc22d2
0.959404
MSCs
Hivep1
0.958742
MSCs
Recql4
0.954728
MSCs
LOC502894
0.953326
MSCs
Tbx4
0.949726
MSCs
Ube2c
0.948554
MSCs
Fam118a
0.947657
MSCs
250
Gene symbol
LogFC enrichment
Population enriched in
Atad5
0.947584
MSCs
Otub2
0.945935
MSCs
Rttn
0.938489
MSCs
Mcm2
0.936726
MSCs
Polr2f
0.933054
MSCs
Hspb1
0.929518
MSCs
Psph
0.929357
MSCs
Lrrfip1
0.927076
MSCs
Pold2
0.926056
MSCs
Kctd9
0.924984
MSCs
Egfl7
0.923466
MSCs
RGD1565800
0.922333
MSCs
Grin2d
0.919418
MSCs
Ica1
0.916431
MSCs
Pqlc3
0.908666
MSCs
Fmo4
0.905354
MSCs
RGD1563581
0.904496
MSCs
Ebf3
0.90002
MSCs
Btbd14a
0.899153
MSCs
Pid1
0.896085
MSCs
Vars
0.895589
MSCs
Deadc1
0.8951
MSCs
Cdh15
0.89501
MSCs
Cdc14b
0.892984
MSCs
Shmt1
0.891732
MSCs
Hdac7a
0.889758
MSCs
LOC680477
0.889563
MSCs
RGD1307897
0.889537
MSCs
Kifc3
0.889111
MSCs
Smyd2
0.887529
MSCs
Wsb2
0.887298
MSCs
Vwa1
0.885098
MSCs
RGD1309104
0.882353
MSCs
Ndufa12
0.882044
MSCs
Ankrd52
0.88161
MSCs
Tnfrsf10b
0.880219
MSCs
Znf618
0.877231
MSCs
Nup155
0.877008
MSCs
Rap1gds1
0.875686
MSCs
Hspa2
0.875312
MSCs
Spsb4
0.872341
MSCs
Lmnb2
0.872232
MSCs
Murc
0.863715
MSCs
251
Gene symbol
LogFC enrichment
Population enriched in
Dus4l
0.860021
MSCs
Slfn4
0.856277
MSCs
Gpx8
0.855493
MSCs
Kcnk12
0.844082
MSCs
Lmo4
0.842343
MSCs
Rad51c
0.833717
MSCs
Prim2
0.833545
MSCs
Aldh9a1
0.833079
MSCs
Donson
0.831719
MSCs
LOC680531
0.828979
MSCs
RGD1561381
0.826888
MSCs
Snrpa
0.824862
MSCs
Osr1
0.823524
MSCs
Cenpl
0.821522
MSCs
Numbl
0.82063
MSCs
Prepl
0.820366
MSCs
RGD1310263
0.817502
MSCs
Clic4
0.817495
MSCs
Ddx1
0.817175
MSCs
Zfp382
0.814465
MSCs
Wdr51a
0.812226
MSCs
Srf
0.804712
MSCs
Pbx3
0.801977
MSCs
Kpnb1
0.800526
MSCs
Zswim6
0.799177
MSCs
Ifit1lb
0.796812
MSCs
Zfyve16
0.796099
MSCs
LOC687694
0.792873
MSCs
Ddx59
0.790222
MSCs
Ikzf2
0.787695
MSCs
Rock2
0.786616
MSCs
RGD1307392
0.785065
MSCs
Cep76
0.784812
MSCs
Gnrh1
0.783452
MSCs
Lce1l
0.781424
MSCs
Ckap5
0.780541
MSCs
Mkl1
0.778233
MSCs
Ppp1r13b
0.777774
MSCs
Rcbtb2
0.776925
MSCs
Fam98a
0.775675
MSCs
Tnk2
0.77436
MSCs
Nde1
0.772563
MSCs
Bcap29
0.771976
MSCs
252
Gene symbol
LogFC enrichment
Population enriched in
Slc2a10
0.769331
MSCs
RGD1311357
0.766269
MSCs
LOC498265
0.764429
MSCs
Wdr1
0.764231
MSCs
Tubb2c
0.764179
MSCs
Fignl1
0.763991
MSCs
Lrp6
0.763018
MSCs
Xrcc5
0.759323
MSCs
Pus7
0.757055
MSCs
LOC303566
0.752332
MSCs
Tcof1
0.751344
MSCs
Arl6ip1
0.751149
MSCs
Arhgap21
0.74874
MSCs
Nup107
0.745331
MSCs
Cyb5r1
0.744133
MSCs
LOC499418
0.743873
MSCs
Uqcrq
0.737866
MSCs
Vps26b
0.737671
MSCs
Xkr5
0.735592
MSCs
Nme1
0.734099
MSCs
Ptdss1
0.729706
MSCs
Cdh24
0.728564
MSCs
Dock6
0.726929
MSCs
Dzip1
0.726798
MSCs
Pold1
0.723757
MSCs
Nup133
0.720452
MSCs
Fam124a
0.719128
MSCs
Dst
0.717113
MSCs
Igf1r
0.712456
MSCs
Pcyox1
0.711349
MSCs
Cend1
0.708776
MSCs
Gpr75
0.708735
MSCs
Mthfd1
0.697434
MSCs
Pcaf
0.697153
MSCs
Hat1
0.696301
MSCs
Hist1h2aa
0.694028
MSCs
Pfn1
0.690669
MSCs
Mras
0.689433
MSCs
RGD1307648
0.688893
MSCs
Znf483
0.683073
MSCs
Rad50
0.68213
MSCs
Eif2b2
0.681619
MSCs
Cdk5rap2
0.67669
MSCs
253
Gene symbol
LogFC enrichment
Population enriched in
Sh3bp5l
0.675682
MSCs
Galnt12
0.67034
MSCs
Nasp
0.668743
MSCs
Prkci
0.668136
MSCs
Serpinb2
0.663595
MSCs
Jtv1
0.662664
MSCs
Epb4.1l5
0.662612
MSCs
Mapk12
0.661378
MSCs
Oxsr1
0.660418
MSCs
Nup85
0.656737
MSCs
Ankrd45
0.653621
MSCs
Fam46b
0.653448
MSCs
Rangap1
0.650477
MSCs
Sec61g
0.647782
MSCs
Pfkfb1
0.646639
MSCs
Paqr6
0.646637
MSCs
Tbrg1
0.645657
MSCs
Tubb5
0.642566
MSCs
Pls3
0.640865
MSCs
Plekhg3
0.639197
MSCs
Rap2ip
0.625015
MSCs
Cpeb3
0.624811
MSCs
RGD1306227
0.624198
MSCs
Ccdc85b
0.619519
MSCs
Kcnh2
0.619411
MSCs
Mcm8
0.615113
MSCs
Kif2a
0.61001
MSCs
Mxd3
0.60728
MSCs
Nlp
0.606752
MSCs
Spna2
0.604583
MSCs
Hcca2
0.603778
MSCs
RGD1562044
0.602541
MSCs
Apbb1
0.5993
MSCs
Hdgf
0.592069
MSCs
Uba6
0.582823
MSCs
Tnfsf15
0.582577
MSCs
Dars2
0.580561
MSCs
Epm2a
0.578651
MSCs
Lrrfip2
0.577533
MSCs
Hsf2bp
0.573738
MSCs
Efemp2
0.558699
MSCs
LOC500054
0.540898
MSCs
Hdac6
0.526934
MSCs
254
Appendix B Candidate pluripotency genes used for seriation analysis in Chapter 2
Table B.1 Candidate pluripotency genes used for seriation analysis in Chapter 2
A list of 319 candidate pluripotency genes selected by Dr. Connie Eaves‘ laboratory and used
to identify those markers for which the expression is increased in undifferentiated ES cells
(Supercontig 1). The genes are sorted alphabetically.
Gene symbol
Seriation result
Comment
ACTC1
Supercontig 1
Enriched in undifferentiated ES cells
ACVR2B
Supercontig 1
Enriched in undifferentiated ES cells
ADAM23
Supercontig 1
Enriched in undifferentiated ES cells
ARID3B
Supercontig 1
Enriched in undifferentiated ES cells
AURKB
Supercontig 1
Enriched in undifferentiated ES cells
C15orf15
Supercontig 1
Enriched in undifferentiated ES cells
CA14
Supercontig 1
Enriched in undifferentiated ES cells
CD24L4
Supercontig 1
Enriched in undifferentiated ES cells
CDX2
Supercontig 1
Enriched in undifferentiated ES cells
CENPK
Supercontig 1
Enriched in undifferentiated ES cells
CER1
Supercontig 1
Enriched in undifferentiated ES cells
CGB
Supercontig 1
Enriched in undifferentiated ES cells
COASY
Supercontig 1
Enriched in undifferentiated ES cells
CRABP2
Supercontig 1
Enriched in undifferentiated ES cells
CTH
Supercontig 1
Enriched in undifferentiated ES cells
CTNNB1
Supercontig 1
Enriched in undifferentiated ES cells
CYP26A1
Supercontig 1
Enriched in undifferentiated ES cells
DAZL
Supercontig 1
Enriched in undifferentiated ES cells
DIAPH2
Supercontig 1
Enriched in undifferentiated ES cells
DKC1
Supercontig 1
Enriched in undifferentiated ES cells
DPPA2
Supercontig 1
Enriched in undifferentiated ES cells
DPPA5
Supercontig 1
Enriched in undifferentiated ES cells
DSG2
Supercontig 1
Enriched in undifferentiated ES cells
EED
Supercontig 1
Enriched in undifferentiated ES cells
EEF1A1
Supercontig 1
Enriched in undifferentiated ES cells
EIF4A1
Supercontig 1
Enriched in undifferentiated ES cells
EPCAM
Supercontig 1
Enriched in undifferentiated ES cells
ETV4
Supercontig 1
Enriched in undifferentiated ES cells
FABP5
Supercontig 1
Enriched in undifferentiated ES cells
FAM46B
Supercontig 1
Enriched in undifferentiated ES cells
FAM64A
Supercontig 1
Enriched in undifferentiated ES cells
FBXO15
Supercontig 1
Enriched in undifferentiated ES cells
FGF4
Supercontig 1
Enriched in undifferentiated ES cells
FGF5
Supercontig 1
Enriched in undifferentiated ES cells
255
Gene symbol
Seriation result
Comment
FGF8
Supercontig 1
Enriched in undifferentiated ES cells
FLJ10884
Supercontig 1
Enriched in undifferentiated ES cells
FN1
Supercontig 1
Enriched in undifferentiated ES cells
FOXD3
Supercontig 1
Enriched in undifferentiated ES cells
FOXH1
Supercontig 1
Enriched in undifferentiated ES cells
GAL
Supercontig 1
Enriched in undifferentiated ES cells
GBX2
Supercontig 1
Enriched in undifferentiated ES cells
GDF3
Supercontig 1
Enriched in undifferentiated ES cells
GPC4
Supercontig 1
Enriched in undifferentiated ES cells
GRB7
Supercontig 1
Enriched in undifferentiated ES cells
HMGB2
Supercontig 1
Enriched in undifferentiated ES cells
HNRPA1L3
Supercontig 1
Enriched in undifferentiated ES cells
HOMER1
Supercontig 1
Enriched in undifferentiated ES cells
IFITM2
Supercontig 1
Enriched in undifferentiated ES cells
IGF2BP3
Supercontig 1
Enriched in undifferentiated ES cells
INS
Supercontig 1
Enriched in undifferentiated ES cells
ISL1
Supercontig 1
Enriched in undifferentiated ES cells
KANK4
Supercontig 1
Enriched in undifferentiated ES cells
KPNA2
Supercontig 1
Enriched in undifferentiated ES cells
LAMB1
Supercontig 1
Enriched in undifferentiated ES cells
LECT1
Supercontig 1
Enriched in undifferentiated ES cells
LEFTY1
Supercontig 1
Enriched in undifferentiated ES cells
LEFTY2
Supercontig 1
Enriched in undifferentiated ES cells
LIN28
Supercontig 1
Enriched in undifferentiated ES cells
LRRN1
Supercontig 1
Enriched in undifferentiated ES cells
MAD2L2
Supercontig 1
Enriched in undifferentiated ES cells
MMP1
Supercontig 1
Enriched in undifferentiated ES cells
MTHFD1
Supercontig 1
Enriched in undifferentiated ES cells
MYBL2
Supercontig 1
Enriched in undifferentiated ES cells
NANOGP8
Supercontig 1
Enriched in undifferentiated ES cells
NES
Supercontig 1
Enriched in undifferentiated ES cells
NLRP2
Supercontig 1
Enriched in undifferentiated ES cells
NMU
Supercontig 1
Enriched in undifferentiated ES cells
NOL11
Supercontig 1
Enriched in undifferentiated ES cells
NPM1
Supercontig 1
Enriched in undifferentiated ES cells
NPPA
Supercontig 1
Enriched in undifferentiated ES cells
NR0B1
Supercontig 1
Enriched in undifferentiated ES cells
NUDT5
Supercontig 1
Enriched in undifferentiated ES cells
OTX2
Supercontig 1
Enriched in undifferentiated ES cells
PITX2
Supercontig 1
Enriched in undifferentiated ES cells
POU5F1
Supercontig 1
Enriched in undifferentiated ES cells
PTTG1
Supercontig 1
Enriched in undifferentiated ES cells
RAB3B
Supercontig 1
Enriched in undifferentiated ES cells
256
Gene symbol
Seriation result
Comment
REXO2
Supercontig 1
Enriched in undifferentiated ES cells
ROR1
Supercontig 1
Enriched in undifferentiated ES cells
RP6-213H19.1
Supercontig 1
Enriched in undifferentiated ES cells
RPL17
Supercontig 1
Enriched in undifferentiated ES cells
RPL23
Supercontig 1
Enriched in undifferentiated ES cells
RPL6P27
Supercontig 1
Enriched in undifferentiated ES cells
SALL2
Supercontig 1
Enriched in undifferentiated ES cells
SALL4
Supercontig 1
Enriched in undifferentiated ES cells
SCGB3A2
Supercontig 1
Enriched in undifferentiated ES cells
SET
Supercontig 1
Enriched in undifferentiated ES cells
SILV
Supercontig 1
Enriched in undifferentiated ES cells
SLC16A1
Supercontig 1
Enriched in undifferentiated ES cells
SLC39A10
Supercontig 1
Enriched in undifferentiated ES cells
SMS
Supercontig 1
Enriched in undifferentiated ES cells
SNRPF
Supercontig 1
Enriched in undifferentiated ES cells
SNRPN
Supercontig 1
Enriched in undifferentiated ES cells
SOX2
Supercontig 1
Enriched in undifferentiated ES cells
SST
Supercontig 1
Enriched in undifferentiated ES cells
SYCP3
Supercontig 1
Enriched in undifferentiated ES cells
TBX4
Supercontig 1
Enriched in undifferentiated ES cells
TCL1A
Supercontig 1
Enriched in undifferentiated ES cells
TDGF1
Supercontig 1
Enriched in undifferentiated ES cells
TERF1
Supercontig 1
Enriched in undifferentiated ES cells
TEX19
Supercontig 1
Enriched in undifferentiated ES cells
TFCP2L1
Supercontig 1
Enriched in undifferentiated ES cells
TPM1
Supercontig 1
Enriched in undifferentiated ES cells
TRAP1
Supercontig 1
Enriched in undifferentiated ES cells
TUBB
Supercontig 1
Enriched in undifferentiated ES cells
USO1
Supercontig 1
Enriched in undifferentiated ES cells
UTF1
Supercontig 1
Enriched in undifferentiated ES cells
VASH2
Supercontig 1
Enriched in undifferentiated ES cells
VAT1L
Supercontig 1
Enriched in undifferentiated ES cells
WDSOF1
Supercontig 1
Enriched in undifferentiated ES cells
ZFP42
Supercontig 1
Enriched in undifferentiated ES cells
ZFP57
Supercontig 1
Enriched in undifferentiated ES cells
ZIC3
Supercontig 1
Enriched in undifferentiated ES cells
ZSCAN10
Supercontig 1
Enriched in undifferentiated ES cells
AC002480.6
Supercontig 2
Enriched in differentiated ES cells
ACTB
Supercontig 2
Enriched in differentiated ES cells
AK3
Supercontig 2
Enriched in differentiated ES cells
AMMECR1
Supercontig 2
Enriched in differentiated ES cells
ANKRD10
Supercontig 2
Enriched in differentiated ES cells
ARL5B
Supercontig 2
Enriched in differentiated ES cells
257
Gene symbol
Seriation result
Comment
ASH2L
Supercontig 2
Enriched in differentiated ES cells
AURKA
Supercontig 2
Enriched in differentiated ES cells
B2M
Supercontig 2
Enriched in differentiated ES cells
BIRC5
Supercontig 2
Enriched in differentiated ES cells
C12orf48
Supercontig 2
Enriched in differentiated ES cells
CACHD1
Supercontig 2
Enriched in differentiated ES cells
CALU
Supercontig 2
Enriched in differentiated ES cells
CCNB1IP1
Supercontig 2
Enriched in differentiated ES cells
CCNC
Supercontig 2
Enriched in differentiated ES cells
CCT8
Supercontig 2
Enriched in differentiated ES cells
CDC2
Supercontig 2
Enriched in differentiated ES cells
CDH5
Supercontig 2
Enriched in differentiated ES cells
CDK2
Supercontig 2
Enriched in differentiated ES cells
CDT1
Supercontig 2
Enriched in differentiated ES cells
CENPF
Supercontig 2
Enriched in differentiated ES cells
CHORDC1
Supercontig 2
Enriched in differentiated ES cells
CHRNA7
Supercontig 2
Enriched in differentiated ES cells
COBL
Supercontig 2
Enriched in differentiated ES cells
COL2A1
Supercontig 2
Enriched in differentiated ES cells
COL5A2
Supercontig 2
Enriched in differentiated ES cells
COMMD3
Supercontig 2
Enriched in differentiated ES cells
CRABP1
Supercontig 2
Enriched in differentiated ES cells
CXorf15
Supercontig 2
Enriched in differentiated ES cells
DDX21
Supercontig 2
Enriched in differentiated ES cells
DES
Supercontig 2
Enriched in differentiated ES cells
DPPA4
Supercontig 2
Enriched in differentiated ES cells
EDNRB
Supercontig 2
Enriched in differentiated ES cells
EIF4EBP1
Supercontig 2
Enriched in differentiated ES cells
ELOVL6
Supercontig 2
Enriched in differentiated ES cells
EPRS
Supercontig 2
Enriched in differentiated ES cells
ERBB2
Supercontig 2
Enriched in differentiated ES cells
ESRRB
Supercontig 2
Enriched in differentiated ES cells
ETV5
Supercontig 2
Enriched in differentiated ES cells
FAM83D
Supercontig 2
Enriched in differentiated ES cells
FGF2
Supercontig 2
Enriched in differentiated ES cells
FLT1
Supercontig 2
Enriched in differentiated ES cells
GAPDH
Supercontig 2
Enriched in differentiated ES cells
GCG
Supercontig 2
Enriched in differentiated ES cells
GCM1
Supercontig 2
Enriched in differentiated ES cells
GFAP
Supercontig 2
Enriched in differentiated ES cells
GGTLA1
Supercontig 2
Enriched in differentiated ES cells
GLIS2
Supercontig 2
Enriched in differentiated ES cells
GNL3
Supercontig 2
Enriched in differentiated ES cells
258
Gene symbol
Seriation result
Comment
HBB
Supercontig 2
Enriched in differentiated ES cells
HBZ
Supercontig 2
Enriched in differentiated ES cells
HCK
Supercontig 2
Enriched in differentiated ES cells
HDAC2
Supercontig 2
Enriched in differentiated ES cells
HMGA1
Supercontig 2
Enriched in differentiated ES cells
HMGB3
Supercontig 2
Enriched in differentiated ES cells
HSPA4
Supercontig 2
Enriched in differentiated ES cells
HSPD1
Supercontig 2
Enriched in differentiated ES cells
IAPP
Supercontig 2
Enriched in differentiated ES cells
IDH1
Supercontig 2
Enriched in differentiated ES cells
IFITM1
Supercontig 2
Enriched in differentiated ES cells
IGF1R
Supercontig 2
Enriched in differentiated ES cells
IGF2
Supercontig 2
Enriched in differentiated ES cells
IGF2BP2
Supercontig 2
Enriched in differentiated ES cells
IMPDH2
Supercontig 2
Enriched in differentiated ES cells
JMJD2C
Supercontig 2
Enriched in differentiated ES cells
KIF4A
Supercontig 2
Enriched in differentiated ES cells
KIT
Supercontig 2
Enriched in differentiated ES cells
KLF2
Supercontig 2
Enriched in differentiated ES cells
KLF4
Supercontig 2
Enriched in differentiated ES cells
KLF5
Supercontig 2
Enriched in differentiated ES cells
LAMA1
Supercontig 2
Enriched in differentiated ES cells
LAMC1
Supercontig 2
Enriched in differentiated ES cells
LAPTM4B
Supercontig 2
Enriched in differentiated ES cells
LDHB
Supercontig 2
Enriched in differentiated ES cells
LIFR
Supercontig 2
Enriched in differentiated ES cells
LMAN1
Supercontig 2
Enriched in differentiated ES cells
LMNB2
Supercontig 2
Enriched in differentiated ES cells
MANBA
Supercontig 2
Enriched in differentiated ES cells
MGST1
Supercontig 2
Enriched in differentiated ES cells
MKRNP6
Supercontig 2
Enriched in differentiated ES cells
MTHFD2
Supercontig 2
Enriched in differentiated ES cells
MTMR7
Supercontig 2
Enriched in differentiated ES cells
MYC
Supercontig 2
Enriched in differentiated ES cells
NCAPG2
Supercontig 2
Enriched in differentiated ES cells
NFYC
Supercontig 2
Enriched in differentiated ES cells
NODAL
Supercontig 2
Enriched in differentiated ES cells
NR5A2
Supercontig 2
Enriched in differentiated ES cells
NR6A1
Supercontig 2
Enriched in differentiated ES cells
NUMB
Supercontig 2
Enriched in differentiated ES cells
NUSAP1
Supercontig 2
Enriched in differentiated ES cells
OLA1
Supercontig 2
Enriched in differentiated ES cells
PAX6
Supercontig 2
Enriched in differentiated ES cells
259
Gene symbol
Seriation result
Comment
PECAM1
Supercontig 2
Enriched in differentiated ES cells
PHC1
Supercontig 2
Enriched in differentiated ES cells
PHF17
Supercontig 2
Enriched in differentiated ES cells
PODXL
Supercontig 2
Enriched in differentiated ES cells
POU4F2
Supercontig 2
Enriched in differentiated ES cells
PPAT
Supercontig 2
Enriched in differentiated ES cells
PSMA2
Supercontig 2
Enriched in differentiated ES cells
PSMA3
Supercontig 2
Enriched in differentiated ES cells
PTEN
Supercontig 2
Enriched in differentiated ES cells
RBBP9
Supercontig 2
Enriched in differentiated ES cells
RCC2
Supercontig 2
Enriched in differentiated ES cells
REST
Supercontig 2
Enriched in differentiated ES cells
RPL10A
Supercontig 2
Enriched in differentiated ES cells
RPL24
Supercontig 2
Enriched in differentiated ES cells
RPL4
Supercontig 2
Enriched in differentiated ES cells
RPL7
Supercontig 2
Enriched in differentiated ES cells
RPLP0P6
Supercontig 2
Enriched in differentiated ES cells
RPS24
Supercontig 2
Enriched in differentiated ES cells
RRP12
Supercontig 2
Enriched in differentiated ES cells
SALL1
Supercontig 2
Enriched in differentiated ES cells
SDC4
Supercontig 2
Enriched in differentiated ES cells
SEMA3A
Supercontig 2
Enriched in differentiated ES cells
SEPHS1
Supercontig 2
Enriched in differentiated ES cells
SERPINH1
Supercontig 2
Enriched in differentiated ES cells
SFRP2
Supercontig 2
Enriched in differentiated ES cells
SFRS7
Supercontig 2
Enriched in differentiated ES cells
SMAD2
Supercontig 2
Enriched in differentiated ES cells
SMAD3
Supercontig 2
Enriched in differentiated ES cells
SMC4
Supercontig 2
Enriched in differentiated ES cells
SOCS2
Supercontig 2
Enriched in differentiated ES cells
SPP1
Supercontig 2
Enriched in differentiated ES cells
SSB
Supercontig 2
Enriched in differentiated ES cells
SYP
Supercontig 2
Enriched in differentiated ES cells
TAT
Supercontig 2
Enriched in differentiated ES cells
TBC1D23
Supercontig 2
Enriched in differentiated ES cells
TBX3
Supercontig 2
Enriched in differentiated ES cells
TCF15
Supercontig 2
Enriched in differentiated ES cells
TCF3
Supercontig 2
Enriched in differentiated ES cells
TH
Supercontig 2
Enriched in differentiated ES cells
TLE4
Supercontig 2
Enriched in differentiated ES cells
TNNT1
Supercontig 2
Enriched in differentiated ES cells
TPX2
Supercontig 2
Enriched in differentiated ES cells
TUBB4
Supercontig 2
Enriched in differentiated ES cells
260
Gene symbol
Seriation result
Comment
UGP2
Supercontig 2
Enriched in differentiated ES cells
WDR77
Supercontig 2
Enriched in differentiated ES cells
WNT3
Supercontig 2
Enriched in differentiated ES cells
WT1
Supercontig 2
Enriched in differentiated ES cells
XIST
Supercontig 2
Enriched in differentiated ES cells
XPO1
Supercontig 2
Enriched in differentiated ES cells
ZFPM2
Supercontig 2
Enriched in differentiated ES cells
ZNF117
Supercontig 2
Enriched in differentiated ES cells
ZNF43
Supercontig 2
Enriched in differentiated ES cells
ZNF90
Supercontig 2
Enriched in differentiated ES cells
ALDH18A1
Supercontig 3
Other
BXDC2
Supercontig 3
Other
C13orf7
Supercontig 3
Other
C20orf129
Supercontig 3
Other
CCNB1
Supercontig 3
Other
CD9
Supercontig 3
Other
CLDN6
Supercontig 3
Other
CLDN7
Supercontig 3
Other
COL1A1
Supercontig 3
Other
DDX4
Supercontig 3
Other
DLG7
Supercontig 3
Other
DLGAP5
Supercontig 3
Other
DNMT3B
Supercontig 3
Other
ERVWE1
Supercontig 3
Other
FAM29A
Supercontig 3
Other
FZD7
Supercontig 3
Other
GABARAPL1
Supercontig 3
Other
GABRB3
Supercontig 3
Other
GALNT7
Supercontig 3
Other
GJA1
Supercontig 3
Other
GSH1
Supercontig 3
Other
HNRNPAB
Supercontig 3
Other
IGF2BP1
Supercontig 3
Other
IL6ST
Supercontig 3
Other
IPO7
Supercontig 3
Other
JMJD1A
Supercontig 3
Other
KRT1
Supercontig 3
Other
KRT18P19
Supercontig 3
Other
MCM2
Supercontig 3
Other
MDK
Supercontig 3
Other
MTF2
Supercontig 3
Other
MYF5
Supercontig 3
Other
NASP
Supercontig 3
Other
261
Gene symbol
Seriation result
Comment
NBR2
Supercontig 3
Other
NME1-NME2
Supercontig 3
Other
NOG
Supercontig 3
Other
NUP205
Supercontig 3
Other
PAMR1
Supercontig 3
Other
PAX4
Supercontig 3
Other
PCNA
Supercontig 3
Other
PDX1
Supercontig 3
Other
PHIP
Supercontig 3
Other
POLR3G
Supercontig 3
Other
PSIP1
Supercontig 3
Other
PTF1A
Supercontig 3
Other
PTPRZ1
Supercontig 3
Other
RAD51AP1
Supercontig 3
Other
RAF1
Supercontig 3
Other
SCNN1A
Supercontig 3
Other
SEMA6A
Supercontig 3
Other
SKP2
Supercontig 3
Other
TCEA1
Supercontig 3
Other
TERT
Supercontig 3
Other
TK1
Supercontig 3
Other
UBA2
Supercontig 3
Other
UBE2T
Supercontig 3
Other
ZMYM2
Supercontig 3
Other
ZNF257
Supercontig 3
Other
ZNF263
Supercontig 3
Other
ZNF296
Supercontig 3
Other
262
Appendix C Transcripts enriched and depleted in NBL TICs
Table C.1 Transcripts enriched and depleted in NBL TICs
Transcripts enriched (LogFC>0) or depleted (LogFC<0) in abundance in NBL TICs
compared to SKPs and other cancers have been identified as described in Chapter 3. The
RNA-Seq-based log fold change values (LogFC) of NBL TICs compared to SKPs and NBL
TICs compared to other cancers are listed in the table. The transcripts are ordered based on
the magnitude of the log fold change in the NBL TICs vs. SKPs comparison.
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
MKI67
26.96693646
20.24348157
SYNE2
24.89213386
9.826071774
NUP210
23.07971503
11.262793
SLC38A1
21.94938243
9.734695824
ODC1
18.84307121
13.50929761
TOP2A
16.30455777
9.608052033
RRM2
15.57444611
11.82179336
UCP2
15.57083886
5.3169606
CENPF
15.37177492
8.159367496
MYBL2
15.19227895
9.661516845
ATP8A1
14.8453599
10.23366053
PPP1R16B
14.30008339
10.15757581
LMNB1
13.41180995
8.246556372
HMGB2
12.86230979
7.356895948
KIAA0922
12.60446044
11.94840748
CCDC88C
12.49395783
6.094528175
MCM2
12.35126443
8.644136302
CYFIP2
11.91765271
13.49135946
BUB1
11.88377025
6.827490744
MCM4
11.70087367
8.786203302
HNRNPU
11.65596287
7.663725376
SLC1A4
11.59411326
7.631615754
NUSAP1
11.28618426
6.42912675
WHSC1
11.23229062
7.525906497
NCAPD2
11.2119662
7.998102303
MCM3
11.20572045
6.916758222
MCM7
10.90358042
7.294148992
TPX2
10.88247709
5.193337769
GLCCI1
10.61611644
8.33243708
TMPO
9.895594821
6.576471432
FOXM1
9.872573723
6.259132006
SFRS2
9.852469772
5.338115715
263
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
PARP1
9.806251555
5.963989379
FANCI
9.739436411
6.190187583
PLK1
9.715453192
5.98535316
POLE
9.644724181
7.005488763
BCL2
9.431311828
6.027401695
LBR
9.202775362
5.473504232
SLAIN1
9.109385266
4.29268503
HTT
9.030088243
8.781018918
SPAG5
8.737359666
5.036636031
KIF11
8.677708674
4.924155792
REC8
8.641534886
5.177187826
SFRS1
8.629057042
5.233936209
GTSE1
8.586073335
5.376473968
EPB41
8.582966105
5.082246866
KNTC1
8.58091033
5.259046823
PCNA
8.578027568
6.019505716
C13orf23
8.522257983
6.505111731
HELLS
8.413872205
5.294195148
KPNB1
8.349121905
6.883673178
TYMS
8.31849503
7.1039114
CCNA2
8.275261702
5.608805409
STRBP
8.265040666
4.128132161
FANCA
8.250961879
4.763474473
TNIK
8.250470164
6.347733225
SFPQ
8.21920524
7.393266297
SSH2
8.123243985
4.700551118
HNRNPD
8.118885354
6.12481756
BRCA1
8.080316984
4.694020474
BUB1B
8.022160021
4.766682184
TRAP1
8.020173086
4.726922977
HNRNPH1
7.910774749
6.031343787
POLQ
7.904287215
4.522189398
CCNF
7.83925954
5.109190013
DTL
7.827754961
4.950282716
TIMELESS
7.814059837
4.270228836
NCAPH
7.764024636
4.896804715
ESPL1
7.715435082
5.328877224
HNRNPM
7.704609653
5.00087792
FANCD2
7.676870984
3.472258776
LARP1
7.652278445
8.553497042
EZH2
7.600979598
4.249810161
KIF2C
7.476052997
4.112622992
PAICS
7.374446298
6.454305355
264
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
NCAPD3
7.36873723
4.220636378
SSRP1
7.343881609
5.213869182
ZWINT
7.27778899
3.742347744
CXXC4
7.267073319
4.16133507
TTF2
7.236692033
4.843751718
HMMR
7.233360734
4.540960488
H2AFZ
7.225104774
5.098107304
RRM1
7.194620552
4.913623607
ARHGAP11A
7.136910455
3.889090542
KIAA0226
7.132822101
3.950143242
C15orf42
7.131260335
5.094445131
NCAPG2
7.128260642
3.913703109
CDCA5
7.063811558
4.306104755
WDR62
6.987701084
4.191356296
FUBP1
6.929092106
5.184804987
ASF1B
6.921080204
4.046731342
FEN1
6.851343898
5.216085613
FCHSD2
6.848150542
3.684554185
INCENP
6.833092793
4.923179017
CDCA8
6.831699999
3.587798407
BLM
6.730885681
3.525631711
MCM6
6.722370964
4.874330191
NIPA1
6.702036174
3.742189044
SH2D3C
6.699618285
4.390826168
KCNQ5
6.680977014
5.889312755
BIRC5
6.679754362
3.515619857
CHAF1A
6.665332109
3.93000489
COCH
6.633708753
4.998343929
KIF15
6.596667348
3.711546305
E2F8
6.571113522
4.14937197
RFC3
6.55482491
3.814897155
NOLC1
6.552213307
5.163050984
TOPBP1
6.549238094
3.391938776
R3HDM1
6.49768625
5.017031599
KIAA0101
6.385035305
3.936724279
DLGAP5
6.325357025
3.69758135
STIL
6.309741949
2.937222442
FCHO1
6.300447737
3.948016621
CCNB2
6.275291841
3.364012223
E2F2
6.247975595
3.554071586
PPM1G
6.23586497
3.72597527
MTHFD1
6.226972732
4.768734908
PRR11
6.19794566
4.568309711
265
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
TROAP
6.15467655
3.4151157
TAF15
6.128098956
4.719746686
HJURP
6.125644276
2.833539817
FTSJD2
6.125379201
4.413928313
RFX5
6.103610472
3.891829312
AURKB
6.016255948
3.551091541
DHTKD1
6.015728164
2.518022525
MTMR4
5.984615575
5.538871904
ATIC
5.954323285
5.265440496
RFWD3
5.89016915
4.080894344
MCM10
5.827584837
3.015809736
SLC7A6
5.783429301
5.653614402
CDC6
5.782590016
3.128265846
PLK4
5.778220979
3.090520595
EXO1
5.775830175
3.022462056
CLSPN
5.738688325
3.145796671
INTS7
5.706283007
3.181182621
CDC45L
5.689564308
3.468064778
PDCD11
5.681473873
5.31686487
XRCC2
5.669930973
3.595010252
MARS
5.664132882
4.845495405
PSME3
5.63737278
3.774099378
SLCO5A1
5.594555538
4.17915762
POLA2
5.552885592
4.279033683
ARID3B
5.491653346
4.217263113
MSH2
5.482425558
1.794540467
GOT2
5.475664796
3.79530811
PDE7A
5.382296032
2.765055754
ORC1L
5.367773468
3.895330687
ORC6L
5.367417719
3.467293101
SUPT16H
5.367406587
4.208585658
ANKRD44
5.34289189
4.401552665
FSD1L
5.339145486
4.078777794
C13orf3
5.298071971
2.721428925
GINS2
5.279494061
3.346340865
DEPDC1B
5.246522316
2.524779376
UBAP2
5.24211682
3.456639147
HNRNPR
5.202415118
3.814980856
CDCA2
5.195174062
2.889317414
MCM8
5.179990587
2.750677467
LRRC61
5.01565871
2.333247222
GTPBP1
5.003982945
5.071726797
CPSF6
4.981033566
4.590501629
266
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
CDC25A
4.976662591
2.897095077
WDR76
4.886753231
3.344157827
GINS1
4.884026824
2.154240018
GRK6
4.857117722
3.042371941
POLR1B
4.833254474
2.792918357
CEP192
4.813957861
2.077827663
SART3
4.79851291
3.294444378
NNT
4.78307819
4.503689785
KARS
4.730879121
3.214759655
SAFB
4.715110885
3.33951508
UBR7
4.680847249
2.541130104
CKAP2L
4.657502861
2.606384436
SMG7
4.643623934
3.364262941
XPO5
4.631385079
3.880937709
WBP11
4.609014895
4.1022292
FIGNL1
4.606627418
2.08154431
POP1
4.591867213
3.59779264
POU2F1
4.563974482
2.181336015
MAD2L1
4.561555188
2.508935283
LARP5
4.558911544
2.65342629
RNF34
4.544089795
2.934788489
BRI3BP
4.532722879
2.871632222
KIF18A
4.522376826
2.206534579
WDHD1
4.479557562
2.520464807
SFRS14
4.465183576
4.195033652
GSG2
4.460019337
2.711871862
MLH1
4.425945334
3.479686613
RAD51AP1
4.414012376
2.04157747
BRIP1
4.405527838
2.329165924
MARS2
4.398022854
2.932305616
SFXN1
4.397965629
3.007931772
POLR1A
4.382775726
5.193290309
RAD54L
4.370926787
2.608848739
SCLY
4.355601848
3.445955091
NUP214
4.353240599
2.936660367
CDC7
4.295084175
2.081913712
POLE3
4.291177896
2.103924077
TNPO3
4.280898854
3.598140761
PRIM1
4.274990302
2.025835536
FAM60A
4.256602585
2.075988358
GART
4.243787156
3.429799675
VPRBP
4.215273963
3.15503957
VRK1
4.188937853
2.163563272
267
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
GPHN
4.161746127
3.38821531
DCLRE1A
4.146642533
1.79077843
ARHGAP19
4.132293427
2.061365319
PAXIP1
4.124460519
3.004467999
WDR77
4.123325715
3.025945145
SMARCD1
4.120511779
3.615830263
NUP107
4.045449327
1.753345509
C18orf24
4.04326302
2.018041286
NUP93
4.027404583
2.945193827
RFC5
4.010794374
1.354433287
SPC24
3.992537717
2.214071882
NUP88
3.943067477
2.388249501
IMMT
3.914790876
2.508472502
CLN6
3.900640665
2.27579406
AKIRIN2
3.900030725
2.836039109
CCDC21
3.879431224
1.437645446
FBXO5
3.828820351
2.238630699
HIRA
3.82496504
2.994567893
SKP2
3.815969671
2.416286639
HIRIP3
3.80741046
1.628796146
TARS2
3.802558603
1.3672385
CDC23
3.802042917
1.860206111
CHAF1B
3.783016334
2.054530361
MRPL37
3.780801493
2.40942888
ZNF142
3.770070642
3.048480677
UBE2N
3.750756431
2.344418087
CASP2
3.750512131
2.340320183
FH
3.744416849
2.043887261
SUV39H2
3.74039011
2.195390412
WDR33
3.738294696
1.758481797
GSTCD
3.737602949
2.162448303
RQCD1
3.734033552
3.535254452
TOP3A
3.725890353
2.635567545
ADORA2A
3.717888226
4.442241391
KIF24
3.697520999
1.409909211
AOF2
3.687197714
2.599503608
GINS3
3.662782269
1.739907398
POLR2D
3.628248484
2.931046889
FLVCR1
3.618104604
1.996451151
GINS4
3.615801209
2.356390981
CNOT10
3.613905301
1.685010236
SMCR7L
3.608625305
3.212237855
PLAGL2
3.524856809
2.109283482
268
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
DARS2
3.51768568
1.487205349
TTLL4
3.476111783
2.441340128
MCCC1
3.471570139
1.534167246
DBF4
3.45884113
1.777369717
SFRS15
3.424266851
3.379665735
IL7
3.38843542
2.267578323
COX15
3.378255239
1.412871034
CAMK1D
3.377882359
3.177801349
ZBED4
3.336441744
2.829901878
USP10
3.268405624
2.576268409
KIAA0406
3.234756667
2.035658762
LIG3
3.229174992
1.64472414
GTF2I
3.220667531
3.450350357
INTS9
3.198541778
2.434484233
IQCB1
3.178188401
1.643674056
SPC25
3.106304875
1.380774585
CIRH1A
3.077488856
3.202130099
UIMC1
3.073808766
1.733997127
SLC25A44
3.036954121
1.439039613
TUBGCP4
3.036740346
2.074667408
DBF4B
3.033985363
1.140479257
NEIL3
3.02537989
1.304399255
OBFC2B
3.021603876
2.190940372
MRPL39
3.019986937
1.567427458
NEU3
3.017587736
1.545475561
TDP1
3.003823574
1.52193641
EPS15L1
3.003625319
1.814080644
OIP5
2.96368369
1.170646238
TRAIP
2.937710558
1.031190933
YY1AP1
2.898722858
1.609558842
CENPH
2.881530426
1.902673357
MND1
2.876177385
1.564408195
CCDC150
2.795805465
0.886209742
CCDC138
2.765815084
1.134923442
C2orf44
2.751755693
0.906136788
TUBGCP3
2.721374409
1.77509212
C17orf53
2.68997555
1.582560662
ZNF367
2.689225324
1.497127019
PRPF4
2.633247904
1.538737866
FBXO22
2.618684553
2.250848582
TMEM38A
2.616570587
1.368175333
SAAL1
2.593548151
1.286650015
EIF4ENIF1
2.577655456
1.407605112
269
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
EXOSC2
2.571935733
1.532168826
PSPC1
2.569692198
2.784445975
FANCC
2.566279224
1.664071832
SUPV3L1
2.492549812
1.339015985
CHAC2
2.485408105
1.024778737
PMS2
2.448133776
0.929402054
CEP72
2.439410684
0.772423918
OXNAD1
2.410329055
0.76354805
RCL1
2.404256929
1.542624735
DEPDC5
2.39761354
1.248607917
TAF5
2.362775978
1.223091289
C1orf83
2.301767803
1.151520786
GPR63
2.280708562
1.207851655
CENPP
2.248725925
1.248345294
ZNF346
2.244204786
1.546711313
ZSCAN29
2.207977222
1.343883678
TIPIN
2.138518069
0.891238401
C1orf135
2.103025921
0.759216212
PDSS1
2.095539926
0.896088276
TADA2L
2.074052935
0.954661779
MAPKAPK5
2.012235543
1.039750798
RDM1
1.988151172
0.648744845
FANCF
1.977133522
1.191082872
DIS3L2
1.933699088
1.069976799
ADAT1
1.92676802
2.008459187
RFT1
1.86627526
1.147624113
EXOSC3
1.829802731
1.068017527
LCMT2
1.73709081
1.105566639
HIST4H4
1.585250613
0.860151795
AGBL3
1.420685766
0.432104796
DMC1
1.168679573
1.050398384
COL6A3
-78.04876454
-16.85990928
COL1A2
-74.28382256
-30.63347454
COL6A1
-70.81728836
-14.82521282
COL3A1
-69.32155992
-26.02203479
COL6A2
-66.47244796
-14.23599076
LRP1
-63.70014555
-15.92959162
STC1
-47.54280562
-6.145779114
MMP14
-41.93994416
-11.71656345
THBS1
-41.71825847
-19.39565855
A2M
-41.67647726
-15.45064146
TIMP2
-41.32682416
-16.66538096
COL12A1
-39.56528437
-10.31866608
270
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
TIMP3
-36.85823738
-9.266906593
LTBP1
-36.46381379
-6.571303551
IGFBP4
-34.37614225
-8.623311271
DCN
-33.47103761
-11.02791379
MRC2
-31.63508222
-7.381992482
THBS2
-30.30736203
-8.919600586
COL4A2
-30.04177913
-18.2569881
SPARC
-30.03075174
-23.25678739
CTSB
-29.78773998
-22.91869527
LTBP2
-29.43404477
-7.758417707
COL4A1
-28.91887381
-18.14799415
COL5A1
-28.27314464
-11.44344762
LAMA4
-27.82331693
-8.506528196
IL1R1
-26.61378524
-5.95894671
EMP1
-25.54752865
-7.958234731
APLP2
-25.46329869
-16.10061772
LIF
-25.38426464
-4.897867691
EGR1
-25.35845682
-9.915105808
IGFBP3
-25.11693493
-16.68251927
CTSK
-25.05455531
-4.47115708
GPNMB
-24.83904743
-11.49861057
C1S
-24.5961446
-11.02230665
COL14A1
-24.59352871
-4.22817457
GPR177
-24.34878596
-8.438125423
PAM
-24.32406687
-8.359315884
CDH11
-24.1193849
-9.565715541
ITGA5
-23.64093796
-9.260110484
GJA1
-23.60614413
-11.61979875
RND3
-23.55506063
-6.0046257
SERPINE1
-23.16056691
-8.630229627
APP
-22.98030826
-12.75370324
LUM
-22.77965526
-10.17257656
SNED1
-22.67452714
-4.868355364
TNS1
-22.45675488
-9.714016832
KIRREL
-22.43946218
-9.94648015
PCDH18
-22.02019443
-4.189050091
NOTCH2
-21.89379407
-16.32654975
COL5A2
-21.77502404
-11.52680265
NRP1
-21.38894984
-7.750109981
TGFBI
-21.24751868
-13.07147811
WWTR1
-20.75644679
-7.836355958
NR4A2
-20.6156634
-2.9510618
GEM
-20.60197572
-2.620395877
271
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
OSMR
-20.56101279
-9.201801163
FAM20A
-20.39985668
-4.593564199
DUSP1
-20.16242224
-9.657610928
NBL1
-20.07469535
-6.937737177
FAM129B
-19.98783785
-10.39608277
LAMB2
-19.95308915
-14.69860634
CYBRD1
-19.93848535
-6.370081816
OLFML2A
-19.90998587
-2.904819273
SH3PXD2B
-19.83901217
-6.39907524
CALD1
-19.74193838
-12.19211819
ATP2B4
-19.61647869
-11.0788483
PTRF
-19.58427542
-10.97920262
FKBP10
-19.58059473
-8.954395944
HTRA1
-19.57924127
-14.13191448
C8orf4
-19.50378531
-6.202443039
RASSF8
-19.48824453
-3.883287891
SEMA5A
-19.47901493
-5.99655432
FOS
-19.14497757
-12.3390116
GNG12
-19.08650837
-10.97189749
BDKRB2
-18.97342222
-2.678507743
YAP1
-18.96674298
-12.59020866
FLRT2
-18.79091846
-7.184943351
CTSL1
-18.59753461
-10.46089502
NOTCH3
-18.54489451
-10.36533256
CA12
-18.49584348
-6.515770648
COL18A1
-18.42731903
-11.71050325
HMCN1
-18.36868778
-7.946450498
KCTD12
-18.19363982
-9.273857259
EHD2
-17.93919932
-8.616043747
TEAD1
-17.92768588
-8.770750578
ITGB5
-17.64107271
-12.49497074
SERPING1
-17.55234831
-11.10516147
NFIX
-17.43834292
-8.956013952
SOD2
-17.35300278
-6.836897057
GPR124
-17.25492758
-4.222101983
CALU
-17.1889367
-5.999802865
VCL
-16.83625316
-7.367180565
APOD
-16.79051527
-5.790401164
C3
-16.71424309
-16.10242887
ECM1
-16.66393171
-3.823785596
RAI14
-16.5905115
-8.281575704
ANTXR1
-16.57881118
-11.91383723
FOSL2
-16.48463792
-12.57451693
272
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
SVIL
-16.46876063
-6.175904336
FAP
-16.28886625
-3.725852679
PLAU
-15.86203841
-5.943753873
PLXND1
-15.66738386
-8.662532266
OLFML3
-15.66277569
-4.28071462
PTGES
-15.63034811
-2.648610768
TSHZ3
-15.57785844
-2.969009019
SERPINH1
-15.56813836
-8.359743614
CAPN2
-15.38654712
-13.6095608
PHLDB1
-15.36518663
-8.63821974
GSN
-15.34059977
-9.672098668
LMNA
-15.31965109
-5.824609806
AXL
-15.22175969
-8.904915926
EDNRA
-15.20618586
-3.608933886
ASPH
-15.18095934
-8.113789638
DOCK1
-15.16017507
-10.10388288
WNT5A
-15.15001855
-5.553020355
SERPINF1
-15.13573756
-5.056045743
ADAMTS1
-15.05286479
-8.104117196
CDCP1
-15.02753489
-7.152940405
CD63
-14.96413661
-10.03787356
TPBG
-14.839185
-3.471507963
EGFR
-14.55248836
-8.941034339
IL1RN
-14.48309103
-3.207110942
EPAS1
-14.48042686
-8.640772663
CLDN11
-14.47746878
-3.682166313
SNAI2
-14.46265737
-4.41616214
NDRG1
-14.44607486
-14.75347253
ZCCHC24
-14.4431622
-5.727545191
PPFIBP1
-14.40674149
-6.024646385
LPHN2
-14.33308517
-7.825536328
FGF7
-14.27535374
-1.788270549
COLEC12
-14.24886118
-3.203575325
JUN
-14.23874206
-6.766796119
PLAT
-14.19710956
-4.525958612
COL15A1
-14.16794557
-7.121650502
DLC1
-14.13984025
-6.448991963
MYADM
-13.73542092
-4.854969622
GLUL
-13.72780957
-15.97624911
RHOB
-13.69059127
-9.772591693
PRICKLE2
-13.65239421
-6.475097061
BNC2
-13.61192273
-5.054286963
AKAP12
-13.59256823
-7.122020042
273
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
ANXA5
-13.58163078
-6.852037154
PID1
-13.48603173
-3.866039118
DAB2
-13.46091826
-9.876934964
FBLN2
-13.39394781
-4.151322139
LTBP3
-13.3631227
-6.858943143
RUNX1
-13.32070135
-5.355153529
SPON2
-13.31926274
-3.342430912
DUSP6
-13.15615784
-9.455747147
PPAP2B
-13.12560671
-4.165517189
CKAP4
-13.07249461
-5.835801251
ENG
-13.06560596
-5.664063169
SLC7A8
-13.02527221
-5.849910393
ANGPTL2
-12.99594985
-4.498940098
SCARA3
-12.98026158
-6.361633292
TGM2
-12.94356396
-9.954817516
CRISPLD2
-12.90363585
-7.239695428
LAMA2
-12.88112437
-4.168749481
AQP1
-12.86038045
-6.11426318
RAPH1
-12.83514349
-6.382058242
ARRDC3
-12.72082597
-6.837594341
CNIH3
-12.68270008
-1.611950504
S100A16
-12.63290844
-9.279431721
LAMC3
-12.62765305
-3.714979413
SQSTM1
-12.625313
-5.781480456
ERRFI1
-12.60055212
-7.840590918
TENC1
-12.59809289
-7.652323199
CRTAP
-12.58839591
-4.687266734
C13orf33
-12.48512237
-1.916803209
ABCA8
-12.47150186
-4.533097268
TPM2
-12.44094513
-7.805033448
PARVA
-12.43777936
-6.969828096
ITGAV
-12.38530746
-8.139326983
CD59
-12.30291445
-5.92380581
ITGA2
-12.25986726
-7.203555443
APOE
-12.25754133
-11.17096323
CYP1B1
-12.24828786
-6.947273878
ADAM12
-12.15207836
-3.932139302
PLTP
-12.1307572
-8.020028122
COL27A1
-12.06210615
-8.623114701
PRNP
-12.0059327
-7.722098809
CYR61
-11.99693644
-12.72590386
THY1
-11.98744733
-7.603081229
CD81
-11.93552128
-5.561894401
274
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
ADAMTSL4
-11.87497622
-4.217613275
SGCD
-11.87200633
-2.880542507
MYO1D
-11.80020356
-6.4294112
TGFBR2
-11.79727963
-6.994783647
MYL9
-11.79645654
-9.5779603
VGLL3
-11.79643524
-5.495291356
SGK1
-11.7941642
-10.6613682
TGFBR3
-11.79155848
-5.454172978
LOXL1
-11.59938592
-3.237194054
ANO1
-11.59710602
-4.166717255
GLIS3
-11.55381543
-4.365957691
CDKN2B
-11.51121895
-4.17916099
ABCA1
-11.43556142
-6.649833717
PTPRM
-11.36916806
-5.162405371
SDC2
-11.25293874
-9.504626621
LTBR
-11.15390232
-7.721778445
FKBP9
-11.09377343
-5.699242159
IGFBP7
-11.07068506
-16.10366513
CYP26B1
-11.06377339
-2.628451065
DKK3
-11.00099782
-5.653720777
RIN2
-10.98113061
-8.299592625
NFIL3
-10.89407591
-3.649267639
LAPTM4A
-10.88216777
-9.47215333
BMP1
-10.8786157
-4.766378222
SOD3
-10.81387608
-3.877136096
S1PR3
-10.7987874
-4.286666048
ZAK
-10.72636121
-5.392825677
RHBDF1
-10.71524753
-4.907186665
PTGFRN
-10.60453625
-6.133509099
FAM114A1
-10.59929069
-4.903248794
CFH
-10.56578364
-7.485800187
LDB2
-10.55190267
-3.293476084
CXCL12
-10.51769782
-5.381089935
SASH1
-10.51428694
-6.504370574
OLFML2B
-10.50710993
-4.6165065
CCDC80
-10.46805278
-9.757658449
PLEKHH2
-10.45808007
-8.975549088
CCL2
-10.44772256
-2.938458965
ST5
-10.35820527
-7.230739093
ANXA2
-10.35450711
-8.005340644
RCN3
-10.25311568
-3.828035687
CERCAM
-10.2297807
-5.433521276
FBLIM1
-10.22314531
-6.428469349
275
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
AOX1
-10.22031805
-3.454629633
TNS3
-10.20737724
-8.434795054
TNFRSF21
-10.14190423
-4.438171176
TEAD3
-10.1227539
-5.395441645
MYO10
-10.12081238
-12.21513298
TSC22D1
-10.10898402
-7.582652909
S100A11
-10.10188181
-8.730604537
BCL6
-10.09658936
-7.822117134
C20orf108
-10.09016468
-6.969000036
ISLR
-10.08844424
-4.259210081
HS3ST3B1
-10.08841956
-2.320917712
HDAC7
-10.03171112
-6.470323992
CTTN
-10.02167162
-6.051999367
SIPA1L2
-10.01795158
-8.801795287
EFEMP2
-10.00911656
-4.619686694
SLIT2
-9.938604134
-2.985233377
CEBPB
-9.898592102
-4.505294919
LPP
-9.878244273
-6.655387588
CTGF
-9.819439609
-12.64589232
GLI3
-9.819055635
-5.886470103
ANXA1
-9.782642138
-9.528694416
SPRED1
-9.76651532
-2.727582675
ZNF521
-9.763690352
-2.730362369
GNS
-9.729843848
-6.339669288
LOXL4
-9.708168833
-1.736244632
ACVR1
-9.658399585
-5.484235318
ACVRL1
-9.635867517
-2.713385941
MARVELD1
-9.635339224
-4.377384876
MRGPRF
-9.631304729
-1.981925739
PHC2
-9.609922427
-4.071979284
CTSD
-9.606444904
-13.31775028
THBD
-9.598752449
-4.821603233
CD9
-9.581243802
-11.39487796
PRSS23
-9.570128052
-12.42252705
PPL
-9.561352289
-6.426420138
PMP22
-9.547583021
-6.147791751
SLC2A10
-9.516643045
-5.66408614
CXCL2
-9.507327972
-2.843435827
SDCBP
-9.500524879
-7.215643551
ELN
-9.491548517
-3.556549743
PERP
-9.391331291
-9.895308091
FZD1
-9.383943068
-4.742826265
CD68
-9.349115776
-6.274991454
276
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
SYNJ2
-9.29317645
-3.42565216
PLAUR
-9.289470894
-5.622362986
GPC1
-9.274839211
-5.734448346
IGFBP6
-9.235275592
-2.554213124
GPX8
-9.215793404
-5.12605569
CAMK2N1
-9.187242129
-5.306105918
TSPAN9
-9.174666495
-4.103350464
PRRX2
-9.126840885
-2.114365079
PLSCR4
-9.124381087
-4.684690133
RRBP1
-9.108183756
-7.177983965
QSOX1
-9.107436296
-10.3064023
FMNL2
-9.065666318
-6.936887663
NPAS2
-9.041839001
-7.411285038
GAS1
-9.026347802
-4.247690497
SDC3
-9.013725476
-7.144361436
FBLN5
-9.001980836
-4.161165708
CPXM2
-8.992582666
-3.834618173
ITGA3
-8.980123329
-9.451206222
P4HA3
-8.954169365
-2.082308203
EPHB4
-8.936195555
-6.952342539
NNMT
-8.925792746
-7.743672477
LEPREL2
-8.916546541
-3.315024385
IGF2BP2
-8.908767903
-6.95203324
FZD7
-8.897677253
-3.468373513
RAB34
-8.89608857
-5.710764579
TFPI
-8.84179743
-7.175196491
RAB11FIP5
-8.840025129
-5.451994883
KLF9
-8.839468524
-4.918849303
BEST1
-8.827654094
-3.919819046
PLA2R1
-8.770241517
-3.338931021
ENPP1
-8.752570065
-3.668683867
P4HA2
-8.734472794
-4.809403071
COL8A1
-8.726130522
-3.532970244
FOXF2
-8.701247735
-2.117684582
C1orf198
-8.699706442
-7.491121806
PRKCDBP
-8.69071793
-2.74946525
FMOD
-8.681518412
-6.154060851
EXT1
-8.668801156
-5.665957739
RASL12
-8.665005333
-3.460498611
CTHRC1
-8.647135441
-4.506655088
SYDE1
-8.646761171
-4.365059329
GNG11
-8.644075412
-3.773216942
MGP
-8.624858953
-5.562503466
277
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
PDPN
-8.510596161
-4.357340742
HNT
-8.506693559
-2.438433309
TSKU
-8.504561028
-5.725797058
CTDSPL
-8.500641577
-6.550946516
ABCC3
-8.496660426
-6.888844974
COL21A1
-8.476494712
-2.126368743
SOCS3
-8.471398809
-6.777988424
RASD1
-8.453479432
-2.810048051
MYLK
-8.44827779
-6.529454031
SPOCD1
-8.386992704
-2.257010912
HTRA3
-8.371155822
-3.014984376
NDN
-8.366652244
-2.742375004
TFAP2A
-8.322210864
-5.920147189
FAM46A
-8.308391213
-6.861647777
ETS2
-8.291891093
-4.355855537
PDE4D
-8.233225415
-3.346617658
KREMEN1
-8.152496436
-4.244799555
SDC4
-8.13050297
-6.735199708
MAF
-8.085015773
-6.759776865
FCGRT
-8.073802006
-6.589981787
TGFB3
-8.060166458
-2.208196382
CAPN5
-8.051766787
-4.212594332
CXCL14
-8.046046533
-5.825759452
RBPMS
-8.016187551
-5.237182416
KLF4
-7.984642039
-5.567505716
OAT
-7.984368681
-5.668808035
MXRA8
-7.971489533
-2.522300331
UACA
-7.963584603
-9.56377129
PROS1
-7.938933499
-4.183393923
ADM
-7.927246362
-4.584555704
USP53
-7.910664276
-5.720550057
SDC1
-7.905863484
-6.210648382
OLFML1
-7.888464408
-2.14161084
LIFR
-7.86819662
-7.824708089
LGALS3
-7.865227668
-6.554969793
WIPI1
-7.85738754
-2.363668589
IL1B
-7.826892855
-1.62473
NXN
-7.817095474
-5.347718906
GRB10
-7.796194839
-6.771341544
BNIP3L
-7.751175484
-6.11879348
KDELR3
-7.750582284
-4.896616797
TEAD2
-7.733819222
-5.53817432
VEGFC
-7.726343312
-2.760093577
278
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
S100A10
-7.706383395
-9.426143903
EMP2
-7.681330298
-5.064766873
CD151
-7.680970127
-5.309222834
SECTM1
-7.676116872
-3.525289925
SFXN3
-7.658782612
-4.005013653
CXCL1
-7.593245321
-1.585578081
OSBPL5
-7.589333896
-4.278467372
ADAMTS4
-7.583022887
-3.258689058
ARHGEF10
-7.57342629
-5.671118169
BMP2
-7.570872832
-2.593800257
SATB2
-7.5341501
-3.909401256
HAS2
-7.525316662
-2.484401716
SEPT10
-7.498414731
-4.471631734
C4orf18
-7.48128806
-9.189586226
CDC42EP1
-7.477939717
-7.082085177
TGFB1I1
-7.47142632
-3.173952085
ABLIM3
-7.452537899
-3.107283056
TNFRSF1A
-7.439621713
-4.958182382
GAS6
-7.402249328
-7.10299006
KIAA1217
-7.399828827
-10.89688074
GFPT2
-7.388986754
-3.593618384
SPHK1
-7.379576269
-2.864533216
EFEMP1
-7.366984718
-7.114644571
MICAL2
-7.32384118
-5.48083052
ARSJ
-7.311815753
-3.407951701
EPHX1
-7.279526661
-8.725567839
PAPSS2
-7.246967525
-5.411536807
CTNNA1
-7.205848641
-5.306411666
CLEC11A
-7.18377535
-1.819791819
PLXDC1
-7.137582179
-3.989787157
TMBIM1
-7.12751138
-4.402032703
RHOJ
-7.110654409
-2.781958958
LRRN4CL
-7.108072581
-1.252273333
PBX1
-7.100835591
-3.564040295
PRKG1
-7.087578611
-2.412224689
VAMP3
-7.023920839
-5.574199643
MSRB3
-7.016889633
-5.129640282
C1QTNF1
-6.955396955
-5.521046776
PODN
-6.895086391
-3.289103217
PPP2R3A
-6.877105692
-4.079980097
INHBA
-6.845944344
-3.72848226
SEMA3C
-6.822473957
-8.224077435
BACE2
-6.800375378
-7.583558835
279
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
SELENBP1
-6.795310263
-5.730780685
SLC12A4
-6.772912594
-2.971504846
WASF3
-6.739269145
-6.028087784
MMP19
-6.732602067
-3.001987285
BOC
-6.726496252
-3.747440184
HEXB
-6.703179153
-5.216143131
VDR
-6.702047932
-4.060181813
COL8A2
-6.700493251
-2.862313155
TNFRSF12A
-6.691840145
-5.151227194
LEPR
-6.631501555
-3.272815512
TCF7L1
-6.605692804
-4.34587665
ADAMTS2
-6.602837416
-3.150757871
CST3
-6.601605906
-8.156517547
MICALL2
-6.599242848
-4.648431844
ARHGAP28
-6.59233419
-3.198146613
ARHGAP22
-6.56438503
-2.096098486
MET
-6.554864818
-8.971494241
SERPINB6
-6.510573534
-7.342858329
SPRY1
-6.48080002
-4.82771692
PDGFC
-6.475847796
-9.507483635
GLDN
-6.465955374
-2.93320048
LAMP1
-6.454410133
-3.683878078
GNB4
-6.431438675
-5.720721892
PCYOX1
-6.429425406
-3.83655458
STON1-GTF2A1L
-6.40807117
-4.447070424
BAG3
-6.357222603
-4.75649919
SLC41A1
-6.347525157
-5.787237437
PRELP
-6.304871424
-4.466188185
TMEM98
-6.278499121
-4.549171697
LRIG3
-6.274767847
-4.391888718
LRRC32
-6.271102492
-3.92078561
FABP3
-6.27000498
-1.680905462
ADAMTS14
-6.252392959
-2.950600514
FZD8
-6.238275342
-3.667324801
SORBS3
-6.199764438
-4.983742988
PLOD3
-6.191712523
-2.740599898
ADAMTS5
-6.176111936
-4.861178552
PROCR
-6.154886679
-2.507465922
SIX4
-6.152206492
-4.052431783
FOXC1
-6.145639303
-3.735285252
RCN1
-6.135201968
-3.188453242
CTSA
-6.131197523
-6.613474808
PVRL2
-6.130258386
-8.693192786
280
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
GSTM5
-6.128755384
-1.725050053
TCF7L2
-6.117949068
-4.32435385
DIXDC1
-6.087907482
-3.058016053
TBC1D8
-6.077189106
-3.44033092
KIAA0284
-6.062288324
-4.99177228
F2RL1
-6.026680764
-5.483468262
ELTD1
-5.992308396
-3.537887829
HRH1
-5.984605139
-5.171036736
DUSP3
-5.97886987
-4.994806874
PLXDC2
-5.978129737
-4.292542777
YES1
-5.969485046
-3.159561299
LRP11
-5.969400803
-6.227606164
SPSB1
-5.96178341
-4.281276609
THRB
-5.956137533
-5.544209459
SMOX
-5.955836165
-2.350371197
RUNX2
-5.951199409
-5.825787079
HOXA10
-5.946566364
-2.682600664
FZD4
-5.929448199
-6.498081664
COL13A1
-5.916915567
-2.52997628
MAFB
-5.909735755
-5.610078081
NOV
-5.887565304
-3.360918343
DOCK6
-5.880463474
-3.247385072
CTSF
-5.840415945
-4.842621904
C1RL
-5.814845584
-3.124944199
MMP11
-5.814552698
-4.532706056
ACO1
-5.807221424
-2.68174222
MAP3K6
-5.802085804
-4.050536119
KITLG
-5.798588989
-4.272432622
TM4SF1
-5.791335369
-12.53707774
RIN1
-5.79041071
-2.104647464
HSD11B1
-5.76932916
-2.22178531
TAX1BP3
-5.76452567
-3.209701844
PAQR5
-5.761581864
-4.214632811
C5orf4
-5.754953241
-3.793389815
PLEKHA4
-5.720222629
-3.258903839
SRGAP1
-5.70642442
-3.853435882
HSPA12A
-5.66454931
-4.817842799
CAV2
-5.660559349
-7.57964101
ACOX2
-5.628977628
-2.736537429
BHMT2
-5.625579946
-1.345439586
GLI2
-5.603244683
-2.473602022
RAB3IL1
-5.565584766
-3.132443833
SEMA3F
-5.557072592
-4.270430452
281
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
B3GNT5
-5.5404847
-4.112549456
CXCL3
-5.508133504
-1.585181963
REXO2
-5.487375387
-4.875971125
NPC2
-5.467632755
-7.19861593
RAB23
-5.463987254
-2.125287919
F3
-5.424974893
-7.132590103
PHACTR2
-5.401149075
-3.918105107
FHL2
-5.376727646
-8.031326308
VGLL4
-5.363118769
-4.120134414
PRSS35
-5.343275362
-1.92637967
KIFC3
-5.310813373
-4.354083028
PDGFRL
-5.3055922
-2.385839009
ID1
-5.28631763
-5.298422921
C6orf145
-5.279585823
-5.20611973
HEBP1
-5.272833822
-4.4772269
GLIPR2
-5.26032381
-4.116335794
FAM176B
-5.247141428
-1.866258823
GLT8D2
-5.24163493
-2.046208978
MTMR11
-5.222017527
-4.246619305
NAV3
-5.217429215
-4.100427073
AASS
-5.213321679
-4.189915377
PITX1
-5.210536739
-2.421279115
MAFF
-5.182515306
-2.544924447
CAMKK1
-5.168032725
-1.907668336
ADCY4
-5.16494943
-2.429088209
RRAS
-5.149514903
-3.14941192
VWA5A
-5.129713685
-3.862181613
FAM180A
-5.08850125
-1.171564467
SELM
-5.021347133
-3.355638782
TFAP2C
-4.99638407
-3.433801542
CYFIP1
-4.96855948
-4.929299348
HOMER3
-4.952919758
-2.81510466
PPAP2C
-4.950033988
-5.907537966
FCHO2
-4.920911614
-6.490230276
PVRL3
-4.906461106
-2.641037996
PPM2C
-4.901737602
-3.609136592
CYYR1
-4.884924548
-2.600459654
PPIC
-4.884664398
-4.676650833
ANKMY2
-4.881466636
-4.557392134
LAYN
-4.822383468
-4.05499754
TRPM4
-4.819527968
-6.0084593
ALDH1A1
-4.81528634
-6.074483835
S100A13
-4.778693848
-4.221550791
282
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
SLC39A13
-4.776980813
-1.719045894
FNDC4
-4.773185554
-2.562406532
KCNE4
-4.762751855
-2.735488259
SLC17A5
-4.744694533
-2.305056467
PELI2
-4.698487941
-3.244166815
ZDHHC1
-4.651571589
-2.035733645
PCSK5
-4.632623254
-4.607916997
PDIA5
-4.611012344
-2.144754054
NPTN
-4.606204812
-1.695356223
C10orf26
-4.575940207
-2.067815648
SERTAD4
-4.561560526
-2.137812621
NFIA
-4.535872275
-3.876790266
C7orf58
-4.495019125
-3.209354968
PRDM16
-4.438731006
-2.319554634
CCDC8
-4.436150348
-2.748129084
SMPD1
-4.419194676
-1.581937356
RBMS1
-4.404841258
-1.754251543
HSPB7
-4.389496884
-1.468639817
RAB32
-4.370171421
-3.918508973
C15orf52
-4.36349757
-3.942262012
ESM1
-4.3497793
-2.778770497
PDK4
-4.338228338
-5.718004565
CHRD
-4.313464557
-3.664091931
MLPH
-4.309663139
-6.717474014
SEMA3B
-4.308485621
-5.190046543
SLC27A1
-4.294278503
-3.325017944
HSPB8
-4.292079135
-3.936750911
AKR1C3
-4.290460758
-2.947677596
PLK2
-4.275368945
-7.055120854
MITF
-4.250183532
-5.643236941
AKR1C1
-4.249415053
-1.466560683
SULF1
-4.246097656
-8.139079379
FIBIN
-4.226883212
-3.386613889
C2CD2
-4.211710288
-3.143417078
C14orf37
-4.198515451
-3.106271068
CCDC149
-4.192084046
-3.482738848
TMEM43
-4.177853825
-2.643167232
UBASH3B
-4.170145266
-3.774068436
TSPAN14
-4.169282644
-4.758198937
ANKS6
-4.154123871
-3.724535282
PHYHD1
-4.153721779
-4.216307558
MRVI1
-4.149700688
-4.684852909
IDUA
-4.132797153
-2.207640717
283
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
EVI1
-4.114268194
-7.754222039
NRBP2
-4.104305668
-4.610167775
AK1
-4.093822904
-2.551949602
GDNF
-4.088665901
-2.190981364
RYK
-4.0815333
-2.137519639
SEPP1
-4.077407388
-7.929501879
TAGLN
-4.053155762
-9.48424099
FUCA2
-4.047931189
-3.013556519
CDC42EP2
-4.047369669
-2.155297031
TRIM47
-4.047167126
-3.781359049
ACSS3
-4.037661389
-3.67342106
CPZ
-4.029550012
-3.160129756
IER5L
-4.023042592
-3.445505645
PLCD1
-4.020832181
-2.655017488
CYB5R2
-3.994072667
-2.61434352
UBTD1
-3.983241777
-2.036866077
RCAN2
-3.976382029
-3.912958043
PHLDA2
-3.944089149
-2.264846309
GAS2L1
-3.898895447
-4.209798964
METRNL
-3.888523659
-2.112678173
SRXN1
-3.876143746
-2.712735161
PDE7B
-3.870863541
-3.285205001
GALNTL2
-3.870132779
-2.358110052
MYO15B
-3.855189874
-4.553683104
PIAS3
-3.847304715
-3.168990346
EVC2
-3.844502451
-2.323086749
ZBTB47
-3.828299939
-3.88949216
SNX21
-3.805668094
-2.640802004
IL17RC
-3.80222719
-3.442234801
RARRES2
-3.797634718
-4.067158304
FKBP14
-3.796892982
-1.515218504
C1orf85
-3.780465924
-1.991446795
RSPO3
-3.780429343
-1.798839888
C13orf15
-3.74967828
-3.376011963
FOXQ1
-3.729193949
-3.584925777
OSBPL10
-3.727365972
-5.191370072
C10orf116
-3.711889152
-1.718061087
CDC42EP5
-3.711218833
-1.476855159
HSD3B7
-3.706665643
-2.996050087
ERBB2
-3.667947762
-8.636475401
TCEA3
-3.659653581
-6.179313848
EMILIN3
-3.627410134
-2.908769075
TRPV4
-3.623725773
-2.76201237
284
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
SGMS2
-3.617949434
-3.390928611
FOXF1
-3.614159895
-1.394243848
PACSIN3
-3.605181529
-2.457731712
GHR
-3.60293857
-3.001962857
ALS2CL
-3.56682601
-5.309426893
AFAP1L2
-3.553497092
-4.223559245
FAM26E
-3.52398329
-2.439725767
PINK1
-3.520122308
-3.450526169
ANGPTL4
-3.517212882
-2.732106717
PGM5
-3.516890932
-2.277163906
EPHA2
-3.512566689
-6.263577289
DNALI1
-3.485274373
-3.511265537
GRASP
-3.466521496
-1.96491781
SMAD6
-3.462255328
-3.146344274
SCUBE2
-3.452494709
-2.905828206
PARD3B
-3.445829724
-5.486728445
AVPI1
-3.441618788
-2.794113379
FEZ2
-3.429675235
-3.482671271
CXCL16
-3.415275159
-7.459035941
LMOD1
-3.411606416
-3.941051214
SAV1
-3.375077144
-2.026245067
KDR
-3.369497175
-4.780565859
PPP1R3B
-3.355477032
-5.036247713
TMEM54
-3.350610866
-3.631761557
CCL8
-3.327662266
-1.401894626
SHISA4
-3.322864911
-2.735933785
C1orf190
-3.286832834
-1.342338054
DZIP1L
-3.266274887
-1.985723457
GGCX
-3.240521603
-2.274992029
OSR1
-3.23142412
-1.751664357
PRKD1
-3.230927516
-3.911381651
HSPA12B
-3.223459801
-1.337081532
ZNF385D
-3.219959033
-1.058379034
SEC16B
-3.218653186
-4.786135301
MALL
-3.217061507
-4.733712568
SPARCL1
-3.212048835
-11.8199309
SLC24A3
-3.209905918
-3.324118735
RARRES1
-3.209621122
-3.208968809
KCNC4
-3.202464584
-2.794996903
ADAMTSL5
-3.201806381
-1.518305887
PTGR1
-3.197121585
-4.223760449
LAMB3
-3.196494672
-7.276319302
GPRC5A
-3.175196272
-8.961440421
285
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
HNMT
-3.173719874
-5.102794548
GPR116
-3.153585055
-5.578342665
FAM62B
-3.14485815
-3.371049628
CARD10
-3.139150425
-4.2608198
PPP1R3C
-3.134776085
-1.642447904
DYSF
-3.126052071
-5.976131499
CSAD
-3.106272454
-1.983823996
SLC22A3
-3.096206567
-3.156503602
SSH3
-3.095159486
-4.005379201
SLC47A1
-3.078488911
-3.052334369
CTF1
-3.072710682
-1.659437405
SPATA6
-3.044058221
-2.984994957
SLC9A3R2
-3.043470191
-4.693516207
AFAP1L1
-2.982184698
-3.26031397
PDE1A
-2.98023333
-2.52832547
GALNS
-2.967579841
-1.74710031
ST3GAL4
-2.956946574
-3.912489902
FAM89A
-2.942052464
-1.969187999
RHOD
-2.937517399
-2.748935586
ALDH3B1
-2.92890583
-2.33133059
SUMF1
-2.911329405
-3.324310927
MN1
-2.902812492
-3.520119653
SLC40A1
-2.898867417
-9.139870098
MOCOS
-2.892582181
-2.429495323
GALC
-2.886076193
-3.529778526
ATP10A
-2.869999284
-3.663155094
CITED4
-2.853194127
-3.074437845
SSPN
-2.842143308
-2.958511343
BMPER
-2.831041783
-2.586797912
SMPDL3A
-2.818892488
-3.311611897
DENND2C
-2.798282656
-2.595737715
SYTL2
-2.784038915
-5.14524253
SNAI1
-2.767112852
-1.133956399
TNFAIP8L3
-2.744514934
-1.884748834
FBXO32
-2.741882206
-4.257330649
PLD2
-2.687559604
-2.669529515
MIPOL1
-2.678454205
-3.461690355
DDIT4L
-2.636492314
-3.273675802
ITGA8
-2.634776665
-1.700761282
NR1H3
-2.633790054
-2.512860462
HIBADH
-2.596026311
-2.481823028
MDFI
-2.58654607
-2.858403902
PECI
-2.584292175
-2.404978976
286
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
COPZ2
-2.571744547
-1.424691709
TMEM26
-2.568361625
-2.118568154
C1orf113
-2.557503412
-2.393016384
SLC13A3
-2.547295644
-2.090526983
IMPA2
-2.529106094
-3.653636386
TMEM204
-2.526005165
-1.699061085
CCDC102B
-2.490260853
-1.312432148
IRX3
-2.479158668
-2.940255575
SLC38A6
-2.453751247
-1.631223655
C1QTNF5
-2.435954514
-3.390450745
HHIPL1
-2.413947089
-1.400613509
C21orf63
-2.389293888
-2.570457714
APOL4
-2.346710733
-2.24217477
GPRC5B
-2.343715663
-8.206296234
ANKRD35
-2.329550354
-1.746755157
ERG
-2.327770509
-3.461522943
MAN1C1
-2.32220856
-1.545203118
FBXO2
-2.320454204
-1.755862404
SCN1B
-2.301943868
-1.625929209
RAPGEF3
-2.289205649
-4.672807712
C17orf58
-2.284840938
-1.471960485
FAM62C
-2.249543604
-1.405822585
STBD1
-2.223316556
-3.025177167
BDH2
-2.211305835
-1.529715315
HFE
-2.206308486
-1.912565834
RGL3
-2.196214103
-5.239700395
CRK
-2.195299658
-1.494098184
CALB2
-2.168028442
-1.960137425
CTNS
-2.145769771
-1.421872661
NQO2
-2.142834843
-2.643948306
HSPB2
-2.124338329
-1.111288135
MEOX2
-2.103017552
-1.300509283
KCNE3
-2.102654663
-4.873856979
COBLL1
-2.094888349
-6.796616668
LCAT
-2.092353952
-1.589959228
HSD17B14
-2.089041477
-1.527940498
BTC
-2.051202476
-2.358846201
FSTL3
-2.040016988
-2.338291504
ATP8B4
-2.028350816
-2.27849812
C17orf79
-2.02794028
-1.478014819
ELOVL3
-2.026537942
-0.735474867
MMRN2
-2.005964606
-3.040995679
PODNL1
-1.980694963
-0.993287832
287
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
C11orf70
-1.96656768
-2.14691841
COX7A1
-1.966152073
-0.870369896
CBR3
-1.963880965
-2.594560417
ZFYVE28
-1.956748776
-2.288409663
ATOH8
-1.94111599
-3.036016508
GATA6
-1.928981721
-4.954263955
TRPC4
-1.858765342
-2.051374366
SAT2
-1.819457401
-1.880599548
IL1RL2
-1.790481085
-1.346146898
FAM70B
-1.78681553
-1.514850877
MFSD7
-1.77713905
-1.987598316
BFSP1
-1.76186939
-1.713445729
FBLN7
-1.739362393
-2.224490617
GRTP1
-1.731558781
-2.058697287
NAGS
-1.727035502
-1.216408054
LRRC8E
-1.726783136
-2.087601159
RERG
-1.715712021
-2.423153868
CYB5R1
-1.714982751
-2.904796525
PHYH
-1.709306062
-3.141843075
ANG
-1.699739817
-2.001075823
TDRD6
-1.692123533
-2.911215907
TRPC6
-1.67925083
-2.476680632
NGEF
-1.63185099
-2.671933637
MUC1
-1.592239123
-11.75078683
C7orf10
-1.576745585
-1.412980884
CLDN4
-1.57635278
-8.928835
GNAL
-1.575655113
-2.287137561
GALNT12
-1.529861771
-3.356288507
AUH
-1.51256067
-1.175701761
WFDC1
-1.505443832
-2.01179907
C2orf55
-1.494938285
-3.13294996
GLB1L
-1.480696154
-1.30790568
DAPK2
-1.475908689
-2.39774179
IL34
-1.457330427
-1.465733257
CILP2
-1.418459217
-1.825934344
C1QTNF2
-1.399929082
-1.095600012
IL17RE
-1.342615928
-1.870260637
LMLN
-1.336624565
-1.681177035
MKX
-1.312696003
-2.882888198
RAB20
-1.288259179
-3.401389641
SEC14L4
-1.256031489
-1.62132683
RILP
-1.249363443
-1.624661241
HOXA7
-1.234138863
-1.715475959
288
Gene symbol
LogFC NBL TICs vs. SKPs
LogFC NBL TICs vs. cancers
DMRTA1
-1.212603424
-1.850178151
C6orf97
-1.211879885
-3.529950128
NPHP1
-1.209234419
-2.497931825
EMCN
-1.195739018
-1.871341211
ARMC4
-1.172615219
-2.00966696
FANK1
-1.142279119
-2.645228138
NTF3
-1.122823527
-1.039375172
BST1
-1.114702878
-2.748435494
LRRC6
-1.092334252
-2.811812875
CYP39A1
-1.081336695
-2.073697336
CCDC48
-1.063435177
-1.68095822
OVGP1
-1.059869511
-1.404768543
PLEK2
-1.021794976
-2.411484974
WBSCR27
-0.937879305
-1.493620355
GGTA1
-0.927626335
-2.093590991
C3AR1
-0.900210057
-3.100673913
SLC2A9
-0.888734851
-1.168363147
ABCG2
-0.881713128
-2.878861541
SLC25A21
-0.868662295
-1.299284359
CFB
-0.84444826
-3.206153468
KCNK15
-0.842602826
-1.611983671
KBTBD10
-0.835221952
-0.934105664
EPHA1
-0.810484856
-2.683318814
C18orf34
-0.763570571
-1.137732737
PLA2G7
-0.705743421
-3.058844942
ABHD7
-0.671317596
-1.556041026
CFHR3
-0.604716036
-1.123689251
RASSF9
-0.587285591
-1.608385317
SH3RF2
-0.585414125
-2.644100244
CLDN7
-0.50975886
-6.66713801
C21orf9
-0.506504987
-0.5697938
DKKL1
-0.495623989
-0.780226737
KLRD1
-0.380751344
-0.951847703
289
Appendix D Original data for the 99 NBL cases described in Chapter 4
Table D.1 Original data for the 99 NBL cases described in Chapter 4
The data below includes TARGET identifiers (column 1), sequencing technologies used to characterize each sample (columns 2
through 4) and the patients’ clinical characteristics (columns 5 through 8).
TARGET_ID
TARGET-30PALCBW
TARGET-30PASGPY
TARGET-30PAKIPY
TARGET-30PANZPV
TARGET-30PANBMJ
TARGET-30PAIFCS
TARGET-30PAPBJT
TARGET-30PAPSMC
TARGET-30PAKYZS
TARGET-30PAPEFE
TARGET-30PARSHT
TARGET-30PALTYB
TARGET-30PASFRV
TARGET-30PAMVRA
Genome,
transcriptome,
Illumina sequencing
DNA Index
1
X
1
1
1
1
1.16
1.9
1
Age
(days)
MYCN
1 = amp
2 = not amp
3 = not done
4 = unsatisfactory
2
679
2
Shimada
0 = Unknown, 1 = Neuroblastoma
2 = Ganglioneuroblastoma,
intermixed
3 = Ganglioneuroma, maturing
OR well diff.
4 = Ganglioneuroblastoma,
nodular
1
2
1972
2
2
X
1
1116
2
2
X
2
1757
2
2
X
1
993
2
2
X
1
1540
2
0
X
1
570
2
4
X
1
1337
1
2
X
1
659
2
3
X
1
990
1
2
1
752
1
2
2
1536
2
1
2
1057
2
2
2
1554
2
2
X
1.771
1
X
Gender
1 = male
2 = female
9 = unknown
1.449
1.09
X
Exome,
Illumina
sequencing
X
2.02
1.17
X
Genome,
Complete
Genomics
sequencing
290
TARGET_ID
TARGET-30PANKFE
TARGET-30PAKHCF
TARGET-30PALJUV
TARGET-30PAMCXF
TARGET-30PALSAE
TARGET-30PARDUJ
TARGET-30PARRBU
TARGET-30PAMUTD
TARGET-30PAPTLD
TARGET-30PAPHPE
TARGET-30PAMMXF
TARGET-30PAPKWN
TARGET-30PALPGG
TARGET-30PAKGKH
TARGET-30PAIXRK
TARGET-30PALETP
TARGET-30PAIXIF
TARGET-30PAMYCE
TARGET-30PAICGF
Genome,
transcriptome,
Illumina sequencing
DNA Index
1
1.19
1
1.17
2
Exome,
Illumina
sequencing
Gender
1 = male
2 = female
9 = unknown
Age
(days)
MYCN
1 = amp
2 = not amp
3 = not done
4 = unsatisfactory
X
2
1280
2
Shimada
0 = Unknown, 1 = Neuroblastoma
2 = Ganglioneuroblastoma,
intermixed
3 = Ganglioneuroma, maturing
OR well diff.
4 = Ganglioneuroblastoma,
nodular
2
X
2
839
2
1
X
2
1301
2
2
X
1
765
1
2
X
1
782
2
2
2
1052
2
2
1
1448
1
2
1
825
1
2
1
1244
2
2
X
1
806
2
2
X
1
1661
1
2
X
2
710
1
2
X
2
700
1
0
X
1
882
2
2
X
1
564
1
2
X
2
1466
2
2
X
1
1315
2
2
X
2
1059
2
2
X
2
1278
2
0
X
0
X
Genome,
Complete
Genomics
sequencing
1.24
1.09
X
X
1.66
1.13
1.07
1.6
1
1
1
1
1
1
1.05
291
TARGET_ID
TARGET-30PAIMDT
TARGET-30PALBFW
TARGET-30PAITEG
TARGET-30PAPNEP
TARGET-30PAKHHB
TARGET-30PAMDAL
TARGET-30PAIVHE
TARGET-30PAKZRE
TARGET-30PAPBGH
TARGET-30PAMZGT
TARGET-30PAISSH
TARGET-30PANRRW
TARGET-30PANNMS
TARGET-30PAMBAC
TARGET-30PAMZMG
TARGET-30PAHYWC
TARGET-30PARIRD
TARGET-30PAMVAG
TARGET-30PALXMM
Genome,
transcriptome,
Illumina sequencing
DNA Index
1.91
1
1
1
1
0
1.06
1
1
1
1.45
X
X
Exome,
Illumina
sequencing
Genome,
Complete
Genomics
sequencing
Gender
1 = male
2 = female
9 = unknown
Age
(days)
MYCN
1 = amp
2 = not amp
3 = not done
4 = unsatisfactory
X
2
1408
2
Shimada
0 = Unknown, 1 = Neuroblastoma
2 = Ganglioneuroblastoma,
intermixed
3 = Ganglioneuroma, maturing
OR well diff.
4 = Ganglioneuroblastoma,
nodular
0
X
1
804
2
2
X
1
1034
2
0
X
1
720
1
2
X
1
687
1
2
X
2
1222
1
2
X
1
1123
2
2
X
2
1083
1
2
X
2
1074
2
2
X
1
602
1
4
X
2
656
1
0
1
1730
2
2
2
1080
2
2
X
1
945
1
2
X
2
614
1
2
X
1
704
1
0
1
733
1
2
X
1
663
2
2
X
1
1475
2
2
1
2
1.13
1
1
X
2.08
1
1.28
292
TARGET_ID
TARGET-30PANYBL
TARGET-30PALWVJ
TARGET-30PALAKM
TARGET-30PAIPGU
TARGET-30PASDZJ
TARGET-30PAPSKM
TARGET-30PANUKV
TARGET-30PASCKI
TARGET-30PAPTMM
TARGET-30PAITCI
TARGET-30PAIXNC
TARGET-30PALZZV
TARGET-30PAKZRF
TARGET-30PALHVD
TARGET-30PALUDH
TARGET-30PAILNU
TARGET-30PANBSP
TARGET-30PALTEG
TARGET-30PAKJRE
Genome,
transcriptome,
Illumina sequencing
DNA Index
1.85
1.88
1.06
1.04
X
Exome,
Illumina
sequencing
Genome,
Complete
Genomics
sequencing
Gender
1 = male
2 = female
9 = unknown
Age
(days)
MYCN
1 = amp
2 = not amp
3 = not done
4 = unsatisfactory
X
1
1216
2
Shimada
0 = Unknown, 1 = Neuroblastoma
2 = Ganglioneuroblastoma,
intermixed
3 = Ganglioneuroma, maturing
OR well diff.
4 = Ganglioneuroblastoma,
nodular
0
X
1
1544
2
2
X
2
1758
2
2
X
2
898
2
0
2
902
1
2
1
1330
2
2
1
1486
1
2
X
2
911
2
2
X
2
559
1
2
X
1
728
2
1
X
1
723
2
2
X
2
699
1
2
X
1
1285
4
2
X
2
1634
2
2
X
2
1105
2
2
X
1
1683
2
0
X
2
1064
2
2
X
1
1367
2
2
X
1
1100
2
2
1
X
1.4
1
X
1
1
1.92
1.13
1.79
1
1.17
1.89
1.08
1
1.96
1.84
293
TARGET_ID
TARGET-30PAPBZI
TARGET-30PAIXNV
TARGET-30PALEVG
TARGET-30PANPVI
TARGET-30PASLGS
TARGET-30PALZRG
TARGET-30PALXHW
TARGET-30PALWIP
TARGET-30PALIIN
TARGET-30PALZSL
TARGET-30PALFPI
TARGET-30PASCLP
TARGET-30PARGUX
TARGET-30PAKFUY
TARGET-30PANIPC
TARGET-30PAMMWD
TARGET-30PANRHJ
TARGET-30PAREGK
TARGET-30PANBCI
Genome,
transcriptome,
Illumina sequencing
DNA Index
1.99
1.58
1.3
1.11
X
Gender
1 = male
2 = female
9 = unknown
Age
(days)
MYCN
1 = amp
2 = not amp
3 = not done
4 = unsatisfactory
X
1
1710
2
Shimada
0 = Unknown, 1 = Neuroblastoma
2 = Ganglioneuroblastoma,
intermixed
3 = Ganglioneuroma, maturing
OR well diff.
4 = Ganglioneuroblastoma,
nodular
2
X
2
644
2
2
X
1
666
1
2
X
1
1289
2
2
1
1.23
1.2
1.68
1
Genome,
Complete
Genomics
sequencing
1
1218
2
2
X
2
577
1
2
X
2
1131
2
2
X
1
1232
2
2
X
1
1279
2
2
X
1
1564
1
2
X
1
1152
2
4
1
644
1
2
2
1704
2
2
X
1
1194
2
2
X
1
921
1
4
X
1
812
1
2
X
2
1117
2
2
1
568
2
2
1
753
1
2
X
1.122
1.16
X
Exome,
Illumina
sequencing
1.908
X
1
1.73
1.68
1
1.51
X
1.46
1
X
294
TARGET_ID
Genome,
transcriptome,
Illumina sequencing
TARGET-30PANYGR
TARGET-30PAINLN
TARGET-30PAKXDZ
TARGET-30PANZVU
TARGET-30PASAZJ
TARGET-30PALNLU
TARGET-30PALAKE
TARGET-30PALJPX
TARGET-30PAPPKJ
X
DNA Index
1
2.41
1
1.2
Exome,
Illumina
sequencing
Gender
1 = male
2 = female
9 = unknown
Age
(days)
MYCN
1 = amp
2 = not amp
3 = not done
4 = unsatisfactory
X
1
993
2
Shimada
0 = Unknown, 1 = Neuroblastoma
2 = Ganglioneuroblastoma,
intermixed
3 = Ganglioneuroma, maturing
OR well diff.
4 = Ganglioneuroblastoma,
nodular
2
X
1
1404
2
0
X
2
828
1
2
X
1
1713
2
2
1
1764
2
2
X
1
1439
2
2
X
1
860
1
3
X
1
860
1
2
X
1
1731
2
2
X
1.107
1.49
1
1.94
1.16
Genome,
Complete
Genomics
sequencing
295
Appendix E Variant calls detected in the 99 tumor/normal pairs
The variant calls (MAF files) for the 99 tumor/normal pairs described in Chapter 4 have been
submitted to the database of Genotypes and Phenotypes (dbGAP), and are available under the
study accession number phs000218.v3.p1.
296
Appendix F Chromatin remodeling and MAPK pathway gene lists used in Chapter 4
Table F.1 Chromatin remodeling and MAPK pathway gene lists used in Chapter 4
The list of chromatin remodeling genes was compiled from previously published work
[336,106]. The MAPK genes are MAPK pathway members (KEGG hsa04010) that are also
found in the Cancer Gene Census [7]. The genes are sorted alphabetically within each
category.
Gene symbol
Function
ACTL6A
Chromatin remodeling
ARID1A
Chromatin remodeling
ARID1B
Chromatin remodeling
ASH1L
Chromatin remodeling
BPTF
Chromatin remodeling
BRCA1
Chromatin remodeling
BRCA2
Chromatin remodeling
BRD2
Chromatin remodeling
BRD4
Chromatin remodeling
BRD7
Chromatin remodeling
CHD3
Chromatin remodeling
CHD4
Chromatin remodeling
CHD6
Chromatin remodeling
CREBBP
Chromatin remodeling
DOT1L
Chromatin remodeling
EP300
Chromatin remodeling
EXT1
Chromatin remodeling
EZH1
Chromatin remodeling
EZH2
Chromatin remodeling
GATAD2B
Chromatin remodeling
H1F0
Chromatin remodeling
H1FX
Chromatin remodeling
H2AFY
Chromatin remodeling
H3F3A
Chromatin remodeling
HDAC4
Chromatin remodeling
HDAC6
Chromatin remodeling
HDAC7
Chromatin remodeling
HDAC9
Chromatin remodeling
HIST1H2AG
Chromatin remodeling
HIST1H2AL
Chromatin remodeling
HIST1H2BI
Chromatin remodeling
HIST1H2BL
Chromatin remodeling
HIST1H3A
Chromatin remodeling
297
Gene symbol
Function
HIST1H3F
Chromatin remodeling
HIST1H4I
Chromatin remodeling
HIST2H2BF
Chromatin remodeling
HIST3H2BB
Chromatin remodeling
HTATIP2
Chromatin remodeling
IKZF3
Chromatin remodeling
IKZF4
Chromatin remodeling
INF2
Chromatin remodeling
ING3
Chromatin remodeling
JAK2
Chromatin remodeling
KAT2A
Chromatin remodeling
KAT2B
Chromatin remodeling
KDM2A
Chromatin remodeling
KDM3B
Chromatin remodeling
KDM4A
Chromatin remodeling
KDM4D
Chromatin remodeling
KDM5A
Chromatin remodeling
KDM5B
Chromatin remodeling
KDM6A
Chromatin remodeling
MEF2B
Chromatin remodeling
MLL1
Chromatin remodeling
MLL2
Chromatin remodeling
MLL3
Chromatin remodeling
MLL4
Chromatin remodeling
MLL5
Chromatin remodeling
MYST3
Chromatin remodeling
MYST4
Chromatin remodeling
NCOA1
Chromatin remodeling
NCOA3
Chromatin remodeling
NCOA5
Chromatin remodeling
NCOR1
Chromatin remodeling
NCOR2
Chromatin remodeling
NSD1
Chromatin remodeling
NUP98
Chromatin remodeling
PADI4
Chromatin remodeling
PAX5
Chromatin remodeling
PPP1CA
Chromatin remodeling
PRDM1
Chromatin remodeling
PRDM14
Chromatin remodeling
PRDM15
Chromatin remodeling
PRDM16
Chromatin remodeling
PRDM2
Chromatin remodeling
PRDM4
Chromatin remodeling
298
Gene symbol
Function
PRDM5
Chromatin remodeling
PYGO1
Chromatin remodeling
PYGO2
Chromatin remodeling
SETD1A
Chromatin remodeling
SETD2
Chromatin remodeling
SETD5
Chromatin remodeling
SETD8
Chromatin remodeling
SMARCA2
Chromatin remodeling
SMARCA4
Chromatin remodeling
SMARCC1
Chromatin remodeling
SMYD1
Chromatin remodeling
SUV420H1
Chromatin remodeling
UTX
Chromatin remodeling
AKT1
MAPK pathway oncogene
AKT2
MAPK pathway oncogene
BRAF
MAPK pathway oncogene
DAXX
MAPK pathway oncogene
DDIT3
MAPK pathway oncogene
EGFR
MAPK pathway oncogene
ELK4
MAPK pathway oncogene
FGFR1
MAPK pathway oncogene
FGFR2
MAPK pathway oncogene
FGFR3
MAPK pathway oncogene
HRAS
MAPK pathway oncogene
JUN
MAPK pathway oncogene
KRAS
MAPK pathway oncogene
LILRB1
MAPK pathway oncogene
MAP2K4
MAPK pathway oncogene
MAPK10
MAPK pathway oncogene
MAPK9
MAPK pathway oncogene
MYC
MAPK pathway oncogene
NF1
MAPK pathway oncogene
NFKB2
MAPK pathway oncogene
NRAS
MAPK pathway oncogene
NTRK1
MAPK pathway oncogene
PDGFB
MAPK pathway oncogene
PDGFRA
MAPK pathway oncogene
PDGFRB
MAPK pathway oncogene
PTPN11
MAPK pathway oncogene
PTPN13
MAPK pathway oncogene
RAF1
MAPK pathway oncogene
TP53
MAPK pathway oncogene
299