Clinical proteomics in breast cancer

Clinical proteomics in breast cancer
ISBN/EAN: 978-90-393-4994-6
© 2009 Marie-Christine Gast, Den Haag
Cover design:
Printed by:
Initium- grafisch en interactief, Utrecht, The Netherlands
Gildeprint Drukkerijen BV, Enschede, The Netherlands
Clinical proteomics in breast cancer
“Clinical proteomics” in borstkanker
(met een samenvatting in het Nederlands)
PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Universiteit Utrecht
op gezag van de rector magnificus, prof. dr J.C. Stoof,
ingevolge het besluit van het college voor promoties
in het openbaar te verdedigen
op donderdag 12 februari 2009 des middags te 12.45 uur
door
Maria Christine Willemine Gast
geboren op 23 februari 1978
te Woerden
Promotoren:
Promotoren:
Prof. dr J.H. Beijnen
Prof. dr J.H.M. Schellens
The laboratory research described in this thesis was performed at the Department of
Pharmacy & Pharmacology, Slotervaart Hospital / The Netherlands Cancer Institute,
Amsterdam, The Netherlands
The research in this thesis was financially supported by the Dutch Cancer Society
(project NKI 2005-3421).
Publication of this thesis was financially supported by:
Dutch Cancer Society, Amsterdam, The Netherlands
The Netherlands Laboratory for Anticancer Drug Formulation, Amsterdam, The
Netherlands
Genzyme Nederland, Naarden, The Netherlands
Contents
Preface
9
Chapter 1
Introduction
1.1
Clinical proteomics in breast cancer: a review
Chapter 2
Technical aspects
2.1
Comparing the old and new generation SELDI-TOF MS:
Consequence for serum protein profiling
39
2.2
Serum protein profiling using SELDI-TOF MS:
Influence of sample storage duration
59
Chapter 3
Protein profiling of serum
3.1
Serum protein profiling for diagnosis of breast cancer
using SELDI-TOF MS
83
3.2
SELDI-TOF MS serum protein profiles in breast cancer:
Assessment of robustness and validity
101
3.3
Haptoglobin phenotype is not a predictor of recurrence free
survival in high-risk primary breast cancer patients
121
3.4
Post-operative proteomic profiles may predict recurrence free
survival in high-risk primary breast cancer
143
Chapter 4
Protein profiling of tissue
4.1
Detection of breast cancer by SELDI-TOF MS tissue and serum
protein profiling
13
163
perspectivess
Conclusions and perspective
187
Summary
197
Samenvatting
201
Dankwoord
Dankw
oord
209
Curriculum Vitae
213
List of publications
215
Preface
Preface
Breast cancer imposes a significant healthcare burden on women worldwide. For
example, in the USA, breast cancer currently is estimated to be the most commonly
diagnosed neoplasm in women, accounting for more than a quarter of all new female
cancer cases (1). In addition, preceded only by lung cancer, breast cancer is at present
the second leading cause of cancer deaths (1). Despite the substantial progress made in
cancer therapy, the five-year survival rate of breast cancer still is inversely proportional
to its stage at the time of diagnosis (2). Hence, short of prevention, detection of breast
cancer at an early, still curable, stage would offer the best route to decrease its mortality
rates. However, since many patients present with advanced disease, the currently
applied diagnostic screening tools (e.g., mammography) obviously do not suffice for
adequate breast cancer diagnosis. In addition, despite the survival benefit achieved by
locoregional treatment and adjuvant systemic therapy, many breast cancer patients will
eventually develop metastatic relapse and die (3), while a small percentage of patients
would have survived without these treatment modalities. Evidently, the currently
applied prognostic and predictive markers (e.g., age, hormone receptor status) lack
adequate performance as well. Hence, better markers for early diagnosis, accurate
prognosis and treatment prediction, applied either individually or in conjunction with
existing modalities, are warranted to improve breast cancer care.
Although (breast) cancer is, for a large part, a genetic disease, it is currently understood
that gene analysis by itself does not provide a complete picture of the actual state of an
individual. Instead, the functional “end-units” of the genome, the proteome, will offer a
more dynamic and accurate reflection of a biological status. The clinical relevance of
these proteins as cancer biomarkers is augmented by their ease of access in blood, being
a readily accessible biological matrix that allows for repeated collection. In fact, several
blood proteins are already in use a breast cancer markers (e.g., Cancer Antigen 15.3 and
27.29) (4). Their lack of adequate performance, however, precludes their use as singular
breast cancer markers. Conversely, a panel of protein markers is expected to better
reflect breast cancer complexity, yielding improved sensitivities and specificities. The
search for this biomarker panel has been boosted by recent developments in mass
spectrometry, resulting in a.o. the surface-enhanced laser desorption/ionisation time-offlight mass spectrometry (SELDI-TOF MS) technology (5). Enabling the simultaneous
detection of a large part of the (blood) proteome in a high-throughput fashion, this
technology holds promise as a screening tool for discovery of cancer biomarkers. A
landmark paper in this respect was written by Petricoin et al. (6), providing the first
report on SELDI-TOF MS serum protein profiling for identification of (ovarian) cancer
patients.
The objectives of this thesis were the evaluation and application of SELDI-TOF MS
protein profiling for detection of serum and tissue protein profiles that could yield new
biomarkers for diagnosis and prognosis of breast cancer. First, we provide an overview
9
Preface
of protein profiling studies performed in breast cancer, and evaluate the potential of
proteins identified by this research for clinical use as breast cancer biomarkers (Chapter
1). Subsequently, both technical (Chapter 2.1) and pre-analytical (Chapter 2.2) aspects
related to protein profiling research were investigated. The SELDI-TOF MS technology
was then used for protein profiling of serum, searching for novel markers that can be
applied in diagnosis (Chapter 3.1) or prognosis (Chapter 3.3 and 3.4) of breast cancer,
and determining the reproducibility of diagnostic serum protein profiles (Chapter 3.2).
Finally, to augment insight into the pathophysiological mechanisms associated with, or
underlying, breast cancer, the SELDI-TOF MS technology was applied in protein
profiling of breast tissue (Chapter 4).
References
(1)
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. Cancer statistics, 2008. CA Cancer J Clin 2008;
58(2):71-96.
(2)
Ries L, Melbert D, Krapcho M, Stinchcomb D, Howlader N, Horner M et al. SEER Cancer Statistics
Review, 1975-2005, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2005/ ,
based on November 2007 SEER data submission, posted on the SEER website. 2008.
(3)
Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer
Trialists' Collaborative Group. Lancet 1998; 352(9132):930-942.
(4)
Harris L, Fritsche H, Mennel R, Norton L, Ravdin P, Taube S et al. American Society of Clinical
Oncology 2007 update of recommendations for the use of tumor markers in breast cancer. J Clin Oncol
2007; 25(33):5287-5312.
(5)
Hutchens TW, Yip TT. New desorption strategies for the mass spectrometric analysis of macromolecules.
Rapid Commun Mass Spectrom 1993; 7:576-580.
(6)
Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM et al. Use of proteomic
patterns in serum to identify ovarian cancer. Lancet 2002; 359(9306):572-577.
10
Chapter
Introduction
1
Chapter
Clinical proteomics
in breast cancer:
a review
Marie-Christine W. Gast
Jan H.M. Schellens
Jos H. Beijnen
Breast Cancer Res Treatm 2008; in press
1.1
Chapter 1.1
Abstract
Breast cancer imposes a significant healthcare burden on women worldwide. Early
detection is of paramount importance in reducing mortality, yet the diagnosis of breast
cancer is hampered by the lack of an adequate detection method. In addition, better
breast cancer prognostication may improve selection of patients eligible for adjuvant
therapy. Hence, new markers for early diagnosis, accurate prognosis and prediction of
response to treatment are warranted to improve breast cancer care. Since proteomics
can bridge the gap between the genetic alterations underlying cancer and cellular
physiology, much is expected from proteome analyses for the detection of better protein
biomarkers. Recent technical advances in mass spectrometry, such as matrix-assisted
laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) and its
variant surface-enhanced laser desorption/ionisation (SELDI-) TOF MS, have enabled
high-throughput proteome analysis. In the current review, we give a comprehensive
overview of the results of expression proteomics (i.e., protein profiling) research
performed in breast cancer using these two platforms. Many protein peaks have been
reported to bear significant diagnostic, prognostic or predictive value, however, only
few candidate markers have been structurally identified yet. In addition, although of
pivotal importance in preventing overfitting of data and systematic bias by preanalytical parameters, validation of biomarker candidates by other, quantitative,
methods and/or in new populations is very limited. Moreover, none of the identified
candidate biomarkers has been investigated for their utility as breast cancer markers in
large, prospective, clinical settings. As such, the candidate biomarkers discussed in this
overview have not been validated sufficiently to be used for clinical patient care.
Nonetheless, regarding the promising results up to now, MALDI- and SELDI-TOF MS
protein profiling studies could eventually fulfil the great promise that protein
biomarkers have for improving cancer patient outcome, provided that these studies are
performed with adequate statistical power and analytical rigour.
14
Clinical proteomics in breast cancer
Introduction
Breast cancer imposes a significant healthcare burden to women worldwide. For
example, in the USA, breast cancer is estimated to be the most commonly diagnosed
neoplasm in women in 2008, as it will account for 26% of all new female cancer cases
(1). In addition, preceded only by lung cancer, breast cancer is expected to be the
second leading cause of USA cancer deaths in 2008 (1). The five-year survival rates of
breast cancer decrease from 98% for localised disease to 26% for late stage disease (2).
Hence, short of prevention, detection of breast cancer at an early, still curable stage
would offer the best route to decrease its mortality rates. However, since only 63% of
breast cancers are still confined to the breast at the time of diagnosis (1), the currently
applied diagnostic screening tools (e.g., mammography) obviously do not suffice for
adequate breast cancer diagnosis. In addition, despite the survival benefit achieved by
locoregional treatment and adjuvant systemic therapy, 30-50% of breast cancer patients
will eventually develop metastatic relapse and die (3), while a small percentage of
patients would have survived without these treatment modalities. Evidently, the
currently applied prognostic and predictive markers (e.g., age, hormone receptor status)
lack adequate performance as well. Hence, better markers for early diagnosis, accurate
prognosis and prediction of response to treatment are warranted to improve breast
cancer care.
We now comprehend that cancer arises from successive genetic changes, by which a
number of cellular processes, including growth control, senescence, apoptosis,
angiogenesis, and metastasis, are altered (4;5). Consequently, researchers initially
searched for markers by employing genomic and transcriptomic approaches, providing
new biomarkers (e.g., (6-9)) and expanding our insight into the genetic basis of cancer.
It is, however, currently understood that gene analysis by itself provides an incomplete
picture. Due to alternative splicing of both mRNA and proteins, combined with more
than 100 unique post-translational modifications, one gene can give rise to multiple
protein species (10). Hence, compared to the genome, the proteome can provide a more
dynamic and accurate reflection of both the intrinsic genetic programme of the cell and
the impact of its immediate environment (11). Since proteome analysis can provide the
link between gene sequence and cellular physiology (12), proteomics is expected to
complement gene analyses for evaluating disease development, prognosis, and response
to treatment (13).
Until recently, the search for novel protein biomarkers has been dominated by twodimensional gel electrophoresis (14), a major disadvantage of which is its lack of real
high-throughput capability. However, recent advances in analytical technologies, such
as protein microarrays and mass spectrometry (MS), have enabled large-scale proteomic
analyses (15). Due to their relative simplicity of sample preparation, high analytical
sensitivity and speed of data acquisition, two MS-based technologies in particular, i.e.,
matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) MS (16) and its
15
Chapter 1.1
variant surface-enhanced laser desorption/ionisation (SELDI-) TOF MS (17;18) have
been widely deployed for cancer biomarker discovery (19). In both laser
desorption/ionisation (LDI) platforms, biological samples (e.g., serum, tissue lysate) are
co-crystallised with an energy absorbing matrix on a sample probe surface. Subsequent
irradiation with brief laser pulses sublimates and ionises the proteins out of their
crystalline matrix, after which an electric field migrates the charged proteins to the
time-of-flight mass analyser. Herein, proteins are separated based on their mass, as the
time to detector impact (TOF) is proportional to protein mass per charge (m/z). The two
LDI platforms differ in their sample probe surfaces. In MALDI, the probe surface
merely presents the sample to the mass spectrometer, warranting off-line sample
fractionation and clean-up to produce usable MS signals. In contrast, the probe surfaces
utilised by SELDI are comprised of various chromatographic surfaces, enabling their
active role in sample fractionation (Figure 1).
In the current overview, we focus on the expression proteomics (i.e., protein profiling)
studies performed using the two LDI platforms in the search for novel breast cancer
biomarkers. We will discuss the studies performed thus far for discovery of diagnostic,
prognostic and predictive biomarkers, and evaluate the potential of the discriminating
proteins identified in this research for clinical use as breast cancer biomarkers.
Diagnostic protein profiling studies
Short of prevention, detection at an early stage remains the best route to decrease breast
cancer related mortality. Hence, the majority of MALDI/SELDI protein profiling studies
performed in breast cancer has searched for novel diagnostic markers (Table 1). All
diagnostic protein profiling studies were performed in vivo, investigating various
biological matrices, including serum, plasma and tissue, but also nipple aspirate fluid,
ductal lavage fluid and saliva.
Protein profiling of tissue
As tissue proteins will reflect the earliest changes caused by the successive genetic
mutations that lead to breast cancer, it has been hypothesised that the concentration of
potential biomarkers is highest in the tumour and its immediate microenvironment
(19). Although tissue provides an invaluable sample source, tissue sampling through
biopsies is highly invasive, thereby limiting the number of diagnostic tissue protein
profiling studies performed thus far.
Analysis of tumour tissue lysates by SELDI-TOF MS revealed several peaks that were
significantly associated with lymph node status (20) or cancer subtype (i.e., lobular and
ductal carcinoma) (21). However, the search for tumour originating proteins can be
complicated by the high cellular heterogeneity of whole tumour tissue specimens. This
can be reduced by laser capture microdissection (LCM), enabling selective capture of a
16
Clinical proteomics in breast cancer
specific subset of cells (22). Following microdissection, captured cells can be mounted
directly on a MALDI target, thereby preserving their spatial conformation for imaging
MS (23;24). Using LCM, Umar et al. (24) detected 9 differentially expressed tryptic
peptides (not structurally identified) following analysis of stromal and tumour cells
collected from five tissue specimens. In addition, Sanders et al. (23) identified ubiquitin
and S100-A8 to be decreased in tumour (n = 122) compared to normal tissue (n = 167),
whereas S100-A6 was found increased. Their split-sample approach allowed a successful
within-study validation of the three potential markers (23). As both ubiquitin and S100A6 were also found to decrease in lysates of human breast cancer cell lines following
chemotherapy induced apoptosis (25), these proteins may provide insight into the
pathogenesis of breast cancer upon further investigation.
Despite the clear potential of (tumour) tissue to yield cancer-specific diagnostic
biomarkers, their routine clinical application is seriously hampered by the intricacies
associated with tissue sampling. Although this can be avoided by assessment of tumourderived markers in easier accessible biological matrices such as serum, this type of
validation has not been performed in breast cancer yet.
Protein profiling of serum and plasma
Since whole blood is considered to provide a dynamic reflection of physiological and
pathological status, human plasma and serum represent the most extensively studied
biological matrices in the quest for (breast) cancer biomarkers (26). Constantly
perfusing and percolating the human body, the blood compartment endows a proteinrich information archive (27). Besides the expected circulatory proteins, this archive
also contains specific tumour-secreted proteins, normal tissue- and plasma-proteins
digested by tumour-secreted proteases, and proteins produced by local and distant
responses to the tumour (11;28;29). Moreover, whole blood is an easy to sample, readily
accessible matrix that allows repeated collection, thereby augmenting the clinical
relevance of candidate blood-borne biomarkers (28;30).
Several MALDI-TOF MS and SELDI-TOF MS peaks (not structurally identified) have
been reported to differentiate between serum or plasma of breast cancer patients,
patients with benign breast disease and/or healthy controls (31-36). Since a small
percentage (7 to 10%) of breast cancers is attributable to hereditary syndromes (e.g.,
BRCA-1, -2 mutations), Becker et al. (37) investigated whether the BRCA-1 mutation
was reflected by the serum proteome. Multiple SELDI-TOF MS peaks were significantly
different in expression between breast cancer patients with and without the BRCA-1
mutation (37). However, as none of these peaks were structurally identified, their
association to the BRCA-1 gene remains unclear. Moreover, none of the peaks reported
by these studies have been validated by analysis of an independent sample set. Yet
validation is of utmost importance to ascertain reproducibility and prevent systematic
bias and overfitting of data. This is highlighted by a study of our group (38), in which
the potential markers for breast cancer and lymph node status, reported by Vlahou et al.
17
Chapter 1.1
(39) and Laronga et al. (34), respectively, could not be confirmed following analysis of
an independent sample set. In contrast, Belluco et al. (40) report excellent performance
of their seven-peak classifier (not structurally identified) following validation by an
independent sample set analysed 14 months after their initial discovery study.
Figure 1
Schematic representation of the MALDI- and SELDI-TOF MS principle (adapted from (15)).
A) Protein profiling by MALDI-TOF MS: 1. samples (μl volume) are fractionated off-line using for instance
magnetic beads coated with a chromatographic surface (e.g., hydrophilic, hydrophobic, cationic, anionic, or
immobilised metal affinity capture moiety), 2. addition of energy absorbing matrix (e.g., α-cyano-4-hydroxycinnamic acid) to (fractionated) samples, 3. application of mixed specimen to inert target plate for laser
irradiation in C.
B) Protein profiling by SELDI-TOF MS: 1. application of sample (μl volume) from, for example, cancer and
control patients to an 8-spot array with a chromatographic surface (e.g., hydrophilic, hydrophobic, cationic,
anionic, or immobilised metal affinity capture moiety) in appropriate binding buffer, 2. on-chip sample cleanup using various wash-buffers, 3. application of energy absorbing matrix (e.g., sinapinic acid) for desorption /
ionisation of proteins by laser irradiation in a laser desorption/ionisation time-of-flight analyser (C).
C) Schematic representation of laser desorption/ionisation (LDI) time-of-flight (TOF) analyser: the MALDI
target plate or SELDI array is inserted in the MALDI or SELDI instrument. Subsequent laser irradiation
desorbs and ionises bound proteins, after which an electric field migrates the charged proteins to the TOF
analyser. Herein, proteins are separated based on their mass, as the time to detector impact (TOF) is
proportional to the protein mass per charge (m/z = constant * t2). Thus, small proteins (c) fly faster than large
ones (a), and multiple charged ones (b) faster than single-charged ones (a).
D) Representative example of SELDI-TOF mass spectra of sera from female healthy controls (HC) and breast
cancer patients (BC). On the x-axis the protein m/z is displayed, and the y-axis depicts its abundance.
Expression differences are visible between breast cancer and control sera at m/z 3980 and m/z 4292 (first
arrow, ITIH4 fragments), and m/z 8939 (second arrow, C3adesArg).
B
A
1 sample fractionation 1
D
2
2
3
3
60
40
20
0
60
40
20
0
60
40
20
0
60
40
20
0
+
18
Mirror
Laser
+
a
b
c
Mass analyser (TOF MS)
Detector
Target
C
4000
6000
8000
HC
HC
BC
BC
4000
6000
8000
Training
Platform
LCM, IMS
LCM, IMS
LCM lysate
Lysis
C8 fractionation
C18 fractionation
WCX fractionation
IMAC fractionation
WCX fractionation
Albumin depletion
SAX fractionation
-
Pretreatment
IMAC Cu
IMAC, WCX, SAX
H4
IMAC Cu, SAX
IMAC Cu, SAX
IMAC Cu, SAX
IMAC Cu
IMAC Ni
IMAC Ni
IMAC Ni
IMAC Ni
IMAC Ni
Immunoassay
Immunoassay
IMAC Cu
IMAC Cu
H4
NP20, H4, SAX
NP20, H4, SAX
IMAC Cu, WCX
IMAC Cu, WCX
IMAC Cu
NP20
Condition
5
62
65
20
78
21
46
48
76
49
45
16
15
155
103
20
19
61
29
12
20
25
23
23
5
38
46
42
25
83
-
Samples (n)
BC
BD
3a
84
29
33
28
77
33
47
15
15
155
41
41
40
61
15
15
13
23b
23b,5
5
63
HC
Protein profiling studies performed in breast cancer by MALDI- and SELDI-TOF MS.
Diagnostic studies
Tissue
MALDI
MALDI
SELDI
SELDI
Serum
MALDI
MALDI
MALDI
MALDI
MALDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
Plasma
SELDI
SELDI
NAF
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
SELDI
Matrix
Table 1
n.p.
MALDI
n.p.
n.p.
n.p.
n.p.
n.p.
n.p.
n.p.
n.p.
n.p.
n.p.
SELDI
n.p.
SELDI
n.p.
SELDI
SELDI
SELDI
n.p.
n.p.
SELDI
n.p.
n.p.
n.p.
n.p.
n.p.
n.p.
SELDI
n.p.
Validation
Platform
0
0
37
13
0
0
0
60
47
93
49
48
28
9
Samples (n)
BC
BD
7b
0
46
27
48
48
83
HC
No
Yes
No
No
No
Yes
No
No
No
No
No
No
No
No
No
No
Yes
No
No
Yes
Yes
Yes
No
No
Yes
No
No
No
Yes
No
ID
(24)
(23)
(20)
(21)
(32)
(41)
(36)
(31)
(42)
(33)
(39)
(34)
(38)
(37)
(40)
(43)
(44)
(45)
(46)
(47)
(48)
(49)
(35)
(50)
(51)
(52)
(53)
(54)
(55)
(56)
Ref.
Clinical proteomics in breast cancer
19
20
SELDI
SELDI
SELDI
Lysis
(Medium)
Lysis
H4
IMAC Cu
WCX
WCX
IMAC Cu
IMAC Cu, WCX
IMAC Cu
IMAC Cu
IMAC Cu, SAX
SAX
-
IMAC Cu, WCX
SAX
WCX
3
2
3
6
24
27
105
60
81
63
87
21
16
3
-
-
-
-
n.p.
n.p.
n.p.
n.p.
n.p.
IHC on TMA
n.p.
n.p.
n.p.
1D GE
n.p.
n.p.
n.p.
n.p.
21b,44
16b
3
-
Platform
Validation
HC
0
0
371
BD
547a
BC
Samples (n)
0
0
HC
No
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
ID
(67)
(68)
(25)
(69)
(35)
(60)
(61)
(62)
(63)
(64)
(65;66)
(57)
(58)
(59)
Ref.
Abbreviations: 1D GE: 1 dimensional gel electrophoresis, BC: breast cancer, BD: benign breast disease patient, CSF: cerebrospinal fluid, DLF: ductal lavage fluid, H4:
reversed phase array, HC: healthy control, ID: structural identification of candidate biomarkers, IHC: immunohistochemistry, IMAC: immobilised metal affinity
capture (fractionation or array), IMS: imaging mass spectrometry, LCM: laser capture microdissection, NAF: nipple aspirate fluid, n.p.: not performed, NP20: normal
phase array, SAX: strong anion exchange (fractionation or array), TMA: tissue microarray, WCX: weak cation exchange (fractionation or array). a: tissue sample
obtained from tissue adjacent to tumourous tissue, b: NAF / DLF sample obtained from non-cancerous contralateral breast.
Predictive studies
Cell line
SELDI
SELDI
SELDI
Serum
SELDI
Plasma
SELDI
Tryptic digestion
Lysis
Lysis
Lysis
SAX fractionation
-
BC
BD
Samples
Samples (n)
Condition
Platform
Pretreatment
Training
Protein profiling studies performed in breast cancer by MALDI- and SELDI-TOF MS (continued)
Prognostic studies
Cell line
SELDI
Tissue
SELDI
SELDI
Serum
SELDI
SELDI
CSF
MALDI
NAF
DLF
Saliva
Matrix
Table 1
Chapter 1.1
Clinical proteomics in breast cancer
Li et al. (43) observed three serum peaks to distinguish patients from controls: one (4.3
kDa) decreased and two (8.1 kDa and 8.9 kDa) increased in patients. These peaks were
structurally identified as a fragment of inter-alpha-trypsin inhibitor heavy chain H4
(ITIH4, 4.3 kDa), C3a des-arginine (C3adesArg, 8.9 kDa) and a C-terminal truncated form
thereof (C3adesArg∆8, 8.1 kDa) (44). Subsequent analysis of an independent sample set
could only confirm the increased 8.1 kDa and 8.9 kDa C3a fragments (44). However, the
8.1 kDa C3adesArg∆8 was found to lack significance in a second (45) and third validation
study (46). The latter study also reported a decreased 8.9 kDa C3adesArg expression in
breast cancer (46), whereas in all previous studies, this fragment was found increased
(43-45). Beyond these four studies, C3adesArg has been found associated with survival, as
its expression decreased in metastatic relapse (63). In addition, the 4.3 kDa ITIH4
fragment was one of the several ITIH4 fragments found increased in breast cancer by
Song et al. (48). Similar ITIH4 fragments, observed by Villanueva et al. (41) and Fung et
al. (47), were found either increased in cancer (41), or devoid of discriminative power
(47). Regarding the inconsistent regulation observed across multiple studies, the
definitive value of the different ITIH4 fragments, C3adesArg, and C3adesArg∆8 in the
diagnosis of breast cancer cannot be determined yet.
In addition to the various ITIH4 fragments, several fragments of fibrinopeptide A,
fibrinogen alpha, C3f, C4a, apolipoprotein A-IV, bradykinin, factor XIII, and
transthyretin were found to provide accurate class discrimination (41). Generated by
exoprotease activities superimposed on the ex vivo coagulation and complementdegradation pathways, these fragments are proposed to bear cancer-type specificity. It
has, however, been argued that this ´peptidome signature´ merely reflects the
hypercoaguable state of the blood of cancer patients (70) and not necessarily a cancerspecific signature (19). Although the peptidome signature has not been validated yet,
two fragments thereof (i.e., the ITIH4 fragment discussed above, and a fibrinogen
fragment) have been encountered in other studies as well (43;47;48). The fibrinogen
fragment, though increased in the breast cancer serum peptidome, was found decreased
in breast cancer plasma and reverted to normal values after surgical extirpation of the
tumour (49). The difference between study results most likely originates from the
biological matrix investigated, as plasma differs from serum by inhibition of the
coagulation cascade, by which fibrinogen is generated.
Also of interest are the results of the ‘Classification competition on clinical mass
spectrometry proteomic diagnosis data’ (42). For this competition, sera of breast cancer
patients (n = 76) and healthy controls (n = 77) were analysed by MALDI-TOF MS. Data
were subsequently analysed by ten competition participants for construction of
diagnostic classifiers (71-80). Surprisingly, though the various bioinformatic methods
applied resulted in highly divergent classification models, reported performances
(ranging from 83% to 89%) were very similar. However, as these results are based on a
single dataset, validation by analysis of an independent study population most likely
21
Chapter 1.1
will reveal differences between the various bioinformatic methods and their resulting
classification models.
Serum and plasma protein profiling studies by MALDI-TOF or SELDI-TOF MS have
yielded numerous protein peaks with a significantly different expression between breast
cancer and healthy control. However, although elucidation of protein identities is
essential for insights into the molecular mechanisms involved in breast cancer, thus far,
only a small percentage of reported peaks has been structurally identified. Moreover,
since most studies did not investigate other cancer types or patients with benign breast
disease, the specificity of reported markers for breast cancer still has to be addressed.
Furthermore, although of pivotal importance, only few potential markers have been
validated by analysis of independent sample sets. As these studies generally yielded
contradictory results, further research is needed to determine the potential of identified
markers in breast cancer diagnosis.
Protein profiling of nipple aspirate fluid and ductal lavage fluid
Most breast cancers (70-80%) are thought to arise from the epithelial cells lining the
mammary ducts (13). The breast epithelium exfoliates cells as a renewal of tissue and
secretes fluid into the ductal-lobular system of the breast. While this fluid exits each
breast through six to nine orifices at the nipple, it can be collected by either of two noninvasive methods; aspiration or ductal lavage. Both nipple aspirate fluid (NAF) and
ductal lavage fluid (DLF) are traditionally used for cytological assessment (56;58), but
their vicinity to the breast epithelium renders them valuable matrices for diagnostic
protein profiling studies as well.
Although several discriminating protein peaks were detected when comparing equal
volumes of NAF or DLF from breast cancer patients and healthy controls by SELDITOF MS (50;55;56;58), large variations in the spectra between different samples within
one diagnostic group were observed (50;55;58). Likely originating from the wide
protein content range of NAF (1-90 mg/ml (56)), further studies have normalised the
protein content prior to analysis. Nonetheless, despite normalisation, Sauter et al.
(51;52) could not confirm the initially observed diagnostic potential of three SELDITOF MS peaks (identified as haemoglobin beta chain isoforms) in a second, larger, study
population. In contrast, despite the very limited sample size, the differential expression
of human neutrophil peptides 1-3 observed in NAF (n = 10) was confirmed by analyses
of pooled DLF samples from cancerous (n = 9) and unrelated healthy (n = 7) breasts (55).
As the breasts are a paired organ system, NAF samples from both the cancerous and
non-cancerous breast of patients with unilateral breast cancer have been compared as
well. Surprisingly, although different between patients, protein expression patterns
were highly similar in both breasts of each patient (53;54;57). Comparison of either the
cancerous or the contralateral breast to unrelated healthy controls, however, yielded
several significantly different peaks (54;57).
22
Clinical proteomics in breast cancer
Despite limited sample sizes and lack of validation studies, NAF protein profiling did
distinguish between women with and without breast cancer. However, as identification
of the cancer-bearing breast was not possible, protein profiling of NAF may have more
value in breast cancer risk assessment and disease monitoring than as a diagnostic tool
(57). Evidently, further research is needed to assess the value of the intraductal
approach in breast cancer diagnosis.
Protein profiling of saliva
The use of saliva in diagnosis of systemic diseases such as breast cancer has been
demonstrated by the detection of increased levels of solubilised c-erbB-2 and CA15.3 in
breast cancer patients compared to healthy controls (33). Investigating saliva for
diagnostic purposes has several key advantages, including its noninvasive collection, the
possibility of repeated sampling, and the ease of sample handling and processing.
Nonetheless, thus far, only one feasibility study has been performed in saliva. Using
SELDI-TOF MS, five high molecular weight peaks were found to be overexpressed in
breast cancer (n = 3) compared to control (n = 3) (59). Although these peaks were
neither structurally identified nor validated in larger sample sets, this study does show
the potential of using saliva for diagnostic purposes.
Prognostic protein profiling studies
Compared to diagnostic studies, protein profiling studies aimed at discovering novel
markers to improve breast cancer prognostication are rather limited (Table 1).
Investigating post-operative sera of 83 high-risk breast cancer patients by SELDI-TOF
MS, Goncalves et al. (63) constructed a 40-protein signature that correctly predicted
outcome in 83% of patients. Major components of this signature included haptoglobin
alpha-1, complement component C3a, transferrin, and apolipoprotein A-I and C-I
(Table 2). These results should be interpreted cautiously, as the number of proteins used
for classification is rather high in comparison with the limited study population,
indicating probable over-fitting of the data. Moreover, results have not been validated
in independent sample sets. The importance of validation is emphasised by a study
performed by our group (64). Using SELDI-TOF MS, we discovered a strong association
between haptoglobin phenotype and recurrence free survival in sera of 63 high-risk
primary breast cancer patients. However, as results were not confirmed following
validation by haptoglobin phenotyping of a six-fold larger sample set (n = 371), this
observation most likely resulted from a type I error (i.e., false positive) (64).
In a third SELDI-TOF MS study, performed in breast cancer tissue (n = 60), high levels
of ubiquitin and/or low levels of ferritin light chain were found associated with a good
prognosis (62). Although results have not been confirmed by analysis of independent
23
Chapter 1.1
sample sets, ubiquitin has also been found differently expressed in breast cancer by
three other studies investigating tissue specimens (23) and cell lines (25;60).
Lastly, cerebrospinal fluid (CSF) has also been explored for prognostic markers (65;66).
CSF is specific for the central nervous system (81), contains less total protein than
serum and provides a low fluid-volume-to-organ ratio, thereby augmenting biomarker
discovery (30). As collection of CSF by invasive lumbar puncture is not applicable to
healthy controls, this matrix has thus far only been investigated for prognostic purposes.
In search for markers indicative of leptomeningeal metastases (LM), whole CSF samples
of 106 breast cancer patients were digested with trypsin (65). Following MALDI-TOF
MS analysis of the resulting peptides, a classifier with 77% accuracy in determining LM
status was constructed (65). The discriminative tryptic peptides were derived of several
proteins (66). Three of these proteins (i.e., apolipoprotein A-I, haptoglobin and
transferrin) have also been found associated to clinical outcome in serum (63).
Currently, breast cancer prognosis is assessed by a.o. TNM classification, assigning
breast tumours to different stages based on depth of tumour invasion and presence of
metastases. However, considering the heterogeneity in outcome of patients diagnosed
with equivalent TNM stage, this classification system is suboptimal in tumour
characterisation. Instead, tumour staging on the molecular level could be more accurate.
Indeed, microarray-based gene expression profiling studies have identified five major
molecular breast cancer subtypes (i.e., luminal A and B, ERBB2-overexpressing, basallike, and normal-like), showing distinct clinical courses and responses to therapeutic
agents (82;83). Hence, in search for prognostic markers, two studies have investigated
the correlation between SELDI-TOF MS protein profiles of tumour tissue lysates (n =
105) (61) and breast cancer cell lines (n = 27) (60) with the previously reported breast
cancer subtypes. Although discrepancies between cells grown in vivo and in vitro exist
due to adaptation to cell culture conditions, breast cancer cell lines have been shown to
accurately reflect the genomic, transcriptional, and biological heterogeneity found in
primary tumours (84). As such, they appear to be a good surrogate matrix for tumour
tissues, enabling proteome comparisons without introducing interfering factors. Indeed,
in both studies, patient subgroups identified by hierarchical clustering of SELDI-TOF
MS protein profiles were analogous to the molecular breast cancer subtypes (60;61). Of
the several differentially expressed protein peaks detected, heat shock protein (HSP) 27
and annexin V were identified as over-expressed in the luminal A type tumour tissue
lysates (61), while S100-A9 and a C-terminal truncated form of ubiquitin were found
differentially expressed between the luminal-like and basal-like cell lines (60). Of note,
subsequent immunohistochemical analysis of S100-A9 in tumour specimens of 547 early
breast cancer patients confirmed its association with basal subtypes, as well as its value
as an indicator of poor prognosis (60). The in vivo prognostic potential of HSP 27 and
annexin V should be assessed by validation in clinical samples.
Similar to the diagnostic studies, the prognostic studies published thus far generally
investigated only a limited number of samples. Combined with the large number of
24
Clinical proteomics in breast cancer
features generated by the resulting protein profiles, datasets are frequently subjected to
multiple testing. Hence, candidate biomarkers are prone to be false positive, rendering
validation of pivotal importance to assess their true clinical performance. Nonetheless,
thus far, only two validation studies have been performed. All studies have, however,
structurally identified (part of) the candidate prognostic markers. The markers
identified across serum and CSF (e.g., apolipoprotein A-I, haptoglobin and transferrin)
were highly abundant, non-specific, host-response generated proteins. In addition,
many of the proteins identified in tissue and cell lines (e.g., annexin V, S100-A9) are in
fact normal cellular proteins. However, as their precise role in breast cancer remain to
be elucidated, further research is needed to determine their value for breast cancer
prognostication.
Predictive protein profiling studies
Although accurate prediction of chemosensitivity in cancer therapy would enable
individualised therapy, thus avoiding toxic side effects and eliminating the use of
ineffective agents, protein profiling studies searching for markers for response
prediction and treatment monitoring of breast cancer are scarce. Several SELDI-TOF
MS peaks (not structurally identified) were found indicative of treatment regimen for
chemosensitive and -resistant breast cancer cell lines following exposure to doxorubicin
or paclitaxel (67). In addition, Dowling et al. (68) found an increase of a 7.6 kDa bovine
transferrin fragment in serum-free conditioned medium of paclitaxel resistant human
breast cancer cell lines, corresponding to the increased expression of the transferrin
receptor they observed in whole cell lysates. Although these results were not translated
to a human in vivo setting, other studies have indeed reported an association between
increased serum and CSF transferrin levels and poor clinical outcome (63;65;66).
Similarly, while ubiquitin and S100-A6 were found to decrease in lysates of human
breast cancer cell lines following chemotherapy induced apoptosis (25), an aberrant
expression of both proteins has also been reported in breast cancer tissue (23;62).
Nonetheless, regarding the very limited number of samples investigated in the various
studies, screening of larger cohorts and validation of the preclinical data in clinical
samples is warranted before these potential markers can be used to improve therapeutic
accuracy in clinical practice.
In vivo studies have been performed as well (35;69). In serum, both high molecular
weight kininogen and apolipoprotein A-II were found significantly decreased in
expression following docetaxel-induced shock (69). Likewise, in plasma, a SELDI-TOF
MS peak at m/z 2790 (not structurally identified) was found to significantly increase
following (neo)adjuvant paclitaxel infusion (35). As it remains to be elucidated whether
identified proteins are treatment-responsive, originate from micrometastatic carcinoma,
or merely result from a general host-response to cytotoxic therapy, the definitive value
of identified proteins as predictive markers can not be established yet.
25
Chapter 1.1
Discussion and conclusion
Thus far, the majority of LDI protein profiling studies performed in breast cancer has
searched for novel diagnostic markers, while the search for new prognostic and
predictive biomarkers is limited to only few studies. The studies discussed in the current
overview together have reported hundreds of mass-to-charge values, intensities of
which were found to contain significant diagnostic, prognostic or predictive value.
However, although indispensable for providing insight into the pathophysiological
mechanisms associated with, or underlying, breast cancer, and development of absolute
quantitative assays, only very few of these mass-to-charge ratios have been structurally
identified yet. Moreover, the candidate markers that have been identified constitute of
normal cellular proteins and high abundant blood proteins involved in coagulation and
the acute phase response. Since their biology cannot be linked directly to tumour
biochemistry, one of the ultimate aims of (LDI) protein profiling studies, i.e., increasing
knowledge of the molecular mechanisms involved in cancer by identification of
discriminative (full-length) proteins generated exclusively by cancer cells, has not been
fulfilled yet.
Moreover, many of the identified candidate breast cancer markers have been found to
bear diagnostic potential for other cancer types as well (e.g., C3adesArg in colorectal
cancer (85), apolipoprotein A-I in ovarian cancer (86)), indicating a general lack of
tumour-specificity. However, as cancer cells are deranged host cells, and most cancers
of epithelial origin share similar molecular features (81), it may prove difficult to find a
true cancer-specific protein that is expressed exclusively by one type of malignant cells.
On the other hand, as such proteins are expected to be among the least abundant
proteins, they could well be below the detection limit of the current (LDI) methods.
Hence, these specific tumour-secreted proteins might actually exist, but could simply
have eluded detection thus far.
Nonetheless, identification of specific tumour-secreted proteins is no prerequisite for
improving breast cancer care, as better breast cancer diagnosis, prognosis, and
prediction can also be accomplished by surrogate biomarkers of disease. A class of
proteins currently recognised for their surrogate biomarker potential is the (proteolytic
fragments of) high-abundant circulatory proteins. These fragments are hypothesised to
be generated by cancer type-specific exoprotease activity, superimposed on the ex vivo
coagulation and complement degradation proteolytic pathways. In addition, these
fragments can also result from the proteases specifically expressed by malignant cells
within the tumour microenvironment for tumour invasion and metastasis (87;88), as
they proteolytically process the acute phase proteins that are generated by the host
response to the tumour. Since these modified host response proteins generally are
present at substantially higher circulatory concentrations than the enzymes that process
them upon their exposure to the tumour microenvironment, they can be detected in
blood by current (LDI) methods for diagnostic purposes (47). Although in breast cancer,
this concept has been investigated for a.o. serum ITIH4, the various studies have
26
Clinical proteomics in breast cancer
reported contradictory results, a finding not entirely unforeseen regarding the
biological matrix commonly investigated (i.e., serum). Since serum is generated by
coagulation, its proteome is prone to the proteases involved in this cascade, as well as to
those involved in the complement cascade, activated upon clotting. Various preanalytical parameters, such as sampling device, clotting temperature, and storage time,
can thus all exert a distinct influence on the serum proteome. Hence, the concept of
cancer type-specific (host response) protein fragments generated by tumour-secreted
proteases still awaits confirmation by validation studies that adhere to rigorous sample
handling protocols.
The need for such validation studies is, however, not limited to the reported host
response protein fragments. Regardless of their identity, the majority of markers has
been reported by single breast cancer studies, in which only limited numbers of samples
were investigated, thereby compromising the generalisibility of results. Moreover, as
the number of generated features (i.e., protein MS peaks) usually by far exceeds the
number of samples investigated, proteomic (LDI) datasets are frequently subjected to
multiple testing. As such, many candidate biomarkers are prone to be false positive.
Hence, to prevent overfitting of data, as well as systematic bias by above-mentioned
pre-analytical parameters, validation of biomarker candidates by other, quantitative,
methods and/or in new study populations is of pivotal importance. Yet, thus far, such
validation studies have been performed for only few of the candidate biomarkers
detected in LDI studies (i.e., serum C3adesArg, C3adesArg∆8, ITIH4 fragments, haptoglobin
alpha-1, plasma m/z 2660 fibrinogen, and tissue S100-A9). As these studies generally
yielded contradictory results (except for the m/z 2660 fibrinogen fragment and S100A9), further research is needed to determine the true value of these markers in breast
cancer management.
The few validation studies performed thus far are all of retrospective nature. In fact,
none of the identified candidate markers has been investigated for their utility as breast
cancer biomarkers in a larger, prospective, clinical setting. As such, none of the
candidate biomarkers discussed in this overview has been validated sufficiently to be
used for clinical patient care. Yet, the move from the discovery phase to the pre-clinical
and subsequent clinical validation phase is mandatory, as the sole purpose of a
biomarker lies in its application. Nonetheless, overseeing the results of MALDI- and
SELDI-TOF MS protein profiling studies up to now, the two platforms hold promise as
high-throughput screening tools for discovery of novel breast cancer markers. Provided
that these studies are performed with adequate statistical power and analytical rigour,
they could eventually fulfil the great promise that protein biomarkers have for
improving cancer patient outcome.
27
28 6
1627 1704
2602
Factor XIII
Ferritin light chain
942 - 1865
C4a (fragments)
19809
8900
8926
8919
8941
8936
C3a des-R anaphylatoxin
(C3adesArg)
C3f (fragments)
8100
8116
8129
8129
C3a C-terminal fragment
(C3adesArg∆8)
6647
904,
1061
t.p.
2508
9285
Bradykinin (fragments)
Apolipoprotein E
Apolipoprotein C-I
Apolipoprotein A-IV
Apolipoprotein A-II
28284
Apolipoprotein A-I
t.p.
33327
t.p.
α-1-antichymotrypsin
Annexin V
Platform (m/z)
MALDI
SELDI
Tissue
Serum
Serum
Serum
Serum
Serum
Serum
Serum
Serum
Serum
Serum
Serum
Serum
Serum
CSF
Serum
Serum
Serum
Serum
CSF
Tissue
CSF
Matrix
+
+
+
-
+
+
+
-
+
+
n.s.
n.s.
+
-
-
+
-
+
+
+
Relapse
Cancer
Cancer
Cancer
Cancer
Cancer
Cancer
Cancer
Relapse
Cancer
Cancer
Cancer
Cancer
Cancer
Metastasis
Relapse
Cancer
Shock
Relapse
Metastasis
Luminal
subtype
Metastasis
Expression
+/in
+/-
Acute phase protein, iron homeostasis
Blood coagulation
Complement activation
Complement activation
Complement activation
Complement activation
Inflammation mediator
Lipid metabolism
Acute phase protein, lipid metabolism
Lipid metabolism
Lipid metabolism
Lipid metabolism
Tumour proliferation / metastasis?,
anticoagulant protein
Acute phase protein,
serine protease inhibitor
Function
Candidate biomarkers in breast cancer identified by MALDI- and SELDI-TOF MS protein profiling studies.
Biomarker identity
Table 2
(62)
(41)
(41)
(41)
(43)
(44)
(45)
(46)
(63)
(43)
(44)
(45)
(46)
(41)
(65;66)
(63)
(41)
(69)
(63)
(65;66)
(61)
(65;66)
Ref.
Chapter 1.1
905 - 1537
t.p.
Fibrinopeptide A (fragments)
Haptoglobin (alpha 1)
S100-A6 (isoforms)
Prostaglandin D2 synthase
Kininogen HMW
10094
t.p.
10900
7790
4300
4300
4286
4276
2271 3272
2271 4293
ITIH4
ITIH4
3375 3490
Human Neutrophil Peptide
1-3
998 -2358
15940
Haemoglobin beta chain
(isoforms)
Haemopexin
27152
9192
9192
Heat shock protein 27
t.p.
2379, 2659
Fibrinogen alpha (fragments)
2661
Platform (m/z)
MALDI
SELDI
Cell line
Tissue
CSF
Serum
-
-
-
+
+
Serum
Serum
+
n.s.
+
+
+
+
+
+
-
+/-*
+
-
Apoptosis
Cancer
Metastasis
Shock
Cancer
Cancer
Cancer
Cancer
Cancer
Cancer
Cancer
Cancer
Metastasis
Cancer
Luminal
subtype
Metastasis
Relapse
Relapse
Cancer
Cancer
Cancer
Expression
+/in
+/-
Serum
Serum
Serum
Serum
Serum
NAF
CSF
NAF
Tissue
CSF
Serum
Serum
Serum
Serum
Plasma
Matrix
(65;66)
(25)
(23)
Ca2+-binding protein, growth factor
(69)
(41)
(48)
(43)
(44)
(45)
(46)
(47)
(55)
(65;66)
(51)
(61)
(65;66)
(63)
(64)
(41)
(41)
(49)
Ref.
Catalyses prostaglandin conversion
Blood coagulation, bradykinin release
Acute phase reactant?
Antibiotic, fungicide and antiviral
Haeme binding and transport,
acute phase protein
Oxygen transport
Stress resistance, actin organisation
Acute phase protein,
haemoglobin binding
Blood coagulation
Blood coagulation
Function
Candidate biomarkers in breast cancer identified by MALDI- and SELDI-TOF MS protein profiling studies (continued).
Biomarker identity
Table 2
Clinical proteomics in breast cancer
297
30
Ubiquitin
Transthyretin (fragment)
Transferrin (bovine)
8568
2451
t.p.
Cell line
Tissue
Cell line
Tissue
8507
8560
Serum
CSF
Medium
Serum
CSF
Cell line
Tissue
Matrix
8445
7600
81763
Transferrin (human)
t.p.
13300
10842
S100-A8
S100-A9
Platform (m/z)
MALDI
SELDI
+
+
+
-
+
+
+
-
+
Basal-like
subtype
Metastasis
Apoptosis
Cancer
Cancer
Metastasis
Resistance
Relapse
Metastasis
Basal-like
subtype
Cancer
Expression
+/in
+/-
(60)
Ca2+-binding protein, inflammation
(dimer with S100-A8)
Protein modifier
Thyroid hormone-binding protein,
acute phase reactant
Iron binding & transport, cell proliferation
(62)
(25)
(23)
(60)
(41)
(65;66)
(68)
(63)
(65;66)
(23)
Ca2+-binding protein, inflammation
(dimer with S100-A9)
Acute phase reactant, iron binding &
transport, cell proliferation
Ref.
Function
Candidate biomarkers in breast cancer identified by MALDI- and SELDI-TOF MS protein profiling studies (continued).
Biomarker identity
Table 2
Chapter 1.1
Clinical proteomics in breast cancer
References
(1)
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. Cancer statistics, 2008. CA Cancer J Clin 2008;
58(2):71-96.
(2)
Ries LA, Melbert D, Krapcho M, Mariotto A, Miller BA, Feuer EJ et al. SEER Cancer Statistics Review,
1975-2004. http://seer.cancer.gov/csr/1975_2004/ . 2008.
(3)
Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer
Trialists' Collaborative Group. Lancet 1998; 352(9132):930-942.
(4)
Mommers EC, van Diest PJ, Leonhart AM, Meijer CJ, Baak JP. Balance of cell proliferation and apoptosis
in breast carcinogenesis. Breast Cancer Res Treat 1999; 58(2):163-169.
(5)
Reis-Filho JS, Lakhani SR. The diagnosis and management of pre-invasive breast disease: genetic
alterations in pre-invasive lesions. Breast Cancer Res 2003; 5(6):313-319.
(6)
't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M et al. Gene expression profiling predicts
clinical outcome of breast cancer. Nature 2002; 415(6871):530-536.
(7)
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F et al. Gene-expression profiles to predict
distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365(9460):671-679.
(8)
Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S et al. A strong candidate for
the breast and ovarian cancer susceptibility gene BRCA1. Science 1994; 266(5182):66-71.
(9)
Vogelstein B, Kinzler KW. Has the breast cancer gene been found? Cell 1994; 79(1):1-3.
(10) Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ et al. Proteomics: new
perspectives, new biomedical opportunities. Lancet 2000; 356(9243):1749-1756.
(11) Aebersold R, Anderson L, Caprioli R, Druker B, Hartwell L, Smith R. Perspective: a program to improve
protein biomarker discovery for cancer. J Proteome Res 2005; 4(4):1104-1109.
(12) Dove A. Proteomics: translating genomics into products? Nat Biotechnol 1999; 17(3):233-236.
(13) Clarke W, Zhang Z, Chan DW. The application of clinical proteomics to cancer and other diseases. Clin
Chem Lab Med 2003; 41(12):1562-1570.
(14) Hondermarck H, Vercoutter-Edouart AS, Revillion F, Lemoine J, Yazidi-Belkoura I, Nurcombe V et al.
Proteomics of breast cancer for marker discovery and signal pathway profiling. Proteomics 2001;
1(10):1216-1232.
(15) Engwegen JY, Gast MC, Schellens JH, Beijnen JH. Clinical proteomics: searching for better tumour
markers with SELDI-TOF mass spectrometry. Trends Pharmacol Sci 2006; 27(5):251-259.
(16) Hortin GL. The MALDI-TOF mass spectrometric view of the plasma proteome and peptidome. Clin
Chem 2006; 52(7):1223-1237.
(17) Hutchens TW, Yip TT. New desorption strategies for the mass spectrometric analysis of macromolecules.
Rapid Commun Mass Spectrom 1993; 7:576-580.
(18) Merchant M, Weinberger SR. Recent advancements in surface-enhanced laser desorption/ionizationtime of flight-mass spectrometry. Electrophoresis 2000; 21(6):1164-1177.
(19) Simpson RJ, Bernhard OK, Greening DW, Moritz RL. Proteomics-driven cancer biomarker discovery:
looking to the future. Curr Opin Chem Biol 2008; 12(1):72-77.
(20) Nakagawa T, Huang SK, Martinez SR, Tran AN, Elashoff D, Ye X et al. Proteomic profiling of primary
breast cancer predicts axillary lymph node metastasis. Cancer Res 2006; 66(24):11825-11830.
(21) Traub F, Feist H, Kreipe HH, Pich A. SELDI-MS-based expression profiling of ductal invasive and
lobular invasive human breast carcinomas. Pathol Res Pract 2005; 201(12):763-770.
31
Chapter 1.1
(22) Cowherd SM, Espina VA, Petricoin EF, III, Liotta LA. Proteomic analysis of human breast cancer tissue
with laser-capture microdissection and reverse-phase protein microarrays. Clin Breast Cancer 2004;
5(5):385-392.
(23) Sanders ME, Dias EC, Xu BJ, Mobley JA, Billheimer D, Roder H et al. Differentiating proteomic
biomarkers in breast cancer by laser capture microdissection and MALDI MS. J Proteome Res 2008;
7(4):1500-1507.
(24) Umar A, Dalebout JC, Timmermans AM, Foekens JA, Luider TM. Method optimisation for peptide
profiling of microdissected breast carcinoma tissue by matrix-assisted laser desorption/ionisation-time of
flight and matrix-assisted laser desorption/ionisation-time of flight/time of flight-mass spectrometry.
Proteomics 2005; 5(10):2680-2688.
(25) Leong S, Christopherson RI, Baxter RC. Profiling of apoptotic changes in human breast cancer cells using
SELDI-TOF mass spectrometry. Cell Physiol Biochem 2007; 20(5):579-590.
(26) Hanash SM, Pitteri SJ, Faca VM. Mining the plasma proteome for cancer biomarkers. Nature 2008;
452(7187):571-579.
(27) Petricoin EF, Zoon KC, Kohn EC, Barrett JC, Liotta LA. Clinical proteomics: translating benchside
promise into bedside reality. Nat Rev Drug Discov 2002; 1(9):683-695.
(28) Conrads TP, Zhou M, Petricoin EF, III, Liotta L, Veenstra TD. Cancer diagnosis using proteomic
patterns. Expert Rev Mol Diagn 2003; 3(4):411-420.
(29) Grizzle WE, Semmes OJ, Bigbee WL, Zhu L, Malik G, Oelschlager DK et al. The need from review and
understanding of SELDI/MALDI mass spectroscopy data prior to analysis. Cancer Informatics 2005; 1:8697.
(30) Good DM, Thongboonkerd V, Novak J, Bascands JL, Schanstra JP, Coon JJ et al. Body fluid proteomics
for biomarker discovery: lessons from the past hold the key to success in the future. J Proteome Res
2007; 6(12):4549-4555.
(31) Callesen AK, Vach W, Jorgensen PE, Cold S, Tan Q, Depont CR et al. Combined experimental and
statistical strategy for mass spectrometry based serum protein profiling for diagnosis of breast cancer: a
case-control study. J Proteome Res 2008; 7(4):1419-1426.
(32) de Noo ME, Deelder A, van der WM, Ozalp A, Mertens B, Tollenaar R. MALDI-TOF serum protein
profiling for the detection of breast cancer. Onkologie 2006; 29(11):501-506.
(33) Hu Y, Zhang S, Yu J, Liu J, Zheng S. SELDI-TOF-MS: the proteomics and bioinformatics approaches in
the diagnosis of breast cancer. Breast 2005; 14(4):250-255.
(34) Laronga C, Becker S, Watson P, Gregory B, Cazares L, Lynch H et al. SELDI-TOF serum profiling for
prognostic and diagnostic classification of breast cancers. Dis Markers 2003; 19(4-5):229-238.
(35) Pusztai L, Gregory BW, Baggerly KA, Peng B, Koomen J, Kuerer HM et al. Pharmacoproteomic analysis
of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or
adjuvant chemotherapy for breast carcinoma. Cancer 2004; 100(9):1814-1822.
(36) Shin S, Cazares L, Schneider H, Mitchell S, Laronga C, Semmes OJ et al. Serum biomarkers to
differentiate benign and malignant mammographic lesions. J Am Coll Surg 2007; 204(5):1065-1071.
(37) Becker S, Cazares LH, Watson P, Lynch H, Semmes OJ, Drake RR et al. Surfaced-enhanced laser
desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1
and sporadic breast cancer. Ann Surg Oncol 2004; 11(10):907-914.
(38) Gast MC, Bonfrer JM, van Dulken EJ, de Kock L, Rutgers EJ, Schellens JH et al. SELDI-TOF MS serum
protein profiles in breast cancer: assessment of robustness and validity. Cancer Biomark 2006; 2(6):235248.
(39) Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D et al. A novel approach toward
development of a rapid blood test for breast cancer. Clin Breast Cancer 2003; 4(3):203-209.
32
Clinical proteomics in breast cancer
(40) Belluco C, Petricoin EF, Mammano E, Facchiano F, Ross-Rucker S, Nitti D et al. Serum Proteomic
Analysis Identifies a Highly Sensitive and Specific Discriminatory Pattern in Stage 1 Breast Cancer. Ann
Surg Oncol 2007; 14(9):2470-2476.
(41) Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB et al. Differential
exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006; 116(1):271284.
(42) van der Werff MP, Mertens B, de Noo ME, Bladergroen MR, Dalebout HC, Tollenaar RA et al. Casecontrol breast cancer study of MALDI-TOF proteomic mass spectrometry data on serum samples. Stat
Appl Genet Mol Biol 2008; 7:Article 2.
(43) Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for
identification of serum biomarkers to detect breast cancer. Clin Chem 2002; 48(8):1296-1304.
(44) Li J, Orlandi R, White CN, Rosenzweig J, Zhao J, Seregni E et al. Independent Validation of Candidate
Breast Cancer Serum Biomarkers Identified by Mass Spectrometry. Clin Chem 2005; 51(12):2229-2235.
(45) Mathelin C, Cromer A, Wendling C, Tomasetto C, Rio MC. Serum biomarkers for detection of breast
cancers: a prospective study. Breast Cancer Res Treat 2005;1-8.
(46) van Winden AWJ, Gast MCW, Beijnen JH, Rutgers EJ, Grobbee DE, Peeters PHM et al. Validation of
previously identified serum biomarkers for breast cancer with SELDI-TOF MS: a case control study.
BMC Medical Genomics 2008; Accepted for publication.
(47) Fung ET, Yip TT, Lomas L, Wang Z, Yip C, Meng XY et al. Classification of cancer types by measuring
variants of host response proteins using SELDI serum assays. Int J Cancer 2005; 115(5):783-789.
(48) Song J, Patel M, Rosenzweig CN, Chan-Li Y, Sokoll LJ, Fung ET et al. Quantification of fragments of
human serum inter-alpha-trypsin inhibitor heavy chain 4 by a surface-enhanced laser
desorption/ionization-based immunoassay. Clin Chem 2006; 52(6):1045-1053.
(49) Shi Q, Harris LN, Lu X, Li X, Hwang J, Gentleman R et al. Declining plasma fibrinogen alpha fragment
identifies HER2-positive breast cancer patients and reverts to normal levels after surgery. J Proteome Res
2006; 5(11):2947-2955.
(50) Paweletz CP, Trock B, Pennanen M, Tsangaris T, Magnant C, Liotta LA et al. Proteomic patterns of
nipple aspirate fluids obtained by SELDI-TOF: potential for new biomarkers to aid in the diagnosis of
breast cancer. Dis Markers 2001; 17(4):301-307.
(51) Sauter ER, Zhu W, Fan XJ, Wassell RP, Chervoneva I, Du Bois GC. Proteomic analysis of nipple aspirate
fluid to detect biologic markers of breast cancer. Br J Cancer 2002; 86(9):1440-1443.
(52) Sauter ER, Shan S, Hewett JE, Speckman P, Du Bois GC. Proteomic analysis of nipple aspirate fluid using
SELDI-TOF-MS. Int J Cancer 2005; 114(5):791-796.
(53) Kuerer HM, Coombes KR, Chen JN, Xiao L, Clarke C, Fritsche H et al. Association between ductal fluid
proteomic expression profiles and the presence of lymph node metastases in women with breast cancer.
Surgery 2004; 136(5):1061-1069.
(54) Pawlik TM, Fritsche H, Coombes KR, Xiao L, Krishnamurthy S, Hunt KK et al. Significant differences in
nipple aspirate fluid protein expression between healthy women and those with breast cancer
demonstrated by time-of-flight mass spectrometry. Breast Cancer Res Treat 2005; 89(2):149-157.
(55) Li J, Zhao J, Yu X, Lange J, Kuerer H, Krishnamurthy S et al. Identification of biomarkers for breast
cancer in nipple aspiration and ductal lavage fluid. Clin Cancer Res 2005; 11(23):8312-8320.
(56) He J, Gornbein J, Shen D, Lu M, Rovai LE, Shau H et al. Detection of breast cancer biomarkers in nipple
aspirate fluid by SELDI-TOF and their identification by combined liquid chromatography-tandem mass
spectrometry. Int J Oncol 2007; 30(1):145-154.
(57) Noble JL, Dua RS, Coulton GR, Isacke CM, Gui GP. A comparative proteinomic analysis of nipple
aspiration fluid from healthy women and women with breast cancer. Eur J Cancer 2007; 43(16):23152320.
33
Chapter 1.1
(58) Mendrinos S, Nolen JD, Styblo T, Carlson G, Pohl J, Lewis M et al. Cytologic findings and protein
expression profiles associated with ductal carcinoma of the breast in ductal lavage specimens using
surface-enhanced laser desorption and ionization-time of flight mass spectrometry. Cancer 2005;
105(3):178-183.
(59) Streckfus CF, Bigler LR, Zwick M. The use of surface-enhanced laser desorption/ionization time-offlight mass spectrometry to detect putative breast cancer markers in saliva: a feasibility study. J Oral
Pathol Med 2006; 35(5):292-300.
(60) Goncalves A, Charafe-Jauffret E, Bertucci F, Audebert S, Toiron Y, Esterni B et al. Protein profiling of
human breast tumor cells identifies novel biomarkers associated with molecular subtypes. Mol Cell
Proteomics 2008; 7(8):1420-1433.
(61) Brozkova K, Budinska E, Bouchal P, Hernychova L, Knoflickova D, Valik D et al. Surface-enhanced laser
desorption/ionization time-of-flight proteomic profiling of breast carcinomas identifies
clinicopathologically relevant groups of patients similar to previously defined clusters from cDNA
expression. Breast Cancer Res 2008; 10(3):R48.
(62) Ricolleau G, Charbonnel C, Lode L, Loussouarn D, Joalland MP, Bogumil R et al. Surface-enhanced laser
desorption/ionization time of flight mass spectrometry protein profiling identifies ubiquitin and ferritin
light chain as prognostic biomarkers in node-negative breast cancer tumors. Proteomics 2006; 6(6):19631975.
(63) Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M et al. Postoperative serum
proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving
adjuvant chemotherapy. Oncogene 2006; 25(7):981-989.
(64) Gast MC, van Tinteren H, Bontenbal M, van Hoesel QC, Nooij MA, Rodenhuis S et al. Haptoglobin
phenotype is not a predictor of recurrence free survival in high-risk primary breast cancer patients. BMC
Cancer 2009; 8:389.
(65) Dekker LJ, Boogerd W, Stockhammer G, Dalebout JC, Siccama I, Zheng P et al. MALDI-TOF mass
spectrometry analysis of cerebrospinal fluid tryptic peptide profiles to diagnose leptomeningeal
metastases in patients with breast cancer. Mol Cell Proteomics 2005; 4(9):1341-1349.
(66) Rompp A, Dekker L, Taban I, Jenster G, Boogerd W, Bonfrer H et al. Identification of leptomeningeal
metastasis-related proteins in cerebrospinal fluid of patients with breast cancer by a combination of
MALDI-TOF, MALDI-FTICR and nanoLC-FTICR MS. Proteomics 2007; 7(3):474-481.
(67) Mian S, Ball G, Hornbuckle J, Holding F, Carmichael J, Ellis I et al. A prototype methodology combining
surface-enhanced laser desorption/ionization protein chip technology and artificial neural network
algorithms to predict the chemoresponsiveness of breast cancer cell lines exposed to Paclitaxel and
Doxorubicin under in vitro conditions. Proteomics 2003; 3(9):1725-1737.
(68) Dowling P, Maurya P, Meleady P, Glynn SA, Dowd AJ, Henry M et al. Purification and identification of
a 7.6-kDa protein in media conditioned by superinvasive cancer cells. Anticancer Res 2007;
27(3A):1309-1317.
(69) Heike Y, Hosokawa M, Osumi S, Fujii D, Aogi K, Takigawa N et al. Identification of serum proteins
related to adverse effects induced by docetaxel infusion from protein expression profiles of serum using
SELDI ProteinChip system. Anticancer Res 2005; 25(2B):1197-1203.
(70) Goldsmith GH. Hemostatic changes in patients with malignancy. Int J Hematol 2001; 73(2):151-156.
(71) Barrett JH, Cairns DA. Application of the random forest classification method to peaks detected from
mass spectrometric proteomic profiles of cancer patients and controls. Stat Appl Genet Mol Biol 2008;
7:Article 4.
(72) Heidema AG, Nagelkerke N. Developing a discrimination rule between breast cancer patients and
controls using proteomics mass spectrometric data: a three-step approach. Stat Appl Genet Mol Biol
2008; 7:Article 5.
(73) Fearn T. Principal component discriminant analysis. Stat Appl Genet Mol Biol 2008; 7:Article 6.
34
Clinical proteomics in breast cancer
(74) Datta S. Classification of breast cancer versus normal samples from mass spectrometry profiles using
linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol
2008; 7:Article 7.
(75) Hoefsloot HC, Smit S, Smilde AK. A classification model for the Leiden proteomics competition. Stat
Appl Genet Mol Biol 2008; 7:Article 8.
(76) Strimenopoulou F, Brown PJ. Empirical Bayes logistic regression. Stat Appl Genet Mol Biol 2008;
7:Article 9.
(77) Goeman JJ. Autocorrelated logistic ridge regression for prediction based on proteomics spectra. Stat Appl
Genet Mol Biol 2008; 7:Article 10.
(78) Pham TV, van de Wiel MA, Jimenez CR. Support vector machine approach to separate control and
breast cancer serum samples. Stat Appl Genet Mol Biol 2008; 7:Article 11.
(79) Valkenborg D, Van Sanden S, Lin D, Kasim A, Zhu Q, Haldermans P et al. A cross-validation study to
select a classification procedure for clinical diagnosis based on proteomic mass spectrometry. Stat Appl
Genet Mol Biol 2008; 7:Article 12.
(80) Gammerman A, Nouretdinov I, Burford B, Chervonenkis A, Vovk V, Luo Z. Clinical mass spectrometry
proteomic diagnosis by conformal predictors. Stat Appl Genet Mol Biol 2008; 7(2):Article 13.
(81) Hu S, Loo JA, Wong DT. Human body fluid proteome analysis. Proteomics 2006; 6(23):6326-6353.
(82) Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H et al. Gene expression patterns of breast
carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001;
98(19):10869-10874.
(83) Perou CM, Sorlie T, Eisen MB, van de RM, Jeffrey SS, Rees CA et al. Molecular portraits of human breast
tumours. Nature 2000; 406(6797):747-752.
(84) Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T et al. A collection of breast cancer cell lines for
the study of functionally distinct cancer subtypes. Cancer Cell 2006; 10(6):515-527.
(85) Habermann JK, Roblick UJ, Luke BT, Prieto DA, Finlay WJ, Podust VN et al. Increased serum levels of
complement C3a anaphylatoxin indicate the presence of colorectal tumors. Gastroenterology 2006;
131(4):1020-1029.
(86) Zhang Z, Bast RC, Jr., Yu Y, Li J, Sokoll LJ, Rai AJ et al. Three biomarkers identified from serum
proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004; 64(16):5882-5890.
(87) Bank U, Kruger S, Langner J, Roessner A. Review: peptidases and peptidase inhibitors in the
pathogenesis of diseases. Disturbances in the ubiquitin-mediated proteolytic system. Proteaseantiprotease imbalance in inflammatory reactions. Role of cathepsins in tumour progression. Adv Exp
Med Biol 2000; 477:349-378.
(88) Blasi F. Proteolysis, cell adhesion, chemotaxis, and invasiveness are regulated by the u-PA-u-PAR-PAI-1
system. Thromb Haemost 1999; 82(2):298-304.
35
Chapter
Technical aspects
2
Chapter
Comparing the old and new
generation SELDI-TOF MS:
consequence for serum
protein profiling
Marie-Christine W. Gast
Judith Y.M.N. Engwegen
Jan H.M. Schellens
Jos H. Beijnen
BMC Med Genomics 2008;1:4
2.1
Chapter 2.1
Abstract
Although the PBS-IIc SELDI-TOF MS apparatus has been extensively used in the search
for better biomarkers, issues have been raised concerning the semi-quantitative nature
of the technique and its reproducibility. To overcome these limitations, a new SELDITOF MS instrument has been introduced: the PCS 4000 series. Changes in this
apparatus compared to the older one are a.o. an increased dynamic range of the
detector, an adjusted configuration of the detector sensitivity, a raster scan that ensures
more complete desorption coverage and an improved detector attenuation mechanism.
In the current study, we evaluated the performance of the old PBS-IIc and new PCS
4000 series generation SELDI-TOF MS apparatus. To this end, two different sample sets
were profiled after which the same ProteinChip arrays were analysed successively by
both instruments. Generated spectra were analysed by the associated software packages.
The performance of both instruments was evaluated by assessment of the number of
peaks detected in the two sample sets, the biomarker potential and reproducibility of
generated peak clusters, and the number of peaks detected following serum
fractionation. We could not confirm the claimed improved performance of the new PCS
4000 instrument, as assessed by the number of peaks detected, the biomarker potential
and the reproducibility. However, the PCS 4000 instrument did prove to be of superior
performance in peak detection following profiling of serum fractions. As serum
fractionation facilitates detection of low abundant proteins through reduction of the
dynamic range of serum proteins, it is now increasingly applied in the search for new
potential biomarkers. Hence, although the new PCS 4000 instrument did not differ
from the old PBS-IIc apparatus in the analysis of crude serum, its superior performance
after serum fractionation does hold promise for improved biomarker detection and
identification.
40
Comparison of SELDI-TOF MS apparatus
Introduction
The development of mass spectrometry (MS) for the analysis of complex protein
mixtures has greatly enhanced the possibility of large-scale protein profiling studies.
Protein profiling studies are generally performed using a top-down approach starting
with a mixture of intact proteins and peptides. After sample pre-fractionation, e.g., by
two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), proteins are identified
either by peptide mass fingerprinting using tryptic digestion and/or tandem MS. Mass
spectrometry for protein profiling is particularly important for the low-molecularweight fraction of the proteome, since the use of immunological assays is limited due to
a lack of antibodies for these peptides. Up until recently, real high-throughput
technologies for mass spectrometric protein profiling have been lacking. Two recent
applications of matrix-assisted laser desorption/ionisation time-of-flight mass
spectrometry (MALDI-TOF MS) combine sample pre-fractionation with MS, facilitating
the analysis of many samples at the time. A magnetic beads-based assay using beads
with different chromatographic affinities is available from Bruker Daltonics (1).
Alternatively, surface-enhanced laser desorption/ionisation time-of-flight mass
spectrometry (SELDI-TOF MS; Bio-Rad Laboratories, Hercules, CA) can be used to
profile biological matrices on arrays with different surface chemistries. A ProteinChip
Interface is available for hybrid quadrupole - time of flight mass spectrometers (PCIQqTOF; e.g., QSTAR, Applied Biosystems / MDS SCIEX, Foster City, CA, USA),
permitting QqTOF analysis of ProteinChip arrays. Although QqTOF platforms have
both tandem MS capability and superior mass accuracy, it suffers from decreased
sensitivity and a limited data acquisition range (up to 4 kDa), compared to the SELDITOF MS platform (up to 200 kDa).
The SELDI-TOF MS technology has been extensively used for the assessment of tissue,
serum and plasma to find diagnostic, prognostic or therapy-predictive biomarkers for
diseases, especially cancer (2-8). However, issues have been raised concerning the semiquantitative nature of the technique and its reproducibility (9-11). The first generation
SELDI-TOF MS instruments (PBS-II and PBS-IIc) generate spectra with a fixed
maximum signal, which is set to 100. Protein abundances exceeding this maximum
saturate the detector and are cut off to 100, neglecting the excess abundance and leading
to underestimated peak intensities from both the saturated peak and its following peak,
as the detector remains saturated for some time (12). Furthermore, settings for laser
intensity and detector sensitivity are not easily optimised to generate unsaturated
spectra for all the samples to be measured. To overcome these limitations a new SELDITOF MS instrument has been introduced: the PCS 4000 series. Changes in this
apparatus compared to the older ones are: 1) the increased dynamic range of the
detector, so that saturation is less likely to occur, 2) the special configuration for
sensitivity in the high mass range for better detection of proteins > 100 kDa, 3) a socalled Synchronised Optical Laser Extraction, which scans each spot in a raster to
41
Chapter 2.1
ensure complete desorption coverage, 4) a detector attenuation mechanism, enabling
signal reduction up to a specified mass and preventing saturation by matrix molecules.
Furthermore, instead of using arbitrary units, peak intensities are scaled in µA,
corresponding to the real electric current generated by the impact of ions onto the
detector. Laser intensity settings are in nJ (13).
These improvements should lead to better reproducibility of peak intensities and
detection of more peaks. Yet, the ultimate gain would be that this leads to more and
better biomarker candidates. We chose to assess these claims by serum protein profiling
of two different cohorts of cancer patients and matched controls on both the PBS-IIc
and the PCS 4000 SELDI-TOF MS. The data generated on each platform were analysed
by the associated software packages. Furthermore, the PBS-IIc generated data were
analysed by the software package associated with the PCS 4000 apparatus, to assess the
influence of the different software packages. The numbers of detected and significantly
different peaks on both instruments were compared, as was the potential of each data
set to yield a reliable classification of patients and controls. Furthermore, the
reproducibilities of the instruments were compared. Lastly, we also profiled serum
fractions and assessed the difference in number of peaks detected between the PBS-IIc
and PCS 4000 instruments.
Materials and Methods
Chemicals
All used chemicals were obtained from Sigma, St. Louis, MO, USA, unless stated
otherwise.
Patient samples
The performances of both apparatus were assessed with two distinct sample sets. A first
set of 45 sera from colorectal cancer (CRC) patients and 43 matched controls (CON) was
prospectively collected between July 2003 and October 2005 (referred to as the CRC
set). The second set consisted of 45 sera from breast cancer (BC) patients and 46
matched normal women (CON), collected between January 2003 and July 2005
(referred to as the BC set). Both sets were obtained at the Netherlands Cancer Institute,
in Amsterdam, The Netherlands. Sample collection was performed with individuals'
informed consent after approval by the institutional review boards.
Serum fractionation
Serum samples from three normal women were fractionated in duplicate on QhyperD
beads with a strong anion exchange moiety (Bio-Rad Labs), according to manufacturer’s
protocol. Sample fractionation was performed with a Biomek 3000 Laboratory
Automation Workstation (Beckman Coulter Inc.). First, sera were denatured with 9 M
42
Comparison of SELDI-TOF MS apparatus
urea / 2% 3[(3-cholamidepropyl)-dimethylammonio]-propane sulfonate (CHAPS). After
binding of denatured serum to the beads, the flow-through was collected and bound
proteins were subsequently eluted with buffers with pH from 9 to 3. Remaining
proteins were finally eluted with an organic buffer.
Protein profiling
For profiling of whole serum each sample was analysed according to previously
developed protocols (4). CRC samples and their matched controls were first denatured
with 9 M urea / 2% CHAPS / 1% dithiotreitol. Then, each sample was applied in
triplicate on CM10 arrays (weak cation exchange chromatography) with 20 mM sodium
phosphate pH 5 / 0.1% TritonX-100 as a binding buffer and 20 mM sodium phosphate
pH 5 as a wash buffer. BC samples and their matched controls were denatured in 9 M
urea / 2% CHAPS, after which each sample was applied in duplicate on IMAC30 arrays
(immobilised metal affinity capture chromatography). Prior to sample application,
IMAC30 arrays were charged twice with 50 µL 100 mM nickel sulphate (Braun,
Emmenbrücke, Germany), followed by three rinses with deionised water. Phosphate
Buffered Saline (PBS; 0.01 M) pH 7.4 / 0.5 M sodium chloride / 0.1% TritonX-100 was
applied as a binding buffer and PBS pH 7.4 / 0.5 M sodium chloride as a wash buffer.
For both sample sets, a 50% sinapinic acid (SPA; Bio-Rad Labs) solution in 50%
acetonitrile (ACN) / 0.5% trifluoroacetic acid (TFA) was used as energy absorbing
matrix.
Profiling of fractionated serum was performed on both CM10 chips and IMAC30 arrays.
Binding and wash buffers were 100 mM sodium acetate pH 4 and 50 mM HEPES buffer
for CM10. For IMAC30, 100 mM copper sulphate was used as charging solution, 100
mM sodium acetate as neutralizing buffer and 100 mM sodium phosphate pH 7 / 0.5 M
sodium chloride as binding and wash buffer (all: Bio-Rad Labs). A solution of 50% SPA
in 50% ACN / 0.5% TFA was used as matrix.
During all profiling experiments arrays were assembled in 96-well format bioprocessors
(Bio-Rad Labs), which were placed on a platform shaker at 350 rpm. Arrays were
equilibrated twice with 200 µL of binding buffer, incubated with denatured sample or
QHyperD serum fraction for 30 min and, after binding, washed twice with binding
buffer, followed by two washes with wash buffer. Lastly, arrays were rinsed with
deionised water. After air-drying, two times 1 µL of matrix was applied to the array
spots.
SELDI-TOF MS analysis of all data sets was performed with both the PBS-IIc and the
PCS 4000 ProteinChip Reader (Bio-Rad Labs). Data acquisition and processing were
optimised for each sample set separately. Each spot was read twice, once with the PCS
4000 and once with the PBS-IIc instrument. Measurement settings for each apparatus
and sample set are summarised in Table 1. M/z values were calibrated externally with
All-in-One peptide standard (Bio-Rad Labs).
43
Chapter 2.1
bioinformatics
Statistics and bio
informatics
To account for possible differences in data processing by the different software
packages, data from the PBS-IIc were analysed with the ProteinChip Software, version
3.1 (Bio-Rad Labs) as well as with Ciphergen Express™ version 3.0.6. (Bio-Rad Labs).
PCS 4000 data were only processed with the latter package. The PBS-IIc-generated
spectra analysed by the ProteinChip Software and Ciphergen Express™ respectively
will further be referred to as “data set 1a” and “data set 1b”. The PCS 4000-generated
spectra, analysed by Ciphergen Express™ will be referred to as “data set 2”.
Table 1
Settings for protein profiling and data processing.
CRC sample set
PBS-IIc
PCS 4000
BC sample set
PBS-IIc
PCS 4000
CRC: n = 45
CON: n = 43
CM10
20 mM NaAc pH 5
3
BC: n = 45
CON: n = 46
IMAC30 Ni
PBS pH 7.4 + 0.5 M NaCl
2
SELDI analysis parameters
Samples
Array type
Binding conditions
Replicates
SELDI acquisition parameters
m/z range
Laser intensity
Detector sensitivity
Deflector/detector attenuation
Laser shots kept
0-200 kDa
155
6
2000 Da
65
Not-assessable spectra
Cluster settings:
First pass
Second pass
Cluster mass window
Present in
0-200 kDa
3500 nJ
n.a.
2000 Da
530
0-200 kDa
155
5
1000 Da
105
CRC: 2 / 135
CON: 4 / 129
S/N 5
S/N 2
0.3%
45%
Valley depth 5
Valley depth 2
0.3%
45%
0-200 kDa
3500 nJ
n.a
1000 Da
530
BC: 4 / 90
CON: 0 / 92
S/N 5
S/N 2
0.3%
30%
Valley depth 5
Valley depth 2
0.3%
30%
Abbreviations: BC: breast cancer, CM10: weak cation exchange ProteinChip array, CON: control, CRC:
colorectal cancer, IMAC30: Immobilised metal affinity capture ProteinChip array, n.a.: not applicable, PBS:
phospate buffered saline, S/N: Signal to noise ratio.
Spectra from the CRC and BC sets were analysed separately. Acquired spectra from each
set were compiled and analysed as a whole. Both the ProteinChip and Ciphergen
Express™ software spectra were baseline subtracted with the following settings: smooth
before fitting baseline: 25 points, fitting width: 10 times expected peak width. Filtering
was “on” using an average width of 0.2 times the expected peak width. The noise was
calculated from 2000 or 1000 to 200,000 Da for the CRC and BC set respectively.
Spectra were normalised to the total ion current in the same m/z range. For peak
clustering with the ProteinChip Software, the Biomarker Wizard (BMW; Bio-Rad Labs)
44
Comparison of SELDI-TOF MS apparatus
application was used. For clustering with the Ciphergen Express™ software (Bio-Rad
Labs), identical clustering conditions were defined (see Table 1). In each set, peaks were
auto-detected starting from 2000 Da.
For the CRC and BC set, peak intensities from the triplicate (CRC) and duplicate (BC)
analyses were averaged and mean peak intensities between groups compared by the
non-parametric Mann-Whitney U (MWU) test (p < 0.01 considered statistically
significant). For the CRC set, the median CV in each sample set was calculated from the
CV’s of the triplicate analyses for all clustered peaks in each data set as well as for all
common peaks present in each of the three data sets.
For the BC set, Spearman’s rank correlation coefficient was calculated on peak
intensities in each duplicate analysis for all three data sets. The majority of peaks (>
50%) detected spectrum wide in the three data sets are of relatively low average
intensity (< 5), increasing the chance of finding potential biomarkers in the low
intensity range. Hence, the reproducibility of peaks in the low intensity range is of
special interest. However, correlation analyses are influenced by outliers (i.e., the few
high intensity peaks detected), even when using non-parametric statistics. We therefore
chose to assess the reproducibility in subsets of peaks, starting with inclusion of the
10%, 20%, 30%, etc. of peaks with lowest intensity, and ending with inclusion of all
peaks detected. Spearman’s rank correlation coefficient and corresponding p-values
were subsequently plotted per subset of peaks.
Classification performance of the data sets obtained with both apparatus was assessed by
building classification trees with the Biomarker Patterns Software (BPS; Bio-Rad Labs).
Trees were generated with the gini method and the minimal cost tree was chosen in
both the CRC and BC sample set. A ten-fold cross validation was used to estimate the
sensitivity and specificity for each tree.
For the CM10 and IMAC30 serum fractionation sets, baseline correction and noise
calculation was performed as described for the CRC and BC set. For each duplicate
fraction, peaks were auto-detected by the ProteinChip software or Ciphergen Express™
by the settings described in Table 1. The number of peaks in each fraction was assessed,
as well as the number of unique peaks across all fractions.
Results
Protein profiling CRC set
Six spectra did not contain a protein profile and were thus not assessable (Table 1). For
data set 1a normalisation factors as estimated by the apparatus-associated software of
the assessable spectra were 0.67 to 3.39 (log -0.18 to 0.53) and for data set 1b 0.62 to 2.4
(log -0.21 to 0.38). For data set 2 values ranged from 0.33 to 4.8 (log -0.49 to 0.68). Since
the spectra with aberrant normalisation factors (>2 SD from mean of log normalisation
factor) were mostly not from the same samples for the two apparatus, none were
45
Chapter 2.1
excluded, to ensure an equal comparison of both machines. This concerned 13 and 11
spectra from data set 1a and 1b, and 14 from data set 2.
Comparing CRC vs. CON, 32 clusters were generated for data set 1a (Table 2). In
contrast, despite similar settings for processing and the same spectra, only 27 clusters
were generated for data set 1b. With the PCS 4000 (data set 2) 48 clusters could be
detected. Although the number of detected peaks was highest for data set 2, data set 1a
yielded a similar number of significantly different peaks. Detailed peak cluster
information can be found in Table 3. Overall, the significantly different peaks in all data
sets were largely similar.
Table 2
Peak clustering results for the CRC and BC sample set.
Number of peaks
peaks detected in:
Data set 1a *
Also detected in data set 1b
Also detected in data set 2
Data set 1b †
Also detected in data set 2
Data set 2 ‡
CRC set
all
32
31
31
27
26
48
p < 0.01
19
13
15
14
11
20
BC set
all
81
30
43
31
29
59
p < 0.01
47
22
28
22
21
45
Abbreviations: BC: breast cancer, CRC: colorectal cancer. * Data set 1a: PBS-IIc generated data, analysed by
the ProteinChip software, † data set 1b: PBS-IIc generated data, analysed by Ciphergen Express, ‡ data set 2:
PCS 4000 generated data, analysed by Ciphergen Express.
Classification trees were built with all clustered peaks in each data set and with the
subset of the 25 clusters that were detected in all three data sets (Table 4). The best tree
was generated with data set 2, with m/z 4446 as single classifier and sensitivity and
specificity ≥ 80%. Since this peak was not detected in data set 1a and 1b, other peaks
were used as classifiers in these sets, respectively m/z 15930 and 32308. The
classification trees constructed on the subset of 25 common clusters in data sets 1a and
1b applied the same cluster (m/z 32308). The best classifier of data set 2 made use of
apparently the same cluster (m/z 32394), and had a better performance as single
classifier in this set than in set 1a, but a similar performance as in set 1b.
Table 3
Data set 1a
2746*
3979*
4160
4179*
4290*
4481
4605
46
Peak cluster information for CRC and BC sample set.
CRC sample set
Data set 1b
2745*
4162
4181*
4474
Data set 2
2746*
3163
3406*
3978*
4159
4182*
4287*
4303
4446*
4480
4607
Data set 1a
2027
2146*
2154*
2235*
2277*
2647
2675*
2731*
2747
2760
2775
BC sample set
Data set 1b
Data set 2
2028
2146*
2760*
Comparison of SELDI-TOF MS apparatus
Table 3
Data set 1a
5723*
5913*
6443
6459*
6641
6655*
Peak cluster information for CRC and BC sample set (continued).
CRC sample set
Data set 1b
5724*
6443
6460*
6640
6654*
Data set 2
4961
5719*
5915*
6442
6458*
6643
6659*
6687
6846
7778
7982*
8079*
6860*
7779
7982*
8968*
9186
8962*
9187
9307*
9307*
13779*
14077*
15121
15930*
16105*
13778*
14077*
15121
15919*
16105*
23426*
28098*
32308*
23426
28093
32308*
51034
56408
67003
79100
51006
56337
67003
79040
6865*
7778
7990*
8074*
8159
8889
8962*
9210
9315
9360
9409
9593
10072*
12889
13796
14053
15168
15982*
16139*
16334*
18617*
23486
28216
32394*
39727
51116
56685
67239
80075
Data set 1a
2794
2888*
2960*
2968*
3091
3107
3151*
3168*
3282*
3296*
3431
3451
3689*
3781*
3824
3891*
3898
3916*
3965*
3980*
3995*
4078
4137
4155
4204*
4218*
4292*
4308*
4334*
4449*
4464*
4484*
4497*
4513*
4653
4669
4691
4798
5076
5090
5274*
5348*
5363*
5554*
5815
5916*
5932*
6100*
6122*
6142*
6667
6848
6965*
6990*
7482
7778
BC sample set
Data set 1b
Data set 2
2961*
3165*
3281*
3164*
3281*
3683*
3891*
3963*
3980*
3997*
3962*
3979*
3994*
4138
4218*
4292*
4308*
4218*
4289*
4304*
4447*
4463*
4484*
4444*
4458*
4482*
4652
4650
5078
5348*
5364*
5917*
5932*
6097*
6122*
5348*
5360*
5549*
5810
5915*
5929*
6676
6972*
6966*
7778
7775
47
Chapter 2.1
Table 3
Data set 1a
Peak cluster information for CRC and BC sample set (continued).
CRC
CRC sample set
Data set 1b
Data set 2
Data set 1a
7939*
7985
8155
8948*
9161*
9302
BC sample set
Data set 1b
8155
8955*
9303
13925*
13925*
33475*
33583*
43108
43015
60776
66702*
79724
60804
66711*
79393
109494
133447
133435
149723
Data set 2
7982
8150
8946*
9151*
9299
9526
11096*
11747*
13919*
14124*
22284*
28221
30502*
33490*
40059*
43098*
50704*
60889*
67142*
80323*
89750*
91037
93689*
104178*
110344*
136611*
149579*
177011
Abbreviations: BC: breast cancer, CRC: colorectal cancer. *MWU test; p < 0.01.
Protein profiling BC set
Following array reading with both the PCS 4000 and PBS-IIc apparatus, two spectra did
not contain a protein profile. Along with their duplicate reading, these spectra were
excluded from further analyses (Table 1). The normalisation factors for data set 1a were
0.52 to 2.15 (log -0.29 to 0.33). Using Ciphergen Express™ software, normalisation
factors of all spectra ranged from 0.51 to 2.25 (log -0.29 to 0.35) for the PBS-IIc
generated spectra and from 0.44 to 2.69 (log -0.36 to 0.43) for the PCS 4000 generated
spectra. In total, 9 and 8 spectra from data set 1a and 1b respectively and 10 spectra
from data set 2 had an aberrant normalisation factor (>2 SD from mean of log
normalisation factor). As the majority of these spectra were from different samples for
the 3 data sets, none were excluded, to ensure equal comparison of both apparatus. In
data set 1a and 1b respectively, a total of 81 and 31 clusters were detected. The
ProteinChip software detected 51 clusters that were not detected by Ciphergen
Express™ in the same data set. Except for one cluster (> 100 kDa), these unique clusters
were all < 10 kDa in mass and < 4 in intensity. In the data set 2, a total of 59 peak
clusters was detected. Fifteen of these clusters (all > 9 kDa) were not detected in either
48
Comparison of SELDI-TOF MS apparatus
data set 1a or 1b. Tables 2 and 3, respectively, provide an overview of peak clustering
results and detailed peak cluster information.
Table 4
Characteristics of the classification trees constructed on the CRC sample set.
Tree characteristics
Data set
Clusters
1a
All
Common
1b
All
Common
2
All
Common
(#)
(32)
(25)
(27)
(25)
(48)
(25)
Node 1
m/z 15930
m/z 15930
m/z 32308
m/z 32308
m/z 4446
m/z 32394
≤ 35.576
≤ 35.576
≤ 0.676
≤ 0.676
≤ 1.136
≤ 0.149
Node 2
m/z 51034
m/z 51034
-
≤ 1.372
≤ 1.372
-
Tree performance*
Sens (%)
Spec (%)
68.8
62.8
68.8
62.8
75.6
73.8
75.6
73.8
82.2
90.5
73.3
81.0
Abbreviations: sens: sensitivity, spec: specificity. *Tree performance as determined by 10-fold cross validation.
Classification trees were generated on all peaks detected in data set 1a, 1b or 2, and on
the subset of peaks detected across all three data sets (Table 5). All optimum decision
trees constructed on data set 1a and 1b, using either all peaks detected or only the
common peaks, applied m/z 3964 as single classifier, with data set 1b yielding the best
performance of ~ 80%. The trees constructed on data set 2 made use of different
clusters, either considering all peaks detected (m/z 9151 and m/z 5360) or the common
peaks detected (m/z 3979 and m/z 4218). However, the tree constructed on data set 1b
generally had the best performance.
Table 5
Characteristics of the classification trees constructed on the BC sample set.
Tree characteristics
Data set
Clusters
1a
All
Common
1b
All
Common
2
All
Common
(#)
(81)
(28)
(31)
(28)
(59)
(28)
Node 1
m/z 3964
m/z 3964
m/z 3964
m/z 3964
m/z 9151
m/z 3979
≤ 4.010
≤ 4.010
≤ 3.855
≤ 3.855
≤ 1.614
≤ 32.163
Node 2
m/z 5360
m/z 4218
≤ 17.86
≤ 4.648
Tree performance*
Sens (%)
Spec (%)
74.4
73.9
74.4
73.9
83.7
78.3
83.7
78.3
72.1
78.3
62.8
71.7
Abbreviations: sens: sensitivity, spec: specificity. *Tree performance as determined by 10-fold cross validation.
Reproducibility CRC set
For each data set on each apparatus the inter-chip reproducibility was assessed by
calculating the median CV across all samples from replicate peak intensities of all
clustered peaks and of the subset of 25 common peaks in the three data sets. The median
CV of all peaks and all common peaks was lowest for data set 2 and highest for data set
1a (Table 6). Considering all peaks, the CV was significantly different for the data sets
(Kruskall Wallis test; p = 0.012), but not when considering only the common peaks to
each data set (Kruskall Wallis test; p = 0.3).
49
Chapter 2.1
Table 6
Reproducibility of the CRC data sets.
Peak clusters
All peaks
Common peaks
All peaks (CON)
All peaks (CRC)
Data set 1a
28.30%
25.48%
28.52%
27.68%
Median CV of
Data set 1b
22.61%
23.06%
21.76%
23.38%
Data set 2
20.62%
21.71%
18.50%
23.33%
Reproducibility BC set
For the BC sample set, Spearman’s rank correlation coefficient was calculated on peak
intensities in each duplicate analysis for successive subsets of peaks including the 10%,
20%, 30% to 100% of peaks with lowest intensity, for all three data sets. As depicted in
Figure 1, a correlation coefficient > 0.8 was only reached after inclusion of 80% of
lowest peaks in data set 1a, while in the other two data sets, this coefficient was already
reached at inclusion of < 20% of lowest peaks. Similar results were obtained when
considering the significance of correlation (Figure 1). However, when considering only
the common peaks detected across all three data sets, results obtained were highly
similar for the three data sets (Figure 2).
Serum fractionation
The numbers of clusters detected on CM10 and IMAC arrays for each sample in each
acquired fraction are summarised in Table 7. Some of the clusters are occurring in
several fractions. Ignoring these overlapping clusters, on average twice as many peaks
were detected in the PCS 4000 generated spectra compared to the PBS-IIc generated
spectra (analysed either by the ProteinChip software or Ciphergen Express™). This is
also illustrated in Figure 3.
Discussion
Although the PBS-IIc SELDI-TOF MS apparatus has been extensively used in the search
for better biomarkers, issues have been raised concerning the semi-quantitative nature
of the technique and its reproducibility. To overcome these limitations, a new SELDITOF MS instrument has been introduced: the PCS 4000 series. In the current study, we
compared the performances of the old PBS-IIc and new PCS 4000 series generation
SELDI-TOF MS apparatus, by analysis of two sample sets.
Peak detection
For the CRC sample set, most peaks were detected with the new PCS 4000 series using
the Ciphergen Express™ software, indicating a better sensitivity and less detector
saturation of this apparatus. The latter allows for the application of increased laser
intensities, after which proteins will desorb more comprehensively, resulting in
50
Comparison of SELDI-TOF MS apparatus
detection of more peaks. However, for the BC sample set, most peaks were detected
with the PBS-IIc instrument using the ProteinChip software, indicating the opposite.
Interestingly, in both sample sets, fewer peaks were detected by Ciphergen Express™
than by the ProteinChip software in the spectra generated with the PBS-IIc, despite the
fact that both software packages use the same algorithm with similar settings to
generate peak clusters. Apparently, the spectrum processing algorithms underlying the
visible settings are different for both software packages.
Figure 1
Plots of Spearman’s rank correlation coefficient and p-values for all peaks detected in the BC data
sets.
Depicted are the mean (red) and median (black) values of all peaks detected in the three data sets of the BC
sample set. PBS: data set 1a (PBS-IIc generated data, analysed by ProteinChip software), PCS: data set 2 (PCS
4000 generated data, analysed by Ciphergen Express™), PBS/PCS: data set 1b (PBS-IIc generated data,
analysed by Ciphergen Express™).
In the BC set, all peaks detected in the PBS-IIc generated spectra by the ProteinChip
software, but missed by Ciphergen Express™ were < 4 in intensity. As peaks are
detected by means of their signal-to-noise ratio, detection of these low intensity peaks
becomes critical when either the noise increases or the signal decreases due to over-
51
Chapter 2.1
estimation of the baseline. Conceivably, the algorithm for noise and/or baseline
estimation between both software packages has been changed. Due to the detector
attenuation of the PCS 4000 instrument, matrix blanking has improved compared to the
PBS-IIc. Hence, less chemical noise is expected when measuring with the PCS 4000
instrument, to which the algorithm applied in noise calculation might have been
adapted. As such, for spectra generated with the PBS-IIc (in which relatively more
chemical noise is present), the Ciphergen Express™ software will estimate the noise too
high or the signal too low, the latter being the consequence of the baseline being
estimated too high. Either way results in fewer detected peaks.
Table 7
Data
set*
et*
Peak clustering results for the serum fractions profiled on CM10 and IMAC30 arrays.
Serum sample 1
CM10
1a 1b 2
FT+pH9 43 24 60
pH7
16 6
45
pH5
42 19 51
pH4
22 22 46
pH3
20 20 46
Organic 22 17 58
Total
165 108 306
Unique 103 78 167
IMAC30
1a
1b
23 22
9
9
15 15
24 22
16 14
31 30
118 112
85 82
Serum sample 2
CM10
2
1a
1b 2
52 59 23 57
30 28 9
46
32 37 16 49
53 19 17 50
42 10 8
48
40 17 12 61
249 170 85 311
158 106 61 162
IMAC30
1a
1b
28 22
9
7
24 16
34 24
16 13
22 15
133 97
82 72
Serum sample 3
CM10
2
1a
1b 2
43 29 19 47
24 9
5
51
29 21 15 56
54 23 19 54
36 19 16 58
36 24 20 53
222 125 94 319
135 82 71 163
IMAC30
1a
1b
24 15
7
7
17 11
23 21
18 13
19 20
108 87
67 53
2
47
26
28
46
38
42
227
128
* Data set 1a: PBS-IIc generated data, analysed by the ProteinChip software, data set 1b: PBS-IIc generated
data, analysed by Ciphergen Express, data set 2: PCS 4000 generated data, analysed by Ciphergen Express.
The difference between peaks detected by either software package in the PBS-IIc
generated spectra was more pronounced in the BC set than in the CRC set. These two
data sets differed in their deflector / detector attenuation settings (CRC: 2000 Da, BC:
1000 Da), but in both sets, the noise was calculated between 2 and 200 kDa. However,
as matrix peaks are generally observed up to 2000 Da, their contribution to the noise
will most likely increase with decreasing deflector settings. Hence, the difference in
deflector settings could have caused higher noise estimation in the BC set compared to
the CRC set. Combined with the probable noise overestimation by Ciphergen Express™
in PBS-IIc generated spectra, and the fact that relative to the CRC data sets, the BC data
sets contained more low intensity peaks (30 and 70%, respectively), which were mainly
present in the <10 kDa range, this might explain the more pronounced difference in
number of peaks detected in the PBS-IIc generated BC data set by both software
packages.
The difference in deflector / detector attenuation settings might also explain why,
contrary to the CRC set, in the BC set more peaks were detected by the ProteinChip
software in the PBS-IIc spectra than by Ciphergen Express™ in the PCS 4000 spectra.
Compared to the ProteinChip software, the noise calculation algorithm in Ciphergen
Express™ apparently is more sensitive to the noise in the low molecular weight range.
Due to the difference in detector attenuation settings, this low molecular weight range
52
Comparison of SELDI-TOF MS apparatus
will contain a higher signal in the BC spectra than in the CRC spectra. Consequently,
the noise is estimated higher and less peaks are detected. This hypothesis is supported
by the observation that all peaks detected in the PBS-IIc spectra, but not in the PCS
4000 spectra were < 3 in intensity.
One of the alleged improvements of the PCS 4000 compared to its PBS-IIc predecessor
is its special configuration for sensitivity in the high mass range that allows detection of
proteins above 100 kDa. Indeed, in the BC set, four > 100 kDa peaks were detected
exclusively in the PCS 4000 generated spectra, compared to two peaks in the PBS-IIc
generated spectra. Moreover, all peaks that were detected exclusively in the PCS 4000
spectra by Ciphergen Express™ were above 10 kDa. However, in none of the CRC data
sets any proteins > 100 kDa were detected, indicating no better sensitivity for proteins
in the higher mass range for the PCS 4000 series. Most peaks detected only in data set 2
were in the 2-10 kDa range. The differences in detection of high molecular weight
peaks could, however, be caused by the different array types used for the analyses of
both sample sets.
Classification
As the ultimate gain of the improved performance of the PCS 4000 instrument would
be detection of more and better biomarker candidates, we also assessed the classification
potential of the data sets generated by both machines. For the CRC set, the improved
performance of the new instrument was indeed reflected in the classifiers constructed,
as the best classification was obtained with the data set generated by the PCS 4000
instrument, using the total number of peaks detected. When using the subset of peaks
detected in all three data sets, the performance of the classifier build on data set 1b and
2 was similar. For the BC data set, results were less unambiguous. While for data set 1a
and 1b only one classifier was applied in the different optimum decision trees
constructed, best performance was achieved in data set 1b. Apparently, the different
spectrum processing algorithms underlying both software packages also contribute to
the alleged improved performance of the PCS 4000 instrument. However, application of
both the PCS 4000 and Ciphergen Express™ yielded no better classifiers. Hence, for the
BC set, the superior performance of the PCS 4000 instrument in providing better
biomarker candidates could not be confirmed. It can, however, not be precluded that
our data sets do not contain any real biomarkers.
Reproducibility
For the CRC set, the reproducibility of peak intensities was largely similar across data
sets, although a non-significant trend could be seen to a lower CV for data set 2
compared to 1a and 1b. Thus, the spot scanning in a raster and the less detector
saturation with the PCS 4000 series does not seem to result in a significant better
reproducibility. The fact that significant differences in CV were seen when all peaks
were considered indicates that the surplus of peaks detected in data set 2 consists of
53
Chapter 2.1
more robust peaks than the ones also detected in the other data sets, causing the median
CV to drop. Reproducibility of the PCS 4000 instrument as measured by the CV has
been stated to be < 20% using an external standard (13). It is not known to us in which
m/z range this reproducibility was obtained and whether this was with manual or
robotic sample handling. However, our observed median CV is well in concordance
with this value, especially taking the manual sample handling into account.
Figure 2
Plots of Spearman’s rank correlation coefficient and p-values for common peaks detected in the BC
data sets.
Depicted are the mean (red) and median (black) values of common peaks detected across all three data sets of
the BC sample set. PBS: data set 1a (PBS-IIc generated data, analysed by ProteinChip software), PCS: data set
2 (PCS 4000 generated data, analysed by Ciphergen Express™), PBS/PCS: data set 1b (PBS-IIc generated data,
analysed by Ciphergen Express™).
Reproducibility in the BC data sets was assessed by calculation of Spearman’s rank
correlation coefficient on duplicate intensities of the 10 to 100% peaks with lowest
intensity. When all peaks detected were included in this calculation, usage of the PCS
4000 and Ciphergen Express™ software package led to a better performance, as
statistically significantly (p < 0.05) good correlations (R > 0.8) were already achieved
54
Comparison of SELDI-TOF MS apparatus
upon inclusion of only 20% of lowest peaks, compared to the 80% of lowest peaks
necessary to achieve comparable results in the PBS-IIc generated data set. However,
when correcting for the excess of low intensity peaks detected in data set 1a relative to
data set 2 by considering only the peaks detected across all three data sets, results
obtained were highly similar for the three data sets. Thus, the improved features of the
PCS 4000 instrument relative to the PBS-IIc apparatus do not lead to an improved
reproducibility, as already observed in the CRC data sets.
Serum fractionation
Analysis of the PBS-IIc generated spectra by Ciphergen Express™ generally yielded the
lowest number of peaks detected. Hence, the performance of the PCS 4000 in serum
fractionation is indeed superior compared to the PBS-IIc instrument, reflecting the
improved spot coverage and increased detector sensitivity. These observations are
highly similar to the results obtained following peak detection in the three CRC data
sets.
Figure 3
Spectra of serum fractions analysed on CM10 arrays and measured on the PBS-IIc and PCS 4000
instrument. A: flow through/pH 9 fractions, B: pH 7 fractions.
Although deflector / detector attenuation settings were different for the fractionation
spectra on IMAC and CM10 chips, peak clustering results were highly similar for the
two array types used, contrary to the results obtained in the CRC and BC sample sets.
This could be due to the fact that these spectra have a higher noise level than spectra
from crude serum (data not shown), limiting the influence of the different noise
estimation between both software packages. Moreover, the number of peaks < 10 kDa is
similar in the fractionation spectra from the IMAC and CM10 chips, contrary to the
spectra from the CRC and BC set, which could also cause less influence of the noise
estimation on peak detection.
55
Chapter 2.1
Conclusion
In conclusion, regarding the number of peaks detected, the biomarker potential and the
reproducibility of the two sample sets investigated by both the old (PBS-IIc) and new
(PCS 4000) generation SELDI-TOF MS apparatus, we could not confirm the alleged
improved performance of the PCS 4000 instrument over the PBS-IIc apparatus.
However, the PCS 4000 instrument did prove to be of superior performance in peak
detection following profiling of serum fractions. Until now, the majority of studies in
which SELDI-TOF MS was applied in crude serum protein profiling for biomarker
discovery generally reported high abundant, non-disease-specific proteins as potential
biomarkers. However, the large dynamic range of crude serum hampers detection of the
allegedly high-informative low abundant serum proteins. As serum fractionation
facilitates detection of low abundant proteins through reduction of this dynamic range,
it is increasingly applied in the search for new potential biomarkers. Hence, although
the new PCS 4000 instrument did not differ from the old PBS-IIc apparatus in the
analysis of crude serum, its superior performance of fractionated serum samples does
hold promise for improved biomarker detection and identification.
Acknowledgement
The authors gratefully acknowledge Ciphergen Biosystems for use of the PCS 4000
SELDI-TOF MS, and the Department of Clinical Chemistry, University Hospital
Maastricht for use of the Ciphergen Express™ software package. Wouter Meuleman is
greatly acknowledged for help with data analysis.
References
(1)
(2)
(3)
(4)
(5)
(6)
56
http://www.bruker.nl/daltonics/home_daltonics.html. 2007.
Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH et al. Serum protein fingerprinting
coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate
hyperplasia and healthy men. Cancer Res 2002; 62(13):3609-3614.
Caspersen MB, Sorensen NM, Schrohl AS, Iversen P, Nielsen HJ, Brunner N. Investigation of tissue
inhibitor of metalloproteinases 1 in plasma from colorectal cancer patients and blood donors by surfaceenhanced laser desorption/ionization time-of-flight mass spectrometry. Int J Biol Markers 2007;
22(2):89-94.
Engwegen JY, Helgason HH, Cats A, Harris N, Bonfrer JM, Schellens JH et al. Identification of serum
proteins discriminating colorectal cancer patients and healthy controls using surface-enhanced laser
desorption ionisation-time of flight mass spectrometry. World J Gastroenterol 2006; 12(10):1536-1544.
Li J, Zhao J, Yu X, Lange J, Kuerer H, Krishnamurthy S et al. Identification of biomarkers for breast
cancer in nipple aspiration and ductal lavage fluid. Clin Cancer Res 2005; 11(23):8312-8320.
Schultz IJ, De Kok JB, Witjes JA, Babjuk M, Willems JL, Wester K et al. Simultaneous proteomic and
genomic analysis of primary Ta urothelial cell carcinomas for the prediction of tumor recurrence.
Anticancer Res 2007; 27(2):1051-1058.
Comparison of SELDI-TOF MS apparatus
(7)
(8)
(9)
(10)
(11)
(12)
(13)
Zhang Z, Bast RC, Jr., Yu Y, Li J, Sokoll LJ, Rai AJ et al. Three biomarkers identified from serum
proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004; 64(16):5882-5890.
Freed GL, Cazares LH, Fichandler CE, Fuller TW, Sawyer CA, Stack BC, Jr. et al. Differential Capture of
Serum Proteins for Expression Profiling and Biomarker Discovery in Pre- and Posttreatment Head and
Neck Cancer Samples. Laryngoscope 2008; 118(1):61-68.
Diamandis EP. Point: Proteomic patterns in biological fluids: do they represent the future of cancer
diagnostics? Clin Chem 2003; 49(8):1272-1275.
Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum:
comparing datasets from different experiments. Bioinformatics 2004; 20(5):777-785.
Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer
Inst 2005; 97(4):315-319.
Malyarenko DI, Cooke WE, Adam BL, Malik G, Chen H, Tracy ER et al. Enhancement of sensitivity and
resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for
serum peptides using time-series analysis techniques. Clin Chem 2005; 51(1):65-74.
ProteinChip System, Series 4000, Product Note (www.bio-rad.com). 2005.
57
Chapter
Serum protein profiling using
SELDI-TOF MS:
influence of sample
storage duration
Marie-Christine W. Gast
Carla H. van Gils
Lodewijk F.A. Wessels
Nathan Harris
Johannes M.G. Bonfrer
Emiel J. Th. Rutgers
Jan H.M. Schellens
Jos H. Beijnen
Submitted for publication
2.2
Chapter 2.2
Abstract
In the last two decades, great efforts have been made in the search for better serum
protein biomarkers that can be applied in screening, diagnosis and prognosis of cancer.
One of the technologies used extensively for protein profiling is surface-enhanced laser
desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS). However,
issues have been raised concerning the robustness and validity of alleged serum markers
discovered by SELDI-TOF MS. Pre-analytical variables, such as sample collection,
processing and storage temperature, have been shown to exert a profound effect on
protein profiles, irrespective of true biological variation. However, little is known about
the possible effects of the pre-analytical variable ‘sample storage duration’ on the serum
proteome. We therefore aimed to investigate the effects of extended storage duration on
the serum protein profile.
To this end, archival sera of 140 breast cancer patients, stored at -30°C for 1 to 11 years,
were profiled by SELDI-TOF MS using Immobilized Metal Affinity Capture arrays, a
condition applied in the majority of biomarker discovery studies performed thus far in
breast cancer. Spectrum-wide, 76 peak clusters were detected, 14 of which were found
significantly associated to sample storage duration, following five distinct patterns.
These clusters were structurally identified as C3a des-arginine anaphylatoxin and
multiple fragments of albumin and fibrinogen. These proteins have, however, also been
described previously as potential cancer markers, rendering them specific to both
disease and sample handling issues. Hence, to prevent experimental variation to be
interpreted erroneously as disease associated variation, assessment of potential
confounding by pre-analytical parameters (such as storage time) is a prerequisite in
biomarker discovery and validation studies. Moreover, regarding the different (nonlinear) patterns by which peak intensities were found associated to storage duration,
merely linear corrections for sample storage duration will not necessarily suffice.
60
Influence of sample storage duration on serum protein profiles
Introduction
In the last two decades, much effort has been devoted to the search for improved
markers that can be applied in screening, diagnosis and prognosis of cancer. With
cancer being a genetic disease, genetic markers were initially pursued by the
investigation of the cancer genome. It is, however, currently understood that gene
analysis by itself provides an incomplete picture. Due to alternative splicing of both
mRNA and proteins, combined with more than 100 unique post-translational
modifications, one gene can give rise to multiple protein species (1). Hence, compared
to the genome, the proteome can provide a more dynamic and accurate reflection of
both the intrinsic genetic programme of the cell and the impact of its immediate
environment (2). Since proteome analysis can provide the link between gene sequence
and cellular physiology (3), proteomics is expected to complement gene analyses for the
detection of novel cancer markers (4).
In search of those markers, several different methods based on mass spectrometry (MS)
have been applied to interrogate the proteome. One of the technologies used
extensively for protein profiling is surface-enhanced laser desorption/ionisation timeof-flight mass spectrometry (SELDI-TOF MS) (5). This technology combines retentive
chromatography with laser desorption/ionisation MS instrumentation, enabling highthroughput mass profiling of highly complex biological samples such as serum. During
subsequent analyses, spectral patterns are compared across sample groups to find
discerning masses or differential changes in peak intensities.
The majority of SELDI studies reported thus far have investigated serum, though the
technology is equally effective in analysing tissue lysates, for instance (6). Serum,
however, is an easy to sample, readily accessible protein-rich body fluid, perfusing all
tissues of the body and thus, theoretically, providing a good reflection of the human
proteome (7). In addition, existing serum banks could readily provide serum from a
large number of patients, enabling studies to be carried out in a timely fashion, as
sample collection otherwise would have taken years (8). Indeed, many reports have
described the successful application of SELDI-TOF MS in the discovery of potential
serum markers for different types of cancer, such as ovarian (9), colorectal (10), and
thyroid carcinoma (11).
However, issues have been raised concerning the robustness and validity of alleged
serum markers discovered by SELDI-TOF MS. A potential drawback of analysing highdimensional proteomic (SELDI-TOF MS) data for disease associated biomarkers is the
propensity to discover patterns resulting from pre-analytical artefacts in a given sample
set, rather than from the pathology of interest (12). Indeed, several lines of evidence
indicate that pre-analytical variables, such as sample collection, processing and storage
temperature, can exert a profound effect on protein profiles, regardless of true biological
variation (13-15).
61
Chapter 2.2
However, yet little is known about the possible effects of the pre-analytical variable
‘sample storage duration’ on the serum proteome. Although this parameter has been
investigated in two studies, only very few sera (n ≤ 12), stored for relatively short
periods of time (1-3 months) were profiled (16;17). Clinical studies generally exceed
these storage durations, since study sera either originate from sample banks (18-20), or
are collected prospectively over a period of years (21). Therefore, we previously set out
to study the effects of longer storage duration periods (0 to 16 months), with a larger
sample size (n = 150) (22). Nonetheless, even this extended storage interval is generally
surpassed by clinical proteomics studies, as, for instance, McLerran et al. (23) have
reported a collection interval of more than 20 years. Their prostate cancer study
provides a clear example of the potentially detrimental effects of this long-term storage
duration on clinical proteomics studies. Following analysis of prospectively collected
sera, their initial results (obtained in archival sera) could not be confirmed. It was not
until they subjected their initial study to extensive post-study data analysis, that they
discovered their study to be biased by, amongst other, sample storage duration, as the
cases had a much longer sample storage duration compared to the controls (23).
Although the study of McLerran is not unique in analysing sera that originate from a
serum bank, the influence of storage duration on reported serum protein profiles is
seldomly investigated. In the current study, we therefore aimed to investigate the
effects of extended storage duration (1 to 11 years) on the serum protein profile. To this
end, archival sera of 140 breast cancer patients were profiled by SELDI-TOF MS with
Immobilized Metal Affinity Capture (IMAC30) arrays, as these settings are employed in
the majority of serum biomarker discovery studies performed in breast cancer
(18;19;21;24-28). Peak clusters found significantly associated with sample storage
duration were structurally identified.
Materials and methods
Study population
Archival sera of 140 breast cancer patients, collected between January 1993 and
December 2002, were analysed in our laboratory using standardised analytical
procedures. All sera were collected prior to any therapy, with individuals’ informed
consent, after approval by the Institutional review boards. All sera originated from the
Netherlands Cancer Institute serum bank, where they had been collected and stored for
9 to 128 months at -30°C according to standard procedures.
Chemicals
All used chemicals were obtained from Sigma, St. Louis, MO, USA, unless stated
otherwise.
62
Influence of sample storage duration on serum protein profiles
Serum protein profiling
Serum protein profiling was performed using the ProteinChip SELDI Reader (Bio-Rad
Labs, Hercules, CA, USA) with IMAC30 arrays. Sera were analysed in three batches (n =
39, 47, and 54 in Batch 1, 2, and 3, respectively), on three consecutive days. Throughout
the assay, arrays were assembled in a 96-well bioprocessor, which was shaken on a
platform shaker at 350 rpm. Arrays were charged twice with 50 μl 100 mM nickel
sulphate (Merck, Darmstadt, Germany) for 15 min, followed by three rinses with
deionised water (Braun, Emmenbrücke, Germany) and two equilibrations with 200 µl
Phosphate Buffered Saline (PBS; 0.01 M) pH 7.4 / 0.5 M sodium chloride / 0.1%
TritonX-100 (binding buffer; sodium chloride from Merck) for 5 min. Sera thawed on
ice and denatured by 1:10 dilution in 9 M urea / 2% 3-[(3-cholamidopropyl)dimethylammonio-]-1-propanesulfonic acid (CHAPS). Pretreated samples were diluted
1:10 in binding buffer and randomly applied in singular to the arrays. After a 30 min
incubation, the arrays were washed twice with binding buffer and twice with PBS pH
7.4 / 0.5 M sodium chloride for 5 min. Following a quick rinse with deionised water,
arrays were air-dried. A saturated solution of sinapinic acid (Bio-Rad Labs) in 50%
acetonitrile (ACN; Biosolve, Valkenswaard, The Netherlands) / 0.5% trifluoroacetic acid
(TFA; Merck) was applied twice (0.6 μl) to the arrays as the matrix. Following airdrying, the arrays were analysed using the ProteinChip SELDI (PBSIIc) Reader. Data
were collected between 0 and 100 kDa, averaging 65 laser shots with intensity 200,
detector sensitivity 9, and a focus lag time of 636 ns (m/z 7000). For mass accuracy, the
instrument was calibrated on the day of measurements with All-in-One peptide
standard (Bio-Rad Labs).
Statistics and bioinformatics
Spectra of the three batches were processed separately by the ProteinChip Software
v3.1 (Bio-Rad Labs). Spectra were baseline subtracted, after which they were
normalised to the total ion current. Spectra with normalisation factors > 2 or < 0.5 were
excluded from further analysis.
Following spectrum pre-processing, the Biomarker Wizard (BMW) software package
was applied for peak detection. Peaks were auto-detected when occurring in at least
30% of spectra and when having a signal-to-noise ratio (S/N) of at least 7. Peak clusters
were completed with peaks with a S/N of at least 5 in a cluster mass window of 0.4%,
and peak information was subsequently exported as spreadsheet files. The batches were
analysed on three consecutive days, a parameter known to influence spectral data
(29;30). As such, merging peak intensity data of the three sets would lead to spurious
results. To this end, the intensities of peaks occurring across all three sample sets were
log transformed to obtain normal distributions. Next, the log transformed peak
intensities were converted to standard Z-values per sample set, by subtracting the mean
and dividing by the standard deviation. The log-Z transformed data of the three sets
were subsequently merged in one file.
63
Chapter 2.2
To investigate (non-)linear effects of sample storage time on peak expression, samples
were split according to four categories of sample storage duration (≤ 32 months, n = 16;
32-64 months, n = 59; 64-96 months, n = 48; > 96 months, n = 17). Mean peak intensity
differences between the four categories were subsequently investigated by means of
ANOVA analyses. Resulting p-values were corrected for multiple testing using the
Bonferroni correction, by multiplying p-values with the number of peak clusters
detected and tested (n = 76). Next, we investigated whether the relationship between
peak intensity and sample storage time was influenced by patients’ age and / or stage of
disease. To this end, samples were split according to tertiles of patients’ age (≤ 49.4, 49.5
-61.8, and > 61.8 years), or according to stage of disease (Stage 2A, and 2B). Mean peak
intensity differences between the categories were subsequently investigated by
ANOVA, and T-test statistics, respectively. For storage time-associated clusters found
significantly related to patients’ age and / or stage of disease, the relationship between
peak intensity and storage duration was investigated in subgroups of age (i.e., < and >
median age) and stage (i.e., Stage 2A, and 2B). Statistical analyses were performed by
using SPSS statistical software, version 13.0 (SPSS Inc., Chicago, IL, USA).
Protein purification and identification
A serum sample (500 μl) containing the proteins we found significantly associated with
sample storage duration was denatured in 9 M urea / 2% CHAPS / 50 mM Tris-HCl pH
9. The sample was subsequently fractionated on Q Ceramic HyperD beads with a strong
anion exchange moiety (Biosepra Inc., Marlborough, MA, USA). After binding of
denatured serum to the beads, the flow through was collected and bound proteins were
subsequently eluted with buffers with pH from 9 to 3. The fractions containing the
proteins of interest were further purified by size fractionation, using Microcon 50 kDa
MW spin concentrators (YM50, Millipore, Billerica, MA, USA) with increasing
concentrations of ACN / TFA 0.1%. The filtrates containing the proteins of interest
were subsequently de-salted by application on reversed phase RP18 beads (Varian Inc.,
Palo Alto, CA, USA), followed by elution with increasing concentrations of ACN / TFA
0.1%. The purification process was monitored by profiling each fraction on IMAC30 Ni
arrays and NP20 arrays (a non-selective, silica chromatographic surface). Eluates
containing the proteins of interest were dried and redissolved in loading buffer for SDSPAGE. Gel electrophoresis was performed on Novex NuPage gels (18% Tris-Glycine gel;
Invitrogen, San Diego, CA, USA). Following staining with colloidal Coomassie staining
(Simply Blue; Invitrogen), protein bands of interest were excised and collected. The
proteins within the excised bands were eluted by washing twice with 30% ACN / 100
mM ammonium bicarbonate, followed by dehydration in 100% ACN. Next, gel bands
were heated at 50°C for 5 min and eluted with 45% formic acid / 30% ACN / 10%
isopropanol under sonification for 30 min. After leaving the eluates overnight at room
temperature, they were profiled on NP20 arrays. Eluates were subsequently dried,
resuspended in 20 ng/µl trypsin (Promega, Madison, WI, USA) in 10% ACN / 25 mM
64
Influence of sample storage duration on serum protein profiles
ammonium bicarbonate, followed by incubation at room temperature for 4 h. The
tryptic digests were profiled on NP20 chips, using 1 μl 20% alpha-cyano-4-hydroxy
cinnaminic acid (Bio-Rad Labs) in 50% ACN / 0.5% TFA as matrix. Peptides in the
digests were investigated with the NCBI database using the ProFound search engine at
http://prowl.rockefeller.edu/prowl-cgi/profound.exe with the following search
parameters: standard cleavage rules for trypsin, 1 missed cleavage allowed.
Confirmation of protein identity was provided by sequencing tryptic digest peptides by
quadrupole-TOF (Q-TOF) MS (Applied Biosystems / MSD Sciex, Foster City, CA, USA)
fitted with a ProteinChip Interface (PCI-1000). Fragment ion spectra resulting from QTOF analyses were taken to search the SwissProt 44.2 database (Homo Sapiens: 11072
sequences) using the MASCOT search engine at www.matrixscience.com (Matrix
Science Ltd., London, UK), with the following search parameters: monoisotopic
precursor mass tolerance: 40 ppm, fragment mass tolerance: 0.2 Da, variable
modifications: methionine oxidation, and trypsin cleavage site. Protein identity was
furthermore confirmed by immunoassay on ProteinA beads, using the appropriate
antibodies. Beads were loaded with antibody in PBS, washed twice with PBS, incubated
for 30 min with whole serum, and washed 5 times with PBS and once with deionised
water. Bound proteins were subsequently eluted using 0.1 M acetic acid, and eluates
were profiled on NP20 arrays. The extent of non-specific binding was tested using a
murine IgG antibody (Bio-Rad Labs). For all identification experiments, a serum sample
lacking the protein of interest was run concurrently as a negative control.
Table 1
Patient and sample characteristics of the study population.
Parameter
Patients
N
140
Age (years),
median [IQR]
52.6
[45.7-66.0]
Stage*
2A
2B
3A
Unknown
48
78
10
4
Diagnosis*
IDC
ILC
IDC & ILC
Other
104
23
4
9
Sample storage time (months),
median [IQR]
61.3
[49.1-76.4]
Sample collection interval
Jan-’93 - Dec-‘02
Abbreviations: IQR: interquartile range. * Pathologically determined stage and diagnosis; IDC: invasive ductal
carcinoma; ILC: invasive lobular carcinoma; other: mucinous, tubular, mixed, unknown.
65
Chapter 2.2
Results
Study population
Patient and sample characteristics are summarized in Table 1. The majority of breast
cancer patients (> 80%) was diagnosed with Stage 2A and Stage 2B invasive ductal
carcinoma. Median patients’ age was 52.6 years.
Serum protein profiling
Representative SELDI-TOF MS spectra are presented in Figure 1. Following serum
protein profiling and spectrum pre-processing by the ProteinChip Software v3.1,
spectra of two breast cancer patients (Batch 3) were discarded due to aberrant
normalisation factors. Spectrum-wide, the Biomarker Wizard detected a total of 76 peak
clusters across the spectra of all three batches.
Figure 1
Representative example of protein profiles obtained from two breast cancer patient sera, stored for
10 (BC 1) and 120 (BC 2) months.
Relative intensity →
5000
7000
8000
5911.0+H
40
20
6000
9000
8939.1+H
5342.4+H
BC 1
0
40
BC 2
20
0
5000
6000
7000
8000
9000
Mass-to-charge ratio →
Influence of sample storage duration on the serum protein profile
Mean log-Z peak intensity differences between the four discrete sample storage time
categories were investigated by ANOVA analyses. In total, 14 of the 76 peak clusters
detected spectrum-wide were found to differ significantly in mean log-Z intensity
between the four categories of sample storage duration (Table 2). None of the 14 peak
clusters were found related to patients’ age (ANOVA; p > 0.05) or stage of disease (Ttest; p > 0.05).
66
Influence of sample storage duration on serum protein profiles
Overall, five different patterns by which peak intensities were associated with sample
storage duration were observed. In Figure 2, these patterns (A-E) are depicted for five
representative peak clusters. For the peak clusters m/z 2773, 2789, 3089, and 3104, a
positive association with storage time was observed up to a storage duration of
approximately 5 years, after which peak intensities gradually decreased with sample
storage time (Pattern A). Peak intensities of m/z 4215, 5908, 5929, 6114, and 11091
were observed to continuously decrease over sample storage time (Pattern B), while the
intensity of m/z 4441 was found to increase over time (Pattern C). The fourth pattern
(Pattern D), observed in peak clusters m/z 4471, and 8939, consists of an initial increase
in peak intensity, after which the intensities remain stable over a prolonged period of
sample storage. Finally, for both clusters m/z 5341 and 5557, peak intensities were
stable up to approximately 8 years of storage, after which peak intensities decreased
rapidly (Pattern E).
Table 2
Protein
Peak cluster information of the 14 peaks found significantly associated with sample storage
duration.
Peak
(m/z)
2773
2789
3089
3104
ANOVA
p-value*
0.033
0.001
0.016
0.005
Fibrinogen
4215
5341
5357
5908
5929
6114
11091
< 0.001
0.003
0.046
< 0.001
< 0.001
0.013
0.003
B
E
E
B
B
B
B
Unknown fibrinogen fragment
m/z 5341 fibrinogen‡
m/z 5341 fibrinogen‡, Ox
m/z 5908 fibrinogen‡
m/z 5908 fibrinogen‡, Ox
m/z 5908 fibrinogen‡ SPA adduct
Unknown fibrinogen fragment
C3adesArg
4471
8937
0.018
0.002
D
D
m/z 8939 C3adesArg‡ double charge
m/z 8939 C3adesArg‡
Unknown
4441
0.028
C
Unknown
Albumin
Regulation†
Regulation†
(pattern)
A
A
A
A
(Alleged) identity‡
identity‡
m/z 2756 albumin‡, Ox
m/z 2756 albumin‡, (Ox)2
m/z 3089 albumin‡
m/z 3089 albumin‡, Ox
* Bonferroni corrected p-values from ANOVA test of mean intensity differences between the discrete time
intervals, † 5 patterns by which peak intensity was associated to sample storage duration: A: initial increase,
followed by a gradual decrease, B: continuous decrease or C: continuous increase, D: initial increase, after
which intensities remain stable, and E: stable up to ~8 years of storage time, followed by a decrease. Peptides
marked with ‡ were structurally identified.
Peptide purification and (tentative) identification
The m/z 2756 and 3089 peptides were present in the pH 4 eluate after QhyperD
fractionation of whole serum. Following concentration of this fraction on YM50 spin
concentrators the two peptides were detected in the 50% ACN eluate. Since the
peptides were lost during subsequent purification processes, their amino acid (aa)
sequence was determined by direct tandem MS on a PCI-interfaced Q-TOF. The two
67
Chapter 2.2
peptides were identified as N-terminal fragments of albumin (Figure 3). The theoretical
mass of the m/z 2756 peptide (24 aa) and the m/z 3089 peptide (27 aa) is 2754.10 Da and
3085.51 Da, respectively, and the pI of both fragments is 6.04.
Figure 2
Representative examples of the five Patterns (A-E) by which peak intensities were found associated
to sample storage duration (y-axis: Z-log transformed peak intensity, x-axis: sample storage
duration in months).
A: m/z 3089
C: m/z 4441
B: m/z 5908
2
2
2
1
1
0
0
0
-1
-1
-2
-2
-2
0 - 32
0 - 32
64 - 96
32 - 64
64 - 96
32 - 64
96 - 128
0 - 32
96 - 128
64 - 96
32 - 64
96 - 128
E: m/z 5341
D: m/z 8939
2
3
1
2
0
1
-1
0
-2
-3
-1
-4
0 - 32
64 - 96
32 - 64
96 - 128
0 - 32
64 - 96
32 - 64
96 - 128
Figure 4 depicts the correlation matrix presenting the (absolute) Pearson’s correlation
coefficients calculated between the peak intensities of the 14 peaks found significantly
associated to storage time. The three peak clusters m/z 2756, 2773, and 2789 were found
highly correlated (Figure 4). As the mass deviation between m/z 2773, 2789 and 2756 is
approximately 16 and 32 Da, these peptides most likely represent oxidised forms of the
m/z 2756 albumin fragment. Similarly, m/z 3104 was found highly correlated to m/z
3089 (Figure 4). Regarding the mass difference of approximately 16 Da, m/z 3104 is
likely to represent the oxidised form of the m/z 3089 albumin fragment. These
hypothesised identities are endorsed by the observation that all five peak clusters show
a similar correlation with sample storage time (i.e., initial increase, followed by a
gradual decrease in peak intensity: Pattern A).
68
Influence of sample storage duration on serum protein profiles
Following QhyperD fractionation, the m/z 5908 peptide was detected in the flow
through. After concentration of this fraction on YM50 spin concentrators, the peptide
was found in the flow through. De-salting by use of RP18 beads resulted in elution of
the peptide in the 50% ACN / 0.1% TFA eluate. This fraction was again concentrated on
YM3 spin concentrators. Profiling of the retentate revealed only peptides < 5908 Da. In
prior identification attempts, the m/z 5908 peptide was shown to degrade with
increasing manipulation. The detected peptides are therefore likely to originate from
breakdown of the m/z 5908 peptide. Direct sequencing of three peptides by tandem MS
on a Q-TOF confirmed the peptides to originate from a fibrinogen alpha-E fragment
(FGA576-630), 54 aa in length, with theoretical mass 5904.22 Da and pI 8.07 (Figure 5).
Figure 3
Structural identification of the m/z 2756 and m/z 3089 peak clusters.
Direct sequencing of the m/z 2756 and m/z 3089 peak clusters. MS spectrum of the YM50 50% ACN eluate.
All peptides were sequenced with tandem MS using Q-TOF. Results from the MASCOT search for protein
identification include start and end positions of the peptide sequence starting from the amino acid terminal of
the whole protein, the observed m/z (Mr (obs)), transformed to its experimental mass (Mr(expt)), the
calculated mass (Mr(calc)) from the matched peptide sequence, as well as their mass difference (Delta), and
the peptide sequence (in grey: the amino acid sequence determined by Q-TOF MS).
2754.5771
100
3969.6741
3970.6267
3818.5620
3968.6443
2753.6125
2755.6064
3085.8679
3819.5469
3084.8691
3816.5674
3086.9119
2744.2961
%
2736.9192
2756.5286
3820.5825
3083.8936
3087.8425
3800.4783
2728.5276
2727.3542
2677.0747
2669.2397
2543.2390
2655.1824
0
2500
2600
2757.6008
3371.8872
3100.8860
2770.1846
3166.9778
2939.7974
3284.0730
3372.8596
3783.3423
2800
2900
3000
3100
3689.3057
3448.2319
3044.8630
2700
3799.5215
3200
3300
3400
3500
3600
3700
3822.5530
3875.6406
3800
3900
m/z, amu
MASCOT search results: m/z 2756 and m/z 3089 N-terminal truncated albumin fragments
StartMr
Mr
Mr
Delta Sequence
StartEnd
(obs)
(expt)
(calc)
25-48 2753.61
2752.60
2752.43
0.17
R.DAHKSEVAHRFKDLGEENFKALVL.I
25-51 3083.83
3082.82
3083.62
-0.80
R.DAHKSEVAHRFKDLGEENFKALVLIAF.A
Amino acid sequence of albumin fragments (start: 25 - end: 57, 82% sequence coverage):
DAHKSE VAHRFKDLGE ENFKALVLIA FAQYLQQ
69
Chapter 2.2
The m/z 4215, m/z 5341, m/z 5357, m/z 5908, m/z 5929, m/z 6114 and m/z 11091 peak
clusters were found highly correlated to each other (Figure 4). Moreover, as the mass of
the m/z 5341 corresponds to the theoretical mass of the 49 aa fibrinogen alpha-E
fragment FGA576-625, the correlated peptides most likely represent (oxidised) fibrinogen
fragments. The m/z 6114 cluster represents the SPA adduct of m/z 5908 FGA576-630.
Indeed, except for m/z 5341 and m/z 5357, the (hypothesized) fibrinogen fragments all
show a similar correlation with sample storage time (i.e., gradual decrease in peak
intensity over sample storage time: Pattern B).
Figure 4
Peak intensity correlation matrix for the 14 peaks found associated with sample storage duration
(for clarity, Pearsons´ correlation coefficients were converted into absolute values).
1
m/z 2773
0.9
m/z 2789
m/z 3089
0.8
m/z 3104
0.7
m/z 4215
m/z 5341
0.6
m/z 5357
0.5
m/z 5908
0.4
m/z 5929
m/z 6114
0.3
m/z 11091
0.2
m/z 4471
m/z 8937
0.1
m/z 4441
m/z 4441
m/z 8937
m/z 4471
m/z 11091
m/z 6114
m/z 5929
m/z 5908
m/z 5357
m/z 5341
m/z 4215
m/z 3104
m/z 3089
m/z 2789
m/z 2773
Following QhyperD fractionation, the m/z 8939 peptide was eluted in the flow through.
This fraction was concentrated on YM50 spin concentrators, and the peptide was found
in the 30% ACN eluate. De-salting of the eluate on RP18 beads resulted in elution of
the peptide in the 50% ACN / 0.1% TFA eluate, which was subsequently subjected to
SDS-PAGE analysis. After staining, a clear band in the 8.9 kDa region was visible,
which was excised, Elution of the proteins within the excised bands was followed by
70
Influence of sample storage duration on serum protein profiles
tryptic digestion of the eluate. Profiling of the gel-eluate confirmed the presence of the
peptide, and peptide mapping of the tryptic digest identified it as complement
component 3 precursor (estimated Z-score 1.57, 4% sequence coverage). Amino acid
sequencing of 6 peptides in the tryptic digest by tandem MS on a Q-TOF identified the
marker as C3a des-arginine anaphylatoxin (C3adesArg, 61% sequence coverage), a 76
amino acid protein with theoretical mass 8939.46 Da and pI 9.54 (Figure 6). This
identity was confirmed by an immunoassay, for which a C3a polyclonal antibody
(Abcam Ltd, Cambridge, UK) was used. Profiling of the eluates revealed the presence of
a peak at m/z 8940. Non-specific binding as determined by binding to murine IgG
antibody was very low.
The peak intensities of m/z 4471 and m/z 8939 were found highly correlated (Figure 4).
Regarding its mass, the m/z 4471 peak most likely represents the doubly charged form
of the m/z 8939 peak. The two peak clusters show similar correlations to sample storage
duration (i.e., initial increase, after which peak intensities remain stable over time:
Pattern D), endorsing the hypothesised identity of m/z 4471.
Discussion
In the current study, archival sera of 140 breast cancer patients, stored at -30°C for 1 to
11 years, were analysed by SELDI-TOF MS. Of the 76 peak clusters detected spectrumwide, peak intensities of 14 peak clusters were found to be significantly associated with
sample storage duration by five different patterns (A - E). These peak clusters were
structurally identified as C3adesArg and multiple fragments of albumin and fibrinogen.
Their susceptibility to sample handling issues has been discussed in previous studies
(8;22;31). A number of these proteins have, however, also been reported as potential
cancer markers (10;11;21). Although they are not tumour-derived, it is currently
hypothesized that these (cancer specific) serum proteins are generated from a pool of
high-abundant founder proteins by tumour specific protease activities (11;32;33).
Hence, these cleavage products were found to be specific for both disease and preanalytical sample handling parameters. Evidently, assessment of potential confounding
by pre-analytical parameters (such as storage time) is of vital importance, to prevent
experimental variation to be interpreted erroneously as disease associated variation.
Moreover, regarding the different (non-linear) patterns by which peak intensities were
found associated to storage duration, merely linear corrections for sample storage
duration will not necessarily suffice.
The albumin clusters
Four of the 14 significant peak clusters were observed to initially increase in peak
intensity up to approximately five years of storage time, after which peak intensities
decreased (Pattern A). Two clusters (m/z 3089 and m/z 3105) were structurally
71
Chapter 2.2
identified as albumin fragments, while the other two clusters (m/z 2773 and m/z 2789)
most likely correspond to oxidised forms of the structurally identified m/z 2756 albumin
fragment. We hypothesise that the observed pattern is the result of continuous in vitro
proteolytic degradation of albumin and its fragments.
Figure 5
Structural identification of the m/z 5341 and m/z 5908 peak clusters.
Direct sequencing of the m/z 5341 and m/z 5908 peak clusters. MS spectrum of the YM3 retentate. Insert:
SELDI-TOF MS spectrum of the 50% ACN / 0.1% TFA RP18 eluate (upper spectrum) and YM3 retentate
(lower spectrum). All peptides were sequenced with tandem MS using Q-TOF for confirmation. Results from
the MASCOT search for protein identification include start and end positions of the peptide sequence starting
from the amino acid terminal of the whole protein, the observed m/z (Mr (obs)), transformed to its
experimental mass (Mr(expt)), the calculated mass (Mr(calc)) from the matched peptide sequence, as well as
their mass difference (Delta), and the peptide sequence (in grey: the amino acid sequence determined by QTOF MS).
Relative intensity →
5914.5+H
5347.8+H
8597.0+H
6120.3+H
2768.0+H
4000
6000
8000
2557.8+H
2937.0+H
2773.5+H
3284.5+H
3212.6+H
2341.2+H
1816.1+H
1899.8
2000
3000
4000
5368.9+H
5000
6000
Mass to charge ratio →
MASCOT search results: m/z 5908 C-terminal fibrinogen fragment
StartStart-End
Mr
Mr
Mr
Delta Sequence
(obs)
(expt)
(calc)
576-598
2553.00
2551.99
2552.09
-0.10
K.SSSYSKQFTSSTSYNRGDSTFES.K
576-600
2768.20
2767.19
2767.22
-0.03
K.SSSYSKQFTSSTSYNRGDSTFESKS.Y
576-601
2931.20
2930.19
2930.28
-0.09
K.SSSYSKQFTSSTSYNRGDSTFESKSY.K
Amino acid sequence of m/z 5341 fibrinogen fragment (start: 576 - end: 625, 53% sequence coverage):
SSSYSKQFTS STSYNRGDST FESKSYKMAD EAGSEADHEG THSTKRGHA
Amino acid sequence of m/z 5908 fibrinogen fragment (start: 576 - end: 630, 48% sequence coverage):
SSSYSKQFTS STSYNRGDST FESKSYKMAD EAGSEADHEG THSTKRGHAK SRPV
72
Influence of sample storage duration on serum protein profiles
Albumin is the most abundant serum protein (30-50 mg/ml), comprising about one-half
of the blood serum proteins (34). As such, detection of its proteolytic fragments as
markers for sample storage duration is not unexpected. The susceptibility of albumin for
proteolytic degradation during prolonged storage at -30°C has been described previously
by our group (22). We observed the N-terminal albumin25-57 fragment to be positively
correlated to storage duration (1.4 years) at -30°C. Similarly, peaks corresponding to our
m/z 2773 and m/z 3104 peak clusters were also found to increase in peak intensity with
increasing storage time.
The fibrinogen clusters
Seven of the 14 peak clusters found significantly associated with sample storage
duration were identified as (probable) fibrinogen fragments. The m/z 5908 FGA576-630, its
oxidised form at m/z 5929, its SPA adduct at m/z 6114, and the two alleged fibrinogen
fragments at m/z 4215 and m/z 11091 all continuously decrease in peak intensity with
increasing sample storage time (Pattern B). The m/z 5341 FGA576-625 and its oxidised
form at m/z 5357, however, decrease in peak intensity only after approximately six
years of storage (Pattern E). These time-dependent changes in peak intensities represent
the characteristics of a sequential reaction, in which the fibrinogen fragments are
proteolytically degraded into subsequent smaller fragments.
Fibrinogen acts as the main factor in the formation of a blood clot by polymerisation to
a fibrin network and by enabling platelets to aggregate. (35) Similar to albumin, it is one
of the most abundant blood proteins (2-4 mg/ml), and as such, detection of proteolytic
fibrinogen fragments by SELDI-TOF MS as indicators for sample storage duration is not
surprising. In addition, various fibrinogen fragments have been found correlated to
coagulation time. While the m/z 5908 FGA576-630 continuously decreased in peak
intensity with coagulation time, the intensities of all other peaks initially increased with
coagulation time, after which they either remained stable (m/z 4215, m/z 11091), or
decreased (m/z 5341, m/z 6114). (8;31;36)
The different fibrinogen fragments have been described in relation to various types of
cancer. Villanueva et al. (11) reported a decreased serum m/z 5902 FGA576-630 peak
intensity in thyroid cancer compared to normal. In contrast, a 5.9 kDa peak (not
structurally identified) has been found increased in cancer vs. control in colorectal
(37;38), pancreatic (39), gastric (40), and lung cancer (41;42), and in hypopharyngeal
squamous cell carcinoma (HSCC) (43). The two latter studies also reported increased
serum m/z 5339, m/z 5927 and m/z 6114 peak intensities in cases compared to controls
(42;43). Although the sera of the groups compared in above-mentioned studies allegedly
were collected in the same time interval, precise information on storage duration
generally is not provided. Proteolytic degradation is, however, known to decelerate
with decreasing temperature. Since all sera investigated in these studies were stored at 80°C, influence of storage duration on peak expression may be limited compared to our
(-30°C) study.
73
Chapter 2.2
Figure 6
Structural identification of the m/z 8939 peak cluster.
Peptide mapping of the m/z 8939 peak cluster. MS spectrum of the m/z 8939 tryptic digest in the gel eluate.
All peptides were sequenced with tandem MS using Q-TOF for confirmation. Results from the MASCOT
search for protein identification include start and end positions of the peptide sequence starting from the
amino acid terminal of the whole protein, the observed m/z (Mr (obs)), transformed to its experimental mass
(Mr(expt)), the calculated mass (Mr(calc)) from the matched peptide sequence, as well as their mass difference
(Delta), the number of missed cleavage sites for trypsin (Miss), and the peptide sequence (in grey: the amino
acid sequence determined by Q-TOF MS).
1588.7266
100
1589.7362
1590.7296
%
1591.7397
1339.5571
1568.6200
1716.8252
1745.8353
960.5211
1095.5529
1037.4445
0
900
1000
1100
1321.5455
1200
1300
1800.8739
1949.8494
1433.6863
1400
1500
1600
1700
2164.0596
1966.8585
1592.7504
1341.5610
1800
1900
2274.1758
1984.8695
2000
2100
2166.0745
2200
2457.1707
2553.3289
2276.1829
2300
2400
2500
2600
2700
2800
2900
m/z, amu
MASCOT peptide mapping results:
StartStart- End
m/z 8939 C3adesArg anaphylatoxin
Mr
Mr
Mr
Delta Miss Sequence
(obs)
(expt)
(calc)
672-679
960.52
959.51
959.54
-0.03
1
R.SVQLTEKR.M
713-722
1095.55 1094.54 1094.58 -0.04
1
R.FISLGEACKK.V
699-709
1339.59 1338.58 1338.58 -0.00
2
K.ELRKCCEDGMR.E
692-704
1568.64 1567.63 1567.64 -0.00
2
R.KCCEDGMRENPMR.F
723-735
1588.74 1587.73 1587.74 -0.01
0
K.VFLDCCNYITELR.R
722-735
1716.84 1715.83 1715.84 -0.00
1
K.KVFLDCCNYITELR.R
Amino acid sequence of m/z 8939 C3adesArg (start: 672 - end: 747, 61% sequence coverage):
SVQLTEKRMDKVGKYPKELRKCCEDGMRENPMRFSCQRRTRFISLGEACKKVFLDCCNYITELRRQHARA
SHLGLA
74
Influence of sample storage duration on serum protein profiles
The C3adesArg clusters
We observed two peak clusters at m/z 8939 and m/z 4471, identified as C3adesArg and its
doubly charged form, to initially increase in peak intensity, after which intensities
remained constant during the residual time interval studied (Pattern D). The acute
phase reactant C3 is the most abundant (1.2 mg/ml) complement protein in serum (44).
This protein supports the activation of all three pathways of complement activation (the
classic, alternative, and lectin pathway) (45;46). Produced mainly in the liver and
adipocytes, it is formed by cleavage of C3 (185 kDa) by C3-convertases into C3b (176
kDa) and C3a (9 kDa) (47). The anaphylatoxin C3a is only short lived in serum, as
carboxypeptidases cleave the C-terminal arginine residue, creating the more stable but
biologically inactive C3adesArg (8.9 kDa) (46-48). Presumably, the conversion of C3a to
C3adesArg becomes complete during the first months of storage, explaining the observed
increase in m/z 8939 peak intensity during the first months of storage. As m/z 8939
C3adesArg peak intensities remain stable following the initial increase, the protein appears
not susceptible to proteolytic degradation, a finding that is corroborated by the reported
stability of C3a(desArg) to extremes of heat and pH (49).
Complement can also be activated in vitro, as activation of the coagulation system is
followed by activation of platelets, eliciting complement activation (50). Although not
structurally identified, Banks et al. (8) indeed reported the intensity of an IMAC-Cu
m/z 8939 and m/z 4477 peak to significantly increase with prolonged coagulation times.
The 8.9 kDa C3adesArg peak has been described as an alleged biomarker in a number of
studies investigating different cancer types using serum SELDI-TOF MS analysis (21;5154). This protein peak has been found increased in different cancer types, such as breast
(18;21;24;55), colorectal (51;54), hepatocellular cancer (52;56;57), and chronic lymphoid
malignancies (53). In contrast, however, the studies of Hu et al. (58) and Han et al. (42)
have reported an 8.9 kDa peak (not structurally identified) to be decreased in breast and
lung cancer sera. As information regarding sample collection intervals is rarely provided
by reported studies, the extent to which sample storage duration might have biased
reported results can not be assessed.
Of particular interest though are two studies published by Li et al. (18;21). Their first
study reports an 8.9 kDa peak that was increased in breast cancer compared to healthy
controls. This peak had a very high diagnostic performance (18). The cancer sera were,
however, collected during a (non-specified) longer time interval than the control sera,
as mentioned in their validation study, but the association between sample storage
duration and peak expression was not investigated. The increase of the 8.9 kDa peak
(identified as C3adesArg) in breast cancer was confirmed by analysis of a second,
independent sample set, all sera of which were collected within the same 2-year
window. Compared to their discovery study, however, the diagnostic performance of
the 8.9 kDa peak was much lower, indicating probable bias by storage duration in their
discovery study (21). Nonetheless, results of their initial study were indeed
reproducible, as proven by the validation study. Their first study sample set was stored
75
Chapter 2.2
at -80°C, a temperature at which the formation of C3adesArg might be limited compared
to -30°C. Their validation sample set was, however, stored at -30°C, but as all these sera
were procured in the same time interval, the samples of this set are unlikely to be biased
by storage duration. Although evidently, results can be reproducible when sample
groups differ in storage time, investigators are not absolved from assessment of potential
bias by storage parameters.
We have not yet structurally identified the last peak cluster at m/z 4441. This peak was
found positively associated to sample storage duration. Most likely, this m/z 4441 cluster
represents a high-abundant serum protein fragment, formed by continuous non-specific
proteolytic activity during storage at -30°C.
Conclusion
In conclusion, we have identified SELDI-TOF MS peak intensities of C3adesArg and
various albumin and fibrinogen fragments to be significantly associated to storage
duration in sera of 140 breast cancer patients. Reported proteins, however, have also
been described as potential cancer markers in previous reports, rendering them specific
to both disease and sample handling issues. Hence, assessment of potential confounding
pre-analytical parameters (such as storage time) should be an integral component of
biomarker discovery and validation studies, to prevent experimental variation to be
interpreted erroneously as disease associated variation. Moreover, regarding the
different (non-linear) patterns by which peak intensities were found associated to
storage duration, merely linear corrections for sample storage duration will not
necessarily suffice.
Acknowledgement
This study was supported by a grant of the Dutch Cancer Society (project NKI 20053421).
References
(1)
Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ et al. Proteomics: new
perspectives, new biomedical opportunities. Lancet 2000; 356(9243):1749-1756.
(2)
Aebersold R, Anderson L, Caprioli R, Druker B, Hartwell L, Smith R. Perspective: a program to improve
protein biomarker discovery for cancer. J Proteome Res 2005; 4(4):1104-1109.
(3)
Dove A. Proteomics: translating genomics into products? Nat Biotechnol 1999; 17(3):233-236.
(4)
Clarke W, Zhang Z, Chan DW. The application of clinical proteomics to cancer and other diseases. Clin
Chem Lab Med 2003; 41(12):1562-1570.
76
Influence of sample storage duration on serum protein profiles
(5)
Hutchens TW, Yip TT. New desorption strategies for the mass spectrometric analysis of macromolecules.
Rapid Commun Mass Spectrom 1993; 7:576-580.
(6)
Issaq HJ, Veenstra TD, Conrads TP, Felschow D. The SELDI-TOF MS approach to proteomics: protein
profiling and biomarker identification. Biochem Biophys Res Commun 2002; 292(3):587-592.
(7)
Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects.
Mol Cell Proteomics 2002; 1(11):845-867.
(8)
Banks RE, Stanley AJ, Cairns DA, Barrett JH, Clarke P, Thompson D et al. Influences of blood sample
processing
on
low-molecular-weight
proteome
identified
by
surface-enhanced
laser
desorption/ionization mass spectrometry. Clin Chem 2005; 51(9):1637-1649.
(9)
Zhang Z, Bast RC, Jr., Yu Y, Li J, Sokoll LJ, Rai AJ et al. Three biomarkers identified from serum
proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004; 64(16):5882-5890.
(10) Engwegen JY, Helgason HH, Cats A, Harris N, Bonfrer JM, Schellens JH et al. Identification of serum
proteins discriminating colorectal cancer patients and healthy controls using surface-enhanced laser
desorption ionisation-time of flight mass spectrometry. World J Gastroenterol 2006; 12(10):1536-1544.
(11) Villanueva J, Martorella AJ, Lawlor K, Philip J, Fleisher M, Robbins RJ et al. Serum peptidome patterns
that distinguish metastatic thyroid carcinoma from cancer-free controls are unbiased by gender and age.
Mol Cell Proteomics 2006; 5(10):1840-1852.
(12) Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer
Inst 2005; 97(4):315-319.
(13) Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L et al. Correcting common errors in
identifying cancer-specific serum peptide signatures. J Proteome Res 2005; 4(4):1060-1072.
(14) Findeisen P, Sismanidis D, Riedl M, Costina V, Neumaier M. Preanalytical impact of sample handling on
proteome profiling experiments with matrix-assisted laser desorption/ionization time-of-flight mass
spectrometry. Clin Chem 2005; 51(12):2409-2411.
(15) Albrethsen J, Bogebo R, Olsen J, Raskov H, Gammeltoft S. Preanalytical and analytical variation of
surface-enhanced laser desorption-ionization time-of-flight mass spectrometry of human serum. Clin
Chem Lab Med 2006; 44(10):1243-1252.
(16) Hsieh SY, Chen RK, Pan YH, Lee HL. Systematical evaluation of the effects of sample collection
procedures on low-molecular-weight serum/plasma proteome profiling. Proteomics 2006; 6(10):31893198.
(17) Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al. HUPO Plasma Proteome
Project specimen collection and handling: towards the standardization of parameters for plasma
proteome samples. Proteomics 2005; 5(13):3262-3277.
(18) Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for
identification of serum biomarkers to detect breast cancer. Clin Chem 2002; 48(8):1296-1304.
(19) Belluco C, Petricoin EF, Mammano E, Facchiano F, Ross-Rucker S, Nitti D et al. Serum Proteomic
Analysis Identifies a Highly Sensitive and Specific Discriminatory Pattern in Stage 1 Breast Cancer. Ann
Surg Oncol 2007; 14(9):2470-2476.
(20) Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH et al. Serum protein fingerprinting
coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate
hyperplasia and healthy men. Cancer Res 2002; 62(13):3609-3614.
(21) Li J, Orlandi R, White CN, Rosenzweig J, Zhao J, Seregni E et al. Independent Validation of Candidate
Breast Cancer Serum Biomarkers Identified by Mass Spectrometry. Clin Chem 2005; 51(12):2229-2235.
(22) Engwegen JY, alberts M, knol JC, Jimenez C.R., Depla ACTM, Tuynman H et al. Influence of variations
in sample handling on SELDI-TOF MS serum protein profiles for colorectal cancer. Proteomics Clin
Appl 2008; 2(6):936-945.
77
Chapter 2.2
(23) McLerran D, Grizzle WE, Feng Z, Bigbee WL, Banez LL, Cazares LH et al. Analytical Validation of
Serum Proteomic Profiling for Diagnosis of Prostate Cancer; Sources of Sample Bias. Clin Chem 2007;
54(1):44-52.
(24) Mathelin C, Cromer A, Wendling C, Tomasetto C, Rio MC. Serum biomarkers for detection of breast
cancers: a prospective study. Breast Cancer Res Treat 2005;1-8.
(25) Becker S, Cazares LH, Watson P, Lynch H, Semmes OJ, Drake RR et al. Surfaced-enhanced laser
desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1
and sporadic breast cancer. Ann Surg Oncol 2004; 11(10):907-914.
(26) Laronga C, Becker S, Watson P, Gregory B, Cazares L, Lynch H et al. SELDI-TOF serum profiling for
prognostic and diagnostic classification of breast cancers. Dis Markers 2003; 19(4-5):229-238.
(27) Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D et al. A novel approach toward
development of a rapid blood test for breast cancer. Clin Breast Cancer 2003; 4(3):203-209.
(28) Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M et al. Postoperative serum
proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving
adjuvant chemotherapy. Oncogene 2006; 25(7):981-989.
(29) Karsan A, Eigl BJ, Flibotte S, Gelmon K, Switzer P, Hassell P et al. Analytical and preanalytical biases in
serum proteomic pattern analysis for breast cancer diagnosis. Clin Chem 2005; 51(8):1525-1528.
(30) Hu J, Coombes KR, Morris JS, Baggerly KA. The importance of experimental design in proteomic mass
spectrometry experiments: some cautionary tales. Brief Funct Genomic Proteomic 2005; 3(4):322-331.
(31) Baumann S, Ceglarek U, Fiedler GM, Lembcke J, Leichtle A, Thiery J. Standardized approach to
proteome profiling of human serum based on magnetic bead separation and matrix-assisted laser
desorption/ionization time-of-flight mass spectrometry. Clin Chem 2005; 51(6):973-980.
(32) Fung ET, Yip TT, Lomas L, Wang Z, Yip C, Meng XY et al. Classification of cancer types by measuring
variants of host response proteins using SELDI serum assays. Int J Cancer 2005; 115(5):783-789.
(33) Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB et al. Differential
exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006; 116(1):271284.
(34) Quinlan GJ, Martin GS, Evans TW. Albumin: biochemical properties and therapeutic potential.
Hepatology 2005; 41(6):1211-1219.
(35) Mosesson MW. Fibrinogen and fibrin structure and functions. J Thromb Haemost 2005; 3(8):1894-1904.
(36) Timms JF, Arslan-Low E, Gentry-Maharaj A, Luo Z, T'Jampens D, Podust VN et al. Preanalytic influence
of sample handling on SELDI-TOF serum protein profiles. Clin Chem 2007; 53(4):645-656.
(37) Chen YD, Zheng S, Yu JK, Hu X. Artificial neural networks analysis of surface-enhanced laser
desorption/ionization mass spectra of serum protein pattern distinguishes colorectal cancer from healthy
population. Clin Cancer Res 2004; 10(24):8380-8385.
(38) Yu JK, Chen YD, Zheng S. An integrated approach to the detection of colorectal cancer utilizing
proteomics and bioinformatics. World J Gastroenterol 2004; 10(21):3127-3131.
(39) Koopmann J, Zhang Z, White N, Rosenzweig J, Fedarko N, Jagannath S et al. Serum diagnosis of
pancreatic adenocarcinoma using surface-enhanced laser desorption and ionization mass spectrometry.
Clin Cancer Res 2004; 10(3):860-868.
(40) Liang Y, Fang M, Li J, Liu CB, Rudd JA, Kung HF et al. Serum proteomic patterns for gastric lesions as
revealed by SELDI mass spectrometry. Exp Mol Pathol 2006; 81(2):176-180.
(41) Xiao X, Liu D, Tang Y, Guo F, Xia L, Liu J et al. Development of proteomic patterns for detecting lung
cancer. Dis Markers 2003; 19(1):33-39.
(42) Han KQ, Huang G, Gao CF, Wang XL, Ma B, Sun LQ et al. Identification of lung cancer patients by
serum protein profiling using surface-enhanced laser desorption/ionization time-of-flight mass
spectrometry. Am J Clin Oncol 2008; 31(2):133-139.
78
Influence of sample storage duration on serum protein profiles
(43) Zhou L, Cheng L, Tao L, Jia X, Lu Y, Liao P. Detection of hypopharyngeal squamous cell carcinoma
using serum proteomics. Acta Otolaryngol 2006; 126(8):853-860.
(44) Hugli TE. Human anaphylatoxin (C3a) from the third component of complement. Primary structure. J
Biol Chem 1975; 250(21):8293-8301.
(45) Bohana-Kashtan O, Ziporen L, Donin N, Kraus S, Fishelson Z. Cell signals transduced by complement.
Mol Immunol 2004; 41(6-7):583-597.
(46) Sahu A, Sunyer JO, Moore WT, Sarrias MR, Soulika AM, Lambris JD. Structure, functions, and evolution
of the third complement component and viral molecular mimicry. Immunol Res 1998; 17(1-2):109-121.
(47) de Bruijn MH, Fey GH. Human complement component C3: cDNA coding sequence and derived
primary structure. Proc Natl Acad Sci U S A 1985; 82(3):708-712.
(48) Nettesheim DG, Edalji RP, Mollison KW, Greer J, Zuiderweg ER. Secondary structure of complement
component C3a anaphylatoxin in solution as determined by NMR spectroscopy: differences between
crystal and solution conformations. Proc Natl Acad Sci U S A 1988; 85(14):5036-5040.
(49) Hugli TE, Morgan WT, Muller-Eberhard HJ. Circular dichroism of C3a anaphylatoxin. Effects of pH,
heat, guanidinium chloride, and mercaptoethanol on conformation and function. J Biol Chem 1975;
250(4):1479-1483.
(50) Hamad OA, Ekdahl K, Lambris JD, Nilsson B. Complement activation triggered by thrombin receptoractivated platelets. Mol.Immunol. 44, 180. 2007.
(51) Habermann JK, Roblick UJ, Luke BT, Prieto DA, Finlay WJ, Podust VN et al. Increased serum levels of
complement C3a anaphylatoxin indicate the presence of colorectal tumors. Gastroenterology 2006;
131(4):1020-1029.
(52) Lee IN, Chen CH, Sheu JC, Lee HS, Huang GT, Chen DS et al. Identification of complement C3a as a
candidate biomarker in human chronic hepatitis C and HCV-related hepatocellular carcinoma using a
proteomics approach. Proteomics 2006; 6(9):2865-2873.
(53) Miguet L, Bogumil R, Decloquement P, Herbrecht R, Potier N, Mauvieux L et al. Discovery and
identification of potential biomarkers in a prospective study of chronic lymphoid malignancies using
SELDI-TOF-MS. J Proteome Res 2006; 5(9):2258-2269.
(54) Ward DG, Suggett N, Cheng Y, Wei W, Johnson H, Billingham LJ et al. Identification of serum
biomarkers for colon cancer by proteomic analysis. Br J Cancer 2006; 94(12):1898-1905.
(55) Pusztai L, Gregory BW, Baggerly KA, Peng B, Koomen J, Kuerer HM et al. Pharmacoproteomic analysis
of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or
adjuvant chemotherapy for breast carcinoma. Cancer 2004; 100(9):1814-1822.
(56) Poon TC, Yip TT, Chan AT, Yip C, Yip V, Mok TS et al. Comprehensive proteomic profiling identifies
serum proteomic signatures for detection of hepatocellular carcinoma and its subtypes. Clin Chem 2003;
49(5):752-760.
(57) Schwegler EE, Cazares L, Steel LF, Adam BL, Johnson DA, Semmes OJ et al. SELDI-TOF MS profiling of
serum for detection of the progression of chronic hepatitis C to hepatocellular carcinoma. Hepatology
2005; 41(3):634-642.
(58) Hu Y, Zhang S, Yu J, Liu J, Zheng S. SELDI-TOF-MS: the proteomics and bioinformatics approaches in
the diagnosis of breast cancer. Breast 2005; 14(4):250-255.
79
Chapter
Protein profiling of serum
3
Chapter
Serum protein profiling for
diagnosis of breast cancer
using SELDI-TOF MS
Marie-Christine W. Gast
Carla H. van Gils
Lodewijk F.A. Wessels
Nathan Harris
Johannes M.G. Bonfrer
Emiel J. Th. Rutgers
Jan H.M. Schellens
Jos H. Beijnen
Submitted for publication
3.1
Chapter 3.1
Abstract
Early detection is of paramount importance in reducing breast cancer related mortality,
yet the diagnosis of breast cancer is hampered by a lack of adequate detection methods.
In search for novel markers for breast cancer, the surface-enhanced laser
desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS) technology
has been applied with varying success in the investigation of the serum proteome. Both
robustness and validity of alleged markers can, however, be jeopardised by demographic
and pre-analytical variables, which are known to exert profound effects on protein
profiles obtained by SELDI-TOF MS. Although validation as well as structural
identification of putative markers can aid in determining their true performance, thus
far, this has been performed in only a limited number of studies. We therefore aimed to
identify and validate novel serum protein profiles specific for breast cancer and assess
the influence of clinical (i.e., subjects’ age) and pre-analytical (i.e., sample storage
duration) variables on the constructed classifiers.
To this end, sera of breast cancer patients (n = 152) and healthy controls (n = 129) were
analysed using the SELDI-TOF MS technology. Cases and controls were randomly and
evenly divided into a training and test set. In the training set, 14 peak clusters were
found to differ significantly in peak expression between cases and controls. The
intensities of none of these peak clusters were influenced by subjects’ age and sample
storage duration. Ten peak clusters were also found significantly discriminative in the
test set. One peak cluster was structurally identified as C3a des-arginine anaphylatoxin,
while 12 other peak clusters were tentatively identified as inter-alpha-trypsin inhibitor
heavy chain 4 fragments and a fibrinogen fragment, respectively. Subsequent logistic
regression analyses on the training set yielded a classification model with a moderate
performance on the test set, corresponding to those reported in previously performed
validation studies. As this moderate performance most likely originates from the highly
heterogeneous nature of breast cancer, selection of breast cancer subgroups for
comparison with healthy controls is expected to improve results of future diagnostic
SELDI-TOF MS studies.
84
Diagnostic serum protein profiles for breast cancer
Introduction
The American Cancer Society has estimated that breast cancer will be the most
commonly diagnosed cancer among women in the USA in 2008, as it is expected to
account for 26% of all new cancer cases among women (1). Following lung cancer,
breast cancer currently is the second leading cause of cancer deaths in women (1). As
the 5-year survival rate decreases from 98% for localised disease to 26% for distant stage
disease (2), early detection is of paramount importance in reducing breast cancer related
mortality. The diagnosis of breast cancer is, however, hampered by a lack of adequate
detection methods, resulting in detection of only 63% of breast cancers at an early stage
(1). Although mammography currently is the most widely applied imaging test today,
its predictive value is lower in women with dense breast tissue and smaller lesions.
Moreover, no molecular markers are recommended for the (early) detection of breast
cancer hitherto. Currently used serum tumour markers in breast cancer, e.g., Cancer
Antigen 15.3, lack adequate sensitivity and specificity to be applicable in early
detection, and are therefore approved by the FDA only for monitoring therapy of
advanced breast cancer or recurrence (3).
The application of a single biomarker in the detection of breast cancer may, however,
not be feasible, as a single marker is unlikely to cover the high heterogeneity of breast
cancer. Instead, a panel of markers is expected to better reflect breast cancer
complexity, yielding an improved sensitivity and specificity. With cancer being, for a
large part, a genetic disease, researchers initially searched for biomarkers by employing
genomic and transcriptomic approaches. Although this has greatly expanded our insight
into the genetic basis of cancer, it is currently understood that the functional “endunits” of the genome, the proteins, cannot be predicted by genetic and transcriptomic
data alone. Due to amongst other post-transcriptional mRNA modifications (e.g.,
alternative splicing) and post-translational protein modifications, one gene can encode
multiple proteins, reflecting both the intrinsic genetic programme of the cell and the
impact of its immediate environment (4). As such, the proteome provides a more
realistic and detailed view of the biological status, offering a richer source of potential
biomarkers.
One of the techniques currently applied in proteomics research of breast cancer is
surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDITOF MS). Until now, eleven studies have been published in which the SELDI-TOF MS
platform was applied with varying success in the identification and validation of serum
markers for diagnosis (5-12), prognosis (13), or monitoring of therapy-efficacy (14) or toxicity (15) in breast cancer. However, issues have been raised concerning the
robustness and validity of alleged markers discovered by SELDI-TOF MS. A potential
drawback of analysing high-dimensional proteomic (SELDI-TOF MS) data for disease
associated biomarkers is the propensity to discover patterns among variables that are the
result of pre-analytical artefacts in a given sample set, rather than of the pathology of
85
Chapter 3.1
interest (16). Indeed, several lines of evidence indicate that pre-analytical variables, e.g.,
sample collection, processing and storage, can exert profound effects on protein profiles,
regardless of true biological variation. In addition, clinical characteristics, such as
patients’ age, could also introduce bias (17). Despite these concerns, only few studies
investigating the serum proteome for discovery of breast cancer specific biomarkers
investigate the possible influence of pre-analytical and patient-related variables on the
expression of potential biomarkers.
The raised issues on the validity and robustness of alleged biomarkers can, however,
also be addressed by validation and structural identification (16). Nonetheless, thus far,
in breast cancer, only two panels of biomarkers discovered by SELDI-TOF MS have
been validated by analysis of independent sample sets, resulting in partial (10;11) or no
validation (18). Moreover, only few biomarkers discovered by SELDI-TOF MS breast
cancer research have been structurally identified.
In the current study, we aimed to discover and validate novel serum protein profiles
specific for breast cancer. To this end, archival sera of breast cancer patients and
healthy controls were analysed using SELDI-TOF MS. Spectral data were merged in one
file, after which they were randomly and evenly split into a training and test set. In the
training set, we detected 14 discriminating peak clusters, one cluster of which was
structurally identified. Furthermore, the relationship between the intensity of the
classifier peak clusters and breast cancer status was adjusted for demographic and preanalytical variables (i.e., subjects’ age and sample storage duration). Finally, the samples
in the test set were applied for validation purposes.
Materials and methods
Study population
Archival sera of 152 breast cancer patients (BC) and 129 female healthy controls (HC),
collected between January 2003 and July 2005, were analysed on different occasions in
our laboratory using standardised analytical procedures. All sera were collected prior to
any therapy, with individuals’ informed consent after approval by the institutional
review boards. All sera originate from the Netherlands Cancer Institute serum bank,
where they had been collected and stored for 3 to 50 months at -30°C according to
standard procedures.
Chemicals
All used chemicals were obtained from Sigma, St. Louis, MO, USA, unless stated
otherwise.
86
Diagnostic serum protein profiles for breast cancer
SELDI--TOF MS protein profiling
SELDI
Serum protein profiling was performed using the ProteinChip SELDI (PBSIIc) Reader
(Bio-Rad Labs, Hercules, CA, USA). Various chip chemistries, binding- and washingprocedures and sample pretreatments were initially evaluated to determine which
affinity chemistry and sample pretreatment procedure provided the best serum profiles
in terms of number and resolution of proteins. Immobilized Metal Affinity Capture
(IMAC30) arrays were selected for further analysis. Samples were analysed in three
batches (Batch 1: BC: n = 40, HC: n = 40; Batch 2: BC: n = 43, HC: n = 46; Batch 3: BC: n
= 69, HC: n = 43). The samples in Batch 1 were analysed in singular, while the samples
in Batch 2 and 3 were analysed in duplicate. Throughout the assay, arrays were
assembled in a 96-well bioprocessor, which was shaken on a platform shaker at 300
rpm.
Arrays were charged twice with 50 μl 100 mM nickel sulphate (Merck, Darmstadt,
Germany) for 15 min, followed by three rinses with deionised water (Braun,
Emmenbrücke, Germany) and two equilibrations with 200 μl Phosphate Buffered Saline
(PBS; 0.01 M) pH 7.4 / 0.5 M sodium chloride / 0.1% TritonX-100 (binding buffer;
sodium chloride from Merck) for 5 min. Unfractionated serum samples were thawed on
ice and denatured by 1:10 dilution in 9 M urea / 2% 3-[(3-cholamidopropyl)dimethylammonio-]-1-propanesulfonic acid (CHAPS). Pretreated samples were diluted
1:10 in binding buffer and randomly applied to the arrays. After a 30 min incubation,
the arrays were washed twice with binding buffer and twice with PBS pH 7.4 / 0.5 M
sodium chloride for 5 min. Following a quick rinse with deionised water, arrays were
air-dried. A 50% sinapinic acid (SPA; Bio-Rad Labs) solution in 50% acetonitrile (ACN;
Biosolve, Valkenswaard, The Netherlands) / 0.5% trifluoroacetic acid (TFA; Merck) was
applied twice (1.0 μl) to the arrays as the matrix. Following air-drying, the arrays were
analysed using the ProteinChip SELDI (PBS IIc) Reader. As the three batches were
analysed on different occasions (with PBS IIc Reader maintenance in between), data
acquisition was optimised for each sample set separately (data not shown), to obtain
similar spectra. For mass accuracy, the instrument was calibrated on each day of
measurements with All-in-One peptide standard (Bio-Rad Labs).
Statistics and bioinformatics
Spectra were processed per batch by the ProteinChip Software v3.1 (Bio-Rad Labs).
Spectra were baseline subtracted, followed by normalisation to the total ion current.
Spectra with normalisation factors > 2 or < 0.5 were excluded from further analysis. The
Biomarker Wizard (BMW) software package was applied for peak detection. BMW
settings were optimised for each batch separately (data not shown), to ascertain correct
detection of real peaks (instead of peaks that merely represent noise). Peak information
was subsequently exported as spreadsheet files, and peak intensities from the duplicate
analyses in Batch 2 and 3 were averaged. The three batches were analysed on three
87
Chapter 3.1
separate occasions, a parameter known to influence spectral data (19;20). As such,
merging peak intensity data of the three batches could lead to spurious results. To this
end, first, the intensities of peaks occurring across all three batches were log
transformed to obtain normal distributions. Next, the log transformed peak intensities
were converted to standard Z-values per batch, by subtracting the mean and dividing
by the standard deviation. The log-Z transformed data of the three batches were
merged in one file. After this, cases and controls were randomly divided over a training
(BC: n = 76, HC: n = 65) and test (BC: n = 76, HC: n = 64) set. In the training set, the
parametric T-test was applied for the comparison of the mean log-Z transformed peak
intensities between cases and controls. Resulting p-values were corrected for multiple
testing by the Bonferroni method, by multiplying p-values with the number of peak
clusters detected and tested.
To estimate the influence of subjects’ age and storage duration on the relationship
between the 14 discriminating peak clusters and breast cancer status, logistic regression
analyses were performed on the training set. First, we calculated a crude odds ratio per
peak cluster, using a univariate model (i.e., by inclusion of only one peak cluster as
continuous variable). Next, multivariate odds ratios adjusted for subjects’ age
(categorized according to tertiles: ≤ 51.3 years, 51.3-61.4 years, > 61.4 years), and storage
duration (categorized according to tertiles: ≤ 14.5 months, 14.5-31.7 months, > 31.7
months) were calculated. Both parameters were considered confounders if the adjusted
odds ratio was 10% different from the crude odds ratio.
To investigate the relationship between a combination of the log-Z transformed peak
intensities and the presence of breast cancer, crude odds ratios for each of the peak
intensities (as continuous variables) were estimated in a logistic regression model with
the inclusion of all peak clusters detected based on forward entry (p < 0.05). Again, to
investigate whether the relationship between peak intensities and the presence of breast
cancer could be explained by subjects’ age and / or sample storage duration, the odds
ratios were adjusted for these parameters.
The classification performance of the logistic regression model was evaluated by
estimation of the area under the Receiver Operating Characteristic (ROC) curve (AUC)
and accompanying 95% confidence interval. The model was subsequently applied to the
test set for validation purposes. All statistical analyses were performed using SPSS
statistical software, version 13.0 (SPSS Inc., Chicago, IL, USA).
purification
Peptide purifica
tion and identification
Structural identification of potential biomarkers was performed previously. Briefly,
potential markers were purified from serum using anion-exchange chromatographic,
size exclusion, and gel-electrophoresis techniques, following by trypsin digestion. The
peptide map of the digest, acquired on the ProteinChip SELDI (PBS IIc) Reader, was
investigated with the NCBI database using the ProFound search engine at
http://prowl.rockefeller.edu/prowl-cgi/profound.exe. Confirmation of protein identity
88
Diagnostic serum protein profiles for breast cancer
was provided by sequencing tryptic digest peptides by quadrupole-TOF (Q-TOF) MS
(Applied Biosystems / MSD Sciex, Foster City, CA, USA) fitted with a ProteinChip
Interface (PCI-1000). Fragment ion spectra were taken to search the SwissProt 44.2
database (Homo Sapiens: 11072 sequences) using the MASCOT search engine at
www.matrixscience.com (Matrix Science Ltd., London, UK). Protein identity was
further confirmed by immunoassay on ProteinA beads.
Results
Study population
Patient and sample characteristics are summarized in Table 1. The healthy controls
were significantly younger than the breast cancer patients at time of sample
procurement (Mann-Whitney U test (MWU); p < 0.001). The majority of breast cancer
patients had invasive ductal carcinoma (76%) and was diagnosed with Stage 2 (63%)
disease. The median sample storage duration was slightly longer for breast cancer sera
(median: 26.0 months) than for the healthy control sera (median: 20.1 months) (MWU;
p = 0.018).
Table 1
Patient and sample characteristics of the study population.
Parameter
N
Breast cancer
152
Healthy control
129
Age (years), median
[IQR]
61.1
[50.3-67.0]
52.0
[42.0-57.7]
Stage*
n.a.
0
1
2A / 2B
3A / 3C
7
30
68 / 28
13 / 6
DCIS
IDC
ILC
IDC & ILC
Other
6
116
16
5
9
Diagnosis*
n.a.
Sample storage time (months), median
[IQR]
Sample collection interval
26.0
[14.1-36.7]
Apr-’03 - Jul-’05
20.1
[12.6-31.9]
Jan-’03 - Jul-‘05
Abbreviations: DCIS: ductal carcinoma in situ, IDC: invasive ductal carcinoma, ILC: invasive lobular
carcinoma, IQR: interquartile range, n.a.: not applicable. * Pathologically determined stage and diagnosis;
other: mucinous, tubular, mixed, unknown.
89
Chapter 3.1
SELDI--TOF MS protein profiling
SELDI
Representative SELDI-TOF MS spectra are presented in Figure 1. Following spectrum
pre-processing and normalisation, 73 (BC: n = 36; HC: n = 37), 89 (BC: n = 43; HC: n =
46), and 111 samples (BC: n = 68; HC: n = 43) were left for analysis in Batch 1, 2, and 3,
respectively. The Biomarker Wizard detected 57 peak clusters across all three batches.
In the training set, 14 peak clusters were found significantly different in expression
between breast cancer and control (T-test; Bonferroni corrected p < 0.05, Table 2).
Except for the m/z 4219 and m/z 11745 peak clusters, intensities were found decreased
in breast cancer compared to control (Table 2: logistic regression, odds ratio < 1).
Following correction for subjects’ age and sample storage duration, the adjusted odds
ratios of three peak clusters (m/z 2733, 3965, and 4219) differed by more than 10% from
the crude odds ratios. All three peaks remain, however, significantly related to breast
cancer status. Ten of these 14 peak clusters were found significantly different in peak
expression between breast cancer and control in the test set as well.
Figure 1
Representative example of protein profiles obtained from a healthy control (HC) and a breast
cancer patient (BC).
4000
Relative intensity →
40
20
6000
8000
8939.6+H
4289.7+H
3979.8+H
HC
3281.3+H
0
40
BC
20
0
4000
6000
8000
Mass-to-charge ratio →
Next, multivariate logistic regression analyses were performed on the training set.
Following forward entry inclusion of all peak clusters detected spectrum-wide, four
peak clusters (m/z 4219, 4309, 5350, and 29183) were incorporated in the model,
resulting in a ROC AUC of 0.813 (85% CI: 0.742-0.884) (Table 3). Two peak clusters
(m/z 4219 and 4309) were already found significantly different in peak expression
between breast cancer and healthy control. Of the four peak clusters included in this
model, only m/z 4219 had an adjusted odds ratio that differed more than 10% from the
crude odds ratio. Similar to the univariate analyses, however, after adjustment this peak
90
Diagnostic serum protein profiles for breast cancer
cluster was even more strongly related to breast cancer status. The multivariate model
classified the samples in the training set with a sensitivity and specificity of 74.3% and
71.9%, respectively. Model performance was lower following validation on the test set
(ROC AUC: 0.713 (95% CI: 0.626-0.800); sensitivity: 72.6%, specificity: 61.3%).
Table 2
Characteristics of the 14 clusters that differ significantly in expression between breast cancer and
healthy control in the training set.
Cluster
T-test
Training
set
p-value*
0.011
< 0.001
< 0.001
< 0.001
< 0.001
0.004
0.005
0.007
0.003
0.046
0.002
< 0.001
< 0.001
0.003
(m/z)
2733
3166
3282
3299
3691
3782
3965
3980
3997
4219
4292
4309
8940
11745
Test set
p-value*
0.047
0.013
0.005
< 0.001
n.s.
0.004
0.004
n.s.
n.s.
n.s.
0.028
0.007
0.004
0.008
Logistic regression analyses
Training set †
Training set ‡
(adjusted)
OR
(95% CI)
OR
(95% CI)
0.45 (0.28-0.70)
0.50 (0.31-0.81)
0.40 (0.26-0.62)
0.41 (0.25-0.67)
0.41 (0.26-0.64)
0.44 (0.28-0.70)
0.42 (0.27-0.64)
0.41 (0.26-0.66)
0.37 (0.23-0.59)
0.37 (0.22-0.63)
0.44 (0.28-0.68)
0.48 (0.30-0.75)
0.45 (0.29-0.69)
0.52 (0.33-0.81)
0.47 (0.31-0.71)
0.49 (0.31-0.77)
0.44 (0.29-0.67)
0.44 (0.27-0.70)
1.86 (1.27-2.72)
2.40 (1.51-3.80)
0.39 (0.24-0.65)
0.42 (0.25-0.71)
0.32 (0.20-0.54)
0.32 (0.18-0.56)
0.37 (0.24-0.57)
0.35 (0.22-0.58)
2.21 (1.46-3.33)
2.22 (1.39-3.56)
Test set †
OR
0.52
0.50
0.49
0.40
0.58
0.49
0.49
0.65
0.62
1.80
0.50
0.48
0.48
2.18
(95% CI)
(0.34-0.79)
(0.34-0.74)
(0.34-0.72)
(0.26-0.61)
(0.40-0.85)
(0.33-0.71)
(0.33-0.71)
(0.45-0.94)
(0.44-0.87)
(1.23-2.64)
(0.33-0.76)
(0.32-0.72)
(0.33-0.71)
(1.43-3.34)
Abbreviations: 95% CI: 95% Confidence Interval, n.s.: not significant, OR: odds ratio. † Crude logistic
regression analyses, by inclusion of one peak cluster (continuous), ‡ adjusted logistic regression analyses
(training set only), by inclusion of one peak cluster (continuous), subjects’ age (categorical), and sample
storage duration (categorical), * Bonferroni corrected p-values.
Peptide purification and identification
One of the 14 peak clusters found significantly different between breast cancer and
control was m/z 8940, which we previously identified as complement component 3
precursor by peptide mapping (ProFound; estimated Z-score 1.57, 4% sequence
coverage). Amino acid sequencing of 6 peptides in the tryptic digest by tandem MS on a
Q-TOF identified the marker as C3a des-arginine anaphylatoxin (C3adesArg, 61%
sequence coverage), a 76 amino acid protein with theoretical mass 8939.46 Da and pI
9.54. This identity was confirmed by an immunoassay, for which ProteinA beads were
loaded with a C3a polyclonal antibody (Abcam Ltd, Cambridge, UK).
Figure 2 depicts the correlation matrix presenting the (absolute) Pearson’s correlation
coefficients calculated between the peak intensities of the 14 peaks found significantly
different in expression between breast cancer and healthy control. To preclude bias by
group, all Pearsons’ correlation analyses were performed in the healthy controls of the
total study population. As 11 peak clusters were found highly correlated to each other
(Pearson’s R > 0.63, Figure 2), we hypothesize these clusters to represent multiple
fragments of one founder protein. Using data from previous publications, we suggest
91
Chapter 3.1
this founder protein to be inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4). Eight of
the alleged ITIH4 peak clusters had an observed mass corresponding to the theoretical
mass of the different ITIH4 fragments described in the literature (Table 4). The peak
clusters at m/z 4219 and 11745 were not correlated to any of the significantly different
peak clusters. The m/z 4219 and m/z 5350 peak clusters, selected in the multivariable
logistic regression analysis, were previously identified as (putative) fibrinogen fragments
by our group.
Table 3
Multivariate logistic regression analyses in the training set, by forward entry inclusion of all peak
clusters detected, before and after adjustment for subjects’ age and sample storage duration.
Multivariate model
OR
(95% CI)
1.94
(1.24-3.04)
0.26
(0.14-0.48)
0.62
(0.39-0.97)
0.53
(0.33-0.83)
Variable
m/z 4219
m/z 4309
m/z 5350
m/z 28183
p-value
0.004
< 0.001
0.035
0.006
Multivariate model, adjusted
OR
(95% CI)
p-value
2.78
(1.59-4.86)
< 0.001
0.26
(0.13-0.52)
< 0.001
0.60
(0.36-1.01)
0.054
0.49
(0.29-0.85)
0.011
Performance
ROC AUC
Training set
Test set
0.813
0.713
Sensitivity
Specificity
74.3%
71.9%
Sensitivity
Specificity
72.6%
61.3%
(0.742-0.884)
(0.626-0.800)
Training set
Test set
Abbreviations: AUC: area under the Receiver Operating Characteristic (ROC) curve, 95% CI: 95% Confidence
Interval, OR: odds ratio, ROC: Receiver Operating Characteristics curve.
Discussion
In the current study, sera of breast cancer patients (n = 152) and healthy controls (n =
129) were analysed using the SELDI-TOF MS technology. Spectra were divided into a
training and test set, and 14 peak clusters were found to differ significantly in peak
expression between breast cancer and healthy control in the training set. Ten of these
14 peak clusters could also be validated in the test set. We previously identified one
peak cluster as C3adesArg, while 12 other peak clusters were tentatively identified as
ITIH4 fragments and a fibrinogen fragment, respectively. A classification model was
subsequently generated by multivariate logistic regression analysis on the training set.
Its performance on the test set was similar to those reported by previously performed
independent validation studies (10;11;18;21). Hence, our split-sample approach yielded
reliable estimates of performance. Nonetheless, the diagnostic performances reported
thus far are only moderate. The identification of a general diagnostic biomarker is,
however, seriously challenged by the molecular characteristics of breast cancer, which
92
Diagnostic serum protein profiles for breast cancer
are highly heterogeneous (22-24). As such, selection of breast cancer subgroups for
comparison with healthy controls is expected to improve results of future diagnostic
SELDI-TOF MS studies.
Table 4
Structural identities of eight alleged ITIH4 peak clusters that significantly differ in expression
between breast cancer and healthy control in the training set.
Mr
(obs)
(m/z)
Mr
(calc)
(Da)
Structural identity of putative
putative ITIH4 fragment
Ref.
2725.06
StartAmino acid sequence
End
662-688 R.PGVLSSRQLGLPGPPDVPDHAAYHPF.R
2733
3166
(25-27)
3157.58
617-644
R.NVHSGSTFFKYYLQGAKIPKPEASFSPR.R
(25;26;28)
3282
3273.72
658-688
R.MNFRPGVLSSRQLGLPGPPDVPDHAAYHPF.R
(25-27)
3299
3289.72
658-688
R.MNFRPGVLSSRQLGLPGPPDVPDHAAYHPF.R *
3965
3957.46
654-690
A.AGSRMNFRPGVLSSRQLGLPGPPDVPDHAAYHPFRR.L
(27;28)
3980
3973.46
654-690
A.AGSRMNFRPGVLSSRQLGLPGPPDVPDHAAYHPFRR.L *
(27;28)
4292
4284.83
650-690
R.QAGAAGSRMNFRPGVLSSRQLGLPGPPDVPDHAAYHPFRR.L
(28)
4309
4300.83
650-690
R.QAGAAGSRMNFRPGVLSSRQLGLPGPPDVPDHAAYHPFRR.L *
(28)
Abbreviations: ITIH4: inter-alpha-trypsin inhibitor heavy chain 4, Mr (obs): observed mass-to-charge ratio,
Mr (calc): calculated mass from the matched peptide sequence. * Met-Ox fragment
Complement C3adesArg
We discovered the expression of the serum m/z 8940 C3adesArg peak to be significantly
decreased in breast cancer compared to controls in both the training and test set.
Complement C3 is the most abundant (1.2 mg/ml) complement protein in serum (29),
supporting the activation of all three pathways of complement activation (the classic,
alternative, and lectin pathway) (30;31). Produced mainly in the liver and adipocytes,
C3a is formed by cleavage of C3 (185 kDa) by C3-convertases into C3b (176 kDa) and
C3a (8.9 kDa) (32). The anaphylatoxin C3a is only short lived in serum as
carboxypeptidases cleave the C-terminal arginine residue, creating the more stable, but
biologically inactive C3adesArg (8.9 kDa) (31-33).
As C3 is a positive acute phase reactant (34), elevated serum levels of C3 (and hence,
C3a(desArg)) in cancer compared to control are anticipated. Indeed, elevated serum C3
levels have been described in various cancer types, including neuroblastoma (35), lung
cancer (36), and cancer of the digestive tract (37). Likewise, increased serum C3adesArg
levels, determined by SELDI-TOF MS, have been reported in breast- (9;10),
hepatocellular- (38), and colorectal cancer (39;40), and chronic lymphoid malignancies
(41). We, on the other hand, observed decreased C3adesArg levels in breast cancer in the
current study population, as well as in a subset hereof, which we analysed for validation
of the 8.9 kDa marker reported by Li et al. (21). Other studies have described decreased
8.9 kDa peak intensities in breast- (7;42), and lung cancer (43). Moreover, Li et al.
observed decreased SELDI-TOF MS C3adesArg peak intensities in sera of metastatic breast
93
Chapter 3.1
cancer patients (10). Their finding is corroborated by the decreased serum C3 levels
reported in patients with metastatic breast-, gastric-, and colorectal cancer (37) and
brain tumours (35). Hence, complement activation seems an early event during
tumourigenesis. This, however, can not explain the results of the current study, as we
included only sera of patients with locally invasive breast cancer.
An other possible explanation for the observed inconsistencies in 8.9 kDa C3adesArg
regulation can be the in vitro complement activation, caused by coagulation induced
platelet activation (44). Banks et al. (45) indeed reported the intensity of an IMAC3 m/z
8939 peak (not structurally identified though) to significantly increase with prolonged
coagulation times. Coagulation time is, however, an unlikely confounder, as studies
generally apply standardised collection protocols for both cancer and control samples.
C3adesArg levels can also be affected by sample storage time. In a previous study, we
found the m/z 8939 C3adesArg peak intensity positively correlated to sample storage time
during the first three years of storage, after which intensities remained stable. Although
in the current study, the breast cancer sera were stored for a slightly longer period than
the control sera, storage time of both sample groups was less than three years.
Moreover, as the m/z 8939 peak performance was not influenced by adjustment for
sample storage duration, this parameter is unlikely to have confounded results of the
current study.
ITIH4 fragments
Of the 14 peaks we found significantly different in expression between breast cancer
and healthy controls, 11 were identified as putative ITIH4 fragments. The peak
intensities of all putative ITIH4 fragments were decreased in breast cancer compared to
control. ITIH4, a 120 kDa plasma glycoprotein expressed mainly in the liver, acts as a
positive acute phase reactant and is extensively proteolytically processed (27). Plasma
kallikrein readily cleaves ITIH4 into an N-terminal 85 kDa and C-terminal 35 kDa
fragment, after which the 85 kDa fragment is further cleaved into an N-terminal 57 kDa
and a putative 28 kDa fragment. The latter fragment has not been detected in its
entirety hitherto, as it is rapidly cleaved into subsequent smaller fragments (27).
Changes in the abundance of different fragments have been found associated with
various types of cancer (e.g., prostate, breast, ovarian, colorectal and pancreatic cancer)
(25-27), indicating cancer-type specific proteolytic processing of ITIH4. Three of the 11
putative ITIH4 fragments (i.e., m/z 2733, m/z 3282, and m/z 4292) have been reported
as potential markers for breast cancer (27). Unlike our results, however, this study
found increased peak intensities of the three fragments in cancer compared to control
(27).
94
Diagnostic serum protein profiles for breast cancer
Figure 2
Peak intensity correlation matrix for the 14 peaks found significantly different in expression
between breast cancer and healthy control in the training set (for clarity, Pearsons´ correlation
coefficients were converted into absolute values).
1
m/z 2733
m/z 3166
0.9
m/z 3282
0.8
m/z 3299
m/z 3691
0.7
m/z 3782
0.6
m/z 3965
0.5
m/z 3980
m/z 3997
0.4
m/z 4292
0.3
m/z 4309
m/z 8940
0.2
m/z 4219
0.1
m/z 11745
m/z 11745
m/z 4219
m/z 8940
m/z 4309
m/z 4292
m/z 3997
m/z 3980
m/z 3965
m/z 3782
m/z 3691
m/z 3299
m/z 3282
m/z 3166
m/z 2733
The m/z 4292 ITIH4 fragment has also been described by Li et al. (9;10). They initially
observed a 4.3 kDa ITIH4 fragment to be downregulated in breast cancer (9), but found
this peak upregulated upon validation (10). In their original discovery study, the cancer
sera were collected during a (non-specified) longer time interval than the control sera,
whereas in the validation study, sera of both cases and controls were collected within a
two-year time interval. Combined with the postulated instability of the ITIH4 fragment
(causing further truncation during prolonged storage), this could indeed explain their
discrepant results. Nonetheless, following analysis of prospectively collected sera,
Mathelin et al. (11) also observed a decreased expression of the m/z 4292 ITIH4 peak
intensity in breast cancer. This decrease was also observed following analysis of a subset
of the current study population for validation of the markers reported by Li et al. (21).
However, the decrease of m/z 4292 observed in the breast cancer cases of the current
study could not be explained by the difference in storage duration between the cancer
and control sera, as correction for this parameter by logistic regression analyses did not
affect the performance of the m/z 4292 peak. In addition, evidence for the alleged 4.3
kDa ITIH4 fragment instability is only limited. Peak intensities of this fragment were
95
Chapter 3.1
found both in- (28) and decreased by different (pre-) analytical parameters (25;27),
though the fragmentation pattern was not altered (25;27). Perhaps the discrepant results
of the various studies are caused by differences between the patient populations
investigated in the various studies.
Other markers
Of the 14 peak clusters found significantly different in expression between breast
cancer and healthy controls, both m/z 4219 and m/z 11745 were not correlated to any
of the other peak clusters. While we previously identified m/z 4219 as a putative
fibrinogen fragment, the identity of m/z 11745 peak is yet unknown. The m/z 5350 peak
cluster, included in the logistic regression model, was identified earlier as a fibrinogen
fragment as well (i.e., fibrinogen alpha-E fragment FGA576-625). The multivariate
classification model furthermore designated the m/z 28183 peak cluster as a potential
marker, in combination with m/z 4219, 4309, and 5350. Although peak intensities of
both m/z 5350 and m/z 28183 were not significantly different in expression between
breast cancer and healthy control, combination with other markers evidently improved
their diagnostic performance. The m/z 5350 peak cluster, though not structurally
identified, has been reported earlier as significantly increased in sera of patients with
lung cancer (43) and hypopharyngeal squamous cell carcinoma (46). Based on the
observed mass, we hypothesise the m/z 28183 peak cluster to represent apolipoprotein
A-I. This protein was previously identified by our group as a potential marker for
colorectal cancer by serum SELDI-TOF MS analyses (47). Synthesised both in the liver
and small intestine, apolipoprotein A-I constitutes the major component of high-density
lipoproteins (48). It is a negative acute phase reactant (49), explaining the decreased
expression we observed in cancer vs. healthy control (Table 2, crude odds ratio < 1). Its
decreased expression in cancer is confirmed by other studies investigating breast- (48),
ovarian- (50), colorectal- (47), and hepatocellular cancer (51).
Conclusion
In conclusion, using SELDI-TOF MS, we discovered and validated 10 peak clusters that
significantly differ in expression between sera of breast cancer patients and healthy
controls. These peak clusters were structurally identified as the high abundant C3adesArg
anaphylatoxin, and putative ITIH4 and fibrinogen fragments. Logistic regression
analyses in the training set yielded a classification model with a performance
comparable to those reported in previously performed independent validation studies.
As these moderate performances most likely originate from the highly heterogeneous
nature of breast cancer, selection of breast cancer subgroups for comparison with
healthy controls is expected to improve results of future diagnostic SELDI-TOF MS
studies.
96
Diagnostic serum protein profiles for breast cancer
Acknowledgement
This study was supported by a grant of the Dutch Cancer Society (project NKI 20053421).
References
(1)
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. Cancer statistics, 2008. CA Cancer J Clin 2008;
58(2):71-96.
(2)
Ries L, Melbert D, Krapcho M, Stinchcomb D, Howlader N, Horner M et al. SEER Cancer Statistics
Review, 1975-2005, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2005/
, based on November 2007 SEER data submission, posted on the SEER website. 2008.
(3)
Stieber P, Molina R, Chan DW, Fritsche HA, Beyrau R, Bonfrer JM et al. Clinical evaluation of the
Elecsys CA 15-3 test in breast cancer patients. Clin Lab 2003; 49(1-2):15-24.
(4)
Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ et al. Proteomics: new
perspectives, new biomedical opportunities. Lancet 2000; 356(9243):1749-1756.
(5)
Becker S, Cazares LH, Watson P, Lynch H, Semmes OJ, Drake RR et al. Surfaced-enhanced laser
desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1
and sporadic breast cancer. Ann Surg Oncol 2004; 11(10):907-914.
(6)
Belluco C, Petricoin EF, Mammano E, Facchiano F, Ross-Rucker S, Nitti D et al. Serum Proteomic
Analysis Identifies a Highly Sensitive and Specific Discriminatory Pattern in Stage 1 Breast Cancer. Ann
Surg Oncol 2007; 14(9):2470-2476.
(7)
Hu Y, Zhang S, Yu J, Liu J, Zheng S. SELDI-TOF-MS: the proteomics and bioinformatics approaches in
the diagnosis of breast cancer. Breast 2005; 14(4):250-255.
(8)
Laronga C, Becker S, Watson P, Gregory B, Cazares L, Lynch H et al. SELDI-TOF serum profiling for
prognostic and diagnostic classification of breast cancers. Dis Markers 2003; 19(4-5):229-238.
(9)
Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for
identification of serum biomarkers to detect breast cancer. Clin Chem 2002; 48(8):1296-1304.
(10) Li J, Orlandi R, White CN, Rosenzweig J, Zhao J, Seregni E et al. Independent Validation of Candidate
Breast Cancer Serum Biomarkers Identified by Mass Spectrometry. Clin Chem 2005; 51(12):2229-2235.
(11) Mathelin C, Cromer A, Wendling C, Tomasetto C, Rio MC. Serum biomarkers for detection of breast
cancers: a prospective study. Breast Cancer Res Treat 2005;1-8.
(12) Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D et al. A novel approach toward
development of a rapid blood test for breast cancer. Clin Breast Cancer 2003; 4(3):203-209.
(13) Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M et al. Postoperative serum
proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving
adjuvant chemotherapy. Oncogene 2006; 25(7):981-989.
(14) Pusztai L, Gregory BW, Baggerly KA, Peng B, Koomen J, Kuerer HM et al. Pharmacoproteomic analysis
of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or
adjuvant chemotherapy for breast carcinoma. Cancer 2004; 100(9):1814-1822.
(15) Heike Y, Hosokawa M, Osumi S, Fujii D, Aogi K, Takigawa N et al. Identification of serum proteins
related to adverse effects induced by docetaxel infusion from protein expression profiles of serum using
SELDI ProteinChip system. Anticancer Res 2005; 25(2B):1197-1203.
97
Chapter 3.1
(16) Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer
Inst 2005; 97(4):315-319.
(17) Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L et al. Correcting common errors in
identifying cancer-specific serum peptide signatures. J Proteome Res 2005; 4(4):1060-1072.
(18) Gast MC, Bonfrer JM, van Dulken EJ, de Kock L, Rutgers EJ, Schellens JH et al. SELDI-TOF MS serum
protein profiles in breast cancer: assessment of robustness and validity. Cancer Biomark 2006; 2(6):235248.
(19) Hu J, Coombes KR, Morris JS, Baggerly KA. The importance of experimental design in proteomic mass
spectrometry experiments: some cautionary tales. Brief Funct Genomic Proteomic 2005; 3(4):322-331.
(20) Karsan A, Eigl BJ, Flibotte S, Gelmon K, Switzer P, Hassell P et al. Analytical and preanalytical biases in
serum proteomic pattern analysis for breast cancer diagnosis. Clin Chem 2005; 51(8):1525-1528.
(21) van Winden AWJ, Gast MCW, Beijnen JH, Rutgers EJ, Grobbee DE, Peeters PHM et al. Validation of
previously identified serum biomarkers for breast cancer with SELDI-TOF MS: a case control study.
BMC Medical Genomics 2008; Accepted for publication.
(22) Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H et al. Gene expression patterns of breast
carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 2001;
98(19):10869-10874.
(23) Perou CM, Sorlie T, Eisen MB, van de RM, Jeffrey SS, Rees CA et al. Molecular portraits of human breast
tumours. Nature 2000; 406(6797):747-752.
(24) Bertucci F, Birnbaum D, Goncalves A. Proteomics of Breast Cancer: Principles and Potential Clinical
Applications. Mol Cell Proteomics 2006; 5(10):1772-1786.
(25) Fung ET, Yip TT, Lomas L, Wang Z, Yip C, Meng XY et al. Classification of cancer types by measuring
variants of host response proteins using SELDI serum assays. Int J Cancer 2005; 115(5):783-789.
(26) Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB et al. Differential
exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006; 116(1):271284.
(27) Song J, Patel M, Rosenzweig CN, Chan-Li Y, Sokoll LJ, Fung ET et al. Quantification of fragments of
human serum inter-alpha-trypsin inhibitor heavy chain 4 by a surface-enhanced laser
desorption/ionization-based immunoassay. Clin Chem 2006; 52(6):1045-1053.
(28) Timms JF, Arslan-Low E, Gentry-Maharaj A, Luo Z, T'Jampens D, Podust VN et al. Preanalytic influence
of sample handling on SELDI-TOF serum protein profiles. Clin Chem 2007; 53(4):645-656.
(29) Hugli TE. Human anaphylatoxin (C3a) from the third component of complement. Primary structure. J
Biol Chem 1975; 250(21):8293-8301.
(30) Bohana-Kashtan O, Ziporen L, Donin N, Kraus S, Fishelson Z. Cell signals transduced by complement.
Mol Immunol 2004; 41(6-7):583-597.
(31) Sahu A, Sunyer JO, Moore WT, Sarrias MR, Soulika AM, Lambris JD. Structure, functions, and evolution
of the third complement component and viral molecular mimicry. Immunol Res 1998; 17(1-2):109-121.
(32) de Bruijn MH, Fey GH. Human complement component C3: cDNA coding sequence and derived
primary structure. Proc Natl Acad Sci U S A 1985; 82(3):708-712.
(33) Nettesheim DG, Edalji RP, Mollison KW, Greer J, Zuiderweg ER. Secondary structure of complement
component C3a anaphylatoxin in solution as determined by NMR spectroscopy: differences between
crystal and solution conformations. Proc Natl Acad Sci U S A 1988; 85(14):5036-5040.
(34) Gabay C, Kushner I. Acute-phase proteins and other systemic responses to inflammation. N Engl J Med
1999; 340(6):448-454.
(35) Carli M, Bucolo C, Pannunzio MT, Ongaro G, Businaro R, Revoltella R. Fluctuation of serum
complement levels in children with neuroblastoma. Cancer 1979; 43(6):2399-2404.
98
Diagnostic serum protein profiles for breast cancer
(36) Gminski J, Mykala-Ciesla J, Machalski M, Drozdz M, Najda J. Immunoglobulins and complement
components levels in patients with lung cancer. Rom J Intern Med 1992; 30(1):39-44.
(37) Maness PF, Orengo A. Serum complement levels in patients with digestive tract carcinomas and other
neoplastic diseases. Oncology 1977; 34(2):87-89.
(38) Lee IN, Chen CH, Sheu JC, Lee HS, Huang GT, Chen DS et al. Identification of complement C3a as a
candidate biomarker in human chronic hepatitis C and HCV-related hepatocellular carcinoma using a
proteomics approach. Proteomics 2006; 6(9):2865-2873.
(39) Habermann JK, Roblick UJ, Luke BT, Prieto DA, Finlay WJ, Podust VN et al. Increased serum levels of
complement C3a anaphylatoxin indicate the presence of colorectal tumors. Gastroenterology 2006;
131(4):1020-1029.
(40) Ward DG, Suggett N, Cheng Y, Wei W, Johnson H, Billingham LJ et al. Identification of serum
biomarkers for colon cancer by proteomic analysis. Br J Cancer 2006; 94(12):1898-1905.
(41) Miguet L, Bogumil R, Decloquement P, Herbrecht R, Potier N, Mauvieux L et al. Discovery and
identification of potential biomarkers in a prospective study of chronic lymphoid malignancies using
SELDI-TOF-MS. J Proteome Res 2006; 5(9):2258-2269.
(42) Shin S, Cazares L, Schneider H, Mitchell S, Laronga C, Semmes OJ et al. Serum biomarkers to
differentiate benign and malignant mammographic lesions. J Am Coll Surg 2007; 204(5):1065-1071.
(43) Han KQ, Huang G, Gao CF, Wang XL, Ma B, Sun LQ et al. Identification of lung cancer patients by
serum protein profiling using surface-enhanced laser desorption/ionization time-of-flight mass
spectrometry. Am J Clin Oncol 2008; 31(2):133-139.
(44) Hamad OA, Ekdahl K, Lambris JD, Nilsson B. Complement activation triggered by thrombin receptoractivated platelets. Mol.Immunol. 44, 180. 2007.
(45) Banks RE, Stanley AJ, Cairns DA, Barrett JH, Clarke P, Thompson D et al. Influences of blood sample
processing
on
low-molecular-weight
proteome
identified
by
surface-enhanced
laser
desorption/ionization mass spectrometry. Clin Chem 2005; 51(9):1637-1649.
(46) Zhou L, Cheng L, Tao L, Jia X, Lu Y, Liao P. Detection of hypopharyngeal squamous cell carcinoma
using serum proteomics. Acta Otolaryngol 2006; 126(8):853-860.
(47) Engwegen JY, Helgason HH, Cats A, Harris N, Bonfrer JM, Schellens JH et al. Identification of serum
proteins discriminating colorectal cancer patients and healthy controls using surface-enhanced laser
desorption ionisation-time of flight mass spectrometry. World J Gastroenterol 2006; 12(10):1536-1544.
(48) Chang SJ, Hou MF, Tsai SM, Wu SH, Hou LA, Ma H et al. The association between lipid profiles and
breast cancer among Taiwanese women. Clin Chem Lab Med 2007; 45(9):1219-1223.
(49) Van Lenten BJ, Reddy ST, Navab M, Fogelman AM. Understanding changes in high density lipoproteins
during the acute phase response. Arterioscler Thromb Vasc Biol 2006; 26(8):1687-1688.
(50) Zhang Z, Bast RC, Jr., Yu Y, Li J, Sokoll LJ, Rai AJ et al. Three biomarkers identified from serum
proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004; 64(16):5882-5890.
(51) Steel LF, Shumpert D, Trotter M, Seeholzer SH, Evans AA, London WT et al. A strategy for the
comparative analysis of serum proteomes for the discovery of biomarkers for hepatocellular carcinoma.
Proteomics 2003; 3(5):601-609.
99
Chapter
SELDI-TOF MS serum protein
profiles in breast cancer:
assessment of robustness
and validity
Marie-Christine W. Gast
Johannes M.G.Bonfrer
Eric J. van Dulken
Lieve de Kock
Emiel J. Th. Rutgers
Jan H.M. Schellens
Jos H. Beijnen
Cancer Biomarkers 2006;2(6):235-48
3.2
Chapter 3.2
Abstract
There is an urgent need for new serum markers that can be applied in the early
detection of breast cancer. Following detection of new, potential biomarkers, such as
those reported by Vlahou et al. (Clin Breast Cancer 2003;4:230-9) and Laronga et al.
(Dis Markers 2003;19:229-38), assessment of both their robustness and validity is
essential to confirm their clinical applicability. We therefore aimed to determine
robustness and validity of biomarkers reported by the authors mentioned, by analysis of
an independent sample set (breast cancer: n = 47, normal women: n = 48) in our
laboratory, according to the methods described by both authors. Although all markers
for the differentiation between breast cancer patients and normal women, discovered in
the study of Vlahou et al., were recovered in our validation data set, none had sufficient
performance to be applied as a classifier. The markers discovered by Laronga et al. in
the differentiation between lymph node positive and -negative breast cancer patients
were in part recovered from our validation data set, but were also not applicable as a
classifier. In conclusion, although (part of) the proteins discovered and designated as
markers by either author could be detected, their validity as biomarkers could not be
confirmed by the current study. This finding stresses that, when reporting on a
potential biomarker, confirmation of both robustness and validity is essential in
obtaining its true clinical applicability.
102
Assessment of robustness and validity
Introduction
Following the introduction of the proteomic surface-enhanced laser
desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS) platform,
many efforts have been made in the search for new biomarkers applicable in a.o. cancer
diagnosis. This has resulted in several publications, reporting the analysis of SELDI-TOF
MS protein profiles by sophisticated bioinformatics techniques, yielding potential
biomarkers with a high sensitivity and specificity in the detection of different types of
cancer, e.g., prostate, ovarian, lung and breast cancer (1-5). Despite these promising
results, this technology faces some limitations. Of particular concern has been the
reproducibility of the SELDI-based approach. This is illustrated by the comment that
for the same cancer type, different biomarkers have been identified by different
research groups (6). It may, however, be very unlikely to obtain identical protein
profiles when studies of the same cancer type use different study populations, different
methods to collect the data (e.g., different ProteinChip Array types, sample handling
and assay conditions), and different bioinformatics methods to analyse the data (7). Still,
standardisation of all these variables does not guarantee detection of reproducible
protein profiles. Discriminative protein profiles could well be explained by chance, due
to overfitting of the data. This can easily occur when large numbers of possible
predictors are fitted by a multivariable model, as is often the case in proteomic data
analysis (8).
To confirm the robustness of biomarker protein profiles, and to preclude the possibility
of being caused by chance, identification of these profiles across different sample sets
and different laboratories is imperative (8). Furthermore, structural identification of the
proteins or peptides that form part of discriminative protein profiles is essential in
undermining problems with robustness. Results should be validated by analysis of an
independent, but similar sample set, that is handled like the sample set in which the
biomarkers were discovered. Although validation is essential in obtaining the true
clinical applicability of the biomarker, very little attention has been paid to this issue
thus far. To date, only biomarkers for ovarian and breast cancer have been validated by
the analysis of samples from different institutes (3;9). A second, more elaborate
validation study for prostate cancer is currently still ongoing (10;11).
In breast cancer, eight studies in which the SELDI-TOF MS platform was applied in the
identification of serum markers for diagnosis, prognosis, or monitoring of therapy
efficacy or toxicity, have been published until now (5;12-18). Thus far, only one panel
of biomarkers discovered by SELDI-TOF MS in breast cancer has been validated yet by
both intra- (9) and inter-laboratory (19) analysis of two independent sample sets. The
two validation studies, however, reported only partial replication of prior results. Thus,
detection of reproducible serum protein profiles in different sample sets and at different
laboratories has not been achieved yet. We therefore aimed to assess robustness and
validity of two high-performance breast cancer protein profiles published earlier by
103
Chapter 3.2
Vlahou et al. (17) and by Laronga et al. (15), through analysis of our own sample set in
our laboratory, using the procedures as described by both authors.
Methods
Patients
Serum samples of 47 female patients with pathologically confirmed primary breast
cancer, who had not received any prior treatment at time of sample withdrawal, were
included in cohort A of this study. The control samples included in this cohort were
randomly collected from 48 female normal volunteers. All sera in cohort A were stored
at -20°C. Furthermore, serum samples of 19 female patients with pathologically
confirmed primary breast cancer who had not received any prior treatment at time of
sample withdrawal were included in cohort B of this study. All sera in this cohort were
stored at -80°C, and serum aliquots of 7 patients were simultaneously stored at -20°C.
All sera included in this study were obtained after signing an Institutional review board
approved Informed Consent. Withdrawal, processing and storage of all serum samples
were performed under strictly defined conditions at the Department of Clinical
Chemistry of the Institute. Study participant and sample characteristics of cohort A and
B are, along with study data of Vlahou et al. (17) and Laronga et al. (15), provided in
Table 1 and Table 2.
Serum protein profiling
Serum protein profiling was performed manually on immobilized metal affinity capture
(IMAC30) and strong anion exchange (Q10) arrays, as described by Adam et al. (1), and
Vlahou et al. (17), respectively. Both array types consist of an identical chromatography
as the array types used in the studies of Vlahou et al. (17) and Laronga et al. (15) (IMAC
and SAX arrays, respectively). Samples were randomized for processing. In brief, sample
pretreatment consisted of mixing 20 μl serum with 30 μl of 8 M urea / 1% CHAPS in
PBS pH 7.4 for 10 min at 4°C, followed by the addition of 100 μl 1 M urea / 0.125%
CHAPS. Next, 600 μl of binding buffer (PBS pH 7.4 for the IMAC30 assay, and 20 mM
HEPES / 0.1% TritonX-100 for the Q10 assay) was added, and samples were placed on
ice. Arrays were assembled into a bioprocessor, a device that holds up to twelve 8-spot
arrays and allows for the addition of larger volumes, and was shaken on a platform
shaker at 250 rpm throughout the assay. IMAC30 arrays were activated by the
application of 20 μl 100 mM CuSO4 for 5 min, followed by 10 rinses with double
distilled water. Next, 20 μl 100 mM SodiumAcetate was added for 5 min, again followed
by 10 rinses with double distilled water. Both IMAC30 and Q10 arrays were
subsequently equilibrated twice for 5 min with 200 μl of their respective binding buffer.
Next, pretreated serum samples (50 μl) were randomly applied to the arrays. After a 30
min incubation, arrays were washed three times for 5 min with binding buffer and
104
Assessment of robustness and validity
three times with double distilled water. Following air drying of the arrays, sinapinic
acid was applied twice to each spot. Since both authors do not clearly specify either
EAM composition or volume added per spot, we applied 1μl of a 50% solution of
sinapinic acid in 50% ACN / 0.5% TFA to each spot twice, according to manufacturers
instructions. Following air-drying, the array was inserted in the PBS IIc ProteinChip
Reader (Ciphergen Biosystems Inc., Freemont, CA, USA). Since both studies were
performed with this instruments’ predecessor, reported acquisition parameters could
only partially be applied. Both laser intensity and detector sensitivity were optimised
for the detection of the reported biomarkers. Time-Of-Flight mass spectra were
generated for IMAC30 and Q10 arrays by averaging 192 laser shots with intensity 150
and 157, respectively (arbitrary units), and detector sensitivity 6 and 5, respectively
(arbitrary units). Spectra were generated with different focus lag times. For IMAC30
arrays, a lag time of 900 ns was maintained, as specified by Adam et al. (1) For Q10
arrays, focus lag time was not specified; following optimisation, we maintained a lag
time of 528 ns. For mass accuracy, the instrument was calibrated on the day of
measurements using a peptide molecular mass standard (Ciphergen Inc.), containing
[Arg8] vasopressin (1084.3 Da), somatostatin (1637.9 Da), dynorphin (2147.5 Da),
ACTH (2933.5 Da), insulin β-chain (bovine) (3495.9 Da), insulin (human recombinant)
(5807.7 Da) and hirudin (7033.6 Da).
Statistics and Bioinformatics
All spectra were processed by the ProteinChip Software v3.1 (Ciphergen Inc.). Spectra
were compiled in one file, baseline subtracted and normalised for total ion current from
1000 Da (Laronga et al. (15)) or 1500 Da (Vlahou et al.(17)) to the spectrum’s end.
Spectra with normalisation factors > 2 or < 0.5 were excluded from further analysis.
Next, the Biomarker Wizard (BMW), an application within the ProteinChip Software,
was applied for peak detection and clustering. First, peaks detected by Vlahou et al. (17)
and Laronga et al. (15) were manually searched for and labelled at their centroid,
irrespective of signal-to-noise (S/N) ratio. Second, to investigate whether the spectra
contained other discriminative protein peaks, automatic peak detection was performed
along the entire spectrum. Software settings applied herein were as specified by Vlahou
et al.(17) and Laronga et al. (15). For Vlahou et al.(17), peaks with a S/N ratio > 3,
occurring in at least 10% of all spectra were clustered initially, in both IMAC30 and
Q10 spectra. Clusters were completed by peaks with S/N > 1.5 in a cluster mass window
of 0.3%. For Laronga et al. (15), peaks occurring in at least 5% of all spectra, with S/N >
3, were clustered. Peak clusters were completed by peaks with S/N > 2 in a mass
window of 0.2% and 0.3% for IMAC30 and Q10 data, respectively. Following peak
detection and clustering, average peak intensities for both groups (i.e., breast cancer vs.
normal, and lymph node positive vs. -negative) were calculated. Next, peak expression
differences between spectra of both groups were calculated by the Biomarker Wizard,
using the non-parametric Mann-Whitney-U test. P values < 0.05 were considered
105
Chapter 3.2
statistically significant. Peak information was subsequently exported as spreadsheet
files, and data of both array types were merged in one file. Files were analysed for
pattern recognition and sample classification by the Biomarker Patterns Software v.5.0.1
(BPS; Ciphergen Inc.).
To investigate the effect of storage time on the expression of the reported markers, peak
intensities of spectra in cohort A were plotted against their respective sample storage
times. Plots were visually inspected for random distribution of peak intensity across
sample age. All statistical analyses, other than those executed by the BMW or BPS, were
performed by using SPSS statistical software, v.11.0.1 (SPSS Inc., Chicago, IL, USA). For
all analyses, a 2-tailed P value < 0.05 was considered significant.
Results
Patientss
Patient
As is depicted by the data presented in Table 1, only minor differences existed in
participant and sample characteristics of both studies. Breast cancer patients in cohort
A, however, differed in stage of disease. While the study participants of Vlahou et al.
(17) had Stage I to IV disease or DCIS, cohort A represented only Stage II and III
disease. The median age of breast cancer patients and normal women differed slightly
between both samples sets, but the range of participants’ age was similar. Moreover, all
sera were withdrawn prior to therapy. Sera of both breast cancer patients and normal
women in cohort A were sampled in the same time period. The samples in cohort A
were stored at a different temperature (-20°C) than the samples analysed by Vlahou et
al. (17) (-80°C).
Table 1
Characteristics of study participants (breast cancer vs. normal women) and study samples, in
comparison with those reported by Vlahou et al. (17).
Characteristic
N
Normal
Cohort A
48
Normal
Vlahou et al.
47
Cancer
Cohort A
47
Cancer
Cohort B
19
Cancer
Vlahou et al.
45
Median age
[range] (years)
51.0
[21-71]
46.5
[21-78]
61.4
[34-86]
60.5
[27-80]
59.3
[31-91]
Stage*
n.a.
n.a.
DCIS
1
2A
2B
3A
4
Sampling period
25
15
7
Apr-’03 Jan-‘05
Dec-’01 May-‘02
(53%)
(32%)
(15%)
Aug-’03 Dec-‘04
5
11
3
(26%)
(58%)
(16%)
Apr-’05 Jan-‘06
8
14
14
(18%)
(31%)
(31%)
6
3
(13%)
(7%)
Dec-’01 May-‘02
Storage temp
-20°C
-80°C
-20°C
-20°C / -80°C
-80°C
Abbreviations: DCIS: ductal carcinoma in situ, n.a.: not applicable, temp: temperature. * Pathologically
determined stage.
106
Assessment of robustness and validity
The sample set of Laronga et al. (15) contained significantly more participants in
comparison with cohort A of the current sample set (Table 2). Although the median age
of lymph node positives and -negatives differed between both sample sets, participants
age was not significantly different between lymph node positives and -negatives in
cohort A (independent samples T-test; p = 0.06). Both lymph node positive and negative breast cancer patients were sampled in the same time period. The samples in
cohort A were stored at a different temperature (-20°C) than the samples analysed by
Laronga et al. (15) (-80°C).
Table 2
Characteristics of study participants (lymph node positive vs. lymph node negative) and study
samples, in comparison with those reported by Laronga et al. (15).
Characteristic
N
LN neg
Cohort A
11
LN neg
Cohort B
10
LN neg
Laronga
71
LN pos
Cohort A
36
LN pos
Cohort B
9
LN pos
Laronga
27
Mean age
[range] (yrs)
68.4
[43-86]
57.6
[40-71]
56
60.0
[34-86]
62.8
[27-80]
58
Diagnosis*
IDC
ILC
11
9
1
70
1
30
6
8
1
26
1
Sampling
period
Oct-‘03 Dec-‘04
Apr-‘05 Jan-‘06
Dec-‘01 May-‘02
Aug-‘03 Dec-‘04
May-‘05 Sept-‘05
Dec-‘01 May-‘02
Storage temp
-20°C
-20°C /
-80°C
-20°C
-20°C /
-80°C
-80°C
-80°C
Abbreviations: IDC: invasive ductal carcinoma, ILC: invasive lobular carcinoma, LN neg: lymph node
negative, LN pos: lymph node positive, temp: temperature. * Pathologically determined diagnosis.
Serum protein profiling
Differentiation of breast cancer and normal women; Q10 data
Following normalisation of the Q10 data, the spectra of two breast cancer patients and
three normal women were excluded due to a normalisation factor > 2, leaving 45
spectra of breast cancer patients and 45 spectra of normal women for further analysis.
In search for the peaks identified by Vlahou et al. (17) (2.95 kDa, 3.68 kDa, and 4.27
kDa), we manually identified three peak clusters at m/z 2935, m/z 3689 and m/z 4295
on this array type. The mean expression of these peaks was, however, not significantly
different between breast cancer patients and normal women (Mann-Whitney U test; p >
0.5). Moreover, whereas the 3.68 kDa marker was downregulated in breast cancer in
the sample set of Vlahou et al. (17), this marker was found upregulated in breast cancer
in the current sample set, and vice versa for the 4.27 kDa marker (Table 3). With the
data of these three peak clusters solely, the Biomarker Patterns Software constructed a
decision tree with a cross validated sensitivity and specificity of 73.3% and. 24.4%,
respectively (Figure 1A).
Over the entire length of spectra, the Biomarker Wizard detected a total of 145 peak
clusters. Of 10 peak clusters, mean peak intensities were significantly different between
107
Chapter 3.2
breast cancer patients and normal women (data not shown). Using all peak cluster data,
the Biomarker Patterns Software constructed an optimal decision tree consisting of
three nodes (m/z 6307, m/z 5073 and m/z 8238), with a cross validated sensitivity and
specificity of 51.1% and 57.8%, respectively (Table 4).
Table 3
Summary of the markers identified by Vlahou et al. (17) and their presence and performance in the
validation sample set of the current study.
Vlahou et al.
Array
Peak
(kDa)
Intensity
cut off
↑ or ↓ in
BC
Current study
Array
Peak
(m/z)
SAX
2.95
3.68
4.27
≤ 0.841
≤ 0.217
≤ 2.699
Down
Down
Up
Q10
2935
3689
4295
Ave BC
peak
intensity
2.309
2.950
1.744
Ave N
peak
intensity
2.221
2.826
1.771
↑ or ↓ in
N
IMAC
2.95
3.94
3.97
≤ 0.751
≤ 1.192
≤ 8.901
Down
Up
Down
IMAC30
2961
3964
3979
1.416
25.787
16.161
1.301
29.548
17.343
Up
Down
Down
SAX
4.03
≤ 0.658
Down
Q10
4022
2.912
2.782
Up
Down
Up
Down
Abbreviations: BC: breast cancer, N: normal.
Differentiation of breast cancer and normal women; Q10 & IMAC30 data
Upon normalisation, all IMAC30 spectra had a normalisation factor between 0.5 and 2,
and thus, none were excluded from further analysis. We manually identified all
discriminative peaks discovered by Vlahou et al. (17) (2.95 kDa, 3.94 kDa, and 3.97 kDa)
at m/z 2961, m/z 3964, and m/z 3979. Peak intensity data were subsequently generated
by the Biomarker Wizard. In our dataset, none of the three peaks was significantly
different in mean peak expression between breast cancer patients and normal women
(Mann-Whitney U-test; p > 0.09). Moreover, except for the 3.97 kDa peak, the
regulation of protein expression in breast cancer patients in the current dataset was
opposite to that observed in the dataset of Vlahou et al. (17) (Table 3).
The discriminative 4.03 kDa peak discovered on the SAX array type by Vlahou et al.
(17) was manually detected at m/z 4022 in the current Q10 dataset. Intensity data for
this peak were generated by the Biomarker Wizard. In our dataset, mean m/z 4022 peak
intensity did not differ significantly between spectra of breast cancer patients and
normal women (Mann-Whitney U test; p > 0.5). Moreover, in the current dataset, the
peak at m/z 4022 was upregulated in breast cancer, opposite to the downregulation
observed in the dataset of Vlahou et al. (17) (Table 3).
Peak intensity data of the IMAC30 clusters (m/z 2961, m/z 3964, and m/z 3979) and the
Q10 cluster (m/z 4022) were merged in one file, and subsequently analysed by the
Biomarker Patterns Software. The optimum decision tree consisted of 3 nodes (Figure
1B), with a cross validated sensitivity and specificity of 51.1% and. 44.4%, respectively.
Using the software settings applied by Vlahou et al. (17), the Biomarker Wizard
detected a total number of 82 peaks in the entire length of IMAC30 spectra, 8 of which
108
Assessment of robustness and validity
significantly differed in peak expression between spectra of breast cancer patients and
normal women (data not shown). Following combination of peak intensity data from
both the Q10 and IMAC30 array type in one file, the Biomarker Patterns Software
constructed an optimal decision tree consisting of 6 decision nodes, each containing a
different peak. This tree had a sensitivity and specificity of 48.9% and 57.8%,
respectively, as determined by cross validation (Table 4).
Figure 1
Decision trees for the classification of breast cancer (BC) and normal women (N).
A. based on Q10 data solely, and B. based on both Q10 data and IMAC30 data. The decision trees were
constructed on the three (A) and four (B) discriminative protein peaks identified by Vlahou et al. (17), as
manually detected in cohort A of the current dataset. Spectra that follow the decision rules depicted in each
node will proceed to the left descendant node and vice versa.
1B
1A
m/z 2935 ≤ 0.607
BC: n = 45
N: n = 45
Cancer
BC: n = 4
N: n = 0
m/z 3964 ≤ 29.589
BC: n = 45
N: n = 45
m/z 3689 ≤ 1.857
BC: n = 41
N: n = 45
Normal
BC: n = 1
N: n = 10
Cancer
BC: n = 40
N: n = 35
m/z 3964 ≤ 7.595
BC: n = 25
N: n = 15
Normal
BC: n = 2
N: n = 5
Cancer
BC: n = 23
N: n = 10
m/z 4022 ≤ 3.447
BC: n = 20
N: n = 30
Normal
BC: n = 14
N: n = 28
Cancer
BC: n = 6
N: n = 2
Differentiation of lymph node positive and -negative breast cancer patients; Q10 &
IMAC30 data
Following normalisation, all IMAC30 spectra had a normalisation factor between 0.5
and 2. Thus, none were excluded from further analysis. A total of 114 clusters were
detected on the IMAC30 array type, 7 of which differed significantly in peak expression
between lymph node positive and negative breast cancer patients (data not shown). All
three discriminating peaks discovered by Laronga et al. (15) on the IMAC surface (1437,
1349, and 1003 Da) were recovered in the current IMAC30 dataset. Mean intensities of
none of these peaks was significantly different between lymph node positive and negative breast cancer patients in the current dataset (Mann-Whitney U test; p > 0.1)
(Table 5).
Following normalisation of the Q10 data, 3 spectra (2 from lymph node positive-, 1
from lymph node negative breast cancer patients) had a normalisation factor > 2. These
spectra were excluded from further analysis. In total, 18 peak clusters were recovered
from the current dataset, with one peak cluster (m/z 43,462) having a significantly
mean expression difference between lymph node positive and -negative breast cancer
109
Chapter 3.2
patients (data not shown). None of the discriminating peaks discovered by Laronga et al.
(15) were, however, detected in the current dataset.
Table 4
Comparison of decision trees constructed in the study of Vlahou et al. (17), Laronga et al. (15), and
in the current study.
Fig
-
Study
Vlahou
Distinction
Distinction
BC vs. N
Array
SAX
Inclusion of
All BMW
detected peaks
Classifiers
4.27 kDa
3.68 kDa
2.95 kDa
Cut off
≤ 2.699
≤ 0.217
≤ 0.841
Sens
82.2
80.0
Spec
85.1
78.7
1A
Current
BC vs. N
Q10
Manually
detected peaks
m/z 2935
m/z 3689
≤ 0.607
≤ 1.857
97.8
73.3
22.2
24.4
-
Current
BC vs. N
Q10
All BMW
detected peaks
m/z 6307
m/z 5073
m/z 8238
≤ 1.866
≤ 2.544
≤ 1.493
71.1
51.1
88.9
57.8
-
Vlahou
BC vs. N
SAX &
IMAC
All BMW
detected peaks
3.94 kDa
3.97 kDa
4.03 kDa
2.95 kDa
≤ 1.192
≤ 8.901
≤ 0.658
≤ 0.751
90.0
90.0
96.7
93.3
1B
Current
BC vs. N
Q10 &
IMAC30
Manually
detected peaks
m/z 3964
m/z 4022
m/z 3964
≤ 29.59
≤ 3.447
≤ 7.595
64.4
51.1
73.3
44.4
-
Current
BC vs. N
Q10 &
IMAC30
All BMW
detected peaks
m/z 6307
m/z 5073
m/z 23425
m/z 8238
m/z 53987
m/z 18743
≤ 1.866
≤ 2.544
≤ 0.207
≤ 1.493
≤ 0.064
≤ 0.321
95.6
48.9
88.9
57.8
-
Laronga
LN pos vs.
LN neg
SAX &
IMAC
All BMW
detected peaks
74144 Da
59065 Da
40277 Da
1437 Da
100
81.0
87.3
77.0
1003 Da
≤ 0.014
≤ 0.010
≤ 1.444
≤ 0.446 /
0.537
≤ 1.367 /
2.014
≤ 16.90
m/z 1276
m/z 96409
≤ 1.249
≤ 0.018
70.6
50.0
100
30.0
1349 Da
-
Current
LN pos vs.
LN neg
Q10 &
IMAC30
All BMW
detected peaks
Abbreviations: BC: breast cancer, BMW: Biomarker Wizard software, LN neg: lymph node negative, LN pos:
lymph node positive, N: normal, Sens: sensitivity (%), Spec: specificity (%), both obtained by the learning
dataset (in regular font) and by cross validation (in bold font).
Peak intensity data of the IMAC30 and Q10 array types were merged in one file and
submitted for analysis by the Biomarker Patterns Software. None of the trees
constructed had a satisfactory performance, since the cross validated sensitivity and
specificity of the optimum tree did not exceed 50% (Table 4).
110
Assessment of robustness and validity
Table 5
Summary of the markers identified by Laronga et al. (15) and their presence and performance in
the validation sample set of the present study.
Laronga et al.
Array
Peak
(kDa)
Intensity
cut off
↑ or ↓
in LN+
Current study
Array
Peak
(m/z)
SAX
74144
59065
40277
≤ 0.014
≤ 0.010
≤ 1.444
Up
Down
Up
Q10
n.d.
n.d.
n.d.
Ave LNpeak
intensity
-
Ave LN+
peak
intensity
-
↑ or ↓
in LN+
IMAC
1437
1437
1349
1349
1003
≤ 0.446
≤ 0.537
≤ 1.367
≤ 2.014
≤ 16.90
Down
Down
Down
Up
Down
IMAC30
1455
1.356
1.135
Down
1352
0.717
0.421
Down
1004
14.354
12.499
Down
-
Abbreviations: LN-: lymph node negative, LN+: lymph node positive, n.d.: not detected.
Influence of sample storage temperature and -time on biomarker expression
All discriminating peaks reported by Vlahou et al. (17) on both the IMAC and SAX
surface were recovered from the spectra in our -80°C / -20°C dataset. Of markers
discovered by Laronga et al. (15), only those reported on the IMAC surface were
recovered from our -80°C and -20°C spectra.
Peak intensities of recovered markers were not significantly different between samples
stored at -80°C and -20°C (Mann-Whitney U test; p > 0.05), except for two peaks
discovered by Vlahou et al. (17) and recovered on the Q10 surface (Mann-Whitney U
test; m/z 4022, p = 0.026; m/z 4295, p = 0.040). However, 95% confidence intervals (CI)
of mean intensities of these two peaks overlapped between -20°C and -80°C spectra
(mean (95% CI); m/z 4022 (-20°C): 3.95 (3.37-4.53); m/z 4022 (-80°C): 4.64 (4.29-5.00);
m/z 4295 (-20°C): 2.45 (2.26-2.65); m/z 4295 (-80°C): 2.80 (2.62-2.97)). Peak intensity
distribution of m/z 4022 and m/z 4295 in -80°C and -20°C spectra are presented in
Figure 2, along with the intensities of three other representative peaks (m/z 3689 (Q10;
Vlahou et al. (17)), m/z 2961 (IMAC30; Vlahou et al. (17)), and m/z 1455 (IMAC30;
Laronga et al. (15)).
Of the peaks reported by Laronga et al. (15), and recovered in our -80°C dataset, none
were significantly different in mean peak intensities between lymph node positive and negative breast cancer sera (Mann-Whitney U test; p > 0.05).
The intensities of all recovered peaks in the current dataset were plotted against their
respective sample storage times. Visual inspection of these plots revealed a random
distribution of peak intensities across sample age. Figure 3 presents three plots,
representative for all peaks recovered in cohort A.
111
Chapter 3.2
Figure 2
Peak intensity distributions of the peaks at m/z 3689, m/z 4022, m/z 4295 (Q10), m/z 2961
(IMAC30) (all four reported by Vlahou et al. (17)), and m/z 1455 (IMAC30; reported by Laronga et
al. (15)) (LN-: lymph node negative, LN+: lymph node positive) in cohort B of the current dataset.
7
storage temp
peak intensity
6
-20°C
-80°C
5
4
3
2
1
0
LN- LN- LN+
m/z
3689
m/z
4022
m/z
4295
m/z
2961
m/z 1455
Discussion
Differentiation of breast cancer and normal
Breast cancer was the most commonly diagnosed cancer among women in the USA in
2004. (20) While early detection of breast cancer can lead to improved clinical
outcomes (21;22), 34% of breast cancer patients in the USA are diagnosed in a late stage
(20). Currently used serum tumour markers, such as CA15.3, lack adequate sensitivity
(23%) and specificity (69%) to be applicable in cancer detection (23), and are therefore
recommended only for use as markers for monitoring therapy or recurrence (24). Even
with mammography, being the most widely applied imaging test today, approximately
20% of breast cancers will remain undetected (25). Therefore, new, robust and valid
serum biomarkers that can be applied in the (early) detection of breast cancer are
urgently needed.
In search for these biomarkers, Vlahou et al. (17) reported the application of a number
of peaks, detected by SELDI-TOF MS, in the differentiation between sera obtained from
breast cancer patients and normal women. Although all reported classifiers were
recovered from the current dataset, identified peak mass-to-charge ratio’s differed
slightly between the datasets of Vlahou and Laronga and the current dataset. Mass shifts
between spectra can have its origin in the low resolution of the SELDI-(linear)TOF MS
(26), in its mass accuracy of 0.1%, or in the different calibrations that were applied to
both datasets (10;27). Finally, since none of the peaks discovered were structurally
identified, it cannot be excluded that corresponding peak masses between both studies
112
Assessment of robustness and validity
possibly represent different proteins / peptides, or result from post-translational
modifications or slight differences in sample storage and processing, thus explaining the
observed mass shift.
Figure 3 Sample age vs. intensity of the peaks at A. m/z 3689 (Q10), B. m/z 2961 (IMAC30) (both reported
by Vlahou et al.(17)), and C. m/z 1455 (IMAC30; reported by Laronga et al. (15)) in cohort A of the
current dataset.
m/z 3689
7
BC
N
6
Peak intensity
m/z 2961
group
5
4
3
2
4
m/z 1455
group
4
group
BC
N
3
LNLN+
3
2
2
1
1
0
0
1
0
200
400
600
800
200
400
600
800
200
400
600
800
Sample age (days)
The promising recovery of classifiers in our own dataset did, however, not develop
further into a satisfactory discrimination between breast cancer and normal by these
classifiers. Although SELDI-TOF MS peak intensity data are known to be affected by
the different sample collection methods and instrument settings applied in these studies
(28;29), peak expression in both datasets was not only different, but even reversed for a
number of classifiers (i.e., up- vs. downregulated in breast cancer patients). Moreover,
mean peak intensities of the classifiers recovered in our dataset were not significantly
different between both groups. An exact parallel between the peak expression
differences in both studies is, however, difficult to draw. Peak expression differences in
the total study population (as determined in the current study) do not necessarily
correspond to peak expression differences observed in a subgroup of the total
population (as deduced from the decision trees reported by Vlahou et al. (17)). Still,
peak expression differences of the first markers applied in reported decision trees (4.27
kDa and 3.94 kDa), for which aforesaid is not relevant, was reversed between both
datasets. As a result, only decision trees with suboptimal performance could be
constructed, and validity of classifiers could not be ascertained by our own dataset.
Differentiation of lymph node positive and -negative
The study performed by Laronga et al. (15) had multiple aims, one of which was the
identification of serum biomarkers specific for lymph node involvement in breast
cancer patients. Currently, lymph node status is determined by sentinel lymph node
biopsy. Although false negative rates of 33% have been reported, sentinel lymph node
113
Chapter 3.2
detection has an overall accuracy of ≥ 96% (30). The procedure has, however, a highly
operator dependent character, and in the event of multicentric tumours and lymph
nodes with high tumour burden, accuracy can be limited (31). Thus, serum biomarkers
for lymph node status, provided their performance is satisfactory, can reduce morbidity
associated with biopsy and aid in determining whether dissection of axillary lymph
nodes is required.
In search for these markers, Laronga et al. (15) reported an eight-node decision tree,
constructed out of six features (three of each ProteinChip array type applied), to
differentiate between lymph node positive and -negative breast cancer patients with a
sensitivity and specificity of 81% and 77%, respectively, as determined by cross
validation. With the exception of the 1003 Da peak, intensity cut off values of all peaks
used in the discrimination between lymph node positive and -negative breast cancer
patients are quite low. As demonstrated by Semmes et al. (10), interlaboratory
agreement on peak m/z values is more difficult to achieve when peak intensities
decrease. This finding is also reflected by our own observation, since none of the low
intensity peaks discovered on the SAX surface were recovered in our dataset. The 1003
Da classifier, reported on the IMAC surface by Laronga et al. (15), had the highest
intensity cut off value and was indeed recovered in our own dataset. This peak,
however, is located in the very low m/z region of the spectrum, where the matrix noise
contribution to the baseline signal is largest (27). Hence, this peak most likely does not
represent a functional peptide, but might also be an adduct or artefact of the energyabsorbing-molecules or other chemical contaminants. As such, the biological validity of
the peak is open to question (1). Structural identification clearly is a prerequisite to
unequivocally determine whether an alleged biomarker is biologically valid (32).
Data analysis
Both Vlahou et al. (17) and Laronga et al. (15) applied the same bioinformatics software
as used in the current study, i.e., the Biomarker Patterns software. This software
package, applied in pattern recognition and sample classification, constructs decision
trees by means of forward selection. A drawback of this method is that each successive
split (‘node’) is less well-founded statistically, since sample size concomitantly decreases
with an increasing number of decision nodes (33). Thus, with each successive decision
rule, the tree becomes more strongly fitted to the training dataset, thereby reducing the
likelihood of generalisation to unseen (test) data. This overfitting of data is more likely
to occur when a large number of possible classifiers is applied in the construction of a
multivariable model, such as a decision tree, as is often the case in proteomic studies (8).
Overfitting of a model can unequivocally be detected by cross validation. While the
error rate in the training set tends to decrease with an increasing number of classifiers,
the error rate in the test set (as determined by cross validation) will increase. Thus, the
performance of decision trees that do not suffer from overfitting will be similar during
training and cross validation. Cross validation of the decision tree reported by Laronga
114
Assessment of robustness and validity
et al. (15) yielded a sensitivity and specificity of 77% and 81%, respectively, while
during training, a sensitivity and specificity of 100% and 87.3%, respectively, was
achieved, indicating probable overfitting. Classifiers applied in overfitted trees are
seldom robust, since they often represent peculiarities of the data set used for tree
construction (33), providing a possible explanation for our inability to recover part of
the classifiers detected by Laronga et al. (15).
Furthermore, for data analysis procedures, Laronga et al. (15) refer to the publication of
Vlahou et al. (17), in which spectra were normalised for total ion current in the 1.5-200
kDa mass range. The biomarkers reported by Laronga et al. (15) on the IMAC surface
are, however, all < 1.5 kDa in mass. Since the application of a normalisation factor is
only valid in the mass range employed during computation, intensities of the < 1.5kDa
peaks applied by Laronga et al. (15) suffer from faulty normalisation. This may well
provide an explanation for our inability to validate the markers we recovered from the
IMAC30 surface.
Sample storage temperature and -time
Recovery of the SAX markers detected by Laronga et al. (15) in our validation sample
set should be achieved when markers are robust, even though both sample handling
and assay procedures presumably were not completely identical between studies.
However, of the markers that were proven to be robust, validity could not be
ascertained by our sample set. Our inability to recover and validate (part of) the
reported markers could result from the difference in storage temperature between the
sample sets analysed by Vlahou et al. (17), Laronga et al. (15) (both at -80°C), and the
sample set analysed in the current study (-20°C). To investigate dependence of peak
expression on storage temperature, we analysed an additional serum sample set from
primary breast cancer patients, with identical samples stored at both -80°C and -20°C.
Regarding the markers reported by Laronga et al. (15), only markers detected on the
IMAC30 surface were recovered from our -80°C / -20°C dataset. Peak intensities of
recovered markers were not significantly different between -80°C and -20°C spectra.
None of the markers reported on the IMAC surface and recovered in our -80°C sample
set was able to differentiate between lymph node positive and -negative breast cancer
patients. Thus, expression of these three markers does not seem to be influenced by
sample storage temperature (-20°C vs. -80°C).
Regarding the markers reported by Vlahou et al. (17), all were recovered from the -80°C
/ -20°C sample set. No significant difference in peak intensities of recovered markers
was observed between -80°C and -20°C spectra, except for m/z 4022 and m/z 4295.
However, as the 95% CI of the mean peak intensities overlapped between the -80°C and
-20°C spectra, expression of these peaks is most likely to be independent of storage
temperature. Yet, since our -80°C sample set consisted solely of breast cancer sera,
decisive conclusions with respect to influence of storage temperature on marker
115
Chapter 3.2
expression in sera of normal women, and thus on marker performance, could not be
drawn.
Although the structural identity of reported markers has not been elucidated, we
hypothesize these markers to represent fragments of host response proteins, originating
from protease (or protease inhibitor) activity, since these fragments frequently have
been identified as potential biomarkers in the proteomic biomarker studies published to
date (34-36). Assuming this hypothesis is correct, then protease (inhibitor) activity in
breast cancer sera is most likely not affected by storage temperature, since peak
intensities were similar in both -80°C and -20°C spectra under the tested conditions. We
therefore hypothesize that protease (inhibitor) activity in normal control sera is not
influenced by sample storage temperature as well, and, as a consequence, sample storage
temperature most likely does not influence biomarker performance in this study.
Another parameter that can influence the expression of reported peaks, is storage time.
When plotting peak intensity vs. sample storage time, a random distribution of peak
intensity across sample age was observed for all peaks. Thus, the expression of reported
biomarkers, recovered in our dataset, is most likely not influenced by storage time.
Evidently, further research is warranted to draw definitive conclusions.
Differentiation by other biomarkers
Satisfactory differentiation between sera of either breast cancer patients and normal
women or lymph node positive and -negative breast cancer patients could not be
achieved by any other (pattern of) features in the protein profiles generated on either
IMAC30 or Q10 surfaces. The current approach of using unfractionated sera clearly
lacks sensitivity to differentiate between these populations. Different pre-fractionation
techniques may, however, provide us with a more in-depth view of especially the lowabundant proteome (37;38), thereby offering a possible means of differentiation.
Moreover, since breast cancer is a highly heterogeneous disease, study patients may fall
into a wide variety of different subgroups. Then, sample size will likely become too
small to detect differences between subclasses of cancer or between cancer and normal,
and detection of specific biomarkers may be hampered. Increasing sample size can
provide a solution.
Conclusion
Conclusion
In this study, both robustness and validity of the breast cancer biomarkers detected by
Vlahou et al. (17) and Laronga et al. (15) were assessed. Following analysis of a different
set of breast cancer and normal control sera in a different laboratory by meticulously
using the reported assays, all biomarkers reported by Vlahou et al. (17) were recovered
in our dataset. However, none of these biomarkers, applied either alone or in
combination with each other, could satisfactorily differentiate between breast cancer
sera and normal sera. Thus, although robustness of these biomarkers in our dataset was
116
Assessment of robustness and validity
proven, their validity could not. The biomarkers Laronga et al. (15) reported to be
specific for lymph node involvement in breast cancer patients were partially recovered
from our dataset. Since no satisfactory differentiation between lymph node positive and
-negative breast cancer sera could be achieved using the recovered markers, validity of
these biomarkers could not be ascertained by analysis of our small sample set.
In conclusion, this study demonstrates that, although results reported by both Vlahou et
al. (17) and Laronga et al. (15) were promising, the validity or serum biomarkers
discovered herein could not be ascertained by analysis of our independent sample set.
Although structural identification of a potential biomarker is no absolute prerequisite in
validation, it can not only undermine problems with robustness, but also determine its
biological validity. Our findings stress that, when reporting on potential biomarkers,
confirmation of both robustness and validity is pivotal in obtaining their clinical
applicability, and structural identification of a potential biomarker, prior to its
validation, is strongly recommended.
References
(1)
Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH et al. Serum protein fingerprinting
coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate
hyperplasia and healthy men. Cancer Res 2002; 62(13):3609-3614.
(2)
Woong-Shick A, Sung-Pil P, Su-Mi B, Joon-Mo L, Sung-Eun N, Gye-Hyun N et al. Identification of
hemoglobin-alpha and -beta subunits as potential serum biomarkers for the diagnosis and prognosis of
ovarian cancer. Cancer Sci 2005; 96(3):197-201.
(3)
Zhang Z, Bast RC, Jr., Yu Y, Li J, Sokoll LJ, Rai AJ et al. Three biomarkers identified from serum
proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004; 64(16):5882-5890.
(4)
Xiao X, Liu D, Tang Y, Guo F, Xia L, Liu J et al. Development of proteomic patterns for detecting lung
cancer. Dis Markers 2003; 19(1):33-39.
(5)
Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for
identification of serum biomarkers to detect breast cancer. Clin Chem 2002; 48(8):1296-1304.
(6)
Diamandis EP. Point: Proteomic patterns in biological fluids: do they represent the future of cancer
diagnostics? Clin Chem 2003; 49(8):1272-1275.
(7)
Grizzle WE, Meleth S. Clarification in the point/counterpoint discussion related to surface-enhanced
laser desorption/ionization time-of-flight mass spectrometric identification of patients with
adenocarcinomas of the prostate. Clin Chem 2004; 50(8):1475-1476.
(8)
Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer
Inst 2005; 97(4):315-319.
(9)
Li J, Orlandi R, White CN, Rosenzweig J, Zhao J, Seregni E et al. Independent Validation of Candidate
Breast Cancer Serum Biomarkers Identified by Mass Spectrometry. Clin Chem 2005; 51(12):2229-2235.
(10) Semmes OJ, Feng Z, Adam BL, Banez LL, Bigbee WL, Campos D et al. Evaluation of serum protein
profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the
detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 2005; 51(1):102-112.
117
Chapter 3.2
(11) Grizzle WE, Adam BL, Bigbee WL, Conrads TP, Carroll C, Feng Z et al. Serum protein expression
profiling for cancer detection: validation of a SELDI-based approach for prostate cancer. Dis Markers
2003; 19(4-5):185-195.
(12) Becker S, Cazares LH, Watson P, Lynch H, Semmes OJ, Drake RR et al. Surfaced-enhanced laser
desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1
and sporadic breast cancer. Ann Surg Oncol 2004; 11(10):907-914.
(13) Heike Y, Hosokawa M, Osumi S, Fujii D, Aogi K, Takigawa N et al. Identification of serum proteins
related to adverse effects induced by docetaxel infusion from protein expression profiles of serum using
SELDI ProteinChip system. Anticancer Res 2005; 25(2B):1197-1203.
(14) Hu Y, Zhang S, Yu J, Liu J, Zheng S. SELDI-TOF-MS: the proteomics and bioinformatics approaches in
the diagnosis of breast cancer. Breast 2005; 14(4):250-255.
(15) Laronga C, Becker S, Watson P, Gregory B, Cazares L, Lynch H et al. SELDI-TOF serum profiling for
prognostic and diagnostic classification of breast cancers. Dis Markers 2003; 19(4-5):229-238.
(16) Pusztai L, Gregory BW, Baggerly KA, Peng B, Koomen J, Kuerer HM et al. Pharmacoproteomic analysis
of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or
adjuvant chemotherapy for breast carcinoma. Cancer 2004; 100(9):1814-1822.
(17) Vlahou A, Laronga C, Wilson L, Gregory B, Fournier K, McGaughey D et al. A novel approach toward
development of a rapid blood test for breast cancer. Clin Breast Cancer 2003; 4(3):203-209.
(18) Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M et al. Postoperative serum
proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving
adjuvant chemotherapy. Oncogene 2006; 25(7):981-989.
(19) Mathelin C, Cromer A, Wendling C, Tomasetto C, Rio MC. Serum biomarkers for detection of breast
cancers: a prospective study. Breast Cancer Res Treat 2005;1-8.
(20) Jemal A, Tiwari RC, Murray T, Ghafoor A, Samuels A, Ward E et al. Cancer statistics, 2004. CA Cancer J
Clin 2004; 54(1):8-29.
(21) Etzioni R, Urban N, Ramsey S, McIntosh M, Schwartz S, Reid B et al. The case for early detection. Nat
Rev Cancer 2003; 3(4):243-252.
(22) Smith RA, Cokkinides V, Eyre HJ. American Cancer Society guidelines for the early detection of cancer,
2004. CA Cancer J Clin 2004; 54(1):41-52.
(23) Stieber P, Molina R, Chan DW, Fritsche HA, Beyrau R, Bonfrer JM et al. Clinical evaluation of the
Elecsys CA 15-3 test in breast cancer patients. Clin Lab 2003; 49(1-2):15-24.
(24) Chan DW, Beveridge RA, Muss H, Fritsche HA, Hortobagyi G, Theriault R et al. Use of Truquant BR
radioimmunoassay for early detection of breast cancer recurrence in patients with stage II and stage III
disease. J Clin Oncol 1997; 15(6):2322-2328.
(25) Olsen O, Gotzsche PC. Cochrane review on screening for breast cancer with mammography. Lancet
2001; 358(9290):1340-1342.
(26) Petricoin EF, Liotta LA. SELDI-TOF-based serum proteomic pattern diagnostics for early detection of
cancer. Curr Opin Biotechnol 2004; 15(1):24-30.
(27) Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum:
comparing datasets from different experiments. Bioinformatics 2004; 20(5):777-785.
(28) Rai AJ, Stemmer PM, Zhang Z, Adam BL, Morgan WT, Caffrey RE et al. Analysis of Human Proteome
Organization Plasma Proteome Project (HUPO PPP) reference specimens using surface enhanced laser
desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multi-institution correlation of
spectra and identification of biomarkers. Proteomics 2005; 5(13):3467-3474.
(29) Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al. HUPO Plasma Proteome
Project specimen collection and handling: towards the standardization of parameters for plasma
proteome samples. Proteomics 2005; 5(13):3262-3277.
118
Assessment of robustness and validity
(30) Scoggins CR, Chagpar AB, Martin RC, McMasters KM. Should sentinel lymph-node biopsy be used
routinely for staging melanoma and breast cancers? Nat Clin Pract Oncol 2005; 2(9):448-455.
(31) Tafra L, Lannin DR, Swanson MS, Van Eyk JJ, Verbanac KM, Chua AN et al. Multicenter trial of sentinel
node biopsy for breast cancer using both technetium sulfur colloid and isosulfan blue dye. Ann Surg
2001; 233(1):51-59.
(32) Gillette MA, Mani DR, Carr SA. Place of pattern in proteomic biomarker discovery. J Proteome Res
2005; 4(4):1143-1154.
(33) Wiemer JC, Prokudin A. Bioinformatics in proteomics: application, terminology, and pitfalls. Pathol Res
Pract 2004; 200(2):173-178.
(34) Fung ET, Yip TT, Lomas L, Wang Z, Yip C, Meng XY et al. Classification of cancer types by measuring
variants of host response proteins using SELDI serum assays. Int J Cancer 2005; 115(5):783-789.
(35) Li J, White N, Zhang Z, Rosenzweig J, Mangold LA, Partin AW et al. Detection of prostate cancer using
serum proteomics pattern in a histologically confirmed population. J Urol 2004; 171(5):1782-1787.
(36) Tolson J, Bogumil R, Brunst E, Beck H, Elsner R, Humeny A et al. Serum protein profiling by SELDI
mass spectrometry: detection of multiple variants of serum amyloid alpha in renal cancer patients. Lab
Invest 2004; 84(7):845-856.
(37) Righetti PG, Castagna A, Antonioli P, Boschetti E. Prefractionation techniques in proteome analysis: the
mining tools of the third millennium. Electrophoresis 2005; 26(2):297-319.
(38) Thulasiraman V, Lin S, Gheorghiu L, Lathrop J, Lomas L, Hammond D et al. Reduction of the
concentration difference of proteins in biological liquids using a library of combinatorial ligands.
Electrophoresis 2005; 26(18):3561-3571.
119
Chapter
Haptoglobin phenotype is not
a predictor of recurrence free
survival in high-risk primary
breast cancer patients
Marie-Christine W. Gast
Harm van Tinteren
Marijke Bontenbal
René Q.G.C.M. van Hoesel
Marianne A. Nooij
Sjoerd Rodenhuis
Paul N. Span
Vivianne C.G. Tjan-Heijnen
Elisabeth G.E. de Vries
Nathan Harris
Jos W.R. Twisk
Jan H.M. Schellens
Jos H. Beijnen
BMC Cancer 2008;8;389
3.3
Chapter 3.3
Abstract
Better breast cancer prognostication may improve selection of patients for adjuvant
therapy. We conducted a retrospective follow-up study in which we investigated sera of
high-risk primary breast cancer patients, to search for proteins predictive of recurrence
free survival. Two sample sets of high-risk primary breast cancer patients participating
in a randomised national trial investigating the effectiveness of high-dose
chemotherapy were analysed. Sera in set I (n = 63) were analysed by surface-enhanced
laser desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS) for
biomarker finding. Initial results were validated by analysis of sample set II (n = 371),
using one-dimensional gel-electrophoresis.
In sample set I, the expression of a peak at mass-to-charge ratio 9198 (relative intensity
≤ 20 or > 20), identified as haptoglobin (Hp) alpha-1 chain, was strongly associated with
recurrence free survival (global Log-rank test; p = 0.0014). Haptoglobin is present in
three distinct phenotypes (Hp 1-1, Hp 2-1, and Hp 2-2), of which only individuals with
phenotype Hp 1-1 or Hp 2-1 express the haptoglobin alpha-1 chain. As the expression
of the haptoglobin alpha-1 chain, determined by SELDI-TOF MS, corresponds to the
phenotype, initial results were validated by haptoglobin phenotyping of the
independent sample set II by native one-dimensional gel-electrophoresis. With the Hp
1-1 phenotype as the reference category, the univariate hazard ratio for recurrence was
0.87 (95% CI: 0.56-1.34, p = 0.5221) and 1.03 (95% CI: 0.65-1.64, p = 0.8966) for the Hp
2-1 and Hp 2-2 phenotypes, respectively, in sample set II. In contrast to our initial
results, the haptoglobin phenotype was not identified as a predictor of recurrence free
survival in high-risk primary breast cancer in our validation set. Our initial observation
in the discovery set was probably the result of a type I error (i.e., false positive). This
study illustrates the importance of validation in obtaining the true clinical applicability
of a potential biomarker.
122
Prognostic serum protein profiles for breast cancer
Introduction
Following lung cancer, breast cancer currently is the second leading cause of cancer
deaths in women (1). A substantial survival benefit is achieved by treatment with
adjuvant systemic therapy. The main prognostic factors in breast cancer include clinical
(age) and pathological parameters (tumour size, lymph node status, and grade of
malignancy), whereas the hormone-receptor and Her2/neu-receptor status are (also)
predictive factors (2). However, 30-50% of breast cancer patients will eventually
develop metastatic relapse and die, despite locoregional treatment and adjuvant
systemic chemotherapy (3), while there is a small percentage that would have survived
without adjuvant chemotherapy and hormonal therapy. Clearly, improved breast cancer
prognostication is urgently needed to more accurately predict clinical outcome in
individual patients and as such reduce both over- and undertreatment of the disease.
High-throughput genomic and transcriptomic approaches have recently demonstrated
to generate signatures that better predict clinical outcome than conventional prognosis
criteria. For example, investigators from our institutes have published gene expression
profiles in tumour tissue that outperformed all clinical variables in predicting disease
outcome (distant metastases) (4-7). Similarly, a RT-PCR based multigene assay was
recently shown to accurately predict both the probability of recurrence and the
magnitude of chemotherapy benefit in node-negative, oestrogen-receptor positive
breast cancer (8).
An alternative and complementary approach is to perform protein expression analysis.
As the proteome reflects gene expression as well as protein stability and posttranslational modifications, protein data could, in principle, be used for the same
purpose. One of the techniques currently applied in proteomics research of breast
cancer is surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry
(SELDI-TOF MS). Until now, only two studies have been published in which this
platform was applied in the identification of serum markers for prognosis of breast
cancer (9;10). Comparing the tumour cytosolic extract of node-negative sporadic breast
tumours with or without a recurrence, Ricolleau et al. (10) identified a high level of
ubiquitin and / or a low level of ferritin light chain to be associated with a good
prognosis in breast cancer (n = 60). Goncalves et al. (9) constructed a multiprotein
model, consisting of 40 proteins, that correctly predicted relapse in 67 of the 81 patients
of which fractionated sera were investigated. These promising results need to be
interpreted cautiously, as in both studies only a limited number of patients was
investigated, and results have not been validated yet by analysis of independent study
populations.
Hence, the aim of the current study is to investigate sera of high-risk primary breast
cancer patients to search for proteins predictive of recurrence free survival, and to
validate our results by analysis of an independent study population.
123
Chapter 3.3
Materials and Methods
Study population
From 1993 to 1999, high-risk primary breast cancer patients who had undergone
modified radical mastectomy or breast conserving surgery with complete axillary
clearance participated in a randomised multicentre phase III trial. This study
investigated the benefit of high-dose adjuvant chemotherapy in patients with ≥ 4
axillary lymph node metastases. The design of the study has been described elsewhere
(11). Major eligibility criteria were histologically confirmed stage 2A, 2B or 3A breast
cancer with at least 4 tumour-positive axillary lymph nodes but no evidence of distant
metastases, age under 56 years, and no previous other malignancies.
In sample set I, sera of 63 study patients who were treated in the Netherlands Cancer
Institute were included. Sera were procured after surgery (7-51 days), but prior to
adjuvant chemotherapy (0-45 days). All sera were obtained and stored under strictly
defined conditions at the Institutional Serum Bank. In sample set II, serum / plasma
samples (procured at any time point in therapy) of 371 study patients treated in the
Netherlands Cancer Institute (sera; n = 15, plasma; n = 38), the Erasmus Medical Center
- Daniel den Hoed Cancer Center (sera; n = 114), the Radboud University Medical
Center Nijmegen (sera; n = 87), the University Medical Center Groningen (sera; n = 69),
and the University Medical Center Leiden (sera; n = 48) were included. All samples
were obtained with medical-ethics approval and all patients gave informed consent.
Chemicals
All used chemicals were obtained from Sigma, St. Louis, MO, USA, unless stated
otherwise.
Biomarker discovery
Protein profiling was performed using the ProteinChip SELDI Reader (Bio-Rad
Laboratories, Hercules, CA, USA). Several chromatographic array surfaces with suitable
binding conditions were screened for discriminative mass-to-charge ratio’s (m/z)
between unfractionated sera of breast cancer patients of set I either experiencing a
recurrence at a relatively short follow-up (Recurrence Free Survival (RFS) < 16 months,
n = 4), or experiencing no recurrence after a long follow-up (> 75 months, n = 4).
Optimal discrimination between both groups was obtained by Q10 arrays (strong anion
exchange chromatography) with 100 mM Tris-HCl pH 8 / 0.1% TritonX-100 as a
binding buffer. This assay was subsequently applied in the analysis of all sera in sample
set I (n = 63).
In brief, samples were thawed on ice and denatured by 1:10 dilution in 9 M urea / 2% 3[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS) / 1% dithiotreitol (DTT). Arrays were assembled in a 96-well bioprocessor (Bio-Rad Labs), which was
124
Prognostic serum protein profiles for breast cancer
placed on a platform shaker at 350 rpm at all steps of the protocol. Arrays were
equilibrated twice with 200 µl of binding buffer for 5 min. Pretreated serum samples
were diluted 1:10 in binding buffer and were randomly applied to the arrays. After a 30
min incubation, the arrays were washed twice with binding buffer and twice with 100
mM Tris-HCl pH 8 for 5 min. Following a quick rinse with deionised water (Braun,
Emmenbrücke, Germany), arrays were air-dried. A 50% solution of sinapinic acid (BioRad Labs) in 50% acetonitrile (ACN) / 0.5% trifluoroacetic acid (TFA) was applied twice
(1.0 µl) to the array as matrix. Following air-drying, the arrays were analysed using the
ProteinChip SELDI (PBS IIc) Reader (Bio-Rad Labs). For mass accuracy, the instrument
was calibrated on the day of measurements with All-In-One peptide standard (Bio-Rad
Labs). Data were collected between 0 and 200 kDa, averaging 65 laser shots with
intensity 158, detector sensitivity 5, and a focus lag time of 746 ns. Spectra were
baseline subtracted and normalised to the total ion current from 1.5 to 200 kDa. The
Biomarker Wizard software package (version 3.1, Bio-Rad Labs) was applied for peak
detection. Peaks were auto-detected when occurring in at least 25% of spectra and
when having a signal-to-noise ratio of at least 5. Peak clusters were completed with
peaks with a signal-to-noise ratio of at least 2 in a cluster mass window of 0.3%.
Biomarker characterisation
A 500 µl serum sample containing the biomarker of interest marker (i.e., m/z 9198) was
denatured in 9 M urea / 2% CHAPS / 1% DTT in 50 mM Tris-HCl pH 9. The sample
was subsequently fractionated on Q Ceramic HyperD beads with a strong anion
exchange moiety (Biosepra Inc., Marlborough, MA, USA). After binding of denatured
sample to the beads, the flow through was collected and bound proteins were
subsequently eluted with buffers of pH 9-3. The fraction containing the marker was
further purified by size fractionation, using Microcon 50 kDa MW spin concentrators
(YM50, Millipore, Billerica, MA, USA) with increasing concentrations of ACN / 0.1%
TFA. The filtrate containing the m/z 9198 marker was subsequently de-salted by
application on reversed phase RP18 beads (Varian Inc., Palo Alto, CA, USA), followed
by elution with increasing concentrations of ACN containing 0.1% TFA. The
purification process was monitored by profiling each fraction on Q10 arrays and NP20
arrays (a non-selective, silica chromatographic surface). Eluates containing the m/z 9198
marker were dried and redissolved in loading buffer for SDS-PAGE, which was
performed on Novex NuPage gels (18% Tris-Glycine gel; Invitrogen, San Diego, CA,
USA). Following Coomassie staining (Simply Blue; Invitrogen), protein bands of interest
were excised and collected. The proteins within the excised bands were eluted by
washing twice with 30% ACN / 100 mM ammonium bicarbonate, followed by
dehydration in 100% ACN. Gel bands were subsequently heated at 50°C for 5 min and
eluted with 45% formic acid / 30% ACN / 10% isopropanol under sonification for 30
min. After leaving the eluates overnight at room temperature, they were profiled on
NP20 arrays. Eluates were subsequently dried, resuspended in 20 ng/µl trypsin
125
Chapter 3.3
(Promega, Madison, WI, USA) in 10% ACN / 25 mM ammonium bicarbonate, followed
by incubation at room temperature for 4 h for protein digestion. For in-gel protein
digestion, gel bands were first washed with 40% methanol / 10% acetic acid twice,
followed by a 30% ACN / 100 mM ammonium bicarbonate wash. Gel bands were dried
by SpeedVac and digested for 12 h by trypsin (20 ng/µl 100 mM ammonium
bicarbonate). All tryptic digests were profiled on NP20 chips, using 1 µl 20% alphacyano-4-hydroxy cinnaminic acid solution in 50% ACN / 0.5% TFA as matrix. Peptides
in the digests were investigated with the NCBI database using the ProFound search
engine at http://prowl.rockefeller.edu/prowl-cgi/profound.exe with the following
search parameters: standard cleavage rules for trypsin, 1 missed cleavage allowed.
Confirmation of protein identity was provided by sequencing tryptic digest peptides by
quadrupole-TOF (Q-TOF) MS (Applied Biosystems / MSD Sciex, Foster City, CA, USA)
fitted with a ProteinChip Interface. Fragment ion spectra resulting from Q-TOF
analyses were taken to search the SwissProt 44.2 database (Homo Sapiens: 11072
sequences) using the MASCOT search engine at www.matrixscience.com (Matrix
Science Ltd., London, UK), with the following search parameters: monoisotopic
precursor mass tolerance: 40 ppm, fragment mass tolerance: 0.2 Da, variable
modifications: methionine oxidation, and trypsin cleavage site. Throughout the
identification experiments, a serum sample lacking the m/z 9198 marker was run
concurrently as a negative control.
Haptoglobin phenotyping assay
The haptoglobin (Hp) phenotype of all samples in set I and II was assessed by native
one-dimensional gel electrophoresis, followed by peroxidase staining. One µl of serum
or plasma sample was mixed with 19 µl of a 1:100 dilution of haemolysate in phosphate
buffered saline. Following incubation for 5 min at room temperature, 10 µl of 3x native
sample buffer (30 ml glycerol / 18.8 ml 1 M Tris-HCl pH 6.8 / 1.5 ml 1% (w/v)
bromophenol blue, made to 100 ml with water) was added and mixed. Samples were
then loaded onto a 3-8% gradient Tris-Acetate NuPAGE precast gel (Invitrogen,
Karlsruhe, Germany). Samples were run at a constant 150 V, gradient 18-7 mA for 3 h,
using a running buffer of 25 mM Tris / 250 mM glycine, adjusted to pH 8.6. After
staining with 1% (w/v) rhodamine 1%, the gel was incubated for 10 min in a 1:1 waterdiluted leucomalachite green peroxidase-development buffer (0.2 g leucomalachite
green / 0.02 g EDTA in 25 ml 40% (v/v) acetic acid with 0.06% (v/v) H2O2). The
phenotype of each sample was subsequently determined by its specific migration
pattern, which appears as black bands in the gel (Figure 1A) (12).
Statistical analysis
Survival curves were analysed according to the Kaplan-Meier method from the date of
randomisation to the time of first recurrence or death, or the date of last follow-up. The
curves were compared by log-rank statistics. To investigate the relation of haptoglobin
126
Prognostic serum protein profiles for breast cancer
phenotype and other variables with recurrence-free survival time, a Cox proportional
hazards model was used. Relations were expressed in terms of hazard ratios with 95%
confidence intervals. Possible confounding clinical variables that either have known
prognostic or predictive value (i.e., treatment (high dose vs. conventional dose
chemotherapy), age (≥ 40 yrs vs. < 40 yrs), number of positive lymph node (0 - 9 vs. ≥
10), tumour size (< 5 cm vs. ≥ 5 cm), Her2/neu status (negative vs. positive, of note,
patients did not yet receive adjuvant trastuzumab), receptor status (oestrogen and / or
progesterone receptor (ER/PR) positive vs. negative), and Bloom-Richardson grade
(grade I vs. grade II vs. grade III)), or variables that were related to the exposure
haptoglobin phenotype (i.e., surgery (breast conserving vs. mastectomy)) were
incorporated into the model.
Figure 1
Haptoglobin phenotype assessment using a native PAGE system.
A. Specific migration pattern of Hp 1-1, Hp 2-1, and Hp 2-2, in a 3-8% gradient Tris-Acetate gel.
B. Composition of the three haptoglobin phenotypes Hp 1-1, Hp 2-1, and Hp 2-2 (adapted from (13)).
A
Hp 1-1 Hp 2-1 Hp 2-2
B
Hp 1-1
Hp α-1
Hp 2-1
Hp α-2
Hp 2-2
Hp β
The distribution of patient characteristics over the two sample sets were compared
using either the Chi-square test or the Fisher’s exact test for categorical variables and
the Mann-Whitney U test for continuous variables. All statistical analyses were
performed using SPSS statistical software, version 13.0 (SPSS Inc., Chicago, IL, USA)
and SAS statistical software, version 9.1.3 (SAS Institute Inc., Cary, NC, USA). Statistical
tests were two sided at the 5% level of significance.
127
Chapter 3.3
Results
Study population
At time of analysis, in sample set I (n = 63), 28 patients had a recurrence or had died and
35 patients were censored at a median follow-up of 6.6 years. In sample set II (n = 371),
149 patients had a recurrence or had died and 222 patients were censored at a median
follow-up of 8.0 years. Characteristics of both sample sets are provided in Table 1. All
patient characteristics were similarly distributed between sample set I and sample set II,
as determined by the Chi-square test or the Mann-Whitney U test.
Biomarker discovery
Following evaluation of several chromatographic array surfaces with suitable binding
conditions, the Q10 array with 100mM Tris-HCl pH 8 / 0.1% TritonX-100 as a binding
buffer gave optimal results in our screening population (n = 8). Using this SELDI-TOF
MS assay, the spectra of sera from patients experiencing no recurrence (n = 4) could
clearly be distinguished by the spectra of sera from patients that recurred at a relatively
short follow-up (n = 4) by overexpression of a peak at m/z 9198. The clear dichotomous
distribution in the relative m/z 9198 peak intensity was subsequently confirmed in the
acquired mass spectra of all 63 sera in sample set I (peak intensity > 20: n = 40, ≤ 20: n =
23). Representative SELDI-TOF MS spectra are presented in Figure 2. The KaplanMeier curve (Figure 3) shows a significant difference in the probability of remaining
recurrence free (Log-rank test, p = 0.0014) between high-risk primary breast cancer
patients exhibiting a peak at m/z 9198 with a relative intensity > or ≤ than 20. The
univariate hazard ratio was 3.22 (95% CI: 1.51-6.85, p = 0.0024).
Biomarker characterisation
Following anion exchange fractionation, the m/z 9198 marker was eluted in the pH 5
eluate. This fraction was concentrated on YM50 spin concentrators, and the marker was
found in the water wash. De-salting of the water wash on RP18 beads resulted in
elution of the marker in the 60% ACN / 0.1% TFA eluate, which was subsequently
subjected to SDS-PAGE analysis. After staining, a clear band in the 9 kDa region was
visible, which was excised. Elution of the proteins within the excised bands was
followed by tryptic digestion of the eluate. Profiling of the gel-eluate confirmed the
presence of the marker. Peptide mapping of the tryptic digest identified the marker as
haptoglobin alpha-1 chain (estimated Z-score 1.49, 48% sequence coverage), which is
an 83 amino acid peptide with a theoretical mass of 9192.21 Da and a pI of 5.23. This
identity [SwissProt: P00738] was confirmed by amino acid sequencing of 4 peptides in
the tryptic digest by tandem MS on a Q-TOF (76% coverage, Figure 4 A).
128
Prognostic serum protein profiles for breast cancer
Table 1
Patient and tumour characteristics of sample set I and II.
Sample set I
N
(%)
Sample set II
N
(%)
Patient characteristics
N
63
371
age
Mean [range]
< 40 years
≥ 40 years
45.8 [33-55]
10
53
(16%)
(84%)
43.9 [26-55]
94
277
(25%)
(75%)
49
11
3
(78%)
(17%)
(5%)
317
40
14
(85%)
(11%)
(4%)
Mastectomy
Breast conserving
56
7
(89%)
(11%)
291
80
(78%)
(22%)
Conventional dose
High dose
27
36
(43%)
(57%)
158
213
(43%)
(57%)
4-9
≥ 10
40
23
(63%)
(37%)
241
130
(65%)
(35%)
T1 (< 2 cm)
T2 (2-5 cm)
T3 (≥ 5 cm)
9
41
13
(14%)
(65%)
(21%)
90
225
56
(24%)
(61%)
(15%)
Her2/neu status
Negative
Positive
Unknown
42
16
5
(67%)
(25%)
(8%)
274
81
16
(74%)
(22%)
(4%)
Oestrogen / Progesterone receptor status
ER and PR negative
ER and/or PR positive
Unknown
10
50
3
(16%)
(79%)
(5%)
101
250
20
(27%)
(67%)
(5%)
Bloom-Richardson grade
Grade I
Grade II
Grade III
Unknown
13
26
20
4
(21%)
(41%)
(32%)
(4%)
62
112
170
27
(17%)
(30%)
(46%)
(7%)
Menopausal status
Pre
Post
Unknown
Surgery
Treatment
Tumour characteristics
Number
Tumour size
Haptoglobin occurs in vivo as polymers of an alpha and beta chain complex, interlinked
via disulfide bridges. There are two major alpha chains: alpha-1 (83 amino acids, 9.2
kDa) and alpha-2 (142 amino acids, 16 kDa), of which the alpha-2 chain is the product
of unequal crossing over between two alpha-1 alleles (14). Due to this genetic variation,
haptoglobin occurs in three major (pheno)types: Hp 1-1, Hp 2-1 and Hp 2-2, occurring
in 16%, 48%, and 36%, respectively, of the northwestern European population (13). The
129
Chapter 3.3
Hp 1-1 phenotype consists of an [alpha-1 - beta] dimer (86 kDa), whereas Hp 2-1
consist of two [alpha-1 - beta] units flanking a variable length [alpha-2 - beta] polymer
(86-300 kDa). Hp 2-2, the largest species, consists of multiple repeats of an [alpha-2 beta] unit (170-900 kDa) (Figure 1B) (15;16). Expression of the Hp alpha-1 chain, as
determined by SELDI-TOF MS, will correspond to the actual haptoglobin phenotype,
since the haptoglobin alpha-1 chain is only expressed by individuals with the Hp 1-1 or
Hp 2-1 phenotype (12). Indeed, following haptoglobin phenotype assessment by native
one-dimensional gel-electrophoresis, all patients with m/z 9198 ≤ 20 carried the Hp 2-2
phenotype (n = 23), while patients with m/z 9198 > 20 were shown to have either the
Hp 1-1 (n = 14) or Hp 2-1 (n = 26) phenotype (Figure 5).
Figure 2
Representative example of serum protein profiles (sample set I) obtained with the optimized
SELDI-TOF MS assay, showing the clear dichotomous expression of the m/z 9198 peak (dotted
box).
8000
9000
10000
11000
75
Pt 1
0
75
Pt 2
Relative intensity →
0
75
Pt 3
0
75
Pt 4
0
75
Pt 5
0
75
Pt 6
0
75
Pt 7
0
75
Pt 8
0
8000
9000
10000
11000
Mass-to-charge ratio →
Following Kaplan-Meier analysis by haptoglobin phenotype in sample set I (n = 63), the
Hp 1-1, 2-1 and 2-2 phenotypes were shown to be associated with a good, intermediate
and poor prognosis, respectively (global Log-rank test, p = 0.0029) (Figure 6). With Hp
1-1 phenotype as the reference category, the univariate hazard ratio was 3.08 (95% CI:
0.67-14.10, p = 0.1464) for Hp 2-1, and 7.37 (95% CI: 1.69-32.23, p = 0.0079) for Hp 2-2
130
Prognostic serum protein profiles for breast cancer
phenotype. In the multivariate Cox regression analysis, haptoglobin phenotype was
independently associated with recurrence free survival (Hp 2-2; p = 0.0098), while for
receptor status (ER/PR negative; p = 0.0962) and treatment arm (conventional dose; p =
0.0509) a borderline significant association was observed (Table 2).
Figure 3
Recurrence free survival in sample set I (n = 63) according to m/z 9198 peak intensity > 20 or ≤ 20,
as determined by SELDI-TOF MS.
1.0
m/z 9198 > 20
RFS probability
0.8
0.6
m/z 9198 <= 20
0.4
0.2
0.0
p = 0.0014 (Log-rank test, two-sided)
40
23
0
38
22
12
34
16
32
11
29
8
24
36
48
months from randomisation
29
7
60
24 > 20
6 <= 20
72
Biomarker validation
The distribution of the haptoglobin phenotype of patients in the validation sample set II
(n = 371) was subsequently assessed for validation purposes. As the haptoglobin
phenotype (i.e., genotype) is not influenced by treatment, samples in set II were
collected at any time point in therapy. All patient characteristics were similarly
distributed between sample set I and sample set II. The Hp 1-1, 2-1, and 2-2 phenotype
was determined in 70, 189, and 112 patients, respectively, yielding an allele frequency
of 0.44, which is in concordance with previously reported frequencies. (13)
The Kaplan-Meier curve, however, did not show a significant difference in the
probability of recurrence free survival (global Log-rank test, p = 0.6158) between the
high-risk primary breast cancer patients in sample set II having the Hp 1-1, 2-1 or 2-2
phenotype (Figure 7). With the Hp 1-1 phenotype as the reference category, the
univariate hazard ratio was 0.87 (95% CI: 0.56-1.34, p = 0.5221) and 1.03 (95% CI: 0.651.64, p = 0.8966) for the Hp 2-1 and Hp 2-2 phenotypes, respectively. This finding was
not affected by tumour size, which was found to be the only independently associated
variable for recurrence free survival (p = 0.0374). Her2/neu status (p = 0.0589) and
131
Chapter 3.3
treatment arm (p = 0.0809) were only borderline significantly associated with
recurrence free survival in set II (Table 2).
Table 2
Multivariable proportional-hazards analyses for the risk of recurrence for patients in sample set I
and II.
Variable
Sample set I (n = 63)
Sample set II (n = 371)
HR
(95% CI)
p-value
HR
(95% CI)
p-value
Haptoglobin phenotype
Hp 1-1
Hp 2-1
Hp 2-2
1
4.21
17.76
(0.44-39.83)
(2.00-157.44)
0.2105
0.0098
1
0.94
1.26
(0.58-1.52)
(0.76-2.08)
0.8059
0.3653
Surgery
Breast conserving
Mastectomy
1
1.26
(0.24-6.67)
0.7843
1
0.91
(0.59-1.39)
0.6514
Treatment arm
High dose
Conventional dose
1
0.33
(0.11-1.00)
0.0509
1
1.36
(0.96-1.92)
0.0809
Age
≥ 40 yrs
< 40 yrs
1
1.26
(0.32-4.93)
0.7413
1
0.97
(0.65-1.46)
0.8890
No. of positive lymph nodes
4-9
≥ 10
1
0.72
(0.23-2.29)
0.5785
1
1.00
(0.69-1.45)
0.9964
Tumour size
< 5 cm
≥ 5 cm
1
1.35
(0.50-3.65)
0.5497
1
1.63
(1.03-2.60)
0.0374
Her2/neu status
Negative
Positive
1
1.47
(0.53-4.08)
0.4641
1
1.47
(0.99-2.20)
0.0589
Receptor status
ER/PR positive
ER/PR negative
1
3.39
(0.80-14.27)
0.0962
1
1.11
(0.73-1.70)
0.6138
Bloom-Richardson grade
Grade I
Grade II
Grade III
1
0.92
1.28
(0.26- 3.23)
(0.37-4.46)
0.9016
0.6946
1
0.87
1.34
(0.50-1.50)
(0.79-2.26)
0.6187
0.2796
Abbreviations: 95% CI: 95% confidence interval, HR: hazard ratio.
Discussion
The introduction of high-throughput analytical platforms, such as the genomic/
transcriptomic microarray technology, or the proteomic SELDI-TOF MS technology,
has enabled the advent of discovery-based research. Large quantities of data can now be
analysed without underlying hypotheses, to search for patterns that discriminate
between patients with different diagnosis, prognosis or response to treatment.
132
Prognostic serum protein profiles for breast cancer
Assessment of validity, however, is pivotal in this discovery-based ‘-omics’ research, as
the meaning of such patterns from a biological perspective often is unknown.
Figure 4a Structural identification of the m/z 9198 peak cluster.
Peptide mapping of the m/z 9198 marker. MS spectrum of the m/z 9198 tryptic digest in the gel eluate. All
peptides were sequenced with tandem MS using Q-TOF for confirmation. Results from the MASCOT search
for protein identification include start and end positions of the peptide sequence starting from the amino acid
terminal of the whole protein, the observed m/z (Mr(obs)), transformed to its experimental mass (Mr(expt)),
the calculated mass (Mr(calc)) from the matched peptide sequence, as well as their mass difference (Delta), the
number of missed cleavage sites for trypsin (Miss), and the peptide sequence (in grey: the amino acid sequence
determined by Q-TOF MS).
3294.6096
100
1976.8243
3293.5781
2379.0537
1975.8258
3295.5713
2378.0378
1708.7289
2380.0500
3292.6165
1977.8232
1709.7421
%
1743.7511
2090.8633
2091.8340
1590.6274
2465.0471
2466.0205
1573.6035
1000
1200
1400
1600
3276.5515
3067.4463
2467.0144
1563.6398
0
3296.5806
2381.0464
3080.3804
2706.1506
1800
2000
2200
2400
2600
2800
3297.5657
4387.0762
4385.0986
3000
3200
3400
3600
3800
4000
4200
4400
m/z, amu
MASCOT peptide mapping results:
m/z 9198 haptoglobin alpha-1 chain
StartMr
Mr
Mr
Delta
Miss Sequence
StartEnd
(obs)
(expt)
(calc)
35-49 1590.70 1589.69 1589.79 -0.10
0
K.PPEIAHGYVEHSVR.Y
58-72 1708.80 1708.79 1707.84 -0.05
1
K.LRTEGDGVYTLNNEK.Q
78-94 1743.75 1742.74 1742.87 -0.13
0
K.AVGDKLPECEAVCGKPK.N
19-49 3292.61 3291.60 3291.51 0.09
0
A.VDSGNDVTDIADDGCPKPPEIAHGYVEHSVR.Y
Amino acid sequence of m/z 9198 haptoglobin alpha-1 chain (start: 18 - end: 101, 76% sequence coverage):
VDSGNDVTDIADDGCPKPPEIAHGYVEHSVRYQCKNYYKLRTEGDGVYTLNNEKQWINKAVGDKLPECEA
VCGKPKNPANPVQ
Our initial findings were, however, endorsed by the various biological functions of
haptoglobin and their phenotype-dependency. The main physiological function of
haptoglobin is binding of free haemoglobin. The haptoglobin-haemoglobin complex is
too large to be filtered at the kidney glomerulus and is therefore retained. Both iron loss
133
Chapter 3.3
and free-radical mediated damage, caused by the haem-iron mediated generation of free
hydroxyl radicals (by means of the Fenton reaction: H2O2 + Fe2+ → Fe3+ + OH- + ·OH)
are thus prevented (13). Haptoglobin has also been identified as a strong angiogenic
agent, activating endothelial cell growth and differentiation. This function was shown
to be phenotype dependent, as the Hp 2-2 phenotype has been found to be more
angiogenic than the other phenotypes (17). The poor prognosis of our Hp 2-2 breast
cancer patients in our discovery set (n = 63) could be exerted via this haptoglobin
function, since angiogenesis is well known to be involved tumour growth, proliferation,
and metastasis (18).
Figure 4b Structural identification of the m/z 9198 peak cluster.
Matched amino acid sequence of the m/z 9198 marker (in grey: amino acid sequence sequenced by Q-TOF
MS), haptoglobin alpha-1 chain and the corresponding N-terminus of haptoglobin-related-protein (in grey:
amino acid substitutions between haptoglobin and haptoglobin-related-protein) (19).
Comparison of amino acid sequences
m/z 9198 marker
VDSGNDVTDIADDGCPKPPEIAHGYVEHSVRYQCKNYYKLRTEGDGVYTLNNEKQWINKAVGDKLPECEAVCGKPKN
PANPVQ
Haptoglobin alphaalpha-1 chain
(Mw 9192.21, pI 5.23)
VDSGNDVTDIADDGCPKPPEIAHGYVEHSVRYQCKNYYKLRTEGDGVYTLNNEKQWINKAVGDKLPECEAVCGKPKN
PANPVQ
HaptoglobinHaptoglobin-relatedrelated-protein
(Mw 9492.64, pI 6.76)
LYSGNDVTDISDDRFPKPPEIANGYVEHLFRYQCKNYYRLRTEGDGVYTLNDKKQWINKAVGDKLPECEAVCGKPKN
PANPVQ
Both haptoglobin and its phenotype have been described previously in relation to
various diseases (including cancer), a finding which is not surprising in view of its
biology. Both the intact protein and its subunits have been found overexpressed in
serum of patients with various solid tumours, for example ovarian and small-cell lung
cancer (20-22). The 9.2 kDa haptoglobin alpha-1 chain has been specifically detected by
Tolson et al. (12) in sera of renal cell carcinoma patients and healthy controls.
Following haptoglobin phenotyping, all patients having the Hp 2-2 phenotype indeed
proved to be missing the 9.2 kDa haptoglobin alpha-1 peak in their serum protein
profile. Due to its phenotypic distribution, this protein could, however, not be
considered as a diagnostic marker. The influence of haptoglobin phenotype on
recurrence free survival has not been investigated (12). Using the SELDI-TOF MS
platform, the 9.2 kDa alpha-1 chain was also detected by Goncalves et al. (9). Unlike our
own observations, they found the 9.2 kDa peak to be overexpressed in sera of high-risk
primary breast cancer patients (n = 81) experiencing a relapse versus long-term disease
free survivors (9). The absolute intensities of the m/z 9192 peak in SELDI-TOF MS
spectra of Goncalves et al. (9) ranged between 0 and 2, and a clear dichotomous peak
intensity distribution was not observed. These discrepancies from our initial findings
most likely originate in the serum pre-fractionation that was performed in this study.
134
Prognostic serum protein profiles for breast cancer
During protein purification, we repeatedly found the 9.2 kDa peak to be predominantly
present in the pH 5 fraction. Goncalves et al. (9) however, subjected only the pH 9 /
flow through, pH 4, and organic solvent fractions to SELDI-TOF MS analysis, resulting
in a suboptimal assay for haptoglobin alpha-1 detection.
Peak intensity of the m/z 9198 marker (as determined by SELDI-TOF MS) vs. haptoglobin
phenotype (as assessed by 1D gel-electrophoresis) of samples in set I.
m/z 9198 peak intensity
Figure 5
80
60
40
20
0
Hp
1
1-1
Hp
2
2-1
Hp
3
2-2
Another association between protein expression and recurrence free survival in breast
cancer has previously been reported by Kuhajda et al. (23;24). They described a
decreased tumour tissue expression of haptoglobin-related-protein, quantitated
immunohistochemically, to be associated with a prolonged recurrence free survival in
70 breast cancer patients (23). Their findings differ from our observations by the
biological matrix analysed (tumour tissue vs. serum), by the exposure used for
prediction of recurrence free survival (protein expression vs. phenotype), and by the
identity of the protein used for prognostication (haptoglobin-related-protein vs.
haptoglobin). Although coded for by two different genes, both proteins have more than
90% amino acid sequence homology. There are, however, distinct differences between
the alpha-1 chain of haptoglobin and haptoglobin-related-protein, due to 8 amino acid
substitutions (19). Amino acid sequencing of the m/z 9198 marker, taking 7 out of 8
amino acid substitutions into account, enabled us to unequivocally identify our
candidate biomarker as haptoglobin alpha-1 chain (Figure 4 B).
The haptoglobin phenotype has not been identified as a predictor of recurrence free
survival in breast cancer thus far. The phenotype has nevertheless been associated with
clinical outcome of other pathologies, a.o. mortality (25), nephropathy (26), and
cardiovascular disease outcome in diabetic patients (27), mortality in HIV infection (28)
and mortality in tuberculosis (29). In these studies, the Hp 2-2 phenotype was
invariably associated with worse clinical outcome, in contrast though to the study of
Depypere et al. (30), who found the Hp 1-1 phenotype associated with more severe
hypertension and proteinuria in patients with preeclampsia.
135
Chapter 3.3
Despite the potential biological justifications, our promising initial result of the
haptoglobin phenotype being a predictor of recurrence free survival in a limited
number (n = 63) of high-risk primary breast cancer patients was not confirmed
following validation by analysis of a six-fold larger sample set (n = 371). It is unlikely
that our findings result from differences in patient characteristics between our
discovery and validation sample set, since all (known) characteristics were similarly
distributed between both sample sets.
Figure 6
Recurrence free survival in sample set I (n = 63) by haptoglobin phenotype.
1.0
Hp 1-1
RFS probability
0.8
Hp 2-1
0.6
Hp 2-2
0.4
0.2
0.0
p = 0.0029 (logrank test, two-sided)
14
26
23
0
14
24
22
12
13
21
16
13
19
11
12
17
8
24
36
48
months from randomisation
12
17
7
60
11 Hp 1-1
13 Hp 2-1
6 Hp 2-2
72
Despite the potential biological justifications, our promising initial result of the
haptoglobin phenotype being a predictor of recurrence free survival in a limited
number (n = 63) of high-risk primary breast cancer patients was not confirmed
following validation by analysis of a six-fold larger sample set (n = 371). It is unlikely
that our findings result from differences in patient characteristics between our
discovery and validation sample set, since all (known) characteristics were similarly
distributed between both sample sets.
The two major threats to validity of discovery-based proteomics research come from
chance and bias (31). Sources of bias include differences in sample collection and
storage, or in analysis (32). The haptoglobin phenotype however, is not influenced by
specimen collection and storage. Besides, the native 1D-gelelectrophoresis method for
assessment of haptoglobin phenotype is robust and reproducible (33). Study results are
therefore unlikely to have been influenced by bias, but rather result from chance.
Due to small sample sizes and the artifice of discovery strategies, many biomarker
candidates are prone to be false positive, i.e., be a type I error (erroneous rejection of
the null hypothesis). The chance of candidate biomarkers being type I errors is inferred
136
Prognostic serum protein profiles for breast cancer
by the fact that most proteomic datasets are subject to both the ‘curse of dimensionality’
(large number of features) and the ‘curse of dataset sparsity’ (limited number of samples)
(34). As such, datasets are frequently subjected to multiple testing in search for
candidate biomarkers. Yet, even a level of significance for type I errors of 0.01 is no
guarantee that false positive findings are debarred, even following correction for
multiple testing. Problems caused by chance are best avoided by analysis of an
independent validation dataset, in which false positive markers will be ruled out, as
they are unique to the discovery sample set (32;35).
The above-mentioned hurdles in proteomics research apply equally to all other ‘-omics’
research (e.g., genomics and metabolomics), as in general, these research approaches
suffer from a limited number of samples in comparison with the large number of
generated features.
Figure 7
Recurrence free survival in sample set II (n = 371) by haptoglobin phenotype.
1.0
RFS probability
Hp 2-1
Hp 1-1
0.8
0.6
Hp 2-2
0.4
0.2
p = 0.6158 (Log-rank test, two-sided)
70
189
0.0 112
0
63
176
103
12
57
160
90
51
143
80
48
138
73
24
36
48
months from randomisation
43
124
66
60
33 Hp 1-1
107 Hp 2-1
58 Hp 2-2
72
Conclusion
In conclusion, although we initially found the haptoglobin phenotype to be a predictor
of recurrence free survival in a limited number of high-risk primary breast cancer
patients, this was not confirmed following validation by analysis of a similar, but sixfold larger sample set. Clearly, validation of initial results is of pivotal importance in
determining the clinical significance of a candidate biomarker. In spite of this, few, if
any, related clinical diagnostic tests have yet been validated for clinical use, although
the number of papers reporting on candidate protein biomarkers is large and still
expanding. This lack of validation can result in chance results and erroneous
137
Chapter 3.3
conclusions, leading to disappointment when results cannot be reproduced (36).
Distillation of true positives from the total pool of candidate biomarkers is the single
greatest challenge in biomarker development, and should therefore be the emphasis in
data-driven proteomics research.
Acknowledgement
This study was supported by a grant of the Dutch Cancer Society (project NKI 20053421).
References
(1)
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. Cancer statistics, 2008. CA Cancer J Clin 2008;
58(2):71-96.
(2)
Goldhirsch A, Wood WC, Gelber RD, Coates AS, Thurlimann B, Senn HJ. Meeting highlights: updated
international expert consensus on the primary therapy of early breast cancer. J Clin Oncol 2003;
21(17):3357-3365.
(3)
Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer
Trialists' Collaborative Group. Lancet 1998; 352(9132):930-942.
(4)
't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M et al. Gene expression profiling predicts
clinical outcome of breast cancer. Nature 2002; 415(6871):530-536.
(5)
Foekens JA, Atkins D, Zhang Y, Sweep FC, Harbeck N, Paradiso A et al. Multicenter validation of a gene
expression-based prognostic signature in lymph node-negative primary breast cancer. J Clin Oncol 2006;
24(11):1665-1671.
(6)
van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW et al. A gene-expression signature
as a predictor of survival in breast cancer. N Engl J Med 2002; 347(25):1999-2009.
(7)
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F et al. Gene-expression profiles to predict
distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365(9460):671-679.
(8)
Paik S, Tang G, Shak S, Kim C, Baker J, Kim W et al. Gene expression and benefit of chemotherapy in
women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol 2006; 24(23):37263734.
(9)
Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M et al. Postoperative serum
proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving
adjuvant chemotherapy. Oncogene 2006; 25(7):981-989.
(10) Ricolleau G, Charbonnel C, Lode L, Loussouarn D, Joalland MP, Bogumil R et al. Surface-enhanced laser
desorption/ionization time of flight mass spectrometry protein profiling identifies ubiquitin and ferritin
light chain as prognostic biomarkers in node-negative breast cancer tumors. Proteomics 2006; 6(6):19631975.
(11) Rodenhuis S, Bontenbal M, Beex LV, Wagstaff J, Richel DJ, Nooij MA et al. High-dose chemotherapy
with hematopoietic stem-cell rescue for high-risk breast cancer. N Engl J Med 2003; 349(1):7-16.
(12) Tolson J, Bogumil R, Brunst E, Beck H, Elsner R, Humeny A et al. Serum protein profiling by SELDI
mass spectrometry: detection of multiple variants of serum amyloid alpha in renal cancer patients. Lab
Invest 2004; 84(7):845-856.
(13) Langlois MR, Delanghe JR. Biological and clinical significance of haptoglobin polymorphism in humans.
Clin Chem 1996; 42(10):1589-1600.
138
Prognostic serum protein profiles for breast cancer
(14) Bowman BH, Kurosky A. Haptoglobin: the evolutionary product of duplication, unequal crossing over,
and point mutation. Adv Hum Genet 1982; 12:189-4.
(15) Dobryszycka W. Biological functions of haptoglobin--new pieces to an old puzzle. Eur J Clin Chem Clin
Biochem 1997; 35(9):647-654.
(16) Wassell J. Haptoglobin: function and polymorphism. Clin Lab 2000; 46(11-12):547-552.
(17) Cid MC, Grant DS, Hoffman GS, Auerbach R, Fauci AS, Kleinman HK. Identification of haptoglobin as
an angiogenic factor in sera from patients with systemic vasculitis. J Clin Invest 1993; 91(3):977-985.
(18) de Castro JG, Puglisi F, de Azambuja E, El Saghir NS, Awada A. Angiogenesis and cancer: A cross-talk
between basic science and clinical trials (the "do ut des" paradigm). Crit Rev Oncol Hematol 2006;
59(1):40-50.
(19) Maeda N. Nucleotide sequence of the haptoglobin and haptoglobin-related gene pair. The haptoglobinrelated gene contains a retrovirus-like element. J Biol Chem 1985; 260(11):6698-6709.
(20) Bharti A, Ma PC, Maulik G, Singh R, Khan E, Skarin AT et al. Haptoglobin alpha-subunit and
hepatocyte growth factor can potentially serve as serum tumor biomarkers in small cell lung cancer.
Anticancer Res 2004; 24(2C):1031-1038.
(21) Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al. HUPO Plasma Proteome
Project specimen collection and handling: towards the standardization of parameters for plasma
proteome samples. Proteomics 2005; 5(13):3262-3277.
(22) Ye B, Cramer DW, Skates SJ, Gygi SP, Pratomo V, Fu L et al. Haptoglobin-alpha subunit as potential
serum biomarker in ovarian cancer: identification and characterization using proteomic profiling and
mass spectrometry. Clin Cancer Res 2003; 9(8):2904-2911.
(23) Kuhajda FP, Piantadosi S, Pasternack GR. Haptoglobin-related protein (Hpr) epitopes in breast cancer as
a predictor of recurrence of the disease. N Engl J Med 1989; 321(10):636-641.
(24) Kuhajda FP, Katumuluwa AI, Pasternack GR. Expression of haptoglobin-related protein and its potential
role as a tumor antigen. Proc Natl Acad Sci U S A 1989; 86(4):1188-1192.
(25) Suleiman M, Aronson D, Asleh R, Kapeliovich MR, Roguin A, Meisel SR et al. Haptoglobin
polymorphism predicts 30-day mortality and heart failure in patients with diabetes and acute myocardial
infarction. Diabetes 2005; 54(9):2802-2806.
(26) Nakhoul FM, Zoabi R, Kanter Y, Zoabi M, Skorecki K, Hochberg I et al. Haptoglobin phenotype and
diabetic nephropathy. Diabetologia 2001; 44(5):602-604.
(27) Levy AP, Hochberg I, Jablonski K, Resnick HE, Lee ET, Best L et al. Haptoglobin phenotype is an
independent risk factor for cardiovascular disease in individuals with diabetes: The Strong Heart Study. J
Am Coll Cardiol 2002; 40(11):1984-1990.
(28) Delanghe JR, Langlois MR, Boelaert JR, Van Acker J, Van Wanzeele F, van der GG et al. Haptoglobin
polymorphism, iron metabolism and mortality in HIV infection. AIDS 1998; 12(9):1027-1032.
(29) Kasvosve I, Gomo ZA, Mvundura E, Moyo VM, Saungweme T, Khumalo H et al. Haptoglobin
polymorphism and mortality in patients with tuberculosis. Int J Tuberc Lung Dis 2000; 4(8):771-775.
(30) Depypere HT, Langlois MR, Delanghe JR, Temmerman M, Dhont M. Haptoglobin polymorphism in
patients with preeclampsia. Clin Chem Lab Med 2006; 44(8):924-928.
(31) Ransohoff DF. Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer 2005;
5(2):142-149.
(32) Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer
Inst 2005; 97(4):315-319.
(33) Linke RP. Typing and subtyping of haptoglobin from native serum using disc gel electrophoresis in
alkaline buffer: application to routine screening. Anal Biochem 1984; 141(1):55-61.
139
Chapter 3.3
(34) Somorjai RL, Dolenko B, Baumgartner R. Class prediction and discovery using gene microarray and
proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 2003; 19(12):1484-1491.
(35) Belluco C, Petricoin EF, Mammano E, Facchiano F, Ross-Rucker S, Nitti D et al. Serum Proteomic
Analysis Identifies a Highly Sensitive and Specific Discriminatory Pattern in Stage 1 Breast Cancer. Ann
Surg Oncol 2007; 14(9):2470-2476.
(36) Ransohoff DF. Discovery-based research and fishing. Gastroenterology 2003; 125(2):290.
140
Chapter
Post-operative serum proteomic
profiles may predict recurrence
free survival in high-risk
primary breast cancer
Marie-Christine W. Gast
Marc Zapatka
Jan H.M. Schellens
Jos H. Beijnen
Interim analysis
3.4
Chapter 3.4
Abstract
Better breast cancer prognostication may improve selection of patients for adjuvant
therapy. We conducted a retrospective follow-up study in which we investigated sera of
high-risk primary breast cancer patients, to search for proteins predictive of recurrence
free survival. Sera of 82 breast cancer patients procured after surgery, but prior to the
administration of adjuvant therapy, were fractionated using anion-exchange
chromatography, to facilitate detection of the low abundant serum proteome. Selected
fractions were subsequently analysed by surface-enhanced laser desorption/ionisation
time-of-flight mass spectrometry (SELDI-TOF MS), and resulting protein profiles were
searched for prognostic markers by appropriate bioinformatics tools. Four peak clusters
(i.e., m/z 3073, m/z 3274, m/z 4405, and m/z 7973) were found to bear significant
prognostic value (p ≤ 0.01). The m/z 3274 candidate marker was structurally identified
as inter-alpha-trypsin inhibitor heavy chain 4 fragment658-688 in serum. Except for the
m/z 7973 peak cluster, these peaks remained independently associated to recurrence
free survival upon multivariate Cox regression analysis, including clinical parameters of
known prognostic value in this study population. Hence, investigation of the postoperative serum proteome by e.g., anion-exchange fractionation followed by SELDITOF MS analysis, is promising for the detection of novel prognostic factors. However,
regarding the rather limited study population, validation of these results by analysis of
independent study populations is warranted to assess the true clinical applicability of
discovered prognostic markers. In addition, structural identification of the other
markers will aid in elucidation of their role in breast cancer prognosis, as well as enable
development of absolute quantitative assays.
144
Prognostic serum protein profiles for breast cancer
Introduction
Breast cancer is at present the most commonly diagnosed neoplasm among women in
the USA (1). In addition, despite the substantial progress made in cancer therapy, breast
cancer is the second leading cause of female cancer deaths, following lung cancer (1).
The main prognostic factors currently used to determine eligibility for administration of
adjuvant therapy include both clinical and pathological parameters, e.g., patient’s age at
diagnosis, tumour size, lymph node status, grade of malignancy, and hormone-receptor
and Her2/neu receptor status (the latter two being predictive factors as well) (2).
However, despite appropriate locoregional treatment and adjuvant therapy, 30-50% of
breast cancer patients will develop metastatic relapse and die (3), while there is a small
percentage of patients that would have survived without adjuvant chemo- and
hormonal therapy. Evidently, currently applied prognostic markers do not suffice for
precise risk-group determination in breast cancer. This failure most likely originates in
the high molecular heterogeneity of breast cancer pathogenesis and progression, which
the currently used prognostic parameters clearly cannot fully address. Improved
prognostic markers that might help to reduce both over- and undertreatment of the
disease are thus urgently needed.
In search for these markers, investigators from our institutes have published gene
expression profiles in tumour tissue that outperformed all prognostic parameters in
predicting disease outcome (i.e., distant metastases) (4-7). Nonetheless, it is currently
understood that the functional “end-unit” of the genome, i.e., the proteome, might have
greater ability in reflecting the molecular complexity of (breast) cancer. Covering posttranslational and post-transcriptional modifications, the proteome reflects both the
intrinsic genetic programme of the cell and the impact of its immediate environment,
providing a highly dynamic and accurate view of a biological status (8), and hence, a
rich and complementary source of potential biomarkers.
One of the proteomic technologies used extensively in the search for novel markers is
surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDITOF MS) (9). Combining retention chromatography with laser desorption/ionisation
MS instrumentation, this platform has enabled high-throughput mass profiling of
highly complex biological samples, such as tissue lysates and serum. Thus far, only two
studies have reported the use of SELDI-TOF MS for discovery of prognostic breast
cancer markers (10;11). Ricolleau et al. (11) investigated tumour cytosolic extracts of 60
breast cancer patients, and identified ubiquitin and ferritin light chain to be associated
with prognosis. Goncalves at al. (10), on the other hand, investigated serum, being an
easier accessible biological matrix that provides a good reflection of the human
proteome as it perfuses all tissues of the body. Following SELDI-TOF MS analysis of
fractionated sera, they constructed a multiprotein model consisting of 40 proteins,
correctly predicting relapse in 67 of 81 patients (10). Our research group has previously
performed a prognostic SELDI-TOF MS study in serum as well (12). Although we
145
Chapter 3.4
initially discovered the haptoglobin phenotype to be a strong, independent, prognostic
parameter in high-risk primary breast cancer (n = 63), this result most likely was false
positive, as it was not confirmed following analysis of our validation sample set (n =
371) (12). In contrast to the study of Goncalves et al. (10), we investigated raw,
unfractionated sera in our previous study. While only 22 proteins comprise more than
99% of the human serum proteome, the low abundant proteins make up for the
remaining < 1% (13). This large dynamic range of proteins in crude serum hampers
detection of the allegedly high-informative low abundant serum proteins. Serum
fractionation, however, is likely to facilitate detection of the low abundant proteins
through reduction of this dynamic range (14).
We therefore aimed to specifically explore the low abundant serum proteome for the
presence of markers that can be applied in the prognostication of breast cancer. To this
end, sera of 82 breast cancer patients procured after surgery, but prior to the
administration of adjuvant therapy, were fractionated using anion-exchange
chromatography. Selected fractions were subsequently analysed by SELDI-TOF MS, and
resulting protein profiles were searched for prognostic markers by appropriate
bioinformatics tools.
Materials and Methods
Study population
From 1993 to 1999, high-risk primary breast cancer patients who had undergone
modified radical mastectomy or breast conserving surgery with complete axillary
clearance participated in a randomised multicentre phase III trial. This study
investigated the benefit of high-dose adjuvant chemotherapy in patients with ≥ 4
axillary lymph node metastases. The design of the study has been described elsewhere
(15). Major eligibility criteria were histologically confirmed stage 2A, 2B or 3A breast
cancer with at least 4 tumour-positive axillary lymph nodes, but no evidence of distant
metastases, age under 56 years, and no previous other malignancies.
In the current study, sera of 82 study patients who were treated in the Erasmus Medical
Center - Daniel den Hoed Cancer Center (Erasmus: n = 24), or in the Radboud
University Medical Center Nijmegen (Radboud: n = 58) were included. Sera were
procured after surgery (13-55 days), but prior to the administration of adjuvant
chemotherapy (0-41 days), and all sera were stored at -80°C. All serum samples were
obtained with medical-ethics approval and all patients gave informed consent.
Chemicals
All used chemicals were obtained from Sigma, St. Louis, MO, USA, unless stated
otherwise.
146
Prognostic serum protein profiles for breast cancer
Serum fractionation
Sera were fractionated manually using a strong anion exchange Q ceramic resin (BioRad Labs, Hercules, CA, USA), according to manufacturers’ protocol. Briefly, sera (20
µl) were denatured in 9 M urea / 2% 3[(3-cholamidepropyl)-dimethylammonio]propane sulfonate (CHAPS), after which they were randomly allocated in duplicate to
two 96-well ProteinChip Q filtration plates, prefilled with Q ceramic HyperD F resin
(Bio-Rad Labs). In addition, one serum sample was randomly assigned to 12 different
wells of each fractionation plate for quality control purposes. Following incubation (30
min), the flow through was collected using a vacuum manifold (Millipore, Billerica,
MA, USA). Bound proteins were subsequently eluted with a stepwise pH gradient using
wash buffers ranging from pH 9 to pH 3, followed by an organic buffer for elution of
remaining proteins. As a result, 6 serum fractions (F) were obtained, i.e., F1 (flowthrough plus pH 9), F2 (pH 7), F3 (pH 5), F4 (pH 4), F5 (pH 3), and F6 (organic buffer).
Prior to protein profiling, fractions were stored overnight at +4°C.
SELDISELDI-TOF MS protein profiling
Protein profiling of serum fractions was performed using the ProteinChip SELDI (PCS
4000) Reader (Bio-Rad Labs). Various array chemistries and fractions were initially
evaluated to determine which combination provided the best protein profiles in terms
of number and resolution of proteins. Following assay optimisation, we selected
Immobilized Metal Affinity Capture (IMAC30) arrays for analysis of F3 and F4, and
weak cation exchange (CM10) arrays for analysis of F5 and F6. Throughout the manual
assay, arrays were assembled in a 96-well bioprocessor, which was shaken on a
MicroMix 5 platform shaker (DPC Cirrus Inc., Los Angeles, CA, USA) at setting 20/7.
IMAC30 arrays were charged with 50 µl 100 mM copper sulphate (Merck, Darmstadt,
Germany) for 10 min, followed by neutralisation (5 min) with 200 µl 100 mM sodium
acetate buffer pH 4. Next, both IMAC30 and CM10 arrays were equilibrated twice for 5
min with 200 µl of their respective binding buffers (IMAC30: 0.01 M phosphate
buffered saline pH 7.4 / 0.5 M sodium chloride (Merck), CM10: 20 mM sodium acetate
pH 4). Arrays were subsequently loaded with 85 µl binding buffer and 15 µl of the
fractionated sample. After incubation (30 min), arrays were washed three times with
200 µl binding buffer, and following a quick rinse with MilliQ water (Millipore), arrays
were air-dried. A 50% sinapinic acid (Bio-Rad Labs) solution in 50% acetonitrile (Labscan Ltd., Dublin, Ireland) / 0.5% trifluoroacetic acid (Merck) was applied twice (1.0 µl)
to the arrays as the matrix. Following air-drying, the arrays were analysed using the
ProteinChip SELDI (PCS 4000) Reader. Data were collected between 0 and 300 kDa,
averaging 530 laser shots with 3500 nJ intensity, at focus mass 7.5 kDa and matrix
attenuation 1000 Da. For mass accuracy, the instrument was calibrated on the day of
measurements with All-in-One protein standard (Bio-Rad Labs).
147
Chapter 3.4
Statistics and bioinformatics
Mass spectrometry data were processed using the tbimass R-package (www.rproject.org,
publication in preparation). After pre-processing (resampling, baseline correction,
normalization, alignment correction), peaks were recognised using PROcess
(www.bioconductor.org) on the mean spectra of each experimental group (fraction/
ProteinChip array type). For discovery of peak clusters with significant prognostic
value, a subpopulation (n = 68) containing patients diagnosed with a recurrence within
36 months of follow-up (n = 32) and patients experiencing no recurrence after a followup of at least 48 months (n = 36) were extracted from the total study population.
Investigating this subpopulation using Cox proportional hazards analysis, the peak
clusters associated with recurrence were identified within all peaks of the combined
data of all fractions/ProteinChip array types. For selection, a stepwise method was
applied (i.e., stepBIC), an algorithm sequentially searching through all possible Cox
proportional hazard models for the one that minimises the Bayesian Information
Criterion (BIC). Recurrence free survival was calculated from the date of randomisation
to the time of first recurrence or death, or the date of last follow-up.
Clinical parameters were selected on known impact on recurrence in the total study
population, to prevent overfitting of the data to the model. To this end, a Cox
proportional hazards analysis was performed including the known clinical parameters
presented in Table 1, based on forward entry (p < 0.05). In addition, the clinical
parameter ‘treatment’ was selected to correct for the different treatment arms of the
original clinical trial. A Cox proportional hazards model was subsequently build on the
total study population, by inclusion of the relevant clinical parameters only. To
investigate whether the relationship between peak intensities and recurrence free
survival could be explained by any of the relevant clinical parameters, the hazard ratios
were adjusted for these clinical parameters by construction of a Cox proportional
hazards model on the total study population, incorporating the selected peak clusters
and the relevant clinical parameters.
Since our study population originated from two different hospitals that allegedly used
different sample collection protocols, our results could have been influenced by various
pre-analytical factors. The influence of the different collection protocols on the SELDITOF MS protein profiles was investigated by multidimensional scaling of the SELDITOF MS spectra. Herewith, the degree of similarity or dissimilarity between the
samples withdrawn at the two different hospitals is graphically expressed: points
representing similarity tend to cluster together, while points representing dissimilarity
tend to be far apart. The influence of collection center on the protein profile was
furthermore investigated by Cox proportional hazards analysis for each peak cluster
separately, incorporating one peak cluster, relevant clinical parameters and collection
center.
148
Prognostic serum protein profiles for breast cancer
The reproducibility of the assay was assessed by analysis of one quality control serum
sample, fractionated 24 times by random assignment to 12 different wells of each of the
2 fractionation plates. Within the quality control spectra, all peaks with a signal-tonoise ratio (S/N) ≥ 2 were detected, after which the coefficient of variation was
calculated on the corresponding peak intensities. All statistical tests were two-sided,
and p < 0.05 was considered statistically significant.
Peptide identification
For identification purposes, peptides of interest were extracted from serum(fractions) by
reversed-phase C18 magnetic beads (Dynabeads RPC18, Invitrogen, Breda, The
Netherlands) using a Kingfisher 96 pipetting robot (Thermo Fisher Scientific, Waltham,
MA, USA), according to the optimized protocol described in (16). Briefly, sera were
diluted in TFA 0.1%, after which the peptide content was bound to the beads. The
beads were subsequently washed with 0.1% TFA, and eluted with 50% ACN. Eluate (1
µl) was mixed with α-cyano-4-hydroxy-cinnamic acid matrix (2 µl), after which the
mixture was spotted (0.7 µl) on a MALDI target plate. Analyses were performed on a
4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems, Forster City, CA,
USA). Fragment ion spectra were taken to search the NCBI 20081128 database (Homo
sapiens:
216937
sequences)
using
the
MASCOT
search
engine
at
http://www.matrixscience.com (Matrix Science Ltd., London, UK), with the following
search parameters: monoisotopic precursor mass tolerance: 18 ppm, fragment mass
tolerance: 1 Da, variable modifications: methionine oxidation, and no specified protease
cleavage site.
Results
Study population
At time of analysis, 45 patients (Erasmus: 19 pts, Radboud: 26 pts) had a recurrence or
had died and 37 patients (Erasmus: 5 pts, Radboud: 32 pts) were censored at a median
follow-up of 6.5 years (Erasmus: 7.8 years, Radboud: 6.3 years). Patient characteristics
are provided in Table 1. All patient characteristics were similarly distributed between
the samples obtained from the Erasmus Medical Center and the Radboud University
Medical Center, as determined by the Chi-square test or the Mann-Whitney U test.
SELDI--TOF MS protein profiling
SELDI
After selection of the peak clusters with stepBIC, a Cox proportional hazards model on
combined peak clusters identified in all measured fractions/ProteinChip arrays was
build. Out of the 400 peak clusters tested for inclusion into the model, four peak
clusters (m/z 3073 (F4/IMAC30), m/z 3273 (F4/IMAC30), m/z 4404 (F6/CM10), m/z
149
Chapter 3.4
7973 (F5/CM10)) were selected due to their significant association with recurrence free
survival in the subpopulation (Table 2).
Table 1
Patient and tumour characteristics of the study population.
Erasmus (n = 24)
n
(%)
Radboud (n = 58)
n
(%)
Total (n = 82)
n
(%)
Age, mean [range]
< 40 years
≥ 40 years
43.5 [26-54]
6
(25%)
18
(75%)
43.2 [28-54]
15
(26%)
43
(74%)
43.3 [26-54]
21
(26%)
61
(74%)
Menopausal status
Premenopausal
Postmenopausal
Unknown
21
3
0
(88%)
(12%)
(0%)
52
5
1
(90%)
(9%)
(1%)
73
8
1
(89%)
(10%)
(1%)
Mastectomy
Breast conserving
15
9
(63%)
(37%)
40
18
(69%)
(31%)
55
27
(67%)
(33%)
Conventional dose
High dose
13
11
(54%)
(46%)
30
28
(52%)
(48%)
43
39
(52%)
(48%)
15
9
(63%)
(37%)
38
20
(66%)
(34%)
53
29
(65%)
(35%)
T1 (< 2 cm)
T2 (2-5 cm)
T3 (≥ 5 cm)
7
13
4
(29%)
(54%)
(17%)
17
34
7
(29%)
(59%)
(12%)
24
47
11
(29%)
(57%)
(14%)
Negative
Positive
Unknown
13
10
1
(54%)
(42%)
(4%)
37
19
2
(64%)
(33%)
(3%)
50
29
3
(61%)
(35%)
(4%)
Oestrogen receptor status
ER negative
ER positive
Unknown
9
14
1
(38%)
(58%)
(4%)
23
35
0
(40%)
(60%)
(0%)
32
49
1
(39%)
(60%)
(1%)
Progesterone receptor status
PR negative
PR positive
Unknown
10
13
1
(42%)
(54%)
(4%)
24
34
0
(41%)
(59%)
(0%)
34
47
1
(42%)
(57%)
(1%)
Bloom-Richardson grade
Grade I
Grade II
Grade III
Unknown
0
6
17
1
(0%)
(25%)
(71%)
(4%)
7
18
32
1
(12%)
(31%)
(55%)
(2%)
7
24
49
2
(9%)
(29%)
(60%)
(2%)
Patient characteristics
Surgery
Treatment
Tumour characteristics
Number of positive lymph nodes
4-9
≥ 10
Tumour size
Her2/neu status
150
Prognostic serum protein profiles for breast cancer
Cox proportional hazards analysis on all known clinical parameters based on forward
entry revealed the parameters ‘age’, ‘number of positive lymph nodes’, and
‘progesterone receptor status’ to be significantly associated with recurrence free survival
in the total population. In addition, the parameter ‘treatment’ was included to correct
for the different treatment arms of the original clinical trial. In the Cox proportional
hazards model constructed on the total population by inclusion of the relevant clinical
parameters, age (≥ 40 years, p = 0.021), number of positive lymph nodes (≥ 10, p =
0.012), and progesterone receptor status (positive, p = 0.003) were found significantly
associated with recurrence free survival (Table 2).
Table 2
Multivariate proportional hazards analyses for the risk of recurrence on selected peak clusters,
before (model 1, subpopulation) and after (model 3, total study population) adjustment for relevant
clinical parameters, and on relevant clinical parameters solely (model 2, total study population).
Parameter
Peak cluster
m/z 3073
m/z 3273
m/z 4405
m/z 7973
Treatment
CONV
HD
Age
< 40 yrs
≥ 40 yrs
No. of LN+
≥ 10
4-9
PR status
PR(-)
PR(+)
Model 1 - peak clusters
Model 2 - clinical parameters Model 3 - combined
HR
(95% CI)
p-value HR
(95% CI)
p-value HR
(95% CI)
p-value
3.17
10.18
0.02
0.05
(2.03-4.96)
(1.99-52.01)
(0.01-0.25)
(0.01-0.48)
< 0.001
0.005
0.003
0.010
2.48
11.71
0.01
0.24
(1.78-3.48)
(2.05-66.90)
(0.01-0.17)
(0.03-1.78)
< 0.001
0.006
0.001
0.160
1
1.59
(0.86-2.95)
0.140
1
2.48
(1.24-4.98)
0.011
1
0.44
(0.22-0.88)
0.021
1
0.35
(0.16-0.74)
0.006
1
0.44
(0.23-0.84)
0.012
1
0.37
(0.19-0.72)
0.003
1
0.40
(0.22-0.73)
0.003
1
0.28
0.14-0.55)
< 0.001
Abbreviations: BR grade: Bloom-Richarson grade, CONV.: conventional dose arm, HD: high dose arm, LN+:
number of positive lymph nodes, PR: progesterone receptor status positive (+) and negative (-).
To investigate whether the prognostic value of the four peak clusters was independent
from confounding effects due to differences in the clinical parameters, a combined Cox
proportional hazards model including the four selected peaks and all relevant clinical
parameters was build on the total study population. Three of the four selected peak
clusters (i.e., m/z 3072, m/z 3273, and m/z 4404) remained significantly associated with
recurrence free survival in combination with the clinical variables (Table 2).
Furthermore, using multidimensional scaling, we investigated the influence of the
different collection protocols (allegedly used by the two hospitals) on the SELDI-TOF
MS serum protein profiles. As depicted in Figure 1 for F4/CM10, spectra of the sera
collected in the Erasmus Medical Center and the Radboud University Medical Center
are randomly distributed, indicating no structural differences in the SELDI-TOF MS
151
Chapter 3.4
serum protein profiles of both hospitals. In addition, following Cox proportional
hazards analysis including one peak cluster, relevant clinical parameters and collection
center, all peak clusters except m/z 7973 remained (borderline) significant (i.e., m/z
3073: HR = 3.44, p = 0.046, m/z 3274: HR = 2.39, p = 0.051, m/z 4405: HR = 0.107, p <
0.001, and m/z 7973: HR = 0.35, p = 0.160).
Figure 1
MDS plot of Fraction 4/IMAC30 data (i.e., duplicate spectra) on Center of withdrawal (R: Erasmus,
N: Radboud, O: quality control sample).
Lastly, the reproducibility of the assay was investigated by calculation of the coefficient
of variation of all peak clusters with S/N > 2 detected in the quality control spectra (n =
24 per fraction/ProteinChip array type) (Figure 2). The median coefficient of variation
of the peak intensities following fractionation and SELDI-TOF MS analysis ranged from
13.4% to 24.2% for the different fractions/ProteinChip arrays investigated, with an
overall average CV of 20.2%.
Peptide identification
The MALDI serum / serum fraction peptide profiles obtained using C18 magnetic beads
were searched for the presence of prognostic SELDI peaks based on mass matching. Due
to the different chemistries used for peptide capture for SELDI-TOF MS (IMAC30 Cu)
and MALDI-TOF MS (C18), and to the mass limitations for direct fragmentation, we
were able to elucidate the identity of one of the four candidate prognostic peak clusters
in the spectra of whole serum. The SELDI-TOF MS peak cluster at m/z 3274 was
detected by MALDI-TOF/TOF MS as MH+ ions at m/z 3271.69 (default calibration), and
identified by MALDI-TOF/TOF MS/MS (Figure 3) in conjunction with database
searching as a fragment of inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4658-688),
with a MASCOT score of 69 (expect: 0.0025).
152
Prognostic serum protein profiles for breast cancer
Figure 2
Coefficient of variation (y-axis) of the peak cluster identified in the quality control sample (x-axis)
fractionated on fractionation plate 1 (black) and plate 2 (grey).
CM10 F5
CM10 F6
peaks
peaks
IMAC30 F3
IMAC30 F4
peaks
peaks
Discussion
In the current study, we investigated sera of 82 breast cancer patients procured after
surgery, but prior to the administration of adjuvant therapy, in search for novel
prognostic biomarkers. To facilitate detection of the low-abundant serum proteome,
sera were fractionated using anion-exchange chromatography, after which selected
fractions were analysed by SELDI-TOF MS. Resulting protein profiles were searched for
prognostic markers by appropriate bioinformatics tools. Considering solely the peak
clusters detected in the SELDI-TOF MS protein profiles, four peak clusters (i.e., m/z
3073, m/z 3274, m/z 4405, and m/z 7973) were found to bear significant prognostic
value. The m/z 3274 candidate marker was structurally identified as ITIH4658-688 in
serum. Moreover, except for the m/z 7973 peak cluster, these peaks remained
independently associated to recurrence free survival upon multivariate Cox regression
analysis, including clinical parameters of known prognostic value in this study
population. Hence, investigation of the post-operative serum proteome by e.g., anionexchange fractionation followed by SELDI-TOF MS analysis, is promising for the
detection of novel prognostic factors. However, regarding the rather limited study
population, validation of our results by analysis of similar, prospectively collected,
independent, study populations is warranted to assess the true clinical applicability of
identified prognostic markers. In addition, structural identification of the other markers
will aid in elucidation of their role in breast cancer prognosis, as well as enable
development of absolute quantitative assays (e.g., (17)).
153
Chapter 3.4
Figure 3
Annotated MALDI-TOF/TOF MS/MS spectrum of m/z 3271.69.
100
90
119.5
70.0738
b14
2120.0483
80
b23
2431.2012
70
% Intensity
b20-NH3
60
50
b23-NH3
2103.0208
110.0566
2414.2258
155.0333
245.9986
b2
40
z22
c19
30
c22
2022.0994
120.0500
84.0851
129.0643
b9
y9
20
c9
1054.1283 b5
1019.1696
y11
1810.9154
1616.4342
2342.2510
y20
2184.1443
1245.4258
MH-NH3
3256.5027
2333.2383
y20
2724.3333
10
0
9.0
698.4
1387.8
Mass (m/z)
2077.2
2766.6
3456.0
Metastases are thought to arise from clinically undetectable residual or micrometastatic
disease, activated by a.o. stroma-generated growth factors, early impediment of immune
surveillance and enhancement of angiogenesis (18-21). These early post-surgical host
reponse processes are potentially affected by surgical extirpation of the tumour, as this
disrupts the intricate interactions between malignant cells and physiological tumourcontrol mechanisms (22;23). Hence, the early post-operative serum proteome can bear
prognostic information, since it reflects the host response processes that can play a key
role in metastatic progression. The candidate prognostic markers detected in the current
study therefore most likely correlate with this post-operative host response. In addition,
since all study participants were treated with adjuvant chemotherapy, these
differentially expressed proteins may also relate to the tumour phenotype and its
chemosensitivity. Nonetheless, the four candidate markers could also arise directly from
residual or micrometastatic disease. Considering the nadir in tumour burden following
surgery, however, serum concentrations of tumour-secreted proteins most likely are
well below the detection limit of the SELDI-TOF MS platform, even following serum
fractionation. Lastly, the four candidate prognostic markers can also result from
tumour-secreted proteases that process host-response proteins upon their exposure to
the tumour microenvironment (24;25). Since these modified host response proteins
generally are present at substantially higher circulatory concentrations than the
154
Prognostic serum protein profiles for breast cancer
enzymes that process them upon their exposure to the tumour microenvironment, they
can be detected in blood by SELDI-TOF MS.
This latter hypothesis is in fact endorsed by the structural identity of the candidate m/z
3274 marker, i.e., the inter-alpha-trypsin inhibitor heavy chain 4658-688 fragment,
identified in serum. We previously found serum levels of this fragment decreased in
breast cancer compared to control (Gast et al., submitted). Other studies have detected
this fragment in serum as well, reporting either a lack of discriminative value (24;25), or
an increase in breast cancer compared to control (26). Most likely, these contradictory
findings originate from the heterogeneity of the different study populations
investigated, or from the postulated instability of ITIH4 fragments (24;26;27). In
addition, changes in the abundance of the m/z 3274 ITIH4 fragment have been found
associated to various types of cancer (e.g., prostate, breast, ovarian, colorectal, and
pancreatic cancer) (24-26). This evident lack of specificity does not hamper its use as
prognostic marker, however. The various serum ITIH4 fragments are currently
hypothesised to result from tumour-secreted proteases that process host response
proteins upon their exposure to the tumour microenvironment (24;26;28). Hence, in the
current study, the m/z 3274 marker could well originate from proteolytic activity
associated with residual (micrometastatic) disease. According to this hypothesis, this
candidate prognostic marker might possibly be applied in other malignancies as well, as
the protease activity has been shown to be cancer-type specific (24-26). Hence, future
validation studies should also include other types of malignancies.
Structural identification is imperative to investigate origin and function of the other
three candidate biomarkers. In addition, concerning the rather limited study
population, results must be validated by analysis of an independent, similar, sample set.
Such validation sets may prove difficult to obtain, however, regarding the extended
follow-up window needed to reliably investigate breast cancer prognosis. Fortunately,
cross validation can already offer some indication of the generalisibility of the
classification model. For instance, the performance of the multiprotein index
constructed by Goncalves et al. (10) declined from 83% during training, to 72% after
cross-validation. This drop in performance indicates probable overfitting of the data,
which most likely is caused by the high number of proteins used for classification (n =
40) compared to the limited study population (n = 81). Conversely, we included only 4
protein peaks (and the 4 clinical parameters that best predicted recurrence free survival)
in our model to purposely preclude overfitting of the data to the model. Hence, despite
the lack of an independent validation set, we hypothesise this model to be generalisible
to similar, new, study populations.
While serum is generated by coagulation, its proteome is prone to the proteases
involved in this cascade, as well as to those involved in the complement cascade,
activated upon clotting. Various pre-analytical parameters, such as sampling device,
clotting temperature, and storage time, can thus all exert a distinct influence on the
serum proteome. Since our study populations originated from two different hospitals
155
Chapter 3.4
that allegedly used different sample collection protocols, our results could have been
influenced by the various pre-analytical factors. However, as depicted in Figure 2, we
did not observe such an influence on the protein profiles, indicating that the
investigated serum proteome most likely is rather robust to (small) differences in
collection protocols. Moreover, despite the different characteristics of the two study
groups, all peak clusters except m/z 7973 remained (borderline) significant after
inclusion of the collection center in the Cox proportional hazards model. The three
peaks are therefore of additional prognostic value, even if the different collection
centers are taken into account. The reliability of our results is furthermore endorsed by
the reproducibility of the assay (average CV: 20.2%), which is well in agreement to
previous reports (10;29).
Conclusion
In conclusion, using serum anion-exchange fractionation in combination with SELDITOF MS analysis, we discovered 4 peak clusters, one of which identified as serum
ITIH4658-688, with significant prognostic value in a study population of 82 high-risk
primary breast cancer patients. Three peak clusters (including ITIH4658-688) remained
significantly associated to recurrence free survival following inclusion of clinical
parameters. These results are promising, as the prognostic profile identified in the
current study could eventually improve therapeutic accuracy. However, the rather
limited study population requires extension and re-assessment in other, similar, study
populations, to confirm the performance of identified prognostic peak clusters. In
addition, the precise roles of the candidate markers in breast cancer remain to be
elucidated, as this could help in the identification of new molecular targets. Hence,
enabling development of absolute quantitative assays as well, structural identification of
the candidate markers is warranted. Lastly, profiling of other serum fractions on
different types of ProteinChip arrays might reveal more protein peaks with prognostic
value.
Acknowledgement
This study was supported by a grant of the Dutch Cancer Society (project NKI 20053421). We gratefully acknowledge Annemieke van Winden for help with serum
fractionation.
156
Prognostic serum protein profiles for breast cancer
References
(1)
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. Cancer statistics, 2008. CA Cancer J Clin 2008;
58(2):71-96.
(2)
Goldhirsch A, Wood WC, Gelber RD, Coates AS, Thurlimann B, Senn HJ. Meeting highlights: updated
international expert consensus on the primary therapy of early breast cancer. J Clin Oncol 2003;
21(17):3357-3365.
(3)
Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer
Trialists' Collaborative Group. Lancet 1998; 352(9132):930-942.
(4)
't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M et al. Gene expression profiling predicts
clinical outcome of breast cancer. Nature 2002; 415(6871):530-536.
(5)
Foekens JA, Atkins D, Zhang Y, Sweep FC, Harbeck N, Paradiso A et al. Multicenter validation of a gene
expression-based prognostic signature in lymph node-negative primary breast cancer. J Clin Oncol 2006;
24(11):1665-1671.
(6)
van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW et al. A gene-expression signature
as a predictor of survival in breast cancer. N Engl J Med 2002; 347(25):1999-2009.
(7)
Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F et al. Gene-expression profiles to predict
distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365(9460):671-679.
(8)
Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ et al. Proteomics: new
perspectives, new biomedical opportunities. Lancet 2000; 356(9243):1749-1756.
(9)
Hutchens TW, Yip TT. New desorption strategies for the mass spectrometric analysis of macromolecules.
Rapid Commun Mass Spectrom 1993; 7:576-580.
(10) Goncalves A, Esterni B, Bertucci F, Sauvan R, Chabannon C, Cubizolles M et al. Postoperative serum
proteomic profiles may predict metastatic relapse in high-risk primary breast cancer patients receiving
adjuvant chemotherapy. Oncogene 2006; 25(7):981-989.
(11) Ricolleau G, Charbonnel C, Lode L, Loussouarn D, Joalland MP, Bogumil R et al. Surface-enhanced laser
desorption/ionization time of flight mass spectrometry protein profiling identifies ubiquitin and ferritin
light chain as prognostic biomarkers in node-negative breast cancer tumors. Proteomics 2006; 6(6):19631975.
(12) Gast MC, van Tinteren H, Bontenbal M, van Hoesel QC, Nooij MA, Rodenhuis S et al. Haptoglobin
phenotype is not a predictor of recurrence free survival in high-risk primary breast cancer patients. BMC
Cancer 2009; 8:389.
(13) Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects.
Mol Cell Proteomics 2002; 1(11):845-867.
(14) Hoffman SA, Joo WA, Echan LA, Speicher DW. Higher dimensional (Hi-D) separation strategies
dramatically improve the potential for cancer biomarker detection in serum and plasma. J Chromatogr B
Analyt Technol Biomed Life Sci 2007; 849(1-2):43-52.
(15) Rodenhuis S, Bontenbal M, Beex LV, Wagstaff J, Richel DJ, Nooij MA et al. High-dose chemotherapy
with hematopoietic stem-cell rescue for high-risk breast cancer. N Engl J Med 2003; 349(1):7-16.
(16) Jimenez CR, El Filali Z, Knol JC, Hoekman K, Kruyt FAE, Giaccone G et al. Automated serum peptide
profiling using novel magnetic C18 beads off-line coupled to MALDI-TOF-MS. Proteomics Clin Appl
2007; 1(6):598-604.
(17) Broek van den I, Sparidans RW, Schellens JH, Beijnen JH. Liquid chromatography/tandem mass
spectrometric method for the quantification of eight proteolytic fragments of ITIH4 with biomarker
potential in human plasma and serum. Rapid Commun Mass Spectrom 2008; 22(18):2915-2928.
(18) Demicheli R, Retsky MW, Swartzendruber DE, Bonadonna G. Proposal for a new model of breast cancer
metastatic development. Ann Oncol 1997; 8(11):1075-1080.
157
Chapter 3.4
(19) Demicheli R, Retsky MW, Hrushesky WJ, Baum M, Gukas ID. The effects of surgery on tumor growth: a
century of investigations. Ann Oncol 2008; 19(11):1821-1828.
(20) Heimann R, Hellman S. Individual characterisation of the metastatic capacity of human breast
carcinoma. Eur J Cancer 2000; 36(13 Spec No):1631-1639.
(21) Pupa SM, Menard S, Forti S, Tagliabue E. New insights into the role of extracellular matrix during tumor
onset and progression. J Cell Physiol 2002; 192(3):259-267.
(22) Fisher B, Gunduz N, Coyle J, Rudock C, Saffer E. Presence of a growth-stimulating factor in serum
following primary tumor removal in mice. Cancer Res 1989; 49(8):1996-2001.
(23) Tagliabue E, Agresti R, Carcangiu ML, Ghirelli C, Morelli D, Campiglio M et al. Role of HER2 in woundinduced breast carcinoma proliferation. Lancet 2003; 362(9383):527-533.
(24) Fung ET, Yip TT, Lomas L, Wang Z, Yip C, Meng XY et al. Classification of cancer types by measuring
variants of host response proteins using SELDI serum assays. Int J Cancer 2005; 115(5):783-789.
(25) Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB et al. Differential
exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006; 116(1):271284.
(26) Song J, Patel M, Rosenzweig CN, Chan-Li Y, Sokoll LJ, Fung ET et al. Quantification of fragments of
human serum inter-alpha-trypsin inhibitor heavy chain 4 by a surface-enhanced laser
desorption/ionization-based immunoassay. Clin Chem 2006; 52(6):1045-1053.
(27) Timms JF, Arslan-Low E, Gentry-Maharaj A, Luo Z, T'Jampens D, Podust VN et al. Preanalytic influence
of sample handling on SELDI-TOF serum protein profiles. Clin Chem 2007; 53(4):645-656.
(28) Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L et al. Correcting common errors in
identifying cancer-specific serum peptide signatures. J Proteome Res 2005; 4(4):1060-1072.
(29) Albrethsen J. Reproducibility in protein profiling by MALDI-TOF mass spectrometry. Clin Chem 2007;
53(5):852-858.
158
Chapter
Protein profiling of tissue
4
Chapter
Detection of breast cancer by
SELDI-TOF MS tissue and serum
protein profiling
Marie-Christine W. Gast
Eric J. van Dulken
Thea K.G. van Loenen
Florine Kingma-Vegter
Johan Westerga
Claudie C. Flohil
Jaco C. Knol
Connie R. Jimenez
Carla H. van Gils
Lodewijk F.A. Wessels
Jan H.M. Schellens
Jos H. Beijnen
Submitted for publication
4.1
Chapter 4.1
Abstract
Breast cancer is estimated to be the second leading cause of female cancer deaths in the
USA in 2008. Despite the advances made in cancer therapy, early detection remains the
best route to decrease overall (breast) cancer mortality. However, current modalities
(e.g., mammography) lack adequate performance to be applicable in early detection, and
new, improved, markers are urgently needed. In the past decade, novel markers were
extensively searched for in the proteome, using a.o. the surface-enhanced laser
desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS) technology.
The majority of SELDI-TOF MS studies have thus far investigated samples originating
from biorepositories, which are likely to suffer from variable adherence to collection
protocols, thereby hampering biomarker discovery. We therefore investigated breast
cancer (n = 75) and control (n = 26) serum and tissue samples, collected prospectively by
rigorous adherence to a strictly defined protocol, to discover novel breast cancer
associated markers. Sera were collected pre- and post-operatively, and both serum and
tissue samples were analysed by SELDI-TOF MS using two different array-types
(IMAC30 Ni and Q10 pH 8).
Following serum analyses, one Q10 peak cluster (m/z 3939) was found significantly
increased in breast cancer compared to control, while two IMAC30 peak clusters (m/z
4292 and m/z 4301) were significantly decreased in expression following surgery of the
breast cancer patients. Conversely, proteome analyses at the tumour level yielded 27
peak clusters, discriminative between breast cancer and control tissue. In addition,
several peak clusters gradually in- or decreased in intensity from healthy to benign to
cancer, or with increasing cancer stage, apparently visualising disease progression.
Constructed classification trees had a 10-fold cross validated performance of 67% to
87%. Two tissue peak clusters were identified as N-terminal albumin fragments. These
fragments are likely to have been generated by (breast) cancer specific proteolytic
activity in the tumour microenvironment. As such, they can potentially provide insight
into the pathophysiological mechanisms associated with, or underlying, breast cancer,
and aid in improving breast cancer diagnosis.
164
Diagnostic tissue and serum protein profiles for breast cancer
Introduction
Accounting for 26% of all new cancer cases, breast cancer is estimated to be the most
commonly diagnosed neoplasm among women in the USA in 2008 (1). Following lung
cancer, it is the second leading cause of USA cancer deaths in the prognosis for 2008 (1).
Despite the substantial progress made in cancer therapy, the best route to decrease
overall mortality from (breast) cancer is through early detection, as cancer survival is
inversely proportional to disease stage at presentation (2). Unfortunately, due to a lack
of adequate detection methods, only 63% of breast cancers are confined to the breast at
the time of diagnosis (1). Although mammography currently is the most widely applied
imaging test, it has only limited predictive value in women with dense breast tissue and
small lesions. In addition, established serum tumour markers (e.g., Cancer Antigen 15.3)
lack adequate performance to be applicable in early detection, and are thus applied only
in monitoring therapy of advanced breast cancer or recurrence (3). Evidently, new
biomarkers for reliable detection of breast cancer, either individually or in conjunction
with existing modalities, are urgently needed.
With cancer being a genetic disease, these markers were initially searched for by the
investigation of the cancer genome and transcriptome. It is expected, however, that the
proteome will have complementary use in the detection of novel cancer markers (4).
Covering post-transcriptional as well as post-translational modifications, the proteome
provides a more dynamic and accurate reflection of a biological status, as it mirrors both
the intrinsic genetic programme of the cell and the impact of its immediate
environment (5). In search for novel cancer markers, the proteome has been
investigated
by
various
techniques,
including
surface-enhanced
laser
desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS) (6). As this
platform enables analysis of highly complex biological matrices (e.g., serum, tissue
lysates) in a high-throughput fashion, it has been used extensively for discovery of
novel serum markers for e.g., breast (7;8), colorectal- (9;10), and renal cell carcinoma
(11;12).
The majority of SELDI-TOF MS studies published thus far in breast cancer have
investigated sera originating from biorepositories (7;8;13-15), allowing timely study
progression, since prospective sample collection usually takes years. Biorepositories
may, however, vary in their adherence to consistent sample processing protocols over
time, potentially affecting SELDI-TOF MS serum protein profiles (16). Although this
renders biorepository samples representative of routinely collected “real world”
samples, it may also seriously hamper biomarker detection, as breast cancer is a highly
complex disease, thereby complicating biomarker discovery in its own right. Hence, in
the current study, we aimed to discover novel high-performance SELDI-TOF MS
markers for breast cancer detection through comparison of breast cancer and control
sera, collected prospectively by rigorous adherence to a strictly defined protocol.
Moreover, we also investigated serum proteome transitions occurring after surgery,
165
Chapter 4.1
since monitoring protein expression dynamics in response to surgery may help to better
understand breast cancer pathogenesis. Lastly, collection of tissue specimens from each
participant during surgery enabled us to perform biomarker discovery directly at the
tumour level. Identification of cancer associated proteins at the tumour level might
provide us more insight into the pathophysiological mechanisms associated with, or
underlying, breast cancer.
Materials and methods
Study population
Women above 18 years old presenting with an indication for (reductive) breast surgery
at the Department of (Plastic) Surgery of the Slotervaart Hospital (Amsterdam) were
asked for participation in this study. Of the 111 women asked for study participation, 84
were diagnosed with breast cancer (BC). The control group (CON) consisted of 14
women diagnosed with benign breast disease (BBD), and 13 women with healthy breast
tissue (healthy control; HC). Ten participants were found ineligible according to the
inclusion criteria: 8 participants (BC: n = 7, BBD: n = 1) had prior malignancies, and in 2
participants (BC), sample collection failed. Serum was collected prior to surgery and at
least 2 weeks after surgery (but prior to the eventual administration of adjuvant
therapy). Of 6 participants, no pre-surgery serum sample was collected (BC: n = 3, BBD:
n = 1, HC: n = 2). Post-surgery samples were not available of 10 participants (BC: n = 8,
BBD: n = 2), either because collection failed, or collection took place following
initiation of adjuvant therapy. Tissue sample collection was not successful in 14
participants (BC: n = 12, HC: n = 2).
Serum and tissue collection was done following a strict procedure. According to
manufacturers’ protocol, blood samples were collected in 9.5 ml BD Vacutainer tubes
(Beckton-Dickinson, Breda, The Netherlands) and allowed to clot for exactly 30 min at
room temperature, after which they were centrifuged at 1500 g for 15 min at room
temperature. Sera were then immediately aliquotted and stored at -70°C. Tissue was
collected dry and tissue sections were snap frozen in liquid nitrogen immediately after
collection at the Department of Pathology, and stored in liquid nitrogen until analysis.
All serum and tissue samples were collected between April 2005 and December 2007,
after approval by the local medical ethics committee, with individuals’ written
informed consent.
Chemicals
All used chemicals were obtained from Sigma, St. Louis, MO, USA, unless stated
otherwise.
166
Diagnostic tissue and serum protein profiles for breast cancer
Preparation of tissue lysates
Snap frozen (whole) tissue sections were disintegrated in deep frozen state by
pulverisation with a Mikro-dismembrator II (Sartorius AG, Göttingen, Germany). First,
tissues were cut into smaller blocks, placed into a pre-cooled shaking flask with a
stainless steel ball and then pulverised in three rounds of shaking (55 sec) and cooling in
liquid nitrogen (3 min). About 10 mg of the resulting frozen tissue powder was then
added to 100 μl of each of two denaturation buffers (buffer 1: 9 M urea and 2% 3-[(3cholamidopropyl)dimethylammonio-]-1-propanesulfonic acid (CHAPS), and buffer 2: 9
M urea, 2% CHAPS, and 1% dithiotreitol (DTT)). Lysates were stored at -70°C until
analysis, for which they were thawed on ice and centrifuged at 15000 rpm for 5 min.
The protein concentration of each supernatant was subsequently determined using the
2D-Quant Kit (GE Healthcare, Diegem, Belgium), following manufacturers’
instructions.
Serum and tissue protein profiling
Serum protein profiling was performed using the ProteinChip SELDI (PBSIIc) Reader
(Bio-Rad Labs, Hercules, CA, USA). Various chip chemistries, binding- and washingprocedures and sample pretreatments were initially evaluated to determine which
procedure provided the best serum profiles in terms of number and resolution of
proteins. Both strong anion exchange (Q10) and Immobilized Metal Affinity Capture
(IMAC30) arrays were selected for further analysis. Throughout the assay, arrays were
assembled in a 96-well bioprocessor, which was shaken on a platform shaker at 250
rpm. Sample processing was manual, and all serum samples were randomly attributed to
one of 3 measurement series (per array type) before analysis. Each series was measured
in duplicate on one day and samples were allocated randomly to the arrays. Tissue
lysates were analysed in duplicate in one separate series, using the same procedures as
for serum. For each sample, the amount of lysate applied to the arrays was adjusted to
its protein concentration.
For the IMAC30 assay, arrays were charged twice with 50 µl 100 mM nickel sulphate
(Merck, Darmstadt, Germany) for 15 min, followed by three rinses with deionised
water (Braun, Emmenbrücke, Germany) and two equilibrations with 200 µl Phosphate
Buffered Saline (PBS; 0.01 M) pH 7.4 / 0.5 M sodium chloride / 0.1% TritonX-100
(binding buffer; sodium chloride from Merck) for 5 min. For the Q10 assay, arrays were
equilibrated twice with 200 µl 20 mM Tris-HCl buffer pH 8.0 / 0.1% TritonX-100
(binding buffer). Unfractionated serum samples were thawed on ice and denatured
twice; once for the IMAC30 assay (by 1:10 dilution in 9 M urea / 2% CHAPS), and once
for the Q10 assay (by 1:10 dilution in 9 M urea / 2% CHAPS / 1% DTT). Pretreated
samples were diluted 1:10 in binding buffer and randomly applied to the arrays. After a
30 min incubation, the arrays were washed twice with binding buffer and twice with
PBS pH 7.4 / 0.5 M sodium chloride (IMAC30) or 20 mM Tris-HCl buffer pH 8 (Q10)
167
Chapter 4.1
for 5 min. Following a quick rinse with deionised water, arrays were air-dried. A 50%
solution of sinapinic acid (Bio-Rad Labs) in 50% acetonitrile (ACN; Biosolve,
Valkenswaard, The Netherlands) / 0.5% trifluoroacetic acid (TFA; Merck) was applied
twice (1.0 µl) to the arrays as the matrix. Following air-drying, the array was inserted in
a ProteinChip SELDI (PBS IIc) Reader. Using the ProteinChip Software v3.1 (Bio-Rad
Labs), data were collected between 0 and 100 kDa, averaging 80 laser shots with
intensity 147 (IMAC30) / 140 (Q10), detector sensitivity 5, and focus lag time of 746 ns.
Settings for tissue analysis were optimised independently, resulting in an average of 80
laser shots per spectrum at intensity 147 (IMAC30) / 144 (Q10), detector sensitivity 5,
and focus lag time of 746 ns. For mass accuracy, the instrument was calibrated on each
day of measurements with All-in-One peptide standard (Bio-Rad Labs).
Statistics and bioinformatics
Spectra of serum samples were processed per measurement series by the ProteinChip
Software v3.1 (Bio-Rad Labs). Following baseline subtraction, spectra were normalised
to the total ion current. Spectra with normalisation factors < 0.5 or > 2 were excluded
from further analysis. Next, the spectra of the three measurement series were merged in
one experiment file, and the Biomarker Wizard (BMW) software package was applied
for peak detection. Peaks were auto-detected when occurring in at least 15% of spectra
and when having a signal-to-noise (S/N) ≥ 4. Peak clusters were completed with peaks
with a S/N ≥ 1.5 in a cluster mass window of 0.4%. Peak information was subsequently
exported as spreadsheet files, and peak intensities from the duplicate analyses were
averaged. The sera were analysed on three consecutive days, a parameter known to
influence spectral data (17;18). As such, merging peak intensity data of the three
measurement series could lead to spurious results. To this end, peak intensities were log
transformed to obtain normal distributions. Per measurement series, the log
transformed peak intensities were converted to standard Z-values by subtracting the
mean and dividing by the standard deviation. The log-Z transformed data of the three
series were subsequently merged in one file. The T-test was then applied in the
comparison of the mean log-Z peak intensities between BC and CON (i.e., BBD and HC)
per time-point (i.e., pre- and post-surgery). Mean log-Z peak intensities of the different
time points (pre- vs. post-surgery) were compared by the paired T-test in each group
(BC and CON) separately, to preclude bias by group.
All tissue samples were analysed within one measurement series per array type. Spectra
were pre-processed as described above, after which the BMW software package was
applied for peak detection. Peaks were auto-detected when occurring in at least 10% of
spectra and when having a S/N ≥ 3. Peak clusters were completed with peaks with S/N ≥
1 in a cluster mass window of 0.45%. Peak information was subsequently exported as
spreadsheet files, and peak intensities from the duplicate analyses were averaged.
Median peak intensities between BC and CON (i.e., BBD and HC) were compared using
the non-parametric Mann-Whitney U test (MWU). All p-values were corrected for
168
Diagnostic tissue and serum protein profiles for breast cancer
multiple testing by the Bonferroni method, by multiplying p-values with the number of
peak clusters detected and tested.
The classification performance of both serum and tissue protein profiles was assessed by
building classification trees with the Biomarker Patterns Software v.5.0.1. (BPS; BioRad Labs), inputting all peaks detected by the BMW. A ten-fold cross-validation was
performed to estimate the sensitivity and specificity of each tree.
The breast cancer samples were collected during a longer time interval than the control
samples (Table 1), which, despite storage at -70°C (serum) and -196°C (tissue),
potentially can introduce bias (19-21). Therefore, all analyses were also performed in
subsets of the total study population, containing only breast cancer samples that were
matched to the control samples for sample storage duration.
Table 1
Patient and sample characteristics of diagnostic groups evaluable for serum protein profiling.
Breast cancer
(total)
Breast cancer
(subgroup†)
Benign
Healthy
N
75
26
13
13
Age (years), median
[IQR]
57.4
[48.4-69.9]
55.1
[45.2-67.1]
40.8
[30.6-48.9]
43.4
[43.0-48.7]
n.a.
n.a.
3
34
21 / 10
3/3
1
2
9
6/5
2/2
0
n.a.
n.a.
Stage ‡
0
1
2A / 2B
3A / 3C
Unknown
Benign diagnosis
Mastopathy
Periductitis
Fibroadenoma
Sample storage duration
(months), median [IQR]
Pre-surgery sera
Post-surgery sera
n.a.
7
1
5
17.0
[10.7-23.2]
14.8
[8.8-20.4]
8.4
[5.1-12.3]
7.4
[4.1-10.9]
8.0
[3.1-9.6]
5.2
[2.1-8.2]
9.1
[5.5-12.0]
7.5
[4.5-10.6]
Abbreviations: IQR: interquartile range, n.a.: not applicable. † Subgroup: subgroup of breast cancer patients
matched to the control patients for sample storage duration. ‡ Stage: pathologically determined stage.
Finally, we also investigated whether the relationship between peak intensity and
breast cancer / surgery status was influenced by patients’ age or stage of disease. To this
end, breast cancer samples were split according to tertiles of patients’ age, or to stage of
disease (Stage 1, and 2). Peak intensity differences between the categories were
subsequently investigated by ANOVA and T-test statistics (pre- and post-surgery
serum), and the Kruskall-Wallis and Mann-Whitney U (MWU) test (tissue). For breast
cancer/surgery status-associated peak clusters found significantly related to patients’ age
and or stage of disease, the relationship between peak intensity and breast
169
Chapter 4.1
cancer/surgery status was investigated in subgroups of age (i.e., < and > median age) and
stage (i.e., Stage 1, and 2).
identification
Peptide identificati
on
For identification purposes, peptides of interest were extracted from tissue lysates by
reversed-phase C18 magnetic beads (Dynabeads RPC18, Invitrogen, Breda, The
Netherlands) using a Kingfisher 96 pipetting robot (Thermo Fisher Scientific, Waltham,
MA, USA), according to the optimized protocol described in (22). Briefly, tissue lysates
were diluted in TFA 0.1%, after which the peptide content was bound to the beads. The
beads were subsequently washed with 0.1% TFA, and eluted with 50% ACN. Eluate
(1.5 µl) was mixed with α-cyano-4-hydroxy-cinnamic acid matrix (1.5 µl), after which
the mixture was spotted (0.7 µl) on a MALDI target plate. Analyses were performed on
a 4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems, Forster City, CA,
USA). Fragment ion spectra resulting from TOF/TOF analyses were searched manually
for b- and y-ion (related) peaks.
Results
Study population
The characteristics of all assessable study participants are described in Table 1 (serum)
and 2 (tissue). At the time of inclusion, the breast cancer patients were significantly
older than the controls in both the total and matched study population (MWU; p <
0.001). The majority of breast cancer patients was diagnosed with Stage 1 (45%) or Stage
2 (41%) disease. The cancer samples were collected during a longer time interval than
the control samples, resulting in significantly different sample storage times. Storage
durations were not significantly different between both groups in the matched study
population.
Serum protein profiling
Representative serum and tissue SELDI-TOF MS spectra are presented in Figure 1 and 2.
Following serum IMAC30 analyses, the Biomarker Wizard detected 51 peak clusters.
None of the detected peak clusters were found significantly different in expression
between breast cancer and control in either the total study population or the matched
subgroup. Comparing pre- and post-surgery samples, significant differences were
observed in the breast cancer group, as intensities of the m/z 4285 peak cluster and its
oxidised form at m/z 4301 significantly decreased following surgery (MWU; Bonferroni
corrected p = 0.042, and 0.001, respectively, Figure 3).
The Biomarker Wizard detected 54 peak clusters following Q10 serum analyses. In the
pre-surgery samples, intensities of the m/z 3939 peak cluster were found significantly
increased in breast cancer compared to control in the total study and matched
170
Diagnostic tissue and serum protein profiles for breast cancer
population (T-test; Bonferroni corrected p = 0.003, and 0.022, respectively). Postsurgery m/z 3939 peak intensities were similar between cancer and control (Figure 3),
as were all other peak clusters. None of the peak clusters were found significantly
different in intensity between the pre- and post-surgery samples in either one of the
diagnostic groups.
Table 2
Patient and sample characteristics of diagnostic groups evaluable for tissue protein profiling.
Breast cancer
(total)
Breast cancer
(subgroup†)
Benign
Healthy
N
63
22
13
11
Age (years), median
[IQR]
59.1
[48.7-70.6]
56.2
[45.6-76.3]
41.2
[30.9-49.3]
36.6
[43.1-48.3]
n.a.
n.a.
2
28
17 / 10
3/3
2
8
5/3
2/2
n.a.
n.a.
Stage ‡
0
1
2A / 2B
3A / 3C
Benign diagnosis
n.a.
7
1
5
Sample storage duration
(months), median [IQR]
17.5
[10.1-25.4]
10.3
[7.4-13.8]
9.3
[4.8-11.1]
8.8
[6.4-13.4]
Abbreviations: IQR: interquartile range, n.a.: not applicable. † Subgroup: subgroup of breast cancer patients
matched to the control patients for sample storage duration. ‡ Stage: pathologically determined stage.
None of the three significant peak clusters (i.e., m/z 4285, 4301, and 3939) were found
related to patients’ age (ANOVA; p > 0.05) and stage of disease (T-test; p > 0.05) in
either the pre- or post-surgery samples. In addition, no suitable classification trees were
obtained in either the IMAC30 or Q10 serum data (10-fold cross validated performance
< 66%, data not shown).
Tissue protein profiling
Of the 114 peak clusters detected by the Biomarker Wizard in the IMAC30 spectra, 20
were found significantly different in intensity between breast cancer and control. None
of these 20 peak clusters were found related to patients’ age (Kruskall-Wallis test; p >
0.05), while 10 peak clusters were found significantly discriminative in the matched
sample set as well (Table 3). Some of these peaks showed a gradual increase (i.e., m/z
6833) or decrease (i.e., m/z 4505) in peak intensity from HC to BBD to BC (Figure 4).
We also observed some clusters (i.e., m/z 9517) to gradually change in intensity with
increasing cancer stage (Figure 4), though none of the peak clusters were found related
to stage of disease (MWU test; p > 0.05).
171
Chapter 4.1
Figure 1
Representative example of serum IMAC30 (pre- vs. post-surgery) and Q10 (pre-surgery BC vs. HC)
protein profiles.
4000
40
20
0
40
20
0
40
20
0
40
20
0
6000
8000
4285.4+H
4301.8+H
BC pre-surgery (IMAC30)
BC post-surgery (IMAC30)
HC pre-surgery (Q10)
3938.5+H
4000
BC pre-surgery (Q10)
6000
8000
In the Q10 spectra, 113 peak clusters were detected by the Biomarker Wizard. Of the 27
peak clusters found significantly different in intensity between breast cancer and
control in the total sample set, 17 were also significantly different in the matched
sample set (Table 4). None of the significant peak clusters were found related to
patient’s age (Kruskall-Wallis test; p > 0.05). Similar to the IMAC30 data, some peaks
showed a gradual increase or decrease (i.e., m/z 7286) in peak intensity going from BC
to BBD to HC (Figure 5). Again, some clusters were found to gradually increase (i.e.,
m/z 9745) or decrease (i.e., m/z 2612) in intensity with increasing cancer stage (Figure
5). However, intensities of none of the significant peak clusters were found related to
stage of disease (MWU test; p > 0.05).
Optimal discrimination between breast cancer and control was achieved by one-node
classification trees, applying either the m/z 5430 (IMAC30) or the m/z 19899 (Q10) peak
cluster in both the total and the matched study population. Ten-fold cross-validated
performance of classification trees ranged from 67% to 87% (data not shown).
Peptide identification
The MALDI tissue peptide profiles obtained using C18 magnetic beads were searched
for the presence of discriminative SELDI peaks based on mass matching. Due to the
different chemistries used for peptide capture for SELDI-TOF MS (Q10 and IMAC30
Ni) and MALDI-TOF MS (C18), and to the mass limitations for direct fragmentation, we
were able to elucidate the identity of only two Q10 tissue lysate peak clusters found
significantly different in expression between cancer and control. The SELDI-TOF MS
peak clusters at m/z 3090 and m/z 4169 were detected by MALDI-TOF/TOF MS as MH+
172
Diagnostic tissue and serum protein profiles for breast cancer
ions at m/z 3084.80 and m/z 4163.04 (default calibration), respectively. Using one a-ion
peak, eleven b-ion peaks, one c-ion peak, and three y-ion peaks detected by MALDITOF/TOF MS/MS (Figure 6), m/z 3084.80 was identified manually as the albumin25-51
fragment DAHKSEVAHRFKDLGEENFKALVLIAF (theoretical monoisotopic mass
3083.62 Da, pI 6.04). Six of the b-ion peaks were also identified in the MALDITOF/TOF MS/MS spectrum of m/z 4163.04, explaining the major peaks in the spectrum.
The m/z 4169/4163.04 peptide corresponds to the N-terminal albumin25-60 fragment
(sequence
DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPF,
theoretical
monoisotopic mass 4162.11 Da, and pI 6.04). On the SELDI platform, the peak
intensities of the discriminative m/z 3549, 3563, and 3711 peak clusters were found
highly correlated to the peak intensities of the N-terminal m/z 3090 and 4369 albumin
fragments (Spearman’s R ≥ 0.75), suggesting structural homology between these peak
clusters, or cleavage by the same protease acting on a different substrate.
Figure 2
Representative example of tissue IMAC30 and Q10 protein profiles (BC vs. HC).
4000
6000
8000
4504.6+H
40
20
0
HC (IMAC30)
40
20
0
40
20
0
BC (IMAC30)
2610.5+H
HC (Q10)
40
20
0
BC (Q10)
4000
6000
8000
Discussion
In the current study, we analysed pre- and post-surgery serum and tissue samples of
breast cancer patients (n = 75) and controls (n = 26) using the SELDI-TOF MS
technology. All samples were collected prospectively by rigorous adherence to a strictly
defined protocol. Following serum analyses, one Q10 peak cluster (m/z 3939) was found
significantly increased in breast cancer compared to control, while IMAC30 analyses
revealed two peak clusters (m/z 4292 and m/z 4301) that significantly decreased in
expression following surgery of the breast cancer patients. In contrast, tissue analyses
yielded 10 IMAC30 and 17 Q10 peak clusters with a significantly different expression
173
Chapter 4.1
between cancer and control. A number of peak clusters gradually increased or decreased
in intensity with increasing cancer stage, or from healthy to benign to cancer (Figure 4
and 5). Ten-fold cross validation performances of the various classification trees ranged
from 67% to 87%.
Figure 3
Intensities of the significantly different peak clusters detected in the IMAC30 (m/z 4285 and m/z
4301) and Q10 (m/z 3939) serum analysis (y-axis: log-Z transformed peak intensity, x-axis: group
CON (control), and BC (breast cancer)).
m/z 4285
4
m/z 4301
p = 0.042
4
m/z 3939
p = 0.001
4
2
2
2
0
0
0
-2
-2
-2
CON
BC
CON
BC
p = 0.003
CON
BC
Serum analyses
Although one peak cluster (m/z 3939) differed significantly in expression between
groups (BC vs. CON), obtained serum protein profiles could not be applied in a
satisfactory classification of samples. The detection of discriminating peak clusters
might have been hampered by our limited study population, as well as by the
composition of the diagnostic groups investigated. Since our control group contains
benign breast disease patients, and our cancer group contains predominantly early stage
disease, the diagnostic groups are less divergent, which is likely to have impeded group
distinction. Nonetheless, comparison of breast cancer to solely benign breast disease or
healthy control did not yield significantly different peak clusters (data not shown), a
finding most likely caused by the very limited samples sizes of the respective control
groups (BBD: n = 13, HC: n = 11).
Since monitoring protein expression dynamics in response to surgery may help to better
understand breast cancer pathogenesis, we also investigated serum proteome transitions
occurring after surgery. To this end, protein profiles of sera procured prior to and
following surgery were compared. Using paired statistics, we aimed to detect intraindividual differences, which might otherwise be masked by inter-individual variation
(23). Both m/z 4285 and m/z 4301 peak clusters were observed to significantly decrease
following surgery. Since this effect was detected solely in the breast cancer group (n =
75), we hypothesise both peak clusters to be cancer-associated.
174
Diagnostic tissue and serum protein profiles for breast cancer
Table 3
Significantly different peak clusters detected in the IMAC30 tissue protein profiles.
Peak cluster
(m/z)
Total study population
Matched study population
p (MWU)
Peak ratio†
p (MWU)
Peak ratio†
3347
3545
3600
3897
4207
4505
4876
5225
5278
5431
6833
7950
9518
10938
15296
15895
16076
31199
31818
66656
0.001
0.008
0.001
0.028
0.020
0.001
< 0.001
0.003
< 0.001
< 0.001
0.002
0.009
< 0.001
< 0.001
0.002
0.006
< 0.001
< 0.022
0.026
0.004
0.30
1.86
18.62
4.24
3.47
0.30
5.96
4.81
0.23
0.31
4.87
0.46
2.49
7.00
0.44
0.50
0.47
0.45
0.42
0.24
n.s.
n.s.
n.s.
0.021
0.002
0.006
n.s.
n.s.
0.027
0.027
0.027
n.s.
0.009
0.014
n.s.
n.s.
0.049
n.s.
n.s.
0.039
0.41
1.46
17.38
5.21
4.75
0.26
5.49
4.97
0.26
0.36
5.64
0.47
2.91
4.80
0.48
0.53
0.49
0.48
0.47
0.24
Abbreviations: MWU: Mann-Whitney U test BC vs. CON, Bonferroni corrected p-value, n.s.: not significant.
† Peak ratio: average peak intensity in breast cancer spectra divided by the average intensity in control
spectra.
The m/z 4285 and m/z 4301 peak clusters have both been reported previously as
diagnostic markers for breast cancer (8;15;24-26). Identified as a putative ITIH4
fragment (15;25) and its oxidised form, previous studies have described either a
decreased (8;24;26) or increased (15;25) serum m/z 4285 ITIH4 expression in breast
cancer. Most likely, these contradictory findings originate from the postulated
instability of ITIH4 fragments (25;27;28), as well as the heterogeneity of the different
study populations investigated. Despite these considerations, ITIH4 fragments are
currently understood to contain cancer-type specific marker utility, as they are
hypothesised to be generated by proteases contributed by cancer cells (25;27;29).
Tissue analysis
In tissue, we detected several peak clusters to differ significantly in expression between
cancer and control. Of the 47 discriminative peak clusters observed in the total study
population, 27 were also found significantly different in the matched subgroup. This
discrepancy between the total and the matched study population is most likely due to
the limited sample size, as effect estimates (i.e., peak ratio’s) are similar between both
populations (see Table 3 and 4). Regarding the 67 to 87% performance of constructed
175
Chapter 4.1
classification trees, the tissue protein profiles had improved diagnostic utility compared
to the serum protein profiles.
Figure 4
Intensities of the significantly different peak clusters detected in the IMAC30 tissue analysis (yaxis: peak intensity, x-axis: group HC (healthy control), BBD (benign breast disease), and BC 1 – 3
(breast cancer stage 1-3)).
m/z 6833
m/z 4505
m/z 9518
8
50
6
6
40
30
4
4
20
10
0
HC
Figure 5
BBD
2
2
0
0
HC
BC
BBD
BC
HC BBD BC 1 BC 2 BC 3
Intensities of the significantly different peak clusters detected in the Q10 tissue analysis (y-axis:
peak intensity, x-axis: group HC (healthy control), BBD (benign breast disease), and BC 1-3 (breast
cancer stage 1-3)).
m/z 7286
m/z 2612
30
20
10
m/z 9745
12
8
9
6
6
4
3
2
0
0
HC BBD BC 1 BC 2 BC 3
0
HC
BBD
BC
HC BBD BC 1 BC 2 BC 3
By analogy to the linear model of the development of colon cancer (i.e., the adenomacarcinoma sequence), breast lesions are believed to progress in a linear fashion through
the sequential stages of normal epithelium, to usual hyperplasia (without atypia), to
atypical hyperplasia, to carcinoma in situ, and, ultimately, invasive breast cancer
(30;31). The progression to malignant breast disease is associated with accumulation of
an increasing number of genetic mutations (32), as well as changes in the expression of
cell cycle-related and apoptosis-related proteins (33). This continuum of breast
alterations appears to be visualised by the peak clusters that gradually increase or
176
Diagnostic tissue and serum protein profiles for breast cancer
decrease in intensity from healthy to benign to cancer, and with increasing cancer
stage.
Table 4
Significantly different peak clusters detected in the Q10 tissue protein profiles.
Peak cluster
(m/z)
Total study population
Matched study population
p (MWU)
Peak ratio†
p (MWU)
Peak ratio†
1871
2021
2074
2089
2504
2612
2959
2976
3091
3201
3298
3326
3549
3563
3711
3987
4169
4857
7286
7339
9745
9958
12173
12636
16804
19899
35988
0.048
0.011
0.005
0.005
< 0.001
< 0.001
0.018
0.021
< 0.001
0.002
< 0.001
< 0.001
< 0.001
< 0.001
< 0.001
0.424
0.006
< 0.001
< 0.001
0.049
< 0.001
0.002
< 0.001
< 0.001
0.012
< 0.001
< 0.001
0.62
2.93
0.38
0.36
0.39
0.15
2.68
2.19
0.38
2.38
0.32
0.36
0.33
0.28
0.34
0.41
0.33
0.36
0.29
0.54
3.87
2.14
8.53
2.58
1.80
4.17
1.81
n.s.
n.s.
n.s.
n.s.
0.006
0.020
n.s.
n.s.
0.022
n.s.
0.018
0.022
0.006
0.006
0.015
n.s.
0.038
0.029
0.042
n.s.
0.002
0.024
0.003
0.010
n.s.
< 0.001
0.029
0.62
2.31
0.44
0.38
0.37
0.24
2.64
1.84
0.35
2.30
0.30
0.34
0.26
0.25
0.32
0.45
0.27
0.36
0.47
0.55
4.20
2.26
4.57
2.89
1.93
4.57
2.04
Abbreviations: MWU: Mann-Whitney U test BC vs. CON, Bonferroni corrected p-value, n.s.: not significant.
† Peak ratio: average peak intensity in breast cancer spectra divided by the average intensity in control
spectra.
Of the discriminative peak clusters detected in the Q10 tissue lysate analyses, m/z 3090
and m/z 4169 were identified as differentially truncated N-terminal albumin fragments.
Three other discriminative peaks (m/z 3549, 3563, and 3711) were found highly
correlated to these peaks, indicating structural similarity or cleavage by the same
protease. Their diagnostic value could, however, possibly be compromised by the
continuous aspecific proteolytic (albumin) degradation known to occur during
prolonged storage at -30°C. For example, we previously found the expression of the m/z
3090 albumin25-51 fragment to be significantly associated with storage duration of breast
177
Chapter 4.1
cancer sera at -30°C. Peak intensities were found to increase up to approximately 5
years of storage, after which they gradually decreased. A similar association to sample
storage duration was observed in an other study by our group, in which we investigated
sera of colorectal cancer patients stored for 1.4 years at -30°C (19). In the current study,
however, the discriminative N-terminal albumin fragments were detected in tissue
lysate, rather than in serum. All tissue specimens were snap frozen immediately after
surgical excision, after which they were stored in liquid nitrogen (-196°C), further
limiting in vitro proteolytic activity compared to -30°C. Indeed, the albumin fragments
were found significantly discriminative in both the total and matched study population,
indicating a lack of association to sample storage duration. This finding was confirmed
by the observation that the albumin fragments were discriminative to the same extent
in both the total study population and the sub-population matched for sample storage
duration. Hence, the albumin fragments observed in the current study most likely are
(breast) cancer specific.
Figure 6
100
Annotated MALDI-TOF/TOF MS/MS spectrum of m/z 3084.80.
110.0737
522.4
90
80
% Intensity
70
60
50
b13
40
1521.8853
70.0805
30
b5
20
539.2104
b3
b4
10
b6
452.1948 668.1923 b7
767.2963
0
9.0
b12
b8
838.4666
324.0803
658.6
b10
y26
1406.8763
1131.5055 a11
b16
c12
b17
2969.8669
y19
y21
1821.0933
2247.5828
1250.7838 1423.8265
1950.1238
2417.8674
1308.2
1957.8
2607.4
MH+
3084.6667
3257.0
Mass (m/z)
Although the N-terminal albumin fragments have not been described in breast cancer
hitherto, it is currently understood that proteolytic fragments of high-abundant serum
proteins can bear cancer(type)-specificity (27;34). Tumourigenesis is associated with
changes in the balance between proteases and protease inhibitors that are secreted by
178
Diagnostic tissue and serum protein profiles for breast cancer
the tumour in its microenvironment (35-37). These enzymes can not yet be applied in
cancer detection, as they generally do not reach detectable levels in the circulation (27).
However, since they enzymatically process high-abundant host-response proteins,
specific proteolytic fragments thereof can serve as surrogate biomarkers for the presence
and action of underlying proteases. Indeed, many cancer-specific proteolytic fragments
of high-abundant host-response proteins (e.g., ITIH4, fibrinogen, apolipoproteins, and
complement components) have been detected in serum (25;27;34;38), similar to the
discriminative serum ITIH4 fragment detected in the current study. As the breast
tumour microenvironment is known to exhibit various changes in the amount and
activity of proteolytic enzymes (35;39), the differential albumin fragments discovered in
the current study could well result from such specific proteolytic activity. Protease
levels have been found proportional to tumour size, and hence, tumour stage (40). In
the current study, we investigated predominantly early stage disease, suggesting limited
protease activity in the tumour microenvironment, which might explain the lack of
detection of these albumin fragments in serum. Yet, these fragments might well be
detected in blood by other, more sensitive and specific analytical methods. However,
upon translation into a diagnostic serum assay, care should be taken to prevent bias by
pre-analytical parameters. Nevertheless, provided that these albumin fragments are
validated in independent study populations, they can potentially offer further insight
into the pathophysiological mechanisms associated with, or underlying, breast cancer.
Similarly, structural identification of the other discriminative peak clusters observed in
tissue is warranted to assess their potential role in breast cancer. Lastly, these
discriminative proteins can potentially aid in improving accurate diagnosis of breast
cancer.
Conclusion
In conclusion, though we detected some discriminative peak clusters following serum
analyses, constructed classification models had moderate performances. Likely
hampered by the highly complex nature of breast cancer (41), the current approach
appears not sensitive enough to reliably detect the cancer-specific markers that are
allegedly present in the low-abundant serum proteome (42;43). Analysis at the tumour
level, however, yielded several peak clusters with a significantly different expression
between breast cancer and control. Two discriminative peak clusters were identified as
N-terminal albumin fragments. Presumably generated by (breast) cancer-specific
proteolytic activity in the tumour microenvironment, these albumin fragments can
potentially offer further insight into the pathophysiological mechanisms associated
with, or underlying, breast cancer, and improve accurate breast cancer diagnosis,
provided that these fragments are validated in independent study populations.
179
Chapter 4.1
Acknowledgement
We gratefully acknowledge Marian van der Linde (mammacare nurse), the Department
of Clinical Chemistry, and the Department of Anaesthesiology for help with collection,
preparation and storage of serum and tissue samples.
References
References
(1)
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T et al. Cancer statistics, 2008. CA Cancer J Clin 2008;
58(2):71-96.
(2)
Ries LA, Melbert D, Krapcho M, Mariotto A, Miller BA, Feuer EJ et al. SEER Cancer Statistics Review,
1975-2004. http://seer.cancer.gov/csr/1975_2004/ . 2008.
(3)
Stieber P, Molina R, Chan DW, Fritsche HA, Beyrau R, Bonfrer JM et al. Clinical evaluation of the
Elecsys CA 15-3 test in breast cancer patients. Clin Lab 2003; 49(1-2):15-24.
(4)
Engwegen JY, Gast MC, Schellens JH, Beijnen JH. Clinical proteomics: searching for better tumour
markers with SELDI-TOF mass spectrometry. Trends Pharmacol Sci 2006; 27(5):251-259.
(5)
Banks RE, Dunn MJ, Hochstrasser DF, Sanchez JC, Blackstock W, Pappin DJ et al. Proteomics: new
perspectives, new biomedical opportunities. Lancet 2000; 356(9243):1749-1756.
(6)
Hutchens TW, Yip TT. New desorption strategies for the mass spectrometric analysis of macromolecules.
Rapid Commun Mass Spectrom 1993; 7:576-580.
(7)
Belluco C, Petricoin EF, Mammano E, Facchiano F, Ross-Rucker S, Nitti D et al. Serum Proteomic
Analysis Identifies a Highly Sensitive and Specific Discriminatory Pattern in Stage 1 Breast Cancer. Ann
Surg Oncol 2007; 14(9):2470-2476.
(8)
Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for
identification of serum biomarkers to detect breast cancer. Clin Chem 2002; 48(8):1296-1304.
(9)
Engwegen JY, Helgason HH, Cats A, Harris N, Bonfrer JM, Schellens JH et al. Identification of serum
proteins discriminating colorectal cancer patients and healthy controls using surface-enhanced laser
desorption ionisation-time of flight mass spectrometry. World J Gastroenterol 2006; 12(10):1536-1544.
(10) Habermann JK, Roblick UJ, Luke BT, Prieto DA, Finlay WJ, Podust VN et al. Increased serum levels of
complement C3a anaphylatoxin indicate the presence of colorectal tumors. Gastroenterology 2006;
131(4):1020-1029.
(11) Engwegen JY, Mehra N, Haanen JB, Bonfrer JM, Schellens JH, Voest EE et al. Validation of SELDI-TOF
MS serum protein profiles for renal cell carcinoma in new populations. Lab Invest 2007; 87(2):161-172.
(12) Tolson J, Bogumil R, Brunst E, Beck H, Elsner R, Humeny A et al. Serum protein profiling by SELDI
mass spectrometry: detection of multiple variants of serum amyloid alpha in renal cancer patients. Lab
Invest 2004; 84(7):845-856.
(13) Becker S, Cazares LH, Watson P, Lynch H, Semmes OJ, Drake RR et al. Surfaced-enhanced laser
desorption/ionization time-of-flight (SELDI-TOF) differentiation of serum protein profiles of BRCA-1
and sporadic breast cancer. Ann Surg Oncol 2004; 11(10):907-914.
(14) Hu Y, Zhang S, Yu J, Liu J, Zheng S. SELDI-TOF-MS: the proteomics and bioinformatics approaches in
the diagnosis of breast cancer. Breast 2005; 14(4):250-255.
(15) Li J, Orlandi R, White CN, Rosenzweig J, Zhao J, Seregni E et al. Independent Validation of Candidate
Breast Cancer Serum Biomarkers Identified by Mass Spectrometry. Clin Chem 2005; 51(12):2229-2235.
180
Diagnostic tissue and serum protein profiles for breast cancer
(16) Banks RE. Preanalytical influences in clinical proteomic studies: raising awareness of fundamental issues
in sample banking. Clin Chem 2008; 54(1):6-7.
(17) Karsan A, Eigl BJ, Flibotte S, Gelmon K, Switzer P, Hassell P et al. Analytical and preanalytical biases in
serum proteomic pattern analysis for breast cancer diagnosis. Clin Chem 2005; 51(8):1525-1528.
(18) Hu J, Coombes KR, Morris JS, Baggerly KA. The importance of experimental design in proteomic mass
spectrometry experiments: some cautionary tales. Brief Funct Genomic Proteomic 2005; 3(4):322-331.
(19) Engwegen JY, alberts M, knol JC, Jimenez C.R., Depla ACTM, Tuynman H et al. Influence of variations
in sample handling on SELDI-TOF MS serum protein profiles for colorectal cancer. Proteomics Clin
Appl 2008; 2(6):936-945.
(20) Hsieh SY, Chen RK, Pan YH, Lee HL. Systematical evaluation of the effects of sample collection
procedures on low-molecular-weight serum/plasma proteome profiling. Proteomics 2006; 6(10):31893198.
(21) Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al. HUPO Plasma Proteome
Project specimen collection and handling: towards the standardization of parameters for plasma
proteome samples. Proteomics 2005; 5(13):3262-3277.
(22) Jimenez CR, El Filali Z, Knol JC, Hoekman K, Kruyt FAE, Giaccone G et al. Automated serum peptide
profiling using novel magnetic C18 beads off-line coupled to MALDI-TOF-MS. Proteomics Clin Appl
2007; 1(6):598-604.
(23) Kasthuri RS, Verneris MR, Ibrahim HN, Jilma B, Nelsestuen GL. Studying multiple protein profiles over
time to assess biomarker validity. Expert Rev Proteomics 2006; 3(4):455-464.
(24) Mathelin C, Cromer A, Wendling C, Tomasetto C, Rio MC. Serum biomarkers for detection of breast
cancers: a prospective study. Breast Cancer Res Treat 2005;1-8.
(25) Song J, Patel M, Rosenzweig CN, Chan-Li Y, Sokoll LJ, Fung ET et al. Quantification of fragments of
human serum inter-alpha-trypsin inhibitor heavy chain 4 by a surface-enhanced laser
desorption/ionization-based immunoassay. Clin Chem 2006; 52(6):1045-1053.
(26) van Winden AWJ, Gast MCW, Beijnen JH, Rutgers EJ, Grobbee DE, Peeters PHM et al. Validation of
previously identified serum biomarkers for breast cancer with SELDI-TOF MS: a case control study.
BMC Medical Genomics 2008; Accepted for publication.
(27) Fung ET, Yip TT, Lomas L, Wang Z, Yip C, Meng XY et al. Classification of cancer types by measuring
variants of host response proteins using SELDI serum assays. Int J Cancer 2005; 115(5):783-789.
(28) Timms JF, Arslan-Low E, Gentry-Maharaj A, Luo Z, T'Jampens D, Podust VN et al. Preanalytic influence
of sample handling on SELDI-TOF serum protein profiles. Clin Chem 2007; 53(4):645-656.
(29) Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L et al. Correcting common errors in
identifying cancer-specific serum peptide signatures. J Proteome Res 2005; 4(4):1060-1072.
(30) Arpino G, Laucirica R, Elledge RM. Premalignant and in situ breast disease: biology and clinical
implications. Ann Intern Med 2005; 143(6):446-457.
(31) Santen RJ, Mansel R. Benign breast disorders. N Engl J Med 2005; 353(3):275-285.
(32) Reis-Filho JS, Lakhani SR. The diagnosis and management of pre-invasive breast disease: genetic
alterations in pre-invasive lesions. Breast Cancer Res 2003; 5(6):313-319.
(33) Mommers EC, van Diest PJ, Leonhart AM, Meijer CJ, Baak JP. Balance of cell proliferation and apoptosis
in breast carcinogenesis. Breast Cancer Res Treat 1999; 58(2):163-169.
(34) Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB et al. Differential
exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006; 116(1):271284.
(35) Garbett EA, Reed MW, Stephenson TJ, Brown NJ. Proteolysis in human breast cancer. Mol Pathol 2000;
53(2):99-106.
181
Chapter 4.1
(36) Blasi F. Proteolysis, cell adhesion, chemotaxis, and invasiveness are regulated by the u-PA-u-PAR-PAI-1
system. Thromb Haemost 1999; 82(2):298-304.
(37) Bank U, Kruger S, Langner J, Roessner A. Review: peptidases and peptidase inhibitors in the
pathogenesis of diseases. Disturbances in the ubiquitin-mediated proteolytic system. Proteaseantiprotease imbalance in inflammatory reactions. Role of cathepsins in tumour progression. Adv Exp
Med Biol 2000; 477:349-378.
(38) Villanueva J, Martorella AJ, Lawlor K, Philip J, Fleisher M, Robbins RJ et al. Serum peptidome patterns
that distinguish metastatic thyroid carcinoma from cancer-free controls are unbiased by gender and age.
Mol Cell Proteomics 2006; 5(10):1840-1852.
(39) Martinez JM, Prieto I, Ramirez MJ, Cueva C, Alba F, Ramirez M. Aminopeptidase activities in breast
cancer tissue. Clin Chem 1999; 45(10):1797-1802.
(40) Ho CH, Yuan CC, Liu SM. Diagnostic and prognostic values of plasma levels of fibrinolytic markers in
ovarian cancer. Gynecol Oncol 1999; 75(3):397-400.
(41) Bertucci F, Birnbaum D, Goncalves A. Proteomics of Breast Cancer: Principles and Potential Clinical
Applications. Mol Cell Proteomics 2006; 5(10):1772-1786.
(42) Hoffman SA, Joo WA, Echan LA, Speicher DW. Higher dimensional (Hi-D) separation strategies
dramatically improve the potential for cancer biomarker detection in serum and plasma. J Chromatogr B
Analyt Technol Biomed Life Sci 2007; 849(1-2):43-52.
(43) Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects.
Mol Cell Proteomics 2002; 1(11):845-867.
182
Conclusions and perspectives
Conclusions and perspectives
Conclusions and Perspectives
Recent advances in mass spectrometry, such as surface-enhanced laser
desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF MS) technology,
have enabled the simultaneous detection of a large part of the proteome in a highthroughput fashion. The relative simplicity of sample preparation, high analytical
sensitivity, and speed of data acquisition renders SELDI-TOF MS a promising
technology for detection of novel biomarkers in complex biological matrices, such as
serum and tissue. In the present thesis, technical aspects of the SELDI-TOF MS
technology (e.g., technical improvements, sample handling issues) are described.
Subsequently, the use of SELDI-TOF MS for protein profiling of serum and breast tissue
in search of novel diagnostic and prognostic biomarkers for breast cancer is described.
In a retrospective study performed in serum, we discovered and structurally identified
several discriminating proteins. The diagnostic performance of the constructed
classification model was only moderate though. In addition, upon analysis of an
independent sample set, we could not confirm the diagnostic performance of previously
published candidate biomarkers in serum. Furthermore, we investigated the potential of
SELDI-TOF MS serum protein profiles in improving breast cancer prognostication.
Although in crude serum, we could not detect prognostic protein profiles, analysis of
anion-exchange fractionated serum revealed three protein peaks, intensities of which
were independently associated to recurrence free survival. In addition, investigation of
prospectively collected breast tissues revealed several highly discriminative proteins,
two of which were structurally identified as N-terminal albumin fragments. Most
probably generated by tumour-specific proteolytic activity, these fragments might
provide further insight into the pathophysiological mechanisms associated with, or
underlying, breast cancer. These results highlight the potential of tissue SELDI-TOF MS
analysis in clinical proteomics research. Yet, its successful application requires further
consideration of key issues such as sample handling procedures, reproducibility and
external validation of results.
SELDI--TOF MS
Breast cancer biomarker discovery by SELDI
Detection of breast cancer at an early stage, when it is still curable by current treatment
modalities, could be greatly facilitated by the application of blood-borne biomarkers. In
search for such markers, we performed a retrospective study, investigating sera from
breast cancer patients and healthy controls (Chapter 3.1). To obtain reliable estimates of
classification model performance, we applied a split-sample approach, dividing the
samples into a training and a test set. Several discriminative serum proteins were
discovered and structurally identified as acute phase reactants (a.o. C3a des-arginine
anaphylatoxin and inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4) fragments).
However, the constructed classification model had a moderate performance,
187
Conclusions and perspectives
comparable to those reported by previously performed SELDI-TOF MS validation
studies. The apparent intricacy associated with biomarker detection in crude serum was
confirmed by a second study, in which we aimed to assess the reproducibility of
previously reported breast cancer biomarkers (Chapter 3.2). Following analysis of an
independent sample set by meticulous use of reported analytical assays, only part of the
previously reported candidate biomarkers was recovered, while none of the previously
published expression differences could be confirmed.
Better breast cancer prognostication may improve selection of patients whom would
benefit from adjuvant therapy. Hence, in search for novel prognostic serum markers,
sera of high-risk primary breast cancer patients (Chapter 3.3) were investigated and a
strong association between haptoglobin phenotype and recurrence free survival was
discovered. This result could, however, not be confirmed following validation by
analysis of a similar, but six-fold larger sample set, rendering our initial observation
most likely false positive. These results emphasize the importance of validation in a
sufficiently sized population, as even the most thorough statistical methodology cannot
preclude chance results and, hence, erroneous conclusions.
Subsequently, to specifically explore the allegedly high-informative low abundant
serum proteome, we fractionated a subset of the previous validation set (n = 82) and
analysed part of the resulting fractions by SELDI-TOF MS (Chapter 3.4). Four protein
peaks, one of which identified as a serum ITIH4 fragment, were found to contain
significant prognostic value. Three peak clusters (including the ITIH4 fragment)
remained significantly associated to recurrence free survival following inclusion of
clinical parameters of known prognostic value. Hence, investigation of the postoperative, anion-exchange fractionated, serum proteome by e.g., SELDI-TOF MS, is
promising for the detection of novel prognostic factors. Provided that the (other three)
discovered candidate markers are structurally identified and validated by independent
study populations, they can eventually be applied to improve breast cancer
prognostication.
Lastly, we investigated tissue specimens collected prospectively from breast cancer
patients, benign breast disease patients and healthy controls (Chapter 4.1). Contrary to
the crude serum analyses, proteome analyses at the tumour level yielded multiple
SELDI-TOF MS peak clusters with discriminative value between malignant and
healthy/benign tissue. Two proteins were structurally identified as N-terminal albumin
fragments. Most probably originating from tumour-specific protease activity, following
their validation these proteins might provide further insight into the pathophysiological
mechanisms associated with, or underlying, breast cancer. Similarly, structural
identification of the other discriminative peak clusters observed in tissue is warranted
to assess their potential role in breast cancer. Lastly, upon validation, these
discriminative proteins can potentially aid in a more accurate diagnosis of breast cancer.
The apparent difficulty in the detection of general, high-performance biomarkers in
crude serum for diagnosis or prognosis of breast cancer by SELDI-TOF MS probably is
188
Conclusions and perspectives
inherent to the highly complex nature of breast cancer, as well as to its high
heterogeneity. Selection of breast cancer subgroups is therefore likely to facilitate future
biomarker detection in crude serum. Furthermore, the detection of valid biomarkers is
hampered by the type of research applied. Most proteomic datasets are subject to both
the ‘curse of dimensionality’ (large number of protein peaks) and the ‘curse of dataset
sparsity’ (limited number of samples). As such, datasets are frequently subjected to
multiple testing. Consequently, classification models are easily over-trained (i.e.,
overfitted), resulting in many candidate biomarkers prone to be false positive. Crossvalidation can offer some indication of the generalisability of the classification model,
provided that potential bias is not hard-wired into the data. Hence, a more reliable
estimate of general performance is obtained by analysis of an independent, but similar,
sample set.
Detection of candidate biomarkers might also have been hampered by the biological
matrix commonly investigated (i.e., crude serum). While only 22 proteins comprise
more than 99% of the human serum proteome, the low abundant proteins make up for
the remaining < 1%. This large dynamic range of proteins in crude serum (~10 orders of
magnitude) hampers detection of the allegedly high-informative low abundant serum
proteins, since the currently applied proteomic technologies are limited to a dynamic
range of 2-4 orders of magnitude. Serum fractionation, however, is likely to facilitate
detection of the low abundant proteins through reduction of this dynamic range, as
demonstrated in Chapter 3.4.
In addition, while serum is generated by coagulation, its proteome is prone to the
proteases involved in this cascade, as well as to those involved in the complement
cascade, activated upon clotting. Various pre-analytical parameters, such as sampling
device, clotting temperature, and storage time, can thus all exert a distinct influence on
the serum proteome, as illustrated by Chapter 2.2. Investigating breast cancer sera
stored for 1 to 11 years at -30°C, we identified several proteins with a significant (non-)
linear association to storage duration. These proteins have, however, also been
described previously as potential cancer markers, rendering them specific to both
disease and sample handling issues. Hence, to prevent experimental variation from
being interpreted erroneously as disease associated variation, assessment of potential
(non-)linear confounding effects by pre-analytical parameters is a prerequisite in
biomarker discovery studies. Moreover, to ascertain reproducibility and prevent
systematic bias and overfitting of data, proteomic studies aiming at the discovery of
biomarkers preferably should include two distinct and complementary steps: a
discovery phase and a validation phase.
The above-mentioned hurdles in serum biomarker detection apply to tissue analyses as
well, since the complexity and heterogeneity of breast cancer or the type of research is
not affected by the biological matrix investigated. Likewise, the tissue proteome is
susceptible to proteolytic activity induced by tumour resection. Nonetheless, analysis of
tissue specimens, collected following a strict protocol to prevent bias by pre-analytical
189
Conclusions and perspectives
parameters, yielded many more discriminate proteins compared to serum analyses
(Chapter 4). As the concentration of potential biomarkers will be highest in the tumour
and its immediate microenvironment, the discrepancy in results between tissue and
serum is most likely caused by dilution (or degradation) of tissue proteins upon entrance
of the circulatory system. Though assessment of these tumour-derived markers in blood
would facilitate their clinical application, the discriminative tissue proteins were not
detected in serum collected from the same patients by a rigorous sample collection
protocol, using the same SELDI-TOF MS assay that was applied in tissue analysis. Yet,
these fragments might well be detected in blood by other, more sensitive and specific
analytical methods. However, upon translation into a diagnostic serum assay, care
should be taken to prevent bias by pre-analytical parameters, as one albumin fragment
was previously found to be associated with sample storage duration of serum at -30°C
(Chapter 2.2). Nevertheless, provided that these albumin fragments are validated in
independent study populations, they can potentially offer further insight into the
pathophysiological mechanisms associated with, or underlying, breast cancer, as well as
aid in a more accurate diagnosis of breast cancer. Similarly, structural identification and
validation of the other discriminative peak clusters observed in tissue is warranted to
assess their potential role in breast cancer.
Perspectives
The ultimate aim of proteomic protein profiling studies has been the identification of
novel, specific oncoproteins, to augment knowledge of the molecular mechanisms
underlying breast carcinogenesis and to improve breast cancer care. Still, the candidate
biomarkers identified thus far constitute of normal cellular proteins and highly
abundant proteins normally present in the blood compartment, which were identified
in various cancer types. As such, specific oncoproteins have yet not been detected using
the SELDI-TOF MS approach. The term ‘oncoprotein’ can, however, be defined in
various ways. For instance, the currently approved oncoproteins (e.g., Cancer Antigen
15.3 and 27.29) are proteins for which the genetic blueprint is present in every cell.
Thus, in theory, the expression of these oncoproteins is not limited to malignant cells.
The resulting lack of specificity is reflected by the moderate performance of these
markers. True oncoproteins, on the other hand, are highly tumour-specific, since their
genetic blueprint is confined to malignant cells. These oncoproteins are the result of the
specific genetic mutations that underlie malignant transformation, leading to aberrant
amino acid sequences and/or post-translational modifications. However, as cancer cells
are deranged host cells, and most cancers of epithelial origin share similar molecular
features, it may be hard to find true cancer-specific proteins, expressed exclusively by
one type of malignant cells.
Alternatively, since these true oncoproteins are expected to be among the least
abundant proteins, they could simply have eluded detection thus far, due to the limited
dynamic range of the current protein profiling technologies. The detection of low
190
Conclusions and perspectives
abundant proteins can be facilitated by reducing the dynamic range of their biological
matrix prior to analysis. Although this can be achieved by either (immuno)depletion of
the most abundant proteins, or enrichment of the low abundant proteins by, for
instance, anion-exchange fractionation, these approaches all suffer from inherent
limitations (e.g., co-depletion of low abundant proteins, loss of reproducibility). As
such, they have not yet resulted in identification of true oncoproteins. Alternatively,
less complex biological matrices, such as (media from) cell lines could be investigated.
Although this simultaneously reduces the biological heterogeneity that is characteristic
of human specimens, cells grown in vivo and in vitro are not identical due to adaptation
to cell culture conditions. Reduction of heterogeneity can also be accomplished by
analysis of mouse models, though mouse and human specimens are likely to be of
similar complexity.
Furthermore, the identification of true oncoproteins can be hampered by the very same
genetic blueprint that defines them as true oncoproteins. Detection in protein profiling
studies requires these oncoproteins to sufficiently differ from native proteins in mass
and/or physicochemical properties. This will be easily achieved for oncoproteins that
are the result of genetic mutations leading to aberrant post-translational modifications.
However, the oncoproteins that are defined by aberrant amino acid sequences might
prove more difficult to detect, as changes in amino acids do not necessarily alter the isoelectric point or mass of proteins. For instance, apolipoprotein A-I (28 kDa, described in
Chapter 3.1) has a mass error of 1.4-15 Da upon detection by high-throughput protein
profiling platforms (of 30 to 500 ppm mass accuracy). Hence, its D127N variant, in
which aspartic acid (115.09 Da) is substituted by asparagine (114.10 Da), may not be
detected. Likewise, substitution of multiple amino acids might not cause a net mass
difference. Detection of oncoproteins can also be precluded when multiple genetic
mutations result in loss of protein expression.
Identification of true oncoproteins is, however, not a prerequisite for improving breast
cancer care, as better breast cancer diagnosis and prognosis can also be accomplished by
surrogate biomarkers of disease. A class of proteins currently recognised for their
surrogate biomarker potential, are the (proteolytic fragments of) high-abundant
circulatory proteins. These fragments are hypothesised to either be generated by cancer
type-specific exoprotease activity, superimposed on the ex vivo coagulation and
complement degradation proteolytic pathways, or result from tumour-secreted
proteases that process host-response proteins upon their exposure to the tumour
microenvironment. Although in breast cancer, this concept has been investigated for
a.o. serum ITIH4, the various studies have reported contradictory results, most likely
caused by the susceptibility of the serum proteome to various pre-analytical parameters.
Hence, the concept of cancer type-specific (host response) protein fragments generated
by tumour-specific proteases still awaits confirmation by validation studies that apply
rigorous sample handling protocols. The latter requirement could prove to be a
191
Conclusions and perspectives
bottleneck in the clinical use of these markers, as a strict adherence to collection
protocols might not be feasible in the long run.
Another bottleneck in the clinical application of protein biomarkers lies in the poor
reproducibility of the semi-quantitative laser desorption/ionisation (LDI) technologies,
such as SELDI-TOF MS. The reproducibility can be improved by standardisation of preanalytical parameters, automation of the experimental work-flow, the use of replicate
measurements, and a frequent performance check of the SELDI-TOF MS instrument.
However, as the matrix co-crystallisation and desorption/ionisation steps central to the
LDI technology are highly complex processes involving different thermodynamic and
physicochemical phenomena, which are currently not well understood, the LDI process
in itself is essentially uncontrollable. The use of internal standards (a commonly used
method to correct for analytical variability), is hampered by competition for array
surface binding and ionisation, which play prominent roles in the complex mixtures
generally investigated (e.g., serum, tissue lysates). Hence, the variability between
measurements over time can not completely be eliminated by analytical quality control
procedures, and efforts to develop statistical and/or bioinformatical methods to correct
for this variation should be made. As these reproducibility issues hamper the future use
of classification algorithms based on relative (SELDI-TOF MS) peak intensities,
quantitative measurements of candidate biomarkers are currently still preferable for
clinical use. However, quantitative methods such as ELISA will not always suffice. An
antibody based assay cannot distinguish the parent protein from its cleaved fragments
(the latter of which could possess the greatest diagnostic potential) since the antibody
recognises its cognate epitope in both the parent and fragment proteins, thereby
precluding correct quantitation. Hence, high-throughput, multiplex immuno mass
spectrometry technologies that can discriminate between the different antibody-bound
proteins should be developed.
In conclusion, mass spectrometry based profiling techniques should be considered as a
means of screening the proteome to identify protein patterns indicative of cancer, and
biomarkers candidates should be validated in new study populations using other,
quantitative, methods. Following their validation, candidate biomarkers must be
investigated for their utility as breast cancer biomarkers in larger, prospective, clinical
settings, also including different disease types (e.g., benign breast diseases) to ascertain
specificity. This move from the discovery phase to the pre-clinical and subsequent
clinical validation phase is mandatory, as the sole purpose of a biomarker lies in its
application. Overseeing the results of all SELDI-TOF MS protein profiling studies in
breast cancer up to now, this platform holds promise as a high-throughput screening
tool for discovery of novel breast cancer markers. Provided that these studies are
performed with adequate statistical power and analytical rigour, they could eventually
fulfil the great promise that protein biomarkers have for improving cancer patient
outcome.
192
Summary
Samenvatting
Summary
Summary
Breast cancer imposes a significant healthcare burden on women worldwide, as it is
estimated to be the most commonly diagnosed neoplasm in women. In addition,
preceded only by lung cancer, breast cancer is at present the second leading cause of
cancer deaths. Despite the substantial progress made in cancer therapy, the five-year
survival rate of breast cancer still is inversely proportional to its stage at the time of
diagnosis. Hence, short of prevention, detection of breast cancer at an early, still
curable, stage would offer the best route to decrease its mortality rates. However, since
many patients present with advanced disease, the current diagnostic screening tools
(e.g., mammography) obviously do not suffice for adequate breast cancer diagnosis. In
addition, despite the survival benefit achieved by locoregional treatment and adjuvant
systemic therapy, many breast cancer patients will eventually develop metastatic
relapse and die, while a small percentage of patients would have survived without these
treatment modalities. Evidently, the currently applied prognostic and predictive
markers (e.g., age, hormone receptor status) lack adequate performance as well. Hence,
better markers for early diagnosis, accurate prognosis and treatment prediction, applied
either individually or in conjunction with existing modalities, are warranted to improve
breast cancer care. As proteins reflect the actual state of an organism and are readily
measurable in biological matrices such as blood and tissue, they hold promise as
potential biomarkers for cancer.
Recent advances in analytical technologies, such as protein microarrays and mass
spectrometry (MS), have enabled large-scale proteomic analyses. Two MS based
technologies in particular, i.e., matrix-enhanced laser desorption/ionisation time-offlight (MALDI-TOF) MS and its variant surface-enhance laser desorption/ionisation
(SELDI-) TOF MS have been widely deployed for cancer biomarker discovery, cue to
the relative simplicity of sample preparation, high analytical sensitivity and speed of
data acquisition. In Chapter 1.1, a comprehensive overview of the protein profiling
studies performed in breast cancer by these two LDI platforms is provided. Many
biomarker candidates have been detected. However, structural identification,
validation, and investigation in large, prospective clinical trials are obligatory prior to
their eventual application in clinical patient care. Nonetheless, the two platforms hold
promise as a high-throughput screening tools for discovery of breast cancer biomarkers,
provided that studies are performed with adequate statistical power and analytical
rigour.
In Chapter 2, technical and pre-analytical aspects related to protein profiling research is
investigated. In Chapter 2.1, the performance of the first and second generation SELDITOF MS apparatus is compared. No differences between the instruments were observed
in the number of peaks detected in whole serum, the biomarker potential of the
197
Summary
detected peaks, and the reproducibility of the analyses. However, the second generation
SELDI-TOF MS had a superior performance in the analysis of anion-exchange
fractionated serum, since up to twice as many peaks were detected compared to the first
generation apparatus.
It is increasingly recognised that pre-analytical variables, such as sample collection,
processing, and storage temperature, can exert profound effects on the serum proteome.
However, although the majority of clinical studies investigate samples originating from
sample banks, only little is known about the possible effects of the pre-analytical
variable ‘sample storage duration’. We therefore investigated the effects of extended
storage duration (1 to 11 years) on the SELDI-TOF MS serum protein profile (Chapter
2.2). Several protein peaks, structurally identified as C3a des-Arginine anaphylatoxin
and multiple fragments of albumin and fibrinogen were found significantly associated
to sample storage duration, following five (non-)linear patterns. Of note, these proteins
have also been described as potential cancer markers, rendering them specific to both
disease and sample handling issues. Hence, to prevent experimental variation from
being interpreted erroneously as disease associated variation, assessment of potential
(non-)linear confounding by pre-analytical parameters should be an integral component
of biomarker discovery and validation studies.
We applied the SELDI-TOF MS technology in the search for novel candidate
biomarkers that can be used in diagnosis (Chapter 3.1 and 3.2) or prognosis (Chapter 3.3
and 3.4) in breast cancer. In Chapter 3.1, the identification of serum proteins by which
breast cancer patients could be discerned from healthy controls is described. Ten peaks,
structurally identified as C3a des-Arginine anaphylatoxin, (tentative) inter-alphatrypsin inhibitor heavy chain (ITIH4) fragments, and a (tentative) fibrinogen fragment,
were found significantly discriminative in both the training and the test set. None of
these peaks were influenced by clinical (subjects’ age) and pre-analytical (sample
storage duration) parameters. Nonetheless, the constructed classification model had an
only moderate performance, most likely originating from the highly heterogeneous
nature of breast cancer. Hence, selection of breast cancer subgroups for comparison
with healthy controls is expected to improve results of future diagnostic SELDI-TOF
MS studies.
In Chapter 3.2, we describe the validation of diagnostic SELDI-TOF MS serum protein
profiles for breast cancer discovered by other research groups, by investigation of an
independent study population in our laboratory. Although (part of) the reported
markers were recovered from our study population, none had sufficient performance to
be applied as a marker, exemplifying analytical (i.e., reproducibility of the SELDI-TOF
MS assay) and statistical (e.g., data overfitting) problems associated with this type of
research. Confirmation of validity therefore is essential in obtaining the true clinical
applicability of candidate biomarkers.
198
Summary
Besides diagnostic serum protein profiles, profiles for prognosis may also aid breast
cancer management. Better breast cancer prognostication may improve selection of
patients whom would benefit from adjuvant therapy, thereby reducing both over- and
undertreatment of the disease. In Chapter 3.3, a retrospective follow-up study in which
sera of high-risk primary breast cancer patients were investigated in search for proteins
predictive of recurrence free survival is described. Although we initially found the
haptoglobin phenotype to be a strong predictor of recurrence free survival in our
discovery study population (n = 63), this was not confirmed following analysis of a
similar, but six-fold larger, validation sample set (n = 371), rendering our initial
observation most likely false positive. These results emphasise the importance of
validation in a sufficiently sized population, as even the most thorough statistical
methodology can not preclude chance results, and hence, erroneous conclusions.
Subsequently, to specifically explore the allegedly high-informative low abundant
serum proteome, we fractionated a subset of the previous validation set by anionexchange chromatography, and analysed part of the resulting fractions by SELDI-TOF
MS (Chapter 3.4). Four protein peaks, one of which identified as a serum ITIH4
fragment, were found to contain significant prognostic value. Three peak clusters
(including the ITIH4 fragment) remained significantly associated to recurrence free
survival following inclusion of clinical parameters of known prognostic value. Provided
that the (other three) discovered candidate biomarkers are structurally identified and
validated by independent study populations, they can eventually be applied in
improving breast cancer prognostication.
In Chapter 4, the analysis of both serum and breast tissue, collected prospectively from
breast cancer patients, benign breast disease patients, and female healthy controls, is
described. Sera were collected pre- and post-operatively, to assess the applicability of
serum protein profiles for follow-up after surgery. Contrary to the analysis of crude
serum, proteome analyses at the tumour level yielded multiple SELDI-TOF MS peak
clusters with discriminative value between malignant and healthy / benign tissue. Two
discriminative proteins were structurally identified as N-terminal albumin fragments.
Most probably originating from tumour-specific protease activity, following their
validation, these proteins might provide further insight into the pathophysiological
mechanisms associated with, or underlying, breast cancer. Similarly, structural
identification of the other discriminative tissue peak clusters is warranted to assess their
potential role in breast cancer. Lastly, upon validation, these proteins can potentially aid
in a more accurate diagnosis of breast cancer.
In conclusion, the development of high-throughput mass spectrometric protein
profiling approaches such as SELDI-TOF MS has enabled the simultaneous detection of
part of the proteome in clinical samples in a high-throughput fashion. Overseeing the
results of all SELDI-TOF MS protein profiling studies in breast cancer up to now, this
199
Summary
platform holds promise as a high-throughput screening tool for discovery of novel
breast cancer markers. Yet, its successful application requires further consideration of
key issues such as sample handling procedures, enhancement of the dynamic range,
reproducibility, and external validation of results. Provided that these studies are
performed with adequate statistical power and analytical rigour, they could eventually
fulfil the great promise that protein biomarkers have for improving breast cancer care.
200
Samenvatting
Samenvatting
Borstkanker is naar schatting de meest gediagnosticeerde vorm van kanker bij vrouwen
en vormt daarmee wereldwijd een significante gezondheidsbelasting. Daarnaast heeft
borstkanker momenteel (op longkanker na) de hoogste kankergerelateerde mortaliteit.
Ondanks de aanzienlijke vooruitgang die op therapiegebied is geboekt, is de vijf-jaars
overleving van borstkanker nog steeds omgekeerd evenredig met het stadium ten tijde
van diagnose. Op preventie na biedt detectie van borstkanker in een vroeg, geneeslijk,
stadium daarom de beste kans op reductie van de mortaliteit. Echter, de huidige
detectiemethoden (zoals mammografie) blijken niet optimaal voor vroege detectie,
aangezien veel patiënten pas in een laat stadium worden gediagnosticeerd. Bovendien
zullen veel patiënten ondanks locoregionale behandeling en adjuvante systemische
therapie uiteindelijk overlijden na terugkeer van de ziekte, terwijl een klein aantal
patiënten ook zonder therapie langdurig in leven zou zijn gebleven. Duidelijk is dat ook
de huidige prognostische en predictieve markers (bijvoorbeeld leeftijd, hormoon
receptor status) onvoldoende presteren. Er is daarom behoefte aan verbeterde markers
voor vroege diagnose, accurate prognose en predictie van therapie-effectiviteit, die
alleen of in combinatie met bestaande methoden de borstkankerzorg kunnen
verbeteren. Omdat eiwitten (zowel in hoeveelheid als soort) een goede afspiegeling
geven van de staat waarin een organisme verkeert, en omdat eiwitten eenvoudig
meetbaar zijn in biologische monsters zoals bloed en weefsel, kunnen zij uitstekende
biomarkers voor borstkanker zijn.
Recente ontwikkelingen in analytische technologieën, zoals eiwit-microarrays en
massaspectrometrie (MS), hebben grootschalige eiwitanalyse mogelijk gemaakt. Gezien
hun relatief eenvoudige monstervoorbewerking, grote analytische sensitiviteit en
analysesnelheid, zijn twee MS technieken in het bijzonder, dat wil zeggen “matrixenhanced laser desorption/ionisation time-of-flight” (MALDI-TOF) MS en zijn variant
“surface-enhanced laser desorption/ionisation” (SELDI-)TOF MS, veelvuldig toegepast
in de zoektocht naar nieuwe biomarkers voor kanker. In Hoofdstuk 1.1 wordt een
uitgebreid overzicht gegeven van de proteomicsstudies die tot dusver met behulp van
deze twee technieken binnen borstkanker zijn uitgevoerd. Er zijn zeer veel potentiële
biomarkers gedetecteerd, echter, structurele identificatie, validatie en onderzoek in
grote, prospectieve klinische studies zijn noodzakelijk alvorens deze markers binnen de
kliniek toegepast kunnen worden. Desalniettemin zijn beide technieken veelbelovend
als snelle screeningsmethode voor het vinden van nieuwe borstkankermarkers, mits
deze studies met voldoende statische power en analytische nauwkeurigheid worden
uitgevoerd.
In Hoofdstuk 2 wordt het onderzoek naar de technische en preanalytische aspecten van
de proteomicsstudies beschreven. In Hoofdstuk 2.1 wordt de werking van de eerste en
201
Samenvatting
tweede generatie SELDI-TOF MS apparaten vergeleken. Er werden geen verschillen
gezien tussen het aantal pieken dat werd gedetecteerd in serum, de potentie van de
pieken als biomarker en de reproduceerbaarheid van de analyse. Echter, wanneer
gefractioneerd serum werd geanalyseerd, werden met de nieuwe generatie SELDI-TOF
MS tot tweemaal toe zoveel pieken gedetecteerd als met het apparaat van de oude
generatie.
Het wordt steeds meer onderkend dat preanalytische variabelen, zoals de manier
waarop de monsters zijn afgenomen en verzameld, alsook de temperatuur waarop de
monsters worden opgeslagen, van grote invloed kunnen zijn op het eiwitprofiel van
biologische monsters. Het merendeel van de klinische proteomicsstudies is
retrospectief, waardoor monsters vaak gedurende langere tijd zijn opgeslagen.
Desalniettemin is er slechts zeer weinig bekend over de mogelijke invloed van
opslagduur op het SELDI-TOF MS eiwitprofiel. In Hoofdstuk 2.2 hebben wij daarom
het effect van langdurige opslag (1 tot 11 jaar) op het SELDI-TOF MS serum eiwitprofiel
onderzocht. Van verschillende eiwitten bleek de expressie significant geassocieerd te
zijn met opslagduur, volgens vijf verschillende, (niet-) lineaire patronen.
Geïdentificeerd als C3a des-Arginine anafylatoxine en meerdere fragmenten van
albumine en fibrinogeen, bleken deze eiwitten echter tevens beschreven in de
literatuur als potentiële markers voor kanker, waardoor ze specifiek zijn voor zowel
kanker als preanalytische variabelen. Om uit te sluiten dat experimentele variatie
foutief geïnterpreteerd wordt als variatie die aan de ziekte (kanker) is gerelateerd, zou
het onderzoek naar potentiële (niet-)lineaire effecten van preanalytische variabelen een
vast onderdeel moeten zijn van proteomicsstudies waarin biomarkers worden gezocht
cq gevalideerd.
De SELDI-TOF MS technologie is vervolgens gebruikt voor de detectie van nieuwe,
potentiële, biomarkers die toegepast kunnen worden in de diagnose (Hoofdstuk 3.1 en
3.2), of prognose (Hoofdstuk 3.3 en 3.4) van borstkanker. In Hoofdstuk 3.1 wordt de
identificatie beschreven van serumeiwitten waarmee borstkanker patiënten van
gezonde vrouwelijke controles onderscheiden kunnen worden. De intensiteiten van
tien pieken, geïdentificeerd als C3a des-Arginine anafylatoxine, en (tentatieve)
fragmenten van inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4) en fibrinogeen,
waren significant onderscheidend tussen borstkanker en controle in zowel de trainingals de test-set. De intensiteit van geen van de pieken werd beïnvloed door klinische
(leeftijd) dan wel preanalytische (opslagduur) parameters. Desalniettemin had het
geconstrueerde classificatiemodel een slechts bescheiden sensitiviteit en specificiteit, die
hoogstwaarschijnlijk veroorzaakt worden door de grote heterogeniciteit van
borstkanker. Selectie van borstkanker subgroepen voor vergelijking met gezonde
vrijwilligers zal de resultaten van toekomstige SELDI-TOF MS studies naar alle
waarschijnlijkheid verbeteren.
202
Samenvatting
In Hoofdstuk 3.2 wordt de validatie beschreven van diagnostische SELDI-TOF MS
serumeiwitprofielen voor borstkanker ontdekt door andere onderzoeksgroepen, door
analyse van een onafhankelijke studiepopulatie in ons laboratorium. Hoewel de
gerapporteerde biomarkers (deels) werden gedetecteerd in onze studiepopulatie, was
geen van de kandidaat markers voldoende sensitief en specifiek om als marker toegepast
te worden. Deze resultaten illustreren de analytische (de reproduceerbaarheid van de
SELDI-TOF MS analyse) en statistische (data ‘overfitting’) problemen die inherent zijn
aan dit type onderzoek. Bevestiging van de validiteit is daarom essentieel voor het
vaststellen van de mogelijke klinische toepasbaarheid van kandidaat-biomarkers.
Naast diagnostische eiwitprofielen kunnen ook prognostische eiwitprofielen van belang
zijn binnen de borstkankerzorg. Een verbeterde prognostische evaluatie kan helpen bij
een meer accurate selectie van patiënten die baat hebben bij therapie, waardoor zowel
over- als onderbehandeling gereduceerd kan worden. In Hoofdstuk 3.3 wordt een
retrospectieve follow-up studie beschreven, waarin sera van hoogrisico patiënten met
primaire borstkanker onderzocht zijn op eiwitten die predictief zijn voor ziektevrije
overleving. In eerste instantie bleek het haptoglobine fenotype zeer sterk geassocieerd
met ziektevrije overleving in de training-set (n = 63). Echter, dit resultaat was zeer
waarschijnlijk vals positief, aangezien de gevonden associatie niet werd bevestigd na
analyse van een vergelijkbare, maar zesmaal grotere, validatie-set (n = 371). Deze
resultaten onderschrijven het belang van validatie in een voldoende grote test-set,
aangezien zelfs de meest rigoureuze statistische methodologie de kans op vals positieve
resultaten, en daarmee op onjuiste conclusies, niet kan uitsluiten.
Vervolgens is een deel van de sera (n = 82) uit de hierboven beschreven validatie-set
gefractioneerd middels anion-exchange chromatografie om zo de vermoedelijk zeer
informatieve laagabundante serum eiwitten te onderzoeken op aanwezigheid van
prognostische markers (Hoofdstuk 3.4). Na analyse van geselecteerde serum fracties met
SELDI-TOF MS bleken vier eiwitpieken (waarvan een geidentificeerd als een serum
ITIH4 fragment) significante prognostische informatie te bevatten. Drie van deze
eiwitten (inclusief het ITIH4 fragment) bleken ook na inclusie van klinische parameters
met bekende prognostische waarde significant geassocieerd met ziektevrije overleving.
Deze eiwitten kunnen mogelijk op termijn ingezet worden voor verbetering van
borstkankerzorg, mits ze verder worden geïdentificeerd en gevalideerd in
onafhankelijke studie populaties.
Hoofdstuk 4.1 beschrijft de analyse van zowel serum als borstweefsel, prospectief
verzameld van borstkankerpatiënten, patiënten met een goedaardige borstaandoening
en gezonde, vrouwelijke, vrijwilligers. De sera werden zowel pre- als postoperatief
verzameld om te onderzoeken in hoeverre de serumeiwitprofielen toepasbaar zijn
binnen de follow-up na chirurgie. In tegenstelling tot de analyse van de
(ongefractioneerde) sera, leverde de analyse van eiwitten op het tumorniveau meerdere
SELDI-TOF MS eiwitpieken op met onderscheidende waarde tussen maligne en
gezond/benigne weefsel. Twee van deze onderscheidende eiwitten werden
203
Samenvatting
geïdentificeerd als albumine fragmenten. Mits gevalideerd, kunnen deze eiwitten, die
waarschijnlijk zijn gegenereerd door tumorspecifieke protease activiteit, mogelijk meer
inzicht verschaffen in de pathofysiologische mechanismen die ten grondslag liggen of
samengaan met de ontwikkeling van borstkanker. Evenzo is de identificatie van de
overige onderscheidende eiwitten vereist om hun potentiële rol in borstkanker vast te
stellen. Na validatie kunnen deze eiwitten mogelijk helpen in het stellen van een meer
accurate diagnose van borstkanker.
Concluderend kan worden gesteld dat de ontwikkeling van snelle,
massaspectrometrische, proteomics technieken, zoals SELDI-TOF MS, de gelijktijdige
detectie van een deel van de eiwitsamenstelling van klinische monsters mogelijk heeft
gemaakt. Wanneer de resultaten van de SELDI-TOF MS studies, die tot dusver binnen
borstkanker zijn uitgevoerd, in ogenschouw worden genomen, kan worden
geconcludeerd dat deze technologie een veelbelovende, snelle, screeningsmethode is,
die uitstekend toegepast kan worden in de zoektocht naar nieuwe markers voor
borstkanker. Echter, voor een succesvolle toepassing van de technologie moet aandacht
geschonken worden aan essentiële zaken, zoals procedures voor monsterafname, verwerking en -opslag, verbetering van de dynamische range, reproduceerbaarheid en
externe validatie van resultaten. Mits deze proteomicsstudies uitgevoerd worden met
voldoende statistische power en analytische nauwkeurigheid, kunnen zij op termijn de
grote belofte die eiwitbiomarkers voor verbetering van de borstkankerzorg hebben.
204
Dankwoord
Curriculum vitae
List of publications
Dankwoord
Dankwoord
Dit proefschrift is tot stand gekomen met hulp van velen. Een aantal personen wil ik
hierbij graag in het bijzonder bedanken.
Een eerste woord van dank gaat uit naar alle vrouwen, die bereid zijn geweest om
bloed- en weefselmonsters af te staan voor onderzoek naar borstkanker. Zonder hun
medewerking had het in dit proefschrift beschreven onderzoek niet uitgevoerd kunnen
worden.
Daarnaast wil ik mijn beide promotores, prof. dr Jos Beijnen en prof. dr Jan Schellens,
danken voor de gelegenheid die ze mij hebben geboden om het in dit proefschrift
beschreven onderzoek uit te voeren, en voor de begeleiding hierbij. Beste Jos, ik heb
groot respect voor jouw onuitputtelijke ideeën, motivatie en enthousiasme voor de
wetenschap, alsook voor de manier waarop jij farmacie, wetenschap en
(ziekenhuis)management zo pragmatisch weet te combineren. Ik wil je danken voor je
vertrouwen en de vrijheid die je mij geboden hebt, je immer positieve kijk op de
onderzoeksresultaten (met name op momenten dat die mij even ontbrak), en het feit dat
je deur letterlijk en figuurlijk te alle tijden voor mij heeft opengestaan. Beste Jan, veel
dank voor je waardevolle input op klinisch gebied en de kritische blik op mijn
manuscripten; ik heb veel van jou geleerd.
Carla van Gils en Lodewijk Wessels hebben een belangrijke rol gespeeld bij de
totstandkoming van dit proefschrift. Beste Carla en Lodewijk, hoewel jullie commentaar
me soms tot wanhoop dreef, ben ik jullie er altijd dankbaar voor geweest; de artikelen
zijn er veel beter van geworden. Veel dank voor jullie constructieve begeleiding, ik heb
heel veel van jullie geleerd.
Marc Zapatka, dear Marc, I have appreciated our numerous informative and pleasant
discussions. Thank you for your help with the ´appropriate bioinformatics tools´
(described in Chapter 3.4).
Special thanks also to Nathan Harris. Dear Nathan, I have much enjoyed the highly
informative days I have spend under your supervision at the Ciphergen Lab in
Guildford, and I am very thankful for your help with the protein identifications.
Hans Bonfrer, Tiny Korse, Dorothé Linders en Olaf van Tellingen van het Klinisch
Chemisch Laboratorium van het AvL hebben een belangrijke rol gespeeld bij de
monsterverzameling. Veel dank voor het ontsluiten van de serumbank, alsook voor het
gebruik van de mikrodismembrator. Olaf, dank voor jouw waardevolle inbreng tijdens
het maandagochtendoverleg.
Daarnaast ben ik de chirurgen Eric van Dulken en Lieve de Kock, en de plastisch
chirurgen Florine Kingma-Vegter en Thea van Loenen, zeer erkentelijk voor hun
onmisbare bijdrage aan het slagen van de klinische studie (beschreven in hoofdstuk 4.1).
Dank voor de leerzame en plezierige samenwerking. Ook Marian van der Linde,
mammacare verpleegkundige van het Slotervaartziekenhuis, wil ik in dit opzicht niet
209
Dankwoord
onbenoemd laten. Beste Marian, zonder jouw inbreng was de klinische studie niet zo
soepel verlopen; veel dank hiervoor.
De afdeling Klinische Chemie van het Academisch Ziekenhuis Maastricht, in het
bijzonder Etiënne Michielsen en Judith Bons, wil ik danken voor de organisatie van de
jaarlijkse ‘SELDI-gebruikers dag’, en voor het gebruik van een aantal
softwareprogramma’s. Ik heb goede herinneringen aan de dagen bij jullie in Maastricht,
met name aan de hoge computerdichtheid bij jullie op de kamer (simultane data-analyse
gaat nu eenmaal sneller!). Ook Leo Kruijt van de Animal Science Group te Lelystad wil
ik bij dezen danken voor de hartelijke ontvangst en de plezierige werkomgeving;
stroomstoringen, SELDI-‘jams’ en gecrashte harde schijven hebben de productiviteit en
het plezier van de reis naar Lelystad niet kunnen drukken!
Helgi Helgason, onze ‘proteomics’ wegen hebben elkaar regelmatig gekruist; dank voor
de zeer plezierige samenwerking. Ik wens je veel succes met de voortzetting van jouw
onderzoek in IJsland, en kom je graag een keer opzoeken.
Wouter Meuleman wil ik bij dezen graag danken voor zijn hulp bij de data-analyse.
Wouter, hoeveel ‘das Experiment’-en hebben we nu uiteindelijk uitgevoerd? Jouw
gevoel voor humor (‘aim: world domination’) heb ik altijd bijzonder gewaardeerd. Veel
succes met jouw verdere onderzoek.
De lab-apotheek analisten wil ik graag bedanken voor het wegwijs maken in deze
strakgeregelde afdeling, en de gezellige dagen op het lab.
Ik wil mijn collega-OIO’s uit het Slotervaart en het AvL bedanken voor de leuke tijd en
de positieve werksfeer, eerst in de Onderwereld, later in de Zonnetempel. De vele SLZlunches en -diners, koffiepauzes, vrijdagmiddagborrels in ‘die Rooie’, schoen-zet-acties
met Sinterklaas, OIO-weekendjes, en -uitjes vormen dierbare herinneringen. Natalie en
Judith, zoals Jos al eerder opmerkte, waren wij samen op eiwitgebied ‘de drie
musketiers’ van het Slotervaartziekenhuis, waarbij Annemieke de rol van D’Artagnan
vervulde ;). Natalie, jij als nestrix wist altijd wel raad als wij ergens mee vastliepen.
Judith, als ‘de proteomics-dames’ hebben wij jarenlang lief en leed gedeeld; ik kijk er
met plezier op terug. Dank voor de vele discussies met synergistisch resultaat.
Annemieke, veel dank voor de vele (statistiek-)discussies, maar zeker ook voor de hulp
bij de laatste experimenten in Lelystad (fràctionation
;)). Veel succes met het
afronden van je onderzoek.
Daarnaast wil ik ook in het bijzonder Ly (het orakel), Rob (de shaker is het nèt niet
geworden!) en Joost (I love …) danken voor de vele hilarische momenten, het
meejuichen in majeure tijden en het luisterend oor in mineure tijden. Veel succes met
de laatste loodjes! Jolanda, dank voor de geruisloze en betrouwbare overname van de
monsterverzameling als ik er niet was. Elke, mijn oud-kamergenoot, jij weet als geen
ander hoe ik jouw positiviteit, eerlijkheid, rust en vertrouwen waardeer. Ik ben heel blij
dan je mijn paranimf wilt zijn.
Lieve Sylvia en Paul (PotPaPaul), ik prijs mijzelf gelukkig met jullie als potentiële
schoonouders! Al blijft het wat surreëel dat ik jullie niets hoef uit te leggen over (mijn)
210
Dankwoord
onderzoek. Dank voor het warme onthaal in jullie familie, de gezellige skivakanties, de
interesse in mijn onderzoek, de (mentale) ondersteuning en de kritische blik op mijn
manuscript. Lieve Gerben en Wendy, dank voor jullie interesse in mijn doen en laten.
Gerben, veel succes met je onderzoek; het is bij jou in goede handen. Wendy, wat is het
leven zonder kunst? Dank voor jouw alpha-inbreng in mijn bèta-leven ;).
Lieve papa en mama, veel dank voor jullie vertrouwen, de (mentale) support en de
veilige thuishaven waar ik altijd belangeloos op terug kon vallen; het zijn belangrijke
bijdrages aan dit proefschrift geweest. Lieve Bonma, uw scherpe oog en verstand
ontgaan niets. Dank u voor de interesse in mijn onderzoek en de wijze adviezen.
Elseline en Ruud en de kinderen, wat is het fijn om bij jullie te zijn! Dank voor de
ontspanning die jullie mij bieden. Gerrie-Cor en Niels, dank voor jullie belangstelling
voor mijn onderzoek, en uiteraard ook voor de gezellige avondjes in de kroeg ;). GerrieCor, bijzonder leuk dat ook jij het promotietraject in bent gegaan. Jij bent een feest der
herkenning; ik ben heel blij dat jij mijn paranimf wilt zijn.
Lieve Jasper, met jou in mijn leven schijnt iedere dag de zon. Jij bent mijn solide, doch
flexibele, thuisbasis. Dank voor jouw liefde, rustig vertrouwen, positiviteit en humor.
Ik heb je lief!
Marie-Christine
Den Haag, december 2008
211
Curriculum vitae
Curriculum vitae
Marie-Christine Gast werd geboren op 23 februari 1978 te Woerden. In 1996 behaalde
zij het atheneumdiploma aan de Minkema Scholengemeenschap te Woerden, waarna ze
begon met de studie farmacie aan de Universiteit Utrecht. De doctoraalopleiding werd
afgerond met een wetenschappelijke stage aan het Nederlands Forensisch Instituut te
Rijswijk. In 2002 behaalde zij het apothekersdiploma. In datzelfde jaar begon ze als
projectapotheker binnen de Apotheek van het Slotervaartziekenhuis, waar ze het
farmaceutisch toezicht op de GGD Amsterdam heeft opgezet en uitgevoerd. Onder
voortzetting van dit toezicht werd aansluitend op dezelfde werkplek gestart met het in
dit proefschrift beschreven onderzoek, onder leiding van promotores prof. dr J.H.
Beijnen en prof. dr J.H.M. Schellens.
213
List of publications
List of publications related to this thesis
Gast MCW, Bonfrer JMG, Rutgers EJTh, Schellens JHM, Beijnen JH. Proteomics in
breast cancer. EORTC PAMM 2004;25:Abstract 4.03
Bouwman K, Gast MCW, Bonfrer JMG, Schellens JHM, Beijnen JH. Proteomics in de
oncologie: eiwitten analyseren om kanker op te sporen. Pharm Weekbl
2004;139(25):879-84
Gast MCW, Bonfrer JMG, Rutgers EJTh, Schellens JHM, Beijnen JH. New
discriminatory protein profiles in breast cancer patients. Proceedings of ASCO
2004;23:Abstract 574
Gast MCW, Bonfrer JMG, Rutgers EJTh, Schellens JHM, Beijnen JH. Proteomics in
patients with breast cancer: unique profile discriminates patients from healthy controls.
Br J Clin Pharmacol 2005;59(1):130 (Abstract)
Engwegen JYMN, Gast MCW, Schellens JHM, Beijnen JH. Clinical proteomics:
searching for better tumour markers with SELDI-TOF MS mass spectrometry. Trends
Pharmacol Sci. 2006;27(5):251-9
Gast MCW, Bonfrer JM, van Dulken EJ, de Kock L, Rutgers EJTh, Schellens JHM,
Beijnen JH. SELDI-TOF MS serum protein profiles in breast cancer: Assessment of
robustness and validity. Cancer Biomarkers 2006;2(6):235-48
Gast MCW, Engwegen JYMN, Helgason HH, Schellens JHM, Beijnen JH. “Clinical
proteomics” in de oncologie. NtvO 2007:4(4);140-52
Gast MCW, Engwegen JYMN, Schellens JHM, Beijnen JH. Comparing the old and new
generation SELDI-TOF MS: implications for serum protein profiling. BMC Med
Genomics 2008;1:4
Meuleman W, Engwegen JYMN, Gast MCW, Beijnen JH, Reinders MJ, Wessels LFA.
Comparison of normalisation methods for surface-enhanced laser desorption and
ionisation (SELDI) time-of-flight (TOF) mass spectrometry data. BMC Bioinformatics
2008;9:88
Meuleman W, Engwegen JYMN, Gast MCW, Wessels LFA, Reinders MJ. Analysis of
mass spectrometry data using sub-spectra. BMC Bioinformatics 2008; in press.
215
List of publications
Gast MCW, van Tinteren H, Bontenbal M, van Hoesel QGCM, Nooij MA, Rodenhuis S,
Span PN, Tjan-Heijnen VCG, de Vries EGE, Harris N, Twisk JWR, Schellens JHM,
Beijnen JH. Haptoglobin phenotype is not a predictor of recurrence free survival in
high-risk primary breast cancer patients. BMC Cancer 2008;8:389
van Winden AWJ, Gast MCW, Beijnen JH, Rutgers EJTh, Grobbee DE, Peeters PHM,
van Gils CH. Validation of previously identified serum biomarkers for breast cancer
with SELDI-TOF MS: a case control study. BMC Medical Genomics 2009; accepted for
publication.
Gast MCW, Schellens JHM, Beijnen JH. Clinical proteomics in breast cancer: a review.
Breast Cancer Res Treatm 2008; in press.
Gast MCW, van Gils CH, Wessels LFA, Harris N, Bonfrer JMG, Rutgers EJTh, Schellens
JHM, Beijnen JH. Serum protein profiling using SELDI-TOF MS: Influence of sample
storage duration. Submitted for publication.
Gast MCW, van Gils CH, Wessels LFA, Harris N, Bonfrer JMG, Rutgers EJTh, Schellens
JHM, Beijnen JH. Serum protein profiling for diagnosis of breast cancer using SELDITOF MS. Submitted for publication.
Gast MCW, van Dulken EJ, van Loenen TKG, Kingma-Vegter F, Westerga J, Flohil CC,
Knol J, Jimenez CR, van Gils C, Wessels LFA, Schellens JHM, Beijnen JH. Detection of
breast cancer by SELDI-TOF MS tissue and serum protein profiling. Submitted for
publication.
216