A How-To Guide: Evaluating Surrogate Endpoints in Oncology Trials Faculty Disclosures

Faculty Disclosures
• Nothing to disclose pertinent to this
presentation
A How-To Guide:
Evaluating Surrogate Endpoints
in Oncology Trials
Ming Poi, PharmD, PhD
Specialty Practice Pharmacist,
Phase 1 Clinical Treatment Unit and Investigational Drug Service
The Arthur G. James Cancer Hospital and Richard J. Solove Research Institute
at The Ohio State University
2013 Annual HOPA Conference in Los Angeles, California
March 21, 2013
Heimberg J, et al. N Engl J Med. 1999;102:365-78
2
Common Endpoints/Outcomes
Objectives
• Identify appropriate statistical methods for
analysis of common oncology trial designs
• Review available statistical methods for
reporting surrogate endpoints in oncology
clinical trials
• Describe statistical issues in the evaluation of
prognostic and predictive biomarkers in
oncology
•
•
•
•
•
•
•
•
•
•
•
•
Overall Survival (OS)
Progression Free Survival (PFS)
Disease Free Survival (DFS)
Time to Progression (TTP)
Time to Treatment Failure (TTF)
Disease Specific Survival (DSS)
Complete Response (CR)
Durable Complete Response (DCR)
Partial Response (PR)
Objective Response Rate (ORR) = (CR + PR)
Stable Disease (SD)
Progressive Disease (PD)
3
Overall Survival (OS)
4
Pitfalls of OS
• Viewed as the gold standard (“cure”)
• “Failure” = Death
• Required for randomized phase 3 trials,
however…
• Difficult to determine OS in “chronic disease”
(non-curative) for evaluation of efficacy
• Potential influence of a variety of possible
2nd, 3rd, 4th –line therapies on OS
• Particularly for agents hypothesized to
produce their clinical impact via cytostatic
rather than cytotoxic effects
- May result in inability to complete, high dropout rate, loss to follow-up, etc. (long study)
- Require large sample size for definitive
(“statistically significant”) result
- Longer time to complete and achieve required
number of events
5
6
Surrogate Endpoint - Definition
Surrogate Endpoint
• A marker - a laboratory measurement, or
physical sign - that is used in clinical trials as an
indirect or substitute measurement that
represents a clinically meaningful outcome,
such as survival or symptom improvement
• “…intended to substitute for a clinical endpoint”
(Temple R.J. JAMA 1999; 282(8):790-5;
Lesko and Atkinson. Ann Rev Pharmacol Toxicol 2001; 41:347-366)
• “intended to substitute for a clinical endpoint”
- Project effects on a clinical benefit endpoint
- Can be used to reasonably predict clinical
benefit
• The pathophysiology of the disease of the
pathway of mechanism must be reasonably
well-understood
- Is the biomarker in causal pathway of disease
process?
(Woodcock, CDER-FDA, 2011)
Example:
Blood pressure ↔ strokes prevention and CVD
• Requires validation from many randomized trials
7
Surrogate Endpoint – Regulatory Definitions
Source
Definition
57 FR 13234–13242 (1992)
A surrogate end point, or “marker,” is a laboratory
measurement or physical sign that is used in therapeutic
trials as a substitute for a clinically meaningful endpoint
that is a direct measure of how a patient feels, functions
or survives and is expected to predict the effect of the
therapy.
Prentice’s Criteria
for Surrogate Endpoint Validation
• Correlate - statistically correlated to the
clinical endpoint
FDAMA 1997 USC Section
504(b)(1)
…a “surrogate” endpoint that is reasonably likely to
predict clinical benefit.
Title 21 – Food and Drugs
21 C.F.R. 314 Section
314.510
…a “surrogate” endpoint that is reasonably likely, based
on epidemiologic, therapeutic, pathophysiologic, or other
evidence, to predict clinical benefit.
Guidance for Industry:
Evidence-based review
system for the scientific
evaluation of health claims
Surrogate endpoints are risk biomarkers that have been
shown to be valid predictors of disease risk and therefore
may be used in place of clinical measurements of the
onset of the disease in a clinical trial.
FR=Federal Register; FDAMA= (Food and Drug Administration Modernization Act);
CFR= Code of Federal Regulations
8
• Capture - the surrogate endpoint should
account for all of an intervention’s effects
(Prentice, R. L. Statistics in Medicine 1989; 8: 431–440)
9
10
Ideal Surrogate
Advantages of Surrogate Endpoints
• Expedite the approval for new drugs/indications
to treat serious/life-threatening illnesses and
expected to provide a meaningful therapeutic
benefit over existing therapy
Guide to Clinical Trials:
• “The ideal surrogate endpoint is a disease marker that
reflects what is happening with the underlying disease. The
relationship between the marker and the true endpoint is
important to establish…. the validity of data based on how the
marker is affected by a medicine/treatment can be translated
into a valid statement about the disease and true endpoint”
(Spilker, B. (1991). Guide to clinical trials. New York: Raven Press)
• Smaller and shorter trials = ↓ $$, ↓ Nme
Time
Intervention
Disease
11
Surrogate
Endpoint
True Clinical
Outcome
(Fleming TR and DeMets DL. Ann Intern Med. 1996; 125(7):605-13)
12
Potential Surrogate Endpoints in Oncology
•
•
•
•
•
Surrogate Endpoint – Accelerated Approval
• From 1992 -2008, FDA approved 90 applications for
drugs based on surrogate endpoints through its
accelerated approval process
• 79 of the 90 - drugs to treat cancer, HIV/AIDS, and
inhalational anthrax
• Approval given on the condition that post marketing
trials be performed to verify clinical benefit (aka
phase IV confirmatory trials)
• If confirmatory trial(s) fail – could lead to removal of
the drug or indication from the market
Response rate
Molecular endpoint assessment
Functional Imaging
Tumor marker
Biomarker
13
Examples of Cancer Drugs Approved by the FDA under Accelerated Approval Process
from 1992 – 2008 Using Surrogate Endpoint(s)
Drug Name
Approval
Date
Approval Indication
Surrogate
Endpoint(s) Used
Bicalutamide
Oct. 4, 1995
Combination therapy for the treatment of advanced
prostate cancer
Time to treatment failure
Docetaxel
May 14, 1996
Treatment of locally advanced or metastatic breast
cancer in specific patients
Response rate
14
Examples of Cancer Drugs Approved by the FDA under Traditional Approval Process
from 1992 – 2008 Using Surrogate Endpoint(s)
Drug Name
Approval
Date
Approval Indication
Surrogate
Endpoint(s) Used
Exemestane
Oct. 21, 1999
Treatment of advanced breast cancer in
postmenopausal women
Objective response rate
(partial and complete)
Sorafenib
Dec. 20, 2005
Treatment of advanced renal cell carcinoma
Progression free survival
Sunitinib
Jan. 26, 2006
Treatment of gastrointestinal stromal tumor and
advanced renal cell carcinoma
Time to progression
Lapatinib
Mar. 13, 2007
Treatment of advanced metastatic breast cancer
Time to progression
Irinotecan
June 14, 1996
Treatment of metastatis carcinoma of the colon or
rectum in certain circumstances
Response rate
Capecitabine
April 30, 1998
Treatment of a specific type of metastasis breast
cancer in certain patients
Response rate
Treatment of refractory anaplastic astrocytoma in
specific adult patients
Progression free survival
at 6 months and
objective response
Ixabepilone
Oct. 16, 2007
Treatment of metastatic or locally advanced
breast cancer
Progression free survival
Bendamustine
Mar. 20, 2008
Treatment of chronic lymphocytic leukemia
Objective response and
progression-free survival
Temozolomide
Aug. 11, 1999
Imatinib
May 10, 2001
Treatment of chronic myeloid leukemia in certain
circumstances
Hematologic/cytogenic
response
Letrozole
Oct. 29, 2004
Extended adjuvant treatment of early breast cancer
in specific postmenopausal women
Disease free survival
Nilotinib
Oct. 29, 2007
Treatment of chronic phase and accelerated phase
Philadelphia chromosome positive chronic
myelogenous leukemia in specific adult patients
Major cytogenic
response and
hematologic response
Bevacizumab
Feb. 22, 2008
Treatment of breast cancer in specific patients
Progression-free survival
(Source:GAO-09-866 NEW DRUG APPROVAL: FDA Needs to Enhance Its Oversight of Drugs Approved on the Basis of Surrogate Endpoints (2009).
Accessed from: www.gao.gov. 3 December, 2012)
(Source:GAO-09-866 New Drug Approval: FDA Needs to Enhance Its Oversight of Drugs Approved on the Basis of Surrogate Endpoints (2009).
Accessed from: www.gao.gov. 3 December, 2012)
15
Which of the following is an acceptable surrogate endpoint for a
phase III trial to investigate if Drug XYZ improves the outcomes
in colon cancer patients in an adjuvant setting?
a)
b)
c)
d)
16
Which of the following is the least preferred regulatory endpoint
for drug approval in oncology?
a) Time to treatment failure
(TTF)
b) Disease free survival (DFS)
c) Progression free survival
(PFS)
d) Time to progression (TTP)
Complete response
Disease free survival
Stable disease
Time to progression
17
18
Statistical Tests – Which One?
Common Statistical Tests
Type of Data
Key questions to ask:
1. What type of data (nominal, ordinal or
continuous?)
2. Are the samples we are comparing
independent or related (i.e. cross-over study)?
3. How many samples/groups we are comparing
(2 or more)?
4. Do the data have a normal-distribution?
Others: Equal variance? Confounders?
1 sample
2 samples
(Independent)
2 related
samples
Nominal
(“categorical”
data)
Pearson Chi-squared
(χ2) Test
Fishers Exact Test
Ordinal
(ranked data)
Mann-Whitney U Test
(parametric)
Wilcoxon Rank-Sum
Test (non-parametric)
Kolmogorov-Smirnov
Test (non-parametric)
Sign Test
Wilcoxon
Signed-Rank Test
Student t-Test
(parametric)
Welch Test (same as
Student t-Test, unequal
variance)
Mann Whitney U Test
(nonparametric )
Paired t-Test
Continuous
(interval data)
Z-Test (population
SD known OR n is
large (> 30))
t-Test (population
SD unknown AND n is
small (<30))
McNemar Test
> 2 Independent
Samples
Chi-squared (χ2)
for k independent
samples
Kruskal Wallis
One Way Analysis of
Variance (ANOVA)
Analysis of
Variance (ANOVA)
n=sample size; SD=standard deviation
19
20
Type I and II Errors: Analogy
Types of Error
• α (alpha) = Type I
- “False positive”, “an innocent person goes to jail”
- Occurs when really is no difference but random
sampling error caused data to show statistically
significance
- H0 rejected but shouldn’t have had
• β (beta) = Type II
Justice System
Defendant Defendant
Innocent
Guilty
Guilty
Verdict
[*power=(1-β)]
- “False negative”, “set the guilty free”
- Occurs when really is a difference but random sampling
error caused data fail to show statistically significance
- Fail to reject H0 but should have had
Type I
Not Guilty
Verdict
Reject H0
Type II
Fail to
reject H0
H0 False
Type I
Type II
If the significance level, α, is decreased (ie from
0.05 to 0.01) then the chance of a Type II error will
be
So What…
Ha
is true
Type I error
Type II error
Power (1-type II)
Set α = 0.05
N=N1
Difference between mean =D
a) Increased
b) Decreased
c) Unchanged
Decrease N,
Same difference of mean
N=N2 (where N2 < N1)
Effects = increase Type II error
= decrease Power
H0 True
22
21
H0
is true
Statistical Testing
Reverse in effects
2-fold Increase difference in mean,
N=N2
Effects = decrease in Type II error
= increase in Power
23
(Applet on http://www.intuitor.com/statistics/T1T2Errors.html)
24
Sample Size Calculation
Sample Size Calculation
Comparing 2 proportions of 2 independent samples
Comparing 2 means of 2 independent samples
• For 1-sided test,
Sample Size “n” for testing two means:
n =
• For 2-sided test,
2 (Zα + Zβ )2 σ 2
(δ)2
α=type I error;
β=type II error;
δ=critical difference;
σ2= variance
α=type I error;
β=type II error;
p0=hypothesized population proportion or rate;
p1= alternative hypothesis.
** the population SD, σ is needed in the sample size formula. This typically unknown value may be (i)
estimated from historical data, or (ii) from a previous study (ie phase II)
26
25
Sample Size Calculation
Sample Size Calculation
A randomized trial proposed to assess the effectiveness of
HOPAtinib compared to Standard-of-Care (SOC) for the treatment
of patients with locally advanced or metastatic non-small cell lung
cancer (NSCLC) that is HOPA-positive. A previous study showed
that proportion of subjects cured by HOPAtinib is 50% and
clinically important difference of 15% as compared to SOC is
targeted.
A randomized trial proposed to assess the effectiveness of HOPAtinib
compared to Standard-of-Care (SOC) for the treatment of patients with locally
advanced or metastatic non-small cell lung cancer (NSCLC) that is HOPApositive. A previous study showed that proportion of subjects cured by
HOPAtinib is 50% and clinically important difference of 15% as compared to
SOC is targeted.
α= 5%, Power = 80%, 2-sided test. What is the n needed?
p1 = proportion of subject cured by HOPAtinib = 0.50,
p2 = proportion of subject cured by SOC = 0.35,
p1-p2 = clinically significant difference = 0.15
Zα/2 = 1.96 (refer to z table in any stats text)
Zβ = 0.84 (refer to z table in any stats text)
Based on above formula the sample size required per group is
167. Hence total sample size required is 334 ~340.
α= 5%, Power = 80%, 2-sided test. What is the n needed?
27
Non-inferiority (NI) Trials
28
Non-inferiority (NI) Trials
Rationale for NI Trial:
• To determine whether a new treatment is no worse
than a reference treatment by more than a pre-defined
margin
• NI ≠ the 2 treatments/drugs are equivalent
• NI ≠ the new drug is not inferior to standard therapy
• Take into consideration:
New treatment may not be better than the standard
but may have other advantages, i.e cost, toxicity profile,
invasiveness, etc.
29
E = experimental therapy
S = standard therapy
∆ = The margin (difference b/t E
and S)
µ = mean
H0 for NI in layman terms:
• E is NOT non-inferior to S (a double negative statement)
Ha : E IS non-inferior to S
• Therefore, if p < 0.05, we reject null hypothesis and
conclude E is non-inferior to S by pre-defined margin
• For p > 0.05, fail to reject null hypothesis
= E is NOT non-inferior to S
30
Biomarker
NI Checklists
• Defined as a characteristic that is objectively
measured and evaluated as an indicator of normal
biological processes, pathogenic processes, or
biological responses to a therapeutic intervention
• In IT trials, ITT analysis will often increase the
risk of type I error (falsely reject H0). Would
like to look for “per-protocol”, or both
analyses
• Is the margin defined reasonable?
• Are the 2 arms set-up “fairly” – dosing
equipotent, stopping rule, etc?
(FDA. DrugDevelopmentToolsQualificationProgram. Access from: http://www.fda.gov/Drugs/Development
ApprovalProcess/ DrugDevelopmentToolsQualification Program/ucm284395.htm. 30 November, 2012)
Examples:
Cholesterol, serum creatinine, blood sugar, tumor size
from magnetic resonance imaging (MRI) or computed
tomography (CT)
31
32
Biomarker
Prognostic Biomarkers
• May be a physiologic, pathologic, or anatomic
characteristic/measurement associate with some
aspect of normal or abnormal biologic
function/process
• A prognostic factor is any patient or disease
characteristic that has a significant impact on a
clinical endpoint.
Example: performance status (PS)
• Change in biomarkers post treatment may
predict/identify safety issues due to a drug
• Change in biomarkers post treatment may reveal
a pharmacological activity expected to predict an
eventual benefit from treatment
• Usually in terms of relative hazard of failure
• Used to determine treatment vs. no treatment
following surgery
• Used to consider aggressiveness of treatment
33
34
Predictive Biomarkers
Types of Validation for Prognostic and
Predictive Biomarkers
• Associated with response (benefit) or lack of
response (benefit) to a particular therapy
relative to other available therapy
• In statistics ~ “interaction effect”
• Select one vs. another treatment
• Example: Her2/neu overexpression/response
to trastuzumab
•
•
•
•
•
•
•
•
•
35
Analytical validation
Accuracy compared to gold-standard assay
Robust and reproducible
Clinical validation
Able to predict independent data
Clinical/medical utility
Result in patient benefit
Actionable
Improving treatment decisions
36
Need for Prognostic & Predictive Biomarkers
• Available treatments are:
- Not effective
- Unable to predict which patients are likely
to benefit
Control medical costs
Improve the success rate of clinical
drug development
(JNCI 2009; 101:736-75)
37
Cheang, et al. (cont’)
38
Cheang, et al., 2009 – Methods
Background:
• New technology (gene expression profiling) has
enabled new molecular classification of breast
cancer subtypes
• ER(+) subtypes, Luminal A and Luminal B have
different characteristics and prognosis
• Luminal-B subtype confers increased risk of
early relapse compared with the luminal-A
357 patient tumors w/ invasive breast carcinomas were subtyped by gene expression profile.
- ER and PR status , HER2 status, and the Ki67 index (% of
Ki67(+) cancer nuclei) were determined IHC
- Pre-specified Ki67 cut point to distinguish luminal B from
luminal A tumors
- The prognostic value of the IHC assignment for BrCa
recurrence-free and disease-specific survival was
investigated with an independent tissue microarray of
4046 breast cancers by use of Kaplan-Meier curves and
multivariable Cox regression
39
Cheang, et al., 2009 – Results
Relapsed
No Relapse
Total
Luminal A
151
474
625
Luminal B
86
177
263
Luminal/HER2+
20
35
55
Total
257
686
943
40
Common Point Estimates
• Point Estimates commonly seen (and misunderstood) in
clinical oncology
Odds ratio
Risk difference
Relative risk (aka
risk ratio)
Hazard ratio
What is the Relative Risk (RR) for relapse
between Luminal B vs. A?
RR =(86/263) / (151/625) = 1.35
Which one should be
used???
What is the difference???
41
42
Hazard Ratio
• Relative risk (RR) is meaningless for casecontrol studies! Use odd ratios(OR) instead
• RR cannot generally be calculated in a casecontrol study because the entire population
has not been studied, so incidences are
unknown UNLESS the incidence is low
RR = (a/a+b) / (c/c+d)
RR ≅ (a/b) / (c/d) ≅ OR
~ “RR-averaged-over-time”
• The risk does not depend on time; risk is
constant over time
• Example: HR=0.7 comparing patients in
temsirolimus group vs. interferon-α
=> Patients in the temsirolimus group are at 0.7
times the risk of death as those in the interferon-α
arm, at any given point in time
a, c = “event”
43
44
Understanding Diagnostic Test
Comparing ROC curves
Actual Value
Predicted Value
• Comparing tests
Best
Better
Good
Positive
negative
Total Number
Positive
TP
FP
PP
Negative
FP
TN
PN
Total number
AP
AN
N
Sensitivity = True Positive Rate = TP/AP
Worthless
Specificity = True Negative Rate = TN/AN
Accuracy = (TP + TN) / N
Positive Predictive Value (PPV) = Precision = TP / PP
Negative Predictive Value (NPV) = TN / PN
F1 score (aka F-score) = a measure of a test's accuracy. It considers both
the sensitivity and precision (“the true positive factor”) = 2 x TP /(PP + AP)
Shows the tradeoff b/t sensitivity and specificity (↑ sensiNvity = ↓ specificity)
45
46
Example: T4 cut-off
Prognostic vs. Predictive Biomarkers
Phase III Study Design
Test Results
[Goldstein and Mushlin
(J Gen Intern Med 1987;2:20-24.)]
•
•
•
•
•
•
Determine Sensitivity and Specificity
…..
ROC plot
47
Treatment-by-biomarker interactions
Enrichment design
Completely randomized design
Randomized block design
Biomarker-strategy design
Prospective vs. retrospective
48
Targeted (Enrichment) Design
Targeted (Enrichment) Design
• Predictive Marker Study Design Enrichment
Design
• Only marker(+) patients are randomized
and/or treated
• Example from NSCLC: EGFR mutation as a
predictive biomarker
(Mok et al. N Engl J Med; 361: 947-57)
Develop Predictor of Response to New Drug
Patient Predicted Responsive
New Drug
Patient Predicted Non-Responsive
Off Study
Control
50
49
Biomarker Stratified Design
• Do not use the biomarker to restrict eligibility
• Purpose: To evaluate the new treatment
overall and for the pre-defined population
Examples of successful stories…
Predictive biomarker
For New Tx
Predicted
Responders
New TX
Drug
Disease
Target
Response Rate
in Phase Ib/II
studies (%)
Crizotinib
Imatinib
Imatinib
NSCLC
CML
GI stromal
EMLA-ALK
BCR-ACL
KIT
57
95
54
Kwak, et al., 2010
Kantarjian, et al., 2002
Demetri, et al., 2002
Predicted
Non-responders
Control
New TX
Control
51
52
A relative risk should not be computed for
the following design because the prevalence of the
disease is artificially constrained.
Take Home Messages
• Only way to definitively determine treatment
effectiveness is an RCT that has
- Intent-to-treat procedures and analysis
a) Randomized, doubleblinded prospective study
b) Case-control study
c) Prospective cohort study
d) Cross-sectional study
e) Poll the audience…
- Very little loss of follow-up data
- No other threats (randomization, blinding)
- Non-adherence is bad, but loss to follow-up is
much worse
- Loss before randomization is OK, loss after
randomization is not
53
54
Surrogate is a biomarker that
Thank you!
a) is intended to substitute
for a clinical endpoint
b) can be used to
reasonably predict
clinical benefit
c) is in the causal pathway
of disease process
d) requires validation from
many randomized trials
e) All of the above
55
56