HOW TO RUN LOGISTIC REGRESSION WITH IBM SPSS-20 VERSION

HOW TO RUN LOGISTIC REGRESSION
WITH IBM SPSS-20 VERSION
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.s
tatistics.help%2Fidh_lreg_val.htm
Logistic regression is useful for situations in which you want to be able to predict the
presence or absence of a characteristic or outcome based on values of a set of predictor
variables. It is similar to a linear regression model but is suited to models where the
dependent variable is dichotomous. Logistic regression coefficients can be used to estimate
odds ratios for each of the independent variables in the model. Logistic regression is
applicable to a broader range of research situations than discriminant analysis.
Statistics.
For each analysis: total cases, selected cases, valid cases. For each categorical variable:
parameter coding.
For each step: variable(s) entered or removed, iteration history, –2 log-likelihood,
goodness of fit, Hosmer-Lemeshow goodness-of-fit statistic, model chi-square,
improvement chi-square, classification table, correlations between variables, observed
groups and predicted probabilities chart, residual chi-square.
For each variable in the equation: coefficient (B), standard error of B, Wald statistic,
estimated odds ratio (exp(B)), confidence interval for exp(B), log-likelihood if term
removed from model. For each variable not in the equation: score statistic. For each
case: observed group, predicted probability, predicted group, residual, standardized
residual.
Methods. You can estimate models using block entry of variables or any of the following
stepwise methods:
forward conditional,
forward LR,
forward Wald,
backward conditional,
backward LR, or
backward Wald.
Binary logistic regression is most useful when you want to model the event probability for
a categorical response variable with two outcomes. For example:
• A catalog company wants to increase the proportion of mailings that result in
sales.
1
• A doctor wants to accurately diagnose a possibly cancerous tumor.
• A loan officer wants to know whether the next customer is likely to default.
Other example. What lifestyle characteristics are risk factors for coronary heart disease
(CHD)? Given a sample of patients measured on smoking status, diet, exercise, alcohol use,
and CHD status, you could build a model using the four lifestyle variables to predict the
presence or absence of CHD in a sample of patients. The model can then be used to derive
estimates of the odds ratios for each factor to tell you, for example, how much more likely
smokers are to develop CHD than nonsmokers.
Using the Binary Logistic Regression procedure, the catalog company can send mailings to
the people who are most likely to respond, the doctor can determine whether the tumor is
more likely to be benign or malignant, and the loan officer can assess the risk of extending
credit to a particular customer.
The Logistic Regression Model
In the logistic regression model, the relationship between Z and the probability of the event
of interest is described by this link function.
e zi
1
i 

1  e zi 1  e zi
, or
  
zi  log  i 
 1 i 
where
pi
zi
is the probability the ith case experiences the event of interest
is the value of the unobserved continuous variable for the ith case
The model also assumes that Z is linearly related to the predictors.
zi  b0  b1 xi1  b2 xi 2  ...  bp xip
where
xij
bj
p
is the jth predictor for the ith case
is the jth coefficient
is the number of predictors
2
If Z were observable, you would simply fit a linear regression to Z and be done. However,
since Z is unobserved, you must relate the predictors to the probability of interest by
substituting for Z.
i 
1 e

1
 b0 b1 xi1 ...bp xip

The regression coefficients are estimated through an iterative maximum likelihood method.
Set Rule
Cases defined by the selection rule are included in model estimation. For example, if you
selected a variable and equals and specified a value of 5, then only the cases for which the
selected variable has a value equal to 5 are included in estimating the model.
Statistics and classification results are generated for both selected and unselected cases.
This provides a mechanism for classifying new cases based on previously existing data, or
for partitioning your data into training and testing subsets, to perform validation on the
model generated.
This feature requires the Regression option.
From the menus choose:
Analyze > Regression > Binary Logistic…
In the Logistic Regression dialog box, click Select.
Choose a selection variable and then click Rule.
Define a selection rule for selecting a subset of cases for analysis. The name of the
selected variable appears on the left.
Select a relation. The available relations are equals, not equal to, less than, less than or
equal to, greater than, and greater than or equal to.
Enter a value.
Variable Selection Methods
Method selection allows you to specify how independent variables are entered into the
analysis. Using different methods, you can construct a variety of regression models from
the same set of variables.
• Enter. A procedure for variable selection in which all variables in a block are
entered in a single step.
• Forward Selection (Conditional). Stepwise selection method with entry testing
based on the significance of the score statistic, and removal testing based on the
3
probability of a likelihood-ratio statistic based on conditional parameter
estimates.
• Forward Selection (Likelihood Ratio). Stepwise selection method with entry
testing based on the significance of the score statistic, and removal testing based
on the probability of a likelihood-ratio statistic based on the maximum partial
likelihood estimates.
• Forward Selection (Wald). Stepwise selection method with entry testing based
on the significance of the score statistic, and removal testing based on the
probability of the Wald statistic.
• Backward Elimination (Conditional). Backward stepwise selection. Removal
testing is based on the probability of the likelihood-ratio statistic based on
conditional parameter estimates.
• Backward Elimination (Likelihood Ratio). Backward stepwise selection.
Removal testing is based on the probability of the likelihood-ratio statistic based
on the maximum partial likelihood estimates.
• Backward Elimination (Wald). Backward stepwise selection. Removal testing is
based on the probability of the Wald statistic.
The significance values in your output are based on fitting a single model. Therefore, the
significance values are generally invalid when a stepwise method is used.
All independent variables selected are added to a single regression model. However, you
can specify different entry methods for different subsets of variables. For example, you can
enter one block of variables into the regression model using stepwise selection and a
second block using forward selection. To add a second block of variables to the regression
model, click Next.
Define Categorical Variables
You can specify details of how the Logistic Regression procedure will handle categorical
variables:
Covariates. Contains a list of all of the covariates specified in the main dialog box,
either by themselves or as part of an interaction, in any layer. If some of these are
string variables or are categorical, you can use them only as categorical covariates.
Categorical Covariates. Lists variables identified as categorical. Each variable
includes a notation in parentheses indicating the contrast coding to be used. String
variables (denoted by the symbol < following their names) are already present in
4
the Categorical Covariates list. Select any other categorical covariates from the
Covariates list and move them into the Categorical Covariates list.
Change Contrast. Allows you to change the contrast method. Available contrast
methods are:
• Indicator. Contrasts indicate the presence or absence of category membership.
The reference category is represented in the contrast matrix as a row of zeros.
• Simple. Each category of the predictor variable (except the reference category)
is compared to the reference category.
• Difference. Each category of the predictor variable except the first category is
compared to the average effect of previous categories. Also known as reverse
Helmert contrasts.
• Helmert. Each category of the predictor variable except the last category is
compared to the average effect of subsequent categories.
• Repeated. Each category of the predictor variable except the first category is
compared to the category that precedes it.
• Polynomial. Orthogonal polynomial contrasts. Categories are assumed to be
equally spaced. Polynomial contrasts are available for numeric variables only.
• Deviation. Each category of the predictor variable except the reference
category is compared to the overall effect.
If you select Deviation, Simple, or Indicator, select either First or Last as the reference
category. Note that the method is not actually changed until you click Change.
String covariates must be categorical covariates. To remove a string variable from the
Categorical Covariates list, you must remove all terms containing the variable from the
Covariates list in the main dialog box.
This feature requires the Regression option.
From the menus choose:
Analyze > Regression > Binary Logistic…
In the Logistic Regression dialog box, select at least one variable in the Covariates
list and then click Categorical.
In the Categorical Covariates list, select the covariate(s) whose contrast method
you wish to change. You can change multiple covariates simultaneously.
Select a contrast method from the Method drop-down list.
Click Change.
5
Save New Variables
You can save results of the logistic regression as new variables in the active dataset:
Predicted Values. Saves values predicted by the model. Available options are Probabilities
and Group membership.
• Probabilities. For each case, saves the predicted probability of occurrence of the
event. A table in the output displays name and contents of any new variables. The
"event" is the category of the dependent variable with the higher value; for
example, if the dependent variable takes values 0 and 1, the predicted probability
of category 1 is saved.
• Predicted Group Membership. The group with the largest posterior probability,
based on discriminant scores. The group the model predicts the case belongs to.
Influence. Saves values from statistics that measure the influence of cases on predicted
values. Available options are Cook's, Leverage values, and DfBeta(s).
• Cook's. The logistic regression analog of Cook's influence statistic. A measure of
how much the residuals of all cases would change if a particular case were
excluded from the calculation of the regression coefficients.
• Leverage Value. The relative influence of each observation on the model's fit.
• Df Beta(s). The difference in beta value is the change in the regression coefficient
that results from the exclusion of a particular case. A value is computed for each
term in the model, including the constant.
Residuals. Saves residuals. Available options are Unstandardized, Logit, Studentized,
Standardized, and Deviance.
• Unstandardized Residuals. The difference between an observed value and the
value predicted by the model.
• Logit Residual. The residual for the case if it is predicted in the logit scale. The
logit residual is the residual divided by the predicted probability times 1 minus
the predicted probability.
• Studentized Residual. The change in the model deviance if a case is excluded.
6
• Standardized Residuals. The residual divided by an estimate of its standard
deviation. Standardized residuals, which are also known as Pearson residuals,
have a mean of 0 and a standard deviation of 1.
• Deviance. Residuals based on the model deviance.
Export model information to XML file. Parameter estimates and (optionally) their
covariances are exported to the specified file in XML (PMML) format. You can use this
model file to apply the model information to other data files for scoring purposes. See the
topic Scoring Wizard for more information.
This feature requires the Regression option.
From the menus choose:
Analyze > Regression > Binary Logistic…
In the Logistic Regression dialog box, click Save.
Options
You can specify options for your logistic regression analysis:
Statistics and Plots. Allows you to request statistics and plots. Available options are
Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals,
Correlations of estimates, Iteration history, and CI for exp(B). Select one of the alternatives
in the Display group to display statistics and plots either At each step or, only for the final
model, At last step.
• Hosmer-Lemeshow goodness-of-fit statistic. This goodness-of-fit statistic is
more robust than the traditional goodness-of-fit statistic used in logistic
regression, particularly for models with continuous covariates and studies with
small sample sizes. It is based on grouping cases into deciles of risk and comparing
the observed probability with the expected probability within each decile.
Probability for Stepwise. Allows you to control the criteria by which variables are entered
into and removed from the equation. You can specify criteria for Entry or Removal of
variables.
• Probability for Stepwise. A variable is entered into the model if the probability of
its score statistic is less than the Entry value and is removed if the probability is
greater than the Removal value. To override the default settings, enter positive
values for Entry and Removal. Entry must be less than Removal.
Classification cutoff. Allows you to determine the cut point for classifying cases. Cases
with predicted values that exceed the classification cutoff are classified as positive, while
those with predicted values smaller than the cutoff are classified as negative. To change the
default, enter a value between 0.01 and 0.99.
7
Maximum Iterations. Allows you to change the maximum number of times that the model
iterates before terminating.
Include constant in model. Allows you to indicate whether the model should include a
constant term. If disabled, the constant term will equal 0.
This feature requires the Regression option.
From the menus choose:
Analyze > Regression > Binary Logistic…
In the Logistic Regression dialog box, click Options.
Additional Features
The command syntax language also allows you to:
• Identify casewise output by the values or variable labels of a variable.
• Control the spacing of iteration reports. Rather than printing parameter estimates
after every iteration, you can request parameter estimates after every nth
iteration.
• Change the criteria for terminating iteration and checking for redundancy.
• Specify a variable list for casewise listings.
• Conserve memory by holding the data for each split file group in an external
scratch file during processing.
Example: Using Binary Logistic Regression to Assess Credit Risk
If you are a loan officer at a bank, then you want to be able to identify characteristics that
are indicative of people who are likely to default on loans, and use those characteristics to
identify good and bad credit risks.
Suppose information on 850 past and prospective customers is contained in bankloan.sav.
in the SPSSv20 sample folder: C:\Program Files\IBM\SPSS\Statistics\20\Samples\
English.
The first 700 cases are customers who were previously given loans. Use a
random sample of these 700 customers to create a logistic regression model, setting the
remaining customers aside to validate the analysis. Then use the model to classify the 150
prospective customers as good or bad credit risks.
Setting the random seed allows you to replicate the random selection of cases in this
analysis.
► To set the random seed, from the menus choose:
Transform > Random Number Generators...
8
Preparing the Data for Analysis
► Select Set Starting Point.
► Select Fixed Value and type 9191972 as the value
► Click OK.
► To create the selection variable for validation, from the menus choose:
Transform > Compute Variable...
► Type validate in the Target Variable text box.
► Type rv.bernoulli(0.7) in the Numeric Expression text box.
This sets the values of validate to be randomly generated Bernoulli variates with
probability parameter 0.7.
9
You only intend to use validate with cases that could be used to create the model; that is,
previous customers. However, there are 150 cases corresponding to potential customers in
the data file.
► To perform the computation only for previous customers, click If.
► Select Include if case satisfies condition.
► Type MISSING(default) = 0 as the conditional expression.
This ensures that validate is only computed for cases with non-missing values for default;
that is, for customers who previously received loans.
► Click Continue.
► Click OK in the Compute Variable dialog box.
Approximately 70 percent of the customers previously given loans will have a validate value of 1.
These customers will be used to create the model. The remaining customers who were previously
given loans will be used to validate the model results.
Running the Analysis
► To create the logistic regression model, from the menus choose:
Analyze > Regression > Binary Logistic...
10
► Select Previously defaulted as the dependent variable.
► Select Age in years through Other debt in thousands as covariates.
► Select Forward: LR as the method.
► Select validate as the selection variable and click Rule.
11
► Type 1 as the value for selection variable.
► Click Continue.
► Click Categorical in the Logistic Regression dialog box.
► Select Level of education as a categorical covariate.
► Click Continue.
► Click Save in the Logistic Regression dialog box.
► Select Probabilities in the Predicted Values group, Cook's in the Influence group, and
Studentized in the Residuals group.
► Click Continue.
► Click Options in the Logistic Regression dialog box.
12
► Select Classification plots and Hosmer-Lemeshow goodness-of-fit.
► Click Continue.
► Click OK in the Logistic Regression dialog box.
These selections generate the following command syntax:
LOGISTIC REGRESSION VAR=default
/SELECT validate EQ 1
/METHOD=FSTEP(LR) age ed employ address income debtinc creddebt othdebt
/CONTRAST (ed)=Indicator
/SAVE PRED COOK SRESID
13
/CLASSPLOT
/PRINT=GOODFIT
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
• The procedure fits a logistic regression model for default.
• The SELECT subcommand specifies that only cases with a value of 1 on the variable
validate will be used to build the model. Other cases will be scored and can be used to
validate the final model.
• The variables specified on the METHOD subcommand will be entered into the model
using forward stepwise regression based on the significance of the likelihood ratio
statistic.
• The CONTRAST subcommand specifies that ed is a categorical variable and should be
processed using indicator coding.
• The SAVE subcommand requests that the casewise predicted values, Cook's influence
statistics, and studentized residuals be saved as new variables to the dataset.
• The CLASSPLOT subcommand requests a classification plot of the observed and
predicted values.
• The PRINT subcommand requests the Hosmer-Lemeshow goodness-of-fit statistic.
• All other options are set to their default values.
Model Diagnostics
After building a model, you need to determine whether it reasonably approximates the
behavior of your data.
Tests of Model Fit. The Binary Logistic Regression procedure reports the HosmerLemeshow goodness-of-fit statstic.
Residual Plots. Using variables specified in the Save dialog box, you can construct various
diagnostic plots. Two helpful plots are the change in deviance versus predicted
probabilities and Cook's distances versus predicted probabilities.
Tests of Model Fit
Goodness-of-fit statistics help you to determine whether the model adequately describes the data.
The Hosmer-Lemeshow statistic indicates a poor fit if the significance value is less than 0.05. Here,
the model adequately fits the data.
14
This statistic is the most reliable test of model fit for IBM® SPSS® Statistics binary logistic
regression, because it aggregates the observations into groups of "similar" cases. The statistic is
then computed based upon these groups.
Change in Deviance Versus Predicted Probabilities
The change in deviance is not an option in the Save dialog box, but it can be estimated by
squaring the studentized residuals.
► To create the change in deviance, from the menus choose:
Transform > Compute Variable...
► Type chgdev as the Target Variable.
► Type sre_1**2 as the Numeric Expression.
► Click OK.
The squared studentized residuals have been saved to chgdev.
15
► To produce the residual plot, from the menus choose:
Graphs > Chart Builder...
► Select the Scatter/Dot gallery and choose Simple Scatter.
► Select chgdev as the variable to plot on the Y axis.
► Select Predicted probability as the variable to plot on the X axis.
► Click OK.
16
The change in deviance plot helps you to identify cases that are poorly fit by the model.
Larger changes in deviance indicate poorer fits. There are two distinct patterns in the plot:
a curve that extends from the lower left to the upper right, and a curve that extends from
the upper left to the lower right.
• The curve that extends from the lower left to the upper right corresponds to cases in
which the dependent variable has a value of 0. Thus, non-defaulters who have large
model-predicted probabilities of default are poorly fit by the model.
• The curve that extends from the upper left to the lower right corresponds to cases in
which the dependent variable has a value of 1.
Thus, defaulters who have small model-predicted probabilities of default are poorly fit by
the model. By identifying the cases that are poorly fit by the model, you can focus on how
those customers are different, and hopefully discover another predictor that will improve
the model.
17
► Recall the Chart Builder.
► Select Analog of Cook's influence statistics as the variable to plot on the Y axis.
► Select Predicted probability as the variable to plot on the X axis.
► Click OK.
18
The shape of the Cook's distances plot generally follows that of the previous figure, with
some minor exceptions. These exceptions are high-leverage points, and can be influential to
the analysis. You should note which cases correspond to these points for further
investigation.
19
Choosing the Right Model
There are usually several models that pass the diagnostic checks, so you need tools to
choose between them.
•
Automated Variable Selection. When constructing a model, you generally want to
only include predictors that contribute significantly to the model. The Binary
Logistic Regression procedure offers several methods for stepwise selection of the
"best" predictors to include in the model.
•
Pseudo R-Squared Statistics. The r-squared statistic, which measures the variability
in the dependent variable that is explained by a linear regression model, cannot be
computed for logistic regression models. The pseudo r-squared statistics are
designed to have similar properties to the true r-squared statistic.
•
Classification and Validation. Crosstabulating observed response categories with
predicted categories helps you to determine how well the model identifies
defaulters.
20
Variable Selection
Forward stepwise methods start with a model that doesn't include any of the predictors.
At each step, the predictor with the largest score statistic whose significance value is less than a
specified value (by default 0.05) is added to the model.
The variables left out of the analysis at the last step all have significance values larger than
0.05, so no more are added.
The variables chosen by the forward stepwise method should all have significant changes
in -2 log-likelihood.
The change in -2 log-likelihood is generally more reliable than the Wald statistic. If the two
disagree as to whether a predictor is useful to the model, trust the change in -2 loglikelihood. See the regression coefficients table for the Wald statistic.
21
As a further check, you can build a model using backward stepwise methods. Backward
methods start with a model that includes all of the predictors. At each step, the predictor
that contributes the least is removed from the model, until all of the predictors in the model
are significant. If the two methods choose the same variables, you can be fairly confident
that it's a good model.
R-Squared Statistics
In the linear regression model, the coefficient of determination, R2, summarizes the
proportion of variance in the dependent variable associated with the predictor
(independent) variables, with larger R2 values indicating that more of the variation is
explained by the model, to a maximum of 1. For regression models with a categorical
dependent variable, it is not possible to compute a single R2 statistic that has all of the
characteristics of R2 in the linear regression model, so these approximations are computed
instead. The following methods are used to estimate the coefficient of determination.
•
Cox and Snell's R2 (Cox and Snell, 1989) is based on the log likelihood for the model
compared to the log likelihood for a baseline model. However, with categorical
outcomes, it has a theoretical maximum value of less than 1, even for a "perfect"
model.
•
Nagelkerke's R2 (Nagelkerke, 1991) is an adjusted version of the Cox & Snell Rsquare that adjusts the scale of the statistic to cover the full range from 0 to 1.
•
McFadden's R2 (McFadden, 1974) is another version, based on the log-likelihood
kernels for the intercept-only model and the full estimated model.
What constitutes a “good” R2 value varies between different areas of application. While these
statistics can be suggestive on their own, they are most useful when comparing competing models
for the same data. The model with the largest R2 statistic is “best” according to this measure.
Classification
The classification table shows the practical results of using the logistic regression model.
For each case, the predicted response is Yes if that cases's model-predicted probability is
greater than the cutoff value specified in the dialogs (in this case, the default of 0.5).
22
• Cells on the diagonal are correct predictions.
• Cells off the diagonal are incorrect predictions
Of the cases used to create the model, 57 of the 124 people who previously defaulted are classified
correctly. 352 of the 375 non-defaulters are classified correctly. Overall, 82.0% of the cases are
classified correctly.
From step to step, the improvement in classification indicates how well your model performs. A
better model should correctly identify a higher percentage of the cases.
Classifications based upon the cases used to create the model tend to be too "optimistic" in the
sense that their classification rate is inflated.
Subset validation is obtained by classifying past customers who were not used to create the model.
These results are shown in the Unselected Cases section of the table. 80.6 percent of these cases
were correctly classified by the model. This suggests that, overall, your model is in fact correct
about four out of five times. Even the subset validation is limited, however, because it is based on
only one cutoff value.
23
Logistic Regression Coefficients
The parameter estimates table summarizes the effect of each predictor. The ratio of the
coefficient to its standard error, squared, equals the Wald statistic. If the significance level
of the Wald statistic is small (less than 0.05) then the parameter is useful to the model. The
predictors and coefficient values shown in the last step are used by the procedure to make
predictions The meaning of a logistic regression coefficient is not as straightforward as that
of a linear regression coefficient. While B is convenient for testing the usefulness of
predictors, Exp(B) is easier to interpret.
Exp(B) represents the ratio-change in the odds of the event of interest for a one-unit
change in the predictor. For example, Exp(B) for employ is equal to 0.781, which means
that the odds of default for a person who has been employed at their current job for two
years are 0.781 times the odds of default for a person who has been employed at their
current job for 1 year, all other things being equal
What this difference means in terms of probability depends upon the original probability of default
for the person employed for 1 year.
What this difference means in terms of probability depends upon the original probability of default
for the person employed for 1 year.
Odds (default) =
P(default)
P(default)

P(no default) 1  P(default)
In the case of a customer whose probability of default is 0.5, the odds she will default are related to
the probability by this equation. Thus, her corresponding odds of default are 1.
24
The odds of default for a person with 1 more year on the job are 1*0.781 = 0.781, so the
corresponding probability of default reduces to 0.439.
In the case of a customer whose probability of default is 0.9, his odds of default are 9.
The odds of default for a person with 1 more year on the job are 9*0.781 = 7.029, so the
corresponding probability of default reduces to 0.875.
The change in probability of default is greater for the person who originally had a 0.5 probability of
default. In general, the change will be greater when the original probability is near 0.5, and smaller
when it is close to 0 or 1.
Summary
Using the Logistic Regression Procedure, you have constructed a model for predicting the
probability a given customer will default on their loan.
A critical issue for loan officers is the cost of Type I and Type II errors. That is, what is the
cost of classifying a defaulter as a non-defaulter (Type I)? What is the cost of classifying a
non-defaulter as a defaulter (Type II)? If bad debt is the primary concern, then you want to
lower your Type I error and maximize your sensitivity. If growing your customer base is
the priority, then you want to lower your Type II error and maximize your specificity.
Usually both are major concerns, so you have to choose a decision rule for classifying
customers that gives the best mix of sensitivity and specificity.
Related Procedures
The Binary Logistic Regression procedure is a useful tool for predicting the value of a
categorical response variable with two possible outcomes.
•
If there are more than two possible outcomes and they do not have an inherent
ordering, use the Multinomial Logistic Regression procedure. Multinomial logistic
regression can also be used as an alternative when there are only two outcomes.
•
If there are more than two possible outcomes and they are ordered, use the Ordinal
Regression procedure.
•
If there are more than two possible outcomes and your predictors can be considered
scale, the Discriminat Analysis procedure can be useful. Discriminant analysis can
also be used when there are only two outcomes.
•
If the dependent variable and predictors are scale, use the Linear Regression
procedure.
•
If the dependent variable is scale and some or all predictors are categorical, use the
GLM Univariate procedure.
25
•
In order to use a tool that is more flexible than the classification table, save the
model-predicted probabilities and use the ROC Curve procedure. The ROC Curve
provides an index of accuracy by demonstrating your model's ability to discriminate
between two groups. You can think of it as a visualization of classification tables for
all possible cutoffs.
Recommended Readings
See the following texts for more information logistic regression:
Hosmer, D. W., and S. Lemeshow. 2000. Applied Logistic Regression, 2nd ed. New York: John
Wiley and Sons.
Kleinbaum, D. G. 1994. Logistic Regression: A Self-Learning Text. New York: Springer-Verlag.
Jennings, D. E. 1986. Outliers and residual distributions in logistic regression. Journal of the
American Statistical Association, 81:, 987-990.
Norusis, M. 2004. SPSS 13.0 Statistical Procedures Companion. Upper Saddle-River, N.J.:
Prentice Hall, Inc..
26