Download Report

How to Use…
Guide to using software for the practicals of the Essex Summer School 2013
course 1E ‘Introduction to multilevel models with applications’
Paul Lambert, July 2013
Version 2.2 of 10/JUN/2013
Contents
How to... Undertake the practical sessions for module 1E................................................................... 2
Principles of the practical sessions ...................................................................................................... 4
Organisation of the practical sessions ................................................................................................ 6
..The tremendous importance of workflows.. ................................................................................. 9
How to use… EditPad Lite .................................................................................................................... 13
How to use… Stata ............................................................................................................................... 17
How to use… SPSS ................................................................................................................................ 24
How to use… MLwiN ............................................................................................................................ 30
How to use MLwiN (i): MLwiN the easy way .................................................................................... 30
How to use MLwiN (ii): MLwiN for serious people ............................................................................ 34
How to use MLwiN (iii): Using ‘runmlwin’ ......................................................................................... 36
Some miscellaneous points on working with MLwiN .................................................................... 38
How to use… R ...................................................................................................................................... 39
References cited ................................................................................................................................... 43
Software-specific recommendations:................................................................................................ 44
1
1E Coursepack: Appendix 2
How to... Undertake the practical sessions for module 1E
The next few pages describe topics and materials from the 1E lab sessions. The format of the labs is
that you are supplied with a set of example ‘command files’ in the relevant packages, plus a
corresponding (electronic) exercise sheet. During the scheduled session you should open and review
the example command files (.do, .sps, .mac and .R files), then finish the session by answering the
exercise sheet (which draws upon things illustrated in the example command files).
Annotation notes can be found within the example command files which try to explain what the
commands are doing. Some of the example files are very long, but once you become familiar with
them you will realise that their sections can often be moved through quite quickly. Ordinarily, you
should proceed through the example command files by running a few lines at a time and observing
the outputs generated. As you digest the examples, it will often help to add your own notes,
adjustments or examples to the command file text. You’ll ordinarily need to have the analytical
software open and the relevant tool for working with a syntax file (e.g. ‘syntax window’ or ‘do file
editor’). In addition it will typically also be helpful to have open some applications to remind you of
where the data is stored, and perhaps a plain text editor allowing you to conveniently open up
several other syntax files for purposes of comparison.

When working with Stata a typical view of your desktop might be something like:
Description: The first two interfaces you can see in this screenshot are respectively the Stata do file
editor (where I write commands and send them to be processed, such as by highlighting the relevant
lines and clicking ‘ctrl-d’); and the main Stata window (here Version 10) which includes the results
page. Note that the syntax file open is a modified (personalised) version of the supplied illustrative
syntax file – the name has been changed so that I can save the original file plus my own version of it
after edits (e.g. with my own comments). Behind the scenes I’ve also got open an ‘Editpadlite’ session
which I’m using to conveniently open up and compare some other sytnax files that I don’t particularly
want in my do file editor itself; I’ve also got a file manager software open showing the data I’m
drawing upon (in what Stata will call ‘path1a’); and I’ve got some Internet Exporer (IE) sessions open
(I’m looking up the online BHPS documentation, and the Stata listserv, where information on Stata
commands is available).
2
1E Coursepack: Appendix 2

When working with SPSS, you may have a desktop session something like:
Description: The three SPSS interfaces shown in this screenshot are firstly the syntax editor (where I
write commands and send them to be processed, such as by highlighting the relevant lines and clicking
‘ctrl-r’); then behind this are the ‘data editor’ and the ‘ouptut’ window in SPSS, where I can see the
data and results from the commands. Again, the syntax file open is a modified (personalised) version
of the supplied illustrative syntax file – the name has been changed so that I can save the original file
plus my own version of it after edits (e.g. with my own comments). Whilst working in SPSS I’ve also got
open an ‘Editpadlite’ session which I’m using to keep open and compare some other sytnax files that
I’m checking against; I’ve also got a file manager software open (Windows Explorer) to show the
location of the data files I’m using (i.e. the contents of ‘path1a’); and I’ve got some IE sessions open
(I’m looking at the BHPS documentation, but it’s late in the day and my self-discipline is slipping a
little, so I’ve got my email open as well, which is never going to help with my multilevel modelling..!).
3
1E Coursepack: Appendix 2
Principles of the practical sessions
The programme of practicals included within course 1E is intended to help you first in understanding
the topics covered in teaching sessions, and second by furnishing you with important skills in
applying multilevel analysis techniques to social science data.
For the practical sessions we provide this handout which briefly describes the contents of the
sessions and the Essex lab setup, and then gives more extended material describing the various
software packages used in the labs. In addition, within the labs you are given copies of extended
software specific command files which contain the details of the exercises in the form of software
code and some degree of descriptive notes. These sample command files will hopefully provide you
with some useful worked examples which will help you for quite some time into the future.
Supplementing these resources, you are also given lab ‘exercise sheets’, and some ‘homework’
exercises, which should serve to test the skills covered in the practical sessions.
A lot of material is presented across the programme of practicals, particularly in the details of
supplementary exercises or activities. It is easy with software based materials to skip through parts
of the sessions briefly without properly digesting them. Nevertheless you will get the best out of the
module if you work through the practical exercises in full (even if that involves catching up with
some aspects of them later on). You should also be prepared that covering the materials in full will
often include following up on other information sources, such as software help guides and files.
To some extent, part of learning to work effectively with complex data and analysis methods (which
is what, arguably, defines multilevel modelling of social science processes) is about teaching yourself
to be a better software programmer in relevant software packages (such as Stata, SPSS and MLwiN).
Many social scientists have not previously been trained in programming. Good programming
involves using software in a manner which generates a syntactical record of the procedures; keeping
your software commands and files well organised; and learning specific programming rules,
fragments of code, and code options, as are relevant to the particular software packages (see also
the sections immediately below). Much of the work of Course 1E, indeed, concerns the development
of your skills for using software for handling and analysing social science datasets. To pre-warn you,
for most of us, the development of programming skills in this domain is a long and arduous process.
We would also highlight three themes running through the practical sessions which summarise the
approach we are adopting and the principles of good practice in using software for handling and
analysing social science datasets which we follow:



Documentation for replication
The integration of data management and data analysis
The use of ‘real’ data
4
1E Coursepack: Appendix 2
Documentation for replication
The first principle refers to the importance of working in such a way as to keep a clear linear record
of the steps involved in your analysis. This serves as a means to support the long-term preservation
of your work. A useful guiding principle is that, after a significant break in activities, you yourself – or
another researcher at some future point – could retrace and replicate your analysis generating
exactly the same results. There are several excellent recent texts which explain the importance of
the principle of replicability to a model of systematic scientific enquiry with social science research
data (e.g. Dale, 2006; Freese, 2007; Kohler and Kreuter, 2009; Altman and Franklin 2010). Long
(2009) gives a very useful extended guide to organising you research materials for this purpose.
In social statistics we work overwhelmingly with software packages which process datasets and
generate results. In this domain, documentation for replicability is effectively achieved by (a) the
careful and consistent organisation and storage of data files and other related materials, and (b) the
systematic use (and storage) of ‘syntactical’ programming to process tasks.
‘Syntactical programming’ (also often described as ‘textual command programming’) involves
finding a way to write out software commands in a textual format which may be easily stored,
reopened, re-run and/or edited and re-run. Examples include using SPSS ‘syntax files’, and Stata ‘do
files’, but almost all software programmes support some sort of equivalent syntactical command
representation. The use of syntactical programming is not universal in the analysis of social science
research data - but it should be. A great attraction of programmes such as SPSS and Stata is that
their syntactical command language is largely ‘human readable’ (that is, you can often guess what
the syntax commands mean even before you look them up, as they often use English language
terms). We’ll say more on applying syntactical programming in the ‘How to use…’ sections below.
The integration of data management and data analysis.
The second principle involves recognising the scope for modification and enhancement of
quantitative data in the social sciences. We all know that data is ‘socially constructed’, but the
impact for analysts, which we sometimes forget, is that many measures in a survey dataset are ‘up
for negotiation’ (such as in choices we have regarding category recodes, missing data treatments,
derived summary variables, and choices over the ‘functional form’). The way that such decisions are
actually negotiated in any particular study comprises an element of research which is often labelled
‘data management’.
It is our experience that most applied projects require more input to data management than to data
analysis aspects of the work. As such, finding software approaches (or combinations of software
approaches) which support the seamless movement back and forth between data management and
data analysis is an important consideration. In this regard, Stata stands out at the current time as the
superior package for social survey research as it combines extended functionally for both
components of the research process (this is the main reason that Stata is used more frequently than
other packages in this course). Other packages or combinations of packages can also be used
effectively (for instance a common way of working with multilevel models for those who don’t have
access to Stata is to combine SPSS for data management and exploratory analysis, and MLwiN for
more complex multilevel data analysis functions).
(Aside, and a shameless plug: If you are interested in the methodological topic of data management more
generally, we have a related project on this very theme, where amongst other things we provide online dataoriented resources and training materials, and run workshop sessions, oriented to data management issues in
social survey research: see the ‘DAMES’ project at www.dames.org.uk).
5
1E Coursepack: Appendix 2
The use of ‘real’ data.
The last principle may sound rather trite but is an important part of gaining applied skills in social
science data analysis. Statistics courses and texts ordinarily start with simplified illustrative data such
as simulated data and data files which are very small in terms of the number of cases and variables
involved. This is for very good reasons – trying anything else can lead to access permission problems,
software processing time delays, and onerous requirements for further data management. In this
course we do often use simplified datasets with these characteristics, however, as much as possible
we also include materials which start from the original source data (e.g. the full version of the data
such as you could get from downloading via the UK Data Archive or a similar data provider).
There are quite a few reasons why using full size data and the original measures is a good thing to
aspire to. Firstly, in the domain of multilevel modelling techniques, estimation and calculation
problems, and the actual magnitude of hierarchical effects, can often be different in ‘real’ datasets
compared to those from simplified examples (in ‘real’ applications, the former tend to be much
greater, and the latter much smaller..!). Secondly, using full size data and original variables very
often exposes problems, errors or required adjustments with software routines which may not
otherwise be apparent (for instance, errors caused by missing data conventions, or upper limits on
the numbers of categories, cases or variables, are common problems here). Third, our experience is
that by using ‘real’ data during practical sessions, as course participants you are much more likely to
go away from the course in a position to be readily able to apply the analytical methods covered to
your own data and analysis requirements.
Organisation of the practical sessions
Materials referred to in the 1E practical sessions will include:
 data files (copies of survey data used in the practicals);
 sample command files (pre-prepared materials which include programming commands in
the language of the relevant software)
 summary exercise sheets (questions which test you on things covered by each lab)
 supplementary ‘macros’ or ‘sub-files’ (further pre-prepared materials featuring
programming commands in relevant languages, usually invoked as a sub-routine within the
main sample command files)
 teaching materials (i.e. the Coursepack notes and lecture files).
At each session we will give you example command files for the data analysis packages, which take
you through the process of carrying out the relevant data analysis operations. The main activities of
the lab sessions involve opening the relevant files and working your way through them. During the
labs, we do of course expect that you will ask for help or clarification on parts of the command files
which are not clear.
6
1E Coursepack: Appendix 2
During the workshop you will be able to store data files at a network location within the University
of Essex systems. You should allocate a folder for this module, and use subfolders within it to store
the materials for this module.
A typical setup will look be something like:
This session is from my own laptop so it won’t be identical to your machine, but you should have something
broadly similar. Illustrated, I’ve made a folder called ‘d:\essex10’ within which I’ve made a number of
subfolders for different component materials linked to the course. For example, in the ‘syntax’ subfolder I’ve
got copies of the four example command files linked to Practical session 1; although not shown, other things
from the module are in the other subfolders, such as a number of other relevant command files in the
‘sub_files’ folder, and some relevant data files in the ‘data’ folder.
An important point to make is that the various command files will need to draw upon other files (e.g.
data files) in order to run. To do this, they need to be able to reference the location of the required
files. In most applications, we do this by defining ‘macros’ which point to specific ‘paths’ on your
computer (see also software sections below). For the labs to work successfully, it will be necessary to
ensure that the command file you are trying to run is pointing to the right paths at the right time. In
general, this only requires one specification to be made at the start of the session, for instance to do
this in SPSS and in Stata we define ‘macros’ for the relevant paths (also see pages 20 and 28 of the
‘How to use’ guide on this issue). Sometimes however it can be necessary to edit the full path
reference of a particular file in order to be able to access it.
For example, in the text below, we show some Stata and SPSS commands which in both cases define
a macro (called ‘path3a’) which gives the directory location of the data file ‘aindresp.dta’ or
‘aindresp.sav’. Subsequent commands which call upon it will then go directly to that path:
Stata example
global path3a “d:\data\bhps\”
use pid asex using $path3a\aindresp.dta, clear
tab asex
SPSS example
define !path3a () “d:\data\bhps\” !enddefine.
get file=!path3a+“aindresp.sav”.
fre var=asex.
7
1E Coursepack: Appendix 2
Example command files
Most of the command files are annotated with descriptive comments which try to explain, at
relevant points, what they are doing in terms of the data and analysis.
In general, to use the example command files, you should:
 Save them into your own filestore space
 Open them with the relevant part of the software application (e.g. the ‘syntax editor’ in SPSS
or the ‘do file editor’ in Stata (see also the ‘How to use..’ subsections)
 Save the files with a new name to indicate that this will be your personalised copy of the file
(i.e. so that you can edit the files without risk of losing the original copies)
 Make any necessary edits to the path locations given in the files (usually this involves editing
the ‘path’ statements near the top of the relevant files)
 Work through the files, implementing the commands and interpreting the outputs, adding
your own comments to the files to help you understand the operations you’re performing
Data Files
We’ll use a great many data files over the course of the practical programme. We’re planning to
supply them via the network drive, though it may sometimes be easier to transfer them by other
means.
Many of these data are freely available online, such as the example data files used in textbooks such
as by Hox (2010), Treiman (2009) and Rabe-Hesketh and Skrondal (2012).
Some of the data files are versions of complex survey datasets from the UK and from comparative
international studies. We can use these for teaching purposes, but you are not allowed to take them
away with you from the class. You can in general access these data for free, however to do so you
must register with the relevant data provider (e.g. the UK Data Archive) and agree to any associated
conditions of access (e.g. required citations and acknowledgements).
Macros/non-interactive command files
As well as the example ‘interactive’ command files that you’ll open, work through, and probably edit
slightly, at certain points we will also supply you with command files in the various software
languages which essentially serve as ‘macros’ to perform specific tasks in data management or
analysis. These ordinarily do not need to be further edited or adjusted, and it is not particularly
necessary to examine their contents, but it will be necessary to take some time to place the files in
the right location in order to invoke them successfully at a suitable time.
Where to put your files..
In general it will be effective to store your files (example command files, data files, and macro files)
in the allocated network filestore space during the course of the summer school (‘M:\ drive’). This
filestore should be available to you from any machine on the Essex system.
8
1E Coursepack: Appendix 2
It might not be effective to store some of the larger survey data files on the network location
however (due to their excessive size). In some instances, it might be more efficient to store the files
on the hard drives of the machine you are using (e.g. the C:\ drive). However, if doing this, you
should be careful that contents of the hard drives may not be preserved when you log off.
The various exercises and command files ought all to be transferable to other machines (e.g. your
laptop or you work machine at your home institution), though obviously you will need to install the
relevant software if you don’t already have it. At the end of the module, you will probably want to
take a copy of all your command files and related notes, plus the freely available example datasets
and the macro files, to your memory stick or some other means, to take back with you.
Analysis of your own data
In general we would very much encourage you to bring you own data to the practical sessions. You’ll
get a lot out of the module if you are able to repeat the demonstration analyses on your own data
and research problems. It will ordinarily be most effective to undertake the class examples first, then
subsequently try to re-do the analysis, in part or in whole, on your own data. If we can, we’ll help
you in working with your own data within lab sessions, though when the class is busy we need to
prioritise people who are working through the class examples.
..The tremendous importance of workflows..
If you haven’t come across the idea of workflows before, then this might simply be a point that, at
this stage, we ask you to take in good faith! There are very good expositions of the idea of workflows
in the social science data analysis process in, amongst others, Long (2009), Treiman (2009), and
Kohler and Kreuter (2008).
A workflow in itself is a representation of a series of tasks which contribute to a project or activity. It
can be a useful exercise to conceptualise a research project as a workflow (with components such as
data collection, data processing, data analysis, report writing). However, what is a really important
contribution is to organise the data and command files that are associated with a project in a
consistent style that recognises their relevant contributions to the workflow structure.
What does that involve? The issue is that we want to construct a replicable trail of our data oriented
research, which allows us to go all the way from opening the initial data file, to producing the
publication quality graph or statistical results which are our end products. We need the replicable
trail in order that it will be possible to adjust our analysis on the basis of minor changes at any
possible stage of the process (or to be able to transfer a record of our work on to others). However
because the trail is long and complex (and not entirely linear), we can only do this, realistically, if we
break down our activities into multiple, separate components.
There are different ways to organise files for these purposes, but a popular and highly effective
approach is to design a ‘master’ syntax command file and a series of ‘sub-files’ which it draws upon.
In this model, the sub-files cover different parts of the research analysis. Personally, my preference
9
1E Coursepack: Appendix 2
is to construct both the master and sub-files in the principle software package being used, though
Long (2009) notes that creating a documentation master file in a different software (e.g. MS Excel) is
an effective way to record a wider range of activities which span across different software.
Here’s an example of a series of tasks being called upon via a Stata format ‘master’ file:
This screenshot shows the Stata master file (‘bhps_educ_master.do’), plus a couple of other Stata do files,
which I’ve opened in the do file editor; plus, in addition, a number of other component files are open within the
EditPad editor, including the sub-file which is being called directly by the master file (‘pre_analysis1.do’). The
Stata output file is not visible, but is open behind the scenes.
10
1E Coursepack: Appendix 2
Here’s an example of a project documentation file that might be constructed in Excel:
Note that the other tabs in the Excel file can be used to show things like author details, context of the analysis,
and last update time. The file also notes some (though not all) dependencies within the workflow – for instance
step 9 requires step 4 to have been taken (the macro reads in a plain text data file that was generated in Stata
by the do file ‘pre_analysis1.do’).
In summary, we can’t advise you strongly enough on the value of organising you data files around a
workflow conceptualisation, such as through master and sub-files. Read the opening chapters of
Long (2009), or the other references mentioned above, for more on this theme. We’d encourage you
to look at the workshop materials from the ‘DAMES’ research Node, at www.dames.org.uk, for more
on this topic.
11
1E Coursepack: Appendix 2
Software alternatives
Many different software packages can be used effectively for applied research using complex and clustered
social survey data. Various packages support the estimation of a wide range of multilevel models covering
random effects models and other specifications. Packages are not uniform, however, in their support for
multilevel models. There are variations in their coverage of different modelling options; their requirements
and approaches to operationalising data for modelling; the statistical algorithms the packages use; the relative
estimation times required; the default settings for estimation options and other data options; and the means
of displaying and storing the results produced. The Hox (2010) text comments systematically on software
options for different modelling requirements, and other comparisons can be found in Twisk (2006) and West
et al. (2007), amongst others. A very thorough range of material reviewing alternative software options for
multilevel modelling is available at the Centre for Multilevel Modelling website at:
http://www.bristol.ac.uk/cmm/learning/mmsoftware/
In this course we focus upon four packages which bring slightly different contributions to multilevel modelling.

SPSS is used because it is a widely available general purpose package for data management and data
analysis, which also supports the estimation of many of random effects multilevel models for linear
outcomes. Field’s (2013) introductory text on SPSS includes a chapter on multilevel modelling in SPSS,
whilst Heck et al. (2010, 2012) give longer treatments of the topic.

Stata is used because it is a popular general purpose package for data management and data analysis
which also includes a substantial range of multilevel modelling options. Stata is attractive to applied
researchers for many reasons, including its good facilities for storing and summarising estimation
results; its support of a wide range of advanced analytical methods which complement a multilevel
analysis (e.g. clustering estimators used in Economics); and its wide range of data management
functions suited to complex data. Rabe-Hesketh and Skrondal’s (2012) text provides a very full
intermediate-to-advanced level guide to multilevel modelling in Stata.

MLwiN is used because it is a widely available, purpose built package for applied multilevel modelling
which covers almost all random effects modelling permutations, and is also designed to be broadly
accessible to non-specialist users. A further attraction of MLwiN is that its purpose built functionality
means that it supports a wider range of estimation options, and achieves substantially more efficient
estimation, than many other packages. The MLwiN manual by Rasbash et al. (2012) is a very strong,
and accessible, guide to this software.

R is used occasionally in this module because it is a popular freeware that supports most forms of
multilevel model estimation (via extension libraries). It is a difficult language for social scientists to
work effectively with, however, because it brings with it very high ‘overheads’ in its programming
requirements, especially for large and complex data. Gelman and Hill (2007) give an excellent
advanced introduction to multilevel modelling which is illustrated with examples in R.
We should stress that many more packages can be used effectively for multilevel modelling analysis. Some of
those that are often used for multilevel modelling in social science research, but that we have not mentioned
hitherto, are HLM (Raudenbush et al., 2004), M-Plus (Muthén and Muthén, 2010), and BUGS (e.g. Lun et al.,
2000). Some cross-over between packages can also be found, such as in linkage routines known as ‘plug-ins’
which allow a user to invoke one software package from the interface of another – examples include routines
for running HLM (hlm, see Reardon, 2006) and MLwiN (run-mlwin, see Leckie and Charlton, 2011) from Stata,
and tools to run an estimation engine known as ‘Sabre’ in Stata and in R (see Crouchley et al. 2009).
Finally, an exciting software development in the area being led in the UK is the construction of a generic
interface for specifying and estimating complex statistical models of ‘arbitrary complexity’. These cover most
forms of multilevel models, as well as many other statistical modelling devices. This project is called ‘e-Stat’
and has generated a software resource called ‘Stat-JR’ which became open for general release in a beta
version in March 2012 (see http://www.bristol.ac.uk/cmm/research/estat/, and Browne et al. 2012).
12
1E Coursepack: Appendix 2
How to use… EditPad Lite
The first package we’ll introduce is not a statistical analysis package. EditPad Lite is a freely distributed plain
text editor, and we go to the trouble of introducing it for two reasons:


Using a plain text editor to open supplementary files and subfiles is a very good way to perform
complex operations and programming without getting into too tangled a mess.
In our experience, plenty of people haven’t previously used plain text editors, and are initially
confused by the software until they’ve got used to using it.
1) Access/
installation
To access the package, go to http://www.editpadlite.com/ and follow the links to download the
software:
13
1E Coursepack: Appendix 2
2) More on
installation
It is best to click ‘save file as’ and run the setup.exe file from a location on your computer. This
will extract four component files which allow you to run the software, regardless of where they
are stored. You could for instance install the files on a section of your memory stick and run the
programme from there, but I would recommend saving them to your network filestore folder
and running the software from that location. E.g.:
You can now run the software by double clicking the ‘EditPadLite.exe’ file. To save yourself
some time, you might want to copy that .exe file, then ‘paste as shortcut’ onto your desktop
(and maybe also add a shortcut key to the shortcut icon), so that you can open up the software
rapidly (though such desktop settings might not be preserved across login sessions).
14
1E Coursepack: Appendix 2
3. Opening
files
When you open the software, you are given a blank template from where you can then ask to
open one or more files (you can in fact open several files in one call, which can be convenient):
(note how the multiple files are organised under multiple tabs)
15
1E Coursepack: Appendix 2
4. Editing
files
You can use EditPadLite to edit files (for instance sub-files and macros which are called upon in
other applications). There are numerous handy file editing functionalities in EditPad Lite which
you won’t get in a simpler text editor such as Notepad. For instance you can run various types
of ‘search’ and ‘replace’ commands, across one or multiple files. The latter is often helpful, for
instance if you have built up a suite of related files and you want to change a path specification
in all of them.
(The red tabs indicate the contents of the file have been changed and have not yet been saved)
Closing comments on EditPad Lite
Using a plain text editor is very helpful if your work involves programming in any software language. It can
help you to deal with complex combinations of files, and it provides a ‘safe’ way of editing things without
inadvertently adding in formatting or other characters (a problem that will arise if you try to edit command
files in a word processor software such as Microsoft Word).
We’ve nominated EditPadLite for this purpose, though you might find a different editor more helpful or
preferable. There are many alternatives, some of which are free to access for some or all users, and others of
which require a (typically small) purchase price. Note that if you wish to use EditPad for commercial purposes,
the licence requires that you should purchase the ‘professional’ version of the package, called ‘EditPadPro’.
16
1E Coursepack: Appendix 2
How to use… Stata
Stata was first developed in 1984 and was originally used mainly in academic research in economics. From
approximately the mid 1990’s its functionalities for social survey data analysis began to filter through to other
social science disciplines, and in the last decade it has displaced SPSS as the most popular intermediate-toadvanced level statistical analysis package in many academic disciplines which use social survey data, such as
in sociology, educational research, population health and geography.
Stata is popular for many good reasons (see also Treiman 2009: 80-6). The list of features of Stata that lead me
personally to favour this package above others are:







It supports explicit documentation of complex processes through a concise and ‘human readable’
syntax language
It supports a wide range of data management functions including many routines useful in complex
survey data which are not readily performed in other packages (e.g. ‘egen’, ‘xtdes’)
It supports a very full range of statistical modelling options, including several advanced model
specifications which are not widely available elsewhere
It has excellent graphics capabilities, supporting the specification and export of publication quality
graphs in a syntactical, replicable manner
It features very convenient tools for storing the results from multiple models or analyses and
compiling them in summary tables or files (e.g. ‘est store’, ‘statsby’)
It can read online data files and run command files and macros from online locations
It supports extended add-on programming capabilities, and benefits from a large, constructive
community of user-contributed extensions (see http://www.stata.com/links/utilities-and-services/)
In pragmatic terms, most users of Stata are reasonably confident programmers, and getting started with the
package does need a little effort in learning about data manipulation and data analysis. This is one reason why
Stata is not yet widely taught in introductory social science courses - though, in the UK for example, it is
increasingly used in intermediate and advanced level teaching (e.g. MSc programmes or Undergraduate social
science programmes with extended statistical components).
A common problem with working with Stata is that many institutions do not have site-level access to the
software, and accordingly many individual researchers don’t have access to the package. Stata is generally sold
as an ‘n-user’ package, which means that an institution buys a specified number of copies at any one time.
Recently however, access to Stata for academic researchers has probably be made easier by the Stata
‘GradPlan’, which allows purchase of personal copies of the package for students and faculty at fairly low price
– see http://www.stata.com/order/new/edu/gradplans/. Stata also comes in several different forms with
different upper limits on the scale of data it may handle – ‘Small Stata’ is not normally adequate for working
with advanced survey datasets; ‘Intercooled’ Stata (I/C) usually has more than enough capacity to support
social survey research analysis (although, working with a large scale resources, you may occasionally hit upper
limits, such as on the number of variables or cases, it is usually possible to find an easy work-around such as by
dropping unnecessary variables); Stata SE and MP offer greater capacity regarding the size of datasets and
faster processing power, but they are more expensive to purchase. To my knowledge, most academic
researchers use Intercooled Stata.
Many online resources on Stata are available, in particular we highlight:



UCLA’s ATS pages: http://www.ats.ucla.edu/stat/stata/ (Features a wide range of materials including
videos of using Stata and routines across the range of the package)
The CMM’s LEMMA online course: http://www.bristol.ac.uk/cmm/learning/course.html (includes
detailed descriptions of running basic regression models and of specifying random effects multilevel
models in Stata)
In the first lab session we point you to an illustrative do file which serves as an introduction to Stata,
available from www.longitudinal.stir.ac.uk
17
1E Coursepack: Appendix 2
The steps below give you some relevant instructions on working with Stata for the purposes of the module.
(The images below are for Stata version 10, whereas the Essex lab is likely to have access to the most recent versions 12 or
13. For our purposes, there are few differences between these versions, though you might notice that the default ‘Results’
windows colours scheme was changed after version 10 – it is possible under ‘general preferences’ to change these back to
the ‘classic’ view if you prefer!).
When you
launch the
package, you
see the basic
Stata window,
here for
version I/C
10.1.
On opening the programme (this image shows Stata version 10):
You can
customise its
appearance
(e.g. right
click on the
results
window) –
the image on
the right
reflects how
I’ve set up the
windows on
my machine,
and will be
slightly
different to
what you see
by default on
first launching
the package
in the lab at
Essex.
18
1E Coursepack: Appendix 2
The very first
thing you
should do at
the start of
every session
it so ask
explicitly to
open the ‘do
file’ editor
with ‘ctrl+8’
or ‘ctrl+9’ or
via the GUI.
Note below
that we can
have several
do files open
at once.
Not shown
below, but
from Stata 11
onwards, it is
possible to
permit
various
formatting
options in the
do file editor
(e.g. colour
coding). It is
also possible
to set up
Stata to run
directly from
a plain text
editor if you
wish to
(search online
for how to do
this).
19
1E Coursepack: Appendix 2
Once you’ve
opened a ‘do
file’ you can
begin running
commands by
highlighting
the segments
of the
relevant
command
lines and
clicking
‘ctrl+R’.
(Note that I’ve
changed the
windows
display to
‘white’
background to
make it easier
to read the
outputs)
The results
are shown in
the results
window. Error
messages are
by default
shown in red
text and lead
to the
termination
of the
sequence of
commands at
that point
Important: Defining macros for paths. This particular image shows an important component of the start
of every session in the module lab exercises. The lines beginning with ‘global’ are ways of defining
permanent ‘macros’ for the session. The macros serve to define locations on my machine where my files
(e.g. data files) are actually stored. Doing this means that in later commands (e.g. the image below) I can
call on files via their macro folder locations rather than their full path – this aids transferability of work
between machines/locations.
(unlike in
SPSS, which
carries on,
disregarding
the error).
In the above, the macro which I have called ‘path3b’ means that when Stata reads the line:
what it reads ‘behind the scenes’ is
20
1E Coursepack: Appendix 2
You can also
submit
commands
line by line
through the
command
interface (e.g.
if you don’t
want to log
them in the
do file).
Note how the ‘review’ window shows lines that were entered through the command window,
but it just shows some programming code for commands run through the do file editor.
You need to ‘clear’ the dataspace to read in a new file, e.g.
Note some of
the features
of the Stata
syntax
language:
use $path3b\ghs95.dta, clear
You can’t create a new variable with the same name as an existing one – if it’s there already
you need to delete it first, e.g.
drop fem
gen fem=(sex==2)
The ‘capture’ command suppresses potential error messages so is a useful way to make
commands generic
capture drop fem
gen fem=(sex==2)
Using ‘by:’ or ‘if..’ within a command can usefully restrict the specifications:
tab nstays if sex==1
bysort sex: tab nstays if age >= 20 & age <= 30
The ‘numlabel’ command is a useful way to show both numeric values and categorical labels
compared to the default (labels only), e.g.;
tab ecstaa
numlabel _all, add
tab ecstaa
There’s no requirement for full stops at the end of lines, but a carriage return serves as the
default delimiter, and so we usually use ‘///’ to extend a command over more than one line.
bysort sex: tab nstays ///
if age >= 20 & age <= 60 & ecstaa==1
21
1E Coursepack: Appendix 2
Example of finding and installing the ‘tabplot’ extension routine:
Extension
routines are
often written
by users and
made
available to
the wider
community.
To exploit
them, you
need to run
either ‘net
install’ or ‘ssc
install’
(the exact code needed may depend on which machine you are working on – you may have to
define a folder for the installation that you have permission to write to)
We often run
subfiles, or
define macros
or
programmes,
via calling
upon other
do files with
the ‘do’
command
22
1E Coursepack: Appendix 2
‘est store’ is a
great way to
collate and
review
multiple
model results
Extension:
You can write
a ‘.profile’ file
to load some
starting
specifications
into your
session
For lots more
on Stata, see
the links and
references
given above,
or the DAMES
Node
workshops at
dames.org.uk
23
1E Coursepack: Appendix 2
How to use… SPSS
Here we will refer to SPSS as the software which, in 2008, was officially rebranded as ‘PASW Statistics’, and
then renamed again in 2009 as “IBM SPSS Statistics”. SPSS has been a market leader as a software tool for
social survey data analysis and data management since its first versions in the late 1960’s. Some people believe
that position is coming to an end, since recent revisions to SPSS seem to make it less suitable for academic
research users, whilst other packages have emerged, such as Stata, that seem to many to be more robust and
more wide-ranging in their capabilities. Nevertheless, most European educational establishments have sitelevel access to SPSS, and continue to teach it to students as a first introductory package for survey data
analysis; additionally a great many social science researchers continue to work with the package – so the
software itself remains a significant part of social survey data analysis.
SPSS has a very good range of useful features for data management and exploratory analysis of survey data. It
also has functionality that supports many variants of random effects multilevel models, which has increased in
coverage substantially in recent versions (e.g. Heck et al. 2010; 2012). Perhaps relatively few advanced users of
multilevel models work with SPSS, principally because other packages have a wider range of model
1
specification options . Indeed a common practice amongst multilevel modellers in the social sciences is to use
SPSS for data preparation, in combination with a specialist multilevel modelling package such as MLwiN or
HLM for the actual statistical analysis. In this module we’ll use SPSS in this way (alongside MLwiN), though we
will also illustrate its own multilevel modelling capacity.
In working with SPSS for the preparation or analysis of complex survey data, it is critical to use ‘syntax’
command programming in order to preserve a record of the tasks you undertake. This is often a bit of a
burden to adapt to if it is new to you: probably most people are first taught to use the package using nonsyntax methods (i.e. using the drop down or ‘GUI’ interface). For topics such as covered in this module,
however, it is essential to work with syntax commands, which we’ll give you examples of in the relevant lab
session files. You can also find out relevant syntax by setting up commands via the GUI menus then clicking
‘paste’, or by using the SPSS help facilities.
The steps below highlight ways of using SPSS effectively for the purpose of this module (these images use
version 17). Online introductions to SPSS are available from many other sources, and a list of online tutorials is
maintained at:
http://www.spsstools.net/spss.htm
1
We should note that SPSS can also be used as the means of specifying a much wider range of multilevel modelling and
other statistical models through its ‘plug-in’ facility supporting R and Python scripts (see e.g. Levesque and SPSS Inc, 2010,
for more on this topic).
24
1E Coursepack: Appendix 2
Your first steps
should always
be to open the
programme
first, then
immediately
open a new or
existing syntax
file
I advise against double clicking on SPSS format data files as a way to open the package and data
simultaneously. This might seem a trivial issue, but actually it makes quite a difference to the
replicability and consistency of your analysis.
25
1E Coursepack: Appendix 2
You can run
commands via
syntax by
highlighting all
or part of the
line or lines you
want to run,
then clicking
‘run ->
selection’, or
just ‘ctrl+r’.
You can write
out the
commands
yourself; or find
their details in
the manual; or
use the ‘paste’
button to
retrieve them
from the
windows link.
The above gives you a manual with documentation on all syntax commands. Alternatively by clicking
the ‘syntax help’ icon whilst the cursor is pointing within a particular command, you can go straight
to the specific syntax details of the command you’re looking at (see below).
26
1E Coursepack: Appendix 2
These images
depict using the
‘paste’ option to
find relevant
syntax (for the
crosstabs
command).
Using ‘paste’,
you’ll often get
a bit more
syntax detail
than you
bargained for
since all the
default settings
are explicitly
shown.
27
1E Coursepack: Appendix 2
Most often,
you’ll start with
an existing
syntax file and
work from it,
editing and
amending that
file. A well
organised
syntax file
should start
with metadata
and path
definitions.
By default, the
syntax editor
has colour
coding and
other command
recognition. I
personally
prefer to turn
off annotation
settings on the
syntax editor
(under ‘tools’
and ‘view’), but
others prefer to
keep them on.
*Path definitions are important! In the above, at the top of the file we’ve used the ‘define..’ lines to
specify macros which SPSS will now recognise as the relevant text. For example, when SPSS reads
in the above ‘get file’ line, it is actually reading
28
1E Coursepack: Appendix 2


A few syntax
tips:



New commands begin on a new line.
Command lines can spread over several lines, but commands must
always end with a full stop. When a command line is spread over several
lines, it is conventional, though not essential, to indent the 2 nd, 3rd etc
lines by one or more spaces.
Any line beginning with a ‘*’ symbol is ignored by the SPSS
processor and is regarded as a ‘comment’ (you write comments purely
for your own benefit).
Do your best not to use the drop-down menus! Practice pays off where
using syntax is concerned – the more you run in syntax, the more
comfortable you’ll be with it
It’s good to organise your syntax files into master files and sub-files.
The master file can call the sub-files with the ‘include’ command.
Some comments
on output from
SPSS




You can manually edit content of the output windows (e.g. text, fonts, colours).
You can paste output directly into another application, either as an image (via
‘paste special’) or as an editable graph or table (in some applications)
Usually, you’re unlikely to want to save a whole output file (it’s big and isn’t
transferable) (But you can print output files into pdf only)
Many users don’t save SPSS output itself, but merely save their syntax and
transcribe important results from the output as and when they need it.
29
1E Coursepack: Appendix 2
How to use… MLwiN
MLwiN was first released in 1995 as an easy to use Windows style version of the package ‘MLN’.
That package supported the preparation of data and the estimation of multilevel models via a DOS
style command interface; MLwiN has since retained the MLN estimation functions, and developed a
great many new additional routines.
How to use MLwiN (i): MLwiN the easy way
The standard means of using MLwiN is to open the package then open data files in its format, and input
commands through opening the various windows and choosing options. People most often use the Names and
Equations windows here – the latter is especially impressive, allowing you to specify models then run them
and review their results in an intuitive interface.
The MLwiN user’s guide gives a very accessible description of the windows commands options, as does the
LEMMA online course covering MLwiN examples (http://www.cmm.bristol.ac.uk/research/Lemma/). One
important addition, however, is that it is very useful to have the ‘command interface’ and ‘output’ windows in
the background. These will give you valuable information on what is going on behind the scenes when you
operate the package.
Here are some comments and description on using MLwiN in the standard way:
When you first
open the
MLwiN
interface…
30
1E Coursepack: Appendix 2
…the first thing
to do is click
‘data
manipulation’
and ‘command
interface’, then
click ‘output’,
then tick
‘include
output..’
{Note - These
materials are
oriented to
MLwiN v 2.20,
but they should
be compatible
with other
versions of the
software}
The normal
next step it to
open up an
existing MLwiN
format dataset
(‘worksheet’)..
31
1E Coursepack: Appendix 2
..or else to read
in data in plain
text format,
which requires
you to specify
how many
columns there
are in the data..
When data is
entered, most
people now
open the
‘names’ and
‘equations’
windows as
their next
steps. From the
equations
window, you
can click on
terms or add
terms to
develop your
model as you
wish…!!
32
1E Coursepack: Appendix 2
Important: as
you read in a
worksheet file,
it might already
contain some
saved model
settings which
are visible via
the equations
window
This is the popularity analysis dataset as used in Hox 2010. The terms in the equation
window show the model that has been ‘left’ in MLwiN at the last save – the terms show the
outcome variable (popular); explanatory variables (cextravl; csex; ctexp; cextrav*ctexp; the
random effects design, and the model coefficient estimates for that model.
MLwiN
supports all
sorts of other
useful windows
and functions;
we’ll explore
some but not
all over the
module.
33
1E Coursepack: Appendix 2
How to use MLwiN (ii): MLwiN for serious people
Good practice with statistical analysis involves keeping a replicable log of your work. MLwiN has some very
nice features but it is not well designed to support replication of analysis activities. This is because it is
programmed on the presumption that most specifications are made through windows interface options (point
and click specifications). Most people working with MLwiN specify models, and produce other outputs, via the
windows links. This can work well to a point, but it does make documentation difficult. To see the ‘behind the
scenes’ commands that MLwiN is processing when responding to point and click instructions, ensure the
output window has the box ‘include output from system generated commands’ ticked, from which point you
can see results in the background.
Replication in MLwiN can be achieved - by building up a command file which will run all the processes you are
working on – but this is quite hard work. (An alternative is to set a log of your analysis and run your analysis
through point-and-click methods – some people do this, but it does generate a very long list of commands that
isn’t easily navigated). Building up a command file is hard because the command language (known as ‘MLN’) is
not widely reported. If possible, it is usually better to build up your analysis using a template from another file
(e.g. the command files we’ll supply in this module). Also, some features of the programme use rather longwinded command language sequences which are burdensome to write out manually (e.g. in moving through
multiple iterations when estimating models).
Even so, my advice is to persevere with building up a command file in MLwiN covering the model specification
you’re working on. To do this, I recommend writing your file in a separate plain text editor, and invoking it with
the ‘obey’ command in MLwiN. This is cumbersome, but tractable. I find that a reasonable way to approach
this is to explore and refine models via the windows options, but make sure that when I settle on an approach,
I write it into my command file, and then run the command file again to check that this has worked.
The following screens show this being done:
34
1E Coursepack: Appendix 2
The above screens depict how I would typically have MLwiN configured for my own work. I would have the
package open, but simultaneously I would also be editing a command file in another editor, which I would
periodically run in full in MLwiN.
Looking at the MLwiN session, at various points there are other windows that I’ll need to open, but the four
windows shown are probably the most important. At the command interface I can manually input text
commands. I sometimes do this, and at other times I use it to run a whole subfile, such as with:
obey "d:\26jun\new\prep\prep_mlwin1.mac"
This is used to run the whole of the command file (that I’m simultaneously editing in EditPad Lite) through
MLwiN (which includes a ‘logo’ command which sends the log of the results to a second plain text location).
Next, in the output window, I can see the results of commands I send. This includes results emerging from my
macro file and commands sent in through the command interface, and it also includes results emergent from
windows based command submissions. For example, in this image, I’ve recently input the commands ‘fixe’ and
‘rand’ from the command interface, with the effect of generating a list of the ‘fixed’ and ‘random’ parameter
estimates from the last model run.
The other windows shown in MLwiN here are the names window (which shows a list of the variables and
scalars currently in memory), and the equations window. The equations window is a very attractive feature of
MLwiN through which, firstly, you can see the current model represented statistically, along with its parameter
estimates, and secondly, you can manually edit to adjust the terms of the model.
35
1E Coursepack: Appendix 2
How to use MLwiN (iii): Using ‘runmlwin’
A third mechanism exists for running MLwiN as a plug-in from Stata (see Leckie and Charlton, 2011).
For Stata users this is particularly attractive because it offers the convenience of working with your
data in Stata, but the ability to run complex multilevel models using the more efficient (and wider
ranging) estimation routines of MLwiN (compared to the relatively slow routines in Stata). (A similar
plug-in has also been written recently for the R package).
‘Runwlwin’ allows you to save most model results within your Stata session in a similar way to that
which you would otherwise have achieved from running a model in Stata (i.e. you can subsequently
use commands such as ‘est table’ on the basis of model results which were calculated in MLwiN).
Using
runmlwin
involves
working
in Stata,
and
requires
the most
recent
available
Stata
and
MLwiN
versions
MLwiN
As preliminaries, you must set a macro, specifically called ‘MLwiN_path’ which identifies
where MLwiN is installed on your machine.
..you must then install runmlwin using the typical Stata module installation commands
36
1E Coursepack: Appendix 2
Running
a model
with
runmlwin
uses a
fairly
intuitive
syntax
The
output
includes
a brief
glimpse
of
MLwiN,
which is
opened,
used,
and, if
you used
the
‘nopause’
option, is
promptly
closed
again..!
Note above that by using the ‘viewmacro’ model we also get to see the underlying
MLwiN macro used for this analysis.
37
1E Coursepack: Appendix 2
A great
attraction
is the
storage
of model
results
from
MLwiN as
items in
Stata
In my opinion, runmlwin is a very exciting new contribution to software for multilevel modelling, and will prove
a very valuable tool for anyone with access to both packages!
Some miscellaneous points on working with MLwiN




Sorting data. Some MLwiN models will only work effectively if the data has been sorted in ascending
order of respective level identifiers. In some circumstances, moreover, it may be necessary to ensure
that the identifiers begin at 1 and increase contiguously. Generally, it is easiest to arrange identifiers
in this way in another package such as Stata or SPSS before reading the data into MLwiN.
Macro editor. It is possible to edit a macro within the MLwiN session using the link ‘file -> {open/new}
macro’. The image below shows an example of this. However, the macro can only be executed in its
entirety (there is no comparable function to the SPSS and Stata syntax editors to allow you to run only
a selected subset of lines), and on balance you are likely to find this less convenient then executing a
macro which is open in a separate plain text editor.
Saving worksheets. An important feature of MLwiN is that when a worksheet is saved in MLwiN, as
well as saving the underlying data, the software automatically saves the current model settings as
part of the worksheet. Opening and reviewing the Equations window is usually adequate to reveal any
existing model specifications. Also, model specifications can always be cleared from memory with the
‘clear’ command. To follow ‘best practice’ regarding replication standards, ideally it is preferable to
‘clear’ datasets of the model specifications first, then re-specify the model from first principles via
your working macro. In exploratory analysis phase, however, people frequently do exploit the save
functionality to save their current model settings.
Reading data from other formats. MLwiN is relatively poorly designed for reading data from other
software formats. In general, only plain text data can be easily recorded, and you will need to add in
any metadata (e.g. variable names; variable descriptions) manually yourself. The effective way to do
this is to write a data input macro for the relevant files, which contains the metadata specifications
you want to make then saves out to MLwiN data format (*.ws), thus allowing you to start an
analytical file from that point. There is an illustration of this process in Practical 2b.
Useful user guides on MLwiN are:
Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2012). A User's Guide to MLwiN,
v2.26. Bristol: Centre for Multilevel Modelling, University of Bristol. (pdf available:
http://www.bristol.ac.uk/cmm/software/mlwin/download/2-26/manual-print.pdf)
Rasbash, J., Browne, W. J., & Goldstein, H. (2003). MLwiN 2.0 Command Manual. London:
Centre for Multilevel Modelling, Institute for Education, University of London. (pdf
available: http://www.cmm.bristol.ac.uk/MLwiN/download/commman20.pdf)
(The former covers windows operation of the package, and the latter covers the syntax command
language based upon the earlier MLN package).
38
1E Coursepack: Appendix 2
How to use… R
R is a freeware which is a popular tool amongst statisticians and a small community of social science
researchers with advanced programming skills. It is an ‘object oriented’ programming language which supports
a vast range of statistical analysis routines, and many data management tasks, through its ‘base’ or extension
commands. Being ‘object oriented’ is important and means the package appears to behave in a rather
different way to the other packages described above. The other packages essentially have one principal
quantitative dataset in memory at any one time, plus they store metadata on that dataset and typically some
other statistical results in the form of other scalars and matrices; in such packages, commands are
automatically applied to the variables of the principal dataset. In R, however, different quantitative datasets
(‘data frames’), matrices, vectors, scalars and metadata, are all stored as different ‘objects’, potentially
alongside each other. R therefore works by first defining objects, then second performing operations on one or
many objects, however defined.
An important concept in R is the ‘extension library’, which is how ‘shortcut’ programmes to undertake many
routines are supplied. In fact, you will rarely use R without exploiting extension libraries. The idea here is that R
has a ‘base’ set of commands and support, and that many user-contributed programmes have been written in
that base language. Those extensions typically provide shortcut routes to useful analyses. A few extension
libraries in R are specifically designed to support random effects multilevel model estimation – e.g. the lme
package (Bates, 2005; Pinhero & Bates, 2000).
Some researchers are very enthusiastic about R, the common reasons being that it is free and that it often
supports exciting statistical models or functions which aren’t available in some other packages. However, my
perspective is that R isn’t an efficient package for a social survey researcher interested in applied research, as
the programming demands to exploit it are very high, and, because it isn’t widely used in applied research, it
hasn’t yet developed robust and helpful routines, working interfaces, or documentation standards, to address
popular social science data-oriented requirements.
R is installed
as freeware
and since it is
frequently
updated it is
wise to
regularly
revisit the
distribution
site and redownload
39
1E Coursepack: Appendix 2
When you
open R, you
will see
something
like this ‘R
console’
(on my machine I use a ‘.rprofile’ settings file so my starting display is marginally different to the default)
The first few
lines show me
defining a
new object (a
short vector)
and listing the
objects in
current
memory.
R’s basic help
functions
point to
webpages.
The general help pages mostly have generic information, and are not in general provided with worked
examples. Many R users get their help from other online sources, e.g. http://www.statmethods.net/
40
1E Coursepack: Appendix 2
In general,
with R, the
first thing you
should do is
ask to open a
new or
existing script
and work
from that.
Scripts in R
work in a
similar way to
a syntax file in
Stata or SPSS
– highlight a
line or lines,
and press
‘ctrl+R’.
41
1E Coursepack: Appendix 2
After running
commands,
output is sent
either to the
main console
or a separate
graphics
window
42
1E Coursepack: Appendix 2
References cited
Altman, M., & Franklin, C. H. (2010). Managing Social Science Research Data. London: Chapman
and Hall.
Bates, D. M. (2005). Fitting linear models in R using the lme4 package. R News, 5(1), 27-30.
Browne, W. J., Cameron, B., Charlton, C. M. J., Michaelides, D. T., Parker, R. M. A., Szmaragd, C., et
al. (2012). A Beginner's Guide to Stat-JR (Beta release). Bristol: Centre for Mutlilevel
Modelling, University of Bristol.
Crouchley, R., Stott, D., & Pritchard, J. (2009) Multivariate generalised linear mixed models via
sabreStata (Sabre in Stata). Lancaster: Centre for e-Science, Lancaster University.
Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research
Methodology, 9(2), 143-158.
Field, A. (2013). Discovering Statistics using SPSS, Fourth Edition. London: Sage.
Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology?
Sociological Methods and Research, 36(2), 153-171.
Fox, J., & Weisberg, S. (2011). An R Companion to Applied Regression, 2nd Ed. London: Sage.
Gelman, A., & Hill, J. (2007). Data Analysis using Regression and Multilevel/Hierarchical Models.
Cambridge: Cambridge University Press.
Heck, R. H., Thomas, S. L., & Tabata, L. N. (2010). Multilevel and Longitudinal Modeling with IBM
SPSS. Hove: Psychology Press.
Heck, R. H., Thomas, S. L., & Tabata, L. N. (2012). Multilevel Modeling of Categorical Outcomes
Using IBM SPSS London: Routledge.
Hox, J. (2010). Multilevel Analysis, 2nd Edition. London: Routledge.
Kulas, J. T. (2008). SPSS Essentials: Managing and Analyzing Social Sciences Data New York:
Jossey Bass.
Kohler, H. P., & Kreuter, F. (2009). Data Analysis Using Stata, 2nd Edition. College Station, Texas:
Stata Press.
Leckie, G., & Charlton, C. (2011). runmlwin: Running MLwiN from within Stata. Bristol: University of
Bristol, Centre for Multilevel Modelling, http://www.bristol.ac.uk/cmm/software/runmlwin/
[accessed 1.6.2011].
Levesque, R., & SPSS Inc. (2010). Programming and Data Management for IBM SPSS Statistics 18:
A Guide for PASW Statistics and SAS users. Chicago: SPSS Inc.
Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.
Muthén, L. K., & Muthén, B. O. (1998-2010). Mplus User's Guide, Sixth Edition. Los Angeles,
California: Muthén and Muthén, and http://www.statmodel.com/
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-Plus. New York: SpringerVerlag.
Rabe-Hesketh, S., & Skrondal, A. (2008). Multilevel and Longitudinal Modelling Using Stata, Second
Edition. . College Station, Tx: Stata Press.
Rafferty, A., & Watham, J. (2008). Working with survey files: using hierarchical data, matching files
and pooling data. Manchester: Economic and Social Data Service, and
http://www.esds.ac.uk/government/resources/analysis/.
Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2012). A User's Guide to MLwiN, v2.26.
Bristol: Centre for Multilevel Modelling, University of Bristol.
Rasbash, J., Browne, W. J., Healy, M., Cameron, B., & Charlton, C. (2011). MLwiN, Version 2.24.
Bristol: Centre for Multilevel Modelling, University of Bristol.
Reardon, S. (2006). HLM: Stata Module to invoke and run HLM v6 software from within Stata. Boston:
EconPapers, Statistical Software Components
(http://econpapers.repec.org/software/bocbocode/s448001.htm).
Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., & Congdon, R. (2004). HLM 6: Hierarchical Linear
and Nonlinear Modelling. Chicago: Scientific Software International.
Spector, P. (2008). Data Manipulation with R (Use R). Amsterdam: Springer.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York:
Jossey Bass.
Twisk, J. W. R. (2006). Applied Multilevel Analysis: A Practical Guide. Cambridge: Cambridge
University Press.
West, B. T., Welch, K. B., & Gatecki, A. T. (2007). Linear Mixed Models. Boca Raton, Fl: Chapman
and Hall.
43
1E Coursepack: Appendix 2
Software-specific recommendations:
Stata
Kohler, H. P., & Kreuter, F. (2009). Data Analysis Using Stata, 2nd Ed College Station, Tx: Stata
Press.
Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and Longitudinal Modelling Using Stata, Third
Edition. . College Station, Tx: Stata Press.
Web: http://www.longitudinal.stir.ac.uk/Stata_support.html ; http://www.ats.ucla.edu/stat/stata/
SPSS
Field, A. (2013). Discovering Statistics using SPSS, Fourth Edition. London: Sage.
Kulas, J. T. (2008). SPSS Essentials: Managing and Analyzing Social Sciences Data New York:
Jossey Bass.
Levesque, R., & SPSS Inc. (2010). Programming and Data Management for IBM SPSS Statistics 18:
A Guide for PASW Statistics and SAS users. Chicago: SPSS Inc.
Web: http://www.spsstools.net/spss.htm
MLwiN
Rasbash, J., Steele, F., Browne, W. J., & Goldstein, H. (2009). A User's Guide to MLwiN, v2.10.
Bristol: Centre for Multilevel Modelling, University of Bristol.
Web: http://www.bristol.ac.uk/cmm/software/mlwin/
R
Field, A., Miles, J., & Field, Z. (2013). Discovering Statistics using R. London: Sage.
Fox, J., & Weisberg, S. (2011). An R Companion to Applied Regression, 2nd Ed. London: Sage.
Gelman, A., & Hill, J. (2007). Data Analysis using Regression and Multilevel/Hierarchical Models.
Cambridge: Cambridge University Press.
Spector, P. (2008). Data Manipulation with R (Use R). Amsterdam: Springer.
(There is also a useful guide to using R, within SPSS, in the body of Levesque & SPSS Inc 2010, cited above).
Web: http://www.ats.ucla.edu/stat/r/
44
1E Coursepack: Appendix 2