Download Report

Use of R Software to Obtain Model Fits to Data
Note that R is the statistics software used in Math 141. It is widely available on campus. Some mock
data files are also available on the 394 web site.
Preliminaries
I recommend maintaining your raw data in Excel or some other spreadsheet software. It is easy to
do simple arithmetic operations on the data in those files. You can then export the data as tabdelimited text files (under the Save As menu), which can be imported into R. Also, I have not
included any instructions on how to produce high quality plots from R, though it can be done.
Instead, I recommend creating those plots in Excel (or some other appropriate software), unless you
are already proficient in R.
In creating any Excel spread sheets I recommend creating labeled columns for substrate
concentration and rate (“s” and “v”).
Fitting Data with a Linear Least Squares Fit Regression
•
Open R. An “R Console” window will open. From here on out, I will provide the
commands that you should enter following the “>” prompt in Arial font. A comment for a given
command will follow the “#” sign.
•
Identify the directory in which you are working. For example, if your files are in a Folder
called “LDH” on your Desktop, type setwd(“/Users/arthur/Desktop/LDH”).
•
Load the text file containing your data. For example, using the sample file “SimpleData.txt”,
type mydata = read.table(“SimpleData.txt”,header=TRUE). The “header” issue notes that each
column has a name. “TRUE” must be capitalized. You can see the data were properly imported by
typing data.frame(mydata), which will show you what you got.
•
Specify your variables, using the header names from the text file.
s<-c(mydata[“s”])
v<-c(mydata[“v”])
•
You can also create new variables appropriate for the linear plots. In the following the “$”
indicates that the data is part of an array of values for a given variable.
mydata$is <- 1/mydata$s
mydata$iv <- 1/mydata$v
•
R allows you to create plots for your data, as a quick check that all is good. Type
plot(mydata$is, mydata$iv). Note that x variable precedes y variable.
•
Now for the linear regression. Type lmfit <- lm(formula= iv ~ is, data=mydata). Note that
the y variable precedes the x variable. This command won’t show anything. To see a summary of
the fit, type summary(lmfit). The following table will appear, along with a lot of other data.
The “estimate” of the (intercept) is the y-intercept, along with a standard error. The slope is
provided by the “estimate” of the x-variable (is in this case) and it has a standard error estimate as
well. These are the results you want. The quality of the fit is given by the residual standard error.
•
Just to check that you didn’t make any silly mistakes, you can plot the line in R over the plot
you’ve made. Type abline(lmfit). Make sure it looks reasonable.
Fitting data with a non-linear least squares fit regression
•
Open R. An “R Console” window will open. From here on out, I will provide the
commands that you should enter following the “>” prompt in Arial font. A comment for a given
command will follow the “#” sign.
•
Identify the directory in which you are working. For example, if your files are in a Folder
called “FA” on your Desktop, type setwd(‘/Users/arthur/Desktop/FA’).
•
Load the text file containing your data. For example, using the sample file “fadata.txt”, type
mydata = read.table(“fadata.txt”,header=TRUE). The “header” issue notes that each column has a
name. “TRUE” must be capitalized. You can see the data were properly imported by typing
data.frame(mydata), which will show you what you got.
•
Specify your variables, using the header names from the text file.
s<-c(mydata[“s”])
v<-c(mydata[“v”])
•
The fit is achieved by the following:
nlsfit <- nls(v~(vm*s/(km+s)),data=mydata, start=list(km=10, vm=.03))
A couple of comments. Note that the binding equation is embedded in the above line, with new
parameters identified as vm (meaning Vmax) and km (Michaelis constant). To help R get good values,
a couple of guesses are provided in the “start” values. I guessed 10, .03. You’ll need to guess values
based on your data.
•
The statistics are available by typing summary(nlsfit). Note that the Estimate is the value
that R obtains from the fit and Std. Error is the standard error for that value. Think about
significant figures before transferring this information somewhere. The residual standard error
provides an overall measure of the quality of the fit. It can be readily compared if you are looking at
models that have an equal number of parameters (true for our cases).
Fitting Inhibition Data
The process of using R with multiple independent variables (substrate and inhibitor concentration)
is very similar to performing the non-linear least squares fit described above. You will need to fit the
data to an appropriate algebraic model and you will need to estimate starting values for each variable
(Vmax, Km and Ki). You now have good initial guesses for the first two, and you try a few different
initial guesses for Ki to be sure you get consistent estimates with any given model.
Take a look at the Inhibited files I’ve created to see how I’ve organized the data. When you load it
into R it’s the identical procedure as above, but you’ll need to identify a third variable by typing
something like:
i<-c(mydata=[“i”])
And when it comes time to test the model, here’s what you might type for competitive inhibition.
nlsfit <- nls(v~Vm*s/(s+(Km*(1+(i/Ki)))),data=mydata, start=list(Km=15, Vm=9,Ki=4))
To see the statistics, type summary(nlsfit):
Note the residual standard error. This is a measure of the deviation of the fit from the observed
data. The smaller the number, the better the fit (important in your report).