Download Report

Introduction
This document contains transcripts for the podcasts available for this course. These transcripts
were prepared by staff at the Northeastern State University Center for Teaching and Learning to
meet requirements of the Americans with Disabilities Act.
Thanks to Devon Isaacs.
VGenetic16 In this podcast I’m going to try to give you some hints about how to work genetic linkage problems. These may be things that you’ve worked out on your own, or they may be things that you haven’t worked out on your own. In which case I hope this will be useful. Let’s start out by reviewing what’s involved in doing linkage analysis. In linkage analysis you start by crossing two pure-‐breeding parental lines, in this case parent one (AB/AB) and parent two (ab/ab). In order to work out how this cross comes out, we need to know the genotypes of the gametes that will be produced. For parent one the gametes will be Big A/Big B. For parent two the gametes will be little a/little b. These are the parental type gametes and we’re showing parental type gametes in red in the rest of this presentation. Once you know the parental type gametes you can figure out what the progeny of the cross will be like. And the progeny get one chromosome from parent one, that’s the AB chromosome, and they get the other chromosome from parent two, that’s the ab chromosome. To do linkage analysis of course we have to do a test cross. And a test cross is simply a cross between a double heterozygote, in this case AB/ab with a doubly homozygous recessive (ab/ab). There will be four different types of progeny from the test cross. Two of them will be parental type, those are the ones shown in red (AB/ab and ab/ab). Those get chromosomes from the double heterozygote without any crossing over. The other two types of progeny in the test cross are shown in green. And in these, recombination/crossing over has occurred between the a-‐locus and the b-‐locus and this has generated recombinant type gametes (Ab/ab in one case and aB/ab in the other case). The parental type test cross progeny will be more abundant…there will be more of them. And the recombinant type test cross progeny… they will be less abundant. There will be fewer of them. So let’s look now at some different types of questions that you might be asked. What’s basically going to happen in these questions is you’re going to be given a limited amount of information and then you’re going to need to work out the rest of the information that I just showed you. I’m going to show you how to do that, show you the connections between the different types of information. So the simplest types of question you could be asked is probably one where you’re given the genotypes of parent one and parent two-‐ the pure-‐breeding parents. And in this case we’re doing a cross in repulsion and one parent is Bc/Bc and the other parent is bC/bC. If you know that, then you can work out the gametes that they will produce (Bc and bC), those are the parental type gametes. And if you know the gametes that can be produced, you know what the progeny of this original cross will be. You know what the genotype of the double heterozygote will be-‐ it will be Bc/bC. We take that double heterozygote and we do a test cross with the doubly homozygous recessive (bc/bc). And that will give us four types of progeny. And two of those progeny will be parental type, showing in red, and two of them will be recombinant type. And of course you know that the parental type will be more frequent and the recombinant type will be less frequent. Another way that you could be asked this question would be that you could be given the parental type gamete. You could be told that the parental type gamete for parent one was CD, and the parental type gamete for parent two is cd. If you know that you can figure out the genotypes of the pure-‐breeding parents. Because they’re pure-‐breeding both chromosomes have to be the same. So parent one has to be CD/CD. And parent two has to be cd/cd. You can also of course figure out the genotype of the progeny, the double heterozygote that you’re going to use in the test cross. And there’s the test cross strain (CD/cd). And from that you can figure out what the four progeny classes will be. Parentals at the top in red, recombinants at the bottom in green… and you’ll know which are more frequent and which are less frequent. The third version of this type of question would be one where you’re given the genotype of the testcross double heterozygote. In this case it’s De/dE. If you know that, then you know that one gamete has to be De and the other gamete has to be dE. And given that you can work out the genotypes of the pure breeding parents. You can also figure out what the genotypes of the four testcross progeny will be. And as we’ve seen before, you know that the parental types will be more frequent and the recombinant types will be less frequent. One last way that you can be given a question like this is you could be told the genotype of one of the testcross progeny. You could be told that it is one of the more frequent progeny classes. And the key here is if you know it’s the more frequent progeny class you know that that’s parental type, shown in red (EF/ef). You could also of course be given a genotype and be told it is less frequent in which case it would be recombinant type. Either way, if you know whether your testcross progeny type is more frequent -‐parental type or less frequent-‐recombinant type you can then work out the other three testcross progeny types. And given that information you can work out the genotype of the double heterozygote that was used in the testcross. And from that of course, you can figure out what the parental type gametes are. And from that, you could if necessary figure out what the genotypes of the pure breeding parents are. So, all of this information is connected together. It’s all related, and that means if you’re given just a little bit of information you can work out the rest of it. There is of course another aspect to linkage site questions and that is actually working out map distances. I’m going to look at that in two different ways. The first one is that you could be given information about a testcross and you could be given the numbers of each progeny class. In this case I’m saying that there are 120 which got the Fg parental type gamete, 120 that got the fG parental type gamete, and 80 for each of the two recombinant types (FG and fg). From that you can calculate the map distance between the f-‐locus and the g-‐locus. And the map distance of course is given by the percentage of recombinants. You calculate the percentage of recombinants by taking the number of recombinant type progeny and multiplying it by 100 to make it a percent and dividing by the total number of progeny present. In this case the recombinants are 80 for the FG and 80 for the fg. And then the total progeny of course is the 120+120+80+80, the sum of all the progeny that there are. If you do the calculation, you find out you get 40% recombinant… or put another way the f-‐locus and the g-‐locus, those to loci are 40 map units apart. Finally, there is one other twist on that. And that is, you could be given the map distance…20 map units in this case…and asked to figure out how many of the different testcross progeny types there were. If there are 20 map units, then that means that the recombinant types (shown in green at the bottom) Gh and gH, we know that those are 20% of the total. That means that the parental types must be 80% of the total. It further follows, that if the recombinant types together are 20% of the total progeny, then the individual recombinant types must be 10% of the progeny. So 10% of the progeny will be Gh gametes produced from Gh gametes. And 10% will be produced from gH gametes. Similarly, the 80% parentals must have 40% of the GH parental type gametes and 40% of the gh parental type gametes contributing to the testcross progeny. So that shows you another angle on linkage type questions . You can get practice doing various types of linkage problems using the interactive problems that are available on the website. And you’re looking for the practice question for Lecture 16, Questions 1-‐10. There’s a series which give you progressively less and less information, and then there are a few extra ones thrown in at the end for additional practice. I hope you find that useful and thank you for your time. VGenetic 22 In this genetics podcast I’m going to try to show you how a particular type of question involving Hardy-‐Weinberg equilibrium; specifically a question where you’re asked to find out if a population is in Hardy-‐Weinberg equilibrium or not. The example we’re going to look at concerns the Scarlet Tiger Moth. In the Scarlet Tiger Moth there’s a variable amount of white spotting on the forewing. This is controlled by a single locus which is two alleles and these two alleles show incomplete dominance. This shows the phenotype of one homozygote, the one that has a lot of white spotting. This picture is a simulation of what the individual would look like if it had the…uh, if it was homozygous for the allele that has the least amount of white spotting. And this is the intermediate, the heterozygote, which has an intermediate amount of white spotting as you’d expect from incomplete dominance. The population was studied, and the number of individuals of each phenotype was determined… And for the white spotting the individuals were homozygous and had a lot of white spotting-‐ there were 2,781 of those. For the heterozygotes with the intermediate amount of white spotting there were 337 of those. And for the homozygotes, which had very little white spotting, uh there were just eleven of those. So the question is, is this population in Hardy-‐
Weinberg equilibrium as far as the alleles of this four-‐winged spotting locus is concerned? So let’s look at how we figure that out. So here in table form is the data that we have so far. We know that there are 2,781 of the AA genotype, 337 of the Aa genotype, and 11 of the aa genotype. We need to know the total number of individuals in the population which we get simply by adding those three genotypes together. So 2,781 + 337 + 11… that gives us the total number of individuals…3,129. We’re now going to use that to figure out the allele frequencies. So let’s start off by looking at Big A allele frequency. The individuals who are homozygous Big A/Big A, each of those has two Big A alleles. So the number of Big A alleles from those individuals is 2 x 2,781. In addition the heterozygotes each have one Big A allele and there are 337 of those, so we need to add that into our total number of Big A alleles [(2 X 2,781) + 337]. To get the frequency we need to divide by the total number of alleles in the population and the size of the population is 3,129, and each of those individuals has two alleles. So the total number of alleles of all types in the population is 2 x 3,129. So if we divide the total number of Big A alleles in the population by the total number of alleles in the population, we can get the allele frequency for Big A which works out at 0.943. That’s our first allele frequency. (2 x 2781) + 337 = 0.943 2 x 3129 We can now calculate the alleles frequency for little a. And the homozygous individuals, little a/little a…those each have two alleles and there are eleven of those, so (2 x 11) gives us the number of little a alleles that those individuals contribute. And then of course the heterozygotes, Big A/little a, have one little a allele…there are 337 of those and so we add those in. That’s the number of little a alleles in the population. To get the alleles frequency we need to divide by the total number of alleles in the population, and that’s given by the total number of individuals in the population times two. So that’s our calculation for the allele frequency. We divide the total number of little a alleles in the population by the total number of alleles in the population and that comes to 0.057, for the allele frequency for little a. (2 x 11) + 337 = 0.057 2 x 3129 In terms of Hardy-‐Weinberg equilibrium, those two frequencies correspond to “p” and “q” respectively. So p is 0.943 and q is 0.057. We’re going to use those allele frequencies to calculate the expected genotype frequencies now. If the population was in Hardy-‐Weinberg equilibrium then the expected genotype frequency for AA would be p2 and p2 is 0.943 times itself which comes out as 0.889. So we just put that in there. To get the genotype frequency of aa that’s q2 and so we take our value of q, which is 0.057, and multiply that by itself…and that comes out to 0.003 if you round to three decimal places. To get the number of heterozygotes…heterozygotes are given by 2pq and that’s (2 x 0.943 x 0.057) which works out at 0.108. And that’s our genotype frequency for the heterozygotes. Those are frequencies, and what we need is the expected number of individuals in the population. And we get that by multiplying the genotype frequencies by the total number of individuals in the population. The total number of individuals in the population is 3,129. To get the expected number of AA individuals, it would be the genotype frequency for AA (which is 0.889) times the total number of individuals in the population…and that comes out to 2,782. Similarly, to get the expected number of Aa heterozygotes, we take the genotype frequency of the heterozygotes times the total number of individuals in the population…that gives us 338, or our expected number of heterozygotes. And for the aa individual it’s the same calculation, genotype frequency times the total number of individuals in the population gives us our expected number of individuals…in this case, nine. We’re now ready to do our statistical comparison. We’ve got our observed numbers in the left-‐
hand column and the expected numbers in the right-‐hand column. And just looking at these you can probably guess what the results going to be, but let’s just go through the calculation. The statistical test we use is the chi-‐squared test. And in the chi-‐squared test what we do is we calculate the observed minus the expected squared, divided by the expected number of individuals for each of the three genotypes concerned. So we have one term for AA, one term for Aa, and one term for aa. And all we do now is plug in the numbers. So for the AA term we plug in those numbers there like that. For the Aa heterozygote we plug those numbers into that term there, like so. And similarly for aa homozygotes, we plug those numbers into that term there. And if you do that calculation, you will find that that comes out to 0.448. Now we need to know what that number means…what does that number, or that chi-‐squared value, tell us. The critical value for a chi-‐squared test that has two degrees of freedom like this is 5.991. If X2 is greater than 5.991 you reject the hypothesis that the population is in Hardy-‐Weinberg equilibrium because there’s a less than 5% chance of getting that result by chance. And if X2 is less than 5.991 then that means there’s a greater than 5% chance of getting that result by chance and by convention you accept the hypothesis. In our case the chi-‐squared value was 0.448 and since that is less than the critical value that means that we accept the hypothesis that the population is in Hardy-‐Weinberg equilibrium…and that’s the answer to our question. There are two interactive problems available that will allow you to practice what you’ve learned in this podcast and I encourage you to do those •
•
Lecture 22, Question 4 Lecture 22, Question 5 As always if you have any questions about anything please come and talk to me. VGenetic26
This podcast is about nutritional mutants and the use of selective media. It was put together
because these are topics that seem to cause some difficulty. We’re dealing here primarily with is
the nutritional requirements of bacterial and fungi. Many bacteria and fungi are what are called
prototrophs. That is, given a source of carbon and some salts they can make all of the amino
acids, nucleotides, lipids and so on that they need to survive. Cells which are prototrophs can
grow on what is called minimal medium. Minimal medium is simply medium which has a carbon
source and salts and nothing else.
If we look at the ability of prototrophs to grow on different media, if we put prototrophs on
minimal media…there are three cells that have been put on minimal media…
…then those cells will happily use the carbon source and salts to make all of the material they
need, and will therefore be able to grow and produce colonies which we’ll be able to see on the
plate. If we were to put the prototrophs on complete medium, that is medium which has a carbon
source, salts, and also all of the amino acids and nucleotides and other things that a bacteria
might possibly need, then the prototroph will of course be able to grow again because it can
make everything it needs or take everything it needs out of the medium. And so again, we will
see colonies.
Often when we are doing genetics with bacteria and fungi, we are dealing with what are called
auxotrophs. Auxotrophs are strains which cannot make a particular amino acid or nucleotide.
And the reason why they cannot make a particular amino acid or nucleotide is because a
mutation has occurred in the gene for an enzyme in a biochemical pathway. And because of that
mutation they no longer have an enzyme in the pathway to make a particular amino acid.
For example… if we look at the biochemical pathway that leads to the formation of methionine.
Methionine is an amino acid which is required if the cell is going to be able to make proteins. So,
cells have to be able to get methionine in order to survive. The pathway has a number of steps
with various intermediates and if you had a mutation in the gene that catalyzed the first stepfrom homoserine to succinylhomoserine then… you wouldn’t be able to make methionine.
Similarly, if you were to have a mutation in the gene for the second enzyme going from
succinylhomoserine to cystathionine…you wouldn’t be able to make methionine.
And the same is also true for mutations for the gene for the third enzyme and mutations in the
gene for the fourth enzyme.
If you have a mutation in the gene for any of those enzymes the cell will not be able to make
methionine. And if it cannot make methionine it won’t be able to grow.
So, auxotrophs cannot make a specific amino acid or nucleotide (such as methionine). Because
of that they won’t be able to grow on minimal medium because minimal medium doesn’t have
any methionine in it. If we want the auxotrophs to be able to grow we will need to put
methionine into that minimal medium. And what we say is that the medium must be
supplemented with, in this case, methionine.
Let’s look now at the growth of a methionine auxotroph. That is a cell which cannot make a
methionine and therefore needs to have methionine in the medium. If we put methionine
auxotrophs onto minimal medium those cells cannot make their own methionine, and there’s no
methionine in the medium, and so those cells won’t be able to make proteins…they won’t
survive, and we won’t see any colonies on the minimal medium. If we were to put the cells on
complete medium instead… so there you can see three cells in complete medium.
Complete medium has all of the amino acids a bacterium could possibly want including
methionine. And so even though the cells cannot make their own methionine they can take out
methionine from the surrounding medium, and that will let them make the proteins they need and
be able to grow.
Lastly, let’s look at the situation where we have minimal medium which has been supplemented
with methionine. It’s got methionine added in. if we put the methionine auxotroph cells onto this
medium. Then what will happen is that the cells can take out methionine from the medium. And
consequently they can make the proteins they need and they will be able to grow and we will see
colonies.
Next we’ll take a look at how genes are named when they affect nutritional requirements. The
usual way to do this is to name the gene after the amino acid which is made in the affected
pathway. So for example, if we were talking about genes for methionine synthesis we would call
these “met” genes. Now we saw in the biochemical pathway that there are several enzymes in
the methionine pathway, one for each step in the pathway, and we need to distinguish those
different enzymes and we do that by adding a letter to the name. So, the metA gene is a gene
which codes for one of the enzymes that codes for methionine synthesis. And here you see the
pathway with the genes for the appropriate enzymes for each step. There’s a metA gene for the
first step, a metB gene for the second step, metC, metE…each one codes for a different enzyme.
But all of those enzymes are required for methionine synthesis. And in case you’re wondering, I
don’t know what happened to metD.
We also need to be able to distinguish between the wild-type functional version of the gene and
the mutant non-functional version of the gene. And we do this by using superscripts. If the allele
is functional, that is it makes enzymes which works and will allow the cell to make methionine
we use a superscript plus (+).
For example… If the allele is not functional, if it’s mutant so that the cell cannot make methionine, then we put a superscript minus ( -‐ ). So for example… We’re going to look next at what happens when you grow a mixed culture. That is, cells which
are a mixture of prototrophs and auxotrophs on different media. And we’ll start off by looking at
what happens when we put a mixture of met+ and met – cells on minimal medium. So here we’ve
put six cells onto minimal medium. And the white ones represent metA+ cells; that is prototrophs
that can make their own methionine. And the red ones represent metA – cells; which cannot make
methionine. They are methionine auxotrophs. What will happen here is that the metA+ cells will
happily make their own methionine and will grow and form colonies. Whereas the metA – cells
will be unable to make methionine and will be unable to make…to grow and will not appear as
colonies.
If we put the same cells onto minimal medium which has methionine added to it…what will
happen is that the metA+ cells…well they can grow regardless of whether there’s methionine
present or not because they can make their own. But this time the metA – cells will also be able to
grow because they’ll be able to take out the methionine from the plate. And the result is that all
six cells will grow.
However, the metA – cells are not actually a different color from the regular cells. So we won’t
actually be able to distinguish them. We will just know that we get more colonies on the plate.
We can use the behavior of prototrophs and auxotrophs on different media to figure out how
many cells that there are of a particular genotype on a plate. And we’re going to look at a number
of examples. The first example here we’re going to look at a situation where we have a sample
which has a 100 cells which are hisD+ and that is they are…they have functional enzymes for
histidine biosynthesis and there are 50 cells which are hisD- , that is they are histidine
auxotrophs, they cannot grow unless histidine is present in the medium. And what we want to do
is we want to predict how many colonies we would get on minimal medium and how many
colonies we would get on minimal medium which has been supplemented with histidine. If you
want to try and work this out on your own, you can have a stab at it now, just pause the podcast.
I’m going to go on and walk you through the process.
So, we’ll look first at the situation on minimal medium. And on minimal medium the hisD+ cells
will be able to grow because they can make their own histidine. However, the hisD- cells they
can’t make their own histidine and there’s no histidine in the medium so they won’t be able to
grow. So the only cells which will grow here are the hisD+ cells…there are 100 hisD+ cells in the
sample so we would expect to see 100 colonies.
If we look at the other medium, the medium which is supplemented with histidine…in this case
both types of cell can grow. The hisD+ cells can grow because they can grow whether or not
histidine is present and they can make their own histidine. And the hisD- cells, those can grow
because they can take out histidine from the surrounding medium. So that means we will see 150
colonies: 100 of those colonies are the hisD+ cells and 50 are the hisD- cells. So that’s what we
will see on the two types of medium.
This next example is the same idea…it’s kind of approaching a different direction. This time we
know that a sample gives 80 colonies on minimal medium and 360 colonies on minimal medium
supplemented with arginine. And apparently we have two genotypes present argF+, those are
able to make their own arginine, and argF-, those are cells which are arginine auxotrophs and
they won’t be able to grow unless arginine is present in the medium. So, if you want to try this
on your own you can pause the podcast at this point. I’m going to go on and work you through
the example. We’ll start off by looking at the minimal medium. There are 80 colonies on
minimal medium. Now which type of cells can grow on minimal medium? The argF+ cells can
grow on minimal medium because they can make their own arginine. However, the argF- cells
cannot grow on minimal medium because they can’t make arginine from a carbon source and
some salts. So we see 80 colonies on minimal medium; those colonies must be argF+ cells so we
know that there are 80 argF+ cells present in our sample.
Now let’s go on and look at the minimal medium plus arginine. And in minimal medium plus
arginine both types of cell can grow. The argF+ cells can grow because they can grow on
anything. And the argF- cells can grow because they can take up arginine from the medium. We
know that there are 360 colonies produced on this type of medium. So how many cells are argF-?
Well, of those 360 colonies we know that 80 were argF+ cells from earlier…and so the number
of argF- cells must be 360 minus 80, which is 280 cells. Now, obviously this is assuming no
statistical fluctuation, uh…we’re just keeping things nice and straightforward at this point. So…
from this information we can determine that there are 80 argF+ cells present and 280 argF- cells
present.
So far we’ve looked only at examples which involve one gene…the argF gene and the previous
example. We’re going to look at a more complex example where we have two genes involved;
the leuA gene and the proA gene. And we’re going to have four genotypes:
•
•
•
•
leuA+ proA+
leuA+ proAleuA- proA+
leuA- proA-
In the example we’ve got a number of each of the four types and we’re going to try to work out
how many cells will grow on minimal media, how many cells will grow on minimal plus proline,
how many cells will grow on minimal plus leucine, and how many will grow on complete
medium. And again, if you want to try and do this on your own you can pause the podcast at this
point. I’m going to go ahead and work it out for you.
We’ll start off by looking at the leuA+ proA+ cells. Those cells can grow on any of the four
media. They can make their own leucine. They can make their own proline. It doesn’t matter
whether it’s present in the medium or not. Those will grow on all types of medium. So that
means we can put 15 colonies down for each of those types of medium for the leuA+ proA+ cells.
If we work our way down to the next one leuA+ proA- those need a source of proline. They have
to be able to get proline from the medium because they cannot make it themselves; they are
proline auxotrophs. So those won’t be able to grow on minimal medium. They will be able to
grow on minimal medium supplemented with proline. They won’t be able to grow on minimal
medium supplemented with leucine because there’s no proline there and they need proline in the
medium. They will be able to grow on complete medium because it’s got proline in it; it’s
complete. So that means that those 37 cells will grow on minimal plus proline and on complete
medium.
If we look at the next one down leuA- proA+ those will be able to grow on any medium which has
leucine in it. And that’s minimal plus leucine and complete. And so those 62 colonies will appear
on those two media like that.
Lastly we have the cells which are leuA- proA- . These can’t make their own leucine. These can’t
make their own proline. They have to be able to get leucine and proline from the medium.
There’s only one medium which has both leucine and proline. So there’s only one medium that
those cells will be able to grow on and that’s the complete medium. So those 42 cells won’t be
able to grow on minimal. They won’t be able to grow on minimal plus proline. They won’t be
able to grow on minimal plus leucine. They will be able to grow on complete medium. So we put
those in that last row there.
And…if we add them up! We find that those are the number of colonies that we would expect on
each of the four types of medium.
The last example I want to look at is like the previous one, but backwards. This time we’re
looking at the question tells you how many colonies we get on minimal, how many colonies on
minimal plus proline, how many colonies on minimal plus leucine, and how many colonies we
get on complete medium. And we are asked how many leuA+proA+ cells there were? How many
leuA+ proA- ? LeuA- proA+? LeuA- proA-? That’s what we’re trying to figure out. If you’d like to
try this out on your own you can pause the podcast at this point. I’m going to go ahead and show
you how to work it.
We’ll start off by looking at the minimal medium. The minimal medium has 10 colonies. The
only genotype which can grow on minimal medium is the genotype which can make its own
leucine and proline which is leuA+proA+. So that means that those 10 colonies represent
leuA+proA+cells. So there are 10 leuA+proA+cells present.
Now we’ll look at the minimal medium plus proline. Minimal medium plus proline…that will
allow leuA+proA+ cells to grow because they can grow on anything but it will also allow leuA+
proA- cells to grow. Because while those cells can’t make their own proline, they can take out
proline from the medium. So there are 25 colonies produced that are those two genotypes. The
question is how many cells are leuA+proA- ? Well we take the 25 cells…we subtract the 10 cells
which we know are leuA+proA+ from earlier and that means that there are 15 cells which are
leuA+proA-.
Next one down, are the cells which grow on minimal plus leucine. These will be the leuA+proA+
which can grow on anything but this time we’ll also get the leuA- proA+ cells growing. These can
grow because they can take out leucine from the medium. There are 50 cells of this type and we
know that 10 of those 50 cells are leuA+proA+ from earlier which means the number of cells
which are leuA- proA+ is 50 minus 10…that’s 40.
Lastly, we need to look at the cells which grow on complete medium. And all genotypes can
grow on complete medium because complete medium has both leucine and proline. There are
150 cells of this type but we need to know how many are leuA- proA-. And the way we do that is
we simply subtract all of the other cells from the 150. So there are 10 cells which are
leuA+proA+, 15 which are leuA+proA-, 40 which are leuA- proA+…that’s a total of 65. So the
number of cells which are leuA- proA+ is 150, the number we see on the plate…minus 65, the
ones which we know are other genotypes. That leaves 85 cells which are leuA- proA- . And that’s
how you do that type of question.
If you would like more practice doing questions of this type, the interactive problems 1 through 5
are of this type. The questions get progressively more difficult from 1 through 5. And there are
two questions out of those five which randomly generate numbers so you should be able to get
lots of practice. As always, if you have any questions about this or if you run into any problems
please come and get in touch with me.
VGEN Lab 24
This podcast is an example of what you might do for your Bio-‐informatics presentation. The sequence I am using is sequence 42 from the ones available. I took this sequence and I used it to carry out a BLAST search of the human genome at the NCBI website. I found that the sequence matched PAX6…the PAX6 gene. PAX6 is short for “paired box 6”. This gene is located on the short arm of chromosome 11 at the position shown. And if you look at its location in more detail, it’s located here in between the 11p4 (?) gene which is transcribed to the right, and the RCN-‐1 gene which is also transcribed to the right. The gene gives rise to two primary transcripts. The first primary transcript starts here… and ends here… [No pointer to indicate location]. The other primary transcript starts here and ends here… and ends at the same position, [No pointer to indicate location]. And so it gives rise to one long primary transcript and one short primary transcript. The short primary transcript can be spliced in two ways. One is spliced as shown in this line here and the other one is spliced as shown here, [no pointer to indicate location]. The difference is this little ‘X’ on here which is present in the uh, bottom messenger RNA and absent in the middle mRNA. As a result of this you get two different proteins. One protein is produced from the uh, top transcript. And it is exactly the same as the one from the middle transcript, and that gives you a protein which has 422 amino acids in it. The third mRNA down here, this gives rise to a protein which has an extra 14 amino acids located right here. And so it gives a protein of 436 amino acids. The genetic condition associated with the PAX6 gene is Aniridia. Aniridia is quite a variable condition. The name suggests that the iris of the eye is missing, and in fact there are variable defects in the iris which are observed. These can range from complete loss of the iris through very subtle irregularities in the pupil of the eye which can only be detected using careful examination. In addition, individuals with Aniridia have uh…nystagmus (which is jerky movements of the eye). They are prone to cataracts, glaucoma, and abnormalities in the fovea. All of these conditions will tend to reduce the uh, quality of vision and this can be quite extreme all the way out to blindness. In terms of the relationship between the PAX6 and the condition of Aniridia, about 2/3 of cases of Aniridia are inherited as 1/3 arise sporadically from spontaneous mutations. Mutations can be both chromosomal abnormalities, deletions and translocations of 11p13, the region where PAX6 is located, have been described. You can also get point mutations, uh short insertions and deletions, frame shifts, missense mutations in PAX6. All of those have been reported. One example from a paper involved a family where the male parent was heterozygous for a mutation which uh, changed the serine codon for amino acid 353 into a nonsense codon. And the uh, female parent was heterozygous for a mutation which changed an arginine codon into a stop codon. The phenotypes associated with these two mutations were different. The male parent had early onset cataracts. The female parent had Aniridia. This came to attention because they had offspring who managed to get the S353X mutation from the male parent and the R103X mutation from the female parent. And in this case the phenotype was much more severe…Anopthalmia, uh the eyes were missing. And in fact the child had microcephaly, a very unusually small head. The way that PAX6 is thought to work…it’s known to encode a transcription factor (regulates transcription), and this transcription factor is expressed in the developing nervous system in the eye. And in at least one case, a recent paper reported that if you reduce expression of PAX6 in mice then this leads to abnormal development of the eye-‐ and in a way that increases intraocular pressure during development. And this is thought to give rise to the abnormalities in the iris of the eye. The PAX6 gene contains two DNA binding domains. Uh, one domain here belongs to the PAX family of DNA binding domains. And the other domain here shown in blue belongs to the Homeodomain family of DNA binding domains. And a paper looking at the two splicing variations that you get, find that in the short protein (the 422 amino acid protein)-‐ it’s the PAX domain that binds the DNA and the Homeodomain is masked, hidden, and not involved in DNA binding. In contrast, in the long form of the protein (the one that has the extra axon) and is 436 amino acids long. That extra 14 amino acids renders the PAX domain non-‐functional and in that case it’s the Homeodomain that binds the DNA. The significance if this is that, uh it is alternate splicing which determines which gene this transcription factor will bind to. The uh, short form binds to one set of genes and regulates them and the long form binds to another set of genes and regulates them. Tips for Tests 1 When a politician is asked a difficult question often they’ll avoid it and answer a different question. However, you're not a politician. You’re a student so you don't have that option. That brings us to this tip for test taking. “Answer the question that you're being asked. “ Here's an example of a question… The promoter regions of genes in the bacterium E. coli contain two sequences that are important for correct initiation of transcription. Where are these sequences located? This is a typical answer to this question (TTGACA) or (TATAAT). Unfortunately this answer is wrong! Why is the answer wrong? The answer is wrong because those are sequences and while they are indeed the sequences present in promoter regions in E. coli, that's not what the question asked. The question doesn't ask about sequences. The question asks about locations. Where are the sequences located? Not, “what are the sequences”? So let's have a look at the correct answer. Where are the sequences located? They are located at positions -‐35 and -‐10. Those are locations. Those are the right answer. Tips for Tests 2 At the end of the Star Wars film series the Emperor says to Luke Skywalker, "Only now at the end do you understand." It's possible that here he was giving Luke an important hint about test-‐taking. Always read to the end of the question otherwise you may not understand the question. That brings us to this tip for test-‐taking. “Read the whole question.” Here’s an example of a question where this applies. Which of the following consists of two strands of amino acids? DNA, Protein, RNA, or none of these? When I’ve asked this question in the past the most common answer has been DNA. But this answer is definitely not correct. Let’s look at why this answer is incorrect. This answer is incorrect because if you look at or read to the end of the question you see it wants to know “What consists of two strands of amino acids”. And DNA while it does have two strands is made up of nucleotides. So what’s the correct answer? Well DNA and RNA are both made up of strands of nucleotides so they are not the answer. And proteins while they are made up of chains of amino acids aren’t double stranded. So the correct answer is none of these. Tips for Tests 3 When testifying in court one has to swear to tell the truth, the whole truth, and nothing but the truth. This last part is a good idea for taking tests as well. The tip for this presentation is “Don’t volunteer extra information.” Here’s an example question where this applies. Give the full names of two hydrophobic amino acids. The type of answer that sometimes is given is this here…three amino acids. This answer is wrong! Now let’s look at why this answer is wrong. The reason why this answer is wrong is very simple. The third amino acid there, serine, is hydrophilic. And the question asks for two hydrophobic amino acids. Alanine and valine are hydrophobic but serine is hydrophilic therefore the answer is wrong. The correct answer for this one would be very simple. Stick to giving just two amino acids, alanine and valine. And if you’re not sure which amino acids are hydrophobic or not, take your best guess with two of them. It’s safer than giving three and running the risk of putting in one that’s not hydrophobic. So that’s the right answer, and that’s that tip. Tips for Tests 4 When answering test questions [sound of explosion], sometimes it’s a good idea to take a shot in the dark. Or to put it another way, this tip is “When you don’t know the answer, GUESS.” Here’s a question where this might apply. Which event in translation requires the factor EF-‐G? And then you’re given four possible options… A)
B)
C)
D)
Addition of amino-‐acyl tRNA’s to the P site Formation of peptide bonds Movement of peptidyl-‐t-‐RNA’s to the P site Recognition of the Shine-‐Dalgarno sequence Sometimes when presented with a question like this a student will leave it blank, or put a question mark next to it? Or write, “Don’t know”. This is a terrible answer. So let’s look at why this answer is so useless. The reason why this answer is so useless is because that’s not an answer to the question. There is no way that can be correct. Let’s look at a better answer. A better answer is, if you have absolutely no idea what the answer is…take a shot in the dark. Circle something. At least you have a chance of it being the correct answer. Now, in the interests of realism I’ll point out that’s not actually the correct answer (not D) but at least there was a chance this way. So, when in doubt…GUESS! Don’t leave it blank. Tips for Tests 5 There are a handful of letters that can help you score better on tests and improve your grade. Now they're not actually Scrabble letters, rather they're SI units... nanometers, millimeters, nanomoles, micromoles, units, things like that. Here's an example question. A transmembrane of a helix consists of twenty amino acid residues, how long is it? This is a simple calculation involving the pitch of an alpha helix and the number of amino acids per turn. And the sort of answer I sometimes see is three. That's not an adequate answer. Why is this answer inadequate? The reason why this answer isn't adequate is because the units aren't given. So I don't know whether that's three meters, three feet, or possibly even three cubits. It doesn't say. So what's the complete answer? The complete answer is three nanometers. Give the units. It will get you the point. Tips for Tests 6 According to William Shakespeare in Romeo and Juliet, "A rose by any other name would smell as sweet." Unfortunately this is not the case for answers to test questions. In test questions using the right word is very important. "Kinda similar" isn't going to count. Here's an example of a question where this applies. Give the full name of the aromatic base that pairs with adenine in DNA. Give the full name of the aromatic base that pairs with guanine in DNA. Now these answers are what I sometimes see...thiamine, cysteine. Those answers are wrong. It's not thiamine and it's certainly not cysteine. That answer actually drives me crazy. So I marked it wrong three times here. Why are these answers wrong? Well thiamine is not an aromatic base. It's a vitamin. And cysteine is definitely not an aromatic base, it's an amino acid. If you look at the correct answer we'll see where the mistake may have arisen. The aromatic base that pairs with adenine is not thiamine... its thymine. And the aromatic base that pairs with guanine is not cysteine, its cytosine. Yeah they both begin with "cy" but they are very different things. Thymine and cytosine are the correct answers. Tips for Tests 7 So a man walks into a bar and he says, "I found this electron lying around outside does it belong to anybody in here? Did anybody lose an electron?" And the atom says, "Oh, that's my electron!" And the man says, "are you sure it’s yours?" And the atom says, "I'm positive!" This reminds us that it's important to keep our oxidation states straight. Here's the question where that applies. Which coenzyme accepts electrons from succinate in the citric acid cycle? Somebody's given the answer FADH-‐2. Unfortunately this answer is not correct. Why is this answer wrong? The reason why this answer is wrong is that FADH-‐2 is the reduced form of the coenzyme flavin adenine dinucleotide. It already has electrons on it so it cannot accept any more. What's the correct answer? The correct answer is to put in FAD, the oxidized form of flavin adenine dinucleotide. It can accept electrons. That's the correct answer. Tips for Tests 8 [Music playing…Rolling Stones, “You can’t always get what you want…”] Notice the Rolling Stones Song, “You Can’t Always Get What You Want”. It’s possible that this song was written about Biology professors. Because sometimes you ask a question and you can’t seem to get the answer that you want. That brings us to this tip which is… If the question is about “_____________”, the answer should be about “_______________”… also. Here’s an example of a question where this might apply. How does the secondary structure of fibrous proteins differ from the secondary structure of globular proteins? When I’ve asked this question in the past sometimes I’ve gotten answers that look like this… Fibrous proteins are insoluble and perform structural roles. Globular proteins are soluble and are not structural. These answers are not acceptable. So why is this answer wrong? The reason why this answer is wrong is because the question is about secondary structure. Now secondary structure means things like α-‐helix, β-‐pleated sheet, and β-‐bends. These answers while they are true statements about fibrous proteins don’t refer to secondary structures. An answer to do with secondary structures would be expected to include the terms α-‐helix, β-‐pleated sheet, and/or β-‐bends. So here’s a correct answer to this question… Fibrous proteins have just one type of secondary structure-‐ (e.g. all α-‐helix or all β-‐pleated sheets). Globular proteins have a mixture of secondary structures-‐ (α-‐helix, β-‐pleated sheet, and β-‐bends). Tips for Tests 9 In this tip I’m going to urge you to “know the count”…but not Dracula. Know how many answers are needed for a question. Let’s look at some examples. First example… Which one of the following enzymes uses TPP (thiamine pyrophosphate) as a coenzyme? •
Glycogen phosphorylase •
•
Pyruvate dehydrogenase Pyruvate kinase •
Succinate dehydrogenase For this there is only one answer. The reason why I know there’s only one answer is because it says which one of the following enzymes uses TPP. Contrast that with this second question… Which of the following enzymes uses NAD+ as a coenzyme? For this there could be one, two, three, or four correct answers. And the reason why I know that is because the word “one” is missing…which of the following enzymes. An additional twist on this would be if the question was phrased… Which, if any, of the following enzymes uses NAD+ as a coenzyme? If the question were phrased that way, you could have zero to four correct answers. Let’s look at another situation… Which enzyme is regulated by reverse covalent modification? •
•
β-‐Galactosidase Glycogen phosphorylase •
•
Hexokinase Trypsin There’s one answer to this only, and the reason why I know this is because it says which enzyme…singular… is regulated by reversible covalent modification. Contrast that with this question… Which enzyme or enzymes is/are regulated allosterically? •
•
•
β-‐Galactosidase Glycogen phosphorylase Hexokinase •
Trypsin In this case there are one to four correct answers because the questions say enzyme or enzymes indicating that it could be one or more. An additional twist on this question would be if it said… Which enzyme or enzymes if any is/are regulated allosterically? A little clumsy construction there, but were it phrased that way there would be zero to four correct answers possible. The last example was one that sometimes causes most confusion… During catalysis by chymotrypsin, which of the following residues in the active site makes a nucleophilic attack? •
Aspartate or Cysteine or Histidine or Serine or Tyrosine In this case there is only one answer possible, and the reason that you know that is because the options are separated by the word “or”. That means that aspartate or cysteine or histidine or serine or tyrosine, not more than one of those-‐ just one. Contrast that with this other question… During catalysis by chymotrypsin, which of the following residues in the active site carry out acid-‐base catalysis? •
•
Aspartate Cysteine •
•
•
Histidine Serine Tyrosine And this time we’ve got the options listed without “or” in-‐between that means it can be more than one of those. And that means that you’re going to have one to five possible answers to be circled there. And the usual twist, we modified it to say… …Which, if any, of the following residues in the active site carry out acid-‐base catalysis? And that would mean that zero to five answers would be correct. So by reading the question, hopefully you can tell how many answers you need to circle and that can make your life easier. Tips for Test 10 Sometimes on tests it is very important to know where to “draw the line”. This is particularly important when you’re drawing chemical structures. For example, suppose you had a question which asked you to draw the amino acid alanine. Sometimes I get answers that look like this… While that might get partial credit, it’s not going to get complete credit. So why is this answer wrong? The reason why this answer is wrong is because of the bond drawn there… You have a bond going from the alpha carbon to one of the hydrogens in the methyl group. If we drew out that rather peculiar structure, we’d have a three valent carbon and two valent hydrogen as shown. So what do you need to do? Here’s the correct answer. This time, it’s the carbon of the methyl group which is connected to the alpha carbon. That’s the correct structure. That gets the points. Tips for Tests 11 I say “tomahto”…and you say a glossy red pulpy edible fruit. The point here is that names are not descriptions and vice versa, but we’ll focus on the former. Here’s the question… Briefly describe the role in metabolism of the molecule pictured. An answer I might get to that would be flavin adenine dinucleotide. That answer is definitely not going to get the point. What’s wrong with this answer? What’s wrong with this answer is that that’s the name of the molecule shown. It is indeed flavin adenine dinucleotide… that’s the name. But the question asks you to describe the role. So the correct answer is a description of the role in metabolism. Something like the answer shown there… Tips for Tests 12 When people want to point out how different two things are, sometimes they say they’re “as different as chalk and cheese”. In this tip we’re going to point out how important it is not to confuse things that are very different, even though they start with the same first two letters. Here’s a question where this relevant… What type of sequence is often involved in termination of transcription in prokaryotic cells? A typical answer I might get to that would be UAA, UAG, UGA. But there are actually two things that are wrong with that. So what are they? Well the first thing is that those are sequences, where I actually asked for a type of sequence. So let’s give a chance to correct that. Maybe you would put in instead…stop codon. That’s still wrong. And the reason why it’s wrong is that stop codons are used in translation and this question is about transcription. Transcription and translation are very different processes. And it’s important not to get them confused even though they begin with the same group of letters. So, the correct answer… What type of sequence is often involved in termination of transcription in prokaryotic cells? …would be a Palindrome.