VLSCI how to add missing residues to a model Tutorial Introduction Often a published pdb file will be missing some residues for example in a flexible loop region. To increase the accuracy of a model, any missing bits should be added. This tutorial will demonstrate how to add missing residues using comparative modeling using the program Schrodinger. Comparative modeling is structure prediction based on amino acid sequence similarity. The tertiary protein structure determines its function. Thus through evolution the structure is more conserved than the amino acid sequence. Consequently, if the similarity between a structure and another amino acid sequence is sufficiently high (more than 40%), it should be possible to get a fairly good idea of the protein backbone structure. This tutorial will demonstrate how to add missing residues using comparative modeling using the program Schrodinger, using the amino acid sequence of the human serum albumin as starting point. The albumin protein is well studied and its structure has been determined. In cases involving a protein whose structure hasn’t been solved, ab initio modeling should be used. For more information on Schrodinger see http://www.schrodinger.com/products/14/12/ For more information on structure determination see http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Lo oking-‐at-‐Structures/methods.html Last updated 27 August 2012 Page 1 of 6 1. Find the amino acid sequence. Begin by finding the correct amino acid sequence of the human albumin protein. Go to the NCBI website http://www.ncbi.nlm.nih.gov/ and search for human serum albumin in the search box. A list of databases appears. Click on the ‘Protein: sequence database’. Open the first entry and check that it is the human albumin protein. albumin [Homo sapiens] 609 aa protein Accession: AAA98797.1 GI: 178344 2. Perform a blast search We are interested in the sequence. Open the FASTA tab at the top of the page. Save this sequence in the text file (in this document it is referred to as complete_sequence.txt) Next, search the Protein Data Bank to see if it contains any similar sequences we can use as structural templates for the complete_sequence.txt. In another window, open the NCBI BlastP website http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins Copy the FASTA sequence into BlastP. Under ‘Choose Search Set’, select ‘Protein Data Bank proteins (pdb)’ as database. This will ensure that server searches for similar sequences in pdb files. Click blast to submit the request. Look at the results. The entry 1AO6_A.pdb seems to match the query 100%, yet the pdb file does not include all of the residues in the query – some residues are missing in the 3D model. 3. Inspect the pdb file Open the 1AO6 entry in the Protein Data bank, and display the pdb file by clicking on ‘Display Files’ on the right side of the page. Stroll down to the ‘Remarks 465 MISSING RESIDUES’ section. Indeed seven amino acids are not included in the pdb file. Last updated 27 August 2012 Page 2 of 6 Due to the high sequence similarity to the complete_sequence.txt, the 1AO6.pdb file seems like a very good template. Download the 1AO6.pdb file. Notice the first 24 residues are not included in 1AO6.pdb, even thou only seven were listed as missing in the pdb (?). Go back and look at the Genbank human albumin entry, and stroll down to the ‘FEATURES’ section. The ‘mat_peptide’ points to the mature protein, which starts with residue 25. Therefore, there is no reason to add the first 24 amino acids, as we are only interested in the mature protein. 4. Save only one chain Open the 1AO6.pdb in VMD. vmd 1AO6.pdb The albumin protein is a single peptide chain, yet this pdb entry contains two chains. This pdb file contains two chains. As only one chain is needed, save chain A, by highlighting the structure file in the VMD Main window. Then select File > Save Coordinates, and type chain A as the selected atoms. Save the file as 1AO6_A.pdb. Delete the 1AO6.pdb, load 1AO6_A.pdb into VMD and confirm the sequence of the model. Open Extensions > Analysis > Sequence Viewer. The model starts with 5 S 6 E 7 V and ends with Last updated 27 August 2012 Page 3 of 6 580 Q 581 A 582 A Thus we need to add the following residues to our model 1 2 3 4 D A H K 583 L 584 G 585 L Open the ‘complete_sequence.txt’ and modify the sequence so it correlates with the scheme above. 5. Obtain VLSCI license to use Schrodinger Now that we have the amino acid sequence and an appropriate template structure, we can commence the homology modeling. Various online servers and programs allow you to do comparative modeling. Here, we use the program Schrodinger to model the missing residues from the complete_sequence.txt using the 1AO6_A.pdb as template. Before you can use the software, you must agree to the Schrodinger license. Go to the VLSCI website http://www.vlsci.org.au/page/home and log into your VLSCI account under ‘VLSCI ACCOUNT MANAGEMENT’ in the upper right corner. Click on the ‘Restricted Software’ and then ‘Add software’. A list of software will appear on the screen. Select Schrodinger and agree to the licence. You should now be permitted to use Schrodinger. 6. Log onto a VLSCI cluster and load maestro Upload the 1AO6_A.pdb and complete_sequence.txt to your Merri (or Bruce) account using Last updated 27 August 2012 Page 4 of 6 sftp [email protected] followed by put 1AO6.pdb 1AO6_A_seq.txt Exit and log onto Merri by typing ssh [email protected] –Y The –Y ensures that Schrodinger is run on Merri, yet can be view on your local screen. Schrodinger must be loaded before you can use it. Load Schrodinger using the command module load schrodinger From the Schrodinger package open maestro by typing maestro 7. Create a new project In maestro, begin by creating a project folder for this project. Click on New under the menu Project, and give your project a name. Now you can save all the created files in your maestro directory. 8. Perform homology modeling Begin the modeling job by opening Applications > Prime > Structure Prediction. A new window pops up. Click on the ‘From File’ bottom and select the complete_sequence.txt. Click next. Last updated 27 August 2012 Page 5 of 6 Now click ‘Import’ and select the 1AO6_A.pdb. Click on ‘Imported Homolog’ and a colorful alignment should appear. Check to ensure that all the missing residues are there. Select ‘Comparative Modeling’. This method predicts a 3D structure based on the similar amino acid sequence. Subsequently, click on ‘Run SSP’ (secondary structure prediction). Notice the green clocklike thing under the alignment, this tells you that the job is running. When done, click ‘Next’ and choose ‘Build’ to construct the model. This step will take several minutes to complete. 9. Save the new model When done, add the model to the project table. In the project table window, select Table > Export > Structures. Save the new model as a pdb file. Check this new model in VMD to ensure that the missing residues have been added. Last updated 27 August 2012 Page 6 of 6