Codon optimization of Col H gene encoding Clostridium histolyticum collagenase to express in Escherichia coli Hamzeh Alipour, Abbasali Raz, Navid Dinparast Djadid, Abbas Rami, Seyed Mohammad Amin Mahdian A given amino acid sequence can be encoded by a huge number of different nucleic acid sequences. These sequences, however, prove not to be equally useful. The choice of PrePrints sequence can significantly impact the expression of an encoded protein. As regards the importance of protein-coding sequence and promising industrial and medicinal applications of Clostridium histolyticum collagenase, this study examined the codon optimization of the Col H gene so as to enhance collagenase expression in Escherichia coli (E. coli). The coding region of mature Col H gene was optimized according to the codon usage of E. coli using Gene Designer software (DNA 2.0). The results revealed that relative frequency of codon usage in Col H gene was adapted to the most preferred triplets in E. coli in such a way that codon usage bias in E. coli was enhanced after codon optimization. Similarly, the higher level of collagenase expression was more likely the result of substituting rare codons with optimal codons. As has been reported elsewhere, the findings from this study suggest that codon optimization provides a theoretical improvement in Col H gene expression in E. coli. In spite of that, experimental research is needed to confirm the improvement. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 1 Alipour Hamzeh1, 2, Raz Abbasali1, Dinparast Djadid Navid1, Rami Abbas1 and Mahdian Seyed 2 Mohammad Amin1,* 3 1 4 Institute of Iran, Tehran, Iran 5 2 6 School of Health, Shiraz University of Medical Sciences, Shiraz, Iran 7 * 8 Research Group (MVRG), Biotechnology Research Center (BRC), Pasteur Institute of Iran, Tehran, 9 Iran. Tel: 02636100955 Fax: 02636100955 E-mail: [email protected] Malaria and Vector Research Group [MVRG], Biotechnology Research Center [BRC], Pasteur Department of Medical Entomology and Vector Control, Research Centre for Health Sciences, To whom correspondence should be addressed: Seyed Mohammad Amin Mahdian, Malaria and Vector PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 10 Introduction 11 Collagens that exist in a huge number of cell types are the integral component of animal tissues 12 such as skin, tendons and cartilage as well as the organic constituent of bones, teeth and cornea. 13 In fact, it accounts for about 25 to 33% of the total protein in mammalian organs. Collagen which 14 exists in the connective tissues of nearly all organs is as insoluble fibers. It imbeds in the 15 mucopolysaccharides and protein of the extracellular matrix and plays a vital role in tissues 16 strength (Harrington 1996). A change in its production or degradation has been shown to result in 17 a variety of diseases. In such cases, proteolytic enzymes provide a useful clinical way to the 18 treatment of collagen-centered disorders. On the other hand, making use of enzymes as drug 19 enjoys two advantages which differentiate them from all other types of drugs. To start with, 20 enzymes frequently bind and act on their substrates with a high affinity. In the second place, 21 enzymes are catalytic and transform target molecules into certain products. Thus, enzymes are 22 more specific and potent drugs than small molecules that can accomplish therapeutic 23 biochemistry in the body. These features have given rise to the development of many enzymatic 24 drugs for a large variety of disorders (Vellard 2003). 25 A tight triple helical structure that makes up collagen causes its resistance to most proteases; all 26 the same, collagenases can specifically degrade collagen. Bacterial collagenases, in addition, 27 have been shown to display broader substrate specificity than vertebrate collagenases. 28 Collagenases derived from Clostridium histolyticum, namely Col G and Col H, are a case in point 29 since these are capable of easily digesting collagens, no matter what size and type they have 30 (Ohbayashi, Yamagata et al. 2012; Preet Kaur and Azmi 2013). 31 Recently, making use of Clostridium collagenases has attracted a great deal of researchers’ 32 interest as a non-invasive therapeutic procedure. As such, these collagenases have been examined 33 for the treatment of Dupuytren's disease (Badalamente, Hurst et al. 2002), Peyronie’s Disease PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 34 (HELLSTROM and BIVALACQUA 2000), Herniated Lumbar Disk (Sussman, Bromley et al. 35 1981), retained placenta (Fecteau, Haffner et al. 1998), adhesive capsulitis (Badalamente and 36 Wang 2012), wound healing (YAVUZER, LATİFOĞLU et al. 1997),the debridement of burns 37 (Klasen 2000) and in the preparation of pancreatic islet cells for transplantation (Vrabelova, Adin 38 et al. 2014). 39 Nowadays, culturing C. Histolyticum and subsequent purifying all of the produced bacterial 40 proteins is a widely used method of producing collagenases for clinical applications (Bertuzzi, 41 Cuttitta et al. 2014). Nevertheless, the isolation and extraction of enzyme from natural sources 42 due to low expression levels and the intracellular localization of enzyme faces technical problems 43 and increases the costs of production. For commercial success, decreasing the cost of production 44 is necessary which, in turn, depends on the expression level of enzyme and the purification costs. 45 As such, there has been growing interest in producing enzyme by recombinant methods (Katrolia, 46 Yan et al. 2011). Prokaryotic expression systems, especially E. coli, are one of the most common 47 systems for the industrial production of proteins of therapeutic or commercial applications. E. 48 coli has advantages including growth on inexpensive media, rapid biomass accumulation, 49 convenient genetic manipulation, and simple scale-up (Baneyx and Mujacic 2004). Nonetheless, 50 the production of heterologous protein in the organism may be decreased by codon bias 51 phenomenon, in which proteins of interest contain codons that are rarely used in E. coli (Burgess- 52 Brown, Sharma et al. 2008). It has revealed that the presence of rare codons results in decreasing 53 translation speed and inducing translational errors (Plotkin and Kudla 2010; Elena, Ravasi et al. 54 2014). The rarest codons in E. coli are illustrated in Table 1. In addition to rare codons, GC 55 content can affect expression levels. GC-rich mRNAs can contribute to forming powerful 56 secondary structures and, especially in bacteria, such a powerful structure near ribosome-binding 57 site obstructs translation initiation. On the other hand, GC-poor mRNAs cannot fold strongly and PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 58 frequently carry sequence elements limiting expression. For instance, low GC content has been 59 seen to restrict the expression of Plasmodium falciparum genes in E. coli. Such mRNAs seem to 60 be targets for RNase E cleaving AU-rich sequence (Plotkin and Kudla 2010). Codon optimization 61 has been considered to be a common strategy to improve translation efficiency and accuracy 62 (Gingold and Pilpel 2011). Codon optimization, indeed, is to alter rare codons in target gene so as 63 to adapt them to the codon usage of specific expression host. Recently, a huge number of studies 64 have reported an increase in expression levels by codon optimization (Gustafsson, Govindarajan 65 et al. 2004; Zhou, Schnake et al. 2004; Lee, Koh et al. 2010). Here, we explored codon 66 optimization of Col H gene to express in E. coli. 67 Material and methods: 68 Codon optimization of Col H 69 The protein sequence of Col H was taken from UniProt database and 40 amino acids of putative 70 signal peptide (MKRKCLSKRLMLAITMATIFTVNSTLPIYAAVDKNNATAA) were removed 71 in order to generate a mature enzyme. Then, the coding region of mature Col H gene was 72 optimized according to the codon usage of E. coli using Gene Designer software (DNA 2.0). This 73 software using proprietary algorithms replaces rare codons and eliminates problematic mRNA 74 structures and repetitive sequences. 75 Gene Sequence Analyses 76 The sequence analysis of native and optimized Col H gene was performed using online software 77 involving Rare Codon Analysis Tool and Sequence Analysis which are available on web sites 78 www.genscript.com and www.bioinformatics.org, respectively. 79 Results and Discussion PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 80 Codon optimization of Col H 81 The native gene used tandem rare codons such as AGA AGG which have been shown to greatly 82 affect heterologous expression in E. coli. These effects include ribosome pausing and co- 83 translational cleavage of mRNA, ribosomal frame shifting or amino acid misincorporation 84 (Burgess-Brown, Sharma et al. 2008; Plotkin and Kudla 2010). Codon optimization substituted 85 such codons and adapted frequency of codon usage to the most preferred triplets in E. coli so that 86 its codon usage bias was enhanced. Moreover, the sequence of native (GenBank accession 87 number D29981) and codon-optimized genes were aligned, indicating that codon optimization 88 did not alter the amino acid sequence and that 666 out of 982 codons (67.82%) were substituted 89 (Figure 1). 90 Percentage of non-optimal and optimal codons before and after codon optimization 91 The native and optimized codon sequences of Col H enzyme are compared using Sequence 92 Analysis software. The gene coding collagenase contains 2946 bp encoded the protein 982 amino 93 acids. Before codon optimization, the numbers of rare codon GGA and GGG (codons encoding 94 glycine) were 38 and 6, respectively. On the other hand, the codons GGC and GGT that are 95 optimal for glycine were 3 and 26, respectively. After codon optimization, the rare codons 96 reached 0 and the optimal codons GGC and GGT increased to 37 and 36, respectively. Moreover, 97 the rare codons AGA and AGG (coding arginine) that were 23 and 5 fell to zero after 98 optimization. In contrast, the codons CGT and CGC which are optimal for arginine increased 99 from 2 and 1 to 20 and 11, respectively, after optimization. Additionally, the numbers of rare 100 codons for leucine, i.e. TTA and CTA, decreased from 42 and 4 to zero. On the contrary, codon 101 optimization raised the optimal codon CTG to 65. Furthermore, the codons AGT and TCA that 102 are rare for serine diminished to zero and, in contrast, increased numbers in the optimal codons 103 TCT, TCC, and AGC emerged from codon optimization. In addition, after codon optimization, PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 104 the rare codon ACA encoding threonine declined from 23 to 0 and the optimal codon ACC 105 increased from 4 to 40. As far as isoleucine amino acid is concerned, the rare codon ATA reduced 106 from 41 to 0 and the optimal codons ATC and ATT rose to 35 and 23, respectively. As for proline, 107 the rare codon CCC decreased from 1 to 0 and optimal codon CCG increased from 1 to 33. As to 108 other amino acids, it has also been shown that codons biased towards more frequent codons via 109 codon optimization. As an example, less frequent codon AAT that encode asparagine reduced 110 from 57 to 10 and, contrarily, more frequent codon AAC increased from 12 to 59 (Table 2) 111 Codon Adaption Index 112 Codon Adaption Index (CAI) assesses the extent of bias in favor of codons which are involved in 113 highly expressed genes (Moriyama 2003). The levels of protein expression and CAI are known to 114 be correlated. A CAI of > 0.8 is regarded to be optimum for expression in the expression host of 115 interest. As shown in Figure 2, the CAI value for original gene was 0.62; however, codon 116 optimization increased the CAI of the coding sequence to 0.84 that is ideal for expression in E. 117 coli. 118 Frequency of optimal codon 119 The simplest method for measuring species-specific codon usage bias is the frequency of optimal 120 codons (Fop): 121 where Xop and Xnon are the number of optimal and non-optimal codons in a gene, respectively. 122 Codons excluded from the calculation contain stop codons and codons for tryptophan and 123 methionine. Optimal codons for E. coli were originally determined based on the availability of 124 tRNA and the nature of the codon-anticodon interplay. These codons are thought to be PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 125 translationally optimal and are more frequently involved in genes expressed highly than lowly 126 expressed genes (Moriyama 2003). Non-optimal codon content is currently known to limit the 127 expression of heterologous proteins owing to restricting available cognate tRNAs in the 128 expression host (Angov 2011). As can be seen from Figure 3, the 44% of Col H codons showed 129 the value of 100 before codon optimization whilst they increased to 64% with codon 130 optimization. Additionally, the percentage of rare codons (codons with values lower than 30) was 131 12, which decreased to 0 following codon optimization. 132 GC content adjustment 133 Original gene encoding Col H showed a GC low content, resulting in low expression. In spite of 134 that, codon optimization increased it from 30.99% to 47.46% (Figure 4). 135 Conclusion 136 In order to quantify translation efficiency, two measures are available. The first type of measures 137 evaluates the codon bias of genes which CAI is a good case in point. A further kind is based on 138 the availability of tRNA at each codon along the gene which Fop is a case in point. A privileged 139 status of the Fop over the CAI is that it reduces the need to identify a set of highly expressed 140 genes as a reference. On the contrary, it only needs the recognition of all tRNA genes in the 141 genome and their classification in accordance with their anti-codon (Gingold and Pilpel 2011). 142 Here, the Gene designer stand-alone software applied to optimize codons considers both 143 methods. Furthermore, GC content in the Col H gene was balanced using the software. As such, 144 better results are expected to have been obtained using the software. 145 As has been reported elsewhere [e.g. (Hu, Shi et al. 1996; Johansson, Bolton-Grob et al. 1999; 146 Zhou, Schnake et al. 2004; Burgess-Brown, Sharma et al. 2008)] the findings from this study 147 suggest that codon optimization provides a theoretical improvement in Col H gene expression in 148 E. coli. In spite of that, experimental research is needed to confirm the improvement. This is PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 149 because although codon bias greatly impacts gene expression, it is not the only contributing 150 factor. The selection of expression vectors and transcriptional promoters is also imperative 151 (Gustafsson, Govindarajan et al. 2004). In addition, a sequence motif in the vicinity of the 152 initiation AUG (Gingold and Pilpel 2011) and mRNA stability at the 5´terminus (Maertens, 153 Spriestersbach et al. 2010) were shown to play a role in the gene expression. 154 Acknowledgments 155 We would like to thank Dr. Ramin Rahmany for reviewing and editing the manuscript. 156 Competing interests 157 The authors declare that they have no competing interests. 158 References 159 160 Angov, E. (2011). "Codon usage: Nature's roadmap to expression and folding of proteins." Biotechnology journal 6(6): 650-659. 161 162 Badalamente, M. A., L. C. Hurst, et al. (2002). "Collagen as a clinical target: nonoperative treatment of Dupuytren's disease." The Journal of hand surgery 27(5): 788-798. 163 164 Badalamente, M. A. and E. Wang (2012). Methods for treating adhesive capsulitis, Google Patents. 165 166 Baneyx, F. and M. Mujacic (2004). "Recombinant protein folding and misfolding in Escherichia coli." Nature biotechnology 22(11): 1399-1408. 167 168 Bertuzzi, F., A. Cuttitta, et al. (2014). Clostridium histolyticum recombinant collagenases and method for the manufacture thereof, Google Patents. 169 170 171 Burgess-Brown, N. A., S. Sharma, et al. (2008). "Codon optimization can improve expression of human genes in< i> Escherichia coli</i>: A multi-gene study." Protein expression and purification 59(1): 94-102. 172 173 Elena, C., P. Ravasi, et al. (2014). "Expression of codon optimized genes in microbial systems: current industrial applications and perspectives." Frontiers in microbiology 5. 174 175 176 Fecteau, K., J. Haffner, et al. (1998). "The potential of collagenase as a new therapy for separation of human retained placenta: hydrolytic potency on human, equine and bovine placentae." Placenta 19(5): 379-383. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints 177 178 Gingold, H. and Y. Pilpel (2011). "Determinants of translation efficiency and accuracy." Molecular systems biology 7(1). 179 180 Gustafsson, C., S. Govindarajan, et al. (2004). "Codon bias and heterologous protein expression." Trends in biotechnology 22(7): 346-353. 181 182 Harrington, D. J. (1996). "Bacterial collagenases and collagen-degrading enzymes and their potential role in human disease." Infection and immunity 64(6): 1885. 183 184 HELLSTROM, W. J. and T. J. BIVALACQUA (2000). "Peyronie's disease: etiology, medical, and surgical therapy." Journal of andrology 21(3): 347-354. 185 186 187 Hu, X., Q. Shi, et al. (1996). "Specific Replacement of Consecutive AGG Codons Results in High-Level Expression of Human Cardiac Troponin T in< i> Escherichia coli</i>." Protein expression and purification 7(3): 289-293. 188 189 190 Johansson, A.-S., R. Bolton-Grob, et al. (1999). "Use of Silent Mutations in cDNA Encoding Human Glutathione Transferase M2-2 for Optimized Expression in< i> Escherichia coli</i>." Protein expression and purification 17(1): 105-112. 191 192 193 Katrolia, P., Q. Yan, et al. (2011). "Molecular cloning and high-level expression of a βgalactosidase gene from< i> Paecilomyces aerugineus</i> in< i> Pichia pastoris</i>." Journal of Molecular Catalysis B: Enzymatic 69(3): 112-119. 194 195 Klasen, H. (2000). "A review on the nonoperative removal of necrotic tissue from burn wounds." Burns 26(3): 207-222. 196 197 198 Lee, S. G., H. Y. Koh, et al. (2010). "Expression of recombinant endochitinase from the Antarctic bacterium,< i> Sanguibacter antarcticus</i> KOPRI 21702 in< i> Pichia pastoris</i> by codon optimization." Protein expression and purification 71(1): 108-114. 199 200 201 Maertens, B., A. Spriestersbach, et al. (2010). "Gene optimization mechanisms: A multi‐gene study reveals a high success rate of full‐length human proteins expressed in Escherichia coli." Protein Science 19(7): 1312-1326. 202 Moriyama, E. N. (2003). "Codon usage." eLS. 203 204 205 Ohbayashi, N., N. Yamagata, et al. (2012). "Enhancement of the structural stability of full-length clostridial collagenase by calcium ions." Applied and environmental microbiology 78(16): 58395844. 206 207 Plotkin, J. B. and G. Kudla (2010). "Synonymous but not the same: the causes and consequences of codon bias." Nature Reviews Genetics 12(1): 32-42. 208 209 Preet Kaur, S. and W. Azmi (2013). "The Association of Collagenase with Human Diseases and its Therapeutic Potential in Overcoming them." Current Biotechnology 2(1): 10-16. 210 211 Sussman, B. J., J. W. Bromley, et al. (1981). "Injection of collagenase in the treatment of herniated lumbar disk: initial clinical report." Jama 245(7): 730-732. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 Vellard, M. (2003). "The enzyme as drug: application of enzymes as pharmaceuticals." Current opinion in biotechnology 14(4): 444-450. 214 215 Vrabelova, D., C. Adin, et al. (2014). "Evaluation of a high-yield technique for pancreatic islet isolation from deceased canine donors." Domestic animal endocrinology 47: 119-126. 216 217 YAVUZER, R., O. LATİFOĞLU, et al. (1997). "Enhanced wound healing using collagenase in guinea pig." Gazi Medical Journal 8(3). 218 219 220 Zhou, Z., P. Schnake, et al. (2004). "Enhanced expression of a recombinant malaria candidate vaccine in< i> Escherichia coli</i> by codon optimization." Protein expression and purification 34(1): 87-94. PrePrints 212 213 PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 Table 1(on next page) table of Codons rarely used in E. coli PrePrints Codons rarely used in E. coli PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 Rare codon(s) AGG, AGA, CGG, CGA GGA, GGG ATA CTA, TTA CCC TCG, TCA, AGT ACA PrePrints Amino acid Arginine Glycine Isoleucine Leucine Proline Serine Threonine PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 Table 2(on next page) table of results for the codon analysis of Col H gene before and after optimization PrePrints Results for the codon analysis of Col H gene before and after optimization PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints Amino Acid Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Before codon optimization Codon Number /1000 GCG 1.00 1.02 GCA 22.00 22.40 GCT 17.00 17.31 GCC 2.00 2.04 AGG 5.00 5.09 AGA 23.00 23.42 CGG 0.00 0.00 CGA 0.00 0.00 CGT 2.00 2.04 CGC 1.00 1.02 AAT 57.00 58.04 AAC 12.00 12.22 GAT 59.00 60.08 GAC 12.00 12.22 TGT 3.00 3.05 TGC 0.00 0.00 CAG 3.00 3.05 CAA 20.00 20.37 GAG 10.00 10.18 GAA 60.00 61.10 GGG 6.00 6.11 GGA 38.00 38.70 GGT 26.00 26.48 GGC 3.00 3.05 CAT 16.00 16.29 CAC 1.00 1.02 ATA 41.00 41.75 ATT 11.00 11.20 ATC 6.00 6.11 TTG 4.00 4.07 TTA 42.00 42.77 CTG 0.00 0.00 CTA 4.00 4.07 CTT 14.00 14.26 CTC 1.00 1.02 AAG 32.00 32.59 AAA 59.00 60.08 ATG 19.00 19.35 TTT 24.00 24.44 TTC 13.00 13.24 CCG 1.00 1.02 CCA 23.00 23.42 CCT 15.00 15.27 CCC 1.00 1.02 AGT 25.00 25.46 AGC 5.00 5.09 TCG 0.00 0.00 TCA 16.00 16.29 TCT 21.00 21.38 TCC 2.00 2.04 ACG 1.00 1.02 ACA 23.00 23.42 ACT 34.00 34.62 ACC 4.00 4.07 TGG 9.00 9.16 Fraction 0.02 0.52 0.40 0.05 0.16 0.74 0.00 0.00 0.06 0.03 0.83 0.17 0.83 0.17 1.00 0.00 0.13 0.87 0.14 0.86 0.08 0.52 0.36 0.04 0.94 0.06 0.71 0.19 0.10 0.06 0.65 0.00 0.06 0.22 0.02 0.35 0.65 1.00 0.65 0.35 0.03 0.57 0.38 0.03 0.36 0.07 0.00 0.23 0.30 0.03 0.02 0.37 0.55 0.06 1.00 Amino Acid Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp After codon optimization Codon Number /1000 GCG 14.00 14.26 GCA 13.00 13.24 GCT 8.00 8.15 GCC 7.00 7.13 AGG 0.00 0.00 AGA 0.00 0.00 CGG 0.00 0.00 CGA 0.00 0.00 CGT 20.00 20.37 CGC 11.00 11.20 AAT 10.00 10.18 AAC 59.00 60.08 GAT 35.00 35.64 GAC 36.00 36.66 TGT 1.00 1.02 TGC 2.00 2.04 CAG 16.00 16.29 CAA 7.00 7.13 GAG 19.00 19.35 GAA 51.00 51.93 GGG 0.00 0.00 GGA 0.00 0.00 GGT 36.00 36.66 GGC 37.00 37.68 CAT 5.00 5.09 CAC 12.00 12.22 ATA 0.00 0.00 ATT 23.00 23.42 ATC 35.00 35.64 TTG 0.00 0.00 TTA 0.00 0.00 CTG 65.00 66.19 CTA 0.00 0.00 CTT 0.00 0.00 CTC 0.00 0.00 AAG 25.00 25.46 AAA 66.00 67.21 ATG 19.00 19.35 TTT 12.00 12.22 TTC 25.00 25.46 CCG 33.00 33.60 CCA 6.00 6.11 CCT 1.00 1.02 CCC 0.00 0.00 AGT 0.00 0.00 AGC 17.00 17.31 TCG 0.00 0.00 TCA 0.00 0.00 TCT 34.00 34.62 TCC 18.00 18.33 ACG 6.00 6.11 ACA 0.00 0.00 ACT 16.00 16.29 ACC 40.00 40.73 TGG 9.00 9.16 Fraction 0.33 0.31 0.19 0.17 0.00 0.00 0.00 0.00 0.65 0.35 0.14 0.86 0.49 0.51 0.33 0.67 0.70 0.30 0.27 0.73 0.00 0.00 0.49 0.51 0.29 0.71 0.00 0.40 0.60 0.00 0.00 1.00 0.00 0.00 0.00 0.27 0.73 1.00 0.32 0.68 0.82 0.15 0.03 0.00 0.00 0.25 0.00 0.00 0.49 0.26 0.10 0.00 0.26 0.65 1.00 PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 TAT TAC GTG GTA GTT GTC 64.00 13.00 2.00 31.00 23.00 0.00 65.17 13.24 2.04 31.57 23.42 0.00 0.83 0.17 0.04 0.55 0.41 0.00 Tyr Tyr Val Val Val Val TAT TAC GTG GTA GTT GTC 23.00 54.00 19.00 11.00 19.00 7.00 23.42 54.99 19.35 11.20 19.35 7.13 0.30 0.70 0.34 0.20 0.34 0.13 PrePrints Tyr Tyr Val Val Val Val PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 Figure 1(on next page) image of sequences alignment The sequence alignments of original and codon-optimized Col H. 666 out of 982 codons PrePrints (67.82%) were substituted. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 PrePrints PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 2 image of the distribution of codon usage frequency along the coding region of Col H The distribution of codon usage frequency along the coding region of Col H. (A) and (B) PrePrints depict the coding region of Col H before and after codon optimization, respectively. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 3 image of the percentage distribution of codons on the basis of their qualification The percentage distribution of codons on the basis of their qualification. The value of 100 is assigned to the codon with the highest usage frequency for a specific amino acid in the PrePrints expression organism of interest. The values of < 30 are assigned to codons which hinder the expression efficiency and accuracy. (A) and (B) illustrate the percentage of Col H codons before and after codon optimization, respectively. PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014 4 image of GC content adjustment GC content adjustment. The percentage range of which GC content is optimum is between 30% and 70%. Any peaks exterior of this range will adversely influence transcriptional and PrePrints translational efficiency PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
© Copyright 2024