Codon optimization of Col H gene encoding Clostridium

Codon optimization of Col H gene encoding Clostridium
histolyticum collagenase to express in Escherichia coli
Hamzeh Alipour, Abbasali Raz, Navid Dinparast Djadid, Abbas Rami, Seyed Mohammad Amin Mahdian
A given amino acid sequence can be encoded by a huge number of different nucleic acid
sequences. These sequences, however, prove not to be equally useful. The choice of
PrePrints
sequence can significantly impact the expression of an encoded protein. As regards the
importance of protein-coding sequence and promising industrial and medicinal applications
of Clostridium histolyticum collagenase, this study examined the codon optimization of the
Col H gene so as to enhance collagenase expression in Escherichia coli (E. coli). The coding
region of mature Col H gene was optimized according to the codon usage of E. coli using
Gene Designer software (DNA 2.0). The results revealed that relative frequency of codon
usage in Col H gene was adapted to the most preferred triplets in E. coli in such a way that
codon usage bias in E. coli was enhanced after codon optimization. Similarly, the higher
level of collagenase expression was more likely the result of substituting rare codons with
optimal codons. As has been reported elsewhere, the findings from this study suggest that
codon optimization provides a theoretical improvement in Col H gene expression in E. coli.
In spite of that, experimental research is needed to confirm the improvement.
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
1
Alipour Hamzeh1, 2, Raz Abbasali1, Dinparast Djadid Navid1, Rami Abbas1 and Mahdian Seyed
2
Mohammad Amin1,*
3
1
4
Institute of Iran, Tehran, Iran
5
2
6
School of Health, Shiraz University of Medical Sciences, Shiraz, Iran
7
*
8
Research Group (MVRG), Biotechnology Research Center (BRC), Pasteur Institute of Iran, Tehran,
9
Iran. Tel: 02636100955 Fax: 02636100955 E-mail: [email protected]
Malaria and Vector Research Group [MVRG], Biotechnology Research Center [BRC], Pasteur
Department of Medical Entomology and Vector Control, Research Centre for Health Sciences,
To whom correspondence should be addressed: Seyed Mohammad Amin Mahdian, Malaria and Vector
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
10
Introduction
11
Collagens that exist in a huge number of cell types are the integral component of animal tissues
12
such as skin, tendons and cartilage as well as the organic constituent of bones, teeth and cornea.
13
In fact, it accounts for about 25 to 33% of the total protein in mammalian organs. Collagen which
14
exists in the connective tissues of nearly all organs is as insoluble fibers. It imbeds in the
15
mucopolysaccharides and protein of the extracellular matrix and plays a vital role in tissues
16
strength (Harrington 1996). A change in its production or degradation has been shown to result in
17
a variety of diseases. In such cases, proteolytic enzymes provide a useful clinical way to the
18
treatment of collagen-centered disorders. On the other hand, making use of enzymes as drug
19
enjoys two advantages which differentiate them from all other types of drugs. To start with,
20
enzymes frequently bind and act on their substrates with a high affinity. In the second place,
21
enzymes are catalytic and transform target molecules into certain products. Thus, enzymes are
22
more specific and potent drugs than small molecules that can accomplish therapeutic
23
biochemistry in the body. These features have given rise to the development of many enzymatic
24
drugs for a large variety of disorders (Vellard 2003).
25
A tight triple helical structure that makes up collagen causes its resistance to most proteases; all
26
the same, collagenases can specifically degrade collagen. Bacterial collagenases, in addition,
27
have been shown to display broader substrate specificity than vertebrate collagenases.
28
Collagenases derived from Clostridium histolyticum, namely Col G and Col H, are a case in point
29
since these are capable of easily digesting collagens, no matter what size and type they have
30
(Ohbayashi, Yamagata et al. 2012; Preet Kaur and Azmi 2013).
31
Recently, making use of Clostridium collagenases has attracted a great deal of researchers’
32
interest as a non-invasive therapeutic procedure. As such, these collagenases have been examined
33
for the treatment of Dupuytren's disease (Badalamente, Hurst et al. 2002), Peyronie’s Disease
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
34
(HELLSTROM and BIVALACQUA 2000), Herniated Lumbar Disk (Sussman, Bromley et al.
35
1981), retained placenta (Fecteau, Haffner et al. 1998), adhesive capsulitis (Badalamente and
36
Wang 2012), wound healing (YAVUZER, LATİFOĞLU et al. 1997),the debridement of burns
37
(Klasen 2000) and in the preparation of pancreatic islet cells for transplantation (Vrabelova, Adin
38
et al. 2014).
39
Nowadays, culturing C. Histolyticum and subsequent purifying all of the produced bacterial
40
proteins is a widely used method of producing collagenases for clinical applications (Bertuzzi,
41
Cuttitta et al. 2014). Nevertheless, the isolation and extraction of enzyme from natural sources
42
due to low expression levels and the intracellular localization of enzyme faces technical problems
43
and increases the costs of production. For commercial success, decreasing the cost of production
44
is necessary which, in turn, depends on the expression level of enzyme and the purification costs.
45
As such, there has been growing interest in producing enzyme by recombinant methods (Katrolia,
46
Yan et al. 2011). Prokaryotic expression systems, especially E. coli, are one of the most common
47
systems for the industrial production of proteins of therapeutic or commercial applications. E.
48
coli has advantages including growth on inexpensive media, rapid biomass accumulation,
49
convenient genetic manipulation, and simple scale-up (Baneyx and Mujacic 2004). Nonetheless,
50
the production of heterologous protein in the organism may be decreased by codon bias
51
phenomenon, in which proteins of interest contain codons that are rarely used in E. coli (Burgess-
52
Brown, Sharma et al. 2008). It has revealed that the presence of rare codons results in decreasing
53
translation speed and inducing translational errors (Plotkin and Kudla 2010; Elena, Ravasi et al.
54
2014). The rarest codons in E. coli are illustrated in Table 1. In addition to rare codons, GC
55
content can affect expression levels. GC-rich mRNAs can contribute to forming powerful
56
secondary structures and, especially in bacteria, such a powerful structure near ribosome-binding
57
site obstructs translation initiation. On the other hand, GC-poor mRNAs cannot fold strongly and
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
58
frequently carry sequence elements limiting expression. For instance, low GC content has been
59
seen to restrict the expression of Plasmodium falciparum genes in E. coli. Such mRNAs seem to
60
be targets for RNase E cleaving AU-rich sequence (Plotkin and Kudla 2010). Codon optimization
61
has been considered to be a common strategy to improve translation efficiency and accuracy
62
(Gingold and Pilpel 2011). Codon optimization, indeed, is to alter rare codons in target gene so as
63
to adapt them to the codon usage of specific expression host. Recently, a huge number of studies
64
have reported an increase in expression levels by codon optimization (Gustafsson, Govindarajan
65
et al. 2004; Zhou, Schnake et al. 2004; Lee, Koh et al. 2010). Here, we explored codon
66
optimization of Col H gene to express in E. coli.
67
Material and methods:
68
Codon optimization of Col H
69
The protein sequence of Col H was taken from UniProt database and 40 amino acids of putative
70
signal peptide (MKRKCLSKRLMLAITMATIFTVNSTLPIYAAVDKNNATAA) were removed
71
in order to generate a mature enzyme. Then, the coding region of mature Col H gene was
72
optimized according to the codon usage of E. coli using Gene Designer software (DNA 2.0). This
73
software using proprietary algorithms replaces rare codons and eliminates problematic mRNA
74
structures and repetitive sequences.
75
Gene Sequence Analyses
76
The sequence analysis of native and optimized Col H gene was performed using online software
77
involving Rare Codon Analysis Tool and Sequence Analysis which are available on web sites
78
www.genscript.com and www.bioinformatics.org, respectively.
79
Results and Discussion
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
80
Codon optimization of Col H
81
The native gene used tandem rare codons such as AGA AGG which have been shown to greatly
82
affect heterologous expression in E. coli. These effects include ribosome pausing and co-
83
translational cleavage of mRNA, ribosomal frame shifting or amino acid misincorporation
84
(Burgess-Brown, Sharma et al. 2008; Plotkin and Kudla 2010). Codon optimization substituted
85
such codons and adapted frequency of codon usage to the most preferred triplets in E. coli so that
86
its codon usage bias was enhanced. Moreover, the sequence of native (GenBank accession
87
number D29981) and codon-optimized genes were aligned, indicating that codon optimization
88
did not alter the amino acid sequence and that 666 out of 982 codons (67.82%) were substituted
89
(Figure 1).
90
Percentage of non-optimal and optimal codons before and after codon optimization
91
The native and optimized codon sequences of Col H enzyme are compared using Sequence
92
Analysis software. The gene coding collagenase contains 2946 bp encoded the protein 982 amino
93
acids. Before codon optimization, the numbers of rare codon GGA and GGG (codons encoding
94
glycine) were 38 and 6, respectively. On the other hand, the codons GGC and GGT that are
95
optimal for glycine were 3 and 26, respectively. After codon optimization, the rare codons
96
reached 0 and the optimal codons GGC and GGT increased to 37 and 36, respectively. Moreover,
97
the rare codons AGA and AGG (coding arginine) that were 23 and 5 fell to zero after
98
optimization. In contrast, the codons CGT and CGC which are optimal for arginine increased
99
from 2 and 1 to 20 and 11, respectively, after optimization. Additionally, the numbers of rare
100
codons for leucine, i.e. TTA and CTA, decreased from 42 and 4 to zero. On the contrary, codon
101
optimization raised the optimal codon CTG to 65. Furthermore, the codons AGT and TCA that
102
are rare for serine diminished to zero and, in contrast, increased numbers in the optimal codons
103
TCT, TCC, and AGC emerged from codon optimization. In addition, after codon optimization,
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
104
the rare codon ACA encoding threonine declined from 23 to 0 and the optimal codon ACC
105
increased from 4 to 40. As far as isoleucine amino acid is concerned, the rare codon ATA reduced
106
from 41 to 0 and the optimal codons ATC and ATT rose to 35 and 23, respectively. As for proline,
107
the rare codon CCC decreased from 1 to 0 and optimal codon CCG increased from 1 to 33. As to
108
other amino acids, it has also been shown that codons biased towards more frequent codons via
109
codon optimization. As an example, less frequent codon AAT that encode asparagine reduced
110
from 57 to 10 and, contrarily, more frequent codon AAC increased from 12 to 59 (Table 2)
111
Codon Adaption Index
112
Codon Adaption Index (CAI) assesses the extent of bias in favor of codons which are involved in
113
highly expressed genes (Moriyama 2003). The levels of protein expression and CAI are known to
114
be correlated. A CAI of > 0.8 is regarded to be optimum for expression in the expression host of
115
interest. As shown in Figure 2, the CAI value for original gene was 0.62; however, codon
116
optimization increased the CAI of the coding sequence to 0.84 that is ideal for expression in E.
117
coli.
118
Frequency of optimal codon
119
The simplest method for measuring species-specific codon usage bias is the frequency of optimal
120
codons (Fop):
121
where Xop and Xnon are the number of optimal and non-optimal codons in a gene, respectively.
122
Codons excluded from the calculation contain stop codons and codons for tryptophan and
123
methionine. Optimal codons for E. coli were originally determined based on the availability of
124
tRNA and the nature of the codon-anticodon interplay. These codons are thought to be
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
125
translationally optimal and are more frequently involved in genes expressed highly than lowly
126
expressed genes (Moriyama 2003). Non-optimal codon content is currently known to limit the
127
expression of heterologous proteins owing to restricting available cognate tRNAs in the
128
expression host (Angov 2011). As can be seen from Figure 3, the 44% of Col H codons showed
129
the value of 100 before codon optimization whilst they increased to 64% with codon
130
optimization. Additionally, the percentage of rare codons (codons with values lower than 30) was
131
12, which decreased to 0 following codon optimization.
132
GC content adjustment
133
Original gene encoding Col H showed a GC low content, resulting in low expression. In spite of
134
that, codon optimization increased it from 30.99% to 47.46% (Figure 4).
135
Conclusion
136
In order to quantify translation efficiency, two measures are available. The first type of measures
137
evaluates the codon bias of genes which CAI is a good case in point. A further kind is based on
138
the availability of tRNA at each codon along the gene which Fop is a case in point. A privileged
139
status of the Fop over the CAI is that it reduces the need to identify a set of highly expressed
140
genes as a reference. On the contrary, it only needs the recognition of all tRNA genes in the
141
genome and their classification in accordance with their anti-codon (Gingold and Pilpel 2011).
142
Here, the Gene designer stand-alone software applied to optimize codons considers both
143
methods. Furthermore, GC content in the Col H gene was balanced using the software. As such,
144
better results are expected to have been obtained using the software.
145
As has been reported elsewhere [e.g. (Hu, Shi et al. 1996; Johansson, Bolton-Grob et al. 1999;
146
Zhou, Schnake et al. 2004; Burgess-Brown, Sharma et al. 2008)] the findings from this study
147
suggest that codon optimization provides a theoretical improvement in Col H gene expression in
148
E. coli. In spite of that, experimental research is needed to confirm the improvement. This is
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
149
because although codon bias greatly impacts gene expression, it is not the only contributing
150
factor. The selection of expression vectors and transcriptional promoters is also imperative
151
(Gustafsson, Govindarajan et al. 2004). In addition, a sequence motif in the vicinity of the
152
initiation AUG (Gingold and Pilpel 2011) and mRNA stability at the 5´terminus (Maertens,
153
Spriestersbach et al. 2010) were shown to play a role in the gene expression.
154
Acknowledgments
155
We would like to thank Dr. Ramin Rahmany for reviewing and editing the manuscript.
156
Competing interests
157
The authors declare that they have no competing interests.
158
References
159
160
Angov, E. (2011). "Codon usage: Nature's roadmap to expression and folding of proteins."
Biotechnology journal 6(6): 650-659.
161
162
Badalamente, M. A., L. C. Hurst, et al. (2002). "Collagen as a clinical target: nonoperative
treatment of Dupuytren's disease." The Journal of hand surgery 27(5): 788-798.
163
164
Badalamente, M. A. and E. Wang (2012). Methods for treating adhesive capsulitis, Google
Patents.
165
166
Baneyx, F. and M. Mujacic (2004). "Recombinant protein folding and misfolding in Escherichia
coli." Nature biotechnology 22(11): 1399-1408.
167
168
Bertuzzi, F., A. Cuttitta, et al. (2014). Clostridium histolyticum recombinant collagenases and
method for the manufacture thereof, Google Patents.
169
170
171
Burgess-Brown, N. A., S. Sharma, et al. (2008). "Codon optimization can improve expression of
human genes in< i> Escherichia coli</i>: A multi-gene study." Protein expression and
purification 59(1): 94-102.
172
173
Elena, C., P. Ravasi, et al. (2014). "Expression of codon optimized genes in microbial systems:
current industrial applications and perspectives." Frontiers in microbiology 5.
174
175
176
Fecteau, K., J. Haffner, et al. (1998). "The potential of collagenase as a new therapy for
separation of human retained placenta: hydrolytic potency on human, equine and bovine
placentae." Placenta 19(5): 379-383.
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
177
178
Gingold, H. and Y. Pilpel (2011). "Determinants of translation efficiency and accuracy."
Molecular systems biology 7(1).
179
180
Gustafsson, C., S. Govindarajan, et al. (2004). "Codon bias and heterologous protein expression."
Trends in biotechnology 22(7): 346-353.
181
182
Harrington, D. J. (1996). "Bacterial collagenases and collagen-degrading enzymes and their
potential role in human disease." Infection and immunity 64(6): 1885.
183
184
HELLSTROM, W. J. and T. J. BIVALACQUA (2000). "Peyronie's disease: etiology, medical, and
surgical therapy." Journal of andrology 21(3): 347-354.
185
186
187
Hu, X., Q. Shi, et al. (1996). "Specific Replacement of Consecutive AGG Codons Results in
High-Level Expression of Human Cardiac Troponin T in< i> Escherichia coli</i>." Protein
expression and purification 7(3): 289-293.
188
189
190
Johansson, A.-S., R. Bolton-Grob, et al. (1999). "Use of Silent Mutations in cDNA Encoding
Human Glutathione Transferase M2-2 for Optimized Expression in< i> Escherichia coli</i>."
Protein expression and purification 17(1): 105-112.
191
192
193
Katrolia, P., Q. Yan, et al. (2011). "Molecular cloning and high-level expression of a βgalactosidase gene from< i> Paecilomyces aerugineus</i> in< i> Pichia pastoris</i>." Journal of
Molecular Catalysis B: Enzymatic 69(3): 112-119.
194
195
Klasen, H. (2000). "A review on the nonoperative removal of necrotic tissue from burn wounds."
Burns 26(3): 207-222.
196
197
198
Lee, S. G., H. Y. Koh, et al. (2010). "Expression of recombinant endochitinase from the Antarctic
bacterium,< i> Sanguibacter antarcticus</i> KOPRI 21702 in< i> Pichia pastoris</i> by codon
optimization." Protein expression and purification 71(1): 108-114.
199
200
201
Maertens, B., A. Spriestersbach, et al. (2010). "Gene optimization mechanisms: A multi‐gene
study reveals a high success rate of full‐length human proteins expressed in Escherichia coli."
Protein Science 19(7): 1312-1326.
202
Moriyama, E. N. (2003). "Codon usage." eLS.
203
204
205
Ohbayashi, N., N. Yamagata, et al. (2012). "Enhancement of the structural stability of full-length
clostridial collagenase by calcium ions." Applied and environmental microbiology 78(16): 58395844.
206
207
Plotkin, J. B. and G. Kudla (2010). "Synonymous but not the same: the causes and consequences
of codon bias." Nature Reviews Genetics 12(1): 32-42.
208
209
Preet Kaur, S. and W. Azmi (2013). "The Association of Collagenase with Human Diseases and
its Therapeutic Potential in Overcoming them." Current Biotechnology 2(1): 10-16.
210
211
Sussman, B. J., J. W. Bromley, et al. (1981). "Injection of collagenase in the treatment of
herniated lumbar disk: initial clinical report." Jama 245(7): 730-732.
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
Vellard, M. (2003). "The enzyme as drug: application of enzymes as pharmaceuticals." Current
opinion in biotechnology 14(4): 444-450.
214
215
Vrabelova, D., C. Adin, et al. (2014). "Evaluation of a high-yield technique for pancreatic islet
isolation from deceased canine donors." Domestic animal endocrinology 47: 119-126.
216
217
YAVUZER, R., O. LATİFOĞLU, et al. (1997). "Enhanced wound healing using collagenase in
guinea pig." Gazi Medical Journal 8(3).
218
219
220
Zhou, Z., P. Schnake, et al. (2004). "Enhanced expression of a recombinant malaria candidate
vaccine in< i> Escherichia coli</i> by codon optimization." Protein expression and purification
34(1): 87-94.
PrePrints
212
213
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
Table 1(on next page)
table of Codons rarely used in E. coli
PrePrints
Codons rarely used in E. coli
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
Rare codon(s)
AGG, AGA, CGG, CGA
GGA, GGG
ATA
CTA, TTA
CCC
TCG, TCA, AGT
ACA
PrePrints
Amino acid
Arginine
Glycine
Isoleucine
Leucine
Proline
Serine
Threonine
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
Table 2(on next page)
table of results for the codon analysis of Col H gene before and after optimization
PrePrints
Results for the codon analysis of Col H gene before and after optimization
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
Amino Acid
Ala
Ala
Ala
Ala
Arg
Arg
Arg
Arg
Arg
Arg
Asn
Asn
Asp
Asp
Cys
Cys
Gln
Gln
Glu
Glu
Gly
Gly
Gly
Gly
His
His
Ile
Ile
Ile
Leu
Leu
Leu
Leu
Leu
Leu
Lys
Lys
Met
Phe
Phe
Pro
Pro
Pro
Pro
Ser
Ser
Ser
Ser
Ser
Ser
Thr
Thr
Thr
Thr
Trp
Before codon optimization
Codon
Number
/1000
GCG
1.00
1.02
GCA
22.00
22.40
GCT
17.00
17.31
GCC
2.00
2.04
AGG
5.00
5.09
AGA
23.00
23.42
CGG
0.00
0.00
CGA
0.00
0.00
CGT
2.00
2.04
CGC
1.00
1.02
AAT
57.00
58.04
AAC
12.00
12.22
GAT
59.00
60.08
GAC
12.00
12.22
TGT
3.00
3.05
TGC
0.00
0.00
CAG
3.00
3.05
CAA
20.00
20.37
GAG
10.00
10.18
GAA
60.00
61.10
GGG
6.00
6.11
GGA
38.00
38.70
GGT
26.00
26.48
GGC
3.00
3.05
CAT
16.00
16.29
CAC
1.00
1.02
ATA
41.00
41.75
ATT
11.00
11.20
ATC
6.00
6.11
TTG
4.00
4.07
TTA
42.00
42.77
CTG
0.00
0.00
CTA
4.00
4.07
CTT
14.00
14.26
CTC
1.00
1.02
AAG
32.00
32.59
AAA
59.00
60.08
ATG
19.00
19.35
TTT
24.00
24.44
TTC
13.00
13.24
CCG
1.00
1.02
CCA
23.00
23.42
CCT
15.00
15.27
CCC
1.00
1.02
AGT
25.00
25.46
AGC
5.00
5.09
TCG
0.00
0.00
TCA
16.00
16.29
TCT
21.00
21.38
TCC
2.00
2.04
ACG
1.00
1.02
ACA
23.00
23.42
ACT
34.00
34.62
ACC
4.00
4.07
TGG
9.00
9.16
Fraction
0.02
0.52
0.40
0.05
0.16
0.74
0.00
0.00
0.06
0.03
0.83
0.17
0.83
0.17
1.00
0.00
0.13
0.87
0.14
0.86
0.08
0.52
0.36
0.04
0.94
0.06
0.71
0.19
0.10
0.06
0.65
0.00
0.06
0.22
0.02
0.35
0.65
1.00
0.65
0.35
0.03
0.57
0.38
0.03
0.36
0.07
0.00
0.23
0.30
0.03
0.02
0.37
0.55
0.06
1.00
Amino Acid
Ala
Ala
Ala
Ala
Arg
Arg
Arg
Arg
Arg
Arg
Asn
Asn
Asp
Asp
Cys
Cys
Gln
Gln
Glu
Glu
Gly
Gly
Gly
Gly
His
His
Ile
Ile
Ile
Leu
Leu
Leu
Leu
Leu
Leu
Lys
Lys
Met
Phe
Phe
Pro
Pro
Pro
Pro
Ser
Ser
Ser
Ser
Ser
Ser
Thr
Thr
Thr
Thr
Trp
After codon optimization
Codon
Number
/1000
GCG
14.00
14.26
GCA
13.00
13.24
GCT
8.00
8.15
GCC
7.00
7.13
AGG
0.00
0.00
AGA
0.00
0.00
CGG
0.00
0.00
CGA
0.00
0.00
CGT
20.00
20.37
CGC
11.00
11.20
AAT
10.00
10.18
AAC
59.00
60.08
GAT
35.00
35.64
GAC
36.00
36.66
TGT
1.00
1.02
TGC
2.00
2.04
CAG
16.00
16.29
CAA
7.00
7.13
GAG
19.00
19.35
GAA
51.00
51.93
GGG
0.00
0.00
GGA
0.00
0.00
GGT
36.00
36.66
GGC
37.00
37.68
CAT
5.00
5.09
CAC
12.00
12.22
ATA
0.00
0.00
ATT
23.00
23.42
ATC
35.00
35.64
TTG
0.00
0.00
TTA
0.00
0.00
CTG
65.00
66.19
CTA
0.00
0.00
CTT
0.00
0.00
CTC
0.00
0.00
AAG
25.00
25.46
AAA
66.00
67.21
ATG
19.00
19.35
TTT
12.00
12.22
TTC
25.00
25.46
CCG
33.00
33.60
CCA
6.00
6.11
CCT
1.00
1.02
CCC
0.00
0.00
AGT
0.00
0.00
AGC
17.00
17.31
TCG
0.00
0.00
TCA
0.00
0.00
TCT
34.00
34.62
TCC
18.00
18.33
ACG
6.00
6.11
ACA
0.00
0.00
ACT
16.00
16.29
ACC
40.00
40.73
TGG
9.00
9.16
Fraction
0.33
0.31
0.19
0.17
0.00
0.00
0.00
0.00
0.65
0.35
0.14
0.86
0.49
0.51
0.33
0.67
0.70
0.30
0.27
0.73
0.00
0.00
0.49
0.51
0.29
0.71
0.00
0.40
0.60
0.00
0.00
1.00
0.00
0.00
0.00
0.27
0.73
1.00
0.32
0.68
0.82
0.15
0.03
0.00
0.00
0.25
0.00
0.00
0.49
0.26
0.10
0.00
0.26
0.65
1.00
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
TAT
TAC
GTG
GTA
GTT
GTC
64.00
13.00
2.00
31.00
23.00
0.00
65.17
13.24
2.04
31.57
23.42
0.00
0.83
0.17
0.04
0.55
0.41
0.00
Tyr
Tyr
Val
Val
Val
Val
TAT
TAC
GTG
GTA
GTT
GTC
23.00
54.00
19.00
11.00
19.00
7.00
23.42
54.99
19.35
11.20
19.35
7.13
0.30
0.70
0.34
0.20
0.34
0.13
PrePrints
Tyr
Tyr
Val
Val
Val
Val
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
Figure 1(on next page)
image of sequences alignment
The sequence alignments of original and codon-optimized Col H. 666 out of 982 codons
PrePrints
(67.82%) were substituted.
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
PrePrints
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
2
image of the distribution of codon usage frequency along the coding region of Col H
The distribution of codon usage frequency along the coding region of Col H. (A) and (B)
PrePrints
depict the coding region of Col H before and after codon optimization, respectively.
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
3
image of the percentage distribution of codons on the basis of their qualification
The percentage distribution of codons on the basis of their qualification. The value of 100 is
assigned to the codon with the highest usage frequency for a specific amino acid in the
PrePrints
expression organism of interest. The values of < 30 are assigned to codons which hinder the
expression efficiency and accuracy. (A) and (B) illustrate the percentage of Col H codons
before and after codon optimization, respectively.
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014
4
image of GC content adjustment
GC content adjustment. The percentage range of which GC content is optimum is between
30% and 70%. Any peaks exterior of this range will adversely influence transcriptional and
PrePrints
translational efficiency
PeerJ PrePrints | http://dx.doi.org/10.7287/peerj.preprints.754v1 | CC-BY 4.0 Open Access | rec: 21 Dec 2014, publ: 21 Dec 2014