May 01, 2015
By Siavash Bashiri, David Vikström, Nurzian Ismail
Volume 28, Issue 5
Production of proteins for manufacturing therapeutics and pharmaceuticals is a complicated process, and its optimization can be time-consuming. Many problems are associated with protein expression, including toxicity, misfolding and degradation, aggregation in inclusion bodies, low yields, and difficulties in purification. One of the most commonly used protein expression systems uses Escherichia coli as a protein factory. E. coli has many advantages—including rapid growth, ease of scale up, and low costs—but it poses challenges as well. Here, the authors discuss some parameters that can influence protein yields and quality during protein expression in E. coli.
Bacterial strain selection
The choice of a bacterial strain for protein expression is closely tied to the properties of the target protein to be expressed and the choice of expression vector. T7-based expression systems are based on expression from the strong T7 promoter. The BL21(DE3) strain is one of the most commonly used strains in both industry and academia. It carries the phage T7 gene 1—encoding T7 RNA polymerase—in its chromosome under the control of the lacUV5 promoter. Protein expression can be induced with the addition of isopropyl β-D-1-thiogalactopyranoside (IPTG).
Expression in the T7-based system tends to be “leaky,” which is problematic for the expression of toxic proteins—but this can be resolved by presence of pLysS or pLysE, which expresses T7 lysozyme, an inhibitor of T7 RNA polymerase (1). The BL21-AI strain might be a better alternative for tighter regulation, because the expression of T7 RNA polymerase from the chromosome is now under the control of the arabinose promoter, which has lower basal expression than the lacUV5 promoter.
In some cases, better yields of toxic proteins may be obtained in the C41(DE3) and C43(DE3) strains. These strains were derived from BL21(DE3) and were specially selected because they could tolerate the expression of toxic proteins (2). Interestingly, the mutations that were key to this tolerance converted the strong lacUV5 promoter to one that resembled the weaker wild-type lac promoter (3).
Controlling the intensity of expression is a major problem with T7-based systems. One way to do this is to lower the amount of inducer added to the culture. However, this is problematic with the “all-or-nothing” induction characteristics of the T7-based systems, because low inducer concentration leads to a mixed population. This population is typically comprised of a few cells expressing large amounts of the target of low quality, and the majority of the cells not expressing anything (4).
Codon usage differs among prokaryotes and eukaryotes. Thus, it may be difficult to obtain reasonable yields of nonbacterial proteins expressed in E. coli. The Rosetta strains, which carry a plasmid supplementing tRNAs for six to seven rare codons in E. coli, may help to overcome that. Strains, such as Shuffle (New England Biolabs) and Origami (Novagen), are specially engineered to provide an oxidizing environment in the cytoplasm, allowing the formation of disulfide bonds in the cytoplasm itself, and circumventing the need to target these proteins to the periplasm. The Lemo21(DE3) (Xbrane Biosciences) and Tuner (Novagen) strains confer the capacity to tune the expression of target proteins, and this may favor higher yields of better-quality protein. Lemo21(DE3) carries a plasmid harboring the gene for T7 lysozyme, an inhibitor of T7 RNA polymerase, under the control of the titratable rhamnose promoter. By regulating the expression of T7 lysozyme, the level of expression of the target protein from the T7 promoter may be tuned (3).
Choice of promoter/vector system
The pET vector series contains either the T7 promoter or the T7/lacO promoter, which has the lac operator sequence inserted between the T7 promoter and translation initiation site to reduce basal expression. Selection for pET-harbouring cells may be performed with either ampicillin or kanamycin, depending on the pET vector chosen. The choice of a specific promoter system depends on the strength of promoter desired, the “leakiness” of the promoter system (which is undesirable for highly toxic proteins), and compatibility with the bacterial strain selected. The T7 promoter system requires the use of T7 RNAP-containing strains, but promoters such as lac, lacUV5, T5, tac, trc, rhaBAD, and araBAD may be used with any E. coli strain.
The lacUV5 promoter, a derivative of the lac promoter, contains two mutations in the -10 region and an additional mutation at -66 within the catabolite gene activator protein (CAP) binding site. These mutations result in an increase in promoter strength and reduced catabolite repression of the lacUV5 promoter (5).
The tac promoter combines the -10 region of the lacUV5 promoter and the -35 region of the trp promoter and is at least five-fold more efficient than the lacUV5 promoter (6).
The trc promoter has similar promoter strength to the tac promoter and varies in sequence only by one base pair (7).
The lac, lacUV5, tac, and trc promoters all include the binding site for lacI repressor, thus, in order to achieve efficient repression, the plasmid must also carry the lacI or lacIq gene, especially for high copy plasmids. The lacIq gene contains a mutation in its promoter that enhances lacI expression by tenfold (8).
The pBAD series of plasmids allow protein expression from the araBAD promoter. Repression of the araBAD promoter is more efficient than lac-derived promoters, reducing any unwanted basal expression. Protein expression from the araBAD promoter may be modulated to a limited extent by varying inducer concentrations, but these promoters also suffer from the “all-or-nothing” induction characteristics as previously described for the T7-based systems (9).
If greater control is desired, vectors containing the rhaBAD promoter—including the two regulatory genes RhaS and RhaR—allow protein expression to be tuned more efficiently (4).
The Rhamex vectors (Xbrane Biosciences), which include the regulatory genes RhaR and RhaS, allow expression from the rhaBAD promoter, and are available in a range of copy numbers, providing an additional level of control to protein yields. Depending on the protein target, the ability to tune the level of expression of the target mRNA may give certain advantages, such as increased protein accumulation, increased protein solubility, and increased cell fitness.
The addition of affinity tags may aid the detection and purification of target proteins. Common affinity tags include poly-His, FLAG, c-Myc, poly-Arg, and StrepII tags. The tag could be located in the N- or C-terminus of the target protein, but the position of the tag should not affect localization or topology of the protein in the case of membrane proteins. For example, for proteins synthesized with targeting signal sequences, the tag should be located at the C-terminus to avoid mis-targeting and to ensure mature protein capture, especially if the signal sequence is cleaved.
Another advantage to inserting the tag in the C-terminus is that it allows detection of the fully synthesized protein. The protein can only be detected using the tag if the entire polypeptide has been synthesized. Depending on the purpose of protein expression, these tags may be left in the final product or cleaved off with the use of specific proteases during the purification process.
Proteases such as Tev protease, enterokinase, thrombin, and factor Xa recognize distinct amino acid sequences, which may be included after the tag for N-terminally-tagged proteins or before the tag for C-terminally-tagged proteins. The choice of protease depends on the specificity of the protease recognition site, the amino acids that are left in the mature protein after cleavage, ease of protease removal during purification, and the cost. It should be noted that tagging of a target protein may interfere with its correct folding, assembly of complexes, the activity of protein, and even expression yields, so it is best to leave a protein untagged, if possible.
Recombinant proteins expressed in E. coli often end up in inclusion bodies. This is not necessarily a bad thing, as inclusion bodies are easily isolated, and it is sometimes possible to refold a protein that has been isolated in this way to sufficiently high quality. The refolding step adds an additional stage in the purification process, however, and requires additional time for optimization. This increases the cost of production. In addition, only a fraction of the isolated protein will refold to give active protein, resulting in loss of yield.
The addition of such fusion partners as maltose binding protein (MBP), glutathione-S-transferase (GST), ubiquitin, SUMO, or thioredoxin (Trx), which are present in plasmids supplied by various companies, may aid solubility. It may be necessary, however, to screen several fusion partners, because these proteins may not enhance solubility for some targets, or they may affect solubility to different levels.
Some of these fusion partners also aid purification. For example, MBP will bind to amylose-agarose, while GST will bind to glutathione-agarose for purification by affinity chromatography. Similar to affinity tags, these fusion partners must be cleaved off during purification. Unfortunately, in some cases, the target protein may not remain soluble after cleavage of the fusion partner.
Expression conditions can have two different effects: First, they can increase protein yield per cell, and second, they can increase cell densities per volume of culture. The ideal scenario would be obtaining a condition in which both protein yield per cell and cell densities per volume are high, but this cannot always be achieved. It may be necessary to optimize expression conditions to obtain high cell densities to compensate for a low protein yield per cell. There are many parameters that affect cell growth and recombinant protein expression, such as choice of medium, carbon source, temperature, pH, aeration, inducer concentrations, and length of induction.
For batch cultivation, LB (Luria-Bertani) medium is commonly used. Although it is a rich medium, it does not support growth to very high cell densities, particularly because it contains a low amount of carbon source and divalent cations. Media such as 2xYT, Terrific Broth (TB), and Super Broth (SB) are better than LB for obtaining high cell densities. However, nutrients become limiting in batch cultivations. Much higher cell densities may be obtained in fed-batch cultivations.
Another type of medium, the auto-induction medium, eliminates the need for the addition of an inducer, specifically for lactose-induced systems. It relies on having a mixture of glycerol, D-glucose, and α-lactose in the medium. D-glucose is generally preferred, and represses any expression from lac-based promoters. Once D-glucose is depleted, lactose is taken up by the cells, which then promotes the expression of target protein from the lac promoter. The amount of D-glucose in the medium determines the timing of the induction. The use of an auto-induction medium simplifies cultivation procedures, and, in some cases, improves protein yields.
Cultures are typically grown at 30–37 °C, but the temperature may be optimized for the target being expressed. In some cases, a greater fraction of soluble protein is achieved when cultures are grown at lower temperatures, although the trade-off is that the cultures grow more slowly. Subjecting cultures to high temperature for a short time initiates the expression of heat-shock chaperones, which may be beneficial in promoting higher yields of properly-folded proteins. Inducer concentrations and the induction period are additional parameters that can be optimized according to the target protein, especially if a titratable promoter system is used. The optimal length of induction period can vary from 4–24 hours. Another parameter, aeration, is affected by the choice of vessels/flasks and the volume of culture used. Ideally, the volume of culture should not exceed 10% of the total volume of the flask. Increasing the shaking speed and the use of baffled flasks also promotes better aeration.
Choosing the best expression system depends largely on the target protein and the scale of manufacturing. There are many factors that can be optimized. This article merely touches on a number of them, such as the choice of E. coli strain; the promoter system; the need for tags such as signal sequences or solubility-enhancers; and importantly, the culture and induction conditions. It is important to consider the entire process—from vector and insert design to bioreactor conditions—because each part of the process will have a huge impact on the final result.
1. F.W. Studier, J. Mol. Biol. 219, pp. 37-44 (1991).
2. B. Miroux and J.E. Walker, J. Mol. Biol. 260, pp. 289-98 (1996).
3. S. Wagner et al., Proc. Natl. Acad. Sci. 105, pp. 14371-14376 (2008).
4. M.J. Giacalone et al., BioTechniques 40, pp. 355-64 (2006).
5. B. J. Hirschel et al., J. Bacteriol. 143, pp. 1534-1537 (1980).
6. E. Amann et al., Gene 25, pp. 167-78 (1983).
7. M.E. Mulligen et al., J. Biol. Chem. 260, pp. 3529-3538 (1985).
8. M.P. Calos, Nature 274, pp. 762-765 (1978).
9. D.A. Siegele, D.A. and J.C. Hu, Proc. Natl. Acad. Sci. 94, pp. 8168-8172 (1997).
About the Authors
Siavash Bashiri is CEO; David Vikström is CTO; and Nurzian Ismail is senior scientist and project leader; all at Xbrane Bioscience.
Vol. 28, No. 5