January2016/Plasmid design
The goal of this design is to give ourselves as high a chance of success as possible by re-creating a published successful experiment from Kim2005 that expressed a partial human kappa-casein and trying a variation of their experiment where we express full-length bovine kappa casein.
Plasmid backbone
We're using the DNA 2.0 shuttle vector backbone pD1204. Which has the following features:
- E. coli high copy number origin of replication
- E. coli ampicillin resistance marker
- S. cerevisiae Ura3 gene
- S. cerevisiae GAL1 galactose inducible promoter immediately before SapI site
- SapI site for integration of gene of interest
- S. cerevisiae strong terminator immediately after SapI site
Here is what we want to end up with:
on-vector GAL1 promoter | (alpha factor secretion tag) + kappa-casein gene + his tag | on-vector terminator
(the parts in bold are what we need to synthesize)
We put the his-tag at the C-terminal end since then we can use his-purification even if cleavage of the secretion tag fails. We can even try to purify any protein expressed by E. coli due to leaky expression.
In this design we have no way of cleaving the his tag. That will have to come later.
Synthesized sequence parts
The following sub-sections list the parts that we need to synthesize (as a single piece of DNA) and insert into the plasmid backbone.
We're designing two variants of the synthesized sequence:
- One based on hCMP
- One based full-length bovine kappa-casein (CSN3*B)
We know that hCMP has already been done by Kim2005 and we are replicating their experiment as best we can. If that fails then we know we're doing something wrong. If the hCMP version succeeds but the full-length fails then we're doing things right but something is inherently problematic with the full-length protein (maybe it is toxic to S. cerevisiae?).
SapI recognition sequences
We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone. From page 4 of Electra PDF.
# [] denotes recognition sequences # | denotes where the cut will happen # 5' end - the cut will happen before the ATG [GCTCTTC]T|ATG ----- [CGAGAAG]A TAC| # 3' end |GGT A[GAAGAGC] ----- |T[CTTCTCG] # based on the above, what we actually add: # 5' GCTCTTCT # _and_ the first three nucleotides # of our sequence to be inserted _must_ be ATG # 3' GGTAGAAGAGC
From page 4 of Electra PDF.
Alpha-factor secretion signal
There are several variations to thoose from. There's the full-length from DNA 2.0:
# FAKS - Alpha-factor full MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN TTIASIAAKEEGVSLEKREAEA
The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?
Since a Kex2 cut site ("KR") is present in the native sequence (and Kex2 is present in yeast), we should at the very least remove the KR and everything after that:
# FAKS - Alpha-factor with everything after Kex2 site removed MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN TTIASIAAKEEGVSLEKR # Same but the actual nucleotide sequence ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA
Here's the minimal version that does not rely on Kex/Ste:
# FAKS - Alpha-factor T (minimal) MRFPSIFTAVLFAASSALA
DNA 2.0 provides 11 different secretion signals which were all tested against each-other in Gil2015 and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:
# KP - Killer Protein secretion sequence MTKPTQVLVRSVSILFFITLLHLVVA
We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).
We decided to go with the full lengths with the Kex2 and post-Kex2 sequence removed.
Partial kappa casein / CMP (caseinomacropeptide)
Here is the human kappa casein coding sequence used by Kim2015:
# coding sequence only (no signal sequence) # https://www.ncbi.nlm.nih.gov/nuccore/M73628 # chymosin cut site in bold EVQNQKQPACHENDERPFYQKTAPYVPMYYVPNSYPYYGTNLYQRRPAIAINNPYVPRTYYANPA VVRPHAQIPQRQYLPNSHPPTVVRRPNLHPSFIAIPPKKIQDKIIIPTINTIATVEPTP TPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA
Human chymosin cuts at FI with the resulting cut begin between F and I. After chymosin processing, residues 98 to 162 will be left:
# hCMP - length: 65 # https://www.ncbi.nlm.nih.gov/nuccore/M73628 IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA
Now the Kim2005 paper specifies that their hCMP consists of 64 amino acids from 106 to 169 of kappa casein. I can find no way of matching this up to the sequence they reference. The cut site MUST be between F and I so the weirdest part of this mismatch is the fact that they specify the length of hCMP as being 64 aa but it is actually 65. Here is another paper Liu2008 that agrees with us that hCMP is 65 aa.
We could try to express the bovine version of CMP, but since we're trying to replicate the Kim2005 protocol that is probably not a good idea. Anyway, here it is:
# bCMP derived from CSN3*B by taking everything after the cut site # (the chymosin cut site is FM, between the F and M, in cows) # http://www.ncbi.nlm.nih.gov/protein/AAQ87923.1 MAIPPKKNQDKTEIPTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
Full-length bovine kappa-casein (CSN3*B)
QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
(Caseinomacripeptide (CMP) in italic).
His tag
A his tag is added to simplify purification:
# His tag - Six histidines HHHHHH
Stop codon
We need a translational terminator, aka a stop codon. The AA symbol for a stop codon is: *
There is some evidence that having a G nucleotide immediately after the stop codon in S. cerevisiae increases the effectiveness of the stop codon. We already have a G so we didn't need to do anything extra.
Putting it together
These variants are using the full-length alpha-factor secretion signal:
Full-length bovine variant
Sequence for codon-optimization (with comments):
# Extra random nucleotides since enzymes don't like cutting right at the end TACACG # SapI recognition sequence GCTCTTCT # FAKS - Nearly full-length alpha-factor (cut off after Kex2 site) ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA # full-length bovine kappa (CSN3*B) QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV # His tag HHHHHH # Stop codon * # SapI recognition sequence GGTAGAAGAGC # Extra random nucleotides since enzymes don't like cutting right at the end GTACCA
We ran the parts specified as amino acids through the IDT codon optimizer for gBlocks and added the non-coding regions back:
TACACGGCTCTTCTATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAG CTGCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACT TAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGT TTATAAATACTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGACAGGAAC AGAATCAGGAGCAGCCAATAAGATGTGAAAAGGACGAGAGATTTTTTTCTGACAAAATTGCGAAGTATA TACCTATACAATACGTTTTGTCACGTTACCCTAGCTACGGCTTGAATTACTATCAGCAGAAACCCGTTG CACTTATAAATAACCAATTCCTACCCTATCCATATTATGCAAAGCCTGCGGCTGTACGTTCACCTGCTC AGATCCTGCAATGGCAGGTTCTTAGCAACACTGTTCCCGCAAAGTCCTGTCAAGCTCAGCCTACCACAA TGGCGCGTCACCCCCACCCCCACCTGAGCTTTATGGCAATACCCCCTAAGAAAAACCAAGATAAGACAG AAATCCCGACCATAAATACTATAGCGTCTGGCGAGCCCACCAGCACCCCGACTATTGAAGCTGTTGAGA GTACGGTAGCGACACTAGAGGCCAGCCCTGAAGTCATCGAGTCTCCACCTGAGATAAACACCGTACAGG TTACGAGTACGGCGGTGCATCACCATCATCATCATTAGGGTAGAAGAGCGTACCA
(initial random 6-mer, alpha factor, His tag, and SapI site in bold)
We ensured that no other SapI sites were present after codon optimization.
hCMP variant
Sequence for codon-optimization (with comments):
# Extra random nucleotides since enzymes don't like cutting right at the end TACACG # SapI recognition sequence GCTCTTCT # FAKS - Nearly full-length alpha-factor (cut off after Kex2 site) ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA # 65-AA hCMP sequence IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA # His tag HHHHHH # Stop codon * # SapI recognition sequence GGTAGAAGAGC # Extra random nucleotides since enzymes don't like cutting right at the end GTACCA
Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:
TACACGGCTCTTCTATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAG CTGCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACT TAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGT TTATAAATACTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGAATTGCGA TACCGCCAAAGAAGATTCAGGATAAAATAATAATACCTACTATAAACACAATAGCTACTGTTGAACCAA CACCTACCCCCGCTACGGAGCCCACTGTGGATTCTGTCGTTACTCCAGAGGCTTTCAGCGAGTCAATAA TTACCTCAACACCGGAAACCACTACGGTTGCCGTGACCCCTCCCACGGCTCACCATCATCACCATCATT AAGGTAGAAGAGCGTACCA
(initial random 6-mer, alpha factor, His tag, and SapI site in bold)
We ensured that no other SapI sites were present.
Ordering
We are ordering these as IDT gBlocks since that is our cheapest available option.