Difference between revisions of "January2016/Plasmid design"
(Fixed the full length sequence! I had accidentally left out part of kappa-casein!) |
|||
Line 146: | Line 146: | ||
# full-length bovine kappa (CSN3*B) | # full-length bovine kappa (CSN3*B) | ||
QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV | |||
RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP | |||
TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV | |||
# His tag | # His tag | ||
Line 157: | Line 158: | ||
Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added: | Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added: | ||
'''GCTCTTCT''' | '''GCTCTTCT'''ATGCGTTTCCCCAGTATATTCACGGCGGTCTTATTCGCTGCCTCTAGTGCGCTTGCTGCGCC | ||
TGTAAACACCACTACGGAAGACGAAACCGCACAGATTCCTGCGGAGGCAGTGATTGGGTACAGCGACCTT | |||
GAAGGAGATTTCGATGTTGCGGTGCTGCCTTTCTCCAACTCTACCAACAATGGGCTATTATTCATAAACA | |||
CCACGATTGCATCTATAGCTGCTAAGGAGGAAGGAGTGAGCCTAGAAAAACGTGAGGCGGAGGCACAAGA | |||
ACAAAATCAAGAGCAGCCTATACGTTGCGAAAAAGACGAACGTTTCTTTAGCGATAAAATAGCGAAGTAC | |||
ATCCCCATACAATATGTCCTTTCTAGGTATCCTTCTTACGGTTTAAATTATTATCAACAGAAACCCGTAG | |||
CTCTAATTAATAATCAGTTCCTGCCATACCCTTACTATGCAAAACCGGCAGCGGTCCGTAGTCCGGCCCA | |||
GATTCTACAATGGCAAGTATTGTCCAATACTGTCCCGGCAAAAAGTTGTCAAGCACAACCTACAACCATG | |||
GCCCGTCACCCGCACCCGCACCTGTCATTCATGGCTATACCCCCCAAGAAGAACCAGGATAAAACTGAAA | |||
TACCAACAATTAACACAATCGCTTCCGGAGAACCTACATCCACCCCTACCATAGAAGCTGTGGAATCTAC | |||
TGTGGCCACATTAGAGGCATCACCAGAGGTCATTGAGTCACCGCCAGAGATAAACACCGTGCAAGTCACT | |||
TCCACTGCAGTCCACCACCACCATCATCATTAA'''GGTAGAAGAGC''' | |||
(SapI sites in '''bold''') | (SapI sites in '''bold''') |
Revision as of 02:00, 6 January 2016
The goal of this design is to give ourselves as high a chance of success as possible by re-creating a published successful experiment from Kim2005 that expressed a partial human kappa-casein and trying a variation of their experiment where we express full-length bovine kappa casein.
Plasmid backbone
We're using the DNA 2.0 shuttle vector backbone pD1204. Which has the following features:
- E. coli high copy number origin of replication
- E. coli ampicillin resistance marker
- S. cerevisiae Ura3 gene
- S. cerevisiae GAL1 galactose inducible promoter immediately before SapI site
- SapI site for integration of gene of interest
- S. cerevisiae strong terminator immediately after SapI site
Here is what we want to end up with:
on-vector GAL1 promoter | (alpha factor secretion tag) + kappa-casein gene + his tag | on-vector terminator
(the parts in bold are what we need to synthesize)
We put the his-tag at the C-terminal end since then we can use his-purification even if cleavage of the secretion tag fails. We can even try to purify any protein expressed by E. coli due to leaky expression.
In this design we have no way of cleaving the his tag. That will have to come later.
Synthesized sequence parts
The following sub-sections list the parts that we need to synthesize (as a single piece of DNA) and insert into the plasmid backbone.
We're designing two variants of the synthesized sequence:
- One based on hCMP
- One based full-length bovine kappa-casein (CSN3*B)
We know that hCMP has already been done by Kim2005 and we are replicating their experiment as best we can. If that fails then we know we're doing something wrong. If the hCMP version succeeds but the full-length fails then we're doing things right but something is inherently problematic with the full-length protein (maybe it is toxic to S. cerevisiae?).
SapI recognition sequences
We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone. From page 4 of Electra PDF.
# [] denotes recognition sequences # | denotes where the cut will happen # 5' end - the cut will happen before the ATG [GCTCTTC]T|ATG ----- [CGAGAAG]A TAC| # 3' end |GGT A[GAAGAGC] ----- |T[CTTCTCG] # based on the above, what we actually add: # 5' GCTCTTCT # _and_ the first three nucleotides # of our sequence to be inserted _must_ be ATG # 3' GGTAGAAGAGC
From page 4 of Electra PDF.
Alpha-factor secretion signal
There are several variations to thoose from. There's the full-length from DNA 2.0:
# FAKS - Alpha-factor full MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN TTIASIAAKEEGVSLEKREAEA
The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?
Here's the minimal version that does not rely on Kex/Ste:
# FAKS - Alpha-factor T (minimal) MRFPSIFTAVLFAASSALA
Patrik is working on finding the exact sequence used in the Kim2005 paper, which is only slightly shorter than the full alpha factor.
DNA 2.0 provides 11 different secretion signals which were all tested against each-other in Gil2015 and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:
# KP - Killer Protein secretion sequence MTKPTQVLVRSVSILFFITLLHLVVA
We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).
Partial kappa casein / CMP (caseinomacropeptide)
Here is the human kappa casein coding sequence used by Kim2015:
# coding sequence only (no signal sequence) # https://www.ncbi.nlm.nih.gov/nuccore/M73628 # chymosin cut site in bold EVQNQKQPACHENDERPFYQKTAPYVPMYYVPNSYPYYGTNLYQRRPAIAINNPYVPRTYYANPA VVRPHAQIPQRQYLPNSHPPTVVRRPNLHPSFIAIPPKKIQDKIIIPTINTIATVEPTP TPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA
Human chymosin cuts at FI with the resulting cut begin between F and I. After chymosin processing, residues 98 to 162 will be left:
# hCMP - length: 65 # https://www.ncbi.nlm.nih.gov/nuccore/M73628 IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA
Now the Kim2005 paper specifies that their hCMP consists of 64 amino acids from 106 to 169 of kappa casein. I can find no way of matching this up to the sequence they reference. The cut site MUST be between F and I so the weirdest part of this mismatch is the fact that they specify the length of hCMP as being 64 aa but it is actually 65. Here is another paper Liu2008 that agrees with us that hCMP is 65 aa.
We could try to express the bovine version of CMP, but since we're trying to replicate the Kim2005 protocol that is probably not a good idea. Anyway, here it is:
# bCMP derived from CSN3*B by taking everything after the cut site # (the chymosin cut site is FM, between the F and M, in cows) # http://www.ncbi.nlm.nih.gov/protein/AAQ87923.1 MAIPPKKNQDKTEIPTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
Full-length bovine kappa-casein (CSN3*B)
QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
(Caseinomacripeptide (CMP) in italic).
His tag
A his tag is added to simplify purification:
# His tag - Six histidines HHHHHH
Stop codon
We need a translational terminator, aka a stop codon. The AA symbol for a stop codon is: *
There is some evidence that having a G nucleotide immediately after the stop codon in S. cerevisiae increases the effectiveness of the stop codon. We already have a G so we didn't need to do anything extra.
Putting it together
These variants are using the full-length alpha-factor secretion signal:
Full-length bovine variant
Sequence for codon-optimization (with comments):
# FAKS - Full-length alpha-factor MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN TTIASIAAKEEGVSLEKREAEA # full-length bovine kappa (CSN3*B) QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV # His tag HHHHHH # Stop codon *
Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:
GCTCTTCTATGCGTTTCCCCAGTATATTCACGGCGGTCTTATTCGCTGCCTCTAGTGCGCTTGCTGCGCC TGTAAACACCACTACGGAAGACGAAACCGCACAGATTCCTGCGGAGGCAGTGATTGGGTACAGCGACCTT GAAGGAGATTTCGATGTTGCGGTGCTGCCTTTCTCCAACTCTACCAACAATGGGCTATTATTCATAAACA CCACGATTGCATCTATAGCTGCTAAGGAGGAAGGAGTGAGCCTAGAAAAACGTGAGGCGGAGGCACAAGA ACAAAATCAAGAGCAGCCTATACGTTGCGAAAAAGACGAACGTTTCTTTAGCGATAAAATAGCGAAGTAC ATCCCCATACAATATGTCCTTTCTAGGTATCCTTCTTACGGTTTAAATTATTATCAACAGAAACCCGTAG CTCTAATTAATAATCAGTTCCTGCCATACCCTTACTATGCAAAACCGGCAGCGGTCCGTAGTCCGGCCCA GATTCTACAATGGCAAGTATTGTCCAATACTGTCCCGGCAAAAAGTTGTCAAGCACAACCTACAACCATG GCCCGTCACCCGCACCCGCACCTGTCATTCATGGCTATACCCCCCAAGAAGAACCAGGATAAAACTGAAA TACCAACAATTAACACAATCGCTTCCGGAGAACCTACATCCACCCCTACCATAGAAGCTGTGGAATCTAC TGTGGCCACATTAGAGGCATCACCAGAGGTCATTGAGTCACCGCCAGAGATAAACACCGTGCAAGTCACT TCCACTGCAGTCCACCACCACCATCATCATTAAGGTAGAAGAGC
(SapI sites in bold)
We ensured that no other SapI sites were present.
hCMP variant
Sequence for codon-optimization (with comments):
# FAKS - Full-length alpha-factor MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN TTIASIAAKEEGVSLEKREAEA # 65-AA hCMP sequence IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA # His tag HHHHHH # Stop codon *
Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:
GCTCTTCTGCTCTTCTATGCGTTTTCCCAGCATCTTTACGGCAGTGCTGTTCGCTGCCTCATCTGCGTTAGCAGCTCCG GTAAACACGACGACGGAAGACGAGACCGCCCAAATCCCAGCGGAGGCAGTGATTGGATATTCCGATTTAGA AGGCGACTTCGATGTGGCGGTTCTACCTTTTTCAAACTCAACGAATAACGGCCTGCTTTTTATAAATACGA CCATAGCCTCTATTGCGGCGAAAGAAGAGGGCGTCTCTTTGGAGAAAAGGGAAGCGGAAGCTATAGCTATT CCACCAAAGAAGATACAAGATAAGATAATCATACCCACTATCAATACCATAGCCACGGTCGAGCCTACGCC TACTCCCGCTACCGAGCCCACCGTTGATTCAGTTGTGACTCCTGAGGCGTTTAGTGAGTCCATCATCACGA GTACACCTGAGACAACCACCGTCGCCGTCACTCCCCCGACTGCCCATCACCATCATCATCACTAGGGTAGA AGAGC
(SapI sites in bold)
We ensured that no other SapI sites were present.
Ordering
We are ordering these as IDT gBlocks since that is our cheapest available option.