Difference between revisions of "January2016/Plasmid design"

From Real Vegan Cheese
Jump to navigation Jump to search
(Fixed the full length sequence! I had accidentally left out part of kappa-casein!)
Line 146: Line 146:
   
   
  # full-length bovine kappa (CSN3*B)
  # full-length bovine kappa (CSN3*B)
  LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI
  QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV
  PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
  RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP
TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
   
   
  # His tag
  # His tag
Line 157: Line 158:
Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added:
Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added:


  '''GCTCTTCT'''ATGCGTTTCCCTTCCATCTTTACCGCTGTCCTGTTCGCGGCGAGCAGTGCCTTAGCCGCTCC
  '''GCTCTTCT'''ATGCGTTTCCCCAGTATATTCACGGCGGTCTTATTCGCTGCCTCTAGTGCGCTTGCTGCGCC
  CGTGAATACAACTACGGAAGATGAGACCGCCCAGATTCCAGCTGAAGCAGTTATTGGGTACTCTGATTTG
  TGTAAACACCACTACGGAAGACGAAACCGCACAGATTCCTGCGGAGGCAGTGATTGGGTACAGCGACCTT
  GAGGGTGACTTCGACGTTGCGGTATTGCCCTTTAGCAATTCCACTAACAACGGGCTGCTATTCATTAACA
  GAAGGAGATTTCGATGTTGCGGTGCTGCCTTTCTCCAACTCTACCAACAATGGGCTATTATTCATAAACA
  CCACCATCGCAAGTATTGCGGCAAAGGAGGAAGGCGTGTCCTTAGAGAAGCGTGAGGCCGAGGCGTTAAT
  CCACGATTGCATCTATAGCTGCTAAGGAGGAAGGAGTGAGCCTAGAAAAACGTGAGGCGGAGGCACAAGA
  AAACAACCAATTTCTGCCATACCCATATTATGCGAAACCGGCCGCTGTAAGATCTCCCGCTCAAATCCTA
  ACAAAATCAAGAGCAGCCTATACGTTGCGAAAAAGACGAACGTTTCTTTAGCGATAAAATAGCGAAGTAC
  CAATGGCAAGTTCTTAGCAACACAGTGCCTGCAAAGAGTTGCCAGGCCCAACCAACTACAATGGCCAGAC
  ATCCCCATACAATATGTCCTTTCTAGGTATCCTTCTTACGGTTTAAATTATTATCAACAGAAACCCGTAG
  ACCCACACCCTCATCTATCTTTTATGGCGATACCACCGAAGAAAAACCAAGACAAGACTGAGATACCGAC
  CTCTAATTAATAATCAGTTCCTGCCATACCCTTACTATGCAAAACCGGCAGCGGTCCGTAGTCCGGCCCA
  AATCAACACAATTGCGTCAGGAGAGCCCACATCCACGCCTACCATAGAGGCCGTTGAGAGCACAGTAGCG
  GATTCTACAATGGCAAGTATTGTCCAATACTGTCCCGGCAAAAAGTTGTCAAGCACAACCTACAACCATG
  ACGTTAGAAGCGAGCCCCGAAGTTATTGAATCTCCCCCTGAAATCAATACTGTCCAGGTAACCTCTACAG
  GCCCGTCACCCGCACCCGCACCTGTCATTCATGGCTATACCCCCCAAGAAGAACCAGGATAAAACTGAAA
  CAGTACACCATCATCACCACCACTGA'''GGTAGAAGAGC'''
  TACCAACAATTAACACAATCGCTTCCGGAGAACCTACATCCACCCCTACCATAGAAGCTGTGGAATCTAC
TGTGGCCACATTAGAGGCATCACCAGAGGTCATTGAGTCACCGCCAGAGATAAACACCGTGCAAGTCACT
TCCACTGCAGTCCACCACCACCATCATCATTAA'''GGTAGAAGAGC'''
 


(SapI sites in '''bold''')
(SapI sites in '''bold''')

Revision as of 02:00, 6 January 2016

The goal of this design is to give ourselves as high a chance of success as possible by re-creating a published successful experiment from Kim2005 that expressed a partial human kappa-casein and trying a variation of their experiment where we express full-length bovine kappa casein.

Plasmid backbone

We're using the DNA 2.0 shuttle vector backbone pD1204. Which has the following features:

  • E. coli high copy number origin of replication
  • E. coli ampicillin resistance marker
  • S. cerevisiae Ura3 gene
  • S. cerevisiae GAL1 galactose inducible promoter immediately before SapI site
  • SapI site for integration of gene of interest
  • S. cerevisiae strong terminator immediately after SapI site

Here is what we want to end up with:

on-vector GAL1 promoter | (alpha factor secretion tag) + kappa-casein gene + his tag | on-vector terminator

(the parts in bold are what we need to synthesize)

We put the his-tag at the C-terminal end since then we can use his-purification even if cleavage of the secretion tag fails. We can even try to purify any protein expressed by E. coli due to leaky expression.

In this design we have no way of cleaving the his tag. That will have to come later.

Synthesized sequence parts

The following sub-sections list the parts that we need to synthesize (as a single piece of DNA) and insert into the plasmid backbone.

We're designing two variants of the synthesized sequence:

  • One based on hCMP
  • One based full-length bovine kappa-casein (CSN3*B)

We know that hCMP has already been done by Kim2005 and we are replicating their experiment as best we can. If that fails then we know we're doing something wrong. If the hCMP version succeeds but the full-length fails then we're doing things right but something is inherently problematic with the full-length protein (maybe it is toxic to S. cerevisiae?).

SapI recognition sequences

We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone. From page 4 of Electra PDF.

# [] denotes recognition sequences
# | denotes where the cut will happen

# 5' end - the cut will happen before the ATG 
[GCTCTTC]T|ATG
          -----
[CGAGAAG]A TAC|

# 3' end 
|GGT A[GAAGAGC]
-----
    |T[CTTCTCG]

# based on the above, what we actually add:
# 5'
GCTCTTCT
# _and_ the first three nucleotides 
# of our sequence to be inserted _must_ be ATG

# 3'
GGTAGAAGAGC

From page 4 of Electra PDF.

Alpha-factor secretion signal

There are several variations to thoose from. There's the full-length from DNA 2.0:

# FAKS - Alpha-factor full
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKREAEA

The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?

Here's the minimal version that does not rely on Kex/Ste:

# FAKS - Alpha-factor T (minimal)
MRFPSIFTAVLFAASSALA

Patrik is working on finding the exact sequence used in the Kim2005 paper, which is only slightly shorter than the full alpha factor.

DNA 2.0 provides 11 different secretion signals which were all tested against each-other in Gil2015 and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:

# KP - Killer Protein secretion sequence
MTKPTQVLVRSVSILFFITLLHLVVA

We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).

Partial kappa casein / CMP (caseinomacropeptide)

Here is the human kappa casein coding sequence used by Kim2015:

# coding sequence only (no signal sequence)
# https://www.ncbi.nlm.nih.gov/nuccore/M73628
# chymosin cut site in bold
EVQNQKQPACHENDERPFYQKTAPYVPMYYVPNSYPYYGTNLYQRRPAIAINNPYVPRTYYANPA
VVRPHAQIPQRQYLPNSHPPTVVRRPNLHPSFIAIPPKKIQDKIIIPTINTIATVEPTP
TPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

Human chymosin cuts at FI with the resulting cut begin between F and I. After chymosin processing, residues 98 to 162 will be left:

# hCMP - length: 65
# https://www.ncbi.nlm.nih.gov/nuccore/M73628
IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

Now the Kim2005 paper specifies that their hCMP consists of 64 amino acids from 106 to 169 of kappa casein. I can find no way of matching this up to the sequence they reference. The cut site MUST be between F and I so the weirdest part of this mismatch is the fact that they specify the length of hCMP as being 64 aa but it is actually 65. Here is another paper Liu2008 that agrees with us that hCMP is 65 aa.

We could try to express the bovine version of CMP, but since we're trying to replicate the Kim2005 protocol that is probably not a good idea. Anyway, here it is:

# bCMP derived from CSN3*B by taking everything after the cut site
# (the chymosin cut site is FM, between the F and M, in cows)
# http://www.ncbi.nlm.nih.gov/protein/AAQ87923.1
MAIPPKKNQDKTEIPTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

Full-length bovine kappa-casein (CSN3*B)

QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA
LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI
PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

(Caseinomacripeptide (CMP) in italic).

His tag

A his tag is added to simplify purification:

# His tag - Six histidines
HHHHHH

Stop codon

We need a translational terminator, aka a stop codon. The AA symbol for a stop codon is: *

There is some evidence that having a G nucleotide immediately after the stop codon in S. cerevisiae increases the effectiveness of the stop codon. We already have a G so we didn't need to do anything extra.

Putting it together

These variants are using the full-length alpha-factor secretion signal:

Full-length bovine variant

Sequence for codon-optimization (with comments):

# FAKS - Full-length alpha-factor
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKREAEA

# full-length bovine kappa (CSN3*B)
QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV
RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP
TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

# His tag
HHHHHH

# Stop codon
*

Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:

GCTCTTCTATGCGTTTCCCCAGTATATTCACGGCGGTCTTATTCGCTGCCTCTAGTGCGCTTGCTGCGCC
TGTAAACACCACTACGGAAGACGAAACCGCACAGATTCCTGCGGAGGCAGTGATTGGGTACAGCGACCTT
GAAGGAGATTTCGATGTTGCGGTGCTGCCTTTCTCCAACTCTACCAACAATGGGCTATTATTCATAAACA
CCACGATTGCATCTATAGCTGCTAAGGAGGAAGGAGTGAGCCTAGAAAAACGTGAGGCGGAGGCACAAGA
ACAAAATCAAGAGCAGCCTATACGTTGCGAAAAAGACGAACGTTTCTTTAGCGATAAAATAGCGAAGTAC
ATCCCCATACAATATGTCCTTTCTAGGTATCCTTCTTACGGTTTAAATTATTATCAACAGAAACCCGTAG
CTCTAATTAATAATCAGTTCCTGCCATACCCTTACTATGCAAAACCGGCAGCGGTCCGTAGTCCGGCCCA
GATTCTACAATGGCAAGTATTGTCCAATACTGTCCCGGCAAAAAGTTGTCAAGCACAACCTACAACCATG
GCCCGTCACCCGCACCCGCACCTGTCATTCATGGCTATACCCCCCAAGAAGAACCAGGATAAAACTGAAA
TACCAACAATTAACACAATCGCTTCCGGAGAACCTACATCCACCCCTACCATAGAAGCTGTGGAATCTAC
TGTGGCCACATTAGAGGCATCACCAGAGGTCATTGAGTCACCGCCAGAGATAAACACCGTGCAAGTCACT
TCCACTGCAGTCCACCACCACCATCATCATTAAGGTAGAAGAGC


(SapI sites in bold)

We ensured that no other SapI sites were present.

hCMP variant

Sequence for codon-optimization (with comments):

# FAKS - Full-length alpha-factor
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKREAEA

# 65-AA hCMP sequence
IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

# His tag
HHHHHH

# Stop codon
*

Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:

GCTCTTCTGCTCTTCTATGCGTTTTCCCAGCATCTTTACGGCAGTGCTGTTCGCTGCCTCATCTGCGTTAGCAGCTCCG
GTAAACACGACGACGGAAGACGAGACCGCCCAAATCCCAGCGGAGGCAGTGATTGGATATTCCGATTTAGA
AGGCGACTTCGATGTGGCGGTTCTACCTTTTTCAAACTCAACGAATAACGGCCTGCTTTTTATAAATACGA
CCATAGCCTCTATTGCGGCGAAAGAAGAGGGCGTCTCTTTGGAGAAAAGGGAAGCGGAAGCTATAGCTATT
CCACCAAAGAAGATACAAGATAAGATAATCATACCCACTATCAATACCATAGCCACGGTCGAGCCTACGCC
TACTCCCGCTACCGAGCCCACCGTTGATTCAGTTGTGACTCCTGAGGCGTTTAGTGAGTCCATCATCACGA
GTACACCTGAGACAACCACCGTCGCCGTCACTCCCCCGACTGCCCATCACCATCATCATCACTAGGGTAGA
AGAGC

(SapI sites in bold)

We ensured that no other SapI sites were present.

Ordering

We are ordering these as IDT gBlocks since that is our cheapest available option.