January2016/Plasmid design

From Real Vegan Cheese
Revision as of 02:47, 19 January 2016 by Juul (talk | contribs) (combined with last edit fixes sequences to match final ordered sequences)
Jump to navigation Jump to search

The goal of this design is to give ourselves as high a chance of success as possible by re-creating a published successful experiment from Kim2005 that expressed a partial human kappa-casein and trying a variation of their experiment where we express full-length bovine kappa casein.

Plasmid backbone

We're using the DNA 2.0 shuttle vector backbone pD1204. Which has the following features:

  • E. coli high copy number origin of replication
  • E. coli ampicillin resistance marker
  • S. cerevisiae Ura3 gene
  • S. cerevisiae GAL1 galactose inducible promoter immediately before SapI site
  • SapI site for integration of gene of interest
  • S. cerevisiae strong terminator immediately after SapI site

Here is what we want to end up with:

on-vector GAL1 promoter | (alpha factor secretion tag) + kappa-casein gene + his tag | on-vector terminator

(the parts in bold are what we need to synthesize)

We put the his-tag at the C-terminal end since then we can use his-purification even if cleavage of the secretion tag fails. We can even try to purify any protein expressed by E. coli due to leaky expression.

In this design we have no way of cleaving the his tag. That will have to come later.

Synthesized sequence parts

The following sub-sections list the parts that we need to synthesize (as a single piece of DNA) and insert into the plasmid backbone.

We're designing two variants of the synthesized sequence:

  • One based on hCMP
  • One based full-length bovine kappa-casein (CSN3*B)

We know that hCMP has already been done by Kim2005 and we are replicating their experiment as best we can. If that fails then we know we're doing something wrong. If the hCMP version succeeds but the full-length fails then we're doing things right but something is inherently problematic with the full-length protein (maybe it is toxic to S. cerevisiae?).

SapI recognition sequences

We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone. From page 4 of Electra PDF.

# [] denotes recognition sequences
# | denotes where the cut will happen

# 5' end - the cut will happen before the ATG 
[GCTCTTC]T|ATG
          -----
[CGAGAAG]A TAC|

# 3' end 
|GGT A[GAAGAGC]
-----
    |T[CTTCTCG]

# based on the above, what we actually add:
# 5'
GCTCTTCT
# _and_ the first three nucleotides 
# of our sequence to be inserted _must_ be ATG

# 3'
GGTAGAAGAGC

From page 4 of Electra PDF.

Alpha-factor secretion signal

There are several variations to thoose from. There's the full-length from DNA 2.0:

# FAKS - Alpha-factor full
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKREAEA

The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?

Since a Kex2 cut site ("KR") is present in the native sequence (and Kex2 is present in yeast), we should at the very least remove the KR and everything after that:

# FAKS - Alpha-factor with Kex2 site and everything after Kex2 site removed
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLE

Here's the minimal version that does not rely on Kex/Ste:

# FAKS - Alpha-factor T (minimal)
MRFPSIFTAVLFAASSALA


DNA 2.0 provides 11 different secretion signals which were all tested against each-other in Gil2015 and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:

# KP - Killer Protein secretion sequence
MTKPTQVLVRSVSILFFITLLHLVVA

We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).

We decided to go with the full lengths with the Kex2 and post-Kex2 sequence removed.

Partial kappa casein / CMP (caseinomacropeptide)

Here is the human kappa casein coding sequence used by Kim2015:

# coding sequence only (no signal sequence)
# https://www.ncbi.nlm.nih.gov/nuccore/M73628
# chymosin cut site in bold
EVQNQKQPACHENDERPFYQKTAPYVPMYYVPNSYPYYGTNLYQRRPAIAINNPYVPRTYYANPA
VVRPHAQIPQRQYLPNSHPPTVVRRPNLHPSFIAIPPKKIQDKIIIPTINTIATVEPTP
TPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

Human chymosin cuts at FI with the resulting cut begin between F and I. After chymosin processing, residues 98 to 162 will be left:

# hCMP - length: 65
# https://www.ncbi.nlm.nih.gov/nuccore/M73628
IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

Now the Kim2005 paper specifies that their hCMP consists of 64 amino acids from 106 to 169 of kappa casein. I can find no way of matching this up to the sequence they reference. The cut site MUST be between F and I so the weirdest part of this mismatch is the fact that they specify the length of hCMP as being 64 aa but it is actually 65. Here is another paper Liu2008 that agrees with us that hCMP is 65 aa.

We could try to express the bovine version of CMP, but since we're trying to replicate the Kim2005 protocol that is probably not a good idea. Anyway, here it is:

# bCMP derived from CSN3*B by taking everything after the cut site
# (the chymosin cut site is FM, between the F and M, in cows)
# http://www.ncbi.nlm.nih.gov/protein/AAQ87923.1
MAIPPKKNQDKTEIPTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

Full-length bovine kappa-casein (CSN3*B)

QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA
LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI
PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

(Caseinomacripeptide (CMP) in italic).

His tag

A his tag is added to simplify purification:

# His tag - Six histidines
HHHHHH

Stop codon

We need a translational terminator, aka a stop codon. The AA symbol for a stop codon is: *

There is some evidence that having a G nucleotide immediately after the stop codon in S. cerevisiae increases the effectiveness of the stop codon. We already have a G so we didn't need to do anything extra.

Putting it together

These variants are using the full-length alpha-factor secretion signal:

Full-length bovine variant

Sequence for codon-optimization (with comments):

# Extra random nucleotides since enzymes don't like cutting right at the end
TACACG

# SapI recognition sequence
GCTCTTCT

# FAKS - Nearly full-length alpha-factor (cut off before Kex2 site)
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLE

# full-length bovine kappa (CSN3*B)
QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV
RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP
TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

# His tag
HHHHHH

# Stop codon
*

# SapI recognition sequence
GGTAGAAGAGC

# Extra random nucleotides since enzymes don't like cutting right at the end
GTACCA


We ran the parts specified as amino acids through the IDT codon optimizer for gBlocks and added the non-coding regions back:

TACACGGCTCTTCTATGCGTTTCCCCTCCATTTTCACGGCAGTCCTATTTGCAGCCTCTAGTGCCCT
TGCTGCACCTGTCAACACCACGACTGAGGACGAAACCGCTCAAATCCCCGCGGAGGCTGTAATAGGC
TACAGCGACCTAGAGGGAGACTTTGATGTTGCAGTTTTGCCGTTTAGCAACTCTACCAATAACGGGC
TACTTTTTATTAATACGACGATAGCCTCTATTGCAGCAAAGGAGGAAGGCGTATCCCTAGAACAAGA
ACAGAATCAGGAGCAACCGATCAGATGTGAAAAAGATGAGAGATTCTTCAGTGATAAGATAGCAAAA
TACATACCCATTCAATATGTTCTGTCAAGGTATCCTTCCTATGGCTTAAACTACTACCAGCAGAAAC
CAGTTGCGTTGATTAACAACCAATTCCTTCCGTACCCATATTACGCGAAACCCGCGGCAGTGCGTTC
ACCTGCGCAAATACTACAGTGGCAGGTCTTATCCAATACAGTCCCTGCAAAATCATGTCAGGCACAG
CCGACGACTATGGCGAGGCATCCACATCCCCACCTGTCATTTATGGCTATTCCCCCTAAGAAAAATC
AGGATAAGACCGAGATTCCCACTATAAACACGATCGCTTCCGGAGAACCCACCAGCACACCGACTAT
CGAAGCGGTTGAGTCAACTGTAGCAACCTTGGAGGCAAGCCCGGAAGTAATCGAGTCACCGCCCGAA
ATAAATACTGTGCAAGTTACGAGCACTGCAGTACACCATCACCACCATCATTAAGGTAGAAGAGCGT
ACCA

(non-coding regions in bold)

We ensured that no other SapI sites were present after codon optimization.

hCMP variant

Sequence for codon-optimization (with comments):

# Extra random nucleotides since enzymes don't like cutting right at the end
TACACG

# SapI recognition sequence
GCTCTTCT

# FAKS - Nearly full-length alpha-factor (cut off before Kex2 site)
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLE

# 65-AA hCMP sequence
IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

# His tag
HHHHHH

# Stop codon
*

# SapI recognition sequence
GGTAGAAGAGC

# Extra random nucleotides since enzymes don't like cutting right at the end
GTACCA

Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:

TACACGGCTCTTCTATGAGATTTCCCTCTATTTTCACTGCCGTCTTGTTTGCAGCGAGCAGCGCACTTGCT
GCACCCGTAAACACGACCACGGAAGACGAGACCGCTCAGATACCGGCAGAGGCCGTTATTGGATACTCCGA
CCTTGAAGGGGATTTCGATGTCGCGGTGCTTCCATTCTCTAACTCCACTAATAACGGGCTTTTGTTTATAA
ACACCACCATTGCTTCAATAGCCGCAAAAGAGGAAGGGGTAAGTCTTGAAATTGCGATTCCGCCTAAGAAA
ATTCAAGATAAAATCATTATCCCGACCATAAATACCATCGCTACTGTAGAACCAACGCCGACACCCGCCAC
AGAGCCGACAGTAGATAGTGTAGTAACCCCTGAGGCTTTTTCCGAGTCAATTATCACATCCACCCCGGAGA
CTACGACAGTGGCGGTAACTCCCCCCACCGCCCATCACCACCATCACCACTAGGGTAGAAGAGCGTACCA

(non-coding regions in bold)

We ensured that no other SapI sites were present.

Ordering

We are ordering these as IDT gBlocks since that is our cheapest available option.