Difference between revisions of "January2016/Plasmid design"

From Real Vegan Cheese
Jump to navigation Jump to search
(Created page with "The goal of this design is to give ourselves as high a chance of success as possible by re-creating a published successful experiment from [https://www.ncbi.nlm.nih.gov/pubmed...")
 
 
(14 intermediate revisions by 2 users not shown)
Line 3: Line 3:
= Plasmid backbone =
= Plasmid backbone =


We're using the DNA 2.0 shuttle vector backbone https://www.dna20.com/eCommerce/catalog/datasheet/189 pD1204]. Which has the following features:
We're using the DNA 2.0 shuttle vector backbone [https://www.dna20.com/eCommerce/catalog/datasheet/189 pD1204]. Which has the following features:


* E. coli high copy number origin of replication
* E. coli high copy number origin of replication
Line 35: Line 35:
== SapI recognition sequences ==
== SapI recognition sequences ==


We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone. The
We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone.
 
From page 4 of [https://www.dna20.com/eCommerce/catalog/datasheet/365 Electra PDF].
From page 4 of [https://www.dna20.com/eCommerce/catalog/datasheet/365 Electra PDF].


Line 72: Line 71:


The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?
The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?
Since a Kex2 cut site ("KR") is present in the native sequence (and Kex2 is present in yeast), we should at the very least remove the KR and everything after that:
# FAKS - Alpha-factor with everything after Kex2 site removed
'''MRFPSIFTAVLFAASSALA'''APVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKR
# Same but the actual nucleotide sequence
ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA
CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA
TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT
GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA


Here's the minimal version that does not rely on Kex/Ste:
Here's the minimal version that does not rely on Kex/Ste:
Line 78: Line 89:
  MRFPSIFTAVLFAASSALA
  MRFPSIFTAVLFAASSALA


Patrik is working on finding the exact sequence used in the Kim2005 paper, which is only slightly shorter than the full alpha factor.


DNA 2.0 provides 11 different secretion signals which were all tested against each-other in [https://www.ncbi.nlm.nih.gov/pubmed/25907834 Gil2015] and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:
DNA 2.0 provides 11 different secretion signals which were all tested against each-other in [https://www.ncbi.nlm.nih.gov/pubmed/25907834 Gil2015] and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:
Line 86: Line 96:


We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).
We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).
We decided to go with the full lengths with the Kex2 and post-Kex2 sequence removed.


== Partial kappa casein / CMP (caseinomacropeptide) ==
== Partial kappa casein / CMP (caseinomacropeptide) ==
Line 115: Line 127:
== Full-length bovine kappa-casein (CSN3*B) ==
== Full-length bovine kappa-casein (CSN3*B) ==


  # QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA
  QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA
  LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSF''MAIPPKKNQDKTEI''
  LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSF''MAIPPKKNQDKTEI''
  ''PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV''
  ''PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV''
Line 142: Line 154:
Sequence for codon-optimization (with comments):
Sequence for codon-optimization (with comments):


  # FAKS - Full-length alpha-factor
# Extra random nucleotides since enzymes don't like cutting right at the end
  MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TACACG
  TTIASIAAKEEGVSLEKREAEA
# SapI recognition sequence
GCTCTTCT
  # FAKS - Nearly full-length alpha-factor (cut off after Kex2 site)
ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA
CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA
  TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT
  GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA
   
   
  # full-length bovine kappa (CSN3*B)
  # full-length bovine kappa (CSN3*B)
  LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI
  QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV
  PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
  RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP
TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV
   
   
  # His tag
  # His tag
Line 155: Line 176:
  # Stop codon
  # Stop codon
  *
  *
# SapI recognition sequence
GGTAGAAGAGC
# Extra random nucleotides since enzymes don't like cutting right at the end
GTACCA


Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added:
We ran the parts specified as amino acids through the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks] and added the non-coding regions back:


  '''GCTCTTCT'''ATGCGTTTCCCTTCCATCTTTACCGCTGTCCTGTTCGCGGCGAGCAGTGCCTTAGCCGCTCC
  '''TACACG'''GCTCTTCT'''ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAG'''
  CGTGAATACAACTACGGAAGATGAGACCGCCCAGATTCCAGCTGAAGCAGTTATTGGGTACTCTGATTTG
  '''CTGCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACT'''
  GAGGGTGACTTCGACGTTGCGGTATTGCCCTTTAGCAATTCCACTAACAACGGGCTGCTATTCATTAACA
  '''TAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGT'''
  CCACCATCGCAAGTATTGCGGCAAAGGAGGAAGGCGTGTCCTTAGAGAAGCGTGAGGCCGAGGCGTTAAT
  '''TTATAAATACTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA'''CAGGAAC
  AAACAACCAATTTCTGCCATACCCATATTATGCGAAACCGGCCGCTGTAAGATCTCCCGCTCAAATCCTA
AGAATCAGGAGCAGCCAATAAGATGTGAAAAGGACGAGAGATTTTTTTCTGACAAAATTGCGAAGTATA
  CAATGGCAAGTTCTTAGCAACACAGTGCCTGCAAAGAGTTGCCAGGCCCAACCAACTACAATGGCCAGAC
TACCTATACAATACGTTTTGTCACGTTACCCTAGCTACGGCTTGAATTACTATCAGCAGAAACCCGTTG
  ACCCACACCCTCATCTATCTTTTATGGCGATACCACCGAAGAAAAACCAAGACAAGACTGAGATACCGAC
  CACTTATAAATAACCAATTCCTACCCTATCCATATTATGCAAAGCCTGCGGCTGTACGTTCACCTGCTC
  AATCAACACAATTGCGTCAGGAGAGCCCACATCCACGCCTACCATAGAGGCCGTTGAGAGCACAGTAGCG
  AGATCCTGCAATGGCAGGTTCTTAGCAACACTGTTCCCGCAAAGTCCTGTCAAGCTCAGCCTACCACAA
  ACGTTAGAAGCGAGCCCCGAAGTTATTGAATCTCCCCCTGAAATCAATACTGTCCAGGTAACCTCTACAG
  TGGCGCGTCACCCCCACCCCCACCTGAGCTTTATGGCAATACCCCCTAAGAAAAACCAAGATAAGACAG
  CAGTACACCATCATCACCACCACTGA'''GGTAGAAGAGC'''
  AAATCCCGACCATAAATACTATAGCGTCTGGCGAGCCCACCAGCACCCCGACTATTGAAGCTGTTGAGA
  GTACGGTAGCGACACTAGAGGCCAGCCCTGAAGTCATCGAGTCTCCACCTGAGATAAACACCGTACAGG
  TTACGAGTACGGCGGTG'''CATCACCATCATCATCAT'''TAG'''GGTAGAAGAGC'''GTACCA'''


(SapI sites in '''bold''')
(initial random 6-mer, alpha factor, His tag, and SapI site in '''bold''')


We ensured that no other SapI sites were present.
We ensured that no other SapI sites were present after codon optimization.


== hCMP variant ==
== hCMP variant ==
Line 177: Line 206:
Sequence for codon-optimization (with comments):
Sequence for codon-optimization (with comments):


  # FAKS - Full-length alpha-factor
# Extra random nucleotides since enzymes don't like cutting right at the end
  MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TACACG
  TTIASIAAKEEGVSLEKREAEA
# SapI recognition sequence
GCTCTTCT
  # FAKS - Nearly full-length alpha-factor (cut off after Kex2 site)
ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA
CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA
  TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT
  GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA
   
   
  # full-length bovine kappa (CSN3*B)
  # 65-AA hCMP sequence
  IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA
  IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA
   
   
Line 189: Line 226:
  # Stop codon
  # Stop codon
  *
  *
# SapI recognition sequence
GGTAGAAGAGC
# Extra random nucleotides since enzymes don't like cutting right at the end
GTACCA


Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added:
Codon optimized DNA sequence (using the [https://www.idtdna.com/CodonOpt IDT codon optimizer for gBlocks]) with SapI sites added:


  '''GCTCTTCT'''GCTCTTCTATGCGTTTTCCCAGCATCTTTACGGCAGTGCTGTTCGCTGCCTCATCTGCGTTAGCAGCTCCG
  '''TACACG'''GCTCTTCT'''ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAG'''
  GTAAACACGACGACGGAAGACGAGACCGCCCAAATCCCAGCGGAGGCAGTGATTGGATATTCCGATTTAGA
  '''CTGCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACT'''
  AGGCGACTTCGATGTGGCGGTTCTACCTTTTTCAAACTCAACGAATAACGGCCTGCTTTTTATAAATACGA
  '''TAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGT'''
  CCATAGCCTCTATTGCGGCGAAAGAAGAGGGCGTCTCTTTGGAGAAAAGGGAAGCGGAAGCTATAGCTATT
  '''TTATAAATACTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA'''ATTGCGA
  CCACCAAAGAAGATACAAGATAAGATAATCATACCCACTATCAATACCATAGCCACGGTCGAGCCTACGCC
  TACCGCCAAAGAAGATTCAGGATAAAATAATAATACCTACTATAAACACAATAGCTACTGTTGAACCAA
  TACTCCCGCTACCGAGCCCACCGTTGATTCAGTTGTGACTCCTGAGGCGTTTAGTGAGTCCATCATCACGA
  CACCTACCCCCGCTACGGAGCCCACTGTGGATTCTGTCGTTACTCCAGAGGCTTTCAGCGAGTCAATAA
  GTACACCTGAGACAACCACCGTCGCCGTCACTCCCCCGACTGCCCATCACCATCATCATCACTAG'''GGTAGA'''
  TTACCTCAACACCGGAAACCACTACGGTTGCCGTGACCCCTCCCACGGCT'''CACCATCATCACCATCAT'''T
  '''AGAGC'''
  AA'''GGTAGAAGAGC'''GTACCA'''


(SapI sites in '''bold''')
(initial random 6-mer, alpha factor, His tag, and SapI site in '''bold''')


We ensured that no other SapI sites were present.
We ensured that no other SapI sites were present.

Latest revision as of 04:45, 19 January 2016

The goal of this design is to give ourselves as high a chance of success as possible by re-creating a published successful experiment from Kim2005 that expressed a partial human kappa-casein and trying a variation of their experiment where we express full-length bovine kappa casein.

Plasmid backbone

We're using the DNA 2.0 shuttle vector backbone pD1204. Which has the following features:

  • E. coli high copy number origin of replication
  • E. coli ampicillin resistance marker
  • S. cerevisiae Ura3 gene
  • S. cerevisiae GAL1 galactose inducible promoter immediately before SapI site
  • SapI site for integration of gene of interest
  • S. cerevisiae strong terminator immediately after SapI site

Here is what we want to end up with:

on-vector GAL1 promoter | (alpha factor secretion tag) + kappa-casein gene + his tag | on-vector terminator

(the parts in bold are what we need to synthesize)

We put the his-tag at the C-terminal end since then we can use his-purification even if cleavage of the secretion tag fails. We can even try to purify any protein expressed by E. coli due to leaky expression.

In this design we have no way of cleaving the his tag. That will have to come later.

Synthesized sequence parts

The following sub-sections list the parts that we need to synthesize (as a single piece of DNA) and insert into the plasmid backbone.

We're designing two variants of the synthesized sequence:

  • One based on hCMP
  • One based full-length bovine kappa-casein (CSN3*B)

We know that hCMP has already been done by Kim2005 and we are replicating their experiment as best we can. If that fails then we know we're doing something wrong. If the hCMP version succeeds but the full-length fails then we're doing things right but something is inherently problematic with the full-length protein (maybe it is toxic to S. cerevisiae?).

SapI recognition sequences

We'll be using the DNA 2.0 Electra system to insert our synthesized sequence into a DNA 2.0 Electra plasmid backbone. From page 4 of Electra PDF.

# [] denotes recognition sequences
# | denotes where the cut will happen

# 5' end - the cut will happen before the ATG 
[GCTCTTC]T|ATG
          -----
[CGAGAAG]A TAC|

# 3' end 
|GGT A[GAAGAGC]
-----
    |T[CTTCTCG]

# based on the above, what we actually add:
# 5'
GCTCTTCT
# _and_ the first three nucleotides 
# of our sequence to be inserted _must_ be ATG

# 3'
GGTAGAAGAGC

From page 4 of Electra PDF.

Alpha-factor secretion signal

There are several variations to thoose from. There's the full-length from DNA 2.0:

# FAKS - Alpha-factor full
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKREAEA

The sequence in bold is cleaved off during translocation and secretion, whereas the rest is cleaved by Kex or Ste proteases, but are Kex and Ste proteases present in the secretion pathway or outside of the cell in large enough quantities to ensure cleavage of the remaining alpha-factor even during high protein expression?

Since a Kex2 cut site ("KR") is present in the native sequence (and Kex2 is present in yeast), we should at the very least remove the KR and everything after that:

# FAKS - Alpha-factor with everything after Kex2 site removed
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFIN
TTIASIAAKEEGVSLEKR

# Same but the actual nucleotide sequence
ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA
CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA
TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT
GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA

Here's the minimal version that does not rely on Kex/Ste:

# FAKS - Alpha-factor T (minimal)
MRFPSIFTAVLFAASSALA


DNA 2.0 provides 11 different secretion signals which were all tested against each-other in Gil2015 and the KP (Killer Protein) secretion sequence was found to be most effective for their protein:

# KP - Killer Protein secretion sequence
MTKPTQVLVRSVSILFFITLLHLVVA

We are not sure if these results will be similar for other proteins, and we do not know if this secretion signal is automatically cleaved in S. cerevisiae (if you're reading this you could try to find out :-).

We decided to go with the full lengths with the Kex2 and post-Kex2 sequence removed.

Partial kappa casein / CMP (caseinomacropeptide)

Here is the human kappa casein coding sequence used by Kim2015:

# coding sequence only (no signal sequence)
# https://www.ncbi.nlm.nih.gov/nuccore/M73628
# chymosin cut site in bold
EVQNQKQPACHENDERPFYQKTAPYVPMYYVPNSYPYYGTNLYQRRPAIAINNPYVPRTYYANPA
VVRPHAQIPQRQYLPNSHPPTVVRRPNLHPSFIAIPPKKIQDKIIIPTINTIATVEPTP
TPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

Human chymosin cuts at FI with the resulting cut begin between F and I. After chymosin processing, residues 98 to 162 will be left:

# hCMP - length: 65
# https://www.ncbi.nlm.nih.gov/nuccore/M73628
IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

Now the Kim2005 paper specifies that their hCMP consists of 64 amino acids from 106 to 169 of kappa casein. I can find no way of matching this up to the sequence they reference. The cut site MUST be between F and I so the weirdest part of this mismatch is the fact that they specify the length of hCMP as being 64 aa but it is actually 65. Here is another paper Liu2008 that agrees with us that hCMP is 65 aa.

We could try to express the bovine version of CMP, but since we're trying to replicate the Kim2005 protocol that is probably not a good idea. Anyway, here it is:

# bCMP derived from CSN3*B by taking everything after the cut site
# (the chymosin cut site is FM, between the F and M, in cows)
# http://www.ncbi.nlm.nih.gov/protein/AAQ87923.1
MAIPPKKNQDKTEIPTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

Full-length bovine kappa-casein (CSN3*B)

QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVA
LINNQFLPYPYYAKPAAVRSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEI
PTINTIASGEPTSTPTIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

(Caseinomacripeptide (CMP) in italic).

His tag

A his tag is added to simplify purification:

# His tag - Six histidines
HHHHHH

Stop codon

We need a translational terminator, aka a stop codon. The AA symbol for a stop codon is: *

There is some evidence that having a G nucleotide immediately after the stop codon in S. cerevisiae increases the effectiveness of the stop codon. We already have a G so we didn't need to do anything extra.

Putting it together

These variants are using the full-length alpha-factor secretion signal:

Full-length bovine variant

Sequence for codon-optimization (with comments):

# Extra random nucleotides since enzymes don't like cutting right at the end
TACACG

# SapI recognition sequence
GCTCTTCT

# FAKS - Nearly full-length alpha-factor (cut off after Kex2 site)
ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA
CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA
TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT
GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA

# full-length bovine kappa (CSN3*B)
QEQNQEQPIRCEKDERFFSDKIAKYIPIQYVLSRYPSYGLNYYQQKPVALINNQFLPYPYYAKPAAV
RSPAQILQWQVLSNTVPAKSCQAQPTTMARHPHPHLSFMAIPPKKNQDKTEIPTINTIASGEPTSTP
TIEAVESTVATLEASPEVIESPPEINTVQVTSTAV

# His tag
HHHHHH

# Stop codon
*

# SapI recognition sequence
GGTAGAAGAGC

# Extra random nucleotides since enzymes don't like cutting right at the end
GTACCA

We ran the parts specified as amino acids through the IDT codon optimizer for gBlocks and added the non-coding regions back:

TACACGGCTCTTCTATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAG
CTGCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACT
TAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGT
TTATAAATACTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGACAGGAAC
AGAATCAGGAGCAGCCAATAAGATGTGAAAAGGACGAGAGATTTTTTTCTGACAAAATTGCGAAGTATA
TACCTATACAATACGTTTTGTCACGTTACCCTAGCTACGGCTTGAATTACTATCAGCAGAAACCCGTTG
CACTTATAAATAACCAATTCCTACCCTATCCATATTATGCAAAGCCTGCGGCTGTACGTTCACCTGCTC
AGATCCTGCAATGGCAGGTTCTTAGCAACACTGTTCCCGCAAAGTCCTGTCAAGCTCAGCCTACCACAA
TGGCGCGTCACCCCCACCCCCACCTGAGCTTTATGGCAATACCCCCTAAGAAAAACCAAGATAAGACAG
AAATCCCGACCATAAATACTATAGCGTCTGGCGAGCCCACCAGCACCCCGACTATTGAAGCTGTTGAGA
GTACGGTAGCGACACTAGAGGCCAGCCCTGAAGTCATCGAGTCTCCACCTGAGATAAACACCGTACAGG
TTACGAGTACGGCGGTGCATCACCATCATCATCATTAGGGTAGAAGAGCGTACCA

(initial random 6-mer, alpha factor, His tag, and SapI site in bold)

We ensured that no other SapI sites were present after codon optimization.

hCMP variant

Sequence for codon-optimization (with comments):

# Extra random nucleotides since enzymes don't like cutting right at the end
TACACG

# SapI recognition sequence
GCTCTTCT

# FAKS - Nearly full-length alpha-factor (cut off after Kex2 site)
ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA
CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA
TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT
GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGA

# 65-AA hCMP sequence
IAIPPKKIQDKIIIPTINTIATVEPTPTPATEPTVDSVVTPEAFSESIITSTPETTTVAVTPPTA

# His tag
HHHHHH

# Stop codon
*

# SapI recognition sequence
GGTAGAAGAGC

# Extra random nucleotides since enzymes don't like cutting right at the end
GTACCA

Codon optimized DNA sequence (using the IDT codon optimizer for gBlocks) with SapI sites added:

TACACGGCTCTTCTATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAG
CTGCTCCAGTCAACACTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACT
TAGATTTAGAAGGGGATTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGT
TTATAAATACTACTATTGCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGAATTGCGA
TACCGCCAAAGAAGATTCAGGATAAAATAATAATACCTACTATAAACACAATAGCTACTGTTGAACCAA
CACCTACCCCCGCTACGGAGCCCACTGTGGATTCTGTCGTTACTCCAGAGGCTTTCAGCGAGTCAATAA
TTACCTCAACACCGGAAACCACTACGGTTGCCGTGACCCCTCCCACGGCTCACCATCATCACCATCATT
AAGGTAGAAGAGCGTACCA

(initial random 6-mer, alpha factor, His tag, and SapI site in bold)

We ensured that no other SapI sites were present.

Ordering

We are ordering these as IDT gBlocks since that is our cheapest available option.