GBlock design

From Real Vegan Cheese
Revision as of 06:55, 9 January 2016 by Patrik (talk | contribs) (→‎pYEGa)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Codon Optimization, Restriction Analysis and Flanking Sequences for Casein and FAM20C Kinase Genes


In this section, we use published amino acid sequences for our proteins of interest to design their DNA sequences in a form useful for downstream applications.

A detailed description of the human and bovine casein and FAM20C kinase proteins with which we work can be found in Molecular biology.

The design process includes codon optimization for expression in baker's yeast (Saccharomyces cerevisiae,) restriction analysis to ensure our DNA sequences aren't cleaved by the restriction enzymes used in this project or other common restriction enzymes, incorporation of the FAKS yeast alpha-factor secretion signal to direct our protein products out of yeast cells, addition of restriction sites for Electra cloning, addition of BioBrick flanking sequences to allow future assembly standardization, and disruption of Kex protease amino acid recognition sites, as needed.

The final DNA sequences are submitted to Integrated DNA Technologies (IDT) for synthesis as "gBlocks Gene Fragments"

Background behind each gBlock design stage

Codon optimization

Addition of FAKS yeast alpha-factor secretion signal sequence

Restriction analysis

Addition of flanking sequences: SapI restriction sites for Electra cloning & BioBrick sequences

Disruption of Kex protease amino acid recognition sites

Process of gBlock DNA design

Codon optimization

We use the IDT Codon Optimization Tool

1) Select ‘gBlocks Gene Fragments’ for Product Type & ‘Amino Acids’ for Sequence Type

2) Select Codon organism: 3 options for Saccharomyces cerevisiae. We use 'Saccharomyces cerevisiae.'

3) Nbr of results: 5

4) Input amino acid sequence from Molecular biology. Do not input N-terminus native signal peptide sequence; this will be replaced by yeast alpha-factor signal peptide later.

Due to stochastic nature of the process, different results will come up every time you run the optimization tool. DO NOT CLOSE THE RESULTS PAGE - you can use “Select Sequence” to fix any issue that came up during restriction analysis.

Addition of FAKS yeast alpha-factor secretion signal sequence


Amino acid sequence:


Codon-optimized DNA sequence from DNA 2.0 (without the leading M):


We use a secretion signal-empty version of the vector, so paste the FAKS DNA sequence in at the 5'end of our target gene DNA sequence, minus FAKS initial ATG start codon.

Restriction analysis

Take the top result from the IDT codon optimization, and check for the presence of the restriction sites below. If you find any, you can try the next result from codon optimization until you find a sequence that has none. OR you can disrupt any sites you find by tweaking a codon here and there (see Manual Optimization below)

restriction sites to check for:

EcoRI site:   GAATTC
XbaI site:    TCTAGA
SpeI site:    ACTAGT
PstI site:    CTGCAG
NotI site:    GCGGCCGC
SapI site:    GCTCTTC and GAAGAGC
BglII site:    AGATCT
BamHI site:    GGATCC

Use NEBcutter to search for cutters in this list (addgene does not have all of them):

Select “Only defined oligonucleotide sequences”, then click “define oligos” and fill in the sequences above. (As long as you hit “back” to fill in a new DNA sequence to scan, you will only need to fill in the list of restriction sites once.)

To make sure NEBcutter is detecting these restriction sites, you can input this test sequence containing all of them: GAATTCTCTAGAACTAGTCTGCAGGCGGCCGCGCTCTTCGAAGAGCAGATCTGGATCC

Manual optimization, as required

The automatic codon optimization can fail in at least two ways:

1) The IDT codon optimization may give a warning such as: “Additional Services Required This sequence contains a window of 20 bases starting at base 480 with a GC content of 5.0 %. Solution: Redesign this region to have a GC content greater than 15%.” NOTE: regardless of number of problematic areas, the warning comes up only for the first. After fixing, check sequence again for others.

2) Or you may not be able to find any codon optimized results that avoid all the restriction sites listed above.

You can solve both problems by changing a few codons by hand. Click the “Select Sequence” button below the IDT codon optimization results you want to improve. The next page will show you all possible codons for each amino acid in your sequence. Locate the problematic areas, and change one or more codons to fix the problem, selecting a codon that is close in frequency to the one the automated optimization had picked (hover your mouse over the codon to see the frequency). Avoid codons listed in gray, which are used less than 10% of the time in your organism.

Addition of flanking sequences: SapI restriction sites for Electra cloning & BioBrick sequences

We will need to put the right restrictions sites etc. around the codon optimized coding sequences. Here are the sequences we put around our genes of interest:


To break that down a bit further:

TACACG : random 6-mer (restriction endonucleases don't like cutting close to the end)


GCTCTTC T ATG : SapI site for Electra cloning system

<FAKS-gene of interest> : codon optimized DNA sequence

GGT A GAAGAGC : SapI site for Electra cloning system


GTACCA: random 6-mer (picked the same as in Craig's original hKcasein)

Disruption of Kex protease amino acid recognition sites

The Kex2 protease amino acid recognition sites (Lysine/Arginine and Arginine/Arginine) occur in the native sequence of three of our study proteins: bovine alpha casein S2, human kappa casein and FAM20C. We will retain the Kex2 site resulting in cleavage of the FAKS signal peptide from our secreted proteins. See protease problems for details.

We will disrupt these sites by altering to amino acids as biochemically similar as possible. All Lysine/Arginine and Arginine/Arginine pairs in above casein and FAM20C replaced by Lysine/Lysine: Kex + and Kex - variants, respectively. gBlock design proceeds as above.

Final DNA sequences and annotated gene maps for all project gBlocks




P.FAKS.Beta(A2).S (bov)



P.FAKS.humAlphaS1.S, version 1

P.FAKS.humAlphaS1.S, version 2






Design bovine Kappa casein constructs based on Kim et al, 2005

We are having trouble getting kappa casein expressed. So let's try to mimic the approach taken in the Kim et al 2005 paper on expression of partial human kappa (except using the bovine sequence, and including a His tag).

  • pYEGa: Inducible, GAL 10 promoter, URA 3 marker, Amp resistance [17]
  • pYEGa-hCMP: Inducible, GAL 10 promoter, URA 3 marker, Amp resistance
  • S. cerevisiae 2805: MATa pep4::HIS3 pro1-d can1 GAL2 his3 d ura3–52

"General DNA manipulations were performed following standard techniques [10]. The hCMP gene was chemically synthesized based on the amino acid sequence [6] and Genbank database (Accession No. M73628). Table 2 shows the primer sequences for the PCR amplification of the hCMP gene with the restriction sites included for further gene manipulations. The gene for the a-factor secretion signal sequence was placed before the hCMP structural gene so that the signal peptide was fused to the 5' end of the hCMP and that the recombinant hCMP was secreted in the culture medium. In addition, the Kozak sequence was inserted into the position preceding the ATG initiation codon to facilitate an efficient translation in yeast [11]. For the construction of the inducible plasmid with pYEGa for S. cerevisiae, the hCMP gene was amplified using the primers hCMPECOaF (forward) and hCMPSALR (reverse), and digested with EcoR I and Sal I [...]. The resulting expression vectors were designated pYEGa-hCMP (inducible) and pYIGPhCMP (constitutive), respectively."

Reconstruction of Kim 2005 pYEGa-hCMP

Backbone plasmid

"Plasmid YEp352 which is a multicopy number yeast-E.coli shuttle vector containing pBR322 sequence, yeast URA3 and yeast 2u replication origin (10) was used for the backbone of the general yeast expression and secretion vectors. To construct a promoter-lacZ fusion plasmid, pBM258 (GAL1-GAL10 promoter) (14), [...] were used as promoter sources [...]. YEp70aT (11) was used as PCR template to obtain mating factor pre-pro leader sequence." [Sohn et al, 1991]


Construction of the pYEGa plasmid is described in [Sohn et al, 1991]. We can reconstruct exactly which sequence they used for the alpha factor pre-pro leader sequence by checking how their PCR primer sequences match the MFalpha1 gene sequence

 Primer1: CAATGAATTCGATTAAAAGAATGAGATTTC (EcoRI restriction site in bold)
 Primer2: TTTATCTAGAGATACCCCTTCTTCTTTAGC (XbaI restriction site in bold)
 MFalpha1: (overlap with primers in bold)
 >gi|330443753:193648-194145 Saccharomyces cerevisiae S288c chromosome XVI, complete sequence
 Resulting PCR product:

This PCR product were then digested at the built in EcoRI and XbaI sites and spliced into the yeast expression vector, with the GAL10 promoter up front, and at the end a short adaptor sequence to supply the Kex2 site of the alpha factor, immediately followed by the foreign gene to be expressed (hirudin, in [Sohn et al, 1991]).

The resulting sequence between the GAL10 promoter and the foreign gene is as follows:



It is unclear in the Kim paper exactly how the hCMP gene was added into the pYEGa vector. The methods section says that the hCMP gene was amplified with primers that add an EcorI site at the 5' end, and a SalI site at the 3' end, and then digested with those restriction enzymes. However, the pYEGa-1 vector described in [Sohn et al, 1991] has no EcorI/SalI cloning site where such a fragment could be inserted. Rather, there is an EcorI site in front of the alpha factor leader sequence, and a SalI site after the foreign gene (hirudin), so digesting with those restriction enzymes would remove both the hirudin gene and the alpha factor.