GBlock design
Codon Optimization, Restriction Analysis and Flanking Sequences for Casein and FAM20C Kinase Genes
Overview
In this section, we use published amino acid sequences for our proteins of interest to design their DNA sequences in a form useful for downstream applications.
A detailed description of the human and bovine casein and FAM20C kinase proteins with which we work can be found in Molecular biology.
The design process includes codon optimization for expression in baker's yeast (Saccharomyces cerevisiae,) restriction analysis to ensure our DNA sequences aren't cleaved by the restriction enzymes used in this project or other common restriction enzymes, incorporation of the FAKS yeast alpha-factor secretion signal to direct our protein products out of yeast cells, addition of restriction sites for Electra cloning, addition of BioBrick flanking sequences to allow future assembly standardization, and disruption of Kex protease amino acid recognition sites, as needed.
The final DNA sequences are submitted to Integrated DNA Technologies (IDT) for synthesis as "gBlocks Gene Fragments"
Background behind each gBlock design stage
Codon optimization
Addition of FAKS yeast alpha-factor secretion signal sequence
Restriction analysis
Addition of flanking sequences: SapI restriction sites for Electra cloning & BioBrick sequences
Disruption of Kex protease amino acid recognition sites
Process of gBlock DNA design
Codon optimization
We use the IDT Codon Optimization Tool
1) Select ‘gBlocks Gene Fragments’ for Product Type & ‘Amino Acids’ for Sequence Type
2) Select Codon organism: 3 options for Saccharomyces cerevisiae. We use 'Saccharomyces cerevisiae.'
3) Nbr of results: 5
4) Input amino acid sequence from Molecular biology. Do not input N-terminus native signal peptide sequence; this will be replaced by yeast alpha-factor signal peptide later.
Due to stochastic nature of the process, different results will come up every time you run the optimization tool. DO NOT CLOSE THE RESULTS PAGE - you can use “Select Sequence” to fix any issue that came up during restriction analysis.
Addition of FAKS yeast alpha-factor secretion signal sequence
FULL FAKS SEQUENCE
Amino acid sequence:
MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFINTTIASIAAKEEGVSLEKREAEA
Codon-optimized DNA sequence from DNA 2.0 (without the leading M):
AGATTCCCATCTATTTTCACCGCTGTCTTGTTCGCTGCCTCCTCTGCATTGGCTGCCCCTGTTAACAC TACCACTGAAGACGAGACTGCTCAAATTCCAGCTGAAGCAGTTATCGGTTACTCTGACCTTGAGGGTGA TTTCGACGTCGCTGTTTTGCCTTTCTCTAACTCCACTAACAACGGTTTGTTGTTCATTAACACCACTA TCGCTTCCATTGCTGCTAAGGAAGAGGGTGTCTCTCTCGAGAAAAGAGAGGCCGAAGCT
We use a secretion signal-empty version of the vector, so paste the FAKS DNA sequence in at the 5'end of our target gene DNA sequence, minus FAKS initial ATG start codon.
Restriction analysis
Take the top result from the IDT codon optimization, and check for the presence of the restriction sites below. If you find any, you can try the next result from codon optimization until you find a sequence that has none. OR you can disrupt any sites you find by tweaking a codon here and there (see Manual Optimization below)
restriction sites to check for:
EcoRI site: GAATTC XbaI site: TCTAGA SpeI site: ACTAGT PstI site: CTGCAG NotI site: GCGGCCGC SapI site: GCTCTTC and GAAGAGC BglII site: AGATCT BamHI site: GGATCC
Use NEBcutter to search for cutters in this list (addgene does not have all of them):
Select “Only defined oligonucleotide sequences”, then click “define oligos” and fill in the sequences above. (As long as you hit “back” to fill in a new DNA sequence to scan, you will only need to fill in the list of restriction sites once.)
To make sure NEBcutter is detecting these restriction sites, you can input this test sequence containing all of them: GAATTCTCTAGAACTAGTCTGCAGGCGGCCGCGCTCTTCGAAGAGCAGATCTGGATCC
Manual optimization, as required
The automatic codon optimization can fail in at least two ways:
1) The IDT codon optimization may give a warning such as: “Additional Services Required This sequence contains a window of 20 bases starting at base 480 with a GC content of 5.0 %. Solution: Redesign this region to have a GC content greater than 15%.” NOTE: regardless of number of problematic areas, the warning comes up only for the first. After fixing, check sequence again for others.
2) Or you may not be able to find any codon optimized results that avoid all the restriction sites listed above.
You can solve both problems by changing a few codons by hand. Click the “Select Sequence” button below the IDT codon optimization results you want to improve. The next page will show you all possible codons for each amino acid in your sequence. Locate the problematic areas, and change one or more codons to fix the problem, selecting a codon that is close in frequency to the one the automated optimization had picked (hover your mouse over the codon to see the frequency). Avoid codons listed in gray, which are used less than 10% of the time in your organism.
Addition of flanking sequences: SapI restriction sites for Electra cloning & BioBrick sequences
We will need to put the right restrictions sites etc. around the codon optimized coding sequences. Here are the sequences we put around our genes of interest:
TACACGGAATTCGCGGCCGCTTCTAGAGGCTCTTCTATG <gene of interest> GGTAGAAGAGCTACTAGTAGCGGCCGCTGCAGGTACCA
To break that down a bit further:
TACACG : random 6-mer (restriction endonucleases don't like cutting close to the end)
GAATTC GCGGCCGC T TCTAGA G : BioBrick prefix
GCTCTTC T ATG : SapI site for Electra cloning system
<FAKS-gene of interest> : codon optimized DNA sequence
GGT A GAAGAGC : SapI site for Electra cloning system
T ACTAGT A GCGGCCG CTGCAG : BioBrick suffix
GTACCA: random 6-mer (picked the same as in Craig's original hKcasein)
Disruption of Kex protease amino acid recognition sites
The Kex2 protease amino acid recognition sites (Lysine/Arginine and Arginine/Arginine) occur in the native sequence of three of our study proteins: bovine alpha casein S2, human kappa casein and FAM20C. We will retain the Kex2 site resulting in cleavage of the FAKS signal peptide from our secreted proteins. See protease problems for details.
We will disrupt these sites by altering to amino acids as biochemically similar as possible. All Lysine/Arginine and Arginine/Arginine pairs in above casein and FAM20C replaced by Lysine/Lysine: Kex + and Kex - variants, respectively. gBlock design proceeds as above.
Final DNA sequences and annotated gene maps for all project gBlocks
P.FAKS.bovAlphaS1.S
P.FAKS.bovAlphaS2(Kex+).S
P.FAKS.bovAlphaS2(Kex-).S
P.FAKS.Beta(A2).S (bov)
P.FAKS.bovBeta(B).S
P.FAKS.bovKappa.S
P.FAKS.humAlphaS1.S, version 1
P.FAKS.humAlphaS1.S, version 2
P.FAKS.humBeta.S
P.FAKS.humKappa(Kex+).S
P.FAKS.humKappa(Kex-).S
Sap.FAKS.hFam20C(Kex+).Sap
Sap.Faks.hFam20C(Kex-).Sap
Design bovine Kappa casein constructs based on Kim et al, 2005
We are having trouble getting kappa casein expressed. So let's try to mimic the approach taken in the Kim et al 2005 paper on expression of partial human kappa (except using the bovine sequence, and including a His tag).
- pYEGa: Inducible, GAL 10 promoter, URA 3 marker, Amp resistance [17]
- pYEGa-hCMP: Inducible, GAL 10 promoter, URA 3 marker, Amp resistance
- S. cerevisiae 2805: MATa pep4::HIS3 pro1-d can1 GAL2 his3 d ura3–52
"General DNA manipulations were performed following standard techniques [10]. The hCMP gene was chemically synthesized based on the amino acid sequence [6] and Genbank database (Accession No. M73628). Table 2 shows the primer sequences for the PCR amplification of the hCMP gene with the restriction sites included for further gene manipulations. The gene for the a-factor secretion signal sequence was placed before the hCMP structural gene so that the signal peptide was fused to the 5' end of the hCMP and that the recombinant hCMP was secreted in the culture medium. In addition, the Kozak sequence was inserted into the position preceding the ATG initiation codon to facilitate an efficient translation in yeast [11]. For the construction of the inducible plasmid with pYEGa for S. cerevisiae, the hCMP gene was amplified using the primers hCMPECOaF (forward) and hCMPSALR (reverse), and digested with EcoR I and Sal I [...]. The resulting expression vectors were designated pYEGa-hCMP (inducible) and pYIGPhCMP (constitutive), respectively."
Reconstruction of Kim 2005 pYEGa-hCMP
Backbone plasmid
"Plasmid YEp352 which is a multicopy number yeast-E.coli shuttle vector containing pBR322 sequence, yeast URA3 and yeast 2u replication origin (10) was used for the backbone of the general yeast expression and secretion vectors. To construct a promoter-lacZ fusion plasmid, pBM258 (GAL1-GAL10 promoter) (14), [...] were used as promoter sources [...]. YEp70aT (11) was used as PCR template to obtain mating factor pre-pro leader sequence." [Sohn et al, 1991]
pYEGa
Construction of the pYEGa plasmid is described in [Sohn et al, 1991]. We can reconstruct exactly which sequence they used for the alpha factor pre-pro leader sequence by checking how their PCR primer sequences match the MFalpha1 gene sequence
Primer1: CAATGAATTCGATTAAAAGAATGAGATTTC (EcoRI restriction site in bold) Primer2: TTTATCTAGAGATACCCCTTCTTCTTTAGC (XbaI restriction site in bold) MFalpha1: (overlap with primers in bold) >gi|330443753:193648-194145 Saccharomyces cerevisiae S288c chromosome XVI, complete sequence ATGAGATTTCCTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT GCCAGCATTGCTGCTAAAGAAGAAGGGGTATCTTTGGATAAAAGAGAGGCTGAAGCTTGGCATTGGTTGC AACTAAAACCTGGCCAACCAATGTACAAGAGAGAAGCCGAAGCTGAAGCTTGGCATTGGCTGCAACTAAA GCCTGGCCAACCAATGTACAAAAGAGAAGCCGACGCTGAAGCTTGGCATTGGCTGCAACTAAAGCCTGGC CAACCAATGTACAAAAGAGAAGCCGACGCTGAAGCTTGGCATTGGTTGCAGTTAAAACCCGGCCAACCAA TGTACTAA
Resulting PCR product: CAATGAATTCGATTAAAAGAATGAGATTTC CTTCAATTTTTACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTGCTCCAGTCAACA CTACAACAGAAGATGAAACGGCACAAATTCCGGCTGAAGCTGTCATCGGTTACTTAGATTTAGAAGGGGA TTTCGATGTTGCTGTTTTGCCATTTTCCAACAGCACAAATAACGGGTTATTGTTTATAAATACTACTATT GCCAGCATTGCT GCTAAAGAAGAAGGGGTATCTCTAGATAAA