GBlock design

From Real Vegan Cheese
Revision as of 03:55, 30 September 2014 by Rachel Linzer (talk | contribs) (Created page with "=Codon Optimization, Restriction Analysis and Flanking Sequences for Casein and FAM20C Kinase Genes= ==Overview== In this section, we use published amino acid sequences for ou...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Codon Optimization, Restriction Analysis and Flanking Sequences for Casein and FAM20C Kinase Genes

Overview

In this section, we use published amino acid sequences for our proteins of interest to design their DNA sequences in a form useful for downstream applications.

A detailed description of the human and bovine casein and FAM20C kinase proteins with which we work can be found in Molecular biology.

The design process includes codon optimization for expression in baker's yeast (Saccharomyces cerevisiae,) restriction analysis to ensure our DNA sequences aren't cleaved by the restriction enzymes used in this project or other common restriction enzymes, incorporation of the FAKS yeast alpha-factor secretion signal to direct our protein products out of yeast cells, addition of restriction sites for Electra cloning, addition of BioBrick flanking sequences to allow future assembly standardization, and disruption of Kex protease amino acid recognition sites, as needed.

The final DNA sequences are submitted to Integrated DNA Technologies (IDT) for synthesis as "gBlocks Gene Fragments"

Background behind each gBlock design stage

Codon optimization

Addition of FAKS yeast alpha-factor secretion signal sequence

Restriction analysis

Addition of flanking sequences: SapI restriction sites for Electra cloning & BioBrick sequences

Disruption of Kex protease amino acid recognition sites

Process of gBlock DNA design

Codon optimization

We use the IDT Codon Optimization Tool

1) Select ‘gBlocks Gene Fragments’ for Product Type & ‘Amino Acids’ for Sequence Type

2) Select Codon organism: 3 options for Saccharomyces cerevisiae. We use 'Saccharomyces cerevisiae.'

3) Nbr of results: 5

4) Input amino acid sequence from Molecular biology. Do not input N-terminus native signal peptide sequence; this will be replaced by yeast alpha-factor signal peptide later.

Due to stochastic nature of the process, different results will come up every time you run the optimization tool. DO NOT CLOSE THE RESULTS PAGE - you can use “Select Sequence” to fix any issue that came up during restriction analysis.

Addition of FAKS yeast alpha-factor secretion signal sequence

FULL FAKS SEQUENCE

Amino acid sequence:

MRFPSIFTAVLFAASSALAAPVNTTTEDETAQIPAEAVIGYSDLEGDFDVAVLPFSNSTNNGLLFINTTIASIAAKEEGVSLEKREAEA

Codon-optimized DNA sequence from DNA 2.0:

 AGATTCCCATCTATTTTCACCGCTGTCTTGTTCGCTGCCTCCTCTGCATTGGCTGCCCCTGTTAACAC
 TACCACTGAAGACGAGACTGCTCAAATTCCAGCTGAAGCAGTTATCGGTTACTCTGACCTTGAGGGTGA
 TTTCGACGTCGCTGTTTTGCCTTTCTCTAACTCCACTAACAACGGTTTGTTGTTCATTAACACCACTA
 TCGCTTCCATTGCTGCTAAGGAAGAGGGTGTCTCTCTCGAGAAAAGAGAGGCCGAAGCT

We use a secretion signal-empty version of the vector, so paste the FAKS DNA sequence in at the 5'end of our target gene DNA sequence, minus FAKS initial ATG start codon.

Restriction analysis

Take the top result from the IDT codon optimization, and check for the presence of the restriction sites below. If you find any, you can try the next result from codon optimization until you find a sequence that has none. OR you can disrupt any sites you find by tweaking a codon here and there (see Manual Optimization below)

restriction sites to check for:

EcoRI site:   GAATTC
XbaI site:    TCTAGA
SpeI site:    ACTAGT
PstI site:    CTGCAG
NotI site:    GCGGCCGC
SapI site:    GCTCTTC and GAAGAGC
BglII site:    AGATCT
BamHI site:    GGATCC

Use NEBcutter to search for cutters in this list (addgene does not have all of them):

Select “Only defined oligonucleotide sequences”, then click “define oligos” and fill in the sequences above. (As long as you hit “back” to fill in a new DNA sequence to scan, you will only need to fill in the list of restriction sites once.)

To make sure NEBcutter is detecting these restriction sites, you can input this test sequence containing all of them: GAATTCTCTAGAACTAGTCTGCAGGCGGCCGCGCTCTTCGAAGAGCAGATCTGGATCC

Manual optimization, as required

The automatic codon optimization can fail in at least two ways:

1) The IDT codon optimization may give a warning such as: “Additional Services Required This sequence contains a window of 20 bases starting at base 480 with a GC content of 5.0 %. Solution: Redesign this region to have a GC content greater than 15%.” NOTE: regardless of number of problematic areas, the warning comes up only for the first. After fixing, check sequence again for others.

2) Or you may not be able to find any codon optimized results that avoid all the restriction sites listed above.

You can solve both problems by changing a few codons by hand. Click the “Select Sequence” button below the IDT codon optimization results you want to improve. The next page will show you all possible codons for each amino acid in your sequence. Locate the problematic areas, and change one or more codons to fix the problem, selecting a codon that is close in frequency to the one the automated optimization had picked (hover your mouse over the codon to see the frequency). Avoid codons listed in gray, which are used less than 10% of the time in your organism.

Addition of flanking sequences: SapI restriction sites for Electra cloning & BioBrick sequences

We will need to put the right restrictions sites etc. around the codon optimized coding sequences. Here are the sequences we put around our genes of interest:

TACACGGAATTCGCGGCCGCTTCTAGAGGCTCTTCTATG <gene of interest> GGTAGAAGAGCTACTAGTAGCGGCCGCTGCAGGTACCA


To break that down a bit further:

TACACG : random 6-mer (restriction endonucleases don't like cutting close to the end)

GAATTC GCGGCCGC T TCTAGA G : BioBrick prefix

GCTCTTC T ATG : SapI site for Electra cloning system

<FAKS-gene of interest> : codon optimized DNA sequence

GGT A GAAGAGC : SapI site for Electra cloning system

T ACTAGT A GCGGCCG CTGCAG : BioBrick suffix

GTACCA: random 6-mer (picked the same as in Craig's original hKcasein)

Disruption of Kex protease amino acid recognition sites

The Kex2 protease amino acid recognition sites (Lysine/Arginine and Arginine/Arginine) occur in the native sequence of three of our study proteins: bovine alpha casein S2, human kappa casein and FAM20C. We will retain the Kex2 site resulting in cleavage of the FAKS signal peptide from our secreted proteins. See protease problems for details.

We will disrupt these sites by altering to amino acids as biochemically similar as possible. All Lysine/Arginine and Arginine/Arginine pairs in above casein and FAM20C replaced by Lysine/Lysine: Kex + and Kex - variants, respectively. gBlock design proceeds as above.

Final DNA sequences and annotated gene maps for all project gBlocks

P.FAKS.bovAlphaS1.S

P.FAKS.bovAlphaS2(Kex+).S

P.FAKS.bovAlphaS2(Kex-).S

P.FAKS.Beta(A2).S (bov)

P.FAKS.bovBeta(B).S

P.FAKS.bovKappa.S

P.FAKS.humAlphaS1.S, version 1

P.FAKS.humAlphaS1.S, version 2

P.FAKS.humBeta.S

P.FAKS.humKappa(Kex+).S

P.FAKS.humKappa(Kex-).S

Sap.FAKS.hFam20C(Kex+).Sap

Sap.Faks.hFam20C(Kex-).Sap