RaGene: Unlocking the potential of AI Powered Gene-Optimization

By

Proteinea Team

November 27, 2022

RaGene Overview

RaGene is an AI-driven gene optimization platform specializing in codon optimization, with significant promise to address challenges and deliver innovations in protein expression, mRNA therapeutics, cell therapy, and gene therapy.

Diverging from conventional rule-based methods that hinge on predetermined criteria like codon usage frequency, secondary structure, and mRNA stability, RaGene introduces a next-gen sequence optimization approach. The AI-powered platform addresses existing limitations by embracing biological complexity, extracting and predicting host and protein-specific features, and achieving broad generalization across species and protein types.

RaGene excels in optimizing multiple parameters, including expression yield, solubility, and stability, providing a unique solution for enhancing therapeutic protein expression without compromising quality. As a licensable technology, RaGene holds the potential to streamline cost-effective market entry for new therapeutics, enhancing accessibility to critical and high-impact drugs.

Codon Optimization, Why?

As established in molecular biology, it all starts with the genetic code of life, DNA. A DNA gene encodes information, which is transcribed into mRNA followed by translation into the respective protein (DNA → mRNA → Protein). However, a given protein can be encoded by a variety of DNA sequences, and not all of them are equally efficient. This means that some coding sequences can express a certain protein better and faster than others. This is particularly a problem when a protein of interest is meant to be expressed in a heterologous host other than its native one, where some codons are not easily readable by the non-native expression host leading to low production yields. And that is exactly where the codon optimization concept comes into action, ensuring that the best possible DNA coding sequence of the protein of interest has been designed for the chosen host, to guarantee the highest possible yield.

Each protein consists of a chain of amino acids, called a polypeptide chain. Each amino acid (AA) in the chain is encoded by three successive DNA nucleotides known as "codon". As a fact, a single AA can be encoded by several codons, known as “synonymous codons”. Some synonymous codons, however, are favored by a host organism (e.g. bacteria, yeast, insects, or human cells) over others. For example, the amino acid proline (P) is encoded by four synonymous codons: CCG, CCC, CCT, and CCA. While the codon CCC is the most preferred by human cells, the bacteria E. coli in contrast favors the codon CCG. Accordingly, the codon optimization algorithms select the most usable codons for the protein sequence based on the expression host of choice.

In addition to the codon usability explained above, other critical factors play a decisive role in the protein expression. These include: mRNA secondary structures, GC content, mRNA destabilizing motifs, etc. Therefore, modern codon optimization tools take all these factors into consideration

“There is major room for innovation beyond rational codon optimization"

Although the attempt to produce high protein yields by changing codon assignments has resulted in the widespread usage of DNA codon optimization, several studies have revealed that synonymous codon alterations via the rational approach might have adverse implications. This is because the native DNA code includes several layers of information that overlay the amino acid sequence, and this “natural” complexity can be disturbed by rational codon optimization methods. For example, the ribosomal translation of a mRNA should accelerate and decelerate at certain regions to ensure the correct protein folding. As a result of violating such natural information, implications such as protein conformation and stability alteration, as well as changed sites of post-translational modifications have been observed. These changes ultimately affect protein function.

Moreover, certain possible concerns are linked with the use of rational codon optimization for the production of recombinant therapeutic proteins, such as the formation of anti-drug antibodies, which can impair therapeutic effectiveness and trigger allergic responses. Therefore, there is a critical need for alternative and safer codon optimization approaches that overcome the drawbacks of rational methods. Proteinea is here to fulfill this need with its cutting-edge Artificial Intelligence (AI) technology.

RaGene's Existing Features

  1. Compatibility with any host: By incorporating multi-omic host-level data, RaGene achieves recombinant protein-host system harmony. Across commonly used and customized cell lines, RaGene generates host-specialized codon sequences that account for evolutionary relationships between hosts and protein classes
  2. Protein-agnostic optimization: RaGene optimization capabilities extend to naturally occurring and synthetic/engineered protein types. Our models utilize meticulous specialization, tailoring sequence optimization for each protein class based on  profound understanding of both protein-specific and evolutionary characteristics
  3. Superior & reliable yield increase: RaGene has established performance standards that reach remarkable heights – up to 9x fold increase in expression. But our commitment doesn't end there. We prioritize reliability, ensuring every run of our pipeline maintains protein integrity while maximizing expression potential
  4. Scaled high throughput delivery: RaGene is underpinned by a robust computing infrastructure, ensuring both data security and scalability, to effortlessly accommodate a high volume of optimization orders while offering a seamless usability experience

Case studies and Performance Benchmarks

RaGene is heavily benchmarked against existing commercial alternatives and experimentally tested via several independent entities. Below, we only showcase a few of the benchmarks performed by 3rd party commercial partners. The results, therefore, span different protein types, different antibody classes, different expression hosts and strains, and different testing pipelines, plasmid maps, media, and conditions adopted by each entity, supporting the generalization abilities of RaGene across commercial users with variable demands.

Tested on 3 complex monoclonal antibodies in HEK293 that are already optimized using the CRO’s adopted commercial codon optimization tool, RaGene’s optimized variants succeeded in increasing the yield between 2.2 to 9X the value of the reference sequences, RaGene has also succeeded in increasing the monomer % between 3 to 13% for some of the variants. Given the antibodies were produced in large volumes,  any yield increase is directly correlated to significant cost reduction.

RaGene’s generalization capabilities extend to the expression spectrum, succeeding in optimizing inherently challenging antibodies with limited-to-no yield (post-optimization). This success profile encompasses a range of bi-specific, tri-specific, and multi-specific antibodies, ensuring fold enhancements where it is possible.

A graph representation of the fold change in one experiment on fifteen different monoclonal antibodies represented in log scale. Antibodies were expressed in CHO cell line and the data was obtained using Gator (mg/L).

Product Use Cases

  1. DNA Synthesis: Within the domain of DNA synthesis, RaGene offers a pre-synthesis CDS optimization option that accommodates time and scalability requirements often associated with providing DNA synthesis as a service or operating in-house. Being host and protein-agnostic, RaGene proves compatible with the wide range of 3rd party customers or early research projects with high usage diversity and synthesis contstraints
  2. Protein Expression: Whether for 3rd party customers or expression of proprietary proteins, offering reliable yield enhancement enables comprehensive extended testing of the target proteins, and directly reduces testing, manufacturing, and production costs with no additional workflow overhead or amino acid sequence modifications
  3. RNA Therapeutics: Grappling with the issues of expression, degradation, and solubility of mRNA sequences extending with respect to. molecule size and delivery method, RaGene enables the pragmatic usability of linear mRNA sequences, while aligning with the increased adoption of saRNA and circRNA, in turn, offering compatibility with innovations across RNA therapeutics, vaccines, etc
  4. Plasmid Engineering: Through proprietary plasmid conditional and incremental optimization (i.e., optimizing selected plasmid regions/sequences given the context learned from the full plasmid, even if only the selected region/sequence is to be optimized), RaGene enables global construct optimization through in-silico modeling of the full plasmid mutagenesis space
  5. Gene and Cell Therapy: Through our cell line/host system-specific optimization axis with generalizable performance for engineered strains, RaGene supports multiple entry points for gene and cell therapies. Starting from viral vector yield enhancement, empty capsid reduction, and cassette optimization all the way to trans-gene yield optimization, RaGene offers 360° compatibility with gene therapy needs.


Contact us at bd@proteinea.com for more information on commercial partnerships

Latest blogs

Ankh: Pioneering the paradigm shifting technology of protein language models

To model the language of life (i.e., the language of proteins) in a data-efficient and cost-optimized modality, we need a meaningful representation encompassing its structural and functional information. Proteinea has developed -and is continuously improving- Ankh, a revolutionary base protein language model that will change the landscape of protein engineering with AI forever

protein optimization

Multi-Parameter Optimization: Explore How We Made an Already Exceptional Protein Better

Integrating Ankh and other proprietary models, our Protein engineering platform (PEP) is capable of multi-parameter protein optimization of complex proteins. Here, we showcase how we have integrated several components of our platform to engineer a protein integral in many biotechnology industries to enhance multiple parameters, stability, and solubility.

View All

Our innovations