Researchers from the University of Wisconsin–Madison Unveil ‘SAMPLE’: An Artificial Intelligence Platform for Fully Autonomous Protein Engineering

Protein engineering, a field with wide-ranging applications in chemistry, energy, and medicine, has multiple intricate challenges. Existing methods of engineering new proteins with improved or novel functions are slow, labor-intensive, and inefficient. This inefficiency in protein engineering hampers the ability to exploit its potential in various scientific and medical fields.

Protein engineering involves a discovery-driven process where hypotheses are generated, experiments are designed and performed, and the data is interpreted to refine the understanding of biological systems. This process is iterative but inefficient, often taking years to complete. Integrating robot scientists and self-driving laboratories has been employed in various areas, such as gene identification, chemical synthesis methodologies, and the discovery of new materials. These autonomous systems can learn from diverse data sources, make decisions under uncertainty, and generate reproducible data, showing promise in protein engineering and synthetic biology.

A team of researchers at the University of Wisconsin–Madison has introduced the Self-driving Autonomous Machines for Protein Landscape Exploration (SAMPLE) platform, an innovative approach to autonomous protein engineering. SAMPLE comprises an intelligent agent and a fully automated robotic system collaboratively working to enhance protein engineering. The agent designs new proteins and learns protein sequence-function relationships while the robotic system conducts experiments and provides feedback. 

Researchers conducted 10,000 simulated protein engineering trials using cytochrome P450 data to evaluate the SAMPLE platform. They utilized various Bayesian optimization (BO) methods, including UCB positive, Expected UCB, standard UCB, and random approaches, to select protein sequences for testing. The thermostability of the engineered proteins gauged the effectiveness of these methods. The study also investigated batch testing, noting a minor advantage in smaller batch experiments. A Gaussian Process (GP) model is central to SAMPLE, trained on sequence-function data, guiding the agent’s design decisions. Robustness and reliability were ensured through multiple layers of exception handling and data quality control for failed experimental steps. 

The SAMPLE agents successfully identified glycoside hydrolase enzymes that were significantly more stable than the initial sequences, with at least a 12°C increase in thermal tolerance. These agents efficiently explored less than 2% of the full combinatorial landscape before converging on the most stable designs. The top sequences identified by each agent were unique but converged to the same region in the fitness landscape, suggesting they had reached the global fitness peak. The human characterization of these machine-designed proteins confirmed their enhanced thermostability and maintained catalytic activity.

Conclusively, SAMPLE platform represents a significant advancement in protein engineering, demonstrating the potential of self-driving laboratories to automate and accelerate scientific discovery. SAMPLE’s full autonomy, integration of learning, decision-making, and experimentation, represents a major leap over previous semi-autonomous systems. It highlights the efficiency and potential of using machine learning and automation in protein engineering. This methodical approach underscores the synergy of intelligent computational design, automated experimentation, and precise data management in protein engineering advancements.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.

🧵🧵 [Download] Evaluation of Large Language Model Vulnerabilities Report (Promoted)