
The key to understanding proteins—such as those that govern cancer, COVID-19, and other diseases—is quite simple: Identify their chemical structure and find which other proteins can bind to them. But there’s a catch.
“The search space for proteins is enormous,” said Brian Coventry, a research scientist with the Institute for Protein Design at the University of Washington and The Howard Hughes Medical Institute.
A typical protein studied by Coventry’s lab is composed of 65 amino acids, and with 20 different amino acid choices at each position, there are 65 to the power of 20 binding combinations, a number bigger than the estimated number of atoms in the universe.
Coventry is one of the co-authors of a study published in May 2023 in the journal Nature Communications.
In this study, the research team utilized deep learning methods to enhance existing energy-based physical models in computational protein design from scratch. The results showed a 10-fold increase in success rates for binding a designed protein with its target protein in laboratory tests.
“We demonstrated that the incorporation of deep learning methods improves the pipeline by evaluating the quality of the interfaces where hydrogen bonds form or from hydrophobic interactions,” explained Nathaniel Bennett, a post-doctoral scholar at the Institute for Protein Design, University of Washington, who also participated in the study. “This approach is more efficient than individually enumerating all the energies involved.”
Deep learning utilizes computer algorithms to analyze patterns in data and draw meaningful conclusions. In this study, deep learning techniques were employed to learn iterative transformations of protein sequence representations and structures, leading to highly accurate models.
The research team developed a deep learning-augmented de novo protein binder design protocol using software tools such as AlphaFold 2 and RoseTTA fold, both developed by the Institute for Protein Design.
The computational design of proteins was well-suited for parallelization on Frontera, a high-performance computing resource. Each design trajectory could be processed independently, maximizing the efficiency of the computational process.
“We distributed the problem, which involved 2 to 6 million designs, among Frontera’s massive computing resources. Each CPU was assigned to handle one design trajectory, allowing us to complete a large number of design trajectories in a reasonable amount of time,” added Bennett.
The authors employed the RifDock docking program to generate millions of protein “docks” or potential interactions between protein structures. These docks were divided into smaller chunks and distributed among Frontera’s compute nodes for parallel processing.
Further optimization was achieved by using the ProteinMPNN software tool developed by the Institute for Protein Design, which increased the computational efficiency of generating protein sequences neural networks by over 200 times compared to previous tools.
The modeling data used in the study consisted of yeast surface display binding data, publicly available and collected by the Institute for Protein Design. This data included thousands of different DNA strands encoding designed proteins, which were expressed by yeast cells. The cells were then sorted based on their ability to bind, and the researchers used tools from the human genome sequencing project to analyze the DNA sequences.
Although the study showed promising results with a 10-fold increase in the success rate of designed structures binding to their target proteins, Coventry emphasized that there is still much work to be done.
“We’ve made significant progress, but there is still room for improvement. The future of our research is to continue increasing the success rate and tackle even more challenging targets, such as viruses and cancer T-cell receptors,” said Coventry.
To achieve better computationally designed proteins, the researchers are focused on optimizing software tools and expanding the scope of their sampling efforts.
“The bigger the computer, the better the proteins we can design. Our goal is to develop the tools that will create the cancer-fighting drugs of tomorrow. Many of the proteins we design have the potential to become life-saving drugs. We are constantly improving our process to make those drugs even better,” concluded Coventry.
More information:
Nathaniel R. Bennett et al, Improving de novo protein binder design with deep learning, Nature Communications (2023). DOI: 10.1038/s41467-023-38328-5
Citation:
Deep learning for new protein design (2023, August 3)
retrieved 3 August 2023
from https://phys.org/news/2023-08-deep-protein.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.
Denial of responsibility! TechCodex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.

Jessica Irvine is a tech enthusiast specializing in gadgets. From smart home devices to cutting-edge electronics, Jessica explores the world of consumer tech, offering readers comprehensive reviews, hands-on experiences, and expert insights into the coolest and most innovative gadgets on the market.