Refine
Document Type
- Article (3)
Language
- English (3)
Has Fulltext
- yes (3)
Is part of the Bibliography
- no (3)
Keywords
- Biocatalysis (3)
- Catalytic Activity (1)
- FAIR data (1)
- Machine Learning (1)
- Metadata standard (1)
- Ontology (1)
- Semantics (1)
- Stereoselectivity (1)
- Transaminases (1)
- catalytic activity (1)
- machine learning (1)
- protein engineering (1)
- selectivity (1)
Institute
Publisher
- Wiley (3)
Abstract
Enzyme activity data for biocatalytic applications are currently often not annotated with standardized conditions and terms. This makes it extremely hard to retrieve, compare, and reuse enzymatic data. With advances in the fields of artificial intelligence (AI) and machine learning (ML), the automated usability of data in the form of machine‐readable annotations will play a crucial role for their success. It is becoming increasingly easy to retrieve complex data sets and extract relevant information; however, standardized data readability is a current limitation. In this contribution, we outline an iterative approach to develop standardized terms and create semantic relations (ontologies) to achieve this highly desirable goal of improving the discoverability, accessibility, interoperability, and reuse of digital resources in the field of biocatalysis.
Amine transaminases (ATAs) are powerful biocatalysts for the stereoselective synthesis of chiral amines. Machine learning provides a promising approach for protein engineering, but activity prediction models for ATAs remain elusive due to the difficulty of obtaining high-quality training data. Thus, we first created variants of the ATA from Ruegeria sp. (3FCR) with improved catalytic activity (up to 2000-fold) as well as reversed stereoselectivity by a structure-dependent rational design and collected a high-quality dataset in this process. Subsequently, we designed a modified one-hot code to describe steric and electronic effects of substrates and residues within ATAs. Finally, we built a gradient boosting regression tree predictor for catalytic activity and stereoselectivity, and applied this for the data-driven design of optimized variants which then showed improved activity (up to 3-fold compared to the best variants previously identified). We also demonstrated that the model can predict the catalytic activity for ATA variants of another origin by retraining with a small set of additional data.
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.