Refine
Document Type
- Article (4)
Language
- English (4)
Has Fulltext
- yes (4)
Is part of the Bibliography
- no (4)
Keywords
- Biocatalysis (4) (remove)
Institute
Publisher
- Wiley (4)
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Abstract
Enzyme activity data for biocatalytic applications are currently often not annotated with standardized conditions and terms. This makes it extremely hard to retrieve, compare, and reuse enzymatic data. With advances in the fields of artificial intelligence (AI) and machine learning (ML), the automated usability of data in the form of machine‐readable annotations will play a crucial role for their success. It is becoming increasingly easy to retrieve complex data sets and extract relevant information; however, standardized data readability is a current limitation. In this contribution, we outline an iterative approach to develop standardized terms and create semantic relations (ontologies) to achieve this highly desirable goal of improving the discoverability, accessibility, interoperability, and reuse of digital resources in the field of biocatalysis.
Amine transaminases (ATAs) are powerful biocatalysts for the stereoselective synthesis of chiral amines. Machine learning provides a promising approach for protein engineering, but activity prediction models for ATAs remain elusive due to the difficulty of obtaining high-quality training data. Thus, we first created variants of the ATA from Ruegeria sp. (3FCR) with improved catalytic activity (up to 2000-fold) as well as reversed stereoselectivity by a structure-dependent rational design and collected a high-quality dataset in this process. Subsequently, we designed a modified one-hot code to describe steric and electronic effects of substrates and residues within ATAs. Finally, we built a gradient boosting regression tree predictor for catalytic activity and stereoselectivity, and applied this for the data-driven design of optimized variants which then showed improved activity (up to 3-fold compared to the best variants previously identified). We also demonstrated that the model can predict the catalytic activity for ATA variants of another origin by retraining with a small set of additional data.
Long-chain aliphatic amines such as (S,Z)-hepta- dec-9-en-7-amine and 9-aminoheptadecane were synthesized from ricinoleic acid and oleic acid, respectively, by whole-cell cascade reactions using the combination of an alcohol dehydrogenase (ADH) from Micrococcus luteus, an engi- neered amine transaminase from Vibrio fluvialis (Vf-ATA), and a photoactivated decarboxylase from Chlorella variabilis NC64A (Cv-FAP) in a one-pot process. In addition, long chain aliphatic esters such as 10-(heptanoyloxy)dec-8-ene and octyl- nonanoate were prepared from ricinoleic acid and oleic acid, respectively, by using the combination of the ADH, a Baeyer– Villiger monooxygenase variant from Pseudomonas putida KT2440, and the Cv-FAP. The target compounds were produced at rates of up to 37 U g1 dry cells with conversions up to 90 %. Therefore, this study contributes to the preparation of industrially relevant long-chain aliphatic chiral amines and esters from renewable fatty acid resources.