3D Convolutional Neural Networks for Identifying Protein Interfaces

Protein interaction is a fundamental part of nearly all biochemical processes and proteins evolved specific surface regions for molecular recognition and interaction. These regions are different from the remaining surface, with different amino acid compositions, geometry and chemical properties. Det...

ver descrição completa

Detalhes bibliográficos
Autor principal: Pascoal, Cláudio (author)
Formato: masterThesis
Idioma:eng
Publicado em: 2021
Assuntos:
Texto completo:http://hdl.handle.net/10362/123467
País:Portugal
Oai:oai:run.unl.pt:10362/123467
Descrição
Resumo:Protein interaction is a fundamental part of nearly all biochemical processes and proteins evolved specific surface regions for molecular recognition and interaction. These regions are different from the remaining surface, with different amino acid compositions, geometry and chemical properties. Detecting protein interfaces can lead to a better understanding of protein interactions granting advantages to fields such as drug design and metabolic engineering. Most of the existing interface predictors use structured data, clearly defined data types usually obtained from data sets. However, proteins are very complex molecules and there is not a single property capable of distinguishing the interface from the rest of the protein surface to all types of proteins. Indeed, deep learning arises as an adequate approach able to capture feature from unstructured data as images, texts, sensor data and volumes. In here, the aim was to identify interface regions in known protein spatial structures together with their biochemical properties by exploring new applications of 3D convolutional neural networks. For this, some state-of-the-art convolutional neural networks architectures were explored in order to find an architecture that suits this problem, and even more, have good performance. Other state-of-the-art machine learning predictors are also considered to identify the best biochemical properties to be added as new channels. Afterward, the interface predictions will be compared with the ground-truth, obtained by calculating the distances of atoms between the different chains of the protein complexes.