Machine Learning based to Predict B-Cell Epitope Region Utilizing Protein Features

Fatema Nafa and Ryan Kanoff, Salem State University, USA; Fatema Nafa and Ryan Kanoff, Salem State University, USA

Machine Learning based to Predict B-Cell Epitope Region Utilizing Protein Features

Authors

Fatema Nafa and Ryan Kanoff, Salem State University, USA

Abstract

Considering the current state of Covid-19 pandemic, vaccine research and production is more important than ever. Antibodies recognize epitopes, which are immunogenic regions of antigen, in a very specific manner, to trigger an immune response. It is extremely difficult to predict such locations, yet they have substantial implications for complex humoral immunogenicity pathways. This paper presents a machine learning epitope prediction model. The research creates several models to test the accuracy of B-cell epitope prediction based solely on protein features. The goal is to establish a quantitative comparison of the accuracy of three machine learning models, XGBoost, CatBoost, and LightGbM. Our results found similar accuracy between the XGBoost and LightGbM models with the CatBoost model having the highest accuracy of 82%. Though this accuracy is not high enough to be considered reliable it does warrant further research on the subject.

Keywords

machine learning models, data exploratory techniques, B-cell epitope prediction.

CS&IT Conference Proceedings

Machine Learning based to Predict B-Cell Epitope Region Utilizing Protein Features