keyboard_arrow_up
Machine Learning Classification of Hemoglobin Beta Gene Mutations

Authors

Anja Radomirovi c , University Union , Serbia

Abstract

Mutations in the HBB gene cause severe hemoglobinopathies such as sickle cell disease and beta-thalassemia. Accurate HBB variant classification is crucial for diagnosis but remains challenging. I present a bioinformatics pipeline integrating HGVS parsing, Ensembl annotation, SpliceAI, and BioPython to analyze 1,809 ClinVar variants. Seven models were trained with SMOTE. XGBoost achieved an F1-score of 0.9495 and perfect recall, though ROC-AUC 0.4489 showed discrimination limits. Results highlight ML challenges for single-gene classification and importance of data quality in genomic medicine.

Keywords

HBB gene, variant pathogenicity, machine learning, protein encoding, XGBoost, hemoglobinopathies

Full Text  Volume 16, Number 1