Severity classification and identification of biomarkers in covid-19: exome analysis of patients using support vector machines with linear kernel (SVMs)
Name: ALEXIA STEFANI SIQUEIRA ZETUM
Publication date: 24/02/2025
Examining board:
Name![]() |
Role |
---|---|
DEBORA DUMMER MEIRA | Coorientador |
ELIZEU FAGUNDES DE CARVALHO | Examinador Externo |
FLAVIA DE PAULA | Examinador Interno |
IURI DRUMOND LOURO | Presidente |
Summary: Introduction: SARS-CoV-2 infection presents a wide spectrum of clinical manifestations. Genetic variations may influence the host's response to the virus. The use of Machine Learning (ML) has shown promise in identifying genetic biomarkers and individuals who may develop severe forms of the disease. Objective: To develop an ML model using exome data to predict
clinical outcomes in COVID-19 patients and identify genes potentially associated with disease severity. Methodology: The study involved data from 239 COVID-19 patients ("Non-severe" and "Severe"). DNA sequencing was performed, and ancestry analysis was conducted. A Support Vector Machine (SVM) model with a linear kernel was developed to predict COVID-19 severity, utilizing Recursive Feature Elimination (RFE) to select the most influential variants. Metrics such as Area Under the Curve-Receiver perating Characteristic (AUC-ROC), accuracy, F1 score, sensitivity, and specificity were used. Subsequently, logistic regression (LR) analysis was performed with the variants selected by SVM-RFE and confounding variables. Results and Discussion: The SVM model with a linear kernel achieved an AUC-ROC of 0,81, accuracy of 83%, and an F1 score of 0,78, indicating a good capacity to discriminate between "Severe" and "Non-severe" cases of COVID-19. Fifteen variants were selected by the model, of which seven were significantly associated with disease severity in the LR analysis. Risk variants include WSCD1 (rs2302837 "A/A" or "A/G," 95% CI: 1,32–7,24, OR: 3,09, P < 0,01), PTPRS (rs1143700 "A/A" or "A/G," 95% CI: 1,54–7,07, OR: 3,30, P < 0,01), ARVCF (rs2073744 "A/A" or "A/G," 95% CI: 1,31–6,30, OR: 2,88, P < 0,01), and LVRN (rs10078759 "G/G" or "G/C," 95% CI: 1,07–4,31, OR: 2,08, P = 0,04). Conversely, protective variants include ALDH4A1 (rs6426813 "G/G" or "G/A," 95% CI: 0,23–0,93, OR: 0,48, P = 0,02), ARHGAP22 (rs10776601 "C/C" or "C/T," 95% CI: 0,09–0,56, OR: 0,23, P < 0,01), and C3 (rs423490 "A/A" or "A/G," 95% CI: 0,14–0,70, OR: 0,32, P < 0,01). The results demonstrated that the SVM with a linear kernel is effective in predicting COVID-19 severity using exome data. The protein-protein interaction (PPI) network analysis identified biological pathways associated with the immune system, inflammatory response, and blood coagulation. Genes such as C3, PTPRS, and LVRN stood out in functions related to immune response regulation and inflammation modulation, suggesting these pathways are directly linked to adverse COVID-19 outcomes. The network also revealed the interconnection between cellular signaling processes and stress response mechanisms, which may explain the variability in clinical responses observed among patients. Conclusion: The SVM with a linear kernel using our data proved effective in predicting COVID-19 severity. This study highlights the importance of integrative approaches to better understanding the disease. Identifying genetic biomarkers can aid in treatment and management of future
pandemics.