Distilling knowledge from high quality biobank data towards the discovery of risk factors for patients with cardiovascular diseases and depression
Vasileios C. Pezoulas; Georg Ehret; Jos Bosch; Dimitris Fotiadis; Antonis Sakellarios
Abstract
Cardiovascular disease (CVD) is the leading cause of death worldwide. Patients with CVD may also suffer from mental disorders, such as, depression which is a common comorbid condition. However, the risk factors for depression in CVD patients have not been extensively investigated in the literature. In this work, we utilized a hybrid and explainable AI-empowered workflow to identify underlying factors for CVD and depression. Towards this direction, we acquired a subset of the UK Biobank (UKB), including 157,302 patients with depression assessment and CVD. At the first step, 701 features were selected from the UKB, upon clinical guidance, including demographics, blood tests, mental examinations, and clinical assessments. An automated biobank data curation pipeline was applied to transform the UKB subset into a high-quality dataset by removing outliers, and genes with increased variability. A hybrid version of the XGBoost classifier was used to classify patients with CVD and depression, where a scalable loss function was utilized to overcome overfitting effects. Our results demonstrate that we can diagnose patients with comorbid conditions of CVD and depression with 0.80, 0.82, accuracy, and sensitivity, respectively, where the mood swings, BMI, and age, were identified as biomarkers, among others. To our knowledge, this is the first case study aiming to distil knowledge from the UKB to identify cost effective risk factors for patients with CVD and depression.
Keywords: nan
Links
[Full text PDF][Bibtex]