Predictive healthcare modelling via kernel-based machine learning
PhD thesis
Nwegbu, N. 2023. Predictive healthcare modelling via kernel-based machine learning. PhD thesis Middlesex University Computer Science
Type | PhD thesis |
---|---|
Title | Predictive healthcare modelling via kernel-based machine learning |
Authors | Nwegbu, N. |
Abstract | Predictive modelling of clinical data is fraught with challenges arising from the manner in which events are recorded. Firstly, the aggregated electronic health records (EHR) contain complementary information from multiple sources and are characterised as heterogeneous due to the disparate innate properties of their constituents. Secondly, patients typically fall ill at irregular intervals and experience dissimilar intervention trajectories. This results in irregularly sampled and uneven-length heterogeneous data, which poses a problem for standard multivariate tools. The alternative of feature extraction into equal-length vectors via methods like bag-of-words (BoW) potentially discards useful information. This research proposes an approach based on a kernel framework, in which data is maintained in its native form: discrete sequences of symbols. Bespoke kernel functions derived from variants of edit distance between pairs of sequences may then be utilized in conjunction with support vector machines (SVM) to classify the data. The framework via multi-kernel learning (MKL) provides a principled way of addressing the problem of modelling heterogeneous EHR entities; thus, we can algebraically combine multiple base kernels derived from real-valued and categorical entities into a single model. It also provides a means to combine weak discriminative standalone kernels in order to achieve superior results. The proposed method was evaluated in the context of a prediction task involving determining susceptible patients likely to succumb to type 2 diabetes following an earlier episode of elevated blood pressure of 130/80 mmHg. Kernels combined via multi-kernel learning achieved an F1-score of 0.96, outperforming classification with SVM 0.63, Logistic Regression 0.63, Long Short Term Memory 0.61 and Multi-Layer Perceptron 0.54 applied to a BoW representation of the data. An F1-score of 0.91 was achieved by combining symbolic kernels with kernels derived from 11 real-valued test measurements. The findings also showed a higher F1-score of 0.93 was achieved in a similar heterogeneous combination of kernels derived from symbolic EHR and from a single test measure, ‘Serum bilirubin level’ (Read code 44E..00). In addition, as a means of external validation of the proposed framework, an F1-score of 0.97 was achieved with MKL on an external dataset. The proposed approach is consequently able to overcome the limitations associated with feature-based classification in the context of clinical data. |
Sustainable Development Goals | 9 Industry, innovation and infrastructure |
3 Good health and well-being | |
Middlesex University Theme | Health & Wellbeing |
Department name | Computer Science |
Institution name | Middlesex University |
Publisher | Middlesex University Research Repository |
Publication dates | |
Online | 13 Mar 2024 |
Publication process dates | |
Accepted | 16 Mar 2023 |
Deposited | 13 Mar 2024 |
Output status | Published |
Accepted author manuscript | File Access Level Open |
Language | English |
https://repository.mdx.ac.uk/item/10xq8v
Download files
46
total views78
total downloads2
views this month7
downloads this month