Predictive healthcare modelling via kernel-based machine learning

PhD thesis


Nwegbu, N. 2023. Predictive healthcare modelling via kernel-based machine learning. PhD thesis Middlesex University Computer Science
TypePhD thesis
TitlePredictive healthcare modelling via kernel-based machine learning
AuthorsNwegbu, N.
Abstract

Predictive modelling of clinical data is fraught with challenges arising from the manner in which events are recorded. Firstly, the aggregated electronic health records (EHR) contain complementary information from multiple sources and are characterised as heterogeneous due to the disparate innate properties of their constituents. Secondly, patients typically fall ill at irregular intervals and experience dissimilar intervention trajectories. This results in irregularly sampled and uneven-length heterogeneous data, which poses a problem for standard multivariate tools. The alternative of feature extraction into equal-length vectors via methods like bag-of-words (BoW) potentially discards useful information.

This research proposes an approach based on a kernel framework, in which data is maintained in its native form: discrete sequences of symbols. Bespoke kernel functions derived from variants of edit distance between pairs of sequences may then be utilized in conjunction with support vector machines (SVM) to classify the data. The framework via multi-kernel learning (MKL) provides a principled way of addressing the problem of modelling heterogeneous EHR entities; thus, we can algebraically combine multiple base kernels derived from real-valued and categorical entities into a single model. It also provides a means to combine weak discriminative standalone kernels in order to achieve superior results.

The proposed method was evaluated in the context of a prediction task involving determining susceptible patients likely to succumb to type 2 diabetes following an earlier episode of elevated blood pressure of 130/80 mmHg. Kernels combined via multi-kernel learning achieved an F1-score of 0.96, outperforming classification with SVM 0.63, Logistic Regression 0.63, Long Short Term Memory 0.61 and Multi-Layer Perceptron 0.54 applied to a BoW representation of the data. An F1-score of 0.91 was achieved by combining symbolic kernels with kernels derived from 11 real-valued test measurements. The findings also showed a higher F1-score of 0.93 was achieved in a similar heterogeneous combination of kernels derived from symbolic EHR and from a single test measure, ‘Serum bilirubin level’ (Read code 44E..00). In addition, as a means of external validation of the proposed framework, an F1-score of 0.97 was achieved with MKL on an external dataset.

The proposed approach is consequently able to overcome the limitations associated with feature-based classification in the context of clinical data.

Sustainable Development Goals9 Industry, innovation and infrastructure
3 Good health and well-being
Middlesex University ThemeHealth & Wellbeing
Department nameComputer Science
Institution nameMiddlesex University
PublisherMiddlesex University Research Repository
Publication dates
Online13 Mar 2024
Publication process dates
Accepted16 Mar 2023
Deposited13 Mar 2024
Output statusPublished
Accepted author manuscript
File Access Level
Open
LanguageEnglish
Permalink -

https://repository.mdx.ac.uk/item/10xq8v

Download files


Accepted author manuscript
NONwegbu thesis.pdf
File access level: Open

  • 46
    total views
  • 78
    total downloads
  • 2
    views this month
  • 7
    downloads this month

Export as