Evaluating machine-learning models for predicting hospital transfers in administrative data: a study of admissions for myocardial infarction

Research output: Contribution to journalAbstract/Meeting Abstractpeer-review


Hospital administrative data is a valuable source to measure myocardial infarction (MI) rates. However, admission counts are susceptible to over-inflation if the patient is transferred multiple times during a single episode of care, and variables denoting transfers may not be reliable. To obtain an accurate number of events, hospital transfers need to be correctly identified.

Objectives and Approach
We assessed multivariable logistic regression and various machine-learning models to predict transfers in hospital administrative data. Using Western Australian linked hospital data, we identified records from 2000-2016 with a principal discharge diagnosis of MI. Our standard method to compare against was a 24-hour look-back to identify a transfer using just admission and separation dates from the current and previous records for the same patient. Multivariable logistic regression and decision trees with various boosting algorithms were used to predict if a single record was a transfer, using variables recorded in the admission (e.g. age, sex, type of hospital, admitted from, emergency/elective admission). The performance of each model was calculated using metrics including area under the curve (AUC).

Records in the training, validation and testing samples had similar characteristics: mean age=68.9 years, 66% were male and 58% admitted to tertiary hospitals. Gradient Boosting Decision Tree (AUC=0.887, 95%CI: 0.886-0.887) outperformed multivariable logistic regression (AUC=0.875; 95% CI: 0.869-0.881) and random forest models (AUC=0.859; 95% CI: 0.853-0.865).

Conclusion / Implications
Multivariable logistic regression and machine-learning models are able to identify transfers in a single record from existing variables. They can be used in unlinked hospital administrative data where records belonging to the same patient cannot be identified.
Original languageEnglish
JournalInternational Journal of Population Data Science
Issue number5
Publication statusPublished - 9 Dec 2020
EventInternational Population Data Linkage Network 2020 Conference: Data Linkage: Information to Impact - Virtual conference, Virtual
Duration: 1 Nov 202013 Nov 2020


Dive into the research topics of 'Evaluating machine-learning models for predicting hospital transfers in administrative data: a study of admissions for myocardial infarction'. Together they form a unique fingerprint.

Cite this