Automatic data extraction from 24 hour blood pressure measurement reports of a large multicenter clinical trial

Janis M. Nolde, Ajmal Mian, Luca Schlaich, Justine Chan, Leslie Marisol Lugo-Gavidia, Nicola Barrie, Vishal Gopal, Graham S. Hillis, Clara K. Chow, Markus P. Schlaich

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Background and objectives: Ambulatory blood pressure monitoring (ABPM) is usually reported in descriptive values such as circadian averages and standard deviations. Making use of the original, individual blood pressure measurements may be advantageous, particularly for research purposes, as this increases the flexibility of the analytical process, enables alternative statistical analyses and provide novel insights. Here we describe the development of a new multistep, hierarchical data extraction algorithm to collect raw data from .pdf reports and text files as part of a large multi-center clinical study. Methods: Original reports were saved in a nested file system, from which they were automatically extracted, read and saved into databases with custom made programs written in Python 3. Data were further processed, cleaned and relevant descriptive statistics such as averages and standard deviations calculated according to a variety of definitions of day- and night-time. Additionally, data control mechanisms for manual review of the data and programmatic auto-detection of extraction errors was implemented as part of the project. Results: The developed algorithm extracted 97% of the data automatically, the missing data consisted mostly of reports that were saved incorrectly or not formatted in the specified way. Manual checks comparing samples of the extracted data to original reports indicated a high level of accuracy of the extracted data, no errors introduced due to flaws in the extraction software were detected in the extracted dataset. Conclusions: The developed multistep, hierarchical data extraction algorithm facilitated collection from different file formats and paired with database cleaning and data processing steps led to an effective and accurate assembly of raw ABPM data for further and adjustable analyses. Manual work was minimized while data quality was ensured with standardized, reproducible procedures.

Original languageEnglish
Article number106588
JournalComputer Methods and Programs in Biomedicine
Publication statusPublished - Feb 2022


Dive into the research topics of 'Automatic data extraction from 24 hour blood pressure measurement reports of a large multicenter clinical trial'. Together they form a unique fingerprint.
  • Data science and cardiovascular risk

    Nolde, J. M., 2022, (Unpublished)

    Research output: ThesisDoctoral Thesispeer-review

    103 Downloads (Pure)

Cite this