Differential Privacy on Large Language Models for Privacy Preserving Clinical Coding

Research output: Chapter in Book/Conference paperConference paperpeer-review

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced performance across various Natural Language Processing (NLP) tasks. In certain fields, particularly healthcare, the risk of data leakage in research data management is a critical concern when employing LLMs. To ensure data privacy, recent studies have adopted approaches, such as de-identification by masking out personal identifiable information. However, these anonymisation techniques remain vulnerable to various attacks, including linkage attacks, attribute inference attacks, and membership inference attacks. Differential privacy is a robust anonymisation technique that constrains the influence of individual data samples during model training to address data leakage. Nonetheless, the trade-off between utility and privacy protection remains challenging. Moreover, while differential privacy has been extensively studied in the context of tabular and image data, its application in NLP, especially with clinical data, is limited. In this paper, we explore the integration of differential privacy into the fine-tuning process of LLMs for clinical data, covering a range of model sizes and privacy standards within a healthcare context. We utilise these LLMs to generate synthetic medical notes and assess the privacy and utility of our differential privacy training approach by deploying these synthetic notes in a downstream clinical coding task. Our findings demonstrate that synthetic data from differential privacy-based LLMs achieve comparable or superior classification accuracy to non-differential privacy-based LLMs.

Original languageEnglish
Title of host publicationProceedings of the International Joint Conference on Neural Networks
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISBN (Print)9798331510428
DOIs
Publication statusPublished - 2025
Event2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy
Duration: 30 Jun 20255 Jul 2025

Publication series

NameProceedings of the International Joint Conference on Neural Networks
ISSN (Print)2161-4393

Conference

Conference2025 International Joint Conference on Neural Networks, IJCNN 2025
Country/TerritoryItaly
CityRome
Period30/06/255/07/25

Fingerprint

Dive into the research topics of 'Differential Privacy on Large Language Models for Privacy Preserving Clinical Coding'. Together they form a unique fingerprint.

Cite this