TY - GEN
T1 - Differential Privacy on Large Language Models for Privacy Preserving Clinical Coding
AU - Marshall, Ben
AU - Li, Sirui
AU - Meka, Shiv Akarsh
AU - Liu, Wei
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Recent advancements in Large Language Models (LLMs) have significantly enhanced performance across various Natural Language Processing (NLP) tasks. In certain fields, particularly healthcare, the risk of data leakage in research data management is a critical concern when employing LLMs. To ensure data privacy, recent studies have adopted approaches, such as de-identification by masking out personal identifiable information. However, these anonymisation techniques remain vulnerable to various attacks, including linkage attacks, attribute inference attacks, and membership inference attacks. Differential privacy is a robust anonymisation technique that constrains the influence of individual data samples during model training to address data leakage. Nonetheless, the trade-off between utility and privacy protection remains challenging. Moreover, while differential privacy has been extensively studied in the context of tabular and image data, its application in NLP, especially with clinical data, is limited. In this paper, we explore the integration of differential privacy into the fine-tuning process of LLMs for clinical data, covering a range of model sizes and privacy standards within a healthcare context. We utilise these LLMs to generate synthetic medical notes and assess the privacy and utility of our differential privacy training approach by deploying these synthetic notes in a downstream clinical coding task. Our findings demonstrate that synthetic data from differential privacy-based LLMs achieve comparable or superior classification accuracy to non-differential privacy-based LLMs.
AB - Recent advancements in Large Language Models (LLMs) have significantly enhanced performance across various Natural Language Processing (NLP) tasks. In certain fields, particularly healthcare, the risk of data leakage in research data management is a critical concern when employing LLMs. To ensure data privacy, recent studies have adopted approaches, such as de-identification by masking out personal identifiable information. However, these anonymisation techniques remain vulnerable to various attacks, including linkage attacks, attribute inference attacks, and membership inference attacks. Differential privacy is a robust anonymisation technique that constrains the influence of individual data samples during model training to address data leakage. Nonetheless, the trade-off between utility and privacy protection remains challenging. Moreover, while differential privacy has been extensively studied in the context of tabular and image data, its application in NLP, especially with clinical data, is limited. In this paper, we explore the integration of differential privacy into the fine-tuning process of LLMs for clinical data, covering a range of model sizes and privacy standards within a healthcare context. We utilise these LLMs to generate synthetic medical notes and assess the privacy and utility of our differential privacy training approach by deploying these synthetic notes in a downstream clinical coding task. Our findings demonstrate that synthetic data from differential privacy-based LLMs achieve comparable or superior classification accuracy to non-differential privacy-based LLMs.
KW - Clinical Data
KW - Data Privacy
KW - Differential Privacy
KW - Large Language Models
KW - Privacy Preserving
UR - https://www.scopus.com/pages/publications/105029083856
U2 - 10.1109/IJCNN64981.2025.11229049
DO - 10.1109/IJCNN64981.2025.11229049
M3 - Conference paper
AN - SCOPUS:105029083856
SN - 9798331510428
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - Proceedings of the International Joint Conference on Neural Networks
PB - IEEE, Institute of Electrical and Electronics Engineers
T2 - 2025 International Joint Conference on Neural Networks, IJCNN 2025
Y2 - 30 June 2025 through 5 July 2025
ER -