Motivation: Progress notes are narrative summaries about the status of patients during the course of treatment or care. Time and efficiency pressures have ensured clinicians' continued preference for unstructured text over entering data in forms when composing progress notes. The ability to extract meaningful data from the unstructured text contained within the notes is invaluable for retrospective analysis and decision support. The automatic extraction of data from unstructured notes, however, has been largely prevented due to the complexity of handling abbreviations, misspelling, punctuation errors and other types of noise.Objective: We present a robust system for cleaning noisy progress notes in real-time, with a focus on abbreviations and misspellings.Methods: The system uses statistical semantic analysis based on Web data and the occasional participation of clinicians to automatically replace abbreviations with the actual senses and misspellings with the correct words.Results: An accuracy of as high as 88.73% was achieved based only on statistical semantic analysis using Web data. The response time of the system with the caching mechanism enabled is 1.5-2 s per word which is about the same as the average typing speed of clinicians.Conclusions: The overall accuracy and the response time of the system will improve with time, especially when the confidence mechanism is activated through clinicians' interactions with the system. This system will be implemented in a clinical information system to drive interactive decision support and analysis functions leading to improved patient care and outcomes. (C) 2011 Elsevier B.V. All rights reserved.