Discussion – Text Mining the EHR 1414 unread replies.1414 replies. This special
Discussion – Text Mining the EHR
1414 unread replies.1414 replies.
This special topic looks at the use of text as an additional source of data for medical information. EHRs contain a wide variety of data, such as vitals, medications, procedure codes, etc., in a structured format that makes it easy to retrieve the needed information. However, structured data alone may not be sufficient to adequately care for a patient because the data lacks specificity or may be missing entirely. Therefore, researchers have explored text mining as an alternative route for uncovering information buried in text.
Text is inherently messy, especially in the EHR. Unlike structured data, automatically obtaining information from text using a computer is not a straightforward task. The meaning of a word or phrase is often dependent on its context (e.g., “cold” can refer to “being cold,” “having a cold,” “chronic obstructive lung disease,” etc.). In addition, computer algorithms can also have difficulties processing text that is grammatically incorrect, contain spelling errors, use ambiguous acronyms and abbreviations, etc.
There are numerous techniques for processing text. The main reading is an example of statistical text mining (STM), a technique which uses the counts of words to describe a document instead of trying to infer meaning directly from the text. The reading discusses a project where STM was used to determine if evidence of a fall can be found in clinical progress notes.
The main reading for this special topic is the McCart et al. (JAMIA 2012) article “Finding Falls in Ambulatory Care Clinical Documents using Statistical Text Mining”: McCart_et_al_12_jamia_falls.pdf download.
Further Readings (Optional)
This additional reading gives another example of how text mining may be used to support the development of an ontology: Luther_et_al_11_stm_ontology.pdf download. In text-based work, ontologies are widely used to support text-based analysis such as natural language processing (NLP), information extraction (IE), and information retrieval (IR).
One primary post (minimum 300 words) i