The Apache™ clinical Text Analysis and Knowledge Extraction System (cTAKES™) focuses on extracting knowledge from clinical text through Natural Language Processing (NLP) techniques.
cTAKES is engineered in a modular fashion and employs leading-edge rule-based and machine learning methods.
cTAKES has standard features for biomedical text processing software, including the ability to extract concepts such as symptoms, procedures, diagnoses, medications and anatomy with attributes and standard codes.
More powerful components can perform tasks as complex as identifying temporal events, dates and times – resulting in placement of events in a patient timeline.
Components are trained on gold standards from the biomedical as well as the general domain. This affords usability across different types of clinical narrative (e.g. radiology reports, clinical notes, discharge summaries) in various institution formats as well as other types of health-related narrative (e.g. twitter feeds), using multiple data standards (e.g. Health Level 7 (HL7), Clinical Document Architecture (CDA), Fast Healthcare Interoperability Resources (FHIR), SNOMED-CT, RxNORM).
cTAKES is the NLP platform for many initiatives across the world covering a variety of research purposes and large datasets. Contributors include professionals at medical and commercial institutions, NLP and Machine Learning researchers, Medical Doctors, and students of many disciplines and levels. We encourage people from all backgrounds to get involved!