Abstract
With the increasingly prevalent adoption of electronic health records (EHR) worldwide, there is a need to expand the use of this technology past patient care and into clinical research. The availability of doctor’s summaries of patient visits is useful to this study as it provides a detailed report of a patient’s care, ranging from types of treatment and prescriptions issued. However, a major barrier to clinical research is its narrative based style, creating machine readability problems with its unstructured content. In this study, I created a computational model that is able to classify EHRs into categories of diagnoses, through a feature extraction method that can recognize relevant variable-length phrases and classify doctor’s notes using a modified decision tree methodology. My model was able to determine the status of patients of various experiences with smoking to an accuracy of 91%, and identify patients with complications related to obesity to an accuracy of 82%. Further, I trained my model to analyze specific words and phrases that are used universally within doctor’s notes, such as the words tobacco, cigarette, and pregnant. I also created a procedural ontology mapping tool that has potential to be used to find previously unknown links between symptoms. Through computational predictions, I formulated a generalized method for early clinical diagnosis of diseases and ailments and to further increase understanding of the symptoms that cause them.
With the increasingly prevalent adoption of electronic health records (EHR) worldwide, there is a need to expand the use of this technology past patient care and into clinical research. The availability of doctor’s summaries of patient visits is useful to this study as it provides a detailed report of a patient’s care, ranging from types of treatment and prescriptions issued. However, a major barrier to clinical research is its narrative based style, creating machine readability problems with its unstructured content. In this study, I created a computational model that is able to classify EHRs into categories of diagnoses, through a feature extraction method that can recognize relevant variable-length phrases and classify doctor’s notes using a modified decision tree methodology. My model was able to determine the status of patients of various experiences with smoking to an accuracy of 91%, and identify patients with complications related to obesity to an accuracy of 82%. Further, I trained my model to analyze specific words and phrases that are used universally within doctor’s notes, such as the words tobacco, cigarette, and pregnant. I also created a procedural ontology mapping tool that has potential to be used to find previously unknown links between symptoms. Through computational predictions, I formulated a generalized method for early clinical diagnosis of diseases and ailments and to further increase understanding of the symptoms that cause them.