Standard NER Tagging Scheme for Big Data Healthcare Analytics built on Unified Medical Corpora
Keywords:big data, endocrine diseases, international diabetes federation, healthcare analytics, ICD-10, medical Corpora, NLP
The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing, or treating patients for uniform phenotype features from patients’ profile. The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ data set in Comma Separated Values (CSV) format diagnosed with diabetes mellitus and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www. icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming (NLP) techniques and frameworks like TensorFlow, Keras, Long Short-Term Memory (LSTM), and Bi-LSTM. In our preliminary experiments, albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.
How to Cite
Copyright (c) 2022 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.