Standard NER Tagging Scheme for Big Data Healthcare Analytics built on Unified Medical Corpora
DOI:
https://doi.org/10.37965/jait.2022.0127Keywords:
big data, endocrine diseases, international diabetes federation, healthcare analytics, ICD-10, medical Corpora, NLPAbstract
The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing, or treating patients for uniform phenotype features from patients’ profile. The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ data set in Comma Separated Values (CSV) format diagnosed with diabetes mellitus and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www. icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming (NLP) techniques and frameworks like TensorFlow, Keras, Long Short-Term Memory (LSTM), and Bi-LSTM. In our preliminary experiments, albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.
Metrics
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Authors
This work is licensed under a Creative Commons Attribution 4.0 International License.