Standard NER Tagging Scheme for Big Data Healthcare Analytics built on Unified Medical Corpora

Authors

  • Sarah Shafqat Department of basic and Applied Sciences, International Islamic University, Pakistan https://orcid.org/0000-0002-6080-6765
  • Hammad Majeed Department of Computer Science, National University of Computer and Emerging Sciences, Pakistan
  • Qaisar Javaid Department of basic and Applied Sciences, International Islamic University, Pakistan
  • Hafiz Farooq Ahmad Computer Science Department, College of Computer Sciences and Information Technology, King Faisal University, Saudi Arabia

DOI:

https://doi.org/10.37965/jait.2022.0127

Keywords:

big data, endocrine diseases, international diabetes federation, healthcare analytics, ICD-10, medical Corpora, NLP

Abstract

The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing, recommending, prescribing, or treating patients for uniform phenotype features from patients’ profile. The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning. Therefore, here we demonstrated a mechanism to come up with uniform NER (Named Entity Recognition) tagged medical corpora that is fed with 14407 endocrine patients’ data set in Comma Separated Values (CSV) format diagnosed with diabetes mellitus and comorbidity diseases. The other corpus is of ICD-10-CM coding scheme in text format taken from www. icd10data.com. ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming (NLP) techniques and frameworks like TensorFlow, Keras, Long Short-Term Memory (LSTM), and Bi-LSTM. In our preliminary experiments, albeit label sets in form of (instance, label) pair were tagged with Sequential() model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms. The maximum accuracy achieved for model validation was 0.8846.

Metrics

Metrics Loading ...

Author Biographies

Hammad Majeed, Department of Computer Science, National University of Computer and Emerging Sciences, Pakistan

Dr. Hammad Majeed is  teaching at National University of Computer and Emerging Sciences as an Associate  professor. He has research interests in the areas of Artificial Intelligence, Computational Intelligence, Machine Learning, Data Mining & Knowledge Discovery, Evolutionary Gaming, Machine Vision & Robotics. He is actively invovled in research in these areas.

Qaisar Javaid, Department of basic and Applied Sciences, International Islamic University, Pakistan

Assistant Professor / Incharge, CISCO Networking Academy

Hafiz Farooq Ahmad, Computer Science Department, College of Computer Sciences and Information Technology, King Faisal University, Saudi Arabia

Hafiz Farooq Ahmad received the Ph.D. degree in distributed computing from Tokyo Institute of Technology, Tokyo, Japan. He is currently an Associate Professor with the College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al Ahsa, Saudi Arabia. He is the pioneer for Semantic Web Application Firewall (SWAF) in cooperation with DTS Inc., Japan, in 2010. He contributed in agent cites project, a European funded research and development project for agent systems. He initiated Scalable fault tolerant Agent Grooming Environment (SAGE) Project and proposed the concept of decentralized multi agent systems SAGE back, in 2002. He has more than 100 international publications, including a book on security in sensors. His research interests include semantics systems, machine learning, health informatics, and Web application security. He has been awarded a number of national and international awards, such as the Best Researcher Award of the Year 2011 by NUST, the PSF/COMSTECH Best Researcher of the Year 2005, and the Star Laureate Award, in 2004.

Downloads

Published

2022-08-22

How to Cite

Shafqat, S., Majeed, H., Javaid, Q., & Ahmad, H. F. (2022). Standard NER Tagging Scheme for Big Data Healthcare Analytics built on Unified Medical Corpora. Journal of Artificial Intelligence and Technology, 2(4), 152–157. https://doi.org/10.37965/jait.2022.0127

Issue

Section

Research Article