Comparative Analysis of Pre-Trained Language Models for Medical Document Classification and Priority-Based Workflow Routing

Qiaomu Zhang

PDF

Published: 2026-01-18

Keywords:

medical document classification, pre-trained language models, workflow optimization, multi-task learning

Qiaomu Zhang

Computer Science, Rice University, TX, USA

Abstract

Medical document processing in healthcare systems faces significant challenges due to the exponential growth in data volume and the complexity of clinical terminology. This paper presents a comprehensive comparative analysis of pre-trained language models for medical document classification and priority-based workflow routing. We evaluate BioBERT, ClinicalBERT, and base BERT models through systematic fine-tuning on diverse medical document types, including clinical notes, diagnostic reports, and insurance claims. Our multi-task learning architecture simultaneously performs document classification and priority scoring, achieving 94.7% classification accuracy and an AUC-ROC of 0.928 for urgency detection. The proposed approach reduces per-document handling time by 99.9%, cutting average manual review from 4.3 minutes per document to 0.31 seconds, while maintaining high accuracy across heterogeneous medical texts. Experimental results on 45,000 annotated medical documents demonstrate that domain-adapted models outperform general-purpose transformers by 8.3 percentage points. The integration of shared representation learning with task-specific output layers enables efficient workflow optimization, allowing for the processing of documents at 0.31 seconds per item with GPU acceleration. These findings provide actionable insights for healthcare organizations implementing automated document management systems.

Issue

Vol. 1 No. 4 (2025): 2025 International Conference on Digital Intelligence and Computing Technologies (DICT 2025)

Section

Articles

How to Cite

Comparative Analysis of Pre-Trained Language Models for Medical Document Classification and Priority-Based Workflow Routing. (2026). Journal of Sustainability, Policy, and Practice, 1(4), 205-221. https://schoalrx.com/index.php/jspp/article/view/75

References

1. T. Jokioja, H. Moen, and F. Ginter, "Deep learning in medical document classification," Computer Science, 2020.

2. N. Zhang, and M. Jankowski, "Hierarchical BERT for medical document understanding," arXiv preprint arXiv:2204.09600, 2022.

3. A. K. Rai, U. S. Aswal, S. K. Muthuvel, A. Sankhyan, S. L. Chari, and A. K. Rao, "Clinical text classification in healthcare: Leveraging bert for nlp," In 2023 International Conference on Artificial Intelligence for Innovations in Healthcare Industries (ICAIIHI), December, 2023, pp. 1-7. doi: 10.1109/icaiihi57871.2023.10489434

4. T. Asha, S. Soundaryaa, and R. Rajakumari, "Enhancing Patient Welfare using Natural Language Processing (NLP): A Paradigm Shift in Prescription Technology," In 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), November, 2023, pp. 1255-1261.

5. G. Bo, W. Shanshan, Z. Qing, P. Bo, and Z. Yan, "Empowering medical data analysis: an advanced deep fusion model for sorting medicine document," IEEE Access, vol. 12, pp. 1650-1659, 2023. doi: 10.1109/access.2023.3347029

6. D. R. CH, "Enhanced Named Entity Recognition in Medical Texts Using Transformer-Based Models," In 2024 2nd International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), December, 2024, pp. 1-5. doi: 10.1109/scopes64467.2024.10990753

7. T. KAMPU, M. NII, E. NAKANISHI, and R. SAKASHITA, "Nursing-Care Text Classification and Extraction of Important Terms for Classification Using Transformers," In 2024 International Conference on Machine Learning and Cybernetics (ICMLC), September, 2024, pp. 427-432. doi: 10.1109/icmlc63072.2024.10935220

8. A. Khaliq, A. Khan, S. A. Awan, S. Jan, M. Umair, and M. F. Zuhairi, "Integrating topic-aware heterogeneous graph neural network with transformer model for medical scientific document abstractive summarization," IEEE Access, vol. 12, pp. 113855-113866, 2024.

9. A. Joshi, Y. Singh, V. Pareek, I. Sharda, and T. Jain, "Medical Document Classification using NLP Techniques," In 2024 First International Conference on Software, Systems and Information Technology (SSITCON), October, 2024, pp. 1-6. doi: 10.1109/ssitcon62437.2024.10795870

10. S. Maleki Varnosfaderani, and M. Forouzanfar, "The role of AI in hospitals and clinics: transforming healthcare in the 21st century," Bioengineering, vol. 11, no. 4, p. 337, 2024. doi: 10.3390/bioengineering11040337

11. F. López-Martínez, E. R. Núñez-Valdez, V. García-Díaz, and Z. Bursac, "A case study for a big data and machine learning platform to improve medical decision support in population health management," Algorithms, vol. 13, no. 4, p. 102, 2020. doi: 10.3390/a13040102

12. T. S. Heo, Y. Yoo, Y. Park, B. Jo, K. Lee, and K. Kim, "Medical code prediction from discharge summary: Document to sequence bert using sequence attention," In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), December, 2021, pp. 1239-1244.

13. C. A. Seralathan, G. Rupesh, V. M. Tarun Jayant, and R. Kingsy Grace, "Healthscribe: Revolutionizing Medical Documents with Large Language Models," In 2025 International Conference on Next Generation Computing Systems (ICNGCS), August, 2025, pp. 1-6.

14. P. Elamparithi, M. Bhavani, and R. Karthick, "BERT and RoBERTa Model Based Approach for Text to Diseases Classification," In 2025 International Conference on Emerging Trends in Industry 4.0 Technologies (ICETI4T), June, 2025, pp. 1-6.

15. T. Shen, X. Zhang, D. Lee, M. T. Quasim, and C. Wang, "Intent-based ioT network slicing for smart healthcare systems: A knowledge-driven multi-path resource orchestration framework," IEEE Internet of Things Journal, 2025. doi: 10.1109/jiot.2025.3622689

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite

References