Hybrid Machine Learning Approaches for Classification of Retinal Vascular Occlusions Using Multisource Clinical Text Data

Santosh Khanal; Rabindra Bista

doi:10.56294/hl2025918

Authors

Santosh Khanal Kathmandu University, Department of Computer Science and Engineering, Dhulikhel, Nepal Country Author https://orcid.org/0009-0002-5946-4646
Rabindra Bista Kathmandu University, Department of Computer Science and Engineering, Dhulikhel, Nepal Country Author https://orcid.org/0000-0002-0638-5840

DOI:

https://doi.org/10.56294/hl2025918

Keywords:

Predictive models, Unstructured clinical data, Natural Language Processing (NLP), Retinal vascular occlusions, Predictive accuracy

Abstract

Predictive models that incorporate a variety of clinical data have grown in significance as a means of improving healthcare decision-making. There is still a sizable amount of unstructured patient data that is either in the form of handwritten records that have been digitized or free-text doctor notes. This research builds a framework for the two data sources, digital clinical notes and scanned handwritten notes to perform predictive analysis. The data chosen for the research is related to Painless sudden loss of vision which is considered to be a serious ophthalmic emergency and is frequently associated with retinal vascular occlusions. Improving patient outcomes and facilitating prompt intervention need early distinction between its primary causes, which are Central Retinal Artery Occlusion (CRAO), Central Retinal Vein Occlusion (CRVO), Branch Retinal Artery Occlusion (BRAO), and Branch Retinal Vein Occlusion (BRVO). In the first stage, the unstructured clinical data from the textual/scanned format are converted into one single structured data frame using natural language processing (NLP). Structured data is then evaluated with machine learning algorithms and tested with different variations in order to identify the model that delivers the highest predictive accuracy, guided by the characteristics of the clinical data itself.

References

1. Wang, R., Jinmeng, J., Zhongxin, A., Yongli, G., Xi, N., Tieliu, S. RDAD: A Machine Learning System to Support Phenotype-Based Rare Disease Diagnosis. Frontiers in Genetics. Vol 9, 2018. doi:10.3389/fgene.2018.00587 DOI: https://doi.org/10.3389/fgene.2018.00587

2. Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, F. S., Forshee, R., Walderhaug, M., Botsis, T. Natural language processing systems for capturing and standardizing unstructured clinical information: systematic review. Journal of Biomedical Informatics,V3, 2017,ISSN:1532-0464,doi:10.1016/j.jbi.2017.07.012. DOI: https://doi.org/10.1016/j.jbi.2017.07.012

3. S. Khanal and R. Bista, "A Hybrid Model for Deciphering Doctors' Handwriting Notes Recognition," 2024 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 2024, pp. 466-470, doi: 10.1109/IICAIET62352.2024.10730188. DOI: https://doi.org/10.1109/IICAIET62352.2024.10730188

4. Rajkomar, A., Hardt, M., Howell, D. M., Corrado, G., Chin, H. M. Ensuring Fairness in Machine Learning to Advance Health Equity. Annals of Internal Medicine, vol:169, p866-872,2018, doi:10.7326/M18-1990,PMID: 30508424 DOI: https://doi.org/10.7326/M18-1990

5. Shickel, B., Loftus, T.J., Adhikari, L. et al. DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning. Sci Rep 9, 1879 (2019). https://doi.org/10.1038/s41598-019-38491-0 DOI: https://doi.org/10.1038/s41598-019-38491-0

6. Vishal, R., Sameera, N.,Merlin, M.,Subhadra, J.,Rajeev, R.,Raja, N.,Taraprasad, D. Combined retinal vascular occlusion: Demography, clinical features, visual outcome, systemic co-morbidities, and literature review. Indian Journal of Ophthalmology 68(10):p 2136-2142, October 2020. doi:10.4103/ijo.IJO_2116_19 DOI: https://doi.org/10.4103/ijo.IJO_2116_19

7. Zilong, H., Jiahong, J. Jiaheng, X., Yongjin, Z. Exploring the Use of ChatGPT for Generating Chinese Reading Materials in Eye-Tracking. doi:10.2139/ssrn.4805567 DOI: https://doi.org/10.2139/ssrn.4805567

8. Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jin, D., Naumann, T., & McDermott, M. (2019). Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323. DOI: https://doi.org/10.18653/v1/W19-1909

9. Manandhar, L. D., Thapa, R., & Poudyal, G. (2020). Clinical profile and management of vitreous hemorrhage in tertiary eye care centre in Nepal. Nepalese Journal of Ophthalmology, V12, p99–105. doi:10.3126/nepjoph.v12i1.28143 DOI: https://doi.org/10.3126/nepjoph.v12i1.30338

10. Math, C. C. (2017, August 8). Sudden vision loss. JAMA, 318(6), 516. https://doi.org/10.1001/jama.2017.9734 DOI: https://doi.org/10.1001/jama.2017.7950

11. Lendzioszek, K., et al. (2024). Retinal vein occlusion Background knowledge and foreground knowledge prospects, A Review. Journal of Clinical Medicine, 13(4), 1042. https://doi.org/10.3390/jcm13041042 DOI: https://doi.org/10.3390/jcm13133950

12. Berguig, A., et al. (2023). Central retinal vein occlusion in young population: Risk factors and outcomes. Frontiers in Medicine, 10, 1180234. doi:10.3389/fmed.2023.1180234 DOI: https://doi.org/10.3389/fmed.2023.1180234

13. Friedman, C. (2005). Semantic Text Parsing for Patient Records. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds) Medical Informatics. Integrated Series in Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/0-387-25739-X_15 DOI: https://doi.org/10.1007/0-387-25739-X_15

14. Benjamin C. Brodie, David E. Taylor, and Ron K. Cytron. 2006. A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching. In Proceedings of the 33rd annual international symposium on Computer Architecture (ISCA '06). IEEE Computer Society, USA, 191–202. https://doi.org/10.1109/ISCA.2006.7 DOI: https://doi.org/10.1109/ISCA.2006.7

15. Gibney, D., & Thankachan, S. V. (2021). Text Indexing for Regular Expression Matching. Algorithms, 14(5), 133. https://doi.org/10.3390/a14050133 DOI: https://doi.org/10.3390/a14050133

16. dos Santos Reis, L. V., da Silva Bigonha, R., Di Iorio, V. O., & de Souza Amorim, L. E. Adaptable parsing expression grammars. In Programming Languages: 16th Brazilian Symposium, SBLP 2012. Proceedings p72-86. DOI: https://doi.org/10.1007/978-3-642-33182-4_7

17. Dietterich, T.G. (2000). Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45014-9_1 DOI: https://doi.org/10.1007/3-540-45014-9_1

18. Lee, H. J., Han, J. Y., Park, H. Y. L., & Park, C. K. (2023). Prediction of the cause of fundus-obscuring vitreous hemorrhage using machine learning. Diagnostics, 13(4), 728. https://doi.org/10.3390/diagnostics13040728 DOI: https://doi.org/10.3390/diagnostics13040728

19. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD. DOI: https://doi.org/10.1145/2939672.2939785

20. Rashmi, K. V., & Gilad-Bachrach, R. (2015). DART: Dropouts meet Multiple Additive Regression Trees. NeurIPS.

21. Jimmy S. Chen and Sally L. Baxter (2022). Applications of natural language processing in ophthalmology: present and future.

22. Kamisetty VN, Chidvilas BS, Revathy S, Jeyanthi P, Anu VM, Gladence LM. Digitization of Data from Invoice using OCR. In2022 6th International Conference on Computing Methodologies and Communication (ICCMC) 2022 Mar 29 (pp. 1-10). IEEE. DOI: https://doi.org/10.1109/ICCMC53470.2022.9754117

23. Wang, S., McDermott, M. B. A., Chauhan, G., Hughes, M. C., Naumann, T., & Ghassemi, M. (2019). MIMIC-Extract: A data extraction, preprocessing, and representation pipeline for MIMIC-III. arXiv DOI: https://doi.org/10.1145/3368555.3384469

Hybrid Machine Learning Approaches for Classification of Retinal Vascular Occlusions Using Multisource Clinical Text Data

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications