Hybrid Machine Learning Approaches for Classification of Retinal Vascular Occlusions Using Multisource Clinical Text Data
DOI:
https://doi.org/10.56294/hl2025918Keywords:
Predictive models, Unstructured clinical data, Natural Language Processing (NLP), Retinal vascular occlusions, Predictive accuracyAbstract
Predictive models that incorporate a variety of clinical data have grown in significance as a means of improving healthcare decision-making. There is still a sizable amount of unstructured patient data that is either in the form of handwritten records that have been digitized or free-text doctor notes. This research builds a framework for the two data sources, digital clinical notes and scanned handwritten notes to perform predictive analysis. The data chosen for the research is related to Painless sudden loss of vision which is considered to be a serious ophthalmic emergency and is frequently associated with retinal vascular occlusions. Improving patient outcomes and facilitating prompt intervention need early distinction between its primary causes, which are Central Retinal Artery Occlusion (CRAO), Central Retinal Vein Occlusion (CRVO), Branch Retinal Artery Occlusion (BRAO), and Branch Retinal Vein Occlusion (BRVO). In the first stage, the unstructured clinical data from the textual/scanned format are converted into one single structured data frame using natural language processing (NLP). Structured data is then evaluated with machine learning algorithms and tested with different variations in order to identify the model that delivers the highest predictive accuracy, guided by the characteristics of the clinical data itself.
References
1. Wang, R., Jinmeng, J., Zhongxin, A., Yongli, G., Xi, N., Tieliu, S. RDAD: A Machine Learning System to Support Phenotype-Based Rare Disease Diagnosis. Frontiers in Genetics. Vol 9, 2018. doi:10.3389/fgene.2018.00587 DOI: https://doi.org/10.3389/fgene.2018.00587
2. Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, F. S., Forshee, R., Walderhaug, M., Botsis, T. Natural language processing systems for capturing and standardizing unstructured clinical information: systematic review. Journal of Biomedical Informatics,V3, 2017,ISSN:1532-0464,doi:10.1016/j.jbi.2017.07.012. DOI: https://doi.org/10.1016/j.jbi.2017.07.012
3. S. Khanal and R. Bista, "A Hybrid Model for Deciphering Doctors' Handwriting Notes Recognition," 2024 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 2024, pp. 466-470, doi: 10.1109/IICAIET62352.2024.10730188. DOI: https://doi.org/10.1109/IICAIET62352.2024.10730188
4. Rajkomar, A., Hardt, M., Howell, D. M., Corrado, G., Chin, H. M. Ensuring Fairness in Machine Learning to Advance Health Equity. Annals of Internal Medicine, vol:169, p866-872,2018, doi:10.7326/M18-1990,PMID: 30508424 DOI: https://doi.org/10.7326/M18-1990
5. Shickel, B., Loftus, T.J., Adhikari, L. et al. DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning. Sci Rep 9, 1879 (2019). https://doi.org/10.1038/s41598-019-38491-0 DOI: https://doi.org/10.1038/s41598-019-38491-0
6. Vishal, R., Sameera, N.,Merlin, M.,Subhadra, J.,Rajeev, R.,Raja, N.,Taraprasad, D. Combined retinal vascular occlusion: Demography, clinical features, visual outcome, systemic co-morbidities, and literature review. Indian Journal of Ophthalmology 68(10):p 2136-2142, October 2020. doi:10.4103/ijo.IJO_2116_19 DOI: https://doi.org/10.4103/ijo.IJO_2116_19
7. Zilong, H., Jiahong, J. Jiaheng, X., Yongjin, Z. Exploring the Use of ChatGPT for Generating Chinese Reading Materials in Eye-Tracking. doi:10.2139/ssrn.4805567 DOI: https://doi.org/10.2139/ssrn.4805567
8. Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jin, D., Naumann, T., & McDermott, M. (2019). Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323. DOI: https://doi.org/10.18653/v1/W19-1909
9. Manandhar, L. D., Thapa, R., & Poudyal, G. (2020). Clinical profile and management of vitreous hemorrhage in tertiary eye care centre in Nepal. Nepalese Journal of Ophthalmology, V12, p99–105. doi:10.3126/nepjoph.v12i1.28143 DOI: https://doi.org/10.3126/nepjoph.v12i1.30338
10. Math, C. C. (2017, August 8). Sudden vision loss. JAMA, 318(6), 516. https://doi.org/10.1001/jama.2017.9734 DOI: https://doi.org/10.1001/jama.2017.7950
11. Lendzioszek, K., et al. (2024). Retinal vein occlusion Background knowledge and foreground knowledge prospects, A Review. Journal of Clinical Medicine, 13(4), 1042. https://doi.org/10.3390/jcm13041042 DOI: https://doi.org/10.3390/jcm13133950
12. Berguig, A., et al. (2023). Central retinal vein occlusion in young population: Risk factors and outcomes. Frontiers in Medicine, 10, 1180234. doi:10.3389/fmed.2023.1180234 DOI: https://doi.org/10.3389/fmed.2023.1180234
13. Friedman, C. (2005). Semantic Text Parsing for Patient Records. In: Chen, H., Fuller, S.S., Friedman, C., Hersh, W. (eds) Medical Informatics. Integrated Series in Information Systems, vol 8. Springer, Boston, MA. https://doi.org/10.1007/0-387-25739-X_15 DOI: https://doi.org/10.1007/0-387-25739-X_15
14. Benjamin C. Brodie, David E. Taylor, and Ron K. Cytron. 2006. A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching. In Proceedings of the 33rd annual international symposium on Computer Architecture (ISCA '06). IEEE Computer Society, USA, 191–202. https://doi.org/10.1109/ISCA.2006.7 DOI: https://doi.org/10.1109/ISCA.2006.7
15. Gibney, D., & Thankachan, S. V. (2021). Text Indexing for Regular Expression Matching. Algorithms, 14(5), 133. https://doi.org/10.3390/a14050133 DOI: https://doi.org/10.3390/a14050133
16. dos Santos Reis, L. V., da Silva Bigonha, R., Di Iorio, V. O., & de Souza Amorim, L. E. Adaptable parsing expression grammars. In Programming Languages: 16th Brazilian Symposium, SBLP 2012. Proceedings p72-86. DOI: https://doi.org/10.1007/978-3-642-33182-4_7
17. Dietterich, T.G. (2000). Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45014-9_1 DOI: https://doi.org/10.1007/3-540-45014-9_1
18. Lee, H. J., Han, J. Y., Park, H. Y. L., & Park, C. K. (2023). Prediction of the cause of fundus-obscuring vitreous hemorrhage using machine learning. Diagnostics, 13(4), 728. https://doi.org/10.3390/diagnostics13040728 DOI: https://doi.org/10.3390/diagnostics13040728
19. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD. DOI: https://doi.org/10.1145/2939672.2939785
20. Rashmi, K. V., & Gilad-Bachrach, R. (2015). DART: Dropouts meet Multiple Additive Regression Trees. NeurIPS.
21. Jimmy S. Chen and Sally L. Baxter (2022). Applications of natural language processing in ophthalmology: present and future.
22. Kamisetty VN, Chidvilas BS, Revathy S, Jeyanthi P, Anu VM, Gladence LM. Digitization of Data from Invoice using OCR. In2022 6th International Conference on Computing Methodologies and Communication (ICCMC) 2022 Mar 29 (pp. 1-10). IEEE. DOI: https://doi.org/10.1109/ICCMC53470.2022.9754117
23. Wang, S., McDermott, M. B. A., Chauhan, G., Hughes, M. C., Naumann, T., & Ghassemi, M. (2019). MIMIC-Extract: A data extraction, preprocessing, and representation pipeline for MIMIC-III. arXiv DOI: https://doi.org/10.1145/3368555.3384469
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Santosh Khanal, Rabindra Bista (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.