MORE | Fall 2024

Multi-Modal Learning for Enhanced Ocular Disease Detection Using Visual Question Answering

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

Early detection of ocular diseases is crucial, yet current diagnostic tools are limited by the scarcity of annotated medical data. This project addresses the challenge by developing a multi-modal system that combines large language models (LLMs) and vision models to analyze both medical images and question-answer pairs derived from the Ocular Disease Intelligent Recognition (ODIR) dataset. By leveraging zero-shot and few-shot learning, the model generates and ranks Q&A pairs based on their relevance to disease detection. The integration of image and textual data allows the system to detect ocular diseases with improved accuracy. This solution has the potential to enhance diagnostic capabilities, particularly in resource-limited settings, by offering a robust tool for early detection and intervention.

View the poster