MORE | Fall 2024
Multi-Modal Learning for Enhanced Ocular Disease Detection Using Visual Question Answering
Early detection of ocular diseases is crucial, yet current diagnostic tools are limited by the scarcity of annotated medical data. This project addresses the challenge by developing a multi-modal system that combines large language models (LLMs) and vision models to analyze both medical images and question-answer pairs derived from the Ocular Disease Intelligent Recognition (ODIR) dataset. By leveraging zero-shot and few-shot learning, the model generates and ranks Q&A pairs based on their relevance to disease detection. The integration of image and textual data allows the system to detect ocular diseases with improved accuracy. This solution has the potential to enhance diagnostic capabilities, particularly in resource-limited settings, by offering a robust tool for early detection and intervention.
Student researcher
Kannak Sharma
Robotics and autonomous systems
Hometown: Amritsar, Punjab, India
Graduation date: Spring 2025