Joel Joseph
Computer science
Hometown: Gilbert, Arizona, United States
Graduation date: Fall 2025
FURI | Spring 2025
Optimizing TCR Embeddings: Reducing Data Requirements for Efficient Model Performance
catELMo, a deep learning-based embedding model for T-cell receptor (TCR) sequences, has significantly improved immunological tasks like TCR-epitope binding prediction. However, its reliance on large-scale datasets raises concerns about its efficiency and accessibility, especially in fields where data collection is resource-intensive. This project aims to determine the minimal dataset required to maintain strong predictive performance by systematically reducing training data and evaluating the impact on model accuracy. Additionally, we will explore a data selection strategy that prioritizes sequence diversity to retain performance with a smaller dataset. By training and testing catELMo on progressively smaller, strategically selected datasets, we will identify a threshold where further reductions degrade accuracy. Our findings will help optimize the TCR modeling, making deep learning-based immunological research more scalable and applicable to real-world settings where data is limited.
Mentor: Heewook Lee