MORE | Spring 2024
Towards Automated Selection of Embedding Models: Identifying the Optimal Parameters for the Baseline Model for TCR Embedding
Analyzing T cell receptor (TCR)-epitope interactions is crucial for identifying therapeutic targets for diseases. TCR clustering illuminates the landscape of clonal expansion, offering potential targets for intervention. Additionally, TCR-epitope binding affinity prediction is instrumental in screening cognate TCRs for combating harmful antigens. Recent advancements, particularly catELMo, have emphasized the significance of effective embeddings in enhancing TCR-related tasks. Despite catELMo’s success, the underlying mechanisms remain elusive. This proposal outlines a large-scale comparative study on TCR embeddings, focusing on optimizing parameters for catELMo, such as learning rate, batch size, and training epochs. Notably, the current leading embedding model, based on bidirectional Long Short-Term Memory (biLSTM), outperforms transformer-based models in prediction tasks. To understand why catELMo performs well, we carry out a large-scale comparative study on TCR embeddings. Because the scale of the entire study is large, the focus will be research on finding the optimal parameters for the baseline model.