FURI | Fall 2023
Multimodal Self-Supervised Approach to Text-to-Music Generation
Discrete Music Language Models have the ability to generate musical pieces with various genres in discrete formats such as MIDI and piano roll, which offers more flexibility and customizability. Meanwhile, recent works show that current Discrete Music Language Models have limitations when it comes to the relationship between text and music domains, which raises challenges to step toward text-music generation. Therefore, this research focuses on developing conditioning discrete music ability by proposing a transformer model that merges an encoder text language model and a decoder music language model, with the purpose of capturing the text-music relationship to learn text-music generation.
Student researcher
Nick Nguyen
Computer systems engineering
Hometown: Ho Chi Minh City, Ho Chi Minh, Vietnam
Graduation date: Spring 2026