FURI | Spring 2025

Developing Robust Transcription Models for Ambulance Call Data Using Domain Randomization Techniques

Health icon, disabled. A red heart with a cardiac rhythm running through it.

The researcher evaluated whisper’s speech-to-text model’s performance for hospital transcriptions. Initially, the model achieved a WER (word error rate) close to 53%. The researcher then worked on implementing domain randomization by adding noise to the dataset and fine-tuning the whisper model with this new dataset, achieving a WER of close to 47%. This has shown that techniques like domain randomization and data augmentation can effectively improve transcription accuracy. In the future, the researcher will work on exploring other speech-to-text models and refining noise injection techniques like adding real-world noise to the dataset (ambulance and traffic noises, etc.).

Student researcher

Sai Shiva Satwik Mallajosyula

Computer science

Hometown: Visakhapatnam, Andhra Pradesh, India

Graduation date: Spring 2025