FURI | Summer 2020
Real Time Multimodal Classification for Social Media Notifications
Aim of this research is to create a model that can take either image, text or both to classify the content of the message into different categories in real time. In the past other researchers tried early, late and common space fusion to solve similar problem. In this research model behaves as, if image input is provided then model first creates the caption and appends it into the text input either text input exists or not. Then feeds that result into text classifier. Resulting model performed better than text classifier on inputs that has image from everyday life. However, model performed worse on images that are creepy/gore. Because, caption generator is not trained to anything like that.