Vishvak Murahari

I am a 2nd year CS Masters student (focus on Machine Learning) at Georgia Tech, advised by Prof. Devi Parikh and Abhishek Das. I also work closely with Prof. Dhruv Batra. I earned my Bachelors in Computer Science (focus on AI and Devices) from Georgia Tech and I was fortunate to be advised by Prof. Thomas Ploetz and worked closely with Prof. Aman Parnami.

In the past few years, I have had the fortune to intern at Microsoft, Redmond (Summer 2019, 2018, 2017) where I have worked on improving the query re-formulation algorithms at Outlook 365, designing recommendation systems for XBox and developing low latency systems to back large scale privacy dashboard for Windows 10 Users.

I am looking for PhD positions starting Fall 2020. and research internships starting Summer 2020 . I have decided to join Ai2 for the summer. I would be working with the PRIOR team.

Email  /  CV  /  Google Scholar  /  LinkedIn  /  Github  /  Twitter  / 

profile photo

The problems that I work on lie at the intersection of Computer Vision, Machine Learning and Natural Language Processing. Some of my current research interests include:

  • Grounded Language Learning: Teaching agents to talk about environment specific concepts and entities.
  • Transfer Learning in Conversational AI: Transferring from large web scale chit-chat corpuses (eg: reddit, twitter etc.) to smaller task oriented dialog datasets.
  • Learning language through interaction: Teaching agents to talk through either self-play or by interacting with language based environments.

Representative papers are listed under Papers.

3DSP Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari , Dhruv Batra, Devi Parikh, Abhishek Das
arxiv preprint

Following recent trends in representation learning for language, we introduce an approach to leverage pretraining on related large-scale vision-language datasets before transferring to visual dialog. Specifically, we adapt the recently proposed ViLBERT (Lu et al., 2019) model for multi-turn visually-grounded conversation sequences. Our best single model achieves state-of-the-art on Visual Dialog, outperforming prior published work (including model ensembles) by more than 1% absolute on NDCG and MRR. Next, we carefully analyse our model and find that additional finetuning using 'dense' annotations leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model! This highlights a stark trade-off between the two primary metrics for this task -- NDCG and MRR. We find that this is because dense annotations in the dataset do not correlate well with the original ground-truth answers to questions, often rewarding the model for generic responses (e.g. "can't tell").

3DSP Improving Generative Visual Dialog by Answering Diverse Questions
Vishvak Murahari , Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das
EMNLP, 2019

While generative visual dialog models trained with self-talk based RL perform better at the associated downstream task, they suffer from repeated interactions -- resulting in saturation in improvements as the number of rounds increase. To counter this, we devise a simple auxiliary objective that incentivizes Q-Bot to ask diverse questions, thus reducing repetitions and in turn enabling A-Bot to explore a larger state space during RL i.e., be exposed to more visual concepts to talk about, and varied questions to answer.

iswc2018 On attention models in human activity recognition
Vishvak Murahari, Thomas Ploetz
ISWC 2018

Most approaches that model time-series data in human activity recognition based on body-worn sensing (HAR) use a fixed size temporal context to represent different activities. This might, however, not be apt for sets of activities with individually varying durations. We introduce attention models into HAR research as a data driven approach for exploring relevant temporal context. Attention models learn a set of weights over input data, which we leverage to weight the temporal context being considered to model each sensor reading. We also visualize the learned weights to better understand what constitutes relevant temporal context

3DSP Teaching Assistant, Introduction to Robotics and Perception (CS 3630)

As a TA for CS 3630, I was a part of one of the largest hands-on advanced robotics classes in the country, taken by close to 200 students. I advised students on robotic planning, control and localization. I collaborated with co-TAs to develop and improve 2 projects on robot localization. I also engaged with students in-person through weekly office hours and also engaged online through Piazza

iswc2018 Teaching Assistant, Introduction to AI (CS 3600)

Guided more than 300 students on AI projects and homework. Reinforced concepts ranging from probabilistic inference to Neural Networks, Optimization and Reinforcement Leaning. Helped in course development and helped improve existing class projects. Held weekly office hours to engage with students

(Design and CSS courtesy: Jon Barron and Amlaan Bhoi)