DARPA - Strategic Social Interaction Module: Modeling Human Social Interaction in Multimodal Data

The objective of this project is to model social interactions between humans with focuses on the science of social interactions and human dynamics, technological and pedagogical design of training tools for developing human dynamics interaction proficiencies, and assessment of SSIM training and subsequent performance outcomes. 

With the help of social psychologist, we defined a set of social interaction predicates such as joint attention and entrainment. Based on sound social psychological theory and methodology we collected a new dataset, “Tower Game”, consisting of audio-visual capture of dyadic interactions labeled with social interaction predicates.

For modeling of social interaction in multimodal data, we developed three a novel hybrid represenation learning models to uncover actionable constituents of social interaction predicates in audio, motion capture, and multmodal data. For our initial model, we developed a unimodal hybrid model which consists of a generative model, which is used for unsupervised representation learning of short term temporal phenomena and a discriminative model, which is used for event detection and classification of long range temporal dynamics. We applied this model to audio only and achieved state-of-the-art results. We then extended the audio only model to model two sets of homogenous data and used it to analyize motion capture data of two people interacting. Finally, we extended our model to address heterogeneous data and applied it to audio and motion capture data.


D. A. Salter, A. Tamrakar, B. Siddiquie, M. R. Amer, A. Divakaran, B. Lande , D. Mehri. The Tower Game Dataset: A Multimodal Dataset for Analyzing Social Interaction Predicates. Affective Computing and Intelligent Interactions, 2015. PDF

M. R. Amer, B. Siddiquie, C. Richey, A. Divakaran. Emotion Detection in Speech using Deep Networks. International Conference on Acoustics, Speech, and Signal Processing, 2014. PDF
M. R. Amer,
B. Siddiquie, A. Tamrakar, D. A. Salter, B. Lande, D. Mehri, Ajay Divakaran. Human Social Interaction Modeling using Temporal Deep Networks. arXiv 2015. PDF

M. R. Amer, B. Siddiquie, S. Khan, A. Divakaran, H. Sawhney. Multimodal Fusion using Dynamic Hybrid Models. Winter Conference on Applications of Computer Vision, 2014. PDF


M. R. Amer, B. Siddiquie, A. Divakaran, C. Richey, S. Khan, H. Sawhney, T. J. Shields. Dynamic Hybrid Models for Multimodal Analysis. US9875445, 2018. PDF