Capturing complex spatial and temporal structure in high-bandwidth, noisy, ambiguous data streams is a significant challenge in even the most modern signal/image analysis systems. Current computational approaches are overwhelmingly compute intensive and are only able to extract limited spatial structure from modest quantities of data. The objective of this project is to represent a family of cortical processing models, that are able to handle different data types, in addition to constantly optimizing their performance automatically in order to handle new data. Such algorithms would be necessary to a cortical processor, including temporal/spatial recognition with a unified architecture and a modular structure. The cortical computational model should be fault tolerant to gaps in data, massively parallel, extremely power efficient, and highly scalable.
Unified: We formulated a novel hybrid model that exploits the strength of discriminative classifiers along with the representation power of generative models. Our focus is on detecting unimodal and multimodal events in time varying sequences as well as generating missing data in any of the modalities. Discriminative classifiers have been shown to achieve higher performances than the corresponding generative likelihood-based classifiers. On the other hand, generative models learn a rich informative space which allows for data generation and joint feature representation that discriminative models lack. We propose a new model that jointly optimizes the representation and classification spaces using a hybrid energy function. We employed a Restricted Boltzmann Machines (RBMs) based model to learn a shared representation across multiple modalities with time varying data. The Conditional RBMs (CRBMs) is an extension of the RBM model that takes into account short term temporal phenomena. The hybrid model involves augmenting CRBMs with a discriminative component for classification. For these purposes we propose a novel Multimodal Discriminative CRBMs (MMDCRBMs) model. Finally, we extend the model with a factored multi-task component that enables scaling over larger number of classes without increasing the number of parameters.
Unimodal: Our approach was successfully applied to the problem of GPU activity prediction which is an important and complex problem. This is due to the high level of contention among thousands of parallel threads. This problem was mostly addressed using heuristics. We model any performance metric as a temporal function of the executed instructions with the intuition that the flow of instructions can be identified as distinct activities of the code.
Multimodal: Our approach was also applied to multimodal datasets such as ChaLearn gestures dataset, audio-mocap, as well as the Tower Game dataset, mocap-mocap as well as three multimodal toy datasets. We report classification accuracy, generation accuracy, and localization accuracy and demonstrate its superiority compared to the state-of-the-art methods.
Multi-task: We evaluate our approach on two publicly available datasets, time-series with multiple lable per instances, the Body Affect dataset and the Tower Game dataset. In addition to three publicly available datasets, static with mulitple lables, Celebrity Faces (CelebA), Multi-task Facial Landmarks (MTFL), and ChaLearn facial attributes dataset. We show superior classification performance improvement over the state-of-the-art.
A popular testbed for deep learning has been multimodal recognition of human activity or gesture involving diverse inputs such as video, audio, skeletal pose and depth images. Deep learning architectures have excelled on such problems due to their ability to combine modality representations at different levels of nonlinear feature extraction. However, designing an optimal architecture in which to fuse such learned representations has largely been a non-trivial human engineering effort. We treat fusion structure optimization as a hyper-parameter search and cast it as a discrete optimization problem under the Bayesian optimization framework. We propose a novel graph-induced kernel to compute structural similarities in the search space of tree-structured multimodal architectures and demonstrate its effectiveness using two challenging multimodal human activity recognition datasets.
D. Ramachandram, M Lisicki, TJ Shields, M.R. Amer, GW Taylor. Bayesian optimization on graph-structured search spaces: Optimizing deep multimodal fusion architectures. Neurocomputing, 2018. PDF
D. Ramachandram, T. J. Shields, M. Lisicki, M. R. Amer and G. W. Taylor. Structure Optimization for Deep Multimodal Fusion Networks using Graph-Induced Kernels. European Symposium on Artificial Neural Networks, 2017. PDF
M. R. Amer, T. J. Shields, B. Siddiquie, A. Tamrakar, A. Divakaran, S. Chai, Deep Multimodal Fusion: A Hybrid Approach, International Journal of Computer Vision, 2017. PDF
T. J. Shields*, M. R. Amer*, M. Ehrlich, A. Tamrakar. Action-Affect-Gender Classification using Multi-Task Representation Learning. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. PDF
M. Ehrlich, T. J. Shields, T. Almaev, M. R. Amer. Facial Attributes Classification using Multi-Task Representation Learning. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016. PDF
A. Raghavan, M. R. Amer, T. J. Shields, D. Zhang, S. Chai. GPU Activity Recognition using Representation Learning. International Conference on Machine Learning Workshops, 2016. PDF
M. R. Amer, T. J. Shields, A. Tamrakar, M. Ehrlich, T. Almaev. Deep Multi-Task Representation Learning. WO2017161233, Pending. PDF
S. M. Chai, D. C. Zhang, M. R. Amer, T. J. Shields, A. N. Raghavan. Low precision neural networks using subband decomposition. WO2017176384, Pending. PDF
S. M. Chai, D. C. Zhang, M. R. Amer, T. J. Shields, A. N. Raghavan, B. Ramamurthy. Systems and methods for optimizing operations of computing devices using deep neural networks. US15625578, Pending. PDF