I am a PhD candidate at the Center for Cybersecurity, New York University (NYU), advised by Prof. Brandon Reagen. I’m broadly interested in cryptographically secure privacy-preserving machine learning (PPML) and work at the intersection of deep learning and applied cryptography (homomorphic encryption and multiparty computation) as a part of DPRIVE projects. My research primarily focuses on developing innovative architectures and algorithms to optimize neural network computations on encrypted data.
In my early PhD, I worked on designing nonlinear-efficient CNNs and developed ReLU-optimization techniques (DeepReDuce, ICML'21), and proposed methods for redesigning existing CNNs (DeepReShape, TMLR'24) for end-to-end private inference efficiency.
My current research focuses on the privacy and security of large language models (LLMs). Specifically, I am investigating the role of nonlinearity in GPT models (Our preliminary findings, ATTRIB@NeurIPS'24), aiming to develop innovative methods for designing GPT models with fewer nonlinearities for efficient private inference.
I have also served as an invited reviewer for NeurIPS'23 and ‘24, ICLR'24, CVPR'24, and ICML'24. If you are interested in collaborating, please feel free to email me!
Ph.D. in Privacy-preserving Deep Learning, 2020 - present
New York University
M.Tech. (Research Assistant) in Computer Science and Engineering, 2017 - 2020
Indian Institute of Technology Hyderabad
B.Tech. in Electronics and Communication Engineering, 2009 - 2013
National Institute of Technology Surat
In this work, we present a comprehensive analysis to understand the role of nonlinearities in transformer-based decoder-only language models. We introduce AERO, a four-step architectural optimization framework that refines the existing LLM architecture for efficient PI by systematically removing nonlinearities such as LayerNorm and GELU and reducing FLOPs counts. For the first time, we propose a Softmax-only architecture with significantly fewer FLOPs tailored for efficient PI. Furthermore, we devise a novel entropy regularization technique to improve the performance of Softmax-only models. AERO achieves up to 4.23x communication and 1.94x latency reduction.
DeepReDuce is a set of optimizations for the judicious removal of ReLUs to reduce private inference latency by leveraging the ReLUs heterogeneity in classical networks. DeepReDuce strategically drops ReLUs upto 4.9x (on CIFAR-100) and 5.7x (on TinyImageNet) for ResNet18 with no loss in accuracy. Compared to the state-of-the-art for private inference DeepReDuce improves accuracy and reduces ReLU count by up to 3.5% (iso-ReLU) and 3.5×(iso-accuracy), respectively.
Responsibilities include: