I am a Ph.D. candidate in Electrical and Computer Engineering at New York University, advised by Prof. Brandon Reagen. My research explores the mathematical foundations of large language models (LLMs)—how information, geometry, and learning dynamics interact and shape the stability, efficiency, and scaling behavior of LLMs.
In particular, my work pursues three complementary thrusts: representation integrity (entropy budgets, spectral utilization, stability regimes), scientific foundations (information theory, inductive biases, scaling laws), and high-dimensional learning dynamics (eigenspectra, weight manifolds, spectral geometry).
The long-term aim is to build first-principles frameworks for designing and optimizing LLMs while preserving their representational integrity. This effort led to NerVE, an eigenspectral framework that characterizes the nonlinear transformations of FFNs, and uses spectral utilization metrics to quantify FFN width utilization (EMNLP 2025 (Main)).
I also developed AERO, an information-theoretic framework that studies how nonlinearities influence entropy budgets of attention mechanisms and introduces entropy-guided attention for private LLM architectures with fewer nonlinear operations. Preliminary results appeared at PPAI@AAAI'25 and ATTRIB@NeurIPS'24.
Earlier in my Ph.D., as part of the DPRIVE project, I proposed new architectures and algorithms for efficient inference on encrypted data. This includes DeepReDuce (ICML'21, Spotlight), a ReLU-optimization technique, and DeepReShape (TMLR'24), a family of CNNs redesigned for private inference efficieny. Both works redefined the state-of-the-arts in private inference, achieving 3.5× and 8.7× speedups over prior SOTA, respectively.
Recent talks: We presented our work Entopy and Private Language Models at the NYU CILVR Seminar, and Entropy-Guided Attention for Private LLMs on the AI Fireside Chat
Besides research, I have contributed as an (invited) reviewer for NeurIPS (2023-25), ICML (2024, 2025), ICLR (2024, 2025), TMLR (2025), AISTATS (2025), CVPR (2024, 2025), ICCV (2025), and AAAI (2025).
Ph.D. in Neural Architectures for Efficient Private Inference, 2020 - present
New York University
M.Tech. (Research) in Computer Science and Engineering, 2017 - 2020
Indian Institute of Technology Hyderabad
B.Tech. in Electronics and Communication Engineering, 2009 - 2013
National Institute of Technology Surat
See older updates on the All News page.

We introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of Private Inference (PI). By leveraging Shannon’s entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities, beyond ensuring training stability, they are crucial for maintaining attention head diversity. Specifically, we find that their removal triggers two critical failure modes, entropy collapse in deeper layers that destabilizes training, and entropic overload in earlier layers that leads to under-utilization of Multi-Head Attention’s (MHA) representational capacity. We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload. Additionally, we explore inference-efficient alternatives to layer normalization for preventing entropy collapse and stabilizing the training of LLMs with reduced-nonlinearities. Our study bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient PI architecture.

DeepReDuce is a set of optimizations for the judicious removal of ReLUs to reduce private inference latency by leveraging the ReLUs heterogeneity in classical networks. DeepReDuce strategically drops ReLUs upto 4.9x (on CIFAR-100) and 5.7x (on TinyImageNet) for ResNet18 with no loss in accuracy. Compared to the state-of-the-art for private inference DeepReDuce improves accuracy and reduces ReLU count by up to 3.5% (iso-ReLU) and 3.5×(iso-accuracy), respectively.
Responsibilities include:
Cracking the code of private AI: The role of entropy in secure language models
NYU Tandon School of Engineering • March 2025
Team streamlines neural networks to be more adept at computing on encrypted data
NYU Tandon School of Engineering • June 2021
Random Matrix Analysis Reveals Capacity Bottlenecks in Transformer Multi-Head Attention
Quantum Zeitgeist • July 2025
Team streamlines neural networks to be more adept at computing on encrypted data
TechXplore • July 2021
Team streamlines neural networks to be more adept at computing on encrypted data
ScienceDaily • July 2021
Making Private AI Practical: A Review of “Entropy-Guided Attention for Private LLM”
by Roma Shusterman, CTO at Brain Electrophysiological Laboratory (BEL) • March 2025