I recently completed my Ph.D. in Electrical and Computer Engineering at New York University, advised by Prof. Brandon Reagen. My thesis, Nonlinear Representation Dynamics: Spectral Scaling Laws and Applications to Private AI, studies how nonlinear transformations, architectural choices, and optimization dynamics shape representation geometry, spectral scaling behavior, and efficient private inference.
My research studies representation learning and high-dimensional learning dynamics in language models. I am interested in internal structure that is not visible from aggregate metrics alone: how latent geometry, spectra, entropy, and data movement determine what a model can represent and how efficiently it can be executed.
This agenda has led to three connected lines of work. NerVE and Spectral Scaling Laws quantify nonlinear feed-forward transformations and realized capacity in LLMs; recent work on optimizer-induced spectral scaling laws studies how optimizers change capacity allocation across token regimes. AERO studies entropy dynamics in attention and uses entropy-guided regularization to make private LLM inference more stable and efficient. Earlier, through the DARPA DPRIVE program, DeepReDuce and DeepReShape redesigned neural networks for efficient encrypted inference.
Selected papers, talks, and media coverage are listed below.
Ph.D. in Electrical and Computer Engineering, 2020 - 2026
New York University
M.Tech. (Research) in Computer Science and Engineering, 2017 - 2020
Indian Institute of Technology Hyderabad
B.Tech. in Electronics and Communication Engineering, 2009 - 2013
National Institute of Technology Surat
I study representation learning, scaling laws, and high-dimensional learning dynamics in language models. My work focuses on how optimization, architecture, nonlinearities, and systems constraints shape information flow, representation geometry, entropy dynamics, and realized capacity. Across these settings, I am interested in structure that is not visible from aggregate metrics alone — such as validation loss, latency, or compute — but strongly influences model behavior and efficiency.
I develop frameworks for understanding how language models transform and allocate representational capacity across layers, token regimes, optimizers, and scale. While classical scaling laws relate loss to compute, my work studies how internal capacity itself scales through nonlinear eigenspectrum dynamics, spectral scaling laws, and optimizer-induced capacity. A key finding is that the optimizer, not only the architecture, determines how much nominal capacity a model realizes; models with nearly identical validation loss can still differ sharply in their internal representation geometry.
I design neural architectures and training methods that reduce the cost of privacy-preserving inference without sacrificing model quality. This work studies how nonlinearities dominate secure-inference cost, and how entropy dynamics in attention reveal failure modes such as entropy collapse and entropic overload. The goal is to treat efficient private inference as a representation-design problem, not only a cryptographic-systems problem.
My earlier work studied hardware-aware deep learning through compact architectures, roofline performance modeling, and data-reuse analysis. This line of work showed that conventional arithmetic intensity can obscure the data-movement structure that drives energy and efficiency. This systems background shapes how I think about compute bottlenecks, data movement, and the interaction between algorithms, model structure, and hardware.
I am increasingly focused on capacity-aware training and evaluation methods that go beyond loss: measuring whether models preserve useful representational degrees of freedom across scale, optimization, data regimes, and continual adaptation. This includes spectral capacity, entropy regulation, plasticity loss, and architecture–optimizer co-design for models that remain adaptable as they learn.
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Nandan Kumar Jha, Brandon Reagen
Under review, 2026
arXiv · Project · Code · Blog
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Nandan Kumar Jha, Brandon Reagen
ICLR 2026
arXiv · Project · Code
Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?
Nandan Kumar Jha, Brandon Reagen
EMNLP 2025, Main Conference
arXiv · Related code
A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
Nandan Kumar Jha, Brandon Reagen
HiLD Workshop at ICML 2025
arXiv · News
AERO: Entropy-Guided Attention for Private LLM Inference
Nandan Kumar Jha, Brandon Reagen
Under review, 2026; earlier version at AAAI PPAI 2025
Earlier arXiv · Code · Video · Press release
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Nandan Kumar Jha, Brandon Reagen
TMLR 2024
arXiv · Slides
DeepReDuce: ReLU Reduction for Fast Private Inference
Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
ICML 2021, Spotlight
arXiv · Slides · ICML video · Press release
Circa: Stochastic ReLUs for Private Deep Learning
Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg
NeurIPS 2021
arXiv · Poster
Characterizing and Optimizing End-to-End Systems for Private Inference
Karthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen
ASPLOS 2023
arXiv · Code
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
Rajat Saini*, Nandan Kumar Jha*, Bedanta Das, Sparsh Mittal, C. Krishna Mohan (*equal contribution)
WACV 2020
Paper · Code · Video
Modeling Data Reuse in Deep Neural Networks by Taking Data-Types into Cognizance
Nandan Kumar Jha, Sparsh Mittal
IEEE Transactions on Computers 2020
arXiv
DRACO: Co-Optimizing Hardware Utilization and Performance of DNNs on Systolic Accelerator
Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik, Dipan Mandal, Mahesh Chandra
ISVLSI 2020
arXiv · Slides
For the complete publication list, see Google Scholar.
Random Matrix Analysis Reveals Capacity Bottlenecks in Transformer Multi-Head Attention
Quantum Zeitgeist · July 2025
Cracking the code of private AI: The role of entropy in secure language models
NYU Tandon School of Engineering · March 2025
Team streamlines neural networks to be more adept at computing on encrypted data
NYU Tandon · TechXplore · ScienceDaily · 2021
Making Private AI Practical: A Review of “Entropy-Guided Attention for Private LLM”
by Roma Shusterman, CTO at Brain Electrophysiological Laboratory (BEL) · March 2025
NYU Tandon graduate students bring a wealth of experience to Brooklyn
NYU Tandon School of Engineering · March 2025
Conferences — NeurIPS (2023–2026), ICLR (2024–2026), ICML (2024–2026), CVPR 2024, ICCV 2025, AISTATS 2025, AAAI 2025
Journals — TMLR (2025–2026), TIFS 2025, JETC 2020