Nandan Kumar Jha

Nandan Kumar Jha

Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

New York University

About me

I recently completed my Ph.D. in Electrical and Computer Engineering at New York University, advised by Prof. Brandon Reagen. My thesis, Nonlinear Representation Dynamics: Spectral Scaling Laws and Applications to Private AI, studies how nonlinear transformations, architectural choices, and optimization dynamics shape representation geometry, spectral scaling behavior, and efficient private inference.

My research studies representation learning and high-dimensional learning dynamics in language models. I am interested in internal structure that is not visible from aggregate metrics alone: how latent geometry, spectra, entropy, and data movement determine what a model can represent and how efficiently it can be executed.

This agenda has led to three connected lines of work. NerVE and Spectral Scaling Laws quantify nonlinear feed-forward transformations and realized capacity in LLMs; recent work on optimizer-induced spectral scaling laws studies how optimizers change capacity allocation across token regimes. AERO studies entropy dynamics in attention and uses entropy-guided regularization to make private LLM inference more stable and efficient. Earlier, through the DARPA DPRIVE program, DeepReDuce and DeepReShape redesigned neural networks for efficient encrypted inference.

Selected papers, talks, and media coverage are listed below.

Interests
  • Representation Learning
  • Scaling Laws and Spectral Geometry
  • High-Dimensional Learning Dynamics
  • Cryptographically Secure Private AI
Education
  • Ph.D. in Electrical and Computer Engineering, 2020 - 2026

    New York University

  • M.Tech. (Research) in Computer Science and Engineering, 2017 - 2020

    Indian Institute of Technology Hyderabad

  • B.Tech. in Electronics and Communication Engineering, 2009 - 2013

    National Institute of Technology Surat

Research Themes

I study representation learning, scaling laws, and high-dimensional learning dynamics in language models. My work focuses on how optimization, architecture, nonlinearities, and systems constraints shape information flow, representation geometry, entropy dynamics, and realized capacity. Across these settings, I am interested in structure that is not visible from aggregate metrics alone — such as validation loss, latency, or compute — but strongly influences model behavior and efficiency.

Representation Learning and Scaling Laws for LLMs

I develop frameworks for understanding how language models transform and allocate representational capacity across layers, token regimes, optimizers, and scale. While classical scaling laws relate loss to compute, my work studies how internal capacity itself scales through nonlinear eigenspectrum dynamics, spectral scaling laws, and optimizer-induced capacity. A key finding is that the optimizer, not only the architecture, determines how much nominal capacity a model realizes; models with nearly identical validation loss can still differ sharply in their internal representation geometry.

Cryptographically Secure and Efficient Private Inference

I design neural architectures and training methods that reduce the cost of privacy-preserving inference without sacrificing model quality. This work studies how nonlinearities dominate secure-inference cost, and how entropy dynamics in attention reveal failure modes such as entropy collapse and entropic overload. The goal is to treat efficient private inference as a representation-design problem, not only a cryptographic-systems problem.

Hardware-Aware and Efficient ML

My earlier work studied hardware-aware deep learning through compact architectures, roofline performance modeling, and data-reuse analysis. This line of work showed that conventional arithmetic intensity can obscure the data-movement structure that drives energy and efficiency. This systems background shapes how I think about compute bottlenecks, data movement, and the interaction between algorithms, model structure, and hardware.

Current Direction

I am increasingly focused on capacity-aware training and evaluation methods that go beyond loss: measuring whether models preserve useful representational degrees of freedom across scale, optimization, data regimes, and continual adaptation. This includes spectral capacity, entropy regulation, plasticity loss, and architecture–optimizer co-design for models that remain adaptable as they learn.

Selected Papers

Representation Learning and Scaling Laws for LLMs

Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Nandan Kumar Jha, Brandon Reagen
Under review, 2026
arXiv · Project · Code · Blog

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Nandan Kumar Jha, Brandon Reagen
ICLR 2026
arXiv · Project · Code

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?
Nandan Kumar Jha, Brandon Reagen
EMNLP 2025, Main Conference
arXiv · Related code

A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
Nandan Kumar Jha, Brandon Reagen
HiLD Workshop at ICML 2025
arXiv · News

Cryptographically Secure and Efficient Private Inference

AERO: Entropy-Guided Attention for Private LLM Inference
Nandan Kumar Jha, Brandon Reagen
Under review, 2026; earlier version at AAAI PPAI 2025
Earlier arXiv · Code · Video · Press release

DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Nandan Kumar Jha, Brandon Reagen
TMLR 2024
arXiv · Slides

DeepReDuce: ReLU Reduction for Fast Private Inference
Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
ICML 2021, Spotlight
arXiv · Slides · ICML video · Press release

Circa: Stochastic ReLUs for Private Deep Learning
Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg
NeurIPS 2021
arXiv · Poster

Characterizing and Optimizing End-to-End Systems for Private Inference
Karthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen
ASPLOS 2023
arXiv · Code

Hardware-Aware and Efficient ML

ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
Rajat Saini*, Nandan Kumar Jha*, Bedanta Das, Sparsh Mittal, C. Krishna Mohan (*equal contribution)
WACV 2020
Paper · Code · Video

Modeling Data Reuse in Deep Neural Networks by Taking Data-Types into Cognizance
Nandan Kumar Jha, Sparsh Mittal
IEEE Transactions on Computers 2020
arXiv

DRACO: Co-Optimizing Hardware Utilization and Performance of DNNs on Systolic Accelerator
Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik, Dipan Mandal, Mahesh Chandra
ISVLSI 2020
arXiv · Slides


For the complete publication list, see Google Scholar.

Recent Highlights

2026
Jun 2026
My Ph.D. thesis, Nonlinear Representation Dynamics: Spectral Scaling Laws and Applications to Private AI, is now online. Thesis
Jun 2026
Released the preprint for Optimizer-Induced Spectral Scaling Laws, along with the project page, code, and blog post. arXiv · Project · Code · Blog
Apr 2026
Successfully defended my Ph.D. thesis at New York University.
Apr 2026
Presented NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks at ICLR 2026 (Rio de Janeiro, Brazil). arXiv · Project
Feb 2026
Served as a panelist on Under the Hood of AI, an expert panel on AI infrastructure at NYU School of Law. Event
2025
Dec 2025
Presented Regularizing the Entropy Landscape of Self-Attention at the OPT Workshop, NeurIPS 2025 (San Diego). Workshop · OpenReview
Nov 2025
Presented Spectral Scaling Laws in Language Models at EMNLP 2025 (Suzhou, China). arXiv
Jul 2025
Presented two works at ICML 2025 workshops (Vancouver): Spectral Scaling Laws at AIW, and a random-matrix analysis of Multi-head Latent Attention at HiLD. AIW · arXiv
May 2025
Received the ECE Student Research Poster Day Award at New York University, including a $1,000 cash prize.
Apr 2025
Gave the CILVR seminar Entropy and Private Language Models at the NYU Center for Data Science. Seminar · Video
Mar 2025
Presented Entropy-Guided Attention for Private LLMs at the AAAI PPAI Workshop, with coverage from NYU Engineering. arXiv · Press

Selected Talks

2026
Under the Hood of AI
Panelist, Expert Panel on AI Infrastructure, NYU School of Law
2025
Entropy and Private Language Models
CILVR Seminar Series, NYU Center for Data Science
2025
Entropy-Guided Attention for Private LLMs
Ploutos AI Fireside Chat
2021
DeepReDuce: ReLU Reduction for Fast Private Inference
ICML Spotlight Talk

Press & Media

Research Coverage

Random Matrix Analysis Reveals Capacity Bottlenecks in Transformer Multi-Head Attention
Quantum Zeitgeist · July 2025

Cracking the code of private AI: The role of entropy in secure language models
NYU Tandon School of Engineering · March 2025

Team streamlines neural networks to be more adept at computing on encrypted data
NYU Tandon · TechXplore · ScienceDaily · 2021


Article

Making Private AI Practical: A Review of “Entropy-Guided Attention for Private LLM”
by Roma Shusterman, CTO at Brain Electrophysiological Laboratory (BEL) · March 2025


Interview

NYU Tandon graduate students bring a wealth of experience to Brooklyn
NYU Tandon School of Engineering · March 2025

Service

Invited Reviewer

Conferences — NeurIPS (2023–2026), ICLR (2024–2026), ICML (2024–2026), CVPR 2024, ICCV 2025, AISTATS 2025, AAAI 2025
Journals — TMLR (2025–2026), TIFS 2025, JETC 2020

Contact