I am a research scientist at Goodfire working on new techniques for auditing LLMs, with a special focus on addressing evaluation awareness.
I am a fourth-year PhD student at NYU (on leave), where I studied scaling laws and phase transitions of neural networks and diffusion models with Arthur Jacot and Eric Vanden-Eijnden. I received my B.S. in Mathematics from Stanford University.
Discovering Undesired Rare Behaviors via Model Diff Amplification
S. Aranguri, T. McGrath
Used by Anthropic to evaluate Claude Sonnet 4.5, see
system card (page 95)
In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution
F. Xiao,
S. Aranguri
SAE on activation differences
S. Aranguri, Jacob Drori, Neel Nanda
Inference-Time Toxicity Mitigation in Protein Language Models via Logit-Diff Amplification
M. Burda,
S. Aranguri, I. Arcuschin, E. Ferrante
ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design
Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
S. Aranguri, F. Insulla
ICLR 2025 Deep Generative Model in Machine Learning Workshop and Frontiers in Probabilistic Inference Workshop
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Z. Tu,
S. Aranguri, A. Jacot
NeurIPS 2024
Optimizing Noise Schedules of Generative Models in High Dimensions
S. Aranguri, G. Biroli, M. Mezard, E. Vanden-Eijnden
Untangling planar graphs and curves by staying positive
S. Aranguri, H. Chang, D. Fridman
ACM-SIAM Symposium on Discrete Algorithms 2022