Santiago Aranguri

CV
I am a research scientist at Goodfire working on new techniques for auditing LLMs, with a special focus on addressing evaluation awareness.

I am a fourth-year PhD student at NYU (on leave), where I studied scaling laws and phase transitions of neural networks and diffusion models with Arthur Jacot and Eric Vanden-Eijnden. I received my B.S. in Mathematics from Stanford University.

Publications

Discovering Undesired Rare Behaviors via Model Diff Amplification
S. Aranguri, T. McGrath
Used by Anthropic to evaluate Claude Sonnet 4.5, see system card (page 95)

In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution
F. Xiao, S. Aranguri

SAE on activation differences
S. Aranguri, Jacob Drori, Neel Nanda

Inference-Time Toxicity Mitigation in Protein Language Models via Logit-Diff Amplification
M. Burda, S. Aranguri, I. Arcuschin, E. Ferrante
ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design

Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
S. Aranguri, F. Insulla
ICLR 2025 Deep Generative Model in Machine Learning Workshop and Frontiers in Probabilistic Inference Workshop

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Z. Tu, S. Aranguri, A. Jacot
NeurIPS 2024

Optimizing Noise Schedules of Generative Models in High Dimensions
S. Aranguri, G. Biroli, M. Mezard, E. Vanden-Eijnden

Untangling planar graphs and curves by staying positive
S. Aranguri, H. Chang, D. Fridman
ACM-SIAM Symposium on Discrete Algorithms 2022


Contact

You can contact me at santi@goodfire.ai