🤓

Hiroshi Nonaka

Aspiring PhD Student in Machine Learning

Hi! 👋 I'm Hiroshi (博志), particularly passionate about the interpretability of deep neural networks and world models.

My senior thesis is on activation plateaus, stable regions in activation spaces. For details, please check out my thesis proposal. This academic year, I am researching the mechanistic interpretability and moral reasoning of LLMs at Relational Cognition Lab at University of California, Irvine. Previously, I researched model-free RL at the University of Maryland, College Park (NeurIPS 2025 ARLET Workshop), narrative representations of LLMs at Soka University of America (NeurIPS 2025 LLM-Evaluation Workshop), VLMs for emotion recognition at Texas State University (IEEE UEMCON 2024), and spatiotemporal understanding in model-based RL at the University of Tokyo.

Latest News

Jan 2025
Thesis updated - Updated my thesis on "activation plateaus" with new results! Please check it out here.
Jan 2025
Blog post - Posted a new blog post: Do Humans Dream of an Objectivity Beyond Subjectivity?
Dec 2025
Blog post - I posted my first blog post on reading list for mech interp beginners!
Oct 2025
I started a blog - I started a blog on machine learning and random thoughts. It's still under constraction, but check it out from the header!
Sep 2025
NeurIPS Workshop Paper Accepted - My first-author paper "Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning" is accepted to NeurIPS 2025 Workshop on ARLET!
Sep 2025
NeurIPS Workshop Paper Accepted - My first-author paper "Evaluating LLM Story Generation through Large-scale Network Analysis of Social Structures" is accepted to NeurIPS 2025 Workshop on Eval-LLM!
Sep 2025
Joining UC Irvine - I joined the Relational Cognition Lab at UCI as an undergrad RA, working on the mechanistic interpretability and moral reasoning of LLMs.