Preprint · 2026 · advisor
Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
Nathan Mao , Varun Kaushik , Shreya Shivkumar , Parham Sharafoleslami , Kevin Zhu , Sunishchal Dev
FalseCite — a curated dataset of 82k false claims paired with fabricated citations — reveals that LLMs hallucinate more readily when misleading references are present, especially in smaller models like GPT-4o-mini. Hidden-state clustering exposes a distinctive 'horn-like' geometry across hallucinating and non-hallucinating activations.
Preprint · 2025
COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation
Kenji Sahay , Snigdha Pandya , Rohan Nagale , Anna Lin , Shikhar Shiromani , Parham Sharaf , Kevin Zhu , Sunishchal Dev
A decoding-time intervention that dynamically steers attention toward retrieved context using a PID controller driven by a per-head Context Reliance Score. No retraining, no multi-pass decoding — just interpretable, single-stream control of evidence grounding. Reduces hallucinations by 2.8–5.8% absolute across HotpotQA, XSum, HaluEval, and RAGTruth.
arXiv ↗
PDF ↗
#hallucination#decoding#attention#interpretability NeurIPS 2025 · 2025 · advisor
Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
Abhishek More , Anthony Zhang , Nicole Bonilla , Ashvik Vivekan , Kevin Zhu , Parham Sharafoleslami , Maheep Chaudhary
EDTR estimates LLM confidence by treating each chain-of-thought as a vector in semantic space and analyzing the geometry of the reasoning distribution. Combined with Dirichlet-based uncertainty quantification, it achieves 41% better calibration than competing methods and perfect accuracy on AIME.