Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering
Nathan Mao , Varun Kaushik , Shreya Shivkumar , Parham Sharafoleslami , Kevin Zhu , Sunishchal Dev
FalseCite — a curated dataset of 82k false claims paired with fabricated citations — reveals that LLMs hallucinate more readily when misleading references are present, especially in smaller models like GPT-4o-mini. Hidden-state clustering exposes a distinctive 'horn-like' geometry across hallucinating and non-hallucinating activations.