Publications

Research on LLM reasoning, hallucination mitigation, and confidence calibration. Most recent first.

2026

Preprint · 2026 · advisor

Visualizing and Benchmarking LLM Factual Hallucination Tendencies via Internal State Analysis and Clustering

Nathan Mao , Varun Kaushik , Shreya Shivkumar , Parham Sharafoleslami , Kevin Zhu , Sunishchal Dev

FalseCite — a curated dataset of 82k false claims paired with fabricated citations — reveals that LLMs hallucinate more readily when misleading references are present, especially in smaller models like GPT-4o-mini. Hidden-state clustering exposes a distinctive 'horn-like' geometry across hallucinating and non-hallucinating activations.

arXiv ↗ PDF ↗ #hallucination#benchmark#interpretability

2025

Preprint · 2025

COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation

Kenji Sahay , Snigdha Pandya , Rohan Nagale , Anna Lin , Shikhar Shiromani , Parham Sharaf , Kevin Zhu , Sunishchal Dev

A decoding-time intervention that dynamically steers attention toward retrieved context using a PID controller driven by a per-head Context Reliance Score. No retraining, no multi-pass decoding — just interpretable, single-stream control of evidence grounding. Reduces hallucinations by 2.8–5.8% absolute across HotpotQA, XSum, HaluEval, and RAGTruth.

arXiv ↗ PDF ↗ #hallucination#decoding#attention#interpretability

NeurIPS 2025 · 2025 · advisor

Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis

Abhishek More , Anthony Zhang , Nicole Bonilla , Ashvik Vivekan , Kevin Zhu , Parham Sharafoleslami , Maheep Chaudhary

EDTR estimates LLM confidence by treating each chain-of-thought as a vector in semantic space and analyzing the geometry of the reasoning distribution. Combined with Dirichlet-based uncertainty quantification, it achieves 41% better calibration than competing methods and perfect accuracy on AIME.

arXiv ↗ PDF ↗ #calibration#chain-of-thought#uncertainty