Projects

# Agent Type IC EC RC
1HumanBaseline0.900.660.94
2Human SimulacraRAG0.790.630.87
3Li et al. (2025)Prompting0.730.590.98
4DeepPersonaPrompting0.720.540.92
5Character.aiCommercial0.710.710.46
6Twin 2K 500Prompting0.530.260.95
7Consistent LLMFine-tuned0.310.300.14
8OpenCharacterFine-tuned0.160.150.14
PICon: A Multi-Turn Interrogation Framework for Evaluating Persona Agent Consistency

LLM-based persona agents are increasingly used as proxies for real human participants in medical training, social science, and product design. But how do you know if a persona agent is truly consistent — or just superficially convincing? PICon applies principles from interrogation methodology to systematically probe persona agents through logically chained multi-turn questioning, exposing contradictions that simpler evaluations miss.

CXReasonAgent: An Evidence-Grounded Diagnostic Reasoning Agent for Chest X-Rays

Many multi-modal AIs often generate plausible but ungrounded reasoning for chest X-ray images via textual explanations only, making it difficult to verify how conclusions are derived from the image. CXReasonAgent integrates an LLM with clinically grounded diagnostic tools to produce responses based on explicit image-derived evidence such as measurements, spatial observations, and visual overlays.

H-AdminSim: A Multi-Agent Simulation Framework for Hospital Administrative Workflows

Hospital administration — intake, scheduling, and patient–staff dialogue — is a critical but understudied target for LLM agents. H-AdminSim is a multi-agent simulation framework that models these workflows by synthesizing patient data across care levels and simulating interactions between LLM-driven staff and patient agents, with optional FHIR R5 integration for compatibility with real hospital information systems. LLMs are scored via rubric-based evaluation across intake, scheduling, and dialogue quality.

PatientSim demo
PatientSim: An Open-Source LLM-Powered Patient Simulator

PatientSim is an open-source, LLM-powered patient simulator that generates realistic and behaviorally diverse patient personas grounded in real clinical data. By combining actual patient information from medical databases with four behavioral dimensions — personality type, language proficiency, medical history recall, and cognitive confusion — it creates 37 distinct patient types for training physicians in clinical interview skills and supporting medical dialogue research.