| # | Agent | Type | IC | EC | RC |
|---|---|---|---|---|---|
| 1 | Human | Baseline | 0.90 | 0.66 | 0.94 |
| 2 | Human Simulacra | RAG | 0.79 | 0.63 | 0.87 |
| 3 | Li et al. (2025) | Prompting | 0.73 | 0.59 | 0.98 |
| 4 | DeepPersona | Prompting | 0.72 | 0.54 | 0.92 |
| 5 | Character.ai | Commercial | 0.71 | 0.71 | 0.46 |
| 6 | Twin 2K 500 | Prompting | 0.53 | 0.26 | 0.95 |
| 7 | Consistent LLM | Fine-tuned | 0.31 | 0.30 | 0.14 |
| 8 | OpenCharacter | Fine-tuned | 0.16 | 0.15 | 0.14 |
LLM-based persona agents are increasingly used as proxies for real human participants in medical training, social science, and product design. But how do you know if a persona agent is truly consistent — or just superficially convincing? PICon applies principles from interrogation methodology to systematically probe persona agents through logically chained multi-turn questioning, exposing contradictions that simpler evaluations miss.
Many multi-modal AIs often generate plausible but ungrounded reasoning for chest X-ray images via textual explanations only, making it difficult to verify how conclusions are derived from the image. CXReasonAgent integrates an LLM with clinically grounded diagnostic tools to produce responses based on explicit image-derived evidence such as measurements, spatial observations, and visual overlays.
PatientSim is an open-source, LLM-powered patient simulator that generates realistic and behaviorally diverse patient personas grounded in real clinical data. By combining actual patient information from medical databases with four behavioral dimensions — personality type, language proficiency, medical history recall, and cognitive confusion — it creates 37 distinct patient types for training physicians in clinical interview skills and supporting medical dialogue research.