Research Summary
My research focuses on auditing, securing, and deploying reliable AI systems, with an emphasis on foundation models and agentic systems in real-world environments. I study how modern AI systems fail, how their risks can be analyzed and monitored at the system level, and how they can be applied in domains where failures carry significant consequences. This agenda is organized around three closely connected directions:
I develop methods, benchmarks, and open-source systems for auditing foundation models and agentic systems. This work includes trustworthiness evaluation, security analysis of AI pipelines, ecosystem-scale risk scanning, and continuous monitoring that provides evidence for assurance in deployment. Representative systems include TrustLLM and agent-audit.
Keywords: AI Auditing, AI Assurance, Trustworthy AI, Foundation Models, Agentic Systems, TrustLLM, Agent Audit, Monitoring, Evaluation Frameworks, Risk Analysis
I study failure modes and attack surfaces in large language models and agentic systems, and design methods to detect and mitigate unsafe or unreliable behavior. Representative topics include hallucinations, jailbreaks, prompt attacks, privacy leakage, model extraction, failures in multi-agent interactions, and robustness under distribution shift. This direction is also informed by my earlier work on anomaly detection and out-of-distribution detection.
Keywords: LLM Safety, Agent Safety, AI Security, Reliability, Hallucination Mitigation, Jailbreak Detection, Prompt Attacks, Privacy Leakage, Model Extraction, Robustness, OOD Detection, Anomaly Detection
I apply reliable and auditable AI systems to domains where correctness, safety, and accountability matter, including climate and weather forecasting, healthcare and biomedicine, and computational social systems. These applications also serve as demanding testbeds for auditing, assurance, and safety methods.
Keywords: AI for Science, Climate AI, Weather Forecasting, Healthcare AI, Biomedicine, Computational Social Systems, Decision Modeling, High-Stakes AI
Biography.
[Mar 2026] We received an Amazon Research Award under the AI for Information Security program for work on securing agentic AI systems through auditing and guardrails. Thank you, ![]()
[Mar 2026] We released agent-audit, an open-source security auditing tool for AI agent code with OWASP Agentic Top 10 style checks, taint analysis, and MCP configuration auditing. On ClawHub, agent-audit scanned 18,899 skills and found 13,947 vulnerabilities. If useful, please star it on GitHub.
[Feb 2026] Our work on premise verification via retrieval-augmented logical reasoning for reducing hallucinations has been accepted to TMLR. See publications page!
[Jan 2026] Our group contributed to five papers accepted to ICLR 2026 and WWW 2026. Hats off to the collaborators. See publications page!
[Dec 2025] Our entire group is at NeurIPS 2025, in San Diego! Please reach out to our Ph.D. students for collaborating opportunities and internships!
[Nov 2025] 🎉Our work on explainability–extractability tradeoffs in MLaaS wins the Second Prize CCC Award at the IEEE ICDM 2025 BlueSky Track!.
[Nov 2025] Our paper on mitigating hallucinations in LLMs using causal reasoning has been accepted to AAAI 2026! See our Preprint.
[Nov 2025] 🎉LLM-augmented transformers (TyphoFormer) for typhoon forecasting wins the Best Short Paper Award at ACM SIGSPATIAL 2025; see our Preprint!
[Oct 2025] Two new papers accepted to IJCNLP-AACL 2025 Findings — AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection and LLM-Empowered Patient-Provider Communication (a data-centric survey on clinical applications of LLMs). Congratulations to all!
[Oct 2025] 🎉Congratulations to our Ph.D. students Yuehan Qin and Haoyan Xu for successfully passing their qualifying exams! Both of them achieved this after 1.5 years transferring to our group. We are so proud of their accomplishments and excited for their continued research journeys and graduation!
[Sep 2025] 🎉Congratulations to Shawn Li for being selected as an Amazon ML Fellow (2025–2026). The fellowship recognizes his strong research achievements as a PhD student and will further accelerate his work in secure and trustworthy machine learning.
[Sep 2025] New collaborative NeurIPS 2025 paper “DyFlow” proposes a dynamic workflow framework for agentic reasoning with LLMs.