Research Summary
My research focuses on AI auditing, safety, and security for agent systems, with related work on foundation models and their deployment in real-world environments. I study how modern AI systems fail, how their behavior and risks can be evaluated, monitored, and controlled at the system level, and how they can be deployed responsibly in domains where failures carry significant consequences. This agenda is organized around three closely connected directions:
I develop methods, benchmarks, and open-source systems for auditing foundation models and agent systems. This work includes trustworthiness evaluation, security analysis of AI pipelines, ecosystem-scale risk scanning, and continuous monitoring that provides evidence for assurance in deployment. Representative systems include agent-audit and TrustLLM.
Keywords: AI Auditing, AI Assurance, Trustworthy AI, Foundation Models, Agent Systems, TrustLLM, agent-audit, Monitoring, Evaluation Frameworks, Risk Analysis
I study failure modes, attack surfaces, and runtime control layers in large language models and agent systems, and design methods to detect and mitigate unsafe or compromised behavior. Representative topics include hallucinations, jailbreaks, prompt attacks, privacy leakage, model extraction, failures in multi-agent interactions, and runtime guardrails such as policy enforcement and approval workflows for tool-using agents. Representative systems include Aegis. This direction is also informed by my earlier work on anomaly detection and out-of-distribution detection.
Keywords: LLM Safety, Agent Safety, AI Security, AI Agent Security, Runtime Guardrails, Policy Enforcement, Hallucination Mitigation, Jailbreak Detection, Prompt Attacks, Privacy Leakage, Model Extraction, Robustness, OOD Detection, Anomaly Detection
I apply reliable and auditable AI systems to domains where correctness, safety, and accountability matter, including climate and weather forecasting, healthcare and biomedicine, and computational social systems. These applications also serve as demanding testbeds for auditing, assurance, and safety methods.
Keywords: AI for Science, Climate AI, Weather Forecasting, Healthcare AI, Biomedicine, Computational Social Systems, Decision Modeling, High-Stakes AI
Biography.
[Mar 2026] We received an Amazon Research Award under the AI for Information Security program for work on securing agentic AI systems through auditing and guardrails. Thank you, ![]()
[Mar 2026] We contributed to Aegis, the open-source firewall for AI agents, with pre-execution blocking, human-in-the-loop approvals, and tamper-evident audit trails. A quick demo is available in the repository; if relevant to your workflows, please star it on GitHub.
[Mar 2026] We released agent-audit, an open-source security auditing tool for AI agent code with OWASP Agentic Top 10 style checks, taint analysis, and MCP configuration auditing. On ClawHub, agent-audit scanned 18,899 skills and found 13,947 vulnerabilities. If useful, please star it on GitHub.
[Feb 2026] Our work on premise verification via retrieval-augmented logical reasoning for reducing hallucinations has been accepted to TMLR. See publications page!
[Jan 2026] Our group contributed to five papers accepted to ICLR 2026 and WWW 2026. Hats off to the collaborators. See publications page!
[Dec 2025] Our entire group is at NeurIPS 2025, in San Diego! Please reach out to our Ph.D. students for collaborating opportunities and internships!
[Nov 2025] 🎉Our work on explainability–extractability tradeoffs in MLaaS wins the Second Prize CCC Award at the IEEE ICDM 2025 BlueSky Track!.
[Nov 2025] Our paper on mitigating hallucinations in LLMs using causal reasoning has been accepted to AAAI 2026! See our Preprint.
[Nov 2025] 🎉LLM-augmented transformers (TyphoFormer) for typhoon forecasting wins the Best Short Paper Award at ACM SIGSPATIAL 2025; see our Preprint!
[Oct 2025] Two new papers accepted to IJCNLP-AACL 2025 Findings — AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection and LLM-Empowered Patient-Provider Communication (a data-centric survey on clinical applications of LLMs). Congratulations to all!
[Oct 2025] 🎉Congratulations to our Ph.D. students Yuehan Qin and Haoyan Xu for successfully passing their qualifying exams! Both of them achieved this after 1.5 years transferring to our group. We are so proud of their accomplishments and excited for their continued research journeys and graduation!
[Sep 2025] 🎉Congratulations to Shawn Li for being selected as an Amazon ML Fellow (2025–2026). The fellowship recognizes his strong research achievements as a PhD student and will further accelerate his work in secure and trustworthy machine learning.
[Sep 2025] New collaborative NeurIPS 2025 paper “DyFlow” proposes a dynamic workflow framework for agentic reasoning with LLMs.