Yue Zhao - LLM Safety, Robustness, Agents, AI Safety, and ML Systems

External Employment Disclosure (updated as of 02/01/2026)
I am not affiliated with any company in an employment, consulting, or advisory capacity.
I am open to future external opportunities on a part-time, advising, or visiting basis.
See Collaboration for current collaboration scope and contact details.
Lab Openings. We are warmly welcoming new members to the FORTIS Lab!
Hiring Ph.D. Students for Fall 2027 (likely 1-2 opening):
  • Ph.D. student hiring for 2026 Fall has finished.
  • Also check other labs with openings. Good luck!
  • Future Ph.D. students should be comparable to our 1st year Ph.D. students -- see FORTIS Lab.
Priority Signal for Recruiting (Future Ph.D. + Current Intern): We especially value candidates who enjoy open-source and can ship practical research tools/demos. This profile is rare and will be prioritized across both tracks.
Research Collaborators/Interns (Any Time, All Year Round):
  • We welcome both undergraduate and graduate interns from USC and other institutions.
  • We provide compute support, including in-house GPUs, LLM API credits, and cloud resources (primarily AWS + NSF access).
  • For high-performing MS/undergraduate researchers, we may consider hourly pay on a case-by-case basis.
  • Preferred candidates are located in North America for time zone compatibility.
  • I do not hire in-person summer interns -- I am enjoying summer and working remotely :)
Application Process: To apply for either opportunities, complete the Application Form, email fortis@usc.edu after submitting the form, and review the FORTIS Lab website for more information before reaching out.

Research Summary

My research focuses on auditing, securing, and deploying reliable AI systems, with an emphasis on foundation models and agentic systems in real-world environments. I study how modern AI systems fail, how their risks can be analyzed and monitored at the system level, and how they can be applied in domains where failures carry significant consequences. This agenda is organized around three closely connected directions:

  1. AI Auditing & Assurance

    I develop methods, benchmarks, and open-source systems for auditing foundation models and agentic systems. This work includes trustworthiness evaluation, security analysis of AI pipelines, ecosystem-scale risk scanning, and continuous monitoring that provides evidence for assurance in deployment. Representative systems include TrustLLM and agent-audit.

    Keywords: AI Auditing, AI Assurance, Trustworthy AI, Foundation Models, Agentic Systems, TrustLLM, Agent Audit, Monitoring, Evaluation Frameworks, Risk Analysis

  2. AI Safety & Reliability

    I study failure modes and attack surfaces in large language models and agentic systems, and design methods to detect and mitigate unsafe or unreliable behavior. Representative topics include hallucinations, jailbreaks, prompt attacks, privacy leakage, model extraction, failures in multi-agent interactions, and robustness under distribution shift. This direction is also informed by my earlier work on anomaly detection and out-of-distribution detection.

    Keywords: LLM Safety, Agent Safety, AI Security, Reliability, Hallucination Mitigation, Jailbreak Detection, Prompt Attacks, Privacy Leakage, Model Extraction, Robustness, OOD Detection, Anomaly Detection

  3. AI for Science & Society

    I apply reliable and auditable AI systems to domains where correctness, safety, and accountability matter, including climate and weather forecasting, healthcare and biomedicine, and computational social systems. These applications also serve as demanding testbeds for auditing, assurance, and safety methods.

    Keywords: AI for Science, Climate AI, Weather Forecasting, Healthcare AI, Biomedicine, Computational Social Systems, Decision Modeling, High-Stakes AI

Biography.


✈ News and Travel

[Mar 2026] We received an Amazon Research Award under the AI for Information Security program for work on securing agentic AI systems through auditing and guardrails. Thank you, Amazon logo

[Mar 2026] We released agent-audit, an open-source security auditing tool for AI agent code with OWASP Agentic Top 10 style checks, taint analysis, and MCP configuration auditing. On ClawHub, agent-audit scanned 18,899 skills and found 13,947 vulnerabilities. If useful, please star it on GitHub.

[Feb 2026] Our work on premise verification via retrieval-augmented logical reasoning for reducing hallucinations has been accepted to TMLR. See publications page!

[Jan 2026] Our group contributed to five papers accepted to ICLR 2026 and WWW 2026. Hats off to the collaborators. See publications page!

[Dec 2025] Our entire group is at NeurIPS 2025, in San Diego! Please reach out to our Ph.D. students for collaborating opportunities and internships!

[Nov 2025] 🎉Our work on explainability–extractability tradeoffs in MLaaS wins the Second Prize CCC Award at the IEEE ICDM 2025 BlueSky Track!.

[Nov 2025] Our paper on mitigating hallucinations in LLMs using causal reasoning has been accepted to AAAI 2026! See our Preprint.

Show more news

[Nov 2025] 🎉LLM-augmented transformers (TyphoFormer) for typhoon forecasting wins the Best Short Paper Award at ACM SIGSPATIAL 2025; see our Preprint!

[Oct 2025] Two new papers accepted to IJCNLP-AACL 2025 FindingsAD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection and LLM-Empowered Patient-Provider Communication (a data-centric survey on clinical applications of LLMs). Congratulations to all!

[Oct 2025] 🎉Congratulations to our Ph.D. students Yuehan Qin and Haoyan Xu for successfully passing their qualifying exams! Both of them achieved this after 1.5 years transferring to our group. We are so proud of their accomplishments and excited for their continued research journeys and graduation!

[Sep 2025] 🎉Congratulations to Shawn Li for being selected as an Amazon ML Fellow (2025–2026). The fellowship recognizes his strong research achievements as a PhD student and will further accelerate his work in secure and trustworthy machine learning.

[Sep 2025] New collaborative NeurIPS 2025 paper “DyFlow” proposes a dynamic workflow framework for agentic reasoning with LLMs.


🏅 Awards and Grants

As Principal Investigator (August 2023 onwards)
Prior to Principal Investigator Role (Before August 2023)