I am Haibo Jin, a PhD student majoring in Information Sciences, at the University of Illinois Urbana-Champaign, under the supervision of Prof. Haohan Wang.

My research interest includes trustworthy machine learning and the robustness of deep learning systems. I am now working on attacks and defense on computer vision, diffusion models, and multi-modal models. If you are seeking any form of academic cooperation, please feel free to email me.

๐Ÿ”ฅ News

  • 2025.10: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to visit Neuripsโ€™25 Paper website.
  • 2025.09: ย ๐ŸŽ‰๐ŸŽ‰ Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps is accepted by NeurIPSโ€™25!
  • 2025.08: ย ๐ŸŽ‰๐ŸŽ‰ Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation is accepted by EMNLPโ€™25!
  • 2025.07: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to visit Revolve website.
  • 2025.06: ย ๐ŸŽ‰๐ŸŽ‰ Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization is accepted by ICMLโ€™25!
  • 2024.12: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to our survey paper: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
  • 2024.11: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to visit JAM website.
  • 2024.09: ย ๐ŸŽ‰๐ŸŽ‰ Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters is accepted by NeurIPSโ€™24!
  • 2024.08: ย ๐ŸŽ‰๐ŸŽ‰ Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence is accepted by IEEE Transactions on Dependable and Secure Computing (TDSC)!
  • 2024.07: ย ๐ŸŽ‰๐ŸŽ‰ CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing is accepted by ECCVโ€™24!
  • 2024.07: ย ๐ŸŽ‰๐ŸŽ‰ EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models is accepted by ECCVโ€™24!
  • 2024.06: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to visit JailbreakZoo website.
  • 2024.05: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to my new homepage!
  • 2024.04: ย ๐ŸŽ‰๐ŸŽ‰ Receive a TA/RA offer from the School of Information Sciences, University of Illinois Urbana-Champaign!
  • 2024.03: ย ๐ŸŽ‰๐ŸŽ‰ Welcome to visit JailbreakZoo, a dedicated repository focused on the jailbreaking of large models (LMs), encompassing both large language models (LLMs) and vision language models (VLMs).
  • 2024.01: ย ๐ŸŽ‰๐ŸŽ‰ Start my trip at the University of Illinois Urbana-Champaign as a visiting scholar!

๐Ÿ“ Selected Publications

NeurIPS'25
sym

Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps

Haibo Jin, Peiyan Zhang, Man Luo, Haohan Wang

The 39th Conference on Neural Information Processing Systems (NeurIPSโ€™25)

PDF / Code

EMNLP'25
sym

Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation

Jun Zhuang, Haibo Jin, Ye Zhang, Zhengjian Kang, Wenbin Zhang, Gaby G Dagher, Haohan Wang

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLPโ€™25)

PDF

ICML'25
sym

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Peiyan Zhang, Haibo Jin, Leyang Hu, Xinnuo Li, Liying Kang, Man Luo, Yangqiu Song, Haohan Wang

The 42nd International Conference on Machine Learning (ICMLโ€™25)

PDF / Code

NeurIPS'24
sym

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Haibo Jin, Andy Zhou, Joe D Menke, Haohan Wang

The 38th Conference on Neural Information Processing Systems (NeurIPSโ€™24)

PDF / Code

ECCV'24
sym

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

Haibo Jin, Ruoxi Chen, Jinyin Chen, Haibin Zheng, Yang Zhang, Haohan Wang

The 18th European Conference on Computer Vision (ECCVโ€™24)

PDF / Code

ECCV'24
sym

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models

Ruoxi Chen, Haibo Jin, Yixin Liu, Jinyin Chen, Haohan Wang, Lichao Sun

The 18th European Conference on Computer Vision (ECCVโ€™24)

PDF / Code

SeT LLM@ICLR'24
sym

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Haibo Jin, Ruoxi Chen, Andy Zhou, Yang Zhang, Haohan Wang

ICLR 2024 Workshop on Secure and Trustworthy Large Language Models (SeT LLM@ICLRโ€™24)

PDF / Code / Talk / DOI

ASE'23
sym

CertPri: Certifiable Prioritization for Deep Neural Networks via Movement Cost in Feature Space

Haibin Zheng, Jinyin Chen, Haibo Jin

The 38th IEEE/ACM International Conference on Automated Software Engineering (ASEโ€™23)

PDF / Code / DOI

Information Sciences'23
sym

Excitement surfeited turns to errors: Deep learning testing framework based on excitable neurons

Haibo Jin, Ruoxi Chen, Haibin Zheng, Jinyin Chen, Yao Cheng, Yue Yu, Tieming Chen, Xianglong Liu

Information Sciences, 2023, 637: 118936

PDF / Code / DOI

Information Sciences'22
sym

ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries

Haibo Jin, Jinyin Chen, Haibin Zheng, Zhen Wang, Jun Xiao, Shanqing Yu, Zhaoyan Ming

Information Sciences, 2022, 587: 97-122

PDF / Code / DOI

TDSC'24
sym

Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence

Ruoxi Chen, Haibo Jin, Haibin Zheng, Jinyin Chen, Zhenguang Liu

IEEE Transactions on Dependable and Secure Computing, 2024

PDF / Code

Computers & Security'24
sym

AdvCheck: Characterizing adversarial examples via local gradient checking

Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Shilian Zheng, Xiaoniu Yang, Xing Yang

Computers & Security, 2024, 136: 103540

PDF / Code / DOI

๐ŸŽ–๏ธ Honors and Awards

  • 2022.10: ย  National Scholarship, Postgraduate Premium Scholarship.

๐Ÿ“– Educations

  • 2025.09 - now: ย  PhD Candicate, Information Sciences, University of Illinois Urbana-Champaign, Illinois, US.

๐Ÿ’ป Internships

  • 2023.04 - 2025.08: ย  Visiting scholar at DREAM Lab, University of Illinois Urbana-Champaign, US. (Supervisor: Haohan Wang)

๐Ÿ”– Service

  • 2025: ย  ICLR 2025, AISTATS 2025
  • 2024: ย  NeurIPS 2024, ICML 2024, WWW 2024, SeT LLM @ ICLR 2024