I am Haibo Jin, a PhD student majoring in Information Sciences, at the University of Illinois Urbana-Champaign, under the supervision of Prof. Haohan Wang.

My research interest includes trustworthy machine learning and the robustness of deep learning systems. I am now working on attacks and defense on computer vision, diffusion models, and multi-modal models. If you are seeking any form of academic cooperation, please feel free to email me.

🔥 News

2025.10: 🎉🎉 Welcome to visit Neurips’25 Paper website.
2025.09: 🎉🎉 Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps is accepted by NeurIPS’25!
2025.08: 🎉🎉 Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation is accepted by EMNLP’25!
2025.07: 🎉🎉 Welcome to visit Revolve website.
2025.06: 🎉🎉 Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization is accepted by ICML’25!
2024.12: 🎉🎉 Welcome to our survey paper: Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
2024.11: 🎉🎉 Welcome to visit JAM website.
2024.09: 🎉🎉 Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters is accepted by NeurIPS’24!
2024.08: 🎉🎉 Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence is accepted by IEEE Transactions on Dependable and Secure Computing (TDSC)!
2024.07: 🎉🎉 CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing is accepted by ECCV’24!
2024.07: 🎉🎉 EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models is accepted by ECCV’24!
2024.06: 🎉🎉 Welcome to visit JailbreakZoo website.
2024.05: 🎉🎉 Welcome to my new homepage!
2024.04: 🎉🎉 Receive a TA/RA offer from the School of Information Sciences, University of Illinois Urbana-Champaign!
2024.03: 🎉🎉 Welcome to visit JailbreakZoo, a dedicated repository focused on the jailbreaking of large models (LMs), encompassing both large language models (LLMs) and vision language models (VLMs).
2024.01: 🎉🎉 Start my trip at the University of Illinois Urbana-Champaign as a visiting scholar!

📝 Selected Publications

NeurIPS'25

Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps

Haibo Jin, Peiyan Zhang, Man Luo, Haohan Wang

The 39th Conference on Neural Information Processing Systems (NeurIPS’25)

PDF / Code

EMNLP'25

Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation

Jun Zhuang, Haibo Jin, Ye Zhang, Zhengjian Kang, Wenbin Zhang, Gaby G Dagher, Haohan Wang

The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP’25)

PDF

ICML'25

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

Peiyan Zhang, Haibo Jin, Leyang Hu, Xinnuo Li, Liying Kang, Man Luo, Yangqiu Song, Haohan Wang

The 42nd International Conference on Machine Learning (ICML’25)

PDF / Code

NeurIPS'24

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Haibo Jin, Andy Zhou, Joe D Menke, Haohan Wang

The 38th Conference on Neural Information Processing Systems (NeurIPS’24)

PDF / Code

ECCV'24

CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing

Haibo Jin, Ruoxi Chen, Jinyin Chen, Haibin Zheng, Yang Zhang, Haohan Wang

The 18th European Conference on Computer Vision (ECCV’24)

PDF / Code

ECCV'24

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models

Ruoxi Chen, Haibo Jin, Yixin Liu, Jinyin Chen, Haohan Wang, Lichao Sun

The 18th European Conference on Computer Vision (ECCV’24)

PDF / Code

SeT LLM@ICLR'24

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Haibo Jin, Ruoxi Chen, Andy Zhou, Yang Zhang, Haohan Wang

ICLR 2024 Workshop on Secure and Trustworthy Large Language Models (SeT LLM@ICLR’24)

PDF / Code / Talk / DOI

ASE'23

CertPri: Certifiable Prioritization for Deep Neural Networks via Movement Cost in Feature Space

Haibin Zheng, Jinyin Chen, Haibo Jin

The 38th IEEE/ACM International Conference on Automated Software Engineering (ASE’23)

PDF / Code / DOI

Information Sciences'23

Excitement surfeited turns to errors: Deep learning testing framework based on excitable neurons

Haibo Jin, Ruoxi Chen, Haibin Zheng, Jinyin Chen, Yao Cheng, Yue Yu, Tieming Chen, Xianglong Liu

Information Sciences, 2023, 637: 118936

PDF / Code / DOI

Information Sciences'22

ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries

Haibo Jin, Jinyin Chen, Haibin Zheng, Zhen Wang, Jun Xiao, Shanqing Yu, Zhaoyan Ming

Information Sciences, 2022, 587: 97-122

PDF / Code / DOI

TDSC'24

Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence

Ruoxi Chen, Haibo Jin, Haibin Zheng, Jinyin Chen, Zhenguang Liu

IEEE Transactions on Dependable and Secure Computing, 2024

PDF / Code

Computers & Security'24

AdvCheck: Characterizing adversarial examples via local gradient checking

Ruoxi Chen, Haibo Jin, Jinyin Chen, Haibin Zheng, Shilian Zheng, Xiaoniu Yang, Xing Yang

Computers & Security, 2024, 136: 103540

PDF / Code / DOI

🎖️ Honors and Awards

2022.10: National Scholarship, Postgraduate Premium Scholarship.

📖 Educations

2025.09 - now: PhD Candicate, Information Sciences, University of Illinois Urbana-Champaign, Illinois, US.

💻 Internships

2023.04 - 2025.08: Visiting scholar at DREAM Lab, University of Illinois Urbana-Champaign, US. (Supervisor: Haohan Wang)

🔖 Service

2025: ICLR 2025, AISTATS 2025
2024: NeurIPS 2024, ICML 2024, WWW 2024, SeT LLM @ ICLR 2024