Publications
First-author and co-first-author works are listed first, followed by collaborations. Also on .
First-author
Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference
Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia
arXiv preprint, 2026 · · · · blog post
TL;DR: RHO improves an agent’s skills, tools, and workflows using only its own past trajectories, raising SWE-Bench Pro pass rate from 59% to 78% in one label-free optimization round.
RHO improves LLM agents without any ground-truth labels: the agent retrospectively compares its own past trajectories via self-preference, then rewrites its harness (prompts, tools, control flow) to prefer the behaviors it judges better. Under a matched budget it also beats a validation-feedback optimizer (78% vs 62%) without touching labels. Joint work with Microsoft Research Asia.
M*: Every Task Deserves Its Own Memory Harness
Wenbo Pan, Shujie Liu, Xiangyang Zhou, Shiwei Zhang, Wanlu Shi, Mirror Xu, Xiaohua Jia
arXiv preprint, 2026 · ·
TL;DR: M* represents an agent’s memory system as an executable Python program and evolves it per task, beating nine fixed-memory baselines on 7 of 8 metrics with relative gains up to 31%.
Instead of one fixed memory design for all tasks, M* searches for a task-specific memory architecture expressed as executable Python code (data schema, storage logic, and workflow instructions), optimizing how an agent stores, retrieves, and consolidates information for each workload across conversation, embodied planning, and expert reasoning benchmarks.
Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs
Wenbo Pan, Zhichao Liu, Xianlong Wang, Haining Yu, Xiaohua Jia
ICML 2026 (Oral) · ·
TL;DR: FlashTrace computes multi-token attribution for reasoning LLMs in a single O(N) pass, over 130x faster than the most efficient baseline (20 seconds vs 38 minutes on 5k-token spans) with higher faithfulness.
FlashTrace attributes multi-token spans in long reasoning chains to their input causes, recursively tracing importance through the chain to recover faithfulness lost when reasoning tokens absorb over 90% of attribution mass. Selected for an oral presentation at ICML 2026 (168 of 23,918 submissions, top 0.7%). Ships as a Python package with CLI and interactive HTML traces.
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
Wenbo Pan, Jie Xu, Qiguang Chen, Junhao Dong, Libo Qin, Xinfeng Li, Haining Yu, Xiaohua Jia
ICLR 2026 ·
TL;DR: Proposes the Refusal Index, the Spearman correlation between a model’s refusal and error probabilities; it is about 70% less variable than heuristic refusal metrics, and accuracy barely predicts refusal (R² = 0.242).
Proposes knowledge-aware refusal metrics that separate “refusing because the model does not know” from blanket refusal behavior, and measures this ability across model families on factual tasks.
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions
Wenbo Pan, Zhichao Liu, Qiguang Chen, Xiangyang Zhou, Haining Yu, Xiaohua Jia
ICML 2025 · ·
TL;DR: LLM safety alignment is controlled by multiple orthogonal activation directions: ablating the dominant one eliminates refusal entirely, and a trigger-removal attack keeps about 40% success while other jailbreaks drop to near zero.
Shows that safety alignment writes not one but many orthogonal directions into activation space. We extract this safety residual space, identify directions predictive of refusal (including interpretable secondary features such as role-playing), and show how individual directions can be ablated or steered.
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale
Wenzhen Zheng*, Wenbo Pan*, Xu Xu*, Libo Qin, Li Yue, Ming Zhou (co-first author)
EMNLP 2024 ·
TL;DR: Cross-lingual continual pre-training follows an extended Chinchilla scaling law, matching from-scratch loss with 25-50% fewer training FLOPs across 40 model sizes up to 5B parameters.
A scaling law for continual pre-training: given a compute budget, predicts the loss reachable when adapting an existing checkpoint to a new data distribution (e.g., a new language), validated across 40 model sizes from 40M to 5B parameters. Replaying 10-30% source-language data prevents catastrophic forgetting. Cited 19 times.
A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding
Wenbo Pan, Qiguang Chen, Xiao Xu, Wanxiang Che, Libo Qin
arXiv preprint, 2023 ·
TL;DR: Zero-shot ChatGPT reaches 60.28 JGA on MultiWOZ 2.1 dialogue state tracking, nearly double GPT-3.5’s 32.25 and within one point of the fine-tuned state of the art (61.02).
One of the earliest systematic evaluations of ChatGPT on dialogue understanding (slot filling, intent detection, DST), documenting failure modes that shaped later instruction-tuning work. Cited 55 times.
Collaborations
WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation
Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao, Tianqing Zhu, Xiaohua Jia
arXiv preprint, 2026 ·
TL;DR: WebTrap injects an attacker goal into a browser agent’s task as a seamless workflow step, reaching 91.67% attack success while preserving 91.67% task utility and resisting standard defenses.
Image-to-Video Diffusion: From Foundations to Open Frontiers
Xianlong Wang, Wenbo Pan, Shijia Zhou, Ke Li, Yuqi Wang, Zeyu Ye, Hangtao Zhang, Leo Yu Zhang, Xiaohua Jia
arXiv preprint, 2026 ·
TL;DR: A survey of diffusion-based image-to-video generation: a taxonomy over architectures and training paradigms, four core design dimensions from condition encoding to spatial-temporal upsampling, and 190 references.
Dual-branch Robust Unlearnable Examples
Xianlong Wang, Hangtao Zhang, Wenbo Pan, Ziqi Zhou, Changsong Jiang, Li Zeng, Xiaohua Jia
ICML 2026 ·
TL;DR: DUNE optimizes dual-branch unlearnable perturbations in the spatial and color domains, outperforming 12 state-of-the-art schemes under 7 defenses and capping CIFAR-10 test accuracy at 14.95-50.82%.
Improve Fluency of Neural Machine Translation Using Large Language Models
Jianfei He, Wenbo Pan, Jijia Yang, Sen Peng, Xiaohua Jia
MT Summit 2025
TL;DR: Integrating a Llama2-13B fluency signal into NMT training via contrastive fluency enhancement raises BLEU on all three WMT pairs (up to +1.09), where LLM re-ranking and refinement fail.
End-to-end Task-oriented Dialogue: A Survey of Tasks, Methods, and Future Directions
Libo Qin, Wenbo Pan, Qiguang Chen, Lizi Liao, Zhou Yu, Yue Zhang, Wanxiang Che, Min Li
EMNLP 2023 ·
TL;DR: The first survey of end-to-end task-oriented dialogue: a Modularly vs. Fully EToD taxonomy with curated leaderboards covering 39 models across 4 benchmarks at etods.net. Cited 34 times.
BibTeX
@article{pan2026rho,
title={Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference},
author={Pan, Wenbo and Liu, Shujie and Lin, Chin-Yew and Zeng, Jingying and Tang, Xianfeng and Zhou, Xiangyang and Lu, Yan and Jia, Xiaohua},
journal={arXiv preprint arXiv:2606.05922},
year={2026}
}
@article{pan2026mstar,
title={M$^\star$: Every Task Deserves Its Own Memory Harness},
author={Pan, Wenbo and Liu, Shujie and Zhou, Xiangyang and Zhang, Shiwei and Shi, Wanlu and Xu, Mirror and Jia, Xiaohua},
journal={arXiv preprint arXiv:2604.11811},
year={2026}
}
@inproceedings{pan2026flashtrace,
title={Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs},
author={Pan, Wenbo and Liu, Zhichao and Wang, Xianlong and Yu, Haining and Jia, Xiaohua},
booktitle={International Conference on Machine Learning (ICML)},
note={Oral presentation},
year={2026}
}
@inproceedings{pan2026refusal,
title={Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks},
author={Pan, Wenbo and Xu, Jie and Chen, Qiguang and Dong, Junhao and Qin, Libo and Li, Xinfeng and Yu, Haining and Jia, Xiaohua},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
@inproceedings{pan2025hidden,
title={The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions},
author={Pan, Wenbo and Liu, Zhichao and Chen, Qiguang and Zhou, Xiangyang and Yu, Haining and Jia, Xiaohua},
booktitle={International Conference on Machine Learning (ICML)},
year={2025}
}
@inproceedings{zheng2024breaking,
title={Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale},
author={Zheng, Wenzhen and Pan, Wenbo and Xu, Xu and Qin, Libo and Yue, Li and Zhou, Ming},
booktitle={Proceedings of EMNLP 2024},
year={2024}
}
@article{pan2023preliminary,
title={A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding},
author={Pan, Wenbo and Chen, Qiguang and Xu, Xiao and Che, Wanxiang and Qin, Libo},
journal={arXiv preprint arXiv:2304.04256},
year={2023}
}