Publications
Below is a curated list of my publications.
1) My Favorite Streamer is an LLM: Discovering, Bonding, and Co-Creating in AI VTuber Fandom
- Authors: Jiayi Ye*, Chaoran Chen, Yue Huang, Yanfang Ye, Toby Jia-Jun Li, Xiangliang Zhang
- Venue: CHI 2026
- Links: arXiv · PDF
2) Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
- Authors: Jiayi Ye*, Yanbo Wang*, Yue Huang*, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang (* Equal Contribution)
- Venue: ICLR 2025
- Links: arXiv · PDF
3) Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
- Authors: Yujun Zhou*, Jiayi Ye*, Zipeng Ling*, Yufei Han, Yue Huang, Haomin Zhuang, Zhenwen Liang, Kehan Guo, Taicheng Guo, Xiangqi Wang, Xiangliang Zhang (* Equal Contribution)
- Venue: EMNLP 2025 Findings
- Links: arXiv · PDF
4) AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
- Authors: Han Bao*, Yue Huang*, Yanbo Wang*, Jiayi Ye*, Xiangqi Wang, Xiuying Chen, Mohamed Elhoseiny, Xiangliang Zhang (* Equal Contribution)
- Venue: SFLLM Workshop @ NeurIPS 2024
- Links: Website · arXiv · PDF
5) TRUSTEVAL: A Dynamic Evaluation Toolkit on Trustworthiness of Generative Foundation Models
- Authors: Yanbo Wang*, Jiayi Ye*, Siyuan Wu*, Chujie Gao, Yue Huang, Xiuying Chen, Yue Zhao, Xiangliang Zhang (* Equal Contribution)
- Venue: NAACL 2025 System Demonstration Track
- Links: Website · Code
6) UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
- Authors: Qihui Zhang, Munan Ning, Zheyuan Liu, Yanbo Wang, Jiayi Ye, Yue Huang, Shuo Yang, Xiao Chen, Yibing Song, Li Yuan
- Venue: CVPR 2025
- Links: arXiv · PDF