Publications (* indicates co-first authors)
-
RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
Y Wu*, X Liu*, H Zheng, J Gu, B Chen, ZM Mao, A Krishnamurthy, I Stoica
NSDI 2026 (to appear)
Keywords: LLM RL, Spot Instances, Kubernetes, Cost Efficiency
Paper Code -
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Y Wu*, X Liu*, S Jin, C Xu, F Qian, ZM Mao, M Lentz, D Zhuo, I Stoica
In submission
Keywords: LLM Training, Mixture-of-Experts, Heterogeneous GPUs, DeepSpeed
Paper -
Plato: Plan to Efficiently Decode for Large Language Model Inference
S Jin*, X Liu*, Y Wu, H Zheng, Q Zhang, M Lentz, ZM Mao, A Prakash, F Qian, D Zhuo
COLM 2025
Keywords: LLM Inference, Parallel Decoding, Structured Decoding, KV Cache
Paper -
Compute Or Load KV Cache? Why Not Both? (Cake)
S Jin*, X Liu*, Q Zhang, ZM Mao
ICML 2025
Keywords: LLM Inference, KV Cache, Long Context, vLLM, LMCache
Paper -
Learn to be efficient: Build structured sparsity in large language models (LTE)
H Zheng, X Bai, X Liu, ZM Mao, B Chen, F Lai, A Prakash
NeurIPS 2024 Spotlight
Keywords: LLM Efficiency, Structured Sparsity, MoE, Gather-scatter, Triton
Paper -
mm2-gb: GPU Accelerated Minimap2 for Long Read DNA Mapping
J Dong*, X Liu*, H Sadasivan, S Sitaraman, S Narayanasamy
ACM BCB 2024 Oral
Keywords: GPU, DNA Mapping, Minimap2, HPC, Persistent Kernel
Paper Code