Publications (* indicates co-first authors)

  1. RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
    Y Wu*, X Liu*, H Zheng, J Gu, B Chen, ZM Mao, A Krishnamurthy, I Stoica
    NSDI 2026 (to appear)
    Keywords: LLM RL, Spot Instances, Kubernetes, Cost Efficiency
    Paper Code

  2. HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
    Y Wu*, X Liu*, S Jin, C Xu, F Qian, ZM Mao, M Lentz, D Zhuo, I Stoica
    In submission
    Keywords: LLM Training, Mixture-of-Experts, Heterogeneous GPUs, DeepSpeed
    Paper

  3. Plato: Plan to Efficiently Decode for Large Language Model Inference
    S Jin*, X Liu*, Y Wu, H Zheng, Q Zhang, M Lentz, ZM Mao, A Prakash, F Qian, D Zhuo
    COLM 2025
    Keywords: LLM Inference, Parallel Decoding, Structured Decoding, KV Cache
    Paper

  4. Compute Or Load KV Cache? Why Not Both? (Cake)
    S Jin*, X Liu*, Q Zhang, ZM Mao
    ICML 2025
    Keywords: LLM Inference, KV Cache, Long Context, vLLM, LMCache
    Paper

  5. Learn to be efficient: Build structured sparsity in large language models (LTE)
    H Zheng, X Bai, X Liu, ZM Mao, B Chen, F Lai, A Prakash
    NeurIPS 2024 Spotlight
    Keywords: LLM Efficiency, Structured Sparsity, MoE, Gather-scatter, Triton
    Paper

  6. mm2-gb: GPU Accelerated Minimap2 for Long Read DNA Mapping
    J Dong*, X Liu*, H Sadasivan, S Sitaraman, S Narayanasamy
    ACM BCB 2024 Oral
    Keywords: GPU, DNA Mapping, Minimap2, HPC, Persistent Kernel
    Paper Code