My name is Xueshen Liu (刘学深). I am a 4th-year Ph.D. candidate in the Computer Science and Engineering Division at the University of Michigan, advised by Prof. Z. Morley Mao. My research interests focus on distributed systems and parallel computing. Currently, I am exploring efficient solutions for training, inference, and reinforcement learning (RL) of large language models (LLMs) by designing elastic and heterogeneous systems.

News

Education

  • University of Michigan, Ann Arbor, MI
    Ph.D. in Computer Science and Engineering (CSE), 2022 – Present
    B.S. in Computer Science and Engineering (CSE), 2020 – 2022
  • Shanghai Jiao Tong University, Shanghai, China
    B.S. in Electrical and Computer Engineering (ECE), 2018 – 2022

Selected Projects & Publications

Experience

  • Student Researcher, Systems Research @ Google, Seattle, WA (May 2025 – Dec. 2025)
    • Characterized bottlenecks across the LLM RL pipeline and identified rollout as a dominant yet highly elastic component.
    • Designed RLBoost on Google Cloud Platform to harvest fragmented spot resources, lower RL training cost, and improve overall utilization.
    • Explored heterogeneous compute options (multi-generation GPUs & TPUs) under diverse RL workloads (sequence length, tool calling, etc.).
    • Contributed to an NL2SQL agentic training pipeline, optimizing multi-node communication and applying asynchronous tool calling.
  • Graduate Student Instructor, CSE 589 Advanced Computer Networks, University of Michigan, Ann Arbor, MI (Sept. 2024 – Dec. 2024)
    • Led in-class discussions and held regular office hours.
    • Delivered a guest lecture on distributed software-defined networking (dSDN).
    • Mentored graduate students on research projects, including methodology, implementation, and presentation.
  • Intern Researcher, Connected Autonomous Vehicle (CAV) Lab, General Motors, Warren, MI (May 2024 – Aug. 2024)
    • Designed a large-scale latency-tolerant vehicle positioning system on edge/cloud servers.
    • Developed a deep factor graph model to handle delayed perception data while maintaining real-time responsiveness.
    • Leveraged parallelism and prioritized scheduling to meet tight latency constraints.

Service & Honors

  • Reviewer / PC member: ICLR’26, ICLR’25, COLING’25
  • Invited talk: “Scalable & Latency-tolerant Edge/Cloud Computing via Deep Factor Graph” (Aug. 2024)
  • Invited talk: “Minimap2-gigabases (mm2-gb)” at AMD HPC Apps Knowledge Sync (May 2024)
  • Awards:
    • Roger King Scholarship, College of Engineering, University of Michigan (Aug. 2021)
    • Runner-up Team & Grand Prize, 18th Robomaster Final Competition (Aug. 2019)

Skills

  • Machine learning & systems: VeRL, PyTorch, DeepSpeed, NCCL, SGLang, vLLM, FlashAttention, LMCache, CUTLASS
  • Programming languages: Python, Rust, Triton, CUDA, HIP, C/C++, Go, LLVM
  • Development & profiling: Kubernetes, Nsight Systems/Compute, MCP, Cursor/Codex, Perfetto, Slurm, Docker, Git