Publications (* indicates equal contribution)

  1. Compute Or Load KV Cache? Why Not Both? (Cake)
    S Jin*, X Liu*, Q Zhang, ZM Mao
    arXiv preprint arXiv:2410.03065
    Keywords: LLM Inference, KV Cache, Long Context, vLLM, LMCache
    Paper

  2. Learn to be efficient: Build structured sparsity in large language models (LTE)
    H Zheng, X Bai, X Liu, ZM Mao, B Chen, F Lai, A Prakash
    NeurIPS 2024 Spotlight
    Keywords: LLM Efficiency, Structured Sparsity, MoE, Gather-scatter, Triton
    Paper

  3. mm2-gb: GPU Accelerated Minimap2 for Long Read DNA Mapping
    J Dong*, X Liu*, H Sadasivan, S Sitaraman, S Narayanasamy
    ACM BCB 2024 Oral
    Keywords: GPU, DNA Mapping, Minimap2, HPC, Persistent Kernel
    Paper