Publications (* indicates equal contribution)
-
Compute Or Load KV Cache? Why Not Both? (Cake)
S Jin*, X Liu*, Q Zhang, ZM Mao
arXiv preprint arXiv:2410.03065
Keywords: LLM Inference, KV Cache, Long Context, vLLM, LMCache
Paper -
Learn to be efficient: Build structured sparsity in large language models (LTE)
H Zheng, X Bai, X Liu, ZM Mao, B Chen, F Lai, A Prakash
NeurIPS 2024 Spotlight
Keywords: LLM Efficiency, Structured Sparsity, MoE, Gather-scatter, Triton
Paper -
mm2-gb: GPU Accelerated Minimap2 for Long Read DNA Mapping
J Dong*, X Liu*, H Sadasivan, S Sitaraman, S Narayanasamy
ACM BCB 2024 Oral
Keywords: GPU, DNA Mapping, Minimap2, HPC, Persistent Kernel
Paper