Research
My primary research interest lies in Machine Learning Systems (MLSys), with a focus on developing efficient and scalable systems for model inference and training.
On the algorithmic side, my research emphasizes sparsity and quantization techniques to optimize model inference effeciency. On the system side, I am particularly interested in designing robust pipeline services to support the deployment of popular ML algorithms effectively.
|
|
Herd: Grouping before Pruning for Batch Inference
Yuwei An, Zhuoming Chen, Chenyan Xiong, Beidi Chen
Preprint
paper
|
|
IFMoE: An Inference Framework Design for Fine-grained MoE
Yuwei An, Zhuoming Chen, Beidi Chen
NeurIPS 2024 MLSys Workshop
paper
|
|
Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
Zhaoyang Lyu, Jinyi Wang, Yuwei An, Ya Zhang, Dahua Lin, Bo Dai
CVPR 2023
paper
|
|
LMCache
website
LMCache is a LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations including (GPU, CPU DRAM, Local Disk), LMCache reuse the KV caches of any reused text (not necessarily prefix) in any serving engine instance
|
|
CMU 18789: Deep Generative Modeling
website
TA for 18-789 Deep Generative Modeling in Spring 2025. Instructor: Beidi Chen
|
This is the source code from Jon Barron. Thanks to him for sharing this beautiful template!
|
|