Search

Junchen's Lab

Junchen's Lab

Tour
News
People
Projects
Publications
Contact

Shan Lu

Latest

CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Automatic and Efficient Customization of Neural Networks for ML Applications
Run-Time Prevention of Software Integration Failures of Machine Learning APIs

© 2024 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.

Cite