Junchen's Lab
Junchen's Lab
Tour
News
People
Projects
Publications
Contact
Shan Lu
Latest
CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
Automatic and Efficient Customization of Neural Networks for ML Applications
Run-Time Prevention of Software Integration Failures of Machine Learning APIs
Cite
×