Hanchen presented CacheGen paper at SIGCOMM'24.

CacheGen is a LLM KV cache compression and streaming system that saves storage size and reduces transfer time. By utilizing delta compression and information theory, CacheGen is able to further compress KV cache by up to 4.3x on top of previous machine learning techniques with option to incur NO additional loss.

Link to a pre-recorded version: