LMCache/LMCache
| Blog | Documentation | Join Slack | Interest Form | Roadmap
🔥 NEW: For enterprise-scale deployment of LMCache and vLLM, please check out vLLM Production Stack. LMCache is also officially supported in llm-d and KServe!
Summary
LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.
By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.
Features
- 🔥 Integration with vLLM v1 with the following features:
- High performance CPU KVCache offloading
- Disaggregated prefill
- P2P KVCache sharing
- LMCache is supported in the vLLM production stack, llm-d, and KServe
- Stable support for non-prefix KV caches
- Storage support as follows:
- CPU
- Disk
- NIXL
- Installation support through pip and latest vLLM
Installation
To use LMCache, simply install lmcache from your package manager, e.g. pip:
|
|
Works on Linux NVIDIA GPU platform.
More detailed installation instructions are available in the docs, particularly if you are not using the latest stable version of vllm or using another serving engine with different dependencies. Any “undefined symbol” or torch mismatch versions can be resolved in the documentation.
Getting started
The best way to get started is to checkout the Quickstart Examples in the docs.
Documentation
Check out the LMCache documentation which is available online.
We also post regularly in LMCache blogs.
Examples
Go hands-on with our examples, demonstrating how to address different use cases with LMCache.
Interested in Connecting?
Fill out the interest form, sign up for our newsletter, join LMCache slack, check out LMCache website, or drop an email, and our team will reach out to you!
Community meeting
The community meeting for LMCache is hosted bi-weekly. All are welcome to join!
Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – Add to Calendar
We keep notes from each meeting on this document for summaries of standups, discussion, and action items.
Recordings of meetings are available on the YouTube LMCache channel.
Contributing
We welcome and value all contributions and collaborations. Please check out Contributing Guide on how to contribute.
We continually update [Onboarding] Welcoming contributors with good first issues!
Citation
If you use LMCache for your research, please cite our papers:
|
|
Socials
License
The LMCache codebase is licensed under Apache License 2.0. See the LICENSE file for details.