yichuan-w/LEANN
π LEANN: A Low-Storage Vector Index
β‘ Real-time embedding computation for large-scale RAG on consumer hardware
Quick Start β’ Features β’ Benchmarks β’ Documentation β’ Paper
π What is Leann?
Leann revolutionizes Retrieval-Augmented Generation (RAG) by eliminating the storage bottleneck of traditional vector databases. Instead of pre-computing and storing billions of embeddings, Leann dynamically computes embeddings at query time using highly optimized graph-based search algorithms.
π― Why Leann?
Traditional RAG systems face a fundamental trade-off:
- πΎ Storage: Storing embeddings for millions of documents requires massive disk space
- π Freshness: Pre-computed embeddings become stale when documents change
- π° Cost: Vector databases are expensive to scale
Leann solves this by:
- β Zero embedding storage - Only graph structure is persisted
- β Real-time computation - Embeddings computed on-demand with ms latency
- β Memory efficient - Runs on consumer hardware (8GB RAM)
- β Always fresh - No stale embeddings, ever
π Quick Start
Installation
|
|
30-Second Example
|
|
Run the Demo
|
|
PDF RAG Demo (using LlamaIndex for document parsing and Leann for indexing/search)
This demo showcases how to build a RAG system for PDF documents using Leann.
- Place your PDF files (and other supported formats like .docx, .pptx, .xlsx) into the
examples/data/directory. - Ensure you have an
OPENAI_API_KEYset in your environment variables or in a.envfile for the LLM to function.
|
|
β¨ Features
π₯ Core Features
- π Multiple Distance Functions: L2, Cosine, MIPS (Maximum Inner Product Search)
- ποΈ Pluggable Backends: DiskANN, HNSW/FAISS with unified API
- π Real-time Embeddings: Dynamic computation using optimized ZMQ servers
- π Scalable Architecture: Handles millions of documents on consumer hardware
- π― Graph Pruning: Advanced techniques for memory-efficient search
π οΈ Technical Highlights
- Zero-copy operations for maximum performance
- SIMD-optimized distance computations (AVX2/AVX512)
- Async embedding pipeline with batched processing
- Memory-mapped indices for fast startup
- Recompute mode for highest accuracy scenarios
π¨ Developer Experience
- Simple Python API - Get started in minutes
- Extensible backend system - Easy to add new algorithms
- Comprehensive examples - From basic usage to production deployment
- Rich debugging tools - Built-in performance profiling
π Benchmarks
Memory Usage Comparison
| System | 1M Documents | 10M Documents | 100M Documents |
|---|---|---|---|
| Traditional Vector DB | 3.1 GB | 31 GB | 310 GB |
| Leann | 180 MB | 1.2 GB | 8.4 GB |
| Reduction | 94.2% | 96.1% | 97.3% |
Query Performance
| Backend | Index Size | Query Time | Recall@10 |
|---|---|---|---|
| DiskANN | 1M docs | 12ms | 0.95 |
| DiskANN + Recompute | 1M docs | 145ms | 0.98 |
| HNSW | 1M docs | 8ms | 0.93 |
Benchmarks run on AMD Ryzen 7 with 32GB RAM
ποΈ Architecture
|
|
Key Components
- π§ Embedding Engine: Real-time transformer inference with caching
- π Graph Index: Memory-efficient navigation structures
- π Search Coordinator: Orchestrates embedding + graph search
- β‘ Backend Adapters: Pluggable algorithm implementations
π Supported Models & Backends
π€ Embedding Models
- sentence-transformers/all-mpnet-base-v2 (default)
- sentence-transformers/all-MiniLM-L6-v2 (lightweight)
- Any HuggingFace sentence-transformer model
- Custom model support via API
π§ Search Backends
- DiskANN: Microsoft’s billion-scale ANN algorithm
- HNSW: Hierarchical Navigable Small World graphs
- Coming soon: ScaNN, Faiss-IVF, NGT
π Distance Functions
- L2: Euclidean distance for precise similarity
- Cosine: Angular similarity for normalized vectors
- MIPS: Maximum Inner Product Search for recommendation systems
π¬ Paper
If you find Leann useful, please cite:
LEANN: A Low-Storage Vector Index
|
|
π Use Cases
πΌ Enterprise RAG
|
|
π¬ Research & Experimentation
|
|
π Real-time Applications
|
|
π€ Contributing
We welcome contributions! Leann is built by the community, for the community.
Ways to Contribute
- π Bug Reports: Found an issue? Let us know!
- π‘ Feature Requests: Have an idea? We’d love to hear it!
- π§ Code Contributions: PRs welcome for all skill levels
- π Documentation: Help make Leann more accessible
- π§ͺ Benchmarks: Share your performance results
Development Setup
|
|
Quick Tests
|
|
π Roadmap
π― Q1 2024
- DiskANN backend with MIPS/L2/Cosine support
- HNSW backend integration
- Real-time embedding pipeline
- Memory-efficient graph pruning
π Q2 2024
- Distributed search across multiple nodes
- ScaNN backend support
- Advanced caching strategies
- Kubernetes deployment guides
π Q3 2024
- GPU-accelerated embedding computation
- Approximate distance functions
- Integration with LangChain/LlamaIndex
- Visual similarity search
π¬ Community
Join our growing community of researchers and engineers!
- π¦ Twitter: @LeannAI
- π¬ Discord: Join our server
- π§ Email: leann@yourcompany.com
- π GitHub Discussions: Ask questions here
π License
MIT License - see LICENSE for details.
π Acknowledgments
- Microsoft Research for the DiskANN algorithm
- Meta AI for FAISS and optimization insights
- HuggingFace for the transformer ecosystem
- Our amazing contributors who make this possible
β Star us on GitHub if Leann is useful for your research or applications!
Made with β€οΈ by the Leann team