Question Answering on Producthunt daily

kotaemon

Wed, 10 Sep 2025 15:28:12 +0800

Cinnamon/kotaemon

kotaemon

An open-source clean & customizable RAG UI for chatting with your documents. Built with both end users and developers in mind.

Live Demo #1 | Live Demo #2 | Online Install | Colab Notebook (Local RAG)

User Guide | Developer Guide | Feedback | Contact

Introduction

This project serves as a functional RAG UI for both end users who want to do QA on their documents and developers who want to build their own RAG pipeline.

+----------------------------------------------------------------------------+
| End users: Those who use apps built with `kotaemon`.                       |
| (You use an app like the one in the demo above)                            |
|     +----------------------------------------------------------------+     |
|     | Developers: Those who built with `kotaemon`.                   |     |
|     | (You have `import kotaemon` somewhere in your project)         |     |
|     |     +----------------------------------------------------+     |     |
|     |     | Contributors: Those who make `kotaemon` better.    |     |     |
|     |     | (You make PR to this repo)                         |     |     |
|     |     +----------------------------------------------------+     |     |
|     +----------------------------------------------------------------+     |
+----------------------------------------------------------------------------+

For end users

Clean & Minimalistic UI: A user-friendly interface for RAG-based QA.
Support for Various LLMs: Compatible with LLM API providers (OpenAI, AzureOpenAI, Cohere, etc.) and local LLMs (via ollama and llama-cpp-python).
Easy Installation: Simple scripts to get you started quickly.

For developers

Framework for RAG Pipelines: Tools to build your own RAG-based document QA pipeline.
Customizable UI: See your RAG pipeline in action with the provided UI, built with Gradio .
Gradio Theme: If you use Gradio for development, check out our theme here: kotaemon-gradio-theme.

Key Features

Host your own document QA (RAG) web-UI: Support multi-user login, organize your files in private/public collections, collaborate and share your favorite chat with others.
Organize your LLM & Embedding models: Support both local LLMs & popular API providers (OpenAI, Azure, Ollama, Groq).
Hybrid RAG pipeline: Sane default RAG pipeline with hybrid (full-text & vector) retriever and re-ranking to ensure best retrieval quality.
Multi-modal QA support: Perform Question Answering on multiple documents with figures and tables support. Support multi-modal document parsing (selectable options on UI).
Advanced citations with document preview: By default the system will provide detailed citations to ensure the correctness of LLM answers. View your citations (incl. relevant score) directly in the in-browser PDF viewer with highlights. Warning when retrieval pipeline return low relevant articles.
Support complex reasoning methods: Use question decomposition to answer your complex/multi-hop question. Support agent-based reasoning with ReAct, ReWOO and other agents.
Configurable settings UI: You can adjust most important aspects of retrieval & generation process on the UI (incl. prompts).
Extensible: Being built on Gradio, you are free to customize or add any UI elements as you like. Also, we aim to support multiple strategies for document indexing & retrieval. GraphRAG indexing pipeline is provided as an example.

Installation

If you are not a developer and just want to use the app, please check out our easy-to-follow User Guide. Download the .zip file from the latest release to get all the newest features and bug fixes.

System requirements

Python >= 3.10
Docker: optional, if you install with Docker
Unstructured if you want to process files other than .pdf, .html, .mhtml, and .xlsx documents. Installation steps differ depending on your operating system. Please visit the link and follow the specific instructions provided there.

With Docker (recommended)

We support both lite & full version of Docker images. With full version, the extra packages of unstructured will be installed, which can support additional file types (.doc, .docx, …) but the cost is larger docker image size. For most users, the lite image should work well in most cases.
- To use the full version.
  1 2 3 4 5 6
  
  docker run \ -e GRADIO_SERVER_NAME=0.0.0.0 \ -e GRADIO_SERVER_PORT=7860 \ -v ./ktem_app_data:/app/ktem_app_data \ -p 7860:7860 -it --rm \ ghcr.io/cinnamon/kotaemon:main-full
- To use the full version with bundled Ollama for local / private RAG.
  1 2
  
  # change image name to docker run <...> ghcr.io/cinnamon/kotaemon:main-ollama
- To use the lite version.
1 2

# change image name to docker run <...> ghcr.io/cinnamon/kotaemon:main-lite

We currently support and test two platforms: linux/amd64 and linux/arm64 (for newer Mac). You can specify the platform by passing --platform in the docker run command. For example:

# To run docker with platform linux/arm64
docker run \
-e GRADIO_SERVER_NAME=0.0.0.0 \
-e GRADIO_SERVER_PORT=7860 \
-v ./ktem_app_data:/app/ktem_app_data \
-p 7860:7860 -it --rm \
--platform linux/arm64 \
ghcr.io/cinnamon/kotaemon:main-lite

Once everything is set up correctly, you can go to http://localhost:7860/ to access the WebUI.
We use GHCR to store docker images, all images can be found here.

Without Docker

Clone and install required packages on a fresh python environment.

# optional (setup env)
conda create -n kotaemon python=3.10
conda activate kotaemon

# clone this repo
git clone https://github.com/Cinnamon/kotaemon
cd kotaemon

pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"

Create a .env file in the root of this project. Use .env.example as a template

The .env file is there to serve use cases where users want to pre-config the models before starting up the app (e.g. deploy the app on HF hub). The file will only be used to populate the db once upon the first run, it will no longer be used in consequent runs.
(Optional) To enable in-browser PDF_JS viewer, download PDF_JS_DIST then extract it to libs/ktem/ktem/assets/prebuilt

Start the web server:
1

python app.py
- The app will be automatically launched in your browser.
- Default username and password are both admin. You can set up additional users directly through the UI.
Check the Resources tab and LLMs and Embeddings and ensure that your api_key value is set correctly from your .env file. If it is not set, you can set it there.

Setup GraphRAG

[!NOTE] Official MS GraphRAG indexing only works with OpenAI or Ollama API. We recommend most users to use NanoGraphRAG implementation for straightforward integration with Kotaemon.

Setup Nano GRAPHRAG

Install nano-GraphRAG: pip install nano-graphrag
nano-graphrag install might introduce version conflicts, see this issue
- To quickly fix: pip uninstall hnswlib chroma-hnswlib && pip install chroma-hnswlib
Launch Kotaemon with USE_NANO_GRAPHRAG=true environment variable.
Set your default LLM & Embedding models in Resources setting and it will be recognized automatically from NanoGraphRAG.

Setup LIGHTRAG

Install LightRAG: pip install git+https://github.com/HKUDS/LightRAG.git
LightRAG install might introduce version conflicts, see this issue
- To quickly fix: pip uninstall hnswlib chroma-hnswlib && pip install chroma-hnswlib
Launch Kotaemon with USE_LIGHTRAG=true environment variable.
Set your default LLM & Embedding models in Resources setting and it will be recognized automatically from LightRAG.

Setup MS GRAPHRAG

Non-Docker Installation: If you are not using Docker, install GraphRAG with the following command:
1

pip install "graphrag<=0.3.6" future
Setting Up API KEY: To use the GraphRAG retriever feature, ensure you set the GRAPHRAG_API_KEY environment variable. You can do this directly in your environment or by adding it to a .env file.
Using Local Models and Custom Settings: If you want to use GraphRAG with local models (like Ollama) or customize the default LLM and other configurations, set the USE_CUSTOMIZED_GRAPHRAG_SETTING environment variable to true. Then, adjust your settings in the settings.yaml.example file.

Setup Local Models (for local/private RAG)

See Local model setup.

Setup multimodal document parsing (OCR, table parsing, figure extraction)

These options are available:

Azure Document Intelligence (API)
Adobe PDF Extract (API)
Docling (local, open-source)
- To use Docling, first install required dependencies: pip install docling

Select corresponding loaders in Settings -> Retrieval Settings -> File loader

Customize your application

By default, all application data is stored in the ./ktem_app_data folder. You can back up or copy this folder to transfer your installation to a new machine.
For advanced users or specific use cases, you can customize these files:
- flowsettings.py
- .env

`flowsettings.py`

This file contains the configuration of your application. You can use the example here as the starting point.

Notable settings

# setup your preferred document store (with full-text search capabilities)
KH_DOCSTORE=(Elasticsearch | LanceDB | SimpleFileDocumentStore)

# setup your preferred vectorstore (for vector-based search)
KH_VECTORSTORE=(ChromaDB | LanceDB | InMemory | Milvus | Qdrant)

# Enable / disable multimodal QA
KH_REASONINGS_USE_MULTIMODAL=True

# Setup your new reasoning pipeline or modify existing one.
KH_REASONINGS = [
    "ktem.reasoning.simple.FullQAPipeline",
    "ktem.reasoning.simple.FullDecomposeQAPipeline",
    "ktem.reasoning.react.ReactAgentPipeline",
    "ktem.reasoning.rewoo.RewooAgentPipeline",
]

`.env`

This file provides another way to configure your models and credentials.

Configure model via the .env file

Alternatively, you can configure the models via the .env file with the information needed to connect to the LLMs. This file is located in the folder of the application. If you don’t see it, you can create one.
Currently, the following providers are supported:
- OpenAI
  
  In the .env file, set the OPENAI_API_KEY variable with your OpenAI API key in order to enable access to OpenAI’s models. There are other variables that can be modified, please feel free to edit them to fit your case. Otherwise, the default parameter should work for most people.
  1 2 3 4
  
  OPENAI_API_BASE=https://api.openai.com/v1 OPENAI_API_KEY=<your OpenAI API key here> OPENAI_CHAT_MODEL=gpt-3.5-turbo OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002
- Azure OpenAI
  
  For OpenAI models via Azure platform, you need to provide your Azure endpoint and API key. Your might also need to provide your developments’ name for the chat model and the embedding model depending on how you set up Azure development.
  1 2 3 4 5
  
  AZURE_OPENAI_ENDPOINT= AZURE_OPENAI_API_KEY= OPENAI_API_VERSION=2024-02-15-preview AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002
- Local Models
  - Using ollama OpenAI compatible server:
    - Install ollama and start the application.
    - Pull your model, for example:
      1 2
      
      ollama pull llama3.1:8b ollama pull nomic-embed-text
    - Set the model names on web UI and make it as default:
  - Using GGUF with llama-cpp-python
    
    You can search and download a LLM to be ran locally from the Hugging Face Hub. Currently, these model formats are supported:
    - GGUF
      
      You should choose a model whose size is less than your device’s memory and should leave about 2 GB. For example, if you have 16 GB of RAM in total, of which 12 GB is available, then you should choose a model that takes up at most 10 GB of RAM. Bigger models tend to give better generation but also take more processing time.
      
      Here are some recommendations and their size in memory:
    - Qwen1.5-1.8B-Chat-GGUF: around 2 GB
      
      Add a new LlamaCpp model with the provided model name on the web UI.

Adding your own RAG pipeline

Custom Reasoning Pipeline

Check the default pipeline implementation in here. You can make quick adjustment to how the default QA pipeline work.
Add new .py implementation in libs/ktem/ktem/reasoning/ and later include it in flowssettings to enable it on the UI.

Custom Indexing Pipeline

Check sample implementation in libs/ktem/ktem/index/file/graph

(more instruction WIP).

Citation

Please cite this project as

@misc{kotaemon2024,
    title = {Kotaemon - An open-source RAG-based tool for chatting with any content.},
    author = {The Kotaemon Team},
    year = {2024},
    howpublished = {\url{https://github.com/Cinnamon/kotaemon}},
}

Star History

Contribution

Since our project is actively being developed, we greatly value your feedback and contributions. Please see our Contributing Guide to get started. Thank you to all our contributors!

storm

Tue, 01 Jul 2025 15:32:10 +0800

stanford-oval/storm

STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking

**Latest News** 🔥

[2025/01] We add litellm integration for language models and embedding models in knowledge-storm v1.1.0.
[2024/09] Co-STORM codebase is now released and integrated into knowledge-storm python package v1.0.0. Run pip install knowledge-storm --upgrade to check it out.
[2024/09] We introduce collaborative STORM (Co-STORM) to support human-AI collaborative knowledge curation! Co-STORM Paper has been accepted to EMNLP 2024 main conference.
[2024/07] You can now install our package with pip install knowledge-storm!
[2024/07] We add VectorRM to support grounding on user-provided documents, complementing existing support of search engines (YouRM, BingSearch). (check out #58)
[2024/07] We release demo light for developers a minimal user interface built with streamlit framework in Python, handy for local development and demo hosting (checkout #54)
[2024/06] We will present STORM at NAACL 2024! Find us at Poster Session 2 on June 17 or check our presentation material.
[2024/05] We add Bing Search support in rm.py. Test STORM with GPT-4o - we now configure the article generation part in our demo using GPT-4o model.
[2024/04] We release refactored version of STORM codebase! We define interface for STORM pipeline and reimplement STORM-wiki (check out src/storm_wiki) to demonstrate how to instantiate the pipeline. We provide API to support customization of different language models and retrieval/search integration.

Overview (Try STORM now!)

STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search. Co-STORM further enhanced its feature by enabling human to collaborative LLM system to support more aligned and preferred information seeking and knowledge curation.

While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.

More than 70,000 people have tried our live research preview. Try it out to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!

How STORM & Co-STORM works

STORM

STORM breaks down generating long articles with citations into two steps:

Pre-writing stage: The system conducts Internet-based research to collect references and generates an outline.
Writing stage: The system uses the outline and references to generate the full-length article with citations.

STORM identifies the core of automating the research process as automatically coming up with good questions to ask. Directly prompting the language model to ask questions does not work well. To improve the depth and breadth of the questions, STORM adopts two strategies:

Perspective-Guided Question Asking: Given the input topic, STORM discovers different perspectives by surveying existing articles from similar topics and uses them to control the question-asking process.
Simulated Conversation: STORM simulates a conversation between a Wikipedia writer and a topic expert grounded in Internet sources to enable the language model to update its understanding of the topic and ask follow-up questions.

CO-STORM

Co-STORM proposes a collaborative discourse protocol which implements a turn management policy to support smooth collaboration among

Co-STORM LLM experts: This type of agent generates answers grounded on external knowledge sources and/or raises follow-up questions based on the discourse history.
Moderator: This agent generates thought-provoking questions inspired by information discovered by the retriever but not directly used in previous turns. Question generation can also be grounded!
Human user: The human user will take the initiative to either (1) observe the discourse to gain deeper understanding of the topic, or (2) actively engage in the conversation by injecting utterances to steer the discussion focus.

Co-STORM also maintains a dynamic updated mind map, which organize collected information into a hierarchical concept structure, aiming to build a shared conceptual space between the human user and the system. The mind map has been proven to help reduce the mental load when the discourse goes long and in-depth.

Both STORM and Co-STORM are implemented in a highly modular way using dspy.

Installation

To install the knowledge storm library, use pip install knowledge-storm.

You could also install the source code which allows you to modify the behavior of STORM engine directly.

Clone the git repository.

1
2

git clone https://github.com/stanford-oval/storm.git
cd storm

Install the required packages.

1
2
3

conda create -n storm python=3.11
conda activate storm
pip install -r requirements.txt

API

Currently, our package support:

Language model components: All language models supported by litellm as listed here
Embedding model components: All embedding models supported by litellm as listed here
retrieval module components: YouRM, BingSearch, VectorRM, SerperRM, BraveRM, SearXNG, DuckDuckGoSearchRM, TavilySearchRM, GoogleSearch, and AzureAISearch as

:star2: PRs for integrating more search engines/retrievers into knowledge_storm/rm.py are highly appreciated!

Both STORM and Co-STORM are working in the information curation layer, you need to set up the information retrieval module and language model module to create their Runner classes respectively.

STORM

The STORM knowledge curation engine is defined as a simple Python STORMWikiRunner class. Here is an example of using You.com search engine and OpenAI models.

import os
from knowledge_storm import STORMWikiRunnerArguments, STORMWikiRunner, STORMWikiLMConfigs
from knowledge_storm.lm import LitellmModel
from knowledge_storm.rm import YouRM

lm_configs = STORMWikiLMConfigs()
openai_kwargs = {
    'api_key': os.getenv("OPENAI_API_KEY"),
    'temperature': 1.0,
    'top_p': 0.9,
}
# STORM is a LM system so different components can be powered by different models to reach a good balance between cost and quality.
# For a good practice, choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
# Choose a more powerful model for `article_gen_lm` to generate verifiable text with citations.
gpt_35 = LitellmModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
gpt_4 = LitellmModel(model='gpt-4o', max_tokens=3000, **openai_kwargs)
lm_configs.set_conv_simulator_lm(gpt_35)
lm_configs.set_question_asker_lm(gpt_35)
lm_configs.set_outline_gen_lm(gpt_4)
lm_configs.set_article_gen_lm(gpt_4)
lm_configs.set_article_polish_lm(gpt_4)
# Check out the STORMWikiRunnerArguments class for more configurations.
engine_args = STORMWikiRunnerArguments(...)
rm = YouRM(ydc_api_key=os.getenv('YDC_API_KEY'), k=engine_args.search_top_k)
runner = STORMWikiRunner(engine_args, lm_configs, rm)

The STORMWikiRunner instance can be evoked with the simple run method:

topic = input('Topic: ')
runner.run(
    topic=topic,
    do_research=True,
    do_generate_outline=True,
    do_generate_article=True,
    do_polish_article=True,
)
runner.post_run()
runner.summary()

do_research: if True, simulate conversations with difference perspectives to collect information about the topic; otherwise, load the results.
do_generate_outline: if True, generate an outline for the topic; otherwise, load the results.
do_generate_article: if True, generate an article for the topic based on the outline and the collected information; otherwise, load the results.
do_polish_article: if True, polish the article by adding a summarization section and (optionally) removing duplicate content; otherwise, load the results.

Co-STORM

The Co-STORM knowledge curation engine is defined as a simple Python CoStormRunner class. Here is an example of using Bing search engine and OpenAI models.

from knowledge_storm.collaborative_storm.engine import CollaborativeStormLMConfigs, RunnerArgument, CoStormRunner
from knowledge_storm.lm import LitellmModel
from knowledge_storm.logging_wrapper import LoggingWrapper
from knowledge_storm.rm import BingSearch

# Co-STORM adopts the same multi LM system paradigm as STORM 
lm_config: CollaborativeStormLMConfigs = CollaborativeStormLMConfigs()
openai_kwargs = {
    "api_key": os.getenv("OPENAI_API_KEY"),
    "api_provider": "openai",
    "temperature": 1.0,
    "top_p": 0.9,
    "api_base": None,
} 
question_answering_lm = LitellmModel(model=gpt_4o_model_name, max_tokens=1000, **openai_kwargs)
discourse_manage_lm = LitellmModel(model=gpt_4o_model_name, max_tokens=500, **openai_kwargs)
utterance_polishing_lm = LitellmModel(model=gpt_4o_model_name, max_tokens=2000, **openai_kwargs)
warmstart_outline_gen_lm = LitellmModel(model=gpt_4o_model_name, max_tokens=500, **openai_kwargs)
question_asking_lm = LitellmModel(model=gpt_4o_model_name, max_tokens=300, **openai_kwargs)
knowledge_base_lm = LitellmModel(model=gpt_4o_model_name, max_tokens=1000, **openai_kwargs)

lm_config.set_question_answering_lm(question_answering_lm)
lm_config.set_discourse_manage_lm(discourse_manage_lm)
lm_config.set_utterance_polishing_lm(utterance_polishing_lm)
lm_config.set_warmstart_outline_gen_lm(warmstart_outline_gen_lm)
lm_config.set_question_asking_lm(question_asking_lm)
lm_config.set_knowledge_base_lm(knowledge_base_lm)

# Check out the Co-STORM's RunnerArguments class for more configurations.
topic = input('Topic: ')
runner_argument = RunnerArgument(topic=topic, ...)
logging_wrapper = LoggingWrapper(lm_config)
bing_rm = BingSearch(bing_search_api_key=os.environ.get("BING_SEARCH_API_KEY"),
                     k=runner_argument.retrieve_top_k)
costorm_runner = CoStormRunner(lm_config=lm_config,
                               runner_argument=runner_argument,
                               logging_wrapper=logging_wrapper,
                               rm=bing_rm)

The CoStormRunner instance can be evoked with the warmstart() and step(...) methods.

# Warm start the system to build shared conceptual space between Co-STORM and users
costorm_runner.warm_start()

# Step through the collaborative discourse 
# Run either of the code snippets below in any order, as many times as you'd like
# To observe the conversation:
conv_turn = costorm_runner.step()
# To inject your utterance to actively steer the conversation:
costorm_runner.step(user_utterance="YOUR UTTERANCE HERE")

# Generate report based on the collaborative discourse
costorm_runner.knowledge_base.reorganize()
article = costorm_runner.generate_report()
print(article)

Quick Start with Example Scripts

We provide scripts in our examples folder as a quick start to run STORM and Co-STORM with different configurations.

We suggest using secrets.toml to set up the API keys. Create a file secrets.toml under the root directory and add the following content:

# ============ language model configurations ============ 
# Set up OpenAI API key.
OPENAI_API_KEY="your_openai_api_key"
# If you are using the API service provided by OpenAI, include the following line:
OPENAI_API_TYPE="openai"
# If you are using the API service provided by Microsoft Azure, include the following lines:
OPENAI_API_TYPE="azure"
AZURE_API_BASE="your_azure_api_base_url"
AZURE_API_VERSION="your_azure_api_version"
# ============ retriever configurations ============ 
BING_SEARCH_API_KEY="your_bing_search_api_key" # if using bing search
# ============ encoder configurations ============ 
ENCODER_API_TYPE="openai" # if using openai encoder

STORM examples

To run STORM with gpt family models with default configurations:

Run the following command.

python examples/storm_examples/run_storm_wiki_gpt.py \
    --output-dir $OUTPUT_DIR \
    --retriever bing \
    --do-research \
    --do-generate-outline \
    --do-generate-article \
    --do-polish-article

To run STORM using your favorite language models or grounding on your own corpus: Check out examples/storm_examples/README.md.

Co-STORM examples

To run Co-STORM with gpt family models with default configurations,

Add BING_SEARCH_API_KEY="xxx" and ENCODER_API_TYPE="xxx" to secrets.toml
Run the following command

1
2
3

python examples/costorm_examples/run_costorm_gpt.py \
    --output-dir $OUTPUT_DIR \
    --retriever bing

Customization of the Pipeline

STORM

If you have installed the source code, you can customize STORM based on your own use case. STORM engine consists of 4 modules:

Knowledge Curation Module: Collects a broad coverage of information about the given topic.
Outline Generation Module: Organizes the collected information by generating a hierarchical outline for the curated knowledge.
Article Generation Module: Populates the generated outline with the collected information.
Article Polishing Module: Refines and enhances the written article for better presentation.

The interface for each module is defined in knowledge_storm/interface.py, while their implementations are instantiated in knowledge_storm/storm_wiki/modules/*. These modules can be customized according to your specific requirements (e.g., generating sections in bullet point format instead of full paragraphs).

Co-STORM

If you have installed the source code, you can customize Co-STORM based on your own use case

Co-STORM introduces multiple LLM agent types (i.e. Co-STORM experts and Moderator). LLM agent interface is defined in knowledge_storm/interface.py , while its implementation is instantiated in knowledge_storm/collaborative_storm/modules/co_storm_agents.py. Different LLM agent policies can be customized.
Co-STORM introduces a collaborative discourse protocol, with its core function centered on turn policy management. We provide an example implementation of turn policy management through DiscourseManager in knowledge_storm/collaborative_storm/engine.py. It can be customized and further improved.

Datasets

To facilitate the study of automatic knowledge curation and complex information seeking, our project releases the following datasets:

FreshWiki

The FreshWiki Dataset is a collection of 100 high-quality Wikipedia articles focusing on the most-edited pages from February 2022 to September 2023. See Section 2.1 in STORM paper for more details.

You can download the dataset from huggingface directly. To ease the data contamination issue, we archive the source code for the data construction pipeline that can be repeated at future dates.

WildSeek

To study users’ interests in complex information seeking tasks in the wild, we utilized data collected from the web research preview to create the WildSeek dataset. We downsampled the data to ensure the diversity of the topics and the quality of the data. Each data point is a pair comprising a topic and the user’s goal for conducting deep search on the topic. For more details, please refer to Section 2.2 and Appendix A of Co-STORM paper.

The WildSeek dataset is available here.

Replicate STORM & Co-STORM paper result

For STORM paper experiments, please switch to the branch NAACL-2024-code-backup here.

For Co-STORM paper experiments, please switch to the branch EMNLP-2024-code-backup (placeholder for now, will be updated soon).

Roadmap & Contributions

Our team is actively working on:

Human-in-the-Loop Functionalities: Supporting user participation in the knowledge curation process.
Information Abstraction: Developing abstractions for curated information to support presentation formats beyond the Wikipedia-style report.

If you have any questions or suggestions, please feel free to open an issue or pull request. We welcome contributions to improve the system and the codebase!

Contact person: Yijia Shao and Yucheng Jiang

Acknowledgement

We would like to thank Wikipedia for its excellent open-source content. The FreshWiki dataset is sourced from Wikipedia, licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license.

We are very grateful to Michelle Lam for designing the logo for this project and Dekun Ma for leading the UI development.

Thanks to Vercel for their support of open-source software

Citation

Please cite our paper if you use this code or part of it in your work:

@inproceedings{jiang-etal-2024-unknown,
    title = "Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations",
    author = "Jiang, Yucheng  and
      Shao, Yijia  and
      Ma, Dekun  and
      Semnani, Sina  and
      Lam, Monica",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.554/",
    doi = "10.18653/v1/2024.emnlp-main.554",
    pages = "9917--9955",
}

@inproceedings{shao-etal-2024-assisting,
    title = "Assisting in Writing {W}ikipedia-like Articles From Scratch with Large Language Models",
    author = "Shao, Yijia  and
      Jiang, Yucheng  and
      Kanell, Theodore  and
      Xu, Peter  and
      Khattab, Omar  and
      Lam, Monica",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.347/",
    doi = "10.18653/v1/2024.naacl-long.347",
    pages = "6252--6278",
}

ragflow

Thu, 19 Jun 2025 15:29:58 +0800

infiniflow/ragflow

Document | Roadmap | Twitter | Discord | Demo

📕 Table of Contents

💡 What is RAGFlow?
🎮 Demo
📌 Latest Updates
🌟 Key Features
🔎 System Architecture
🎬 Get Started
🔧 Configurations
🔧 Build a docker image without embedding models
🔧 Build a docker image including embedding models
🔨 Launch service from source for development
📚 Documentation
📜 Roadmap
🏄 Community
🙌 Contributing

💡 What is RAGFlow?

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

🎮 Demo

Try our demo at https://demo.ragflow.io.

🔥 Latest Updates

2025-05-23 Adds a Python/JavaScript code executor component to Agent.
2025-05-05 Supports cross-language query.
2025-03-19 Supports using a multi-modal model to make sense of images within PDF or DOCX files.
2025-02-28 Combined with Internet search (Tavily), supports reasoning like Deep Research for any LLMs.
2024-12-18 Upgrades Document Layout Analysis model in DeepDoc.
2024-08-22 Support text to SQL statements through RAG.

🎉 Stay Tuned

⭐️ Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new releases! 🌟

🌟 Key Features

🍭 “Quality in, quality out”

Deep document understanding-based knowledge extraction from unstructured data with complicated formats.
Finds “needle in a data haystack” of literally unlimited tokens.

🍱 Template-based chunking

Intelligent and explainable.
Plenty of template options to choose from.

🌱 Grounded citations with reduced hallucinations

Visualization of text chunking to allow human intervention.
Quick view of the key references and traceable citations to support grounded answers.

🍔 Compatibility with heterogeneous data sources

Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.

🛀 Automated and effortless RAG workflow

Streamlined RAG orchestration catered to both personal and large businesses.
Configurable LLMs as well as embedding models.
Multiple recall paired with fused re-ranking.
Intuitive APIs for seamless integration with business.

🔎 System Architecture

🎬 Get Started

📝 Prerequisites

CPU >= 4 cores
RAM >= 16 GB
Disk >= 50 GB
Docker >= 24.0.0 & Docker Compose >= v2.26.1
gVisor: Required only if you intend to use the code executor (sandbox) feature of RAGFlow.

[!TIP] If you have not installed Docker on your local machine (Windows, Mac, or Linux), see Install Docker Engine.

🚀 Start up the server

Ensure vm.max_map_count >= 262144:
To check the value of vm.max_map_count:
1

$ sysctl vm.max_map_count
Reset vm.max_map_count to a value at least 262144 if it is not.
1 2

# In this case, we set it to 262144: $ sudo sysctl -w vm.max_map_count=262144
This change will be reset after a system reboot. To ensure your change remains permanent, add or update the vm.max_map_count value in /etc/sysctl.conf accordingly:
1

vm.max_map_count=262144

Clone the repo:

`1`	`$ git clone https://github.com/infiniflow/ragflow.git`

Start up the server using the pre-built Docker images:

[!CAUTION] All Docker images are built for x86 platforms. We don’t currently offer Docker images for ARM64. If you are on an ARM64 platform, follow this guide to build a Docker image compatible with your system.

The command below downloads the v0.19.1-slim edition of the RAGFlow Docker image. See the following table for descriptions of different RAGFlow editions. To download a RAGFlow edition different from v0.19.1-slim, update the RAGFLOW_IMAGE variable accordingly in docker/.env before using docker compose to start the server. For example: set RAGFLOW_IMAGE=infiniflow/ragflow:v0.19.1 for the full edition v0.19.1.

$ cd ragflow/docker
# Use CPU for embedding and DeepDoc tasks:
$ docker compose -f docker-compose.yml up -d

# To use GPU to accelerate embedding and DeepDoc tasks:
# docker compose -f docker-compose-gpu.yml up -d

RAGFlow image tag	Image size (GB)	Has embedding models?	Stable?
v0.19.1	≈9	:heavy_check_mark:	Stable release
v0.19.1-slim	≈2	❌	Stable release
nightly	≈9	:heavy_check_mark:	Unstable nightly build
nightly-slim	≈2	❌	Unstable nightly build

Check the server status after having the server up and running:

`1`	`$ docker logs -f ragflow-server`

The following output confirms a successful launch of the system:


      ____   ___    ______ ______ __
     / __ \ /   |  / ____// ____// /____  _      __
    / /_/ // /| | / / __ / /_   / // __ \| | /| / /
   / _, _// ___ |/ /_/ // __/  / // /_/ /| |/ |/ /
  /_/ |_|/_/  |_|\____//_/    /_/ \____/ |__/|__/

 * Running on all addresses (0.0.0.0)

If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a network anormal error because, at that moment, your RAGFlow may not be fully initialized.

In your web browser, enter the IP address of your server and log in to RAGFlow.

With the default settings, you only need to enter http://IP_OF_YOUR_MACHINE (sans port number) as the default HTTP serving port 80 can be omitted when using the default configurations.
In service_conf.yaml.template, select the desired LLM factory in user_default_llm and update the API_KEY field with the corresponding API key.

See llm_api_key_setup for more information.

The show is on!

🔧 Configurations

When it comes to system configurations, you will need to manage the following files:

.env: Keeps the fundamental setups for the system, such as SVR_HTTP_PORT, MYSQL_PASSWORD, and MINIO_PASSWORD.
service_conf.yaml.template: Configures the back-end services. The environment variables in this file will be automatically populated when the Docker container starts. Any environment variables set within the Docker container will be available for use, allowing you to customize service behavior based on the deployment environment.
docker-compose.yml: The system relies on docker-compose.yml to start up.

The ./docker/README file provides a detailed description of the environment settings and service configurations which can be used as ${ENV_VARS} in the service_conf.yaml.template file.

To update the default HTTP serving port (80), go to docker-compose.yml and change 80:80 to <YOUR_SERVING_PORT>:80.

Updates to the above configurations require a reboot of all containers to take effect:

`1`	`$ docker compose -f docker-compose.yml up -d`

Switch doc engine from Elasticsearch to Infinity

RAGFlow uses Elasticsearch by default for storing full text and vectors. To switch to Infinity, follow these steps:

Stop all running containers:

`1`	`$ docker compose -f docker/docker-compose.yml down -v`

[!WARNING] -v will delete the docker container volumes, and the existing data will be cleared.

Set DOC_ENGINE in docker/.env to infinity.

Start the containers:

`1`	`$ docker compose -f docker-compose.yml up -d`

[!WARNING] Switching to Infinity on a Linux/arm64 machine is not yet officially supported.

🔧 Build a Docker image without embedding models

This image is approximately 2 GB in size and relies on external LLM and embedding services.

1
2
3

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
docker build --platform linux/amd64 --build-arg LIGHTEN=1 -f Dockerfile -t infiniflow/ragflow:nightly-slim .

🔧 Build a Docker image including embedding models

This image is approximately 9 GB in size. As it includes embedding models, it relies on external LLM services only.

1
2
3

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
docker build --platform linux/amd64 -f Dockerfile -t infiniflow/ragflow:nightly .

🔨 Launch service from source for development

Install uv, or skip this step if it is already installed:
1

pipx install uv pre-commit

Clone the source code and install Python dependencies:

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/
uv sync --python 3.10 --all-extras # install RAGFlow dependent python modules
uv run download_deps.py
pre-commit install

Launch the dependent services (MinIO, Elasticsearch, Redis, and MySQL) using Docker Compose:
1

docker compose -f docker/docker-compose-base.yml up -d
Add the following line to /etc/hosts to resolve all hosts specified in docker/.env to 127.0.0.1:
1

127.0.0.1 es01 infinity mysql minio redis sandbox-executor-manager
If you cannot access HuggingFace, set the HF_ENDPOINT environment variable to use a mirror site:
1

export HF_ENDPOINT=https://hf-mirror.com

If your operating system does not have jemalloc, please install it as follows:

# ubuntu
sudo apt-get install libjemalloc-dev
# centos
sudo yum install jemalloc

Launch backend service:

1
2
3

source .venv/bin/activate
export PYTHONPATH=$(pwd)
bash docker/launch_backend_service.sh

Install frontend dependencies:
1 2

cd web npm install
Launch frontend service:
1

npm run dev
The following output confirms a successful launch of the system:
Stop RAGFlow front-end and back-end service after development is complete:
1

pkill -f "ragflow_server.py|task_executor.py"

📚 Documentation

📜 Roadmap

See the RAGFlow Roadmap 2025

🏄 Community

🙌 Contributing

RAGFlow flourishes via open-source collaboration. In this spirit, we embrace diverse contributions from the community. If you would like to be a part, review our Contribution Guidelines first.

all-rag-techniques

Mon, 16 Jun 2025 15:31:49 +0800

FareedKhan-dev/all-rag-techniques

All RAG Techniques: A Simpler, Hands-On Approach ✨

This repository takes a clear, hands-on approach to Retrieval-Augmented Generation (RAG), breaking down advanced techniques into straightforward, understandable implementations. Instead of relying on frameworks like LangChain or FAISS, everything here is built using familiar Python libraries openai, numpy, matplotlib, and a few others.

The goal is simple: provide code that is readable, modifiable, and educational. By focusing on the fundamentals, this project helps demystify RAG and makes it easier to understand how it really works.

Update: 📢

(12-May-2025) Added a new notebook on how to handle big data using Knowledge Graphs.
(27-April-2025) Added a new notebook which finds best RAG technique for a given query (Simple RAG + Reranker + Query Rewrite).
(20-Mar-2025) Added a new notebook on RAG with Reinforcement Learning.
(07-Mar-2025) Added 20 RAG techniques to the repository.

🚀 What’s Inside?

This repository contains a collection of Jupyter Notebooks, each focusing on a specific RAG technique. Each notebook provides:

A concise explanation of the technique.
A step-by-step implementation from scratch.
Clear code examples with inline comments.
Evaluations and comparisons to demonstrate the technique’s effectiveness.
Visualization to visualize the results.

Here’s a glimpse of the techniques covered:

Notebook	Description
1. Simple RAG	A basic RAG implementation. A great starting point!
2. Semantic Chunking	Splits text based on semantic similarity for more meaningful chunks.
3. Chunk Size Selector	Explores the impact of different chunk sizes on retrieval performance.
4. Context Enriched RAG	Retrieves neighboring chunks to provide more context.
5. Contextual Chunk Headers	Prepends descriptive headers to each chunk before embedding.
6. Document Augmentation RAG	Generates questions from text chunks to augment the retrieval process.
7. Query Transform	Rewrites, expands, or decomposes queries to improve retrieval. Includes Step-back Prompting and Sub-query Decomposition.
8. Reranker	Re-ranks initially retrieved results using an LLM for better relevance.
9. RSE	Relevant Segment Extraction: Identifies and reconstructs continuous segments of text, preserving context.
10. Contextual Compression	Implements contextual compression to filter and compress retrieved chunks, maximizing relevant information.
11. Feedback Loop RAG	Incorporates user feedback to learn and improve RAG system over time.
12. Adaptive RAG	Dynamically selects the best retrieval strategy based on query type.
13. Self RAG	Implements Self-RAG, dynamically decides when and how to retrieve, evaluates relevance, and assesses support and utility.
14. Proposition Chunking	Breaks down documents into atomic, factual statements for precise retrieval.
15. Multimodel RAG	Combines text and images for retrieval, generating captions for images using LLaVA.
16. Fusion RAG	Combines vector search with keyword-based (BM25) retrieval for improved results.
17. Graph RAG	Organizes knowledge as a graph, enabling traversal of related concepts.
18. Hierarchy RAG	Builds hierarchical indices (summaries + detailed chunks) for efficient retrieval.
19. HyDE RAG	Uses Hypothetical Document Embeddings to improve semantic matching.
20. CRAG	Corrective RAG: Dynamically evaluates retrieval quality and uses web search as a fallback.
21. Rag with RL	Maximize the reward of the RAG model using Reinforcement Learning.
Best RAG Finder	Finds the best RAG technique for a given query using Simple RAG + Reranker + Query Rewrite.
22. Big Data with Knowledge Graphs	Handles large datasets using Knowledge Graphs.

🗂️ Repository Structure

fareedkhan-dev-all-rag-techniques/
├── README.md                          <- You are here!
├── 01_simple_rag.ipynb
├── 02_semantic_chunking.ipynb
├── 03_chunk_size_selector.ipynb
├── 04_context_enriched_rag.ipynb
├── 05_contextual_chunk_headers_rag.ipynb
├── 06_doc_augmentation_rag.ipynb
├── 07_query_transform.ipynb
├── 08_reranker.ipynb
├── 09_rse.ipynb
├── 10_contextual_compression.ipynb
├── 11_feedback_loop_rag.ipynb
├── 12_adaptive_rag.ipynb
├── 13_self_rag.ipynb
├── 14_proposition_chunking.ipynb
├── 15_multimodel_rag.ipynb
├── 16_fusion_rag.ipynb
├── 17_graph_rag.ipynb
├── 18_hierarchy_rag.ipynb
├── 19_HyDE_rag.ipynb
├── 20_crag.ipynb
├── 21_rag_with_rl.ipynb
├── 22_big_data_with_KG.ipynb
├── best_rag_finder.ipynb
├── requirements.txt                   <- Python dependencies
└── data/
    └── val.json                       <- Sample validation data (queries and answers)
    └── AI_Information.pdf             <- A sample PDF document for testing.
    └── attention_is_all_you_need.pdf  <- A sample PDF document for testing (for Multi-Modal RAG).

🛠️ Getting Started

Clone the repository:

1
2

git clone https://github.com/FareedKhan-dev/all-rag-techniques.git
cd all-rag-techniques

Install dependencies:
1

pip install -r requirements.txt

Set up your OpenAI API key:

Obtain an API key from Nebius AI.

Set the API key as an environment variable:

`1`	`export OPENAI_API_KEY='YOUR_NEBIUS_AI_API_KEY'`

`1`	`setx OPENAI_API_KEY "YOUR_NEBIUS_AI_API_KEY" # On Windows`

or, within your Python script/notebook:

1
2

import os
os.environ["OPENAI_API_KEY"] = "YOUR_NEBIUS_AI_API_KEY"

Run the notebooks:

Open any of the Jupyter Notebooks (.ipynb files) using Jupyter Notebook or JupyterLab. Each notebook is self-contained and can be run independently. The notebooks are designed to be executed sequentially within each file.

Note: The data/AI_Information.pdf file provides a sample document for testing. You can replace it with your own PDF. The data/val.json file contains sample queries and ideal answers for evaluation. The ‘attention_is_all_you_need.pdf’ is for testing Multi-Modal RAG Notebook.

💡 Core Concepts

Embeddings: Numerical representations of text that capture semantic meaning. We use Nebius AI’s embedding API and, in many notebooks, also the BAAI/bge-en-icl embedding model.
Vector Store: A simple database to store and search embeddings. We create our own SimpleVectorStore class using NumPy for efficient similarity calculations.
Cosine Similarity: A measure of similarity between two vectors. Higher values indicate greater similarity.
Chunking: Dividing text into smaller, manageable pieces. We explore various chunking strategies.
Retrieval: The process of finding the most relevant text chunks for a given query.
Generation: Using a Large Language Model (LLM) to create a response based on the retrieved context and the user’s query. We use the meta-llama/Llama-3.2-3B-Instruct model via Nebius AI’s API.
Evaluation: Assessing the quality of the RAG system’s responses, often by comparing them to a reference answer or using an LLM to score relevance.

🤝 Contributing

Contributions are welcome!