<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Reinforcement-Learning on Producthunt daily</title>
        <link>https://producthunt.programnotes.cn/en/tags/reinforcement-learning/</link>
        <description>Recent content in Reinforcement-Learning on Producthunt daily</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Tue, 23 Sep 2025 15:28:18 +0800</lastBuildDate><atom:link href="https://producthunt.programnotes.cn/en/tags/reinforcement-learning/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>DeepResearch</title>
        <link>https://producthunt.programnotes.cn/en/p/deepresearch/</link>
        <pubDate>Tue, 23 Sep 2025 15:28:18 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/deepresearch/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1722929184854-ab6210e4aa29?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTg2MTI0NzN8&amp;ixlib=rb-4.1.0" alt="Featured image of post DeepResearch" /&gt;&lt;h1 id=&#34;alibaba-nlpdeepresearch&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Alibaba-NLP/DeepResearch&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Alibaba-NLP/DeepResearch&lt;/a&gt;
&lt;/h1&gt;&lt;div align=&#34;center&#34;&gt;
  &lt;picture&gt;
      &lt;img src=&#34;./assets/logo.png&#34; width=&#34;100%&#34;&gt;
  &lt;/picture&gt;
&lt;/div&gt;
&lt;hr&gt;
&lt;div align=&#34;center&#34; style=&#34;line-height: 1;&#34;&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Models-5EDDD2?style=for-the-badge&amp;amp;logo=huggingface&amp;amp;logoColor=ffffff&amp;amp;labelColor&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;MODELS&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/Alibaba-NLP/DeepResearch&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Github-24292F?style=for-the-badge&amp;amp;logo=github&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GITHUB&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Blog-4285F4?style=for-the-badge&amp;amp;logo=google-chrome&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Blog&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p align=&#34;center&#34;&gt;
&lt;p align=&#34;center&#34;&gt;
🤗 &lt;a href=&#34;https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B&#34; target=&#34;_blank&#34;&gt;HuggingFace&lt;/a&gt; ｜
&lt;img src=&#34;./assets/tongyi.png&#34; width=&#34;14px&#34; style=&#34;display:inline;&#34;&gt; &lt;a href=&#34;https://modelscope.cn/models/iic/Tongyi-DeepResearch-30B-A3B&#34; target=&#34;_blank&#34;&gt;ModelScope&lt;/a&gt; |  💬 &lt;a href=&#34;./assets/wechat.jpg&#34;&gt;WeChat(微信)&lt;/a&gt;
&lt;p align=&#34;center&#34;&gt;
&lt;a href=&#34;https://trendshift.io/repositories/14895&#34; target=&#34;_blank&#34;&gt;&lt;img src=&#34;https://trendshift.io/api/badge/repositories/14895&#34; alt=&#34;Alibaba-NLP%2FDeepResearch | Trendshift&#34; style=&#34;width: 250px; height: 55px;&#34; width=&#34;250&#34; height=&#34;55&#34;/&gt;&lt;/a&gt;
&lt;h1 id=&#34;introduction&#34;&gt;Introduction
&lt;/h1&gt;&lt;p&gt;We present &lt;img src=&#34;./assets/tongyi.png&#34; width=&#34;14px&#34; style=&#34;display:inline;&#34;&gt; &lt;strong&gt;Tongyi DeepResearch&lt;/strong&gt;, an agentic large language model featuring 30.5 billion total parameters, with only 3.3 billion activated per token. Developed by Tongyi Lab, the model is specifically designed for &lt;strong&gt;long-horizon, deep information-seeking&lt;/strong&gt; tasks. Tongyi DeepResearch demonstrates state-of-the-art performance across a range of agentic search benchmarks, including Humanity&amp;rsquo;s Last Exam, BrowserComp, BrowserComp-ZH, WebWalkerQA,xbench-DeepSearch, FRAMES and SimpleQA.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Tongyi DeepResearch builds upon our previous work on the &lt;img src=&#34;./assets/tongyi.png&#34; width=&#34;14px&#34; style=&#34;display:inline;&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;./WebAgent/&#34; &gt;WebAgent&lt;/a&gt; project.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;More details can be found in our 📰 &lt;a href=&#34;https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/&#34;&gt;Tech Blog&lt;/a&gt;.&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;img width=&#34;100%&#34; src=&#34;./assets/performance.png&#34;&gt;
&lt;/p&gt;
&lt;h2 id=&#34;features&#34;&gt;Features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;⚙️ &lt;strong&gt;Fully automated synthetic data generation pipeline&lt;/strong&gt;: We design a highly scalable data synthesis pipeline, which is fully automatic and empowers agentic pre-training, supervised fine-tuning, and reinforcement learning.&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Large-scale continual pre-training on agentic data&lt;/strong&gt;: Leveraging diverse, high-quality agentic interaction data to extend model capabilities, maintain freshness, and strengthen reasoning performance.&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;End-to-end reinforcement learning&lt;/strong&gt;: We employ a strictly on-policy RL approach based on a customized Group Relative Policy Optimization framework, with token-level policy gradients, leave-one-out advantage estimation, and selective filtering of negative samples to stabilize training in a non‑stationary environment.&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;Agent Inference Paradigm Compatibility&lt;/strong&gt;: At inference, Tongyi DeepResearch is compatible with two inference paradigms: ReAct, for rigorously evaluating the model&amp;rsquo;s core intrinsic abilities, and an IterResearch-based &amp;lsquo;Heavy&amp;rsquo; mode, which uses a test-time scaling strategy to unlock the model&amp;rsquo;s maximum performance ceiling.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;model-download&#34;&gt;Model Download
&lt;/h1&gt;&lt;p&gt;You can directly download the model by following the links below.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Model&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Download Links&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Model Size&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Context Length&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Tongyi-DeepResearch-30B-A3B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗 HuggingFace&lt;/a&gt;&lt;br&gt; &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/iic/Tongyi-DeepResearch-30B-A3B&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤖 ModelScope&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;30B-A3B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;128K&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h1 id=&#34;news&#34;&gt;News
&lt;/h1&gt;&lt;p&gt;[2025/09/20]🚀 Tongyi-DeepResearch-30B-A3B is now on &lt;a class=&#34;link&#34; href=&#34;https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenRouter&lt;/a&gt;! Follow the &lt;a class=&#34;link&#34; href=&#34;https://github.com/Alibaba-NLP/DeepResearch?tab=readme-ov-file#6-you-can-use-openrouters-api-to-call-our-model&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Quick-start&lt;/a&gt; guide.&lt;/p&gt;
&lt;p&gt;[2025/09/17]🔥 We have released &lt;strong&gt;Tongyi-DeepResearch-30B-A3B&lt;/strong&gt;.&lt;/p&gt;
&lt;h1 id=&#34;deep-research-benchmark-results&#34;&gt;Deep Research Benchmark Results
&lt;/h1&gt;&lt;p align=&#34;center&#34;&gt;
  &lt;img width=&#34;100%&#34; src=&#34;./assets/benchmark.png&#34;&gt;
&lt;/p&gt;
&lt;h2 id=&#34;quick-start&#34;&gt;Quick Start
&lt;/h2&gt;&lt;p&gt;This guide provides instructions for setting up the environment and running inference scripts located in the &lt;a class=&#34;link&#34; href=&#34;./inference/&#34; &gt;inference&lt;/a&gt; folder.&lt;/p&gt;
&lt;h3 id=&#34;1-environment-setup&#34;&gt;1. Environment Setup
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Recommended Python version: &lt;strong&gt;3.10.0&lt;/strong&gt; (using other versions may cause dependency issues).&lt;/li&gt;
&lt;li&gt;It is strongly advised to create an isolated environment using &lt;code&gt;conda&lt;/code&gt; or &lt;code&gt;virtualenv&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Example with Conda&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;conda create -n react_infer_env &lt;span class=&#34;nv&#34;&gt;python&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;3.10.0
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;conda activate react_infer_env
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;2-installation&#34;&gt;2. Installation
&lt;/h3&gt;&lt;p&gt;Install the required dependencies:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -r requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;3-environment-configuration-and-prepare-evaluation-data&#34;&gt;3. Environment Configuration and Prepare Evaluation Data
&lt;/h3&gt;&lt;h4 id=&#34;environment-configuration&#34;&gt;Environment Configuration
&lt;/h4&gt;&lt;p&gt;Configure your API keys and settings by copying the example environment file:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Copy the example environment file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;cp .env.example .env
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Edit the &lt;code&gt;.env&lt;/code&gt; file and provide your actual API keys and configuration values:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;SERPER_KEY_ID&lt;/strong&gt;: Get your key from &lt;a class=&#34;link&#34; href=&#34;https://serper.dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Serper.dev&lt;/a&gt; for web search and Google Scholar&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JINA_API_KEYS&lt;/strong&gt;: Get your key from &lt;a class=&#34;link&#34; href=&#34;https://jina.ai/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Jina.ai&lt;/a&gt; for web page reading&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;API_KEY/API_BASE&lt;/strong&gt;: OpenAI-compatible API for page summarization from &lt;a class=&#34;link&#34; href=&#34;https://platform.openai.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DASHSCOPE_API_KEY&lt;/strong&gt;: Get your key from &lt;a class=&#34;link&#34; href=&#34;https://dashscope.aliyun.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Dashscope&lt;/a&gt; for file parsing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SANDBOX_FUSION_ENDPOINT&lt;/strong&gt;: Python interpreter sandbox endpoints (see &lt;a class=&#34;link&#34; href=&#34;https://github.com/bytedance/SandboxFusion&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SandboxFusion&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MODEL_PATH&lt;/strong&gt;: Path to your model weights&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DATASET&lt;/strong&gt;: Name of your evaluation dataset&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OUTPUT_PATH&lt;/strong&gt;: Directory for saving results&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The &lt;code&gt;.env&lt;/code&gt; file is gitignored, so your secrets will not be committed to the repository.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id=&#34;prepare-evaluation-data&#34;&gt;Prepare Evaluation Data
&lt;/h4&gt;&lt;p&gt;The system supports two input file formats: &lt;strong&gt;JSON&lt;/strong&gt; and &lt;strong&gt;JSONL&lt;/strong&gt;.&lt;/p&gt;
&lt;h4 id=&#34;supported-file-formats&#34;&gt;Supported File Formats:
&lt;/h4&gt;&lt;p&gt;&lt;strong&gt;Option 1: JSONL Format (recommended)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create your data file with &lt;code&gt;.jsonl&lt;/code&gt; extension (e.g., &lt;code&gt;my_questions.jsonl&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Each line must be a valid JSON object with &lt;code&gt;question&lt;/code&gt; and &lt;code&gt;answer&lt;/code&gt; keys:
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;What is the capital of France?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nt&#34;&gt;&amp;#34;answer&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Paris&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Explain quantum computing&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nt&#34;&gt;&amp;#34;answer&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Option 2: JSON Format&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create your data file with &lt;code&gt;.json&lt;/code&gt; extension (e.g., &lt;code&gt;my_questions.json&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;File must contain a JSON array of objects, each with &lt;code&gt;question&lt;/code&gt; and &lt;code&gt;answer&lt;/code&gt; keys:
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;What is the capital of France?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nt&#34;&gt;&amp;#34;answer&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Paris&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nt&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Explain quantum computing&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nt&#34;&gt;&amp;#34;answer&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt; The &lt;code&gt;answer&lt;/code&gt; field contains the &lt;strong&gt;ground truth/reference answer&lt;/strong&gt; used for evaluation. The system generates its own responses to the questions, and these reference answers are used to automatically judge the quality of the generated responses during benchmark evaluation.&lt;/p&gt;
&lt;h4 id=&#34;file-references-for-document-processing&#34;&gt;File References for Document Processing:
&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;If using the &lt;em&gt;file parser&lt;/em&gt; tool, &lt;strong&gt;prepend the filename to the &lt;code&gt;question&lt;/code&gt; field&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Place referenced files in &lt;code&gt;eval_data/file_corpus/&lt;/code&gt; directory&lt;/li&gt;
&lt;li&gt;Example: &lt;code&gt;{&amp;quot;question&amp;quot;: &amp;quot;report.pdf What are the key findings?&amp;quot;, &amp;quot;answer&amp;quot;: &amp;quot;...&amp;quot;}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;file-organization&#34;&gt;File Organization:
&lt;/h4&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;project_root/
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── eval_data/
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;│   ├── my_questions.jsonl          # Your evaluation data
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;│   └── file_corpus/                # Referenced documents
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;│       ├── report.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;│       └── data.xlsx
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;4-configure-the-inference-script&#34;&gt;4. Configure the Inference Script
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Open &lt;code&gt;run_react_infer.sh&lt;/code&gt; and modify the following variables as instructed in the comments:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MODEL_PATH&lt;/code&gt;  - path to the local or remote model weights.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DATASET&lt;/code&gt;     - full path to your evaluation file, e.g. &lt;code&gt;eval_data/my_questions.jsonl&lt;/code&gt; or &lt;code&gt;/path/to/my_questions.json&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;OUTPUT_PATH&lt;/code&gt; - path for saving the prediction results, e.g. &lt;code&gt;./outputs&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Depending on the tools you enable (retrieval, calculator, web search, etc.), provide the required &lt;code&gt;API_KEY&lt;/code&gt;, &lt;code&gt;BASE_URL&lt;/code&gt;, or other credentials. Each key is explained inline in the bash script.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;5-run-the-inference-script&#34;&gt;5. Run the Inference Script
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bash run_react_infer.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr&gt;
&lt;p&gt;With these steps, you can fully prepare the environment, configure the dataset, and run the model. For more details, consult the inline comments in each script or open an issue.&lt;/p&gt;
&lt;h3 id=&#34;6-you-can-use-openrouters-api-to-call-our-model&#34;&gt;6. You can use OpenRouter&amp;rsquo;s API to call our model
&lt;/h3&gt;&lt;p&gt;Tongyi-DeepResearch-30B-A3B is now available at &lt;a class=&#34;link&#34; href=&#34;https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenRouter&lt;/a&gt;. You can run the inference without any GPUs.&lt;/p&gt;
&lt;p&gt;You need to modify the following in the file &lt;a class=&#34;link&#34; href=&#34;https://github.com/Alibaba-NLP/DeepResearch/blob/main/inference/react_agent.py&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;inference/react_agent.py&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In the call_server function: Set the API key and URL to your OpenRouter account’s API and URL.&lt;/li&gt;
&lt;li&gt;Change the model name to alibaba/tongyi-deepresearch-30b-a3b.&lt;/li&gt;
&lt;li&gt;Adjust the content concatenation way as described in the comments on lines &lt;strong&gt;88–90.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;benchmark-evaluation&#34;&gt;Benchmark Evaluation
&lt;/h2&gt;&lt;p&gt;We provide benchmark evaluation scripts for various datasets. Please refer to the &lt;a class=&#34;link&#34; href=&#34;./evaluation/&#34; &gt;evaluation scripts&lt;/a&gt; directory for more details.&lt;/p&gt;
&lt;h2 id=&#34;deep-research-agent-family&#34;&gt;Deep Research Agent Family
&lt;/h2&gt;&lt;p align=&#34;center&#34;&gt;
  &lt;img width=&#34;100%&#34; src=&#34;./assets/family.png&#34;&gt;
&lt;/p&gt;
&lt;p&gt;Tongyi DeepResearch also has an extensive deep research agent family. You can find more information in the following paper:&lt;/p&gt;
&lt;p&gt;[1] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2501.07572&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebWalker: Benchmarking LLMs in Web Traversal&lt;/a&gt; (ACL 2025)&lt;br&gt;
[2] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2505.22648&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebDancer: Towards Autonomous Information Seeking Agency&lt;/a&gt; (NeurIPS 2025)&lt;br&gt;
[3] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2507.02592&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebSailor: Navigating Super-human Reasoning for Web Agent&lt;/a&gt;&lt;br&gt;
[4] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2507.15061&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization&lt;/a&gt;&lt;br&gt;
[5] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2508.05748&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent&lt;/a&gt;&lt;br&gt;
[6] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2509.13309&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents&lt;/a&gt;&lt;br&gt;
[7] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2509.13313&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization&lt;/a&gt;&lt;br&gt;
[8] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2509.13312&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research&lt;/a&gt;&lt;br&gt;
[9] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2509.13305&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning&lt;/a&gt;&lt;br&gt;
[10] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2509.13310&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Scaling Agents via Continual Pre-training&lt;/a&gt;&lt;br&gt;
[11] &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2509.13311&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Towards General Agentic Intelligence via Environment Scaling&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;-misc&#34;&gt;🌟 Misc
&lt;/h2&gt;&lt;div align=&#34;center&#34;&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.star-history.com/#Alibaba-NLP/DeepResearch&amp;amp;Date&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://api.star-history.com/svg?repos=Alibaba-NLP/DeepResearch&amp;amp;type=Date&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Star History Chart&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&#34;-talent-recruitment&#34;&gt;🚩 Talent Recruitment
&lt;/h2&gt;&lt;p&gt;🔥🔥🔥 We are hiring! Research intern positions are open (based in Hangzhou、Beijing、Shanghai)&lt;/p&gt;
&lt;p&gt;📚 &lt;strong&gt;Research Area&lt;/strong&gt;：Web Agent, Search Agent, Agent RL, MultiAgent RL, Agentic RAG&lt;/p&gt;
&lt;p&gt;☎️ &lt;strong&gt;Contact&lt;/strong&gt;：&lt;a class=&#34;link&#34; href=&#34;&#34; &gt;yongjiang.jy@alibaba-inc.com&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;contact-information&#34;&gt;Contact Information
&lt;/h2&gt;&lt;p&gt;For communications, please contact Yong Jiang (&lt;a class=&#34;link&#34; href=&#34;mailto:yongjiang.jy@alibaba-inc.com&#34; &gt;yongjiang.jy@alibaba-inc.com&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&#34;citation&#34;&gt;Citation
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bibtex&#34; data-lang=&#34;bibtex&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@misc&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nl&#34;&gt;tongyidr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;author&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{Tongyi DeepResearch Team}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{Tongyi-DeepResearch}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;year&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{2025}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;howpublished&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{\url{https://github.com/Alibaba-NLP/DeepResearch}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        </item>
        <item>
        <title>ML-From-Scratch</title>
        <link>https://producthunt.programnotes.cn/en/p/ml-from-scratch/</link>
        <pubDate>Fri, 05 Sep 2025 15:27:51 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/ml-from-scratch/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1588017571031-356e08526b59?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTcwNTcxODl8&amp;ixlib=rb-4.1.0" alt="Featured image of post ML-From-Scratch" /&gt;&lt;h1 id=&#34;eriklindernorenml-from-scratch&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/eriklindernoren/ML-From-Scratch&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;eriklindernoren/ML-From-Scratch&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;machine-learning-from-scratch&#34;&gt;Machine Learning From Scratch
&lt;/h1&gt;&lt;h2 id=&#34;about&#34;&gt;About
&lt;/h2&gt;&lt;p&gt;Python implementations of some of the fundamental Machine Learning models and algorithms from scratch.&lt;/p&gt;
&lt;p&gt;The purpose of this project is not to produce as optimized and computationally efficient algorithms as possible
but rather to present the inner workings of them in a transparent and accessible way.&lt;/p&gt;
&lt;h2 id=&#34;table-of-contents&#34;&gt;Table of Contents
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#machine-learning-from-scratch&#34; &gt;Machine Learning From Scratch&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#about&#34; &gt;About&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#table-of-contents&#34; &gt;Table of Contents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#installation&#34; &gt;Installation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#examples&#34; &gt;Examples&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#polynomial-regression&#34; &gt;Polynomial Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#classification-with-cnn&#34; &gt;Classification With CNN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#density-based-clustering&#34; &gt;Density-Based Clustering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#generating-handwritten-digits&#34; &gt;Generating Handwritten Digits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#deep-reinforcement-learning&#34; &gt;Deep Reinforcement Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#image-reconstruction-with-rbm&#34; &gt;Image Reconstruction With RBM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#evolutionary-evolved-neural-network&#34; &gt;Evolutionary Evolved Neural Network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#genetic-algorithm&#34; &gt;Genetic Algorithm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#association-analysis&#34; &gt;Association Analysis&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#implementations&#34; &gt;Implementations&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#supervised-learning&#34; &gt;Supervised Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#unsupervised-learning&#34; &gt;Unsupervised Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#reinforcement-learning&#34; &gt;Reinforcement Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#deep-learning&#34; &gt;Deep Learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#contact&#34; &gt;Contact&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;pre&gt;&lt;code&gt;$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;examples&#34;&gt;Examples
&lt;/h2&gt;&lt;h3 id=&#34;polynomial-regression&#34;&gt;Polynomial Regression
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/polynomial_regression.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/p_reg.gif&#34; width=&#34;640&#34;\&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Training progress of a regularized polynomial regression model fitting &lt;br&gt;
    temperature data measured in Linköping, Sweden 2016.
&lt;/p&gt;
&lt;h3 id=&#34;classification-with-cnn&#34;&gt;Classification With CNN
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/convolutional_neural_network.py

+---------+
| ConvNet |
+---------+
Input Shape: (1, 8, 8)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Conv2D               | 160        | (16, 8, 8)   |
| Activation (ReLU)    | 0          | (16, 8, 8)   |
| Dropout              | 0          | (16, 8, 8)   |
| BatchNormalization   | 2048       | (16, 8, 8)   |
| Conv2D               | 4640       | (32, 8, 8)   |
| Activation (ReLU)    | 0          | (32, 8, 8)   |
| Dropout              | 0          | (32, 8, 8)   |
| BatchNormalization   | 4096       | (32, 8, 8)   |
| Flatten              | 0          | (2048,)      |
| Dense                | 524544     | (256,)       |
| Activation (ReLU)    | 0          | (256,)       |
| Dropout              | 0          | (256,)       |
| BatchNormalization   | 512        | (256,)       |
| Dense                | 2570       | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 538570

Training: 100% [------------------------------------------------------------------------] Time: 0:01:55
Accuracy: 0.987465181058
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/mlfs_cnn1.png&#34; width=&#34;640&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Classification of the digit dataset using CNN.
&lt;/p&gt;
&lt;h3 id=&#34;density-based-clustering&#34;&gt;Density-Based Clustering
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/dbscan.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/mlfs_dbscan.png&#34; width=&#34;640&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Clustering of the moons dataset using DBSCAN.
&lt;/p&gt;
&lt;h3 id=&#34;generating-handwritten-digits&#34;&gt;Generating Handwritten Digits
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/unsupervised_learning/generative_adversarial_network.py

+-----------+
| Generator |
+-----------+
Input Shape: (100,)
+------------------------+------------+--------------+
| Layer Type             | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense                  | 25856      | (256,)       |
| Activation (LeakyReLU) | 0          | (256,)       |
| BatchNormalization     | 512        | (256,)       |
| Dense                  | 131584     | (512,)       |
| Activation (LeakyReLU) | 0          | (512,)       |
| BatchNormalization     | 1024       | (512,)       |
| Dense                  | 525312     | (1024,)      |
| Activation (LeakyReLU) | 0          | (1024,)      |
| BatchNormalization     | 2048       | (1024,)      |
| Dense                  | 803600     | (784,)       |
| Activation (TanH)      | 0          | (784,)       |
+------------------------+------------+--------------+
Total Parameters: 1489936

+---------------+
| Discriminator |
+---------------+
Input Shape: (784,)
+------------------------+------------+--------------+
| Layer Type             | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense                  | 401920     | (512,)       |
| Activation (LeakyReLU) | 0          | (512,)       |
| Dropout                | 0          | (512,)       |
| Dense                  | 131328     | (256,)       |
| Activation (LeakyReLU) | 0          | (256,)       |
| Dropout                | 0          | (256,)       |
| Dense                  | 514        | (2,)         |
| Activation (Softmax)   | 0          | (2,)         |
+------------------------+------------+--------------+
Total Parameters: 533762
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/gan_mnist5.gif&#34; width=&#34;640&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Training progress of a Generative Adversarial Network generating &lt;br&gt;
    handwritten digits.
&lt;/p&gt;
&lt;h3 id=&#34;deep-reinforcement-learning&#34;&gt;Deep Reinforcement Learning
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/deep_q_network.py

+----------------+
| Deep Q-Network |
+----------------+
Input Shape: (4,)
+-------------------+------------+--------------+
| Layer Type        | Parameters | Output Shape |
+-------------------+------------+--------------+
| Dense             | 320        | (64,)        |
| Activation (ReLU) | 0          | (64,)        |
| Dense             | 130        | (2,)         |
+-------------------+------------+--------------+
Total Parameters: 450
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/mlfs_dql1.gif&#34; width=&#34;640&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Deep Q-Network solution to the CartPole-v1 environment in OpenAI gym.
&lt;/p&gt;
&lt;h3 id=&#34;image-reconstruction-with-rbm&#34;&gt;Image Reconstruction With RBM
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/restricted_boltzmann_machine.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/rbm_digits1.gif&#34; width=&#34;640&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Shows how the network gets better during training at reconstructing &lt;br&gt;
    the digit 2 in the MNIST dataset.
&lt;/p&gt;
&lt;h3 id=&#34;evolutionary-evolved-neural-network&#34;&gt;Evolutionary Evolved Neural Network
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/neuroevolution.py

+---------------+
| Model Summary |
+---------------+
Input Shape: (64,)
+----------------------+------------+--------------+
| Layer Type           | Parameters | Output Shape |
+----------------------+------------+--------------+
| Dense                | 1040       | (16,)        |
| Activation (ReLU)    | 0          | (16,)        |
| Dense                | 170        | (10,)        |
| Activation (Softmax) | 0          | (10,)        |
+----------------------+------------+--------------+
Total Parameters: 1210

Population Size: 100
Generations: 3000
Mutation Rate: 0.01

[0 Best Individual - Fitness: 3.08301, Accuracy: 10.5%]
[1 Best Individual - Fitness: 3.08746, Accuracy: 12.0%]
...
[2999 Best Individual - Fitness: 94.08513, Accuracy: 98.5%]
Test set accuracy: 96.7%
&lt;/code&gt;&lt;/pre&gt;
&lt;p align=&#34;center&#34;&gt;
    &lt;img src=&#34;http://eriklindernoren.se/images/evo_nn4.png&#34; width=&#34;640&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
    Figure: Classification of the digit dataset by a neural network which has&lt;br&gt;
    been evolutionary evolved.
&lt;/p&gt;
&lt;h3 id=&#34;genetic-algorithm&#34;&gt;Genetic Algorithm
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/genetic_algorithm.py

+--------+
|   GA   |
+--------+
Description: Implementation of a Genetic Algorithm which aims to produce
the user specified target string. This implementation calculates each
candidate&#39;s fitness based on the alphabetical distance between the candidate
and the target. A candidate is selected as a parent with probabilities proportional
to the candidate&#39;s fitness. Reproduction is implemented as a single-point
crossover between pairs of parents. Mutation is done by randomly assigning
new characters with uniform probability.

Parameters
----------
Target String: &#39;Genetic Algorithm&#39;
Population Size: 100
Mutation Rate: 0.05

[0 Closest Candidate: &#39;CJqlJguPlqzvpoJmb&#39;, Fitness: 0.00]
[1 Closest Candidate: &#39;MCxZxdr nlfiwwGEk&#39;, Fitness: 0.01]
[2 Closest Candidate: &#39;MCxZxdm nlfiwwGcx&#39;, Fitness: 0.01]
[3 Closest Candidate: &#39;SmdsAklMHn kBIwKn&#39;, Fitness: 0.01]
[4 Closest Candidate: &#39;  lotneaJOasWfu Z&#39;, Fitness: 0.01]
...
[292 Closest Candidate: &#39;GeneticaAlgorithm&#39;, Fitness: 1.00]
[293 Closest Candidate: &#39;GeneticaAlgorithm&#39;, Fitness: 1.00]
[294 Answer: &#39;Genetic Algorithm&#39;]
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;association-analysis&#34;&gt;Association Analysis
&lt;/h3&gt;&lt;pre&gt;&lt;code&gt;$ python mlfromscratch/examples/apriori.py
+-------------+
|   Apriori   |
+-------------+
Minimum Support: 0.25
Minimum Confidence: 0.8
Transactions:
    [1, 2, 3, 4]
    [1, 2, 4]
    [1, 2]
    [2, 3, 4]
    [2, 3]
    [3, 4]
    [2, 4]
Frequent Itemsets:
    [1, 2, 3, 4, [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [1, 2, 4], [2, 3, 4]]
Rules:
    1 -&amp;gt; 2 (support: 0.43, confidence: 1.0)
    4 -&amp;gt; 2 (support: 0.57, confidence: 0.8)
    [1, 4] -&amp;gt; 2 (support: 0.29, confidence: 1.0)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;implementations&#34;&gt;Implementations
&lt;/h2&gt;&lt;h3 id=&#34;supervised-learning&#34;&gt;Supervised Learning
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/adaboost.py&#34; &gt;Adaboost&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/bayesian_regression.py&#34; &gt;Bayesian Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/decision_tree.py&#34; &gt;Decision Tree&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/regression.py&#34; &gt;Elastic Net&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/gradient_boosting.py&#34; &gt;Gradient Boosting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/k_nearest_neighbors.py&#34; &gt;K Nearest Neighbors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/regression.py&#34; &gt;Lasso Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/linear_discriminant_analysis.py&#34; &gt;Linear Discriminant Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/regression.py&#34; &gt;Linear Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/logistic_regression.py&#34; &gt;Logistic Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/multi_class_lda.py&#34; &gt;Multi-class Linear Discriminant Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/multilayer_perceptron.py&#34; &gt;Multilayer Perceptron&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/naive_bayes.py&#34; &gt;Naive Bayes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/neuroevolution.py&#34; &gt;Neuroevolution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/particle_swarm_optimization.py&#34; &gt;Particle Swarm Optimization of Neural Network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/perceptron.py&#34; &gt;Perceptron&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/regression.py&#34; &gt;Polynomial Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/random_forest.py&#34; &gt;Random Forest&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/regression.py&#34; &gt;Ridge Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/support_vector_machine.py&#34; &gt;Support Vector Machine&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/supervised_learning/xgboost.py&#34; &gt;XGBoost&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;unsupervised-learning&#34;&gt;Unsupervised Learning
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/apriori.py&#34; &gt;Apriori&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/autoencoder.py&#34; &gt;Autoencoder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/dbscan.py&#34; &gt;DBSCAN&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/fp_growth.py&#34; &gt;FP-Growth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/gaussian_mixture_model.py&#34; &gt;Gaussian Mixture Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/generative_adversarial_network.py&#34; &gt;Generative Adversarial Network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/genetic_algorithm.py&#34; &gt;Genetic Algorithm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/k_means.py&#34; &gt;K-Means&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/partitioning_around_medoids.py&#34; &gt;Partitioning Around Medoids&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/principal_component_analysis.py&#34; &gt;Principal Component Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/unsupervised_learning/restricted_boltzmann_machine.py&#34; &gt;Restricted Boltzmann Machine&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;reinforcement-learning&#34;&gt;Reinforcement Learning
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/reinforcement_learning/deep_q_network.py&#34; &gt;Deep Q-Network&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;deep-learning&#34;&gt;Deep Learning
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/deep_learning/neural_network.py&#34; &gt;Neural Network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/deep_learning/layers.py&#34; &gt;Layers&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;Activation Layer&lt;/li&gt;
&lt;li&gt;Average Pooling Layer&lt;/li&gt;
&lt;li&gt;Batch Normalization Layer&lt;/li&gt;
&lt;li&gt;Constant Padding Layer&lt;/li&gt;
&lt;li&gt;Convolutional Layer&lt;/li&gt;
&lt;li&gt;Dropout Layer&lt;/li&gt;
&lt;li&gt;Flatten Layer&lt;/li&gt;
&lt;li&gt;Fully-Connected (Dense) Layer&lt;/li&gt;
&lt;li&gt;Fully-Connected RNN Layer&lt;/li&gt;
&lt;li&gt;Max Pooling Layer&lt;/li&gt;
&lt;li&gt;Reshape Layer&lt;/li&gt;
&lt;li&gt;Up Sampling Layer&lt;/li&gt;
&lt;li&gt;Zero Padding Layer&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Model Types
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/examples/convolutional_neural_network.py&#34; &gt;Convolutional Neural Network&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/examples/multilayer_perceptron.py&#34; &gt;Multilayer Perceptron&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;mlfromscratch/examples/recurrent_neural_network.py&#34; &gt;Recurrent Neural Network&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;contact&#34;&gt;Contact
&lt;/h2&gt;&lt;p&gt;If there&amp;rsquo;s some implementation you would like to see here or if you&amp;rsquo;re just feeling social,
feel free to &lt;a class=&#34;link&#34; href=&#34;mailto:eriklindernoren@gmail.com&#34; &gt;email&lt;/a&gt; me or connect with me on &lt;a class=&#34;link&#34; href=&#34;https://www.linkedin.com/in/eriklindernoren/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>verifiers</title>
        <link>https://producthunt.programnotes.cn/en/p/verifiers/</link>
        <pubDate>Thu, 28 Aug 2025 15:29:47 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/verifiers/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1696345452312-dd623f76551c?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTYzNjYwODJ8&amp;ixlib=rb-4.1.0" alt="Featured image of post verifiers" /&gt;&lt;h1 id=&#34;willccbbverifiers&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/willccbb/verifiers&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;willccbb/verifiers&lt;/a&gt;
&lt;/h1&gt;&lt;div align=&#34;center&#34;&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;h1&gt;Verifiers&lt;/h1&gt;
&lt;/p&gt;
&lt;p&gt;
Environments for LLM Reinforcement Learning
&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&#34;overview&#34;&gt;Overview
&lt;/h2&gt;&lt;p&gt;Verifiers is a library of modular components for creating RL environments and training LLM agents. Verifiers includes an async GRPO implementation built around the &lt;code&gt;transformers&lt;/code&gt; Trainer, is supported by &lt;code&gt;prime-rl&lt;/code&gt; for large-scale FSDP training, and can easily be integrated into any RL framework which exposes an OpenAI-compatible inference client. In addition to RL training, Verifiers can be used directly for building LLM evaluations, creating synthetic data pipelines, and implementing agent harnesses.&lt;/p&gt;
&lt;p&gt;Full documentation is available &lt;a class=&#34;link&#34; href=&#34;https://verifiers.readthedocs.io/en/latest/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;setup&#34;&gt;Setup
&lt;/h2&gt;&lt;p&gt;We recommend using &lt;code&gt;verifiers&lt;/code&gt; with along &lt;a class=&#34;link&#34; href=&#34;https://docs.astral.sh/uv/getting-started/installation/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;uv&lt;/a&gt; for dependency management in your own project:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -LsSf https://astral.sh/uv/install.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; sh
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv init &lt;span class=&#34;c1&#34;&gt;# create a fresh project&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For local (CPU) development and evaluation with API models, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv add verifiers &lt;span class=&#34;c1&#34;&gt;# uv add &amp;#39;verifiers[dev]&amp;#39; for Jupyter + testing support&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For training on GPUs with &lt;code&gt;vf.GRPOTrainer&lt;/code&gt;, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv add &lt;span class=&#34;s1&#34;&gt;&amp;#39;verifiers[all]&amp;#39;&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip install flash-attn --no-build-isolation
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;To use the latest &lt;code&gt;main&lt;/code&gt; branch, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv add verifiers @ git+https://github.com/willccbb/verifiers.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;To use with &lt;code&gt;prime-rl&lt;/code&gt;, see &lt;a class=&#34;link&#34; href=&#34;https://github.com/PrimeIntellect-ai/prime-rl&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To install &lt;code&gt;verifiers&lt;/code&gt; from source for core library development, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/willccbb/verifiers.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; verifiers
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv sync --all-extras &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv pip install flash-attn --no-build-isolation
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv run pre-commit install
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;In general, we recommend that you build and train Environments &lt;em&gt;with&lt;/em&gt; &lt;code&gt;verifiers&lt;/code&gt;, not &lt;em&gt;in&lt;/em&gt; &lt;code&gt;verifiers&lt;/code&gt;. If you find yourself needing to clone and modify the core library in order to implement key functionality for your project, we&amp;rsquo;d love for you to open an issue so that we can try and streamline the development experience. Our aim is for &lt;code&gt;verifiers&lt;/code&gt; to be a reliable toolkit to build on top of, and to minimize the &amp;ldquo;fork proliferation&amp;rdquo; which often pervades the RL infrastructure ecosystem.&lt;/p&gt;
&lt;h2 id=&#34;environments&#34;&gt;Environments
&lt;/h2&gt;&lt;p&gt;Environments in Verifiers are installable Python modules which can specify dependencies in a &lt;code&gt;pyproject.toml&lt;/code&gt;, and which expose a &lt;code&gt;load_environment&lt;/code&gt; function for instantiation by downstream applications (e.g. trainers). See &lt;code&gt;environments/&lt;/code&gt; for examples.&lt;/p&gt;
&lt;p&gt;To initialize a blank Environment module template, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-init vf-environment-name &lt;span class=&#34;c1&#34;&gt;# -p /path/to/environments (defaults to &amp;#34;./environments&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;To an install an Environment module into your project, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-install vf-environment-name &lt;span class=&#34;c1&#34;&gt;# -p /path/to/environments (defaults to &amp;#34;./environments&amp;#34;) &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;To install an Environment module from this repo&amp;rsquo;s &lt;code&gt;environments&lt;/code&gt; folder, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-install vf-math-python --from-repo &lt;span class=&#34;c1&#34;&gt;# -b branch_or_commit (defaults to &amp;#34;main&amp;#34;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Once an Environment module is installed, you can create an instance of the Environment using &lt;code&gt;load_environment&lt;/code&gt;, passing any necessary args:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;verifiers&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;vf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;vf_env&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;vf&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load_environment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;vf-environment-name&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;**&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;env_args&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;To run a quick evaluation of your Environment with an API-based model, do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-eval vf-environment-name &lt;span class=&#34;c1&#34;&gt;# vf-eval -h for config options; defaults to gpt-4.1-mini, 5 prompts, 3 rollouts for each&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The core elements of Environments in are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Datasets: a Hugging Face &lt;code&gt;Dataset&lt;/code&gt; with a &lt;code&gt;prompt&lt;/code&gt; column for inputs, and either &lt;code&gt;answer (str)&lt;/code&gt; or &lt;code&gt;info (dict)&lt;/code&gt; columns for evaluation&lt;/li&gt;
&lt;li&gt;Rollout logic: interactions between models and the environment (e.g. &lt;code&gt;env_response&lt;/code&gt; + &lt;code&gt;is_completed&lt;/code&gt; for any &lt;code&gt;MultiTurnEnv&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Rubrics: an encapsulation for one or more reward functions&lt;/li&gt;
&lt;li&gt;Parsers: optional; an encapsulation for reusable parsing logic&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We support both &lt;code&gt;/v1/chat/completions&lt;/code&gt;-style and &lt;code&gt;/v1/completions&lt;/code&gt;-style inference via OpenAI clients, though we generally recommend &lt;code&gt;/v1/chat/completions&lt;/code&gt;-style inference for the vast majority of applications. Both the included &lt;code&gt;GRPOTrainer&lt;/code&gt; as well as &lt;code&gt;prime-rl&lt;/code&gt; support the full set of &lt;a class=&#34;link&#34; href=&#34;https://docs.vllm.ai/en/v0.6.0/dev/sampling_params.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SamplingParams&lt;/a&gt; exposed by vLLM (via their OpenAI-compatible &lt;a class=&#34;link&#34; href=&#34;https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;server&lt;/a&gt; interface), and leveraging this will often be the appropriate way to implement rollout strategies requiring finer-grained control, such as interrupting and resuming generations for interleaved tool use, or enforcing reasoning budgets.&lt;/p&gt;
&lt;p&gt;The primary constraint we impose on rollout logic is that token sequences must be &lt;em&gt;increasing&lt;/em&gt;, i.e. once a token has been added to a model&amp;rsquo;s context in a rollout, it must remain as the rollout progresses. Note that this causes issues with some popular reasoning models such as the Qwen3 and DeepSeek-R1-Distill series; see &lt;a class=&#34;link&#34; href=&#34;#footguns&#34; &gt;Footguns&lt;/a&gt; for guidance on adapting these models to support multi-turn rollouts.&lt;/p&gt;
&lt;h3 id=&#34;singleturnenv&#34;&gt;SingleTurnEnv
&lt;/h3&gt;&lt;p&gt;For tasks requiring only a single response from a model for each prompt, you can use &lt;code&gt;SingleTurnEnv&lt;/code&gt; directly by specifying a Dataset and a Rubric. Rubrics are sets of reward functions, which can be either sync or async.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;datasets&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;load_dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;verifiers&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;vf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;load_dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;my-account/my-dataset&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;split&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;train&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;reward_A&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;completion&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;info&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;c1&#34;&gt;# reward fn, e.g. correctness&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;reward_B&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;parser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;completion&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;c1&#34;&gt;# auxiliary reward fn, e.g. format&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;async&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;metric&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;completion&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;c1&#34;&gt;# non-reward metric, e.g. proper noun count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;rubric&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;vf&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Rubric&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;funcs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;reward_A&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;reward_B&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;metric&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;],&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;weights&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;1.0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;0.5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;0.0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;vf_env&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;SingleTurnEnv&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;n&#34;&gt;rubric&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rubric&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;results&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;vf_env&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;evaluate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;OpenAI&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(),&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gpt-4.1-mini&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;num_examples&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;100&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rollouts_per_example&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;vf_env&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;make_dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;results&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# HF dataset format&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Datasets should be formatted with columns for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&#39;prompt&#39; (List[ChatMessage])&lt;/code&gt; OR &lt;code&gt;&#39;question&#39; (str)&lt;/code&gt; fields
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ChatMessage&lt;/code&gt; = e.g. &lt;code&gt;{&#39;role&#39;: &#39;user&#39;, &#39;content&#39;: &#39;...&#39;}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;if &lt;code&gt;question&lt;/code&gt; is set instead of &lt;code&gt;prompt&lt;/code&gt;, you can also pass &lt;code&gt;system_prompt (str)&lt;/code&gt; and/or &lt;code&gt;few_shot (List[ChatMessage])&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;answer (str)&lt;/code&gt; AND/OR &lt;code&gt;info (dict)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;task (str)&lt;/code&gt;: optional, used by &lt;code&gt;EnvGroup&lt;/code&gt; and &lt;code&gt;RubricGroup&lt;/code&gt; for orchestrating composition of Environments and Rubrics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following named attributes available for use by reward functions in your Rubric:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;prompt&lt;/code&gt;: sequence of input messages&lt;/li&gt;
&lt;li&gt;&lt;code&gt;completion&lt;/code&gt;: sequence of messages generated during rollout by model and Environment&lt;/li&gt;
&lt;li&gt;&lt;code&gt;answer&lt;/code&gt;: primary answer column, optional if &lt;code&gt;info&lt;/code&gt; is used&lt;/li&gt;
&lt;li&gt;&lt;code&gt;state&lt;/code&gt;: can be modified during rollout to accumulate any metadata (&lt;code&gt;state[&#39;responses&#39;]&lt;/code&gt; includes full OpenAI response objects by default)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;info&lt;/code&gt;: auxiliary info needed for reward computation (e.g. test cases), optional if &lt;code&gt;answer&lt;/code&gt; is used&lt;/li&gt;
&lt;li&gt;&lt;code&gt;task&lt;/code&gt;: tag for task type (used by &lt;code&gt;EnvGroup&lt;/code&gt; and &lt;code&gt;RubricGroup&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;parser&lt;/code&gt;: the parser object declared. Note: &lt;code&gt;vf.Parser().get_format_reward_func()&lt;/code&gt; is a no-op (always 1.0); use &lt;code&gt;vf.ThinkParser&lt;/code&gt; or a custom parser if you want a real format adherence reward.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For tasks involving LLM judges, you may wish to use &lt;code&gt;vf.JudgeRubric()&lt;/code&gt; for managing requests to auxiliary models.&lt;/p&gt;
&lt;p&gt;Note on concurrency: environment APIs accept &lt;code&gt;max_concurrent&lt;/code&gt; to control parallel rollouts. The &lt;code&gt;vf-eval&lt;/code&gt; CLI currently exposes &lt;code&gt;--max-concurrent-requests&lt;/code&gt;; ensure this maps to your environment’s concurrency as expected.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vf-eval&lt;/code&gt; also supports specifying &lt;code&gt;sampling_args&lt;/code&gt; as a JSON object, which is sent to the vLLM inference engine:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-eval vf-environment-name --sampling-args &lt;span class=&#34;s1&#34;&gt;&amp;#39;{&amp;#34;reasoning_effort&amp;#34;: &amp;#34;low&amp;#34;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Use &lt;code&gt;vf-eval -s&lt;/code&gt; to save outputs as dataset-formatted JSON, and view all locally-saved eval results with &lt;code&gt;vf-tui&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;toolenv&#34;&gt;ToolEnv
&lt;/h3&gt;&lt;p&gt;For many applications involving tool use, you can use &lt;code&gt;ToolEnv&lt;/code&gt; to leverage models&amp;rsquo; native tool/function-calling capabilities in an agentic loop. Tools can be specified as generic Python functions (with type hints and docstrings), which will then be passed in JSON schema form to each inference request.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;verifiers&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;vf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;vf_env&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;vf&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ToolEnv&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# HF Dataset with &amp;#39;prompt&amp;#39;/&amp;#39;question&amp;#39; + &amp;#39;answer&amp;#39;/&amp;#39;info&amp;#39; columns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;n&#34;&gt;rubric&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# Rubric object; vf.ToolRubric() can be optionally used for counting tool invocations in each rollout&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;n&#34;&gt;tools&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;search_tool&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;read_article_tool&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;python_tool&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;],&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# python functions with type hints + docstrings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	&lt;span class=&#34;n&#34;&gt;max_turns&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;In cases where your tools require heavy computational resources, we recommend hosting your tools as standalone servers (e.g. MCP servers) and creating lightweight wrapper functions to pass to &lt;code&gt;ToolEnv&lt;/code&gt;. Parallel tool call support is enabled by default.&lt;/p&gt;
&lt;p&gt;For training, or self-hosted endpoints, you&amp;rsquo;ll want to enable auto tool choice in &lt;a class=&#34;link&#34; href=&#34;https://docs.vllm.ai/en/stable/features/tool_calling.html#automatic-function-calling&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vLLM&lt;/a&gt; with the appropriate parser. If your model does not support native tool calling, you may find the &lt;code&gt;XMLParser&lt;/code&gt; abstraction useful for rolling your own tool call parsing on top of &lt;code&gt;MultiTurnEnv&lt;/code&gt;; see &lt;code&gt;environments/xml_tool_env&lt;/code&gt; for an example.&lt;/p&gt;
&lt;h3 id=&#34;multiturnenv&#34;&gt;MultiTurnEnv
&lt;/h3&gt;&lt;p&gt;Both &lt;code&gt;SingleTurnEnv&lt;/code&gt; and &lt;code&gt;ToolEnv&lt;/code&gt; are instances of &lt;code&gt;MultiTurnEnv&lt;/code&gt;, which exposes an interface for writing custom Environment interaction protocols. The two methods you must override are&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;typing&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Tuple&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;verifiers&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;vf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;verifiers.types&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;YourMultiTurnEnv&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vf&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MultiTurnEnv&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;fm&#34;&gt;__init__&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                 &lt;span class=&#34;n&#34;&gt;dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Dataset&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                 &lt;span class=&#34;n&#34;&gt;rubric&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Rubric&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;				 &lt;span class=&#34;n&#34;&gt;max_turns&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                 &lt;span class=&#34;o&#34;&gt;**&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;kwargs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;k&#34;&gt;async&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;is_completed&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;**&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;kwargs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;bool&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return whether or not a rollout is completed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;k&#34;&gt;async&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;env_response&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;state&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;**&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;kwargs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Tuple&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# return new environment message(s) + updated state&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If your application requires more fine-grained control than is allowed by &lt;code&gt;MultiTurnEnv&lt;/code&gt;, you may want to inherit from the base &lt;code&gt;Environment&lt;/code&gt; functionality directly and override the &lt;code&gt;rollout&lt;/code&gt; method.&lt;/p&gt;
&lt;h2 id=&#34;training&#34;&gt;Training
&lt;/h2&gt;&lt;h3 id=&#34;grpotrainer&#34;&gt;GRPOTrainer
&lt;/h3&gt;&lt;p&gt;The included trainer (&lt;code&gt;vf.GRPOTrainer&lt;/code&gt;) supports running GRPO-style RL training via Accelerate/DeepSpeed, and uses vLLM for inference. It supports both full-parameter finetuning, and is optimized for efficiently training dense transformer models on 2-16 GPUs.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# install environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-install vf-wordle &lt;span class=&#34;o&#34;&gt;(&lt;/span&gt;-p /path/to/environments &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; --from-repo&lt;span class=&#34;o&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# quick eval&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;vf-eval vf-wordle -m &lt;span class=&#34;o&#34;&gt;(&lt;/span&gt;model_name in configs/endpoints.py&lt;span class=&#34;o&#34;&gt;)&lt;/span&gt; -n NUM_EXAMPLES -r ROLLOUTS_PER_EXAMPLE
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# inference (shell 0)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;0,1,2,3,4,5 vf-vllm --model willcb/Qwen3-1.7B-Wordle &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    --data-parallel-size &lt;span class=&#34;m&#34;&gt;7&lt;/span&gt; --enforce-eager --disable-log-requests
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# training (shell 1)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;CUDA_VISIBLE_DEVICES&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;6,7 accelerate launch --num-processes &lt;span class=&#34;m&#34;&gt;2&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    --config-file configs/zero3.yaml examples/grpo/train_wordle.py --size 1.7B
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Alternatively, you can train environments with the external &lt;code&gt;prime-rl&lt;/code&gt; project (FSDP-first orchestration). See the &lt;code&gt;prime-rl&lt;/code&gt; README for installation and examples. For example:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-toml&#34; data-lang=&#34;toml&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c&#34;&gt;# orchestrator config (prime-rl)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;nx&#34;&gt;environment&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nx&#34;&gt;id&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;vf-math-python&amp;#34;&lt;/span&gt;  &lt;span class=&#34;c&#34;&gt;# or your environment ID&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# run (prime-rl)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv run rl &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --trainer @ configs/your_exp/train.toml &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --orchestrator @ configs/your_exp/orch.toml &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --inference @ configs/your_exp/infer.toml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;troubleshooting&#34;&gt;Troubleshooting
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Ensure your &lt;code&gt;wandb&lt;/code&gt; and &lt;code&gt;huggingface-cli&lt;/code&gt; logins are set up (or set &lt;code&gt;report_to=None&lt;/code&gt; in &lt;code&gt;training_args&lt;/code&gt;). You should also have something set as your &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; in your environment (can be a dummy key for vLLM).&lt;/li&gt;
&lt;li&gt;If using high max concurrency, increase the number of allowed open sockets (e.g. &lt;code&gt;ulimit -n 4096&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;On some setups, inter-GPU communication can &lt;a class=&#34;link&#34; href=&#34;https://github.com/huggingface/trl/issues/2923&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hang&lt;/a&gt; or crash during vLLM weight syncing. This can usually be alleviated by setting (or unsetting) &lt;code&gt;NCCL_P2P_DISABLE=1&lt;/code&gt; in your environment (or potentially &lt;code&gt;NCCL_CUMEM_ENABLE=1&lt;/code&gt;). Try this as your first step if you experience NCCL-related issues.&lt;/li&gt;
&lt;li&gt;If problems persist, please open an &lt;a class=&#34;link&#34; href=&#34;https://github.com/willccbb/verifiers/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;issue&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;resource-requirements&#34;&gt;Resource Requirements
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GRPOTrainer&lt;/code&gt; is optimized for setups with at least 2 GPUs, scaling up to multiple nodes. 2-GPU setups with sufficient memory to enable small-scale experimentation can be &lt;a class=&#34;link&#34; href=&#34;https://app.primeintellect.ai/dashboard/create-cluster?image=ubuntu_22_cuda_12&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;rented&lt;/a&gt; for &amp;lt;$1/hr.&lt;/p&gt;
&lt;h3 id=&#34;prime-rl&#34;&gt;PRIME-RL
&lt;/h3&gt;&lt;p&gt;If you do not require LoRA support, you may want to use the &lt;code&gt;prime-rl&lt;/code&gt; trainer, which natively supports Environments created using &lt;code&gt;verifiers&lt;/code&gt;, is more optimized for performance and scalability via FSDP, includes a broader set of configuration options and user experience features, and has more battle-tested defaults. Both trainers support asynchronous rollouts, and use a one-step off-policy delay by default for overlapping training and inference. See the &lt;code&gt;prime-rl&lt;/code&gt; &lt;a class=&#34;link&#34; href=&#34;https://github.com/PrimeIntellect-ai/prime-rl&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docs&lt;/a&gt; for usage instructions.&lt;/p&gt;
&lt;h2 id=&#34;further-documentation&#34;&gt;Further Documentation
&lt;/h2&gt;&lt;p&gt;See the full &lt;a class=&#34;link&#34; href=&#34;https://verifiers.readthedocs.io/en/latest/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docs&lt;/a&gt; for more information.&lt;/p&gt;
&lt;h2 id=&#34;contributions&#34;&gt;Contributions
&lt;/h2&gt;&lt;p&gt;Verifiers warmly welcomes community contributions! Please open an issue or PR if you encounter bugs or other pain points during your development, or start a discussion for more open-ended questions.&lt;/p&gt;
&lt;p&gt;Please note that the core &lt;code&gt;verifiers/&lt;/code&gt; library is intended to be a relatively lightweight set of reusable components rather than an exhaustive catalog of RL environments. For &lt;em&gt;applications&lt;/em&gt; of &lt;code&gt;verifiers&lt;/code&gt; (e.g. &amp;ldquo;an Environment for XYZ task&amp;rdquo;), you are welcome to submit a PR for a self-contained module that lives within &lt;code&gt;environments/&lt;/code&gt; if it serves as a canonical example of a new pattern. Stay tuned for more info shortly about our plans for supporting community Environment contributions 🙂&lt;/p&gt;
&lt;h2 id=&#34;citation&#34;&gt;Citation
&lt;/h2&gt;&lt;p&gt;If you use this code in your research, please cite:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bibtex&#34; data-lang=&#34;bibtex&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@misc&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nl&#34;&gt;brown_verifiers_2025&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;author&lt;/span&gt;       &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{William Brown}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;title&lt;/span&gt;        &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{{Verifiers}: Reinforcement Learning with LLMs in Verifiable Environments}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;howpublished&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{\url{https://github.com/willccbb/verifiers}}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;note&lt;/span&gt;         &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{Commit abcdefg • accessed DD Mon YYYY}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;year&lt;/span&gt;         &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{2025}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;roadmap&#34;&gt;Roadmap
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;A community Environments hub for crowdsourcing, sharing, and discovering new RL environments built with &lt;code&gt;verifiers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Default patterns for hosted resources such as code sandboxes, auxiliary models, and MCP servers&lt;/li&gt;
&lt;li&gt;Multimodal input support&lt;/li&gt;
&lt;li&gt;Non-increasing token sequences via REINFORCE&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>ART</title>
        <link>https://producthunt.programnotes.cn/en/p/art/</link>
        <pubDate>Thu, 28 Aug 2025 15:29:23 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/art/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1696345452312-dd623f76551c?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTYzNjYwODJ8&amp;ixlib=rb-4.1.0" alt="Featured image of post ART" /&gt;&lt;h1 id=&#34;openpipeart&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenPipe/ART&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenPipe/ART&lt;/a&gt;
&lt;/h1&gt;&lt;div align=&#34;center&#34;&gt;
&lt;p&gt;&lt;a href=&#34;https://art.openpipe.ai&#34;&gt;&lt;picture&gt;
&lt;img alt=&#34;ART logo&#34; src=&#34;https://github.com/openpipe/art/raw/main/assets/ART_logo.png&#34; width=&#34;160px&#34;&gt;
&lt;/picture&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;h1&gt;Agent Reinforcement Trainer&lt;/h1&gt;
&lt;/p&gt;
&lt;p&gt;
Train multi-step agents for real-world tasks using GRPO.
&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/openpipe/art/blob/main/CONTRIBUTING.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/PRs-welcome-blue.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PRs-Welcome&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/openpipe-art/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/v/openpipe-art?color=364fc7&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI version&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://colab.research.google.com/assets/colab-badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Train Agent&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://discord.gg/zbBHRUpwf4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Join%20Discord-5865F2?style=plastic&amp;amp;logo=discord&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Join Discord&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://art.openpipe.ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Documentation-orange?style=plastic&amp;amp;logo=gitbook&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Documentation&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&#34;-ruler-zero-shot-agent-rewards&#34;&gt;📏 RULER: Zero-Shot Agent Rewards
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;RULER&lt;/strong&gt; (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—&lt;strong&gt;no labeled data, expert feedback, or reward engineering required&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;✨ &lt;strong&gt;Key Benefits:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;2-3x faster development&lt;/strong&gt; - Skip reward function engineering entirely&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;General-purpose&lt;/strong&gt; - Works across any task without modification&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Strong performance&lt;/strong&gt; - Matches or exceeds hand-crafted rewards in 3/4 benchmarks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Easy integration&lt;/strong&gt; - Drop-in replacement for manual reward functions&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Before: Hours of reward engineering&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;complex_reward_function&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;trajectory&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# 50+ lines of careful scoring logic...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;pass&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# After: One line with RULER&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;judged_group&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;await&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ruler_score_group&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;group&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;openai/o3&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://art.openpipe.ai/fundamentals/ruler&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;📖 Learn more about RULER →&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;art-overview&#34;&gt;ART Overview
&lt;/h2&gt;&lt;p&gt;ART is an open-source RL framework that improves agent reliability by allowing LLMs to &lt;strong&gt;learn from experience&lt;/strong&gt;. ART provides an ergonomic harness for integrating GRPO into any python application. For a quick hands-on introduction, run one of the notebooks below. When you&amp;rsquo;re ready to learn more, check out the &lt;a class=&#34;link&#34; href=&#34;https://art.openpipe.ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docs&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;-notebooks&#34;&gt;📒 Notebooks
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Agent Task&lt;/th&gt;
          &lt;th&gt;Example Notebook&lt;/th&gt;
          &lt;th&gt;Description&lt;/th&gt;
          &lt;th&gt;Comparative Performance&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;ART•E LangGraph&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/langgraph/art-e-langgraph.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 7B learns to search emails using LangGraph&lt;/td&gt;
          &lt;td&gt;[Link coming soon]&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;MCP•RL&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/mcp-rl/mcp-rl.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 3B masters the NWS MCP server&lt;/td&gt;
          &lt;td&gt;[Link coming soon]&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;ART•E [RULER]&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/art-e.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 7B learns to search emails using RULER&lt;/td&gt;
          &lt;td&gt;&lt;img src=&#34;https://github.com/openpipe/art/raw/main/assets/benchmarks/email_agent/accuracy-training-progress.svg&#34; height=&#34;72&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;https://producthunt.programnotes.cn/dev/art-e/art_e/evaluate/display_benchmarks.ipynb&#34; &gt;benchmarks&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;2048&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 3B learns to play 2048&lt;/td&gt;
          &lt;td&gt;&lt;img src=&#34;https://github.com/openpipe/art/raw/main/assets/benchmarks/2048/accuracy-training-progress.svg&#34; height=&#34;72&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;https://producthunt.programnotes.cn/examples/2048/display_benchmarks.ipynb&#34; &gt;benchmarks&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Temporal Clue&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/temporal_clue/temporal-clue.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 7B learns to solve Temporal Clue&lt;/td&gt;
          &lt;td&gt;[Link coming soon]&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Tic Tac Toe&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/tic_tac_toe/tic-tac-toe.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 3B learns to play Tic Tac Toe&lt;/td&gt;
          &lt;td&gt;&lt;img src=&#34;https://github.com/openpipe/art/raw/main/assets/benchmarks/tic-tac-toe-local/accuracy-training-progress.svg&#34; height=&#34;72&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;https://producthunt.programnotes.cn/examples/tic_tac_toe/display-benchmarks.ipynb&#34; &gt;benchmarks&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Codenames&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Qwen 2.5 3B learns to play Codenames&lt;/td&gt;
          &lt;td&gt;&lt;img src=&#34;https://github.com/openpipe/art/raw/main/assets/benchmarks/codenames/win_rate_over_time.png&#34; height=&#34;72&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenPipe/art-notebooks/blob/main/examples/codenames/Codenames_RL.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;benchmarks&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;AutoRL [RULER]&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/auto_rl.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🏋️ Train agent&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Train Qwen 2.5 7B to master any task&lt;/td&gt;
          &lt;td&gt;[Link coming soon]&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;-art-news&#34;&gt;📰 ART News
&lt;/h2&gt;&lt;p&gt;Explore our latest research and updates on building SOTA agents.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;🗞️ &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://art.openpipe.ai/integrations/langgraph-integration&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ART now integrates seamlessly with LangGraph&lt;/a&gt;&lt;/strong&gt; - Train your LangGraph agents with reinforcement learning for smarter multi-step reasoning and improved tool usage.&lt;/li&gt;
&lt;li&gt;🗞️ &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://x.com/corbtt/status/1953171838382817625&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MCP•RL: Teach Your Model to Master Any MCP Server&lt;/a&gt;&lt;/strong&gt; - Automatically train models to effectively use MCP server tools through reinforcement learning.&lt;/li&gt;
&lt;li&gt;🗞️ &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://x.com/mattshumer_/status/1950572449025650733&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;AutoRL: Zero-Data Training for Any Task&lt;/a&gt;&lt;/strong&gt; - Train custom AI models without labeled data using automatic input generation and RULER evaluation.&lt;/li&gt;
&lt;li&gt;🗞️ &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://openpipe.ai/blog/ruler-easy-mode-for-rl-rewards&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RULER: Easy Mode for RL Rewards&lt;/a&gt;&lt;/strong&gt; is now available for automatic reward generation in reinforcement learning.&lt;/li&gt;
&lt;li&gt;🗞️ &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://openpipe.ai/blog/art-e-mail-agent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ART·E: How We Built an Email Research Agent That Beats o3&lt;/a&gt;&lt;/strong&gt; demonstrates a Qwen 2.5 14B email agent outperforming OpenAI&amp;rsquo;s o3.&lt;/li&gt;
&lt;li&gt;🗞️ &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://openpipe.ai/blog/art-trainer&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ART Trainer: A New RL Trainer for Agents&lt;/a&gt;&lt;/strong&gt; enables easy training of LLM-based agents using GRPO.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://openpipe.ai/blog&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;📖 See all blog posts →&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;why-art&#34;&gt;Why ART?
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;ART provides convenient wrappers for introducing RL training into &lt;strong&gt;existing applications&lt;/strong&gt;. We abstract the training server into a modular service that your code doesn&amp;rsquo;t need to interface with.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Train from anywhere.&lt;/strong&gt; Run the ART client on your laptop and let the ART server kick off an ephemeral GPU-enabled environment, or run on a local GPU.&lt;/li&gt;
&lt;li&gt;Integrations with hosted platforms like W&amp;amp;B, Langfuse, and OpenPipe provide flexible observability and &lt;strong&gt;simplify debugging&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;ART is customizable with &lt;strong&gt;intelligent defaults&lt;/strong&gt;. You can configure training parameters and inference engine configurations to meet specific needs, or take advantage of the defaults, which have been optimized for training efficiency and stability.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;ART agents can be trained from any client machine that runs python. To add to an existing project, run this command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install openpipe-art
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;-arte-agent&#34;&gt;🤖 ART•E Agent
&lt;/h2&gt;&lt;p&gt;Curious about how to use ART for a real-world task? Check out the &lt;a class=&#34;link&#34; href=&#34;https://openpipe.ai/blog/art-e-mail-agent&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ART•E Agent&lt;/a&gt; blog post, where we detail how we trained Qwen 2.5 14B to beat o3 at email retrieval!&lt;/p&gt;
&lt;img src=&#34;https://github.com/openpipe/art/raw/main/assets/ART_E_graphs.png&#34; width=&#34;700&#34;&gt;
&lt;h2 id=&#34;-training-loop-overview&#34;&gt;🔁 Training Loop Overview
&lt;/h2&gt;&lt;p&gt;ART&amp;rsquo;s functionality is divided into a &lt;strong&gt;client&lt;/strong&gt; and a &lt;strong&gt;server&lt;/strong&gt;. The OpenAI-compatible client is responsible for interfacing between ART and your codebase. Using the client, you can pass messages and get completions from your LLM as it improves. The server runs independently on any machine with a GPU. It abstracts away the complexity of the inference and training portions of the RL loop while allowing for some custom configuration. An outline of the training loop is shown below:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Your code uses the ART client to perform an agentic workflow (usually executing several rollouts in parallel to gather data faster).&lt;/li&gt;
&lt;li&gt;Completion requests are routed to the ART server, which runs the model&amp;rsquo;s latest LoRA in vLLM.&lt;/li&gt;
&lt;li&gt;As the agent executes, each &lt;code&gt;system&lt;/code&gt;, &lt;code&gt;user&lt;/code&gt;, and &lt;code&gt;assistant&lt;/code&gt; message is stored in a Trajectory.&lt;/li&gt;
&lt;li&gt;When a rollout finishes, your code assigns a &lt;code&gt;reward&lt;/code&gt; to its Trajectory, indicating the performance of the LLM.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When each rollout has finished, Trajectories are grouped and sent to the server. Inference is blocked while training executes.&lt;/li&gt;
&lt;li&gt;The server trains your model using GRPO, initializing from the latest checkpoint (or an empty LoRA on the first iteration).&lt;/li&gt;
&lt;li&gt;The server saves the newly trained LoRA to a local directory and loads it into vLLM.&lt;/li&gt;
&lt;li&gt;Inference is unblocked and the loop resumes at step 1.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This training loop runs until a specified number of inference and training iterations have completed.&lt;/p&gt;
&lt;h2 id=&#34;-supported-models&#34;&gt;🧩 Supported Models
&lt;/h2&gt;&lt;p&gt;ART should work with most vLLM/HuggingFace-transformers compatible causal language models, or at least the ones supported by &lt;a class=&#34;link&#34; href=&#34;https://docs.unsloth.ai/get-started/all-our-models&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Unsloth&lt;/a&gt;. Gemma 3 does not appear to be supported for the time being. If any other model isn&amp;rsquo;t working for you, please let us know on &lt;a class=&#34;link&#34; href=&#34;https://discord.gg/zbBHRUpwf4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Discord&lt;/a&gt; or open an issue on &lt;a class=&#34;link&#34; href=&#34;https://github.com/openpipe/art/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GitHub&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id=&#34;-contributing&#34;&gt;🤝 Contributing
&lt;/h2&gt;&lt;p&gt;ART is in active development, and contributions are most welcome! Please see the &lt;a class=&#34;link&#34; href=&#34;CONTRIBUTING.md&#34; &gt;CONTRIBUTING.md&lt;/a&gt; file for more information.&lt;/p&gt;
&lt;h2 id=&#34;-citation&#34;&gt;📖 Citation
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bibtex&#34; data-lang=&#34;bibtex&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@misc&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nl&#34;&gt;hilton2025art&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;author&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{Brad Hilton and Kyle Corbitt and David Corbitt and Saumya Gandhi and Angky William and Bohdan Kovalenskyi and Andie Jones}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;title&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{ART: Agent Reinforcement Trainer}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;year&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{2025}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;publisher&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{GitHub}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;journal&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{GitHub repository}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;howpublished&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s&#34;&gt;{\url{https://github.com/openpipe/art}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;-license&#34;&gt;⚖️ License
&lt;/h2&gt;&lt;p&gt;This repository&amp;rsquo;s source code is available under the &lt;a class=&#34;link&#34; href=&#34;LICENSE&#34; &gt;Apache-2.0 License&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;-credits&#34;&gt;🙏 Credits
&lt;/h2&gt;&lt;p&gt;ART stands on the shoulders of giants. While we owe many of the ideas and early experiments that led to ART&amp;rsquo;s development to the open source RL community at large, we&amp;rsquo;re especially grateful to the authors of the following projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/unslothai/unsloth&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Unsloth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/vllm-project/vllm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vLLM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/huggingface/trl&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;trl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/pytorch/torchtune&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;torchtune&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/skypilot-org/skypilot&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SkyPilot&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, thank you to our partners who&amp;rsquo;ve helped us test ART in the wild! We&amp;rsquo;re excited to see what you all build with it.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>IsaacLab</title>
        <link>https://producthunt.programnotes.cn/en/p/isaaclab/</link>
        <pubDate>Fri, 04 Jul 2025 15:30:49 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/isaaclab/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1726154659420-d68f2ed9b816?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTE2MTQyMDR8&amp;ixlib=rb-4.1.0" alt="Featured image of post IsaacLab" /&gt;&lt;h1 id=&#34;isaac-simisaaclab&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;isaac-sim/IsaacLab&lt;/a&gt;
&lt;/h1&gt;&lt;p&gt;&lt;img src=&#34;https://producthunt.programnotes.cn/docs/source/_static/isaaclab.jpg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Isaac Lab&#34;
	
	
&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h1 id=&#34;isaac-lab&#34;&gt;Isaac Lab
&lt;/h1&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.isaacsim.omniverse.nvidia.com/latest/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/IsaacSim-4.5.0-silver.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;IsaacSim&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://docs.python.org/3/whatsnew/3.10.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/python-3.10-blue.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Python&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://releases.ubuntu.com/20.04/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/platform-linux--64-orange.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Linux platform&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://www.microsoft.com/en-us/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/platform-windows--64-orange.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Windows platform&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab/actions/workflows/pre-commit.yaml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/actions/workflow/status/isaac-sim/IsaacLab/pre-commit.yaml?logo=pre-commit&amp;amp;logoColor=white&amp;amp;label=pre-commit&amp;amp;color=brightgreen&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;pre-commit&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab/actions/workflows/docs.yaml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/actions/workflow/status/isaac-sim/IsaacLab/docs.yaml?label=docs&amp;amp;color=brightgreen&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;docs status&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://opensource.org/licenses/BSD-3-Clause&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/license-BSD--3-yellow.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;License&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://opensource.org/license/apache-2-0&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/license-Apache--2.0-yellow.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;License&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Isaac Lab&lt;/strong&gt; is a GPU-accelerated, open-source framework designed to unify and simplify robotics research workflows, such as reinforcement learning, imitation learning, and motion planning. Built on &lt;a class=&#34;link&#34; href=&#34;https://docs.isaacsim.omniverse.nvidia.com/latest/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA Isaac Sim&lt;/a&gt;, it combines fast and accurate physics and sensor simulation, making it an ideal choice for sim-to-real transfer in robotics.&lt;/p&gt;
&lt;p&gt;Isaac Lab provides developers with a range of essential features for accurate sensor simulation, such as RTX-based cameras, LIDAR, or contact sensors. The framework&amp;rsquo;s GPU acceleration enables users to run complex simulations and computations faster, which is key for iterative processes like reinforcement learning and data-intensive tasks. Moreover, Isaac Lab can run locally or be distributed across the cloud, offering flexibility for large-scale deployments.&lt;/p&gt;
&lt;h2 id=&#34;key-features&#34;&gt;Key Features
&lt;/h2&gt;&lt;p&gt;Isaac Lab offers a comprehensive set of tools and environments designed to facilitate robot learning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Robots&lt;/strong&gt;: A diverse collection of robots, from manipulators, quadrupeds, to humanoids, with 16 commonly available models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Environments&lt;/strong&gt;: Ready-to-train implementations of more than 30 environments, which can be trained with popular reinforcement learning frameworks such as RSL RL, SKRL, RL Games, or Stable Baselines. We also support multi-agent reinforcement learning.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Physics&lt;/strong&gt;: Rigid bodies, articulated systems, deformable objects&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sensors&lt;/strong&gt;: RGB/depth/segmentation cameras, camera annotations, IMU, contact sensors, ray casters.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;getting-started&#34;&gt;Getting Started
&lt;/h2&gt;&lt;p&gt;Our &lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation page&lt;/a&gt; provides everything you need to get started, including detailed tutorials and step-by-step guides. Follow these links to learn more about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab/main/source/setup/installation/index.html#local-installation&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Installation steps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab/main/source/overview/reinforcement-learning/rl_existing_scripts.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Reinforcement learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab/main/source/tutorials/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Tutorials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab/main/source/overview/environments.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Available environments&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;isaac-sim-version-dependency&#34;&gt;Isaac Sim Version Dependency
&lt;/h2&gt;&lt;p&gt;Isaac Lab is built on top of Isaac Sim and requires specific versions of Isaac Sim that are compatible with each release of Isaac Lab.
Below, we outline the recent Isaac Lab releases and GitHub branches and their corresponding dependency versions for Isaac Sim.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Isaac Lab Version&lt;/th&gt;
          &lt;th&gt;Isaac Sim Version&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;main&lt;/code&gt; branch&lt;/td&gt;
          &lt;td&gt;Isaac Sim 4.5&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;v2.1.0&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Isaac Sim 4.5&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;v2.0.2&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Isaac Sim 4.5&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;v2.0.1&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Isaac Sim 4.5&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;v2.0.0&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Isaac Sim 4.5&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;feature/isaacsim_5_0&lt;/code&gt; branch&lt;/td&gt;
          &lt;td&gt;Isaac Sim 5.0&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note that the &lt;code&gt;feature/isaacsim_5_0&lt;/code&gt; will contain active updates and may contain some breaking changes
until the official Isaac Lab 2.2 release.
It currently requires the &lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacSim&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Isaac Sim 5.0 branch&lt;/a&gt; available on GitHub built from source.
Please refer to the README in the &lt;code&gt;feature/isaacsim_5_0&lt;/code&gt; branch for instructions for using Isaac Lab with Isaac Sim 5.0.
We are actively working on introducing backwards compatibility support for Isaac Sim 4.5 for this branch.&lt;/p&gt;
&lt;h2 id=&#34;contributing-to-isaac-lab&#34;&gt;Contributing to Isaac Lab
&lt;/h2&gt;&lt;p&gt;We wholeheartedly welcome contributions from the community to make this framework mature and useful for everyone.
These may happen as bug reports, feature requests, or code contributions. For details, please check our
&lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;contribution guidelines&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;show--tell-share-your-inspiration&#34;&gt;Show &amp;amp; Tell: Share Your Inspiration
&lt;/h2&gt;&lt;p&gt;We encourage you to utilize our &lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab/discussions/categories/show-and-tell&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Show &amp;amp; Tell&lt;/a&gt; area in the
&lt;code&gt;Discussions&lt;/code&gt; section of this repository. This space is designed for you to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Share the tutorials you&amp;rsquo;ve created&lt;/li&gt;
&lt;li&gt;Showcase your learning content&lt;/li&gt;
&lt;li&gt;Present exciting projects you&amp;rsquo;ve developed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By sharing your work, you&amp;rsquo;ll inspire others and contribute to the collective knowledge
of our community. Your contributions can spark new ideas and collaborations, fostering
innovation in robotics and simulation.&lt;/p&gt;
&lt;h2 id=&#34;troubleshooting&#34;&gt;Troubleshooting
&lt;/h2&gt;&lt;p&gt;Please see the &lt;a class=&#34;link&#34; href=&#34;https://isaac-sim.github.io/IsaacLab/main/source/refs/troubleshooting.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;troubleshooting&lt;/a&gt; section for
common fixes or &lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;submit an issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For issues related to Isaac Sim, we recommend checking its &lt;a class=&#34;link&#34; href=&#34;https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/overview.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation&lt;/a&gt;
or opening a question on its &lt;a class=&#34;link&#34; href=&#34;https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/67&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;forums&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;support&#34;&gt;Support
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Please use GitHub &lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab/discussions&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Discussions&lt;/a&gt; for discussing ideas, asking questions, and requests for new features.&lt;/li&gt;
&lt;li&gt;Github &lt;a class=&#34;link&#34; href=&#34;https://github.com/isaac-sim/IsaacLab/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Issues&lt;/a&gt; should only be used to track executable pieces of work with a definite scope and a clear deliverable. These can be fixing bugs, documentation issues, new features, or general updates.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;connect-with-the-nvidia-omniverse-community&#34;&gt;Connect with the NVIDIA Omniverse Community
&lt;/h2&gt;&lt;p&gt;Do you have a project or resource you&amp;rsquo;d like to share more widely? We&amp;rsquo;d love to hear from you!
Reach out to the NVIDIA Omniverse Community team at &lt;a class=&#34;link&#34; href=&#34;mailto:OmniverseCommunity@nvidia.com&#34; &gt;OmniverseCommunity@nvidia.com&lt;/a&gt; to explore opportunities
to spotlight your work.&lt;/p&gt;
&lt;p&gt;You can also join the conversation on the &lt;a class=&#34;link&#34; href=&#34;https://discord.com/invite/nvidiaomniverse&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Omniverse Discord&lt;/a&gt; to
connect with other developers, share your projects, and help grow a vibrant, collaborative ecosystem
where creativity and technology intersect. Your contributions can make a meaningful impact on the Isaac Lab community and beyond!&lt;/p&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;The Isaac Lab framework is released under &lt;a class=&#34;link&#34; href=&#34;LICENSE&#34; &gt;BSD-3 License&lt;/a&gt;. The &lt;code&gt;isaaclab_mimic&lt;/code&gt; extension and its corresponding standalone scripts are released under &lt;a class=&#34;link&#34; href=&#34;LICENSE-mimic&#34; &gt;Apache 2.0&lt;/a&gt;. The license files of its dependencies and assets are present in the &lt;a class=&#34;link&#34; href=&#34;docs/licenses&#34; &gt;&lt;code&gt;docs/licenses&lt;/code&gt;&lt;/a&gt; directory.&lt;/p&gt;
&lt;h2 id=&#34;acknowledgement&#34;&gt;Acknowledgement
&lt;/h2&gt;&lt;p&gt;Isaac Lab development initiated from the &lt;a class=&#34;link&#34; href=&#34;https://isaac-orbit.github.io/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Orbit&lt;/a&gt; framework. We would appreciate if you would cite it in academic publications as well:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;@article{mittal2023orbit,
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   author={Mittal, Mayank and Yu, Calvin and Yu, Qinxi and Liu, Jingzhou and Rudin, Nikita and Hoeller, David and Yuan, Jia Lin and Singh, Ritvik and Guo, Yunrong and Mazhar, Hammad and Mandlekar, Ajay and Babich, Buck and State, Gavriel and Hutter, Marco and Garg, Animesh},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   journal={IEEE Robotics and Automation Letters},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   title={Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   year={2023},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   volume={8},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   number={6},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   pages={3740-3747},
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   doi={10.1109/LRA.2023.3270034}
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        </item>
        <item>
        <title>rl-swarm</title>
        <link>https://producthunt.programnotes.cn/en/p/rl-swarm/</link>
        <pubDate>Sat, 28 Jun 2025 15:28:05 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/rl-swarm/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1584785933913-feb6e407f2a2?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTEwOTU2MjR8&amp;ixlib=rb-4.1.0" alt="Featured image of post rl-swarm" /&gt;&lt;h1 id=&#34;gensyn-airl-swarm&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/gensyn-ai/rl-swarm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;gensyn-ai/rl-swarm&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;rl-swarm&#34;&gt;RL Swarm
&lt;/h1&gt;&lt;p&gt;RL Swarm is a peer-to-peer system for reinforcement learning. It allows you to train models collaboratively with others in the swarm, leveraging their collective intelligence. It is open source and permissionless, meaning you can run it on a consumer laptop at home or on a powerful GPU in the cloud. You can also connect your model to the Gensyn Testnet to receive an on-chain identity that tracks your progress over time.&lt;/p&gt;
&lt;p&gt;Currently, we are running the &lt;a class=&#34;link&#34; href=&#34;https://github.com/open-thought/reasoning-gym/tree/main&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;reasoning-gym&lt;/a&gt; swarm on the Testnet. This swarm is designed to train models to solve a diverse set of reasoning tasks using the reasoning-gym dataset. The current list of default models includes:&lt;/p&gt;
&lt;p&gt;Models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gensyn/Qwen2.5-0.5B-Instruct&lt;/li&gt;
&lt;li&gt;Qwen/Qwen3-0.6B&lt;/li&gt;
&lt;li&gt;nvidia/AceInstruct-1.5B&lt;/li&gt;
&lt;li&gt;dnotitia/Smoothie-Qwen3-1.7B&lt;/li&gt;
&lt;li&gt;Gensyn/Qwen2.5-1.5B-Instruct&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This iteration of rl-swarm is powered by the &lt;a class=&#34;link&#34; href=&#34;https://github.com/gensyn-ai/genrl-swarm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GenRL-Swarm&lt;/a&gt; library.  It is a fully composable framework for decentralized reinforcement learning which enables users to create and customize their own swarms for reinforcement learning with multi-agent multi-stage environments.&lt;/p&gt;
&lt;h2 id=&#34;requirements&#34;&gt;Requirements
&lt;/h2&gt;&lt;p&gt;Your hardware requirements will vary depending on a number of factors including model size and the accelerator platform you use.  Users running large NVIDIA GPU will be assigned a model from the large model pool, while users running less powerful hardware will be assigned a model from the small model pool. This design decision is intended to allow users to advance at a similar rate regardless of the hardware they use, maximizing their utility to the swarm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Supported Hardware&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;arm64 or x86 CPU with minimum 32gb ram (note that if you run other applications during training it might crash training).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OR&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CUDA devices (officially supported):
&lt;ul&gt;
&lt;li&gt;RTX 3090&lt;/li&gt;
&lt;li&gt;RTX 4090&lt;/li&gt;
&lt;li&gt;RTX 5090&lt;/li&gt;
&lt;li&gt;A100&lt;/li&gt;
&lt;li&gt;H100&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With either configuration, you will need Python &amp;gt;=3.10 (for Mac, you will likely need to upgrade).&lt;/p&gt;
&lt;h2 id=&#34;-please-read-before-continuing-&#34;&gt;⚠️ Please read before continuing ⚠️
&lt;/h2&gt;&lt;p&gt;This software is &lt;strong&gt;experimental&lt;/strong&gt; and provided as-is for users who are interested in using (or helping to develop) an early version of the Gensyn Protocol for training models.&lt;/p&gt;
&lt;p&gt;If you care about on-chain participation, you &lt;strong&gt;must&lt;/strong&gt; read the &lt;a class=&#34;link&#34; href=&#34;#identity-management&#34; &gt;Identity Management&lt;/a&gt; section below.&lt;/p&gt;
&lt;p&gt;If you encounter issues, please first check &lt;a class=&#34;link&#34; href=&#34;#troubleshooting&#34; &gt;Troubleshooting&lt;/a&gt;. If you cannot find a solution there, please check if there is an open (or closed) &lt;a class=&#34;link&#34; href=&#34;../../issues&#34; &gt;Issue&lt;/a&gt;. If there is no relevant issue, please file one and include 1) all relevant &lt;a class=&#34;link&#34; href=&#34;#troubleshooting&#34; &gt;logs&lt;/a&gt;, 2) information about your device (e.g. which GPU, if relevant), and 3) your operating system information.&lt;/p&gt;
&lt;h2 id=&#34;instructions&#34;&gt;Instructions
&lt;/h2&gt;&lt;h3 id=&#34;run-the-swarm&#34;&gt;Run the Swarm
&lt;/h3&gt;&lt;p&gt;The easiest way to run RL Swarm is using Docker. This ensures a consistent setup across all operating systems with minimal dependencies.&lt;/p&gt;
&lt;h4 id=&#34;1-clone-this-repo&#34;&gt;1. Clone this repo
&lt;/h4&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/gensyn-ai/rl-swarm
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h4 id=&#34;2-install-docker&#34;&gt;2. Install Docker
&lt;/h4&gt;&lt;p&gt;Make sure you have Docker installed and the Docker daemon is running on your machine. To do that, follow &lt;a class=&#34;link&#34; href=&#34;https://docs.docker.com/get-started/get-docker/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;these instructions&lt;/a&gt; according to your OS. Ensure you allot sufficient memory to the Docker containers. For example if using Docker Desktop, this can be done by going to Docker Desktop Settings &amp;gt; Resources &amp;gt; Advanced &amp;gt; Memory Limit, and increasing it to the maximum possible value.&lt;/p&gt;
&lt;h4 id=&#34;3-start-the-swarm&#34;&gt;3. Start the Swarm
&lt;/h4&gt;&lt;p&gt;Run the following commands from the root of the repository.&lt;/p&gt;
&lt;h5 id=&#34;cpu-support&#34;&gt;CPU support
&lt;/h5&gt;&lt;p&gt;If you’re using a Mac or if your machine has CPU-only support:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker-compose run --rm --build -Pit swarm-cpu
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h5 id=&#34;gpu-support&#34;&gt;GPU support
&lt;/h5&gt;&lt;p&gt;If you&amp;rsquo;re using a machine with an officially supported GPU:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker-compose run --rm --build -Pit swarm-gpu
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h5 id=&#34;docker-compose-issue&#34;&gt;Docker compose issue
&lt;/h5&gt;&lt;p&gt;If &lt;code&gt;docker-compose&lt;/code&gt; does not work when running the above commands, please try &lt;code&gt;docker compose&lt;/code&gt; (no hyphen) instead. I.e. &lt;code&gt; docker compose run --rm --build -Pit swarm-gpu&lt;/code&gt;. This issue sometimes occurs on users running Ubuntu.&lt;/p&gt;
&lt;h3 id=&#34;experimental-advanced-mode&#34;&gt;Experimental (advanced) mode
&lt;/h3&gt;&lt;p&gt;If you want to experiment with the &lt;a class=&#34;link&#34; href=&#34;https://github.com/gensyn-ai/genrl-swarm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GenRL-Swarm&lt;/a&gt; library and its &lt;a class=&#34;link&#34; href=&#34;https://github.com/gensyn-ai/genrl-swarm/blob/main/recipes/rgym/rg-swarm.yaml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;configurable parameters&lt;/a&gt;, we recommend you run RL Swarm via shell script:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 -m venv .venv
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./run_rl_swarm.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;To learn more about experimental mode, check out our &lt;a class=&#34;link&#34; href=&#34;https://github.com/gensyn-ai/genrl-swarm/blob/main/getting_started.ipynb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;getting started guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;login&#34;&gt;Login
&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;A browser window will pop open (you&amp;rsquo;ll need to manually navigate to http://localhost:3000/ if you&amp;rsquo;re on a VM).&lt;/li&gt;
&lt;li&gt;Click &amp;rsquo;login&#39;.&lt;/li&gt;
&lt;li&gt;Login with your preferred method.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;huggingface&#34;&gt;Huggingface
&lt;/h3&gt;&lt;p&gt;If you would like to upload your model to Hugging Face, enter your Hugging Face access token when prompted. You can generate one from your Hugging Face account, under &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/docs/hub/en/security-tokens&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Access Tokens&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;initial-peering-and-training&#34;&gt;Initial peering and training
&lt;/h3&gt;&lt;p&gt;From this stage onward your device will begin training. You should see your peer register and vote on-chain &lt;a class=&#34;link&#34; href=&#34;https://gensyn-testnet.explorer.alchemy.com/address/0xFaD7C5e93f28257429569B854151A1B8DCD404c2?tab=logs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can also track your training progress in real time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;On The RL-Swarm Dashboard: &lt;a class=&#34;link&#34; href=&#34;https://dashboard.gensyn.ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;dashboard.gensyn.ai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;identity-management&#34;&gt;Identity management
&lt;/h2&gt;&lt;h3 id=&#34;introduction&#34;&gt;Introduction
&lt;/h3&gt;&lt;p&gt;On-chain identity is managed via an Alchemy modal sign-in screen. You need to supply an email address or login via a supported method (e.g. Google). This creates an EOA public/private key (which are stored by Alchemy). You will also receive local session keys in the &lt;code&gt;userApiKey&lt;/code&gt;. Note that these aren&amp;rsquo;t your EOA public/private keys.&lt;/p&gt;
&lt;p&gt;During the initial set-up process, you will also create a &lt;code&gt;swarm.pem&lt;/code&gt; file which maintains the identity of your peer. This is then registered on chain using the EOA wallet hosted in Alchemy, triggered using your local api keys. This links the &lt;code&gt;swarm.pem&lt;/code&gt; to the &lt;code&gt;email address&lt;/code&gt; (and corresponding EOA in Alchemy).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;If you want to link multiple nodes to a single EOA&lt;/strong&gt;, simply sign up each node using the same email address. You will get a new peer ID for each node, however they will all be linked to the same EOA that your email is linked to.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Please note&lt;/strong&gt;: if you are using a fork of this repo, or a service organised by someone else (e.g. a &amp;lsquo;one click deployment&amp;rsquo; provider) the identity management flow below is not guaranteed.&lt;/p&gt;
&lt;h3 id=&#34;what-this-means&#34;&gt;What this means
&lt;/h3&gt;&lt;p&gt;In the following two scenarios, everything will work (i.e. you will have an on-chain identity linked with your RL Swarm peer training):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The very first time you run the node from scratch with a new email address. The smart account will be created fresh and linked with the swarm.pem that is also fresh.&lt;/li&gt;
&lt;li&gt;If you run it again with a &lt;code&gt;swarm.pem&lt;/code&gt; AND login the original &lt;code&gt;email address&lt;/code&gt; used with that &lt;code&gt;swarm.pem&lt;/code&gt;. Note: this will throw an error into the log on registration but will still be able to sign transactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the following two scenarios, it will not work (i.e. you won&amp;rsquo;t have an on-chain identity linked with your RL Swarm peer training):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you keep your &lt;code&gt;swarm.pem&lt;/code&gt; and try to link it to an &lt;code&gt;email address&lt;/code&gt; distinct from the one with which it was first registered.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, you should do these actions in the following scenarios&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Signed up with &lt;code&gt;email address&lt;/code&gt;, generated &lt;code&gt;swarm.pem&lt;/code&gt;, BUT lost &lt;code&gt;swarm.pem&lt;/code&gt;&lt;/strong&gt; OR &lt;strong&gt;You want to run multiple nodes at once&lt;/strong&gt;: run from scratch with the same email address and generate a new &lt;code&gt;swarm.pem&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Signed up with &lt;code&gt;email address&lt;/code&gt;, generated &lt;code&gt;swarm.pem&lt;/code&gt;, kept &lt;code&gt;swarm.pem&lt;/code&gt;&lt;/strong&gt; -&amp;gt; you can re-run a single node using this pair if you&amp;rsquo;ve still got them both.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;troubleshooting&#34;&gt;Troubleshooting
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How do I find my logs?&lt;/strong&gt; You can find them inside the &lt;code&gt;/logs&lt;/code&gt; directory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;yarn.log&lt;/code&gt;: This file contains logs for the modal login server.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;swarm.log&lt;/code&gt;: This is the main log file for the RL Swarm application.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wandb/&lt;/code&gt;: This directory contains various logs related to your training runs, including a &lt;code&gt;debug.log&lt;/code&gt; file. These can be updated to Weights &amp;amp; Biases (only available if you log_with wandb).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;My peer &amp;lsquo;skipped a round&amp;rsquo;&lt;/strong&gt;: this occurs when your device isn&amp;rsquo;t fast enough to keep up with the pace of the swarm. For example, if you start training at round 100 and by the time you finish training the rest of the swarm reaches round 102, you will skip round 101 and go straight to 102. This is because your peer is more valuable if it is participating in the active round.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;My model doesn&amp;rsquo;t seem to be training?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you&amp;rsquo;re using a consumer device (e.g. a MacBook), it is likely just running slowly - check back in 20 minutes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Logging in with a new account after previous login?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make sure you click &amp;lsquo;Logout&amp;rsquo; on the login screen before you leave your previous session&lt;/li&gt;
&lt;li&gt;Make sure you delete &lt;code&gt;swarm.pem&lt;/code&gt; from the root directory (try &lt;code&gt;sudo rm swarm.pem&lt;/code&gt;). If you don&amp;rsquo;t do this, and you previously registered with the peer-id stored in this file, it will disrupt the training process.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Issues with the Login screen&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Upgrade viem&lt;/strong&gt;: some users report issues with the &lt;code&gt;viem&lt;/code&gt; package. There are two fixes:
&lt;ul&gt;
&lt;li&gt;in the &lt;code&gt;modal-login/package.json&lt;/code&gt; update: &lt;code&gt;&amp;quot;viem&amp;quot;: &amp;quot;2.25.0&amp;quot;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;in the terminal &lt;code&gt;cd /root/rl-swarm/modal-login/ &amp;amp;&amp;amp; yarn upgrade &amp;amp;&amp;amp; yarn add next@latest &amp;amp;&amp;amp; yarn add viem@latest&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I&amp;rsquo;m getting lots of warnings&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is expected behaviour and usually the output of the package managers or other dependencies. The most common is the below Protobuf warning - which can be ignored
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-gdscript3&#34; data-lang=&#34;gdscript3&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WARNING&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;The&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;candidate&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;selected&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;download&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;or&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;install&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;is&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;a&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;yanked&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;version&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;protobuf&amp;#39;&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;candidate&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Issues on VMs/VPSs?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How do I access the login screen if I&amp;rsquo;m running in a VM?&lt;/strong&gt;: port forwarding. Add this SSH flag: &lt;code&gt;-L 3000:localhost:3000&lt;/code&gt; when connecting to your VM. E.g. &lt;code&gt;gcloud compute ssh --zone &amp;quot;us-central1-a&amp;quot; [your-vm] --project [your-project] -- -L 3000:localhost:3000&lt;/code&gt;. Note, some VPSs may not work with &lt;code&gt;rl-swarm&lt;/code&gt;. Check the Gensyn &lt;a class=&#34;link&#34; href=&#34;https://discord.gg/AdnyWNzXh5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;discord&lt;/a&gt; for up-to-date information on this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Disconnection/general issues&lt;/strong&gt;: If you are tunneling to a VM and suffer a broken pipe, you will likely encounter OOM or unexpected behaviour the first time you relaunch the script. If you &lt;code&gt;control + c&lt;/code&gt; and kill the script it should spin down all background processes. Restart the script and everything should work normally.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Issues with npm/general installation?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Try  &lt;code&gt;npm install -g node@latest&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;OOM errors on MacBook?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Try this (experimental) fix to increase memory:
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-gdscript3&#34; data-lang=&#34;gdscript3&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;PYTORCH_MPS_HIGH_WATERMARK_RATIO&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I have a Windows machine, can I still train a model on the swarm?&lt;/strong&gt;: Yes - but this is not very well tested and may require you to do some debugging to get it set up properly. Install WSL and Linux on your Windows machine using the following instructions: &lt;a class=&#34;link&#34; href=&#34;https://learn.microsoft.com/en-us/windows/wsl/install&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://learn.microsoft.com/en-us/windows/wsl/install&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I want to move my to a different machine and/or restart with a fresh build of the repo, but I want my animal name/peer id to persist.&lt;/strong&gt;: To achieve this simply backup the &lt;code&gt;swarm.pem&lt;/code&gt; file on your current machine and then put it in the corresponding location on your new machine/build of the repo.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I have multiple GPUs on one machine, can I run multiple peers?&lt;/strong&gt;: Yes - but you&amp;rsquo;ll need to manually change things. You&amp;rsquo;ll need to isolate each GPU, install this repo for each GPU, and expose each peer under a different port to pass the modal onboard.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;My round/stage is behind the smart contract/other peers?&lt;/strong&gt;: This is expected behaviour given the different speeds of machines in the network. Once your machine completes it&amp;rsquo;s current round, it will move to the the current round.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I want to use a bigger and/or different model in the RL swarm, can I do that?&lt;/strong&gt;: Yes - but we only recommend doing so if you are comfortable understanding what size model can reasonably run on your hardware.  If you elect to bring a custom model, just paste the repo/model name into the command line when prompted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;I am running a model in the swarm on my CPU, have received a python &lt;code&gt;RuntimeError&lt;/code&gt;, and my training progress seems to have stopped.&lt;/strong&gt;: There are several possible causes for this, but before trying anything please wait long enough to be sure your training actually is frozen and not just slow (e.g., wait longer than a single training iteration has previously taken on your machine). If you&amp;rsquo;re sure training is actually frozen, then some things to try are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set this (experimental) fix: &lt;code&gt;export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 &amp;amp;&amp;amp; ./run_rl_swarm.sh&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>all-rag-techniques</title>
        <link>https://producthunt.programnotes.cn/en/p/all-rag-techniques/</link>
        <pubDate>Mon, 16 Jun 2025 15:31:49 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/all-rag-techniques/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1510792047925-c55a452bbad7?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTAwNTkwNDV8&amp;ixlib=rb-4.1.0" alt="Featured image of post all-rag-techniques" /&gt;&lt;h1 id=&#34;fareedkhan-devall-rag-techniques&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/FareedKhan-dev/all-rag-techniques&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;FareedKhan-dev/all-rag-techniques&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;all-rag-techniques-a-simpler-hands-on-approach-&#34;&gt;All RAG Techniques: A Simpler, Hands-On Approach ✨
&lt;/h1&gt;&lt;p&gt;Read this in your preferred language:&lt;br&gt;
&lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=de&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Deutsch&lt;/a&gt; |  &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=es&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Español&lt;/a&gt; |  &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=fr&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Français&lt;/a&gt; |  &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=ja&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;日本語&lt;/a&gt; |  &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=ko&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;한국어&lt;/a&gt; |  &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=pt&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Português&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=ru&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Русский&lt;/a&gt; |  &lt;a class=&#34;link&#34; href=&#34;https://www.readme-i18n.com/FareedKhan-dev/all-rag-techniques?lang=zh&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;中文&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.python.org/downloads/release/python-370/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/python-3.7&amp;#43;-blue.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Python 3.7&amp;#43;&#34;
	
	
&gt;&lt;/a&gt; &lt;a class=&#34;link&#34; href=&#34;https://cloud.nebius.ai/services/llm-embedding&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Nebius%20AI-API-brightgreen&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Nebius AI&#34;
	
	
&gt;&lt;/a&gt; &lt;a class=&#34;link&#34; href=&#34;https://openai.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/OpenAI-API-lightgrey&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;OpenAI&#34;
	
	
&gt;&lt;/a&gt; &lt;a class=&#34;link&#34; href=&#34;https://medium.com/@fareedkhandev/testing-every-rag-technique-to-find-the-best-094d166af27f&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Medium-Blog-black?logo=medium&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Medium&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This repository takes a clear, hands-on approach to &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt;, breaking down advanced techniques into straightforward, understandable implementations. Instead of relying on frameworks like &lt;code&gt;LangChain&lt;/code&gt; or &lt;code&gt;FAISS&lt;/code&gt;, everything here is built using familiar Python libraries &lt;code&gt;openai&lt;/code&gt;, &lt;code&gt;numpy&lt;/code&gt;, &lt;code&gt;matplotlib&lt;/code&gt;, and a few others.&lt;/p&gt;
&lt;p&gt;The goal is simple: provide code that is readable, modifiable, and educational. By focusing on the fundamentals, this project helps demystify RAG and makes it easier to understand how it really works.&lt;/p&gt;
&lt;h2 id=&#34;update-&#34;&gt;Update: 📢
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;(12-May-2025) Added a new notebook on how to handle big data using Knowledge Graphs.&lt;/li&gt;
&lt;li&gt;(27-April-2025) Added a new notebook which finds best RAG technique for a given query (Simple RAG + Reranker + Query Rewrite).&lt;/li&gt;
&lt;li&gt;(20-Mar-2025) Added a new notebook on RAG with Reinforcement Learning.&lt;/li&gt;
&lt;li&gt;(07-Mar-2025) Added 20 RAG techniques to the repository.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-whats-inside&#34;&gt;🚀 What&amp;rsquo;s Inside?
&lt;/h2&gt;&lt;p&gt;This repository contains a collection of Jupyter Notebooks, each focusing on a specific RAG technique.  Each notebook provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A concise explanation of the technique.&lt;/li&gt;
&lt;li&gt;A step-by-step implementation from scratch.&lt;/li&gt;
&lt;li&gt;Clear code examples with inline comments.&lt;/li&gt;
&lt;li&gt;Evaluations and comparisons to demonstrate the technique&amp;rsquo;s effectiveness.&lt;/li&gt;
&lt;li&gt;Visualization to visualize the results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&amp;rsquo;s a glimpse of the techniques covered:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Notebook&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Description&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;01_simple_rag.ipynb&#34; &gt;1. Simple RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;A basic RAG implementation.  A great starting point!&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;02_semantic_chunking.ipynb&#34; &gt;2. Semantic Chunking&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Splits text based on semantic similarity for more meaningful chunks.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;03_chunk_size_selector.ipynb&#34; &gt;3. Chunk Size Selector&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Explores the impact of different chunk sizes on retrieval performance.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;04_context_enriched_rag.ipynb&#34; &gt;4. Context Enriched RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Retrieves neighboring chunks to provide more context.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;05_contextual_chunk_headers_rag.ipynb&#34; &gt;5. Contextual Chunk Headers&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Prepends descriptive headers to each chunk before embedding.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;06_doc_augmentation_rag.ipynb&#34; &gt;6. Document Augmentation RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Generates questions from text chunks to augment the retrieval process.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;07_query_transform.ipynb&#34; &gt;7. Query Transform&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Rewrites, expands, or decomposes queries to improve retrieval.  Includes &lt;strong&gt;Step-back Prompting&lt;/strong&gt; and &lt;strong&gt;Sub-query Decomposition&lt;/strong&gt;.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;08_reranker.ipynb&#34; &gt;8. Reranker&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Re-ranks initially retrieved results using an LLM for better relevance.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;09_rse.ipynb&#34; &gt;9. RSE&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Relevant Segment Extraction:  Identifies and reconstructs continuous segments of text, preserving context.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;10_contextual_compression.ipynb&#34; &gt;10. Contextual Compression&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Implements contextual compression to filter and compress retrieved chunks, maximizing relevant information.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;11_feedback_loop_rag.ipynb&#34; &gt;11. Feedback Loop RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Incorporates user feedback to learn and improve RAG system over time.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;12_adaptive_rag.ipynb&#34; &gt;12. Adaptive RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Dynamically selects the best retrieval strategy based on query type.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;13_self_rag.ipynb&#34; &gt;13. Self RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Implements Self-RAG, dynamically decides when and how to retrieve, evaluates relevance, and assesses support and utility.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;14_proposition_chunking.ipynb&#34; &gt;14. Proposition Chunking&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Breaks down documents into atomic, factual statements for precise retrieval.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;15_multimodel_rag.ipynb&#34; &gt;15. Multimodel RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Combines text and images for retrieval, generating captions for images using LLaVA.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;16_fusion_rag.ipynb&#34; &gt;16. Fusion RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Combines vector search with keyword-based (BM25) retrieval for improved results.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;17_graph_rag.ipynb&#34; &gt;17. Graph RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Organizes knowledge as a graph, enabling traversal of related concepts.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;18_hierarchy_rag.ipynb&#34; &gt;18. Hierarchy RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Builds hierarchical indices (summaries + detailed chunks) for efficient retrieval.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;19_HyDE_rag.ipynb&#34; &gt;19. HyDE RAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Uses Hypothetical Document Embeddings to improve semantic matching.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;20_crag.ipynb&#34; &gt;20. CRAG&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Corrective RAG: Dynamically evaluates retrieval quality and uses web search as a fallback.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;21_rag_with_rl.ipynb&#34; &gt;21. Rag with RL&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Maximize the reward of the RAG model using Reinforcement Learning.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;best_rag_finder.ipynb&#34; &gt;Best RAG Finder&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Finds the best RAG technique for a given query using Simple RAG + Reranker + Query Rewrite.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;22_Big_data_with_KG.ipynb&#34; &gt;22. Big Data with Knowledge Graphs&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;Handles large datasets using Knowledge Graphs.&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;-repository-structure&#34;&gt;🗂️ Repository Structure
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;fareedkhan-dev-all-rag-techniques/
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── README.md                          &amp;lt;- You are here!
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 01_simple_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 02_semantic_chunking.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 03_chunk_size_selector.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 04_context_enriched_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 05_contextual_chunk_headers_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 06_doc_augmentation_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 07_query_transform.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 08_reranker.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 09_rse.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 10_contextual_compression.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 11_feedback_loop_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 12_adaptive_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 13_self_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 14_proposition_chunking.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 15_multimodel_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 16_fusion_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 17_graph_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 18_hierarchy_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 19_HyDE_rag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 20_crag.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 21_rag_with_rl.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── 22_big_data_with_KG.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── best_rag_finder.ipynb
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;├── requirements.txt                   &amp;lt;- Python dependencies
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;└── data/
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    └── val.json                       &amp;lt;- Sample validation data (queries and answers)
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    └── AI_Information.pdf             &amp;lt;- A sample PDF document for testing.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    └── attention_is_all_you_need.pdf  &amp;lt;- A sample PDF document for testing (for Multi-Modal RAG).
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;-getting-started&#34;&gt;🛠️ Getting Started
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Clone the repository:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/FareedKhan-dev/all-rag-techniques.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; all-rag-techniques
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Install dependencies:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -r requirements.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set up your OpenAI API key:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Obtain an API key from &lt;a class=&#34;link&#34; href=&#34;https://studio.nebius.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Nebius AI&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Set the API key as an environment variable:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;export&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;YOUR_NEBIUS_AI_API_KEY&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;or&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;setx OPENAI_API_KEY &lt;span class=&#34;s2&#34;&gt;&amp;#34;YOUR_NEBIUS_AI_API_KEY&amp;#34;&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# On Windows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;or, within your Python script/notebook:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;os&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;environ&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;OPENAI_API_KEY&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;YOUR_NEBIUS_AI_API_KEY&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run the notebooks:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Open any of the Jupyter Notebooks (&lt;code&gt;.ipynb&lt;/code&gt; files) using Jupyter Notebook or JupyterLab.  Each notebook is self-contained and can be run independently.  The notebooks are designed to be executed sequentially within each file.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The &lt;code&gt;data/AI_Information.pdf&lt;/code&gt; file provides a sample document for testing. You can replace it with your own PDF.  The &lt;code&gt;data/val.json&lt;/code&gt; file contains sample queries and ideal answers for evaluation.
The &amp;lsquo;attention_is_all_you_need.pdf&amp;rsquo; is for testing Multi-Modal RAG Notebook.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;-core-concepts&#34;&gt;💡 Core Concepts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Embeddings:&lt;/strong&gt;  Numerical representations of text that capture semantic meaning.  We use Nebius AI&amp;rsquo;s embedding API and, in many notebooks, also the &lt;code&gt;BAAI/bge-en-icl&lt;/code&gt; embedding model.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Vector Store:&lt;/strong&gt;  A simple database to store and search embeddings.  We create our own &lt;code&gt;SimpleVectorStore&lt;/code&gt; class using NumPy for efficient similarity calculations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cosine Similarity:&lt;/strong&gt;  A measure of similarity between two vectors.  Higher values indicate greater similarity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Chunking:&lt;/strong&gt;  Dividing text into smaller, manageable pieces.  We explore various chunking strategies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Retrieval:&lt;/strong&gt; The process of finding the most relevant text chunks for a given query.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generation:&lt;/strong&gt;  Using a Large Language Model (LLM) to create a response based on the retrieved context and the user&amp;rsquo;s query.  We use the &lt;code&gt;meta-llama/Llama-3.2-3B-Instruct&lt;/code&gt; model via Nebius AI&amp;rsquo;s API.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Evaluation:&lt;/strong&gt;  Assessing the quality of the RAG system&amp;rsquo;s responses, often by comparing them to a reference answer or using an LLM to score relevance.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-contributing&#34;&gt;🤝 Contributing
&lt;/h2&gt;&lt;p&gt;Contributions are welcome!&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
