<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Deployment on Producthunt daily</title>
        <link>https://producthunt.programnotes.cn/en/tags/deployment/</link>
        <description>Recent content in Deployment on Producthunt daily</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Sat, 06 Dec 2025 15:27:23 +0800</lastBuildDate><atom:link href="https://producthunt.programnotes.cn/en/tags/deployment/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>fizzy</title>
        <link>https://producthunt.programnotes.cn/en/p/fizzy/</link>
        <pubDate>Sat, 06 Dec 2025 15:27:23 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/fizzy/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1544085311-11a028465b03?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NjUwMDYwMzZ8&amp;ixlib=rb-4.1.0" alt="Featured image of post fizzy" /&gt;&lt;h1 id=&#34;basecampfizzy&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/basecamp/fizzy&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;basecamp/fizzy&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;fizzy&#34;&gt;Fizzy
&lt;/h1&gt;&lt;p&gt;This is the source code of &lt;a class=&#34;link&#34; href=&#34;https://fizzy.do/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Fizzy&lt;/a&gt;, the Kanban tracking tool for issues and ideas by &lt;a class=&#34;link&#34; href=&#34;https://37signals.com&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;37signals&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;deploying-fizzy&#34;&gt;Deploying Fizzy
&lt;/h2&gt;&lt;p&gt;If you&amp;rsquo;d like to run Fizzy on your own server, we recommend deploying it with &lt;a class=&#34;link&#34; href=&#34;https://kamal-deploy.org/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Kamal&lt;/a&gt;.
Kamal makes it easier to set up a bare server, copy the application to it, and manage the configuration settings that it uses.&lt;/p&gt;
&lt;p&gt;(Kamal is also what we use to deploy Fizzy at 37signals. If you&amp;rsquo;re curious about what our deployment configuration looks like, you can find it inside &lt;a class=&#34;link&#34; href=&#34;https://github.com/basecamp/fizzy-saas&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;fizzy-saas&lt;/code&gt;&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;This repo contains a starter deployment file that you can modify for your own specific use. That file lives at &lt;a class=&#34;link&#34; href=&#34;config/deploy.yml&#34; &gt;config/deploy.yml&lt;/a&gt;, which is the default place where Kamal will look for it.&lt;/p&gt;
&lt;p&gt;The steps to configure your very own Fizzy are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Fork the repo&lt;/li&gt;
&lt;li&gt;Edit few things in config/deploy.yml, .kamal/secrets, and config/environments/production.rb&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;kamal setup&lt;/code&gt; to do your first deploy.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We&amp;rsquo;ll go through each of these in turn.&lt;/p&gt;
&lt;h3 id=&#34;fork-the-repo&#34;&gt;Fork the repo
&lt;/h3&gt;&lt;p&gt;To make it easy to customise Fizzy&amp;rsquo;s settings for your own instance, you should start by creating your own GitHub fork of the repo.
That allows you to commit your changes, and track them over time.
You can always re-sync your fork to pick up new changes from the main repo over time.&lt;/p&gt;
&lt;p&gt;Once you&amp;rsquo;ve got your fork ready, run &lt;code&gt;bin/setup&lt;/code&gt; from within it, to make sure everything is installed.&lt;/p&gt;
&lt;h3 id=&#34;editing-the-configuration&#34;&gt;Editing the configuration
&lt;/h3&gt;&lt;p&gt;The config/deploy.yml has been mostly set up for you, but you&amp;rsquo;ll need to fill out some sections that are specific to your instance.
To get started, the parts you need to change are all in the &amp;ldquo;About your deployment&amp;rdquo; section.
We&amp;rsquo;ve added comments to that file to highlight what each setting needs to be, but the main ones are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;servers/web&lt;/code&gt;: Enter the hostname of the server you&amp;rsquo;re deploying to here. This should be an address that you can access via &lt;code&gt;ssh&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ssh/user&lt;/code&gt;: If you access your server a &lt;code&gt;root&lt;/code&gt; you can leave this alone; if you use a different user, set it here.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;proxy/ssl&lt;/code&gt; and &lt;code&gt;proxy/host&lt;/code&gt;: Kamal can set up SSL certificates for you automatically. To enable that, set the hostname again as &lt;code&gt;host&lt;/code&gt;. If you don&amp;rsquo;t want SSL for some reason, you can set &lt;code&gt;ssl: false&lt;/code&gt; to turn it off.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;env/clear/MAILER_FROM_ADDRESS&lt;/code&gt;: This is the email address that Fizzy will send emails from. It should usually be an address from the same domain where you&amp;rsquo;re running Fizzy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fizzy also requires a few environment variables to be set up, some of which contain secrets.
The simplest way to do this is to put them in a file called &lt;code&gt;.kamal/secrets&lt;/code&gt;.
Because this file will contain secret credentials, it&amp;rsquo;s important that you DON&amp;rsquo;T CHECK THIS FILE INTO YOUR REPO! You can add the filename to &lt;code&gt;.gitignore&lt;/code&gt; to ensure you don&amp;rsquo;t commit this file accidentally.&lt;/p&gt;
&lt;p&gt;If you use a password manager like 1Password, you can also opt to keep your secrets there instead.
Refer to the &lt;a class=&#34;link&#34; href=&#34;https://kamal-deploy.org/docs/configuration/environment-variables/#secrets&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Kamal documentation&lt;/a&gt; for more information about how to do that.&lt;/p&gt;
&lt;p&gt;To store your secrets, create the file &lt;code&gt;.kamal/secrets&lt;/code&gt; and enter something like the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;SECRET_KEY_BASE=12345
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;VAPID_PUBLIC_KEY=something
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;VAPID_PRIVATE_KEY=somethingelse
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;SMTP_USERNAME=email-provider-username
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;SMTP_PASSWORD=email-provider-password
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The values you enter here will be specific to you, and you can get or create them as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;SECRET_KEY_BASE&lt;/code&gt; should be a long, random secret. You can run &lt;code&gt;bin/rails secret&lt;/code&gt; to create a suitable value for this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;VAPID_PUBLIC_KEY&lt;/code&gt; &amp;amp; &lt;code&gt;VAPID_PRIVATE_KEY&lt;/code&gt; are a pair of credentials that are used for sending notifications. You can create your own keys by starting a development console with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bin/rails c
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then run the following to create a new pair of keys:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-ruby&#34; data-lang=&#34;ruby&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;vapid_key&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;no&#34;&gt;WebPush&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate_key&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;puts&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;VAPID_PRIVATE_KEY=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;#{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vapid_key&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;private_key&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;puts&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;VAPID_PUBLIC_KEY=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;#{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vapid_key&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;public_key&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;SMTP_USERNAME&lt;/code&gt;/&lt;code&gt;SMTP_PASSWORD&lt;/code&gt; are credentials you should get from your email provider.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Lastly, you&amp;rsquo;ll need to set up the rest of your email configuration in &lt;code&gt;config/environments/production.rb&lt;/code&gt;. There is an example configuration in comments at the top of that file. The actual settings you use here will depend on your email provider, but in most cases will look similar to that section, so you can uncomment it and edit to suit. Note that it will use the &lt;code&gt;SMTP_USERNAME&lt;/code&gt; and &lt;code&gt;SMTP_PASSWORD&lt;/code&gt; values you entered in your secrets.&lt;/p&gt;
&lt;p&gt;Once you&amp;rsquo;ve made all those changes, commit them to your fork so they&amp;rsquo;re saved.&lt;/p&gt;
&lt;h3 id=&#34;deploy-fizzy&#34;&gt;Deploy Fizzy!
&lt;/h3&gt;&lt;p&gt;You can now do your first deploy by running:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bin/kamal setup
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will set up Docker (if needed), build your Fizzy app container, configure it, and start it running.&lt;/p&gt;
&lt;p&gt;After the first deploy is done, any subsequent steps won&amp;rsquo;t need to do that initial setup. So for future deploys you can just run:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bin/kamal deploy
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;development&#34;&gt;Development
&lt;/h2&gt;&lt;h3 id=&#34;setting-up&#34;&gt;Setting up
&lt;/h3&gt;&lt;p&gt;First, get everything installed and configured with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bin/setup
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bin/setup --reset &lt;span class=&#34;c1&#34;&gt;# Reset the database and seed it&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;And then run the development server:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bin/dev
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You&amp;rsquo;ll be able to access the app in development at &lt;a class=&#34;link&#34; href=&#34;http://fizzy.localhost:3006&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;http://fizzy.localhost:3006&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To login, enter &lt;code&gt;david@example.com&lt;/code&gt; and grab the verification code from the browser console to sign in.&lt;/p&gt;
&lt;h3 id=&#34;web-push-notifications&#34;&gt;Web Push Notifications
&lt;/h3&gt;&lt;p&gt;Fizzy uses VAPID (Voluntary Application Server Identification) keys to send browser push notifications. For notifications to work in development you&amp;rsquo;ll need to generate a key pair and set these environment variables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;VAPID_PRIVATE_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;VAPID_PUBLIC_KEY&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Generate them with the &lt;code&gt;web-push&lt;/code&gt; gem:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-ruby&#34; data-lang=&#34;ruby&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;vapid_key&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;no&#34;&gt;WebPush&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate_key&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;puts&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;VAPID_PRIVATE_KEY=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;#{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vapid_key&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;private_key&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;puts&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;VAPID_PUBLIC_KEY=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;#{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vapid_key&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;public_key&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;running-tests&#34;&gt;Running tests
&lt;/h3&gt;&lt;p&gt;For fast feedback loops, unit tests can be run with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bin/rails test
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The full continuous integration tests can be run with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bin/ci
&lt;/code&gt;&lt;/pre&gt;
&lt;h3 id=&#34;database-configuration&#34;&gt;Database configuration
&lt;/h3&gt;&lt;p&gt;Fizzy works with SQLite by default and supports MySQL too. You can switch adapters with the &lt;code&gt;DATABASE_ADAPTER&lt;/code&gt; environment variable. For example, to develop locally against MySQL:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;DATABASE_ADAPTER&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;mysql bin/setup --reset
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;DATABASE_ADAPTER&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;mysql bin/ci
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The remote CI pipeline will run tests against both SQLite and MySQL.&lt;/p&gt;
&lt;h3 id=&#34;outbound-emails&#34;&gt;Outbound Emails
&lt;/h3&gt;&lt;p&gt;You can view email previews at &lt;a class=&#34;link&#34; href=&#34;http://fizzy.localhost:3006/rails/mailers&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;http://fizzy.localhost:3006/rails/mailers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can enable or disable &lt;a class=&#34;link&#34; href=&#34;https://github.com/ryanb/letter_opener&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;letter_opener&lt;/code&gt;&lt;/a&gt; to open sent emails automatically with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bin/rails dev:email
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Under the hood, this will create or remove &lt;code&gt;tmp/email-dev.txt&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;saas-gem&#34;&gt;SaaS gem
&lt;/h2&gt;&lt;p&gt;37signals bundles Fizzy with &lt;a class=&#34;link&#34; href=&#34;https://github.com/basecamp/fizzy-saas&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;fizzy-saas&lt;/code&gt;&lt;/a&gt;, a companion gem that links Fizzy with our billing system and contains our production setup.&lt;/p&gt;
&lt;p&gt;This gem depends on some private git repositories and it is not meant to be used by third parties. But we hope it can serve as inspiration for anyone wanting to run fizzy on their own infrastructure.&lt;/p&gt;
&lt;h2 id=&#34;contributing&#34;&gt;Contributing
&lt;/h2&gt;&lt;p&gt;We welcome contributions! Please read our &lt;a class=&#34;link&#34; href=&#34;STYLE.md&#34; &gt;style guide&lt;/a&gt; before submitting code.&lt;/p&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;Fizzy is released under the &lt;a class=&#34;link&#34; href=&#34;LICENSE.md&#34; &gt;O&amp;rsquo;Saasy License&lt;/a&gt;.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>Product Hunt Daily | 2025-10-13</title>
        <link>https://producthunt.programnotes.cn/en/p/product-hunt-daily-2025-10-13/</link>
        <pubDate>Mon, 13 Oct 2025 13:09:03 +0000</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/product-hunt-daily-2025-10-13/</guid>
        <description>&lt;h2 id=&#34;1-meku&#34;&gt;1. Meku
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: AI Web App and Site Builder&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Meku.dev is an AI-powered web app and site builder engineered for developers. Generate, customize and deploy full-stack web apps and sites from simple AI prompts. Comes with essential integrations and deployment tools and hosting to launch your MVP fast.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/UEDDQDR2ASWYKW?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/meku?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Meku&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI Web App Builder, Site Builder, AI, Web App, Full-Stack, Developer Tools, Deployment, Hosting, MVP, Meku.dev, AI Prompts&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺384&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;2-open-saas-20&#34;&gt;2. Open SaaS 2.0
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Free, open-source SaaS starter kit with superpowers&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Build and launch your SaaS application faster with this starter kit—free, open-source, full-stack React + Node.js. Features include auth, payments, AI example app, and admin dashboard.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/5QTWF4SISAQQ3M?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/open-saas?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Open SaaS 2.0&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: SaaS, open-source, React, Node.js, starter kit, free, auth, payments, AI, admin dashboard&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺335&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;3-undoomed&#34;&gt;3. UNDOOMED
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Stop doomscrolling. A mute button for Reels, Shorts &amp;amp; feeds.&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Tired of “5 min” turning into 2 hours? Undoomed hides Reels/Shorts/TikTok feeds but leaves what matters—messages and posts. Per-app settings, quick switch, light “clarity” stats. iOS now, Android on the way.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/SQP6L5JD7CBUI4?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/undoomed?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;UNDOOMED&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: doomscrolling, mute button, Reels, Shorts, feeds, iOS, Android, productivity, time management, social media, focus&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺258&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;4-getillustrations-figma-plugin&#34;&gt;4. Getillustrations Figma Plugin
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Bring stunning vector illustrations straight into Figma&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: The power of Getillustrations directly in Figma! Our plugin is packed with thousands of ready-to-use illustrations for UI, apps and websites. Fast, editable, and designed to keep your workflow smooth, just download and link to your All access account.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/A2AB3CIFDMVRWL?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/get-illustrations-premium-illustrations-for-website-and-app-ui-design?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Getillustrations Figma Plugin&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Figma plugin, vector illustrations, UI illustrations, app illustrations, website illustrations, editable illustrations, ready-to-use illustrations, Getillustrations&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺193&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;5-ai-ultra-editing-by-thumblify&#34;&gt;5. AI Ultra Editing by Thumblify
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Edit Like a Pro&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: ThumblifyAI Ultra Editing gives you full creative power. Lets you replace backgrounds, add elements, and fix every small detail in seconds. No limits, no complex tools. Just pure creativity made faster with AI.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/IJRSDEKOGLLYG2?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/thumblifyai?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;AI Ultra Editing by Thumblify&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI Editing, Image Editing, Background Removal, AI, Creative Editing, Thumblify, Ultra Editing, Easy Editing, Fast Editing, AI Tools&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺173&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;6-monitoro&#34;&gt;6. Monitoro
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Track AI &amp;amp; SEO Traffic o_O&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Track traffic from AI, Google, and Bing in one place. Finally!&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/ZEJ3WCSUZCJKJF?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/seo-monitoro?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Monitoro&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI traffic, SEO traffic, Google traffic, Bing traffic, traffic tracking, monitor&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺62&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: No&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;7-flow-tabs-pro&#34;&gt;7. Flow Tabs Pro
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Cut Chrome&amp;rsquo;s memory usage in half with smart tab suspension&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Stop Chrome from eating your RAM. Flow Tabs Pro suspends inactive tabs, cutting memory usage up to 50%. Super Focus Mode, session switching, and seamless tab management—all without disrupting your workflow. 7-day free trial.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/AXTFSKCTB6PBKM?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/flow-tabs-pro?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Flow Tabs Pro&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Chrome, tab management, memory saver, tab suspension, RAM optimization, Flow Tabs Pro, productivity, free trial, Super Focus Mode, session switching&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺40&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: No&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;8-minarah---prayer-reminder-app&#34;&gt;8. Minarah - Prayer Reminder App
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Never miss a prayer on your mac&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Minarah is a full-screen prayer reminder for macOS with elegant alerts, Smart Countdown, and Focus Mode — helping you stay on time for every Salah without breaking focus.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/BWHARV4VSHGECD?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/minarah-prayer-reminder-app?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Minarah - Prayer Reminder App&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Prayer reminder, macOS, Salah, alerts, countdown, focus mode, Islamic app, prayer times&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺25&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: No&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;9-ai-financial-report-10k-q-etc&#34;&gt;9. AI Financial Report (10K, Q, etc)
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Fast, clear annual report with smart financial metrics&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Delivers instant, comprehensive financial report summaries with one click. Upload statements and get a clear breakdown of key metrics business, finance statement, growth all displayed in a modern UI. Save hours, work smarter, and make informed decisions.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/JYNEFFTGZAQURO?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/ai-financial-report-10k-q-etc?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;AI Financial Report (10K, Q, etc) &#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI Financial Report, financial summaries, annual reports, 10K, Q reports, financial metrics, business analysis, fast reporting, modern UI, informed decisions, data analysis&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺19&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: No&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;10-strayl&#34;&gt;10. Strayl
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Build everywhere&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Strayl is a mobile AI coding agent - like Claude Code, but built for iOS. Edit, refactor, and preview your projects with AI. It connects to GitHub, indexes your codebase, connects to MCP servers to use their tools, and can spin up instant preview deployments.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/SVUDZA2LA6C362?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/strayl?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://producthunt.programnotes.cn/&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Strayl&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI coding agent, iOS, Code editing, Refactoring, AI-powered, GitHub integration, Instant preview, Deployment, Mobile coding, Strayl&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺19&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: No&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-12 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
</description>
        </item>
        <item>
        <title>examples</title>
        <link>https://producthunt.programnotes.cn/en/p/examples/</link>
        <pubDate>Thu, 11 Sep 2025 15:29:44 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/examples/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1639401182416-313713ce68de?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTc1NzU3Mzd8&amp;ixlib=rb-4.1.0" alt="Featured image of post examples" /&gt;&lt;h1 id=&#34;vercelexamples&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/examples&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vercel/examples&lt;/a&gt;
&lt;/h1&gt;&lt;p align=&#34;center&#34;&gt;
  &lt;a href=&#34;https://vercel.com&#34;&gt;
    &lt;img src=&#34;https://assets.vercel.com/image/upload/v1588805858/repositories/vercel/logo.png&#34; height=&#34;96&#34;&gt;
    &lt;h3 align=&#34;center&#34;&gt;Vercel Examples&lt;/h3&gt;
  &lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://producthunt.programnotes.cn/solutions&#34; &gt;Solutions&lt;/a&gt; – Demos, reference architecture, and best practices&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://producthunt.programnotes.cn/starter&#34; &gt;Starter&lt;/a&gt; – Functional applications which can act as a starting point&lt;/li&gt;
&lt;li&gt;And more!&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;vercel-templates&#34;&gt;Vercel Templates
&lt;/h2&gt;&lt;p&gt;Multiple examples are being featured in &lt;a class=&#34;link&#34; href=&#34;https://vercel.com/templates&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Vercel&amp;rsquo;s Templates&lt;/a&gt;, visit that page for more advanced filtering options.&lt;/p&gt;
&lt;h3 id=&#34;for-vercelians&#34;&gt;For Vercelians
&lt;/h3&gt;&lt;p&gt;Examples that have front matter metadata will create a new Draft template in &lt;a class=&#34;link&#34; href=&#34;https://app.contentful.com&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Contentful&lt;/a&gt;, for more steps on how to publish a template, read &lt;a class=&#34;link&#34; href=&#34;./internal/publishing-templates.md&#34; &gt;Publishing Templates&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;adding-a-new-example&#34;&gt;Adding a new example
&lt;/h2&gt;&lt;p&gt;To quickly start contributing with a new example, run the following commands:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm i
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm new-example
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If the script above isn&amp;rsquo;t used, make sure the example complies with the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It must have a &lt;code&gt;.gitignore&lt;/code&gt; similar to &lt;a class=&#34;link&#34; href=&#34;./plop-templates/example/.gitignore&#34; &gt;plop-templates/example/.gitignore&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;It must have a &lt;code&gt;package.json&lt;/code&gt; similar to &lt;a class=&#34;link&#34; href=&#34;./plop-templates/example/package.json&#34; &gt;plop-templates/example/package.json&lt;/a&gt; (usage of Next.js is optional). The license should be &lt;code&gt;MIT&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;It must have a &lt;code&gt;README.md&lt;/code&gt; similar to &lt;a class=&#34;link&#34; href=&#34;./plop-templates/example/README.md&#34; &gt;plop-templates/example/README.md&lt;/a&gt;. The example has to be able to include a demo URL (the Vercel team will deploy it!) and if it requires environment variables, it must have a &lt;code&gt;.env.example&lt;/code&gt; file and instructions on how to set them up. Take &lt;a class=&#34;link&#34; href=&#34;./edge-middleware/bot-protection-datadome/README.md&#34; &gt;bot-protection-datadome&lt;/a&gt; as an example.
&lt;ul&gt;
&lt;li&gt;To customize the Vercel Deploy Button take a look at the &lt;a class=&#34;link&#34; href=&#34;https://vercel.com/docs/deploy-button&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;docs&lt;/a&gt;, useful if the deployment has required environment variables.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If using Next.js, it must have a &lt;code&gt;.eslintrc.json&lt;/code&gt; similar to &lt;a class=&#34;link&#34; href=&#34;./plop-templates/example/.eslintrc.json&#34; &gt;plop-templates/example/.eslintrc.json&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;All Next.js examples should be using the same styling and layout provided by &lt;code&gt;@vercel/examples-ui&lt;/code&gt;, its usage can be seen in the &lt;a class=&#34;link&#34; href=&#34;./plop-templates/example&#34; &gt;plop template&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;adding-a-template&#34;&gt;Adding a template
&lt;/h3&gt;&lt;p&gt;If you would like the example to be featured in &lt;a class=&#34;link&#34; href=&#34;https://vercel.com/templates&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vercel.com/templates&lt;/a&gt; then also add the front matter metadata to the top of the readme, like in &lt;a class=&#34;link&#34; href=&#34;./edge-middleware/bot-protection-datadome/README.md&#34; &gt;bot-protection-datadome&lt;/a&gt;. To know all the possible values for each metadata take a look at &lt;a class=&#34;link&#34; href=&#34;./internal/fields.json&#34; &gt;&lt;code&gt;internal/fields.json&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you want to add related templates to your template, copy the &lt;code&gt;slug&lt;/code&gt; from the other template into the &lt;code&gt;relatedTemplates&lt;/code&gt; field, for example for &lt;a class=&#34;link&#34; href=&#34;https://vercel.com/templates/next.js/monorepo-turborepo&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vercel.com/templates/next.js/monorepo-turborepo&lt;/a&gt; the slug is &lt;code&gt;monorepo-turborepo&lt;/code&gt;, as written in &lt;a class=&#34;link&#34; href=&#34;./solutions/monorepo/README.md&#34; &gt;solutions/monorepo/README.md&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;the-pre-commit-hook&#34;&gt;The pre-commit hook
&lt;/h3&gt;&lt;p&gt;We use &lt;a class=&#34;link&#34; href=&#34;https://typicode.github.io/husky/#/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Husky&lt;/a&gt; to manage the pre-commit &lt;a class=&#34;link&#34; href=&#34;https://git-scm.com/docs/githooks&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Git hook&lt;/a&gt; in this repo. Husky configures hooks automatically during install, so you don&amp;rsquo;t need to do anything special to get them working, but if it fails to install, you can run the following command to install it manually:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm run prepare
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Code changes automatically go through Prettier and ESLint when you make a commit, &lt;strong&gt;please do not skip these steps&lt;/strong&gt; unless they&amp;rsquo;re broken and in that case let us known by creating an issue.&lt;/p&gt;
&lt;h2 id=&#34;read-the-docs&#34;&gt;Read the Docs
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://vercel.com/docs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Vercel Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://nextjs.org/docs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Next.js Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have any questions or suggestions about the docs, feel free to &lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/examples/discussions&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open a discussion&lt;/a&gt;, or &lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/examples/pulls&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;submit a PR&lt;/a&gt; with your suggestions!&lt;/p&gt;
&lt;h2 id=&#34;provide-feedback&#34;&gt;Provide Feedback
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/examples/discussions&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Start a Discussion&lt;/a&gt; with a question, piece of feedback, or idea you want to share with the team.&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/vercel/examples/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Open an Issue&lt;/a&gt; if you believe you&amp;rsquo;ve encountered a bug that you want to flag for the team.&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        <item>
        <title>MiniCPM-V</title>
        <link>https://producthunt.programnotes.cn/en/p/minicpm-v/</link>
        <pubDate>Tue, 02 Sep 2025 15:29:41 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/minicpm-v/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1638382620941-f5c0628d21bd?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTY3OTgwOTd8&amp;ixlib=rb-4.1.0" alt="Featured image of post MiniCPM-V" /&gt;&lt;h1 id=&#34;openbmbminicpm-v&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM-V&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenBMB/MiniCPM-V&lt;/a&gt;
&lt;/h1&gt;&lt;div align=&#34;center&#34;&gt;
&lt;p&gt;&lt;img src=&#34;./assets/minicpm_v_and_minicpm_o_title.png&#34; width=&#34;500em&#34; &gt;&lt;/img&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;./README_zh.md&#34; &gt;中文&lt;/a&gt; |
English&lt;/strong&gt;&lt;/p&gt;
&lt;span style=&#34;display: inline-flex; align-items: center; margin-right: 2px;&#34;&gt;
  &lt;img src=&#34;./assets/wechat.png&#34; alt=&#34;WeChat&#34; style=&#34;margin-right: 4px;&#34;&gt;
  &lt;a href=&#34;docs/wechat.md&#34; target=&#34;_blank&#34;&gt; WeChat&lt;/a&gt; &amp;nbsp;|
&lt;/span&gt;
&amp;nbsp;
&lt;span style=&#34;display: inline-flex; align-items: center; margin-left: -8px;&#34;&gt;
&lt;img src=&#34;./assets/discord.png&#34; alt=&#34;Discord&#34; style=&#34;margin-right: 4px;&#34;&gt;
  &lt;a href=&#34;https://discord.gg/rftuRMbqzf&#34; target=&#34;_blank&#34;&gt; Discord&lt;/a&gt; &amp;nbsp;
&lt;/span&gt;
&lt;p align=&#34;center&#34;&gt;
   MiniCPM-V 4.5 &lt;a href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5&#34;&gt;🤗&lt;/a&gt; &lt;a href=&#34;http://101.126.42.235:30910/&#34;&gt;🤖&lt;/a&gt; | MiniCPM-o 2.6 &lt;a href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6&#34;&gt;🤗&lt;/a&gt;  &lt;a href=&#34;https://minicpm-omni-webdemo-us.modelbest.cn/&#34;&gt; 🤖&lt;/a&gt; | &lt;a href=&#34;https://github.com/OpenSQZ/MiniCPM-V-Cookbook&#34;&gt;🍳 Cookbook&lt;/a&gt; | 
  📄 Technical Report (Coming Soon)
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;MiniCPM-V&lt;/strong&gt; is a series of efficient end-side multimodal LLMs (MLLMs), which accept images, videos and text as inputs and deliver high-quality text outputs. &lt;strong&gt;MiniCPM-o&lt;/strong&gt; additionally takes audio as inputs and provides high-quality speech outputs in an end-to-end fashion. Since February 2024, we have released 7 versions of the model, aiming to achieve &lt;strong&gt;strong performance and efficient deployment&lt;/strong&gt;. The most notable models in the series currently include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MiniCPM-V 4.5&lt;/strong&gt;: 🔥🔥🔥 The latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, this model &lt;strong&gt;outperforms GPT-4o-latest, Gemini-2.0 Pro, and Qwen2.5-VL 72B&lt;/strong&gt; in vision-language capabilities, making it the most performant on-device multimodal model in the open-source community. This version brings &lt;strong&gt;new features including efficient high-FPS and long video understanding (up to 96x compression rate for video tokens), controllable hybrid fast/deep thinking, strong handwritten OCR and complex table/document parsing&lt;/strong&gt;. It also advances MiniCPM-V&amp;rsquo;s popular features such as trustworthy behavior, multilingual support and end-side deployability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MiniCPM-o 2.6&lt;/strong&gt;: ⭐️⭐️⭐️ The most capable model in the MiniCPM-o series. With a total of 8B parameters, this end-to-end model &lt;strong&gt;achieves comparable performance to GPT-4o-202405 in vision, speech, and multimodal live streaming&lt;/strong&gt;, making it one of the most versatile and performant models in the open-source community. For the new voice mode, MiniCPM-o 2.6 &lt;strong&gt;supports bilingual real-time speech conversation with configurable voices&lt;/strong&gt;, and also allows for fun capabilities such as emotion/speed/style control, end-to-end voice cloning, role play, etc. Due to its superior token density, MiniCPM-o 2.6 can for the first time &lt;strong&gt;support multimodal live streaming on end-side devices&lt;/strong&gt; such as iPad.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;news&#34;&gt;News &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;h4 id=&#34;-pinned&#34;&gt;📌 Pinned
&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;[2025.09.01] ⭐️⭐️⭐️ MiniCPM-V 4.5 has been officially supported by &lt;a class=&#34;link&#34; href=&#34;https://github.com/ggml-org/llama.cpp/pull/15575&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;llama.cpp&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/vllm-project/vllm/pull/23586&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vLLM&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/hiyouga/LLaMA-Factory/pull/9022&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LLaMA-Factory&lt;/a&gt;. You are welcome to use it directly through these official channels! Support for additional frameworks such as &lt;a class=&#34;link&#34; href=&#34;https://github.com/ollama/ollama/pull/12078&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Ollama&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/sgl-project/sglang/pull/9610&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SGLang&lt;/a&gt; is actively in progress.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.08.26] 🔥🔥🔥 We open-source MiniCPM-V 4.5, which outperforms GPT-4o-latest, Gemini-2.0 Pro, and Qwen2.5-VL 72B. It advances popular capabilities of MiniCPM-V, and brings useful new features. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.08.01] ⭐️⭐️⭐️ We open-sourced the &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-V &amp;amp; o Cookbook&lt;/a&gt;! It provides comprehensive guides for diverse user scenarios, paired with our new &lt;a class=&#34;link&#34; href=&#34;https://minicpm-o.readthedocs.io/en/latest/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Docs Site&lt;/a&gt; for smoother onboarding.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.06.20] ⭐️⭐️⭐️ Our official &lt;a class=&#34;link&#34; href=&#34;https://ollama.com/openbmb&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Ollama repository&lt;/a&gt; is released. Try our latest models with &lt;a class=&#34;link&#34; href=&#34;https://ollama.com/openbmb/minicpm-o2.6&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;one click&lt;/a&gt;！&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.03.01] 🚀🚀🚀 RLAIF-V, the alignment technique of MiniCPM-o, is accepted by CVPR 2025 Highlights！The &lt;a class=&#34;link&#34; href=&#34;https://github.com/RLHF-V/RLAIF-V&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;code&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;dataset&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/abs/2405.17220&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;paper&lt;/a&gt; are open-sourced!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.01.24] 📢📢📢 MiniCPM-o 2.6 technical report is released! See &lt;a class=&#34;link&#34; href=&#34;https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.01.19] 📢 &lt;strong&gt;ATTENTION!&lt;/strong&gt; We are currently working on merging MiniCPM-o 2.6 into the official repositories of llama.cpp, Ollama, and vllm. Until the merge is complete, please USE OUR LOCAL FORKS of &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;llama.cpp&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Ollama&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#efficient-inference-with-llamacpp-ollama-vllm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vllm&lt;/a&gt;. &lt;strong&gt;Using the official repositories before the merge may lead to unexpected issues&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.01.19] ⭐️⭐️⭐️ MiniCPM-o tops GitHub Trending and reaches top-2 on Hugging Face Trending!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.01.17] We have updated the usage of MiniCPM-o 2.6 int4 quantization version and resolved the model initialization error. Click &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt; and try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.01.13] 🔥🔥🔥 We open-source MiniCPM-o 2.6, which matches GPT-4o-202405 on vision, speech and multimodal live streaming. It advances popular capabilities of MiniCPM-V 2.6, and supports various new fun features. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.08.17] 🚀🚀🚀 MiniCPM-V 2.6 is now fully supported by &lt;a class=&#34;link&#34; href=&#34;https://github.com/ggerganov/llama.cpp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;official&lt;/a&gt; llama.cpp! GGUF models of various sizes are available &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-2_6-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.08.06] 🔥🔥🔥 We open-source MiniCPM-V 2.6, which outperforms GPT-4V on single image, multi-image and video understanding. It advances popular features of MiniCPM-Llama3-V 2.5, and can support real-time video understanding on iPad. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/abs/2408.01800&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;. Come and try it out!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;br&gt;
&lt;details&gt; 
&lt;summary&gt;Click to view more news.&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;[2025.08.02] 🚀🚀🚀 We open-source MiniCPM-V 4.0, which outperforms GPT-4.1-mini-20250414 in image understanding. It advances popular features of MiniCPM-V 2.6, and largely improves the efficiency. We also open-source the iOS App on iPhone and iPad. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2025.01.23] 💡💡💡 MiniCPM-o 2.6 is now supported by &lt;a class=&#34;link&#34; href=&#34;https://github.com/PKU-Alignment/align-anything&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Align-Anything&lt;/a&gt;, a framework by PKU-Alignment Team for aligning any-to-any modality large models with human intentions. It supports DPO and SFT fine-tuning on both vision and audio. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.08.15] We now also support multi-image SFT. For more details, please refer to the &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;document&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.08.14] MiniCPM-V 2.6 now also supports &lt;a class=&#34;link&#34; href=&#34;https://github.com/modelscope/ms-swift/issues/1613&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;fine-tuning&lt;/a&gt; with the SWIFT framework!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.08.10] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 is now fully supported by &lt;a class=&#34;link&#34; href=&#34;https://github.com/ggerganov/llama.cpp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;official&lt;/a&gt; llama.cpp! GGUF models of various sizes are available &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See &lt;a class=&#34;link&#34; href=&#34;#inference-with-vllm&#34; &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model&amp;rsquo;s layers across multiple GPUs. For more details, check this &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;link&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and Ollama! Please pull the latest code &lt;strong&gt;of our provided forks&lt;/strong&gt; (&lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;llama.cpp&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Ollama&lt;/a&gt;). GGUF models in various sizes are available &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;. MiniCPM-Llama3-V 2.5 series is &lt;strong&gt;not supported by the official repositories yet&lt;/strong&gt;, and we are working hard to merge PRs. Please stay tuned!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.24] We release the MiniCPM-Llama3-V 2.5 &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;gguf&lt;/a&gt;, which supports &lt;a class=&#34;link&#34; href=&#34;#inference-with-llamacpp&#34; &gt;llama.cpp&lt;/a&gt; inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.23] 🔍 We&amp;rsquo;ve released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmark evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click &lt;a class=&#34;link&#34; href=&#34;./docs/compare_with_phi-3_vision.md&#34; &gt;here&lt;/a&gt; to view more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide &lt;a class=&#34;link&#34; href=&#34;#deployment-on-mobile-phone&#34; &gt;efficient inference&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;./finetune/readme.md&#34; &gt;simple fine-tuning&lt;/a&gt;. Try it now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.04.23] MiniCPM-V-2.0 supports vLLM now! Click &lt;a class=&#34;link&#34; href=&#34;#inference-with-vllm&#34; &gt;here&lt;/a&gt; to view more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.04.18] We create a HuggingFace Space to host the demo of MiniCPM-V 2.0 at &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/spaces/openbmb/MiniCPM-V-2&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.04.17] MiniCPM-V-2.0 supports deploying &lt;a class=&#34;link&#34; href=&#34;#webui-demo&#34; &gt;WebUI Demo&lt;/a&gt; now!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.04.15] MiniCPM-V-2.0 now also supports &lt;a class=&#34;link&#34; href=&#34;https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2%e6%9c%80%e4%bd%b3%e5%ae%9e%e8%b7%b5.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;fine-tuning&lt;/a&gt; with the SWIFT framework!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.04.12] We open-source MiniCPM-V 2.0, which achieves comparable performance with Gemini Pro in understanding scene text and outperforms strong Qwen-VL-Chat 9.6B and Yi-VL 34B on &lt;a href=&#34;https://rank.opencompass.org.cn/leaderboard-multimodal&#34;&gt;OpenCompass&lt;/a&gt;, a comprehensive evaluation over 11 popular benchmarks. Click &lt;a href=&#34;https://openbmb.vercel.app/minicpm-v-2&#34;&gt;here&lt;/a&gt; to view the MiniCPM-V 2.0 technical blog.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.03.14] MiniCPM-V now supports &lt;a class=&#34;link&#34; href=&#34;https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v%e6%9c%80%e4%bd%b3%e5%ae%9e%e8%b7%b5.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;fine-tuning&lt;/a&gt; with the SWIFT framework. Thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/Jintao-Huang&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Jintao&lt;/a&gt; for the contribution！&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.03.01] MiniCPM-V can now be deployed on Mac!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[2024.02.01] We open-source MiniCPM-V and OmniLMM-12B, which support efficient end-side deployment and powerful multimodal capabilities correspondingly.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt; 
&lt;h2 id=&#34;contents&#34;&gt;Contents &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#minicpm-v-45&#34; &gt;MiniCPM-V 4.5&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#inference-efficiency&#34; &gt;Inference Efficiency&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#minicpm-o-26&#34; &gt;MiniCPM-o 2.6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#minicpm-v--o-cookbook&#34; &gt;MiniCPM-V &amp;amp; o Cookbook&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#chat-with-our-demo-on-gradio-&#34; &gt;Chat with Our Demo on Gradio 🤗&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#inference&#34; &gt;Inference&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#model-zoo&#34; &gt;Model Zoo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#multi-turn-conversation&#34; &gt;Multi-turn Conversation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#chat-with-multiple-images&#34; &gt;Chat with Multiple Images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#in-context-few-shot-learning&#34; &gt;In-context Few-shot Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#chat-with-video&#34; &gt;Chat with Video&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#speech-and-audio-mode&#34; &gt;Speech and Audio Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#multimodal-live-streaming&#34; &gt;Multimodal Live Streaming&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#inference-on-multiple-gpus&#34; &gt;Inference on Multiple GPUs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#inference-on-mac&#34; &gt;Inference on Mac&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#efficient-inference-with-llamacpp-ollama-vllm&#34; &gt;Efficient Inference with llama.cpp, Ollama, vLLM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#fine-tuning&#34; &gt;Fine-tuning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#awesome-work-using-minicpm-v--minicpm-o&#34; &gt;Awesome work using MiniCPM-V &amp;amp; MiniCPM-o&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#faqs&#34; &gt;FAQs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#limitations&#34; &gt;Limitations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;minicpm-v-45&#34;&gt;MiniCPM-V 4.5
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;MiniCPM-V 4.5&lt;/strong&gt; is the latest and most capable model in the MiniCPM-V series. The model is built on Qwen3-8B and SigLIP2-400M with a total of 8B parameters. It exhibits a significant performance improvement over previous MiniCPM-V and MiniCPM-o models, and introduces new useful features. Notable features of MiniCPM-V 4.5 include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;🔥 &lt;strong&gt;State-of-the-art Vision-Language Capability.&lt;/strong&gt;
MiniCPM-V 4.5 achieves an average score of 77.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. &lt;strong&gt;With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-latest, Gemini-2.0 Pro, and strong open-source models like Qwen2.5-VL 72B&lt;/strong&gt; for vision-language capabilities, making it the most performant MLLM under 30B parameters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🎬 &lt;strong&gt;Efficient High-FPS and Long Video Understanding.&lt;/strong&gt; Powered by a new unified 3D-Resampler over images and videos, MiniCPM-V 4.5 can now achieve 96x compression rate for video tokens, where 6 448x448 video frames can be jointly compressed into 64 video tokens (normally 1,536 tokens for most MLLMs). This means that the model can perceive significantly more video frames without increasing the LLM inference cost. This brings state-of-the-art high-FPS (up to 10FPS) video understanding and long video understanding capabilities on Video-MME, LVBench, MLVU, MotionBench, FavorBench, etc., efficiently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;⚙️ &lt;strong&gt;Controllable Hybrid Fast/Deep Thinking.&lt;/strong&gt; MiniCPM-V 4.5 supports both fast thinking for efficient frequent usage with competitive performance, and deep thinking for more complex problem solving. To cover efficiency and performance trade-offs in different user scenarios, this fast/deep thinking mode can be switched in a highly controlled fashion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;💪 &lt;strong&gt;Strong OCR, Document Parsing and Others.&lt;/strong&gt;
Based on &lt;a class=&#34;link&#34; href=&#34;https://arxiv.org/pdf/2403.11703&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LLaVA-UHD&lt;/a&gt; architecture, MiniCPM-V 4.5 can process high-resolution images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), using 4x fewer visual tokens than most MLLMs. The model achieves &lt;strong&gt;leading performance on OCRBench, surpassing proprietary models such as GPT-4o-latest and Gemini 2.5&lt;/strong&gt;. It also achieves state-of-the-art performance for PDF document parsing capability on OmniDocBench among general MLLMs. Based on the latest &lt;a class=&#34;link&#34; href=&#34;https://github.com/RLHF-V/RLAIF-V/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLAIF-V&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/VisCPM&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VisCPM&lt;/a&gt; techniques, it features &lt;strong&gt;trustworthy behaviors&lt;/strong&gt;, outperforming GPT-4o-latest on MMHal-Bench, and supports &lt;strong&gt;multilingual capabilities&lt;/strong&gt; in more than 30 languages.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;💫  &lt;strong&gt;Easy Usage.&lt;/strong&gt;
MiniCPM-V 4.5 can be easily used in various ways: (1) &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/llama.cpp/blob/Support-MiniCPM-V-4.5/docs/multimodal/minicpmv4.5.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;llama.cpp&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/ollama/tree/MIniCPM-V&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ollama&lt;/a&gt; support for efficient CPU inference on local devices, (2) &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;int4&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GGUF&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/AutoAWQ&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;AWQ&lt;/a&gt; format quantized models in 16 sizes, (3) &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/sglang/tree/main&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SGLang&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;#efficient-inference-with-llamacpp-ollama-vllm&#34; &gt;vLLM&lt;/a&gt; support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/transformers/tree/main&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Transformers&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;./docs/llamafactory_train_and_infer.md&#34; &gt;LLaMA-Factory&lt;/a&gt;, (5) quick &lt;a class=&#34;link&#34; href=&#34;#chat-with-our-demo-on-gradio&#34; &gt;local WebUI demo&lt;/a&gt;, (6) optimized &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/MiniCPM-o-demo-iOS&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;local iOS app&lt;/a&gt; on iPhone and iPad, and (7) online web demo on &lt;a class=&#34;link&#34; href=&#34;http://101.126.42.235:30910/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;server&lt;/a&gt;. See our &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Cookbook&lt;/a&gt; for full usage!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;key-techniques&#34;&gt;Key Techniques &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;./assets/minicpm-v-4dot5-framework.png&#34; , width=100%&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Architechture: Unified 3D-Resampler for High-density Video Compression.&lt;/strong&gt; MiniCPM-V 4.5 introduces a 3D-Resampler that overcomes the performance-efficiency trade-off in video understanding. By grouping and jointly compressing up to 6 consecutive video frames into just 64 tokens (the same token count used for a single image in MiniCPM-V series), MiniCPM-V 4.5 achieves a 96× compression rate for video tokens. This allows the model to process more video frames without additional LLM computational cost, enabling high-FPS video and long video understanding. The architecture supports unified encoding for images, multi-image inputs, and videos, ensuring seamless capability and knowledge transfer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pre-training: Unified Learning for OCR and Knowledge from Documents.&lt;/strong&gt; Existing MLLMs learn OCR capability and knowledge from documents in isolated training approaches. We observe that the essential difference between these two training approaches is the visibility of the text in images. By dynamically corrupting text regions in documents with varying noise levels and asking the model to reconstruct the text, the model learns to adaptively and properly switch between accurate text recognition (when text is visible) and multimodal context-based knowledge reasoning (when text is heavily obscured). This eliminates reliance on error-prone document parsers in knowledge learning from documents, and prevents hallucinations from over-augmented OCR data, resulting in top-tier OCR and multimodal knowledge performance with minimal engineering overhead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Post-training: Hybrid Fast/Deep Thinking with Multimodal RL.&lt;/strong&gt; MiniCPM-V 4.5 offers a balanced reasoning experience through two switchable modes: fast thinking for efficient daily use and deep thinking for complex tasks. Using a new hybrid reinforcement learning method, the model jointly optimizes both modes, significantly enhancing fast-mode performance without compromising deep-mode capability. Incorporated with &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/RLPR&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLPR&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/RLHF-V/RLAIF-V&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLAIF-V&lt;/a&gt;, it generalizes robust reasoning skills from broad multimodal data while effectively reducing hallucinations.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;evaluation&#34;&gt;Evaluation  &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;div align=&#34;center&#34;&gt;
  &lt;img src=&#34;./assets/radar_minicpm_v45.png&#34;, width=60%&gt;
&lt;/div&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;./assets/minicpmv_4_5_evaluation_result.png&#34; , width=80%&gt;
&lt;/div&gt;
&lt;h3 id=&#34;inference-efficiency&#34;&gt;Inference Efficiency
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;OpenCompass&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;left&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
            &lt;tr&gt;
              &lt;th align=&#34;left&#34;&gt;Model&lt;/th&gt;
              &lt;th&gt;Size&lt;/th&gt;
              &lt;th&gt;Avg Score ↑&lt;/th&gt;
              &lt;th&gt;Total Inference Time ↓&lt;/th&gt;
            &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GLM-4.1V-9B-Thinking&lt;/td&gt;
            &lt;td&gt;10.3B&lt;/td&gt;
            &lt;td&gt;76.6&lt;/td&gt;
            &lt;td&gt;17.5h&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiMo-VL-7B-RL&lt;/td&gt;
            &lt;td&gt;8.3B&lt;/td&gt;
            &lt;td&gt;76.4&lt;/td&gt;
            &lt;td&gt;11h&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-V 4.5&lt;/td&gt;
            &lt;td&gt;8.7B&lt;/td&gt;
            &lt;td&gt;&lt;b&gt;77.0&lt;/td&gt;
            &lt;td&gt;&lt;b&gt;7.5h&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Video-MME&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;left&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
          &lt;tr&gt;
              &lt;th align=&#34;left&#34;&gt;Model&lt;/th&gt;
              &lt;th&gt;Size&lt;/th&gt;
              &lt;th&gt;Avg Score ↑&lt;/th&gt;
              &lt;th&gt;Total Inference Time ↓&lt;/th&gt;
              &lt;th&gt;GPU Mem ↓&lt;/th&gt;
          &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
          &lt;tr&gt;
              &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Qwen2.5-VL-7B-Instruct&lt;/td&gt;
              &lt;td&gt;8.3B&lt;/td&gt;
              &lt;td&gt;71.6&lt;/td&gt;
              &lt;td&gt;3h&lt;/td&gt;
              &lt;td&gt;60G&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GLM-4.1V-9B-Thinking&lt;/td&gt;
              &lt;td&gt;10.3B&lt;/td&gt;
              &lt;td&gt;&lt;b&gt;73.6&lt;/td&gt;
              &lt;td&gt;2.63h&lt;/td&gt;
              &lt;td&gt;32G&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-V 4.5&lt;/td&gt;
              &lt;td&gt;8.7B&lt;/td&gt;
              &lt;td&gt;73.5&lt;/td&gt;
              &lt;td&gt;&lt;b&gt;0.26h&lt;/td&gt;
              &lt;td&gt;&lt;b&gt;28G&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Both Video-MME and OpenCompass were evaluated using 8×A100 GPUs for inference. The reported inference time of Video-MME includes full model-side computation, and excludes the external cost of video frame extraction (dependent on specific frame extraction tools) for fair comparison.&lt;/p&gt;
&lt;h3 id=&#34;examples&#34;&gt;Examples  &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;div align=&#34;center&#34;&gt;
  &lt;a href=&#34;https://www.youtube.com/watch?v=Cn23FujYMMU&#34;&gt;&lt;img src=&#34;./assets/minicpmv4_5/MiniCPM-V 4.5-8.26_img.jpeg&#34;, width=70%&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;div style=&#34;display: flex; flex-direction: column; align-items: center;&#34;&gt;
  &lt;img src=&#34;assets/minicpmv4_5/en_case1.png&#34; alt=&#34;en_case1&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
  &lt;img src=&#34;assets/minicpmv4_5/en_case2.png&#34; alt=&#34;en_case2&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
  &lt;img src=&#34;assets/minicpmv4_5/en_case3.jpeg&#34; alt=&#34;en_case3&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
&lt;/div&gt;
&lt;details&gt;
&lt;summary&gt;Click to view more cases.&lt;/summary&gt;
&lt;div style=&#34;display: flex; flex-direction: column; align-items: center;&#34;&gt;
  &lt;img src=&#34;assets/minicpmv4_5/zh_extra.jpeg&#34; alt=&#34;zh_extra&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;p&gt;We deploy MiniCPM-V 4.5 on iPad M4 with &lt;a class=&#34;link&#34; href=&#34;https://github.com/tc-mb/MiniCPM-o-demo-iOS&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;iOS demo&lt;/a&gt;. The demo video is the raw screen recording without edition.&lt;/p&gt;
&lt;table align=&#34;center&#34;&gt; 
    &lt;p align=&#34;center&#34;&gt;
      &lt;img src=&#34;assets/minicpmv4_5/v45_en_handwriting.gif&#34; width=45%/&gt;
      &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
      &lt;img src=&#34;assets/minicpmv4_5/v45_en_cot.gif&#34; width=45%/&gt;
    &lt;/p&gt;
    &lt;p align=&#34;center&#34;&gt;
      &lt;img src=&#34;assets/minicpmv4_5/v45_cn_handwriting.gif&#34; width=45%/&gt;
      &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
      &lt;img src=&#34;assets/minicpmv4_5/v45_cn_travel.gif&#34; width=45%/&gt;
    &lt;/p&gt;
&lt;/table&gt;
&lt;h2 id=&#34;minicpm-o-26&#34;&gt;MiniCPM-o 2.6
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;MiniCPM-o 2.6&lt;/strong&gt; is the latest and most capable model in the MiniCPM-o series. The model is built in an end-to-end fashion based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M, and Qwen2.5-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.6, and introduces new features for real-time speech conversation and multimodal live streaming. Notable features of MiniCPM-o 2.6 include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;🔥 &lt;strong&gt;Leading Visual Capability.&lt;/strong&gt;
MiniCPM-o 2.6 achieves an average score of 70.2 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. &lt;strong&gt;With only 8B parameters, it surpasses widely used proprietary models like GPT-4o-202405, Gemini 1.5 Pro, and Claude 3.5 Sonnet&lt;/strong&gt; for single image understanding. It also &lt;strong&gt;outperforms GPT-4V and Claude 3.5 Sonnet&lt;/strong&gt; in multi-image and video understanding, and shows promising in-context learning capability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🎙 &lt;strong&gt;State-of-the-art Speech Capability.&lt;/strong&gt; MiniCPM-o 2.6 supports &lt;strong&gt;bilingual real-time speech conversation with configurable voices&lt;/strong&gt; in English and Chinese. It &lt;strong&gt;outperforms GPT-4o-realtime on audio understanding tasks&lt;/strong&gt; such as ASR and STT translation, and shows &lt;strong&gt;state-of-the-art performance on speech conversation in both semantic and acoustic evaluations in the open-source community&lt;/strong&gt;. It also allows for fun features such as emotion/speed/style control, end-to-end voice cloning, role play, etc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🎬 &lt;strong&gt;Strong Multimodal Live Streaming Capability.&lt;/strong&gt; As a new feature, MiniCPM-o 2.6 can &lt;strong&gt;accept continuous video and audio streams independent of user queries, and support real-time speech interaction&lt;/strong&gt;. It &lt;strong&gt;outperforms GPT-4o-202408 and Claude 3.5 Sonnet and shows state-of-the-art performance in the open-source community on StreamingBench&lt;/strong&gt;, a comprehensive benchmark for real-time video understanding, omni-source (video &amp;amp; audio) understanding, and multimodal contextual understanding.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;💪 &lt;strong&gt;Strong OCR Capability and Others.&lt;/strong&gt;
Advancing popular visual capabilities from MiniCPM-V series, MiniCPM-o 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344). It achieves &lt;strong&gt;state-of-the-art performance on OCRBench for models under 25B, surpassing proprietary models such as GPT-4o-202405&lt;/strong&gt;.
Based on the latest &lt;a class=&#34;link&#34; href=&#34;https://github.com/RLHF-V/RLAIF-V/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLAIF-V&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/VisCPM&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VisCPM&lt;/a&gt; techniques, it features &lt;strong&gt;trustworthy behaviors&lt;/strong&gt;, outperforming GPT-4o and Claude 3.5 Sonnet on MMHal-Bench, and supports &lt;strong&gt;multilingual capabilities&lt;/strong&gt; on more than 30 languages.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;🚀 &lt;strong&gt;Superior Efficiency.&lt;/strong&gt;
In addition to its friendly size, MiniCPM-o 2.6 also shows &lt;strong&gt;state-of-the-art token density&lt;/strong&gt; (i.e., the number of pixels encoded into each visual token). &lt;strong&gt;It produces only 640 tokens when processing a 1.8M pixel image, which is 75% fewer than most models&lt;/strong&gt;. This directly improves the inference speed, first-token latency, memory usage, and power consumption. As a result, MiniCPM-o 2.6 can efficiently support &lt;strong&gt;multimodal live streaming&lt;/strong&gt; on end-side devices such as iPads.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;💫  &lt;strong&gt;Easy Usage.&lt;/strong&gt;
MiniCPM-o 2.6 can be easily used in various ways: (1) &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/llama.cpp/blob/minicpm-omni/examples/llava/README-minicpmo2.6.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;llama.cpp&lt;/a&gt; support for efficient CPU inference on local devices, (2) &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;int4&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GGUF&lt;/a&gt; format quantized models in 16 sizes, (3) &lt;a class=&#34;link&#34; href=&#34;#efficient-inference-with-llamacpp-ollama-vllm&#34; &gt;vLLM&lt;/a&gt; support for high-throughput and memory-efficient inference, (4) fine-tuning on new domains and tasks with &lt;a class=&#34;link&#34; href=&#34;./docs/llamafactory_train_and_infer.md&#34; &gt;LLaMA-Factory&lt;/a&gt;, (5) quick &lt;a class=&#34;link&#34; href=&#34;#chat-with-our-demo-on-gradio&#34; &gt;local WebUI demo&lt;/a&gt;, and (6) online web demo on &lt;a class=&#34;link&#34; href=&#34;https://minicpm-omni-webdemo-us.modelbest.cn/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;server&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Architecture.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;End-to-end Omni-modal Architecture.&lt;/strong&gt; Different modality encoders/decoders are connected and trained in an &lt;strong&gt;end-to-end&lt;/strong&gt; fashion to fully exploit rich multimodal knowledge. The model is trained in a fully end-to-end manner with only CE loss.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Omni-modal Live Streaming Mechanism.&lt;/strong&gt; (1) We change the offline modality encoder/decoders into online ones for &lt;strong&gt;streaming inputs/outputs.&lt;/strong&gt; (2) We devise a &lt;strong&gt;time-division multiplexing (TDM) mechanism&lt;/strong&gt; for omni-modality streaming processing in the LLM backbone. It divides parallel omni-modality streams into sequential info within small periodic time slices.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Configurable Speech Modeling Design.&lt;/strong&gt; We devise a multimodal system prompt, including traditional text system prompt, and &lt;strong&gt;a new audio system prompt to determine the assistant voice&lt;/strong&gt;. This enables flexible voice configurations in inference time, and also facilitates end-to-end voice cloning and description-based voice creation.&lt;/li&gt;
&lt;/ul&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;./assets/minicpm-o-26-framework-v2.png&#34; , width=80%&gt;
&lt;/div&gt;
&lt;h3 id=&#34;evaluation-1&#34;&gt;Evaluation  &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;div align=&#34;center&#34;&gt;
  &lt;img src=&#34;./assets/radar.jpg&#34;, width=80%&gt;
&lt;/div&gt;
&lt;details&gt;
&lt;summary&gt;Click to view visual understanding results.&lt;/summary&gt;
&lt;p&gt;&lt;strong&gt;Image Understanding&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Model&lt;/th&gt;
            &lt;th&gt;Size&lt;/th&gt;
            &lt;th&gt;Token Density&lt;sup&gt;+&lt;/sup&gt;&lt;/th&gt;
            &lt;th&gt;OpenCompass&lt;/th&gt;
            &lt;th&gt;OCRBench&lt;/th&gt;
            &lt;th&gt;MathVista mini&lt;/th&gt;
            &lt;th&gt;ChartQA&lt;/th&gt;
            &lt;th&gt;MMVet&lt;/th&gt;
            &lt;th&gt;MMStar&lt;/th&gt;
            &lt;th&gt;MME&lt;/th&gt;
            &lt;th&gt;MMB1.1 test&lt;/th&gt;
            &lt;th&gt;AI2D&lt;/th&gt;
            &lt;th&gt;MMMU val&lt;/th&gt;
            &lt;th&gt;HallusionBench&lt;/th&gt;
            &lt;th&gt;TextVQA val&lt;/th&gt;
            &lt;th&gt;DocVQA test&lt;/th&gt;
            &lt;th&gt;MathVerse mini&lt;/th&gt;
            &lt;th&gt;MathVision&lt;/th&gt;
            &lt;th&gt;MMHal Score&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;19&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Proprietary&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT-4o-20240513&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;1088&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;69.9&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;736&lt;/td&gt;
            &lt;td&gt;61.3&lt;/td&gt;
            &lt;td&gt;85.7&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;69.1&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;63.9&lt;/td&gt;
            &lt;td&gt;2328.7&lt;/td&gt;
            &lt;td&gt;82.2&lt;/td&gt;
            &lt;td&gt;84.6&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;69.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;55.0&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;92.8&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;50.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;30.4&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;3.6&lt;/u&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Claude3.5-Sonnet&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;750&lt;/td&gt;
            &lt;td&gt;67.9&lt;/td&gt;
            &lt;td&gt;788&lt;/td&gt;
            &lt;td&gt;61.6&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;90.8&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;66.0&lt;/td&gt;
            &lt;td&gt;62.2&lt;/td&gt;
            &lt;td&gt;1920.0&lt;/td&gt;
            &lt;td&gt;78.5&lt;/td&gt;
            &lt;td&gt;80.2&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;65.9&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;49.9&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;95.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;3.4&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Gemini 1.5 Pro&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;64.4&lt;/td&gt;
            &lt;td&gt;754&lt;/td&gt;
            &lt;td&gt;57.7&lt;/td&gt;
            &lt;td&gt;81.3&lt;/td&gt;
            &lt;td&gt;64.0&lt;/td&gt;
            &lt;td&gt;59.1&lt;/td&gt;
            &lt;td&gt;2110.6&lt;/td&gt;
            &lt;td&gt;73.9&lt;/td&gt;
            &lt;td&gt;79.1&lt;/td&gt;
            &lt;td&gt;60.6&lt;/td&gt;
            &lt;td&gt;45.6&lt;/td&gt;
            &lt;td&gt;73.5&lt;/td&gt;
            &lt;td&gt;86.5&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;19.2&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT-4o-mini-20240718&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;1088&lt;/td&gt;
            &lt;td&gt;64.1&lt;/td&gt;
            &lt;td&gt;785&lt;/td&gt;
            &lt;td&gt;52.4&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;66.9&lt;/td&gt;
            &lt;td&gt;54.8&lt;/td&gt;
            &lt;td&gt;2003.4&lt;/td&gt;
            &lt;td&gt;76.0&lt;/td&gt;
            &lt;td&gt;77.8&lt;/td&gt;
            &lt;td&gt;60.0&lt;/td&gt;
            &lt;td&gt;46.1&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;3.3&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;19&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Open Source&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Cambrian-34B&lt;/td&gt;
            &lt;td&gt;34B&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;1820&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;58.3&lt;/td&gt;
            &lt;td&gt;591&lt;/td&gt;
            &lt;td&gt;50.3&lt;/td&gt;
            &lt;td&gt;75.6&lt;/td&gt;
            &lt;td&gt;53.2&lt;/td&gt;
            &lt;td&gt;54.2&lt;/td&gt;
            &lt;td&gt;2049.9&lt;/td&gt;
            &lt;td&gt;77.8&lt;/td&gt;
            &lt;td&gt;79.5&lt;/td&gt;
            &lt;td&gt;50.4&lt;/td&gt;
            &lt;td&gt;41.6&lt;/td&gt;
            &lt;td&gt;76.7&lt;/td&gt;
            &lt;td&gt;75.5&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GLM-4V-9B&lt;/td&gt;
            &lt;td&gt;13B&lt;/td&gt;
            &lt;td&gt;784&lt;/td&gt;
            &lt;td&gt;59.1&lt;/td&gt;
            &lt;td&gt;776&lt;/td&gt;
            &lt;td&gt;51.1&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;58.0&lt;/td&gt;
            &lt;td&gt;54.8&lt;/td&gt;
            &lt;td&gt;2018.8&lt;/td&gt;
            &lt;td&gt;67.9&lt;/td&gt;
            &lt;td&gt;71.2&lt;/td&gt;
            &lt;td&gt;46.9&lt;/td&gt;
            &lt;td&gt;45.0&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Pixtral-12B&lt;/td&gt;
            &lt;td&gt;12B&lt;/td&gt;
            &lt;td&gt;256&lt;/td&gt;
            &lt;td&gt;61.0&lt;/td&gt;
            &lt;td&gt;685&lt;/td&gt;
            &lt;td&gt;56.9&lt;/td&gt;
            &lt;td&gt;81.8&lt;/td&gt;
            &lt;td&gt;58.5&lt;/td&gt;
            &lt;td&gt;54.5&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;72.7&lt;/td&gt;
            &lt;td&gt;79.0&lt;/td&gt;
            &lt;td&gt;51.1&lt;/td&gt;
            &lt;td&gt;47.0&lt;/td&gt;
            &lt;td&gt;75.7&lt;/td&gt;
            &lt;td&gt;90.7&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;VITA-1.5&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;784&lt;/td&gt;
            &lt;td&gt;63.3&lt;/td&gt;
            &lt;td&gt;741&lt;/td&gt;
            &lt;td&gt;66.2&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;52.7&lt;/td&gt;
            &lt;td&gt;60.2&lt;/td&gt;
            &lt;td&gt;2328.1&lt;/td&gt;
            &lt;td&gt;76.8&lt;/td&gt;
            &lt;td&gt;79.2&lt;/td&gt;
            &lt;td&gt;52.6&lt;/td&gt;
            &lt;td&gt;44.6&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;DeepSeek-VL2-27B (4B)&lt;/td&gt;
            &lt;td&gt;27B&lt;/td&gt;
            &lt;td&gt;672&lt;/td&gt;
            &lt;td&gt;66.4&lt;/td&gt;
            &lt;td&gt;809&lt;/td&gt;
            &lt;td&gt;63.9&lt;/td&gt;
            &lt;td&gt;86.0&lt;/td&gt;
            &lt;td&gt;60.0&lt;/td&gt;
            &lt;td&gt;61.9&lt;/td&gt;
            &lt;td&gt;2253.0&lt;/td&gt;
            &lt;td&gt;81.2&lt;/td&gt;
            &lt;td&gt;83.8&lt;/td&gt;
            &lt;td&gt;54.0&lt;/td&gt;
            &lt;td&gt;45.3&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;84.2&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;93.3&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;3.0&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Qwen2-VL-7B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;784&lt;/td&gt;
            &lt;td&gt;67.1&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;866&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;58.2&lt;/td&gt;
            &lt;td&gt;83.0&lt;/td&gt;
            &lt;td&gt;62.0&lt;/td&gt;
            &lt;td&gt;60.7&lt;/td&gt;
            &lt;td&gt;2326.0&lt;/td&gt;
            &lt;td&gt;81.8&lt;/td&gt;
            &lt;td&gt;83.0&lt;/td&gt;
            &lt;td&gt;54.1&lt;/td&gt;
            &lt;td&gt;50.6&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;84.3&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;94.5&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;31.9&lt;/td&gt;
            &lt;td&gt;16.3&lt;/td&gt;
            &lt;td&gt;3.2&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;LLaVA-OneVision-72B&lt;/td&gt;
            &lt;td&gt;72B&lt;/td&gt;
            &lt;td&gt;182&lt;/td&gt;
            &lt;td&gt;68.1&lt;/td&gt;
            &lt;td&gt;741&lt;/td&gt;
            &lt;td&gt;67.5&lt;/td&gt;
            &lt;td&gt;83.7&lt;/td&gt;
            &lt;td&gt;60.6&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;65.8&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;2261.0&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;85.0&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;85.6&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;56.8&lt;/td&gt;
            &lt;td&gt;49.0&lt;/td&gt;
            &lt;td&gt;80.5&lt;/td&gt;
            &lt;td&gt;91.3&lt;/td&gt;
            &lt;td&gt;39.1&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;3.5&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;InternVL2.5-8B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;706&lt;/td&gt;
            &lt;td&gt;68.3&lt;/td&gt;
            &lt;td&gt;822&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;64.4&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;84.8&lt;/td&gt;
            &lt;td&gt;62.8&lt;/td&gt;
            &lt;td&gt;62.8&lt;/td&gt;
            &lt;td&gt;2344.0&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;83.6&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;84.5&lt;/td&gt;
            &lt;td&gt;56.0&lt;/td&gt;
            &lt;td&gt;50.1&lt;/td&gt;
            &lt;td&gt;79.1&lt;/td&gt;
            &lt;td&gt;93.0&lt;/td&gt;
            &lt;td&gt;39.5&lt;/td&gt;
            &lt;td&gt;19.7&lt;/td&gt;
            &lt;td&gt;3.4&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-V 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;2822&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;65.2&lt;/td&gt;
            &lt;td&gt;852*&lt;/td&gt;
            &lt;td&gt;60.6&lt;/td&gt;
            &lt;td&gt;79.4&lt;/td&gt;
            &lt;td&gt;60.0&lt;/td&gt;
            &lt;td&gt;57.5&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;2348.4*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;78.0&lt;/td&gt;
            &lt;td&gt;82.1&lt;/td&gt;
            &lt;td&gt;49.8*&lt;/td&gt;
            &lt;td&gt;48.1*&lt;/td&gt;
            &lt;td&gt;80.1&lt;/td&gt;
            &lt;td&gt;90.8&lt;/td&gt;
            &lt;td&gt;25.7&lt;/td&gt;
            &lt;td&gt;18.3&lt;/td&gt;
            &lt;td&gt;3.6&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;2822&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;70.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;897*&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;71.9*&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;86.9*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;67.5&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;64.0&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;2372.0*&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;80.5&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;85.8&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;50.4*&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;51.9&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;82.0&lt;/td&gt;
            &lt;td&gt;93.5&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;41.4*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;23.1*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;3.8&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
* We evaluate this benchmark using chain-of-thought prompting. Specifically, for MME, we used this technique only for the Cognition set.
&lt;p&gt;&lt;sup&gt;+&lt;/sup&gt; Token Density: number of pixels encoded into each visual token at maximum resolution, i.e., # pixels at maximum resolution / # visual tokens.&lt;/p&gt;
&lt;p&gt;Note: For proprietary models, we calculate token density based on the image encoding charging strategy defined in the official API documentation, which provides an upper-bound estimation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-image and Video Understanding&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Model&lt;/th&gt;
            &lt;th&gt;Size&lt;/th&gt;
            &lt;th&gt;BLINK val&lt;/th&gt;
            &lt;th&gt;Mantis Eval&lt;/th&gt;
            &lt;th&gt;MIRB&lt;/th&gt;
            &lt;th&gt;Video-MME (wo / w subs)&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;6&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Proprietary&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT-4o-20240513&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;68.0&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;71.9/77.2&lt;strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT4V&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;54.6&lt;/td&gt;
            &lt;td&gt;62.7&lt;/td&gt;
            &lt;td&gt;53.1&lt;/td&gt;
            &lt;td&gt;59.9/63.3&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;6&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Open-source&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;VITA-1.5&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;45.0&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;56.1/58.7&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;LLaVA-NeXT-Interleave 14B&lt;/td&gt;
            &lt;td&gt;14B&lt;/td&gt;
            &lt;td&gt;52.6&lt;/td&gt;
            &lt;td&gt;66.4&lt;/td&gt;
            &lt;td&gt;30.2&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;LLaVA-OneVision-72B&lt;/td&gt;
            &lt;td&gt;72B&lt;/td&gt;
            &lt;td&gt;55.4&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;77.6&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;66.2/69.5&lt;/u&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MANTIS 8B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;49.1&lt;/td&gt;
            &lt;td&gt;59.5&lt;/td&gt;
            &lt;td&gt;34.8&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Qwen2-VL-7B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;53.2&lt;/td&gt;
            &lt;td&gt;69.6*&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;67.6*&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;63.3/69.0&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;InternVL2.5-8B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;54.8&lt;/td&gt;
            &lt;td&gt;67.7&lt;/td&gt;
            &lt;td&gt;52.5&lt;/td&gt;
            &lt;td&gt;64.2/66.9&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-V 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;53.0&lt;/td&gt;
            &lt;td&gt;69.1&lt;/td&gt;
            &lt;td&gt;53.8&lt;/td&gt;
            &lt;td&gt;60.9/63.6&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;56.7&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;71.9&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;58.6&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;63.9/67.9&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
* We evaluate officially released checkpoints by ourselves.
&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;Click to view audio understanding and speech conversation results.&lt;/summary&gt;
&lt;p&gt;&lt;strong&gt;Audio Understanding&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Task&lt;/th&gt;
            &lt;th&gt;Size&lt;/th&gt;
            &lt;th colspan=&#34;3&#34;&gt;ASR (zh)&lt;/th&gt;
            &lt;th colspan=&#34;3&#34;&gt;ASR (en)&lt;/th&gt;
            &lt;th colspan=&#34;2&#34;&gt;AST&lt;/th&gt;
            &lt;th&gt;Emotion&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Metric&lt;/th&gt;
            &lt;td&gt;&lt;/td&gt;
            &lt;th colspan=&#34;3&#34;&gt;CER↓&lt;/th&gt;
            &lt;th colspan=&#34;3&#34;&gt;WER↓&lt;/th&gt;
            &lt;th colspan=&#34;2&#34;&gt;BLEU↑&lt;/th&gt;
            &lt;th&gt;ACC↑&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Dataset&lt;/th&gt;
            &lt;td&gt;&lt;/td&gt;
            &lt;th&gt;AISHELL-1&lt;/th&gt;
            &lt;th&gt;Fleurs zh&lt;/th&gt;
            &lt;th&gt;WenetSpeech test-net&lt;/th&gt;
            &lt;th&gt;LibriSpeech test-clean&lt;/th&gt;
            &lt;th&gt;GigaSpeech&lt;/th&gt;
            &lt;th&gt;TED-LIUM&lt;/th&gt;
            &lt;th&gt;CoVoST en2zh&lt;/th&gt;
            &lt;th&gt;CoVoST zh2en&lt;/th&gt;
            &lt;th&gt;MELD emotion&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;11&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Proprietary&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT-4o-Realtime&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;7.3*&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;5.4*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;28.9*&lt;/td&gt;
            &lt;td&gt;2.6*&lt;/td&gt;
            &lt;td&gt;12.9*&lt;/td&gt;
            &lt;td&gt;4.8*&lt;/td&gt;
            &lt;td&gt;37.1*&lt;/td&gt;
            &lt;td&gt;15.7*&lt;/td&gt;
            &lt;td&gt;33.2*&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Gemini 1.5 Pro&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;4.5*&lt;/td&gt;
            &lt;td&gt;5.9*&lt;/td&gt;
            &lt;td&gt;14.3*&lt;/td&gt;
            &lt;td&gt;2.9*&lt;/td&gt;
            &lt;td&gt;10.6*&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;3.0*&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;47.3*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;22.6*&lt;/td&gt;
            &lt;td&gt;48.4*&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;11&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Open-Source&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Qwen2-Audio-7B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;7.5&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;1.6&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;45.2&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;24.4&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;55.3&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Qwen2-Audio-7B-Instruct&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;2.6*&lt;/td&gt;
            &lt;td&gt;6.9*&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;10.3*&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;3.1*&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;9.7&lt;/u&gt;*&lt;/td&gt;
            &lt;td&gt;5.9*&lt;/td&gt;
            &lt;td&gt;39.5*&lt;/td&gt;
            &lt;td&gt;22.9*&lt;/td&gt;
            &lt;td&gt;17.4*&lt;/td&gt;
        &lt;/tr&gt;
          &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;VITA-1.5&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;2.16&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;8.4&lt;/td&gt;
            &lt;td&gt;3.4&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GLM-4-Voice-Base&lt;/td&gt;
            &lt;td&gt;9B&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;2.5&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;2.8&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;1.6&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;4.4&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;6.9&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;1.7&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;8.7&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;3.0&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;48.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;27.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;52.4&lt;/u&gt;&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
* We evaluate officially released checkpoints by ourselves.&lt;br&gt;&lt;br&gt;
&lt;p&gt;&lt;strong&gt;Speech Generation&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Task&lt;/th&gt;
            &lt;th&gt;Size&lt;/th&gt;
            &lt;th colspan=&#34;9&#34;&gt;SpeechQA&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Metric&lt;/th&gt;
            &lt;th&gt;&lt;/th&gt;
            &lt;th colspan=&#34;3&#34;&gt;ACC↑&lt;/th&gt;
            &lt;th&gt;G-Eval (10 point)↑&lt;/th&gt;
            &lt;th&gt;Semantic ELO score↑&lt;/th&gt;
            &lt;th&gt;Acoustic ELO score↑&lt;/th&gt;
            &lt;th&gt;Overall ELO score↑&lt;/th&gt;
            &lt;th&gt;UTMOS↑&lt;/th&gt;
            &lt;th&gt;ASR-WER↓&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Dataset&lt;/th&gt;
            &lt;th&gt;&lt;/th&gt;
            &lt;th&gt;Speech Llama Q.&lt;/th&gt;
            &lt;th&gt;Speech Web Q.&lt;/th&gt;
            &lt;th&gt;Speech Trivia QA&lt;/th&gt;
            &lt;th&gt;Speech AlpacaEval&lt;/th&gt;
            &lt;th colspan=&#34;5&#34;&gt;AudioArena&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;11&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Proprietary&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT-4o-Realtime&lt;/td&gt;
            &lt;td&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;71.7&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;51.6&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;69.7&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;7.4&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;1157&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;1203&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;1200&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;4.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;2.3&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;11&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Open-Source&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GLM-4-Voice&lt;/td&gt;
            &lt;td&gt;9B&lt;/td&gt;
            &lt;td&gt;50.0&lt;/td&gt;
            &lt;td&gt;32.0&lt;/td&gt;
            &lt;td&gt;36.4&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;5.1&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;999&lt;/td&gt;
            &lt;td&gt;1147&lt;/td&gt;
            &lt;td&gt;1035&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;4.1&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;11.7&lt;/u&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Llama-Omni&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;45.3&lt;/td&gt;
            &lt;td&gt;22.9&lt;/td&gt;
            &lt;td&gt;10.7&lt;/td&gt;
            &lt;td&gt;3.9&lt;/td&gt;
            &lt;td&gt;960&lt;/td&gt;
            &lt;td&gt;878&lt;/td&gt;
            &lt;td&gt;897&lt;/td&gt;
            &lt;td&gt;3.2&lt;/td&gt;
            &lt;td&gt;24.3&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;VITA-1.5&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;46.7&lt;/td&gt;
            &lt;td&gt;28.1&lt;/td&gt;
            &lt;td&gt;23.3&lt;/td&gt;
            &lt;td&gt;2.0&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Moshi&lt;/td&gt;
            &lt;td&gt;7B&lt;/td&gt;
            &lt;td&gt;43.7&lt;/td&gt;
            &lt;td&gt;23.8&lt;/td&gt;
            &lt;td&gt;16.7&lt;/td&gt;
            &lt;td&gt;2.4&lt;/td&gt;
            &lt;td&gt;871&lt;/td&gt;
            &lt;td&gt;808&lt;/td&gt;
            &lt;td&gt;875&lt;/td&gt;
            &lt;td&gt;2.8&lt;/td&gt;
            &lt;td&gt;8.2&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Mini-Omni&lt;/td&gt;
            &lt;td&gt;1B&lt;/td&gt;
            &lt;td&gt;22.0&lt;/td&gt;
            &lt;td&gt;12.8&lt;/td&gt;
            &lt;td&gt;6.9&lt;/td&gt;
            &lt;td&gt;2.5&lt;/td&gt;
            &lt;td&gt;926&lt;/td&gt;
            &lt;td&gt;803&lt;/td&gt;
            &lt;td&gt;865&lt;/td&gt;
            &lt;td&gt;3.4&lt;/td&gt;
            &lt;td&gt;10.0&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;61.0&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;40.0&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;40.2&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;5.1&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;1088&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;1163&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;1131&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;4.2&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;9.8&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
All results are from AudioEvals, and the evaluation methods along with further details can be found in &lt;a href=&#34;https://github.com/OpenBMB/UltraEval-Audio&#34; target=&#34;_blank&#34;&gt;AudioEvals&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
&lt;p&gt;&lt;strong&gt;End-to-end Voice Cloning&lt;/strong&gt;&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Task&lt;/th&gt;
            &lt;th colspan=&#34;2&#34;&gt;Voice cloning&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Metric&lt;/th&gt;
            &lt;th&gt;SIMO↑&lt;/th&gt;
            &lt;th&gt;SIMO↑&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Dataset&lt;/th&gt;
            &lt;th&gt;Seed-TTS test-zh&lt;/th&gt;
            &lt;th&gt;Seed-TTS test-en&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;F5-TTS&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;76&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;67&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;CosyVoice&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;75&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;64&lt;/u&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;FireRedTTS&lt;/td&gt;
            &lt;td&gt;63&lt;/td&gt;
            &lt;td&gt;46&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
            &lt;td&gt;57&lt;/td&gt;
            &lt;td&gt;47&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt;Click to view multimodal live streaming results.&lt;/summary&gt;
&lt;p&gt;&lt;strong&gt;Multimodal Live Streaming&lt;/strong&gt;: results on StreamingBench&lt;/p&gt;
&lt;table style=&#34;margin: 0px auto;&#34;&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th align=&#34;left&#34;&gt;Model&lt;/th&gt;
            &lt;th&gt;Size&lt;/th&gt;
            &lt;th&gt;Real-Time Video Understanding&lt;/th&gt;
            &lt;th&gt;Omni-Source Understanding&lt;/th&gt;
            &lt;th&gt;Contextual Understanding&lt;/th&gt;
            &lt;th&gt;Overall&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody align=&#34;center&#34;&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;7&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Proprietary&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Gemini 1.5 Pro&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;77.4&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;67.8&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;51.1&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;70.3&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;GPT-4o-202408&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;74.5&lt;/td&gt;
            &lt;td&gt;51.0&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;48.0&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;64.1&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Claude-3.5-Sonnet&lt;/td&gt;
            &lt;td&gt;-&lt;/td&gt;
            &lt;td&gt;74.0&lt;/td&gt;
            &lt;td&gt;41.4&lt;/td&gt;
            &lt;td&gt;37.8&lt;/td&gt;
            &lt;td&gt;59.7&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td colspan=&#34;9&#34; align=&#34;left&#34;&gt;&lt;strong&gt;Open-source&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;VILA-1.5&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;61.5&lt;/td&gt;
            &lt;td&gt;37.5&lt;/td&gt;
            &lt;td&gt;26.7&lt;/td&gt;
            &lt;td&gt;49.5&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;LongVA&lt;/td&gt;
            &lt;td&gt;7B&lt;/td&gt;
            &lt;td&gt;63.1&lt;/td&gt;
            &lt;td&gt;35.9&lt;/td&gt;
            &lt;td&gt;30.2&lt;/td&gt;
            &lt;td&gt;50.7&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;LLaVA-Next-Video-34B&lt;/td&gt;
            &lt;td&gt;34B&lt;/td&gt;
            &lt;td&gt;69.8&lt;/td&gt;
            &lt;td&gt;41.7&lt;/td&gt;
            &lt;td&gt;34.3&lt;/td&gt;
            &lt;td&gt;56.7&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;Qwen2-VL-7B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;71.2&lt;/td&gt;
            &lt;td&gt;40.7&lt;/td&gt;
            &lt;td&gt;33.1&lt;/td&gt;
            &lt;td&gt;57.0&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;InternVL2-8B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;70.1&lt;/td&gt;
            &lt;td&gt;42.7&lt;/td&gt;
            &lt;td&gt;34.1&lt;/td&gt;
            &lt;td&gt;57.0&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;VITA-1.5&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;70.9&lt;/td&gt;
            &lt;td&gt;40.8&lt;/td&gt;
            &lt;td&gt;35.8&lt;/td&gt;
            &lt;td&gt;57.4&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;LLaVA-OneVision-7B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;74.3&lt;/td&gt;
            &lt;td&gt;40.8&lt;/td&gt;
            &lt;td&gt;31.0&lt;/td&gt;
            &lt;td&gt;58.4&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;InternLM-XC2.5-OL-7B&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;75.4&lt;/td&gt;
            &lt;td&gt;46.2&lt;/td&gt;
            &lt;td&gt;33.6&lt;/td&gt;
            &lt;td&gt;60.8&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-V 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;72.4&lt;/td&gt;
            &lt;td&gt;40.2&lt;/td&gt;
            &lt;td&gt;33.4&lt;/td&gt;
            &lt;td&gt;57.7&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td nowrap=&#34;nowrap&#34; align=&#34;left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
            &lt;td&gt;8B&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;79.9&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;53.4&lt;/u&gt;&lt;/td&gt;
            &lt;td&gt;38.5&lt;/td&gt;
            &lt;td&gt;&lt;u&gt;66.0&lt;/u&gt;&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;
&lt;/details&gt;
&lt;h3 id=&#34;examples-1&#34;&gt;Examples &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;We deploy MiniCPM-o 2.6 on end devices. The demo video is the raw-speed recording on an iPad Pro and a Web demo.&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
  &lt;a href=&#34;https://www.youtube.com/watch?v=vRIMbxJzStY&amp;t=2s&#34;&gt;&lt;img src=&#34;./assets/minicpmo2_6/2dot6_o_demo_video_img.png&#34;, width=70%&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;div style=&#34;display: flex; flex-direction: column; align-items: center;&#34;&gt;
  &lt;img src=&#34;assets/minicpmo2_6/minicpmo2_6_math_intersect.png&#34; alt=&#34;math&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
  &lt;img src=&#34;assets/minicpmo2_6/minicpmo2_6_diagram_train_NN.png&#34; alt=&#34;diagram&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
  &lt;img src=&#34;assets/minicpmo2_6/minicpmo2_6_multi-image_bike.png&#34; alt=&#34;bike&#34; style=&#34;margin-bottom: 5px;&#34;&gt;
&lt;/div&gt;
&lt;h2 id=&#34;legacy-models&#34;&gt;Legacy Models &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Model&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Introduction and Guidance&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 4.0&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;./docs/minicpm_v4_en.md&#34; &gt;Document&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 2.6&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;./docs/minicpm_v2dot6_en.md&#34; &gt;Document&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-Llama3-V 2.5&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;./docs/minicpm_llama3_v2dot5.md&#34; &gt;Document&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 2.0&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;./docs/minicpm_v2.md&#34; &gt;Document&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 1.0&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;./docs/minicpm_v1.md&#34; &gt;Document&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;OmniLMM-12B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;././docs/omnilmm_en.md&#34; &gt;Document&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;minicpm-v--o-cookbook&#34;&gt;MiniCPM-V &amp;amp; o Cookbook
&lt;/h2&gt;&lt;p&gt;Discover comprehensive, ready-to-deploy solutions for the MiniCPM-V and MiniCPM-o model series in our structured &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;cookbook&lt;/a&gt;, which empowers developers to rapidly implement multimodal AI applications with integrated vision, speech, and live-streaming capabilities. Key features include:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Easy Usage Documentation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our comprehensive &lt;a class=&#34;link&#34; href=&#34;https://minicpm-o.readthedocs.io/en/latest/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation website&lt;/a&gt; presents every recipe in a clear, well-organized manner.
All features are displayed at a glance, making it easy for you to quickly find exactly what you need.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Broad User Spectrum&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We support a wide range of users, from individuals to enterprises and researchers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Individuals&lt;/strong&gt;: Enjoy effortless inference using &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-v4_ollama.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Ollama&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-v4_llamacpp.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Llama.cpp&lt;/a&gt; with minimal setup.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enterprises&lt;/strong&gt;: Achieve high-throughput, scalable performance with &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-v4_vllm.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;vLLM&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-v4_sglang.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SGLang&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Researchers&lt;/strong&gt;: Leverage advanced frameworks including &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_full.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Transformers&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_llamafactory.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LLaMA-Factory&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/swift.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SWIFT&lt;/a&gt;, and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/align_anything.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Align-anything&lt;/a&gt; to enable flexible model development and cutting-edge experimentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Versatile Deployment Scenarios&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our ecosystem delivers optimal solution for a variety of hardware environments and deployment demands.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Web demo&lt;/strong&gt;: Launch interactive multimodal AI web demo with &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;FastAPI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quantized deployment&lt;/strong&gt;: Maximize efficiency and minimize resource consumption using &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/gguf/minicpm-v4_gguf_quantize.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GGUF&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/bnb/minicpm-v4_bnb_quantize.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;BNB&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;End devices&lt;/strong&gt;: Bring powerful AI experiences to &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/ios_demo/ios.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;iPhone and iPad&lt;/a&gt;, supporting offline and privacy-sensitive applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;chat-with-our-demo-on-gradio-&#34;&gt;Chat with Our Demo on Gradio 🤗
&lt;/h2&gt;&lt;p&gt;We provide online and local demos powered by Hugging Face Gradio &lt;a href=&#39;https://github.com/gradio-app/gradio&#39;&gt;&lt;img src=&#39;https://img.shields.io/github/stars/gradio-app/gradio&#39;&gt;&lt;/a&gt;, the most popular model deployment framework nowadays. It supports streaming outputs, progress bars, queuing, alerts, and other useful features.&lt;/p&gt;
&lt;h3 id=&#34;online-demo&#34;&gt;Online Demo &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;Click here to try out the online demo of &lt;a class=&#34;link&#34; href=&#34;https://minicpm-omni-webdemo-us.modelbest.cn/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-o 2.6&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;http://120.92.209.146:8887/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-V 2.6&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-Llama3-V 2.5&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/spaces/openbmb/MiniCPM-V-2&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-V 2.0&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;local-webui-demo&#34;&gt;Local WebUI Demo &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;You can easily build your own local WebUI demo using the following commands.&lt;/p&gt;
&lt;p&gt;Please ensure that &lt;code&gt;transformers==4.44.2&lt;/code&gt; is installed, as other versions may have compatibility issues.&lt;/p&gt;
&lt;p&gt;If you are using an older version of PyTorch, you might encounter this issue &lt;code&gt;&amp;quot;weight_norm_fwd_first_dim_kernel&amp;quot; not implemented for &#39;BFloat16&#39;&lt;/code&gt;, Please add &lt;code&gt;self.minicpmo_model.tts.float()&lt;/code&gt; during the model initialization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For real-time voice/video call demo:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;launch model server:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -r requirements_o2.6.txt
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python web_demos/minicpm-o_2.6/model_server.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;launch web server:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Make sure Node and PNPM is installed.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt-get update
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt-get install nodejs npm
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;npm install -g pnpm
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; web_demos/minicpm-o_2.6/web_server
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# create ssl cert for https, https is required to request camera and microphone permissions.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bash ./make_ssl_cert.sh  &lt;span class=&#34;c1&#34;&gt;# output key.pem and cert.pem&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm install  &lt;span class=&#34;c1&#34;&gt;# install requirements&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pnpm run dev  &lt;span class=&#34;c1&#34;&gt;# start server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Open &lt;code&gt;https://localhost:8088/&lt;/code&gt; in browser and enjoy the real-time voice/video call.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For chatbot demo:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -r requirements_o2.6.txt
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python web_demos/minicpm-o_2.6/chatbot_web_demo_o2.6.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Open &lt;code&gt;http://localhost:8000/&lt;/code&gt; in browser and enjoy the vision mode chatbot.&lt;/p&gt;
&lt;h2 id=&#34;inference&#34;&gt;Inference
&lt;/h2&gt;&lt;h3 id=&#34;model-zoo&#34;&gt;Model Zoo
&lt;/h3&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;Model&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Device&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Memory&lt;/th&gt;
          &lt;th style=&#34;text-align: left&#34;&gt;         Description&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;Download&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 4.5&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;GPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;18 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The latest version, strong end-side multimodal performance for single image, multi-image and video understanding.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 4.5 gguf&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;CPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;8 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The gguf version, lower memory usage and faster inference.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 4.5 int4&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;GPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;9 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The int4 quantized version, lower GPU memory usage.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-V 4.5 AWQ&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;GPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;9 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The int4 quantized version, lower GPU memory usage.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-V-4_5-AWQ&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-V-4_5-AWQ&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-o 2.6&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;GPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;18 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The latest version, achieving GPT-4o level performance for vision, speech and multimodal live streaming on end-side devices.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-o 2.6 gguf&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;CPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;8 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The gguf version, lower memory usage and faster inference.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-gguf&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;MiniCPM-o 2.6 int4&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;GPU&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;9 GB&lt;/td&gt;
          &lt;td style=&#34;text-align: left&#34;&gt;The int4 quantized version, lower GPU memory usage.&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/openbmb/MiniCPM-o-2_6-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;🤗&lt;/a&gt;    &lt;a class=&#34;link&#34; href=&#34;https://modelscope.cn/models/OpenBMB/MiniCPM-o-2_6-int4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;./assets/modelscope_logo.png&#34; width=&#34;20px&#34;&gt;&lt;/img&gt;&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;multi-turn-conversation&#34;&gt;Multi-turn Conversation
&lt;/h3&gt;&lt;p&gt;If you wish to enable long-thinking mode, provide the argument &lt;code&gt;enable_thinking=True&lt;/code&gt; to the chat function.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -r requirements_o2.6.txt
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Please refer to the following codes to run.&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;assets/minicpmo2_6/show_demo.jpg&#34; width=&#34;500px&#34;&gt;
&lt;/div&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;33
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;34
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;35
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;PIL&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;manual_seed&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;100&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;attn_implementation&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;sdpa&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# sdpa or flash_attention_2, no eager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cuda&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/minicpmo2_6/show_demo.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;enable_thinking&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# If `enable_thinking=True`, the long-thinking mode is enabled.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# First round chat &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;What is the landform in the picture?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;enable_thinking&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;enable_thinking&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Second round chat, pass history context of multi-turn conversation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;({&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;assistant&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;({&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;What should I pay attention to when traveling here?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You will get the following output:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# round1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;The landform in the picture is karst topography. Karst landscapes are characterized by distinctive, jagged limestone hills or mountains with steep, irregular peaks and deep valleys—exactly what you see here These unique formations result from the dissolution of soluble rocks like limestone over millions of years through water erosion.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;This scene closely resembles the famous karst landscape of Guilin and Yangshuo in China’s Guangxi Province. The area features dramatic, pointed limestone peaks rising dramatically above serene rivers and lush green forests, creating a breathtaking and iconic natural beauty that attracts millions of visitors each year &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; its picturesque views.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# round2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;When traveling to a karst landscape like this, here are some important tips:
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;1. Wear comfortable shoes: The terrain can be uneven and hilly.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;2. Bring water and snacks &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; energy during hikes or boat rides.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;3. Protect yourself from the sun with sunscreen, hats, and sunglasses—especially since you’ll likely spend &lt;span class=&#34;nb&#34;&gt;time&lt;/span&gt; outdoors exploring scenic spots.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;4. Respect &lt;span class=&#34;nb&#34;&gt;local&lt;/span&gt; customs and nature regulations by not littering or disturbing wildlife.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;By following these guidelines, you&lt;span class=&#34;err&#34;&gt;&amp;#39;&lt;/span&gt;ll have a safe and enjoyable trip &lt;span class=&#34;k&#34;&gt;while&lt;/span&gt; appreciating the stunning natural beauty of places such as Guilin’s karst mountains.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h4 id=&#34;chat-with-multiple-images&#34;&gt;Chat with Multiple Images
&lt;/h4&gt;&lt;details&gt;
&lt;summary&gt; Click to view Python code running MiniCPM-V-4_5 with multiple images input. &lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;PIL&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;attn_implementation&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;sdpa&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# sdpa or flash_attention_2, no eager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cuda&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image1&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;image1.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image2&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;image2.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;Compare image 1 and image 2, tell me about the differences between image 1 and image 2.&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;image2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/details&gt;
&lt;h4 id=&#34;in-context-few-shot-learning&#34;&gt;In-context Few-shot Learning
&lt;/h4&gt;&lt;details&gt;
&lt;summary&gt; Click to view Python code running MiniCPM-V-4_5 with few-shot input. &lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;PIL&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;attn_implementation&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;sdpa&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# sdpa or flash_attention_2, no eager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cuda&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;production date&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image1&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;example1.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer1&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;2023.08.04&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image2&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;example2.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer2&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;2007.04.24&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image_test&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;test.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]},&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;assistant&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]},&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;assistant&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image_test&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/details&gt;
&lt;h4 id=&#34;chat-with-video&#34;&gt;Chat with Video
&lt;/h4&gt;&lt;details&gt;
&lt;summary&gt; Click to view Python code running MiniCPM-V-4_5 by with video input and 3D-Resampler. &lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;33
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;34
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;35
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;36
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;37
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;38
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;39
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;40
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;41
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;42
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;43
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;44
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;45
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;46
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;47
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;48
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;49
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;50
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;51
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;52
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;53
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;54
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;55
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;56
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;57
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;58
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;59
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;60
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;61
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;62
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;63
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;64
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;65
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;66
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;67
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;68
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;69
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;70
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;71
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;72
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;73
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;74
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;75
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;76
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;77
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;78
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;79
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;80
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;81
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;82
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;83
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;84
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;85
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;86
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;87
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;88
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;89
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;90
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;91
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;92
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;93
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;94
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;95
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;96
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;## The 3d-resampler compresses multiple frames into 64 tokens by introducing temporal_ids. &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# To achieve this, you need to organize your video data into two corresponding sequences: &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#   frames: List[Image]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;#   temporal_ids: List[List[Int]].&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;PIL&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;decord&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;VideoReader&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;cpu&lt;/span&gt;    &lt;span class=&#34;c1&#34;&gt;# pip install decord&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;scipy.spatial&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;cKDTree&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;numpy&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;math&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;attn_implementation&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;sdpa&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# sdpa or flash_attention_2, no eager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cuda&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-V-4_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# or openbmb/MiniCPM-o-2_6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;MAX_NUM_FRAMES&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;180&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# Indicates the maximum number of frames received after the videos are packed. The actual maximum number of valid frames is MAX_NUM_FRAMES * MAX_NUM_PACKING.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;MAX_NUM_PACKING&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;3&lt;/span&gt;  &lt;span class=&#34;c1&#34;&gt;# indicates the maximum packing number of video frames. valid range: 1-6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;TIME_SCALE&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;mf&#34;&gt;0.1&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;map_to_nearest_scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;values&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tree&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;cKDTree&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;asarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[:,&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;indices&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;tree&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;query&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;asarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;values&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[:,&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;asarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;indices&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;group_array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;size&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;size&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;size&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;encode_video&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;choose_fps&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;force_packing&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;uniform_sample&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;l&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;n&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;gap&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;l&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;n&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;idxs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;*&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;gap&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;gap&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;2&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;n&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;l&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;idxs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;vr&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;VideoReader&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;ctx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cpu&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;vr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_avg_fps&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;choose_fps&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;*&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;int&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MAX_NUM_FRAMES&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;choose_frames&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;round&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;min&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;choose_fps&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;round&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;*&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;min&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MAX_NUM_FRAMES&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;else&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;math&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ceil&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;*&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;choose_fps&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MAX_NUM_FRAMES&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MAX_NUM_PACKING&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;choose_frames&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;round&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;*&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;choose_fps&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;else&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;choose_frames&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;round&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;MAX_NUM_FRAMES&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;*&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MAX_NUM_PACKING&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MAX_NUM_PACKING&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frame_idx&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))]&lt;/span&gt;      
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frame_idx&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;  &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uniform_sample&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_idx&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;choose_frames&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;force_packing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;min&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;force_packing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;MAX_NUM_PACKING&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39; duration:&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;get video frames=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_idx&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;, packing_nums=&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;vr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_batch&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_idx&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;asnumpy&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frame_idx_ts&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frame_idx&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;arange&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;video_duration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TIME_SCALE&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frame_ts_id&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;map_to_nearest_scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_idx_ts&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;scale&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;/&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;TIME_SCALE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frame_ts_id&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frame_ts_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;astype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;int32&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;assert&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_ts_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;fromarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;v&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;astype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;uint8&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;v&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;frame_ts_id_group&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;group_array&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_ts_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;packing_nums&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frame_ts_id_group&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;video_test.mp4&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;5&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# fps for video&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;force_packing&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# You can set force_packing to ensure that 3D packing is forcibly enabled; otherwise, encode_video will dynamically set the packing quantity based on the duration.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frame_ts_id_group&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;encode_video&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;force_packing&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;force_packing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Describe the video&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;frames&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]},&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_image_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_slice_nums&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temporal_ids&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame_ts_id_group&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/details&gt;
&lt;h4 id=&#34;speech-and-audio-mode&#34;&gt;Speech and Audio Mode
&lt;/h4&gt;&lt;p&gt;Model initialization&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;librosa&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-o-2_6&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;attn_implementation&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;sdpa&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# sdpa or flash_attention_2, no eager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cuda&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-o-2_6&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;init_tts&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tts&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;float&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr/&gt;
&lt;h5 id=&#34;mimick&#34;&gt;Mimick &lt;!-- omit in toc --&gt;
&lt;/h5&gt;&lt;p&gt;&lt;code&gt;Mimick&lt;/code&gt; task reflects a model&amp;rsquo;s end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model&amp;rsquo;s foundational capability in end-to-end speech modeling.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;mimick_prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Please repeat each user&amp;#39;s speech, including voice style and speech content.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;audio_input&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/input_examples/Trump_WEF_2018_10s.mp3&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# load the audio to be mimicked&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# `./assets/input_examples/fast-pace.wav`, &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# `./assets/input_examples/chi-english-1.wav` &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# `./assets/input_examples/exciting-emotion.wav` &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# for different aspects of speech-centric features.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;mimick_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;audio_input&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;output_mimick.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# save the tts result to output_audio_path&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr/&gt;
&lt;h5 id=&#34;general-speech-conversation-with-configurable-voices&#34;&gt;General Speech Conversation with Configurable Voices &lt;!-- omit in toc --&gt;
&lt;/h5&gt;&lt;p&gt;A general usage scenario of &lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; is role-playing a specific character based on the audio prompt. It will mimic the voice of the character to some extent and act like the character in text, including language style. In this mode, &lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; sounds &lt;strong&gt;more natural and human-like&lt;/strong&gt;. Self-defined audio prompts can be used to customize the voice of the character in an end-to-end manner.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/input_examples/icl_20.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# load the reference audio&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;sys_prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mode&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;audio_roleplay&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;language&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;en&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# round one&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;xxx.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_roleplay_round_1.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# round two&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;history&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;({&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;assistant&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;xxx.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;history&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_roleplay_round_2.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr/&gt;
&lt;h5 id=&#34;speech-conversation-as-an-ai-assistant&#34;&gt;Speech Conversation as an AI Assistant &lt;!-- omit in toc --&gt;
&lt;/h5&gt;&lt;p&gt;An enhanced feature of &lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; is to act as an AI assistant, but only with limited choice of voices. In this mode, &lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; is &lt;strong&gt;less human-like and more like a voice assistant&lt;/strong&gt;. In this mode, the model is more instruction-following. For demo, you are suggested to use &lt;code&gt;assistant_female_voice&lt;/code&gt;, &lt;code&gt;assistant_male_voice&lt;/code&gt;, and &lt;code&gt;assistant_default_female_voice&lt;/code&gt;. Other voices may work but not as stable as the default voices.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Please note that, &lt;code&gt;assistant_female_voice&lt;/code&gt; and &lt;code&gt;assistant_male_voice&lt;/code&gt; are more stable but sounds like robots, while &lt;code&gt;assistant_default_female_voice&lt;/code&gt; is more human-alike but not stable, its voice often changes in multiple turns. We suggest you to try stable voices &lt;code&gt;assistant_female_voice&lt;/code&gt; and &lt;code&gt;assistant_male_voice&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/input_examples/assistant_female_voice.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or use `./assets/input_examples/assistant_male_voice.wav`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;sys_prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mode&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;audio_assistant&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;language&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;en&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;xxx.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]]}&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# load the user&amp;#39;s audio question&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# round one&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_assistant_round_1.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# round two&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;history&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;({&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;assistant&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;})&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;xxx.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;history&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_assistant_round_2.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr/&gt;
&lt;h5 id=&#34;instruction-to-speech&#34;&gt;Instruction-to-Speech &lt;!-- omit in toc --&gt;
&lt;/h5&gt;&lt;p&gt;&lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; can also do Instruction-to-Speech, aka &lt;strong&gt;Voice Creation&lt;/strong&gt;. You can describe a voice in detail, and the model will generate a voice that matches the description. For more Instruction-to-Speech sample instructions, you can refer to &lt;a class=&#34;link&#34; href=&#34;https://voxinstruct.github.io/VoxInstruct/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://voxinstruct.github.io/VoxInstruct/&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;instruction&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;Speak like a male charming superstar, radiating confidence and style in every word.&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;instruction&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_voice_creation.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr/&gt;
&lt;h5 id=&#34;voice-cloning&#34;&gt;Voice Cloning &lt;!-- omit in toc --&gt;
&lt;/h5&gt;&lt;p&gt;&lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; can also do zero-shot text-to-speech, aka &lt;strong&gt;Voice Cloning&lt;/strong&gt;. With this mode, model will act like a TTS model.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/input_examples/icl_20.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# load the reference audio&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;sys_prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ref_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mode&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;voice_cloning&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;language&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;en&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;text_prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Please read the text below.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;text_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content that you want to read&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;user_question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_voice_cloning.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;hr/&gt;
&lt;h5 id=&#34;addressing-various-audio-understanding-tasks&#34;&gt;Addressing Various Audio Understanding Tasks &lt;!-- omit in toc --&gt;
&lt;/h5&gt;&lt;p&gt;&lt;code&gt;MiniCPM-o-2.6&lt;/code&gt; can also be used to address various audio understanding tasks, such as ASR, speaker analysis, general audio captioning, and sound scene tagging.&lt;/p&gt;
&lt;p&gt;For audio-to-text tasks, you can use the following prompts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ASR with ZH(same as AST en2zh): &lt;code&gt;请仔细听这段音频片段，并将其内容逐字记录。&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;ASR with EN(same as AST zh2en): &lt;code&gt;Please listen to the audio snippet carefully and transcribe the content.&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker Analysis: &lt;code&gt;Based on the speaker&#39;s content, speculate on their gender, condition, age range, and health status.&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;General Audio Caption: &lt;code&gt;Summarize the main content of the audio.&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;General Sound Scene Tagging: &lt;code&gt;Utilize one keyword to convey the audio&#39;s content or the associated scene.&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;task_prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Please listen to the audio snippet carefully and transcribe the content.&amp;#34;&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;se&#34;&gt;\n&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# can change to other prompts.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;audio_input&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/input_examples/audio_understanding.mp3&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# load the audio to be captioned&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;task_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;audio_input&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;128&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.3&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;result_audio_understanding.wav&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h4 id=&#34;multimodal-live-streaming&#34;&gt;Multimodal Live Streaming
&lt;/h4&gt;&lt;details&gt;
&lt;summary&gt; Click to view Python code running MiniCPM-o 2.6 with chat inference. &lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;33
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;34
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;35
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;36
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;37
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;38
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;39
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;40
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;41
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;42
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;43
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;44
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;45
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;46
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;47
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;48
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;49
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;50
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;51
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;52
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;53
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;54
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;55
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;56
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;57
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;58
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;59
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;60
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;61
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;62
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;63
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;64
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;65
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;66
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;67
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;68
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;69
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;70
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;71
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;72
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;73
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;74
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;75
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;math&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;numpy&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;PIL&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;moviepy.editor&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;VideoFileClip&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;tempfile&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;librosa&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;soundfile&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;sf&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;get_video_chunk_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;flatten&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;video&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;VideoFileClip&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;video_duration:&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;video&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;duration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;with&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;tempfile&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;suffix&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;.wav&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;delete&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;temp_audio_file&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;temp_audio_file_path&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;temp_audio_file&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;video&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;write_audiofile&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;temp_audio_file_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;codec&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;pcm_s16le&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;fps&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;audio_np&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;librosa&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;load&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;temp_audio_file_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;16000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;mono&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;num_units&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;math&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;ceil&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;duration&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;c1&#34;&gt;# 1 frame + 1s audio chunk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;i&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;range&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;num_units&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;frame&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;video&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_frame&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;image&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;fromarray&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;frame&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;astype&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;uint8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;audio&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;audio_np&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sr&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;i&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;flatten&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;extend&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;([&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;lt;unit&amp;gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;else&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;([&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;lt;unit&amp;gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-o-2_6&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;attn_implementation&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;sdpa&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cuda&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-o-2_6&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;init_tts&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# If you are using an older version of PyTorch, you might encounter this issue &amp;#34;weight_norm_fwd_first_dim_kernel&amp;#34; not implemented for &amp;#39;BFloat16&amp;#39;, Please convert the TTS to float32 type.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# model.tts.float()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# https://huggingface.co/openbmb/MiniCPM-o-2_6/blob/main/assets/Skiing.mp4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;assets/Skiing.mp4&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;sys_msg&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_sys_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;mode&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;omni&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;language&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;en&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# if use voice clone prompt, please set ref_audio&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# ref_audio_path = &amp;#39;/path/to/ref_audio&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# ref_audio, _ = librosa.load(ref_audio_path, sr=16000, mono=True)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# sys_msg = model.get_sys_prompt(ref_audio=ref_audio, mode=&amp;#39;omni&amp;#39;, language=&amp;#39;en&amp;#39;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;get_video_chunk_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msg&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sys_msg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;msg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# please set generate_audio=True and output_audio_path to save the tts result&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;output.wav&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;4096&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;omni_input&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# please set omni_input=True when omni inference&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_tts_template&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;output_audio_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;max_slice_nums&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;use_image_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;return_dict&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/details&gt;
&lt;details&gt;
&lt;summary&gt; Click to view Python code running MiniCPM-o 2.6 with streaming inference. &lt;/summary&gt;
&lt;p&gt;Note: The streaming inference has a slight performance degradation because the audio encoding is not global.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;33
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;34
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;35
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;36
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;37
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;38
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;39
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;40
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;41
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;42
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;43
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;44
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;45
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;46
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;47
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;48
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;49
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;50
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;51
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# a new conversation need reset session first, it will reset the kv-cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;reset_session&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;get_video_chunk_content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;video_path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;flatten&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;123&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 1. prefill system prompt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streaming_prefill&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sys_msg&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;],&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 2. prefill video/audio chunks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;content&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;contents&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;content&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streaming_prefill&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# 3. generate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streaming_generate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;session_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;temperature&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mf&#34;&gt;0.5&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;audios&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;text&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;generate_audio&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;r&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;audio_wav&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;r&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;audio_wav&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;sampling_rate&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;r&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sampling_rate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;txt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;r&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;text&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;audios&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;append&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;audio_wav&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;text&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;txt&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;np&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;concatenate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;audios&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sf&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;write&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;output.wav&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;samplerate&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;sampling_rate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;text:&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;audio saved to output.wav&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;else&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;for&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;r&lt;/span&gt; &lt;span class=&#34;ow&#34;&gt;in&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;res&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;text&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;+=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;r&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;text&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;text:&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/details&gt;
&lt;h3 id=&#34;inference-on-multiple-gpus&#34;&gt;Inference on Multiple GPUs
&lt;/h3&gt;&lt;p&gt;You can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs (12 GB or 16 GB) by distributing the model&amp;rsquo;s layers across multiple GPUs. Please refer to this &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tutorial&lt;/a&gt; for detailed instructions on how to load the model and inference using multiple low VRAM GPUs.&lt;/p&gt;
&lt;h3 id=&#34;inference-on-mac&#34;&gt;Inference on Mac
&lt;/h3&gt;&lt;details&gt;
&lt;summary&gt;Click to view an example, to run MiniCPM-Llama3-V 2.5 on 💻 Mac with MPS (Apple silicon or AMD GPUs). &lt;/summary&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# test.py  Need more than 16GB memory.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;PIL&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoModel&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-Llama3-V-2_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;low_cpu_mem_usage&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;to&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;device&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;mps&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;openbmb/MiniCPM-Llama3-V-2_5&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;trust_remote_code&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;eval&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;image&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;open&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;./assets/hk_OCR.jpg&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;convert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;RGB&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;question&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;Where is this photo taken?&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[{&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;role&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s1&#34;&gt;&amp;#39;content&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;question&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;context&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;_&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chat&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;image&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;msgs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;context&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;None&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sampling&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;answer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Run with command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;PYTORCH_ENABLE_MPS_FALLBACK&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; python test.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/details&gt;
&lt;h3 id=&#34;efficient-inference-with-llamacpp-ollama-vllm&#34;&gt;Efficient Inference with llama.cpp, Ollama, vLLM
&lt;/h3&gt;&lt;p&gt;See &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/llama.cpp/tree/minicpmv-main/examples/llava/README-minicpmv2.6.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;our fork of llama.cpp&lt;/a&gt; for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).&lt;/p&gt;
&lt;p&gt;See &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/ollama/blob/minicpm-v2.6/examples/minicpm-v2.6/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;our fork of Ollama&lt;/a&gt; for more detail. This implementation supports smooth inference of 16~18 token/s on iPad (test environment：iPad Pro + M4).&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt; vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0. And you can use our fork to run MiniCPM-o 2.6 for now. Click to see. &lt;/summary&gt;
&lt;ol&gt;
&lt;li&gt;Install vLLM(&amp;gt;=0.7.1):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install vllm
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;Run Example:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.vllm.ai/en/latest/getting_started/examples/vision_language.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Vision Language&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.vllm.ai/en/latest/getting_started/examples/audio_language.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Audio Language&lt;/a&gt;
&lt;/details&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;fine-tuning&#34;&gt;Fine-tuning
&lt;/h2&gt;&lt;h3 id=&#34;simple-fine-tuning&#34;&gt;Simple Fine-tuning &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;We support simple fine-tuning with Hugging Face for MiniCPM-o 2.6, MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;./finetune/readme.md&#34; &gt;Reference Document&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&#34;with-align-anything&#34;&gt;With Align-Anything &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;We support fine-tuning MiniCPM-o 2.6 by PKU-Alignment Team (both vision and audio, SFT and DPO) with the &lt;a class=&#34;link&#34; href=&#34;https://github.com/PKU-Alignment/align-anything&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Align-Anything framework&lt;/a&gt;. Align-Anything is a scalable framework that aims to align any-modality large models with human intentions, open-sourcing the &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/datasets/PKU-Alignment/align-anything&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;datasets, models and benchmarks&lt;/a&gt;. Benefiting from its concise and modular design, it supports 30+ open-source benchmarks, 40+ models and algorithms including SFT, SimPO, RLHF, &lt;em&gt;etc&lt;/em&gt;. It also provides 30+ directly runnable scripts, making it suitable for beginners to quickly get started.&lt;/p&gt;
&lt;p&gt;Best Practices: &lt;a class=&#34;link&#34; href=&#34;https://github.com/PKU-Alignment/align-anything/tree/main/scripts&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-o 2.6&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;with-llama-factory&#34;&gt;With LLaMA-Factory &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;We support fine-tuning MiniCPM-o 2.6 and MiniCPM-V 2.6 with the LLaMA-Factory framework. LLaMA-Factory provides a solution for flexibly customizing the fine-tuning (Lora/Full/Qlora) of 200+ LLMs without the need for coding through the built-in web UI LLaMABoard. It supports various training methods like sft/ppo/dpo/kto and advanced algorithms like Galore/BAdam/LLaMA-Pro/Pissa/LongLoRA.&lt;/p&gt;
&lt;p&gt;Best Practices: &lt;a class=&#34;link&#34; href=&#34;./docs/llamafactory_train_and_infer.md&#34; &gt;MiniCPM-o 2.6 | MiniCPM-V 2.6&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;with-the-swift-framework&#34;&gt;With the SWIFT Framework &lt;!-- omit in toc --&gt;
&lt;/h3&gt;&lt;p&gt;We now support MiniCPM-V series fine-tuning with the SWIFT framework. SWIFT supports training, inference, evaluation and deployment of nearly 200 LLMs and MLLMs . It supports the lightweight training solutions provided by PEFT and a complete Adapters Library including techniques such as NEFTune, LoRA+ and LLaMA-PRO.&lt;/p&gt;
&lt;p&gt;Best Practices：&lt;a class=&#34;link&#34; href=&#34;https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v%e6%9c%80%e4%bd%b3%e5%ae%9e%e8%b7%b5.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-V 1.0&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/minicpm-v-2%e6%9c%80%e4%bd%b3%e5%ae%9e%e8%b7%b5.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-V 2.0&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://github.com/modelscope/ms-swift/issues/1613&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM-V 2.6&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;awesome-work-using-minicpm-v--minicpm-o&#34;&gt;Awesome work using MiniCPM-V &amp;amp; MiniCPM-o
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/CatchTheTornado/text-extract-api&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;text-extract-api&lt;/a&gt;: Document extraction API using OCRs and Ollama supported models &lt;img src=&#34;https://img.shields.io/github/stars/CatchTheTornado/text-extract-api&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/heshengtao/comfyui_LLM_party&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;comfyui_LLM_party&lt;/a&gt;: Build LLM workflows and integrate into existing image workflows &lt;img src=&#34;https://img.shields.io/github/stars/heshengtao/comfyui_LLM_party&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/imanoop7/Ollama-OCR&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Ollama-OCR&lt;/a&gt;: OCR package uses vlms through Ollama to extract text from images and PDF &lt;img src=&#34;https://img.shields.io/github/stars/imanoop7/Ollama-OCR&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/MixLabPro/comfyui-mixlab-nodes&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;comfyui-mixlab-nodes&lt;/a&gt;: ComfyUI node suite supports Workflow-to-APP、GPT&amp;amp;3D and more &lt;img src=&#34;https://img.shields.io/github/stars/MixLabPro/comfyui-mixlab-nodes&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/HumanAIGC-Engineering/OpenAvatarChat&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenAvatarChat&lt;/a&gt;: Interactive digital human conversation implementation on single PC &lt;img src=&#34;https://img.shields.io/github/stars/HumanAIGC-Engineering/OpenAvatarChat&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/arkohut/pensieve&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;pensieve&lt;/a&gt;: A privacy-focused passive recording project by recording screen content &lt;img src=&#34;https://img.shields.io/github/stars/arkohut/pensieve&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/icereed/paperless-gpt&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;paperless-gpt&lt;/a&gt;: Use LLMs to handle paperless-ngx, AI-powered titles, tags and OCR &lt;img src=&#34;https://img.shields.io/github/stars/icereed/paperless-gpt&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/kimjammer/Neuro&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Neuro&lt;/a&gt;: A recreation of Neuro-Sama, but running on local models on consumer hardware &lt;img src=&#34;https://img.shields.io/github/stars/kimjammer/Neuro&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Repo stars&#34;
	
	
&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;faqs&#34;&gt;FAQs
&lt;/h2&gt;&lt;p&gt;Click here to view the &lt;a class=&#34;link&#34; href=&#34;./docs/faqs.md&#34; &gt;FAQs&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;limitations&#34;&gt;Limitations
&lt;/h2&gt;&lt;p&gt;As an experimental trial, we find MiniCPM-o 2.6 has notable limitations worth further investigation and improvement.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unstable speech output.&lt;/strong&gt; The speech generation can be flawed with noisy backgrounds and unmeaningful sounds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repeated response.&lt;/strong&gt; The model tends to repeat its response when encountering similar consecutive user queries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;High-latency on Web Demo.&lt;/strong&gt; Users may experience unusual high-latency when using web demo hosted on overseas servers. We recommend deploying the demo locally or with good network connections.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;model-license&#34;&gt;Model License &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;This repository is released under the &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Apache-2.0&lt;/a&gt; License.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The usage of MiniCPM-o/V model weights must strictly follow &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MiniCPM Model License.md&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The models and weights of MiniCPM are completely free for academic research. after filling out a &lt;a class=&#34;link&#34; href=&#34;https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&amp;ldquo;questionnaire&amp;rdquo;&lt;/a&gt; for registration, are also available for free commercial use.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;statement&#34;&gt;Statement &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;p&gt;As MLLMs, MiniCPM-o/V models generate content by learning a large number of multimodal corpora, but they cannot comprehend, express personal opinions, or make value judgements. Anything generated by MiniCPM-o/V models does not represent the views and positions of the model developers&lt;/p&gt;
&lt;p&gt;We will not be liable for any problems arising from the use of MiniCPM-o/V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination, or misuse of the model.&lt;/p&gt;
&lt;h2 id=&#34;institutions&#34;&gt;Institutions  &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;p&gt;This project is developed by the following institutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;img src=&#34;assets/thunlp.png&#34; width=&#34;28px&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;https://nlp.csai.tsinghua.edu.cn/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;THUNLP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;img src=&#34;assets/modelbest.png&#34; width=&#34;28px&#34;&gt; &lt;a class=&#34;link&#34; href=&#34;https://modelbest.cn/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ModelBest&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-star-history&#34;&gt;🌟 Star History &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;table align=&#34;center&#34;&gt;
    &lt;p align=&#34;center&#34;&gt;
      &lt;img src=&#34;assets/star-history-25-09-02.png&#34;/&gt;
    &lt;/p&gt;
&lt;/table&gt;
&lt;!-- &lt;picture&gt;
  &lt;source
    media=&#34;(prefers-color-scheme: dark)&#34;
    srcset=&#34;
      https://api.star-history.com/svg?repos=OpenBMB/MiniCPM-o&amp;type=Date&amp;theme=dark
    &#34;
  /&gt;
  &lt;source
    media=&#34;(prefers-color-scheme: light)&#34;
    srcset=&#34;
      https://api.star-history.com/svg?repos=OpenBMB/MiniCPM-o&amp;type=Date
    &#34;
  /&gt;
  &lt;img
    alt=&#34;Star History Chart&#34;
    src=&#34;https://api.star-history.com/svg?repos=OpenBMB/MiniCPM-o&amp;type=Date&#34;
  /&gt;
&lt;/picture&gt; --&gt;
&lt;h2 id=&#34;key-techniques-and-other-multimodal-projects&#34;&gt;Key Techniques and Other Multimodal Projects &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;p&gt;👏 Welcome to explore key techniques of MiniCPM-o/V and other multimodal projects of our team:&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/VisCPM/tree/main&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VisCPM&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/OpenBMB/RLPR&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLPR&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/RLHF-V/RLHF-V&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLHF-V&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/thunlp/LLaVA-UHD&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LLaVA-UHD&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/RLHF-V/RLAIF-V&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;RLAIF-V&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;citation&#34;&gt;Citation &lt;!-- omit in toc --&gt;
&lt;/h2&gt;&lt;p&gt;If you find our model/code/paper helpful, please consider citing our papers 📝 and staring us ⭐️！&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bib&#34; data-lang=&#34;bib&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nc&#34;&gt;@article&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nl&#34;&gt;yao2024minicpm&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{MiniCPM-V: A GPT-4V Level MLLM on Your Phone}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;author&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;journal&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{arXiv preprint arXiv:2408.01800}&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;na&#34;&gt;year&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;{2024}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;</description>
        </item>
        <item>
        <title>fastapi_mcp</title>
        <link>https://producthunt.programnotes.cn/en/p/fastapi_mcp/</link>
        <pubDate>Sat, 16 Aug 2025 15:26:52 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/fastapi_mcp/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1679108319531-278564f267ec?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTUzMjkyMDF8&amp;ixlib=rb-4.1.0" alt="Featured image of post fastapi_mcp" /&gt;&lt;h1 id=&#34;tadata-orgfastapi_&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tadata-org/fastapi_mcp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tadata-org/fastapi_mcp&lt;/a&gt;
&lt;/h1&gt;&lt;p align=&#34;center&#34;&gt;&lt;a href=&#34;https://github.com/tadata-org/fastapi_mcp&#34;&gt;&lt;img src=&#34;https://github.com/user-attachments/assets/7e44e98b-a0ba-4aff-a68a-4ffee3a6189c&#34; alt=&#34;fastapi-to-mcp&#34; height=100/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
  &lt;span style=&#34;font-size: 0.85em; font-weight: normal;&#34;&gt;Built by &lt;a href=&#34;https://tadata.com&#34;&gt;Tadata&lt;/a&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;h1 align=&#34;center&#34;&gt;
  FastAPI-MCP
&lt;/h1&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;a href=&#34;https://trendshift.io/repositories/14064&#34; target=&#34;_blank&#34;&gt;&lt;img src=&#34;https://trendshift.io/api/badge/repositories/14064&#34; alt=&#34;tadata-org%2Ffastapi_mcp | Trendshift&#34; style=&#34;width: 250px; height: 55px;&#34; width=&#34;250&#34; height=&#34;55&#34;/&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p align=&#34;center&#34;&gt;Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!&lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/fastapi-mcp/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/v/fastapi-mcp?color=%2334D058&amp;amp;label=pypi%20package&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI version&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/fastapi-mcp/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/pyversions/fastapi-mcp.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Python Versions&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;#&#34; &gt;&lt;img src=&#34;https://img.shields.io/badge/FastAPI-009485.svg?logo=fastapi&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;FastAPI&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/tadata-org/fastapi_mcp/actions/workflows/ci.yml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://github.com/tadata-org/fastapi_mcp/actions/workflows/ci.yml/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;CI&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://codecov.io/gh/tadata-org/fastapi_mcp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://codecov.io/gh/tadata-org/fastapi_mcp/branch/main/graph/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Coverage&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p align=&#34;center&#34;&gt;&lt;a href=&#34;https://github.com/tadata-org/fastapi_mcp&#34;&gt;&lt;img src=&#34;https://github.com/user-attachments/assets/b205adc6-28c0-4e3c-a68b-9c1a80eb7d0c&#34; alt=&#34;fastapi-mcp-usage&#34; height=&#34;400&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;features&#34;&gt;Features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Authentication&lt;/strong&gt; built in, using your existing FastAPI dependencies!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;FastAPI-native:&lt;/strong&gt; Not just another OpenAPI -&amp;gt; MCP converter&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Zero/Minimal configuration&lt;/strong&gt; required - just point it at your FastAPI app and it works&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Preserving schemas&lt;/strong&gt; of your request models and response models&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Preserve documentation&lt;/strong&gt; of all your endpoints, just as it is in Swagger&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Flexible deployment&lt;/strong&gt; - Mount your MCP server to the same app, or deploy separately&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ASGI transport&lt;/strong&gt; - Uses FastAPI&amp;rsquo;s ASGI interface directly for efficient communication&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;hosted-solution&#34;&gt;Hosted Solution
&lt;/h2&gt;&lt;p&gt;If you prefer a managed hosted solution check out &lt;a class=&#34;link&#34; href=&#34;https://tadata.com&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tadata.com&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;We recommend using &lt;a class=&#34;link&#34; href=&#34;https://docs.astral.sh/uv/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;uv&lt;/a&gt;, a fast Python package installer:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv add fastapi-mcp
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Alternatively, you can install with pip:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install fastapi-mcp
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;basic-usage&#34;&gt;Basic Usage
&lt;/h2&gt;&lt;p&gt;The simplest way to use FastAPI-MCP is to add an MCP server directly to your FastAPI application:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;fastapi&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FastAPI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;fastapi_mcp&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FastApiMCP&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;app&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FastAPI&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;mcp&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;FastApiMCP&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;app&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Mount the MCP server directly to your FastAPI app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;mcp&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;mount&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s it! Your auto-generated MCP server is now available at &lt;code&gt;https://app.base.url/mcp&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;documentation-examples-and-advanced-usage&#34;&gt;Documentation, Examples and Advanced Usage
&lt;/h2&gt;&lt;p&gt;FastAPI-MCP provides &lt;a class=&#34;link&#34; href=&#34;https://fastapi-mcp.tadata.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;comprehensive documentation&lt;/a&gt;. Additionaly, check out the &lt;a class=&#34;link&#34; href=&#34;examples&#34; &gt;examples directory&lt;/a&gt; for code samples demonstrating these features in action.&lt;/p&gt;
&lt;h2 id=&#34;fastapi-first-approach&#34;&gt;FastAPI-first Approach
&lt;/h2&gt;&lt;p&gt;FastAPI-MCP is designed as a native extension of FastAPI, not just a converter that generates MCP tools from your API. This approach offers several key advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Native dependencies&lt;/strong&gt;: Secure your MCP endpoints using familiar FastAPI &lt;code&gt;Depends()&lt;/code&gt; for authentication and authorization&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ASGI transport&lt;/strong&gt;: Communicates directly with your FastAPI app using its ASGI interface, eliminating the need for HTTP calls from the MCP to your API&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unified infrastructure&lt;/strong&gt;: Your FastAPI app doesn&amp;rsquo;t need to run separately from the MCP server (though &lt;a class=&#34;link&#34; href=&#34;https://fastapi-mcp.tadata.com/advanced/deploy#deploying-separately-from-original-fastapi-app&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;separate deployment&lt;/a&gt; is also supported)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This design philosophy ensures minimum friction when adding MCP capabilities to your existing FastAPI services.&lt;/p&gt;
&lt;h2 id=&#34;development-and-contributing&#34;&gt;Development and Contributing
&lt;/h2&gt;&lt;p&gt;Thank you for considering contributing to FastAPI-MCP! We encourage the community to post Issues and create Pull Requests.&lt;/p&gt;
&lt;p&gt;Before you get started, please see our &lt;a class=&#34;link&#34; href=&#34;CONTRIBUTING.md&#34; &gt;Contribution Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;community&#34;&gt;Community
&lt;/h2&gt;&lt;p&gt;Join &lt;a class=&#34;link&#34; href=&#34;https://join.slack.com/t/themcparty/shared_invite/zt-30yxr1zdi-2FG~XjBA0xIgYSYuKe7~Xg&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MCParty Slack community&lt;/a&gt; to connect with other MCP enthusiasts, ask questions, and share your experiences with FastAPI-MCP.&lt;/p&gt;
&lt;h2 id=&#34;requirements&#34;&gt;Requirements
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Python 3.10+ (Recommended 3.12)&lt;/li&gt;
&lt;li&gt;uv&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;MIT License. Copyright (c) 2025 Tadata Inc.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>adk-python</title>
        <link>https://producthunt.programnotes.cn/en/p/adk-python/</link>
        <pubDate>Sat, 09 Aug 2025 15:28:44 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/adk-python/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1656257537297-e4809014f365?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTQ3MjQ0NTR8&amp;ixlib=rb-4.1.0" alt="Featured image of post adk-python" /&gt;&lt;h1 id=&#34;googleadk-python&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/google/adk-python&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;google/adk-python&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;agent-development-kit-adk&#34;&gt;Agent Development Kit (ADK)
&lt;/h1&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;LICENSE&#34; &gt;&lt;img src=&#34;https://img.shields.io/badge/License-Apache_2.0-blue.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;License&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/google/adk-python/actions/workflows/python-unit-tests.yml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://github.com/google/adk-python/actions/workflows/python-unit-tests.yml/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Python Unit Tests&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://www.reddit.com/r/agentdevelopmentkit/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Reddit-r%2Fagentdevelopmentkit-FF4500?style=flat&amp;amp;logo=reddit&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;r/agentdevelopmentkit&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://deepwiki.com/google/adk-python&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://deepwiki.com/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Ask DeepWiki&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;html&gt;
    &lt;h2 align=&#34;center&#34;&gt;
      &lt;img src=&#34;https://raw.githubusercontent.com/google/adk-python/main/assets/agent-development-kit.png&#34; width=&#34;256&#34;/&gt;
    &lt;/h2&gt;
    &lt;h3 align=&#34;center&#34;&gt;
      An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
    &lt;/h3&gt;
    &lt;h3 align=&#34;center&#34;&gt;
      Important Links:
      &lt;a href=&#34;https://google.github.io/adk-docs/&#34;&gt;Docs&lt;/a&gt;, 
      &lt;a href=&#34;https://github.com/google/adk-samples&#34;&gt;Samples&lt;/a&gt;,
      &lt;a href=&#34;https://github.com/google/adk-java&#34;&gt;Java ADK&lt;/a&gt; &amp;
      &lt;a href=&#34;https://github.com/google/adk-web&#34;&gt;ADK Web&lt;/a&gt;.
    &lt;/h3&gt;
&lt;/html&gt;
&lt;p&gt;Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. While optimized for Gemini and the Google ecosystem, ADK is model-agnostic, deployment-agnostic, and is built for compatibility with other frameworks. ADK was designed to make agent development feel more like software development, to make it easier for developers to create, deploy, and orchestrate agentic architectures that range from simple tasks to complex workflows.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;-key-features&#34;&gt;✨ Key Features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rich Tool Ecosystem&lt;/strong&gt;: Utilize pre-built tools, custom functions,
OpenAPI specs, or integrate existing tools to give agents diverse
capabilities, all for tight integration with the Google ecosystem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Code-First Development&lt;/strong&gt;: Define agent logic, tools, and orchestration
directly in Python for ultimate flexibility, testability, and versioning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Modular Multi-Agent Systems&lt;/strong&gt;: Design scalable applications by composing
multiple specialized agents into flexible hierarchies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Deploy Anywhere&lt;/strong&gt;: Easily containerize and deploy agents on Cloud Run or
scale seamlessly with Vertex AI Agent Engine.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-agent2agent-a2a-protocol-and-adk-integration&#34;&gt;🤖 Agent2Agent (A2A) Protocol and ADK Integration
&lt;/h2&gt;&lt;p&gt;For remote agent-to-agent communication, ADK integrates with the
&lt;a class=&#34;link&#34; href=&#34;https://github.com/google-a2a/A2A/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;A2A protocol&lt;/a&gt;.
See this &lt;a class=&#34;link&#34; href=&#34;https://github.com/a2aproject/a2a-samples/tree/main/samples/python/agents&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;example&lt;/a&gt;
for how they can work together.&lt;/p&gt;
&lt;h2 id=&#34;-installation&#34;&gt;🚀 Installation
&lt;/h2&gt;&lt;h3 id=&#34;stable-release-recommended&#34;&gt;Stable Release (Recommended)
&lt;/h3&gt;&lt;p&gt;You can install the latest stable version of ADK using &lt;code&gt;pip&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install google-adk
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The release cadence is weekly.&lt;/p&gt;
&lt;p&gt;This version is recommended for most users as it represents the most recent official release.&lt;/p&gt;
&lt;h3 id=&#34;development-version&#34;&gt;Development Version
&lt;/h3&gt;&lt;p&gt;Bug fixes and new features are merged into the main branch on GitHub first. If you need access to changes that haven&amp;rsquo;t been included in an official PyPI release yet, you can install directly from the main branch:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install git+https://github.com/google/adk-python.git@main
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Note: The development version is built directly from the latest code commits. While it includes the newest fixes and features, it may also contain experimental changes or bugs not present in the stable release. Use it primarily for testing upcoming changes or accessing critical fixes before they are officially released.&lt;/p&gt;
&lt;h2 id=&#34;-documentation&#34;&gt;📚 Documentation
&lt;/h2&gt;&lt;p&gt;Explore the full documentation for detailed guides on building, evaluating, and
deploying agents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://google.github.io/adk-docs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Documentation&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;-feature-highlight&#34;&gt;🏁 Feature Highlight
&lt;/h2&gt;&lt;h3 id=&#34;define-a-single-agent&#34;&gt;Define a single agent:
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google.adk.agents&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google.adk.tools&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;google_search&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;root_agent&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Agent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;search_assistant&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gemini-2.0-flash&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# Or your preferred Gemini model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;instruction&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;You are a helpful assistant. Answer user questions using Google Search when needed.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;description&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;An assistant that can search the web.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;tools&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;google_search&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;define-a-multi-agent-system&#34;&gt;Define a multi-agent system:
&lt;/h3&gt;&lt;p&gt;Define a multi-agent system with coordinator agent, greeter agent, and task execution agent. Then ADK engine and the model will guide the agents works together to accomplish the task.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;google.adk.agents&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlmAgent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;BaseAgent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Define individual agents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;greeter&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlmAgent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;greeter&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gemini-2.0-flash&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;task_executor&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlmAgent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;task_executor&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gemini-2.0-flash&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Create parent agent and assign children via sub_agents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;coordinator&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;LlmAgent&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Coordinator&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;gemini-2.0-flash&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;description&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;I coordinate greetings and tasks.&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;sub_agents&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# Assign sub_agents here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;greeter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;task_executor&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;development-ui&#34;&gt;Development UI
&lt;/h3&gt;&lt;p&gt;A built-in development UI to help you test, evaluate, debug, and showcase your agent(s).&lt;/p&gt;
&lt;img src=&#34;https://raw.githubusercontent.com/google/adk-python/main/assets/adk-web-dev-ui-function-call.png&#34;/&gt;
&lt;h3 id=&#34;evaluate-agents&#34;&gt;Evaluate Agents
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;adk &lt;span class=&#34;nb&#34;&gt;eval&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    samples_for_testing/hello_world &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    samples_for_testing/hello_world/hello_world_eval_set_001.evalset.json
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;-contributing&#34;&gt;🤝 Contributing
&lt;/h2&gt;&lt;p&gt;We welcome contributions from the community! Whether it&amp;rsquo;s bug reports, feature requests, documentation improvements, or code contributions, please see our&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://google.github.io/adk-docs/contributing-guide/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;General contribution guideline and flow&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Then if you want to contribute code, please read &lt;a class=&#34;link&#34; href=&#34;./CONTRIBUTING.md&#34; &gt;Code Contributing Guidelines&lt;/a&gt; to get started.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;vibe-coding&#34;&gt;Vibe Coding
&lt;/h2&gt;&lt;p&gt;If you are to develop agent via vibe coding the &lt;a class=&#34;link&#34; href=&#34;./llms.txt&#34; &gt;llms.txt&lt;/a&gt; and the &lt;a class=&#34;link&#34; href=&#34;./llms-full.txt&#34; &gt;llms-full.txt&lt;/a&gt; can be used as context to LLM. While the former one is a summarized one and the later one has the full information in case your LLM has big enough context window.&lt;/p&gt;
&lt;h2 id=&#34;-license&#34;&gt;📄 License
&lt;/h2&gt;&lt;p&gt;This project is licensed under the Apache 2.0 License - see the &lt;a class=&#34;link&#34; href=&#34;LICENSE&#34; &gt;LICENSE&lt;/a&gt; file for details.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Happy Agent Building!&lt;/em&gt;&lt;/p&gt;
</description>
        </item>
        <item>
        <title>reflex</title>
        <link>https://producthunt.programnotes.cn/en/p/reflex/</link>
        <pubDate>Wed, 06 Aug 2025 15:36:08 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/reflex/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1650749837474-a9ab19e3d1af?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTQ0NjU3NTF8&amp;ixlib=rb-4.1.0" alt="Featured image of post reflex" /&gt;&lt;h1 id=&#34;reflex-devreflex&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;reflex-dev/reflex&lt;/a&gt;
&lt;/h1&gt;&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;https://raw.githubusercontent.com/reflex-dev/reflex/main/docs/images/reflex.svg&#34; alt=&#34;Reflex Logo&#34; width=&#34;300px&#34;&gt;
&lt;hr&gt;
&lt;h3 id=&#34;-performant-customizable-web-apps-in-pure-python-deploy-in-seconds-&#34;&gt;&lt;strong&gt;✨ Performant, customizable web apps in pure Python. Deploy in seconds. ✨&lt;/strong&gt;
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://badge.fury.io/py/reflex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://badge.fury.io/py/reflex.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI version&#34;
	
	
&gt;&lt;/a&gt;
&lt;img src=&#34;https://img.shields.io/pypi/pyversions/reflex.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;versions&#34;
	
	
&gt;
&lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/getting-started/introduction&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/Documentation%20-Introduction%20-%20%23007ec6&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Documentation&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://pepy.tech/projects/reflex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://static.pepy.tech/badge/reflex&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI Downloads&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://discord.gg/T5WSbC2YtQ&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/discord/1029853095527727165?color=%237289da&amp;amp;label=Discord&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Discord&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://x.com/getreflex&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/twitter/follow/getreflex&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Twitter&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;English&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/zh/zh_cn/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;简体中文&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/zh/zh_tw/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;繁體中文&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/tr/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Türkçe&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/in/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;हिंदी&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/pt/pt_br/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Português (Brasil)&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/it/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Italiano&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/es/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Español&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/kr/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;한국어&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/ja/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;日本語&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/de/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Deutsch&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/pe/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Persian (پارسی)&lt;/a&gt; | &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/docs/vi/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Tiếng Việt&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;blockquote&gt;
&lt;p&gt;[!NOTE]
🚀 &lt;strong&gt;Try &lt;a class=&#34;link&#34; href=&#34;https://build.reflex.dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Reflex Build&lt;/a&gt;&lt;/strong&gt; – our AI-powered app builder that generates full-stack Reflex applications in seconds.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1 id=&#34;introduction&#34;&gt;Introduction
&lt;/h1&gt;&lt;p&gt;Reflex is a library to build full-stack web apps in pure Python.&lt;/p&gt;
&lt;p&gt;Key features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pure Python&lt;/strong&gt; - Write your app&amp;rsquo;s frontend and backend all in Python, no need to learn Javascript.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full Flexibility&lt;/strong&gt; - Reflex is easy to get started with, but can also scale to complex apps.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deploy Instantly&lt;/strong&gt; - After building, deploy your app with a &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/hosting/deploy-quick-start/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;single command&lt;/a&gt; or host it on your own server.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See our &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/blog/2024-03-21-reflex-architecture/#the-reflex-architecture&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;architecture page&lt;/a&gt; to learn how Reflex works under the hood.&lt;/p&gt;
&lt;h2 id=&#34;-installation&#34;&gt;⚙️ Installation
&lt;/h2&gt;&lt;p&gt;Open a terminal and run (Requires Python 3.10+):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install reflex
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;-create-your-first-app&#34;&gt;🥳 Create your first app
&lt;/h2&gt;&lt;p&gt;Installing &lt;code&gt;reflex&lt;/code&gt; also installs the &lt;code&gt;reflex&lt;/code&gt; command line tool.&lt;/p&gt;
&lt;p&gt;Test that the install was successful by creating a new project. (Replace &lt;code&gt;my_app_name&lt;/code&gt; with your project name):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;mkdir my_app_name
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; my_app_name
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;reflex init
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This command initializes a template app in your new directory.&lt;/p&gt;
&lt;p&gt;You can run this app in development mode:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;reflex run
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You should see your app running at http://localhost:3000.&lt;/p&gt;
&lt;p&gt;Now you can modify the source code in &lt;code&gt;my_app_name/my_app_name.py&lt;/code&gt;. Reflex has fast refreshes so you can see your changes instantly when you save your code.&lt;/p&gt;
&lt;h2 id=&#34;-example-app&#34;&gt;🫧 Example App
&lt;/h2&gt;&lt;p&gt;Let&amp;rsquo;s go over an example: creating an image generation UI around &lt;a class=&#34;link&#34; href=&#34;https://platform.openai.com/docs/guides/images/image-generation?context=node&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;DALL·E&lt;/a&gt;. For simplicity, we just call the &lt;a class=&#34;link&#34; href=&#34;https://platform.openai.com/docs/api-reference/authentication&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OpenAI API&lt;/a&gt;, but you could replace this with an ML model run locally.&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;https://raw.githubusercontent.com/reflex-dev/reflex/main/docs/images/dalle.gif&#34; alt=&#34;A frontend wrapper for DALL·E, shown in the process of generating an image.&#34; width=&#34;550&#34; /&gt;
&lt;/div&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Here is the complete code to create this. This is all done in one Python file!&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;23
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;24
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;25
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;26
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;27
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;28
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;29
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;30
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;31
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;32
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;33
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;34
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;35
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;36
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;37
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;38
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;39
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;40
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;41
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;42
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;43
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;44
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;45
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;46
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;47
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;48
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;49
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;50
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;51
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;52
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;53
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;54
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;55
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;56
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;reflex&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;as&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;rx&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;openai&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;openai_client&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;openai&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;OpenAI&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&amp;#34;The app state.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;image_url&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;get_image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Get the image from the prompt.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;window_alert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Prompt Empty&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;yield&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;response&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;openai_client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;images&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;n&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;size&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;1024x1024&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image_url&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;url&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;index&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;center&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;vstack&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;heading&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;DALL-E&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;font_size&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;1.5em&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;input&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;placeholder&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Enter a prompt..&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;on_blur&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;set_prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;width&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;25em&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;button&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;s2&#34;&gt;&amp;#34;Generate Image&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;on_click&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get_image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;width&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;25em&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;loading&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;cond&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;src&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image_url&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;width&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;20em&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;            &lt;span class=&#34;n&#34;&gt;align&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;center&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;p&#34;&gt;),&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;width&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;100%&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;height&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;100vh&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Add state and page to the app.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;app&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;App&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;app&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;add_page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;index&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Reflex:DALL-E&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;lets-break-this-down&#34;&gt;Let&amp;rsquo;s break this down.
&lt;/h2&gt;&lt;div align=&#34;center&#34;&gt;
&lt;img src=&#34;https://raw.githubusercontent.com/reflex-dev/reflex/main/docs/images/dalle_colored_code_example.png&#34; alt=&#34;Explaining the differences between backend and frontend parts of the DALL-E app.&#34; width=&#34;900&#34; /&gt;
&lt;/div&gt;
&lt;h3 id=&#34;reflex-ui&#34;&gt;&lt;strong&gt;Reflex UI&lt;/strong&gt;
&lt;/h3&gt;&lt;p&gt;Let&amp;rsquo;s start with the UI.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;index&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;center&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;o&#34;&gt;...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This &lt;code&gt;index&lt;/code&gt; function defines the frontend of the app.&lt;/p&gt;
&lt;p&gt;We use different components such as &lt;code&gt;center&lt;/code&gt;, &lt;code&gt;vstack&lt;/code&gt;, &lt;code&gt;input&lt;/code&gt;, and &lt;code&gt;button&lt;/code&gt; to build the frontend. Components can be nested within each other
to create complex layouts. And you can use keyword args to style them with the full power of CSS.&lt;/p&gt;
&lt;p&gt;Reflex comes with &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/library&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;60+ built-in components&lt;/a&gt; to help you get started. We are actively adding more components, and it&amp;rsquo;s easy to &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/wrapping-react/overview/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;create your own components&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;state&#34;&gt;&lt;strong&gt;State&lt;/strong&gt;
&lt;/h3&gt;&lt;p&gt;Reflex represents your UI as a function of your state.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;class&lt;/span&gt; &lt;span class=&#34;nc&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;State&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&amp;#34;The app state.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;image_url&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The state defines all the variables (called vars) in an app that can change and the functions that change them.&lt;/p&gt;
&lt;p&gt;Here the state is comprised of a &lt;code&gt;prompt&lt;/code&gt; and &lt;code&gt;image_url&lt;/code&gt;. There are also the booleans &lt;code&gt;processing&lt;/code&gt; and &lt;code&gt;complete&lt;/code&gt; to indicate when to disable the button (during image generation) and when to show the resulting image.&lt;/p&gt;
&lt;h3 id=&#34;event-handlers&#34;&gt;&lt;strong&gt;Event Handlers&lt;/strong&gt;
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;def&lt;/span&gt; &lt;span class=&#34;nf&#34;&gt;get_image&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Get the image from the prompt.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;k&#34;&gt;return&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;window_alert&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Prompt Empty&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;k&#34;&gt;yield&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;response&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;openai_client&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;images&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        &lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;prompt&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;n&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;size&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;1024x1024&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;image_url&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;url&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;processing&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;bp&#34;&gt;self&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;complete&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;False&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Within the state, we define functions called event handlers that change the state vars. Event handlers are the way that we can modify the state in Reflex. They can be called in response to user actions, such as clicking a button or typing in a text box. These actions are called events.&lt;/p&gt;
&lt;p&gt;Our DALL·E app has an event handler, &lt;code&gt;get_image&lt;/code&gt; which gets this image from the OpenAI API. Using &lt;code&gt;yield&lt;/code&gt; in the middle of an event handler will cause the UI to update. Otherwise the UI will update at the end of the event handler.&lt;/p&gt;
&lt;h3 id=&#34;routing&#34;&gt;&lt;strong&gt;Routing&lt;/strong&gt;
&lt;/h3&gt;&lt;p&gt;Finally, we define our app.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;app&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;rx&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;App&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;We add a page from the root of the app to the index component. We also add a title that will show up in the page preview/browser tab.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;app&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;add_page&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;index&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;title&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;DALL-E&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You can create a multi-page app by adding more pages.&lt;/p&gt;
&lt;h2 id=&#34;-resources&#34;&gt;📑 Resources
&lt;/h2&gt;&lt;div align=&#34;center&#34;&gt;
&lt;p&gt;📑 &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/getting-started/introduction&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Docs&lt;/a&gt;   |   🗞️ &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/blog&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Blog&lt;/a&gt;   |   📱 &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/library&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Component Library&lt;/a&gt;   |   🖼️ &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/templates/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Templates&lt;/a&gt;   |   🛸 &lt;a class=&#34;link&#34; href=&#34;https://reflex.dev/docs/hosting/deploy-quick-start&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Deployment&lt;/a&gt;  &lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&#34;-status&#34;&gt;✅ Status
&lt;/h2&gt;&lt;p&gt;Reflex launched in December 2022 with the name Pynecone.&lt;/p&gt;
&lt;p&gt;🚀 Introducing &lt;a class=&#34;link&#34; href=&#34;https://build.reflex.dev/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Reflex Build&lt;/a&gt; — Our AI-Powered Builder
Reflex Build uses AI to generate complete full-stack Python applications. It helps you quickly create, customize, and refine your Reflex apps — from frontend components to backend logic — so you can focus on your ideas instead of boilerplate code. Whether you’re prototyping or scaling, Reflex Build accelerates development by intelligently scaffolding and optimizing your app’s entire stack.&lt;/p&gt;
&lt;p&gt;Alongside this, &lt;a class=&#34;link&#34; href=&#34;https://cloud.reflex.dev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Reflex Cloud&lt;/a&gt; launched in 2025 to offer the best hosting experience for your Reflex apps. We’re continuously improving the platform with new features and capabilities.&lt;/p&gt;
&lt;p&gt;Reflex has new releases and features coming every week! Make sure to :star: star and :eyes: watch this repository to stay up to date.&lt;/p&gt;
&lt;h2 id=&#34;contributing&#34;&gt;Contributing
&lt;/h2&gt;&lt;p&gt;We welcome contributions of any size! Below are some good ways to get started in the Reflex community.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Join Our Discord&lt;/strong&gt;: Our &lt;a class=&#34;link&#34; href=&#34;https://discord.gg/T5WSbC2YtQ&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Discord&lt;/a&gt; is the best place to get help on your Reflex project and to discuss how you can contribute.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GitHub Discussions&lt;/strong&gt;: A great way to talk about features you want added or things that are confusing/need clarification.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GitHub Issues&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Issues&lt;/a&gt; are an excellent way to report bugs. Additionally, you can try and solve an existing issue and submit a PR.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are actively looking for contributors, no matter your skill level or experience. To contribute check out &lt;a class=&#34;link&#34; href=&#34;https://github.com/reflex-dev/reflex/blob/main/CONTRIBUTING.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;CONTRIBUTING.md&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;all-thanks-to-our-contributors&#34;&gt;All Thanks To Our Contributors:
&lt;/h2&gt;&lt;a href=&#34;https://github.com/reflex-dev/reflex/graphs/contributors&#34;&gt;
  &lt;img src=&#34;https://contrib.rocks/image?repo=reflex-dev/reflex&#34; /&gt;
&lt;/a&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;Reflex is open-source and licensed under the &lt;a class=&#34;link&#34; href=&#34;https://raw.githubusercontent.com/reflex-dev/reflex/main/LICENSE&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Apache License 2.0&lt;/a&gt;.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>awesome-claude-code</title>
        <link>https://producthunt.programnotes.cn/en/p/awesome-claude-code/</link>
        <pubDate>Wed, 23 Jul 2025 15:36:28 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/awesome-claude-code/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1703088996593-39768a77fb82?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTMyNTYwMTZ8&amp;ixlib=rb-4.1.0" alt="Featured image of post awesome-claude-code" /&gt;&lt;h1 id=&#34;hesreallyhimawesome-claude-code&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hesreallyhim/awesome-claude-code&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hesreallyhim/awesome-claude-code&lt;/a&gt;
&lt;/h1&gt;&lt;!--lint disable remark-lint:awesome-badge--&gt;
&lt;h1 id=&#34;&#34;&gt;
&lt;/h1&gt;&lt;!-- [![Awesome](https://awesome.re/badge-flat2.svg)](https://awesome.re) --&gt;
&lt;pre style=&#34;display: inline-block; text-align: left;&#34;&gt;
 █████┐ ██┐    ██┐███████┐███████┐ ██████┐ ███┐   ███┐███████┐
██┌──██┐██│    ██│██┌────┘██┌────┘██┌───██┐████┐ ████│██┌────┘
███████│██│ █┐ ██│█████┐  ███████┐██│   ██│██┌████┌██│█████┐
██┌──██│██│███┐██│██┌──┘  └────██│██│   ██│██│└██┌┘██│██┌──┘
██│  ██│└███┌███┌┘███████┐███████│└██████┌┘██│ └─┘ ██│███████┐
└─┘  └─┘ └──┘└──┘ └──────┘└──────┘ └─────┘ └─┘     └─┘└──────┘

 ────────────────────────────────────────────────────────────────────────────────────

 ██████┐██┐      █████┐ ██┐   ██┐██████┐ ███████┐     ██████┐ ██████┐ ██████┐ ███████┐
██┌────┘██│     ██┌──██┐██│   ██│██┌──██┐██┌────┘    ██┌────┘██┌───██┐██┌──██┐██┌────┘
██│     ██│     ███████│██│   ██│██│  ██│█████┐      ██│     ██│   ██│██│  ██│█████┐
██│     ██│     ██┌──██│██│   ██│██│  ██│██┌──┘      ██│     ██│   ██│██│  ██│██┌──┘
└██████┐███████┐██│  ██│└██████┌┘██████┌┘███████┐    └██████┐└██████┌┘██████┌┘███████┐
 └─────┘└──────┘└─┘  └─┘ └─────┘ └─────┘ └──────┘     └─────┘ └─────┘ └─────┘ └──────┘
&lt;/pre&gt;
&lt;!--lint enable remark-lint:awesome-badge--&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://awesome.re&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://awesome.re/badge-flat2.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Awesome&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;!--lint enable remark-lint:awesome-badge--&gt;
&lt;!--lint disable double-link--&gt;
&lt;p&gt;This is a curated list of slash-commands, &lt;code&gt;CLAUDE.md&lt;/code&gt; files, CLI tools, and other resources and guides for enhancing your &lt;a class=&#34;link&#34; href=&#34;https://docs.anthropic.com/en/docs/claude-code&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Claude Code&lt;/a&gt; workflow, productivity, and vibes.&lt;/p&gt;
&lt;!--lint enable double-link--&gt;
&lt;p&gt;Claude Code is a cutting-edge CLI-based coding assistant and agent that you can access in your terminal or IDE. It is a rapidly evolving tool that offers a number of powerful capabilities, and allows for a lot of configuration, in a lot of different ways. Users are actively working out best practices and workflows. It is the hope that this repo will help the community share knowledge and understand how to get the most out of Claude Code.&lt;/p&gt;
&lt;h3 id=&#34;announcements&#34;&gt;Announcements
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;2025-07-18 - I ended up over-engineering the submission workflow, but I think it&amp;rsquo;s done, I just have to smoke test it and update the docs. For anyone with existing PR&amp;rsquo;s, don&amp;rsquo;t worry about updating them (for formatting purposes, that is), I can take care of it myself. For anoyne with new PR&amp;rsquo;s, you &lt;em&gt;should&lt;/em&gt; be able to run &lt;code&gt;make submit&lt;/code&gt; from the root directory of your fork for an interactive experience (as I said, needs smoke testing) - alternatively, add your entry to the bottom of &lt;a class=&#34;link&#34; href=&#34;../THE_RESOURCES_TABLE.csv&#34; &gt;&lt;code&gt;THE_RESOURCES_TABLE&lt;/code&gt;&lt;/a&gt; and run &lt;code&gt;make generate&lt;/code&gt; to automatically update the &lt;code&gt;README.md&lt;/code&gt; based on the information you filled in. If it&amp;rsquo;s not working, just open a PR with the relevant information and I&amp;rsquo;ll deal with it, I created this mess anyway 😃.&lt;/li&gt;
&lt;/ul&gt;
&lt;br&gt;
&lt;h2 id=&#34;contents&#34;&gt;Contents
&lt;/h2&gt;&lt;p&gt;▪     &lt;a class=&#34;link&#34; href=&#34;#workflows--knowledge-guides-&#34; &gt;Workflows &amp;amp; Knowledge Guides&lt;/a&gt;&lt;br&gt;
▪     &lt;a class=&#34;link&#34; href=&#34;#tooling-&#34; &gt;Tooling&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#ide-integrations&#34; &gt;IDE Integrations&lt;/a&gt;&lt;br&gt;
▪     &lt;a class=&#34;link&#34; href=&#34;#hooks-&#34; &gt;Hooks&lt;/a&gt;&lt;br&gt;
▪     &lt;a class=&#34;link&#34; href=&#34;#slash-commands-&#34; &gt;Slash-Commands&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#version-control--git&#34; &gt;Version Control &amp;amp; Git&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#code-analysis--testing&#34; &gt;Code Analysis &amp;amp; Testing&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#context-loading--priming&#34; &gt;Context Loading &amp;amp; Priming&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#documentation--changelogs&#34; &gt;Documentation &amp;amp; Changelogs&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#ci--deployment&#34; &gt;CI / Deployment&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#project--task-management&#34; &gt;Project &amp;amp; Task Management&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#miscellaneous&#34; &gt;Miscellaneous&lt;/a&gt;&lt;br&gt;
▪     &lt;a class=&#34;link&#34; href=&#34;#claudemd-files-&#34; &gt;CLAUDE.md Files&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#language-specific&#34; &gt;Language-Specific&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#domain-specific&#34; &gt;Domain-Specific&lt;/a&gt;&lt;br&gt;
          ▫     &lt;a class=&#34;link&#34; href=&#34;#project-scaffolding--mcp&#34; &gt;Project Scaffolding &amp;amp; MCP&lt;/a&gt;&lt;br&gt;
▪     &lt;a class=&#34;link&#34; href=&#34;#official-documentation-&#34; &gt;Official Documentation&lt;/a&gt;&lt;/p&gt;
&lt;br&gt;
&lt;h2 id=&#34;workflows--knowledge-guides-&#34;&gt;Workflows &amp;amp; Knowledge Guides 🧠
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;A &lt;strong&gt;workflow&lt;/strong&gt; is a tightly coupled set of Claude Code-native resources that facilitate specific projects&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/cloudartisan/cloudartisan.github.io/tree/main/.claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Blogging Platform Instructions&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/cloudartisan&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;cloudartisan&lt;/a&gt;    ⚖️  CC-BY-SA-4.0&lt;br&gt;
Provides a well-structured set of commands for publishing and maintaining a blogging platform, including commands for creating posts, managing categories, and handling media files.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://claudelog.com&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;ClaudeLog&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://www.reddit.com/user/inventor_black/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;InventorBlack&lt;/a&gt; &lt;br&gt;
A comprehensive knowledge repository that features detailed breakdowns of advanced Claude Code mechanics including &lt;a class=&#34;link&#34; href=&#34;https://claudelog.com/mechanics/claude-md-supremacy&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;CLAUDE.md best practices&lt;/a&gt;, practical technique guides like &lt;a class=&#34;link&#34; href=&#34;https://claudelog.com/mechanics/plan-mode&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;plan mode&lt;/a&gt;, and a &lt;a class=&#34;link&#34; href=&#34;https://claudelog.com/configuration&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;configuration guide&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/disler/just-prompt/tree/main/.claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Context Priming&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/disler&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;disler&lt;/a&gt; &lt;br&gt;
Provides a systematic approach to priming Claude Code with comprehensive project context through specialized commands for different project scenarios and development contexts.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/kingler/n8n_agent/tree/main/.claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;n8n_agent&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/kingler&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;kingler&lt;/a&gt; &lt;br&gt;
Amazing comprehensive set of comments for code analysis, QA, design, documentation, project structure, project management, optimization, and many more.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/steadycursor/steadystart/tree/main/.claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Project Bootstrapping and Task Management&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/steadycursor&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;steadycursor&lt;/a&gt; &lt;br&gt;
Provides a structured set of commands for bootstrapping and managing a new project, including meta-commands for creating and editing custom slash-commands.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/scopecraft/command/tree/main/.claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Project Management, Implementation, Planning, and Release&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/scopecraft&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;scopecraft&lt;/a&gt; &lt;br&gt;
Really comprehensive set of commands for all aspects of SDLC.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/harperreed/dotfiles/tree/master/.claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Project Workflow System&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/harperreed&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;harperreed&lt;/a&gt; &lt;br&gt;
A set of commands that provide a comprehensive workflow system for managing projects, including task management, code review, and deployment processes.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://diwank.space/field-notes-from-shipping-real-code-with-claude&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Shipping Real Code w/ Claude&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/creatorrr&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Diwank&lt;/a&gt; &lt;br&gt;
A detailed blog post explaining the author&amp;rsquo;s process for shipping a product with Claude Code, including CLAUDE.md files and other interesting resources.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Helmi/claude-simone&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Simone&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Helmi&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Helmi&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A broader project management workflow for Claude Code that encompasses not just a set of commands, but a system of documents, guidelines, and processes to facilitate project planning and execution.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/wcygan/dotfiles/tree/d8ab6b9f5a7a81007b7f5fa3025d4f83ce12cc02/claude/commands&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Slash-commands megalist&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/wcygan&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;wcygan&lt;/a&gt; &lt;br&gt;
A pretty stunning list (88 at the time of this post!) of slash-commands ranging from agent orchestration, code review, project management, security, documentation, self-assessment, almost anything you can dream of.&lt;/p&gt;
&lt;br&gt;
&lt;h2 id=&#34;tooling-&#34;&gt;Tooling 🧰
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tooling&lt;/strong&gt; denotes applications that are built on top of Claude Code and consist of more components than slash-commands and &lt;code&gt;CLAUDE.md&lt;/code&gt; files&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ryoppippi/ccusage&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;CC Usage&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/ryoppippi&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ryoppippi&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Handy CLI tool for managing and analyzing Claude Code usage, based on analyzing local Claude Code logs. Presents a nice dashboard regarding cost information, token consumption, etc.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/nyatinte/ccexp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;ccexp&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/nyatinte&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;nyatinte&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Interactive CLI tool for discovering and managing Claude Code configuration files and slash commands with a beautiful terminal UI.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ruvnet/claude-code-flow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Code Flow&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/ruvnet&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ruvnet&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
This mode serves as a code-first orchestration layer, enabling Claude to write, edit, test, and optimize code autonomously across recursive agent cycles.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/possibilities/claude-composer&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Composer&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/possibilities&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Mike Bannister&lt;/a&gt;    ⚖️  Unlicense&lt;br&gt;
A tool that adds small enhancements to Claude Code.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/claude-did-this/claude-hub&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Hub&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/claude-did-this&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Claude Did This&lt;/a&gt; &lt;br&gt;
A webhook service that connects Claude Code to GitHub repositories, enabling AI-powered code assistance directly through pull requests and issues. This integration allows Claude to analyze repositories, answer technical questions, and help developers understand and improve their codebase through simple @mentions.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/smtg-ai/claude-squad&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Squad&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/smtg-ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;smtg-ai&lt;/a&gt;    ⚖️  AGPL-3.0&lt;br&gt;
Claude Squad is a terminal app that manages multiple Claude Code, Codex (and other local agents including Aider) in separate workspaces, allowing you to work on multiple tasks simultaneously.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/parruda/claude-swarm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Swarm&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/parruda&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;parruda&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Launch Claude Code session that is connected to a swarm of Claude Code Agents.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/eyaltoledano/claude-task-master&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Task Master&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/eyaltoledano&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;eyaltoledano&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
A task management system for AI-driven development with Claude, designed to work seamlessly with Cursor AI.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/grahama1970/claude-task-runner&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Claude Task Runner&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/grahama1970&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;grahama1970&lt;/a&gt; &lt;br&gt;
A specialized tool to manage context isolation and focused task execution with Claude Code, solving the critical challenge of context length limitations and task focus when working with Claude on complex, multi-step projects.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/dagger/container-use&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Container Use&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/dagger&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;dagger&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Development environments for coding agents. Enable multiple agents to work safely and independently with your preferred stack.&lt;/p&gt;
&lt;h3 id=&#34;ide-integrations&#34;&gt;IDE Integrations
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/stevemolitor/claude-code.el&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;claude-code.el&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/stevemolitor&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;stevemolitor&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
An Emacs interface for Claude Code CLI.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/greggh/claude-code.nvim&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;claude-code.nvim&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/greggh&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;greggh&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A seamless integration between Claude Code AI assistant and Neovim.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/stravu/crystal&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;crystal&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/stravu&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;stravu&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A full-fledged desktop application for orchestrating, monitoring, and interacting with Claude Code agents.&lt;/p&gt;
&lt;br&gt;
&lt;h2 id=&#34;hooks-&#34;&gt;Hooks 🪝
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hooks&lt;/strong&gt; are a brand new API for Claude Code that allows users to activate commands and run scripts at different points in Claude&amp;rsquo;s agentic lifecycle.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Experimental]&lt;/strong&gt; - The resources listed in this section have not been fully vetted and may not work as expected, given the bleeding-edge nature of Claude Code hooks. Nevertheless, I wished to include them at least as a source of inspiration and to explore this unknown terrain. YMMV!&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/beyondcode/claude-hooks-sdk&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;claude-code-hooks-sdk&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/beyondcode&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;beyondcode&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A Laravel-inspired PHP SDK for building Claude Code hook responses with a clean, fluent API. This SDK makes it easy to create structured JSON responses for Claude Code hooks using an expressive, chainable interface.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/johnlindquist/claude-hooks&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;claude-hooks&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/johnlindquist&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;John Lindquist&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A TypeScript-based system for configuring and customizing Claude Code hooks with a powerful and flexible interface.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Veraticus/nix-config/tree/main/home-manager/claude-code/hooks&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Linting, testing, and notifications (in go)&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Veraticus&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Josh Symonds&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Nice set of hooks for enforcing code quality (linting, testing, notifications), with a nice configuration setup as well.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/nizos/tdd-guard&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;TDD Guard&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/nizos&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Nizar Selander&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A hooks-driven system that monitors file operations in real-time and blocks changes that violate TDD principles.&lt;/p&gt;
&lt;br&gt;
&lt;h2 id=&#34;slash-commands-&#34;&gt;Slash-Commands 🔪
&lt;/h2&gt;&lt;h3 id=&#34;version-control--git&#34;&gt;Version Control &amp;amp; Git
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/danielscholl/mvn-mcp-server/blob/main/.claude/commands/bug-fix.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/bug-fix&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/danielscholl&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;danielscholl&lt;/a&gt; &lt;br&gt;
Streamlines bug fixing by creating a GitHub issue first, then a feature branch for implementing and thoroughly testing the solution before merging.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/evmts/tevm-monorepo/blob/main/.claude/commands/commit.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/commit&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/evmts&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;evmts&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Creates git commits using conventional commit format with appropriate emojis, following project standards and creating descriptive messages that explain the purpose of changes.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/steadycursor/steadystart/blob/main/.claude/commands/2-commit-fast.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/commit-fast&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/steadycursor&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;steadycursor&lt;/a&gt; &lt;br&gt;
Automates git commit process by selecting the first suggested message, generating structured commits with consistent formatting while skipping manual confirmation and removing Claude co-Contributorship footer&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/toyamarinyon/giselle/blob/main/.claude/commands/create-pr.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-pr&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/toyamarinyon&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;toyamarinyon&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Streamlines pull request creation by handling the entire workflow: creating a new branch, committing changes, formatting modified files with Biome, and submitting the PR.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/liam-hq/liam/blob/main/.claude/commands/create-pull-request.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-pull-request&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/liam-hq&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;liam-hq&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Provides comprehensive PR creation guidance with GitHub CLI, enforcing title conventions, following template structure, and offering concrete command examples with best practices.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/evmts/tevm-monorepo/blob/main/.claude/commands/create-worktrees.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-worktrees&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/evmts&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;evmts&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Creates git worktrees for all open PRs or specific branches, handling branches with slashes, cleaning up stale worktrees, and supporting custom branch creation for development.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/jeremymailen/kotlinter-gradle/blob/master/.claude/commands/fix-github-issue.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/fix-github-issue&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/jeremymailen&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;jeremymailen&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Analyzes and fixes GitHub issues using a structured approach with GitHub CLI for issue details, implementing necessary code changes, running tests, and creating proper commit messages.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/metabase/metabase/blob/master/.claude/commands/fix-issue.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/fix-issue&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/metabase&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;metabase&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Addresses GitHub issues by taking issue number as parameter, analyzing context, implementing solution, and testing/validating the fix for proper integration.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/metabase/metabase/blob/master/.claude/commands/fix-pr.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/fix-pr&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/metabase&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;metabase&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Fetches and fixes unresolved PR comments by automatically retrieving feedback, addressing reviewer concerns, making targeted code improvements, and streamlining the review process.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/evmts/tevm-monorepo/blob/main/.claude/commands/husky.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/husky&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/evmts&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;evmts&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Sets up and manages Husky Git hooks by configuring pre-commit hooks, establishing commit message standards, integrating with linting tools, and ensuring code quality on commits.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/arkavo-org/opentdf-rs/blob/main/.claude/commands/pr-review.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/pr-review&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/arkavo-org&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;arkavo-org&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Reviews pull request changes to provide feedback, check for issues, and suggest improvements before merging into the main codebase.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/giselles-ai/giselle/blob/main/.claude/commands/update-branch-name.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/update-branch-name&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/giselles-ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;giselles-ai&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Updates branch names with proper prefixes and formats, enforcing naming conventions, supporting semantic prefixes, and managing remote branch updates.&lt;/p&gt;
&lt;h3 id=&#34;code-analysis--testing&#34;&gt;Code Analysis &amp;amp; Testing
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/rygwdn/slack-tools/blob/main/.claude/commands/check.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/check&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/rygwdn&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;rygwdn&lt;/a&gt; &lt;br&gt;
Performs comprehensive code quality and security checks, featuring static analysis integration, security vulnerability scanning, code style enforcement, and detailed reporting.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Graphlet-AI/eridu/blob/main/.claude/commands/clean.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/clean&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Graphlet-AI&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Graphlet-AI&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Addresses code formatting and quality issues by fixing black formatting problems, organizing imports with isort, resolving flake8 linting issues, and correcting mypy type errors.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/kingler/n8n_agent/blob/main/.claude/commands/code_analysis.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/code_analysis&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/kingler&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;kingler&lt;/a&gt; &lt;br&gt;
Provides a menu of advanced code analysis commands for deep inspection, including knowledge graph generation, optimization suggestions, and quality evaluation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/to4iki/ai-project-rules/blob/main/.claude/commands/optimize.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/optimize&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/to4iki&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;to4iki&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Analyzes code performance to identify bottlenecks, proposing concrete optimizations with implementation guidance for improved application performance.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/rzykov/metabase/blob/master/.claude/commands/repro-issue.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/repro-issue&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/rzykov&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;rzykov&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Creates reproducible test cases for GitHub issues, ensuring tests fail reliably and documenting clear reproduction steps for developers.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/zscott/pane/blob/main/.claude/commands/tdd.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/tdd&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/zscott&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;zscott&lt;/a&gt; &lt;br&gt;
Guides development using Test-Driven Development principles, enforcing Red-Green-Refactor discipline, integrating with git workflow, and managing PR creation.&lt;/p&gt;
&lt;h3 id=&#34;context-loading--priming&#34;&gt;Context Loading &amp;amp; Priming
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/elizaOS/elizaos.github.io/blob/main/.claude/commands/context-prime.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/context-prime&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/elizaOS&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;elizaOS&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Primes Claude with comprehensive project understanding by loading repository structure, setting development context, establishing project goals, and defining collaboration parameters.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/okuvshynov/cubestat/blob/main/.claude/commands/initref.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/initref&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/okuvshynov&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;okuvshynov&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Initializes reference documentation structure with standard doc templates, API reference setup, documentation conventions, and placeholder content generation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ethpandaops/xatu-data/blob/master/.claude/commands/load-llms-txt.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/load-llms-txt&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/ethpandaops&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ethpandaops&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Loads LLM configuration files to context, importing specific terminology, model configurations, and establishing baseline terminology for AI discussions.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3/torchcell/blob/main/.claude/commands/load_coo_context.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/load_coo_context&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Mjvolk3&lt;/a&gt; &lt;br&gt;
References specific files for sparse matrix operations, explains transform usage, compares with previous approaches, and sets data formatting context for development.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3/torchcell/blob/main/.claude/commands/load_dango_pipeline.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/load_dango_pipeline&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Mjvolk3&lt;/a&gt; &lt;br&gt;
Sets context for model training by referencing pipeline files, establishing working context, and preparing for pipeline work with relevant documentation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/yzyydev/AI-Engineering-Structure/blob/main/.claude/commands/prime.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/prime&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/yzyydev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;yzyydev&lt;/a&gt; &lt;br&gt;
Sets up initial project context by viewing directory structure and reading key files, creating standardized context with directory visualization and key documentation focus.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ddisisto/si/blob/main/.claude/commands/rsi.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/rsi&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/ddisisto&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ddisisto&lt;/a&gt; &lt;br&gt;
Reads all commands and key project files to optimize AI-assisted development by streamlining the process, loading command context, and setting up for better development workflow.&lt;/p&gt;
&lt;h3 id=&#34;documentation--changelogs&#34;&gt;Documentation &amp;amp; Changelogs
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/berrydev-ai/blockdoc-python/blob/main/.claude/commands/add-to-changelog.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/add-to-changelog&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/berrydev-ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;berrydev-ai&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Adds new entries to changelog files while maintaining format consistency, properly documenting changes, and following established project standards for version tracking.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/jerseycheese/Narraitor/tree/feature/issue-227-ai-suggestions/.claude/commands/analyze-issue.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-docs&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/jerseycheese&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;jerseycheese&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Analyzes code structure and purpose to create comprehensive documentation detailing inputs/outputs, behavior, user interaction flows, and edge cases with error handling.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/slunsford/coffee-analytics/blob/main/.claude/commands/docs.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/docs&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/slunsford&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;slunsford&lt;/a&gt; &lt;br&gt;
Generates comprehensive documentation that follows project structure, documenting APIs and usage patterns with consistent formatting for better user understanding.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hackdays-io/toban-contribution-viewer/blob/main/.claude/commands/explain-issue-fix.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/explain-issue-fix&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/hackdays-io&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hackdays-io&lt;/a&gt; &lt;br&gt;
Documents solution approaches for GitHub issues, explaining technical decisions, detailing challenges overcome, and providing implementation context for better understanding.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Consiliency/Flutter-Structurizr/blob/main/.claude/commands/update-docs.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/update-docs&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Consiliency&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Consiliency&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Reviews current documentation status, updates implementation progress, reviews phase documents, and maintains documentation consistency across the project.&lt;/p&gt;
&lt;h3 id=&#34;ci--deployment&#34;&gt;CI / Deployment
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/kelp/webdown/blob/main/.claude/commands/release.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/release&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/kelp&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;kelp&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Manages software releases by updating changelogs, reviewing README changes, evaluating version increments, and documenting release changes for better version tracking.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hackdays-io/toban-contribution-viewer/blob/main/.claude/commands/run-ci.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/run-ci&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/hackdays-io&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hackdays-io&lt;/a&gt; &lt;br&gt;
Activates virtual environments, runs CI-compatible check scripts, iteratively fixes errors, and ensures all tests pass before completion.&lt;/p&gt;
&lt;h3 id=&#34;project--task-management&#34;&gt;Project &amp;amp; Task Management
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/scopecraft/command/blob/main/.claude/commands/create-command.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-command&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/scopecraft&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;scopecraft&lt;/a&gt; &lt;br&gt;
Guides Claude through creating new custom commands with proper structure by analyzing requirements, templating commands by category, enforcing command standards, and creating supporting documentation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/taddyorg/inkverse/blob/main/.claude/commands/create-jtbd.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-jtbd&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/taddyorg&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;taddyorg&lt;/a&gt;    ⚖️  AGPL-3.0&lt;br&gt;
Creates Jobs-to-be-Done frameworks that outline user needs with structured format, focusing on specific user problems and organizing by job categories for product development.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/taddyorg/inkverse/blob/main/.claude/commands/create-prd.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-prd&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/taddyorg&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;taddyorg&lt;/a&gt;    ⚖️  AGPL-3.0&lt;br&gt;
Generates comprehensive product requirement documents outlining detailed specifications, requirements, and features following standardized document structure and format.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Wirasm/claudecode-utils/blob/main/.claude/commands/create-prp.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/create-prp&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Wirasm&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Wirasm&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Creates product requirement plans by reading PRP methodology, following template structure, creating comprehensive requirements, and structuring product definitions for development.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/disler/just-prompt/blob/main/.claude/commands/project_hello_w_name.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/project_hello_w_name&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/disler&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;disler&lt;/a&gt; &lt;br&gt;
Creates customizable greeting components with name input, demonstrating argument passing, component reusability, state management, and user input handling.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/chrisleyva/todo-slash-command/blob/main/todo.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/todo&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/chrisleyva&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;chrisleyva&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
A convenient command to quickly manage project todo items without leaving the Claude Code interface, featuring due dates, sorting, task prioritization, and comprehensive todo list management.&lt;/p&gt;
&lt;h3 id=&#34;miscellaneous&#34;&gt;Miscellaneous
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/TuckerTucker/tkr-portfolio/blob/main/.claude/commands/five.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/five&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/TuckerTucker&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;TuckerTucker&lt;/a&gt; &lt;br&gt;
Applies the &amp;ldquo;five whys&amp;rdquo; methodology to perform root cause analysis, identify underlying issues, and create solution approaches for complex problems.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3/torchcell/blob/main/.claude/commands/fixing_go_in_graph.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/fixing_go_in_graph&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Mjvolk3&lt;/a&gt; &lt;br&gt;
Focuses on Gene Ontology annotation integration in graph databases, handling multiple data sources, addressing graph representation issues, and ensuring correct data incorporation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/GaloyMoney/lana-bank/blob/main/.claude/commands/mermaid.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/mermaid&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/GaloyMoney&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GaloyMoney&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Generates Mermaid diagrams from SQL schema files, creating entity relationship diagrams with table properties, validating diagram compilation, and ensuring complete entity coverage.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3/torchcell/blob/main/.claude/commands/review_dcell_model.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/review_dcell_model&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Mjvolk3&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Mjvolk3&lt;/a&gt; &lt;br&gt;
Reviews old Dcell implementation files, comparing with newer Dango model, noting changes over time, and analyzing refactoring approaches for better code organization.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/zuplo/docs/blob/main/.claude/commands/use-stepper.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;/use-stepper&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/zuplo&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;zuplo&lt;/a&gt; &lt;br&gt;
Reformats documentation to use React Stepper component, transforming heading formats, applying proper indentation, and maintaining markdown compatibility with admonition formatting.&lt;/p&gt;
&lt;br&gt;
&lt;h2 id=&#34;claudemd-files-&#34;&gt;CLAUDE.md Files 📂
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt; files&lt;/strong&gt; are files that contain important guidelines and context-specfic information or instructions that help Claude Code to better understand your project and your coding standards&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;language-specific&#34;&gt;Language-Specific
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/didalgolab/ai-intellij-plugin/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;AI IntelliJ Plugin&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/didalgolab&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;didalgolab&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Provides comprehensive Gradle commands for IntelliJ plugin development with platform-specific coding patterns, detailed package structure guidelines, and clear internationalization standards.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/alexei-led/aws-mcp-server/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;AWS MCP Server&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/alexei-led&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;alexei-led&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Features multiple Python environment setup options with detailed code style guidelines, comprehensive error handling recommendations, and security considerations for AWS CLI interactions.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/touchlab/DroidconKotlin/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;DroidconKotlin&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/touchlab&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;touchlab&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Delivers comprehensive Gradle commands for cross-platform Kotlin Multiplatform development with clear module structure and practical guidance for dependency injection.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/expectedparrot/edsl/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;EDSL&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/expectedparrot&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;expectedparrot&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Offers detailed build and test commands with strict code style enforcement, comprehensive testing requirements, and standardized development workflow using Black and mypy.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/giselles-ai/giselle/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Giselle&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/giselles-ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;giselles-ai&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Provides detailed build and test commands using pnpm and Vitest with strict code formatting requirements and comprehensive naming conventions for code consistency.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/hashintel/hash/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;HASH&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/hashintel&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;hashintel&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Features comprehensive repository structure breakdown with strong emphasis on coding standards, detailed Rust documentation guidelines, and systematic PR review process.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/inkline/inkline/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Inkline&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/inkline&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;inkline&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Structures development workflow using pnpm with emphasis on TypeScript and Vue 3 Composition API, detailed component creation process, and comprehensive testing recommendations.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/mattgodbolt/jsbeeb/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;JSBeeb&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/mattgodbolt&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;mattgodbolt&lt;/a&gt;    ⚖️  GPL-3.0&lt;br&gt;
Provides development guide for JavaScript BBC Micro emulator with build and testing instructions, architecture documentation, and debugging workflows.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/LamoomAI/lamoom-python/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Lamoom Python&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/LamoomAI&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LamoomAI&lt;/a&gt;    ⚖️  Apache-2.0&lt;br&gt;
Serves as reference for production prompt engineering library with load balancing of AI Models, API documentation, and usage patterns with examples.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/langchain-ai/langgraphjs/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;LangGraphJS&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/langchain-ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;langchain-ai&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Offers comprehensive build and test commands with detailed TypeScript style guidelines, layered library architecture, and monorepo structure using yarn workspaces.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/metabase/metabase/blob/master/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Metabase&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/metabase&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;metabase&lt;/a&gt;    ⚖️  NOASSERTION&lt;br&gt;
Details workflow for REPL-driven development in Clojure/ClojureScript with emphasis on incremental development, testing, and step-by-step approach for feature implementation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/sgcarstrends/backend/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;SG Cars Trends Backend&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/sgcarstrends&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;sgcarstrends&lt;/a&gt; &lt;br&gt;
Provides comprehensive structure for TypeScript monorepo projects with detailed commands for development, testing, deployment, and AWS/Cloudflare integration.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/spylang/spy/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;SPy&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/spylang&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;spylang&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Enforces strict coding conventions with comprehensive testing guidelines, multiple code compilation options, and backend-specific test decorators for targeted filtering.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/KarpelesLab/tpl/blob/master/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;TPL&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/KarpelesLab&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;KarpelesLab&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Details Go project conventions with comprehensive error handling recommendations, table-driven testing approach guidelines, and modernization suggestions for latest Go features.&lt;/p&gt;
&lt;h3 id=&#34;domain-specific&#34;&gt;Domain-Specific
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Layr-Labs/avs-vibe-developer-guide/blob/master/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;AVS Vibe Developer Guide&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Layr-Labs&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Layr-Labs&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Structures AI-assisted EigenLayer AVS development workflow with consistent naming conventions for prompt files and established terminology standards for blockchain concepts.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/CommE2E/comm/blob/master/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Comm&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/CommE2E&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;CommE2E&lt;/a&gt;    ⚖️  BSD-3-Clause&lt;br&gt;
Serves as a development reference for E2E-encrypted messaging applications with code organization architecture, security implementation details, and testing procedures.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/badass-courses/course-builder/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Course Builder&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/badass-courses&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;badass-courses&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Enables real-time multiplayer capabilities for collaborative course creation with diverse tech stack integration and monorepo architecture using Turborepo.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/eastlondoner/cursor-tools/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Cursor Tools&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/eastlondoner&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;eastlondoner&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Creates a versatile AI command interface supporting multiple providers and models with flexible command options and browser automation through &amp;ldquo;Stagehand&amp;rdquo; feature.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/soramimi/Guitar/blob/master/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Guitar&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/soramimi&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;soramimi&lt;/a&gt;    ⚖️  GPL-2.0&lt;br&gt;
Serves as development guide for Guitar Git GUI Client with build commands for various platforms, code style guidelines for contributing, and project structure explanation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Fimeg/NetworkChronicles/blob/legacy-v1/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Network Chronicles&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Fimeg&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Fimeg&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Presents detailed implementation plan for AI-driven game characters with technical specifications for LLM integration, character guidelines, and service discovery mechanics.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/different-ai/note-companion/blob/master/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Note Companion&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/different-ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;different-ai&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Provides detailed styling isolation techniques for Obsidian plugins using Tailwind with custom prefix to prevent style conflicts and practical troubleshooting steps.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ParetoSecurity/pareto-mac/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Pareto Mac&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/ParetoSecurity&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ParetoSecurity&lt;/a&gt;    ⚖️  GPL-3.0&lt;br&gt;
Serves as development guide for Mac security audit tool with build instructions, contribution guidelines, testing procedures, and workflow documentation.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/steadycursor/steadystart/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;SteadyStart&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/steadycursor&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;steadycursor&lt;/a&gt; &lt;br&gt;
Clear and direct instructives about style, permissions, Claude&amp;rsquo;s &amp;ldquo;role&amp;rdquo;, communications, and documentation of Claude Code sessions for other team members to stay abreast.&lt;/p&gt;
&lt;h3 id=&#34;project-scaffolding--mcp&#34;&gt;Project Scaffolding &amp;amp; MCP
&lt;/h3&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/basicmachines-co/basic-memory/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Basic Memory&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/basicmachines-co&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;basicmachines-co&lt;/a&gt;    ⚖️  AGPL-3.0&lt;br&gt;
Presents an innovative AI-human collaboration framework with Model Context Protocol for bidirectional LLM-markdown communication and flexible knowledge structure for complex projects.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/grahama1970/claude-code-mcp-enhanced/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;claude-code-mcp-enhanced&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/grahama1970&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;grahama1970&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Provides detailed and emphatic instructions for Claude to follow as a coding agent, with testing guidance, code examples, and compliance checks.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/Family-IT-Guy/perplexity-mcp/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Perplexity MCP&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/Family-IT-Guy&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Family-IT-Guy&lt;/a&gt;    ⚖️  ISC&lt;br&gt;
Offers clear step-by-step installation instructions with multiple configuration options, detailed troubleshooting guidance, and concise architecture overview of the MCP protocol.&lt;/p&gt;
&lt;br&gt;
&lt;h2 id=&#34;official-documentation-&#34;&gt;Official Documentation 🏛️
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;Links to some of Anthropic&amp;rsquo;s terrific documentation and resources regarding Claude Code&lt;/p&gt;
&lt;/blockquote&gt;
&lt;!--lint disable double-link--&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.anthropic.com/en/docs/claude-code&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Anthropic Documentation&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/anthropics&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Anthropic&lt;/a&gt;    ⚖️  ©&lt;br&gt;
The official documentation for Claude Code, including installation instructions, usage guidelines, API references, tutorials, examples, loads of information that I won&amp;rsquo;t list individually. Like Claude Code, the documentation is frequently updated.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/anthropics/anthropic-quickstarts/blob/main/CLAUDE.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Anthropic Quickstarts&lt;/code&gt;&lt;/a&gt;   by   &lt;a class=&#34;link&#34; href=&#34;https://github.com/anthropics&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Anthropic&lt;/a&gt;    ⚖️  MIT&lt;br&gt;
Offers comprehensive development guides for three distinct AI-powered demo projects with standardized workflows, strict code style guidelines, and containerization instructions.&lt;/p&gt;
&lt;h2 id=&#34;contributing-&#34;&gt;Contributing 🌻
&lt;/h2&gt;&lt;p&gt;Please note that this project is released with a &lt;a class=&#34;link&#34; href=&#34;code-of-conduct.md&#34; &gt;Contributor Code of Conduct&lt;/a&gt;. By participating in this project you agree to abide by its terms.&lt;/p&gt;
&lt;p&gt;Regarding content, we especially welcome:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proven, effective resources that follow best practices and may even be in use in production.&lt;/li&gt;
&lt;li&gt;Innovative, creative, or experimental workflows that perhaps are still being iterated upon, but have high potential value, and push the boundaries of Claude Code&amp;rsquo;s documented capabilities and use cases.&lt;/li&gt;
&lt;li&gt;Additional libraries and tooling that are built on top of Claude Code and offer enhanced functionality.&lt;/li&gt;
&lt;li&gt;Applications of Claude Code outside of the traditional &amp;ldquo;coding assistant&amp;rdquo; context, e.g., CI/CD integration, testing, documentation, dev-ops, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See &lt;a class=&#34;link&#34; href=&#34;CONTRIBUTING.md&#34; &gt;CONTRIBUTING.md&lt;/a&gt; for more information on how to contribute to this project. Or, fire up Claude Code and invoke the &lt;code&gt;/project:add-new-resource&lt;/code&gt; command and let Claude walk you through it!&lt;/p&gt;
&lt;p&gt;If you have any suggestions or thoughts on how to improve the repo, or how to best organize the list, feel free to start a Discussion topic. This is meant to be for the Claude Code community, and in general I prefer not to act on sole authority.&lt;/p&gt;
&lt;h3 id=&#34;a-note-about-licenses&#34;&gt;A note about licenses
&lt;/h3&gt;&lt;p&gt;Because simply listing a hyperlink does not qualify as redistribution, the license of the original source is not relevant to its inclusion. However, for posterity and convenience, we do host copies of all resources whose license permits it. Therefore, please include information about the resource&amp;rsquo;s license. Additionally, take note: &lt;em&gt;if you do not include a LICENSE in your GitHub repo, then by default it is fully copyrighted and redistribution is not allowed&lt;/em&gt;. So, if you are intending to make an open source project, it&amp;rsquo;s critical to pick from one of the many available open source licenses. This is just a reminder that without a LICENSE, your project is not open source (it&amp;rsquo;s merely source-code-available) - it may of course still be included on this list, but this notice is to inform readers about the default rules regarding GitHub and LICENSE files. See &lt;a class=&#34;link&#34; href=&#34;https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/licensing-a-repository&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt; for more details.&lt;/p&gt;
</description>
        </item>
        <item>
        <title>planka</title>
        <link>https://producthunt.programnotes.cn/en/p/planka/</link>
        <pubDate>Wed, 04 Jun 2025 15:30:26 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/planka/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1541021771462-0362c99b0f1f?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NDkwMjIxNTd8&amp;ixlib=rb-4.1.0" alt="Featured image of post planka" /&gt;&lt;h1 id=&#34;plankanbanplanka&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;plankanban/planka&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;planka&#34;&gt;PLANKA
&lt;/h1&gt;&lt;p&gt;&lt;strong&gt;Project mastering driven by fun&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://img.shields.io/github/package-json/v/plankanban/planka?style=flat-square&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Version&#34;
	
	
&gt; &lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/pkgs/container/planka&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/docker_pulls-6M%2B-%23066da5?style=flat-square&amp;amp;color=red&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Docker Pulls&#34;
	
	
&gt;&lt;/a&gt; &lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/graphs/contributors&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/contributors/plankanban/planka?style=flat-square&amp;amp;color=blue&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Contributors&#34;
	
	
&gt;&lt;/a&gt; &lt;a class=&#34;link&#34; href=&#34;https://discord.gg/WqqYNd7Jvt&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/discord/1041440072953765979?style=flat-square&amp;amp;logo=discord&amp;amp;logoColor=white&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Chat&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://raw.githubusercontent.com/plankanban/planka/master/assets/demo.gif&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Demo&#34;
	
	
&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://plankanban.github.io/planka&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;strong&gt;Client demo&lt;/strong&gt;&lt;/a&gt; (without server features).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;⚠️ The demo GIF and client demo are based on &lt;strong&gt;v1&lt;/strong&gt; and will be updated soon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;key-features&#34;&gt;Key Features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Collaborative Kanban Boards&lt;/strong&gt;: Create projects, boards, lists, cards, and manage tasks with an intuitive drag-and-drop interface&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real-Time Updates&lt;/strong&gt;: Instant syncing across all users, no refresh needed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rich Markdown Support&lt;/strong&gt;: Write beautifully formatted card descriptions with a powerful markdown editor&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flexible Notifications&lt;/strong&gt;: Get alerts through 100+ providers, fully customizable to your workflow&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seamless Authentication&lt;/strong&gt;: Single sign-on with OpenID Connect integration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multilingual &amp;amp; Easy to Translate&lt;/strong&gt;: Full internationalization support for a global audience&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-to-deploy&#34;&gt;How to Deploy
&lt;/h2&gt;&lt;p&gt;PLANKA is easy to install using multiple methods - learn more in the &lt;a class=&#34;link&#34; href=&#34;https://docs.planka.cloud/docs/welcome/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;installation guide&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For configuration and environment settings, see the &lt;a class=&#34;link&#34; href=&#34;https://docs.planka.cloud/docs/category/configuration/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;configuration section&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;contact&#34;&gt;Contact
&lt;/h2&gt;&lt;p&gt;Interested in a hosted version of PLANKA? Email us at &lt;a class=&#34;link&#34; href=&#34;mailto:github@planka.group&#34; &gt;github@planka.group&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For any security issues, please do not create a public issue on GitHub - instead, report it privately by emailing &lt;a class=&#34;link&#34; href=&#34;mailto:security@planka.group&#34; &gt;security@planka.group&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; We do NOT offer any public support via email, please use GitHub.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Join our community:&lt;/strong&gt; Get help, share ideas, or contribute on our &lt;a class=&#34;link&#34; href=&#34;https://discord.gg/WqqYNd7Jvt&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Discord server&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;PLANKA is &lt;a class=&#34;link&#34; href=&#34;https://faircode.io&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;fair-code&lt;/a&gt; distributed under the &lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/blob/master/LICENSES/PLANKA%20Community%20License%20EN.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Fair Use License&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/blob/master/LICENSES/PLANKA%20Commercial%20License%20EN.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;PLANKA Pro/Enterprise License&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Source Available&lt;/strong&gt;: The source code is always visible&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Self-Hostable&lt;/strong&gt;: Deploy and host it anywhere&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Extensible&lt;/strong&gt;: Customize with your own functionality&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enterprise Licenses&lt;/strong&gt;: Available for additional features and support&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more details, check the &lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/blob/master/LICENSES/PLANKA%20License%20Guide%20EN.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;License Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;contributing&#34;&gt;Contributing
&lt;/h2&gt;&lt;p&gt;Found a bug or have a feature request? Check out our &lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/blob/master/CONTRIBUTING.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Contributing Guide&lt;/a&gt; to get started.&lt;/p&gt;
&lt;p&gt;For setting up the project locally, see the &lt;a class=&#34;link&#34; href=&#34;https://docs.planka.cloud/docs/category/development/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;development section&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thanks to all our contributors!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/plankanban/planka/graphs/contributors&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://contrib.rocks/image?repo=plankanban/planka&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Contributors&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
</description>
        </item>
        <item>
        <title>langflow</title>
        <link>https://producthunt.programnotes.cn/en/p/langflow/</link>
        <pubDate>Fri, 30 May 2025 15:29:59 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/langflow/</guid>
        <description>&lt;img src="https://images.unsplash.com/flagged/photo-1572850005109-f4ac7529bf9f?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NDg1OTAxMDZ8&amp;ixlib=rb-4.1.0" alt="Featured image of post langflow" /&gt;&lt;h1 id=&#34;langflow-ailangflow&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/langflow-ai/langflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;langflow-ai/langflow&lt;/a&gt;
&lt;/h1&gt;&lt;!-- markdownlint-disable MD030 --&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/langflow-ai/langflow/releases&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/release/langflow-ai/langflow?style=flat-square&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Release Notes&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://opensource.org/licenses/MIT&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/license-MIT-orange&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI - License&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://pypistats.org/packages/langflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/dm/langflow?style=flat-square&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI - Downloads&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://star-history.com/#langflow-ai/langflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/stars/langflow-ai/langflow?style=flat-square&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub star chart&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/langflow-ai/langflow/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/issues-raw/langflow-ai/langflow?style=flat-square&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Open Issues&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://twitter.com/langflow_ai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/twitter/url/https/twitter.com/langflow-ai.svg?style=social&amp;amp;label=Follow%20%40Langflow&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Twitter&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://www.youtube.com/@Langflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/youtube/channel/subscribers/UCn2bInQrjdDYKEEmbpwblLQ?label=Subscribe&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;YouTube Channel&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://discord.gg/EqksyE2EX9&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/discord/1116803230643527710?logo=discord&amp;amp;style=social&amp;amp;label=Join&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Discord Server&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://langflow.org&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Langflow&lt;/a&gt; is a powerful tool for building and deploying AI-powered agents and workflows. It provides developers with both a visual authoring experience and a built-in API server that turns every agent into an API endpoint that can be integrated into applications built on any framework or stack. Langflow comes with batteries included and supports all major LLMs, vector databases and a growing library of AI tools.&lt;/p&gt;
&lt;h2 id=&#34;-highlight-features&#34;&gt;✨ Highlight features
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Visual Builder&lt;/strong&gt; to get started quickly and iterate.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Access to Code&lt;/strong&gt; so developers can tweak any component using Python.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Playground&lt;/strong&gt; to immediately test and iterate on their flows with step-by-step control.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-agent&lt;/strong&gt; orchestration and conversation management and retrieval.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deploy as an API&lt;/strong&gt; or export as JSON for Python apps.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observability&lt;/strong&gt; with LangSmith, LangFuse and other integrations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enterprise-ready&lt;/strong&gt; security and scalability.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;-quickstart&#34;&gt;⚡️ Quickstart
&lt;/h2&gt;&lt;p&gt;Langflow works with Python 3.10 to 3.13.&lt;/p&gt;
&lt;p&gt;Install with uv &lt;strong&gt;(recommended)&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;uv pip install langflow
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Install with pip&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install langflow
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;-deployment&#34;&gt;📦 Deployment
&lt;/h2&gt;&lt;h3 id=&#34;self-managed&#34;&gt;Self-managed
&lt;/h3&gt;&lt;p&gt;Langflow is completely open source and you can deploy it to all major deployment clouds. Follow this &lt;a class=&#34;link&#34; href=&#34;https://docs.langflow.org/deployment-docker&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;guide&lt;/a&gt; to learn how to use Docker to deploy Langflow.&lt;/p&gt;
&lt;h3 id=&#34;fully-managed-by-datastax&#34;&gt;Fully-managed by DataStax
&lt;/h3&gt;&lt;p&gt;DataStax Langflow is a full-managed environment with zero setup. Developers can &lt;a class=&#34;link&#34; href=&#34;https://astra.datastax.com/signup?type=langflow&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;sign up for a free account&lt;/a&gt; to get started.&lt;/p&gt;
&lt;h2 id=&#34;-stay-up-to-date&#34;&gt;⭐ Stay up-to-date
&lt;/h2&gt;&lt;p&gt;Star Langflow on GitHub to be instantly notified of new releases.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://github.com/user-attachments/assets/03168b17-a11d-4b2a-b0f7-c1cce69e5a2c&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Star Langflow&#34;
	
	
&gt;&lt;/p&gt;
&lt;h2 id=&#34;-contribute&#34;&gt;👋 Contribute
&lt;/h2&gt;&lt;p&gt;We welcome contributions from developers of all levels. If you&amp;rsquo;d like to contribute, please check our &lt;a class=&#34;link&#34; href=&#34;./CONTRIBUTING.md&#34; &gt;contributing guidelines&lt;/a&gt; and help make Langflow more accessible.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://star-history.com/#langflow-ai/langflow&amp;amp;Date&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://api.star-history.com/svg?repos=langflow-ai/langflow&amp;amp;type=Timeline&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Star History Chart&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;-contributors&#34;&gt;❤️ Contributors
&lt;/h2&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/langflow-ai/langflow/graphs/contributors&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://contrib.rocks/image?repo=langflow-ai/langflow&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;langflow contributors&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
</description>
        </item>
        <item>
        <title>hardhat</title>
        <link>https://producthunt.programnotes.cn/en/p/hardhat/</link>
        <pubDate>Thu, 29 May 2025 15:29:27 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/hardhat/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1661692410737-2804d762237d?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NDg1MDM3MDJ8&amp;ixlib=rb-4.1.0" alt="Featured image of post hardhat" /&gt;&lt;h1 id=&#34;nomicfoundationhardhat&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NomicFoundation/hardhat&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NomicFoundation/hardhat&lt;/a&gt;
&lt;/h1&gt;&lt;p&gt;packages/hardhat-core/README.md&lt;/p&gt;
</description>
        </item>
        <item>
        <title>1Panel</title>
        <link>https://producthunt.programnotes.cn/en/p/1panel/</link>
        <pubDate>Mon, 21 Apr 2025 15:28:34 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/1panel/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1631028353342-9c573a9bc957?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NDUyMjA0NjF8&amp;ixlib=rb-4.0.3" alt="Featured image of post 1Panel" /&gt;&lt;h1 id=&#34;1panel-dev1panel&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/1Panel-dev/1Panel&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;1Panel-dev/1Panel&lt;/a&gt;
&lt;/h1&gt;&lt;p align=&#34;center&#34;&gt;&lt;a href=&#34;https://1panel.pro&#34;&gt;&lt;img src=&#34;https://resource.1panel.pro/img/1panel-logo.png&#34; alt=&#34;1Panel&#34; width=&#34;300&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;&lt;b&gt;Top-Rated Web-based Linux Server Management Tool&lt;/b&gt;&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;a href=&#34;https://trendshift.io/repositories/2462&#34; target=&#34;_blank&#34;&gt;&lt;img src=&#34;https://trendshift.io/api/badge/repositories/2462&#34; alt=&#34;1Panel-dev%2F1Panel | Trendshift&#34; style=&#34;width: 240px; height: auto;&#34; /&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;a href=&#34;https://www.gnu.org/licenses/gpl-3.0.html&#34;&gt;&lt;img src=&#34;https://shields.io/github/license/1Panel-dev/1Panel?color=%231890FF&#34; alt=&#34;License: GPL v3&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://app.codacy.com/gh/1Panel-dev/1Panel?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=1Panel-dev/1Panel&amp;utm_campaign=Badge_Grade_Dashboard&#34;&gt;&lt;img src=&#34;https://app.codacy.com/project/badge/Grade/da67574fd82b473992781d1386b937ef&#34; alt=&#34;Codacy&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://discord.gg/bUpUqWqdRr&#34; target=&#34;_blank&#34;&gt;
        &lt;img src=&#34;https://img.shields.io/discord/1318846410149335080?logo=discord&amp;labelColor=%20%235462eb&amp;logoColor=%20%23f5f5f5&amp;color=%20%235462eb&#34;
            alt=&#34;chat on Discord&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://github.com/1Panel-dev/1Panel/releases&#34;&gt;&lt;img src=&#34;https://img.shields.io/github/v/release/1Panel-dev/1Panel&#34; alt=&#34;GitHub release&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://github.com/1Panel-dev/1Panel&#34;&gt;&lt;img src=&#34;https://img.shields.io/github/stars/1Panel-dev/1Panel?color=%231890FF&amp;style=flat-square&#34; alt=&#34;Stars&#34;&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/README.md&#34;&gt;&lt;img alt=&#34;English&#34; src=&#34;https://img.shields.io/badge/English-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.zh-Hans.md&#34;&gt;&lt;img alt=&#34;中文(简体)&#34; src=&#34;https://img.shields.io/badge/中文(简体)-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.ja.md&#34;&gt;&lt;img alt=&#34;日本語&#34; src=&#34;https://img.shields.io/badge/日本語-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.pt-br.md&#34;&gt;&lt;img alt=&#34;Português (Brasil)&#34; src=&#34;https://img.shields.io/badge/Português (Brasil)-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.ar.md&#34;&gt;&lt;img alt=&#34;العربية&#34; src=&#34;https://img.shields.io/badge/العربية-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.de.md&#34;&gt;&lt;img alt=&#34;Deutsch&#34; src=&#34;https://img.shields.io/badge/Deutsch-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.es.md&#34;&gt;&lt;img alt=&#34;Español&#34; src=&#34;https://img.shields.io/badge/Español-d9d9d9&#34;&gt;&lt;/a&gt;&lt;br&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.fr.md&#34;&gt;&lt;img alt=&#34;français&#34; src=&#34;https://img.shields.io/badge/français-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.ko.md&#34;&gt;&lt;img alt=&#34;한국어&#34; src=&#34;https://img.shields.io/badge/한국어-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.id.md&#34;&gt;&lt;img alt=&#34;Bahasa Indonesia&#34; src=&#34;https://img.shields.io/badge/Bahasa Indonesia-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.zh-Hant.md&#34;&gt;&lt;img alt=&#34;中文(繁體)&#34; src=&#34;https://img.shields.io/badge/中文(繁體)-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.tr.md&#34;&gt;&lt;img alt=&#34;Türkçe&#34; src=&#34;https://img.shields.io/badge/Türkçe-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.ru.md&#34;&gt;&lt;img alt=&#34;Русский&#34; src=&#34;https://img.shields.io/badge/%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9-d9d9d9&#34;&gt;&lt;/a&gt;
  &lt;a href=&#34;https://producthunt.programnotes.cn/docs/README.ms.md&#34;&gt;&lt;img alt=&#34;Bahasa Melayu&#34; src=&#34;https://img.shields.io/badge/Bahasa Melayu-d9d9d9&#34;&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;1Panel provides an intuitive web interface and MCP Server to manage websites, files, containers, databases, and LLMs on a Linux server.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Efficient Management&lt;/strong&gt;: Through a user-friendly web graphical interface, 1Panel enables users to effortlessly manage their Linux servers. Key features include host monitoring, file management, database administration, container management, LLMs management.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rapid Website Deployment&lt;/strong&gt;: With deep integration of the popular open-source website building software WordPress, 1Panel streamlines the process of domain binding and SSL certificate configuration, all achievable with just one click.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Application Store&lt;/strong&gt;: 1Panel curates a wide range of high-quality open-source tools and applications, facilitating easy installation and updates for its users.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security and Reliability&lt;/strong&gt;: By leveraging containerization and secure application deployment practices, 1Panel minimizes vulnerability exposure. It further enhances security through integrated firewall management and log auditing capabilities.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;One-Click Backup &amp;amp; Restore&lt;/strong&gt;: Data protection is made simple with 1Panel&amp;rsquo;s one-click backup and restore functionality, supporting various cloud storage solutions to ensure data integrity and availability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP Server&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://github.com/1Panel-dev/mcp-1panel&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;1Panel MCP Server&lt;/a&gt; allow user to execute server operations via natural language.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;quick-start&#34;&gt;Quick Start
&lt;/h2&gt;&lt;p&gt;Execute the script below and follow the prompts to install 1Panel:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -sSL https://resource.1panel.pro/quick_start.sh -o quick_start.sh &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; bash quick_start.sh
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Please refer to our &lt;a class=&#34;link&#34; href=&#34;https://docs.1panel.pro/quick_start/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;中国用户请使用这个 &lt;a class=&#34;link&#34; href=&#34;https://1panel.cn/docs/installation/online_installation/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;安装脚本&lt;/a&gt;，其应用数量比国际版本更丰富。&lt;/p&gt;
&lt;h2 id=&#34;screenshot&#34;&gt;Screenshot
&lt;/h2&gt;&lt;p&gt;&lt;img src=&#34;https://resource.1panel.pro/img/1panel.png&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;UI Display&#34;
	
	
&gt;&lt;/p&gt;
&lt;h2 id=&#34;star-history&#34;&gt;Star History
&lt;/h2&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://star-history.com/#1Panel-dev/1Panel&amp;amp;Date&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://api.star-history.com/svg?repos=1Panel-dev/1Panel&amp;amp;type=Date&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Star History Chart&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;pro-edition&#34;&gt;Pro Edition
&lt;/h2&gt;&lt;p&gt;Compared to the OSS Edition, 1Panel Pro Edition provides users with a wealth of enhanced features and technical support services. Enhanced features include WAF enhancement, Website monitoring, Mobile APP, custom logo and theme, etc.&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://1panel.pro/pricing&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Click to see Pro Edition details&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;security-information&#34;&gt;Security Information
&lt;/h2&gt;&lt;p&gt;If you discover any security issues, please refer to &lt;a class=&#34;link&#34; href=&#34;https://producthunt.programnotes.cn/SECURITY.md&#34; &gt;SECURITY.md&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;Licensed under The GNU General Public License version 3 (GPLv3)  (the &amp;ldquo;License&amp;rdquo;); you may not use this file except in compliance with the License. You may obtain a copy of the License at&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.gnu.org/licenses/gpl-3.0.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://www.gnu.org/licenses/gpl-3.0.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an &amp;ldquo;AS IS&amp;rdquo; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
