<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>TTS on Producthunt daily</title>
        <link>https://producthunt.programnotes.cn/en/tags/tts/</link>
        <description>Recent content in TTS on Producthunt daily</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Tue, 21 Oct 2025 07:30:35 +0000</lastBuildDate><atom:link href="https://producthunt.programnotes.cn/en/tags/tts/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Product Hunt Daily | 2025-10-21</title>
        <link>https://producthunt.programnotes.cn/en/p/product-hunt-daily-2025-10-21/</link>
        <pubDate>Tue, 21 Oct 2025 07:30:35 +0000</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/product-hunt-daily-2025-10-21/</guid>
        <description>&lt;img src="https://ph-files.imgix.net/3092ad8c-69f9-4198-b0c6-4e148cd1bb66.png?auto=format" alt="Featured image of post Product Hunt Daily | 2025-10-21" /&gt;&lt;h2 id=&#34;1-fish-audio-s1&#34;&gt;1. Fish Audio S1
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Expressive Voice Cloning and Text-to-Speech&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Fish Audio S1 is the most expressive and emotionally rich TTS model—creating lifelike voices that capture emotion, rhythm, and nuance. Clone any voice in 10 seconds, preserving accent, tone, and speaking habits with unmatched realism.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/RXO5YOK7ZBZYFG?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/fish-speech?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/3092ad8c-69f9-4198-b0c6-4e148cd1bb66.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Fish Audio S1&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Voice cloning, text-to-speech, TTS, expressive, lifelike voices, emotion, rhythm, nuance, voice cloning, accent, tone, realism, Fish Audio S1&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺413&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;2-replymer&#34;&gt;2. Replymer
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Human replies that sell your product&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Replymer helps your brand grow through authentic, human‑written replies that recommend your product in the right conversations.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/ATCTFUFRUDRMHA?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/replymer?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/4f07fe8a-bb07-4ee8-8060-c848711686e8.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Replymer&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Human replies, product recommendations, brand growth, authentic replies, social selling, conversation marketing&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺379&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;3-logic-inc&#34;&gt;3. Logic, Inc.
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Automate recurring decisions in plain English&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Logic automates recurring decisions and reviews. Write your process once in plain English, and automate it anywhere. From content moderation to invoice processing, Logic lets you deploy in minutes, not months.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/LKORSMXKRP6577?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/logic-effortless-operational-magic?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/3b9c2e5e-f9f3-4746-8354-40d798608a71.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Logic, Inc.&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Automation, Decisions, Plain English, Process Automation, No-Code, Content Moderation, Invoice Processing, Deploy Quickly&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺291&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;4-voice-gecko&#34;&gt;4. Voice Gecko
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Voice dictation at your fingertips—type less, say more.&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Instant dictation for desktop. Press a shortcut, speak, and instantly get accurate text on your clipboard—perfect for emails, coding, AI prompts, or brain dumps.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/IN6NQQFFTBMSWU?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/voice-gecko?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/5b1a074d-221e-4952-aa01-ae53fb806e3e.jpeg?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Voice Gecko&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: voice dictation, dictation software, voice to text, speech to text, clipboard, desktop, productivity, typing, shortcut, AI prompts, brain dump, voice input&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺237&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;5-simplora&#34;&gt;5. Simplora
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Meetings that make you smarter, not confused&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Never feel lost in a meeting again! Simplora turns every conversation into a unique learning experience, in real-time and beyond. Available wherever you meet. No download required. Get started for free.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/EGCNW2QYNQ52JB?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/simplora?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/8d6520ce-6029-468d-a074-d99967a9dccc.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Simplora&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: Meetings, learning, real-time, no download, free, smarter, confusion, conversation, Simplora&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺188&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;6-diny&#34;&gt;6. diny
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: From git diff to clean commits&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: diny automates commit messages from your staged changes. Clean, consistent, conventional. Includes a timeline view of past commits to keep your history crystal clear.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/GCRJHKK2B3RWT5?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/diny?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/539d4587-2480-44fe-9e56-e972a86a8945.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;diny&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: git commits, commit messages, automation, git diff, clean commits, conventional commits, commit history, timeline view, developer tools&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺156&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;7-pylon&#34;&gt;7. Pylon
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: The support platform built for B2B&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: AI-Native support platform built for B2B companies. One tool for your ticketing, chat, knowledge base, AI support, account intelligence, and more.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/XWNWAI7CGNNFJB?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/pylon-4?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/b4955206-9405-4fbb-a55b-28ae15e6a5e5.jpeg?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Pylon&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: B2B support, AI support, ticketing, chat, knowledge base, account intelligence, support platform, AI-native&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺138&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;8-app2dev&#34;&gt;8. App2.dev
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Turn ideas &amp;amp; Figma designs into complete web &amp;amp; mobile apps&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Turn your ideas &amp;amp; Figma designs into web &amp;amp; mobile apps in minutes with backend, database, and authentication - all powered by AI.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/JWOK7RUANFXLZY?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/app2-dev?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/4303b056-9478-4598-8b41-cfb83162495c.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;App2.dev&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: App development, Figma to app, web app, mobile app, AI, no-code, backend, database, authentication, rapid development&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺114&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;9-aden-ai&#34;&gt;9. Aden AI
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Turn any file into a chatbot course &amp;amp; get certified with AI&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: We built the Aden Training Agent - it transforms any file or manual into an interactive AI course for workforce training or certification. Try our Mindfulness Agent that teaches focus under pressure, or upload your own file to create a smart, adaptive course.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/H4JP7VGQUIIIDS?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/ai-powered-form-that-fills-itself?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/efb99060-26a3-47fb-8cad-558e3118c08d.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Aden AI&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI chatbot course, file to course, workforce training, AI certification, adaptive learning, Mindfulness Agent, training agent, smart course&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺104&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;10-vibeonly&#34;&gt;10. VibeOnly
&lt;/h2&gt;&lt;p&gt;&lt;strong&gt;Tagline&lt;/strong&gt;: Helping companies screen and hire AI-fluent employees&lt;br&gt;
&lt;strong&gt;Description&lt;/strong&gt;: Everyone says &amp;ldquo;AI won’t take your job. People who use it will&amp;rdquo;. Vibeonly helps you hire those people. It’s a test that shows who really knows how to use AI tools really well. Perfect for founders and hiring managers who want elite AI fluent talent.&lt;br&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/r/7LP6JGJC5IXNPT?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;open&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Product Hunt&lt;/strong&gt;: &lt;a class=&#34;link&#34; href=&#34;https://www.producthunt.com/products/vibeonly?utm_campaign=producthunt-api&amp;amp;utm_medium=api-v2&amp;amp;utm_source=Application%3A&amp;#43;weekly&amp;#43;%28ID%3A&amp;#43;148189%29&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;View on Product Hunt&lt;/a&gt;&lt;br&gt;
&lt;img src=&#34;https://ph-files.imgix.net/ec9cc838-b83d-4c33-995b-fc03c39ec778.png?auto=format&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;VibeOnly&#34;
	
	
&gt;&lt;br&gt;
&lt;strong&gt;Keyword&lt;/strong&gt;: AI hiring, AI fluency, employee screening, AI talent, hiring, AI tools, VibeOnly, talent acquisition&lt;br&gt;
&lt;strong&gt;VotesCount&lt;/strong&gt;: 🔺100&lt;br&gt;
&lt;strong&gt;Featured&lt;/strong&gt;: Yes&lt;br&gt;
&lt;strong&gt;CreatedAt&lt;/strong&gt;: 2025-10-20 07:01 AM (UTC)&lt;/p&gt;
&lt;hr&gt;
</description>
        </item>
        <item>
        <title>abogen</title>
        <link>https://producthunt.programnotes.cn/en/p/abogen/</link>
        <pubDate>Tue, 02 Sep 2025 15:30:10 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/abogen/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1642970047680-c940bb0bcf03?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTY3OTgwOTd8&amp;ixlib=rb-4.1.0" alt="Featured image of post abogen" /&gt;&lt;h1 id=&#34;denizsafakabogen&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;denizsafak/abogen&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;abogen&#34;&gt;abogen &lt;img width=&#34;40px&#34; title=&#34;abogen icon&#34; src=&#34;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/abogen/assets/icon.ico&#34; align=&#34;right&#34; style=&#34;padding-left: 10px; padding-top:5px;&#34;&gt;
&lt;/h1&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/actions&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://github.com/denizsafak/abogen/actions/workflows/test_pip.yml/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Build Status&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/releases/latest&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/github/v/release/denizsafak/abogen&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub Release&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/abogen/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/pyversions/abogen&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Abogen PyPi Python Versions&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/releases/latest&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/os-windows%20%7C%20linux%20%7C%20macos%20-blue&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Operating Systems&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/psf/black&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/code%20style-black-000000.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Code style: black&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://opensource.org/licenses/MIT&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/License-MIT-maroon.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;License: MIT&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Abogen is a powerful text-to-speech conversion tool that makes it easy to turn ePub, PDF, or text files into high-quality audio with matching subtitles in seconds. Use it for audiobooks, voiceovers for Instagram, YouTube, TikTok, or any project that needs natural-sounding text-to-speech, using &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/hexgrad/Kokoro-82M&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Kokoro-82M&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img title=&#34;Abogen Main&#34; src=&#39;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/demo/abogen.png&#39; width=&#34;380&#34;&gt; &lt;img title=&#34;Abogen Processing&#34; src=&#39;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/demo/abogen2.png&#39; width=&#34;380&#34;&gt;&lt;/p&gt;
&lt;h2 id=&#34;demo&#34;&gt;Demo
&lt;/h2&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/user-attachments/assets/094ba3df-7d66-494a-bc31-0e4b41d0b865&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/user-attachments/assets/094ba3df-7d66-494a-bc31-0e4b41d0b865&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This demo was generated in just 5 seconds, producing ∼1 minute of audio with perfectly synced subtitles. To create a similar video, see &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/tree/main/demo&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;the demo guide&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;how-to-install&#34;&gt;&lt;code&gt;How to install?&lt;/code&gt; &lt;a href=&#34;https://pypi.org/project/abogen/&#34; target=&#34;_blank&#34;&gt;&lt;img src=&#34;https://img.shields.io/pypi/pyversions/abogen&#34; alt=&#34;Abogen Compatible PyPi Python Versions&#34; align=&#34;right&#34; style=&#34;margin-top:6px;&#34;&gt;&lt;/a&gt;
&lt;/h2&gt;&lt;h3 id=&#34;windows&#34;&gt;Windows
&lt;/h3&gt;&lt;p&gt;Go to &lt;a class=&#34;link&#34; href=&#34;https://github.com/espeak-ng/espeak-ng/releases/latest&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;espeak-ng latest release&lt;/a&gt; download and run the *.msi file.&lt;/p&gt;
&lt;h4 id=&#34;option-1-install-using-script&#34;&gt;OPTION 1: Install using script
&lt;/h4&gt;&lt;ol&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/archive/refs/heads/main.zip&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Download&lt;/a&gt; the repository&lt;/li&gt;
&lt;li&gt;Extract the ZIP file&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;WINDOWS_INSTALL.bat&lt;/code&gt; by double-clicking it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This method handles everything automatically - installing all dependencies including CUDA in a self-contained environment without requiring a separate Python installation. (You still need to install &lt;a class=&#34;link&#34; href=&#34;https://github.com/espeak-ng/espeak-ng/releases/latest&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;espeak-ng&lt;/a&gt;.)&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[!NOTE]
You don&amp;rsquo;t need to install Python separately. The script will install Python automatically.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 id=&#34;option-2-install-using-pip&#34;&gt;OPTION 2: Install using pip
&lt;/h4&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Create a virtual environment (optional)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;mkdir abogen &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python -m venv venv
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;venv&lt;span class=&#34;se&#34;&gt;\S&lt;/span&gt;cripts&lt;span class=&#34;se&#34;&gt;\a&lt;/span&gt;ctivate
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# For NVIDIA GPUs:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# For AMD GPUs:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Not supported yet, because ROCm is not available on Windows. Use Linux if you have AMD GPU.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Install abogen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install abogen
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;mac&#34;&gt;Mac
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Install espeak-ng&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;brew install espeak-ng
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Create a virtual environment (recommended)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;mkdir abogen &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 -m venv venv
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Install abogen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# For Silicon Mac (M1, M2 etc.)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# After installing abogen, we need to install Kokoro&amp;#39;s development version which includes MPS support.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install git+https://github.com/hexgrad/kokoro.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;linux&#34;&gt;Linux
&lt;/h3&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Install espeak-ng&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo apt install espeak-ng &lt;span class=&#34;c1&#34;&gt;# Ubuntu/Debian&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo pacman -S espeak-ng &lt;span class=&#34;c1&#34;&gt;# Arch Linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sudo dnf install espeak-ng &lt;span class=&#34;c1&#34;&gt;# Fedora&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Create a virtual environment (recommended)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;mkdir abogen &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python3 -m venv venv
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Install abogen&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# For NVIDIA GPUs:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Already supported, no need to install CUDA separately.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# For AMD GPUs:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# After installing abogen, we need to uninstall the existing torch package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 uninstall torch 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;[!TIP]
If you get &lt;code&gt;WARNING: The script abogen-cli is installed in &#39;/home/username/.local/bin&#39; which is not on PATH.&lt;/code&gt; error, run the following command to add it to your PATH:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;echo&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;export PATH=\&amp;#34;/home/&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$USER&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;/.local/bin:\$PATH\&amp;#34;&amp;#34;&lt;/span&gt; &amp;gt;&amp;gt; ~/.bashrc &lt;span class=&#34;o&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&#34;nb&#34;&gt;source&lt;/span&gt; ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;[!TIP]
If you get &amp;ldquo;No matching distribution found&amp;rdquo; error, try installing it on supported Python (3.10 to 3.12). You can use &lt;a class=&#34;link&#34; href=&#34;https://github.com/pyenv/pyenv&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;pyenv&lt;/a&gt; to manage multiple Python versions easily in Linux. Watch this &lt;a class=&#34;link&#34; href=&#34;https://www.youtube.com/watch?v=MVyb-nI4KyI&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;video&lt;/a&gt; by NetworkChuck for a quick guide.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Special thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/hg000125&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@hg000125&lt;/a&gt; for his contribution in &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/issues/23&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#23&lt;/a&gt;. AMD GPU support is possible thanks to his work.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;how-to-run&#34;&gt;&lt;code&gt;How to run?&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;If you installed using pip, you can simply run the following command to start Abogen:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;abogen
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;blockquote&gt;
&lt;p&gt;[!TIP]
If you installed using the Windows installer &lt;code&gt;(WINDOWS_INSTALL.bat)&lt;/code&gt;, It should have created a shortcut in the same folder, or your desktop. You can run it from there. If you lost the shortcut, Abogen is located in &lt;code&gt;python_embedded/Scripts/abogen.exe&lt;/code&gt;. You can run it from there directly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;how-to-use&#34;&gt;&lt;code&gt;How to use?&lt;/code&gt;
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Drag and drop any ePub, PDF, or text file (or use the built-in text editor)&lt;/li&gt;
&lt;li&gt;Configure the settings:
&lt;ul&gt;
&lt;li&gt;Set speech speed&lt;/li&gt;
&lt;li&gt;Select a voice (or create a custom voice using voice mixer)&lt;/li&gt;
&lt;li&gt;Select subtitle generation style (by sentence, word, etc.)&lt;/li&gt;
&lt;li&gt;Select output format&lt;/li&gt;
&lt;li&gt;Select where to save the output&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Hit Start&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;in-action&#34;&gt;&lt;code&gt;In action&lt;/code&gt;
&lt;/h2&gt;&lt;img title=&#34;Abogen in action&#34; src=&#39;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/demo/abogen.gif&#39;&gt; 
&lt;p&gt;Here’s Abogen in action: in this demo, it processes ∼3,000 characters of text in just 11 seconds and turns it into 3 minutes and 28 seconds of audio, and I have a low-end &lt;strong&gt;RTX 2060 Mobile laptop GPU&lt;/strong&gt;. Your results may vary depending on your hardware.&lt;/p&gt;
&lt;h2 id=&#34;configuration&#34;&gt;&lt;code&gt;Configuration&lt;/code&gt;
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Options&lt;/th&gt;
          &lt;th&gt;Description&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Input Box&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Drag and drop &lt;code&gt;ePub&lt;/code&gt;, &lt;code&gt;PDF&lt;/code&gt;, or &lt;code&gt;.TXT&lt;/code&gt; files (or use built-in text editor)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Queue options&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Add multiple files to a queue and process them in batch, with individual settings for each file. See &lt;a class=&#34;link&#34; href=&#34;#queue-mode&#34; &gt;Queue mode&lt;/a&gt; for more details.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Adjust speech rate from &lt;code&gt;0.1x&lt;/code&gt; to &lt;code&gt;2.0x&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Select Voice&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;First letter of the language code (e.g., &lt;code&gt;a&lt;/code&gt; for American English, &lt;code&gt;b&lt;/code&gt; for British English, etc.), second letter is for &lt;code&gt;m&lt;/code&gt; for male and &lt;code&gt;f&lt;/code&gt; for female.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Voice mixer&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Create custom voices by mixing different voice models with a profile system. See &lt;a class=&#34;link&#34; href=&#34;#voice-mixer&#34; &gt;Voice Mixer&lt;/a&gt; for more details.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Voice preview&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Listen to the selected voice before processing.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Generate subtitles&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;Disabled&lt;/code&gt;, &lt;code&gt;Sentence&lt;/code&gt;, &lt;code&gt;Sentence + Comma&lt;/code&gt;, &lt;code&gt;Sentence + Highlighting&lt;/code&gt;, &lt;code&gt;1 word&lt;/code&gt;, &lt;code&gt;2 words&lt;/code&gt;, &lt;code&gt;3 words&lt;/code&gt;, etc. (Represents the number of words in each subtitle entry)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Output voice format&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;.WAV&lt;/code&gt;, &lt;code&gt;.FLAC&lt;/code&gt;, &lt;code&gt;.MP3&lt;/code&gt;, &lt;code&gt;.OPUS (best compression)&lt;/code&gt; and &lt;code&gt;M4B (with chapters)&lt;/code&gt; (Special thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/jborza&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@jborza&lt;/a&gt; for chapter support in PR &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/pull/10&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#10&lt;/a&gt;)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Output subtitle format&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Configures the subtitle format as &lt;code&gt;SRT (standard)&lt;/code&gt;, &lt;code&gt;ASS (wide)&lt;/code&gt;, &lt;code&gt;ASS (narrow)&lt;/code&gt;, &lt;code&gt;ASS (centered wide)&lt;/code&gt;, or &lt;code&gt;ASS (centered narrow)&lt;/code&gt;.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Replace single newlines with spaces&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Replaces single newlines with spaces in the text. This is useful for texts that have imaginary line breaks.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Save location&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;Save next to input file&lt;/code&gt;, &lt;code&gt;Save to desktop&lt;/code&gt;, or &lt;code&gt;Choose output folder&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Book handler options&lt;/th&gt;
          &lt;th&gt;Description&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Chapter Control&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Select specific &lt;code&gt;chapters&lt;/code&gt; from ePUBs or &lt;code&gt;chapters + pages&lt;/code&gt; from PDFs.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Save each chapter separately&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Save each chapter in e-books as a separate audio file.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Create a merged version&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Create a single audio file that combines all chapters. (If &lt;code&gt;Save each chapter separately&lt;/code&gt; is disabled, this option will be the default behavior.)&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Save in a project folder with metadata&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Save the converted items in a project folder with available metadata files.&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Menu options&lt;/th&gt;
          &lt;th&gt;Description&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Theme&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Change the application&amp;rsquo;s theme using &lt;code&gt;System&lt;/code&gt;, &lt;code&gt;Light&lt;/code&gt;, or &lt;code&gt;Dark&lt;/code&gt; options.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Configure max words per subtitle&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Configures the maximum number of words per subtitle entry.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Configure max lines in log window&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Configures the maximum number of lines to display in the log window.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Separate chapters audio format&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Configures the audio format for separate chapters as &lt;code&gt;wav&lt;/code&gt;, &lt;code&gt;flac&lt;/code&gt;, &lt;code&gt;mp3&lt;/code&gt;, or &lt;code&gt;opus&lt;/code&gt;.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Create desktop shortcut&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Creates a shortcut on your desktop for easy access.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Open config directory&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Opens the directory where the configuration file is stored.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Open cache directory&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Opens the cache directory where converted text files are stored.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Clear cache files&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Deletes cache files created during the conversion or preview.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Check for updates at startup&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Automatically checks for updates when the program starts.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Disable Kokoro&amp;rsquo;s internet access&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Prevents Kokoro from downloading models or voices from HuggingFace Hub, useful for offline use.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Reset to default settings&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Resets all settings to their default values.&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;blockquote&gt;
&lt;p&gt;Special thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/robmckinnon&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@robmckinnon&lt;/a&gt; for adding Sentence + Highlighting feature in PR &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/pull/65&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#65&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;voice-mixer&#34;&gt;&lt;code&gt;Voice Mixer&lt;/code&gt;
&lt;/h2&gt;&lt;img title=&#34;Abogen Voice Mixer&#34; src=&#39;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/demo/voice_mixer.png&#39;&gt;
&lt;p&gt;With voice mixer, you can create custom voices by mixing different voice models. You can adjust the weight of each voice and save your custom voice as a profile for future use. The voice mixer allows you to create unique and personalized voices. (Huge thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/jborza&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@jborza&lt;/a&gt; for making this possible through his contributions in &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/pull/5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#5&lt;/a&gt;)&lt;/p&gt;
&lt;h2 id=&#34;queue-mode&#34;&gt;&lt;code&gt;Queue Mode&lt;/code&gt;
&lt;/h2&gt;&lt;img title=&#34;Abogen queue mode&#34; src=&#39;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/demo/queue.png&#39;&gt;
&lt;p&gt;Abogen supports &lt;strong&gt;queue mode&lt;/strong&gt;, allowing you to add multiple files to a processing queue. This is useful if you want to convert several files in one batch.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can add text files (&lt;code&gt;.txt&lt;/code&gt;) directly using the &lt;strong&gt;Add files&lt;/strong&gt; button in the Queue Manager. To add PDF or EPUB files, use the input box in the main window and click the &lt;strong&gt;Add to Queue&lt;/strong&gt; button.&lt;/li&gt;
&lt;li&gt;Each file in the queue keeps the configuration settings that were active when it was added. Changing the main window configuration afterward does &lt;strong&gt;not&lt;/strong&gt; affect files already in the queue.&lt;/li&gt;
&lt;li&gt;You can view each file&amp;rsquo;s configuration by hovering over them.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Abogen will process each item in the queue automatically, saving outputs as configured.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Special thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/jborza&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@jborza&lt;/a&gt; for adding queue mode in PR &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/pull/35&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#35&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;about-chapter-markers&#34;&gt;&lt;code&gt;About Chapter Markers&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;When you process ePUB or PDF files, Abogen converts them into text files stored in your cache directory. When you click &amp;ldquo;Edit,&amp;rdquo; you&amp;rsquo;re actually modifying these converted text files. In these text files, you&amp;rsquo;ll notice tags that look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;CHAPTER_MARKER:Chapter Title&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;These are chapter markers. They are automatically added when you process ePUB or PDF files, based on the chapters you select. They serve an important purpose:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Allow you to split the text into separate audio files for each chapter&lt;/li&gt;
&lt;li&gt;Save time by letting you reprocess only specific chapters if errors occur, rather than the entire file&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can manually add these markers to plain text files for the same benefits. Simply include them in your text like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;CHAPTER_MARKER:Introduction&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;This is the beginning of my text...  
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;CHAPTER_MARKER:Main Content&amp;gt;&amp;gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;Here&amp;#39;s another part...  
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;When you process the text file, Abogen will detect these markers automatically and ask if you want to save each chapter separately and create a merged version.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://raw.githubusercontent.com/denizsafak/abogen/refs/heads/main/demo/chapter_marker.png&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Abogen Chapter Marker&#34;
	
	
&gt;&lt;/p&gt;
&lt;h2 id=&#34;about-metadata-tags&#34;&gt;&lt;code&gt;About Metadata Tags&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Similar to chapter markers, it is possible to add metadata tags for &lt;code&gt;M4B&lt;/code&gt; files. This is useful for audiobook players that support metadata, allowing you to add information like title, author, year, etc. Abogen automatically adds these tags when you process ePUB or PDF files, but you can also add them manually to your text files. Add metadata tags &lt;strong&gt;at the beginning of your text file&lt;/strong&gt; like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_TITLE:Title&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_ARTIST:Author&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_ALBUM:Album Title&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_YEAR:Year&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_ALBUM_ARTIST:Album Artist&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_COMPOSER:Narrator&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&amp;lt;&amp;lt;METADATA_GENRE:Audiobook&amp;gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;supported-languages&#34;&gt;&lt;code&gt;Supported Languages&lt;/code&gt;
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇺🇸 &amp;#39;a&amp;#39; =&amp;gt; American English, 🇬🇧 &amp;#39;b&amp;#39; =&amp;gt; British English
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇪🇸 &amp;#39;e&amp;#39; =&amp;gt; Spanish es
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇫🇷 &amp;#39;f&amp;#39; =&amp;gt; French fr-fr
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇮🇳 &amp;#39;h&amp;#39; =&amp;gt; Hindi hi
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇮🇹 &amp;#39;i&amp;#39; =&amp;gt; Italian it
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇯🇵 &amp;#39;j&amp;#39; =&amp;gt; Japanese: pip install misaki[ja]
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇧🇷 &amp;#39;p&amp;#39; =&amp;gt; Brazilian Portuguese pt-br
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# 🇨🇳 &amp;#39;z&amp;#39; =&amp;gt; Mandarin Chinese: pip install misaki[zh]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For a complete list of supported languages and voices, refer to Kokoro&amp;rsquo;s &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;VOICES.md&lt;/a&gt;. To listen to sample audio outputs, see &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;SAMPLES.md&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[!NOTE]
Japanese audio may require additional configuration. Please check &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/issues/56&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#56&lt;/a&gt; for more information.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;mpv-config&#34;&gt;&lt;code&gt;MPV Config&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;I highly recommend using &lt;a class=&#34;link&#34; href=&#34;https://mpv.io/installation/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;MPV&lt;/a&gt; to play your audio files, as it supports displaying subtitles even without a video track. Here&amp;rsquo;s my &lt;code&gt;mpv.conf&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# --- MPV Settings ---
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;save-position-on-quit
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;keep-open=yes
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# --- Subtitle ---
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sub-ass-override=no
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sub-margin-y=50
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;sub-margin-x=50
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;# --- Audio Quality ---
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;audio-spdif=ac3,dts,eac3,truehd,dts-hd
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;audio-channels=auto
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;audio-samplerate=48000
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;volume-max=200
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;docker-guide&#34;&gt;&lt;code&gt;Docker Guide&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;If you want to run Abogen in a Docker container:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/archive/refs/heads/main.zip&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Download the repository&lt;/a&gt; and extract, or clone it using git.&lt;/li&gt;
&lt;li&gt;Go to &lt;code&gt;abogen&lt;/code&gt; folder. You should see &lt;code&gt;Dockerfile&lt;/code&gt; there.&lt;/li&gt;
&lt;li&gt;Open your termminal in that directory and run the following commands:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Build the Docker image:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker build --progress plain -t abogen .
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Note that building the image may take a while.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# After building is complete, run the Docker container:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Windows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker run --name abogen -v %cd%:/shared -p 5800:5800 -p 5900:5900 --gpus all abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Linux&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker run --name abogen -v &lt;span class=&#34;k&#34;&gt;$(&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;pwd&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;)&lt;/span&gt;:/shared -p 5800:5800 -p 5900:5900 --gpus all abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# MacOS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker run --name abogen -v &lt;span class=&#34;k&#34;&gt;$(&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;pwd&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;)&lt;/span&gt;:/shared -p 5800:5800 -p 5900:5900 abogen
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# We expose port 5800 for use by a web browser, 5900 if you want to connect with a VNC client.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Abogen launches automatically inside the container.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can access it via a web browser at &lt;a class=&#34;link&#34; href=&#34;http://localhost:5800&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;http://localhost:5800&lt;/a&gt; or connect to it using a VNC client at &lt;code&gt;localhost:5900&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;You can use &lt;code&gt;/shared&lt;/code&gt; directory to share files between your host and the container.&lt;/li&gt;
&lt;li&gt;For later use, start it with &lt;code&gt;docker start abogen&lt;/code&gt; and stop it with &lt;code&gt;docker stop abogen&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Known issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Audio preview is not working inside container (ALSA error).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Open cache directory&lt;/code&gt; and &lt;code&gt;Open configuration directory&lt;/code&gt; options in settings not working. (Tried pcmanfm, did not work with Abogen).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(Special thanks to &lt;a class=&#34;link&#34; href=&#34;https://www.reddit.com/user/geo38/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@geo38&lt;/a&gt; from Reddit, who provided the Dockerfile and instructions in &lt;a class=&#34;link&#34; href=&#34;https://www.reddit.com/r/selfhosted/comments/1k8x1yo/comment/mpe0bz8/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;this comment&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;similar-projects&#34;&gt;&lt;code&gt;Similar Projects&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Abogen is a standalone project, but it is inspired by and shares some similarities with other projects. Here are a few:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/santinic/audiblez&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;audiblez&lt;/a&gt;: Generate audiobooks from e-books. &lt;strong&gt;(Has CLI and GUI support)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/plusuncold/autiobooks&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;autiobooks&lt;/a&gt;: Automatically convert epubs to audiobooks&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/mateogon/pdf-narrator&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;pdf-narrator&lt;/a&gt;: Convert your PDFs and EPUBs into audiobooks effortlessly.&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/p0n1/epub_to_audiobook&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;epub_to_audiobook&lt;/a&gt;: EPUB to audiobook converter, optimized for Audiobookshelf&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/DrewThomasson/ebook2audiobook&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ebook2audiobook&lt;/a&gt;: Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;roadmap&#34;&gt;&lt;code&gt;Roadmap&lt;/code&gt;
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;input disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Add OCR scan feature for PDF files using docling/teserract.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Add chapter metadata for .m4a files. (Issue &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/issues/9&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#9&lt;/a&gt;, PR &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/pull/10&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#10&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;input disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Add support for different languages in GUI.&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Add voice formula feature that enables mixing different voice models. (Issue &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/issues/1&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#1&lt;/a&gt;, PR &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/pull/5&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;#5&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;input disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Add support for kokoro-onnx (If it&amp;rsquo;s necessary).&lt;/li&gt;
&lt;li&gt;&lt;input checked=&#34;&#34; disabled=&#34;&#34; type=&#34;checkbox&#34;&gt; Add dark mode.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;troubleshooting&#34;&gt;&lt;code&gt;Troubleshooting&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;If you encounter any issues while running Abogen, try launching it from the command line with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;abogen-cli
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;This will start Abogen in command-line mode and display detailed error messages. Please open a new issue on the &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Issues&lt;/a&gt; page with the error message and a description of your problem.&lt;/p&gt;
&lt;h2 id=&#34;contributing&#34;&gt;&lt;code&gt;Contributing&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;I welcome contributions! If you have ideas for new features, improvements, or bug fixes, please fork the repository and submit a pull request.&lt;/p&gt;
&lt;h3 id=&#34;for-developers-and-contributors&#34;&gt;For developers and contributors
&lt;/h3&gt;&lt;p&gt;If you&amp;rsquo;d like to modify the code and contribute to development, you can &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/archive/refs/heads/main.zip&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;download the repository&lt;/a&gt;, extract it and run the following commands to build &lt;strong&gt;or&lt;/strong&gt; install the package:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Go to the directory where you extracted the repository and run:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install -e .      &lt;span class=&#34;c1&#34;&gt;# Installs the package in editable mode&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install build     &lt;span class=&#34;c1&#34;&gt;# Install the build package&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;python -m build       &lt;span class=&#34;c1&#34;&gt;# Builds the package in dist folder (optional)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;abogen                &lt;span class=&#34;c1&#34;&gt;# Opens the GUI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Feel free to explore the code and make any changes you like.&lt;/p&gt;
&lt;h2 id=&#34;credits&#34;&gt;&lt;code&gt;Credits&lt;/code&gt;
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Abogen uses &lt;a class=&#34;link&#34; href=&#34;https://github.com/hexgrad/kokoro&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Kokoro&lt;/a&gt; for its high-quality, natural-sounding text-to-speech synthesis. Huge thanks to the Kokoro team for making this possible.&lt;/li&gt;
&lt;li&gt;Thanks to &lt;a class=&#34;link&#34; href=&#34;https://github.com/wojiushixiaobai&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;@wojiushixiaobai&lt;/a&gt; for &lt;a class=&#34;link&#34; href=&#34;https://github.com/wojiushixiaobai/Python-Embed-Win64&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Embedded Python&lt;/a&gt; packages. These modified packages include pip pre-installed, enabling Abogen to function as a standalone application without requiring users to separately install Python in Windows.&lt;/li&gt;
&lt;li&gt;Thanks to creators of &lt;a class=&#34;link&#34; href=&#34;https://github.com/aerkalov/ebooklib&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;EbookLib&lt;/a&gt;, a Python library for reading and writing ePub files, which is used for extracting text from ePub files.&lt;/li&gt;
&lt;li&gt;Special thanks to the &lt;a class=&#34;link&#34; href=&#34;https://www.riverbankcomputing.com/software/pyqt/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;PyQt&lt;/a&gt; team for providing the cross-platform GUI toolkit that powers Abogen&amp;rsquo;s interface.&lt;/li&gt;
&lt;li&gt;Icons: &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/aRiu1GGi6Aoe/usa&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;US&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/t3NE3BsOAQwq/great-britain&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Great Britain&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/ly7tzANRt33n/spain&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Spain&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/3muzEmi4dpD5/france&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;France&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/esGVrxg9VCJ1/india&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;India&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/PW8KZnP7qXzO/italy&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Italy&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/McQbrq9qaQye/japan&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Japan&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/zHmH8HpOmM90/brazil&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Brazil&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/Ej50Oe3crXwF/china&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;China&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/uI49hxbpxTkp/female&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Female&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/12351/male&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Male&lt;/a&gt;, &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/21698/adjust&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Adjust&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/icon/GskSeVoroQ7u/voice-id&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Voice Id&lt;/a&gt; icons by &lt;a class=&#34;link&#34; href=&#34;https://icons8.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Icons8&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;license&#34;&gt;&lt;code&gt;License&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;This project is available under the MIT License - see the &lt;a class=&#34;link&#34; href=&#34;https://github.com/denizsafak/abogen/blob/main/LICENSE&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LICENSE&lt;/a&gt; file for details.
&lt;a class=&#34;link&#34; href=&#34;https://github.com/hexgrad/kokoro&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Kokoro&lt;/a&gt; is licensed under &lt;a class=&#34;link&#34; href=&#34;https://github.com/hexgrad/kokoro/blob/main/LICENSE&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Apache-2.0&lt;/a&gt; which allows commercial use, modification, distribution, and private use.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[!IMPORTANT]
Subtitle generation currently works only for English. This is because Kokoro provides timestamp tokens only for English text. If you want subtitles in other languages, please request this feature in the &lt;a class=&#34;link&#34; href=&#34;https://github.com/hexgrad/kokoro&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Kokoro project&lt;/a&gt;. For more technical details, see &lt;a class=&#34;link&#34; href=&#34;https://github.com/hexgrad/kokoro/blob/6d87f4ae7abc2d14dbc4b3ef2e5f19852e861ac2/kokoro/pipeline.py#L383&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;this line&lt;/a&gt; in the Kokoro&amp;rsquo;s code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Tags: audiobook, kokoro, text-to-speech, TTS, audiobook generator, audiobooks, text to speech, audiobook maker, audiobook creator, audiobook generator, voice-synthesis, text to audio, text to audio converter, text to speech converter, text to speech generator, text to speech software, text to speech app, epub to audio, pdf to audio, content-creation, media-generation&lt;/p&gt;
&lt;/blockquote&gt;
</description>
        </item>
        <item>
        <title>NeMo</title>
        <link>https://producthunt.programnotes.cn/en/p/nemo/</link>
        <pubDate>Sat, 10 May 2025 15:25:32 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/nemo/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1729952832073-bf7d3d6150cd?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NDY4NjE4OTR8&amp;ixlib=rb-4.1.0" alt="Featured image of post NeMo" /&gt;&lt;h1 id=&#34;nvidianemo&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA/NeMo&lt;/a&gt;
&lt;/h1&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;http://www.repostatus.org/#active&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;http://www.repostatus.org/badges/latest/active.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Project Status: Active – The project has reached a stable, usable state and is being actively developed.&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://readthedocs.com/projects/nvidia-nemo/badge/?version=main&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Documentation&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/nvidia/nemo/actions/workflows/codeql.yml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://github.com/nvidia/nemo/actions/workflows/codeql.yml/badge.svg?branch=main&amp;amp;event=push&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;CodeQL&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/blob/master/LICENSE&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;NeMo core license and license for collections in this repo&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://badge.fury.io/py/nemo-toolkit&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://badge.fury.io/py/nemo-toolkit.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Release version&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://badge.fury.io/py/nemo-toolkit&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/pyversions/nemo-toolkit.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Python version&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://pepy.tech/project/nemo-toolkit&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://static.pepy.tech/personalized-badge/nemo-toolkit?period=total&amp;amp;units=international_system&amp;amp;left_color=grey&amp;amp;right_color=brightgreen&amp;amp;left_text=downloads&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPi total downloads&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/psf/black&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/code%20style-black-000000.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Code style: black&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1 id=&#34;nvidia-nemo-framework&#34;&gt;&lt;strong&gt;NVIDIA NeMo Framework&lt;/strong&gt;
&lt;/h1&gt;&lt;h2 id=&#34;latest-news&#34;&gt;Latest News
&lt;/h2&gt;&lt;!-- markdownlint-disable --&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;Pretrain and finetune :hugs:Hugging Face models via AutoModel&lt;/b&gt;&lt;/summary&gt;
      Nemo Framework&#39;s latest feature AutoModel enables broad support for :hugs:Hugging Face models, with 25.02 focusing on &lt;a href=https://huggingface.co/transformers/v3.5.1/model_doc/auto.html#automodelforcausallm&gt;AutoModelForCausalLM&lt;a&gt; in the &lt;a href=https://huggingface.co/models?pipeline_tag=text-generation&amp;sort=trending&gt;text generation category&lt;a&gt;. Future releases will enable support for more model families such as Vision Language Model.
&lt;/details&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;Training on Blackwell using Nemo&lt;/b&gt;&lt;/summary&gt;
      NeMo Framework has added Blackwell support, with 25.02 focusing on functional parity for B200. More optimizations to come in the upcoming releases.
&lt;/details&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;NeMo Framework 2.0&lt;/b&gt;&lt;/summary&gt;
      We&#39;ve released NeMo 2.0, an update on the NeMo Framework which prioritizes modularity and ease-of-use. Please refer to the &lt;a href=https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html&gt;NeMo Framework User Guide&lt;/a&gt; to get started.
&lt;/details&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;New Cosmos World Foundation Models Support&lt;/b&gt;&lt;/summary&gt;
    &lt;details&gt; 
      &lt;summary&gt; &lt;a href=&#34;https://developer.nvidia.com/blog/advancing-physical-ai-with-nvidia-cosmos-world-foundation-model-platform&#34;&gt;Advancing Physical AI with NVIDIA Cosmos World Foundation Model Platform &lt;/a&gt; (2025-01-09) 
      &lt;/summary&gt; 
        The end-to-end NVIDIA Cosmos platform accelerates world model development for physical AI systems. Built on CUDA, Cosmos combines state-of-the-art world foundation models, video tokenizers, and AI-accelerated data processing pipelines. Developers can accelerate world model development by fine-tuning Cosmos world foundation models or building new ones from the ground up. These models create realistic synthetic videos of environments and interactions, providing a scalable foundation for training complex systems, from simulating humanoid robots performing advanced actions to developing end-to-end autonomous driving models. 
        &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/accelerate-custom-video-foundation-model-pipelines-with-new-nvidia-nemo-framework-capabilities/&#34;&gt;
          Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities
        &lt;/a&gt; (2025-01-07)
      &lt;/summary&gt;
        The NeMo Framework now supports training and customizing the &lt;a href=&#34;https://github.com/NVIDIA/Cosmos&#34;&gt;NVIDIA Cosmos&lt;/a&gt; collection of world foundation models. Cosmos leverages advanced text-to-world generation techniques to create fluid, coherent video content from natural language prompts.
        &lt;br&gt;&lt;br&gt;
        You can also now accelerate your video processing step using the &lt;a href=&#34;https://developer.nvidia.com/nemo-curator-video-processing-early-access&#34;&gt;NeMo Curator&lt;/a&gt; library, which provides optimized video processing and captioning features that can deliver up to 89x faster video processing when compared to an unoptimized CPU pipeline.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
&lt;/details&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;Large Language Models and Multimodal Models&lt;/b&gt;&lt;/summary&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/state-of-the-art-multimodal-generative-ai-model-development-with-nvidia-nemo/&#34;&gt;
          State-of-the-Art Multimodal Generative AI Model Development with NVIDIA NeMo
        &lt;/a&gt; (2024-11-06)
      &lt;/summary&gt;
        NVIDIA recently announced significant enhancements to the NeMo platform, focusing on multimodal generative AI models. The update includes NeMo Curator and the Cosmos tokenizer, which streamline the data curation process and enhance the quality of visual data. These tools are designed to handle large-scale data efficiently, making it easier to develop high-quality AI models for various applications, including robotics and autonomous driving. The Cosmos tokenizers, in particular, efficiently map visual data into compact, semantic tokens, which is crucial for training large-scale generative models. The tokenizer is available now on the &lt;a href=http://github.com/NVIDIA/cosmos-tokenizer/NVIDIA/cosmos-tokenizer&gt;NVIDIA/cosmos-tokenizer&lt;/a&gt; GitHub repo and on &lt;a href=https://huggingface.co/nvidia/Cosmos-Tokenizer-CV8x8x8&gt;Hugging Face&lt;/a&gt;.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/llama/index.html#new-llama-3-1-support for more information/&#34;&gt;
        New Llama 3.1 Support
        &lt;/a&gt; (2024-07-23)
      &lt;/summary&gt;
        The NeMo Framework now supports training and customizing the Llama 3.1 collection of LLMs from Meta.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://aws.amazon.com/blogs/machine-learning/accelerate-your-generative-ai-distributed-training-workloads-with-the-nvidia-nemo-framework-on-amazon-eks/&#34;&gt;
          Accelerate your Generative AI Distributed Training Workloads with the NVIDIA NeMo Framework on Amazon EKS
        &lt;/a&gt; (2024-07-16)
      &lt;/summary&gt;
     NVIDIA NeMo Framework now runs distributed training workloads on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. For step-by-step instructions on creating an EKS cluster and running distributed training workloads with NeMo, see the GitHub repository &lt;a href=&#34;https://github.com/aws-samples/awsome-distributed-training/tree/main/3.test_cases/2.nemo-launcher/EKS/&#34;&gt; here.&lt;/a&gt;
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/nvidia-nemo-accelerates-llm-innovation-with-hybrid-state-space-model-support/&#34;&gt;
          NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support
        &lt;/a&gt; (2024/06/17)
      &lt;/summary&gt;
     NVIDIA NeMo and Megatron Core now support pre-training and fine-tuning of state space models (SSMs). NeMo also supports training models based on the Griffin architecture as described by Google DeepMind. 
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
      &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://huggingface.co/models?sort=trending&amp;search=nvidia%2Fnemotron-4-340B&#34;&gt;
          NVIDIA releases 340B base, instruct, and reward models pretrained on a total of 9T tokens.
        &lt;/a&gt; (2024-06-18)
      &lt;/summary&gt;
      See documentation and tutorials for SFT, PEFT, and PTQ with 
      &lt;a href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html&#34;&gt;
        Nemotron 340B 
      &lt;/a&gt;
      in the NeMo Framework User Guide.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/nvidia-sets-new-generative-ai-performance-and-scale-records-in-mlperf-training-v4-0/&#34;&gt;
          NVIDIA sets new generative AI performance and scale records in MLPerf Training v4.0
        &lt;/a&gt; (2024/06/12)
      &lt;/summary&gt;
      Using NVIDIA NeMo Framework and NVIDIA Hopper GPUs NVIDIA was able to scale to 11,616 H100 GPUs and achieve near-linear performance scaling on LLM pretraining. 
      NVIDIA also achieved the highest LLM fine-tuning performance and raised the bar for text-to-image training.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
        &lt;summary&gt;
          &lt;a href=&#34;https://cloud.google.com/blog/products/compute/gke-and-nvidia-nemo-framework-to-train-generative-ai-models&#34;&gt;
            Accelerate your generative AI journey with NVIDIA NeMo Framework on GKE
          &lt;/a&gt; (2024/03/16)
        &lt;/summary&gt;
        An end-to-end walkthrough to train generative AI models on the Google Kubernetes Engine (GKE) using the NVIDIA NeMo Framework is available at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. 
        The walkthrough includes detailed instructions on how to set up a Google Cloud Project and pre-train a GPT model using the NeMo Framework.
        &lt;br&gt;&lt;br&gt;
      &lt;/details&gt;
&lt;/details&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;Speech Recognition&lt;/b&gt;&lt;/summary&gt;
  &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/accelerating-leaderboard-topping-asr-models-10x-with-nvidia-nemo/&#34;&gt;
          Accelerating Leaderboard-Topping ASR Models 10x with NVIDIA NeMo
        &lt;/a&gt; (2024/09/24)
      &lt;/summary&gt;
      NVIDIA NeMo team released a number of inference optimizations for CTC, RNN-T, and TDT models that resulted in up to 10x inference speed-up. 
      These models now exceed an inverse real-time factor (RTFx) of 2,000, with some reaching RTFx of even 6,000.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/new-standard-for-speech-recognition-and-translation-from-the-nvidia-nemo-canary-model/&#34;&gt;
          New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model
        &lt;/a&gt; (2024/04/18)
      &lt;/summary&gt;
      The NeMo team just released Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. 
      Canary also provides bi-directional translation, between English and the three other supported languages.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/&#34;&gt;
          Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models
        &lt;/a&gt; (2024/04/18)
      &lt;/summary&gt;
      NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the Parakeet family of automatic speech recognition (ASR) models. 
      These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
  &lt;details&gt;
    &lt;summary&gt;
      &lt;a href=&#34;https://developer.nvidia.com/blog/turbocharge-asr-accuracy-and-speed-with-nvidia-nemo-parakeet-tdt/&#34;&gt;
        Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT
      &lt;/a&gt; (2024/04/18)
    &lt;/summary&gt;
    NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released Parakeet-TDT. 
    This new addition to the  NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B.
    &lt;br&gt;&lt;br&gt;
  &lt;/details&gt;
&lt;/details&gt;
&lt;!-- markdownlint-enable --&gt;
&lt;h2 id=&#34;introduction&#34;&gt;Introduction
&lt;/h2&gt;&lt;p&gt;NVIDIA NeMo Framework is a scalable and cloud-native generative AI
framework built for researchers and PyTorch developers working on Large
Language Models (LLMs), Multimodal Models (MMs), Automatic Speech
Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV)
domains. It is designed to help you efficiently create, customize, and
deploy new generative AI models by leveraging existing code and
pre-trained model checkpoints.&lt;/p&gt;
&lt;p&gt;For technical documentation, please see the &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo Framework User
Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;whats-new-in-nemo-20&#34;&gt;What&amp;rsquo;s New in NeMo 2.0
&lt;/h2&gt;&lt;p&gt;NVIDIA NeMo 2.0 introduces several significant improvements over its predecessor, NeMo 1.0, enhancing flexibility, performance, and scalability.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Python-Based Configuration&lt;/strong&gt; - NeMo 2.0 transitions from YAML files to a Python-based configuration, providing more flexibility and control. This shift makes it easier to extend and customize configurations programmatically.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Modular Abstractions&lt;/strong&gt; - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt; - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo-Run&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo-Run&lt;/a&gt;, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Overall, these enhancements make NeMo 2.0 a powerful, scalable, and user-friendly framework for AI model development.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[!IMPORTANT]&lt;br&gt;
NeMo 2.0 is currently supported by the LLM (large language model) and VLM (vision language model) collections.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id=&#34;get-started-with-nemo-20&#34;&gt;Get Started with NeMo 2.0
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Refer to the &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Quickstart&lt;/a&gt; for examples of using NeMo-Run to launch NeMo 2.0 experiments locally and on a slurm cluster.&lt;/li&gt;
&lt;li&gt;For more information about NeMo 2.0, see the &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo Framework User Guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo 2.0 Recipes&lt;/a&gt; contains additional examples of launching large-scale runs using NeMo 2.0 and NeMo-Run.&lt;/li&gt;
&lt;li&gt;For an in-depth exploration of the main features of NeMo 2.0, see the &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/index.html#feature-guide&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Feature Guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;To transition from NeMo 1.0 to 2.0, see the &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html#migration-guide&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Migration Guide&lt;/a&gt; for step-by-step instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;get-started-with-cosmos&#34;&gt;Get Started with Cosmos
&lt;/h3&gt;&lt;p&gt;NeMo Curator and NeMo Framework support video curation and post-training of the Cosmos World Foundation Models, which are open and available on &lt;a class=&#34;link&#34; href=&#34;https://catalog.ngc.nvidia.com/orgs/nvidia/teams/cosmos/collections/cosmos&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NGC&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/collections/nvidia/cosmos-6751e884dc10e013a0a0d8e6&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Hugging Face&lt;/a&gt;. For more information on video datasets, refer to &lt;a class=&#34;link&#34; href=&#34;https://developer.nvidia.com/nemo-curator&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo Curator&lt;/a&gt;. To post-train World Foundation Models using the NeMo Framework for your custom physical AI tasks, see the &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/diffusion/nemo/post_training/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Cosmos Diffusion models&lt;/a&gt; and the &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/Cosmos/blob/main/cosmos1/models/autoregressive/nemo/post_training/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Cosmos Autoregressive models&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;llms-and-mms-training-alignment-and-customization&#34;&gt;LLMs and MMs Training, Alignment, and Customization
&lt;/h2&gt;&lt;p&gt;All NeMo models are trained with
&lt;a class=&#34;link&#34; href=&#34;https://github.com/Lightning-AI/lightning&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Lightning&lt;/a&gt;. Training is
automatically scalable to 1000s of GPUs. You can check the performance benchmarks using the
latest NeMo Framework container &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance_summary.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When applicable, NeMo models leverage cutting-edge distributed training
techniques, incorporating &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/modeloverview.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;parallelism
strategies&lt;/a&gt;
to enable efficient training of very large models. These techniques
include Tensor Parallelism (TP), Pipeline Parallelism (PP), Fully
Sharded Data Parallelism (FSDP), Mixture-of-Experts (MoE), and Mixed
Precision Training with BFloat16 and FP8, as well as others.&lt;/p&gt;
&lt;p&gt;NeMo Transformer-based LLMs and MMs utilize &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/TransformerEngine&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA Transformer
Engine&lt;/a&gt; for FP8 training on
NVIDIA Hopper GPUs, while leveraging &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA Megatron
Core&lt;/a&gt; for
scaling Transformer model training.&lt;/p&gt;
&lt;p&gt;NeMo LLMs can be aligned with state-of-the-art methods such as SteerLM,
Direct Preference Optimization (DPO), and Reinforcement Learning from
Human Feedback (RLHF). See &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo-Aligner&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA NeMo
Aligner&lt;/a&gt; for more information.&lt;/p&gt;
&lt;p&gt;In addition to supervised fine-tuning (SFT), NeMo also supports the
latest parameter efficient fine-tuning (PEFT) techniques such as LoRA,
P-Tuning, Adapters, and IA3. Refer to the &lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo Framework User
Guide&lt;/a&gt;
for the full list of supported models and techniques.&lt;/p&gt;
&lt;h2 id=&#34;llms-and-mms-deployment-and-optimization&#34;&gt;LLMs and MMs Deployment and Optimization
&lt;/h2&gt;&lt;p&gt;NeMo LLMs and MMs can be deployed and optimized with &lt;a class=&#34;link&#34; href=&#34;https://developer.nvidia.com/nemo-microservices-early-access&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA NeMo
Microservices&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;speech-ai&#34;&gt;Speech AI
&lt;/h2&gt;&lt;p&gt;NeMo ASR and TTS models can be optimized for inference and deployed for
production use cases with &lt;a class=&#34;link&#34; href=&#34;https://developer.nvidia.com/riva&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA Riva&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;nemo-framework-launcher&#34;&gt;NeMo Framework Launcher
&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;[!IMPORTANT]&lt;br&gt;
NeMo Framework Launcher is compatible with NeMo version 1.0 only. &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo-Run&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo-Run&lt;/a&gt; is recommended for launching experiments using NeMo 2.0.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo-Megatron-Launcher&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo Framework
Launcher&lt;/a&gt; is a
cloud-native tool that streamlines the NeMo Framework experience. It is
used for launching end-to-end NeMo Framework training jobs on CSPs and
Slurm clusters.&lt;/p&gt;
&lt;p&gt;The NeMo Framework Launcher includes extensive recipes, scripts,
utilities, and documentation for training NeMo LLMs. It also includes
the NeMo Framework &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo-Megatron-Launcher#53-using-autoconfigurator-to-find-the-optimal-configuration&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Autoconfigurator&lt;/a&gt;,
which is designed to find the optimal model parallel configuration for
training on a specific cluster.&lt;/p&gt;
&lt;p&gt;To get started quickly with the NeMo Framework Launcher, please see the
&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo Framework
Playbooks&lt;/a&gt;.
The NeMo Framework Launcher does not currently support ASR and TTS
training, but it will soon.&lt;/p&gt;
&lt;h2 id=&#34;get-started-with-nemo-framework&#34;&gt;Get Started with NeMo Framework
&lt;/h2&gt;&lt;p&gt;Getting started with NeMo Framework is easy. State-of-the-art pretrained
NeMo models are freely available on &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/models?library=nemo&amp;amp;sort=downloads&amp;amp;search=nvidia&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Hugging Face
Hub&lt;/a&gt;
and &lt;a class=&#34;link&#34; href=&#34;https://catalog.ngc.nvidia.com/models?query=nemo&amp;amp;orderBy=weightPopularDESC&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA
NGC&lt;/a&gt;.
These models can be used to generate text or images, transcribe audio,
and synthesize speech in just a few lines of code.&lt;/p&gt;
&lt;p&gt;We have extensive
&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tutorials&lt;/a&gt;
that can be run on &lt;a class=&#34;link&#34; href=&#34;https://colab.research.google.com&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Google Colab&lt;/a&gt; or
with our &lt;a class=&#34;link&#34; href=&#34;https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NGC NeMo Framework
Container&lt;/a&gt;.
We also have
&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;playbooks&lt;/a&gt;
for users who want to train NeMo models with the NeMo Framework
Launcher.&lt;/p&gt;
&lt;p&gt;For advanced users who want to train NeMo models from scratch or
fine-tune existing NeMo models, we have a full suite of &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/tree/main/examples&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;example
scripts&lt;/a&gt; that support
multi-GPU/multi-node training.&lt;/p&gt;
&lt;h2 id=&#34;key-features&#34;&gt;Key Features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;nemo/collections/nlp/README.md&#34; &gt;Large Language Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;nemo/collections/multimodal/README.md&#34; &gt;Multimodal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;nemo/collections/asr/README.md&#34; &gt;Automatic Speech Recognition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;nemo/collections/tts/README.md&#34; &gt;Text to Speech&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;nemo/collections/vision/README.md&#34; &gt;Computer Vision&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;requirements&#34;&gt;Requirements
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Python 3.10 or above&lt;/li&gt;
&lt;li&gt;Pytorch 2.5 or above&lt;/li&gt;
&lt;li&gt;NVIDIA GPU (if you intend to do model training)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;developer-documentation&#34;&gt;Developer Documentation
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Version&lt;/th&gt;
          &lt;th&gt;Status&lt;/th&gt;
          &lt;th&gt;Description&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Latest&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://readthedocs.com/projects/nvidia-nemo/badge/?version=main&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Documentation Status&#34;
	
	
&gt;&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Documentation of the latest (i.e. main) branch.&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Stable&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://readthedocs.com/projects/nvidia-nemo/badge/?version=stable&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Documentation Status&#34;
	
	
&gt;&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Documentation of the stable (i.e. most recent release)&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;install-nemo-framework&#34;&gt;Install NeMo Framework
&lt;/h2&gt;&lt;p&gt;The NeMo Framework can be installed in a variety of ways, depending on
your needs. Depending on the domain, you may find one of the following
installation methods more suitable.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#conda--pip&#34; &gt;Conda / Pip&lt;/a&gt;: Install NeMo-Framework with native Pip into a virtual environment.
&lt;ul&gt;
&lt;li&gt;Used to explore NeMo on any supported platform.&lt;/li&gt;
&lt;li&gt;This is the recommended method for ASR and TTS domains.&lt;/li&gt;
&lt;li&gt;Limited feature-completeness for other domains.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#ngc-pytorch-container&#34; &gt;NGC PyTorch container&lt;/a&gt;: Install NeMo-Framework from source with feature-completeness into a highly optimized container.
&lt;ul&gt;
&lt;li&gt;For users that want to install from source in a highly optimized container.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#ngc-nemo-container&#34; &gt;NGC NeMo container&lt;/a&gt;: Ready-to-go solution of NeMo-Framework
&lt;ul&gt;
&lt;li&gt;For users that seek highest performance.&lt;/li&gt;
&lt;li&gt;Contains all dependencies installed and tested for performance and convergence.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;support-matrix&#34;&gt;Support matrix
&lt;/h3&gt;&lt;p&gt;NeMo-Framework provides tiers of support based on OS / Platform and mode of installation. Please refer the following overview of support levels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fully supported: Max performance and feature-completeness.&lt;/li&gt;
&lt;li&gt;Limited supported: Used to explore NeMo.&lt;/li&gt;
&lt;li&gt;No support yet: In development.&lt;/li&gt;
&lt;li&gt;Deprecated: Support has reached end of life.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Please refer to the following table for current support levels:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;OS / Platform&lt;/th&gt;
          &lt;th&gt;Install from PyPi&lt;/th&gt;
          &lt;th&gt;Source into NGC container&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;linux&lt;/code&gt; - &lt;code&gt;amd64/x84_64&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Limited support&lt;/td&gt;
          &lt;td&gt;Full support&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;linux&lt;/code&gt; - &lt;code&gt;arm64&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Limited support&lt;/td&gt;
          &lt;td&gt;Limited support&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;darwin&lt;/code&gt; - &lt;code&gt;amd64/x64_64&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Deprecated&lt;/td&gt;
          &lt;td&gt;Deprecated&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;darwin&lt;/code&gt; - &lt;code&gt;arm64&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;Limited support&lt;/td&gt;
          &lt;td&gt;Limited support&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;windows&lt;/code&gt; - &lt;code&gt;amd64/x64_64&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;No support yet&lt;/td&gt;
          &lt;td&gt;No support yet&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;code&gt;windows&lt;/code&gt; - &lt;code&gt;arm64&lt;/code&gt;&lt;/td&gt;
          &lt;td&gt;No support yet&lt;/td&gt;
          &lt;td&gt;No support yet&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&#34;conda--pip&#34;&gt;Conda / Pip
&lt;/h3&gt;&lt;p&gt;Install NeMo in a fresh Conda environment:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;conda create --name nemo &lt;span class=&#34;nv&#34;&gt;python&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;==&lt;/span&gt;3.10.12
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;conda activate nemo
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h4 id=&#34;pick-the-right-version&#34;&gt;Pick the right version
&lt;/h4&gt;&lt;p&gt;NeMo-Framework publishes pre-built wheels with each release.
To install nemo_toolkit from such a wheel, use the following installation method:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install &lt;span class=&#34;s2&#34;&gt;&amp;#34;nemo_toolkit[all]&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If a more specific version is desired, we recommend a Pip-VCS install. From &lt;a class=&#34;link&#34; href=&#34;github.com/NVIDIA/NeMo&#34; &gt;NVIDIA/NeMo&lt;/a&gt;, fetch the commit, branch, or tag that you would like to install.&lt;br&gt;
To install nemo_toolkit from this Git reference &lt;code&gt;$REF&lt;/code&gt;, use the following installation method:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/NVIDIA/NeMo
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; NeMo
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git checkout @&lt;span class=&#34;si&#34;&gt;${&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;REF&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;:-&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;main&amp;#39;&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install &lt;span class=&#34;s1&#34;&gt;&amp;#39;.[all]&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h4 id=&#34;install-a-specific-domain&#34;&gt;Install a specific Domain
&lt;/h4&gt;&lt;p&gt;To install a specific domain of NeMo, you must first install the
nemo_toolkit using the instructions listed above. Then, you run the
following domain-specific commands:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install nemo_toolkit&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;all&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or pip install &amp;#34;nemo_toolkit[&amp;#39;all&amp;#39;]@git+https://github.com/NVIDIA/NeMo@${REF:-&amp;#39;main&amp;#39;}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install nemo_toolkit&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;asr&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or pip install &amp;#34;nemo_toolkit[&amp;#39;asr&amp;#39;]@git+https://github.com/NVIDIA/NeMo@$REF:-&amp;#39;main&amp;#39;}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install nemo_toolkit&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;nlp&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or pip install &amp;#34;nemo_toolkit[&amp;#39;nlp&amp;#39;]@git+https://github.com/NVIDIA/NeMo@${REF:-&amp;#39;main&amp;#39;}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install nemo_toolkit&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;tts&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or pip install &amp;#34;nemo_toolkit[&amp;#39;tts&amp;#39;]@git+https://github.com/NVIDIA/NeMo@${REF:-&amp;#39;main&amp;#39;}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install nemo_toolkit&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;vision&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or pip install &amp;#34;nemo_toolkit[&amp;#39;vision&amp;#39;]@git+https://github.com/NVIDIA/NeMo@${REF:-&amp;#39;main&amp;#39;}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install nemo_toolkit&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;multimodal&amp;#39;&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;c1&#34;&gt;# or pip install &amp;#34;nemo_toolkit[&amp;#39;multimodal&amp;#39;]@git+https://github.com/NVIDIA/NeMo@${REF:-&amp;#39;main&amp;#39;}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id=&#34;ngc-pytorch-container&#34;&gt;NGC PyTorch container
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;NOTE: The following steps are supported beginning with 24.04 (NeMo-Toolkit 2.3.0)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We recommended that you start with a base NVIDIA PyTorch container:
nvcr.io/nvidia/pytorch:25.01-py3.&lt;/p&gt;
&lt;p&gt;If starting with a base NVIDIA PyTorch container, you must first launch
the container:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker run &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --gpus all &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -it &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --rm &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --shm-size&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;16g &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --ulimit &lt;span class=&#34;nv&#34;&gt;memlock&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;-1 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --ulimit &lt;span class=&#34;nv&#34;&gt;stack&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;67108864&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  nvcr.io/nvidia/pytorch:&lt;span class=&#34;si&#34;&gt;${&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;NV_PYTORCH_TAG&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;:-&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;nvcr.io/nvidia/pytorch:25.01-py3&amp;#39;&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;From &lt;a class=&#34;link&#34; href=&#34;github.com/NVIDIA/NeMo&#34; &gt;NVIDIA/NeMo&lt;/a&gt;, fetch the commit/branch/tag that you want to install.&lt;br&gt;
To install nemo_toolkit including all of its dependencies from this Git reference &lt;code&gt;$REF&lt;/code&gt;, use the following installation method:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; /opt
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/NVIDIA/NeMo
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; NeMo
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git checkout &lt;span class=&#34;si&#34;&gt;${&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;REF&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;:-&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;main&amp;#39;&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;bash reinstall.sh --library all
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;ngc-nemo-container&#34;&gt;NGC NeMo container
&lt;/h2&gt;&lt;p&gt;NeMo containers are launched concurrently with NeMo version updates.
NeMo Framework now supports LLMs, MMs, ASR, and TTS in a single
consolidated Docker container. You can find additional information about
released containers on the &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/releases&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo releases
page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To use a pre-built container, run the following code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker run &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --gpus all &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -it &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --rm &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --shm-size&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;16g &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --ulimit &lt;span class=&#34;nv&#34;&gt;memlock&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;-1 &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --ulimit &lt;span class=&#34;nv&#34;&gt;stack&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;67108864&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  nvcr.io/nvidia/pytorch:&lt;span class=&#34;si&#34;&gt;${&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;NV_PYTORCH_TAG&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;:-&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;nvcr.io/nvidia/nemo:25.02&amp;#39;&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h2 id=&#34;future-work&#34;&gt;Future Work
&lt;/h2&gt;&lt;p&gt;The NeMo Framework Launcher does not currently support ASR and TTS
training, but it will soon.&lt;/p&gt;
&lt;h2 id=&#34;discussions-board&#34;&gt;Discussions Board
&lt;/h2&gt;&lt;p&gt;FAQ can be found on the NeMo &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/discussions&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Discussions
board&lt;/a&gt;. You are welcome to
ask questions or start discussions on the board.&lt;/p&gt;
&lt;h2 id=&#34;contribute-to-nemo&#34;&gt;Contribute to NeMo
&lt;/h2&gt;&lt;p&gt;We welcome community contributions! Please refer to
&lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;CONTRIBUTING.md&lt;/a&gt;
for the process.&lt;/p&gt;
&lt;h2 id=&#34;publications&#34;&gt;Publications
&lt;/h2&gt;&lt;p&gt;We provide an ever-growing list of
&lt;a class=&#34;link&#34; href=&#34;https://nvidia.github.io/NeMo/publications/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;publications&lt;/a&gt; that utilize
the NeMo Framework.&lt;/p&gt;
&lt;p&gt;To contribute an article to the collection, please submit a pull request
to the &lt;code&gt;gh-pages-src&lt;/code&gt; branch of this repository. For detailed
information, please consult the README located at the &lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo/tree/gh-pages-src#readme&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;gh-pages-src
branch&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;blogs&#34;&gt;Blogs
&lt;/h2&gt;&lt;!-- markdownlint-disable --&gt;
&lt;details open&gt;
  &lt;summary&gt;&lt;b&gt;Large Language Models and Multimodal Models&lt;/b&gt;&lt;/summary&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://blogs.nvidia.com/blog/bria-builds-responsible-generative-ai-using-nemo-picasso/&#34;&gt;
          Bria Builds Responsible Generative AI for Enterprises Using NVIDIA NeMo, Picasso
        &lt;/a&gt; (2024/03/06)
      &lt;/summary&gt;
      Bria, a Tel Aviv startup at the forefront of visual generative AI for enterprises now leverages the NVIDIA NeMo Framework. 
      The Bria.ai platform uses reference implementations from the NeMo Multimodal collection, trained on NVIDIA Tensor Core GPUs, to enable high-throughput and low-latency image generation. 
      Bria has also adopted NVIDIA Picasso, a foundry for visual generative AI models, to run inference.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility/&#34;&gt;
          New NVIDIA NeMo Framework Features and NVIDIA H200
        &lt;/a&gt; (2023/12/06)
      &lt;/summary&gt;
      NVIDIA NeMo Framework now includes several optimizations and enhancements, 
      including: 
      1) Fully Sharded Data Parallelism (FSDP) to improve the efficiency of training large-scale AI models, 
      2) Mix of Experts (MoE)-based LLM architectures with expert parallelism for efficient LLM training at scale, 
      3) Reinforcement Learning from Human Feedback (RLHF) with TensorRT-LLM for inference stage acceleration, and 
      4) up to 4.2x speedups for Llama 2 pre-training on NVIDIA H200 Tensor Core GPUs.
      &lt;br&gt;&lt;br&gt;
      &lt;a href=&#34;https://developer.nvidia.com/blog/new-nvidia-nemo-framework-features-and-nvidia-h200-supercharge-llm-training-performance-and-versatility&#34;&gt;
      &lt;img src=&#34;https://github.com/sbhavani/TransformerEngine/blob/main/docs/examples/H200-NeMo-performance.png&#34; alt=&#34;H200-NeMo-performance&#34; style=&#34;width: 600px;&#34;&gt;&lt;/a&gt;
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
    &lt;details&gt;
      &lt;summary&gt;
        &lt;a href=&#34;https://blogs.nvidia.com/blog/nemo-amazon-titan/&#34;&gt;
          NVIDIA now powers training for Amazon Titan Foundation models
        &lt;/a&gt; (2023/11/28)
      &lt;/summary&gt;
      NVIDIA NeMo Framework now empowers the Amazon Titan foundation models (FM) with efficient training of large language models (LLMs). 
      The Titan FMs form the basis of Amazon’s generative AI service, Amazon Bedrock. 
      The NeMo Framework provides a versatile framework for building, customizing, and running LLMs.
      &lt;br&gt;&lt;br&gt;
    &lt;/details&gt;
&lt;/details&gt;
&lt;!-- markdownlint-enable --&gt;
&lt;h2 id=&#34;licenses&#34;&gt;Licenses
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/NVIDIA/NeMo?tab=Apache-2.0-1-ov-file#readme&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NeMo GitHub Apache 2.0
license&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;NeMo is licensed under the &lt;a class=&#34;link&#34; href=&#34;https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;NVIDIA AI PRODUCT
AGREEMENT&lt;/a&gt;.
By pulling and using the container, you accept the terms and
conditions of this license.&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
    </channel>
</rss>
