Voice Cloning on Producthunt daily

Product Hunt Daily | 2026-03-28

Sat, 28 Mar 2026 07:51:53 +0000

1. Agentation

Tagline: The visual feedback tool for AI agents
Description: Agentation turns UI annotations into structured context that AI coding agents can understand and act on. Click any element, add a note, and paste the output into Claude Code, Codex, or any AI tool.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI agents, visual feedback, UI annotations, structured context, coding agents, developer tools
VotesCount: 🔺397
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

2. Claude Code auto-fix

Tagline: Auto-fix PRs in the cloud while you stay hands-off
Description: Claude Code auto-fix watches your pull requests in the cloud, resolving CI failures and review comments automatically. It pushes fixes, asks when needed, and keeps your PR green, so you can step away and come back to a ready-to-merge result.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI code fix, automated PR review, CI failure resolution, cloud-based development, hands-off coding, auto-fix pull requests, continuous integration automation, Claude AI, merge-ready PRs, developer productivity
VotesCount: 🔺337
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

3. Gemini 3.1 Flash Live

Tagline: Making audio AI more natural and reliable
Description: Gemini 3.1 Flash Live is Google’s new state-of-the-art native audio model. Built for low-latency, real-time dialogue, it excels at complex reasoning and function calling. It is the exact engine currently powering Gemini Live and Google Search Live.
Website: open
Product Hunt: View on Product Hunt

Keyword: Gemini, audio AI, real-time dialogue, low-latency, natural conversation, reliable, Google AI, function calling, complex reasoning, Gemini Live, Google Search Live, native audio model
VotesCount: 🔺317
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

4. InsideOrg

Tagline: Free organization chart viewer for any company
Description: InsideOrg lets you enter any company domain and instantly see decision makers, reporting lines, and org structure for free. You don’t have to pay just to view a company’s org chart.
Website: open
Product Hunt: View on Product Hunt

Keyword: organization chart viewer, free org chart, company structure, decision makers, reporting lines, org structure viewer, company domain lookup, organizational chart, business hierarchy
VotesCount: 🔺312
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

5. Cockpit AI

Tagline: Run revenue agents across every channel
Description: Deploy AI revenue agents that research prospects, personalize outreach, follow up across channels, and book meetings using your inbox, contacts, docs, and calendar.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI sales agents, automated outreach, prospect research, personalized follow-up, multichannel engagement, meeting booking, revenue automation, CRM integration, sales automation, AI assistant
VotesCount: 🔺305
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

6. Codex Plugins

Tagline: Package Codex skills and app integrations as plugins
Description: Codex Plugins package skills, app integrations, and workflows into reusable, installable bundles for teams and developers. Seamlessly connect tools like Slack, Figma, Notion, and Google Drive to streamline planning, research, coding, and post-work workflows. Build, share, and scale consistent workflows across projects with built-in skills, authentication, and integrations.
Website: open
Product Hunt: View on Product Hunt

Keyword: plugins, app integrations, workflows, reusable bundles, team collaboration, developer tools, Slack, Figma, Notion, Google Drive, automation, productivity, workflow scaling, Codex skills
VotesCount: 🔺193
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

7. Suno v5.5

Tagline: Create with your voice, tune models to your sound
Description: Suno v5.5 is its most personal music model yet. Use your own voice, train custom models on your catalog, and let My Taste learn what you actually like, so the songs feel less generic and much more like you.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI music generator, voice cloning, custom music model, personalized songs, train AI on your voice, music creation, My Taste feature, Suno AI, create music with voice, unique sound
VotesCount: 🔺192
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

8. Stripe Projects

Tagline: Production-ready dev stack from your terminal
Description: Set up hosting, databases, auth, AI, observability, analytics, and more from the CLI. Stripe Projects gives developers and coding agents a reliable way to provision real services, manage credentials, and keep track of usage across the stack.
Website: open
Product Hunt: View on Product Hunt

Keyword: Stripe, CLI, developer tools, hosting, databases, authentication, AI, observability, analytics, dev stack, provisioning, infrastructure, coding agents, terminal
VotesCount: 🔺170
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

9. Voxtral TTS by Mistral AI

Tagline: Multilingual TTS model with realistic and expressive speech
Description: Voxtral TTS is Mistral AI’s first text-to-speech model with state-of-the-art multilingual text-to-speech with realistic, emotionally expressive voices. Low latency, voice cloning, and support for 9 languages make it ideal for scalable voice agents and enterprise workflows.
Website: open
Product Hunt: View on Product Hunt

Keyword: Voxtral, Mistral AI, TTS, text-to-speech, multilingual, realistic speech, expressive voices, voice cloning, low latency, voice agents, enterprise workflows, scalable
VotesCount: 🔺163
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

10. Audos Publishing House

Tagline: Build an AI business, get up to $100K. No equity taken
Description: Audos Publishing House helps everyday entrepreneurs build million-dollar AI-native businesses with tools, mentorship, and up to $100K in funding - for 0% equity. From the team behind BarkBox and Ro. Now supercharged by the acquisition of No Cap, the world’s first AI investor.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI business funding, no equity, AI startup funding, entrepreneur tools, mentorship, Audos Publishing House, AI investor, build AI business, $100K funding, zero equity, AI-native business, No Cap acquisition, startup capital
VotesCount: 🔺155
Featured: Yes
CreatedAt: 2026-03-27 07:01 AM (UTC)

ebook2audiobook

Wed, 22 Oct 2025 15:28:41 +0800

DrewThomasson/ebook2audiobook

📚 ebook2audiobook

CPU/GPU Converter from eBooks to audiobooks with chapters and metadata
using XTTSv2, Bark, Vits, Fairseq, YourTTS, Tacotron and more. Supports voice cloning and +1110 languages!

[!IMPORTANT] This tool is intended for use with non-DRM, legally acquired eBooks only.
The authors are not responsible for any misuse of this software or any resulting legal consequences.
Use this tool responsibly and in accordance with all applicable laws.

Thanks to support ebook2audiobook developers!

Run locally

Run Remotely

GUI Interface

Click to see images of Web GUI

Demos

New Default Voice Demo

https://github.com/user-attachments/assets/750035dc-e355-46f1-9286-05c1d9e88cea

More Demos

ASMR Voice

https://github.com/user-attachments/assets/68eee9a1-6f71-4903-aacd-47397e47e422

Rainy Day Voice

https://github.com/user-attachments/assets/d25034d9-c77f-43a9-8f14-0d167172b080

Scarlett Voice

https://github.com/user-attachments/assets/b12009ee-ec0d-45ce-a1ef-b3a52b9f8693

David Attenborough Voice

https://github.com/user-attachments/assets/81c4baad-117e-4db5-ac86-efc2b7fea921

Example

README.md

Features

📚 Splits eBook into chapters for organized audio.
🎙️ High-quality text-to-speech with Coqui XTTSv2 and Fairseq (and more).
🗣️ Optional voice cloning with your own voice file.
🌍 Supports +1110 languages (English by default). List of Supported languages
🖥️ Designed to run on 4GB RAM.

Supported Languages

Arabic (ar)	Chinese (zh)	English (en)	Spanish (es)
French (fr)	German (de)	Italian (it)	Portuguese (pt)
Polish (pl)	Turkish (tr)	Russian (ru)	Dutch (nl)
Czech (cs)	Japanese (ja)	Hindi (hi)	Bengali (bn)
Hungarian (hu)	Korean (ko)	Vietnamese (vi)	Swedish (sv)
Persian (fa)	Yoruba (yo)	Swahili (sw)	Indonesian (id)
Slovak (sk)	Croatian (hr)	Tamil (ta)	Danish (da)

+1100 languages and dialects here

Hardware Requirements

4gb RAM minimum, 8GB recommended
Virtualization enabled if running on windows (Docker only)
CPU (intel, AMD, ARM), GPU (Nvidia, AMD*, Intel*) (Recommended), MPS (Apple Silicon CPU) *available very soon

[!IMPORTANT] Before to post an install or bug issue search carefully to the opened and closed issues TAB
to be sure your issue does not exist already.

[!NOTE] Lacking of any standards structure like what is a chapter, paragraph, preface etc.
you should first remove manually any text you don’t want to be converted in audio.

Installation Instructions

Clone repo

1
2

git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook

Launching Gradio Web Interface

Run ebook2audiobook:

Linux/MacOS

`1`	`./ebook2audiobook.sh # Run launch script`

Mac Launcher
Double click Mac Ebook2Audiobook Launcher.command

Windows

`1`	`ebook2audiobook.cmd # Run launch script or double click on it`

Windows Launcher
Double click ebook2audiobook.cmd

Manual Python Install

# (for experts only!)
REQUIRED_PROGRAMS=("calibre" "ffmpeg" "nodejs" "mecab" "espeak-ng" "rust" "sox")
REQUIRED_PYTHON_VERSION="3.12"
pip install -r requirements.txt  # Install Python Requirements
python app.py  # Run Ebook2Audiobook

Open the Web App: Click the URL provided in the terminal to access the web app and convert eBooks. http://localhost:7860/
For Public Link: python app.py --share (all OS) ./ebook2audiobook.sh --share (Linux/MacOS) ebook2audiobook.cmd --share (Windows)

[!IMPORTANT] If the script is stopped and run again, you need to refresh your gradio GUI interface
to let the web page reconnect to the new connection socket.

Basic Usage

Linux/MacOS:

1
2

./ebook2audiobook.sh --headless --ebook <path_to_ebook_file> \
    --voice [path_to_voice_file] --language [language_code]

Windows

1
2

ebook2audiobook.cmd --headless --ebook <path_to_ebook_file>
    --voice [path_to_voice_file] --language [language_code]

[–ebook]: Path to your eBook file
[–voice]: Voice cloning file path (optional)
[–language]: Language code in ISO-639-3 (i.e.: ita for italian, eng for english, deu for german…).
Default language is eng and –language is optional for default language set in ./lib/lang.py.
The ISO-639-1 2 letters codes are also supported.

Example of Custom Model Zip Upload

(must be a .zip file containing the mandatory model files. Example for XTTSv2: config.json, model.pth, vocab.json and ref.wav)

Linux/MacOS

1
2

./ebook2audiobook.sh --headless --ebook <ebook_file_path> \
    --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>

Windows

1
2

ebook2audiobook.cmd --headless --ebook <ebook_file_path> \
    --voice <target_voice_file_path> --language <language> --custom_model <custom_model_path>

<custom_model_path>: Path to model_name.zip file, which must contain (according to the tts engine) all the mandatory files
(see ./lib/models.py).

For Detailed Guide with list of all Parameters to use

Linux/MacOS
1

./ebook2audiobook.sh --help
Windows
1

ebook2audiobook.cmd --help
Or for all OS python app.py --help

usage: app.py [-h] [--session SESSION] [--share] [--headless] [--ebook EBOOK]
              [--ebooks_dir EBOOKS_DIR] [--language LANGUAGE] [--voice VOICE]
              [--device {cpu,gpu,mps}]
              [--tts_engine {XTTSv2,BARK,VITS,FAIRSEQ,TACOTRON2,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}]
              [--custom_model CUSTOM_MODEL] [--fine_tuned FINE_TUNED]
              [--output_format OUTPUT_FORMAT] [--temperature TEMPERATURE]
              [--length_penalty LENGTH_PENALTY] [--num_beams NUM_BEAMS]
              [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K]
              [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting]
              [--text_temp TEXT_TEMP] [--waveform_temp WAVEFORM_TEMP]
              [--output_dir OUTPUT_DIR] [--version]

Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion.

options:
  -h, --help            show this help message and exit
  --session SESSION     Session to resume the conversion in case of interruption, crash, 
                            or reuse of custom models and custom cloning voices.

**** The following options are for all modes:
  Optional

**** The following option are for gradio/gui mode only:
  Optional

  --share               Enable a public shareable Gradio link.

**** The following options are for --headless mode only:
  --headless            Run the script in headless mode
  --ebook EBOOK         Path to the ebook file for conversion. Cannot be used when --ebooks_dir is present.
  --ebooks_dir EBOOKS_DIR
                        Relative or absolute path of the directory containing the files to convert. 
                            Cannot be used when --ebook is present.
  --language LANGUAGE   Language of the e-book. Default language is set 
                            in ./lib/lang.py sed as default if not present. All compatible language codes are in ./lib/lang.py

optional parameters:
  --voice VOICE         (Optional) Path to the voice cloning file for TTS engine. 
                            Uses the default voice if not present.
  --device {cpu,gpu,mps}
                        (Optional) Pprocessor unit type for the conversion. 
                            Default is set in ./lib/conf.py if not present. Fall back to CPU if GPU not available.
  --tts_engine {XTTSv2,BARK,VITS,FAIRSEQ,TACOTRON2,YOURTTS,xtts,bark,vits,fairseq,tacotron,yourtts}
                        (Optional) Preferred TTS engine (available are: ['XTTSv2', 'BARK', 'VITS', 'FAIRSEQ', 'TACOTRON2', 'YOURTTS', 'xtts', 'bark', 'vits', 'fairseq', 'tacotron', 'yourtts'].
                            Default depends on the selected language. The tts engine should be compatible with the chosen language
  --custom_model CUSTOM_MODEL
                        (Optional) Path to the custom model zip file cntaining mandatory model files. 
                            Please refer to ./lib/models.py
  --fine_tuned FINE_TUNED
                        (Optional) Fine tuned model path. Default is builtin model.
  --output_format OUTPUT_FORMAT
                        (Optional) Output audio format. Default is set in ./lib/conf.py
  --temperature TEMPERATURE
                        (xtts only, optional) Temperature for the model. 
                            Default to config.json model. Higher temperatures lead to more creative outputs.
  --length_penalty LENGTH_PENALTY
                        (xtts only, optional) A length penalty applied to the autoregressive decoder. 
                            Default to config.json model. Not applied to custom models.
  --num_beams NUM_BEAMS
                        (xtts only, optional) Controls how many alternative sequences the model explores. Must be equal or greater than length penalty. 
                            Default to config.json model.
  --repetition_penalty REPETITION_PENALTY
                        (xtts only, optional) A penalty that prevents the autoregressive decoder from repeating itself. 
                            Default to config.json model.
  --top_k TOP_K         (xtts only, optional) Top-k sampling. 
                            Lower values mean more likely outputs and increased audio generation speed. 
                            Default to config.json model.
  --top_p TOP_P         (xtts only, optional) Top-p sampling. 
                            Lower values mean more likely outputs and increased audio generation speed. Default to config.json model.
  --speed SPEED         (xtts only, optional) Speed factor for the speech generation. 
                            Default to config.json model.
  --enable_text_splitting
                        (xtts only, optional) Enable TTS text splitting. This option is known to not be very efficient. 
                            Default to config.json model.
  --text_temp TEXT_TEMP
                        (bark only, optional) Text Temperature for the model. 
                            Default to 0.85. Higher temperatures lead to more creative outputs.
  --waveform_temp WAVEFORM_TEMP
                        (bark only, optional) Waveform Temperature for the model. 
                            Default to 0.5. Higher temperatures lead to more creative outputs.
  --output_dir OUTPUT_DIR
                        (Optional) Path to the output directory. Default is set in ./lib/conf.py
  --version             Show the version of the script and exit

Example usage:    
Windows:
    Gradio/GUI:
    ebook2audiobook.cmd
    Headless mode:
    ebook2audiobook.cmd --headless --ebook '/path/to/file'
Linux/Mac:
    Gradio/GUI:
    ./ebook2audiobook.sh
    Headless mode:
    ./ebook2audiobook.sh --headless --ebook '/path/to/file'
    
Tip: to add of silence (1.4 seconds) into your text just use "###" or "[pause]".

NOTE: in gradio/gui mode, to cancel a running conversion, just click on the [X] from the ebook upload component.

TIP: if it needs some more pauses, just add ‘###’ or ‘[pause]’ between the words you wish more pause. one [pause] equals to 1.4 seconds

Docker GPU Options

Available pre-build tags: latest (CUDA 11.8)

Edit: IF GPU isn’t detected then you’ll have to build the image -> Building the Docker Container

Running the pre-built Docker Container

-Run with CPU only

`1`	`docker run --pull always --rm -p 7860:7860 athomasson2/ebook2audiobook`

-Run with GPU Speedup (NVIDIA compatible only)

`1`	`docker run --pull always --rm --gpus all -p 7860:7860 athomasson2/ebook2audiobook`

This command will start the Gradio interface on port 7860.(localhost:7860)

For more options add the parameter --help

Building the Docker Container

You can build the docker image with the command:

`1`	`docker build -t athomasson2/ebook2audiobook .`

Avalible Docker Build Arguments

--build-arg TORCH_VERSION=cuda118 Available tags: [cuda121, cuda118, cuda128, rocm, xpu, cpu]

All CUDA version numbers should work, Ex: CUDA 11.6-> cuda116

--build-arg SKIP_XTTS_TEST=true (Saves space by not baking XTTSv2 model into docker image)

Docker container file locations

All ebook2audiobooks will have the base dir of /app/ For example: tmp = /app/tmp audiobooks = /app/audiobooks

Docker headless guide

Before you do run this you need to create a dir named “input-folder” in your current dir which will be linked, This is where you can put your input files for the docker image to see

`1`	`mkdir input-folder && mkdir Audiobooks`

In the command below swap out YOUR_INPUT_FILE.TXT with the name of your input file

docker run --pull always --rm \
    -v $(pwd)/input-folder:/app/input_folder \
    -v $(pwd)/audiobooks:/app/audiobooks \
    athomasson2/ebook2audiobook \
    --headless --ebook /input_folder/YOUR_EBOOK_FILE

The output Audiobooks will be found in the Audiobook folder which will also be located in your local dir you ran this docker command in

To get the help command for the other parameters this program has you can run this

`1`	`docker run --pull always --rm athomasson2/ebook2audiobook --help`

That will output this Help command output

Docker Compose

This project uses Docker Compose to run locally. You can enable or disable GPU support by setting either *gpu-enabled or *gpu-disabled in docker-compose.yml

Steps to Run

Clone the Repository (if you haven’t already):

1
2

git clone https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook

Set GPU Support (disabled by default) To enable GPU support, modify docker-compose.yml and change *gpu-disabled to *gpu-enabled

Start the service:

# Docker
docker-compose up -d # To update add --build

# Podman
podman compose -f podman-compose.yml up -d # To update add --build

Access the service: The service will be available at http://localhost:7860.

Common Docker Issues

My NVIDIA GPU isnt being detected?? -> GPU ISSUES Wiki Page
python: can't open file '/home/user/app/app.py': [Errno 2] No such file or directory (Just remove all post arguments as I replaced the CMD with ENTRYPOINT in the Dockerfile)
- Example: docker run --pull always athomasson2/ebook2audiobook app.py --script_mode full_docker - > corrected - > docker run --pull always athomasson2/ebook2audiobook
- Arguments can be easily added like this now docker run --pull always athomasson2/ebook2audiobook --share
Docker gets stuck downloading Fine-Tuned models. (This does not happen for every computer but some appear to run into this issue) Disabling the progress bar appears to fix the issue, as discussed here in #191 Example of adding this fix in the docker run command

1
2

docker run --pull always --rm --gpus all -e HF_HUB_DISABLE_PROGRESS_BARS=1 -e HF_HUB_ENABLE_HF_TRANSFER=0 \
    -p 7860:7860 athomasson2/ebook2audiobook

Fine Tuned TTS models

Fine Tune your own XTTSv2 model

De-noise training data

Fine Tuned TTS Collection

For an XTTSv2 custom model a ref audio clip of the voice reference is mandatory:

Supported eBook Formats

.epub, .pdf, .mobi, .txt, .html, .rtf, .chm, .lit, .pdb, .fb2, .odt, .cbr, .cbz, .prc, .lrf, .pml, .snb, .cbc, .rb, .tcr
Best results: .epub or .mobi for automatic chapter detection

Output Formats

Creates a ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'] (set in ./lib/conf.py) file with metadata and chapters.

Updating to Latest Version

1
2
3

git pull # Locally/Compose

docker pull athomasson2/ebook2audiobook:latest # For Pre-build docker images

Reverting to older Versions

Releases can be found -> here

1
2
3

git checkout tags/VERSION_NUM # Locally/Compose -> Example: git checkout tags/v25.7.7

athomasson2/ebook2audiobook:VERSION_NUM # For Pre-build docker images -> Example: athomasson2/ebook2audiobook:v25.7.7

Common Issues:

My NVIDIA GPU isnt being detected?? -> GPU ISSUES Wiki Page
CPU is slow (better on server smp CPU) while NVIDIA GPU can have almost real time conversion. Discussion about this For faster multilingual generation I would suggest my other project that uses piper-tts instead (It doesn’t have zero-shot voice cloning though, and is Siri quality voices, but it is much faster on cpu).
“I’m having dependency issues” - Just use the docker, its fully self contained and has a headless mode, add --help parameter at the end of the docker run command for more information.
“Im getting a truncated audio issue!” - PLEASE MAKE AN ISSUE OF THIS, we don’t speak every language and need advise from users to fine tune the sentence splitting logic.😊

What we need help with! 🙌

Full list of things can be found here

Any help from people speaking any of the supported languages to help us improve the models

Do you need to rent a GPU to boost service from us?

A poll is open here https://github.com/DrewThomasson/ebook2audiobook/discussions/889

Special Thanks

Coqui TTS: Coqui TTS GitHub
Calibre: Calibre Website
FFmpeg: FFmpeg Website
@shakenbake15 for better chapter saving method

Product Hunt Daily | 2025-10-21

Tue, 21 Oct 2025 07:30:35 +0000

1. Fish Audio S1

Tagline: Expressive Voice Cloning and Text-to-Speech
Description: Fish Audio S1 is the most expressive and emotionally rich TTS model—creating lifelike voices that capture emotion, rhythm, and nuance. Clone any voice in 10 seconds, preserving accent, tone, and speaking habits with unmatched realism.
Website: open
Product Hunt: View on Product Hunt

Keyword: Voice cloning, text-to-speech, TTS, expressive, lifelike voices, emotion, rhythm, nuance, voice cloning, accent, tone, realism, Fish Audio S1
VotesCount: 🔺413
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

2. Replymer

Tagline: Human replies that sell your product
Description: Replymer helps your brand grow through authentic, human‑written replies that recommend your product in the right conversations.
Website: open
Product Hunt: View on Product Hunt

Keyword: Human replies, product recommendations, brand growth, authentic replies, social selling, conversation marketing
VotesCount: 🔺379
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

3. Logic, Inc.

Tagline: Automate recurring decisions in plain English
Description: Logic automates recurring decisions and reviews. Write your process once in plain English, and automate it anywhere. From content moderation to invoice processing, Logic lets you deploy in minutes, not months.
Website: open
Product Hunt: View on Product Hunt

Keyword: Automation, Decisions, Plain English, Process Automation, No-Code, Content Moderation, Invoice Processing, Deploy Quickly
VotesCount: 🔺291
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

4. Voice Gecko

Tagline: Voice dictation at your fingertips—type less, say more.
Description: Instant dictation for desktop. Press a shortcut, speak, and instantly get accurate text on your clipboard—perfect for emails, coding, AI prompts, or brain dumps.
Website: open
Product Hunt: View on Product Hunt

Keyword: voice dictation, dictation software, voice to text, speech to text, clipboard, desktop, productivity, typing, shortcut, AI prompts, brain dump, voice input
VotesCount: 🔺237
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

5. Simplora

Tagline: Meetings that make you smarter, not confused
Description: Never feel lost in a meeting again! Simplora turns every conversation into a unique learning experience, in real-time and beyond. Available wherever you meet. No download required. Get started for free.
Website: open
Product Hunt: View on Product Hunt

Keyword: Meetings, learning, real-time, no download, free, smarter, confusion, conversation, Simplora
VotesCount: 🔺188
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

6. diny

Tagline: From git diff to clean commits
Description: diny automates commit messages from your staged changes. Clean, consistent, conventional. Includes a timeline view of past commits to keep your history crystal clear.
Website: open
Product Hunt: View on Product Hunt

Keyword: git commits, commit messages, automation, git diff, clean commits, conventional commits, commit history, timeline view, developer tools
VotesCount: 🔺156
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

7. Pylon

Tagline: The support platform built for B2B
Description: AI-Native support platform built for B2B companies. One tool for your ticketing, chat, knowledge base, AI support, account intelligence, and more.
Website: open
Product Hunt: View on Product Hunt

Keyword: B2B support, AI support, ticketing, chat, knowledge base, account intelligence, support platform, AI-native
VotesCount: 🔺138
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

8. App2.dev

Tagline: Turn ideas & Figma designs into complete web & mobile apps
Description: Turn your ideas & Figma designs into web & mobile apps in minutes with backend, database, and authentication - all powered by AI.
Website: open
Product Hunt: View on Product Hunt

Keyword: App development, Figma to app, web app, mobile app, AI, no-code, backend, database, authentication, rapid development
VotesCount: 🔺114
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

9. Aden AI

Tagline: Turn any file into a chatbot course & get certified with AI
Description: We built the Aden Training Agent - it transforms any file or manual into an interactive AI course for workforce training or certification. Try our Mindfulness Agent that teaches focus under pressure, or upload your own file to create a smart, adaptive course.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI chatbot course, file to course, workforce training, AI certification, adaptive learning, Mindfulness Agent, training agent, smart course
VotesCount: 🔺104
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

10. VibeOnly

Tagline: Helping companies screen and hire AI-fluent employees
Description: Everyone says “AI won’t take your job. People who use it will”. Vibeonly helps you hire those people. It’s a test that shows who really knows how to use AI tools really well. Perfect for founders and hiring managers who want elite AI fluent talent.
Website: open
Product Hunt: View on Product Hunt

Keyword: AI hiring, AI fluency, employee screening, AI talent, hiring, AI tools, VibeOnly, talent acquisition
VotesCount: 🔺100
Featured: Yes
CreatedAt: 2025-10-20 07:01 AM (UTC)

Duix.Heygem

Wed, 28 May 2025 15:29:52 +0800

duixcom/Duix.Heygem

HeyGem - Open Source Alternative to Heygen

1. What’s HeyGem

HeyGem is a free and open-source AI avatar project developed by Duix.com.

Seven years ago, a group of young pioneers chose an unconventional technical path, developing a method to train digital human models using real-person video data. Unlike traditional costly 3D digital human approaches, we leveraged AI-generated technology to create ultra-realistic digital humans, slashing production costs from hundreds of thousands of dollars to just $1,000. This innovation has empowered over 10,000 enterprises and generated over 500,000 personalized avatars for professionals across fields – educators, content creators, legal experts, medical practitioners, and entrepreneurs – dramatically enhancing their video production efficiency. However, our vision extends beyond commercial applications. We believe this transformative technology should be accessible to everyone. To democratize digital human creation, we’ve open-sourced our cloning technology and video production framework. Our commitment remains: breaking down technological barriers to make cutting-edge tools available to all. Now, anyone with a computer can freely craft their own AI Avatar and produce videos at zero cost – this is the essence of HeyGem.

2. Introduction

Heygem is a fully offline video synthesis tool designed for Windows systems that can precisely clone your appearance and voice, digitalizing your image. You can create videos by driving virtual avatars through text and voice. No internet connection is required, protecting your privacy while enjoying convenient and efficient digital experiences.

Core Features
- Precise Appearance and Voice Cloning: Using advanced AI algorithms to capture human facial features with high precision, including facial features, contours, etc., to build realistic virtual models. It can also precisely clone voices, capturing and reproducing subtle characteristics of human voices, supporting various voice parameter settings to create highly similar cloning effects.
- Text and Voice-Driven Virtual Avatars: Understanding text content through natural language processing technology, converting text into natural and fluent speech to drive virtual avatars. Voice input can also be used directly, allowing virtual avatars to perform corresponding actions and facial expressions based on the rhythm and intonation of the voice, making the virtual avatar’s performance more natural and vivid.
- Efficient Video Synthesis: Highly synchronizing digital human video images with sound, achieving natural and smooth lip-syncing, intelligently optimizing audio-video synchronization effects.
- Multi-language Support: Scripts support eight languages - English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
Key Advantages
- Fully Offline Operation: No internet connection required, effectively protecting user privacy, allowing users to create in a secure, independent environment, avoiding potential data leaks during network transmission.
- User-Friendly: Clean and intuitive interface, easy to use even for beginners with no technical background, quickly mastering the software’s usage to start their digital human creation journey.
- Multiple Model Support: Supports importing multiple models and managing them through one-click startup packages, making it convenient for users to choose suitable models based on different creative needs and application scenarios.
Technical Support
- Voice Cloning Technology: Using advanced technologies like artificial intelligence to generate similar or identical voices based on given voice samples, covering context, intonation, speed, and other aspects of speech.
- Automatic Speech Recognition: Technology that converts human speech vocabulary content into computer-readable input (text format), enabling computers to “understand” human speech.
- Computer Vision Technology: Used in video synthesis for visual processing, including facial recognition and lip movement analysis, ensuring virtual avatar lip movements match voice and text content.

3. How to Run Locally

HeyGem supports Docker-based rapid deployment. Prior to deployment, ensure your hardware and software environments meet the specified requirements.

HeyGem support two deployment modes：Windows / Ubuntu 22.04 Installation

Dependencies

Nodejs 18
Docker Images
- docker pull guiji2025/fun-asr
- docker pull guiji2025/fish-speech-ziming
- docker pull guiji2025/heygem.ai

Mode 1：Windows Installation

System Requirements:

Currently supports Windows 10 19042.1526 or higher

Hardware Requirements：

Must have D Drive: Mainly used for storing digital human and project data
- Free space requirement: More than 30GB
C Drive: Used for storing service image files
- Free space requirement: More than 100GB
- If less than 100GB is available, after installing Docker, you can choose a different disk folder with more than 100GB of remaining space at the location shown below.
Recommended Configuration:
- CPU: 13th Gen Intel Core i5-13400F
- Memory: 32GB
- Graphics Card: RTX 4070
Ensure you have an NVIDIA graphics card with properly installed drivers

NVIDIA driver download link: https://www.nvidia.cn/drivers/lookup/

Installing Windows Docker

Use the command wsl --list --verbose to check if WSL is installed. If it shows as below, it’s already installed and no further installation is needed.
Update WSL using wsl --update.
Download Docker for Windows, choose the appropriate installation package based on your CPU architecture.
When you see this interface, installation is successful.
Run Docker
Accept the agreement and skip login on first run

Installing the Server

Installation using Docker, docker-compose as follows:

The docker-compose.yml file is in the /deploy directory.
Execute docker-compose up -d in the /deploy directory, if you want to use the lite version, execute docker-compose -f docker-compose-lite.yml up -d
Wait patiently (about half an hour, speed depends on network), download will consume about 70GB of traffic, make sure to use WiFi
When you see three services in Docker, it indicates success (the lite version has only one service heygem-gen-video)

Server Deployment Solution for NVIDIA 50 Series Graphics Cards

For 50 series graphics cards (tested and also works for 30/40 series with CUDA 12.8) Uses the official preview version of PyTorch

Client

Directly download the officially built installation package
Double-click HeyGem-x.x.x-setup.exe to install

Mode 2：Ubuntu 22.04 Installation

System Requirements：

We have conducted a complete test on Ubuntu 22.04. However, theoretically, it supports desktop Linux distributions.

Hardware Requirements：

Recommended Configuration
CPU: 13th Generation Intel Core i5 - 13400F
Memory: 32G or more (necessary)
Graphics Card: RTX - 4070 (Ensure you have an NVIDIA graphics card and the graphics card driver is correctly installed)
Hard Disk: Free space greater than 100G

Install Docker:

First, use docker --version to check if Docker is installed. If it is installed, skip the following steps.

1
2
3

sudo apt update
sudo apt install docker.io
sudo apt install docker-compose

Install the graphics card driver:

Install the graphics card driver by referring to the official documentation(https://www.nvidia.cn/drivers/lookup/).

After installation, execute the nvidia-smi command. If the graphics card information is displayed, the installation is successful.

Install the NVIDIA Container Toolkit

The NVIDIA Container Toolkit is a necessary tool for Docker to use NVIDIA GPUs. The installation steps are as follows:

Add the NVIDIA package repository:

1
2
3

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
  && curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add - \
  && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the package list and install the toolkit:

1
2

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Configure Docker to use the NVIDIA runtime:

`1`	`sudo nvidia-ctk runtime configure --runtime=docker`

Restart the Docker service:

`1`	`sudo systemctl restart docker`

Install the server

1
2

cd /deploy
docker-compose -f docker-compose-linux.yml up -d

Install the client

Directly download the Linux version of the officially built installation package.
Double click HeyGem-x.x.x.AppImage to launch it. No installation is required.

Reminder: In the Ubuntu system, if you enter the desktop as the root user, directly double - clicking HeyGem - x.x.x.AppImage may not work. You need to execute ./HeyGem - x.x.x.AppImage --no - sandbox in the command - line terminal. Adding the --no - sandbox parameter will do the trick.

4. Open APIs

We have opened APIs for model training and video synthesis. After Docker starts, several ports will be exposed locally, accessible through http://127.0.0.1.

For specific code, refer to:

src/main/service/model.js
src/main/service/video.js
src/main/service/voice.js

Model Training

Separate video into silent video + audio
Place audio in

D:\heygem_data\voice\data is agreed with the guiji2025/fish-speech-ziming service, can be modified in docker-compose
Call the

Parameter example:Response example:Record the response results as they will be needed for subsequent audio synthesis

Audio Synthesis

Interface: http://127.0.0.1:18180/v1/invoke

// Request parameters
{
  "speaker": "{uuid}", // A unique UUID
  "text": "xxxxxxxxxx", // Text content to synthesize
  "format": "wav", // Fixed parameter
  "topP": 0.7, // Fixed parameter
  "max_new_tokens": 1024, // Fixed parameter
  "chunk_length": 100, // Fixed parameter
  "repetition_penalty": 1.2, // Fixed parameter
  "temperature": 0.7, // Fixed parameter
  "need_asr": false, // Fixed parameter
  "streaming": false, // Fixed parameter
  "is_fixed_seed": 0, // Fixed parameter
  "is_norm": 0, // Fixed parameter
  "reference_audio": "{voice.asr_format_audio_url}", // Return value from previous "Model Training" step
  "reference_text": "{voice.reference_audio_text}" // Return value from previous "Model Training" step
}

Video Synthesis

Synthesis interface: http://127.0.0.1:8383/easy/submit

// Request parameters
{
  "audio_url": "{audioPath}", // Audio path
  "video_url": "{videoPath}", // Video path
  "code": "{uuid}", // Unique key
  "chaofen": 0, // Fixed value
  "watermark_switch": 0, // Fixed value
  "pn": 1 // Fixed value
}

Progress query: http://127.0.0.1:8383/easy/query?code=${taskCode}

GET request, the parameter taskCode is the code from the synthesis interface input above

Important Notice to Developer Partners

we are now announcing two parallel service solutions:

Project	HeyGem Open Source Local Deployment	Digital Human/Clone Voice API Service
Usage	Open Source Local Deployment	Rapid Clone API Service
Recommended	Technical Users	Business Users
Technical Threshold	Developers with deep learning framework experience/pursuing deep customization/wishing to participate in community co-construction	Quick business integration/focus on upper-level application development/need enterprise-level SLA assurance for commercial scenarios
Hardware Requirements	Need to purchase GPU server	No need to purchase GPU server
Customization	Can modify and extend the code according to your needs, fully controlling the software’s functions and behavior	Cannot directly modify the source code, can only extend functions through API-provided interfaces, less flexible than open source projects
Technical Support	Community Support	Dynamic expansion support + professional technical response team
Maintenance Cost	High maintenance cost	Simple maintenance
Lip Sync Effect	Usable effect	Stunning and higher definition effect
Commercial Authorization	Supports global free commercial use (enterprises with more than 100,000 users or annual revenue exceeding 10 million USD need to sign a commercial license agreement)	Commercial use allowed
Iteration Speed	Slow updates, bug fixes depend on the community	Latest models/algorithms are prioritized, fast problem resolution

We always adhere to the open source spirit, and the launch of the API service aims to provide a more complete solution matrix for developers with different needs. No matter which method you choose, you can always obtain technical support documents through https://duix.com

We look forward to working with you to promote the inclusive development of digital human technology!

You can chat with Heygem Digital Human on the official website: https://duix.com/

We also provide APl at DUIX Platform: https://docs.duix.com/api-reference/api/Introduction

5. What’s New

[Nvidia 50 Series GPU Version Notice]

Tested and verified on 5090 GPU
For installation instructions, see Server Deployment Solution for NVIDIA 50 Series Graphics Cards

[New Ubuntu Version Notice]

Ubuntu Version Officially Released

Adaptation and verification work for Ubuntu 22.04 Desktop version (kernel 6.8.0-52-generic) has been completed. Compatibility testing for other Linux versions has not yet been conducted.
Added internationalization (English) for the client program interface.
Fixed some known issues
- #304
- #292
Ubuntu22.04 Installation Documentation

6. FAQ

Self-Check Steps Before Asking Questions

Check if all three services are in Running status
Confirm that your machine has an NVIDIA graphics card and drivers are correctly installed.

All computing power for this project is local. The three services won’t start without an NVIDIA graphics card or proper drivers.

Ensure both server and client are updated to the latest version. The project is newly open-sourced, the community is very active, and updates are frequent. Your issue might have been resolved in a new version.
- Server: Go to /deploy directory and re-execute docker-compose up -d
- Client: pull code and re-build
GitHub Issues are continuously updated, issues are being resolved and closed daily. Check frequently, your issue might already be resolved.

Question Template

Problem Description

Describe the reproduction steps in detail, with screenshots if possible.

Provide Error Logs
- How to get client logs:
- Server logs:
  
  Find the key location, or click on our three Docker services, and “Copy” as shown below.

7. How to Interact in real time

HeyGem’s digital human realizes digital human cloning and non-real-time video synthesis.

If you want a digital human to support interaction, you can visit duix.com to experience the free test.

8. Contact

If you have any questions, please raise an issue or contact us at james@duix.com

9. License

https://github.com/GuijiAI/HeyGem.ai/blob/main/LICENSE

10. Acknowledgments

ASR based on fun-asr
TTS based on fish-speech-ziming

11. Star History

GitHub Star History

KrillinAI

Wed, 16 Apr 2025 15:29:17 +0800

krillinai/KrillinAI

AI Audio&Video Translation and Dubbing Tool

English｜简体中文｜日本語｜한국어｜Français｜Deutsch｜Español｜Português｜Русский｜اللغة العربية

📢 New Release for Win & Mac Desktop Version – Welcome to Test and Provide Feedback

Overview

Krillin AI is an all-in-one solution for effortless video localization and enhancement. This minimalist yet powerful tool handles everything from translation, dubbing to voice cloning，formatting—seamlessly converting videos between landscape and portrait modes for optimal display across all content platforms(YouTube, TikTok, Bilibili, Douyin, WeChat Channel, RedNote, Kuaishou). With its end-to-end workflow, Krillin AI transforms raw footage into polished, platform-ready content in just a few clicks.

Key Features:

🎯 One-Click Start - Launch your workflow instantly,New desktop version available—easier to use!

📥 Video download - yt-dlp and local file uploading supported

📜 Precise Subtitles - Whisper-powered high-accuracy recognition

🧠 Smart Segmentation - LLM-based subtitle chunking & alignment

🌍 Professional Translation - Paragraph-level translation for consistency

🔄 Term Replacement - One-click domain-specific vocabulary swap

🎙️ Dubbing and Voice Cloning - CosyVoice selected or cloning voices

🎬 Video Composition - Auto-formatting for horizontal/vertical layouts

Showcase

The following picture demonstrates the effect after the subtitle file, which was generated through a one-click operation after importing a 46-minute local video, was inserted into the track. There was no manual adjustment involved at all. There are no missing or overlapping subtitles, the sentence segmentation is natural, and the translation quality is also quite high.

🔍 Speech Recognition Support

All local models in the table below support automatic installation of executable files + model files. Just make your selection, and KrillinAI will handle everything else for you.

Service	Supported Platforms	Model Options	Local/Cloud	Notes
OpenAI Whisper	Cross-platform	-	Cloud	Fast with excellent results
FasterWhisper	Windows/Linux	`tiny`/`medium`/`large-v2` (recommend medium+)	Local	Faster speed, no cloud service overhead
WhisperKit	macOS (Apple Silicon only)	`large-v2`	Local	Native optimization for Apple chips
Alibaba Cloud ASR	Cross-platform	-	Cloud	Bypasses China mainland network issues

🚀 Large Language Model Support

✅ Compatible with all OpenAI API-compatible cloud/local LLM services including but not limited to:

OpenAI
DeepSeek
Qwen (Tongyi Qianwen)
Self-hosted open-source models
Other OpenAI-format compatible API services

🌍 Language Support

Input languages: Chinese, English, Japanese, German, Turkish supported (more languages being added)
Translation languages: 56 languages supported, including English, Chinese, Russian, Spanish, French, etc.

Interface Preview

🚀 Quick Start

Basic Steps

First, download the Release executable file that matches your device’s system. Follow the instructions below to choose between the desktop or non-desktop version, then place the software in an empty folder. Running the program will generate some directories, so keeping it in an empty folder makes management easier.

[For the desktop version (release files with “desktop” in the name), refer here]
The desktop version is newly released to address the difficulty beginners face in editing configuration files correctly. It still has some bugs and is being continuously updated.

Double-click the file to start using it.

[For the non-desktop version (release files without “desktop” in the name), refer here]
The non-desktop version is the original release, with more complex configuration but stable functionality. It is also suitable for server deployment, as it provides a web-based UI.

Create a config folder in the directory, then create a config.toml file inside it. Copy the contents of the config-example.toml file from the source code’s config directory into your config.toml and fill in your configuration details. (If you want to use OpenAI models but don’t know how to get a key, you can join the group for free trial access.)

Double-click the executable or run it in the terminal to start the service.

Open your browser and enter http://127.0.0.1:8888 to begin using it. (Replace 8888 with the port number you specified in the config file.)

To: macOS Users

[For the desktop version, i.e., release files with “desktop” in the name, refer here]
The current packaging method for the desktop version cannot support direct double-click execution or DMG installation due to signing issues. Manual trust configuration is required as follows:

Open the directory containing the executable file (assuming the filename is KrillinAI_1.0.0_desktop_macOS_arm64) in Terminal
Execute the following commands sequentially:

1
2
3

sudo xattr -cr ./KrillinAI_1.0.0_desktop_macOS_arm64  
sudo chmod +x ./KrillinAI_1.0.0_desktop_macOS_arm64  
./KrillinAI_1.0.0_desktop_macOS_arm64

[For the non-desktop version, i.e., release files without “desktop” in the name, refer here]
This software is not signed, so after completing the file configuration in the “Basic Steps,” you will need to manually trust the application on macOS. Follow these steps:

Open the terminal and navigate to the directory where the executable file (assuming the file name is KrillinAI_1.0.0_macOS_arm64) is located.
Execute the following commands in sequence:

1
2
3

sudo xattr -rd com.apple.quarantine ./KrillinAI_1.0.0_macOS_arm64
sudo chmod +x ./KrillinAI_1.0.0_macOS_arm64
./KrillinAI_1.0.0_macOS_arm64

This will start the service.

Docker Deployment

This project supports Docker deployment. Please refer to the Docker Deployment Instructions.

If you encounter video download failures, please refer to the Cookie Configuration Instructions to configure your cookie information.

Configuration Help

The quickest and most convenient configuration method:

Select openai for both transcription_provider and llm_provider. In this way, you only need to fill in openai.apikey in the following three major configuration item categories, namely openai, local_model, and aliyun, and then you can conduct subtitle translation. (Fill in app.proxy, model and openai.base_url as per your own situation.)

The configuration method for using the local speech recognition model (macOS is not supported for the time being) (a choice that takes into account cost, speed, and quality):

Fill in fasterwhisper for transcription_provider and openai for llm_provider. In this way, you only need to fill in openai.apikey and local_model.faster_whisper in the following three major configuration item categories, namely openai and local_model, and then you can conduct subtitle translation. The local model will be downloaded automatically. (The same applies to app.proxy and openai.base_url as mentioned above.)

The following usage situations require the configuration of Alibaba Cloud:

If llm_provider is filled with aliyun, it indicates that the large model service of Alibaba Cloud will be used. Consequently, the configuration of the aliyun.bailian item needs to be set up.
If transcription_provider is filled with aliyun, or if the “voice dubbing” function is enabled when starting a task, the voice service of Alibaba Cloud will be utilized. Therefore, the configuration of the aliyun.speech item needs to be filled in.
If the “voice dubbing” function is enabled and local audio files are uploaded for voice timbre cloning at the same time, the OSS cloud storage service of Alibaba Cloud will also be used. Hence, the configuration of the aliyun.oss item needs to be filled in. Configuration Guide: Alibaba Cloud Configuration Instructions

Frequently Asked Questions

Please refer to Frequently Asked Questions

Contribution Guidelines

Do not submit unnecessary files like .vscode, .idea, etc. Please make good use of .gitignore to filter them.
Do not submit config.toml; instead, submit config-example.toml.

Star History

HeyGem.ai

Tue, 15 Apr 2025 15:30:25 +0800

GuijiAI/HeyGem.ai

Heygem - Open Source Alternative to Heygen 【切换中文】

Announcement

Heygem digital human cloning intelligent agent and plugins have been successfully launched on the Coze platform. No complex deployment is required, even novice users can easily get started and use it directly.

Click here to instantly access the Coze store experience👉Silicon-based Intelligent Digital Human Cloning Agent | Silicon-based Intelligent Digital Human Cloning Plugin

Scan the code to watch the operation video

[New Ubuntu Version Notice]

Ubuntu Version Officially Released

Adaptation and verification work for Ubuntu 22.04 Desktop version (kernel 6.8.0-52-generic) has been completed. Compatibility testing for other Linux versions has not yet been conducted.
Added internationalization (English) for the client program interface.
Fixed some known issues
- #304
- #292
Ubuntu22.04 Installation Documentation

Important Notice to Developer Partners

Dear Heygem Open Source Community Members:

We sincerely thank you for your enthusiastic attention and active participation in the Heygem digital human open source project! We have noticed that some developers face challenges during local deployment. To better meet the needs of different scenarios, we are now announcing two parallel service solutions:

Project	HeyGem Open Source Local Deployment	Digital Human/Clone Voice API Service
Usage	Open Source Local Deployment	Rapid Clone API Service
Recommended	Technical Users	Business Users
Technical Threshold	Developers with deep learning framework experience/pursuing deep customization/wishing to participate in community co-construction	Quick business integration/focus on upper-level application development/need enterprise-level SLA assurance for commercial scenarios
Hardware Requirements	Need to purchase GPU server	No need to purchase GPU server
Customization	Can modify and extend the code according to your needs, fully controlling the software’s functions and behavior	Cannot directly modify the source code, can only extend functions through API-provided interfaces, less flexible than open source projects
Technical Support	Community Support	Dynamic expansion support + professional technical response team
Maintenance Cost	High maintenance cost	Simple maintenance
Lip Sync Effect	Usable effect	Stunning and higher definition effect
Commercial Authorization	Supports global free commercial use (enterprises with more than 100,000 users or annual revenue exceeding 10 million USD need to sign a commercial license agreement)	Commercial use allowed
Iteration Speed	Slow updates, bug fixes depend on the community	Latest models/algorithms are prioritized, fast problem resolution

Silicon-based Intelligent Developer Team

From scratch, hand-in-hand to teach you how to create your own HeyGem open source AI digital human!

Rapid Clone API | API Documentation Center

Real-time Interaction SDK | SDK Documentation Center

Local Real-time Interaction (realtime) duix.ai Open Source Address | Android Version | IOS Version

Open Source Co-Creation · Shared Glory

Since we open-sourced Heygem, global geeks have illuminated the digital avatar matrix in the code universe, with each commit reconstructing the future! But joy is better shared than enjoyed alone—now we invite all experts to join the “Open Source Co-Creation Plan,” empowering everyone with AI creativity and propelling the Chinese AI fleet towards the stars!

Co-Creation Content Direction

Share high-quality videos or articles on Heygem deployment tutorials, optimization guides, and practical cases (Bilibili, Douyin, Xiaohongshu, WeChat Official Accounts, Zhihu, etc.)

Open Source Co-Creation Special Reward Pool (Real Cash Rewards!)

(1) Basic Rewards

Content receiving 20-100 likes will be awarded the [Heygem.ai Master Award] and a 20 RMB cash red envelope.

Content receiving 100+ likes will be awarded the [Heygem.ai God Award] and a 50 RMB cash red envelope.

(2) Special Achievements:

 The monthly MVP will unlock the Open Source Hall of Fame digital badge (permanently on-chain).

Participation Method

Send your creativity to the customer service lady, add a friend with the note “Name+999”.

Outstanding Co-Creation Works Exhibition

HeyGem Digital Human One-Click Start, 8G Video Memory Available, Model Size 10G, No Need for 100G Hard Disk Space, No Need for D Drive, Based on Docker Single Image, Silicon-Based Open Source

Ai Digital Human 16 - Local Deployment! The Most Popular Open Source Digital Human HeyGem Zero-Basis Hands-On Teaching Setup Tutorial, 20% Generation Stuck Solution, Full Simplified Process with Supporting Files - T8 ComfyUI Tutorial

Heygem Open Source Witnessed History! Cyber Worker Revolution!

Digital Human Project Heygem Local Deployment Tutorial

So Tempting! From Paid to Open Source, AI Digital Humans Will Open a New Era

Open Source Free Digital Humans Are Here, Unlimited Times, Fast Cloning

AI Digital Humans Are Free! GitHub’s Hot Project Can Run on Your Computer

HeyGem One-Click Package Windows Direct Run Without Docker Silicon-Based Open Source Digital Human

Introduction

Core Features
- Precise Appearance and Voice Cloning: Using advanced AI algorithms to capture human facial features with high precision, including facial features, contours, etc., to build realistic virtual models. It can also precisely clone voices, capturing and reproducing subtle characteristics of human voices, supporting various voice parameter settings to create highly similar cloning effects.
- Text and Voice-Driven Virtual Avatars: Understanding text content through natural language processing technology, converting text into natural and fluent speech to drive virtual avatars. Voice input can also be used directly, allowing virtual avatars to perform corresponding actions and facial expressions based on the rhythm and intonation of the voice, making the virtual avatar’s performance more natural and vivid.
- Efficient Video Synthesis: Highly synchronizing digital human video images with sound, achieving natural and smooth lip-syncing, intelligently optimizing audio-video synchronization effects.
- Multi-language Support: Scripts support eight languages - English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
Key Advantages
- Fully Offline Operation: No internet connection required, effectively protecting user privacy, allowing users to create in a secure, independent environment, avoiding potential data leaks during network transmission.
- User-Friendly: Clean and intuitive interface, easy to use even for beginners with no technical background, quickly mastering the software’s usage to start their digital human creation journey.
- Multiple Model Support: Supports importing multiple models and managing them through one-click startup packages, making it convenient for users to choose suitable models based on different creative needs and application scenarios.
Technical Support
- Voice Cloning Technology: Using advanced technologies like artificial intelligence to generate similar or identical voices based on given voice samples, covering context, intonation, speed, and other aspects of speech.
- Automatic Speech Recognition: Technology that converts human speech vocabulary content into computer-readable input (text format), enabling computers to “understand” human speech.
- Computer Vision Technology: Used in video synthesis for visual processing, including facial recognition and lip movement analysis, ensuring virtual avatar lip movements match voice and text content.

Dependencies

Nodejs 18
Docker Images
- docker pull guiji2025/fun-asr
- docker pull guiji2025/fish-speech-ziming
- docker pull guiji2025/heygem.ai

Windows Installation

Prerequisites

Must have D Drive: Mainly used for storing digital human and project data
- Free space requirement: More than 30GB
C Drive: Used for storing service image files
- Free space requirement: More than 100GB
- If less than 100GB is available, after installing Docker, you can choose a different disk folder with more than 100GB of remaining space at the location shown below.
System Requirements:
- Currently supports Windows 10 19042.1526 or higher
Recommended Configuration:
- CPU: 13th Gen Intel Core i5-13400F
- Memory: 32GB
- Graphics Card: RTX 4070
Ensure you have an NVIDIA graphics card with properly installed drivers

NVIDIA driver download link: https://www.nvidia.cn/drivers/lookup/

Installing Windows Docker

Use the command wsl --list --verbose to check if WSL is installed. If it shows as below, it’s already installed and no further installation is needed.

WSL installation command: wsl --install

May fail due to network issues, try multiple times

During installation, you’ll need to set and remember a new username and password

Update WSL using wsl --update.
Download Docker for Windows, choose the appropriate installation package based on your CPU architecture.
When you see this interface, installation is successful.
Run Docker
Accept the agreement and skip login on first run

Installing the Server

Installation using Docker, docker-compose as follows:

The docker-compose.yml file is in the /deploy directory.
Execute docker-compose up -d in the /deploy directory, if you want to use the lite version, execute docker-compose -f docker-compose-lite.yml up -d
Wait patiently (about half an hour, speed depends on network), download will consume about 70GB of traffic, make sure to use WiFi
When you see three services in Docker, it indicates success (the lite version has only one service heygem-gen-video)

Client

Directly download the officially built installation package
Double-click HeyGem-x.x.x-setup.exe to install

Ubuntu 22.04 Installation

Recommended Configuration

CPU: 13th Gen Intel Core i5-13400F
Memory: 32GB or more (required)
Graphics Card: RTX-4070 (ensure you have an NVIDIA graphics card and the driver is correctly installed)
Hard Disk: More than 100GB of free space

Install Docker

First, check if Docker is installed using docker --version. If it is installed, skip the following steps.

Directly download the officially built installation package for the Linux version
Double-click HeyGem-x.x.x.AppImage to launch, no installation required

Reminder: On Ubuntu systems, if you are using the root user to access the desktop, double-clicking HeyGem-x.x.x.AppImage may not work. You need to execute ./HeyGem-x.x.x.AppImage --no-sandbox in the terminal, adding the --no-sandbox parameter.

Open APIs

We have opened APIs for model training and video synthesis. After Docker starts, several ports will be exposed locally, accessible through http://127.0.0.1.

For specific code, refer to:

src/main/service/model.js
src/main/service/video.js
src/main/service/voice.js

Model Training

Separate video into silent video + audio
Place audio in D:\heygem_data\voice\data

D:\heygem_data\voice\data is agreed with the guiji2025/fish-speech-ziming service, can be modified in docker-compose

Call the http://127.0.0.1:18180/v1/preprocess_and_tran interface

Parameter example:
1
2
3
4
5
{
  "format": ".wav",
  "reference_audio": "xxxxxx/xxxxx.wav",
  "lang": "zh"
}
Response example:
1
2
3
4
{
  "asr_format_audio_url": "xxxx/x/xxx/xxx.wav",
  "reference_audio_text": "xxxxxxxxxxxx"
}
Record the response results as they will be needed for subsequent audio synthesis

Audio Synthesis

Interface: http://127.0.0.1:18180/v1/invoke

// Request parameters
{
  "speaker": "{uuid}", // A unique UUID
  "text": "xxxxxxxxxx", // Text content to synthesize
  "format": "wav", // Fixed parameter
  "topP": 0.7, // Fixed parameter
  "max_new_tokens": 1024, // Fixed parameter
  "chunk_length": 100, // Fixed parameter
  "repetition_penalty": 1.2, // Fixed parameter
  "temperature": 0.7, // Fixed parameter
  "need_asr": false, // Fixed parameter
  "streaming": false, // Fixed parameter
  "is_fixed_seed": 0, // Fixed parameter
  "is_norm": 0, // Fixed parameter
  "reference_audio": "{voice.asr_format_audio_url}", // Return value from previous "Model Training" step
  "reference_text": "{voice.reference_audio_text}" // Return value from previous "Model Training" step
}

Video Synthesis

Synthesis interface: http://127.0.0.1:8383/easy/submit

// Request parameters
{
  "audio_url": "{audioPath}", // Audio path
  "video_url": "{videoPath}", // Video path
  "code": "{uuid}", // Unique key
  "chaofen": 0, // Fixed value
  "watermark_switch": 0, // Fixed value
  "pn": 1 // Fixed value
}

Progress query: http://127.0.0.1:8383/easy/query?code=${taskCode}

GET request, the parameter taskCode is the code from the synthesis interface input above

Self-Check Steps Before Asking Questions

Check if all three services are in Running status
Confirm that your machine has an NVIDIA graphics card and drivers are correctly installed.

All computing power for this project is local. The three services won’t start without an NVIDIA graphics card or proper drivers.
Ensure both server and client are updated to the latest version. The project is newly open-sourced, the community is very active, and updates are frequent. Your issue might have been resolved in a new version.
- Server: Go to /deploy directory and re-execute docker-compose up -d
- Client: pull code and re-build
GitHub Issues are continuously updated, issues are being resolved and closed daily. Check frequently, your issue might already be resolved.

Question Template

Problem Description

Describe the reproduction steps in detail, with screenshots if possible.
Provide Error Logs
- How to get client logs:
- Server logs:
  
  Find the key location, or click on our three Docker services, and “Copy” as shown below.

Contact Us

`1`	`James@toolwiz.com`

Voice Cloning on Producthunt daily

Product Hunt Daily | 2026-03-28

1. Agentation

2. Claude Code auto-fix

3. Gemini 3.1 Flash Live

4. InsideOrg

5. Cockpit AI

6. Codex Plugins

7. Suno v5.5

8. Stripe Projects

9. Voxtral TTS by Mistral AI

10. Audos Publishing House

ebook2audiobook

DrewThomasson/ebook2audiobook

📚 ebook2audiobook

Thanks to support ebook2audiobook developers!

Run locally

Run Remotely

GUI Interface

Demos

README.md

Table of Contents

Features

Supported Languages

Hardware Requirements

Installation Instructions

Launching Gradio Web Interface

Basic Usage

Example of Custom Model Zip Upload

For Detailed Guide with list of all Parameters to use

Docker GPU Options

Edit: IF GPU isn’t detected then you’ll have to build the image -> Building the Docker Container

Running the pre-built Docker Container

Building the Docker Container

Avalible Docker Build Arguments

Docker container file locations

Docker headless guide

To get the help command for the other parameters this program has you can run this

Docker Compose

Steps to Run

Common Docker Issues

Fine Tuned TTS models

Fine Tune your own XTTSv2 model

De-noise training data

Fine Tuned TTS Collection

Supported eBook Formats

Output Formats

Updating to Latest Version

Reverting to older Versions

Common Issues:

What we need help with! 🙌

Full list of things can be found here

Do you need to rent a GPU to boost service from us?

Special Thanks

Product Hunt Daily | 2025-10-21

1. Fish Audio S1

2. Replymer

3. Logic, Inc.

4. Voice Gecko

5. Simplora

6. diny

7. Pylon

8. App2.dev

9. Aden AI

10. VibeOnly

Duix.Heygem

duixcom/Duix.Heygem

HeyGem - Open Source Alternative to Heygen

Table of Contents

1. What’s HeyGem

2. Introduction

3. How to Run Locally

Dependencies

Mode 1：Windows Installation

Installing Windows Docker

Installing the Server

Server Deployment Solution for NVIDIA 50 Series Graphics Cards

Client

Mode 2：Ubuntu 22.04 Installation

Install the server