<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Tesseract on Producthunt daily</title>
        <link>https://producthunt.programnotes.cn/en/tags/tesseract/</link>
        <description>Recent content in Tesseract on Producthunt daily</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Thu, 11 Sep 2025 15:31:10 +0800</lastBuildDate><atom:link href="https://producthunt.programnotes.cn/en/tags/tesseract/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>tesseract</title>
        <link>https://producthunt.programnotes.cn/en/p/tesseract/</link>
        <pubDate>Thu, 11 Sep 2025 15:31:10 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/tesseract/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1636114673156-052a83459fc1?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTc1NzU3Mzd8&amp;ixlib=rb-4.1.0" alt="Featured image of post tesseract" /&gt;&lt;h1 id=&#34;tesseract-ocrtesseract&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tesseract-ocr/tesseract&lt;/a&gt;
&lt;/h1&gt;&lt;h1 id=&#34;tesseract-ocr&#34;&gt;Tesseract OCR
&lt;/h1&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://scan.coverity.com/projects/tesseract-ocr&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://scan.coverity.com/projects/tesseract-ocr/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Coverity Scan Build Status&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/security/code-scanning&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://github.com/tesseract-ocr/tesseract/workflows/CodeQL/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;CodeQL&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://issues.oss-fuzz.com/issues?q=is:open%20title:tesseract-ocr&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/oss--fuzz-fuzzing-brightgreen&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;OSS-Fuzz&#34;
	
	
&gt;&lt;/a&gt;
&lt;br&gt;
&lt;a class=&#34;link&#34; href=&#34;https://raw.githubusercontent.com/tesseract-ocr/tesseract/main/LICENSE&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/license-Apache--2.0-blue.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;GitHub license&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/releases/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/badge/download-all%20releases-brightgreen.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Downloads&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;table-of-contents&#34;&gt;Table of Contents
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#tesseract-ocr&#34; &gt;Tesseract OCR&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#about&#34; &gt;About&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#brief-history&#34; &gt;Brief history&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#installing-tesseract&#34; &gt;Installing Tesseract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#running-tesseract&#34; &gt;Running Tesseract&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#for-developers&#34; &gt;For developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#support&#34; &gt;Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#license&#34; &gt;License&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#dependencies&#34; &gt;Dependencies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;#latest-version-of-readme&#34; &gt;Latest Version of README&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;about&#34;&gt;About
&lt;/h2&gt;&lt;p&gt;This package contains an &lt;strong&gt;OCR engine&lt;/strong&gt; - &lt;code&gt;libtesseract&lt;/code&gt; and a &lt;strong&gt;command line program&lt;/strong&gt; - &lt;code&gt;tesseract&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Tesseract 4 adds a new neural net (LSTM) based &lt;a class=&#34;link&#34; href=&#34;https://en.wikipedia.org/wiki/Optical_character_recognition&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;OCR engine&lt;/a&gt; which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (&amp;ndash;oem 0).
It also needs &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Data-Files.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;traineddata&lt;/a&gt; files which support the legacy engine, for example those from the &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tessdata&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tessdata&lt;/a&gt; repository.&lt;/p&gt;
&lt;p&gt;Stefan Weil is the current lead developer. Ray Smith was the lead developer until 2018. The maintainer is Zdenko Podobny. For a list of contributors see &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/blob/main/AUTHORS&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;AUTHORS&lt;/a&gt;
and GitHub&amp;rsquo;s log of &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/graphs/contributors&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;contributors&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Tesseract has &lt;strong&gt;unicode (UTF-8) support&lt;/strong&gt;, and can &lt;strong&gt;recognize &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;more than 100 languages&lt;/a&gt;&lt;/strong&gt; &amp;ldquo;out of the box&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Tesseract supports &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/InputFormats&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;various image formats&lt;/a&gt;&lt;/strong&gt; including PNG, JPEG and TIFF.&lt;/p&gt;
&lt;p&gt;Tesseract supports &lt;strong&gt;various output formats&lt;/strong&gt;: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV, ALTO and PAGE.&lt;/p&gt;
&lt;p&gt;You should note that in many cases, in order to get better OCR results, you&amp;rsquo;ll need to &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;improve the quality&lt;/a&gt; of the image&lt;/strong&gt; you are giving Tesseract.&lt;/p&gt;
&lt;p&gt;This project &lt;strong&gt;does not include a GUI application&lt;/strong&gt;. If you need one, please see the &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/User-Projects-%E2%80%93-3rdParty.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;3rdParty&lt;/a&gt; documentation.&lt;/p&gt;
&lt;p&gt;Tesseract &lt;strong&gt;can be trained to recognize other languages&lt;/strong&gt;.
See &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Tesseract Training&lt;/a&gt; for more information.&lt;/p&gt;
&lt;h2 id=&#34;brief-history&#34;&gt;Brief history
&lt;/h2&gt;&lt;p&gt;Tesseract was originally developed at Hewlett-Packard Laboratories Bristol UK and at Hewlett-Packard Co, Greeley Colorado USA between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google.&lt;/p&gt;
&lt;p&gt;Major version 5 is the current stable version and started with release
&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/releases/tag/5.0.0&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;5.0.0&lt;/a&gt; on November 30, 2021. Newer minor versions and bugfix versions are available from
&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/releases/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Latest source code is available from &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/tree/main&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;main branch on GitHub&lt;/a&gt;.
Open issues can be found in &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;issue tracker&lt;/a&gt;,
and &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Planning.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;planning documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/ReleaseNotes.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Release Notes&lt;/a&gt;&lt;/strong&gt;
and &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/blob/main/ChangeLog&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Change Log&lt;/a&gt;&lt;/strong&gt; for more details of the releases.&lt;/p&gt;
&lt;h2 id=&#34;installing-tesseract&#34;&gt;Installing Tesseract
&lt;/h2&gt;&lt;p&gt;You can either &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Installation.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Install Tesseract via pre-built binary package&lt;/a&gt;
or &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Compiling.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;build it from source&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Before building Tesseract from source, please check that your system has a compiler which is one of the &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/supported-compilers.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;supported compilers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;running-tesseract&#34;&gt;Running Tesseract
&lt;/h2&gt;&lt;p&gt;Basic &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;command line usage&lt;/a&gt;&lt;/strong&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For more information about the various command line options use &lt;code&gt;tesseract --help&lt;/code&gt; or &lt;code&gt;man tesseract&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Examples can be found in the &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html#simplest-invocation-to-ocr-an-image&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;for-developers&#34;&gt;For developers
&lt;/h2&gt;&lt;p&gt;Developers can use &lt;code&gt;libtesseract&lt;/code&gt; &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/capi.h&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;C&lt;/a&gt; or
&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/baseapi.h&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;C++&lt;/a&gt; API to build their own application. If you need bindings to &lt;code&gt;libtesseract&lt;/code&gt; for other programming languages, please see the
&lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/AddOns.html#tesseract-wrappers&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;wrapper&lt;/a&gt; section in the AddOns documentation.&lt;/p&gt;
&lt;p&gt;Documentation of Tesseract generated from source code by doxygen can be found on &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tesseract-ocr.github.io&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;support&#34;&gt;Support
&lt;/h2&gt;&lt;p&gt;Before you submit an issue, please review &lt;strong&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/blob/main/CONTRIBUTING.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;the guidelines for this repository&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;For support, first read the &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation&lt;/a&gt;,
particularly the &lt;a class=&#34;link&#34; href=&#34;https://tesseract-ocr.github.io/tessdoc/FAQ.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;FAQ&lt;/a&gt; to see if your problem is addressed there.
If not, search the &lt;a class=&#34;link&#34; href=&#34;https://groups.google.com/g/tesseract-ocr&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Tesseract user forum&lt;/a&gt;, the &lt;a class=&#34;link&#34; href=&#34;https://groups.google.com/g/tesseract-dev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Tesseract developer forum&lt;/a&gt; and &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;past issues&lt;/a&gt;, and if you still can&amp;rsquo;t find what you need, ask for support in the mailing-lists.&lt;/p&gt;
&lt;p&gt;Mailing-lists:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://groups.google.com/g/tesseract-ocr&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tesseract-ocr&lt;/a&gt; - For tesseract users.&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://groups.google.com/g/tesseract-dev&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tesseract-dev&lt;/a&gt; - For tesseract developers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Please report an issue only for a &lt;strong&gt;bug&lt;/strong&gt;, not for asking questions.&lt;/p&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;pre&gt;&lt;code&gt;The code in this repository is licensed under the Apache License, Version 2.0 (the &amp;quot;License&amp;quot;);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an &amp;quot;AS IS&amp;quot; BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: This software depends on other packages that may be licensed under different open source licenses.&lt;/p&gt;
&lt;p&gt;Tesseract uses &lt;a class=&#34;link&#34; href=&#34;http://leptonica.com/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Leptonica library&lt;/a&gt; which essentially
uses a &lt;a class=&#34;link&#34; href=&#34;http://leptonica.com/about-the-license.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;BSD 2-clause license&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;dependencies&#34;&gt;Dependencies
&lt;/h2&gt;&lt;p&gt;Tesseract uses &lt;a class=&#34;link&#34; href=&#34;https://github.com/DanBloomberg/leptonica&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Leptonica library&lt;/a&gt;
for opening input images (e.g. not documents like pdf).
It is suggested to use leptonica with built-in support for &lt;a class=&#34;link&#34; href=&#34;https://zlib.net&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;zlib&lt;/a&gt;,
&lt;a class=&#34;link&#34; href=&#34;https://sourceforge.net/projects/libpng&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;png&lt;/a&gt; and
&lt;a class=&#34;link&#34; href=&#34;http://www.simplesystems.org/libtiff&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;tiff&lt;/a&gt; (for multipage tiff).&lt;/p&gt;
&lt;h2 id=&#34;latest-version-of-readme&#34;&gt;Latest Version of README
&lt;/h2&gt;&lt;p&gt;For the latest online version of the README.md see:&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract/blob/main/README.md&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/tesseract-ocr/tesseract/blob/main/README.md&lt;/a&gt;&lt;/p&gt;
</description>
        </item>
        <item>
        <title>OCRmyPDF</title>
        <link>https://producthunt.programnotes.cn/en/p/ocrmypdf/</link>
        <pubDate>Sun, 13 Jul 2025 15:31:01 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/ocrmypdf/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1582575633518-b2b7d205a386?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NTIzOTE3NDd8&amp;ixlib=rb-4.1.0" alt="Featured image of post OCRmyPDF" /&gt;&lt;h1 id=&#34;ocrmypdfocrmypdf&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ocrmypdf/OCRmyPDF&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;ocrmypdf/OCRmyPDF&lt;/a&gt;
&lt;/h1&gt;&lt;!-- SPDX-FileCopyrightText: 2014 Julien Pfefferkorn --&gt;
&lt;!-- SPDX-FileCopyrightText: 2015 James R. Barlow --&gt;
&lt;!-- SPDX-License-Identifier: CC-BY-SA-4.0 --&gt;
&lt;img src=&#34;docs/images/logo.svg&#34; width=&#34;240&#34; alt=&#34;OCRmyPDF&#34;&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/ocrmypdf/OCRmyPDF/actions/workflows/build.yml&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://github.com/ocrmypdf/OCRmyPDF/actions/workflows/build.yml/badge.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Build Status&#34;
	
	
&gt;&lt;/a&gt; &lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/ocrmypdf/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/v/ocrmypdf.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI version&#34;
	
	
&gt;&lt;/a&gt; &lt;img src=&#34;https://img.shields.io/homebrew/v/ocrmypdf.svg&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Homebrew version&#34;
	
	
&gt; &lt;img src=&#34;https://readthedocs.org/projects/ocrmypdf/badge/?version=latest&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;ReadTheDocs&#34;
	
	
&gt; &lt;img src=&#34;https://img.shields.io/pypi/pyversions/ocrmypdf&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Python versions&#34;
	
	
&gt;&lt;/p&gt;
&lt;p&gt;OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;9
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf                      &lt;span class=&#34;c1&#34;&gt;# it&amp;#39;s a scriptable command line program&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   -l eng+fra                 &lt;span class=&#34;c1&#34;&gt;# it supports multiple languages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   --rotate-pages             &lt;span class=&#34;c1&#34;&gt;# it can fix pages that are misrotated&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   --deskew                   &lt;span class=&#34;c1&#34;&gt;# it can deskew crooked PDFs!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   --title &lt;span class=&#34;s2&#34;&gt;&amp;#34;My PDF&amp;#34;&lt;/span&gt;           &lt;span class=&#34;c1&#34;&gt;# it can change output metadata&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   --jobs &lt;span class=&#34;m&#34;&gt;4&lt;/span&gt;                   &lt;span class=&#34;c1&#34;&gt;# it uses multiple cores by default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   --output-type pdfa         &lt;span class=&#34;c1&#34;&gt;# it produces PDF/A by default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   input_scanned.pdf          &lt;span class=&#34;c1&#34;&gt;# takes PDF input (or images)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;   output_searchable.pdf      &lt;span class=&#34;c1&#34;&gt;# produces validated PDF output&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://ocrmypdf.readthedocs.io/en/latest/release_notes.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;See the release notes for details on the latest changes&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;main-features&#34;&gt;Main features
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Generates a searchable &lt;a class=&#34;link&#34; href=&#34;https://en.wikipedia.org/?title=PDF/A&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;PDF/A&lt;/a&gt; file from a regular PDF&lt;/li&gt;
&lt;li&gt;Places OCR text accurately below the image to ease copy / paste&lt;/li&gt;
&lt;li&gt;Keeps the exact resolution of the original embedded images&lt;/li&gt;
&lt;li&gt;When possible, inserts OCR information as a &amp;ldquo;lossless&amp;rdquo; operation without disrupting any other content&lt;/li&gt;
&lt;li&gt;Optimizes PDF images, often producing files smaller than the input file&lt;/li&gt;
&lt;li&gt;If requested, deskews and/or cleans the image before performing OCR&lt;/li&gt;
&lt;li&gt;Validates input and output files&lt;/li&gt;
&lt;li&gt;Distributes work across all available CPU cores&lt;/li&gt;
&lt;li&gt;Uses &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tesseract&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Tesseract OCR&lt;/a&gt; engine to recognize more than &lt;a class=&#34;link&#34; href=&#34;https://github.com/tesseract-ocr/tessdata&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;100 languages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Keeps your private data private.&lt;/li&gt;
&lt;li&gt;Scales properly to handle files with thousands of pages.&lt;/li&gt;
&lt;li&gt;Battle-tested on millions of PDFs.&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;misc/screencast/demo.svg&#34; alt=&#34;Demo of OCRmyPDF in a terminal session&#34;&gt;
&lt;p&gt;For details: please consult the &lt;a class=&#34;link&#34; href=&#34;https://ocrmypdf.readthedocs.io/en/latest/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;motivation&#34;&gt;Motivation
&lt;/h2&gt;&lt;p&gt;I searched the web for a free command line tool to OCR PDF files: I found many, but none of them were really satisfying:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Either they produced PDF files with misplaced text under the image (making copy/paste impossible)&lt;/li&gt;
&lt;li&gt;Or they did not handle accents and multilingual characters&lt;/li&gt;
&lt;li&gt;Or they changed the resolution of the embedded images&lt;/li&gt;
&lt;li&gt;Or they generated ridiculously large PDF files&lt;/li&gt;
&lt;li&gt;Or they crashed when trying to OCR&lt;/li&gt;
&lt;li&gt;Or they did not produce valid PDF files&lt;/li&gt;
&lt;li&gt;On top of that none of them produced PDF/A files (format dedicated for long time storage)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;hellip;so I decided to develop my own tool.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation
&lt;/h2&gt;&lt;p&gt;Linux, Windows, macOS and FreeBSD are supported. Docker images are also available, for both x64 and ARM.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Operating system&lt;/th&gt;
          &lt;th&gt;Install command&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Debian, Ubuntu&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;apt install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Windows Subsystem for Linux&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;apt install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Fedora&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;dnf install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;macOS (Homebrew)&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;brew install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;macOS (MacPorts)&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;port install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;macOS (nix)&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;nix-env -i ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;LinuxBrew&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;brew install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;FreeBSD&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;pkg install py-ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Ubuntu Snap&lt;/td&gt;
          &lt;td&gt;&lt;code&gt;snap install ocrmypdf&lt;/code&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For everyone else, &lt;a class=&#34;link&#34; href=&#34;https://ocrmypdf.readthedocs.io/en/latest/installation.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;see our documentation&lt;/a&gt; for installation steps.&lt;/p&gt;
&lt;h2 id=&#34;languages&#34;&gt;Languages
&lt;/h2&gt;&lt;p&gt;OCRmyPDF uses Tesseract for OCR, and relies on its language packs. For Linux users, you can often find packages that provide language packs:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Display a list of all Tesseract language packs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;apt-cache search tesseract-ocr
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Debian/Ubuntu users&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;apt-get install tesseract-ocr-chi-sim  &lt;span class=&#34;c1&#34;&gt;# Example: Install Chinese Simplified language pack&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Arch Linux users&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pacman -S tesseract-data-eng tesseract-data-deu &lt;span class=&#34;c1&#34;&gt;# Example: Install the English and German language packs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# brew macOS users&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;brew install tesseract-lang
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You can then pass the &lt;code&gt;-l LANG&lt;/code&gt; argument to OCRmyPDF to give a hint as to what languages it should search for. Multiple languages can be requested.&lt;/p&gt;
&lt;p&gt;OCRmyPDF supports Tesseract 4.1.1+. It will automatically use whichever version it finds first on the &lt;code&gt;PATH&lt;/code&gt; environment variable. On Windows, if &lt;code&gt;PATH&lt;/code&gt; does not provide a Tesseract binary, we use the highest version number that is installed according to the Windows Registry.&lt;/p&gt;
&lt;h2 id=&#34;documentation-and-support&#34;&gt;Documentation and support
&lt;/h2&gt;&lt;p&gt;Once OCRmyPDF is installed, the built-in help which explains the command syntax and options can be accessed via:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf --help
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Our &lt;a class=&#34;link&#34; href=&#34;https://ocrmypdf.readthedocs.io/en/latest/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation is served on Read the Docs&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please report issues on our &lt;a class=&#34;link&#34; href=&#34;https://github.com/ocrmypdf/OCRmyPDF/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;GitHub issues&lt;/a&gt; page, and follow the issue template for quick response.&lt;/p&gt;
&lt;h2 id=&#34;feature-demo&#34;&gt;Feature demo
&lt;/h2&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Add an OCR layer and convert to PDF/A&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf input.pdf output.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Convert an image to single page PDF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf input.jpg output.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Add OCR to a file in place (only modifies file on success)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf myfile.pdf myfile.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# OCR with non-English languages (look up your language&amp;#39;s ISO 639-3 code)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf -l fra LeParisien.pdf LeParisien.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# OCR multilingual documents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf -l eng+fra Bilingual-English-French.pdf Bilingual-English-French.pdf
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Deskew (straighten crooked pages)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ocrmypdf --deskew input.pdf output.pdf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For more features, see the &lt;a class=&#34;link&#34; href=&#34;https://ocrmypdf.readthedocs.io/en/latest/index.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;requirements&#34;&gt;Requirements
&lt;/h2&gt;&lt;p&gt;In addition to the required Python version, OCRmyPDF requires external program installations of Ghostscript and Tesseract OCR. OCRmyPDF is pure Python, and runs on pretty much everything: Linux, macOS, Windows and FreeBSD.&lt;/p&gt;
&lt;h2 id=&#34;press--media&#34;&gt;Press &amp;amp; Media
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://medium.com/@ikirichenko/going-paperless-with-ocrmypdf-e2f36143f46a&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Going paperless with OCRmyPDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://medium.com/@treyharris/converting-a-scanned-document-into-a-compressed-searchable-pdf-with-redactions-63f61c34fe4c&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Converting a scanned document into a compressed searchable PDF with redactions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://heise.de/-2279695&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;c&amp;rsquo;t 1-2014, page 59&lt;/a&gt;: Detailed presentation of OCRmyPDF v1.0 in the leading German IT magazine c&amp;rsquo;t&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://heise.de/-2356670&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;heise Open Source, 09/2014: Texterkennung mit OCRmyPDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.heise.de/ratgeber/Durchsuchbare-PDF-Dokumente-mit-OCRmyPDF-erstellen-4607592.html&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;heise Durchsuchbare PDF-Dokumente mit OCRmyPDF erstellen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.linuxlinks.com/excellent-utilities-ocrmypdf-add-ocr-text-layer-scanned-pdfs/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Excellent Utilities: OCRmyPDF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.linux-community.de/ausgaben/linuxuser/2021/06/texterkennung-mit-ocrmypdf-und-scanbd-automatisieren/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;LinuxUser Texterkennung mit OCRmyPDF und Scanbd automatisieren&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://news.ycombinator.com/item?id=32028752&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Y Combinator discussion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;business-enquiries&#34;&gt;Business enquiries
&lt;/h2&gt;&lt;p&gt;OCRmyPDF would not be the software that it is today without companies and users choosing to provide support for feature development and consulting enquiries. We are happy to discuss all enquiries, whether for extending the existing feature set, or integrating OCRmyPDF into a larger system.&lt;/p&gt;
&lt;h2 id=&#34;license&#34;&gt;License
&lt;/h2&gt;&lt;p&gt;The OCRmyPDF software is licensed under the Mozilla Public License 2.0 (MPL-2.0). This license permits integration of OCRmyPDF with other code, included commercial and closed source, but asks you to publish source-level modifications you make to OCRmyPDF.&lt;/p&gt;
&lt;p&gt;Some components of OCRmyPDF have other licenses, as indicated by standard SPDX license identifiers or the DEP5 copyright and licensing information file. Generally speaking, non-core code is licensed under MIT, and the documentation and test files are licensed under Creative Commons ShareAlike 4.0 (CC-BY-SA 4.0).&lt;/p&gt;
&lt;h2 id=&#34;disclaimer&#34;&gt;Disclaimer
&lt;/h2&gt;&lt;p&gt;The software is distributed on an &amp;ldquo;AS IS&amp;rdquo; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
