Zerox OCR A High-Precision, Local OCR Powered by GPT-4o-mini

29 min read

If you're on the lookout for a high-precision, locally-run OCR tool that supports complex layouts, Zerox OCR is undoubtedly an excellent choice.

Zerox OCR converts PDF files into images first, which are then processed by the GPT-4o-mini model to recognize text and output it in Markdown format. All pages’ corresponding Markdown results are then consolidated into a complete Markdown document, streamlining the process of extracting information.

图片

Key Features

1. Zero-Shot OCR

Zerox OCR leverages the GPT-4o-mini model for text recognition, allowing it to handle unfamiliar PDFs, images, and other document types without requiring pre-trained data. This enables it to deliver high-precision OCR results effortlessly.

2. Markdown Output Format

Zerox converts each recognized page into a clean and simple Markdown format, which is particularly useful for developers and document processing professionals. This format facilitates easy post-processing and importing into other systems.

3. Support for Complex Documents

Zerox OCR isn’t just for simple text. It also handles documents containing tables, charts, and other complex layouts. Whether it’s scanned PDFs or other formats, Zerox can accurately recognize and extract text from them.

4. Local Operation and API Support

Zerox supports local operation, addressing privacy concerns by ensuring no data is leaked externally. It also provides an API for seamless integration into applications, enhancing workflow automation and efficiency.

Technical Stack

  • Python
  • JavaScript
  • TypeScript

How it Works

  1. File Submission: Zerox supports a wide range of file formats, including PDF, DOCX, and images. You can easily submit files for OCR processing.
  2. File-to-Image Conversion: The document is first converted into images for subsequent text recognition.
  3. GPT-4o-mini Recognition: Each generated image is processed by the GPT-4o-mini model for text recognition.
  4. Markdown Aggregation: The Markdown results from all pages are consolidated into a single complete Markdown document, facilitating further processing and analysis.

How to Install and Use Zerox

In addition to an online demo, Zerox OCR provides Node.js and Python API packages for integration.

Python Installation and Usage Example

pip install py-zerox

Here’s how to use it (make sure you configure the GPT API and other necessary parameters first):

from pyzerox import zerox
import asyncio

async def main():
    file_path = "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf"  # Local filepath and file URLs are supported
    select_pages = None  # Process all pages or only specific ones (1-indexed)
    output_dir = "./output_test"  # Directory to save the consolidated Markdown file
    
    result = await zerox(file_path=file_path, model=model, output_dir=output_dir, custom_system_prompt=custom_system_prompt, select_pages=select_pages, **kwargs)
    return result

# Run the main function
result = asyncio.run(main())

# Print the Markdown result
print(result)

Sample Output

Zerox will output the recognized text in Markdown format, ready for further use or import into other systems.

Conclusion

Zerox OCR is a powerful local open-source tool powered by GPT-4o-mini, capable of handling complex documents and outputting the results in Markdown format. It’s an ideal solution for users needing precise OCR processing. Whether you’re a developer or a professional dealing with large volumes of documents, Zerox OCR is worth a try.