If you're on the lookout for a high-precision, locally-run OCR tool that supports complex layouts, Zerox OCR is undoubtedly an excellent choice.
Zerox OCR converts PDF files into images first, which are then processed by the GPT-4o-mini model to recognize text and output it in Markdown format. All pages’ corresponding Markdown results are then consolidated into a complete Markdown document, streamlining the process of extracting information.
Key Features
1. Zero-Shot OCR
Zerox OCR leverages the GPT-4o-mini model for text recognition, allowing it to handle unfamiliar PDFs, images, and other document types without requiring pre-trained data. This enables it to deliver high-precision OCR results effortlessly.
2. Markdown Output Format
Zerox converts each recognized page into a clean and simple Markdown format, which is particularly useful for developers and document processing professionals. This format facilitates easy post-processing and importing into other systems.
3. Support for Complex Documents
Zerox OCR isn’t just for simple text. It also handles documents containing tables, charts, and other complex layouts. Whether it’s scanned PDFs or other formats, Zerox can accurately recognize and extract text from them.
4. Local Operation and API Support
Zerox supports local operation, addressing privacy concerns by ensuring no data is leaked externally. It also provides an API for seamless integration into applications, enhancing workflow automation and efficiency.
Technical Stack
- Python
- JavaScript
- TypeScript
How it Works
- File Submission: Zerox supports a wide range of file formats, including PDF, DOCX, and images. You can easily submit files for OCR processing.
- File-to-Image Conversion: The document is first converted into images for subsequent text recognition.
- GPT-4o-mini Recognition: Each generated image is processed by the GPT-4o-mini model for text recognition.
- Markdown Aggregation: The Markdown results from all pages are consolidated into a single complete Markdown document, facilitating further processing and analysis.
How to Install and Use Zerox
In addition to an online demo, Zerox OCR provides Node.js and Python API packages for integration.
Python Installation and Usage Example
pip install py-zerox
Here’s how to use it (make sure you configure the GPT API and other necessary parameters first):
from pyzerox import zerox import asyncio async def main(): file_path = "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf" # Local filepath and file URLs are supported select_pages = None # Process all pages or only specific ones (1-indexed) output_dir = "./output_test" # Directory to save the consolidated Markdown file result = await zerox(file_path=file_path, model=model, output_dir=output_dir, custom_system_prompt=custom_system_prompt, select_pages=select_pages, **kwargs) return result # Run the main function result = asyncio.run(main()) # Print the Markdown result print(result)
Sample Output
Zerox will output the recognized text in Markdown format, ready for further use or import into other systems.
Conclusion
Zerox OCR is a powerful local open-source tool powered by GPT-4o-mini, capable of handling complex documents and outputting the results in Markdown format. It’s an ideal solution for users needing precise OCR processing. Whether you’re a developer or a professional dealing with large volumes of documents, Zerox OCR is worth a try.
Project Links
- GitHub: https://github.com/getomni-ai/zerox
- Online Demo: https://getomni.ai/ocr-demo