textra：基于 Apple Vision 的 macOS 命令行 OCR 工具

本文介绍 textra，一个基于 Apple Vision API 和 Speech API 的 macOS 命令行工具，用于从图像、PDF 和音频文件中提取文本。

项目简介

textra 是一个专为 macOS 设计的命令行应用程序，利用 Apple 原生的 Vision API（OCR 文字识别）和 Speech API（语音识别）技术，实现高效的文本提取功能。该工具支持多种输入格式和输出选项，适合需要批量处理文档的开发者使用。

核心特性

OCR 文字识别：从图像（PNG、JPEG 等）和 PDF 文件中提取文本
音频转录：从音频文件（M4A 等）中提取文本内容
多种输出格式：
- 标准输出（stdout）
- 单文本文件输出
- 分页文本文件输出
- JSON 位置数据（包含文本在原始文档中的位置信息）
批量处理：支持同时处理多个文件
区域设置：可指定 locale 优化识别效果

系统要求

macOS 13 或更高版本

安装指南

由于 textra 是基于 Swift 开发的 macOS 原生应用，可通过以下方式安装：

bash

# 从源码构建（需要 Xcode）
git clone https://github.com/freedmand/textra.git
cd textra
swift build -c release

# 将构建产物复制到 PATH 目录
cp .build/release/textra /usr/local/bin/

快速开始

基本用法

bash

# 从图像提取文本
textra image.png

# 从 PDF 提取文本
textra document.pdf

# 从音频文件提取文本
textra audio.m4a

# 批量处理多个文件
textra file1.png file2.pdf file3.m4a

输出选项

bash

# 输出到文件
textra image.png -o output.txt

# 输出 JSON 位置数据
textra image.png --outputPositions output.json

# 分页输出（适用于多页 PDF）
textra document.pdf -t page-{}.txt

# 静默模式（无 stdout 输出）
textra image.png -s -o output.txt

指定区域设置

bash

# 使用英语识别
textra image.png -l en -o output.txt

使用示例

场景 1：提取图像中的文本

bash

# 将图像文本输出到控制台
textra screenshot.png

# 将图像文本保存到文件
textra screenshot.png -o extracted.txt

场景 2：处理多页 PDF

bash

# 提取整个 PDF 文本到单个文件
textra document.pdf -o full.txt

# 每页输出到单独文件
textra document.pdf -t pages/page-{}.txt

场景 3：音频转录

bash

# 转录音频并输出位置数据
textra audio.m4a -s --outputPositions audio.json --outputText audio.txt

测试代码解析

textra 的测试代码展示了其完整的命令行接口和功能验证：

测试结构

python

# 测试运行函数
def run(cmd, assertions):
    stdout, stderr, new_file_contents = run_test(cmd)
    for assertion in assertions:
        assertion(stdout, stderr, new_file_contents)

主要测试用例

帮助和版本信息

python

run("textra -h", [assert_no_file_contents, assert_stderr_matches("textra -h")])
run("textra -v", [assert_stderr_matches(r"\d+\.\d+\.\d+")])

基本文本提取

python

run("textra docp1.png", [assert_stdout, assert_no_error])

文件输出测试

python

run("textra docp1.png -o docp1.txt", [assert_files(["docp1.txt"])])

分页输出测试

python

run("textra doc_3.pdf -o doc.txt -t doc/page-{}.txt",
    [assert_files(["doc.txt", "doc/page-1.txt", "doc/page-2.txt", "doc/page-3.txt"])])

错误处理测试

python

run("textra --invalidoption", [assert_has_error("invalid argument")])
run("textra test.docx", [assert_has_error("file type is not supported")])

命令行参数参考

参数	说明
`-h, --help`	显示帮助信息
`-v, --version`	显示版本号
`-o, --outputText`	指定输出文本文件路径
`-p, --outputPositions`	指定输出 JSON 位置数据文件路径
`-t, --outputPageText`	指定分页输出文件路径模板
`-s, --silent`	静默模式，不输出到 stdout
`-l, --locale`	指定区域设置（如 `en`, `zh-Hans`）
`-x, --outputStdout`	强制输出到 stdout（与 -o 配合使用时）

注意事项

仅支持 macOS 13 及以上版本
支持的文件类型：PNG、JPEG、PDF、M4A 等音频格式
不支持 Word 文档（.docx）等格式
使用 Apple 原生 API，无需额外安装 OCR 引擎

项目链接

GitHub 仓库：https://github.com/freedmand/textra
许可证：MIT License

字节笔记本

textra：基于 Apple Vision 的 macOS 命令行 OCR 工具

项目简介

核心特性

系统要求

安装指南

快速开始

基本用法

输出选项

指定区域设置

使用示例

场景 1：提取图像中的文本

场景 2：处理多页 PDF

场景 3：音频转录

测试代码解析

测试结构

主要测试用例

命令行参数参考

注意事项

项目链接