kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno) —or use via CLI, REST API, or MCP server.

Topics

document-intelligence
elixir
ffi
golang
java
metadata-extraction
node
pdf-extraction
pdfium
php
python
rag
ruby
rust
table-extraction
tesseract
text-extraction
wasm