kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno) —or use via CLI, REST API, or MCP server.

View on GitHub Visit Homepage

Topics

document-intelligence

elixir

ffi

golang

java

metadata-extraction

node

pdf-extraction

pdfium

php

python

rag

ruby

rust

table-extraction

tesseract

text-extraction

wasm