OmniBOR Provenance
ExperimentalSoftware Heritage provenance verification for source files. Checks whether source files match known open source versions using the Software Heritage archive, detecting exact matches and modified files via fuzzy matching.
Overview
You can't reason about the security of software you can't identify. OmniBOR Provenance answers a deceptively simple question: where did this file actually come from? It computes the git SHA-1 of a source file and queries the Software Heritage archive—the universal source code archive—for a match. When a file has been modified, fuzzy matching finds the closest known version and reports exactly what changed.
The tool was extracted from ebomf v0.2.0 as part of an effort to keep ebomf focused on eBPF build tracing, splitting provenance and vulnerability concerns into standalone Rust tools.
How It Works
- Compute the SHA-1 git hash of the target file
- Query the Software Heritage archive for an exact match
- If no exact match, fuzzy-search for similar files
- Filter candidates by filename and size
- Compute line-based similarity (LCS) and return the best match with modification details
Features
- Origin checking - Verify a single file, or recurse through a whole tree
- Fuzzy matching - Identify modified files with a tunable similarity threshold, with optional unified diff output
- CI/CD friendly - JSON output for pipeline integration
- Direct Software Heritage queries - Look up a git SHA-1 against the archive directly
Philosophy
Supply-chain trust starts with knowing what you actually have. Provenance verification is unglamorous, foundational work—the kind of boring infrastructure that quietly makes everything downstream more defensible.