Threat model
This document records the threat model for legal-text-mcp-de using
the STRIDE framework. It is reviewed at each major release.
Scope
Components in scope:
- MCP transport (streamable HTTP, default port 8001).
- HTTP API (FastAPI, default port 8080).
- Dataset loader (validates and loads
DATASET_PATH content).
- Source discovery / generation pipeline
(
prepare_data/prepare_gesetze_im_internet.sh and the runtime
fetchers in mcp/legal_texts/).
Out of scope:
- Underlying Python and uv runtime (assumed trustworthy when
installed from verified sources).
- The Anthropic MCP SDK (
mcp PyPI package) — relied upon as a
trusted dependency; security tracked via Dependabot.
- External sources (gesetze-im-internet.de, EUR-Lex/Cellar) — out of
scope but their availability and integrity are observed via the
generation pipeline.
STRIDE table
Spoofing
| Threat |
Asset |
Mitigation |
| Malicious MCP client impersonates an authorized integration |
MCP transport |
The server has no built-in authentication; rely on network-level isolation (localhost-only by default). For network-exposed deployments, place behind an authenticating reverse proxy. |
| Attacker spoofs source URLs in generation pipeline |
Dataset content |
Source URLs are pinned in code (mcp/legal_texts/sources.py); fetched content is SHA-256-hashed and recorded; manifest entries carry the URL the bytes came from. |
Tampering
| Threat |
Asset |
Mitigation |
| Modification of dataset files at rest |
DATASET_PATH content |
Dataset loader validates JSON schemas at startup (STRICT_STARTUP=true recommended). No in-place modification at runtime; mount read-only in Docker. |
| Modification of release artefacts in transit |
PyPI wheel/sdist; GHCR image |
PyPI: PEP 740 Sigstore attestations. GHCR: cosign keyless signatures. SLSA-3 provenance attestations verifiable from slsa-framework/slsa-verifier. |
Repudiation
| Threat |
Asset |
Mitigation |
| Maintainer denies origin of a release |
Released artefacts |
Sigstore certificates tie each signature to a specific GitHub Actions workflow identity in the project repo. SLSA provenance documents the build. |
| Threat |
Asset |
Mitigation |
| Server logs leak sensitive request data |
Server logs |
Structured logging is best-effort; no PII is processed because the data is public legal text and incoming queries do not include user identifiers. Operators may add their own log scrubbing. |
| Stack traces in HTTP responses |
HTTP API |
FastAPI returns generic 500 responses for unexpected exceptions; detailed traces only in server logs. |
Denial of service
| Threat |
Asset |
Mitigation |
| Resource exhaustion from large queries |
Server |
No built-in rate limiting; deploy behind a reverse proxy with limits for network-exposed setups. Search uses bounded result sets. |
Path traversal in DATASET_PATH |
Server |
Path is validated; only files under the resolved path are read. |
Elevation of privilege
| Threat |
Asset |
Mitigation |
| Container runs as root |
OCI image |
The released image runs as UID 10001 (non-root). |
| Arbitrary code execution via malicious dataset |
Server |
Dataset parsing uses JSON, not pickle or eval. Schema validation rejects unexpected types. |
Residual risks
- The server has no authentication. Localhost-only or
proxy-authenticated deployment is the assumed posture.
- Generation pipeline pulls from public sources without TLS pinning
beyond Python's default certificate validation. Compromise of an
upstream certificate authority would not be detected by this
project.
Review
Reviewed: 2026-05-16 (Phase 3+4 close).
Next review: at v1.1.0 or any architectural change affecting the MCP
or HTTP surface.