ADR-0000: ADR: Deterministic parsing vs live LLM for natural language descriptions

Single Source of Truth

This ADR is automatically copied from docs/adr/0000-deterministic-parsing-vs-live-llm.md in the repository. Edit the source file to update this page.

Status

Accepted

Context

To generate accessible natural language descriptions of diagrams (PlantUML, Mermaid, etc.), there are two fundamentally different approaches:

Live LLM integration - Send diagram source to an LLM (e.g., OpenAI, Claude) at build time to generate descriptions
Deterministic parsing - Write parsers that transform diagram syntax into natural language using fixed rules

This is a foundational architectural decision that affects reliability, reproducibility, and the overall nature of the tool.

Options

Option A: Live LLM integration

Use an LLM API to generate natural language descriptions from diagram source code at build time.

Pros:

Potentially more "natural" sounding descriptions
Can handle arbitrary diagram complexity
Less code to maintain (outsource intelligence to LLM)
Could adapt to new diagram types without code changes

Cons:

Non-deterministic: Same input may produce different output on each build
Hallucination risk: LLM may describe elements that don't exist or misinterpret the diagram
Security concerns: Potential for prompt injection via diagram source code
API dependency: Requires API keys, network access, rate limits, costs
Build reproducibility: Breaks reproducible builds (different output each time)
Offline builds impossible: Cannot build documentation without internet
Latency: API calls add build time
Privacy: Diagram content sent to third-party service

Option B: Deterministic parsing (chosen)

Write parsers that analyze diagram AST/syntax and generate natural language using fixed transformation rules.

Pros:

Deterministic: Same input always produces same output
Reproducible builds: Documentation builds are consistent
No hallucination: Output is exactly what the parser is designed to produce
Offline capable: No external dependencies at build time
No API costs: Free to run
Fast: No network latency
Secure: No code injection risk via LLM prompts
Auditable: Parser behavior can be inspected and tested

Cons:

More development effort to write parsers
May produce less "natural" sounding descriptions
New diagram types require new parser code
Complex diagrams may result in verbose descriptions

Decision

We choose Option B: Deterministic parsing.

The development of the parsers themselves is AI-enhanced (using tools like GitHub Copilot and Claude), but the resulting code is deterministic. This gives us the best of both worlds:

AI assists in writing the parsers (development speed)
The parsers produce deterministic, reliable output (runtime reliability)

This approach aligns with the principle that documentation builds should be reproducible and that accessibility features must be reliable - a screen reader user should get the same description every time they visit a page.

Consequences

Parser code must be written for each supported diagram type (PlantUML, Mermaid, C4, etc.)
Descriptions follow consistent patterns and terminology
Build output is deterministic and can be tested
No runtime dependencies on external AI services
Development uses AI-assisted coding, but production code is traditional deterministic logic

References

WCAG 2.1 - Consistent, predictable behavior is an accessibility requirement
Reproducible Builds: https://reproducible-builds.org/

Date: 2026-02-09 Author: Bart van der Wal & Claude

Status​

Context​

Options​

Option A: Live LLM integration​

Option B: Deterministic parsing (chosen)​

Decision​

Consequences​

References​