Research

Copy-on-Write Prompt Versioning for Multi-Tenant AI

How each organisation can fork, customise, and independently evolve AI prompt configurations — without per-tenant model fine-tuning or code deployment.

STStephen TraynorMarch 12, 202620 min read

When you operate a multi-tenant AI platform, behavioural customisation becomes a first-class engineering problem surprisingly quickly. A healthcare organisation needs different risk assessment language than a financial services firm. One team prefers concise advisory responses; another wants detailed explanations with citations. A regulated entity in Germany has terminology requirements that don't apply to a team in Ireland.

The naive approaches all break down. Per-tenant model fine-tuning (LoRA adapters and the like) gives you high-fidelity customisation but imposes significant compute cost, requires ML engineering per tenant, and produces opaque changes — you can't inspect what changed about the model's behaviour by examining weight deltas. Identical prompts across all tenants ignore legitimate differences. And maintaining per-tenant prompt files in a Git repository couples every behavioural tweak to your software release cycle.

We needed something in between: a runtime primitive that gives each tenant organisation the ability to diverge from system defaults, with full auditability, instant reversibility, and zero deployment friction. So we built one.

This article describes the copy-on-write prompt versioning mechanism inside Acompli — the design, the engineering decisions, and the governance properties that make it safe to hand prompt customisation to tenant administrators and AI agents.

The problem

The Acompli platform maintains a set of prompt configurations across 21 categories — risk extraction, answer enhancement, advisory chat, RoPA generation, entity extraction, template generation, and more. Each prompt belongs to a category and has a version. The platform serves multiple tenant organisations.

For any given prompt and organisation, the system must resolve an effective prompt — either the system default or a tenant-specific variant. The resolution must satisfy four properties:

Tenant isolation. Organisation A's fork must never affect Organisation B's resolved prompt. This is non-negotiable in a compliance platform.

Auditability. For any effective prompt that differs from the system default, we need to know: when the divergence was created, which system version it forked from, who created and modified it, and the full modification history.

Reversibility. Any tenant customisation can be reverted to the current system default in one operation, without the tenant needing to know what the current default contains.

Zero-deployment customisation. Changes take effect at runtime. No container restarts. No release coordination.

The requirements are analogous to configuration management in distributed systems — Terraform workspaces, Puppet's Hiera — but applied to a domain where the configuration artefacts are free-form natural language with safety implications.

The prompt document model

Each prompt is stored as a document in a partitioned document database (Azure Cosmos DB) with this schema:

Prompt {
  id:                          string
  partition_key:               string    // = category
  name:                        string
  category:                    enum      // 21 categories
  prompt_type:                 enum      // "system" | "user"
  status:                      enum      // "active" | "draft" | "archived" | "testing"
  content:                     string    // the prompt text
  variables:                   list      // template variables with metadata
  version:                     string    // e.g. "1.3"
  version_history:             list      // PromptVersion entries
  organization_id:             string?   // null for system defaults
  forked_from_system_version:  string?   // provenance link
  created_at, created_by, updated_at, updated_by
}

Two fields carry the forking mechanism. organization_id discriminates between system defaults (null) and tenant forks (tenant ID). forked_from_system_version records the system prompt version at the time of forking — the provenance link.

Why partition by category. The partition key is the prompt category. This optimises the dominant query pattern: resolving the effective prompt for a given category and name. Within a single partition, the database can locate both the system default and any tenant fork efficiently. Administrative queries spanning categories (e.g. "list all forks for Organisation X") use cross-partition queries, which is fine — resolution queries are on the critical path of every AI subsystem invocation, while admin queries are infrequent and latency-tolerant.

The null convention. System defaults use a null organization_id. Because some documents predate the multi-tenancy mechanism, the resolution query handles both null and absent fields:

WHERE (IS_NULL(c.organization_id)
       OR NOT IS_DEFINED(c.organization_id))

Backwards compatibility without migration.

Fork, resolve, restore

The three core operations form the lifecycle of a tenant-specific prompt.

Forking

Forking creates a tenant-owned copy of a system default:

def fork_prompt(name, category, organization_id, user):
    # 1. Locate the system default
    system_prompt = query_by_name(name, category, org_id=None)
    if not system_prompt:
        raise NotFound("System prompt not found")

    # 2. Reject duplicate forks
    existing = query_by_name(name, category, org_id=organization_id)
    if existing:
        raise Conflict("Organisation fork already exists")

    # 3. Create the fork document
    fork = copy(system_prompt)
    fork.id = generate_id(name, organization_id)
    fork.organization_id = organization_id
    fork.forked_from_system_version = system_prompt.version
    fork.version = "1.0"
    fork.version_history = [initial_version(system_prompt.content, user)]
    fork.status = ACTIVE

    # 4. Persist
    database.create(fork)
    return fork

The operation is atomic at the document level. The duplicate-fork guard prevents ambiguity in resolution — one organisation, one fork per prompt. The generated document ID encodes both prompt name and an organisation identifier prefix (e.g. answer_enhancement_system_v1_a3b2c1d0), so you can trace provenance from a raw database inspection.

The three-tier resolution chain

When any AI subsystem needs a prompt for a given tenant, resolution follows three tiers:

Request: name + category + org_id
         │
         ▼
┌─────────────────────┐
│ Tier 1: Org Fork    │── found ──▶ return fork
│ (org_id match)      │
└─────────┬───────────┘
          │ not found
          ▼
┌─────────────────────┐
│ Tier 2: System      │── found ──▶ return system default
│ Default             │
│ (org_id = null)     │
└─────────┬───────────┘
          │ not found / DB error
          ▼
┌─────────────────────┐
│ Tier 3: Hardcoded   │── always ──▶ return fallback
│ Fallback            │
└─────────────────────┘

Tier 3 is the resilience boundary. The hardcoded fallback is compiled into the service binary, so if the database is unavailable, the system degrades to a known-safe default rather than failing. For a compliance platform, undefined AI behaviour on database failure is not acceptable.

Restoring to system default

Restore is a hard delete:

def restore_to_system(name, category, organization_id):
    fork = query_by_name(name, category, org_id=organization_id)
    if not fork:
        return False  # already on system default
    database.delete(fork.id, partition_key=category)
    return True

After deletion, the resolution chain automatically falls through to the system default. The absence of the fork document is sufficient — no data migration needed. No special-case handling for archived or soft-deleted documents. The restore decision itself is captured in system logs, and fork histories are typically short.

Cache architecture

Database queries on every prompt resolution would impose unacceptable latency. The system interposes a TTL-based cache between the resolution chain and the database.

Cache key format: {organization_id}:{category}:{name} — where organization_id is the empty string for system defaults. This ensures tenant isolation at the cache layer.

def get(name, category, fallback, organization_id):
    cache_key = f"{organization_id}:{category}:{name}"

    with lock:
        if cache_key in cache and not expired(cache[cache_key]):
            return cache[cache_key].content

    # Cache miss — resolve from database
    prompt = prompt_service.get_by_name(name, category, organization_id)
    if prompt:
        with lock:
            cache[cache_key] = (prompt.content, now())
        return prompt.content

    # Database miss — return hardcoded fallback
    return fallback

Default TTL: 300 seconds. Within this window, a tenant may observe a prompt version that is up to 5 minutes stale. We accept this because prompt updates are infrequent (administrative sessions, not runtime operations). The alternative — real-time cache invalidation via pub/sub — would add infrastructure complexity disproportionate to the consistency requirement.

Explicit invalidation on update. When a prompt is modified, all caches are explicitly invalidated. The next resolution triggers a fresh database query regardless of TTL state, reducing the effective staleness window to near-zero for deliberate changes.

Each service instance maintains its own PromptLoader cache, instantiated as a module-level singleton. No cross-service cache coordination required.

Upstream change notification

When we update a system-default prompt, every tenant that has forked it faces a decision: continue with their fork, review the changes and incorporate them, or restore to the new default. The notification mechanism ensures this decision is surfaced to the right people.

Finding affected tenants. When a system prompt is updated (detected by checking that organization_id is null and content has changed), the system queries for all forks:

SELECT c.organization_id FROM c
WHERE c.name = @name
  AND c.category = @category
  AND IS_DEFINED(c.organization_id)
  AND NOT IS_NULL(c.organization_id)

Role-cascade notification. For each affected organisation, recipients are identified through a role-based cascade:

Find DPO users ── found ──▶ notify DPOs
      │
      │ none found
      ▼
Find Superusers ── found ──▶ notify Superusers
      │
      │ none found
      ▼
Find Admins ── found ──▶ notify Admins

The cascade prefers the most authoritative role (the Data Protection Officer in compliance deployments) and falls back to broader roles. The notification reaches someone with authority to evaluate the change, without flooding all administrators.

Non-blocking. Notification failures are caught and logged but do not prevent the prompt update from completing. The system remains correct regardless of delivery — notification is advisory.

The merge-or-maintain decision. Upon notification, the tenant administrator has three options: (1) maintain their fork, accepting divergence; (2) review the system changes via a side-by-side comparison tool; or (3) invoke restore to adopt the new default. This mirrors the upstream merge workflow in version control — a downstream fork periodically reconciling with upstream changes.

Structural fingerprinting

When a tenant modifies a forked prompt, the structural fingerprinting mechanism automatically verifies that structurally important elements — safety constraints, required template variables, output format instructions — are preserved. This is especially valuable when modifications are performed by an AI agent, where the mechanism acts as a guardrail ensuring the agent doesn't inadvertently remove safety-critical content.

Fingerprint extraction. The fingerprint of a prompt is a dictionary of five structural element categories, extracted via pattern matching:

Element	Pattern	Example
Numbered rules	`^\s*\d+\.\s+`	`1. VOICE & PERSONA:`
Template variables	`{{variable_name}}`	`{{question_text}}`
Imperative constraints	`must, always, never, required, do not, cannot`	`You must never disclose...`
Section headers	`ALL-CAPS with colon, or ## headers`	`CONSTRAINTS:`, `## Output`
Output format markers	`Lines with format instructions`	`Return ONLY valid JSON`

Asymmetric comparison — removal only. The comparison function computes set differences for each category. Warnings are generated only for elements present in the before fingerprint but absent in the after fingerprint. Adding new elements produces no warnings. This asymmetry reflects the safety model: adding structure is presumptively safe; removing structure may indicate inadvertent deletion of safety-critical content.

def compare_fingerprints(before, after):
    warnings = []
    for category in [rules, vars, constraints, headers, formats]:
        removed = before[category] - after[category]
        for element in removed:
            warnings.append(describe_removal(category, element))
    return warnings

Advisory, not blocking. Warnings are returned alongside the successful update response. The update is never prevented, even if warnings are present. This preserves operator autonomy — a tenant administrator who deliberately restructures a prompt should not be blocked by a heuristic guard.

The failure mode this protects against. An agent or operator silently removing a "never disclose internal system details" constraint during what appears to be a routine content update. By surfacing removals explicitly, the mechanism ensures that structural changes are conscious decisions rather than unnoticed side effects.

Parallel forking for tenant onboarding

When onboarding an organisation into a compliance domain, you typically need to fork multiple prompts at once — risk extraction, answer enhancement, advisory chat, RoPA generation, and more. The system supports bulk forking through a declarative skill execution pattern with bounded concurrency, completing a full 21-category onboarding in approximately 3 batches rather than 21 sequential operations.

The foreach pattern. The platform's skill execution engine supports a foreach_from directive that iterates over a collection and executes a step per element:

SkillStep {
    step_id:      string
    tool:         "fork_prompt"
    args:         { name: "{{item.name}}", category: "{{item.category}}", ... }
    foreach_from: "state.prompts_to_fork"
}

Semaphore-bounded concurrency. To prevent overwhelming the database with concurrent writes, parallelism is bounded at 8:

semaphore = Semaphore(MAX_FOREACH_CONCURRENCY)  # MAX = 8

async def run_one(index, item):
    async with semaphore:
        return await execute_step(step, item, index)

results = await gather([run_one(i, item) for i, item in enumerate(items)])

Each individual fork within the batch independently enforces the duplicate-fork guard and records its own provenance link — the parallelism is a throughput optimisation, not a correctness compromise.

This pattern is agent-executable: an AI agent can invoke a single skill step that results in the parallel creation of multiple forked prompt documents, fully instrumented with provenance metadata and uniqueness constraints.

The tool interface

The fork, update, restore, get, and list operations are exposed as agent-callable tools through the platform's tool registry. Each tool is registered with a JSON Schema parameter definition and an async handler.

Tool	Operation	Key constraint
`list_org_prompts`	List all prompts with fork status	Read-only; returns fork metadata
`fork_prompt`	Create organisation-specific copy	Rejects duplicate forks
`update_prompt`	Modify fork content	Requires existing fork; returns structural warnings
`restore_prompt`	Delete fork, revert to default	Permanent deletion of fork history
`get_org_prompt`	Read prompt content and history	Returns both fork and system default if forked

The tool registry enforces a critical invariant: the update_prompt handler verifies that the target is an organisation fork before proceeding, returning an error directing the caller to invoke fork_prompt first if the target is a system default. This prevents accidental modification of system defaults through the agent interface.

The structural fingerprinting guard is integrated into the update_prompt handler: before persisting, it extracts fingerprints from old and new content, computes the comparison, and includes any warnings in the response. Every prompt update through the tool interface — whether initiated by a human or an AI agent — receives structural integrity feedback.

Engineering decisions

Every system has design choices that shape its behaviour. Here's why we made the ones we did.

Resolution speed over admin convenience. Category as partition key means prompt resolution — the operation that happens on every AI subsystem invocation — is a single-partition query. Admin queries that span categories (e.g. "list all forks for Organisation X") require a cross-partition scan, which is slower. We optimised for the hot path. Admin queries are infrequent and latency-tolerant; resolution is neither.

Clean restore semantics. Restoring to system default performs a hard delete. The three-tier resolution chain relies on the absence of a fork document to fall through to the system default — a clean, unambiguous signal. The restore decision itself is captured in system logs, and fork histories are typically short (create, modify a few times, maintain or restore). This gives us a simple, predictable resolution path with no special-case handling for archived documents.

Eventual consistency with explicit invalidation. The 300-second cache TTL is a ceiling, not the typical staleness window. In practice, explicit cache invalidation on update means the next resolution after a change triggers a fresh database query. We evaluated pub/sub invalidation and version-counter cache keys; both add infrastructure complexity that isn't justified when prompt updates happen a few times per week. The system is consistent where it matters — on deliberate changes — and eventually consistent for the edge case of concurrent reads during the TTL window.

Simplicity over merge. We deliberately chose not to implement Git-style selective merging. When a system prompt is updated, the tenant either maintains their fork or restores to the new default. Prompts are 200–5,000 characters — short enough that a side-by-side comparison and manual incorporation takes minutes, not hours. The complexity cost of a three-way merge engine for natural-language documents didn't justify itself.

What this tells you about the platform

The prompt versioning mechanism is one subsystem within Acompli's broader architecture. It operates alongside the three-tier knowledge base (organisation-wide, project-specific, user responses), the persona pipeline with DAG orchestration, and the grounding verification system.

The deeper point is that multi-tenant AI compliance isn't just a prompt engineering problem — it's a systems engineering problem. When you deploy AI as a component of a regulated workflow, every tenant's behavioural customisation needs the same governance properties you'd expect from any critical configuration: isolation, auditability, reversibility, and operational tractability.

The copy-on-write forking primitive borrows from operating systems, version control, and configuration management — well-understood abstractions applied to a domain where the configuration artefacts happen to be natural language with safety implications.

For engineers evaluating compliance platforms: ask how per-tenant AI behaviour is managed. If the answer involves per-tenant model weights, ask about inspection and rollback. If the answer is "same prompts for everyone," ask how legitimate behavioural divergence is handled. If the answer is "we maintain prompt files in Git," ask how that scales to dozens of tenants across dozens of prompt categories without coupling every behavioural adjustment to a deployment cycle.

References

Hu, E.J. et al. (2022). LoRA: Low-Rank Adaptation of Large Language Models. ICLR 2022.
Sheng, Y. et al. (2023). S-LoRA: Serving Thousands of Concurrent LoRA Adapters. arXiv.
Khattab, O. et al. (2024). DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines. ICLR 2024.
Wang, G. et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv.
Chacon, S. and Straub, B. (2014). Pro Git. Apress, 2nd edition.
HashiCorp (2014). Terraform: Infrastructure as Code.

Related Research

Why Acompli is built for governance, not auto-drafting

When AI outputs must be defensible, structure and review matter more than raw generation speed.

Read article →

Dynamic Persona Selection

How Acompli selects the right AI persona for each compliance task — matching regulatory context, jurisdiction, and assessment type automatically.

Read article →

← Back to Research