Hot-reloading an LLM gateway without redeploying it

If you run an LLM gateway as a long-running process, every model addition or routing change becomes a potential deployment event. That is the wrong unit of change. Configuration is not code, and a CI pipeline is the wrong tool for shipping a new system prompt at four in the afternoon while a customer's agent is misbehaving.

The pattern I keep coming back to is mechanically simple: ECS service for the gateway, Parameter Store for the model registry, a poll loop in between. Most of the design effort goes into deciding what counts as configuration versus what counts as code, and ruthlessly keeping the two on different deploy paths.

SSM as the model registry

Two deploy paths: image via Terraform/CI, config via aws ssm put-parameter.

DIAL Core, DIAL Chat, and the DIAL Bedrock Adapter each carry their model rosters and routing tables in SSM Parameter Store. The containers read those values on startup and poll for changes on a configurable interval — no rebuild required, no ECS task replacement triggered.

This separates two concerns that are routinely conflated: infrastructure provisioning (handled by Terraform and applied through CI) and model configuration (handled by operators with aws ssm put-parameter). Each has its own deployment cadence and its own blast radius. The image is immutable. The configuration is mutable. The poll loop is the API between them.

The parameter layout I have settled on:

/dial/<env>/models/<model-id>          # JSON: provider, region, weights, agreement
/dial/<env>/routes/<route-id>          # JSON: model-id, priority, fallback chain
/dial/<env>/prompts/<prompt-id>        # JSON: template, variables, version pin
/dial/<env>/limits/<tenant-id>         # JSON: tokens/min, requests/min, RPM ceiling

Each leaf is a single JSON document. SSM's "Advanced" tier gives you 8 KB per parameter and version history out of the box — both matter for this use case. When something breaks, aws ssm get-parameter-history is your audit log.

The reload loop, in 30 lines

The poll loop is unglamorous and that is the point. Pseudocode:

# inside the gateway process
import asyncio, json, boto3, hashlib, logging

ssm = boto3.client("ssm")
log = logging.getLogger(__name__)
current_hash: dict[str, str] = {}

async def reload_loop(prefix: str, interval: int = 30):
    while True:
        try:
            params = ssm.get_parameters_by_path(
                Path=prefix, Recursive=True, WithDecryption=True
            )["Parameters"]
            for p in params:
                h = hashlib.sha256(p["Value"].encode()).hexdigest()
                if current_hash.get(p["Name"]) != h:
                    apply(p["Name"], json.loads(p["Value"]))
                    current_hash[p["Name"]] = h
                    log.info("reloaded", extra={"param": p["Name"], "version": p["Version"]})
        except Exception:
            log.exception("reload failed; keeping last-known-good")
        await asyncio.sleep(interval)

Three things to notice. First, the loop never tears the process down on failure — it logs and retries. Last-known-good in memory is always better than a restart during an incident. Second, the hash comparison means an unchanged parameter is a no-op even if SSM returned it. Third, every reload writes a structured log line, which is the only audit surface operators need at three in the morning.

What changes when you stop treating prompts as code

The immediate effect is iteration velocity. Agent teams used to measure prompt-to-production in hours under the old model — open a PR, get a review, wait for CI, wait for ECS to drain and replace tasks. Under the hot-reload model, the same change collapses to minutes: aws ssm put-parameter --overwrite, wait one poll interval, watch the structured log line confirm it. The gateway did not need to know or care; a reload event is just a parameter fetch.

The subtler effect is operational clarity. When a model agreement fails or a routing rule produces unexpected output, the first question is "what changed?". With SSM-backed config, the answer is one API call: aws ssm get-parameter-history. No git blame, no deployment-log archaeology, no correlating a task-definition revision with a commit SHA across two repos.

The long-running process model also inverts the blast-radius relationship. A misconfigured model entry affects that model's routing only. Operators can roll back a parameter without touching the task definition, the image, or any other model's behaviour. The change unit is a single JSON document, and the rollback unit is the same single JSON document.

The trade-off: eventual consistency is real

The honest counter is that this design is eventually consistent and the poll interval is a real failure surface. If you push a routing change and a customer hits the gateway in the same second, they get the old routing. If you push a broken prompt and only realise after twenty seconds, twenty seconds of traffic saw the broken prompt. A synchronous redeploy does not have this property — when the new tasks are healthy, the old tasks are gone.

I have made the trade for two reasons. The first is that the redeploy alternative is not actually synchronous either: it has its own propagation window during ECS draining, plus the rebuild time, plus CI latency. The hot-reload window is bounded by the poll interval, which I can tune. The second is that the rollback path is faster than the deploy path — pushing the previous parameter version is one CLI call, whereas reverting a Terraform-managed task definition involves a revert PR and another full deploy cycle.

If your tolerance for eventual consistency is zero, this pattern is wrong for you. You probably want a synchronous gateway with feature flags and immediate cutover, accepting the longer deploy path as the cost. For most teams, the eventual-consistency window measured in seconds is well below the human-detection latency of "the prompt is bad", and the trade is favourable.

Counter-argument: but our compliance team wants version control

The strongest pushback I hear is that production prompts should live in Git because Git is the system of record auditors recognise. This is a real concern and I do not dismiss it. The answer is not to keep prompts in Git as a deploy path; the answer is to mirror parameter writes into a Git history asynchronously, so the auditable record exists without coupling deploy to merge.

A small EventBridge rule on Parameter Store Change events that commits the new parameter to a read-only audit repo gives the compliance team what they want and the operators what they need. It is more moving parts than "just put it in Git", but it does not couple the wrong concerns to ship the right artefact.

So what

If you maintain an LLM gateway and your prompt changes go through CI, list every change you shipped last month that touched only configuration and ask how many of them would have benefited from a five-minute path instead of a fifty-minute one. Then look at the failures that took a full redeploy to roll back. Both lists tend to be longer than people expect.

The migration is small enough to prototype in a week: pick one prompt, move it to SSM, add the reload loop, point one route at the SSM-resolved value, ship. Either the operations team starts asking for the second prompt, or they don't and you go back. The cost of finding out is low.

[VERIFY: SSM Standard parameters cap at 4 KB; Advanced tier extends to 8 KB and is required for parameter policies — confirm exact limits against the current AWS Systems Manager pricing page before quoting verbatim.]