INFRASTRUCTURE 2026-06-08 1,842 words 7 min read

SHIPPING AN AGENT TO BEDROCK AGENTCORE_

Abstract cyberpunk visualization of agent deployment pipeline

The Bedrock AgentCore SDK is new. The docs are sparse. This is what it actually looks like to ship something on it.

> SDK ONBOARDING: READING SOURCE BECAUSE YOU HAVE TO#

The first thing you notice is that the documentation covers the happy path. It tells you how to instantiate a client, how to invoke an agent, and not much else. The edge cases — session management, expiry, rotation — you figure out by reading the SDK source.

That is not a complaint. It is just the reality of building on something that shipped recently. The SDK works. The docs will catch up. In the meantime, you read source.

The session model is built around runtimeSessionId. Every invocation carries one. The SDK does not manage session lifecycle for you — that is your problem. You decide when to create a session, when to rotate it, and when to expire it. The SDK just passes whatever you give it.

I figured out the expiry behavior by shipping broken code. The first version assumed sessions were long-lived. They are not. After a period of inactivity, the runtime drops the session state. The next invocation comes back with a clean context — no memory, no history. The agent behaves like it has never met you before.

The fix was straightforward once I understood the model: check expiry before every request, rotate if needed, carry the new session ID forward. But I only understood the model after watching it fail in production. That is the honest version of how this went.

The other thing the docs do not cover is the difference between a session ID and an actor ID. The actor ID is your user identifier — it scopes memory to a person. The session ID is the runtime handle — it scopes context to a conversation. They are related but not the same. Conflating them produces subtle bugs that are annoying to trace.

> MEMORY CLIENT MIGRATION: FROM RAW API TO SDK#

The original implementation called the memory API directly. Raw boto3 calls, manual response parsing, no abstraction. It worked, but it had a problem: every request triggered three separate session lookups. Check if session exists, get session details, get memory context. Three round trips before the agent could do anything useful.

The migration to MemoryClient cleaned up the interface but did not automatically fix the duplicate lookups. That required a deliberate lazy-loading pattern — check once, cache the result, reuse it for the lifetime of the request.

Here is the relevant piece of the request context builder after the fix:

async def _build_request_context(payload: dict) -> tuple[RequestContext | None, list[dict]]:
    prompt = payload.get("prompt", "")
    username = payload.get("username", "user")
    qualifier = payload.get("qualifier")
    if not qualifier:
        messages.append({"error": "qualifier is required", "code": "MISSING_QUALIFIER"})
        return None, messages

    actor_id = get_memory_skill().sanitize_actor_id(username)

    # Session rotation — check expiry ONCE, rotate if needed
    if get_memory_skill().is_session_expired(session_id):
        preferences_cache.pop(actor_id, None)
        recap_cache.pop((actor_id, cast(str, session_id)), None)
        base_options_cache.pop(actor_id, None)
        session_id = get_memory_skill().rotate_session(actor_id)

    mem_manager, session_id = get_memory_skill().get_session_manager(actor_id, session_id)

The get_memory_skill() call returns a singleton. The caches — preferences, recap, base options — are keyed by actor ID and session ID. When a session rotates, all three caches are invalidated for that actor. The next request rebuilds them from the memory store.

The qualifier check at the top is not optional. Without a qualifier, you do not know which deployment you are talking to — staging or production. Routing the wrong request to the wrong environment is the kind of mistake that is hard to debug and easy to prevent.

The lazy loading reduced per-request latency noticeably. Three round trips became one, with the result cached for the duration of the session. That is real. That is not nothing.

> STREAMING: FIFTEEN ITERATIONS ON A SCROLL BAR#

The agent streams responses as server-sent events. Parsing SSE is not complicated. Getting the UI to behave correctly while streaming is a different problem entirely.

The core issue is that streaming creates a conflict between two things the user wants simultaneously: see new content as it arrives, and be able to scroll back through history without being yanked to the bottom every time a new chunk lands.

The naive implementation auto-scrolls on every chunk. This is wrong. If the user has scrolled up to read something, the next chunk should not drag them back down. But if the user is at the bottom, they do want to follow the stream.

Here is the commit history for that scroll behavior, in order:

Fix scroll reset when reading chat history
Fix scroll behavior with null check and requestAnimationFrame
Fix streaming chunks display
Restore batch rendering for streaming deltas
Remove auto-scroll on chunk arrival

Five commits. Fifteen-plus iterations across those commits. The final behavior: scroll-to-bottom fires only when the user is already within a threshold of the bottom. If they have scrolled up, the stream continues silently and the new content is there when they scroll back down.

The other streaming problem was markdown rendering. Rendering markdown on each individual chunk produces garbage — partial code blocks, broken emphasis, half-rendered headers. The fix is to accumulate the full response and render markdown on the complete string. Display the raw delta as it arrives, swap in the rendered version when the stream closes.

Batch DOM updates matter here too. Updating the DOM on every chunk is expensive. Batching updates — accumulate chunks, flush on a requestAnimationFrame — keeps the UI responsive under fast streams. The difference is visible. Without batching, the browser struggles. With batching, it is smooth.

None of this is novel. These are known patterns. But you still have to implement them, and you still have to get them wrong a few times before you get them right.

> DEPLOY PIPELINE: GRAVITON, TRIVY, AND BLUE/GREEN#

The agent runs on Graviton. That means building for linux/arm64. Docker buildx handles the cross-compilation, but it adds time to the build. On a developer machine, a full image build takes a few minutes. In CI, plan for longer.

Trivy scans the image before deploy. The default timeout is 180 seconds. The image takes around 250 seconds to scan. The first few deploys failed at the scan step with a timeout error. The fix was to bump the timeout to 360 seconds. Not elegant, but correct.

The deploy itself is blue/green via update_agent_runtime. The current version keeps serving traffic while the new image is pushed. Once the update completes, the runtime cuts over. The previous version is recorded before the update so rollback is a single API call.

def update_runtime(state: DeployState, config: dict) -> DeployState:
    client = boto3.client('bedrock-agentcore-control', region_name=config['region'])
    current = client.get_agent_runtime(agentRuntimeId=config['agent_id'])

    # Record previous version for rollback
    endpoints = client.list_agent_runtime_endpoints(agentRuntimeId=config['agent_id'])
    for ep in endpoints.get('runtimeEndpoints', []):
        if ep.get('name', '') == 'staging':
            state.previous_version = ep.get('liveVersion', '')

    resp = client.update_agent_runtime(
        agentRuntimeId=config['agent_id'],
        agentRuntimeArtifact={'containerConfiguration': {'containerUri': state.ecr_uri}},
        roleArn=current['roleArn'],
        networkConfiguration=current['networkConfiguration'],
        environmentVariables=config['env_vars'],
    )
    state.version = resp.get('agentRuntimeVersion')
    return state

The roleArn and networkConfiguration are pulled from the current runtime state rather than hardcoded. This matters because those values can change — IAM role updates, VPC changes — and you do not want the deploy script to silently revert them to stale values.

The environmentVariables come from config, not from the current runtime state. Environment variables are the mechanism for passing deployment-specific configuration. They should be explicit in the deploy config, not inherited from whatever was there before.

The DeployState object carries the version through the pipeline. If a later step fails, the rollback function has the previous version available without needing to make another API call.

> PROXY LAYER: ROUTING ANTHROPIC API TO AMAZON Q PRO#

The agent uses Claude via the Anthropic SDK. The SDK expects to talk to Anthropic's API. The actual backend is Amazon Q Pro. The proxy layer sits between them and makes the translation invisible.

The proxy — kiro-gateway — downloads its auth database from Secrets Manager at startup, starts a uvicorn server on port 8081, and sets environment variables that redirect the Anthropic SDK to the local proxy endpoint. From the SDK's perspective, it is talking to Anthropic. From the network's perspective, the traffic goes to Amazon Q Pro.

def start_kiro_proxy() -> None:
    db_path = _download_db()
    _gateway_process = _start_gateway(db_path)
    if _wait_for_ready():
        os.environ["ANTHROPIC_BASE_URL"] = f"http://localhost:{PROXY_PORT}"
        os.environ["ANTHROPIC_API_KEY"] = PROXY_API_KEY
        os.environ["CLAUDE_CODE_USE_BEDROCK"] = "0"

The CLAUDE_CODE_USE_BEDROCK flag is set to "0" explicitly. Without this, the SDK tries to use the Bedrock endpoint directly, bypassing the proxy. The proxy needs to be the only path.

A background thread monitors token health. Amazon Q Pro has token limits. If the current token pool is running low, the monitor logs a warning. If it is exhausted, the monitor triggers a rotation. The agent keeps running through the rotation — requests queue briefly, then resume. The user sees a short pause, not an error.

The proxy lifecycle is managed by the user, not the agent. The agent starts the proxy at boot and monitors it, but it does not restart it autonomously. Restarting the proxy mid-session would drop in-flight requests and corrupt session state. That is a decision that belongs to the operator, not the runtime.

> CROSS-AGENT INVOCATION: DICE CALLS CRABSTIK#

There are two agents in this system. Dice runs in the IDE. Crabstik runs in AgentCore. They share memory but operate independently. Sometimes Dice needs to invoke Crabstik directly — to run a skill that only exists in the AgentCore runtime, or to hand off a task that benefits from the deployed environment.

The invocation pattern is straightforward: boto3 invoke_agent_runtime, stream the response, print deltas as they arrive.

def invoke(prompt, qualifier, username, session_id=None):
    cfg = _load_config()
    client = boto3.client('bedrock-agentcore', region_name=cfg['region'])

    if not session_id:
        actor_id = f"{qualifier}-{username}"
        session_id = get_or_create_session(username, actor_id, 'remote')

    resp = client.invoke_agent_runtime(
        agentRuntimeArn=cfg['agent_arn'],
        qualifier=qualifier,
        runtimeSessionId=session_id,
        payload=json.dumps({'prompt': prompt, 'username': username,
                           'session_id': session_id, 'qualifier': qualifier}),
        contentType='application/json',
    )

    for raw in resp['response']:
        for line in raw.decode('utf-8').split('\n'):
            if line.startswith('data: '):
                data = json.loads(line[6:])
                if data.get('delta'):
                    print(data['delta'], end='', flush=True)

The qualifier determines which deployment Crabstik is running — staging or production. The session ID is created if it does not exist, using the same actor-scoped pattern as the memory client. This means Crabstik picks up the same memory context that Dice has been building throughout the session.

The response is a stream of SSE lines. The parsing is the same pattern as the UI streaming — look for data: prefixed lines, parse JSON, extract the delta. The difference is that here the output goes to stdout rather than a DOM. The principle is the same.

Session persistence across the Dice/Crabstik boundary is what makes this useful. If Dice has been working with a user for an hour, Crabstik does not start cold when invoked. It inherits the session context. The handoff is transparent.

> WHAT BUILDING EARLY ACTUALLY MEANS#

AgentCore is early. The SDK works. The docs will catch up. Building on something before the tutorials exist means reading source code to understand behavior that has not been written down yet. It means shipping broken code to learn the failure modes. It means accumulating a set of patterns — session management, lazy loading, streaming, blue/green deploy — that no one has packaged into a guide because the ecosystem is still forming.

The upside is that you understand the system at a level that people who wait for the tutorials will not. You know why the session expires. You know why the Trivy timeout needed doubling. You know what happens when you conflate actor ID and session ID. That knowledge is durable. The docs, when they arrive, will confirm what you already know.


Joe Gajeckyj is a founder and infrastructure engineer with 19 years in IT, currently building AI-assisted development workflows at JRGWorkshop.