A CLI for managing mobile in-app purchases

The studio I'm contracting with, Epic Games, ships in-app purchases for Fortnite on three mobile platforms: Apple App Store, Google Play, and ONE Store. When I started, all of this was done manually. I built tools to make it faster and more reliable, one per task per platform. By the time there were four of them, I noticed they were all roughly the same shape: read a curated input, resolve some lookups, hit a platform API, report the result. Each tool was solving its narrow problem fine. The redundancy across them was the problem.

The first real pressure point was pricing. V-Bucks price changes affect around a hundred products across the three platforms. Google Play alone requires 79 regional currencies per product. The four single-purpose tools didn't compose well. Updating pricing across all three platforms meant running three tools sequentially, eyeballing the results between runs, hoping nothing was missed.

So I built one CLI to replace them. Three domains under one entry point (pricing, store copy, localization), with shared infrastructure underneath. The framework was the product. Once it existed, adding a new capability stopped meaning "build a fifth tool" and started meaning "add a new function inside the existing one."

The proof was the iOS Global ship

I scoped a net-new iOS Global capability on a Friday and had it in production the following Tuesday morning. Around a hundred IAP products, 175 pricing territories per product, 1,649 new locale entries. About twenty minutes of API runtime once the code path was right. Four working days from spec to production.

The piece that makes a four-day turnaround possible isn't writing code faster. It's that the framework was already there. The ingest pipeline already knew how to read source spreadsheets. The reporting layer already knew how to summarize a run. The auth scaffolding already worked. What I added was the actual platform-specific code for the new capability. Everything else was already paid for.

The load-bearing decision: hybrid, not fully agentic

The original framing was that everything in this CLI was agentic. An AI agent (Claude) would read the structured input data and decide which tools to call. The agent doesn't do the hard work itself. Deterministic code parses spreadsheets, resolves regional pricing, and calls the platform APIs. The agent picks what to call, when, and in what order.

The framing was right inside the loops that actually had branching decisions to make. Pricing across three platforms with different APIs, different auth, different failure modes. The agent earns its keep there. Store-copy updates across many locales, with the model handling task-type variation. Same.

But not every loop needs an agent. Some work is just "iterate over data, call one function." Localization, in its current form, is exactly that. Every product gets the same operation. There's no platform to pick. No edge case to route around. No real decision for the model to add.

The immediate trigger to extract that loop from the agent was performance. Every agent loop turn costs a round-trip to the model. Claude has to be asked which tool to call next. For a loop with no real decision to make, that round-trip is pure latency in exchange for a "decision" the code already knew. Removing the agent from the deterministic loop cut the per-product latency to whatever the platform API itself takes.

The principle the trigger surfaced: use the agent where it has real decisions to make. When the work reduces to "iterate over data, call one function," there's no decision for the model to add. Deterministic loops are faster, more predictable, and easier to debug. The agent earns its keep where the work genuinely branches.

The deeper version: agentic isn't a property of the system. It's a property of the loop. A "hybrid agentic/deterministic CLI" sounds like a compromise. It's actually the more honest framing. Different parts of the work have different shapes, and the system should mirror that.

Two-phase processing is the other framework decision

Every run, agentic or deterministic, is split into two phases. First, ingest. The tool reads the input spreadsheet (pricing rosters, locale sheets, or store-copy drafts, depending on the run) and resolves it into a structured JSON file. That file is human-readable and inspectable. You can see exactly what the system is about to do before anything writes to a platform.

Then run. The run phase reads the JSON, calls the appropriate platform tools, retries failures, and writes a markdown report.

The point of separating them is that the middle layer, the JSON, is a checkpoint. A reviewer can read it. A bug can be diagnosed from it. A re-run can resume from it. The cost of running through a half-broken automation is bounded.

For platforms that have public APIs (Apple App Store, Google Play), this is straightforward. For the third platform, ONE Store, there's no public API, so the tool drives a browser through Playwright MCP. The agent observes pages at runtime and decides what to interact with. The two-phase shape still applies. The difference is that the "run" phase pilots a UI instead of calling an endpoint.

Where it stands

Pricing, store copy, and IAP localization all ship in production. The product/creation domain is planned but not yet built. The predecessor tools have the creation code; it needs to be ported into the same framework.

The iOS Global ship validated the consolidation thesis end-to-end. Net-new capability scoped Friday and live by Tuesday morning, around twenty minutes of API runtime once the code path was right, every update verified against the platform API after write. No silent failures.

The agent isn't smart. The infrastructure around it is.