Migration paper · Results · Forge silent / assisted / Sumi manual

The three paths

Path	Effort	One-shot LLM cost	Manifest elements
Forge silent `yf migrate --ai-silent`	23 s wall, no human input	$0.0468 · 2224 in / 2672 out	23 (4 region/field + 19 actions)
Forge assisted `yf migrate --ai-assisted`	25 s wall, model self-reports no ambiguities	$0.0473 · 2379 in / 2677 out	23 (identical to silent for this app)
Sumi manual hand-decorated reference	~30 min	--	23 (2 region + 2 field + 19 actions)

Why assisted == silent here. Assisted mode lets the model raise ambiguity questions ("is this 'save' or 'submit'?") before producing the final decoration. The calculator's button labels and inline onclick handlers leave no ambiguity, so the model returns the same JSON as silent. Assisted's value shows up on apps with overlapping action semantics (CRM forms with two save-like buttons, multi-tab dashboards, etc.).

Functional success

All three decorations drive the calc correctly on every task in the suite.

Task	Forge silent	Forge assisted	Sumi manual	Expected
3 + 4 =	7 OK	7 OK	7 OK	7
12 + 15 =	27 OK	27 OK	27 OK	27
7 x 8 =	56 OK	56 OK	56 OK	56
100 / 4 =	25 OK	25 OK	25 OK	25
clear + 2+2+2 =	6 OK	6 OK	6 OK	6

Cost per dispatch

The same driver (Sonnet 4.6) runs each of the 5 tasks against each fixture. Tokens and latency are mean per task; cost is the sum over 5 tasks.

Fixture	Mean tokens in	Mean tokens out	Mean LLM latency	Total cost (5 tasks)
forge-silent	1153	114	2510 ms	$0.02575
forge-assisted	1153	114	2398 ms	$0.02564
sumi-manual	1442	113	2542 ms	$0.03004

Manual costs +17% to drive per task compared to Forge. The delta is the label_i18n payload Sumi attached to every action + field + region. Useful for accessibility, neutral for the model picking a verb -- the model already has the verb name.

Total cost of ownership

One-time decoration plus N dispatches at the per-task rates above. (Decoration cost for manual modeled at $30 = 30 min @ $60/h for a human.)

Approach	Decoration	Per task	At N = 100	At N = 10000
Forge silent	$0.0468	$0.00515	$0.56	$51.5
Forge assisted	$0.0473	$0.00513	$0.56	$51.3
Sumi manual	$30.00 (30 min)	$0.00601	$30.60	$90.1

Forge wins on TCO out to the multi-thousand-dispatch range. For the long tail (>50k dispatches in one app), a hybrid wins: have Forge produce the manifest, then trim label_i18n off non-action elements in a one-line post-pass.

Latency

Driver latency (LLM round-trip) is ~2.5 s mean across all three. The decoration call itself (one-shot Claude Sonnet 4.6 with the whole HTML + JS) is ~23 s -- one-off per app, ignorable in TCO. Dispatch latency in-browser is <50 ms per verb (NAC.click_by_verb is essentially DOM .click() + a contract event).

Conclusions

NAC3 migration is cheap. Forge does a clean job for $0.05 and 25 s. The output is drop-in: same UI, plus a JSON manifest the agent reads.
Silent beats manual on operational cost. +17% input tokens for the manual reference is the cost of being thorough on label_i18n. If you're driving the app from an LLM thousands of times, drop the i18n off non-action elements.
Assisted only earns its keep on ambiguous apps. For a calculator, the model has nothing to ask. For a CRM with two save-like buttons, the assisted round saves a bad guess.
Functional parity is the headline. Five tasks, three decoration paths, zero failures -- the model picks the right verb chain from any of the three manifest shapes.