ACE replaces GPU-heavy fine-tuning, brittle RAG, and giant ever-changing prompts with an elegant framework that dynamically adjusts to your changing organization, with little or no human intervention.

See how ACE can handle difficult edge cases and produce more deterministic output for your chat and voice agents.

Right now, capturing ever-evolving edge cases usually means one of three expensive patterns.

The hardest part is not getting an LLM to answer once. The hard part is making it improve continuously without turning your prompt stack into a second codebase.

You maintain large prompts that get longer every week.

You build fragile RAG systems that retrieve inconsistent examples or outdated policy snippets.

You fine-tune models, which adds GPU cost, data-prep overhead, evaluation burden, and a slower iteration loop.

SCX ACE is a radically different approach to context engineering.

Instead of fine-tuning model weights, ACE builds and refines a compact, explainable playbook of operational rules that the model can use at inference time.

FastNew examples can become usable rules without a GPU training cycle.

Self-improvingFailures become refinement inputs.

ExplainableEvery improvement is represented as a human-readable rule.

Operationally lightNo model hosting change, no custom fine-tune artifact, no giant prompt assembled by hand.

Model-portableThe playbook can be used with different frontier or open models.

In our runs, ACE improved held-out performance across several domains.

Dataset

No Playbook

With Playbook

Lift

Customer service style

27 / 60 · 45.0%

32 / 60 · 53.3%

+8.3 pts

Support policy

30 / 60 · 50.0%

39 / 60 · 65.0%

+15.0 pts

Finance XBRL tagging

10 / 20 · 71.3%

11 / 20 · 75.0%

+3.7 pts

The practical takeaway: teams can often get meaningful gains without enormous dynamic prompts or constant fine-tuning.

The flywheel looks deceptively simple. The mechanism underneath automates the context-engineering loop.

Instead of asking prompt engineers to manually maintain a growing policy prompt, ACE observes examples, extracts reusable behavior, tests that behavior against validation cases, and promotes only the rule changes that improve performance. The result is a playbook: a living operational context layer for your model.

Imagine your support team gathers:

200training examples

100validation examples

60held-out test examples

{
  "context": "Category: DELIVERY
              Intent: delivery_period
              Flags: BIL",
  "question": "Customer message:
                what is the shipping period?",
  "target": "Could you please provide your
              {{Tracking Number}} or {{Order Number}}
              so we can provide an accurate
              delivery estimate?"
}

ACE uses the training examples to construct an initial playbook. Then it uses the validation set to find weak spots. Finally, it runs refinement passes that update the playbook only when the full validation gate improves.

06 Training

ACE "trains" on the 200 examples, producing a human-readable set of rules: the playbook.

playbook.md

12345678

# [prob-00023] · Support policy

For delivery_period intent, request the Tracking Number or Order Number before quoting a shipping period.

# [prob-00012] · Customer service workflow

On damaged or missing-item reports, acknowledge, apologise, and request the order ID before promising resolution.

# [ctx-00137] · Finance tagging

Unrecognised stock-based comp on non-vested awards → tag EmployeeServiceShareBasedCompensation…NotYetRecognized.

07 Validation

This is not blind prompt growth. It is gated context evolution.

After ACE creates an initial playbook from the 200 training examples, it evaluates that playbook against the validation set.

Candidate

Validation
gate

3 outcomes

↑

Promote

If a refinement candidate improves the validation score, ACE promotes it.

⤺

Roll back

If the candidate fails to improve, ACE rolls back and keeps the current playbook.

↻

Training signal

If the model fails, ACE keeps the failure as training signal.

08 Refinement

Refinement

The playbook gets refined through multiple passes of reflection. Walk through the refinement steps (which are completely auditable) to better understand how ACE works under the hood. Below, you can find some initial failures found during the first refinement pass.

30 / 100Initial

40 / 100Pass 1

50 / 100Pass 2

09 Results

Customer Service Workflow

Customer: "Hi, I'm contacting you because package damaged on delivery."

No playbook

With ACE playbook

Most AI teams are stuck choosing between brittle prompts, fragile retrieval, or expensive fine-tuning.

SCX ACE creates a fourth path.

It learns from examples, writes down what it learned, tests itself against failures, and promotes only the rules that improve behavior.

It is not a replacement for frontier models. It is the operational memory those models are missing.

ACE makes models better by giving them a living, testable, explainable playbook.

↑ Back to top