Correction Engine
The Correction Engine is CertusOrdo's self-healing component. When the Decision Engine chooses ROLLBACK_AND_RETRY, the Correction Engine determines what to change before the retry attempt — transforming failures into successes without human intervention.
Overview
Decision: ROLLBACK_AND_RETRY → [CORRECTION ENGINE] → Correction Payload → Retry
│
├── Anomaly Analysis
├── Strategy Selection
├── Payload Generation
└── Feedback Loop
Key Insight: Most AI agent failures aren't random — they follow patterns. A timeout often needs a longer timeout. A scope violation needs constrained permissions. The Correction Engine encodes this knowledge.
Why Corrections Matter
Traditional retry logic: "Try again and hope for the best."
CertusOrdo retry logic: "Try again with these specific adjustments."
Example:
Anomaly: Transaction value exceeds limit ($15,000 > $10,000)
Traditional: Retry same transaction → Same failure
CertusOrdo: Split into two $7,500 transactions → Success
Correction Strategies
The Correction Engine supports 20 distinct strategies:
Parameter Adjustments
| Strategy | When Used | Example |
|---|---|---|
ADJUST_PARAMETER |
Config values causing issues | Increase timeout from 30s to 60s |
REDUCE_SCOPE |
Too many operations at once | Process 100 records instead of 1000 |
DECREASE_BATCH_SIZE |
Batch processing overload | Reduce from 50 to 10 items |
INCREASE_TIMEOUT |
Operations timing out | Extend deadline |
RATE_LIMIT_SELF |
Agent moving too fast | Add 100ms delay between calls |
Content Modifications
| Strategy | When Used | Example |
|---|---|---|
MODIFY_FIELD |
Wrong value in specific field | Change currency from USD to EUR |
ADD_CONTEXT |
Missing necessary information | Include customer ID in request |
REMOVE_AMBIGUITY |
Unclear instructions | Specify exact output format |
ENFORCE_FORMAT |
Output schema issues | Add JSON schema constraint |
CONSTRAIN_OUTPUT |
Output too verbose/complex | Limit response to 500 tokens |
Behavioral Changes
| Strategy | When Used | Example |
|---|---|---|
ADD_INSTRUCTION |
Agent needs more guidance | "Verify before submitting" |
SIMPLIFY_TASK |
Task too complex | Break into 3 sequential steps |
DECOMPOSE_TASK |
Multi-part task failing | Execute parts independently |
ADD_VALIDATION |
Missing pre/post checks | Verify balance before transfer |
REQUEST_CONFIRMATION |
High-risk action flagged | Require explicit approval step |
Recovery Actions
| Strategy | When Used | Example |
|---|---|---|
RETRY_AS_IS |
Transient error | Network glitch, just retry |
USE_FALLBACK |
Primary approach failed | Switch to backup API endpoint |
SWITCH_MODEL |
Current model underperforming | Use GPT-4 instead of GPT-3.5 |
CACHE_RESULT |
Repeated expensive operations | Store and reuse intermediate results |
ESCALATE_TO_HUMAN |
Cannot auto-correct safely | Route to human operator |
API Reference
Generate Correction
Request Body:
{
"decision_id": "uuid",
"transaction_id": "uuid",
"anomalies": [
{
"type": "value_bounds",
"severity": "medium",
"code": "VAL002",
"message": "Transaction value $15,000 exceeds limit $10,000",
"details": {
"actual_value": 15000.00,
"limit": 10000.00,
"field": "value_usd"
}
}
],
"retry_count": 0,
"original_payload": {
"action": "wire_transfer",
"amount": 15000.00,
"recipient": "account_xyz"
}
}
Response:
{
"correction_id": "uuid",
"transaction_id": "uuid",
"strategy": "DECOMPOSE_TASK",
"confidence": 0.89,
"corrections": [
{
"action": "SPLIT_TRANSACTION",
"description": "Split single transaction into two within limits",
"original_field": "amount",
"original_value": 15000.00,
"corrected_payloads": [
{
"action": "wire_transfer",
"amount": 7500.00,
"recipient": "account_xyz",
"sequence": 1
},
{
"action": "wire_transfer",
"amount": 7500.00,
"recipient": "account_xyz",
"sequence": 2
}
]
}
],
"reasoning": "Value exceeds single-transaction limit. Decomposing into two transactions of $7,500 each keeps both within bounds while completing the full transfer.",
"estimated_success_probability": 0.94,
"retry_delay_ms": 1000
}
Preview Correction (Dry Run)
Same request body as /generate, but returns the correction without executing it. Useful for testing and debugging.
List Available Strategies
Response:
{
"strategies": [
{
"name": "MODIFY_FIELD",
"description": "Change a specific field value to correct an anomaly",
"applicable_anomaly_types": ["value_bounds", "schema", "consistency"],
"risk_level": "low",
"requires_original_payload": true
},
{
"name": "DECOMPOSE_TASK",
"description": "Break a complex task into smaller sequential steps",
"applicable_anomaly_types": ["value_bounds", "rate_limit", "scope"],
"risk_level": "medium",
"requires_original_payload": true
}
// ... 18 more strategies
]
}
Submit Feedback
Request Body:
{
"correction_id": "uuid",
"outcome": "success",
"retry_count": 1,
"final_confidence": 0.96,
"notes": "Split transaction strategy worked on first retry"
}
Feedback improves future correction selection.
Strategy Selection Algorithm
The Correction Engine selects strategies based on anomaly type and context:
def select_strategy(anomalies, context):
# Priority 1: Direct match
for anomaly in anomalies:
if template := get_template(anomaly.type, context.org_id):
return template.strategy
# Priority 2: Severity-based defaults
if any(a.severity == "critical" for a in anomalies):
return "ESCALATE_TO_HUMAN"
# Priority 3: Anomaly type mapping
strategy_map = {
"value_bounds": "MODIFY_FIELD",
"rate_limit": "RATE_LIMIT_SELF",
"timeout": "INCREASE_TIMEOUT",
"scope": "REDUCE_SCOPE",
"schema": "ENFORCE_FORMAT",
"behavioral": "ADD_INSTRUCTION",
"content_quality": "CONSTRAIN_OUTPUT",
}
primary_anomaly = max(anomalies, key=lambda a: a.severity_weight)
return strategy_map.get(primary_anomaly.type, "RETRY_AS_IS")
Correction Templates
Organizations can define custom correction templates:
correction_template = {
"name": "payment_limit_exceeded",
"description": "Handle transactions exceeding single-payment limits",
# When this template applies
"trigger": {
"anomaly_type": "value_bounds",
"anomaly_code": "VAL002",
"context_match": {
"action_type": ["wire_transfer", "ach_payment"]
}
},
# What correction to apply
"strategy": "DECOMPOSE_TASK",
"parameters": {
"split_method": "equal",
"max_per_transaction": 10000.00,
"delay_between_ms": 5000
},
# Metadata
"success_rate": 0.92,
"avg_retries": 1.1,
"last_updated": "2026-01-15"
}
Retry Logic
The Correction Engine manages retry attempts with exponential backoff:
Retry 1: Apply correction, wait 1 second
↓ (if still fails)
Retry 2: Apply enhanced correction, wait 2 seconds
↓ (if still fails)
Retry 3: Apply aggressive correction, wait 4 seconds
↓ (if still fails)
Escalate to human or terminate
Correction Escalation:
| Retry | Correction Approach |
|---|---|
| 1 | Minimal adjustment (same strategy) |
| 2 | Enhanced adjustment (stronger parameters) |
| 3 | Alternative strategy |
| 4+ | Human escalation |
Integration with Decision Engine
The Correction Engine is invoked when the Decision Engine returns ROLLBACK_AND_RETRY:
async def handle_rollback_and_retry(decision, transaction):
# Step 1: Rollback the transaction
await transaction.rollback()
# Step 2: Generate correction
correction = await correction_engine.generate(
decision_id=decision.id,
transaction_id=transaction.id,
anomalies=decision.anomalies,
retry_count=decision.retry_count,
original_payload=transaction.payload
)
# Step 3: Apply correction to payload
corrected_payload = apply_correction(
original=transaction.payload,
correction=correction
)
# Step 4: Retry with corrected payload
retry_result = await transaction.retry(
payload=corrected_payload,
delay_ms=correction.retry_delay_ms
)
# Step 5: Submit feedback for learning
await correction_engine.feedback(
correction_id=correction.id,
outcome="success" if retry_result.success else "failure",
final_confidence=retry_result.confidence
)
return retry_result
Success Metrics
Track correction effectiveness:
| Metric | Target | Current |
|---|---|---|
| First-retry success rate | > 70% | 74% |
| Overall correction success | > 90% | 91% |
| Average retries to success | < 2.0 | 1.4 |
| Human escalation rate | < 10% | 7% |
Design Principles
- Deterministic — Same anomaly patterns produce same corrections
- Conservative — Start with minimal changes, escalate if needed
- Traceable — Every correction is logged for learning
- Configurable — Templates allow org-specific corrections
- Safe — Never make corrections that could cause harm
Failure Modes
When corrections can't be generated safely:
| Scenario | Response |
|---|---|
| Unknown anomaly type | Return RETRY_AS_IS with low confidence |
| Critical severity | Return ESCALATE_TO_HUMAN |
| No applicable template | Use default strategy mapping |
| Original payload missing | Return error, require payload |
| Max retries exceeded | Return ESCALATE_TO_HUMAN |
Next Steps
When decisions require human notification, the Notification Engine handles multi-channel delivery with escalation chains.