Transactional Outbox Pattern¶
Guarantee atomicity between database transactions and event publishing with Curve's transactional outbox pattern.
The Problem¶
Without transactional outbox:
@Transactional
public Order createOrder(OrderRequest request) {
// 1. Save to database
Order order = orderRepository.save(new Order(request));
// 2. Publish to Kafka
kafkaProducer.send("orders", orderEvent); // ❌ What if this fails?
return order;
}
Issues:
- Order saved but event not published → Lost event
- Event published but transaction rolled back → Ghost event
- No atomicity between DB and Kafka
The Solution¶
Transactional outbox saves events to the database in the same transaction:
@Transactional
@PublishEvent(
eventType = "ORDER_CREATED",
outbox = true, // ✓ Enable outbox
aggregateType = "Order", // Entity type
aggregateId = "#result.id" // Entity ID
)
public Order createOrder(OrderRequest request) {
return orderRepository.save(new Order(request));
}
How It Works¶
sequenceDiagram
participant App as Application
participant DB as Database (Outbox)
participant Kafka as Kafka Broker
participant Poller as Outbox Publisher
Note over App, DB: Transaction Start
App->>DB: 1. Save Business Entity (Order)
App->>DB: 2. Save Outbox Event (PENDING)
DB-->>App: TX Committed ✓
loop Every 1s (Polling)
Poller->>DB: 3. Fetch PENDING Events
Poller->>Kafka: 4. Publish to Kafka
alt Success
Kafka-->>Poller: Ack
Poller->>DB: 5. Update Status to PUBLISHED
else Failure
Poller->>DB: 5. Increment Retry Count
end
end
Steps:
- Save entity and event in same transaction - Atomicity guaranteed
- Background poller - Periodically checks for unsent events
- Publish to Kafka - Events sent asynchronously
- Mark as sent - Successful events marked complete
Configuration¶
Enable Outbox¶
spring:
datasource:
url: jdbc:postgresql://localhost:5432/mydb
username: user
password: pass
jpa:
hibernate:
ddl-auto: update # Or use Flyway/Liquibase
curve:
outbox:
enabled: true
poll-interval-ms: 1000 # Poll every 1 second
batch-size: 100 # Process 100 events per batch
max-retries: 3 # Retry failed events 3 times
cleanup-enabled: true # Auto-cleanup old events
retention-days: 7 # Keep events for 7 days
cleanup-cron: "0 0 2 * * *" # Cleanup at 2 AM daily
Database Schema¶
Curve auto-creates the outbox table:
CREATE TABLE curve_outbox_events (
id BIGINT PRIMARY KEY,
event_id VARCHAR(255) NOT NULL,
event_type VARCHAR(255) NOT NULL,
aggregate_type VARCHAR(255) NOT NULL,
aggregate_id VARCHAR(255) NOT NULL,
payload TEXT NOT NULL,
metadata TEXT,
status VARCHAR(50) NOT NULL, -- PENDING, SENT, FAILED
retry_count INT DEFAULT 0,
last_retry_at TIMESTAMP,
created_at TIMESTAMP NOT NULL,
sent_at TIMESTAMP,
error_message TEXT
);
CREATE INDEX idx_status ON curve_outbox_events(status);
CREATE INDEX idx_created_at ON curve_outbox_events(created_at);
Usage Examples¶
Basic Outbox¶
@Transactional
@PublishEvent(
eventType = "ORDER_CREATED",
outbox = true,
aggregateType = "Order",
aggregateId = "#result.id"
)
public Order createOrder(OrderRequest request) {
return orderRepository.save(new Order(request));
}
With Custom Payload¶
@Transactional
@PublishEvent(
eventType = "USER_REGISTERED",
outbox = true,
aggregateType = "User",
aggregateId = "#result.userId",
payload = "#result.toRegisteredPayload()"
)
public User registerUser(UserRequest request) {
User user = userRepository.save(new User(request));
// Other operations in same transaction
auditRepository.save(new AuditLog("User registered"));
return user;
}
Complex Transaction¶
@Transactional
@PublishEvent(
eventType = "ORDER_COMPLETED",
outbox = true,
aggregateType = "Order",
aggregateId = "#result.id"
)
public Order completeOrder(Long orderId) {
// 1. Update order
Order order = orderRepository.findById(orderId)
.orElseThrow();
order.setStatus(OrderStatus.COMPLETED);
order.setCompletedAt(Instant.now());
// 2. Update inventory
inventoryService.decrementStock(order.getItems());
// 3. Create invoice
Invoice invoice = invoiceService.create(order);
// All operations atomic - event published after TX commits
return orderRepository.save(order);
}
Advanced Features¶
Exponential Backoff¶
Failed events are retried with exponential backoff:
Retry 1: Immediate
Retry 2: 2 seconds later
Retry 3: 4 seconds later (2s × 2)
Retry 4: 8 seconds later (4s × 2)
SKIP LOCKED for Multi-Instance¶
Prevents duplicate processing in multi-instance deployments:
SELECT * FROM curve_outbox_events
WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED; -- Prevents duplicates
Automatic Cleanup¶
Old events are automatically cleaned up:
curve:
outbox:
cleanup-enabled: true
retention-days: 7 # Delete events older than 7 days
cleanup-cron: "0 0 2 * * *" # Daily at 2 AM
Monitoring¶
Health Check¶
Metrics¶
{
"outbox": {
"pendingEvents": 5,
"sentEvents": 1518,
"failedEvents": 2,
"avgProcessingTimeMs": 45
}
}
Query Outbox Table¶
-- Pending events
SELECT * FROM curve_outbox_events
WHERE status = 'PENDING'
ORDER BY created_at DESC;
-- Failed events
SELECT event_type, error_message, retry_count
FROM curve_outbox_events
WHERE status = 'FAILED'
ORDER BY created_at DESC;
-- Event throughput
SELECT DATE(created_at) as date, COUNT(*)
FROM curve_outbox_events
WHERE status = 'SENT'
GROUP BY DATE(created_at)
ORDER BY date DESC;
Outbox Replay API¶
Replay previously published events from the outbox. This is useful for:
- Consumer failure recovery: Reprocess events after consumer downtime
- Data corrections: Re-publish events to fix downstream state
- Testing: Replay historical events in test environments
Endpoint¶
GET /actuator/curve-outbox - View outbox statistics
POST /actuator/curve-outbox - Replay events since timestamp
Enable the Endpoint¶
GET /actuator/curve-outbox¶
View outbox statistics:
Response:
POST /actuator/curve-outbox¶
Replay events since a given timestamp with optional limit:
curl -X POST http://localhost:8080/actuator/curve-outbox \
-H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
-d '{"since": "2026-03-01T00:00:00Z", "limit": 100}'
Request Body:
{
"since": "2026-03-01T00:00:00Z", // ISO-8601 timestamp (required)
"limit": 100 // Max events to replay (optional, default: 1000)
}
Response:
{
"since": "2026-03-01T00:00:00Z",
"limit": 100,
"total": 42, // Events found since timestamp
"success": 40, // Successfully replayed
"failed": 2, // Failed during replay
"failedEventIds": ["evt-001", "evt-002"]
}
Idempotency¶
Since outbox events can be replayed multiple times, consumer implementations must be idempotent:
@Service
public class OrderEventConsumer {
@KafkaListener(topics = "orders")
public void handleOrderCreated(String message) {
OrderCreatedEvent event = parse(message);
// Idempotent check: use event ID as unique key
if (orderRepository.existsByEventId(event.eventId)) {
log.info("Event {} already processed, skipping", event.eventId);
return; // Safely skip duplicate
}
// Process event
Order order = new Order(event);
orderRepository.save(order);
}
}
Replay Scenarios¶
Scenario 1: Recover from consumer downtime
# Consumer was down from 10:00 to 10:30
# Replay events from that period
curl -X POST http://localhost:8080/actuator/curve-outbox \
-H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
-d '{
"since": "2026-03-04T10:00:00Z",
"limit": 500
}'
Scenario 2: Fix consumer bug
# Consumer had a bug for 1 hour
# Re-publish all events from that period
curl -X POST http://localhost:8080/actuator/curve-outbox \
-H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
-d '{
"since": "2026-03-04T09:00:00Z",
"limit": 5000
}'
Important Notes¶
- Already-published events can be replayed - The replay API will re-publish events regardless of their current status
- Consumers must handle duplicates - Implement idempotent processing using event IDs
- Throughput considerations - Large replays may impact Kafka performance; adjust batch size as needed
- Default topic respected - Replayed events use the same topic as the original event
Best Practices¶
DO¶
- Use outbox for critical events - Order creation, payments, etc.
- Monitor pending events - Alert if queue grows
- Set appropriate retention - Balance storage and auditability
- Use database indexes - Optimize poller queries
- Test failure scenarios - Ensure recovery works
DON'T¶
- Use outbox for high-volume events (>10,000/sec)
- Set poll interval too low (<500ms)
- Disable cleanup in production
- Ignore failed events
Performance Considerations¶
| Factor | Impact | Recommendation |
|---|---|---|
| Poll Interval | Lower = faster, higher DB load | 1000ms for most cases |
| Batch Size | Larger = more throughput | 100-500 events |
| Retention | Longer = more storage | 7-30 days |
| Indexes | Essential for performance | On status, created_at |
Throughput Estimates¶
| Configuration | Throughput |
|---|---|
| Poll: 1s, Batch: 100 | ~100 events/sec |
| Poll: 500ms, Batch: 200 | ~400 events/sec |
| Poll: 1s, Batch: 500 | ~500 events/sec |
High Throughput
For >1,000 events/sec, use async mode without outbox:
Troubleshooting¶
Events Stuck in PENDING¶
Events not being published
Check:
- Outbox poller is running:
logging.level.io.github.closeup1202.curve.outbox=DEBUG - Kafka is accessible
- No DB connection pool exhaustion
High Failed Event Count¶
Many events in FAILED status
Solutions:
- Check Kafka connectivity
- Increase
max-retries - Analyze
error_messagecolumn - Manual republish:
Comparison: Outbox vs Async¶
| Feature | Outbox | Async |
|---|---|---|
| Atomicity | ✅ Guaranteed | ❌ Best-effort |
| Throughput | ~1,000 TPS | ~10,000+ TPS |
| Latency | ~1-2 seconds | <100ms |
| Storage | DB required | No extra storage |
| Complexity | Medium | Low |
Use outbox when:
- Atomicity is critical (payments, orders)
- Events must not be lost
- Moderate throughput (<1,000 TPS)
Use async when:
- High throughput needed
- Best-effort delivery acceptable
- Low latency required
What's Next?¶
-
Observability
Monitor your events
-
Configuration
Advanced configuration options