Transactional Outbox Pattern¶

Guarantee atomicity between database transactions and event publishing with Curve's transactional outbox pattern.

The Problem¶

Without transactional outbox:

@Transactional
public Order createOrder(OrderRequest request) {
    // 1. Save to database
    Order order = orderRepository.save(new Order(request));

    // 2. Publish to Kafka
    kafkaProducer.send("orders", orderEvent);  // ❌ What if this fails?

    return order;
}

Issues:

Order saved but event not published → Lost event
Event published but transaction rolled back → Ghost event
No atomicity between DB and Kafka

The Solution¶

Transactional outbox saves events to the database in the same transaction:

@Transactional
@PublishEvent(
    eventType = "ORDER_CREATED",
    outbox = true,                    // ✓ Enable outbox
    aggregateType = "Order",          // Entity type
    aggregateId = "#result.id"        // Entity ID
)
public Order createOrder(OrderRequest request) {
    return orderRepository.save(new Order(request));
}

How It Works¶

sequenceDiagram
    participant App as Application
    participant DB as Database (Outbox)
    participant Kafka as Kafka Broker
    participant Poller as Outbox Publisher

    Note over App, DB: Transaction Start
    App->>DB: 1. Save Business Entity (Order)
    App->>DB: 2. Save Outbox Event (PENDING)
    DB-->>App: TX Committed ✓

    loop Every 1s (Polling)
        Poller->>DB: 3. Fetch PENDING Events
        Poller->>Kafka: 4. Publish to Kafka
        alt Success
            Kafka-->>Poller: Ack
            Poller->>DB: 5. Update Status to PUBLISHED
        else Failure
            Poller->>DB: 5. Increment Retry Count
        end
    end

Steps:

Save entity and event in same transaction - Atomicity guaranteed
Background poller - Periodically checks for unsent events
Publish to Kafka - Events sent asynchronously
Mark as sent - Successful events marked complete

Configuration¶

Enable Outbox¶

application.yml

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/mydb
    username: user
    password: pass

  jpa:
    hibernate:
      ddl-auto: update  # Or use Flyway/Liquibase

curve:
  outbox:
    enabled: true
    poll-interval-ms: 1000      # Poll every 1 second
    batch-size: 100             # Process 100 events per batch
    max-retries: 3              # Retry failed events 3 times
    cleanup-enabled: true       # Auto-cleanup old events
    retention-days: 7           # Keep events for 7 days
    cleanup-cron: "0 0 2 * * *" # Cleanup at 2 AM daily

Database Schema¶

Curve auto-creates the outbox table:

CREATE TABLE curve_outbox_events (
    id BIGINT PRIMARY KEY,
    event_id VARCHAR(255) NOT NULL,
    event_type VARCHAR(255) NOT NULL,
    aggregate_type VARCHAR(255) NOT NULL,
    aggregate_id VARCHAR(255) NOT NULL,
    payload TEXT NOT NULL,
    metadata TEXT,
    status VARCHAR(50) NOT NULL,  -- PENDING, SENT, FAILED
    retry_count INT DEFAULT 0,
    last_retry_at TIMESTAMP,
    created_at TIMESTAMP NOT NULL,
    sent_at TIMESTAMP,
    error_message TEXT
);

CREATE INDEX idx_status ON curve_outbox_events(status);
CREATE INDEX idx_created_at ON curve_outbox_events(created_at);

Usage Examples¶

Basic Outbox¶

@Transactional
@PublishEvent(
    eventType = "ORDER_CREATED",
    outbox = true,
    aggregateType = "Order",
    aggregateId = "#result.id"
)
public Order createOrder(OrderRequest request) {
    return orderRepository.save(new Order(request));
}

With Custom Payload¶

@Transactional
@PublishEvent(
    eventType = "USER_REGISTERED",
    outbox = true,
    aggregateType = "User",
    aggregateId = "#result.userId",
    payload = "#result.toRegisteredPayload()"
)
public User registerUser(UserRequest request) {
    User user = userRepository.save(new User(request));

    // Other operations in same transaction
    auditRepository.save(new AuditLog("User registered"));

    return user;
}

Complex Transaction¶

@Transactional
@PublishEvent(
    eventType = "ORDER_COMPLETED",
    outbox = true,
    aggregateType = "Order",
    aggregateId = "#result.id"
)
public Order completeOrder(Long orderId) {
    // 1. Update order
    Order order = orderRepository.findById(orderId)
        .orElseThrow();
    order.setStatus(OrderStatus.COMPLETED);
    order.setCompletedAt(Instant.now());

    // 2. Update inventory
    inventoryService.decrementStock(order.getItems());

    // 3. Create invoice
    Invoice invoice = invoiceService.create(order);

    // All operations atomic - event published after TX commits
    return orderRepository.save(order);
}

Advanced Features¶

Exponential Backoff¶

Failed events are retried with exponential backoff:

Retry 1: Immediate
Retry 2: 2 seconds later
Retry 3: 4 seconds later  (2s × 2)
Retry 4: 8 seconds later  (4s × 2)

SKIP LOCKED for Multi-Instance¶

Prevents duplicate processing in multi-instance deployments:

SELECT * FROM curve_outbox_events
WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;  -- Prevents duplicates

Automatic Cleanup¶

Old events are automatically cleaned up:

curve:
  outbox:
    cleanup-enabled: true
    retention-days: 7           # Delete events older than 7 days
    cleanup-cron: "0 0 2 * * *" # Daily at 2 AM

Monitoring¶

Health Check¶

curl http://localhost:8080/actuator/health/curve-outbox

{
  "status": "UP",
  "details": {
    "pendingEvents": 5,
    "failedEvents": 2,
    "totalEvents": 1523
  }
}

Metrics¶

curl http://localhost:8080/actuator/curve-metrics

{
  "outbox": {
    "pendingEvents": 5,
    "sentEvents": 1518,
    "failedEvents": 2,
    "avgProcessingTimeMs": 45
  }
}

Query Outbox Table¶

-- Pending events
SELECT * FROM curve_outbox_events
WHERE status = 'PENDING'
ORDER BY created_at DESC;

-- Failed events
SELECT event_type, error_message, retry_count
FROM curve_outbox_events
WHERE status = 'FAILED'
ORDER BY created_at DESC;

-- Event throughput
SELECT DATE(created_at) as date, COUNT(*)
FROM curve_outbox_events
WHERE status = 'SENT'
GROUP BY DATE(created_at)
ORDER BY date DESC;

Outbox Replay API¶

Replay previously published events from the outbox. This is useful for:

Consumer failure recovery: Reprocess events after consumer downtime
Data corrections: Re-publish events to fix downstream state
Testing: Replay historical events in test environments

Endpoint¶

GET  /actuator/curve-outbox    - View outbox statistics
POST /actuator/curve-outbox    - Replay events since timestamp

Enable the Endpoint¶

application.yml

management:
  endpoints:
    web:
      exposure:
        include: curve-outbox

GET /actuator/curve-outbox¶

View outbox statistics:

curl http://localhost:8080/actuator/curve-outbox

Response:

{
  "total": 1523,
  "pending": 5,
  "published": 1516,
  "failed": 2,
  "avgProcessingTimeMs": 45
}

POST /actuator/curve-outbox¶

Replay events since a given timestamp with optional limit:

curl -X POST http://localhost:8080/actuator/curve-outbox \
  -H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
  -d '{"since": "2026-03-01T00:00:00Z", "limit": 100}'

Request Body:

{
  "since": "2026-03-01T00:00:00Z",  // ISO-8601 timestamp (required)
  "limit": 100                        // Max events to replay (optional, default: 1000)
}

Response:

{
  "since": "2026-03-01T00:00:00Z",
  "limit": 100,
  "total": 42,          // Events found since timestamp
  "success": 40,        // Successfully replayed
  "failed": 2,          // Failed during replay
  "failedEventIds": ["evt-001", "evt-002"]
}

Idempotency¶

Since outbox events can be replayed multiple times, consumer implementations must be idempotent:

@Service
public class OrderEventConsumer {

    @KafkaListener(topics = "orders")
    public void handleOrderCreated(String message) {
        OrderCreatedEvent event = parse(message);

        // Idempotent check: use event ID as unique key
        if (orderRepository.existsByEventId(event.eventId)) {
            log.info("Event {} already processed, skipping", event.eventId);
            return;  // Safely skip duplicate
        }

        // Process event
        Order order = new Order(event);
        orderRepository.save(order);
    }
}

Replay Scenarios¶

Scenario 1: Recover from consumer downtime

# Consumer was down from 10:00 to 10:30
# Replay events from that period
curl -X POST http://localhost:8080/actuator/curve-outbox \
  -H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
  -d '{
    "since": "2026-03-04T10:00:00Z",
    "limit": 500
  }'

Scenario 2: Fix consumer bug

# Consumer had a bug for 1 hour
# Re-publish all events from that period
curl -X POST http://localhost:8080/actuator/curve-outbox \
  -H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
  -d '{
    "since": "2026-03-04T09:00:00Z",
    "limit": 5000
  }'

Important Notes¶

Already-published events can be replayed - The replay API will re-publish events regardless of their current status
Consumers must handle duplicates - Implement idempotent processing using event IDs
Throughput considerations - Large replays may impact Kafka performance; adjust batch size as needed
Default topic respected - Replayed events use the same topic as the original event

Best Practices¶

DO¶

Use outbox for critical events - Order creation, payments, etc.
Monitor pending events - Alert if queue grows
Set appropriate retention - Balance storage and auditability
Use database indexes - Optimize poller queries
Test failure scenarios - Ensure recovery works

DON'T¶

Use outbox for high-volume events (>10,000/sec)
Set poll interval too low (<500ms)
Disable cleanup in production
Ignore failed events

Performance Considerations¶

Factor	Impact	Recommendation
Poll Interval	Lower = faster, higher DB load	1000ms for most cases
Batch Size	Larger = more throughput	100-500 events
Retention	Longer = more storage	7-30 days
Indexes	Essential for performance	On status, created_at

Throughput Estimates¶

Configuration	Throughput
Poll: 1s, Batch: 100	~100 events/sec
Poll: 500ms, Batch: 200	~400 events/sec
Poll: 1s, Batch: 500	~500 events/sec

High Throughput

For >1,000 events/sec, use async mode without outbox:

curve:
  kafka:
    async-mode: true

Troubleshooting¶

Events Stuck in PENDING¶

Events not being published

Check:

Outbox poller is running: logging.level.io.github.closeup1202.curve.outbox=DEBUG
Kafka is accessible
No DB connection pool exhaustion

High Failed Event Count¶

Many events in FAILED status

Solutions:

Check Kafka connectivity
Increase max-retries
Analyze error_message column
Manual republish:

UPDATE curve_outbox_events
SET status = 'PENDING', retry_count = 0
WHERE status = 'FAILED'
  AND created_at > NOW() - INTERVAL '1 hour';

Comparison: Outbox vs Async¶

Feature	Outbox	Async
Atomicity	✅ Guaranteed	❌ Best-effort
Throughput	~1,000 TPS	~10,000+ TPS
Latency	~1-2 seconds	<100ms
Storage	DB required	No extra storage
Complexity	Medium	Low

Use outbox when:

Atomicity is critical (payments, orders)
Events must not be lost
Moderate throughput (<1,000 TPS)

Use async when:

High throughput needed
Best-effort delivery acceptable
Low latency required

What's Next?¶

Observability

Monitor your events

Observability
Configuration

Advanced configuration options

Configuration