Skip to content

Transactional Outbox Pattern

Guarantee atomicity between database transactions and event publishing with Curve's transactional outbox pattern.

The Problem

Without transactional outbox:

@Transactional
public Order createOrder(OrderRequest request) {
    // 1. Save to database
    Order order = orderRepository.save(new Order(request));

    // 2. Publish to Kafka
    kafkaProducer.send("orders", orderEvent);  // ❌ What if this fails?

    return order;
}

Issues:

  • Order saved but event not published → Lost event
  • Event published but transaction rolled back → Ghost event
  • No atomicity between DB and Kafka

The Solution

Transactional outbox saves events to the database in the same transaction:

@Transactional
@PublishEvent(
    eventType = "ORDER_CREATED",
    outbox = true,                    // ✓ Enable outbox
    aggregateType = "Order",          // Entity type
    aggregateId = "#result.id"        // Entity ID
)
public Order createOrder(OrderRequest request) {
    return orderRepository.save(new Order(request));
}

How It Works

sequenceDiagram
    participant App as Application
    participant DB as Database (Outbox)
    participant Kafka as Kafka Broker
    participant Poller as Outbox Publisher

    Note over App, DB: Transaction Start
    App->>DB: 1. Save Business Entity (Order)
    App->>DB: 2. Save Outbox Event (PENDING)
    DB-->>App: TX Committed ✓

    loop Every 1s (Polling)
        Poller->>DB: 3. Fetch PENDING Events
        Poller->>Kafka: 4. Publish to Kafka
        alt Success
            Kafka-->>Poller: Ack
            Poller->>DB: 5. Update Status to PUBLISHED
        else Failure
            Poller->>DB: 5. Increment Retry Count
        end
    end

Steps:

  1. Save entity and event in same transaction - Atomicity guaranteed
  2. Background poller - Periodically checks for unsent events
  3. Publish to Kafka - Events sent asynchronously
  4. Mark as sent - Successful events marked complete

Configuration

Enable Outbox

application.yml
spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/mydb
    username: user
    password: pass

  jpa:
    hibernate:
      ddl-auto: update  # Or use Flyway/Liquibase

curve:
  outbox:
    enabled: true
    poll-interval-ms: 1000      # Poll every 1 second
    batch-size: 100             # Process 100 events per batch
    max-retries: 3              # Retry failed events 3 times
    cleanup-enabled: true       # Auto-cleanup old events
    retention-days: 7           # Keep events for 7 days
    cleanup-cron: "0 0 2 * * *" # Cleanup at 2 AM daily

Database Schema

Curve auto-creates the outbox table:

CREATE TABLE curve_outbox_events (
    id BIGINT PRIMARY KEY,
    event_id VARCHAR(255) NOT NULL,
    event_type VARCHAR(255) NOT NULL,
    aggregate_type VARCHAR(255) NOT NULL,
    aggregate_id VARCHAR(255) NOT NULL,
    payload TEXT NOT NULL,
    metadata TEXT,
    status VARCHAR(50) NOT NULL,  -- PENDING, SENT, FAILED
    retry_count INT DEFAULT 0,
    last_retry_at TIMESTAMP,
    created_at TIMESTAMP NOT NULL,
    sent_at TIMESTAMP,
    error_message TEXT
);

CREATE INDEX idx_status ON curve_outbox_events(status);
CREATE INDEX idx_created_at ON curve_outbox_events(created_at);

Usage Examples

Basic Outbox

@Transactional
@PublishEvent(
    eventType = "ORDER_CREATED",
    outbox = true,
    aggregateType = "Order",
    aggregateId = "#result.id"
)
public Order createOrder(OrderRequest request) {
    return orderRepository.save(new Order(request));
}

With Custom Payload

@Transactional
@PublishEvent(
    eventType = "USER_REGISTERED",
    outbox = true,
    aggregateType = "User",
    aggregateId = "#result.userId",
    payload = "#result.toRegisteredPayload()"
)
public User registerUser(UserRequest request) {
    User user = userRepository.save(new User(request));

    // Other operations in same transaction
    auditRepository.save(new AuditLog("User registered"));

    return user;
}

Complex Transaction

@Transactional
@PublishEvent(
    eventType = "ORDER_COMPLETED",
    outbox = true,
    aggregateType = "Order",
    aggregateId = "#result.id"
)
public Order completeOrder(Long orderId) {
    // 1. Update order
    Order order = orderRepository.findById(orderId)
        .orElseThrow();
    order.setStatus(OrderStatus.COMPLETED);
    order.setCompletedAt(Instant.now());

    // 2. Update inventory
    inventoryService.decrementStock(order.getItems());

    // 3. Create invoice
    Invoice invoice = invoiceService.create(order);

    // All operations atomic - event published after TX commits
    return orderRepository.save(order);
}

Advanced Features

Exponential Backoff

Failed events are retried with exponential backoff:

Retry 1: Immediate
Retry 2: 2 seconds later
Retry 3: 4 seconds later  (2s × 2)
Retry 4: 8 seconds later  (4s × 2)

SKIP LOCKED for Multi-Instance

Prevents duplicate processing in multi-instance deployments:

SELECT * FROM curve_outbox_events
WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;  -- Prevents duplicates

Automatic Cleanup

Old events are automatically cleaned up:

curve:
  outbox:
    cleanup-enabled: true
    retention-days: 7           # Delete events older than 7 days
    cleanup-cron: "0 0 2 * * *" # Daily at 2 AM

Monitoring

Health Check

curl http://localhost:8080/actuator/health/curve-outbox
{
  "status": "UP",
  "details": {
    "pendingEvents": 5,
    "failedEvents": 2,
    "totalEvents": 1523
  }
}

Metrics

curl http://localhost:8080/actuator/curve-metrics
{
  "outbox": {
    "pendingEvents": 5,
    "sentEvents": 1518,
    "failedEvents": 2,
    "avgProcessingTimeMs": 45
  }
}

Query Outbox Table

-- Pending events
SELECT * FROM curve_outbox_events
WHERE status = 'PENDING'
ORDER BY created_at DESC;

-- Failed events
SELECT event_type, error_message, retry_count
FROM curve_outbox_events
WHERE status = 'FAILED'
ORDER BY created_at DESC;

-- Event throughput
SELECT DATE(created_at) as date, COUNT(*)
FROM curve_outbox_events
WHERE status = 'SENT'
GROUP BY DATE(created_at)
ORDER BY date DESC;

Outbox Replay API

Replay previously published events from the outbox. This is useful for:

  • Consumer failure recovery: Reprocess events after consumer downtime
  • Data corrections: Re-publish events to fix downstream state
  • Testing: Replay historical events in test environments

Endpoint

GET  /actuator/curve-outbox    - View outbox statistics
POST /actuator/curve-outbox    - Replay events since timestamp

Enable the Endpoint

application.yml
management:
  endpoints:
    web:
      exposure:
        include: curve-outbox

GET /actuator/curve-outbox

View outbox statistics:

curl http://localhost:8080/actuator/curve-outbox

Response:

{
  "total": 1523,
  "pending": 5,
  "published": 1516,
  "failed": 2,
  "avgProcessingTimeMs": 45
}

POST /actuator/curve-outbox

Replay events since a given timestamp with optional limit:

curl -X POST http://localhost:8080/actuator/curve-outbox \
  -H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
  -d '{"since": "2026-03-01T00:00:00Z", "limit": 100}'

Request Body:

{
  "since": "2026-03-01T00:00:00Z",  // ISO-8601 timestamp (required)
  "limit": 100                        // Max events to replay (optional, default: 1000)
}

Response:

{
  "since": "2026-03-01T00:00:00Z",
  "limit": 100,
  "total": 42,          // Events found since timestamp
  "success": 40,        // Successfully replayed
  "failed": 2,          // Failed during replay
  "failedEventIds": ["evt-001", "evt-002"]
}

Idempotency

Since outbox events can be replayed multiple times, consumer implementations must be idempotent:

@Service
public class OrderEventConsumer {

    @KafkaListener(topics = "orders")
    public void handleOrderCreated(String message) {
        OrderCreatedEvent event = parse(message);

        // Idempotent check: use event ID as unique key
        if (orderRepository.existsByEventId(event.eventId)) {
            log.info("Event {} already processed, skipping", event.eventId);
            return;  // Safely skip duplicate
        }

        // Process event
        Order order = new Order(event);
        orderRepository.save(order);
    }
}

Replay Scenarios

Scenario 1: Recover from consumer downtime

# Consumer was down from 10:00 to 10:30
# Replay events from that period
curl -X POST http://localhost:8080/actuator/curve-outbox \
  -H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
  -d '{
    "since": "2026-03-04T10:00:00Z",
    "limit": 500
  }'

Scenario 2: Fix consumer bug

# Consumer had a bug for 1 hour
# Re-publish all events from that period
curl -X POST http://localhost:8080/actuator/curve-outbox \
  -H "Content-Type: application/vnd.spring-boot.actuator.v3+json" \
  -d '{
    "since": "2026-03-04T09:00:00Z",
    "limit": 5000
  }'

Important Notes

  1. Already-published events can be replayed - The replay API will re-publish events regardless of their current status
  2. Consumers must handle duplicates - Implement idempotent processing using event IDs
  3. Throughput considerations - Large replays may impact Kafka performance; adjust batch size as needed
  4. Default topic respected - Replayed events use the same topic as the original event

Best Practices

✅ DO

  • Use outbox for critical events - Order creation, payments, etc.
  • Monitor pending events - Alert if queue grows
  • Set appropriate retention - Balance storage and auditability
  • Use database indexes - Optimize poller queries
  • Test failure scenarios - Ensure recovery works

❌ DON'T

  • Use outbox for high-volume events (>10,000/sec)
  • Set poll interval too low (<500ms)
  • Disable cleanup in production
  • Ignore failed events

Performance Considerations

Factor Impact Recommendation
Poll Interval Lower = faster, higher DB load 1000ms for most cases
Batch Size Larger = more throughput 100-500 events
Retention Longer = more storage 7-30 days
Indexes Essential for performance On status, created_at

Throughput Estimates

Configuration Throughput
Poll: 1s, Batch: 100 ~100 events/sec
Poll: 500ms, Batch: 200 ~400 events/sec
Poll: 1s, Batch: 500 ~500 events/sec

High Throughput

For >1,000 events/sec, use async mode without outbox:

curve:
  kafka:
    async-mode: true

Troubleshooting

Events Stuck in PENDING

Events not being published

Check:

  1. Outbox poller is running: logging.level.io.github.closeup1202.curve.outbox=DEBUG
  2. Kafka is accessible
  3. No DB connection pool exhaustion

High Failed Event Count

Many events in FAILED status

Solutions:

  1. Check Kafka connectivity
  2. Increase max-retries
  3. Analyze error_message column
  4. Manual republish:
UPDATE curve_outbox_events
SET status = 'PENDING', retry_count = 0
WHERE status = 'FAILED'
  AND created_at > NOW() - INTERVAL '1 hour';

Comparison: Outbox vs Async

Feature Outbox Async
Atomicity ✅ Guaranteed ❌ Best-effort
Throughput ~1,000 TPS ~10,000+ TPS
Latency ~1-2 seconds <100ms
Storage DB required No extra storage
Complexity Medium Low

Use outbox when:

  • Atomicity is critical (payments, orders)
  • Events must not be lost
  • Moderate throughput (<1,000 TPS)

Use async when:

  • High throughput needed
  • Best-effort delivery acceptable
  • Low latency required

What's Next?