Skip to content

Capacity Planning — Cost vs. Throughput Analysis

Infrastructure cost projections and resource requirements for each ingestion pattern at varying throughput levels.

Monthly Infrastructure Cost ($)
│ P5 [$5,300]
5000 ┤ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ●
│ ╱
│ ╱
4000 ┤ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╱─ ─ ─ ─ ─
│ ╱
│ P3 [$3,300]
3000 ┤ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─●─ ─╱─ ─ ─ ─ ─ ─
│ P5 ╱ ╱
│ [$3,200] ╱
│ ● ╱
2000 ┤ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ╱─ ─●─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ P3 ╱ P3 [$2,000]
│ P5 ╱[$1,710]
│ [$1,710]╱ P4 [$1,400]
1000 ┤ ─ ─ ─ ─ ─ ─●╱─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─●─ ─ ─
│ P3 ╱[$1,000] P4 [$700] ╱
│ ●─╱─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─●─ ─ ─ ─╱─ ─ ─ ─ ─ ─
│ P2 [$800] ╱ ╱
│ P1 [$350] P4 [$350] ╱ ╱
│ ●────────●─ ─ ─ ─ ─ ─ ─ ╱─ ─ ─ ─ ─╱─ ─ ─ ─ ─ ─ ─ ─ ─
0 ┤───┬──────────┬──────────┬──────────┬──────────┬──────────
0 10k 50k 100k Throughput
Events per Second (events/sec)
Legend: ● P1/P2 (PostgreSQL) ● P3 (Kafka+CH) ● P4 (CH Direct) ● P5 (Kafka+Flink)
Note: P1 and P2 cannot scale beyond ~10k/sec (PostgreSQL write ceiling)
ThroughputPostgreSQLApp ServerTotal/moNotes
1k/sec$100 (db.r6g.large)$50$150Comfortable
5k/sec$200 (db.r6g.xlarge)$100$300Moderate load
10k/sec$250 (db.r6g.2xlarge)$100$350At ceiling
20k/secN/AExceeds PG write capacity

Scaling ceiling: ~10k writes/sec with optimized batch inserts and connection pooling.

ThroughputPostgreSQLETL InfraClickHouseTotal/moNotes
1k/sec$100$50$200$350Over-provisioned
5k/sec$200$100$200$500Good fit
10k/sec$250$150$400$800PG at ceiling
20k/secN/APG bottleneck

Scaling ceiling: Same as P1; PostgreSQL is the bottleneck regardless of downstream.

ThroughputKafkaConsumersClickHousePostgreSQLTotal/moNotes
1k/sec$200 (3 brokers)$50$200$100$550Over-provisioned
10k/sec$300 (3 brokers)$100$400$200$1,000Current target
50k/sec$600 (5 brokers)$200$800 (2 nodes)$200$2,000Phase 2
100k/sec$900 (8 brokers)$400$1,600 (3 nodes)$200$3,300Phase 3

Cost per million events:

  • At 10k/sec: $1,000 / 26.4B events = $0.038 per million events
  • At 50k/sec: $2,000 / 132B events = $0.015 per million events
  • At 100k/sec: $3,300 / 264B events = $0.012 per million events
ThroughputClickHouseApp ServerTotal/moNotes
1k/sec$200$50$250Simplest setup
10k/sec$250$100$350Cost-effective
50k/sec$500 (2 nodes)$200$700Add replication
100k/sec$1,000 (3 nodes)$400$1,400Sharded cluster

Trade-off: Cheapest at every throughput level, but no event replay or decoupling.

ThroughputKafkaFlinkClickHousePostgreSQLTotal/moNotes
1k/sec$200$300$200$100$800Flink is expensive idle
10k/sec$300$510$400$200$1,710JVM TaskManagers
50k/sec$600$800$800$200$3,200Flink scales well
100k/sec$900$1,400$1,600$200$5,300Full streaming stack

Break-even vs P3: Flink adds value only when complex stream processing (windowed aggregations, CEP) is required.

Cost per Million Events (at each throughput level)
│ $0.15
│ ┃
│ ┃ P2
│ $0.10
│ ┃ P5
│ ┃ P1 ┃
│ ┃ ┃ ┃
│ $0.05
│ ┃ ┃ ┃ P3
│ ┃ ┃ ┃ ┃ P4
│ ┃ ┃ ┃ ┃ ┃
│ $0.038 ─ ─ ─ ─ ─ ┃─ ─ ─ ─ ─ ─ ─ ← P3 at 10k/sec
│ ┃ ┃ ┃ ┃ ┃
│ $0.015 ─ ─ ─ ─ ─ ─ ─ ─┃─ ─ ─ ─ ─ ← P3 at 50k/sec
│ ┃ ┃ ┃ ┃ ┃
│ $0.012 ─ ─ ─ ─ ─ ─ ─ ─ ─ ┃─ ─ ─ ─ ← P3 at 100k/sec
│ $0.005 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┃─ ─ ← P4 at 100k/sec
│ ┗━━━┻━━━━┻━━━━━┻━━━━━┛
│ 10k 50k 100k
│ Events per Second
ResourceP1P3 (Chosen)P4P5
CPU Cores812420
Memory (GB)3224840
Storage (TB/mo)2.51.20.81.2
Network (Mbps)508040100
Instances2628
ResourceP3P4P5
CPU Cores482472
Memory (GB)9648160
Storage (TB/mo)12812
Network (Mbps)8004001000
Instances15620

Based on average event size of ~500 bytes uncompressed, ~100 bytes compressed (lz4, ~5x ratio):

ThroughputEvents/DayRaw/DayCompressed/DayCompressed/MonthCompressed/Year
1k/sec86.4M43 GB8.6 GB258 GB3.1 TB
10k/sec864M432 GB86 GB2.6 TB31 TB
50k/sec4.32B2.16 TB432 GB13 TB156 TB
100k/sec8.64B4.32 TB864 GB26 TB312 TB
RetentionStorage at 10k/secStorage at 100k/sec
30 days2.6 TB26 TB
90 days7.8 TB78 TB
1 year31 TB312 TB
3 years93 TB936 TB

Recommendation: 90-day hot storage in ClickHouse, archive older data to S3/cold storage.

MetricThresholdAction
Consumer lag> 10,000 messages sustainedAdd consumer instances
Consumer lag> 100,000 messagesAdd Kafka partitions + consumers
ClickHouse CPU> 70% sustainedAdd ClickHouse node/replica
ClickHouse merge time> 60 secondsIncrease memory or add node
Kafka disk usage> 70%Add brokers or reduce retention
API latency P99> 200msScale API servers
Event throughputApproaching 2x current capacityBegin next phase planning

TCO Summary (3-Year Projection at 10k/sec)

Section titled “TCO Summary (3-Year Projection at 10k/sec)”
PatternMonthlyAnnual3-Year TCOEngineering CostTotal 3-Year
P1$350$4,200$12,600Low ($0)~$12,600
P2$800$9,600$28,800Medium ($20k)~$48,800
P3$1,000$12,000$36,000Medium ($15k)~$51,000
P4$350$4,200$12,600Low ($5k)~$17,600
P5$1,710$20,520$61,560High ($40k)~$101,560

Note: P1 and P2 exclude the cost of re-architecture when hitting the 10k/sec ceiling, which is the likely outcome at bxb’s growth trajectory. P3’s premium over P4 buys event replay — a requirement for billing accuracy.