Skip to content

Infrastructure Sizing Specification

Document ID: PLCY-INF-001
Version: 1.0
Effective Date: December 22, 2025
Last Review: December 22, 2025
Owner: Hop And Haul Team


CONFIDENTIAL

This document is CONFIDENTIAL and for internal use only. Do not distribute outside the organization.

1. Purpose

This document defines the infrastructure sizing requirements for Hop And Haul's production environment, establishing capacity baselines, scaling thresholds, and resource allocation decisions.


2. Workload Characteristics

2.1 User Population

MetricValueNotes
Maximum users5,000Drivers and riders are the same user pool
Peak concurrent users5,000All users may be active simultaneously
User sessionsStateless JWTNo server-side session storage

2.2 Traffic Patterns

Traffic TypeCharacteristics
API requestsBursty, correlated with ride activity
WebSocket connectionsPersistent, one per active driver
GPS updates1 update per 5 seconds per active driver
Peak hours7-9 AM, 4-7 PM local time

2.3 Request Volume

MetricCalculationResult
Max concurrent WebSocket5,000 drivers5,000 connections
GPS updates per second5,000 / 5 sec interval1,000 messages/sec
API RPS (peak)5,000 users x 1 req/sec5,000 RPS
API RPS (sustained)5,000 users x 0.1 req/sec500 RPS

3. Compute Sizing (EC2)

3.1 Selected Instance

AttributeValue
Instance typer6g.xlarge
vCPU4
Memory32 GB
NetworkUp to 10 Gbps
ArchitectureARM64 (Graviton2)
Pricing tierOn-demand or Reserved

3.2 Sizing Justification

Memory Analysis:

ComponentMemory Usage
Swift Vapor runtime500 MB
Application heap1-2 GB
WebSocket connections (5,000 x 50KB)250 MB
Connection buffers500 MB
OS and system1 GB
Total estimated~4 GB
Available headroom28 GB

CPU Analysis:

WorkloadCPU Estimate
5,000 RPS API handling~20% of 4 vCPU
1,000 msg/sec WebSocket~10% of 4 vCPU
JWT validation~5% of 4 vCPU
Database queries~10% of 4 vCPU
Total estimated~45%
Available headroom55%

Why r6g.xlarge:

ReasonExplanation
Graviton2 (ARM)40% better price/performance vs x86
Memory headroom32GB provides buffer for growth
Network capacity10 Gbps handles WebSocket + API
Swift supportSwift fully supports ARM64 Linux

3.3 Alternative Sizing Options

ScenarioInstanceRationale
Cost-optimizedr6g.large (16GB)Sufficient for current load
Growth headroomr6g.xlarge (32GB)Recommended
Peak bufferr6g.2xlarge (64GB)If approaching limits

4. Database Sizing (RDS PostgreSQL)

4.1 Selected Instance

AttributeValue
Instance classdb.t3.small (2GB) or db.t3.medium (4GB)
EnginePostgreSQL 15
Multi-AZEnabled
Storagegp3, 100GB initial
IOPS3,000 baseline (gp3 default)

4.2 Sizing Justification

Connection Analysis:

MetricValue
Application connection pool20-50 connections
Actual concurrent queries10-20
RDS max_connections (t3.small)112
RDS max_connections (t3.medium)225
Utilization<50%

Memory Analysis:

ComponentEstimate
shared_buffers (25% of RAM)500MB (2GB) or 1GB (4GB)
work_mem per connection10MB
Active work_mem (20 queries)200MB
OS and overhead500MB
Total pressure~1.2-1.7 GB

Storage Analysis:

Data TypeEstimated Size
User records (5,000)10 MB
Organization records1 MB
Ride history (1 year)5 GB
GPS traces (compressed, 12 months)20 GB
Audit logs (24 months)10 GB
Indexes10 GB
Total~50 GB
Provisioned100 GB

4.3 Multi-AZ Configuration

FeatureSetting
Multi-AZ deploymentEnabled
Synchronous replicationYes (automatic)
Automatic failoverYes (60-120 seconds)
Backup retention30 days
Point-in-time recoveryEnabled

4.4 Recommendation

Load LevelInstance ClassReasoning
Conservativedb.t3.small (2GB)Sufficient for workload
Recommendeddb.t3.medium (4GB)Headroom for query complexity
Growthdb.t3.large (8GB)If adding analytics queries

5. Network Architecture

5.1 Cloudflare Configuration

ComponentConfiguration
DNSCloudflare (proxied)
DDoS protectionIncluded
Zero TrustEnabled for admin access
TunnelSingle tunnel to EC2
WebSocket supportEnabled

5.2 AWS Networking

ComponentConfiguration
VPCSingle VPC, single region
SubnetsPrivate (app), Private (RDS)
NAT GatewayFor outbound (Secrets Manager, etc.)
Security GroupsNo inbound, egress restricted
VPC EndpointsSecrets Manager, S3

5.3 No Public Exposure

LayerPublic Access
EC2No public IP
RDSNo public access
S3VPC endpoint only
IngressCloudflare Tunnel only

6. WebSocket Capacity

6.1 Connection Limits

LimitValueSource
OS file descriptors65,535 defaultIncrease to 100,000
Swift NIO connectionsNo hard limitMemory-bound
Cloudflare concurrent100+ per tunnelWell above need
Target connections5,000Workload requirement

6.2 GPS Update Flow

Driver App → Cloudflare Tunnel → EC2 (Swift Vapor) → RDS (batch write)
            ↑ WebSocket (persistent)              ↑ Every 30 seconds
StageLatency Target
Client to Cloudflare< 50ms
Cloudflare to EC2< 10ms
EC2 processing< 5ms
Batch DB write< 50ms
Total< 115ms

6.3 Memory per Connection

ComponentSize
Swift NIO channel~10KB
Application state~20KB
Buffers~20KB
Total per connection~50KB
5,000 connections~250MB

7. Secrets Management

7.1 AWS Secrets Manager

SecretRotation
RDS credentials30 days (automatic)
JWT signing keyManual (on compromise)
Cloudflare API tokenManual
Third-party API keysPer provider policy

7.2 Application Secret Loading

TimingBehavior
BootLoad all secrets from Secrets Manager
RuntimeCached in memory
RotationRestart required (or implement refresh)

8. Monitoring and Alerting

8.1 CloudWatch Metrics

MetricWarningCritical
EC2 CPU> 70%> 90%
EC2 Memory> 80%> 95%
RDS CPU> 70%> 85%
RDS connections> 80> 100
RDS storage> 80%> 90%
WebSocket connections> 4,500> 4,900

8.2 Application Metrics

MetricWarningCritical
API latency p99> 500ms> 1000ms
Error rate> 1%> 5%
WebSocket disconnects/min> 100> 500
GPS update lag> 10s> 30s

9. Cost Estimation

9.1 Monthly Costs (us-east-1, On-Demand)

ResourceSpecificationMonthly Cost
EC2 r6g.xlarge730 hours~$150
RDS db.t3.medium Multi-AZ730 hours~$100
RDS storage (100GB gp3)Multi-AZ~$25
NAT GatewayData transfer~$50
Secrets Manager5 secrets~$3
CloudWatchMetrics + logs~$30
S3 (AMIs, backups)~50GB~$2
Total~$360/month

9.2 Reserved Instance Savings

CommitmentEC2 SavingsRDS Savings
1-year RI~30%~30%
3-year RI~50%~50%

10. Scaling Triggers

10.1 When to Scale Up

MetricThresholdAction
Sustained CPU > 80%1 hourConsider larger instance
Memory > 90%SustainedUpgrade instance class
RDS connections > 100SustainedIncrease pool or instance
WebSocket > 4,500Approaching limitPlan for horizontal scale

10.2 When to Consider Horizontal Scaling

TriggerDescription
Users > 10,000Single-box limits approaching
WebSocket > 8,000Connection density limits
RegulatoryData residency requirements
AvailabilityRTO < 5 min unacceptable

10.3 Horizontal Scaling Architecture (Future)

If scaling beyond single-box:

ComponentStrategy
APIALB + multiple EC2
WebSocketSticky sessions or Redis pub/sub
DatabaseRead replicas for queries
SessionsAlready stateless (JWT)

11. Document References

DocumentRelevance
PLCY-DRP-001 Disaster Recovery PlanRecovery procedures
PLCY-SEC-001 Security ControlsSecurity requirements
PLCY-RSK-001 Risk AssessmentCapacity risks

12. Revision History

VersionDateAuthorChanges
1.0December 22, 2025Infrastructure DirectorInitial release

CONFIDENTIAL - Internal Use Only - Hop And Haul Policy Documentation