Quota & regional capacity
GPT-4o capacity is allocated per model, per region as tokens-per-minute quota, and the region you need for residency may have limited or waitlisted capacity. Without provisioned throughput and a fallback plan, traffic spikes hit 429s and users see failures.