Every quarter, we benchmark every major image generation model against real production workloads from our platform. Not synthetic tests, actual jobs from customers generating AI headshots at scale.
This quarter, we tested 8 models across 12,000 inference jobs , scoring each on quality (FID, CLIP, human eval), cost per image, and p95 latency. Here’s the full breakdown.