AI Workflows — Chunk 234

Understanding Rate Limits

APIs have limits: - Requests per minute (RPM) - Tokens per minute (TPM)

Example: Claude API might allow 10 requests per second and 100k tokens per minute.

In most cases, personal and small-business workflows won't hit these limits. But high-volume workflows might.

If your workflow runs 100 times per minute, and each call uses 600 tokens, that's 60,000 tokens per minute. You're close to the limit. A spike could cause failures.

Solutions: - Add delays: Put a 1-second delay between API calls. This spaces them out. - Use exponential backoff: If a request fails due to rate limits, wait, then retry. Double the wait time each retry. - Distribute load: Instead of all 100 calls happening in the same minute, spread them across 5 minutes. - Request higher limits: Contact OpenAI or Anthropic and ask for increased rate limits. They usually grant them if you're a paying customer.

Lesson 6: Cost and Rate Limits — The Bill at the End of the Month

Understanding Rate Limits

This lesson is premium