Rate Limits

The Promptly API enforces monthly usage limits based on your plan tier. All limits are per API key.

Plan tiers

Plan	Monthly requests	Price
Free	5,000	$0
Pro	50,000	$29/month

Rate limit headers

Every API response includes headers with your current usage:

Header	Description
`X-RateLimit-Limit`	Your monthly request quota
`X-RateLimit-Remaining`	Requests remaining this month
`X-RateLimit-Reset`	Unix timestamp when the quota resets

Reading headers

const response = await fetch(
  'https://api.promptlycms.com/prompts/my-prompt',
  {
    headers: { Authorization: 'Bearer pk_live_...' },
  },
);

const limit = response.headers.get('X-RateLimit-Limit');
const remaining = response.headers.get('X-RateLimit-Remaining');
const reset = response.headers.get('X-RateLimit-Reset');

console.log(`${remaining}/${limit} requests remaining`);
console.log(`Resets at ${new Date(Number(reset) * 1000).toISOString()}`);

curl -i https://api.promptlycms.com/prompts/my-prompt \
  -H "Authorization: Bearer pk_live_..."

# Response headers:
# X-RateLimit-Limit: 5000
# X-RateLimit-Remaining: 4832
# X-RateLimit-Reset: 1735689600

import requests

response = requests.get(
    "https://api.promptlycms.com/prompts/my-prompt",
    headers={"Authorization": "Bearer pk_live_..."},
)

limit = response.headers["X-RateLimit-Limit"]
remaining = response.headers["X-RateLimit-Remaining"]
reset = response.headers["X-RateLimit-Reset"]

print(f"{remaining}/{limit} requests remaining")

429 responses

When you exceed your usage limit, the API returns a 429 status with usage details and an upgrade link:

{
  "error": "Usage limit exceeded",
  "code": "USAGE_LIMIT_EXCEEDED",
  "usage": {
    "used": 5000,
    "limit": 5000
  },
  "upgradeUrl": "https://app.promptlycms.com/settings?upgrade"
}

Field	Type	Description
`error`	`string`	Human-readable error message
`code`	`string`	Always `USAGE_LIMIT_EXCEEDED`
`usage`	`object`	Current usage and limit counts
`upgradeUrl`	`string`	Direct link to upgrade your plan

Best practices

Cache responses. Prompt content doesn’t change between publishes. Cache the response and only re-fetch when you deploy or when you know the prompt has been updated.

Fetch at startup or per request. With response times of 50-80ms, fetching prompts on each request is practical. For high-throughput workloads, fetch once at startup and hold them in memory.

Monitor usage. Check the X-RateLimit-Remaining header periodically to avoid hitting limits unexpectedly.

Use batch fetching. If you need multiple prompts, use GET /prompts to fetch all of them in a single request rather than making individual requests for each one.