Inference Errors

Complete error reference for the Skytells Inference API — all error_id values, HTTP status codes, and resolution guides.

Inference API errors

This catalog extends API errors. That page defines shared HTTP semantics, authentication, credits, 429 rate limiting, and client patterns for the API as a whole.

Inference sub-APIs like Chat Completions, Responses, and Embeddings return failures as an Inference error response (nested error field — see Fields on the nested error object on that page). Use error.error_id (and error.request_id for support), not free-form messages.

Branch on error.error_id in your code — never on error.message. The error_id is a stable uppercase identifier. Messages are human-readable and may change between releases.

Using `request_id`

If you contact support or need to investigate a failed request, include error.request_id. It uniquely identifies the request in Skytells logs and allows the support team to trace the full execution path.

Error schema

Inference failures return an OpenAI-compatible JSON body with a top-level error object. Field reference: Inference error response. Branch on error.error_id.

Inference Error Catalog

The Inference API extends the API errors catalog with the following errors:

Authentication & access

`UNAUTHORIZED` — 401

Field	Value
HTTP	`401`
type	`authentication_error`
code	`invalid_api_key` (typical)

When: x-api-key is missing or not accepted.

Resolution: Verify the key in the Console. Shared semantics: 401 — Unauthorized.

`FORBIDDEN` — 403

Field	Value
HTTP	`403`
type	Typically `permission_error` or `invalid_request_error`
code	Varies

When: The key is syntactically valid but the request is not allowed (plan, route, or account state).

Resolution: Check model access and account status; see 403 — Forbidden. Do not confuse with 401 (UNAUTHORIZED).

Request validation

Invalid JSON or parameters align with 400 — Bad Request / Infrastructure. Inference encodes them with the nested fields below.

`INVALID_REQUEST` — 400

Field	Value
HTTP	`400`
type	`invalid_request_error`
code	`invalid_request`

When: The request body is malformed JSON or has an invalid shape.

Resolution: Validate your JSON payload and ensure it matches the endpoint schema. Set Content-Type: application/json.

`INVALID_PARAMETER` — 400

Field	Value
HTTP	`400`
type	`invalid_request_error`
code	`invalid_parameter`

When: A required parameter is missing or a parameter has an invalid value (e.g. missing model, messages not an array).

Resolution: Check error.param for the offending field. Review the request schema.

`MODEL_NOT_FOUND` — 404

Field	Value
HTTP	`404`
type	`server_error`
code	`model_not_found`

When: The model namespace provided is unknown or not available on the Inference API.

Resolution: Use a valid namespace from the Text Models catalog: deepbrain-router, gpt-5, gpt-5.4, or llama-3.1-8b. See also 404 — Not Found for the same MODEL_NOT_FOUND pattern on other routes.

`ENDPOINT_NOT_FOUND` — 404

Field	Value
HTTP	`404`
type	`server_error`
code	`endpoint_not_found`

When: The requested path does not match any supported endpoint.

Resolution: Use a supported endpoint: /v1/chat/completions, /v1/responses, or /v1/embeddings. See the Inference overview.

Billing & Credits

`INSUFFICIENT_CREDITS` — 402

Field	Value
HTTP	`402`
type	`server_error`
code	`insufficient_credits`

When: Your account balance is insufficient for the request (including a safety margin).

Resolution: Same as 402 — Payment Required (INSUFFICIENT_CREDITS / credits). Add credits in Billing or reduce max_tokens / prompt size.

`CREDIT_CHECK_FAILED` — 402

Field	Value
HTTP	`402`
type	`server_error`
code	`credit_check_failed`

When: The credit-verification service was temporarily unreachable and could not confirm your balance.

Resolution: Retry the request. If the problem persists, contact Support with error.request_id.

Rate Limiting

`RATE_LIMITED` — 429

Field	Value
HTTP	`429`
type	`rate_limit_error`
code	`rate_limit_exceeded`

When: Too many requests sent in a short time window.

Resolution: Follow 429 — Too Many Requests and Rate limits (Retry-After, x-skytells-ratelimit-*, backoff). Inference uses error.error_id: RATE_LIMITED with error.code: rate_limit_exceeded.

`INFERENCE_RATE_LIMITED` — 503

Field	Value
HTTP	`503`
type	`service_unavailable_error`
code	`service_unavailable`

When: The inference layer is temporarily busy or rate-limited at the infrastructure level.

Resolution: Retry with exponential backoff. Consider reducing concurrency. This is transient.

Inference (Sanitized)

All inference-level errors are sanitized — no upstream or infrastructure details are ever exposed.

`INFERENCE_TIMEOUT` — 504

Field	Value
HTTP	`504`
type	`timeout_error`
code	`timeout_error`

When: The inference request exceeded the maximum allowed time.

Resolution: Retry the request. If it persists, reduce max_tokens or shorten the prompt. Avoid extremely long generations.

`SERVICE_UNAVAILABLE` — 503

Field	Value
HTTP	`503`
type	`service_unavailable_error`
code	`service_unavailable`

When: The inference service is temporarily unavailable.

Resolution: Retry with exponential backoff. Check status.skytells.ai for active incidents.

`INFERENCE_ERROR` — 500

Field	Value
HTTP	`500`
type	`server_error`
code	`service_error`

When: Inference failed for an unspecified reason (sanitized upstream error).

Resolution: Retry the request. If the issue persists, report error.request_id to Support.

`INTERNAL_ERROR` — 500

Field	Value
HTTP	`500`
type	`server_error`
code	`service_error`

When: An unexpected error occurred in the Skytells gateway.

Resolution: Retry the request. If it persists for more than a few minutes, check status.skytells.ai and report error.request_id to Support.

Safety & Policy

`CONTENT_POLICY_VIOLATION` — 422

Field	Value
HTTP	`422`
type	`invalid_request_error`
code	`content_policy_violation`

When: The request was blocked by Skytells' safety or content policy.

Resolution: Modify or rephrase the prompt; do not retry with the same content. See 422 — Unprocessable Entity for related API-level error_id values and Responsible AI.

Retry guidance

Error ID	Safe to retry?
`INFERENCE_RATE_LIMITED`	✅ Yes — with exponential backoff
`SERVICE_UNAVAILABLE`	✅ Yes — with exponential backoff
`INFERENCE_TIMEOUT`	✅ Yes — consider reducing `max_tokens`
`INFERENCE_ERROR`	✅ Yes — if persistent, report `request_id`
`INTERNAL_ERROR`	✅ Yes — if persistent, report `request_id`
`CREDIT_CHECK_FAILED`	✅ Yes — transient verification failure
`INVALID_REQUEST`	❌ Fix the request first
`INVALID_PARAMETER`	❌ Fix the parameter first
`MODEL_NOT_FOUND`	❌ Use a valid model namespace
`ENDPOINT_NOT_FOUND`	❌ Use a supported endpoint path
`UNAUTHORIZED`	❌ Fix the API key first
`FORBIDDEN`	❌ Check permissions first
`INSUFFICIENT_CREDITS`	❌ Add credits before retrying
`CONTENT_POLICY_VIOLATION`	❌ Do not retry with the same content

Handling Errors in Code

Inference error handling

OpenAI SDK

from openai import OpenAI, APIStatusError

client = OpenAI(
    api_key="YOUR_SKYTELLS_API_KEY",
    base_url="https://api.skytells.ai/v1",
)

try:
    response = client.chat.completions.create(
        model="deepbrain-router",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response.choices[0].message.content)

except APIStatusError as e:
    body = e.response.json().get("error", {})
    error_id = body.get("error_id")
    request_id = body.get("request_id")

    if error_id == "INSUFFICIENT_CREDITS":
        print("Top up your balance at console.skytells.ai/billing")
    elif error_id == "RATE_LIMITED":
        # implement exponential backoff
        import time; time.sleep(2)
    elif error_id in ("INFERENCE_ERROR", "INTERNAL_ERROR"):
        print(f"Transient error — retry. request_id={request_id}")
    elif error_id == "CONTENT_POLICY_VIOLATION":
        print("Content blocked — modify your prompt")
    else:
        print(f"Unhandled error: {error_id} — {body.get('message')}")

How is this guide?

Inference error handling

On this page