Intermediate25 minModule 2 of 5

Errors & Incident Investigation

Track application errors, read event patterns and stack information, and correlate spikes with deployments or traffic changes.

What you'll be able to do after this module

Open an error in Cognition, understand what it's telling you, correlate it with a deployment, and move toward a fix — all without needing to comb through raw log files.


How Cognition tracks errors

Cognition collects errors from your running apps and surfaces them as events with associated metadata: timing, frequency, stack information, and request context. Errors that happen repeatedly are grouped — you see one entry per error pattern rather than one entry per occurrence.

This matters because in production, a single bad deployment can produce thousands of error events in minutes. Grouping lets you immediately see that it's one problem, not a thousand separate problems.


Reading the Errors view

Open Cognition → Errors from the sidebar.

What you see:

  • A list of error events, grouped by pattern.
  • Each row shows the error type, how many times it occurred, the first and last occurrence time, and which app it came from.
  • Clicking a row opens the detail view with the full stack trace, request context, and timing breakdown.

What to look at first

Frequency and timing together tell you whether an error is new or chronic:

PatternWhat it likely means
First seen: today, high countSomething changed recently — check recent deployments
First seen: weeks ago, steady countKnown issue not yet fixed
First seen: today, count of 1Possibly transient — watch to see if it recurs
Count jumped suddenlyTraffic spike, or a bad deployment

Investigating an error spike

This is the most common Cognition workflow. Walk through it step by step:

Check the Overview first

Before drilling into Errors, open Cognition → Overview and look at the error count over time. A sudden jump tells you when the spike started.

Note the timestamp. You'll use it to correlate with deployments.

Open the Errors view

Switch to the Errors tab. Sort by frequency or by first occurrence to find the dominant error pattern.

Read the stack trace

Click the error. Look at:

  • The top frame — this is where the error was thrown. In most cases, this is where you'll find the bug.
  • The calling frames — the code path that led to the throw. Useful when the error is in a shared utility that's called from many places.
  • The error message — read it literally. A Cannot read properties of undefined error at a specific line in your API handler is telling you exactly what went wrong.

Correlate with a deployment

Go to the Console, open your project, and check the deployment history. Compare the timestamp of the first error occurrence with your most recent deployment.

Or check from the CLI:

skytells deployments ls --app my-api --json | jq '[.[] | {id, status, created_at}]'

If the deployment timestamp is close to (or exactly at) the spike start, that deployment is almost certainly the cause.

Act

If it's a broken deployment and you need to get back to working state now:

# Redeploy the app — rolls back to previous working configuration
skytells apps redeploy my-api

Or if you've fixed the code:

skytells deploy my-api
skytells logs my-api --type deployment --follow

Confirm the errors stop

Return to Errors in Cognition. The frequency count for the grouped error should stop climbing. If the pattern cleared, the fix worked.


Common error patterns and what they mean

Error patternLikely cause
Cannot read properties of undefinedCode assumes a field exists but the API or database returned something different
Connection refusedA downstream service (database, cache, external API) is unreachable
Timeout after XmsA downstream call is too slow — the caller gave up waiting
401 Unauthorized from an internal callA service-to-service token expired or was rotated
Out of memoryThe app is consuming more memory than its allocation; see Runtime Health
ECONNRESETNetwork connection dropped mid-request — check infrastructure and upstream services

Working with errors from the CLI

You can pull error data directly from the terminal — useful for quick checks and scripting:

# List recent errors
skytells cognition errors

# Last 20 errors
skytells cognition errors --limit 20

# JSON output for scripting
skytells cognition errors --json

Check the error count in a quick health check:

ERROR_COUNT=$(skytells cognition errors --json | jq 'length')
echo "Errors in current window: $ERROR_COUNT"

After the incident

Once the immediate problem is resolved:

  1. Note what caused it (broken deployment, missing env var, dependency change, traffic pattern).
  2. Add whatever health checks or alerting would have caught it earlier.
  3. Consider whether the error should have been an Error at all — some things your code treats as exceptions might be better handled gracefully.

What you now know

TaskHow to do it
Find the dominant errorsCognition → Errors, sort by frequency
Read a stack traceClick an error row, read from top frame down
Correlate with a deploymentCompare error first-seen timestamp to deployment history
Stop an ongoing incidentskytells apps redeploy my-api or skytells deploy my-api
Confirm the fix workedWatch the error frequency stop climbing in Errors view
Pull errors from terminalskytells cognition errors

Up next: Module 3 — Security Threats & Anomalies →

On this page