Errors & Incident Investigation

Track application errors, read event patterns and stack information, and correlate spikes with deployments or traffic changes.

What you'll be able to do after this module

Open an error in Cognition, understand what it's telling you, correlate it with a deployment, and move toward a fix — all without needing to comb through raw log files.

How Cognition tracks errors

Cognition collects errors from your running apps and surfaces them as events with associated metadata: timing, frequency, stack information, and request context. Errors that happen repeatedly are grouped — you see one entry per error pattern rather than one entry per occurrence.

This matters because in production, a single bad deployment can produce thousands of error events in minutes. Grouping lets you immediately see that it's one problem, not a thousand separate problems.

Reading the Errors view

Open Cognition → Errors from the sidebar.

What you see:

A list of error events, grouped by pattern.
Each row shows the error type, how many times it occurred, the first and last occurrence time, and which app it came from.
Clicking a row opens the detail view with the full stack trace, request context, and timing breakdown.

What to look at first

Frequency and timing together tell you whether an error is new or chronic:

Pattern	What it likely means
First seen: today, high count	Something changed recently — check recent deployments
First seen: weeks ago, steady count	Known issue not yet fixed
First seen: today, count of 1	Possibly transient — watch to see if it recurs
Count jumped suddenly	Traffic spike, or a bad deployment

Investigating an error spike

This is the most common Cognition workflow. Walk through it step by step:

Check the Overview first

Before drilling into Errors, open Cognition → Overview and look at the error count over time. A sudden jump tells you when the spike started.

Note the timestamp. You'll use it to correlate with deployments.

Open the Errors view

Switch to the Errors tab. Sort by frequency or by first occurrence to find the dominant error pattern.

Read the stack trace

Click the error. Look at:

The top frame — this is where the error was thrown. In most cases, this is where you'll find the bug.
The calling frames — the code path that led to the throw. Useful when the error is in a shared utility that's called from many places.
The error message — read it literally. A Cannot read properties of undefined error at a specific line in your API handler is telling you exactly what went wrong.

Correlate with a deployment

Go to the Console, open your project, and check the deployment history. Compare the timestamp of the first error occurrence with your most recent deployment.

Or check from the CLI:

skytells deployments ls --app my-api --json | jq '[.[] | {id, status, created_at}]'

If the deployment timestamp is close to (or exactly at) the spike start, that deployment is almost certainly the cause.

Act

If it's a broken deployment and you need to get back to working state now:

# Redeploy the app — rolls back to previous working configuration
skytells apps redeploy my-api

Or if you've fixed the code:

skytells deploy my-api
skytells logs my-api --type deployment --follow

Confirm the errors stop

Return to Errors in Cognition. The frequency count for the grouped error should stop climbing. If the pattern cleared, the fix worked.

Common error patterns and what they mean

Error pattern	Likely cause
`Cannot read properties of undefined`	Code assumes a field exists but the API or database returned something different
`Connection refused`	A downstream service (database, cache, external API) is unreachable
`Timeout after Xms`	A downstream call is too slow — the caller gave up waiting
`401 Unauthorized` from an internal call	A service-to-service token expired or was rotated
`Out of memory`	The app is consuming more memory than its allocation; see Runtime Health
`ECONNRESET`	Network connection dropped mid-request — check infrastructure and upstream services

Working with errors from the CLI

You can pull error data directly from the terminal — useful for quick checks and scripting:

# List recent errors
skytells cognition errors

# Last 20 errors
skytells cognition errors --limit 20

# JSON output for scripting
skytells cognition errors --json

Check the error count in a quick health check:

ERROR_COUNT=$(skytells cognition errors --json | jq 'length')
echo "Errors in current window: $ERROR_COUNT"

After the incident

Once the immediate problem is resolved:

Note what caused it (broken deployment, missing env var, dependency change, traffic pattern).
Add whatever health checks or alerting would have caught it earlier.
Consider whether the error should have been an Error at all — some things your code treats as exceptions might be better handled gracefully.

What you now know

Task	How to do it
Find the dominant errors	Cognition → Errors, sort by frequency
Read a stack trace	Click an error row, read from top frame down
Correlate with a deployment	Compare error first-seen timestamp to deployment history
Stop an ongoing incident	`skytells apps redeploy my-api` or `skytells deploy my-api`
Confirm the fix worked	Watch the error frequency stop climbing in Errors view
Pull errors from terminal	`skytells cognition errors`

Up next: Module 3 — Security Threats & Anomalies →

PreviousWhat is Cognition?NextSecurity Threats & Anomalies

Errors & Incident Investigation

On this page