Errors & Incident Investigation
Track application errors, read event patterns and stack information, and correlate spikes with deployments or traffic changes.
What you'll be able to do after this module
Open an error in Cognition, understand what it's telling you, correlate it with a deployment, and move toward a fix — all without needing to comb through raw log files.
How Cognition tracks errors
Cognition collects errors from your running apps and surfaces them as events with associated metadata: timing, frequency, stack information, and request context. Errors that happen repeatedly are grouped — you see one entry per error pattern rather than one entry per occurrence.
This matters because in production, a single bad deployment can produce thousands of error events in minutes. Grouping lets you immediately see that it's one problem, not a thousand separate problems.
Reading the Errors view
Open Cognition → Errors from the sidebar.
What you see:
- A list of error events, grouped by pattern.
- Each row shows the error type, how many times it occurred, the first and last occurrence time, and which app it came from.
- Clicking a row opens the detail view with the full stack trace, request context, and timing breakdown.
What to look at first
Frequency and timing together tell you whether an error is new or chronic:
| Pattern | What it likely means |
|---|---|
| First seen: today, high count | Something changed recently — check recent deployments |
| First seen: weeks ago, steady count | Known issue not yet fixed |
| First seen: today, count of 1 | Possibly transient — watch to see if it recurs |
| Count jumped suddenly | Traffic spike, or a bad deployment |
Investigating an error spike
This is the most common Cognition workflow. Walk through it step by step:
Check the Overview first
Before drilling into Errors, open Cognition → Overview and look at the error count over time. A sudden jump tells you when the spike started.
Note the timestamp. You'll use it to correlate with deployments.
Open the Errors view
Switch to the Errors tab. Sort by frequency or by first occurrence to find the dominant error pattern.
Read the stack trace
Click the error. Look at:
- The top frame — this is where the error was thrown. In most cases, this is where you'll find the bug.
- The calling frames — the code path that led to the throw. Useful when the error is in a shared utility that's called from many places.
- The error message — read it literally. A
Cannot read properties of undefinederror at a specific line in your API handler is telling you exactly what went wrong.
Correlate with a deployment
Go to the Console, open your project, and check the deployment history. Compare the timestamp of the first error occurrence with your most recent deployment.
Or check from the CLI:
skytells deployments ls --app my-api --json | jq '[.[] | {id, status, created_at}]'If the deployment timestamp is close to (or exactly at) the spike start, that deployment is almost certainly the cause.
Act
If it's a broken deployment and you need to get back to working state now:
# Redeploy the app — rolls back to previous working configuration
skytells apps redeploy my-apiOr if you've fixed the code:
skytells deploy my-api
skytells logs my-api --type deployment --followConfirm the errors stop
Return to Errors in Cognition. The frequency count for the grouped error should stop climbing. If the pattern cleared, the fix worked.
Common error patterns and what they mean
| Error pattern | Likely cause |
|---|---|
Cannot read properties of undefined | Code assumes a field exists but the API or database returned something different |
Connection refused | A downstream service (database, cache, external API) is unreachable |
Timeout after Xms | A downstream call is too slow — the caller gave up waiting |
401 Unauthorized from an internal call | A service-to-service token expired or was rotated |
Out of memory | The app is consuming more memory than its allocation; see Runtime Health |
ECONNRESET | Network connection dropped mid-request — check infrastructure and upstream services |
Working with errors from the CLI
You can pull error data directly from the terminal — useful for quick checks and scripting:
# List recent errors
skytells cognition errors
# Last 20 errors
skytells cognition errors --limit 20
# JSON output for scripting
skytells cognition errors --jsonCheck the error count in a quick health check:
ERROR_COUNT=$(skytells cognition errors --json | jq 'length')
echo "Errors in current window: $ERROR_COUNT"After the incident
Once the immediate problem is resolved:
- Note what caused it (broken deployment, missing env var, dependency change, traffic pattern).
- Add whatever health checks or alerting would have caught it earlier.
- Consider whether the error should have been an
Errorat all — some things your code treats as exceptions might be better handled gracefully.
What you now know
| Task | How to do it |
|---|---|
| Find the dominant errors | Cognition → Errors, sort by frequency |
| Read a stack trace | Click an error row, read from top frame down |
| Correlate with a deployment | Compare error first-seen timestamp to deployment history |
| Stop an ongoing incident | skytells apps redeploy my-api or skytells deploy my-api |
| Confirm the fix worked | Watch the error frequency stop climbing in Errors view |
| Pull errors from terminal | skytells cognition errors |
What is Cognition?
The five Cognition views explained — when to open each one, what questions it answers, and how to navigate there.
Security Threats & Anomalies
Understand what Skytells detects as a security event, how severity is assessed, and how anomaly detection distinguishes real problems from noise.