Cognition: Runtime Observer
Six process health monitors sampling event loop timing, memory, CPU, garbage collection, active handles, and HTTP metrics. Configurable thresholds emit anomaly events when limits are crossed.
The Runtime Observer samples your Node.js process on a set interval and sends snapshots to Skytells. It runs six independent monitors covering the areas most likely to show performance problems: event loop timing, heap memory, CPU load, garbage collection behaviour, active handles, and HTTP traffic. When a monitored value crosses a configured threshold, it emits a separate anomaly event alongside the snapshot.
Six Monitors
The observer is composed of six independent sub-monitors, each targeting a distinct resource:
| Monitor | Source | What it measures |
|---|---|---|
| Event Loop | perf_hooks.monitorEventLoopDelay(), eventLoopUtilization() | Loop utilization, p50/p99/max lag |
| Memory | process.memoryUsage() | RSS, heap used/total, external, array buffers, growth rate |
| CPU | process.cpuUsage() | User %, system %, total %, core count |
| Garbage Collection | PerformanceObserver('gc') | Collection counts by type, total GC duration, max pause |
| Active Handles | process._getActiveHandles() | Active handles/requests, trend detection |
| HTTP Metrics | PerformanceObserver('http') | Request count, average duration, error rate |
Enabling and Disabling
The Runtime Observer is enabled by default. To control it:
// Enabled by default — configure the interval
Cognition.init({
apiKey: process.env.SKYTELLS_API_KEY!,
projectId: process.env.SKYTELLS_PROJECT_ID!,
runtime: {
enabled: true,
snapshotIntervalMs: 10_000, // default: 10 seconds
},
});
// Disable entirely
Cognition.init({
apiKey: process.env.SKYTELLS_API_KEY!,
projectId: process.env.SKYTELLS_PROJECT_ID!,
runtime: { enabled: false },
});Snapshot Interval
The snapshot timer fires every snapshotIntervalMs (default: 10 seconds). Each tick:
- Collects data from all six monitors simultaneously
- Assembles a
RuntimeSnapshotevent - Sends it through the transport pipeline (
beforeSend→ buffer → batching) - Checks all anomaly thresholds
- Emits
AnomalyEvents for any threshold breaches
The timer is unref()'d — it will never prevent your process from exiting naturally.
Metrics Detail
Event Loop
Source: perf_hooks.monitorEventLoopDelay() + perf_hooks.eventLoopUtilization()
A monitorEventLoopDelay histogram with 20ms resolution is enabled at startup. On each collection, percentiles are read and the histogram is reset. ELU is measured as a delta from the previous collection.
| Metric | Type | Unit | Description |
|---|---|---|---|
utilization | number | ratio (0–1) | Fraction of time the event loop was not idle. 0.7 = 70% busy. |
lagP50 | number | ms | 50th percentile event loop delay (median) |
lagP99 | number | ms | 99th percentile event loop delay |
lagMax | number | ms | Maximum delay observed in the interval |
What to look for:
utilization > 0.8— Event loop is heavily loaded; responses will slowlagP99 > 100ms— Significant latency spikes; likely synchronous blocking codelagMax > 500ms— Something is blocking the event loop for extended periods
Memory
Source: process.memoryUsage()
| Metric | Type | Unit | Description |
|---|---|---|---|
rss | number | bytes | Resident Set Size — total OS memory allocated for this process |
heapUsed | number | bytes | V8 heap currently in use |
heapTotal | number | bytes | V8 total heap size (allocated, including free) |
external | number | bytes | Memory used by C++ objects bound to JS |
arrayBuffers | number | bytes | Memory for ArrayBuffer and SharedArrayBuffer |
heapUsedPercent | number | ratio (0–1) | heapUsed / heapTotal |
growthRate | number | bytes/sec | Rate of heap growth since last collection |
What to look for:
- Sustained positive
growthRate— Potential memory leak heapUsedPercent > 0.9— Heap pressure; GC will become aggressiverssgrowing whileheapUsedis stable — Native memory leak (C++ addons,Buffer.alloc)
CPU
Source: process.cpuUsage()
Converts microseconds of CPU time to a percentage of elapsed wall-clock time since the last collection.
| Metric | Type | Unit | Description |
|---|---|---|---|
userPercent | number | % | CPU time in user-space as % of wall-clock time |
systemPercent | number | % | CPU time in kernel-space (syscalls, I/O) |
totalPercent | number | % | userPercent + systemPercent |
coreCount | number | count | Number of logical CPU cores |
What to look for:
totalPercent > 100— Process is using more than one core (worker threads)systemPercent > 30— Heavy I/O or syscall overheaduserPercentspikes — CPU-intensive computation on the main thread
Garbage Collection
Source: PerformanceObserver observing 'gc' entries
| Metric | Type | Unit | Description |
|---|---|---|---|
majorCount | number | count | Mark-sweep-compact (full GC) collections |
minorCount | number | count | Scavenge (young generation) collections |
incrementalCount | number | count | Incremental marking passes |
totalDuration | number | ms | Cumulative GC time across all types in the interval |
maxPause | number | ms | Longest single GC pause in the interval |
V8 GC kinds:
| Kind | Description |
|---|---|
| Scavenge | Quick collection of the young generation |
| Mark-Sweep-Compact | Full GC — marks, sweeps, and compacts the old generation |
| Incremental Marking | Partial marking done incrementally between frames |
What to look for:
maxPause > 50ms— Long GC pauses causing latency spikes- High
majorCount— Frequent full GCs indicate heap pressure - High
totalDurationrelative to snapshot interval — GC is consuming significant CPU
Active Handles
Source: process._getActiveHandles() and process._getActiveRequests()
| Metric | Type | Unit | Description |
|---|---|---|---|
activeHandles | number | count | Active handles (sockets, timers, file descriptors) |
activeRequests | number | count | Active libuv requests |
trend | string | enum | 'stable' / 'growing' / 'shrinking' |
Trend detection: The monitor keeps a sliding window of the last 10 handle counts. If the last 3 values are monotonically increasing, the trend is 'growing'; decreasing = 'shrinking'; otherwise 'stable'.
What to look for:
trend: 'growing'— Possible handle leak (connections not closed, timers not cleared)- High
activeHandles— Many open connections or file descriptors
HTTP Metrics
Source: PerformanceObserver observing 'http' entries (Node.js 18.2+)
| Metric | Type | Unit | Description |
|---|---|---|---|
totalRequests | number | count | Number of outgoing HTTP requests in the interval |
avgDurationMs | number | ms | Average request duration |
errorRate | number | ratio (0–1) | Fraction of requests with status ≥ 400 |
entries | array | — | Individual request details (method, status, URL, duration) |
The 'http' performance entry type degrades silently if unavailable in a given Node.js version.
On-Demand Snapshots
Collect a runtime snapshot at any time without waiting for the interval:
const snapshot = cognition.getRuntimeSnapshot();
if (snapshot) {
console.log(`Heap: ${Math.round(snapshot.memory.heapUsed / 1024 / 1024)}MB`);
console.log(`ELU: ${(snapshot.eventLoop.utilization * 100).toFixed(1)}%`);
console.log(`GC max pause: ${snapshot.gc.maxPause}ms`);
}Returns null if the Runtime Observer is disabled.
Anomaly Detection
Configure thresholds to trigger anomaly events when specific metrics are breached. Anomaly events are emitted in addition to the regular snapshot — they don't replace it. Multiple anomalies can fire per snapshot interval if multiple thresholds are breached simultaneously.
Cognition.init({
apiKey: process.env.SKYTELLS_API_KEY!,
projectId: process.env.SKYTELLS_PROJECT_ID!,
runtime: {
thresholds: {
heapUsedMb: 512, // Anomaly when heap > 512 MB
eventLoopLagMs: 100, // Anomaly when p99 lag > 100 ms
eluPercent: 0.8, // Anomaly when ELU > 80%
},
},
});Anomaly Types
| Anomaly Type | Triggered when |
|---|---|
heap | memory.heapUsed (in MB) > thresholds.heapUsedMb |
event_loop_lag | eventLoop.lagP99 (in ms) > thresholds.eventLoopLagMs |
elu | eventLoop.utilization (0–1) > thresholds.eluPercent |
Anomaly Event Structure
interface AnomalyEvent {
type: 'anomaly';
timestamp: number;
anomalyType: 'heap' | 'event_loop_lag' | 'elu' | 'gc_pressure';
threshold: number;
actual: number;
message: string; // e.g. "Heap usage 623.4MB exceeds threshold 512MB"
}Runtime Snapshot Structure
interface RuntimeSnapshot {
type: 'runtime_snapshot';
timestamp: number;
memory: {
rss: number;
heapUsed: number;
heapTotal: number;
external: number;
arrayBuffers: number;
};
cpu: {
user: number; // percentage
system: number; // percentage
};
eventLoop: {
utilization: number; // 0–1
lagP50: number; // ms
lagP99: number; // ms
lagMax: number; // ms
};
gc: {
majorCount: number;
minorCount: number;
incrementalCount: number;
totalDuration: number; // ms
maxPause: number; // ms
};
handles: {
active: number;
requests: number;
};
}Using Snapshots in a Health Check Endpoint
app.get('/health', (req, res) => {
const snapshot = cognition.getRuntimeSnapshot();
res.json({
status: 'ok',
sdk: {
initialized: cognition.isInitialized,
bufferedEvents: cognition.bufferSize,
droppedEvents: cognition.droppedCount,
},
runtime: snapshot
? {
heapUsedMb: Math.round(snapshot.memory.heapUsed / 1024 / 1024),
eventLoopLagP99Ms: snapshot.eventLoop.lagP99.toFixed(2),
eluPercent: (snapshot.eventLoop.utilization * 100).toFixed(1),
gcMajorCount: snapshot.gc.majorCount,
activeHandles: snapshot.handles.active,
}
: null,
});
});Related
- Configuration —
runtime.enabled,snapshotIntervalMs,thresholds - Analytics — View runtime health data in the Console and CLI
- Examples — Custom threshold alerting and periodic health check patterns
How is this guide?
Error Capture
How Cognition catches errors automatically and manually, builds breadcrumb trails, parses V8 stack frames, and enriches events with user context and severity levels.
Security Scanner
Real-time HTTP request scanning for SQL injection, XSS, path traversal, and command injection. Ships as Express middleware or a standalone function, with 16 built-in detection patterns.