Logging — Resumable.js Ops

Ops·Updated 2025-11-01

You can't fix what you can't see. Upload pipelines are multi-step, asynchronous, and distributed across client and server—exactly the kind of system where problems hide until they compound into user-visible failures. Structured logging transforms your upload infrastructure from a black box into a transparent, debuggable system. This page covers what to log at each stage of the Resumable.js upload lifecycle, how to correlate events using resumableIdentifier, client-side event logging, server-side tracking, alerting thresholds, and the dashboard metrics that actually matter in production. For the full infrastructure reference, see the ops hub.

Upload monitoring dashboard with throughput and error rate charts

What to Log: The Upload Lifecycle

A single file upload in Resumable.js passes through distinct stages. Each stage should emit a structured log entry. Miss one and you'll have a gap in your timeline exactly where the bug lives.

Upload lifecycle events

Event	Where	Key data
Upload started	Client	`resumableIdentifier`, filename, file size, total chunks, timestamp
Chunk upload begin	Client + Server	`resumableIdentifier`, `resumableChunkNumber`, chunk size
Chunk upload success	Server	`resumableIdentifier`, `resumableChunkNumber`, duration, bytes received
Chunk upload error	Client + Server	`resumableIdentifier`, `resumableChunkNumber`, HTTP status, error message
Chunk test (GET)	Server	`resumableIdentifier`, `resumableChunkNumber`, result (found/not found)
All chunks received	Server	`resumableIdentifier`, total chunks, total bytes, elapsed time
Assembly started	Server	`resumableIdentifier`, target path, expected size
Assembly complete	Server	`resumableIdentifier`, final file path, final size, checksum, assembly duration
Cleanup	Server	`resumableIdentifier`, chunks deleted, temp space reclaimed

That's nine distinct event types for a single file. It sounds like a lot. It's not—each one is a single structured log line, and you'll be grateful for every one of them when debugging a partial upload failure at 2 AM.

Structured Log Format

Plain text logs are useless at scale. When you're processing thousands of concurrent uploads, grepping through unstructured text for a specific file identifier is slow and error-prone. Use structured JSON logs from the start.

{
  "timestamp": "2025-10-15T14:23:07.441Z",
  "level": "info",
  "event": "chunk_received",
  "resumableIdentifier": "abc123-photo-jpg-4194304",
  "chunkNumber": 7,
  "totalChunks": 25,
  "chunkSize": 2097152,
  "durationMs": 847,
  "userId": "user_9f3a2b",
  "ip": "203.0.113.42"
}

Every field is queryable. Want to see all chunks for a specific upload? Filter by resumableIdentifier. Want to find slow chunks? Sort by durationMs. Want to correlate with user reports? Search by userId. Structured logs make these queries trivial in any log aggregation tool—ELK, Loki, Datadog, CloudWatch Logs Insights, whatever your stack uses.

Correlation with resumableIdentifier

The resumableIdentifier field is the single most important piece of data in your upload logs. Resumable.js generates this identifier from the file's name, size, and a relative path. It stays consistent across retries, browser refreshes, and even resumed sessions. That makes it the natural correlation key for every log entry related to a specific upload.

On the server, extract it from every request:

app.post('/api/upload', (req, res) => {
  const identifier = req.body.resumableIdentifier || req.query.resumableIdentifier;
  const chunkNumber = parseInt(req.body.resumableChunkNumber || req.query.resumableChunkNumber);

  logger.info({
    event: 'chunk_received',
    resumableIdentifier: identifier,
    chunkNumber: chunkNumber,
    totalChunks: parseInt(req.body.resumableTotalChunks),
    contentLength: req.headers['content-length'],
  });

  // Process chunk...
});

On the client, attach the same identifier to your logs:

r.on('chunkingComplete', (file) => {
  clientLogger.info({
    event: 'upload_started',
    resumableIdentifier: file.uniqueIdentifier,
    fileName: file.fileName,
    fileSize: file.size,
    totalChunks: file.chunks.length,
  });
});

r.on('fileError', (file, message) => {
  clientLogger.error({
    event: 'upload_error',
    resumableIdentifier: file.uniqueIdentifier,
    error: message,
  });
});

With both client and server logs keyed to the same identifier, you can reconstruct the complete timeline of any upload—from the moment the user selected the file to the final assembly on the server.

Client-Side Event Logging

Resumable.js exposes a rich event system. The events that matter most for logging:

r.on('fileAdded', (file) => { /* Log file metadata */ });
r.on('fileProgress', (file) => { /* Log progress milestones (25%, 50%, 75%) */ });
r.on('fileSuccess', (file, message) => { /* Log completion */ });
r.on('fileError', (file, message) => { /* Log failure with server response */ });
r.on('fileRetry', (file) => { /* Log retry — this is a signal worth tracking */ });

Don't log every fileProgress event—that fires continuously and creates noise. Instead, log at meaningful thresholds or at fixed intervals:

const loggedMilestones = new Set();

r.on('fileProgress', (file) => {
  const percent = Math.floor(file.progress() * 100);
  const milestone = Math.floor(percent / 25) * 25;
  if (milestone > 0 && !loggedMilestones.has(`${file.uniqueIdentifier}-${milestone}`)) {
    loggedMilestones.add(`${file.uniqueIdentifier}-${milestone}`);
    clientLogger.info({
      event: 'upload_progress',
      resumableIdentifier: file.uniqueIdentifier,
      percent: milestone,
    });
  }
});

Alerting Thresholds

Raw logs are useful for debugging. Alerts are useful for not needing to debug in the first place. Define thresholds that trigger notifications before users start complaining.

Metric	Warning threshold	Critical threshold
Chunk error rate	> 5% over 5 minutes	> 15% over 5 minutes
Assembly failure rate	> 1% over 15 minutes	> 5% over 15 minutes
p95 chunk upload time	> 10 s	> 30 s
Upload abandonment rate	> 20% over 1 hour	> 40% over 1 hour
Temp directory disk usage	> 70% capacity	> 90% capacity

Chunk error rates above 5% usually indicate a server-side issue—disk full, timeout misconfiguration, or a bad deployment. Assembly failures above 1% suggest data corruption or missing chunks, possibly due to premature cleanup of temporary files.

The upload abandonment rate is one that teams often overlook. If users start uploads and never complete them, something in your pipeline is broken—but it might not throw errors. Maybe timeouts are too aggressive. Maybe the UI doesn't communicate progress clearly. This metric surfaces UX problems that error rates alone won't catch.

Dashboard Metrics

Beyond alerting, maintain a dashboard with these operational metrics:

Upload throughput: total MB/s across all active uploads, sampled every minute
Active uploads: count of unique resumableIdentifier values with activity in the last 60 seconds
Chunk success rate: percentage of chunk POST requests returning 200, bucketed by minute
p50/p95/p99 chunk duration: latency distribution for chunk uploads
Assembly queue depth: number of files waiting for assembly after all chunks arrived
Temp storage utilization: disk usage of your chunk staging directory

The assembly queue depth is particularly telling. If it grows steadily, your assembly process is slower than your ingest rate. You either need faster assembly (parallel concatenation, faster disk) or a queue with backpressure that slows down new uploads when assembly falls behind.

Log Rotation for Temp Directories

Chunk temporary files accumulate in staging directories and must be cleaned up after assembly—or after a timeout if the upload is abandoned. Log these cleanup operations:

async function cleanupStaleTempFiles(maxAgeMs = 24 * 60 * 60 * 1000) {
  const tempDir = '/tmp/resumable-chunks';
  const entries = await fs.readdir(tempDir, { withFileTypes: true });

  for (const entry of entries) {
    if (entry.isDirectory()) {
      const stat = await fs.stat(path.join(tempDir, entry.name));
      const age = Date.now() - stat.mtimeMs;
      if (age > maxAgeMs) {
        await fs.rm(path.join(tempDir, entry.name), { recursive: true });
        logger.info({
          event: 'temp_cleanup',
          identifier: entry.name,
          ageHours: Math.round(age / 3600000),
          action: 'deleted',
        });
      }
    }
  }
}

Run this on a schedule—every hour is reasonable for most deployments. The logs tell you how many abandoned uploads you're cleaning up and how old they are. A sudden spike in stale temp directories is an early warning that something upstream is preventing uploads from completing.

Good logging turns your upload pipeline from "it works, I think" into "I know exactly what happened to every byte." The upfront cost is minimal—a structured logger and a few event handlers. The payoff is every production incident you diagnose in minutes instead of hours.