Illustration for Retries and Resume guide

Retries and Resume

Handling network failures, implementing retry logic, and resuming interrupted uploads with Resumable.js.

Guides·Updated 2025-10-28

Uploads fail. Networks drop, laptops close, browsers crash, and servers restart. The entire reason Resumable.js exists is to handle these interruptions gracefully—picking up exactly where things left off instead of forcing users to start over. This guide walks through retry configuration, exponential backoff strategies, the testChunks mechanism for resumption, server-side chunk tracking, browser refresh recovery, and practical tuning advice for production deployments. For additional Resumable.js patterns, see the guides hub.

Flowchart of retry logic with exponential backoff timing

Why Uploads Fail

Before configuring retries, it helps to understand the failure modes. They fall into a few categories:

Transient network errors. Wi-Fi drops for three seconds, a mobile user enters an elevator, or a corporate proxy resets an idle connection. These are temporary. Retrying after a short delay usually succeeds.

Server overload. Your upload endpoint returns a 503 or times out because it's under heavy load. Retrying immediately makes this worse—you need backoff.

Permanent errors. A 404 means your endpoint doesn't exist. A 415 means the server rejected the content type. A 401 means authentication expired. Retrying these is pointless.

Client-side failures. The user closes the tab, the browser crashes, or the device runs out of memory. These require resumption rather than simple retry.

Each category demands a different response. Resumable.js gives you the configuration surface to handle all of them.

Retry Strategy: maxChunkRetries and chunkRetryInterval

The two primary knobs for retry behavior are straightforward:

const r = new Resumable({
  target: '/api/upload',
  chunkSize: 2 * 1024 * 1024,
  maxChunkRetries: 3,
  chunkRetryInterval: 2000, // milliseconds
});

When a chunk upload fails (network error or a response status not in permanentErrors), Resumable.js waits chunkRetryInterval milliseconds and tries again, up to maxChunkRetries times. After exhausting retries, the upload pauses and fires an error event.

Three retries at two-second intervals is a sensible default for most applications. It handles the "Wi-Fi blipped for a moment" scenario without making the user wait too long before seeing an error for genuine failures.

What counts as a failure?

Any HTTP response status not in the 200–299 range triggers a retry, unless that status code appears in the permanentErrors array. Network-level errors (connection refused, DNS failure, timeout) also trigger retries.

Exponential Backoff

A fixed retry interval works fine for isolated blips. But when a server is overloaded, three clients all retrying at exactly two-second intervals create synchronized thundering-herd traffic. Exponential backoff spreads retries over time.

Resumable.js doesn't implement exponential backoff natively, but you can layer it on using the event system:

let retryAttempt = 0;

r.on('fileRetry', (file, chunk) => {
  retryAttempt++;
  const backoff = Math.min(2000 * Math.pow(2, retryAttempt), 30000);
  const jitter = Math.random() * 1000;
  r.opts.chunkRetryInterval = backoff + jitter;
});

r.on('fileProgress', () => {
  retryAttempt = 0; // reset on successful progress
  r.opts.chunkRetryInterval = 2000;
});

The jitter component is critical. Without it, every client doubles at the same rate and still hits the server in lockstep. A random offset of up to a second desynchronizes the retry waves.

Why cap at 30 seconds? Because users have limits. If an upload sits idle for more than 30 seconds with no visible progress, most people assume it's broken and refresh the page—which defeats the purpose of retry logic entirely.

testChunks: The Resume Mechanism

Retry handles failures during an active upload session. Resumption handles a completely new session—the user refreshed the page, reopened the browser, or even switched devices (if your identifier scheme supports it).

The magic is in testChunks:

const r = new Resumable({
  target: '/api/upload',
  testChunks: true,
  chunkSize: 2 * 1024 * 1024,
});

When testChunks is true, Resumable.js sends a GET request for each chunk before uploading it. The server checks whether that chunk already exists (matching the resumableIdentifier, resumableChunkNumber, and resumableChunkSize parameters). If the server responds with 200, the chunk is skipped. If it responds with 204 or any non-200 status, the chunk gets uploaded.

This means resumption is essentially free. Re-add the same file, and Resumable.js quickly discovers which chunks already made it to the server. Only the missing ones get uploaded.

The test request overhead

There's a cost: one GET request per chunk before upload begins. For a 500 MB file with 2 MB chunks, that's 250 GET requests just to check status. On a fast connection with no prior upload, this adds measurable latency.

A pragmatic compromise: enable testChunks only when you have reason to believe a previous upload exists. You can toggle it based on whether you find a matching identifier in localStorage:

const fileId = generateFileIdentifier(file);
const hasPriorUpload = localStorage.getItem(`upload_${fileId}`);

const r = new Resumable({
  testChunks: !!hasPriorUpload,
  // ...
});

r.on('fileAdded', (file) => {
  localStorage.setItem(`upload_${file.uniqueIdentifier}`, 'in-progress');
});

r.on('fileSuccess', (file) => {
  localStorage.removeItem(`upload_${file.uniqueIdentifier}`);
});

Server-Side Chunk Tracking

Your server needs to know which chunks it has already received. Two common patterns:

File-based tracking. Write each chunk to disk as [identifier]/chunk_[number]. To answer a GET test request, check if the file exists and its size matches expectations. Simple, works everywhere, but slow on networked file systems.

Database tracking. Insert a row per chunk with identifier, chunk number, and received timestamp. Test requests become a quick database lookup. More infrastructure, but much faster at scale and easier to query for cleanup.

Whichever approach you choose, the server's GET handler follows the same logic:

GET /api/upload?resumableChunkNumber=7&resumableIdentifier=abc123&resumableChunkSize=2097152

→ Chunk exists and correct size? Return 200.
→ Otherwise? Return 204 (or 404).

Keep the test endpoint fast. If it takes 500ms per check, those 250 pre-flight requests for a large file add over two minutes of latency before a single byte uploads.

Handling Browser Refresh

When a user refreshes mid-upload, the JavaScript state is gone. The upload progress bar resets. But the chunks already on your server are still there. That's the entire point.

To give users a seamless experience after refresh:

  1. Persist the file identifier and filename to localStorage or sessionStorage when the upload starts.
  2. On page load, check for incomplete uploads. If found, prompt the user: "You have an unfinished upload. Resume?"
  3. Have the user re-select the file (browsers won't let you programmatically access files from a previous session for security reasons).
  4. Initialize Resumable.js with testChunks: true. The library probes the server and skips completed chunks automatically.

The user sees a progress bar that jumps forward to where they left off. It feels like magic, but it's just good bookkeeping.

permanentErrors Configuration

Not every error deserves a retry. The permanentErrors config tells Resumable.js which HTTP status codes should immediately halt the upload:

const r = new Resumable({
  permanentErrors: [400, 401, 403, 404, 409, 415, 500],
  // ...
});

The default includes [404, 409, 415, 500]. Consider adding:

  • 401 if your auth tokens can expire mid-upload. Retrying with an expired token is futile—catch the error, refresh the token, and restart.
  • 403 if authorization failures are permanent for that resource.
  • 400 if your server uses it for validation errors that won't change on retry.

Be cautious with 500. Some servers return 500 for transient issues (database timeout, temporary disk full). If yours does, leave it out of permanentErrors and let the retry logic handle it.

Practical Retry Tuning

After years of tuning upload systems, here's what works:

  • Start with 3 retries and 2-second intervals. Handles 90% of transient failures.
  • Implement exponential backoff with jitter for any deployment serving more than a few hundred concurrent users.
  • Monitor retry rates. If more than 5% of chunks require retries, investigate whether your server is the bottleneck or your chunk size is too large for your user base's network conditions.
  • Set a total upload timeout at the application level. If a file hasn't completed in a reasonable window (based on file size and expected throughput), surface an error rather than letting it retry indefinitely.
  • Log retry events server-side. Correlate them with server metrics. A spike in retries often precedes a server-side incident.

The goal is simple: uploads should survive every common interruption without the user noticing, and fail clearly for problems that can't be retried away. Resumable.js gives you the primitives. Your configuration turns them into a resilient system.