Upload endpoints are among the most resource-intensive routes in any web application—each request carries a multi-megabyte body that hits disk I/O, burns CPU during validation, and holds a connection open far longer than a typical API call. Without rate limiting, a handful of aggressive clients can saturate your upload pipeline and starve everyone else. This page walks through per-user and global rate limiting strategies, how Resumable.js interacts with 429 responses, client-side throttling with simultaneousUploads, and the server-side architectures that keep upload infrastructure healthy under load. For more infrastructure guidance, see the ops hub.
Why Upload Endpoints Need Special Treatment
A standard API endpoint handles a JSON payload of a few kilobytes and returns in under 100 milliseconds. An upload endpoint receives a 2 MB chunk, writes it to temporary storage, validates metadata, and returns an acknowledgment—a process that might take 500 milliseconds to several seconds depending on your storage backend. That's 10–50x more resource consumption per request.
Now multiply by concurrent users. Ten users uploading simultaneously with simultaneousUploads: 3 means 30 active upload connections. Each one is writing to disk, holding a database or Redis lock for deduplication checks, and consuming a worker thread or event loop slot. Without limits, a burst of 100 users doing the same thing creates 300 concurrent upload requests that will overwhelm most single-server deployments.
Rate limiting isn't about being stingy with resources. It's about ensuring predictable performance for every user.
Rate Limiting Algorithms
Two algorithms dominate production rate limiting for upload endpoints. Choosing between them depends on whether you want to permit short bursts or enforce strict throughput caps.
Token Bucket
The token bucket fills at a steady rate (say, 10 tokens per minute) and holds a maximum number of tokens (say, 20). Each request consumes one token. If the bucket is empty, the request is rejected. This allows short bursts—a user can fire 20 chunk uploads in rapid succession—but sustained throughput is capped at the fill rate.
For upload endpoints, token bucket works well because Resumable.js naturally sends chunks in bursts (governed by simultaneousUploads) followed by brief pauses between files.
Sliding Window
A sliding window counter tracks requests over a rolling time period. If a user sends more than N requests in the last M seconds, subsequent requests are rejected until old entries fall out of the window. This provides smoother rate enforcement with no burst allowance.
| Algorithm | Burst tolerance | Implementation complexity | Best for |
|---|---|---|---|
| Token bucket | High | Medium (need atomic counter + timestamp) | Upload endpoints with bursty traffic |
| Sliding window | None | Medium (sorted set or ring buffer) | Strict per-user fairness |
Client-Side Throttling with simultaneousUploads
Before requests even reach your server, Resumable.js provides a client-side throttle:
const r = new Resumable({
target: '/api/upload',
chunkSize: 2 * 1024 * 1024,
simultaneousUploads: 3,
});
This setting caps the number of in-flight chunk requests per Resumable instance. Setting it to 3 means the browser never sends more than 3 concurrent chunk POSTs. This is your first line of defense—it keeps well-behaved clients from overwhelming the server.
But here's the thing: simultaneousUploads is a client-side setting. A malicious client can ignore it entirely. Anyone with browser DevTools can instantiate multiple Resumable instances or bypass the library altogether. Server-side rate limiting is mandatory; client-side throttling is just a courtesy that improves the experience for legitimate users.
Server-Side 429 Responses
When your rate limiter rejects a request, respond with HTTP 429 Too Many Requests and include a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 5
Content-Type: application/json
{"error": "Rate limit exceeded", "retryAfter": 5}
Resumable.js treats non-200 responses as failures and triggers its retry logic. The chunk that received a 429 will be retried after chunkRetryInterval milliseconds. If your Retry-After header suggests 5 seconds and your chunkRetryInterval is set to 5000, the timing aligns naturally. If the retry interval is shorter, the client may hit the rate limit again on the retry—but it'll eventually back off as retries accumulate.
Tuning maxChunkRetries for rate-limited environments
const r = new Resumable({
target: '/api/upload',
chunkSize: 2 * 1024 * 1024,
simultaneousUploads: 2,
maxChunkRetries: 10,
chunkRetryInterval: 5000,
});
In environments with aggressive rate limiting, increase maxChunkRetries beyond the default. A chunk that gets rate-limited three times in a row isn't failing because of corruption or server errors—it just needs more time. Setting maxChunkRetries to 10 with a 5-second interval means Resumable.js will keep trying for nearly a minute before giving up on a single chunk.
Nginx Rate Limiting
Nginx's limit_req module provides a performant, battle-tested rate limiter at the reverse proxy layer:
http {
limit_req_zone $binary_remote_addr zone=upload:10m rate=10r/s;
server {
location /api/upload {
limit_req zone=upload burst=20 nodelay;
limit_req_status 429;
proxy_pass http://upload_backend;
}
}
}
This configuration allows 10 requests per second per IP address with a burst buffer of 20. The nodelay parameter processes burst requests immediately rather than queuing them (which would add latency). The limit_req_status 429 ensures rejected requests get a proper 429 status code instead of Nginx's default 503.
Per-user vs. per-IP limiting
$binary_remote_addr limits by IP address, which works for most deployments. But if your users sit behind a corporate NAT or shared proxy, hundreds of users share one IP. In that case, use a custom variable based on an authentication token or session identifier:
map $http_authorization $limit_key {
default $binary_remote_addr;
"~Bearer " $http_authorization;
}
limit_req_zone $limit_key zone=upload:20m rate=10r/s;
Application-Level Rate Limiting
For more granular control—rate limits that vary by user tier, file type, or upload priority—implement limiting in your application code. A Redis-backed sliding window is the standard approach:
async function checkRateLimit(userId, limit, windowSeconds) {
const key = `upload_rate:${userId}`;
const now = Date.now();
const windowStart = now - windowSeconds * 1000;
const pipe = redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart);
pipe.zadd(key, now, `${now}-${Math.random()}`);
pipe.zcard(key);
pipe.expire(key, windowSeconds);
const results = await pipe.exec();
const requestCount = results[2][1];
return requestCount <= limit;
}
This approach lets you implement tiered limits: free users get 5 requests/second, paid users get 20. Or you can limit by total bytes uploaded per day rather than request count—something that request-count-based Nginx rules can't do.
Prioritization and Backpressure
Not all uploads are equal. A user re-uploading a failed chunk (resume scenario) should arguably get priority over a fresh upload starting from scratch. If your server is under load, consider a priority queue where resume requests bypass the rate limiter or get a higher limit.
Backpressure is the server's way of telling clients to slow down. Beyond 429 responses, you can implement progressive backpressure by artificially delaying responses under load:
app.post('/api/upload', async (req, res) => {
const load = getCurrentServerLoad();
if (load > 0.8) {
// Add artificial delay proportional to load
const delay = Math.floor((load - 0.8) * 5000);
await new Promise(r => setTimeout(r, delay));
}
// Process chunk...
});
This slows clients down without outright rejecting them. Resumable.js doesn't notice the delay—it just sees a slower response—and the natural effect is reduced concurrency as clients wait longer between chunk completions.
Rate limiting for upload endpoints requires thinking in two dimensions: request count and resource consumption. A single upload request consumes orders of magnitude more resources than a typical GET. Design your limits accordingly, layer them from the edge (Nginx) through the application, and trust that well-configured simultaneousUploads on the client side will keep most traffic within reasonable bounds. The rate limiter is your safety net for everything else.
