NestJS Rate Limiting — 4 Production Strategies With Full Implementation Code

We did not have rate limiting on our first public API. We learned why that was a mistake when a client's integration loop — a while(true) that should have been a setTimeout — sent 50,000 requests in 12 minutes. The database fell over, the pager went off, and we spent the next two hours explaining to three other clients that "the API is slow" was because one integration had eaten the entire connection pool.
NestJS API rate limiting is how you make sure a single misbehaving client cannot take down the whole system. It is not optional on a production API, and it is not a feature you add after the outage. Every NestJS auth module needs one, every webhook endpoint needs one, and every public route definitely needs one.
This is a complete guide to NestJS API rate limiting with four strategies: simple in-memory for small apps, Redis-backed for distributed production, per-user tracking for authenticated endpoints, and plan-based limits tied to subscription tiers.
What Is NestJS API Rate Limiting?
Rate limiting controls how many HTTP requests a client can make within a specific time window. When the limit is exceeded, the server returns 429 Too Many Requests instead of processing the request. The goal is not to block legitimate users — it is to contain the blast radius when something goes wrong.
The standard approach uses the @nestjs/throttler package. You define a time window (TTL) and a request limit, and NestJS tracks how many requests each client has made within that window. The simplest configuration looks like this:
1// app.module.ts — basic global rate limiting
2import { Module } from '@nestjs/common';
3import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler';
4import { APP_GUARD } from '@nestjs/core';
5
6@Module({
7 imports: [
8 ThrottlerModule.forRoot([{
9 ttl: 60000, // 60 seconds
10 limit: 10, // 10 requests per 60 seconds
11 }]),
12 ],
13 providers: [
14 {
15 provide: APP_GUARD,
16 useClass: ThrottlerGuard,
17 },
18 ],
19})
20export class AppModule {}This is the starting point. It works. But it is also the same configuration every tutorial shows you and then stops, which is roughly where the useful part should begin.
Strategy 1: In-Memory Rate Limiting for Small Apps
For a single-instance deployment, in-memory rate limiting is fine. The @nestjs/throttler package stores counters in a Map by default. No Redis, no external dependency, no infrastructure to maintain.
1// app.module.ts — in-memory rate limiting
2ThrottlerModule.forRoot([{
3 name: 'short',
4 ttl: 1000, // 1 second
5 limit: 3, // 3 requests per second
6}, {
7 name: 'medium',
8 ttl: 10000, // 10 seconds
9 limit: 20, // 20 requests per 10 seconds
10}, {
11 name: 'long',
12 ttl: 60000, // 1 minute
13 limit: 100, // 100 requests per minute
14}])Multiple throttler definitions let you enforce different limits simultaneously — a request must pass all three tiers to proceed. This catches burst abuse (the short window) and sustained scraping (the long window) with a single configuration.
The per-route override decorator gives you granular control:
1// auth.controller.ts — stricter limits on sensitive routes
2import { Controller, Post } from '@nestjs/common';
3import { Throttle } from '@nestjs/throttler';
4
5@Controller('auth')
6export class AuthController {
7 @Throttle({ default: { limit: 3, ttl: 60000 } })
8 @Post('login')
9 async login() {
10 // 3 login attempts per minute — brute force protection
11 }
12
13 @Throttle({ default: { limit: 5, ttl: 60000 } })
14 @Post('register')
15 async register() {
16 // 5 registration attempts per minute
17 }
18}In-memory works well for single-server setups. The problem: if you scale to multiple instances, each server has its own counter, and a client can exhaust the limit on one server and just hit the next one.

Strategy 2: Redis-Based Rate Limiting for Production
Once you have more than one server instance, in-memory rate limiting becomes theater. An attacker behind a load balancer can rotate through your instances and stay under each server's individual limit.
Redis solves this by centralising the counter. Install the Redis storage backend:
1npm install @nestjs/throttler-storage-redis ioredisThen configure the throttler to use Redis:
1// app.module.ts — Redis-backed rate limiting
2import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler';
3import { ThrottlerStorageRedisService } from '@nestjs/throttler-storage-redis';
4import Redis from 'ioredis';
5import { Module } from '@nestjs/common';
6
7@Module({
8 imports: [
9 ThrottlerModule.forRoot({
10 throttlers: [{
11 name: 'api',
12 ttl: 60000,
13 limit: 100,
14 }],
15 storage: new ThrottlerStorageRedisService(new Redis({
16 host: process.env.REDIS_HOST,
17 port: parseInt(process.env.REDIS_PORT ?? '6379'),
18 })),
19 }),
20 ],
21 providers: [
22 { provide: APP_GUARD, useClass: ThrottlerGuard },
23 ],
24})
25export class AppModule {}Now the counter is shared across every server instance. A client hitting 100 requests on one server is at 100 on every server. The Redis keys auto-expire using TTL, so there is no cleanup to manage.
Redis is the standard for production rate limiting, but it adds latency — roughly 0.5–1ms per check. For most APIs that is noise. For sub-millisecond-critical paths, evaluate whether the latency budget covers it.
Strategy 3: Per-User Rate Limiting
The default ThrottlerGuard identifies clients by IP address. That is the right default for unauthenticated routes, but wrong for authenticated ones.
An office with 50 people behind a single NAT IP should not share one rate limit. A user switching between mobile data and WiFi should not get a fresh limit on every network hop. For authenticated endpoints, track by user ID or API key, not IP.
Build a custom guard:
1// guards/api-rate-limit.guard.ts
2import { Injectable, ExecutionContext } from '@nestjs/common';
3import { ThrottlerGuard } from '@nestjs/throttler';
4import { GqlExecutionContext } from '@nestjs/graphql';
5
6@Injectable()
7export class ApiRateLimitGuard extends ThrottlerGuard {
8 protected async getTracker(req: Record<string, any>): Promise<string> {
9 // Priority: authenticated user > API key > IP
10 if (req.user?.id) {
11 return `user:${req.user.id}`;
12 }
13
14 if (req.headers['x-api-key']) {
15 return `apikey:${req.headers['x-api-key']}`;
16 }
17
18 return `ip:${req.ip}`;
19 }
20}Register the custom guard instead of the default one:
1// app.module.ts
2{
3 provide: APP_GUARD,
4 useClass: ApiRateLimitGuard,
5}Now authenticated users are tracked by their user ID, API consumers by their key, and anonymous visitors by IP. The same user gets the same limit on their phone, their laptop, and across network changes.
You can take this further with different limits per tracking type:
1// Override error message per type
2protected async getTracker(req: Record<string, any>): Promise<string> {
3 if (req.user?.id) {
4 return `user:${req.user.id}`;
5 }
6 if (req.headers['x-api-key']) {
7 return `apikey:${req.headers['x-api-key']}`;
8 }
9 // Unauthenticated — stricter limit
10 return `ip:${req.ip}`;
11}
12
13// Override the limit per tracker type
14protected async getLimitAndTtl(
15 req: Record<string, any>,
16 throttler: ThrottlerOptions,
17): Promise<[number, number]> {
18 const tracker = await this.getTracker(req);
19
20 if (tracker.startsWith('user:')) {
21 return [200, 60000]; // 200 req/min for authenticated users
22 }
23
24 if (tracker.startsWith('apikey:')) {
25 return [1000, 60000]; // 1000 req/min for API consumers
26 }
27
28 return [10, 60000]; // 10 req/min for anonymous visitors
29}
Strategy 4: Plan-Based Rate Limiting
For a SaaS product, rate limits should follow the subscription plan. Free tier users get 10 requests per minute. Pro users get 100. Enterprise users get custom limits negotiated in a sales call. Plan-based limits are what separate production-grade throttling from toy implementations.
This builds on the custom guard from Strategy 3 with one addition: read the user's plan from the request context.
1// guards/plan-rate-limit.guard.ts
2import { Injectable, ExecutionContext } from '@nestjs/common';
3import { ThrottlerGuard, ThrottlerOptions } from '@nestjs/throttler';
4
5@Injectable()
6export class PlanRateLimitGuard extends ThrottlerGuard {
7 protected async getLimitAndTtl(
8 req: Record<string, any>,
9 throttler: ThrottlerOptions,
10 ): Promise<[number, number]> {
11 // User is attached by auth middleware
12 const plan = req.user?.plan ?? 'free';
13
14 const limits: Record<string, [number, number]> = {
15 free: [10, 60000], // 10 req per minute
16 pro: [100, 60000], // 100 req per minute
17 enterprise: [1000, 60000], // 1000 req per minute
18 };
19
20 return limits[plan] ?? limits.free;
21 }
22
23 protected async getTracker(req: Record<string, any>): Promise<string> {
24 // Track by user ID so the same user gets the same limit everywhere
25 return req.user?.id
26 ? `plan:${req.user.plan}:user:${req.user.id}`
27 : `ip:${req.ip}`;
28 }
29}Register it in the module:
1{
2 provide: APP_GUARD,
3 useClass: PlanRateLimitGuard,
4}The key design: the tracker includes the plan so that when a user upgrades, their old counters are on a different key and they immediately get the new limit. No cache-flush dance required.
For enterprise clients with custom limits, store the limit in a database column and read it from the user object rather than hard-coding a map.
Sliding Window vs Fixed Window
The @nestjs/throttler package defaults to a fixed window algorithm. Fixed window resets the counter at the end of each TTL period. If your limit is 100 requests per minute, all 100 can arrive in the first second of the window, then the API is idle for 59 seconds. That burst can spike resource usage and is exploitable.
Sliding window evaluates the request count over a rolling time frame — the last 60 seconds, not the current clock minute. The @nestjs/throttler package supports a sliding window via the @nestjs/throttler-storage-redis package when configured with track mode.
For a custom sliding window implementation with Redis:
1import { Injectable, ExecutionContext } from '@nestjs/common';
2import { ThrottlerGuard } from '@nestjs/throttler';
3import { InjectRedis } from '@nestjs-modules/ioredis';
4import Redis from 'ioredis';
5
6@Injectable()
7export class SlidingWindowGuard extends ThrottlerGuard {
8 constructor(@InjectRedis() private readonly redis: Redis) {
9 super();
10 }
11
12 protected async handleRequest(
13 context: ExecutionContext,
14 limit: number,
15 ttl: number,
16 throttler: string,
17 ): Promise<boolean> {
18 const req = context.switchToHttp().getRequest();
19 const key = `sliding:${await this.getTracker(req)}`;
20
21 const now = Date.now();
22 const windowStart = now - ttl;
23
24 // Remove entries outside the window
25 await this.redis.zremrangebyscore(key, 0, windowStart);
26
27 // Count entries in the current window
28 const count = await this.redis.zcard(key);
29
30 if (count >= limit) {
31 throw this.throwThrottlingException(context, throttler);
32 }
33
34 // Add current request with timestamp score
35 await this.redis.zadd(key, now, `${now}:${Math.random()}`);
36 await this.redis.expire(key, Math.ceil(ttl / 1000));
37
38 return true;
39 }
40}Sliding window is fairer but requires Redis (the sorted set operations are O(log N)). For most SaaS endpoints, fixed window is fine. Use sliding window on the endpoints where burst behaviour actually hurts — typically billing webhooks, payment confirmation, and rate-limited third-party integrations.
HTTP Headers for Rate-Limited Responses
Rate-limited responses should tell the client exactly what happened and when they can retry. The standard headers are:
X-RateLimit-Limit— maximum requests allowed in the windowX-RateLimit-Remaining— requests remaining in the current windowX-RateLimit-Reset— timestamp (Unix) when the window resetsRetry-After— seconds until the client can retry (on 429 only)
Override the ThrottlerGuard to inject these headers:
1// guards/rate-limit-with-headers.guard.ts
2import { Injectable, ExecutionContext } from '@nestjs/common';
3import { ThrottlerGuard, ThrottlerRequest } from '@nestjs/throttler';
4
5@Injectable()
6export class RateLimitWithHeadersGuard extends ThrottlerGuard {
7 protected async handleRequest(
8 context: ExecutionContext,
9 limit: number,
10 ttl: number,
11 throttler: string,
12 req: ThrottlerRequest,
13 ): Promise<boolean> {
14 const res = context.switchToHttp().getResponse();
15 const { totalHits, timeToExpire } = await req.throttler.storageService.increment(
16 req.key,
17 ttl,
18 );
19
20 res.header('X-RateLimit-Limit', limit.toString());
21 res.header('X-RateLimit-Remaining', Math.max(0, limit - totalHits).toString());
22 res.header('X-RateLimit-Reset', Math.ceil((Date.now() + timeToExpire) / 1000).toString());
23
24 if (totalHits > limit) {
25 res.header('Retry-After', Math.ceil(timeToExpire / 1000).toString());
26 throw this.throwThrottlingException(context, throttler);
27 }
28
29 return true;
30 }
31}These headers are what well-behaved clients use to back off without hitting the limit. Stripe's API is the canonical example — their rate limit headers are the standard every SaaS should match.
Handling Rate Limits on the Frontend
A complete rate limiting implementation covers the frontend too. Rate limiting on the backend is necessary but insufficient if the frontend does not handle 429 responses gracefully. A user hitting a rate limit and seeing a raw JSON error is a worse experience than seeing a friendly message with a retry countdown.
On the frontend, wrap your API client to handle 429:
1// api-client.ts
2import axios from 'axios';
3
4const api = axios.create({ baseURL: '/api' });
5
6api.interceptors.response.use(
7 (response) => response,
8 async (error) => {
9 if (error.response?.status === 429) {
10 const retryAfter = error.response.headers['retry-after']
11 ? parseInt(error.response.headers['retry-after']) * 1000
12 : 5000;
13
14 // Show user-friendly toast or banner
15 showRateLimitWarning(retryAfter);
16
17 // Wait and retry once
18 await new Promise((resolve) => setTimeout(resolve, retryAfter));
19 return api.request(error.config);
20 }
21
22 return Promise.reject(error);
23 },
24);The retry-after header tells the client exactly how long to wait. Respect it — polling faster than the retry-after interval makes the problem worse.
For interactive actions (form submissions, button clicks), disable the button and show a countdown. For background sync operations, queue the request and retry after the header-specified delay.
Testing Your Rate Limiting
Rate limiting is one of those features that is easy to write and easy to break on the next deploy. Test it.
1// rate-limiting.e2e-spec.ts
2import * as request from 'supertest';
3
4describe('Rate Limiting', () => {
5 it('should allow requests within the limit', async () => {
6 for (let i = 0; i < 5; i++) {
7 const res = await request(app).get('/public');
8 expect(res.status).not.toBe(429);
9 }
10 });
11
12 it('should block requests exceeding the limit', async () => {
13 const app = await createTestAppWithLowLimit(3, 60000);
14
15 await request(app).get('/test'); // 1
16 await request(app).get('/test'); // 2
17 await request(app).get('/test'); // 3
18 const blocked = await request(app).get('/test'); // 4 — blocked
19
20 expect(blocked.status).toBe(429);
21 });
22
23 it('should return correct rate limit headers', async () => {
24 const res = await request(app).get('/public');
25
26 expect(res.headers['x-ratelimit-limit']).toBeDefined();
27 expect(res.headers['x-ratelimit-remaining']).toBeDefined();
28 expect(res.headers['x-ratelimit-reset']).toBeDefined();
29 });
30
31 it('should reset the counter after TTL expires', async () => {
32 const app = await createTestAppWithLowLimit(1, 500); // 500ms TTL
33
34 await request(app).get('/test'); // 1 — allowed
35 await request(app).get('/test'); // 2 — blocked
36
37 await new Promise((resolve) => setTimeout(resolve, 600));
38
39 const res = await request(app).get('/test'); // should be allowed again
40 expect(res.status).not.toBe(429);
41 });
42
43 it('should track by user ID for authenticated routes', async () => {
44 const res1 = await request(app)
45 .get('/profile')
46 .set('Authorization', 'Bearer user-token-a');
47
48 const res2 = await request(app)
49 .get('/profile')
50 .set('Authorization', 'Bearer user-token-b');
51
52 // Different users should have independent counters
53 expect(res1.headers['x-ratelimit-remaining']).toBe(
54 res2.headers['x-ratelimit-remaining'],
55 );
56 });
57});Testing rate limiting requires resetting state between tests. For integration tests, create a fresh application instance with a custom TTL of a few hundred milliseconds so the tests do not take minutes.
Production Considerations
A few things that matter when rate limiting is live:
Trust the proxy. If your API runs behind a load balancer or reverse proxy (Nginx, Cloudflare, AWS ALB), the request IP will be the proxy's IP unless you trust the X-Forwarded-For header:
1// main.ts
2app.getHttpAdapter().getInstance().set('trust proxy', 1);Skip health checks. Monitoring systems hitting /health should not be rate limited:
1import { SkipThrottle } from '@nestjs/throttler';
2import { Controller, Get } from '@nestjs/common';
3
4@SkipThrottle()
5@Controller('health')
6export class HealthController {
7 @Get()
8 check() {
9 return { status: 'ok' };
10 }
11}Log rate limit events. When a client is rate limited, log it. A sudden spike in 429 responses across many clients might mean someone is scraping your API. A persistent 429 on a single key might mean a legitimate integration is broken:
1// In your custom guard
2protected async throwThrottlingException(
3 context: ExecutionContext,
4 throttler: string,
5): Promise<void> {
6 const req = context.switchToHttp().getRequest();
7 const tracker = await this.getTracker(req);
8
9 this.logger.warn(`Rate limit exceeded for ${tracker} on ${req.path}`);
10
11 throw new HttpException(
12 {
13 statusCode: 429,
14 message: 'Too many requests. Please slow down.',
15 retryAfter: Math.ceil(this.ttl / 1000),
16 },
17 429,
18 );
19}For more on securing your NestJS application, the official NestJS rate limiting documentation covers the base module configuration in detail. The @nestjs/throttler GitHub repository documents the community storage providers including Redis-backed storage.
If you use authentication in your NestJS app, our JWT authentication with refresh tokens guide shows how to attach user and plan data to every request — which you need for Strategies 3 and 4. The NestJS project structure guide explains how we organise guards and middleware in production modules.
Conclusion
The API that survived 50,000 requests in 12 minutes did not survive because the code was clever. It survived because we added rate limiting after the first outage, and every release since has been behind a throttle that prevents one bad integration from taking down the whole system.
The four NestJS API rate limiting strategies here are not mutually exclusive. Start with in-memory for your MVP. Add Redis when you scale past one server. Build per-user tracking when you add authentication. Graduate to plan-based limits when you have enough customers that the API pricing conversation becomes a boardroom topic.
Pick the strategy that matches where your API is today. If you are already past the "50,000 requests in 12 minutes" stage, skip straight to Redis and plan-based limits. That database fall-over was not fun the first time, and you should not have to live through your own version of it. If you would rather get the throttling right before the outage teaches you, we are happy to take a look.
Frequently Asked Questions
API rate limiting in NestJS restricts how many requests a client can make within a given time window. It is implemented using the @nestjs/throttler package with a ThrottlerGuard applied globally or per route. Rate limiting prevents abuse, brute-force attacks, and server resource exhaustion by returning HTTP 429 when limits are exceeded.
Create a custom guard that extends ThrottlerGuard and override the getTracker method to return the authenticated user's ID instead of the request IP. Register this custom guard as APP_GUARD in your module. This ensures rate limits follow the user across devices and networks.
A fixed window resets the request count at the end of each time period (e.g., exactly every 60 seconds), which can cause request bursts at window boundaries. A sliding window evaluates the request count over a rolling time frame (e.g., the last 60 seconds), providing smoother and fairer rate limiting. The @nestjs/throttler package uses a fixed window by default; sliding window requires a Redis-based implementation.
Access the ThrottlerRequest object inside a custom ThrottlerGuard by overriding the handleRequest method. Read the current request count and TTL from the throttler storage service, then set the X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers on the response.
Use a custom ThrottlerGuard that reads the user's subscription tier from the request (set by your auth middleware) and returns a different limit based on the tier. Free tier users get stricter limits (e.g., 10 req/min), premium users get higher limits (e.g., 100 req/min), and enterprise users get custom limits.
