SaaS Webhook System — How to Build Outgoing Webhooks for Your Platform

We built our first webhook system in a weekend. It took about four hours to wire up: an event happened, the server POSTed to the registered URL, and if it failed, we logged it and moved on. The client who asked for webhooks was happy. Everything worked — until a client's endpoint went down at 2am and we kept hammering it, queueing failed deliveries in an in-memory array until the Node process ran out of memory and restarted. We lost 47 events in that restart. The client noticed. They asked where their webhooks went. We had an honest answer: "somewhere in the ether."
This is the post we needed before that weekend. Here is how to build a production-grade SaaS outgoing webhook system implementation with BullMQ, HMAC signing, delivery logs, and a retry strategy that does not lose events when things go wrong.

What a SaaS Outgoing Webhook System Needs to Do
A SaaS outgoing webhook system takes events from your platform — invoice paid, user created, deployment finished — and delivers them as HTTP POST requests to URLs your customers register. In its simplest form, it is a synchronous HTTP call from application code. That works until your customer's endpoint takes five seconds to respond and your request thread is blocked. It works until their server is down and you have no retry. It works until you have 500 subscribers and the loop takes minutes.
A production system decouples event production from delivery using a queue. This is the pattern every large webhook provider uses: Stripe, GitHub, Slack, Twilio. When an event fires, you write it to a persistent queue and return immediately. A separate worker pool pulls events from the queue, attempts delivery, and handles results asynchronously. This buys you durability (events survive process crashes), backpressure protection (slow endpoints do not block producers), and replayability (failed deliveries can be retried without data loss).
Database Schema for Webhook State
Before any code, you need a data model that tracks endpoints, event subscriptions, and delivery attempts.
1-- Registered webhook endpoints
2CREATE TABLE webhook_endpoints (
3 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
4 tenant_id UUID NOT NULL REFERENCES tenants(id),
5 url TEXT NOT NULL,
6 secret TEXT NOT NULL, -- HMAC secret per endpoint
7 events TEXT[] NOT NULL DEFAULT '{}', -- subscribed event types, '*' for all
8 status TEXT NOT NULL DEFAULT 'active', -- active, paused, disabled
9 consecutive_failures INT NOT NULL DEFAULT 0,
10 created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
11 updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
12);
13
14-- Events queued for delivery
15CREATE TABLE webhook_events (
16 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
17 event_type TEXT NOT NULL,
18 payload JSONB NOT NULL,
19 tenant_id UUID NOT NULL REFERENCES tenants(id),
20 created_at TIMESTAMPTZ NOT NULL DEFAULT now()
21);
22
23-- Delivery attempt log
24CREATE TABLE webhook_deliveries (
25 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
26 event_id UUID NOT NULL REFERENCES webhook_events(id),
27 endpoint_id UUID NOT NULL REFERENCES webhook_endpoints(id),
28 attempt INT NOT NULL DEFAULT 1,
29 status TEXT NOT NULL DEFAULT 'queued', -- queued, in_flight, success, failed, dead_letter
30 response_code INT,
31 response_body TEXT,
32 error_type TEXT, -- timeout, network, 4xx, 5xx
33 next_attempt_at TIMESTAMPTZ,
34 created_at TIMESTAMPTZ NOT NULL DEFAULT now()
35);
36
37-- Indexes for the delivery worker
38CREATE INDEX idx_deliveries_next_attempt ON webhook_deliveries(status, next_attempt_at)
39 WHERE status = 'failed' OR status = 'queued';
40CREATE INDEX idx_events_created ON webhook_events(created_at DESC);The webhook_endpoints.secret field stores a unique HMAC secret per endpoint, generated when the customer registers a URL. Never show the full secret in the UI — mask it like a credit card: whsec_••••••••a1b2.
Registering Webhook Endpoints
Customers need a way to tell your platform where to send events. The API endpoint is straightforward:
1// apps/api/src/webhooks/webhooks.controller.ts
2import { Controller, Post, Body, UseGuards } from '@nestjs/common';
3import { WebhooksService } from './webhooks.service';
4import { AuthGuard } from '../auth/auth.guard';
5
6@Controller('webhooks')
7@UseGuards(AuthGuard)
8export class WebhooksController {
9 constructor(private readonly webhooksService: WebhooksService) {}
10
11 @Post('endpoints')
12 async createEndpoint(@Body() dto: CreateEndpointDto, @User() user) {
13 const secret = crypto.randomBytes(32).toString('hex');
14 const endpoint = await this.webhooksService.createEndpoint({
15 tenantId: user.tenantId,
16 url: dto.url,
17 events: dto.events,
18 secret: `whsec_${secret}`,
19 });
20 return {
21 id: endpoint.id,
22 url: endpoint.url,
23 secret: endpoint.secret, // shown once, not stored in plaintext response after this
24 };
25 }
26}When the customer registers a URL, optionally verify it works before accepting. Send a test payload with a verification token and expect a 200 response echoing the token — the pattern Stripe and Slack both use.
Publishing Webhook Events
When something happens in your domain — say, an invoice is paid — publish an event that the delivery system picks up:
1// apps/api/src/webhooks/webhooks.service.ts
2import { Injectable } from '@nestjs/common';
3import { InjectQueue } from '@nestjs/bullmq';
4import { Queue } from 'bullmq';
5import { InjectRepository } from '@nestjs/typeorm';
6import { Repository } from 'typeorm';
7
8@Injectable()
9export class WebhooksService {
10 constructor(
11 @InjectQueue('webhook-delivery') private deliveryQueue: Queue,
12 @InjectRepository(WebhookEndpoint) private endpointsRepo: Repository<WebhookEndpoint>,
13 ) {}
14
15 async publishEvent(eventType: string, payload: object, tenantId: string) {
16 // Find all active endpoints subscribed to this event type
17 const endpoints = await this.endpointsRepo.findBy({
18 tenantId,
19 status: 'active',
20 // Use raw SQL for array containment check
21 });
22
23 // Store the event for the audit trail
24 const event = await this.eventRepo.save({
25 id: generateUuid(),
26 eventType,
27 payload,
28 tenantId,
29 });
30
31 // Enqueue a job for each matching endpoint
32 const jobs = endpoints.map((ep) => ({
33 name: 'deliver',
34 data: { eventId: event.id, endpointId: ep.id },
35 opts: { jobId: `${event.id}:${ep.id}` }, // idempotent job ID
36 }));
37
38 await this.deliveryQueue.addBulk(jobs);
39 }
40}The jobId option makes this safe to call multiple times for the same event-endpoint pair. BullMQ's deduplication ignores duplicate job IDs within the default job deduplication window, so you can publish without worrying about double-enqueue on crash recovery.

Delivery Worker: Queue Consumer
The worker pulls jobs from the queue, makes the HTTP call, records the result, and decides whether to retry:
1// apps/api/src/webhooks/webhook-delivery.processor.ts
2import { Processor, WorkerHost, InjectQueue } from '@nestjs/bullmq';
3import { Job, Queue } from 'bullmq';
4import { InjectRepository } from '@nestjs/typeorm';
5import { Repository } from 'typeorm';
6
7@Processor('webhook-delivery')
8export class WebhookDeliveryProcessor extends WorkerHost {
9 constructor(
10 @InjectQueue('webhook-delivery') private readonly deliveryQueue: Queue,
11 @InjectRepository(WebhookEndpoint) private endpointsRepo: Repository<WebhookEndpoint>,
12 @InjectRepository(WebhookEvent) private eventsRepo: Repository<WebhookEvent>,
13 @InjectRepository(WebhookDelivery) private deliveriesRepo: Repository<WebhookDelivery>,
14 ) {
15 super();
16 }
17
18 async process(job: Job<{ eventId: string; endpointId: string }>) {
19 const { eventId, endpointId } = job.data;
20 const endpoint = await this.endpointsRepo.findOneBy({ id: endpointId });
21 const event = await this.eventsRepo.findOneBy({ id: eventId });
22
23 const signature = this.signPayload(event.payload, endpoint.secret);
24
25 const delivery = await this.deliveriesRepo.save({
26 eventId,
27 endpointId,
28 attempt: (job.attemptsMade || 0) + 1,
29 status: 'in_flight',
30 });
31
32 try {
33 const response = await fetch(endpoint.url, {
34 method: 'POST',
35 headers: {
36 'Content-Type': 'application/json',
37 'X-Webhook-Signature': signature,
38 'X-Webhook-Timestamp': Math.floor(Date.now() / 1000).toString(),
39 'X-Webhook-Id': eventId,
40 },
41 body: JSON.stringify(event.payload),
42 signal: AbortSignal.timeout(10000),
43 });
44
45 if (response.ok) {
46 await this.deliveriesRepo.update(delivery.id, {
47 status: 'success',
48 responseCode: response.status,
49 });
50 await this.endpointsRepo.update(endpointId, { consecutiveFailures: 0 });
51 return;
52 }
53
54 await this.handleFailure(job, delivery, endpoint, response);
55 } catch (err) {
56 await this.handleFailure(job, delivery, endpoint, null, err);
57 }
58 }
59}Retry Strategy with Exponential Backoff and Jitter
Failures happen. The question is how you handle them. BullMQ's built-in retry mechanism handles the basics, but you want a strategy that distinguishes between failures that retrying can fix and failures it cannot.
1private async handleFailure(
2 job: Job,
3 delivery: WebhookDelivery,
4 endpoint: WebhookEndpoint,
5 response?: Response,
6 error?: Error,
7) {
8 const isRetriable = response
9 ? response.status >= 500 || response.status === 429
10 : true; // timeout or network error
11
12 if (!isRetriable) {
13 // 4xx errors: move to dead letter
14 await this.deliveriesRepo.update(delivery.id, {
15 status: 'dead_letter',
16 responseCode: response?.status,
17 errorType: '4xx',
18 });
19 await this.incrementFailures(endpoint);
20 return;
21 }
22
23 const maxAttempts = 10;
24 if (job.attemptsMade >= maxAttempts) {
25 await this.deliveriesRepo.update(delivery.id, {
26 status: 'dead_letter',
27 responseCode: response?.status,
28 });
29 await this.incrementFailures(endpoint);
30 return;
31 }
32
33 // Exponential backoff with jitter
34 const baseDelay = 2000; // 2 seconds
35 const delay = baseDelay * Math.pow(2, job.attemptsMade);
36 const jitter = delay * 0.2 * (Math.random() - 0.5);
37 const nextAttemptAt = new Date(Date.now() + delay + jitter);
38
39 await this.deliveriesRepo.update(delivery.id, {
40 status: 'failed',
41 responseCode: response?.status,
42 errorType: error ? 'timeout' : '5xx',
43 nextAttemptAt,
44 });
45
46 // Re-enqueue with delay
47 await this.deliveryQueue.add('deliver', job.data, {
48 delay: delay + jitter,
49 jobId: `${job.data.eventId}:${job.data.endpointId}:attempt-${job.attemptsMade + 1}`,
50 });
51
52 await this.incrementFailures(endpoint);
53}
54
55private async incrementFailures(endpoint: WebhookEndpoint) {
56 const count = endpoint.consecutiveFailures + 1;
57 if (count >= 10) {
58 await this.endpointsRepo.update(endpoint.id, {
59 consecutiveFailures: count,
60 status: 'disabled',
61 });
62 // Notify customer: "Your webhook endpoint has been disabled"
63 return;
64 }
65 await this.endpointsRepo.update(endpoint.id, { consecutiveFailures: count });
66}The key details here:
- 5xx and timeouts get retried with exponential backoff from 2s up to roughly 34 minutes by attempt 10
- 4xx errors skip retries — a 400 means the endpoint rejected the payload and sending it again will not help
- Jitter spreads retries across the interval, preventing the thundering herd problem when multiple endpoints fail simultaneously
- 10 consecutive failures disables the endpoint — preserving delivery resources and alerting the customer
BullMQ supports native delayed jobs, so the re-enqueue with delay is handled by Redis rather than a polling loop. The BullMQ documentation covers the full set of options for rate limiting, concurrency, and sandboxed processors.
HMAC Payload Signing

Customers need to verify that the webhooks they receive actually came from your platform and were not tampered with in transit. HMAC-SHA256 is the industry standard — Stripe, GitHub, Slack, and Shopify all use it.
1// apps/api/src/webhooks/webhook-signer.service.ts
2import { createHmac, timingSafeEqual } from 'crypto';
3
4export class WebhookSigner {
5 static sign(payload: object, secret: string): string {
6 const body = JSON.stringify(payload);
7 const timestamp = Math.floor(Date.now() / 1000);
8 const signedPayload = `${timestamp}.${body}`;
9 const signature = createHmac('sha256', secret)
10 .update(signedPayload, 'utf8')
11 .digest('hex');
12 // Node.js crypto API reference covers createHmac and timingSafeEqual
13 return `t=${timestamp},v1=${signature}`;
14 }
15
16 static verify(
17 rawBody: string,
18 signature: string,
19 secret: string,
20 maxAgeSeconds: number = 300,
21 ): boolean {
22 const parts = signature.split(',');
23 const timestamp = parseInt(parts[0]?.replace('t=', ''), 10);
24 const receivedSig = parts[1]?.replace('v1=', '');
25
26 if (Date.now() / 1000 - timestamp > maxAgeSeconds) {
27 return false; // replay attack — signature is too old
28 }
29
30 const signedPayload = `${timestamp}.${rawBody}`;
31 const expectedSig = createHmac('sha256', secret)
32 .update(signedPayload, 'utf8')
33 .digest('hex');
34
35 // Timing-safe comparison prevents timing attacks
36 return timingSafeEqual(Buffer.from(receivedSig), Buffer.from(expectedSig));
37 }
38}The timestamp in the signature prevents replay attacks — a captured webhook cannot be re-sent later because the timestamp check would fail. The Standard Webhooks specification standardizes this format and is worth adopting so your customers can use existing verification libraries instead of writing their own. For more on the crypto primitives used here, the Node.js crypto documentation covers createHmac and timingSafeEqual in depth.
Generate a unique secret per endpoint during registration:
1function generateWebhookSecret(): string {
2 const bytes = crypto.randomBytes(32);
3 return 'whsec_' + bytes.toString('hex');
4}Store it hashed in your database (like a password) and show it to the customer exactly once during endpoint creation. Provide a rotate-secret API endpoint for when customers lose their secret or suspect a leak.
Delivery Log UI
Customers want to see delivery history. Build a page that lists recent deliveries with status, response code, and timestamps:
1// apps/web/src/app/webhooks/deliveries/page.tsx
2export default async function DeliveryLogPage({
3 searchParams,
4}: {
5 searchParams: { endpointId?: string; status?: string };
6}) {
7 const deliveries = await fetchDeliveries({
8 tenantId: currentUser.tenantId,
9 endpointId: searchParams.endpointId,
10 status: searchParams.status,
11 limit: 50,
12 });
13
14 return (
15 <div className="space-y-4">
16 <h1>Webhook Deliveries</h1>
17 <DeliveryTable
18 deliveries={deliveries.data}
19 onRetry={(id) => retryDelivery(id)}
20 />
21 {/* Show: timestamp, event type, endpoint URL, status, response code, latency */}
22 </div>
23 );
24}Include a "Retry" button for failed deliveries. This is the single feature that reduces support tickets the most — customers can self-serve instead of emailing your support team asking why their integration failed.
For your existing background job queue architecture, the delivery worker follows the same retry and dead-letter patterns. The difference is that webhook delivery has an external dependency — the customer's endpoint — which introduces failure modes you cannot control, making observability and self-service retry critical. You can also pair webhook events with audit logging to maintain a complete record of all outgoing payloads for compliance.
Auto-Disable After Consecutive Failures
If an endpoint has been failing for 10 consecutive attempts, continuing to retry wastes resources and delays deliveries to other endpoints that might be working. Disable the endpoint automatically and notify the customer:
1// Check before each delivery attempt
2const endpoint = await this.endpointsRepo.findOneBy({ id: endpointId });
3if (endpoint.status !== 'active') {
4 throw new Error(`Endpoint ${endpointId} is ${endpoint.status}`);
5}When the customer fixes their endpoint, they reactivate it through the dashboard or API, and the system resumes normal delivery. This avoids the "zombie endpoint" problem where one broken customer silently consumes a disproportionate share of your delivery resources.
Testing Webhooks During Development
Customers need to test their integration before going live. Provide a webhook testing tool in your dashboard:
- Send a test event — let customers pick an event type and send a sample payload to their endpoint, even if the real event has not happened yet
- Show the raw request — display the exact HTTP request your platform would send, including headers and signature, so customers can verify their signature verification logic
- Support ngrok-friendly URLs — many developers test with ngrok pointing to localhost. Allow
localhostandngrok.ioURLs in sandbox environments to avoid blocking valid test endpoints - Replay historical events — provide a replay endpoint that re-sends any previously delivered event, useful when a customer is debugging and says "send me the one that failed"
1@Post('events/:id/replay')
2async replayEvent(@Param('id') eventId: string, @User() user) {
3 const event = await this.webhooksService.getEvent(eventId, user.tenantId);
4 await this.webhooksService.publishEvent(event.eventType, event.payload, user.tenantId);
5 return { status: 'queued' };
6}Adding Tenant Isolation
In a multi-tenant SaaS, one customer's broken endpoint should not delay deliveries for other tenants. The simplest isolation strategy is per-tenant queues:
1// Dynamic queue per tenant
2const queueName = `webhook-delivery-${tenantId}`;
3const queue = new Queue(queueName, { connection: redisConnection });
4await queue.add('deliver', { eventId, endpointId });Use separate BullMQ worker instances per queue, each with its own concurrency limit. This prevents one slow tenant from starving the rest. The tradeoff is more Redis connections, but for most SaaS products the isolation benefit outweighs the infrastructure cost.
For event-driven systems that already use NestJS event-driven architecture, you can integrate webhook publishing as an event handler that listens to domain events and enqueues delivery jobs automatically.
Secret Rotation
Webhook secrets should be rotatable without breaking the customer's integration. Provide an API endpoint:
1@Post('endpoints/:id/rotate-secret')
2async rotateSecret(@Param('id') endpointId: string) {
3 const newSecret = `whsec_${crypto.randomBytes(32).toString('hex')}`;
4 await this.webhooksService.updateSecret(endpointId, newSecret);
5 return { secret: newSecret }; // shown once
6}Allow a grace period where both old and new secrets are accepted:
1private async verifyWithRotation(rawBody: string, signature: string, endpoint: Endpoint) {
2 if (verifySignature(rawBody, signature, endpoint.currentSecret)) return true;
3 if (endpoint.previousSecret && verifySignature(rawBody, signature, endpoint.previousSecret)) {
4 return true; // still transitioning
5 }
6 return false;
7}After the grace period expires (typically 24 hours), discard the previous secret. This lets customers update their integration at their own pace without a hard cutover.
Conclusion
A reliable SaaS outgoing webhook system is not hard to build, but it is easy to build wrong. The three mistakes we made on that first weekend — synchronous delivery in the request thread, no retry strategy, in-memory queueing — each looked like an optimization in the moment. Each one cost us events and trust.
The patterns in this post — queue-first architecture with BullMQ in NestJS, HMAC signing with timestamp verification, exponential backoff with jitter, per-tenant isolation, and self-service delivery logs — are the ones that turn a webhook system from a weekend project into something you can ship to enterprise customers. They are the difference between losing 47 events in a process restart and being able to say, when a customer asks what happened to their webhook at 2am: "here is the exact delivery log, including the retry we attempted 47 seconds later that succeeded."
If you are building your first webhook system right now, start with the schema, the BullMQ worker, and HMAC signing. Add the delivery log UI and auto-disable next. That covers 90% of what your customers need, and it does not lose events when things go sideways — which, in any production system, they eventually will.
If you would rather have a second pair of eyes on your delivery and retry design before it is the thing paging you at 2am, get in touch — webhook reliability is exactly the kind of unglamorous infrastructure we build for a living.
Frequently Asked Questions
An outgoing webhook system sends HTTP POST requests from your SaaS platform to customer-registered URLs when specific events occur. It includes delivery queues, retry logic with exponential backoff, HMAC payload signing for security, delivery logging for visibility, and auto-disable mechanisms for failing endpoints.
Sign each webhook payload with HMAC-SHA256 using a unique secret key per customer endpoint. Include the signature and a timestamp in the request headers so receivers can verify both authenticity and freshness. Use timing-safe comparison functions for signature verification to prevent timing attacks.
BullMQ with Redis is the standard choice for NestJS webhook delivery. It provides at-least-once delivery guarantees, supports delayed jobs for retries, and can be scaled horizontally by adding more worker processes. The @nestjs/bullmq package integrates it directly into your NestJS module system.
Implement a multi-tier retry strategy with exponential backoff and jitter. Retry transient failures (5xx, timeouts) with increasing delays. Move permanent failures (4xx errors like 400 or 404) directly to a dead-letter queue after one attempt. Auto-disable endpoints after 10 consecutive failures and notify the customer.
Build your own if webhooks are core to your product value proposition and you need custom retry logic, tenant isolation policies, or compliance control. Use a managed service like Svix or Hookdeck if you want faster time-to-market and can accept vendor lock-in. For most early-stage SaaS products, building is the better long-term investment.
