System Design 10 min read

API Design Part 3: Rate Limiting & Pagination

Master rate limiting algorithms, production Redis implementations, and cursor pagination. Protect your API from abuse while efficiently serving large datasets.

MR

Moshiour Rahman

Advertisement

API Design Mastery Series

This is Part 3 of our comprehensive API Design series.

PartTopicLevel
1HTTP & REST FundamentalsBeginner
2Security & AuthenticationBeginner
3Rate Limiting & PaginationIntermediate
4Versioning & IdempotencyIntermediate
5Caching StrategiesIntermediate
6GraphQL & gRPCIntermediate
7Resilience & ObservabilityAdvanced
8Production MasteryAdvanced

Rate Limiting Deep Dive

Sliding Window Rate Limiting Algorithm

Algorithm Comparison

AlgorithmProsConsBest For
Fixed WindowSimple, memory efficientBurst at window edgesSimple use cases
Sliding Window LogMost accurateHigh memory (stores timestamps)Small scale, audit trails
Sliding Window CounterAccurate, moderate memorySlight approximationProduction systems
Token BucketHandles bursts, smooth rateMore complex stateAPIs allowing bursts
Leaky BucketConsistent output rateNo burst handlingStrict rate enforcement

Production Rate Limiter Implementation

// rate-limiter.ts - Production-grade sliding window counter

import { Redis } from 'ioredis';

interface RateLimitConfig {
  windowMs: number;      // Window size in milliseconds
  maxRequests: number;   // Max requests per window
  keyPrefix?: string;
}

interface RateLimitResult {
  allowed: boolean;
  limit: number;
  remaining: number;
  resetAt: Date;
  retryAfter?: number;   // Seconds until retry (if blocked)
}

export class SlidingWindowRateLimiter {
  private redis: Redis;
  private config: RateLimitConfig;

  constructor(redis: Redis, config: RateLimitConfig) {
    this.redis = redis;
    this.config = {
      keyPrefix: 'ratelimit:',
      ...config
    };
  }

  async check(identifier: string): Promise<RateLimitResult> {
    const now = Date.now();
    const windowStart = now - this.config.windowMs;
    const key = `${this.config.keyPrefix}${identifier}`;

    // Lua script for atomic operation
    const script = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window_start = tonumber(ARGV[2])
      local window_ms = tonumber(ARGV[3])
      local max_requests = tonumber(ARGV[4])

      -- Remove old entries
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

      -- Count requests in current window
      local current_count = redis.call('ZCARD', key)

      if current_count < max_requests then
        -- Add new request
        redis.call('ZADD', key, now, now .. ':' .. math.random())
        redis.call('PEXPIRE', key, window_ms)
        return {1, current_count + 1}
      else
        -- Get oldest entry to calculate retry time
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_at = oldest[2] and (oldest[2] + window_ms) or (now + window_ms)
        return {0, current_count, retry_at}
      end
    `;

    const result = await this.redis.eval(
      script,
      1,
      key,
      now,
      windowStart,
      this.config.windowMs,
      this.config.maxRequests
    ) as [number, number, number?];

    const [allowed, count, retryAt] = result;
    const resetAt = new Date(now + this.config.windowMs);

    return {
      allowed: allowed === 1,
      limit: this.config.maxRequests,
      remaining: Math.max(0, this.config.maxRequests - count),
      resetAt,
      retryAfter: allowed === 0 && retryAt
        ? Math.ceil((retryAt - now) / 1000)
        : undefined
    };
  }

  // Get current status without consuming a request
  async status(identifier: string): Promise<RateLimitResult> {
    const now = Date.now();
    const windowStart = now - this.config.windowMs;
    const key = `${this.config.keyPrefix}${identifier}`;

    await this.redis.zremrangebyscore(key, '-inf', windowStart);
    const count = await this.redis.zcard(key);

    return {
      allowed: count < this.config.maxRequests,
      limit: this.config.maxRequests,
      remaining: Math.max(0, this.config.maxRequests - count),
      resetAt: new Date(now + this.config.windowMs)
    };
  }
}

// Middleware factory
export function createRateLimitMiddleware(
  limiter: SlidingWindowRateLimiter,
  keyGenerator: (req: Request) => string
) {
  return async (req: Request): Promise<Response | null> => {
    const key = keyGenerator(req);
    const result = await limiter.check(key);

    // Always set rate limit headers
    const headers = {
      'X-RateLimit-Limit': String(result.limit),
      'X-RateLimit-Remaining': String(result.remaining),
      'X-RateLimit-Reset': String(Math.floor(result.resetAt.getTime() / 1000)),
      'X-RateLimit-Policy': `${result.limit};w=${limiter['config'].windowMs / 1000}`
    };

    if (!result.allowed) {
      return new Response(
        JSON.stringify({
          success: false,
          error: {
            code: 'RATE_LIMITED',
            message: 'Too many requests',
            retryAfter: result.retryAfter
          }
        }),
        {
          status: 429,
          headers: {
            ...headers,
            'Retry-After': String(result.retryAfter),
            'Content-Type': 'application/json'
          }
        }
      );
    }

    return null; // Continue to handler
  };
}

Multi-Tier Rate Limiting Strategy

TierScopeExample Limits
1 - Global DDoSCDN/Edge (Cloudflare, AWS Shield)10,000 req/sec globally
2 - API GatewayPer-IP unauthenticated100 req/min
3 - User/API KeySubscription-basedFree: 100/hr, Pro: 1K/hr
4 - EndpointResource typeSearch: 20/min, Write: 30/min
5 - ResourceSpecific actions5 failed logins/account/hr

Interview Question: “How would you handle rate limiting in a distributed system with multiple API servers?”

Strong Answer: “The key challenge is shared state. Options include:

  1. Centralized store (Redis): Single source of truth, but adds latency and is a potential bottleneck. Use Redis Cluster for HA.

  2. Sticky sessions: Route users to same server, local rate limiting works. Simple but bad for load distribution.

  3. Approximate consensus: Each server tracks locally, periodically syncs. Allows some over-limit requests but highly available.

  4. Cell-based: Partition users to specific server groups, each group has its own Redis. Limits blast radius.

For most cases, I’d use Redis with the sliding window counter algorithm - it’s a good balance of accuracy and performance. The Lua script I showed ensures atomicity without distributed locks.”


Pagination Strategies

Complete Pagination Comparison

StrategyProsConsBest For
Offset/LimitSimple, random accessSlow on large offsets, inconsistent with changesSmall datasets, admin UIs
Cursor-basedConsistent, efficientNo random access, cursor can expireFeeds, timelines, large datasets
KeysetVery efficient, consistentRequires sortable unique keyTime-series, logs
Page NumberUser-friendly UXSame issues as offsetContent sites, search results

Production Cursor Pagination

// cursor-pagination.ts - Robust cursor implementation

import { z } from 'zod';

interface CursorData {
  id: string;
  sortValue: string | number;
  sortField: string;
  direction: 'asc' | 'desc';
}

// Encode cursor (opaque to client)
export function encodeCursor(data: CursorData): string {
  const json = JSON.stringify(data);
  return Buffer.from(json).toString('base64url');
}

// Decode and validate cursor
export function decodeCursor(cursor: string): CursorData | null {
  try {
    const json = Buffer.from(cursor, 'base64url').toString('utf-8');
    const data = JSON.parse(json);

    // Validate structure
    const schema = z.object({
      id: z.string(),
      sortValue: z.union([z.string(), z.number()]),
      sortField: z.string(),
      direction: z.enum(['asc', 'desc'])
    });

    return schema.parse(data);
  } catch {
    return null;
  }
}

interface PaginationParams {
  first?: number;    // Forward pagination
  after?: string;    // Cursor for forward
  last?: number;     // Backward pagination
  before?: string;   // Cursor for backward
}

interface PaginatedResponse<T> {
  edges: Array<{
    node: T;
    cursor: string;
  }>;
  pageInfo: {
    hasNextPage: boolean;
    hasPreviousPage: boolean;
    startCursor: string | null;
    endCursor: string | null;
    totalCount?: number;
  };
}

// Generic cursor pagination implementation
export async function paginateWithCursor<T extends { id: string }>(
  query: (params: {
    where?: Record<string, unknown>;
    orderBy: Record<string, 'asc' | 'desc'>;
    take: number;
    cursor?: { id: string };
    skip?: number;
  }) => Promise<T[]>,
  countQuery: () => Promise<number>,
  params: PaginationParams,
  sortField: keyof T = 'id' as keyof T,
  sortDirection: 'asc' | 'desc' = 'desc'
): Promise<PaginatedResponse<T>> {
  const limit = params.first || params.last || 20;
  const maxLimit = 100;
  const take = Math.min(limit, maxLimit) + 1; // Fetch one extra to check hasMore

  let cursor: CursorData | null = null;
  let direction = sortDirection;

  if (params.after) {
    cursor = decodeCursor(params.after);
    if (!cursor) throw new Error('Invalid cursor');
  } else if (params.before) {
    cursor = decodeCursor(params.before);
    if (!cursor) throw new Error('Invalid cursor');
    // Reverse direction for backward pagination
    direction = direction === 'asc' ? 'desc' : 'asc';
  }

  // Build query
  const queryParams: Parameters<typeof query>[0] = {
    orderBy: { [sortField]: direction },
    take
  };

  if (cursor) {
    queryParams.cursor = { id: cursor.id };
    queryParams.skip = 1; // Skip the cursor item itself
  }

  const items = await query(queryParams);

  // Check if there are more items
  const hasMore = items.length > limit;
  if (hasMore) items.pop(); // Remove the extra item

  // Reverse if backward pagination
  if (params.before || params.last) {
    items.reverse();
  }

  // Build edges with cursors
  const edges = items.map(item => ({
    node: item,
    cursor: encodeCursor({
      id: item.id,
      sortValue: String(item[sortField]),
      sortField: String(sortField),
      direction: sortDirection
    })
  }));

  // Get total count (optional, can be expensive)
  const totalCount = await countQuery();

  return {
    edges,
    pageInfo: {
      hasNextPage: params.before ? true : hasMore,
      hasPreviousPage: params.after ? true : (params.before ? hasMore : false),
      startCursor: edges[0]?.cursor || null,
      endCursor: edges[edges.length - 1]?.cursor || null,
      totalCount
    }
  };
}

GraphQL Connection Pattern

# schema.graphql - Relay-style connections

type Query {
  users(
    first: Int
    after: String
    last: Int
    before: String
    filter: UserFilter
  ): UserConnection!
}

type UserConnection {
  edges: [UserEdge!]!
  pageInfo: PageInfo!
  totalCount: Int!
}

type UserEdge {
  node: User!
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

Interview Question: “Why would you choose cursor pagination over offset?”

Strong Answer: “Three main reasons:

  1. Performance: Offset pagination degrades as offset grows - the database still has to scan and skip rows. With offset 10000, it scans 10000 rows just to return 20. Cursor pagination uses indexed seeks, consistently fast regardless of position.

  2. Consistency: With offset pagination, if items are inserted or deleted while paginating, you get duplicates or missed items. Cursor pagination maintains position relative to a specific item.

  3. Scalability: Offsets require accurate counts which can be expensive. Cursors only need to know if there’s a ‘next’ item.

The tradeoff is losing random access - users can’t jump to ‘page 50’. For most feeds and lists, sequential access is fine. For data tables needing random access, I’d use offset with caching and reasonable limits.”


Rate Limit Response Headers

Always include these headers so clients can implement smart retry logic:

HeaderPurposeExample
X-RateLimit-LimitMax requests per window1000
X-RateLimit-RemainingRequests left in window847
X-RateLimit-ResetUnix timestamp when window resets1735322400
Retry-AfterSeconds until retry (on 429)60
X-RateLimit-PolicyHuman-readable policy1000;w=3600
// Setting rate limit headers on every response
function setRateLimitHeaders(res: Response, result: RateLimitResult): void {
  res.headers.set('X-RateLimit-Limit', String(result.limit));
  res.headers.set('X-RateLimit-Remaining', String(result.remaining));
  res.headers.set('X-RateLimit-Reset', String(Math.floor(result.resetAt.getTime() / 1000)));

  if (!result.allowed) {
    res.headers.set('Retry-After', String(result.retryAfter));
  }
}

Common Rate Limiting Mistakes

MistakeProblemFix
No headersClients can’t adaptAlways return limit headers
Per-IP onlyShared IPs (NAT) get blocked unfairlyCombine with API key/user
Flat limitsHeavy endpoints abuse cheap onesPer-endpoint limits by cost
No 429 bodyClients don’t know when to retryInclude retryAfter in response
Hard rejectionNo graceful degradationConsider queueing or throttling
Same limits everywhereExpensive ops drain quotaTiered limits by endpoint cost

Pagination Common Mistakes

MistakeProblemFix
No total countUI can’t show “Page X of Y”Include totalCount (optional)
Large default limitSlow responses, timeoutsDefault 20, max 100
Offset on large dataPerformance degradesSwitch to cursor-based
Exposing internal IDsCursor reveals DB structureEncode cursors opaquely
Missing hasNextPageClient fetches empty pageAlways include in pageInfo
Inconsistent orderingItems appear/disappearRequire stable sort key

Rate Limiting Quick Reference

Rate Limiting & Pagination Guide


What’s Next?

Now that you understand traffic control, Part 4: Versioning & Idempotency covers API evolution strategies and making your endpoints safe for retries.

Advertisement

MR

Moshiour Rahman

Software Architect & AI Engineer

Share:
MR

Moshiour Rahman

Software Architect & AI Engineer

Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.

Related Articles

Comments

Comments are powered by GitHub Discussions.

Configure Giscus at giscus.app to enable comments.