API Design Part 8: Production Mastery

API Design Mastery Series

This is Part 8 of our comprehensive API Design series.

Part	Topic	Level
1	HTTP & REST Fundamentals	Beginner
2	Security & Authentication	Beginner
3	Rate Limiting & Pagination	Intermediate
4	Versioning & Idempotency	Intermediate
5	Caching Strategies	Intermediate
6	GraphQL & gRPC	Intermediate
7	Resilience & Observability	Advanced
8	Production Mastery	Advanced

API Lifecycle: From Design to Deprecation

Real Interview Questions

Beginner Level

Q: “What is REST?”

Strong Answer: “REST (Representational State Transfer) is an architectural style based on six constraints: client-server separation, statelessness, cacheability, uniform interface, layered system, and optional code-on-demand. In practice, REST APIs use HTTP methods semantically - GET for retrieval, POST for creation, PUT/PATCH for updates, DELETE for removal - and organize data around resources identified by URLs.”

Follow-up: “What makes an API truly RESTful vs just HTTP-based?”

Answer: “Many APIs called ‘REST’ are actually just HTTP APIs. True REST requires: using HTTP methods correctly (not POST for everything), proper status codes, resource-based URLs (not action-based like /getUser), and ideally HATEOAS where responses include links to related actions. Most production APIs stop at Level 2 of Richardson’s maturity model.”

Q: “Explain idempotency”

Strong Answer: “An operation is idempotent if performing it multiple times has the same effect as performing it once. GET, PUT, DELETE, and HEAD are idempotent by design. POST typically isn’t - creating the same order twice creates two orders. This matters critically in distributed systems where retries happen. If a payment POST times out, retrying without idempotency handling could double-charge. Solutions include client-generated idempotency keys that the server uses to deduplicate requests.”

Intermediate Level

Q: “How would you design rate limiting for a multi-tier API?”

Strong Answer: “I’d implement multiple layers:

Global: CDN-level protection against DDoS, very high limits
IP-based: Protect against unauthenticated abuse, moderate limits
User/API-key: Based on subscription tier, the main business limit
Endpoint-specific: Expensive operations (search, exports) get stricter limits
Resource-specific: Prevent enumeration attacks (failed login attempts per account)

For the algorithm, I’d use sliding window counters in Redis - good balance of accuracy and memory. The key insight is making limits proportional to cost: a search that scans millions of rows should have stricter limits than a simple read.”

Q: “Design a pagination system for a feed with real-time updates”

Strong Answer: “Offset pagination breaks with real-time data - items shift positions. I’d use cursor-based pagination:

Cursor: Encode the last item’s ID and timestamp (for sorting)
Direction: Support both ‘after’ (newer) and ‘before’ (older)
Real-time integration: New items since cursor can be fetched with ‘after’ cursor
Consistency: The cursor represents a stable point, unaffected by new inserts

The cursor is an opaque base64-encoded JSON containing {id, timestamp, direction}. The client doesn’t parse it - just passes it back. This lets us change internal structure without breaking clients.”

Senior Level

Q: “Design the API for a payment processing system”

Strong Answer:

Idempotency: Every mutating endpoint requires an idempotency key. Payments are exactly-once operations - store the key with the result and return cached response on retry.

State machine: Payment status follows defined transitions (pending → processing → succeeded/failed). Invalid transitions return 409 Conflict.

Async operations: Payment processing is asynchronous. POST /payments returns 202 Accepted with a polling URL or webhook registration. Never make clients wait for bank responses.

Security: PCI compliance - never see full card numbers, use tokenization. Request signing with timestamps to prevent replay. IP allowlisting for webhook receivers.

API design:

POST   /payments              - Initiate payment (idempotency-key required)
GET    /payments/:id          - Get status
POST   /payments/:id/capture  - Capture authorized payment
POST   /payments/:id/refund   - Refund payment
GET    /payments/:id/events   - Audit log

Webhooks:
payment.created, payment.processing, payment.succeeded, payment.failed, payment.refunded

Q: “Your API is experiencing 10x normal traffic. Walk me through your response.”

Strong Answer:

Immediate triage (first 5 minutes):

Is it legitimate traffic? Check geographic distribution, user agents, request patterns.
What’s the impact? Check error rates, latency percentiles, queue depths.

If legitimate traffic:

Scale horizontally (auto-scaling should kick in)
Enable aggressive caching
Consider graceful degradation (disable expensive features)
Rate limit per-user to ensure fair access

If attack/abuse:

Enable IP-based rate limiting at edge (CDN)
Block obvious bot patterns
If targeted at specific endpoint, add CAPTCHA or proof-of-work

After stabilization:

Post-mortem: why didn’t auto-scaling keep up?
Capacity planning: do we need more headroom?
Load testing: can we simulate this for future preparedness?

Production Debugging Scenarios

Scenario 1: Intermittent 502 Errors

Symptom: Users report occasional “Bad Gateway” errors, but you can’t reproduce.

Investigation:

# Check load balancer logs for 502s
grep ' 502 ' /var/log/nginx/access.log | tail -100

# Look for patterns - specific endpoints? times?
grep ' 502 ' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn

Common causes & solutions:

Cause	Solution
Upstream timeout	Increase `proxy_read_timeout`
Connection pool exhaustion	Increase pool `max` connections
OOM kills	Add memory, optimize queries
Graceful restart timing	Configure proper drain time

Scenario 2: Memory Leak

Symptom: Server memory grows over days, eventually crashes.

Investigation:

// Add memory tracking endpoint (dev only)
app.get('/debug/memory', (req, res) => {
  const used = process.memoryUsage();
  res.json({
    heapUsed: Math.round(used.heapUsed / 1024 / 1024) + 'MB',
    heapTotal: Math.round(used.heapTotal / 1024 / 1024) + 'MB',
    rss: Math.round(used.rss / 1024 / 1024) + 'MB'
  });
});

Common causes:

Cause	Solution
Event listener leaks	Remove listeners on cleanup
Closure leaks	Avoid holding large objects in closures
Cache without eviction	Use LRU cache with size limits
Unreleased connections	Ensure proper connection cleanup

Scenario 3: Slow Endpoint

Symptom: GET /users/search takes 5+ seconds.

Investigation - Add timing to identify bottleneck:

async function searchUsers(query: string) {
  const timings: Record<string, number> = {};
  const start = Date.now();

  const parsed = parseQuery(query);
  timings.parse = Date.now() - start;

  const dbStart = Date.now();
  const results = await db.user.findMany({ where: buildWhere(parsed) });
  timings.database = Date.now() - dbStart;

  timings.total = Date.now() - start;
  console.log('Search timings:', timings);
  return results;
}
// Output: { "parse": 1, "database": 4500, "total": 4506 }

Database is bottleneck → Check query:

EXPLAIN ANALYZE SELECT * FROM users WHERE name ILIKE '%john%';
-- Solution: Add GIN index for text search
CREATE INDEX users_name_search_idx ON users USING gin(to_tsvector('english', name));

Multi-Tenancy Patterns

Isolation Strategies

Strategy	Isolation	Cost	Best For
Row-Level	Lowest	Lowest	SaaS, startups
Schema-per-Tenant	Medium	Medium	Regulated industries
Database-per-Tenant	Highest	Highest	Enterprise, compliance

Row-Level Security Implementation

// Prisma middleware for automatic tenant filtering
prisma.$use(async (params, next) => {
  const tenantId = getCurrentTenant();

  if (tenantId && TENANT_MODELS.includes(params.model)) {
    // Auto-add tenant filter to queries
    if (params.action === 'findMany' || params.action === 'findFirst') {
      params.args.where = { ...params.args.where, tenantId };
    }

    // Prevent cross-tenant writes
    if (params.action === 'create') {
      params.args.data.tenantId = tenantId;
    }

    if (params.action === 'update' || params.action === 'delete') {
      params.args.where = { ...params.args.where, tenantId };
    }
  }

  return next(params);
});

Noisy Neighbor Protection

class NoisyNeighborProtection {
  private thresholds = {
    requestsPerMinute: 1000,
    cpuMsPerMinute: 60000,
    memoryBytesMax: 512 * 1024 * 1024
  };

  async checkAndTrack(tenantId: string, metrics: TenantUsage): Promise<boolean> {
    if (metrics.requestCount > this.thresholds.requestsPerMinute) {
      await this.alert('rate_limit', tenantId);
      return false;
    }
    // ... check other thresholds
    return true;
  }
}

Async & Event-Driven Design

When to Go Async

Operation Duration	Pattern	Response
< 1 second	Sync	200 OK + result
1-30 seconds	Long polling / SSE	Connection held open
> 30 seconds	Async + webhooks	202 Accepted + callback
Minutes/Hours	Background job	202 Accepted + polling URL

Webhook Implementation

export class WebhookService {
  private retryDelays = [0, 60, 300, 900, 3600]; // seconds

  async deliver(event: WebhookEvent, subscription: WebhookSubscription): Promise<void> {
    const payload = JSON.stringify(event);
    const signature = this.sign(payload, subscription.secret);

    for (let attempt = 0; attempt < 5; attempt++) {
      if (attempt > 0) await this.sleep(this.retryDelays[attempt] * 1000);

      const response = await fetch(subscription.url, {
        method: 'POST',
        headers: {
          'X-Webhook-ID': event.id,
          'X-Webhook-Signature': signature
        },
        body: payload,
        signal: AbortSignal.timeout(30000)
      });

      if (response.ok) return;
      if (response.status < 500) return; // Don't retry client errors
    }

    await this.disableWebhook(subscription.id); // Max retries exceeded
  }

  private sign(payload: string, secret: string): string {
    return `sha256=${createHmac('sha256', secret).update(payload).digest('hex')}`;
  }
}

Large Payloads & Streaming

Streaming Response Pattern

async function streamLargeDataset(req: Request): Promise<Response> {
  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      controller.enqueue(encoder.encode('{"data":['));

      let first = true;
      for await (const user of db.user.findMany({ cursor: true })) {
        if (!first) controller.enqueue(encoder.encode(','));
        first = false;
        controller.enqueue(encoder.encode(JSON.stringify(user)));
      }

      controller.enqueue(encoder.encode(']}'));
      controller.close();
    }
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked' }
  });
}

Chunked Upload

class ChunkedUploadService {
  private chunkSize = 5 * 1024 * 1024; // 5MB

  async initializeUpload(fileName: string, totalSize: number): Promise<UploadSession> {
    return {
      id: crypto.randomUUID(),
      fileName,
      totalSize,
      uploadedSize: 0,
      chunks: new Map(),
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000)
    };
  }

  async uploadChunk(sessionId: string, chunkNumber: number, data: ArrayBuffer) {
    const session = this.sessions.get(sessionId);
    await this.storeChunk(sessionId, chunkNumber, data);
    session.uploadedSize += data.byteLength;

    if (session.uploadedSize >= session.totalSize) {
      await this.finalizeUpload(session);
    }
  }
}

Cost-Aware API Design

At scale, every API call has a dollar value. Senior engineers design with cloud bills in mind.

Cost Per Request

Component	Cost per Request
Load Balancer	$0.000001
App Server	$0.00001
Database Query	$0.0001
Cache Hit	$0.000001
External API	$0.001

Key insight: Cache hits save money. External APIs are expensive.

Endpoint Cost Tiers

const ENDPOINT_CONFIGS = [
  // Cheap - simple reads
  { path: '/users/:id', method: 'GET', costTier: 'cheap', rateLimit: 1000 },

  // Moderate - writes, filtered queries
  { path: '/users', method: 'POST', costTier: 'moderate', rateLimit: 100 },

  // Expensive - aggregations, exports, AI
  { path: '/reports/generate', method: 'POST', costTier: 'expensive', rateLimit: 10 },
  { path: '/ai/analyze', method: 'POST', costTier: 'expensive', rateLimit: 5 }
];

Prevent N+1 Queries

// BAD: N+1 pattern - costs scale with orderIds.length
for (const id of orderIds) {
  const items = await db.orderItem.findMany({ where: { orderId: id } });
}

// GOOD: Batch load - constant cost
const orders = await db.order.findMany({
  where: { id: { in: orderIds } },
  include: { items: true }
});

API Governance

Lifecycle Stages

Stage	What Happens	Duration
Draft	RFC process, internal review	As needed
Alpha	Internal testing only	Weeks
Beta	Limited external, breaking changes allowed	Months
Stable	GA, semver strictly enforced	Years
Deprecated	12-month minimum notice	12+ months
Retired	Returns 410 Gone	Permanent

API Registry

interface APIDefinition {
  id: string;
  name: string;
  version: string;
  status: 'draft' | 'alpha' | 'beta' | 'stable' | 'deprecated';
  owner: { team: string; email: string; slackChannel: string };
  sla: { availability: number; latencyP99Ms: number; errorRateMax: number };
  consumers: string[];
  dependencies: string[];
  deprecation?: {
    announcedAt: Date;
    sunsetAt: Date;
    migrationGuide: string;
  };
}

Breaking Change Approval

Change Type	Required Approvals
Additive	API owner only
Breaking	API owner + architect + affected consumers
Deprecation	API owner + architect + security

The Complete API Engineer Checklist

Category	Must Know	Senior Level
HTTP	Methods, status codes, headers	Content negotiation, conditional requests
Security	JWT, API keys, HTTPS	OAuth flows, mTLS, OWASP top 10
Rate Limiting	Basic implementation	Multi-tier, distributed, adaptive
Pagination	Offset vs cursor	Connection pattern, real-time feeds
Caching	Cache-Control headers	Multi-layer architecture, invalidation
Versioning	URL-based	Migration strategies, deprecation
Idempotency	Concept and why it matters	Production implementation
Resilience	Timeouts, retries	Circuit breakers, bulkheads
Observability	Logging	Metrics, tracing, SLIs/SLOs
API Styles	REST fundamentals	GraphQL, gRPC, when to use each
Contracts	OpenAPI basics	CDC testing, schema validation
Async	Webhooks basics	Event schemas, backpressure
Multi-tenancy	Tenant filtering	Isolation models, noisy neighbor
Streaming	Basic pagination	Chunked uploads, gRPC streaming
Cost	Basic awareness	Fan-out prevention, endpoint costing
Governance	Documentation	Registry, change management

Series Complete

Congratulations on completing the API Design Mastery series! You’ve covered everything from HTTP fundamentals to production debugging and enterprise governance.

The difference between junior and senior engineers isn’t just knowing these concepts - it’s having implemented them, debugged them at 3 AM, and understanding the tradeoffs.

The best API is the one your consumers love to use and your operations team can sleep through.

API Design Principles (The Golden Rules)

Principle	Description	Example
Consistency	Same patterns everywhere	All resources use `GET`, `POST`, `PUT`, `DELETE`
Predictability	No surprises	Same error format across all endpoints
Simplicity	Easy to understand	`GET /users/123` not `GET /fetchUserById?id=123`
Documentation	Always current	OpenAPI spec auto-generated from code
Versioning	Backward compatible	Changes additive, breaking = new version
Security first	Defense in depth	Auth, rate limits, input validation

Production Readiness Checklist

Before launching any API to production:

□ SECURITY
  □ All endpoints require authentication (except health checks)
  □ Input validation on all parameters
  □ Rate limiting configured per tier
  □ CORS configured with specific origins
  □ Security headers set (HSTS, CSP, etc.)
  □ Sensitive data redacted from logs
  □ SQL injection / XSS prevention verified

□ RELIABILITY
  □ Health check endpoints (/health/live, /health/ready)
  □ Timeouts configured for all external calls
  □ Circuit breakers for critical dependencies
  □ Retry logic with exponential backoff
  □ Graceful degradation paths defined
  □ Connection pools properly sized

□ OBSERVABILITY
  □ Structured logging with trace IDs
  □ Metrics for latency, errors, throughput
  □ Distributed tracing configured
  □ Alerting on SLO violations
  □ Dashboards for key metrics

□ DOCUMENTATION
  □ OpenAPI/Swagger spec available
  □ Error codes documented
  □ Rate limits documented
  □ Authentication documented
  □ Changelog maintained
  □ SDK/examples provided

□ OPERATIONS
  □ Deployment pipeline tested
  □ Rollback procedure documented
  □ On-call runbook created
  □ Capacity planning done
  □ Backup and recovery tested

Quick Debugging Commands

# Check API health
curl -s https://api.example.com/health | jq

# Test with authentication
curl -H "Authorization: Bearer $TOKEN" https://api.example.com/users/me

# Check response headers
curl -I https://api.example.com/users

# Time request breakdown
curl -w "@curl-format.txt" -s https://api.example.com/users

# Check rate limit headers
curl -I https://api.example.com/users | grep -i ratelimit

# Test with specific request ID
curl -H "X-Request-ID: $(uuidgen)" https://api.example.com/users

# Check SSL certificate
echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

The Complete API Design Quick Reference

Complete API Design Reference

Continue your journey: