System Design 14 min read

API Design Part 8: Production Mastery

Master real interview questions, production debugging, multi-tenancy, streaming, cost-aware design, and API governance. Complete your journey to senior API engineer.

MR

Moshiour Rahman

Advertisement

API Design Mastery Series

This is Part 8 of our comprehensive API Design series.

PartTopicLevel
1HTTP & REST FundamentalsBeginner
2Security & AuthenticationBeginner
3Rate Limiting & PaginationIntermediate
4Versioning & IdempotencyIntermediate
5Caching StrategiesIntermediate
6GraphQL & gRPCIntermediate
7Resilience & ObservabilityAdvanced
8Production MasteryAdvanced

API Lifecycle: From Design to Deprecation

Real Interview Questions

Beginner Level

Q: “What is REST?”

Strong Answer: “REST (Representational State Transfer) is an architectural style based on six constraints: client-server separation, statelessness, cacheability, uniform interface, layered system, and optional code-on-demand. In practice, REST APIs use HTTP methods semantically - GET for retrieval, POST for creation, PUT/PATCH for updates, DELETE for removal - and organize data around resources identified by URLs.”

Follow-up: “What makes an API truly RESTful vs just HTTP-based?”

Answer: “Many APIs called ‘REST’ are actually just HTTP APIs. True REST requires: using HTTP methods correctly (not POST for everything), proper status codes, resource-based URLs (not action-based like /getUser), and ideally HATEOAS where responses include links to related actions. Most production APIs stop at Level 2 of Richardson’s maturity model.”


Q: “Explain idempotency”

Strong Answer: “An operation is idempotent if performing it multiple times has the same effect as performing it once. GET, PUT, DELETE, and HEAD are idempotent by design. POST typically isn’t - creating the same order twice creates two orders. This matters critically in distributed systems where retries happen. If a payment POST times out, retrying without idempotency handling could double-charge. Solutions include client-generated idempotency keys that the server uses to deduplicate requests.”


Intermediate Level

Q: “How would you design rate limiting for a multi-tier API?”

Strong Answer: “I’d implement multiple layers:

  1. Global: CDN-level protection against DDoS, very high limits
  2. IP-based: Protect against unauthenticated abuse, moderate limits
  3. User/API-key: Based on subscription tier, the main business limit
  4. Endpoint-specific: Expensive operations (search, exports) get stricter limits
  5. Resource-specific: Prevent enumeration attacks (failed login attempts per account)

For the algorithm, I’d use sliding window counters in Redis - good balance of accuracy and memory. The key insight is making limits proportional to cost: a search that scans millions of rows should have stricter limits than a simple read.”


Q: “Design a pagination system for a feed with real-time updates”

Strong Answer: “Offset pagination breaks with real-time data - items shift positions. I’d use cursor-based pagination:

  1. Cursor: Encode the last item’s ID and timestamp (for sorting)
  2. Direction: Support both ‘after’ (newer) and ‘before’ (older)
  3. Real-time integration: New items since cursor can be fetched with ‘after’ cursor
  4. Consistency: The cursor represents a stable point, unaffected by new inserts

The cursor is an opaque base64-encoded JSON containing {id, timestamp, direction}. The client doesn’t parse it - just passes it back. This lets us change internal structure without breaking clients.”


Senior Level

Q: “Design the API for a payment processing system”

Strong Answer:

Idempotency: Every mutating endpoint requires an idempotency key. Payments are exactly-once operations - store the key with the result and return cached response on retry.

State machine: Payment status follows defined transitions (pending → processing → succeeded/failed). Invalid transitions return 409 Conflict.

Async operations: Payment processing is asynchronous. POST /payments returns 202 Accepted with a polling URL or webhook registration. Never make clients wait for bank responses.

Security: PCI compliance - never see full card numbers, use tokenization. Request signing with timestamps to prevent replay. IP allowlisting for webhook receivers.

API design:

POST   /payments              - Initiate payment (idempotency-key required)
GET    /payments/:id          - Get status
POST   /payments/:id/capture  - Capture authorized payment
POST   /payments/:id/refund   - Refund payment
GET    /payments/:id/events   - Audit log

Webhooks:
payment.created, payment.processing, payment.succeeded, payment.failed, payment.refunded

Q: “Your API is experiencing 10x normal traffic. Walk me through your response.”

Strong Answer:

Immediate triage (first 5 minutes):

  1. Is it legitimate traffic? Check geographic distribution, user agents, request patterns.
  2. What’s the impact? Check error rates, latency percentiles, queue depths.

If legitimate traffic:

  • Scale horizontally (auto-scaling should kick in)
  • Enable aggressive caching
  • Consider graceful degradation (disable expensive features)
  • Rate limit per-user to ensure fair access

If attack/abuse:

  • Enable IP-based rate limiting at edge (CDN)
  • Block obvious bot patterns
  • If targeted at specific endpoint, add CAPTCHA or proof-of-work

After stabilization:

  • Post-mortem: why didn’t auto-scaling keep up?
  • Capacity planning: do we need more headroom?
  • Load testing: can we simulate this for future preparedness?

Production Debugging Scenarios

Scenario 1: Intermittent 502 Errors

Symptom: Users report occasional “Bad Gateway” errors, but you can’t reproduce.

Investigation:

# Check load balancer logs for 502s
grep ' 502 ' /var/log/nginx/access.log | tail -100

# Look for patterns - specific endpoints? times?
grep ' 502 ' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn

Common causes & solutions:

CauseSolution
Upstream timeoutIncrease proxy_read_timeout
Connection pool exhaustionIncrease pool max connections
OOM killsAdd memory, optimize queries
Graceful restart timingConfigure proper drain time

Scenario 2: Memory Leak

Symptom: Server memory grows over days, eventually crashes.

Investigation:

// Add memory tracking endpoint (dev only)
app.get('/debug/memory', (req, res) => {
  const used = process.memoryUsage();
  res.json({
    heapUsed: Math.round(used.heapUsed / 1024 / 1024) + 'MB',
    heapTotal: Math.round(used.heapTotal / 1024 / 1024) + 'MB',
    rss: Math.round(used.rss / 1024 / 1024) + 'MB'
  });
});

Common causes:

CauseSolution
Event listener leaksRemove listeners on cleanup
Closure leaksAvoid holding large objects in closures
Cache without evictionUse LRU cache with size limits
Unreleased connectionsEnsure proper connection cleanup

Scenario 3: Slow Endpoint

Symptom: GET /users/search takes 5+ seconds.

Investigation - Add timing to identify bottleneck:

async function searchUsers(query: string) {
  const timings: Record<string, number> = {};
  const start = Date.now();

  const parsed = parseQuery(query);
  timings.parse = Date.now() - start;

  const dbStart = Date.now();
  const results = await db.user.findMany({ where: buildWhere(parsed) });
  timings.database = Date.now() - dbStart;

  timings.total = Date.now() - start;
  console.log('Search timings:', timings);
  return results;
}
// Output: { "parse": 1, "database": 4500, "total": 4506 }

Database is bottleneck → Check query:

EXPLAIN ANALYZE SELECT * FROM users WHERE name ILIKE '%john%';
-- Solution: Add GIN index for text search
CREATE INDEX users_name_search_idx ON users USING gin(to_tsvector('english', name));

Multi-Tenancy Patterns

Isolation Strategies

StrategyIsolationCostBest For
Row-LevelLowestLowestSaaS, startups
Schema-per-TenantMediumMediumRegulated industries
Database-per-TenantHighestHighestEnterprise, compliance

Row-Level Security Implementation

// Prisma middleware for automatic tenant filtering
prisma.$use(async (params, next) => {
  const tenantId = getCurrentTenant();

  if (tenantId && TENANT_MODELS.includes(params.model)) {
    // Auto-add tenant filter to queries
    if (params.action === 'findMany' || params.action === 'findFirst') {
      params.args.where = { ...params.args.where, tenantId };
    }

    // Prevent cross-tenant writes
    if (params.action === 'create') {
      params.args.data.tenantId = tenantId;
    }

    if (params.action === 'update' || params.action === 'delete') {
      params.args.where = { ...params.args.where, tenantId };
    }
  }

  return next(params);
});

Noisy Neighbor Protection

class NoisyNeighborProtection {
  private thresholds = {
    requestsPerMinute: 1000,
    cpuMsPerMinute: 60000,
    memoryBytesMax: 512 * 1024 * 1024
  };

  async checkAndTrack(tenantId: string, metrics: TenantUsage): Promise<boolean> {
    if (metrics.requestCount > this.thresholds.requestsPerMinute) {
      await this.alert('rate_limit', tenantId);
      return false;
    }
    // ... check other thresholds
    return true;
  }
}

Async & Event-Driven Design

When to Go Async

Operation DurationPatternResponse
< 1 secondSync200 OK + result
1-30 secondsLong polling / SSEConnection held open
> 30 secondsAsync + webhooks202 Accepted + callback
Minutes/HoursBackground job202 Accepted + polling URL

Webhook Implementation

export class WebhookService {
  private retryDelays = [0, 60, 300, 900, 3600]; // seconds

  async deliver(event: WebhookEvent, subscription: WebhookSubscription): Promise<void> {
    const payload = JSON.stringify(event);
    const signature = this.sign(payload, subscription.secret);

    for (let attempt = 0; attempt < 5; attempt++) {
      if (attempt > 0) await this.sleep(this.retryDelays[attempt] * 1000);

      const response = await fetch(subscription.url, {
        method: 'POST',
        headers: {
          'X-Webhook-ID': event.id,
          'X-Webhook-Signature': signature
        },
        body: payload,
        signal: AbortSignal.timeout(30000)
      });

      if (response.ok) return;
      if (response.status < 500) return; // Don't retry client errors
    }

    await this.disableWebhook(subscription.id); // Max retries exceeded
  }

  private sign(payload: string, secret: string): string {
    return `sha256=${createHmac('sha256', secret).update(payload).digest('hex')}`;
  }
}

Large Payloads & Streaming

Streaming Response Pattern

async function streamLargeDataset(req: Request): Promise<Response> {
  const encoder = new TextEncoder();

  const stream = new ReadableStream({
    async start(controller) {
      controller.enqueue(encoder.encode('{"data":['));

      let first = true;
      for await (const user of db.user.findMany({ cursor: true })) {
        if (!first) controller.enqueue(encoder.encode(','));
        first = false;
        controller.enqueue(encoder.encode(JSON.stringify(user)));
      }

      controller.enqueue(encoder.encode(']}'));
      controller.close();
    }
  });

  return new Response(stream, {
    headers: { 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked' }
  });
}

Chunked Upload

class ChunkedUploadService {
  private chunkSize = 5 * 1024 * 1024; // 5MB

  async initializeUpload(fileName: string, totalSize: number): Promise<UploadSession> {
    return {
      id: crypto.randomUUID(),
      fileName,
      totalSize,
      uploadedSize: 0,
      chunks: new Map(),
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000)
    };
  }

  async uploadChunk(sessionId: string, chunkNumber: number, data: ArrayBuffer) {
    const session = this.sessions.get(sessionId);
    await this.storeChunk(sessionId, chunkNumber, data);
    session.uploadedSize += data.byteLength;

    if (session.uploadedSize >= session.totalSize) {
      await this.finalizeUpload(session);
    }
  }
}

Cost-Aware API Design

At scale, every API call has a dollar value. Senior engineers design with cloud bills in mind.

Cost Per Request

ComponentCost per Request
Load Balancer$0.000001
App Server$0.00001
Database Query$0.0001
Cache Hit$0.000001
External API$0.001

Key insight: Cache hits save money. External APIs are expensive.

Endpoint Cost Tiers

const ENDPOINT_CONFIGS = [
  // Cheap - simple reads
  { path: '/users/:id', method: 'GET', costTier: 'cheap', rateLimit: 1000 },

  // Moderate - writes, filtered queries
  { path: '/users', method: 'POST', costTier: 'moderate', rateLimit: 100 },

  // Expensive - aggregations, exports, AI
  { path: '/reports/generate', method: 'POST', costTier: 'expensive', rateLimit: 10 },
  { path: '/ai/analyze', method: 'POST', costTier: 'expensive', rateLimit: 5 }
];

Prevent N+1 Queries

// BAD: N+1 pattern - costs scale with orderIds.length
for (const id of orderIds) {
  const items = await db.orderItem.findMany({ where: { orderId: id } });
}

// GOOD: Batch load - constant cost
const orders = await db.order.findMany({
  where: { id: { in: orderIds } },
  include: { items: true }
});

API Governance

Lifecycle Stages

StageWhat HappensDuration
DraftRFC process, internal reviewAs needed
AlphaInternal testing onlyWeeks
BetaLimited external, breaking changes allowedMonths
StableGA, semver strictly enforcedYears
Deprecated12-month minimum notice12+ months
RetiredReturns 410 GonePermanent

API Registry

interface APIDefinition {
  id: string;
  name: string;
  version: string;
  status: 'draft' | 'alpha' | 'beta' | 'stable' | 'deprecated';
  owner: { team: string; email: string; slackChannel: string };
  sla: { availability: number; latencyP99Ms: number; errorRateMax: number };
  consumers: string[];
  dependencies: string[];
  deprecation?: {
    announcedAt: Date;
    sunsetAt: Date;
    migrationGuide: string;
  };
}

Breaking Change Approval

Change TypeRequired Approvals
AdditiveAPI owner only
BreakingAPI owner + architect + affected consumers
DeprecationAPI owner + architect + security

The Complete API Engineer Checklist

CategoryMust KnowSenior Level
HTTPMethods, status codes, headersContent negotiation, conditional requests
SecurityJWT, API keys, HTTPSOAuth flows, mTLS, OWASP top 10
Rate LimitingBasic implementationMulti-tier, distributed, adaptive
PaginationOffset vs cursorConnection pattern, real-time feeds
CachingCache-Control headersMulti-layer architecture, invalidation
VersioningURL-basedMigration strategies, deprecation
IdempotencyConcept and why it mattersProduction implementation
ResilienceTimeouts, retriesCircuit breakers, bulkheads
ObservabilityLoggingMetrics, tracing, SLIs/SLOs
API StylesREST fundamentalsGraphQL, gRPC, when to use each
ContractsOpenAPI basicsCDC testing, schema validation
AsyncWebhooks basicsEvent schemas, backpressure
Multi-tenancyTenant filteringIsolation models, noisy neighbor
StreamingBasic paginationChunked uploads, gRPC streaming
CostBasic awarenessFan-out prevention, endpoint costing
GovernanceDocumentationRegistry, change management

Series Complete

Congratulations on completing the API Design Mastery series! You’ve covered everything from HTTP fundamentals to production debugging and enterprise governance.

The difference between junior and senior engineers isn’t just knowing these concepts - it’s having implemented them, debugged them at 3 AM, and understanding the tradeoffs.

The best API is the one your consumers love to use and your operations team can sleep through.


API Design Principles (The Golden Rules)

PrincipleDescriptionExample
ConsistencySame patterns everywhereAll resources use GET, POST, PUT, DELETE
PredictabilityNo surprisesSame error format across all endpoints
SimplicityEasy to understandGET /users/123 not GET /fetchUserById?id=123
DocumentationAlways currentOpenAPI spec auto-generated from code
VersioningBackward compatibleChanges additive, breaking = new version
Security firstDefense in depthAuth, rate limits, input validation

Production Readiness Checklist

Before launching any API to production:

□ SECURITY
  □ All endpoints require authentication (except health checks)
  □ Input validation on all parameters
  □ Rate limiting configured per tier
  □ CORS configured with specific origins
  □ Security headers set (HSTS, CSP, etc.)
  □ Sensitive data redacted from logs
  □ SQL injection / XSS prevention verified

□ RELIABILITY
  □ Health check endpoints (/health/live, /health/ready)
  □ Timeouts configured for all external calls
  □ Circuit breakers for critical dependencies
  □ Retry logic with exponential backoff
  □ Graceful degradation paths defined
  □ Connection pools properly sized

□ OBSERVABILITY
  □ Structured logging with trace IDs
  □ Metrics for latency, errors, throughput
  □ Distributed tracing configured
  □ Alerting on SLO violations
  □ Dashboards for key metrics

□ DOCUMENTATION
  □ OpenAPI/Swagger spec available
  □ Error codes documented
  □ Rate limits documented
  □ Authentication documented
  □ Changelog maintained
  □ SDK/examples provided

□ OPERATIONS
  □ Deployment pipeline tested
  □ Rollback procedure documented
  □ On-call runbook created
  □ Capacity planning done
  □ Backup and recovery tested

Quick Debugging Commands

# Check API health
curl -s https://api.example.com/health | jq

# Test with authentication
curl -H "Authorization: Bearer $TOKEN" https://api.example.com/users/me

# Check response headers
curl -I https://api.example.com/users

# Time request breakdown
curl -w "@curl-format.txt" -s https://api.example.com/users

# Check rate limit headers
curl -I https://api.example.com/users | grep -i ratelimit

# Test with specific request ID
curl -H "X-Request-ID: $(uuidgen)" https://api.example.com/users

# Check SSL certificate
echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates

The Complete API Design Quick Reference

Complete API Design Reference


Continue your journey:

Advertisement

MR

Moshiour Rahman

Software Architect & AI Engineer

Share:
MR

Moshiour Rahman

Software Architect & AI Engineer

Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.

Related Articles

Comments

Comments are powered by GitHub Discussions.

Configure Giscus at giscus.app to enable comments.