API Design Part 8: Production Mastery
Master real interview questions, production debugging, multi-tenancy, streaming, cost-aware design, and API governance. Complete your journey to senior API engineer.
Moshiour Rahman
Advertisement
API Design Mastery Series
This is Part 8 of our comprehensive API Design series.
| Part | Topic | Level |
|---|---|---|
| 1 | HTTP & REST Fundamentals | Beginner |
| 2 | Security & Authentication | Beginner |
| 3 | Rate Limiting & Pagination | Intermediate |
| 4 | Versioning & Idempotency | Intermediate |
| 5 | Caching Strategies | Intermediate |
| 6 | GraphQL & gRPC | Intermediate |
| 7 | Resilience & Observability | Advanced |
| 8 | Production Mastery | Advanced |
Real Interview Questions
Beginner Level
Q: “What is REST?”
Strong Answer: “REST (Representational State Transfer) is an architectural style based on six constraints: client-server separation, statelessness, cacheability, uniform interface, layered system, and optional code-on-demand. In practice, REST APIs use HTTP methods semantically - GET for retrieval, POST for creation, PUT/PATCH for updates, DELETE for removal - and organize data around resources identified by URLs.”
Follow-up: “What makes an API truly RESTful vs just HTTP-based?”
Answer: “Many APIs called ‘REST’ are actually just HTTP APIs. True REST requires: using HTTP methods correctly (not POST for everything), proper status codes, resource-based URLs (not action-based like /getUser), and ideally HATEOAS where responses include links to related actions. Most production APIs stop at Level 2 of Richardson’s maturity model.”
Q: “Explain idempotency”
Strong Answer: “An operation is idempotent if performing it multiple times has the same effect as performing it once. GET, PUT, DELETE, and HEAD are idempotent by design. POST typically isn’t - creating the same order twice creates two orders. This matters critically in distributed systems where retries happen. If a payment POST times out, retrying without idempotency handling could double-charge. Solutions include client-generated idempotency keys that the server uses to deduplicate requests.”
Intermediate Level
Q: “How would you design rate limiting for a multi-tier API?”
Strong Answer: “I’d implement multiple layers:
- Global: CDN-level protection against DDoS, very high limits
- IP-based: Protect against unauthenticated abuse, moderate limits
- User/API-key: Based on subscription tier, the main business limit
- Endpoint-specific: Expensive operations (search, exports) get stricter limits
- Resource-specific: Prevent enumeration attacks (failed login attempts per account)
For the algorithm, I’d use sliding window counters in Redis - good balance of accuracy and memory. The key insight is making limits proportional to cost: a search that scans millions of rows should have stricter limits than a simple read.”
Q: “Design a pagination system for a feed with real-time updates”
Strong Answer: “Offset pagination breaks with real-time data - items shift positions. I’d use cursor-based pagination:
- Cursor: Encode the last item’s ID and timestamp (for sorting)
- Direction: Support both ‘after’ (newer) and ‘before’ (older)
- Real-time integration: New items since cursor can be fetched with ‘after’ cursor
- Consistency: The cursor represents a stable point, unaffected by new inserts
The cursor is an opaque base64-encoded JSON containing {id, timestamp, direction}. The client doesn’t parse it - just passes it back. This lets us change internal structure without breaking clients.”
Senior Level
Q: “Design the API for a payment processing system”
Strong Answer:
Idempotency: Every mutating endpoint requires an idempotency key. Payments are exactly-once operations - store the key with the result and return cached response on retry.
State machine: Payment status follows defined transitions (pending → processing → succeeded/failed). Invalid transitions return 409 Conflict.
Async operations: Payment processing is asynchronous. POST /payments returns 202 Accepted with a polling URL or webhook registration. Never make clients wait for bank responses.
Security: PCI compliance - never see full card numbers, use tokenization. Request signing with timestamps to prevent replay. IP allowlisting for webhook receivers.
API design:
POST /payments - Initiate payment (idempotency-key required)
GET /payments/:id - Get status
POST /payments/:id/capture - Capture authorized payment
POST /payments/:id/refund - Refund payment
GET /payments/:id/events - Audit log
Webhooks:
payment.created, payment.processing, payment.succeeded, payment.failed, payment.refunded
Q: “Your API is experiencing 10x normal traffic. Walk me through your response.”
Strong Answer:
Immediate triage (first 5 minutes):
- Is it legitimate traffic? Check geographic distribution, user agents, request patterns.
- What’s the impact? Check error rates, latency percentiles, queue depths.
If legitimate traffic:
- Scale horizontally (auto-scaling should kick in)
- Enable aggressive caching
- Consider graceful degradation (disable expensive features)
- Rate limit per-user to ensure fair access
If attack/abuse:
- Enable IP-based rate limiting at edge (CDN)
- Block obvious bot patterns
- If targeted at specific endpoint, add CAPTCHA or proof-of-work
After stabilization:
- Post-mortem: why didn’t auto-scaling keep up?
- Capacity planning: do we need more headroom?
- Load testing: can we simulate this for future preparedness?
Production Debugging Scenarios
Scenario 1: Intermittent 502 Errors
Symptom: Users report occasional “Bad Gateway” errors, but you can’t reproduce.
Investigation:
# Check load balancer logs for 502s
grep ' 502 ' /var/log/nginx/access.log | tail -100
# Look for patterns - specific endpoints? times?
grep ' 502 ' /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn
Common causes & solutions:
| Cause | Solution |
|---|---|
| Upstream timeout | Increase proxy_read_timeout |
| Connection pool exhaustion | Increase pool max connections |
| OOM kills | Add memory, optimize queries |
| Graceful restart timing | Configure proper drain time |
Scenario 2: Memory Leak
Symptom: Server memory grows over days, eventually crashes.
Investigation:
// Add memory tracking endpoint (dev only)
app.get('/debug/memory', (req, res) => {
const used = process.memoryUsage();
res.json({
heapUsed: Math.round(used.heapUsed / 1024 / 1024) + 'MB',
heapTotal: Math.round(used.heapTotal / 1024 / 1024) + 'MB',
rss: Math.round(used.rss / 1024 / 1024) + 'MB'
});
});
Common causes:
| Cause | Solution |
|---|---|
| Event listener leaks | Remove listeners on cleanup |
| Closure leaks | Avoid holding large objects in closures |
| Cache without eviction | Use LRU cache with size limits |
| Unreleased connections | Ensure proper connection cleanup |
Scenario 3: Slow Endpoint
Symptom: GET /users/search takes 5+ seconds.
Investigation - Add timing to identify bottleneck:
async function searchUsers(query: string) {
const timings: Record<string, number> = {};
const start = Date.now();
const parsed = parseQuery(query);
timings.parse = Date.now() - start;
const dbStart = Date.now();
const results = await db.user.findMany({ where: buildWhere(parsed) });
timings.database = Date.now() - dbStart;
timings.total = Date.now() - start;
console.log('Search timings:', timings);
return results;
}
// Output: { "parse": 1, "database": 4500, "total": 4506 }
Database is bottleneck → Check query:
EXPLAIN ANALYZE SELECT * FROM users WHERE name ILIKE '%john%';
-- Solution: Add GIN index for text search
CREATE INDEX users_name_search_idx ON users USING gin(to_tsvector('english', name));
Multi-Tenancy Patterns
Isolation Strategies
| Strategy | Isolation | Cost | Best For |
|---|---|---|---|
| Row-Level | Lowest | Lowest | SaaS, startups |
| Schema-per-Tenant | Medium | Medium | Regulated industries |
| Database-per-Tenant | Highest | Highest | Enterprise, compliance |
Row-Level Security Implementation
// Prisma middleware for automatic tenant filtering
prisma.$use(async (params, next) => {
const tenantId = getCurrentTenant();
if (tenantId && TENANT_MODELS.includes(params.model)) {
// Auto-add tenant filter to queries
if (params.action === 'findMany' || params.action === 'findFirst') {
params.args.where = { ...params.args.where, tenantId };
}
// Prevent cross-tenant writes
if (params.action === 'create') {
params.args.data.tenantId = tenantId;
}
if (params.action === 'update' || params.action === 'delete') {
params.args.where = { ...params.args.where, tenantId };
}
}
return next(params);
});
Noisy Neighbor Protection
class NoisyNeighborProtection {
private thresholds = {
requestsPerMinute: 1000,
cpuMsPerMinute: 60000,
memoryBytesMax: 512 * 1024 * 1024
};
async checkAndTrack(tenantId: string, metrics: TenantUsage): Promise<boolean> {
if (metrics.requestCount > this.thresholds.requestsPerMinute) {
await this.alert('rate_limit', tenantId);
return false;
}
// ... check other thresholds
return true;
}
}
Async & Event-Driven Design
When to Go Async
| Operation Duration | Pattern | Response |
|---|---|---|
| < 1 second | Sync | 200 OK + result |
| 1-30 seconds | Long polling / SSE | Connection held open |
| > 30 seconds | Async + webhooks | 202 Accepted + callback |
| Minutes/Hours | Background job | 202 Accepted + polling URL |
Webhook Implementation
export class WebhookService {
private retryDelays = [0, 60, 300, 900, 3600]; // seconds
async deliver(event: WebhookEvent, subscription: WebhookSubscription): Promise<void> {
const payload = JSON.stringify(event);
const signature = this.sign(payload, subscription.secret);
for (let attempt = 0; attempt < 5; attempt++) {
if (attempt > 0) await this.sleep(this.retryDelays[attempt] * 1000);
const response = await fetch(subscription.url, {
method: 'POST',
headers: {
'X-Webhook-ID': event.id,
'X-Webhook-Signature': signature
},
body: payload,
signal: AbortSignal.timeout(30000)
});
if (response.ok) return;
if (response.status < 500) return; // Don't retry client errors
}
await this.disableWebhook(subscription.id); // Max retries exceeded
}
private sign(payload: string, secret: string): string {
return `sha256=${createHmac('sha256', secret).update(payload).digest('hex')}`;
}
}
Large Payloads & Streaming
Streaming Response Pattern
async function streamLargeDataset(req: Request): Promise<Response> {
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
controller.enqueue(encoder.encode('{"data":['));
let first = true;
for await (const user of db.user.findMany({ cursor: true })) {
if (!first) controller.enqueue(encoder.encode(','));
first = false;
controller.enqueue(encoder.encode(JSON.stringify(user)));
}
controller.enqueue(encoder.encode(']}'));
controller.close();
}
});
return new Response(stream, {
headers: { 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked' }
});
}
Chunked Upload
class ChunkedUploadService {
private chunkSize = 5 * 1024 * 1024; // 5MB
async initializeUpload(fileName: string, totalSize: number): Promise<UploadSession> {
return {
id: crypto.randomUUID(),
fileName,
totalSize,
uploadedSize: 0,
chunks: new Map(),
expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000)
};
}
async uploadChunk(sessionId: string, chunkNumber: number, data: ArrayBuffer) {
const session = this.sessions.get(sessionId);
await this.storeChunk(sessionId, chunkNumber, data);
session.uploadedSize += data.byteLength;
if (session.uploadedSize >= session.totalSize) {
await this.finalizeUpload(session);
}
}
}
Cost-Aware API Design
At scale, every API call has a dollar value. Senior engineers design with cloud bills in mind.
Cost Per Request
| Component | Cost per Request |
|---|---|
| Load Balancer | $0.000001 |
| App Server | $0.00001 |
| Database Query | $0.0001 |
| Cache Hit | $0.000001 |
| External API | $0.001 |
Key insight: Cache hits save money. External APIs are expensive.
Endpoint Cost Tiers
const ENDPOINT_CONFIGS = [
// Cheap - simple reads
{ path: '/users/:id', method: 'GET', costTier: 'cheap', rateLimit: 1000 },
// Moderate - writes, filtered queries
{ path: '/users', method: 'POST', costTier: 'moderate', rateLimit: 100 },
// Expensive - aggregations, exports, AI
{ path: '/reports/generate', method: 'POST', costTier: 'expensive', rateLimit: 10 },
{ path: '/ai/analyze', method: 'POST', costTier: 'expensive', rateLimit: 5 }
];
Prevent N+1 Queries
// BAD: N+1 pattern - costs scale with orderIds.length
for (const id of orderIds) {
const items = await db.orderItem.findMany({ where: { orderId: id } });
}
// GOOD: Batch load - constant cost
const orders = await db.order.findMany({
where: { id: { in: orderIds } },
include: { items: true }
});
API Governance
Lifecycle Stages
| Stage | What Happens | Duration |
|---|---|---|
| Draft | RFC process, internal review | As needed |
| Alpha | Internal testing only | Weeks |
| Beta | Limited external, breaking changes allowed | Months |
| Stable | GA, semver strictly enforced | Years |
| Deprecated | 12-month minimum notice | 12+ months |
| Retired | Returns 410 Gone | Permanent |
API Registry
interface APIDefinition {
id: string;
name: string;
version: string;
status: 'draft' | 'alpha' | 'beta' | 'stable' | 'deprecated';
owner: { team: string; email: string; slackChannel: string };
sla: { availability: number; latencyP99Ms: number; errorRateMax: number };
consumers: string[];
dependencies: string[];
deprecation?: {
announcedAt: Date;
sunsetAt: Date;
migrationGuide: string;
};
}
Breaking Change Approval
| Change Type | Required Approvals |
|---|---|
| Additive | API owner only |
| Breaking | API owner + architect + affected consumers |
| Deprecation | API owner + architect + security |
The Complete API Engineer Checklist
| Category | Must Know | Senior Level |
|---|---|---|
| HTTP | Methods, status codes, headers | Content negotiation, conditional requests |
| Security | JWT, API keys, HTTPS | OAuth flows, mTLS, OWASP top 10 |
| Rate Limiting | Basic implementation | Multi-tier, distributed, adaptive |
| Pagination | Offset vs cursor | Connection pattern, real-time feeds |
| Caching | Cache-Control headers | Multi-layer architecture, invalidation |
| Versioning | URL-based | Migration strategies, deprecation |
| Idempotency | Concept and why it matters | Production implementation |
| Resilience | Timeouts, retries | Circuit breakers, bulkheads |
| Observability | Logging | Metrics, tracing, SLIs/SLOs |
| API Styles | REST fundamentals | GraphQL, gRPC, when to use each |
| Contracts | OpenAPI basics | CDC testing, schema validation |
| Async | Webhooks basics | Event schemas, backpressure |
| Multi-tenancy | Tenant filtering | Isolation models, noisy neighbor |
| Streaming | Basic pagination | Chunked uploads, gRPC streaming |
| Cost | Basic awareness | Fan-out prevention, endpoint costing |
| Governance | Documentation | Registry, change management |
Series Complete
Congratulations on completing the API Design Mastery series! You’ve covered everything from HTTP fundamentals to production debugging and enterprise governance.
The difference between junior and senior engineers isn’t just knowing these concepts - it’s having implemented them, debugged them at 3 AM, and understanding the tradeoffs.
The best API is the one your consumers love to use and your operations team can sleep through.
API Design Principles (The Golden Rules)
| Principle | Description | Example |
|---|---|---|
| Consistency | Same patterns everywhere | All resources use GET, POST, PUT, DELETE |
| Predictability | No surprises | Same error format across all endpoints |
| Simplicity | Easy to understand | GET /users/123 not GET /fetchUserById?id=123 |
| Documentation | Always current | OpenAPI spec auto-generated from code |
| Versioning | Backward compatible | Changes additive, breaking = new version |
| Security first | Defense in depth | Auth, rate limits, input validation |
Production Readiness Checklist
Before launching any API to production:
□ SECURITY
□ All endpoints require authentication (except health checks)
□ Input validation on all parameters
□ Rate limiting configured per tier
□ CORS configured with specific origins
□ Security headers set (HSTS, CSP, etc.)
□ Sensitive data redacted from logs
□ SQL injection / XSS prevention verified
□ RELIABILITY
□ Health check endpoints (/health/live, /health/ready)
□ Timeouts configured for all external calls
□ Circuit breakers for critical dependencies
□ Retry logic with exponential backoff
□ Graceful degradation paths defined
□ Connection pools properly sized
□ OBSERVABILITY
□ Structured logging with trace IDs
□ Metrics for latency, errors, throughput
□ Distributed tracing configured
□ Alerting on SLO violations
□ Dashboards for key metrics
□ DOCUMENTATION
□ OpenAPI/Swagger spec available
□ Error codes documented
□ Rate limits documented
□ Authentication documented
□ Changelog maintained
□ SDK/examples provided
□ OPERATIONS
□ Deployment pipeline tested
□ Rollback procedure documented
□ On-call runbook created
□ Capacity planning done
□ Backup and recovery tested
Quick Debugging Commands
# Check API health
curl -s https://api.example.com/health | jq
# Test with authentication
curl -H "Authorization: Bearer $TOKEN" https://api.example.com/users/me
# Check response headers
curl -I https://api.example.com/users
# Time request breakdown
curl -w "@curl-format.txt" -s https://api.example.com/users
# Check rate limit headers
curl -I https://api.example.com/users | grep -i ratelimit
# Test with specific request ID
curl -H "X-Request-ID: $(uuidgen)" https://api.example.com/users
# Check SSL certificate
echo | openssl s_client -connect api.example.com:443 2>/dev/null | openssl x509 -noout -dates
The Complete API Design Quick Reference

Continue your journey:
Advertisement
Moshiour Rahman
Software Architect & AI Engineer
Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.
Related Articles
API Design Part 1: HTTP & REST Fundamentals
Master HTTP methods, status codes, and REST maturity model. The foundation every API developer needs - from GET/POST basics to idempotency and proper status code selection.
System DesignAPI Design Mastery: Complete 8-Part Series
Master API design from HTTP fundamentals to production systems. 8-part comprehensive guide covering REST, security, caching, GraphQL, gRPC, resilience, and interview preparation.
System DesignAPI Design Part 5: Caching Strategies
Master multi-layer caching architecture, HTTP cache headers, ETags, and cache invalidation patterns. Build fast, scalable APIs with proper caching.
Comments
Comments are powered by GitHub Discussions.
Configure Giscus at giscus.app to enable comments.