Domain-Driven Lambda Architecture That Scales


TL;DR
- We started with flat Lambda functions that directly called Prisma, had no validation layer, and inconsistent error handling across 15+ endpoints
- At around 15 endpoints, the codebase became unreadable. Bugs were hard to trace because validation, business logic, and response formatting all lived in the same function
- We restructured into a domain-driven architecture: handler (routing + middleware) → controller (orchestration) → service (business logic) → DTO (Zod validation) → serializer (response shaping)
- Consolidated 10+ separate admin Lambdas into one admin-router, cutting cold starts from 2-3 seconds to under 500ms
- Built a domain generator script that scaffolds a complete CRUD domain (6 files, 5 endpoints) in one command
- The codebase now has 20 domains, 36 handlers, and 100+ API routes, all following the same pattern
We run an e-commerce platform on AWS Lambda. The API layer handles everything: admin CRUD for products, categories, stores, and tests; public endpoints for customer accounts, search, and reviews; background sync between Shopify and Strapi; and a content API serving pre-assembled responses from Redis.
At the start, it was three engineers and a handful of Lambda functions. By the time we hit 15 endpoints, the codebase was a mess. This is the story of how we restructured it.
What did the codebase look like before?
The first Lambda functions looked like this:
export const handler = async (event: APIGatewayProxyEvent) => {
const body = JSON.parse(event.body || "{}");
if (!body.name) {
return { statusCode: 400, body: JSON.stringify({ error: "Name required" }) };
}
const category = await prisma.category.create({
data: { name: body.name, status: body.status || "ACTIVE" }
});
return {
statusCode: 201,
body: JSON.stringify({ message: "Created", data: category })
};
};
No layers. No validation library. No error handling beyond manual if-checks. The Lambda function was the entire application.
This worked when we had five endpoints. Then we added products, stores, questions with dependencies, test attempts, and customer management. Some Lambdas called services. Some called Prisma directly. Some validated input, some did not. Error responses had different shapes depending on who wrote the handler.
The problems stacked up fast:
Silent data corruption. A create endpoint accepted any JSON. A typo in a field name would silently insert null values into the database. We caught these days later when a frontend developer reported broken API responses.
Inconsistent error responses. One Lambda returned { error: "Not found" }. Another returned { message: "Category not found", statusCode: 404 }. A third returned { error: { message: "...", code: "NOT_FOUND" } }. The frontend had to handle three different error shapes.
Debugging was guessing. When something broke, we could not tell if it was bad input, a Prisma query error, or a logic bug. Everything ran in one function, so the stack trace pointed at a 200-line Lambda that did validation, database queries, response formatting, and error handling all in sequence.
Onboarding was slow. Every new domain was written slightly differently. A new engineer had to read every existing Lambda to understand the conventions, because there were no conventions.

What did we try first?
We evaluated three approaches before landing on the current one.
Option 1: NestJS on Lambda. NestJS has the exact pattern we wanted: modules, controllers, services, DTOs. But it carries a large dependency tree, the cold start penalty was brutal on Lambda (4-6 seconds with the full framework), and we did not need half its features. We needed the pattern, not the framework.
Option 2: Express or Hono on Lambda. Lighter than NestJS, but still framework-level abstractions we did not need. Our admin Lambda runs behind API Gateway which already handles routing. Adding another router inside the Lambda felt redundant.
Option 3: Build the pattern ourselves. Take the concepts from NestJS and Express MVC, the handler-controller-service layering, Zod for validation, and implement them as plain TypeScript classes with zero framework dependencies. The Lambda cold start stays at 100-200ms. The pattern is enforced by convention and code generation, not by a runtime.
We went with Option 3.
How is the codebase structured now?
Every domain follows the same folder structure:
src/app/categories/
├── dto/
│ └── category.dto.ts # Zod schemas for validation
├── services/
│ └── categories.service.ts # Business logic, Prisma queries
├── controllers/
│ └── admin/
│ └── categories.controller.ts # Orchestration
├── serializers/
│ └── admin/
│ └── categories.serializer.ts # Response shaping
├── handlers/
│ └── admin/
│ └── router.handler.ts # Lambda entry point
└── types/
└── categories.types.ts # TypeScript types
Six files per domain. Each file has one job. The data flow is always the same:
Request → Handler → Controller → Service → Prisma → Database
↓ ↓ ↓
Middleware Serializer DTO Validator
There are no exceptions to this flow. Handlers never call services. Controllers never touch Prisma. Services never format HTTP responses.
What does each layer do?
DTOs: Zod schemas that reject bad data at the door
Every create, update, and list operation has a Zod schema. Validation happens in middleware before the handler function runs.
import { z } from "zod";
import {
createEnumField, createIdParamSchema, nameField,
paginationFields, searchQueryField
} from "../../../shared/index.js";
const STATUS_VALUES = ["ACTIVE", "DRAFT"] as const;
export const createCategorySchema = z.object({
name: nameField,
status: createEnumField(STATUS_VALUES).default("ACTIVE")
}).passthrough();
export const listCategoriesSchema = z.object({
q: searchQueryField.optional(),
status: createEnumField(STATUS_VALUES).optional(),
...paginationFields
}).passthrough();
export const updateCategorySchema = z.object({
name: nameField.optional(),
status: createEnumField(STATUS_VALUES).optional()
}).passthrough();
export const categoryIdParamSchema = createIdParamSchema("Category");
The nameField, searchQueryField, and paginationFields helpers are shared across all domains. paginationFields enforces that limit is between 1 and 100, offset is non-negative, and both default to sensible values. Every domain gets this for free.
The .passthrough() call is deliberate. It lets the schema accept unknown fields without failing, so frontend teams can send extra data during development without breaking things. The service layer only reads the fields it knows about.
Services: business logic that talks to the database
Services own all Prisma queries. They throw typed errors that the error boundary middleware converts to HTTP responses.
export class CategoriesService {
async list(params: ListCategoriesDto) {
const where = this.buildListWhereClause(params);
const [rows, total] = await prisma.$transaction([
prisma.category.findMany({
where, take: params.limit, skip: params.offset,
orderBy: { createdAt: "desc" }
}),
prisma.category.count({ where })
]);
return { rows, total };
}
async get(id: string) {
const category = await prisma.category.findUnique({ where: { id } });
if (!category) throw new NotFoundError("Category not found");
return category;
}
async create(payload: CreateCategoryDto) {
return prisma.category.create({
data: { name: payload.name, status: payload.status }
});
}
async update(id: string, payload: UpdateCategoryDto) {
await prisma.category.findUniqueOrThrow({
where: { id }, select: { id: true }
});
return prisma.category.update({
where: { id }, data: this.buildUpdateData(payload)
});
}
private buildListWhereClause(params: ListCategoriesDto) {
const where: any = {};
if (params.status) where.status = params.status;
if (params.q?.trim()) {
where.name = { contains: params.q.trim(), mode: "insensitive" };
}
return where;
}
private buildUpdateData(payload: UpdateCategoryDto) {
const data: any = {};
if (payload.name !== undefined) data.name = payload.name;
if (payload.status !== undefined) data.status = payload.status;
return data;
}
}
Two patterns worth noting. First, buildListWhereClause and buildUpdateData are private methods on every service. They keep the main CRUD methods clean and make it obvious how filters and partial updates work. Second, the update method calls findUniqueOrThrow with select: { id: true } before updating. This is a cheap existence check that throws a Prisma P2025 error if the record does not exist, which the error boundary converts to a 404.
Controllers: thin orchestration, no logic
Controllers wire services to serializers. They do not contain business logic.
export class AdminCategoriesController {
private readonly service = new CategoriesService();
async create(payload: CreateCategoryDto) {
const category = await this.service.create(payload);
return serializeCategory(category);
}
async list(query: ListCategoriesDto) {
const result = await this.service.list(query);
return serializeCategoryList(result, {
limit: query.limit, offset: query.offset
});
}
async get(idParam: string | undefined) {
const id = parseIdWithSchema(
idParam, categoryIdParamSchema,
"Category id must be a valid UUID"
);
const category = await this.service.get(id);
return serializeCategory(category);
}
}
Controllers have no explicit return types. TypeScript infers them from the serializer output. This eliminates a category of bugs where the return type annotation says one thing but the actual return value is different.
Serializers: Prisma camelCase to API snake_case
Serializers transform database records into API response shapes. They handle the case conversion and field selection.
import { serializeTimestamps, serializePaginatedList } from "../../../../shared/index.js";
export const serializeCategory = (category: CategoryEntity) => ({
id: category.id,
name: category.name,
status: category.status,
...serializeTimestamps(category)
});
export const serializeCategoryList = (
result: CategoryListResult,
pagination: { limit: number; offset: number }
) => serializePaginatedList(result, pagination, serializeCategory);
serializeTimestamps converts Prisma's createdAt and updatedAt to created_at and updated_at. serializePaginatedList wraps any list in the standard { data, limit, offset, count, total } envelope. Every list endpoint in the API returns the same shape.
One thing worth debating: the serializer layer is optional. If you maintain the same field naming convention everywhere (request, database, response), you can skip serialization entirely and return Prisma data directly. We chose to keep it for two reasons. First, Prisma uses camelCase (createdAt) and our API uses snake_case (created_at), so transformation is unavoidable. Second, serializers act as a security boundary. Without them, adding a new column to a Prisma model (say, internalNotes) would automatically expose it in every API response. Serializers force you to explicitly list which fields reach the client. The extra 10-15 lines per domain are worth it for that safety net.
Handlers: Lambda entry point with middleware composition
Handlers are the only files that know about AWS Lambda. They compose middleware, match routes, and delegate to controllers.
const controller = new AdminCategoriesController();
const createBase = async (event: APIGatewayProxyEvent) => {
const payload = event.body as unknown as CreateCategoryDto;
logger.info("[categories:create]", { payload });
const data = await controller.create(payload);
return createdResponse("Category created successfully", data);
};
// Middleware composition: auth wraps validation wraps handler
const createHandler = adminAuthMiddleware(
validationMiddleware(createCategorySchema)(createBase)
);
const listHandler = adminAuthMiddleware(
queryValidationMiddleware(listCategoriesSchema)(listBase)
);
const getHandler = adminAuthMiddleware(getBase);
function route(event: APIGatewayProxyEvent) {
const path = event.path;
const method = event.httpMethod;
if (path === "/api/admin/categories" && method === "POST")
return createHandler(event);
if (path === "/api/admin/categories" && method === "GET")
return listHandler(event);
if (/\/api\/admin\/categories\/.+$/i.test(path) && method === "GET")
return getHandler(event);
return errorResponse("Route not found", 404);
}
export const handler = errorBoundary(route);
The middleware stack reads bottom-up: errorBoundary catches all errors, adminAuthMiddleware validates the JWT, validationMiddleware parses and validates the request body with Zod, and then the base handler runs.
How do admin and public routes stay separated?
Not all endpoints need the same middleware. Admin endpoints require JWT authentication and return full entity data. Public endpoints are unauthenticated and often return a different shape (less fields, different serialization).
We handle this with folder structure, not configuration. Each domain can have both admin/ and public/ subdirectories under handlers/, controllers/, and serializers/:
src/app/test-attempts/
├── handlers/
│ ├── admin/
│ │ └── router.handler.ts # JWT auth, full data
│ └── public/
│ └── result.handler.ts # No auth, minimal data
├── controllers/
│ ├── admin/
│ │ └── test-attempts.controller.ts
│ └── public/
│ └── test-attempts.controller.ts
├── serializers/
│ ├── admin/
│ │ └── test-attempts.serializer.ts # All fields
│ └── public/
│ └── test-attempts.serializer.ts # Subset of fields
├── services/
│ └── test-attempts.service.ts # Shared — same business logic
└── dto/
└── test-attempt.dto.ts # Shared — same validation
The service and DTO layers are shared. An admin controller and a public controller for the same domain both call the same service. The difference is in the middleware (admin gets adminAuthMiddleware, public gets frontendAuthMiddleware or none) and the serializer (admin exposes all fields, public exposes a subset).
This means adding a public-facing version of an admin-only endpoint does not require duplicating business logic. You add a new handler, controller, and serializer under public/, wire them to the existing service, and deploy a separate Lambda for the public route.
Admin routes live at /api/admin/* and go through the consolidated admin-router Lambda. Public routes live at /api/* (no /admin/ prefix) and each gets its own Lambda with tuned settings. The path prefix makes it impossible to accidentally expose an admin endpoint without authentication.
How do we decide which routes share a Lambda?
We started with one Lambda per domain. At 15 domains, we had 15 separate Lambdas, each with its own cold start container. An admin user opening the dashboard would hit categories, products, and stores in quick succession. Each first request waited 2-3 seconds. The admin panel felt broken.
The fix was not "merge everything into one Lambda." It was grouping routes by their operational characteristics: same auth, same timeout, same memory, and co-accessed together.
We ended up with five groups:
| Group | Lambda(s) | Why grouped this way |
|---|---|---|
| Admin CRUD | 1 admin-router | Same JWT auth, same 30s timeout, same 256MB, all hit from admin panel together |
| Content APIs | 4 routers (products, collections, media, misc) | Same cache middleware, but different traffic per frontend page — products get 10x the hits of videos |
| Background sync | shopify-sync, strapi-reconcile | 900s timeout, 512MB — cannot share with 30s endpoints |
| Webhooks | shopify-webhook, strapi-webhook | Triggered by external systems, need independent scaling and error isolation |
| Public endpoints | Each separate (search, reviews, tests, account) | Different auth patterns, independent traffic, independent scaling |
The admin-router is the biggest consolidation. Ten domain handlers, one Lambda:
// src/app/admin/handlers/admin-router.handler.ts
import { handler as categoriesHandler } from "../../categories/handlers/admin/router.handler.js";
import { handler as productsHandler } from "../../products/handlers/admin/router.handler.js";
import { handler as questionsHandler } from "../../questions/handlers/admin/router.handler.js";
function route(event: APIGatewayProxyEvent) {
const path = event.path;
if (path.includes("/api/admin/categories")) return categoriesHandler(event);
if (path.includes("/api/admin/products")) return productsHandler(event);
if (path.includes("/api/admin/questions")) return questionsHandler(event);
return errorResponse("Route not found", 404);
}
export const handler = errorBoundary(route);
One container, one cold start. After the first request, every admin endpoint responds in under 100ms.
Content endpoints follow a different pattern. We group them by what frontend pages need. The homepage hits products and templates. The blog page hits blogs, authors, and blog categories. So we have content-media-router (blogs, authors, videos, video categories) and content-misc-router (templates, pages, ingredients, results) as separate Lambdas. Each group shares the same cache middleware and timeout, but scales independently based on which pages get traffic.
The rule: same auth + same timeout + same memory + co-accessed = merge. Anything that differs on any of those = keep separate.
How does the middleware stack work?
We built six middleware functions. Each wraps a handler and adds one concern.
Error boundary catches everything. Prisma errors, Zod errors, our custom AppError classes, and unexpected exceptions all get converted to the right HTTP status code with a consistent response shape.
// Prisma unique constraint violation → 409 Conflict
// Prisma record not found → 404 Not Found
// ZodError → 422 Unprocessable Entity
// NotFoundError → 404
// UnauthorizedError → 401
// ValidationError → 422
// Unknown error → 500
Auth middleware extracts the JWT from the Authorization header or cookie, verifies it, and attaches the decoded payload to the event. Every admin endpoint gets this automatically through the middleware composition.
Validation middleware comes in two flavors: validationMiddleware for POST/PUT bodies and queryValidationMiddleware for GET query parameters. Both parse the input with a Zod schema and return a 422 with the exact validation errors if it fails. By the time the base handler runs, the input is guaranteed to match the schema.
The composition pattern means adding a new middleware takes one line. When we added the assembled cache layer for content endpoints, it was:
export const handler = errorBoundary(
withAssembledCache(
withDependencyTracking(route)
)
);
No changes to any existing handler code.
How does the domain generator work?
After building the fourth domain by copy-pasting from categories and doing find-replace, we built a generator:
npx tsx scripts/generators/generate-domain.ts coupons Coupon
This creates the complete domain in one command:
src/app/coupons/
├── dto/coupon.dto.ts
├── services/coupons.service.ts
├── controllers/admin/coupons.controller.ts
├── serializers/admin/coupons.serializer.ts
├── handlers/admin/router.handler.ts
└── types/coupons.types.ts
Every generated file follows the exact same pattern. The DTO uses the shared field helpers. The service has buildListWhereClause and buildUpdateData. The controller uses parseIdWithSchema. The handler composes middleware in the same order.
After generation, the developer customizes three things:
- Update the DTO with actual fields from the Prisma model
- Update the service's
create()andbuildUpdateData()with field mappings - Update the serializer with the fields to expose
Then add the domain to the admin-router, and it is live.
The generator eliminated two problems: copy-paste typos (we had a bug where a products handler was returning the categories controller's response because someone forgot to rename it) and pattern drift (no more "I will just put the validation in the service this time").
What do the numbers look like?
| Metric | Before | After |
|---|---|---|
| Admin endpoint cold start | 2-3s (separate Lambdas) | 400-500ms (consolidated) |
| Time to add a new CRUD domain | 2-3 hours (manual) | 15-20 minutes (generator + customization) |
| Validation coverage | Partial (some endpoints had none) | 100% (Zod on every endpoint) |
| Error response formats | 3+ different shapes | 1 consistent shape |
| Domains | 5 (flat structure) | 20 (domain-driven) |
| Total API routes | ~15 | 100+ |
| Lambda functions deployed | 15+ | 13 (consolidated) |
What did we get wrong?
We should have started with this structure from day one. The migration sprint took two weeks, and most of that time was untangling business logic from response formatting and manual validation. If we had started with the layered pattern when the codebase had five endpoints, the migration cost would have been zero.
The generator script should have been built earlier. We built four domains by hand before creating the generator. By that point, we had already introduced small inconsistencies: one domain named its service method getById while others used get. One serializer returned createdAt in camelCase while others used created_at. The generator would have prevented all of this.
The admin-router import list grows linearly. Every new admin domain adds an import line and a route match to the admin-router. At 10 domains it is manageable. At 30 it will be a long file. We have discussed dynamic route registration but have not needed it yet.
No unit tests on the layers. The architecture is built for testability: services can be tested without HTTP, controllers without Lambda. But we do not have unit tests yet. We rely on integration tests against the live API and TypeScript's type system to catch mistakes. This has worked so far, but it is a gap.
What would we tell someone building this?
Start with the layers even if you have one endpoint. The cost of a handler, controller, service, and DTO for a single CRUD resource is six small files and maybe 30 minutes of setup. The cost of migrating a flat codebase to this pattern later is days of work and a high risk of introducing bugs during the rewrite.
Invest in shared utilities early. Our paginationFields, createIdParamSchema, serializeTimestamps, and serializePaginatedList helpers save 20-30 lines per domain. More importantly, they enforce consistency. Every list endpoint returns the same envelope. Every error response has the same shape. The frontend team writes one API client and it works everywhere.
Consolidate Lambdas aggressively for related endpoints. Separate Lambdas make sense when you need different timeout, memory, or concurrency settings. For CRUD endpoints that all need the same resources, one Lambda with internal routing is faster (shared cold start), simpler (one Terraform block), and cheaper (one set of CloudWatch log groups).
Build the generator before the third domain. Not after the fourth. The moment you catch yourself copying files and doing find-replace, automate it. The generator is not about saving time on day one. It is about preventing the subtle inconsistencies that accumulate across months and make the codebase feel like it was written by ten different people.





