SkillBlueprint: A Self-Enforcing React Native Audit System

TL;DR

Every React Native screen passes through a 5-phase workflow before it can be committed
The audit runs 8 layers, entirely offline:
1. Error handling
2. Security
3. Performance and real-device guard
4. Responsive layout
5. Navigation types
6. TypeScript
7. Learned rules
8. Test coverage intelligence
Only 2 of the 5 phases use AI tokens; the rest are deterministic and local
A screen fails commit if any required audit layer fails or test coverage drops below 80%
Every production bug gets encoded as an enforced rule -- the system gets harder to break over time, not easier

The problem with "we'll catch it in review"

PR review was our only enforcement layer. And it was slow, inconsistent, and human.

The same issues kept appearing across screens written by different engineers: tokens stored in AsyncStorage instead of EncryptedStorage, ScrollView wrapping flat lists with hundreds of items, useEffect triggering API calls on every render, Dimensions.get hardcoded into layout math. Each one caught in review. Each one appearing again two weeks later in a different file.

The root problem wasn't carelessness -- it was that these rules lived in heads and Notion docs, not in the tools engineers actually ran. We needed enforcement that fired before the PR was even opened.

How the workflow runs

SkillBlueprint is a five-phase pipeline. You describe what you want to build, and it produces committed, tested, audited code.

You say: "make login screen"
→ Phase 1: Reuse Check       (offline)
→ Phase 2: Code Generation   (AI)
→ Phase 3: Audit             (offline)
→ Phase 4: Test Generation   (AI)
→ Phase 5: Run Tests         (offline)

Phases 2 and 4 call the AI. Phases 1, 3, and 5 are fully offline -- no tokens, no latency, no cost.

Phase 1: Check what already exists

Before generating a single line, the system scans src/components/, src/hooks/, src/store/, and src/screens/ for implementations that match what's being requested.

This kills the most common form of codebase bloat: reimplementing what already exists. If useAuth already handles the token flow, code generation reuses it. If LoadingSpinner exists, it won't be recreated inline as a local component. Duplication is caught before it happens, not after.

Phase 2: Code generation with non-negotiable rules

Code generation runs against a strict ruleset. Not suggestions -- enforced rules. Any generated code that violates them fails the audit in Phase 3.

@/ path aliases only              no ../../../
Props interface at top of file
StyleSheet.create at bottom       no inline styles
FlashList for all lists           no ScrollView, no FlatList
FastImage for network images
EncryptedStorage for tokens       AsyncStorage banned for auth data
useFocusEffect for API calls      useEffect banned for data fetching
Zustand slice: loading + error    both required, not optional
testID on every interactive element

The rules exist because each one maps to a real incident -- a token leaked through AsyncStorage, a list that janked on budget Android, a crash that only reproduced on navigation focus. The audit layer is what gives these rules teeth.

Phase 3: The audit

npm run audit

Eight layers run in sequence. A hard FAIL on any layer blocks the commit.

Layer 1: Error handling

Scans for missing try/catch, empty catch blocks that silently swallow errors, and screens that fetch data without surfacing error or loading state to the UI. A screen that can fail without telling the user it failed doesn't pass.

Layer 2: Security

Flags AsyncStorage usage for any token or credential storage, hardcoded URLs, hardcoded API keys, and sensitive data passed through navigation params. Navigation params show up in crash reporters and navigation logs -- they're not a safe place for auth tokens or user IDs.

Layer 3: Performance and real-device guard

Static analysis catches the patterns that degrade performance on real hardware:

ScrollView wrapping lists -- virtualization off, full render on mount
Animated from react-native -- runs on the JS thread, causes jank; use react-native-reanimated (v4)
Dimensions.get for layout sizing -- breaks on orientation change and foldables
useEffect for API calls -- fires on every dependency change, not on screen focus

Beyond static checks, each screen gets a real-device performance guard. Ten code signals are mapped to three guards:

FPS Guard    → PASS / FAIL
Memory Guard → PASS / FAIL
Render Guard → PASS / FAIL

Any FAIL blocks the merge. The signals are derived from patterns with known performance regressions on low-end Android hardware -- the kind of devices most of our users are on.

Layer 4: Responsive layout

Catches hardcoded pixel dimensions that break on non-standard screen sizes, missing SafeAreaView on screens with content near the edges, incorrect Platform branching, and direct usage of Image from core React Native instead of FastImage.

Every navigation.navigate() call must be typed against the route params. Every useNavigation() call must carry its route type. Untyped navigation compiles but produces silent runtime errors when params are wrong -- the audit treats it as a hard failure.

Layer 6: TypeScript

Runs tsc --noEmit and surfaces every type error before it reaches CI. If TypeScript fails, the audit fails. This is the last line of defence before the code leaves the machine.

Layer 7: Learned rules

This layer reads from knowledge/learned-rules.json -- a file that grows every time a bug makes it through everything else and reaches production.

Bug reaches production
→ Root cause identified
→ Rule added to learned-rules.json
→ Audit enforces it on every future file
→ That bug class never ships again

This is where institutional knowledge actually lives. Not in a Confluence doc nobody opens, not in a Slack thread that scrolls away -- in a file that runs on every commit. A developer joining the team on day one gets every rule learned over the past year, automatically enforced.

Layer 8: Test coverage intelligence

Every screen gets a coverage scenario table before tests are generated:

Scenario	Status
Renders correctly	✅ Covered
Shows loading state	✅ Covered
Handles API error	✅ Covered
Empty state	❌ Missing
User interaction	✅ Covered
Navigation on success	✅ Covered
Storage read/write	N/A -- screen has no storage
Accessibility	❌ Missing

Coverage percentage is calculated from this table. Below 80% is a warning. A required scenario marked missing with no documented reason is a hard FAIL -- the commit is blocked until it's covered or explicitly marked N/A with a reason.

Auto-fix mode

npm run audit -- src/screens/LoginScreen.tsx --fix

For structural issues -- inline styles, missing testIDs, AsyncStorage replacements -- the audit rewrites the file directly and reports what changed. Anything it can't safely auto-fix is flagged with the exact file location and the rule it violated. The fix mode is intentionally conservative: we'd rather surface an issue than silently rewrite logic that has side effects we didn't model.

Phase 4 and 5: Test generation and execution

npm run testgen

Generates a full .test.tsx file with 15-20 test cases derived from the scenario table in Layer 8. Tests cover rendering, loading state, error state, user interactions, API flows, and storage behaviour.

npm test

Runs the suite against real component logic. A screen that failed the audit in Phase 3 doesn't get here -- tests for broken code don't run until the code is clean.

What actually changed

Before: every other PR had the same categories of issues. Token storage in the wrong place. Performance antipatterns on list screens. Untyped navigation calls. No test coverage. Each one depended on a reviewer remembering to look.

After: those issues are caught before the file is committed. By the time a PR is opened, the code has passed 8 audit layers, has a generated test suite, and has been type-checked. Review time dropped. Regressions in those categories dropped to zero.

The more important change: every bug that gets through now makes the system stronger. The rules compound. The longer the system runs, the harder it becomes to ship bad code by accident.

What it doesn't solve

The real-device performance guard is based on code signal analysis, not actual profiling. A screen can pass all three guards and still have a performance problem that only surfaces under real network conditions or with a large dataset loaded. We treat the guard as a floor -- necessary, but not sufficient. Real device testing on the low-end hardware we target is still required for anything performance-sensitive.

Learned rules are only as good as how they're written. A rule that's too broad blocks valid patterns. A rule that's too narrow misses the actual root cause. Every new rule addition goes through a review before it's committed to learned-rules.json -- the same care that goes into code goes into the rules that govern it.