From one AI risk map to a configurable assessment platform
The original challenge was to see whether AI-assisted development could build a useful, auditable risk tool from prompts and direction. The result has evolved into two local browser apps: a configurator for designing risk frameworks and an assessor for running evidence-based assessments against reusable JSON packs.
The Problem
Organisations need fast, repeatable ways to assess cyber, AI, privacy, supplier, continuity and operational risk. Traditional GRC platforms are often expensive and slow to tailor. Spreadsheets are flexible but inconsistent. Runtime AI scoring can be impressive, but it is hard to defend when an auditor, regulator or board asks why a risk landed in a particular band.
The governance gap
Risk teams need reusable methods, clear evidence trails, transparent scoring and reports that survive challenge. They also need to adapt quickly as new risk themes emerge: agentic AI, cyber insurance, SaaS posture, SBOMs, CISA KEV, cloud landing zones and concentration risk.
The design response
The toolkit separates method design from assessment execution. The configurator creates the map, questions, controls and scoring model. The assessor loads that JSON and applies the method consistently, offline, with no hidden server-side scoring.
The Original Challenge
The first version was deliberately constrained: could a useful governance tool be built with AI assistance, clear direction and review, without hand-coding the application line by line? The point was not to produce a toy demo. It was to test whether a real business problem could become a working, documented, auditable local application quickly enough to matter.
The business problem
AI adoption was moving faster than practical AI risk governance. Existing approaches were either generic IT risk templates, consultant-heavy frameworks or opaque tools. The gap was a structured assessment tool that a risk or security team could run locally and explain afterwards.
The experiment
The rule was simple: use AI for research, taxonomy design, code generation, testing and documentation, while the human supplies judgement, direction and acceptance. The human role was to define what good looks like and reject work that did not meet it.
- No backend service and no database.
- No hidden runtime AI decision-maker for the score.
- No data transmission by default.
- Every result must be explainable from the config, answers, matrix and controls.
- The app must be shareable as local files and inspectable by opening the source.
Why Deterministic Scoring
| Choice | Strength | Risk | Outcome |
|---|---|---|---|
| Runtime AI scoring | Can interpret messy context and produce fluent narratives. | Hard to reproduce, hard to audit, can change between runs, and may expose assessment data to external APIs. | Useful for drafting and analysis, but weak as the core scoring authority. |
| Deterministic evidence scoring | Same answers produce the same result every time. The formula, matrix and controls are visible. | Requires a well-designed taxonomy and disciplined config maintenance. | Chosen for the assessor. AI can assist around the edges, but the risk result remains explainable. |
How It Was Built
The build became a loop: research the domain, model the method, implement the local browser apps, test against edge cases, then generalise the method into reusable JSON packs.
Prompting Strategy
The most useful prompting pattern was not asking for code first. It was asking the model to clarify the problem, generate the development brief, challenge the brief, and only then implement. That reduced rework because the early prompts focused on intent, constraints and acceptance criteria.
| Technique | How it helped |
|---|---|
| Reverse prompting | Start with the goal, ask AI what it needs to know, answer those questions, then use the refined brief for implementation. |
| Cross-model review | Use another model or another pass to find missing requirements, contradictions and weak assumptions. |
| Outcome-first prompts | Describe the business outcome, not just the component. This helped the apps stay coherent across scoring, UI, exports and config schema. |
| Validation prompts | Ask for tests and edge cases, then run them. The project improved when failures were treated as design feedback rather than just bugs. |
The Current Product
Risk Configurator
Defines the assessment method: families, periods, risk cells, scope logic, vendor questions, internal questions, controls, scoring options, risk matrix, appetite and AI prompt. It includes a config gallery and validator so method owners can tailor packs safely.
Risk Assessor
Runs an assessment from any exported config. It guides users through metadata, scope, surveys and controls, then produces heatmaps, residual risk, treatment plans, evidence assurance, session comparison and exportable reports.
Cost, Time and What Changed
The original product story was a compact AI risk app. The current product is larger: two mature local apps, autosave, evidence handling, AI-assisted workflows, configurable scoring, validation, exports and a full library of JSON packs. The most important change is commercial: new use cases can be shipped as configuration files rather than new software builds.
The Config Ecosystem
The major shift is that the tool is no longer just an AI risk map. It is a reusable framework engine. A JSON pack can represent any domain that can be assessed as scoped risks, evidence questions, controls, appetite and a matrix.
What This Demonstrates
AI is strongest as a builder and analyst
AI accelerated taxonomy design, coding, testing, documentation and pack generation. It is also useful for drafting reports and suggested answers. But the final risk result is still anchored in explicit evidence and deterministic scoring.
JSON turns the app into a product line
Each new configuration can target a market use case without changing the apps. That means the toolkit can grow from one AI map into many sellable assessment packs: cyber insurance, cloud, SaaS, supply chain, privacy, resilience, OT and more.