Risk Assessment ToolkitThe Challenge
Context and rationale

From one AI risk map to a configurable assessment platform

The original challenge was to see whether AI-assisted development could build a useful, auditable risk tool from prompts and direction. The result has evolved into two local browser apps: a configurator for designing risk frameworks and an assessor for running evidence-based assessments against reusable JSON packs.

The Problem

Organisations need fast, repeatable ways to assess cyber, AI, privacy, supplier, continuity and operational risk. Traditional GRC platforms are often expensive and slow to tailor. Spreadsheets are flexible but inconsistent. Runtime AI scoring can be impressive, but it is hard to defend when an auditor, regulator or board asks why a risk landed in a particular band.

The governance gap

Risk teams need reusable methods, clear evidence trails, transparent scoring and reports that survive challenge. They also need to adapt quickly as new risk themes emerge: agentic AI, cyber insurance, SaaS posture, SBOMs, CISA KEV, cloud landing zones and concentration risk.

The design response

The toolkit separates method design from assessment execution. The configurator creates the map, questions, controls and scoring model. The assessor loads that JSON and applies the method consistently, offline, with no hidden server-side scoring.

The Original Challenge

The first version was deliberately constrained: could a useful governance tool be built with AI assistance, clear direction and review, without hand-coding the application line by line? The point was not to produce a toy demo. It was to test whether a real business problem could become a working, documented, auditable local application quickly enough to matter.

The business problem

AI adoption was moving faster than practical AI risk governance. Existing approaches were either generic IT risk templates, consultant-heavy frameworks or opaque tools. The gap was a structured assessment tool that a risk or security team could run locally and explain afterwards.

The experiment

The rule was simple: use AI for research, taxonomy design, code generation, testing and documentation, while the human supplies judgement, direction and acceptance. The human role was to define what good looks like and reject work that did not meet it.

Why Deterministic Scoring

ChoiceStrengthRiskOutcome
Runtime AI scoring Can interpret messy context and produce fluent narratives. Hard to reproduce, hard to audit, can change between runs, and may expose assessment data to external APIs. Useful for drafting and analysis, but weak as the core scoring authority.
Deterministic evidence scoring Same answers produce the same result every time. The formula, matrix and controls are visible. Requires a well-designed taxonomy and disciplined config maintenance. Chosen for the assessor. AI can assist around the edges, but the risk result remains explainable.
The current apps still include AI-assisted features, but they are human-in-the-loop: scoping suggestions, survey pre-fill, treatment drafting, report generation and config drafting. The risk calculation itself remains transparent and deterministic.

How It Was Built

The build became a loop: research the domain, model the method, implement the local browser apps, test against edge cases, then generalise the method into reusable JSON packs.

1
Research and taxonomyAI helped compare cyber, AI, privacy, cloud, resilience and control frameworks, then convert them into families, periods, risks, questions and controls that fit the app schema.
2
Scoring modelThe core calculation was kept deterministic: base score, optional likelihood, survey posture, control effectiveness, appetite and matrix lookup. That makes results repeatable and auditable.
3
Two-app architectureThe configurator owns method design. The assessor owns evidence collection and reporting. This separation keeps the tool flexible without making every assessment a method-building exercise.
4
Pack generationOnce the schema worked, new markets could be served through JSON: cyber insurance, ransomware, Essential Eight, SaaS, SBOM, NIST CSF, OT, privacy, healthcare and more.

Prompting Strategy

The most useful prompting pattern was not asking for code first. It was asking the model to clarify the problem, generate the development brief, challenge the brief, and only then implement. That reduced rework because the early prompts focused on intent, constraints and acceptance criteria.

TechniqueHow it helped
Reverse promptingStart with the goal, ask AI what it needs to know, answer those questions, then use the refined brief for implementation.
Cross-model reviewUse another model or another pass to find missing requirements, contradictions and weak assumptions.
Outcome-first promptsDescribe the business outcome, not just the component. This helped the apps stay coherent across scoring, UI, exports and config schema.
Validation promptsAsk for tests and edge cases, then run them. The project improved when failures were treated as design feedback rather than just bugs.

The Current Product

2
main HTML apps
23
config packs in the catalogue
0
backend services required
1
portable JSON method format

Risk Configurator

Defines the assessment method: families, periods, risk cells, scope logic, vendor questions, internal questions, controls, scoring options, risk matrix, appetite and AI prompt. It includes a config gallery and validator so method owners can tailor packs safely.

Risk Assessor

Runs an assessment from any exported config. It guides users through metadata, scope, surveys and controls, then produces heatmaps, residual risk, treatment plans, evidence assurance, session comparison and exportable reports.

Cost, Time and What Changed

The original product story was a compact AI risk app. The current product is larger: two mature local apps, autosave, evidence handling, AI-assisted workflows, configurable scoring, validation, exports and a full library of JSON packs. The most important change is commercial: new use cases can be shipped as configuration files rather than new software builds.

Local
runs in browser files
Config
new markets via JSON
Audit
transparent calculations
AI
assistive, not authoritative
The lesson from the challenge still holds: the bottleneck is not typing code. It is knowing what the tool must prove, what evidence it needs, which decisions must remain human, and how to make the output defensible.

The Config Ecosystem

The major shift is that the tool is no longer just an AI risk map. It is a reusable framework engine. A JSON pack can represent any domain that can be assessed as scoped risks, evidence questions, controls, appetite and a matrix.

Essential EightML1, ML2 and ML3 across on-prem, cloud and SaaS.
Cyber InsuranceQuote, underwriting, renewal and claim-readiness evidence.
Ransomware ResiliencePrevent, detect, contain and recover from extortion attacks.
AI and Agentic AIInventory, prompt injection, tool use, identity and oversight.
SaaS Security PostureIdentity, OAuth, tenant config, logging, backup and vendor assurance.
Software Supply ChainSBOMs, CI/CD, secrets, provenance, signing and release governance.
NIST CSF Board RiskExecutive posture across Govern, Identify, Protect, Detect, Respond and Recover.
Cloud Landing ZoneIdentity, network, logs, keys, workloads, data, backup and policy.
Specialist PacksOT, healthcare, privacy, BCM, M&A, CTEM, project gating and small business.

What This Demonstrates

AI is strongest as a builder and analyst

AI accelerated taxonomy design, coding, testing, documentation and pack generation. It is also useful for drafting reports and suggested answers. But the final risk result is still anchored in explicit evidence and deterministic scoring.

JSON turns the app into a product line

Each new configuration can target a market use case without changing the apps. That means the toolkit can grow from one AI map into many sellable assessment packs: cyber insurance, cloud, SaaS, supply chain, privacy, resilience, OT and more.