How we test AI agents — and what ALESA NOVA defends.
We do not claim "bullet-proof." We give you mechanical, human-in-the-loop controls and a reproducible test suite you can run yourself. Below: the recognised agent-safety toolset, what each attack class is, the ALESA mechanism that defends it, and how to verify it independently.
52 reproducible tests — including an adversarial red-team. Run them yourself.
The strongest assurance is not our word — it is a test you can re-run. The enforcement layer ships with a reproducible suite. Extract the package, install, and watch every gate prove itself.
The recognised tools for testing AI agents — and what each attacks.
These are the industry / research-recognised tools used to probe AI agents. We explain what each one attacks so you understand the threat model ALESA NOVA is built against.
| Tool | Who | What it attacks / probes |
|---|---|---|
| AgentDojo | ETH Zürich (SPY Lab) | Prompt-injection against tool-using agents — can the agent be hijacked by malicious data or tool output into performing unintended actions? |
| Garak | NVIDIA | LLM vulnerability scanner — probes for data leakage, exfiltration, jailbreak, and prompt injection. |
| AgentHarm | UK AI Safety Institute / Gray Swan | Measures whether an agent can be persuaded to carry out harmful tasks (refusal vs compliance). |
| τ-bench | Sierra | Agent reliability & consistency in real tool-use — does it do the same correct thing, repeatably? |
| Inspect | UK AI Safety Institute | An evaluation framework for authoring and running custom agent evaluations, including deception / scheming tests. |
Every attack class → the ALESA mechanism → the reproducible test.
This is the heart of it: for each attack the toolset probes, the specific ALESA control that defends it, and the test in our suite that proves it.
| Attack class (probed by) | ALESA mechanism that defends it | Reproducible test |
|---|---|---|
| Prompt-injection hijack (AgentDojo) | Policy-Enforcement-Point with server-derived identity (a caller can't forge its role) + tool-allowlist (deny-by-default) + human gate on consequential actions | E2 + E10 (10 tests) |
| Data leakage / exfiltration (Garak) | Secret-leak gate + outbound exfil block + data-classification floor (PII can't be mislabelled) + local-only residency | E1 + E9 (10 tests) |
| Persuaded to harmful action (AgentHarm) | Operator governance + hard-block on sensitive actions (payment/auth/destructive) + RBAC + mandatory human approval | E4 + hard-block (6 tests) |
| Unreliable / inconsistent (τ-bench) | Deterministic mechanical gates + tamper-evident audit + verify-before-done | E3 (6 tests) |
| Tamper with the record / cover tracks | Audit external anchor — a tamper + re-hash of the ledger is detected, not silently accepted | E3 (audit-anchor) |
| Forged identity / privilege escalation | Identity is server-derived; any caller-supplied role/actor is ignored | E2 (PEP) |
| Destructive action (DB wipe, rm -rf) | Risk-floor (destructive never trivially allowed) + no-edit-without-backup + multi-layer block | E2 + E13 (egress) |
| Any model (Claude / Codex / DeepSeek / Qwen / local) | Model-agnostic chokepoint — the gate takes no model parameter; every model passes the identical enforcement | gateway-binding (model-agnostic test) |
What makes ALESA NOVA different — verified in code.
Every model obeys the same gate
Enforcement lives outside the model. Whether the agent is Claude, Codex, DeepSeek, Qwen, or a local model, it hits the identical mechanical gate. A foreign, compromised, or injected model is constrained the same way — because we do not rely on the model being well-behaved.
The agent cannot disable its own guardrails
An override requires an out-of-band, password-verified action by a human. The gate, doctrine, and settings files are edit-denied to the agent, and the agent holds no password. This is the first question every CISO asks — and it is verified in the code.
Eight mechanical controls, built test-first.
E1 · Residency
local-only model routing; sensitive data never leaves the premise.
E2 · Policy gate
server-derived identity, destructive risk-floor, RBAC.
E3 · Audit anchor
tamper + re-hash of the ledger is detected.
E4 · Identity
MFA, segregation of duties, session idle-timeout.
E9 · Classification
PII floors the data class; mislabel can't exfil.
E10 · Tool allowlist
exact, deny-by-default, signed; alias/injection denied.
E11 · Incident SLA
6-hour / 72-hour notification logic + dispatch.
E13 · Egress isolation
default-deny network egress (on the on-prem server).
Designed in alignment with Malaysian regulatory frameworks.
Alignment by design, at control-family level — not a certification claim. Formal certification is an independent process.
| Framework | Relevant alignment |
|---|---|
| BNM RMiT | access control, segregation of duties, change management, data-loss prevention, technology audit |
| NACSA · Cyber Security Act 2024 | cyber risk management, incident-notification readiness (E11) |
| MAMPU / JDN | data residency / on-prem processing (E1, E13) |
| PDPA 2010 + Amendment 2024 | data classification, breach-notification logic, cross-border control |
| ISO/IEC 27001:2022 | logging & monitoring, secure operations |
What is proven today, and what is in progress.
Don't take our word. Run the tests.
For SIRIM, government, banking, and regulated teams: request the reproducibility pack and verify ALESA NOVA's enforcement independently.