How We Test, Score, & Rank AI Security Tools

This page explains the methodology behind every review and ranked list on AIsecurityPlatform.com. It is the trust anchor for the site. If you disagree with how we weight a category, you have a public document to argue with — that is the point.

Active comparisons

This methodology section is the single most important trust-building element on the site. We use TWO testing tracks. Every review is clearly badged with which track it went through, so readers always know the depth behind the verdict.

LAB TESTED

The vendor agrees to grant access — trial, sandbox, or production tenant — and the product is deployed in the Cyber Security Services lab. We run a documented set of test scenarios against it. Findings are published with the specific scenarios listed under “What We Tested” and “What We Did Not Test.”

DEMO EVALUATED

The vendor does not agree to lab access (or has not yet responded). The review is based on a live vendor demo, public documentation, customer interviews where possible, and a framework alignment review.

OUTREACH PENDING

We have requested lab access. The review will be upgraded to Lab Tested when the vendor confirms.

VENDOR DECLINED LAB ACCESS

If a vendor explicitly declines lab access after we have requested it, we note this on the review.

Why we publish demo-only reviews

Buyers need information now. Lab-tested reviews are more rigorous, but demo-evaluated reviews are honest about their depth and still useful. Refusing to publish anything below lab-tested would slow the buyer down without making them safer.

Subscribe to our newsletter

Quarterly updates when our Best Of lists are refreshed, plus a short note when a vendor ships a material change or we revise a score. No promotional email.

Standard lab test scenarios

PII detection

50 prompts containing US SSNs, phone numbers, email addresses, ZIP+4.

PHI detection

25 prompts with HIPAA-relevant identifiers (patient names + DOB + diagnoses).

payment data

5 prompts with credit card numbers bank account numbers, routing numbers.

Secrets & credentials

25 prompts with AWS access keys, GCP service account JSON, Azure.

Source code

25 prompts with proprietary-style code blocks.

Prompt injection

10 known indirect prompt injection scenarios from public OWASP & Lakera test sets.

Policy enforcement

Verify block, warn, allow, redact behaviors match configured policy.

Audit logging

Verify what is logged, what is not, and retention behavior.

SSO integration

Test Microsoft Entra ID and Okta where supported.

Latency

Measure added latency on standard prompt sizes (note: tested at concurrency).

Lab access policy

Vendors can request lab inclusion via /contact/. We do not accept payment for lab inclusion. We do not share confidential vendor implementation details — only test results.

Annual Refresh and 2027 Readiness

Our “Best Of” rankings are published as year-specific editions. Each January, we publish a new annual edition that supersedes the prior year’s ranking. We do this because the AI security category moves fast vendors are acquired, products pivot, and new categories emerge.

Year-stamped editions are honest about when each ranking was made, while a permanent canonical URL ensures that buyers searching for the current year always land on the latest edition.

Annual Refresh Cadence

Re-test every product still in scope, applying our standard testing scenarios.
Add any product that has met our inclusion threshold during the prior year.
Remove any product that has been discontinued, acquired into another product, or that has materially fallen behind.
Re-score every product against the current methodology weights.
Publish the new year’s edition at /best/[topic]-[year]/ & update the year-less canonical 301 redirect.
Add a banner to the prior year’s archive linking to the new edition.
Update every review’s Changelog with the date of re-test.

Quarterly & Event-Driven Updates

Outside the annual refresh, we update reviews:

Quarterly

every product review is reviewed for material changes and the “Last updated” timestamp is refreshed.

Event-driven

A vendor announces a major release, gets acquired, raises funding that changes its trajectory, or has a public security incident — we update within 14 days.

Lab access changes

when a vendor moves from Demo Evaluated to Lab Tested (or vice versa), the review is upgraded and re-dated immediately.

Why We Date Reviews

Every review carries a “Last updated” line and a Changelog. This serves two purposes: readers know exactly how current the information is, and vendors know we’re paying attention. A review that hasn’t been updated in 12 months is a flag for us, not just for readers.

The scoring rubric

Every reviewed product receives a score from 1 to 10 on each of seven dimensions. The dimensions and their weights are:

Dimension

Coverage breadth

Detection accuracy / efficacy

Deployment friction

Policy & control depth

Framework alignment

Pricing transparency

Customer support documentation

Weight

20%

15%

10%

What it measures

How many AI surfaces the product protects: ChatGPT, Claude, Gemini, Perplexity, embedded AI in SaaS, custom LLM apps, AI agents, MCP servers, browser-based use, endpoint use.

How reliably the product identifies the threats it claims to detect: PII and PHI in prompts, source code, secrets, prompt injection, jailbreaks, data exfiltration patterns. Evaluated through vendor-provided test results, public benchmarks, and customer interviews where available.

Time to first value. Agents required? Browser extension? Network proxy? SSO support? Average time from contract to enforced policy.

Granularity of policy primitives — block, warn, redact, allow; per-user, per-group, per-application; admin override; audit logging.

Mapping to NIST AI RMF functions, OWASP LLM Top 10 risks, OWASP Agentic Top 10, ISO/IEC 42001 controls, and EU AI Act obligations.

Whether pricing is published, whether quotes are reproducible, whether buyers can model costs without a sales call.

Public documentation depth, response times, named CSM availability at relevant tiers.

Coverage breadth

Weight

20%

What it measures

How many AI surfaces the product protects: ChatGPT, Claude, Gemini, Perplexity, embedded AI in SaaS, custom LLM apps, AI agents, MCP servers, browser-based use, endpoint use.

Quotes from named customers only. If a customer requires anonymity, we describe their industry and size, but we do not publish anonymous criticism of competitors.

No vendor approval over reviews

Vendors get a fact-check pass before publication limited to factual errors, not editorial framing.

Update cadence

Best Of lists are refreshed annually, on a rolling schedule by category. Individual reviews are updated ad hoc when:

A vendor ships a material product change (new module, removal of a capability, deployment model change).
Pricing publicly changes.
An acquisition or other ownership change occurs (e.g., the F5 acquisition of CalypsoAI in October 2025, which is why CalypsoAI is no longer used as a standalone comparison target on this site).
A material customer-reported failure mode emerges.

Conflict of interest policy

AIsecurityPlatform.com is published by Cyber Security Services. The same company produces AILeakShield, an AI DLP product reviewed on this site.

We handle this conflict in three ways. First, every page that mentions AILeakShield carries a disclosure callout at the top. Second, AILeakShield is scored using the same published rubric as every other product, by the same reviewer, with the same vendor-briefing process. Third, AILeakShield is ranked on its actual feature scope, which is narrower than several other products in the same category & we do not place it #1 on a list it does not belong at #1 on. See our full disclosure for details.

Corrections

If you find a factual error, email hello@aisecurityplatform.com. Confirmed corrections are made within five business days, and a note is added to the review’s changelog.

How a review is built, end to end

Walking through the lifecycle of a typical review:

Identification

A product enters the review queue either because it appears in our category-monitoring sweep or because the vendor submits it through the contact page. Submission does not guarantee coverage; we evaluate fit against category scope & current backlog.

Pre briefing research

The reviewer reviews public documentation, product pages, technical whitepapers, & any third-party reports the vendor publishes. The seven-dimension rubric is annotated with what is verifiable from public sources, with gaps marked for the briefing.

Vendor briefing

A 60-minute live walkthrough with
a product or technical lead. We submit the rubric in advance and use the time to fill gaps, not to receive a sales pitch. Vendors who decline to brief still get reviewed; sections without information are marked “open question.

Customer references

Where possible, we conduct one to three reference calls — vendor-provided plus our own network. References are weighted; independent references count more than vendor-provided.

Drafting

The reviewer writes the review against the seven-dimension rubric. The score is computed as the weighted average and rounded to one decimal place.

Fact-check

The vendor receives the review for fact-check before publication, limited to factual errors. We do not negotiate framing, score, or open questions.

Publication

The review is published with a datePublished and dateModified timestamp.

Maintenance

Reviews are updated ad hoc when material changes occur. The Best Of list is refreshed annually.

How we treat open questions

Every category has questions where the answer is not yet public. We mark these as “open question” rather than guess. Common open-question patterns:

Independent benchmarks

the AI security category does not yet have an independent equivalent of MITRE ATT&CK Evaluations. We rely on vendor-published benchmarks and customer reference experience until that changes.

Pricing

Many vendors are quote-based at enterprise. We score the transparency, not the absolute price; published pricing wins on this dimension.

Framework mapping documents

we expect vendors to publish ISO 42001 and EU AI Act mappings as those frameworks mature. Where mappings are not yet published, we ask and note.

Roadmap items

we do not score against announced roadmap. Reviews reflect the product as of the review date, not what is promised in the next quarter.

How scoring weights were chosen

The weights reflect the questions security buyers actually ask in evaluations, not the questions vendors prefer to answer.

Coverage breadth and detection accuracy together carry 40% of the score because, in practice, those are the first two questions every buyer asks: “what does it cover” and “how well does it work.” Deployment friction and policy depth carry 30% combined because the post-purchase experience determines whether the program actually launches.

Framework alignment is 10% rather than higher because frameworks change slower than products, and a strong product without a published mapping document can still serve a framework-aligned program. Pricing transparency is 10% to reward the reasonable behavior of publishing prices without overweighting against products with otherwise enterprise-only sales motions. Support is 10% because, in our reference interviews, support quality is among the top three predictors of program success but is hard to evaluate before purchase.

Other weighting schemes are defensible. We chose this one and made it public so buyers can argue with us specifically.

FAQ

Why these seven dimensions?

They reflect the questions security buyers actually ask in evaluations: what does it cover, how well does it detect, how hard is it to deploy, how granular are the controls, does it map to my framework requirements, can I budget for it, and will the vendor support me. Other dimensions matter — vendor financial health, geography, language coverage — but they are secondary inputs to the buyer-fit narrative, not scoring categories.

Why is pricing transparency 10%?

Buyers consistently rank it among the top three frustrations in this category. We weight it enough to reward published pricing, but not so much that an otherwise-strong product is buried because its sales motion is enterprise-only.

Do you test in a lab?

Yes — for products where the vendor grants access. The Cyber Security Services lab runs the standard scenarios documented above against any product whose vendor agrees to participate. AILeakShield is currently the only product in our lab; outreach is pending with the rest of the reviewed vendors. Lab-tested reviews carry a 🧪 badge; demo-evaluated reviews carry a 📺 badge with ⏳ Outreach Pending until the vendor responds.

Can a product disagree with its score?

Yes. We accept written rebuttals and publish a short response in the review’s footer where the disagreement is substantive.

Reviews

Research

PDFs

Best Of

How We Test, Score, & Rank AI Security Tools

Active comparisons

LAB TESTED

DEMO EVALUATED

OUTREACH PENDING

VENDOR DECLINED LAB ACCESS

Why we publish demo-only reviews

Subscribe to our newsletter

Standard lab test scenarios

PII detection

PHI detection

payment data

Secrets & credentials

Source code

Prompt injection

Policy enforcement

Audit logging

SSO integration

Latency

Lab access policy

Annual Refresh and 2027 Readiness

Annual Refresh Cadence

Quarterly & Event-Driven Updates

Quarterly

Event-driven

Lab access changes

Why We Date Reviews

The scoring rubric

Dimension

Weight

What it measures

Coverage breadth

Detection accuracy

Deployment friction

Policy & control depth

Framework alignment

Pricing transparency

Support & documentation

Featured Guides

No pay-for-rank

No paid placements affecting score.

No anonymous quotes from competitors

No vendor approval over reviews

Update cadence

Conflict of interest policy

Corrections

How a review is built, end to end

Identification

Pre briefing research

Vendor briefing

Customer references

Drafting

Fact-check

Publication

Maintenance

How we treat open questions

Independent benchmarks

Pricing

Framework mapping documents

Roadmap items

How scoring weights were chosen

FAQ