AI Red Teaming - Enterprise SaaS Security

Active governance.
Eight critical failures.

The platform had GRC tooling, enterprise contracts in the hundreds of millions, and an active compliance motion. It also had a free-trial user who could download every customer's AI-generated intelligence report by iterating a single integer.

Client profile
Enterprise AI-enabled sales intelligence platform
Customer profile
Enterprise buyers - smallest customer at $1B+ revenue
Compliance baseline
Active GRC motion with third-party compliance tooling in use
Assessment scope
5 attack categories - 17 confirmed vulnerability classes
Executive Brief
Compliance tooling is not AI security
  • GRC tooling was active. Enterprise buyers were already enforcing security rigor. The platform's compliance posture, by conventional measures, was reasonable.
  • A single authenticated user - including a free trial account - could retrieve AI-generated research reports belonging to any other customer by iterating a sequential integer. No ownership check. 95 reports obtained in one session.
  • The AI's verbatim system prompt - its confidential operating instructions - was returned to any user who knew to ask for it. This is a map of every AI constraint the platform had. Once extracted, every control built on "the AI won't do X" becomes renegotiable.
  • Attacker-controlled instructions could be written to the AI's persistent configuration. These instructions survived session logout and applied to all future AI interactions for the affected account.
  • The AI generated and published fabricated compliance certifications - SOC 2, ISO 27001, GDPR - with zero content review and no human approval. One API call. Externally shareable.
8
Critical or High severity
confirmed findings
95
Customer PDF reports
obtained via BOLA
4
Critical AI-specific
findings (OWASP LLM)
4
Regulatory frameworks
with confirmed violations

Any authenticated user could download
every customer's AI research library.

The platform's core value proposition is AI-generated research reports on sales target accounts - built over months of AI analysis, news monitoring, and stakeholder profiling. The report download endpoint accepted any account ID without checking whether the requesting user owned that account.

Broken Object-Level Authorization - Confirmed Exploited
Report download endpoint performed no ownership check - 95 PDF reports retrieved across 201 probed accounts

Account IDs were sequential integers. By iterating through a range of IDs and calling the report download endpoint for each, we obtained working download URLs for 95 customer accounts - every account that had generated a report. The request required a valid session token but imposed no check that the token owner had rights to the requested account.

The download URLs embedded temporary cloud storage credentials with a 7-day expiry. This means the exposure window for each exfiltrated link persisted well beyond the discovery session.

What this means operationally: An attacker with a free trial account could spend 10 minutes iterating account IDs and collect the equivalent of years of enterprise sales intelligence on hundreds of companies - all of it the platform's paying customers' proprietary research output.

Precedent: The 2021 Peloton API exposure and 2023 Trello enumeration breach followed the same pattern: a valid session, a predictable identifier, and no ownership check. OWASP classifies this as Broken Object-Level Authorization (BOLA/IDOR) - the most common and most impactful API vulnerability class.

The AI's operating instructions were readable
by any user who asked for them.

The system prompt is the confidential instruction set that governs all AI behavior - what it refuses to do, what data it has access to, how it scopes responses by tenant. We obtained the verbatim system prompt via a direct conversational query. No exploit required. Any user could do this.

System Prompt Extraction - Confirmed - OWASP LLM10
Verbatim role definition, constraint instructions, and tenant scoping rules extracted via chat interface

The extracted content included the AI's primary role definition, its operational goals, the full set of behavioral constraints, and the tenant-scoping logic that governed which data the AI was permitted to surface. All of it returned in a single conversational response.

Why this matters as a precursor attack: System prompt extraction is not typically an end in itself. Its value is as a preparation step. An attacker who knows every constraint the AI operates under - including exactly how refusals are phrased - can craft subsequent prompts that specifically target the gaps. Security researchers have demonstrated that knowing an AI's exact refusal logic reduces the effort to bypass it by an order of magnitude.

Precedent: In 2023, Samsung engineers inadvertently shared proprietary source code and internal AI instructions via ChatGPT, demonstrating that organizations treat system prompts as sensitive operational assets for good reason. In this platform's case, the asset was publicly readable to all authenticated users.

Attacker instructions written to the AI's persistent config
survived session logout.

The platform allowed users to set persistent preferences that were included in every subsequent AI interaction. We confirmed that attacker-controlled instructions could be written to this store through normal authenticated access - and that they remained active after logging out and back in.

Persistent Prompt Injection - Confirmed - OWASP LLM01 - LLM08
Attacker-written instructions persisted across sessions and influenced all subsequent AI responses for the affected account

The attack path required no elevated privilege. A standard authenticated session was sufficient to write arbitrary instructions to the preference store. On the next login, those instructions were included in every AI context window - silently, with no user-facing indication they were present.

We confirmed persistence by: writing a canary instruction in one session, logging out completely, logging in fresh, and verifying that the canary influenced a new AI response without any explicit reference in the new conversation.

What this enables: An attacker with access to a compromised or insider account can permanently alter the AI's behavior for that account - causing it to include specific messaging in all outputs, exfiltrate context to attacker-controlled endpoints via crafted links, or suppress safety controls for all future sessions. The compromise persists until the preference store is explicitly inspected and cleared.

Precedent: Greshake et al. (2023) demonstrated that web content retrieved by Bing Chat could embed instructions that caused the AI to exfiltrate conversation history - the same indirect injection class, but in this case the injection was persistent rather than session-scoped.

The AI generated and published fraudulent
compliance certifications. No review. No approval.

The platform generates AI-powered presentations. Those presentations can be published and shared with external recipients via the platform API. We demonstrated that fabricated compliance certifications - SOC 2, ISO 27001, GDPR - could be packaged into a presentation, published without content review, and made externally shareable in three API calls.

AI-Generated Compliance Fraud - Confirmed - OWASP LLM02 - LLM09
Fabricated SOC 2, ISO 27001, and GDPR certifications published through the platform with zero content moderation and no human approval gate

The attack was three steps: (1) create a presentation with a context payload directing the AI to include specific fabricated certifications, (2) force the presentation to PUBLISHED status via direct API call, (3) force to SHARED status via direct API call. All three steps returned HTTP 200. No content review ran at any stage. No human approved the output.

The certificates cited were entirely invented - including fabricated certification numbers and validity dates. The finished artifact, delivered through the platform's official sharing mechanism, bore no indication that the certifications were AI-generated or unverified.

Why this creates direct legal exposure: The platform sells to compliance-conscious enterprise buyers. A presentation delivered through its official channel carries implicit authority. A recipient who receives a "Compliance Posture" deck generated by the platform's AI has no mechanism to distinguish fabricated certifications from real ones. This is platform-level fraud enablement - actionable under GDPR Art. 5(1)(d) (accuracy principle) and financial services fraud statutes in multiple jurisdictions.

The platform was also one API call away from being the origin of an enterprise deal based on false compliance attestation. For a company whose customers are procurement teams evaluating vendor compliance, this finding is existential.

The rest of the assessment confirmed
how AI risk compounds across the stack.

Beyond the four headline findings, the assessment produced nine additional confirmed or high-confidence vulnerabilities across data access, AI infrastructure, and agent orchestration.

Cross-User PII Access
High
LinkedIn profile data accessible across user boundaries

Personal data of real individuals - names, titles, employers, professional skills - stored in the platform's persona store was readable for any user ID via parameter substitution. Data subjects had no knowledge their data was held, let alone that it was accessible to other platform users. Raises GDPR Art. 32 and India DPDPA Sec. 8(7) processor obligations.

RAG Poisoning
High
Arbitrary file injection into the AI retrieval knowledge base

Document upload to the AI's knowledge retrieval store was confirmed with no upload-time inspection. Malicious content embedded in a document executes when retrieved - the AI reads the document, processes the embedded instructions, and outputs attacker-controlled content to every user who triggers the relevant retrieval path.

Agent Workflow Bypass
High
Human review gate bypassable via direct API parameter

The platform's workflow engine accepted require_human_review: false and status: APPROVED via direct API calls with no server-side enforcement. Human oversight was architectural theatre: the gate existed in the UI but was absent at the API layer. This is a direct EU AI Act Art. 14 violation for systems requiring human oversight.

Infrastructure Enumeration
High
All internal agent names enumerable, including cross-account scope agents

The platform's internal AI agent architecture was fully enumerable. Agent identifiers included names that explicitly indicated their data access scope - including an agent whose name implied cross-account data access. Enumeration of internal AI architecture accelerates targeted attacks by eliminating trial-and-error in attack path design.

Tenant Boundary Failure
High
Session endpoints returned HTTP 200 for cross-tenant user IDs

AI session endpoints accepted user IDs from different tenants without returning an authorization error. An authenticated user in tenant A could reference user IDs from tenant B in API calls and receive a successful response. The tenant boundary was enforced in routing but absent at the authorization layer.

No Rate Limiting
High
Token-flooding and bulk generation workloads accepted without throttling

AI generation endpoints accepted arbitrarily large context payloads and bulk processing requests with no per-user or per-tenant quota enforcement. A single user could trigger generation workloads orders of magnitude beyond normal use - creating a viable denial-of-wallet attack path with no cost controls in place.

Holistic assessment also showed where
existing controls were working.

This was not a platform with no security. Most scenarios were blocked. The pattern was specific: traditional security controls performed well; AI-specific and AI-adjacent controls had gaps.

JWT Integrity
JWT confusion attacks were blocked

Algorithm confusion variants and claim-escalation token tests returned unauthorized responses, indicating correct signature validation on all tested endpoints.

Injection Resistance
Encoding-obfuscated prompt attacks did not land

Base64, URL-encoded, homoglyph, zero-width, and ROT13 injection variants did not trigger canary disclosure - a meaningful positive signal for the platform's prompt filtering layer.

Telemetry
Unauthenticated telemetry injection was rejected

Event forgery and signal pollution attempts were rejected at the telemetry ingestion layer, limiting the risk of unauthenticated observability manipulation.

Admin Surface
No hidden admin or debug routes exposed

Admin endpoint probing, GraphQL introspection attempts, and version-drift discovery tests all returned 403 or 404 - no accidental exposure of privileged surfaces.

SSRF
Server-side request forgery path was rejected

An SSRF-style probe via a platform input accepting external URLs was rejected as invalid input rather than fetched server-side - correctly treated as an attack vector.

Session Controls
Copilot session replay blocked on most routes

Cross-tenant session replay attempts were blocked on the copilot session routes that had proper ownership checks - these controls were present and effective where implemented.

Full findings mapped to OWASP LLM Top 10
and MITRE ATLAS.

Every confirmed finding mapped to the industry standard frameworks enterprise procurement will reference.

Data Access - BOLA / IDOR
Broken object-level authorization on report download endpoint

Report download endpoint accepted any account ID without ownership check. Sequential account IDs enabled full enumeration. 95 confirmed PDF retrievals.

Critical - OWASP API3 - GDPR Art. 32
AI System - LLM10
System prompt extraction via direct conversational query

Verbatim role definition, constraint instructions, and tenant scoping rules returned via chat interface. No authentication escalation required.

Critical - OWASP LLM10 - MITRE ATLAS AML.T0053
AI System - LLM01
Persistent prompt injection via preference store

Attacker-controlled instructions written to persistent AI configuration survived session logout. Influence confirmed via canary instruction persistence test.

Critical - OWASP LLM01 - LLM08
AI System - LLM02 - LLM09
AI-generated fraudulent compliance certifications published with no review

Fabricated SOC 2, ISO 27001, and GDPR certifications packaged into a presentation, forced to PUBLISHED status, and made externally shareable via three API calls with zero content moderation.

Critical - OWASP LLM02 - LLM09 - GDPR Art. 5(1)(d)
Data Access - LLM06
Cross-user PII access - LinkedIn profile data

Personal data of real individuals (names, titles, employers) in the platform's persona store readable for any user ID via parameter substitution. GDPR Art. 32 / DPDPA Sec. 8(7) exposure.

High - OWASP LLM06 - GDPR Art. 32
AI System - LLM03
RAG knowledge base poisoning via arbitrary file upload

Arbitrary file injection into the AI retrieval store confirmed. No upload-time inspection. Malicious documents executable on retrieval - affecting all users who trigger the relevant path.

High - OWASP LLM03 - MITRE ATLAS AML.T0017

The remediation roadmap: full-chain
risk reduction for AI systems.

Findings this deep require a structured remediation sequence - not a patch list. The recommendations were organized by attack surface and impact priority.

Immediate Priority
Enforce server-side ownership on every AI artifact endpoint

Every report download, session lookup, and generated artifact access must validate that the requesting user's session owns the requested object. The account ID in the request must match the token. Presigned URL generation must run through the same ownership check.

Immediate Priority
Add output content governance before AI-generated artifacts are published

No AI-generated content should reach PUBLISHED or SHARED state without a content moderation pass. For high-risk content categories (compliance claims, certifications, factual assertions), require human review as a non-bypassable server-side gate - not a UI-layer check.

Next Priority
Harden AI context boundaries and preference store access

Restrict what can be written to persistent AI configuration stores. Validate and sanitize preference inputs against an allowlist of safe instruction patterns. Log all preference writes as a security event. Add inspection tooling so the preference store state can be audited on demand.

If you sell AI into enterprise accounts,
this is your threat model.

This assessment pattern applies to any AI platform that generates customer-specific artifacts, processes personal data, and sells into compliance-conscious enterprise buyers.

Multi-tenant AI platforms with per-customer artifacts

Research reports, briefs, presentations, generated profiles - any AI output whose business value depends on per-customer data isolation. BOLA on AI-generated artifacts is the most consistently confirmed finding in AI assessments.

Security teams facing enterprise security reviews

Especially where you have compliance tooling and governance documentation, but haven't adversarially tested the AI-specific attack surface. GRC tooling tells you what you've attested to. Red-teaming tells you what an attacker actually finds.

AI platforms with agent workflow automation

If your platform includes workflow approval logic, AI-driven state machines, or human-in-the-loop review gates - those gates must be enforced at the API layer, not the UI layer. UI-layer controls are not security controls.

Founders preparing for regulated enterprise procurement

The output of an adversarial assessment - confirmed findings, blocked controls, OWASP/MITRE-mapped findings, a remediation roadmap - is the document that answers security questionnaires. Not a policy. Evidence.

Find your critical findings
before your enterprise customer does.

Every AI platform we've assessed has had at least one finding that would have ended an enterprise deal. Usually more. 20 minutes is enough to scope whether your AI product has exposure we can find.

Book a 20-min fit call