The platform had GRC tooling, enterprise contracts in the hundreds of millions, and an active compliance motion. It also had a free-trial user who could download every customer's AI-generated intelligence report by iterating a single integer.
The platform's core value proposition is AI-generated research reports on sales target accounts - built over months of AI analysis, news monitoring, and stakeholder profiling. The report download endpoint accepted any account ID without checking whether the requesting user owned that account.
Account IDs were sequential integers. By iterating through a range of IDs and calling the report download endpoint for each, we obtained working download URLs for 95 customer accounts - every account that had generated a report. The request required a valid session token but imposed no check that the token owner had rights to the requested account.
The download URLs embedded temporary cloud storage credentials with a 7-day expiry. This means the exposure window for each exfiltrated link persisted well beyond the discovery session.
What this means operationally: An attacker with a free trial account could spend 10 minutes iterating account IDs and collect the equivalent of years of enterprise sales intelligence on hundreds of companies - all of it the platform's paying customers' proprietary research output.
Precedent: The 2021 Peloton API exposure and 2023 Trello enumeration breach followed the same pattern: a valid session, a predictable identifier, and no ownership check. OWASP classifies this as Broken Object-Level Authorization (BOLA/IDOR) - the most common and most impactful API vulnerability class.
The system prompt is the confidential instruction set that governs all AI behavior - what it refuses to do, what data it has access to, how it scopes responses by tenant. We obtained the verbatim system prompt via a direct conversational query. No exploit required. Any user could do this.
The extracted content included the AI's primary role definition, its operational goals, the full set of behavioral constraints, and the tenant-scoping logic that governed which data the AI was permitted to surface. All of it returned in a single conversational response.
Why this matters as a precursor attack: System prompt extraction is not typically an end in itself. Its value is as a preparation step. An attacker who knows every constraint the AI operates under - including exactly how refusals are phrased - can craft subsequent prompts that specifically target the gaps. Security researchers have demonstrated that knowing an AI's exact refusal logic reduces the effort to bypass it by an order of magnitude.
Precedent: In 2023, Samsung engineers inadvertently shared proprietary source code and internal AI instructions via ChatGPT, demonstrating that organizations treat system prompts as sensitive operational assets for good reason. In this platform's case, the asset was publicly readable to all authenticated users.
The platform allowed users to set persistent preferences that were included in every subsequent AI interaction. We confirmed that attacker-controlled instructions could be written to this store through normal authenticated access - and that they remained active after logging out and back in.
The attack path required no elevated privilege. A standard authenticated session was sufficient to write arbitrary instructions to the preference store. On the next login, those instructions were included in every AI context window - silently, with no user-facing indication they were present.
We confirmed persistence by: writing a canary instruction in one session, logging out completely, logging in fresh, and verifying that the canary influenced a new AI response without any explicit reference in the new conversation.
What this enables: An attacker with access to a compromised or insider account can permanently alter the AI's behavior for that account - causing it to include specific messaging in all outputs, exfiltrate context to attacker-controlled endpoints via crafted links, or suppress safety controls for all future sessions. The compromise persists until the preference store is explicitly inspected and cleared.
Precedent: Greshake et al. (2023) demonstrated that web content retrieved by Bing Chat could embed instructions that caused the AI to exfiltrate conversation history - the same indirect injection class, but in this case the injection was persistent rather than session-scoped.
The platform generates AI-powered presentations. Those presentations can be published and shared with external recipients via the platform API. We demonstrated that fabricated compliance certifications - SOC 2, ISO 27001, GDPR - could be packaged into a presentation, published without content review, and made externally shareable in three API calls.
The attack was three steps: (1) create a presentation with a context payload directing the AI to include specific fabricated certifications, (2) force the presentation to PUBLISHED status via direct API call, (3) force to SHARED status via direct API call. All three steps returned HTTP 200. No content review ran at any stage. No human approved the output.
The certificates cited were entirely invented - including fabricated certification numbers and validity dates. The finished artifact, delivered through the platform's official sharing mechanism, bore no indication that the certifications were AI-generated or unverified.
Why this creates direct legal exposure: The platform sells to compliance-conscious enterprise buyers. A presentation delivered through its official channel carries implicit authority. A recipient who receives a "Compliance Posture" deck generated by the platform's AI has no mechanism to distinguish fabricated certifications from real ones. This is platform-level fraud enablement - actionable under GDPR Art. 5(1)(d) (accuracy principle) and financial services fraud statutes in multiple jurisdictions.
The platform was also one API call away from being the origin of an enterprise deal based on false compliance attestation. For a company whose customers are procurement teams evaluating vendor compliance, this finding is existential.
Beyond the four headline findings, the assessment produced nine additional confirmed or high-confidence vulnerabilities across data access, AI infrastructure, and agent orchestration.
Personal data of real individuals - names, titles, employers, professional skills - stored in the platform's persona store was readable for any user ID via parameter substitution. Data subjects had no knowledge their data was held, let alone that it was accessible to other platform users. Raises GDPR Art. 32 and India DPDPA Sec. 8(7) processor obligations.
Document upload to the AI's knowledge retrieval store was confirmed with no upload-time inspection. Malicious content embedded in a document executes when retrieved - the AI reads the document, processes the embedded instructions, and outputs attacker-controlled content to every user who triggers the relevant retrieval path.
The platform's workflow engine accepted require_human_review: false and status: APPROVED via direct API calls with no server-side enforcement. Human oversight was architectural theatre: the gate existed in the UI but was absent at the API layer. This is a direct EU AI Act Art. 14 violation for systems requiring human oversight.
The platform's internal AI agent architecture was fully enumerable. Agent identifiers included names that explicitly indicated their data access scope - including an agent whose name implied cross-account data access. Enumeration of internal AI architecture accelerates targeted attacks by eliminating trial-and-error in attack path design.
AI session endpoints accepted user IDs from different tenants without returning an authorization error. An authenticated user in tenant A could reference user IDs from tenant B in API calls and receive a successful response. The tenant boundary was enforced in routing but absent at the authorization layer.
AI generation endpoints accepted arbitrarily large context payloads and bulk processing requests with no per-user or per-tenant quota enforcement. A single user could trigger generation workloads orders of magnitude beyond normal use - creating a viable denial-of-wallet attack path with no cost controls in place.
This was not a platform with no security. Most scenarios were blocked. The pattern was specific: traditional security controls performed well; AI-specific and AI-adjacent controls had gaps.
Algorithm confusion variants and claim-escalation token tests returned unauthorized responses, indicating correct signature validation on all tested endpoints.
Base64, URL-encoded, homoglyph, zero-width, and ROT13 injection variants did not trigger canary disclosure - a meaningful positive signal for the platform's prompt filtering layer.
Event forgery and signal pollution attempts were rejected at the telemetry ingestion layer, limiting the risk of unauthenticated observability manipulation.
Admin endpoint probing, GraphQL introspection attempts, and version-drift discovery tests all returned 403 or 404 - no accidental exposure of privileged surfaces.
An SSRF-style probe via a platform input accepting external URLs was rejected as invalid input rather than fetched server-side - correctly treated as an attack vector.
Cross-tenant session replay attempts were blocked on the copilot session routes that had proper ownership checks - these controls were present and effective where implemented.
Every confirmed finding mapped to the industry standard frameworks enterprise procurement will reference.
Report download endpoint accepted any account ID without ownership check. Sequential account IDs enabled full enumeration. 95 confirmed PDF retrievals.
Verbatim role definition, constraint instructions, and tenant scoping rules returned via chat interface. No authentication escalation required.
Attacker-controlled instructions written to persistent AI configuration survived session logout. Influence confirmed via canary instruction persistence test.
Fabricated SOC 2, ISO 27001, and GDPR certifications packaged into a presentation, forced to PUBLISHED status, and made externally shareable via three API calls with zero content moderation.
Personal data of real individuals (names, titles, employers) in the platform's persona store readable for any user ID via parameter substitution. GDPR Art. 32 / DPDPA Sec. 8(7) exposure.
Arbitrary file injection into the AI retrieval store confirmed. No upload-time inspection. Malicious documents executable on retrieval - affecting all users who trigger the relevant path.
Findings this deep require a structured remediation sequence - not a patch list. The recommendations were organized by attack surface and impact priority.
Every report download, session lookup, and generated artifact access must validate that the requesting user's session owns the requested object. The account ID in the request must match the token. Presigned URL generation must run through the same ownership check.
No AI-generated content should reach PUBLISHED or SHARED state without a content moderation pass. For high-risk content categories (compliance claims, certifications, factual assertions), require human review as a non-bypassable server-side gate - not a UI-layer check.
Restrict what can be written to persistent AI configuration stores. Validate and sanitize preference inputs against an allowlist of safe instruction patterns. Log all preference writes as a security event. Add inspection tooling so the preference store state can be audited on demand.
This assessment pattern applies to any AI platform that generates customer-specific artifacts, processes personal data, and sells into compliance-conscious enterprise buyers.
Research reports, briefs, presentations, generated profiles - any AI output whose business value depends on per-customer data isolation. BOLA on AI-generated artifacts is the most consistently confirmed finding in AI assessments.
Especially where you have compliance tooling and governance documentation, but haven't adversarially tested the AI-specific attack surface. GRC tooling tells you what you've attested to. Red-teaming tells you what an attacker actually finds.
If your platform includes workflow approval logic, AI-driven state machines, or human-in-the-loop review gates - those gates must be enforced at the API layer, not the UI layer. UI-layer controls are not security controls.
The output of an adversarial assessment - confirmed findings, blocked controls, OWASP/MITRE-mapped findings, a remediation roadmap - is the document that answers security questionnaires. Not a policy. Evidence.
Every AI platform we've assessed has had at least one finding that would have ended an enterprise deal. Usually more. 20 minutes is enough to scope whether your AI product has exposure we can find.
Book a 20-min fit call