Data poisoning isn’t a future threat. It’s already reshaping how AI systems learn — and the implications for enterprise software are more consequential than anyone is admitting.
There is a foundational assumption baked into nearly every enterprise AI project underway right now: that the model being deployed is trustworthy because it was trained on good data. Security teams worry about who can access the model. Compliance teams worry about what the model outputs. Almost nobody is asking who shaped what the model learned in the first place.
That assumption deserves to be pressure-tested. Urgently.
The Scale Illusion
For most of the past decade, the prevailing view in AI security was that data poisoning — the deliberate corruption of training data to manipulate a model’s behaviour — was a theoretical concern most relevant to small, narrow models. Large foundation models trained on hundreds of billions of tokens, the argument went, would be inherently resistant. You couldn’t meaningfully skew a model that had read most of the internet.
That argument is now empirically broken.
In October 2025, researchers from the Alan Turing Institute, working in collaboration with the AI Security Institute and Anthropic, published what they described as the largest investigation of data poisoning conducted to date. The finding was stark: the number of malicious documents required to successfully embed a backdoor in an LLM was approximately 250 — regardless of whether the model had 600 million parameters or 13 billion. Model size, it turned out, offered essentially no additional protection.
What this means in practice is worth sitting with. An attacker who can publish 250 carefully crafted web pages, forum posts, or Wikipedia edits has a plausible path to embedding persistent, triggerable behaviour into any LLM trained on public internet data. The attack surface isn’t a server or an API endpoint. It’s the open web itself — and it has been accumulating malicious content for years.
The number of malicious documents required to poison a model was near-constant — around 250 — regardless of model size. This directly contradicts the assumption that larger AI systems are inherently more resistant to manipulation.
The Persistence Problem
To understand why this matters more than conventional AI security threats, it helps to think carefully about the difference between a prompt injection attack and a data poisoning attack. Prompt injection is a runtime problem: an attacker feeds malicious instructions into a live model to override its immediate behaviour. It’s dangerous, but it is also transient and, in principle, detectable. The model behaves oddly in the moment. Logs exist.
Data poisoning is different in kind, not just degree. The malicious instruction isn’t delivered at runtime — it is baked into the model’s weights during training, creating what researchers call a backdoor: a dormant behaviour that activates only when a specific trigger phrase or condition is met. The model passes every standard benchmark. It performs well on evaluation sets. It looks, by every conventional measure, exactly like a well-behaved system — until it isn’t.
The medical AI research published in the Journal of Medical Internet Research in January 2026, synthesising findings from 41 security studies across NeurIPS, ICML, and Nature Medicine, puts hard numbers on the detection problem. Detection delays for poisoning attacks commonly range from six to twelve months, and may extend to years in federated or privacy-constrained environments. The attack does not announce itself. It waits.
Perhaps most unsettling: the research found that attack success depends on the absolute number of poisoned samples rather than their proportion of the training corpus. There is no safety in scale. An organisation that assumes its risk is mitigated because it trains on large datasets is operating on a false model of the threat.
The Expanding Attack Surface
If the threat were confined to foundation model training — something only OpenAI, Google, and Anthropic need to worry about — this would be consequential but at least contained. It isn’t contained.
Lakera’s 2026 threat landscape overview documents something that should recalibrate how every enterprise thinks about its AI infrastructure. In 2025, poisoning attacks expanded beyond training pipelines to target three new vectors: retrieval-augmented generation (RAG) systems, third-party tool integrations including MCP servers, and synthetic data pipelines used to generate training data at scale.
The RAG vector is particularly important for enterprise deployments. A RAG system works by retrieving relevant documents at runtime to augment a model’s response. If those documents — the knowledge base, the document store, the SharePoint index — contain poisoned content, every query that retrieves that content is compromised. This isn’t a training-time problem. It’s an ongoing, live exposure that grows as the document corpus grows.
The synthetic data vector is even more troubling in the long run. The so-called Virus Infection Attack, benchmarked at ICML 2025, demonstrated that poisoned content can propagate through synthetic data generation pipelines — meaning that a single corrupted source, passed through a data augmentation or distillation step, can produce thousands of corrupted training examples. Poisoning, in this model, is not just persistent. It is self-replicating.
Check Point’s 2026 Tech Tsunami report calls prompt injection and data poisoning the “new zero-day” threats in AI systems. Unlike a software CVE, there is no patch. Maintaining model integrity becomes a continuous operational discipline.
The Agentic Multiplier
There is a timing dimension to this problem that makes 2026 a particularly critical inflection point. For most of the past three years, enterprise AI deployments have been largely assistive: models that answer questions, summarise documents, draft text. A poisoned model in this configuration is dangerous, but human oversight creates a natural circuit-breaker. Someone reads the output before anything consequential happens.
That circuit-breaker is being systematically removed. Agentic AI — systems that can make decisions, execute workflows, and interact with external services without human review of each step — is transitioning from pilot to production across financial services, healthcare, government, and logistics. Analysts broadly agree that 2026 marks the mainstreaming of this shift.
The consequence for data poisoning risk is non-linear. A backdoor embedded in an agentic AI doesn’t just produce a bad answer that a human can catch. It executes a bad action — allocates resources, approves a transaction, triggers an API call, modifies a record — before any oversight occurs. As one security researcher framed it, when something goes wrong in an agentic system, a single introduced error can propagate through the entire pipeline and corrupt it. The attack surface and the blast radius both expand simultaneously.
What Responsible Deployment Actually Requires
The enterprise response to this threat has so far been inadequate — not because organisations lack good intentions, but because the conventional security playbook doesn’t map cleanly onto the problem. You cannot patch a poisoned model. You cannot firewall your way out of a corrupted training pipeline. The controls have to be upstream, continuous, and architectural.
The JMIR research framework points toward what rigorous defence looks like in practice: ensemble disagreement monitoring, where multiple models cross-check each other for divergent outputs that might indicate backdoor activation; adversarial red teaming specifically designed to probe for trigger-conditioned behaviour; data provenance controls that can trace every training document back to a verifiable source; and governance requirements that treat model integrity as an ongoing audit obligation rather than a one-time deployment check.
Fortinet’s analysis of the threat landscape adds an important regulatory dimension: OWASP’s 2025 Top 10 for LLM Applications now formally classifies data and model poisoning as a recognised integrity attack category, with particular emphasis on external data sources and open-source repositories. NIST’s adversarial ML taxonomy and ENISA’s AI Cybersecurity Challenges report both flag supply chain risk as a primary concern. The regulatory framing is catching up to the technical reality — but organisations that wait for regulation to force the issue will have already absorbed the exposure.
The fundamental strategic reframe required here is this: AI trustworthiness is not a property of the model at the moment of deployment. It is a property of the entire data supply chain, maintained continuously, over the full operational lifetime of the system. Organisations that build their AI governance around deployment-time checks are solving for the wrong moment.
The curriculum that shapes how a model behaves is written long before anyone asks it a question. The question for every enterprise deploying AI in 2026 is whether they know who wrote it.


