Where your prompts go: what the big AI vendors keep, and what changed

When you send a prompt to a hosted AI product, two things happen that you cannot see: the vendor decides how long to keep it, and the vendor decides whether to learn from it. Both answers depend on the exact tier you are using, and both have changed in the last two years. The short version is that consumer products keep more for longer and may train on your text by default, while enterprise and API tiers keep less, do not train by default, and can be contracted down to near zero.

This post maps what the three largest vendors (Anthropic, OpenAI, and Google) actually do with prompts, outputs, and uploaded files in 2024 to 2026, citing each vendor’s own policy page. Then it covers the enterprise takeaway: what to verify in a vendor’s data processing agreement, and why regulated teams increasingly keep the sensitive prompts inside their own boundary instead.

Key takeaways

Consumer and enterprise tiers are different products. The same model can train on your text on the free tier and never see it on the enterprise tier.
Anthropic changed its consumer default in 2025: opt in to training and retention runs up to 5 years, opt out and it stays at 30 days. Commercial and API use was not affected.
By default OpenAI and Google do not train on API or enterprise data, and retain it for a short abuse-monitoring window (about 30 days for OpenAI, configurable to zero on both).
A US court briefly ordered OpenAI to preserve consumer and standard-API logs in the NYT case. Zero-data-retention API and enterprise customers were exempt. That is the thesis in one event.
In a DPA, verify four things: zero retention, no training, named sub-processors, and a fixed processing region.

Why does the tier matter more than the vendor?

The most common mistake is to ask “does Vendor X train on my data” as if there is one answer. There is not. Every major vendor runs at least two very different regimes under the same brand.

The consumer regime covers free and individual paid chat apps. Here the vendor’s incentive is to improve the model from real usage, so the default often leans toward longer retention and, in some cases, training on your inputs unless you opt out. The commercial regime covers API access and enterprise or business plans. Here the customer is usually another company with a procurement and security team, so the default is no training, a short retention window for abuse monitoring, and the option to contract retention down further.

This split is the single most important fact in the whole topic. A clinician pasting a patient summary into a free chat app and an engineer calling the same model through an enterprise API are using the same weights under completely different data terms.

What does Anthropic keep, and what changed in 2025?

Anthropic’s most notable change is on the consumer side. In August 2025 it updated its consumer terms and privacy policy. Previously, consumer chats were not used for model training and were generally deleted within 30 days. Under the update, users on the Free, Pro, and Max plans (including Claude Code from those accounts) are asked to choose whether their new or resumed chats and coding sessions can be used to train Anthropic’s models.

The trade is explicit. If you allow training, retention extends to up to 5 years. If you decline, you stay on the 30-day window and your chats are not used for training. Existing users had to make a selection to continue using Claude, with a deadline of October 8, 2025. Anthropic’s stated reason for the longer window is that model development cycles span years, so training data needs to outlive a single release.

The steelman for the vendor is straightforward: large models are trained on real interactions, and a clear opt out plus a visible deadline is more honest than quietly changing terms. The legitimate enterprise concern is equally clear: defaults govern behavior, and any consumer app on a corporate device is now a potential five-year retention path for whatever an employee pastes in.

What does OpenAI keep across consumer, API, and enterprise?

OpenAI draws the same consumer-versus-commercial line. Its enterprise privacy commitments state that data submitted through the API, ChatGPT Enterprise, and ChatGPT Team is not used to train OpenAI’s models. On the developer side, OpenAI’s data controls documentation confirms that, as of March 1, 2023, data sent to the API is not used to train models unless you explicitly opt in, and that abuse-monitoring logs are retained for up to 30 days.

For customers who need less than 30 days, OpenAI offers Zero Data Retention (ZDR) on eligible endpoints, which excludes customer content from abuse-monitoring logs and forces the store parameter to false. ZDR is not automatic. Per the same documentation, it is subject to prior approval by OpenAI and acceptance of additional requirements, arranged through sales. Some stateful endpoints (assistants, threads, vector stores, files) remain ineligible because they store application state by design.

30 days default OpenAI API abuse-monitoring retention, configurable to zero data retention on eligible endpoints for approved customers OpenAI API data controls, 2025

The most instructive event came from litigation, not policy. In the copyright case brought by The New York Times, a US magistrate judge ordered OpenAI in May 2025 to preserve ChatGPT logs that it would normally delete. OpenAI’s own response to the data demands explains the scope: the order affected ChatGPT Free, Plus, Pro, and Team, plus standard API traffic. It did not affect ChatGPT Enterprise, ChatGPT Edu, or API customers with a Zero Data Retention agreement, because OpenAI does not retain that content in the first place. The broad preservation requirement was later narrowed in late September 2025.

What does Google keep on Gemini, Vertex, and Workspace?

Google’s split is the widest of the three. On the consumer side, Gemini Apps Activity is on by default, conversations are saved to your Google Account for up to 18 months by default (adjustable to 3 or 36 months), and a separate sample of conversations is reviewed by humans and used to improve Google’s models unless you turn the setting off. Google explicitly warns consumers not to enter information they would not want a reviewer to see.

On the commercial side the posture is the opposite. Google’s Vertex AI data governance documentation states that Google does not use your prompts or responses to train its foundation models without permission. Inputs may be cached briefly to reduce latency (up to 24 hours, project-scoped), and customers who need a stricter posture can pursue zero-data-retention-equivalent terms, which requires disabling caching and requesting an exception from abuse-monitoring logging. The Gemini API’s abuse-monitoring logs are kept for a limited window for policy enforcement only, not for training the foundation models.

How do the three vendors compare at a glance?

The table below summarizes the defaults. Treat it as a starting map, not a contract: the binding terms are always the vendor’s own current policy page and your signed agreement.

Vendor and tier	Trains on your data by default?	Default retention	Tighten to zero?
Anthropic consumer (Free/Pro/Max), training allowed	Yes (opt in)	Up to 5 years	Opt out, back to 30 days
Anthropic consumer, opted out	No	30 days	n/a
Anthropic commercial / API	No	Short, contract-defined	Yes, by agreement
OpenAI ChatGPT consumer	Not for Enterprise/Team; consumer settings vary	Account-dependent	Use business tier
OpenAI API	No (since March 2023)	Up to 30 days (abuse)	Yes, ZDR on eligible endpoints
OpenAI Enterprise / Team	No	Contract-defined	Yes
Google Gemini consumer app	Yes, unless turned off	Up to 18 months default	Turn off Apps Activity
Google Vertex AI / Workspace business	No, without permission	Up to 24h cache	Yes, ZDR-equivalent terms

Two patterns hold across all three. First, the default on a personal app is more permissive than most regulated teams realize. Second, every commercial tier already promises no training and supports driving retention toward zero, which means the control you need usually exists; it just has to be turned on and written into the contract.

What should a regulated team verify in a vendor DPA?

A data processing agreement (DPA) is the contract that governs how a vendor, acting as your processor, handles personal data. Marketing pages are not binding; the DPA and its referenced policies are. Four checks separate a defensible deployment from a hopeful one.

Zero retention, in writing. Confirm the retention window for prompts, outputs, and uploaded files, and confirm whether zero data retention applies to the specific endpoints you call. ZDR that excludes the stateful endpoints you actually use is not zero retention for you.
No training, with no quiet carve-outs. Confirm in the contract that your content is not used to train or fine-tune the vendor’s models, and check whether any “service improvement” or safety-tuning clause reopens that door.
Named sub-processors and a change-notice clause. Every downstream party that can touch the data should be listed, with advance notice of additions so you can object before they go live.
A fixed processing and storage region. For GDPR residency and for sector rules, pin where the data is processed and stored, not just where the account is billed. A region promise you cannot verify is not residency.

Where does this leave regulated teams?

The honest read is that the major vendors have built reasonable commercial controls. No training by default on API and enterprise tiers, short retention, and a path to zero retention are real and documented. For many workloads, a well-negotiated DPA on a commercial tier is enough.

The gap is two-sided. First, the consumer surface remains the soft spot: free apps on corporate devices carry the permissive defaults, and that is a governance problem about people and policy, not about the vendor. Second, even a strong DPA is a promise about someone else’s infrastructure, and promises can be changed by the vendor or overridden by a court. For the most sensitive workflows (protected health information, privileged legal material, regulated financial records, defense data), the most durable answer is to not send that content to a third party at all. Run the agent inside your own cloud, VPC, or on-prem environment, keep prompts and documents in a data plane you control, route every call through one gateway, and record every run in a trace you own. Then the retention window is your retention window, the training policy is your training policy, and the next vendor change is something you read about rather than something that happens to your data.

Map your vendors with the table above, fix the consumer defaults, verify the four DPA checks, and keep the irreplaceable prompts inside the boundary. For more on that boundary, see our approach to trust and the rest of our writing on AgentOps and governance.

FAQ

Does Anthropic train on my Claude data?

It depends on the tier. For Claude for Work, Government, Education, and API access, Anthropic does not use your data to train its models. For consumer Free, Pro, and Max plans, since the August 2025 update you choose: allow training and retention runs up to 5 years, or opt out and stay at 30 days with no training. See Anthropic’s consumer terms update.

Is data sent to the OpenAI or Google API used to train their models?

No, not by default. OpenAI states that API data has not been used for training since March 2023 unless you opt in, with up to 30-day abuse-monitoring retention and an optional Zero Data Retention path. Google states that Vertex AI does not use your prompts to train its foundation models without permission. Consumer chat apps are the exception, where defaults can favor training unless you turn the setting off.

What is the most important thing to check in an AI vendor contract?

Four things, in writing: zero or minimal retention for the endpoints you actually use, an explicit no-training commitment with no quiet service-improvement carve-out, a named sub-processor list with change notice, and a fixed processing and storage region. If the most sensitive prompts cannot tolerate any third-party retention risk, the stronger control is to keep them inside your own boundary rather than relying on a contract alone.