> To train, develop, and improve the artificial intelligence, machine learning, ...

graeme · 2025-04-21T20:50:55 1745268655

Oh, that's not great. Cursor has a privacy mode where you can avoid this.

>If you enable "Privacy Mode" in Cursor's settings: zero data retention will be enabled, and none of your code will ever be stored or trained on by us or any third-party.

https://www.cursor.com/privacy

jeanlucas · 2025-04-22T07:17:31 1745306251

Important notice: this is off by default, if you use cursor consider activating this option.

nbittich · 2025-04-22T07:20:14 1745306414

this kind of flag is the trust me bro we hear since forever

simonw · 2025-04-21T19:58:08 1745265488

Yeah that's a bad look. If I have an API key visible in my code does that get packaged up as a "prompt" automatically? Could it be spat out to some other user of a model in the future?

(I assume that there's a reason that wouldn't happen, but it would be nice to know what that reason is.)

Havoc · 2025-04-22T00:38:25 1745282305

I wonder how hard it is to fish the keys out of the model weights later with prompting . Presumably possible to literally brute force it by giving it the first couple chars and maybe an env variable name and asking it to complete it

isjustintime · 2025-04-21T22:57:09 1745276229

I'm also interested in the details on how this works in practice. I know that there was a front page post a few weeks ago about how Cursor worked, and there was a short blurb about how sets of security prompts told the LLM to not do things like hard code API keys, but nothing on the training side.

Workaccount2 · 2025-04-21T20:30:00 1745267400

Gemini doesn't use paid API prompts for training.[1]

I believe its just for free usage and the web app.

[1]https://ai.google.dev/gemini-api/docs/pricing

rudedogg · 2025-04-21T20:45:28 1745268328

Yeah, I was referring to their webapp/Chat, aka Gemini Advanced. It uses your prompts for training unless you turn off chat history completely, or are in their “Workspace” enterprise version.

https://support.google.com/gemini/answer/13594961?hl=en

> What data is collected and how it’s used

> Google collects your chats (including recordings of your Gemini Live interactions), what you share with Gemini Apps (like files, images, and screens), related product usage information, your feedback, and info about your location. Info about your location includes the general area from your device, IP address, or Home or Work addresses in your Google Account. Learn more about location data at g.co/privacypolicy/location.

Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.

Gemini Apps Activity is on by default if you are 18 or older. Users under 18 can choose to turn it on. If your Gemini Apps Activity setting is on, Google stores your Gemini Apps activity with your Google Account for up to 18 months. You can change this to 3 or 36 months in your Gemini Apps Activity setting.

Alifatisk · 2025-04-21T20:34:46 1745267686

That's what I thought

kmeisthax · 2025-04-21T20:45:37 1745268337

Without exception, every AI company is a play for your data. AI requires a continuing supply of new data to train on, it does not "get better" merely by using the existing trainsets with more compute.

Furthermore, synthetic data is a flawed concept. At a minimum, it tends to propagate and amplify biases in the model generating the data. If you ignore that, there's also the fundamental issue that data doesn't exist purely to run more gradient descent, but to provide new information that isn't already compressed into the existing model. Providing additional copies of the same information cannot help.

kadushka · 2025-04-21T21:07:41 1745269661

it does not "get better" merely by using the existing trainsets with more compute.

Pretty sure it does - that’s the whole point of using more test time compute. Also, a lot of research efforts goes into improving data efficiency.

parliament32 · 2025-04-21T20:48:14 1745268494

> Same with Gemini Advanced (paid) training on your prompts

I'm not sure if this is true.

> 17. Training Restriction. Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.

https://cloud.google.com/terms/service-terms

> This Generative AI for Google Workspace Privacy Hub covers... the Gemini app on web (i.e. gemini.google.com) and mobile (Android and iOS).

> Your content is not used for any other customers. Your content is not human reviewed or used for Generative AI model training outside your domain without permission.

> The prompts that a user enters when interacting with features available in Gemini are not used beyond the context of the user trust boundary. Prompt content is not used for training generative AI models outside of your domain without your permission.

> Does Google use my data (including prompts) to train generative AI models? No. User prompts are considered customer data under the Cloud Data Processing Addendum.

https://support.google.com/a/answer/15706919

simonw · 2025-04-21T21:00:32 1745269232

Right, it's the free Gemini that has this: https://ai.google.dev/gemini-api/terms#unpaid-services

> When you use Unpaid Services, including, for example, Google AI Studio and the unpaid quota on Gemini API, Google uses the content you submit to the Services and any generated responses to provide, improve, and develop Google products and services and machine learning technologies, including Google's enterprise features, products, and services, consistent with our Privacy Policy.

rudedogg · 2025-04-21T21:02:11 1745269331

That’s for Google Cloud APIs.

See my post here about Gemini Advanced (the web chat app) https://news.ycombinator.com/item?id=43756269

amelius · 2025-04-21T20:41:37 1745268097

Windsurf: where the users provide the wind and they do all the surfing.

ayi · 2025-04-22T05:01:52 1745298112

Looks like the correct page is this: https://windsurf.com/security

It says:

Zero-data retention mode is the default for any user on a team or enterprise plan and can be enabled by any individual from their profile page.

With zero-data retention mode enabled, code data is not persisted at our servers or by any of our subprocessors. The code data is still visible to our servers in memory for the lifetime of the request, and may exist for a slightly longer period (on the order of minutes to hours) for prompt caching The code data submitted by zero-data retention mode users will never be trained on. Again, zero-data retention mode is on by default for teams and enterprise customers.

bn-l · 2025-04-21T23:52:47 1745279567

Nope literally never going to use it because of this.

627467 · 2025-04-21T22:51:05 1745275865

Hey we all want to keep and eat the cake, but I'm (kinda?) surprised that people expect these services that have been trained on large swaths of "available" data and now don't want to contribute. Even if you're paying: why the selfishness?

sdesol · 2025-04-21T22:56:14 1745276174

I think it is more of, LLMs should be treated as a utility service. Unless Google and others can clearly show the training data involved, the price that providers can charge for LLMs should be capped. I have no issue with contributing my conversations and my open source code, and I should expect in return a fair price.

blibble · 2025-04-21T20:18:04 1745266684

it's the reason they bought it...

Alifatisk · 2025-04-21T20:34:34 1745267674

No way Gemini Advanced user content is also being used for training?