The Business & Technology Network
Helping Business Interpret and Use Technology
«  

May

  »
S M T W T F S
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
 
 
 
 
 
 
 
 
 
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
31
 
 

Can ChatGPT Violate Your Privacy Rights If It Doesn’t Store Your Data?

DATE POSTED:May 8, 2024

If you were to ask someone to state the birthday of someone else, and the person asked just made up a date, which was not the actual birthday, would you argue that the individual’s privacy had been violated? Would you argue that there should be a legal right to demand that the person explain how they came up with the made-up date and to permanently “store” the proper birth date in their mind?

Or would you simply laugh it off as utter nonsense?

I respect the folks at noyb, the European privacy activists who keep filing privacy complaints that often have significant consequences. noyb and its founder, Max Schrems, have pretty much single-handedly continued to rip up US/EU privacy agreements by highlighting that NSA surveillance simply cannot comply with EU data privacy protections.

That said, noyb often seems to take things a bit too far, and I think its latest complaint against OpenAI is one of those cases.

In the EU, the GDPR requires that information about individuals is accurate and that they have full access to the information stored, as well as information about the source. Surprisingly, however, OpenAI openly admits that it is unable to correct incorrect information on ChatGPT. Furthermore, the company cannot say where the data comes from or what data ChatGPT stores about individual people. The company is well aware of this problem, but doesn’t seem to care. Instead, OpenAI simply argues that “factual accuracy in large language models remains an area of active research”. Therefore, noyb today filed a complaint against OpenAI with the Austrian DPA.

I have to admit, sometimes I kinda wonder if noyb is really a kind of tech policy performance art, trying to make a mockery of the GDPR. Because that’s about the only way this complaint makes sense.

The assumptions underlying the complaint are that ChatGPT is something that it is not, that it does something that it does not do, and that this somehow implicates rights that are not implicated at all.

Again, Generative AI chat tools like ChatGPT are making up content based on what they’ve learned over time. It is not storing and collecting such data. It is not retrieving data that it has stored. Many people seem to think that ChatGPT is somehow the front end for a database, or the equivalent of a search engine.

It is not.

It is a digital guessing machine, trained on tons of written works. So, when you prompt it, it is probabilistically guessing at what it can say to respond in a reasonable, understandable manner. It’s predictive text on steroids. But it’s not grabbing data from a database. This is why it does silly things like make up legal cases that don’t exist. It’s not because it has bad data in its database. It’s because it’s making stuff up as it goes based on what “sounds” right.

And, yes, there are some cases where it seems closer to storing data, in that the nature of the training and the probabilistic engine is that it effectively has a very lossy compression algorithm that allows it to sometimes recreate data that closely approximates the original, but that’s still not the same thing as storing data in a database, and in the example used by noyb — a random person’s birthday — that’s simply not the kind of data that is at issue here.

Yet, noyb’s complaint is that ChatGPT can’t tell you what data it has on people (because it doesn’t “have data” on people) and that it can’t correct mistakes (because there’s nothing to “correct” since it’s not pulling what it writes from a database that can be corrected).

The complaint is kind of like saying that if you ask a friend of yours about someone else, and they repeat some false information, arguing that that friend is required under the GDPR to explain why they said what they said and to “correct” what is wrong.

But noyb insists this is true for ChatGPT.

Simply making up data about individuals is not an option. This is very much a structural problem. According to a recent New York Times report, “chatbots invent information at least 3 percent of the time – and as high as 27 percent”. To illustrate this issue, we can take a look at the complainant (a public figure) in our case against OpenAI. When asked about his birthday, ChatGPT repeatedly provided incorrect information instead of telling users that it doesn’t have the necessary data.

If this is actually a violation of the GDPR, noyb’s real complaint is with the GDPR, not with ChatGPT. Again, this only makes sense for an app that is storing and retrieving data.

But that’s not what’s happening. ChatGPT is probabilistically guessing at what to respond with.

No GDPR rights for individuals captured by ChatGPT? Despite the fact that the complainant’s date of birth provided by ChatGPT is incorrect, OpenAI refused his request to rectify or erase the data, arguing that it wasn’t possible to correct data.

There is no data to correct. This is just functionally wrong. It’s like filing a complaint against an orange for not being an apple. It’s just a fundamentally different kind of service.

Now, there are some attempts at generative AI tools that do store data. The hot topic in the generative AI world these days is RAGs, “retrieval augmented generation,” in which an AI is also “retrieving” data from some sort of database. noyb’s complaint would make more sense if it found a RAG that was returning false information. In such a scenario, the complaint would fit.

But when we’re talking about a regular old generative AI model without retrieval capabilities, it makes no sense at all.

If noyb honestly thinks that what ChatGPT is doing is violating the GDPR, then there are only two possibilities: (1) noyb has no idea what it’s talking about here or (2) the GDPR is even more silly than we’ve argued in the past, and all noyb is doing is trolling to make that clear by filing a laughably silly complaint that exposes how poorly fit the GDPR is to the technology in our lives today.