Meta plans to use personal data from Facebook and Instagram profiles to train its generative AI models. This processing violates key provisions of the General Data Protection Regulation (GDPR) and is therefore unlawful.
Introduction
Meta, the parent company of Facebook and Instagram, plans to use personal content from EU users to train its artificial intelligence (AI) systems. This includes posts, likes, photos, and other activity data from its platforms, which are intended to feed into large language models like “LLaMA.”
The planned data processing violates several key provisions of the General Data Protection Regulation (GDPR) and is therefore unlawful. As a result, Meta faces not only potential regulatory sanctions but also significant civil liability.
What is Meta Planning?
Meta intends to use personal data from users of Facebook and Instagram in the European Union to train its generative AI models. This includes content such as posts, comments, likes, reactions, photos, videos, and even metadata from user interactions. Meta claims these data are to be used to adapt AI systems—like the large language model LLaMA—to European languages, cultures, and social habits.
According to Meta, no private messages or data from minors will be used, and users have been informed via an opt-out option. However, the company relies on the legal basis of “legitimate interest” under the GDPR to justify this data use—without obtaining explicit consent from the users.
Which Data Is Affected?
The scope of data that Meta intends to use for training its AI systems is significant and highly sensitive. Starting May 27, 2025, Meta plans to utilize all "public" information available on users’ profiles on Facebook and Instagram. This includes, among other things:
- Content and activity data: posts, comments, likes, photos, videos, stories, and live content
- Profile data: biography, relationship status, interests, profile pictures
- Technical and device-related data: IP address, device type, operating system, app version, cookies
- Usage behavior and social connections: click and scroll behavior, story views, follower networks, communication histories
- External sources: data from third-party websites with Meta integration as well as advertising and analytics partners
- Derived profile information: presumed interests, consumption habits, political or religious beliefs, personal characteristics
Meta specifically refers to all public profile information—that is, content whose visibility is not private or restricted to certain users. This affects both current and all past contributions, public posts, comments, photos, videos, and captions.
Meta states that data from underage users and persons who have objected to processing will be excluded from AI training. However, in practice, this protection is limited.
Meta itself points out that personal data of third parties can also enter the training data—for example, if a person appears in a publicly posted photo or is mentioned in a comment. Even with an active objection, it must be assumed that indirect information—such as tags, shared content, or mentions by others—will flow into AI training. This effectively undermines the right to informational self-determination.
User data processing is carried out not only for training a specific AI model but comprehensively for the "development of various AI systems." Meta plans to combine this data frequently across platforms, for example between Facebook and Instagram or via jointly used devices and linked accounts. This can result in highly detailed personality profiles.
An objection filed after May 27 against the use of one's own data is possible and applies for the future, but any training already carried out with one’s data cannot be undone. Data protection authorities are remaining passive and point out that users should ideally object before May 27, 2025. Meta offers special online forms for this, which can only be filled out by logged-in users. Providing the email address used on Facebook or Instagram is mandatory, but a justification for the objection is not required.
For Meta AI in WhatsApp, there is currently no way to object. Meta explains that inputs from the EU are currently not used for training purposes. Users who still do not want their data to be processed should refrain from using the chatbot.
Why is this legally problematic?
Meta’s use of such extensive and sensitive personal data violates several core principles of the General Data Protection Regulation (GDPR). A major concern is that much of the data can reveal political views, religious beliefs, health conditions, or sexual orientation—categories that are considered especially sensitive under GDPR.
Processing such data requires explicit consent from users. Merely informing them and offering an opt-out option is not sufficient. Meta instead relies on the legal basis of “legitimate interest”—a justification that does not hold for this level of sensitive data.
Moreover, there is a serious lack of transparency: users often have no clear understanding of how and which of their data is being used.
Rights such as access or deletion are also nearly impossible to enforce once data is incorporated into AI training.
Personality Profiles from Billions of Data Points
What Meta is planning is unprecedented: no other company holds such an extensive dataset of personal information as Meta—collected over years from Facebook and Instagram. This is not about isolated pieces of information, but about a dense web of direct and indirect data that can be used to draw detailed conclusions about a person’s identity and character.
By combining profile information, behavioral patterns, interactions, and interests, Meta can create comprehensive personality profiles—including insights into political opinions, religious beliefs, health conditions, and sexual orientation.
This depth of analysis poses serious risks to the privacy and personal rights of those affected. Both the nature of the data—such as political views, health information, relationship status, and more—and its sheer volume make the proposed data processing a legally unprecedented operation.
In the context of AI training, this represents a historically unique experiment with the privacy of millions of people. While previous large language models have been trained on publicly available internet data, never before has such a vast trove of deeply personal, individually linked information—like that found on Facebook and Instagram—been used. The risk to individual rights and public trust in digital platforms is therefore significantly greater in this case than in earlier AI projects.
Consent vs. Opt-Out
Instead of obtaining explicit consent from users, Meta claims a “legitimate interest” as the legal basis for processing personal data. Under Article 6(1)(f) of the GDPR, this justification allows data processing without consent—but only if the individual’s rights do not override the controller’s interest.
That’s where Meta’s argument falls short: given the depth and scale of the data processing—especially involving sensitive content—the users’ right to privacy clearly outweighs Meta’s commercial goals. Individuals have no real control over how their data is used, and less intrusive alternatives—such as anonymized or synthetic data—are available. There is no fair balancing of interests here. As a result, “legitimate interest” cannot serve as a valid legal foundation for Meta’s approach.
What Does This Mean for User Rights?
The GDPR grants individuals broad rights over their data—including access, correction, deletion, and objection to processing. However, in the context of AI training by Meta, these rights become largely ineffective.
The core issue: once personal data has been used to train an AI model, it can no longer be clearly traced or removed from the model. Users often have no insight into whether their data was used, and even if they object or delete content, there is no guarantee that traces of their data don’t persist within the system.
Additionally, data shared by others—such as through tags, comments, or group posts—can still be included in training, even if the individual concerned objected to the use of their data. As a result, the right to control one’s own data is effectively undermined in practice.
Lack of Transparency and Control
One of the core principles of the GDPR is transparency: users must be able to understand what happens to their data. But this is where Meta fails. The information provided is vague, difficult to understand, and leaves many critical questions unanswered—such as which data is used, for what exact purpose, and how deleted content is handled.
Even more troubling is the opaque nature of AI models themselves. Once trained, it is nearly impossible to trace or remove specific data from the model. Users effectively lose control over their data—without ever clearly knowing whether it was used in the first place. This level of opacity not only contradicts the intent of the GDPR but also undermines the fundamental right to informational self-determination.
Risks of AI and Open-Source Release
Meta’s plan to release its trained AI models—like LLaMA 2—as open source raises serious concerns. Once made public, these models can be downloaded, reused, and modified by anyone, anywhere in the world. As a result, neither Meta nor data protection authorities would have any real control over how the models are used, even if they contain sensitive or personal information.
Once released, there is no way to retract or monitor such models. Personal data used in training could potentially be revealed again through indirect means—such as through targeted prompts. Individuals affected would have little to no recourse to protect themselves or assert their rights. This risk of total loss of control over one’s data represents one of the most severe threats to privacy in the age of AI.
What Are the Legal Consequences for Meta?
If Meta proceeds with this kind of data processing, it could face serious legal consequences. European data protection authorities are empowered to prohibit such processing, require deletion of the data, and impose hefty fines—up to 4% of the company’s global annual revenue. For a company like Meta, that could mean billions of euros.
In addition, civil liability poses a major threat. Individuals whose data were unlawfully processed may claim compensation—including for non-material damage such as loss of control or emotional distress. In practice, this could mean several thousand euros per person. With millions of users potentially affected, Meta faces a financial risk in the tens of billions—far beyond any regulatory fine alone.
Conclusion
Meta is planning to use one of the world’s most extensive data collections to develop artificial intelligence – without obtaining the explicit consent of the individuals concerned. The planned data processing involves highly sensitive information, is conducted in a non-transparent manner, and largely escapes user control. Neither the invocation of a “legitimate interest” nor the option to object meets the requirements set out by the GDPR.
The outcome of the data protection assessment is unequivocal: the planned data processing is unlawful. Meta must therefore expect not only significant sanctions from supervisory authorities, but also – and perhaps even more serious – a wave of compensation claims from affected individuals.
Note: The full legal opinion, including detailed legal analysis of the GDPR violations, is available for download and offers an in-depth examination of the relevant issues.
Simpliant Legal Opinion - Inadmissibility of Meta's processing of social media data for AI training - 1.0
This legal opinion examines the admissibility under data protection law of Meta Platforms Ireland Ltd. ("Meta")'s planned processing of personal data of users in the European Union for the training of generative AI models (in particular LLaMA).