Data Protection lawyers with 50+ years of experience

Free initial consultation

Updated Wednesday, May 31, 2023

Updated Wednesday, May 31, 2023

Privacy-compliant training of AI models

Training AI-Models in compliance with GDPR-requirements.

Steffen Groß

Partner (Attorney-at-law)

Jakob Riediger

Scientific Research Assistant

AI Training = processing activity
Anonymization of training data as a solution?
Prerequisites for change of purpose
Dispensability of a compatibility test
Implementation of Privacy Enhancing Technologies
Carrying out the compability test
Outlook and conclusion

Get assistance from our lawyers

Data Protection compliance can be complicated. Let our experienced team simplify it for you.

Free initial consultation

The emergence of AI applications is also accompanied by efforts to trigger the development of a proprietary AI model. The training required for this is associated with the processing of a large amount of data. Since it cannot be ruled out that personal data may be involved, the requirements of the GDPR must be considered when training AI models.

AI Training = processing activity

The development of an AI model requires the training of the model with large amounts of data. Due to the volume of data required for training, it often cannot be ruled out that the training data may include personal data.

Insofar as this involves data of persons located in the European Union, the provisions of the GDPR must therefore be taken into account in the context of the training of AI models. By classifying the training as processing according to Art. 4 No. 2 DSGVO, a legal basis according to Art. 6 DSGVO is required.

Anonymization of training data as a solution?

The provisions of the GDPR do not apply to anonymized data. These data no longer have any personal reference and can therefore be processed without having to observe the restrictions of the GDPR. Anonymization of data (e.g., of inventory data in the run-up to the training) could consequently eliminate data protection concerns.

In implementation, however, anonymization often fails due to practical considerations. On the one hand, it involves a considerable technical effort that cannot be afforded by most of the responsible companies.

On the other hand, the anonymization of training data results in a significant loss of quality, which indirectly leads to a loss of accuracy of the AI model. Furthermore, depending on the application scenario of an AI model, it is not always possible to rely exclusively on anonymous data. [1] An example of this is the development of facial recognition software that is to be trained on "real" faces to be useful as an AI model for this particular application.

Prerequisites for change of purpose

To be able to use a sufficient and qualitative amount of data, the training of AI models often relies on inventory data of the company, such as customer data. These were regularly collected not for the development of an AI, but for the respective business purposes of the company (e.g., delivery of goods or provision of services).

If this data is now to be used for a different purpose - namely the development of an AI - the requirements of the change of purpose according to Art. 6 (4) must be observed. First, data controllers must inform the data subjects about the further processing that will then take place, Art. 13 (3), Art. 14 (4) GDPR.

In addition, the five criteria mentioned for a change of purpose in Article 6 (4) of the GDPR must be considered. This is checked as part of a so-called "compatibility test"[2], unless this is already dispensable (see below):

Dispensability of a compatibility test

In certain constellations, the performance of a compatibility test pursuant to Art. 6 (4) DSGVO may be dispensable. According to Art. 5 (1) (b) of the GDPR, this is the case for "further processing for archiving purposes in the public interest, for scientific or historical research purposes, or for statistical purposes". Thus, at the first level, a consideration of the purpose to be pursued by the AI model to be trained is required.

On the one hand, statistical purposes come into consideration here. Specifically, these may be relevant if the model to be trained "is deemed anonymous within the meaning of the GDPR due to appropriate (...) measures and is not used for measures or decisions regarding individual natural persons." [3]

Statistical purposes consequently come into consideration when processes are "generally improved, anomalies or system impairments are detected, or new data is generated, such as with chatbots," by means of AI.[4]

Implementation of Privacy Enhancing Technologies

Article 6 (4) (e) of the GDPR requires "the existence of appropriate safeguards, which may include encryption or pseudonymization" in the context of the compatibility test.

This makes it clear that the implementation of privacy-enhancing technologies ("PET measures") is a central component of the successful implementation of a compatibility test. This allows the controller to justify the data protection compliance, especially vis-à-vis supervisory authorities.

In addition, early implementation of PET measures is required to avoid "infecting" the AI model. [7]

PET measures that may be considered include, but are not limited to:

  • Anonymization or pseudonymization of training data

  • Use of synthetic training data

  • "Federated Learning" methods[8]

Carrying out the compability test

The performance of a compatibility test always requires a case-by-case examination. The following factors in particular must be taken into account:

Purpose context

The link between the original and new processing purposes: the closer the link, the more likely the compatibility.

Type of personal data

Sensitive data or data requiring special protection could have higher compatibility requirements.

Consequences for those affected

The possible consequences of further processing for the data subjects: Negative effects on the privacy of the data subjects may affect the compatibility of the purposes.

Protective measures

The existence of appropriate safeguards: The implementation of safeguards (PET measures) such as pseudonymization or anonymization can support the compatibility of the processing purposes.

Outlook and conclusion

Companies that want to train an AI model must comply with data protection regulations.

If the training is carried out with existing data, a compatibility test must be carried out. To be successful here, the company must implement data protection-friendly measures (privacy-enhancing technologies).

In addition to the requirements of the GDPR, companies will also have to comply with the provisions of the AI Regulation in the future. In particular, Article 10 of the AI Regulation sets out requirements for the quality of training data.


[1] Cf. Leicht/Sorge, Einsatz von KI-Systemen im Unternehmen: Datenschutzrechtliche Voraussetzungen und technische Lösungsansätze, in: Roth/Corsten, Handbuch Digitalisierung (forthcoming), including a reference to the exemplary presentation of this problem in the use of differential privacy: Bagdasarayan, et. al, Differential privacy has disparate impact on model accuracy, Advances in Neural Information Processing Systems 32 (2019): 15479-15488.

[2] Buchner/Petri, in: Kühling/Buchner DSGVO, 3rd ed. 2020, Art. 6, para. 186.

[3] Kaulartz, Rechtshandbuch Artificial Intelligence and Machine Learning, 1st edition p. 476.

[4] Kaulartz, Rechtshandbuch Artificial Intelligence and Machine Learning, 1st edition 2020, p. 476.

[5] Stefan Brink, former data protection commissioner of Baden-Württemberg, ChatGPT ban: This is what Italy demands from OpenAI - Overview! | SÜDKURIER (

[6] Leffer/Leicht, Data Protection Challenges in the Use of Training Data for AI Systems, *34_p_28_IRIS22_Leffer+ES.indd (

[7] Cf. Kaulartz, Legal Handbook Artificial Intelligence and Machine Learning, p. 476.


© 2019 - 2023 Simpliant