Practical Data Protection Compliance for Open-Source LLMs

AR:0001
Article

Practical Data Protection Compliance for Open-Source LLMs

AWO has a multi-disciplinary team of experts that include legal professionals, technology analysts, ethicists, and data scientists. Together we have identified some of the important data protection requirements under the GDPR for the development and use of open-source LLMs.

This has culminated in our paper, Practical Data Protection Compliance for Open-Source LLMs, which contains a comprehensive analysis of the data protection issues related to these models. A copy of the full paper can be downloaded here. We hope that this work proves a valuable contribution to the open-source community.

This work demonstrates the expertise that our team can provide to organisations developing or using AI systems. Our algorithm governance advisory services support clients ranging from international organisations and start-ups developing responsible AI strategies to engineering and technical teams tasked with algorithm development. You can learn more about our service offering here and get in touch to discuss your needs. We also provide a monthly newsletter Algorithm Governance Roundup where we share technical and regulatory developments, job listings, publications, and advertise events.

Key Issues Impacting Open-Source LLM Development and Use

Open-source LLMs have grown in capability and prominence. An increasing number of organisations have opted to use such models within their own environments and leverage their customisability to tailor them to a range of tasks. However, while open-source LLMs possess several key advantages, both developers and users must consider the data protection implications.

Key Issues for Developers

  • The processing of personal data needs an appropriate legal basis. This applies to both the training data collected for the development of LLMs pre-deployment and any user prompts collected for further training and service improvement post-deployment.
  • Information must be provided explaining what personal data about data subjects is processed and where that data has been collected from. This information must be provided before or at least at the time the model is used by data subjects.
  • Requests made by data subjects based on their rights afforded under data protection law must be fulfilled. Among these rights include the right of access, the right to be forgotten, the right to restriction of processing and the right to object.
  • Personal data processed must be accurate and every reasonable step must be taken to delete or rectify any inaccurate personal data. Furthermore, data subjects have the right to have inaccurate personal data about them rectified without undue delay.
  • Appropriate security measures must be in place to ensure the confidentiality, integrity and availability of personal data processed. In particular, such measures must protect against unauthorised or unlawful processing, as well as against accidental loss, destruction or damage.

Key Issues for Users

  • Before using an open-source LLM, organisations should carry out due diligence to ensure that the developer is compliant with data protection law. This should also be combined with policies to ensure that use of the model by the organisation is compliant.
  • The use of open-source LLMs within an organisation involving the processing of personal data needs to be supported by a specified, explicit and legitimate purpose. Organisations must therefore carefully consider the reasons for deploying and using an open-source LLM.
  • If choosing to use an open-source LLM that processes the personal data of those who use it, there needs to be an appropriate legal basis for such processing.
  • Information must be provided explaining what personal data about data subjects is processed when using the model. This information must be provided before or at least at the time the model is used by data subjects.