The Linux Foundation has announced the CDLA-Permissive-2.0 license agreement to make it easier to share AI and ML data.
The rise of artificial intelligence and machine learning have created a need for a new type of license that allows data sets and learning models to be shared, as well as incorporated into AI and ML applications.
The Linux Foundation described the challenges in a blog post:
Open data is different. Various laws and regulations treat data differently from software or other creative content. Depending on what the data is and which country’s laws you’re looking at, the data often may not be subject to copyright protection, or it might be subject to different laws specific to databases, i.e., sui generis database rights in the European Union.
Additionally, data may be consumed, transformed, and incorporated into Artificial Intelligence (AI) and Machine Learning (ML) models in ways that are different from how software and other creative content are used. Because of all of this, assumptions made in commonly-used licenses for software and creative content might not apply in expected ways to open data.
While the Linux Foundation previously offered the CDLA-Permissive-1.0 license, it was often criticized for being too long and complex. In contrast, 2.0 is less than a page long and is greatly simplified over its predecessor.
In response to perceptions of CDLA-Permissive-1.0 as overly complex, CDLA-Permissive-2.0 is short and uses plain language to express the grant of permissions and requirements. Like version 1.0, the version 2.0 agreement maintains the clear rights to use, share and modify the data, as well as to use without restriction any “Results” generated through computational analysis of the data.
A key element of the new license is the ability to collaborate and maintain compatibility with other licenses, such as Creative Commons licenses. The addition of CDLA-Permissive-2.0 is already being met with acclaim from the industry, with both IBM and Microsoft making data sets available using the language.
“IBM has been at the forefront of innovation in open data sets for some time and as a founding member of the Community Data License Agreement. We have created a rich collection of open data sets on our Data Asset eXchange that will now utilize the new CDLAv2, including the recent addition of CodeNet – a 14-million-sample dataset to develop machine learning models that can help in programming tasks.” Ruchir Puri, IBM Fellow, Chief Scientist, IBM Research