Open data and code. License types

1 September 2021
Software development

This article will focus on open data, open source, and license types

The idea behind open data and open source is that some data should be freely available to everyone. They can then be used and published as you see fit, without restriction from copyright, patents, or other control mechanisms. The goals of the open-source data movement are similar to those of other open resource developments such as hardware, open content, specifications, and education. This article will focus on open data, open source, and license types.

Working with data

In addition to freely available data, there is also data for which you need to obtain a license before publication. Licenses may contain the following elements: indication of authorship, requirements for derivative works, use or non-use for profit, and prohibition of changes.

Copyright vs. Copyleft

Copyright refers to proprietary copyright, which is the right to copy and reproduce.

Copyleft means that anyone who distributes a program, with or without changes, cannot restrict further distribution or alterations.

What licenses can be used when working with data?

Creative Commons licenses are texts that describe the terms of use of the works to which they are attached. Such licenses contain a brief description of the essence of the document and the rules for using the work. They also have a detailed report that lawyers most often check for compliance with copyright law.

Creative Commons licensing can be free or non-free.

The free licenses include:

  • CC Attribution (CC BY). This work can be used as desired, but you must specify the authorship.
  • CC Attribution-Share Alike (CC BY-SA) is a copyleft license that has been one of the most popular since 2009. The license gives others the right to modify and develop the work, even for commercial purposes. Provisions are that the authorship is shown, and the derivative works are licensed under similar conditions. Since all new works received after the changes will have a similar license, they can also be modified and used for commercial purposes.

Non-free licenses:

  • CC Attribution-No Derivative Works (CC BY-ND). This license allows you to distribute the work freely on a commercial and non-commercial basis, but it must remain unchanged with the mandatory indication of authorship. Works with such a license may not be translated into another language.
  • CC Attribution-Noncommercial (CC BY-NC). This license allows you to process, correct, and develop works on a non-commercial basis. For derivative works, the requirements for identifying the authors and non-commercial use remain, but it is not required to grant third parties similar rights to derivatives.
  • CC Attribution-Noncommercial-Share Alike (CC BY-NC-SA). This license allows others to modify the work on a non-commercial basis regarding the original authorship. Derivative works are licensed under similar terms.
  • CC Attribution-Noncommercial-No Derivative Works (CC BY-NC-ND). This license is the most restrictive among the six main licenses. It allows others to receive and distribute the work provided they mention the author and link to them, but they cannot change the source in any way or use it for commercial purposes.

The Free Software Foundation develops the GNU Free Documentation License as a supplement to the General Public License (GNU), a popular free software license. It grants rights to reproduce, distribute, and modify the original work, but new copies and derivative works must also be distributed under the General Free Documentation License (GFDL).

The public domain refers to creative work whose copyright has expired or has never been established. It may also include inventions whose patent terms have expired. Everyone can distribute and use items in the public domain without restrictions but may not issue these works as their own.

When summarizing all the above, it is necessary to note a few facts:

  1. Open data can be accessed under different licenses.
  2. For use in commercial products, it is essential to understand what restrictions exist on using a particular material.
  3. An example from the AI world: Neural networks trained on data that the owners have not authorized (in the form of an open-data license or another transmission method) are illegal.

Open source

The source code of open-source software is available for review, study, and modification. You can pre-check the code for vulnerabilities and unacceptable features, for example, hidden functions for tracking program users. You can modify the open-source software or use the code to create new programs and fix bugs.

Popular licenses:

  • The Berkeley Software Distribution license (BSD) is a software license from the University of Berkeley used in some Microsoft, Apple, and Sony PlayStation products. This license allows commercial use and does not require redistribution as source code. It also enables the distribution of derivative work on other terms (not copyleft).
  • The MIT License is an open-source and free software license developed by the Massachusetts Institute of Technology. This license allows for commercial use and does not require redistribution as source code. In addition, it allows for sub-licensing and selling and enables you to distribute the derivative work on other terms (not copyleft).
  • Apache License is a free software license from the Apache Software Foundation. This license allows you to use the information for profit, does not require redistribution to be in the source code, and allows you to distribute the derivative work on other terms (not copyleft).
  • The GNU General Public License (GNU GPL) is the GNU General Public License. The license grants a user rights to copy, modify, improve, and distribute the software. Software with such a license can be used for profit, but redistribution in the form of the source code is necessary. The distribution of the code must be carried out under the same conditions (copyleft). It does not allow linking with code under a different license.
  • GNU Lesser General Public License (GNU LGPL). The license is designed as something between the strict copyleft license of the GNU General Public License (GPL) and lighter licenses such as the BSD or MIT licenses. The license allows you to use the software for profit. It requires redistribution in the form of the source code and obliges the code to be distributed under the same conditions (copyleft). There is no need to use LGPL for your code.
  • The GNU Affero General Public License (GNU AGPL) is a free license created primarily for web applications. Users accessing the modified program over the network can also use its source code. The license allows the use of the software for profit and requires the provision of source code even for web applications. The code must be distributed under the same conditions (copyleft) and may not be linked with code under a different license.

Summary of open-source licenses

When developing a project, you need to understand what its commercial purpose is. It is crucial to audit third-party components and modules used in projects and the licenses that are used for them.

Author: V. Nareyko