💠Open AI Guide

This is a support document to guide open-AI projects to submit an application for digital public good recognition by the DPGA.

VERY IMPORTANT NOTE

🔴 IMPORTANT: We are currently running a CoP on how to best assess AI-models as DPGs, including topics such as the transparency and licensing of training data, and trade-offs/concerns related to privacy and safeguards. Until the CoP is finished and we have its recommendations, we will only be accepting AI-models where there is full open licensing around both data and software, and where there are no PII-data involved, or where any concerns related to the privacy and protection of such data can easily be addressed.

Relevance to Sustainable Development Goals

Digital public goods must be designed and developed to advance the Sustainable Development Goals (SDGs). A good way to provide evidence of this is:

State a clear couple of sentences that explain the relationship between your AI model and the selected SDG(s) pointing to the specific targets you help accomplish.
Provide any link(s) of a blog post, media post, or public communication (organisation mission statement or similar) that talks about any social, public, or relevant contribution to society. It is not necessary that these mention SDGs as long as it relates to the previous explanation.

📌 You can use this SDG tracker tool to get an idea of the targets, initiatives, and data around each SDGs

📌 The SDG Academy provides free, open educational resources from the world’s leading experts on the sustainable development goals.

<Project name> helps advance SDG2: Zero Hunger by using facial recognition to accurately diagnose and predict acute malnutrition in children under 5.

Collaboration with X local government to predict malnutrition in children of Ethiopia during the Covid19 pandemic - www.link-to-the-article.com

Use of Approved Open Licences

Open AI Models should hold a valid license for both their open data set and their software components.

Please check this list and ensure that BOTH licenses are valid.

A good way to provide evidence of the licence used is to have it listed as a footer on your website and have it in the root repository of your Github page.

The data set backing this AI model is licensed under CC-BY-4.0 and the software is licensed under AGPL 3.0. Evidence of both these can be found here <insert link>

Clear Ownership

Ownership of an AI model refers to all these factors:

Who is stewarding the data set?
Who owns the code for the AI model?
Who owns the software component?

You need to have proof of ownership (Stewardship in the case of data) of all 3 aspects, for this AI model to be attributed to you.

A good way to provide proof of ownership is to have your name listed on both the licenses (of the data set as well as software component) and have the name displayed on the website as well as Readme section of the Github repo.

<Project name> is owned by organisation <name>. Evidence of this can be found at <link 1 - data> <link 2 - software> and <link 3 - general AI - website/ github>

Platform Independence

Digital public goods with elements or assets within the content that create more restrictions than the original license must indicate them. A good way to indicate this is to clearly reference and attribute any external assets or sources used within your dataset or software.

For the software component, you can also submit an SBOM. Please view this section for more details.

Since the AI model will be heavily dependent on the data set used for training, please inform us whether or not periodic updates willbe made to the training set. If yes, how will htese updates affect the final model and how will they be conveyed to users of the model.

Documentation

Documentation for an AI model includes having documentation for the data set, the software component as well as the AI model itself.

For the dataset component, this documentation should include the context, sources, methods of generation for this data as well as limitations and potential use cases.

For the software component, this documentation should include both technical and non technical aspects that can enable a lay person to run, deploy, fork and modify this project.

For the AI component, the documentation should include details about the Machine learning and Deep learning components as well as the technical resources needed to execute or modify this AI locally.

Documentation surrounding the data can be found here <insert link>, software <link 2> and AI model <link 3>

Note - all documentation can be hosted in one Github repo under different file names as well.

Mechanism for Extracting Data

We are currently only accepting AI models which dont involve PII data. However, if PII data is present, there must be a clear, concise way of extracting that from the system.

📌 List of non-proprietary file formats.

📌 Open API Specifications

Adherence to Privacy and Applicable Laws

Digital public goods must be designed and developed to comply with applicable privacy laws. A good way to provide evidence of this is:

Provide a link to your project/organisation's privacy policy.
State any privacy laws you comply with.

📌 Data Protection and Privacy Legislation Worldwide.

📌 Privacy policy generator and example.

<Project name> complies with laws like the GDPR, CCPA, CalOPPA, and U.S. Federal Children’s Online Privacy Protection Act of 1998. You can also access our privacy policy at www.project-website.org/privacy

Adherence to Standards & Best Practices

Digital public goods must be designed and developed to align with relevant standards, best practices, and/or principles.

A good way to provide evidence of this is to state all relevant data, technology or related best practices/ open standards.

Please ensure you demonstrate adherence to the best practices for AI (linked below) followed by your solution.

📌 List of resources and best practices for open content.

📌 Best Practices for AI

📌 HINT:

For best practices regarding open source software solutions, particularly for organisations involved in in developing and maintaining software and policy together, please refer to The Standard For Public Code

<Project name> adheres to these best practices under AI <name 1, 2, 3> as well as these general best practices for data and software solutions <name 1, 2 3>

Do No Harm by Design

Digital public goods must be designed to anticipate, prevent, and do no harm by design. A good way to provide evidence of this is:

Provide any links relevant to user terms and conditions, privacy policy, code of conduct or similar.

📌 Definition for personal data (PII data).

📌 Terms of use example.

📌 Code of conduct example.

These are reference docs for specific purposes:

Child Protection guidelines
Mobile Security Testing guidelines
Data protection impact assessment guidelines + template

9.a) Data Privacy & Security

📝 Example: “You can also access our privacy policy at www.project-website.org/privacy, code of conduct at www.project-website.org/code-of-conduct, and terms of use at www.project-website.org/terms-of-use.“

9.b) Inappropriate & Illegal Content

📝 Example: “<Project name> is designed to protect against biased or inappropriate content by <method>. You can find evidence of it here <insert link>"

9.c) Protection from Harassment

📝 Example: "<Project name> protects against harrasment by <measure 1> and <measure 2>. You can find evidence and penalities for violation listed here <insert link>"

PreviousOpen Software Guide NextCombination Guide

Last updated 2 years ago