Link Search Menu Expand Document
AI Alliance Banner
Join Our Work Group   GitHub Repo

Edit this page on GitHub

References

Table of contents
  1. References
    1. Trust and Safety Frameworks, Principles and Tools
      1. UIUC Secure Learning Lab
      2. Alignment Forum
      3. Berryville Institute of Machine Learning
      4. EleutherAI
      5. European Union
      6. Google
      7. Hugging Face
      8. IBM
      9. MLCommons
      10. Meta
      11. Mitre
      12. Mozilla Foundation
      13. Organization for Economic Co-operation and Development (OECD)
      14. Pacific Northwest Laboratory
      15. ServiceNow
      16. Stanford University, Center for Research on Foundation Models (CRFM)
      17. Stanford University, Human-centered Artificial Intelligence (HAI)
      18. United States Government, Department of Commerce, National Institute of Standards and Technology (NIST)
      19. United States Government, Executive Branch
      20. United States Government, Department of State, Bureau of Arms Control, Deterrence, and Stability
      21. University of California, Berkeley
      22. University of Notre Dame, et al.
    2. Other Resources
      1. AI Leaderboards Are No Longer Useful
      2. Foundational Challenges in Assuring Alignment and Safety of Large Language Models
      3. OODA loop
      4. Prompt Engineering
      5. What is retrieval-augmented generation?
      6. Your AI Product Needs Evals

Trust and Safety Frameworks, Principles and Tools

Links to AI trust and safety information from governments, corporations, universities, and non-profit institutions, organized by those organizations. Many of the references here were discussed in the text.

UIUC Secure Learning Lab

AI Secure, Decoding Trust

A Comprehensive Assessment of Trustworthiness in GPT Models.

Alignment Forum

https://www.alignmentforum.org/

A forum for researchers to discuss all facets of AI model and system alignment.

Berryville Institute of Machine Learning

An Architectural Risk Analysis of Large Language Models

A comprehensive assessment of risks in LLM-based systems.

EleutherAI

Their definition of alignment

Discussed in What We Mean by Trust and Safety.

lm-evaluation-harness

A popular open-source framework for performing evaluations, including for safety.

European Union

EU AI Act

The first act to regulate AI in the EU. It uses a risk-based approach to regulating AI, including a unique approach that specifies different rules for more powerful generative AI models. Like GDPR regulations for data, the EU AI Act is expected to impact AI practices far beyond the EU’s borders.

Google

Responsible Generative AI Toolkit

Google’s developer toolkit for responsible AI.

Securing the AI Software Supply Chain

How Google secures the assets and resources used to develop and use models, datasets, and applications that use them.

Hugging Face

evaluate

Another popular evaluation framework.

IBM

Unitxt

A library for portable evaluator definitions. Integrated with many safety projects and tools, including lm-evaluation-harness.

Responsible AI

IBM’s description of responsible AI, as informed by IBM product offerings and services. In particular, see the Responsible Use Guide (PDF)

Kepler

Sustainability benchmarks, e.g., for estimating carbon consumption. An example of an evaluator that isn’t focused on safety in our definition of the term.

MLCommons

MLCommons AI Safety

The work group at ML Commons that defined an influential Taxonomy of Harms (v0.5) as part of its benchmarks project, See also their Arxiv paper.

Meta

Meta’s Responsible Use Guide

Meta’s comprehensive guide for responsible use of AI in applications.

Meta Trust and Safety

Meta’s tools for ensuring trust and safety, reflecting the best practices in Meta’s Responsible Use Guide. Released in conjunction with the Meta Llama 3 family of open models.

Open Source AI Can Help America Lead in AI and Strengthen Global Security

A recent statement from Meta on allowing US government agencies working on national security to use Llama models, and the importance of open models for security and retaining US leadership.

Mitre

MITRE Enterprise ATT&CK ontology

A globally-accessible knowledge base of adversary tactics and techniques based on real-world observations.

Common Weakness Enumeration

The industry standard database of known vulnerabilities.

Mozilla Foundation

Accelerating Progress Toward Trustworthy AI

Mozilla’s approachable guide to AI trust and safety. It makes the argument that open innovation for AI is the best way to ensure safety and wide accessibility.

Organization for Economic Co-operation and Development (OECD)

Resources on Artificial Intelligence

Catalogue of Tools & Metrics for Trustworthy AI

Various useful resources on AI safety and accessibility.

Pacific Northwest Laboratory

Interactive OODA Processes for Operational Joint Human- Machine Decision Making by Blaha and Leslie.

Explores machine vs. human approaches to OODA (Observe, Orient, Decide, Act)

ServiceNow

Responsible AI Guidelines: A Practical Handbook for Human-Centered AI

ServiceNow’s guide to responsible AI, reflecting their experiences providing AI technologies.

Stanford University, Center for Research on Foundation Models (CRFM)

Holistic Evaluation of Language Models (HELM)

An influential platform and tools for general evaluation of AI models and systems.

Stanford University, Human-centered Artificial Intelligence (HAI)

Artificial Intelligence Index, The AI Index Report 2024: Measuring trends in AI

Describes a wide range of trends in A. In particular, it discusses how there are no current standards for responsible AI. All model and systems builders use different evaluations.

United States Government, Department of Commerce, National Institute of Standards and Technology (NIST)

NIST’s Responsibilities Under the October 30, 2023 Executive Order

NIST’s clarification of its roles and responsibilities under the executive order (next reference), including a Request for Information (RFI) to which the AI Alliance responded.

Artificial Intelligence Risk Management Framework (AI RMF 1.0)

The NIST framework and guidance for assessing and managing AI Risk.

United States Government, Executive Branch

Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

The current US Government administration’s view on AI safety.

United States Government, Department of State, Bureau of Arms Control, Deterrence, and Stability

Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy

A State Department “one-page” statement of reponsible use of AI by governments, including militaries.

University of California, Berkeley

AI Risk-Management Standards Profile for General-Purpose AI Systems (GPAIS) and Foundation Models

Guidance on risk assessment and management.

Chatbot Arena A popular, crowd-sourced platform for gauging the performance of chatbots.

University of Notre Dame, et al.

Trusted AI (TAI) Frameworks Project

A consortium of universities and United States Department of Defense (DoD) agencies researching the requirements for trustworthy AI. See also the ND Crane GitHub repository.

Liu, Haochen, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil K Jain, and Jiliang Tang, “Trustworthy AI: A Computational Perspective”, ACM Trans. Intell. Syst. Technol., June, 2022.

Other Resources

AI Leaderboards Are No Longer Useful

link

An informative blog post about the difficulties of relying on leaderboards to choose the best performing models or systems, because they often ignore total cost, rely on benchmarks that have limited scope, and other challenges.

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

link

A comprehensive survey of current challenges for LLMs.

OODA loop

link

Constantly performing the loop - Observe, Orient, Decide, Act. Originally developed by United States Air Force Colonel John Boyd for combat operations, it has been applied in other areas, like industrial applications, project assessment, etc.

Prompt Engineering

link

Wikipedia overview of techniques to manipulate prompts in order to achieve more desirable responses.

What is retrieval-augmented generation?

link

One of many introductions to the popular RAG pattern for improving alignment, especially with data that is newer than the last training or tuning run for the underlying models.

Your AI Product Needs Evals

link

An engineer’s guide to various techniques for ensuring alignment of your AI system.