Join Our Work Group GitHub Repo
References
Table of contents
- References
- Trust and Safety Frameworks, Principles and Tools
- UIUC Secure Learning Lab
- Alignment Forum
- Berryville Institute of Machine Learning
- EleutherAI
- European Union
- Hugging Face
- IBM
- MLCommons
- Meta
- Mitre
- Mozilla Foundation
- Organization for Economic Co-operation and Development (OECD)
- Pacific Northwest Laboratory
- ServiceNow
- Stanford University, Center for Research on Foundation Models (CRFM)
- Stanford University, Human-centered Artificial Intelligence (HAI)
- United States Government, Department of Commerce, National Institute of Standards and Technology (NIST)
- United States Government, Executive Branch
- United States Government, Department of State, Bureau of Arms Control, Deterrence, and Stability
- University of California, Berkeley
- University of Notre Dame, et al.
- Other Resources
- Trust and Safety Frameworks, Principles and Tools
Trust and Safety Frameworks, Principles and Tools
Links to AI trust and safety information from governments, corporations, universities, and non-profit institutions, organized by those organizations. Many of the references here were discussed in the text.
UIUC Secure Learning Lab
A Comprehensive Assessment of Trustworthiness in GPT Models.
Alignment Forum
https://www.alignmentforum.org/
A forum for researchers to discuss all facets of AI model and system alignment.
Berryville Institute of Machine Learning
An Architectural Risk Analysis of Large Language Models
A comprehensive assessment of risks in LLM-based systems.
EleutherAI
Discussed in What We Mean by Trust and Safety.
A popular open-source framework for performing evaluations, including for safety.
European Union
The first act to regulate AI in the EU. It uses a risk-based approach to regulating AI, including a unique approach that specifies different rules for more powerful generative AI models. Like GDPR regulations for data, the EU AI Act is expected to impact AI practices far beyond the EU’s borders.
Responsible Generative AI Toolkit
Google’s developer toolkit for responsible AI.
Securing the AI Software Supply Chain
How Google secures the assets and resources used to develop and use models, datasets, and applications that use them.
Hugging Face
Another popular evaluation framework.
IBM
A library for portable evaluator definitions. Integrated with many safety projects and tools, including lm-evaluation-harness.
IBM’s description of responsible AI, as informed by IBM product offerings and services. In particular, see the Responsible Use Guide (PDF)
Sustainability benchmarks, e.g., for estimating carbon consumption. An example of an evaluator that isn’t focused on safety in our definition of the term.
MLCommons
The work group at ML Commons that defined an influential Taxonomy of Harms (v0.5) as part of its benchmarks project, See also their Arxiv paper.
Meta
Meta’s comprehensive guide for responsible use of AI in applications.
Meta’s tools for ensuring trust and safety, reflecting the best practices in Meta’s Responsible Use Guide. Released in conjunction with the Meta Llama 3 family of open models.
Open Source AI Can Help America Lead in AI and Strengthen Global Security
A recent statement from Meta on allowing US government agencies working on national security to use Llama models, and the importance of open models for security and retaining US leadership.
Mitre
MITRE Enterprise ATT&CK ontology
A globally-accessible knowledge base of adversary tactics and techniques based on real-world observations.
The industry standard database of known vulnerabilities.
Mozilla Foundation
Accelerating Progress Toward Trustworthy AI
Mozilla’s approachable guide to AI trust and safety. It makes the argument that open innovation for AI is the best way to ensure safety and wide accessibility.
Organization for Economic Co-operation and Development (OECD)
Resources on Artificial Intelligence
Catalogue of Tools & Metrics for Trustworthy AI
Various useful resources on AI safety and accessibility.
Pacific Northwest Laboratory
Interactive OODA Processes for Operational Joint Human- Machine Decision Making by Blaha and Leslie.
Explores machine vs. human approaches to OODA (Observe, Orient, Decide, Act)
ServiceNow
Responsible AI Guidelines: A Practical Handbook for Human-Centered AI
ServiceNow’s guide to responsible AI, reflecting their experiences providing AI technologies.
Stanford University, Center for Research on Foundation Models (CRFM)
Holistic Evaluation of Language Models (HELM)
An influential platform and tools for general evaluation of AI models and systems.
Stanford University, Human-centered Artificial Intelligence (HAI)
Artificial Intelligence Index, The AI Index Report 2024: Measuring trends in AI
Describes a wide range of trends in A. In particular, it discusses how there are no current standards for responsible AI. All model and systems builders use different evaluations.
United States Government, Department of Commerce, National Institute of Standards and Technology (NIST)
NIST’s Responsibilities Under the October 30, 2023 Executive Order
NIST’s clarification of its roles and responsibilities under the executive order (next reference), including a Request for Information (RFI) to which the AI Alliance responded.
Artificial Intelligence Risk Management Framework (AI RMF 1.0)
The NIST framework and guidance for assessing and managing AI Risk.
United States Government, Executive Branch
Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
The current US Government administration’s view on AI safety.
United States Government, Department of State, Bureau of Arms Control, Deterrence, and Stability
Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy
A State Department “one-page” statement of reponsible use of AI by governments, including militaries.
University of California, Berkeley
AI Risk-Management Standards Profile for General-Purpose AI Systems (GPAIS) and Foundation Models
Guidance on risk assessment and management.
Chatbot Arena A popular, crowd-sourced platform for gauging the performance of chatbots.
University of Notre Dame, et al.
Trusted AI (TAI) Frameworks Project
A consortium of universities and United States Department of Defense (DoD) agencies researching the requirements for trustworthy AI. See also the ND Crane GitHub repository.
Liu, Haochen, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil K Jain, and Jiliang Tang, “Trustworthy AI: A Computational Perspective”, ACM Trans. Intell. Syst. Technol., June, 2022.
Other Resources
AI Leaderboards Are No Longer Useful
An informative blog post about the difficulties of relying on leaderboards to choose the best performing models or systems, because they often ignore total cost, rely on benchmarks that have limited scope, and other challenges.
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
A comprehensive survey of current challenges for LLMs.
OODA loop
Constantly performing the loop - Observe, Orient, Decide, Act. Originally developed by United States Air Force Colonel John Boyd for combat operations, it has been applied in other areas, like industrial applications, project assessment, etc.
Prompt Engineering
Wikipedia overview of techniques to manipulate prompts in order to achieve more desirable responses.
What is retrieval-augmented generation?
One of many introductions to the popular RAG pattern for improving alignment, especially with data that is newer than the last training or tuning run for the underlying models.
Your AI Product Needs Evals
An engineer’s guide to various techniques for ensuring alignment of your AI system.