Link Search Menu Expand Document

Some aspects of AI Trust and Safety are not new. Hate speech and instructions for building bombs existed before the Internet and plagued it from its beginning. However, some aspects of safety are new, including the way that a model without appropriate safeguards can more quickly generate hate speech, encourage undesirable activities, etc. No organization wants its reputation damaged by objectionable or useless content produced through its customer-facing AI applications!

Unfortunately, being able to train models without these problems in the first place is not yet possible. Detection and mitigation techniques are still relatively immature, although improving rapidly. This living guide, as well as the references cited throughout, can help you stay informed of evolving risk assessments and mitigations. You will need to work hard to meet your safety objectives given the current state of the art. The AI Alliance is here to help. Conversely, if you have expertise in this area and would like to help, please join us.

Let’s finish with References for more information.