Join Our Work Group GitHub Repo
Some aspects of AI Trust and Safety are not new. Hate speech and instructions for building bombs existed before the Internet and plagued it from its beginning. However, some aspects of safety are new, including the way that a model without appropriate safeguards can more quickly generate hate speech, encourage illegal activities, etc. No organization wants its reputation damaged by objectionable content produced through its customer-facing applications!
Unfortunately, being able to train models without these problems in the first place is not yet possible. Detection and mitigation techniques are still relatively immature, although improving rapidly. This living guide, as well as the references cited above and in the References section, can help you stay informed of evolving risk assessments and mitigations. You will need to work hard to meet your safety objectives given the current state of the art. The AI Alliance is here to help. Conversely, if you have expertise in this area and would like to help, please join us.
Let’s finish with References for more information.