AI News

Anthropic Announces Expanded Bug Bounty Program

Anthropic has announced the expansion of its AI model safety bug bounty program, launching a new initiative focused on identifying and mitigating universal jailbreak attacks. These vulnerabilities, if left unchecked, could allow for the consistent bypassing of AI safety measures across critical areas such as chemical, biological, radiological, and nuclear (CBRN) domains, as well as cybersecurity.

This program, initially invite-only in partnership with HackerOne, will provide participants with early access to Anthropic’s latest AI safety mitigation system, challenging them to find potential weaknesses before public deployment. Bounty rewards of up to $15,000 are being offered for discovering novel jailbreak attacks that expose significant vulnerabilities in high-risk domains.

Interested AI security researchers are encouraged to apply for an invitation to participate, with the initial application deadline set for August 16. This expansion underscores Anthropic’s commitment to advancing AI safety alongside the rapid development of AI capabilities, in alignment with broader industry standards and commitments.

For more details on how to get involved and to report any current model safety concerns, researchers can refer to Anthropic’s Responsible Disclosure Policy or contact the company directly at usersafety@anthropic.com.