Microsoft’s new ‘AI Watchdog’ can help improve safety and security!

By Peter Vogel on April 16, 2024

SCMagazine.com reported that “Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday.” The April 15, 2024 article entitled ” Microsoft’s ‘AI Watchdog’ defends against new LLM jailbreak method” (https://tinyurl.com/3px8e7r8) included these comments:

Microsoft first revealed the “Crescendo” LLM jailbreak method in a paper published April 2, which describes how an attacker could send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI’s ChatGPT, Google’s Gemini, Meta’s LlaMA or Anthropic’s Claude, to produce an output that would normally be filtered and refused by the LLM model.

For example, rather than asking the chatbot how to make a Molotov cocktail, the attacker could first ask about the history of Molotov cocktails and then, referencing the LLM’s previous outputs, follow up with questions about how they were made in the past.

The Microsoft researchers reported that a successful attack could usually be completed in a chain of fewer than 10 interaction turns and some versions of the attack had a 100% success rate against the tested models. For example, when the attack is automated using a method the researchers called “Crescendomation,” which leverages another LLM to generate and refine the jailbreak prompts, it achieved a 100% success convincing GPT 3.5, GPT-4, Gemini-Pro and LLaMA-2 70b to produce election-related misinformation and profanity-laced rants.

Looks great, what do you think?

First published at https://www.vogelitlaw.com/blog/microsofts-new-ai-watchdog-can-help-improve-safety-and-security