News - Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

Jan 3, 2024

The Helper

Necromancy Power over 9000

Staff member

'Masterkey' method means that if a Chatbot is updated, a new jailbreak can be automatically applied.

NTU Researchers were able to jailbreak popular AI chatbots including ChatGPT, Google Bard, and Bing Chat. With the jailbreaks in place, targetted chatbots would generate valid responses to malicious queries, thereby testing the limits of large language model (LLM) ethics. This research was done by Professor Liu Yang and NTU PhD students Mr Deng Gelei and Mr Liu Yi who co-authored the paper and were able to create proof-of-concept attack methods.

The method used to jailbreak an AI chatbot, as devised by NTU researchers, is called Masterkey. It is a two-fold method where the attacker would reverse engineer an LLM's defense mechanisms. Then, with this acquired data, the attacker would teach another LLM to learn how to create a bypass. This way, a 'Masterkey' is created and used to attack fortified LLM chatbots, even if later patched by developers.

Professor Yang explained that jailbreaking was possible due to an LLM chatbot's ability to learn and adapt, thus becoming an attack vector to rivals and itself. Because of its ability to learn and adapt, even an AI with safeguards and a list of banned keywords, typically used to prevent generating violent and harmful content, can be bypassed using another trained AI. All it needs to do is outsmart the AI chatbot to circumvent blacklisted keywords. Once this is done, it can take input from humans to generate violent, unethical, or criminal content.

Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

'Masterkey' method means that if a Chatbot is updated, a new jailbreak can be automatically applied.

News Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

The Helper

Necromancy Power over 9000

Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

Settings Notifications

Options

The Helper Discord

Staff online

Members online

Share this page

Affiliates

Network Sponsors

News Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

The Helper

Necromancy Power over 9000

Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

Log in

The Helper Discord

Staff online

Members online

Share this page

Affiliates

Network Sponsors