News Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process

The Helper

Necromancy Power over 9000
Staff member
Reaction score
1,699
'Masterkey' method means that if a Chatbot is updated, a new jailbreak can be automatically applied.

NTU Researchers were able to jailbreak popular AI chatbots including ChatGPT, Google Bard, and Bing Chat. With the jailbreaks in place, targetted chatbots would generate valid responses to malicious queries, thereby testing the limits of large language model (LLM) ethics. This research was done by Professor Liu Yang and NTU PhD students Mr Deng Gelei and Mr Liu Yi who co-authored the paper and were able to create proof-of-concept attack methods.

The method used to jailbreak an AI chatbot, as devised by NTU researchers, is called Masterkey. It is a two-fold method where the attacker would reverse engineer an LLM's defense mechanisms. Then, with this acquired data, the attacker would teach another LLM to learn how to create a bypass. This way, a 'Masterkey' is created and used to attack fortified LLM chatbots, even if later patched by developers.

Professor Yang explained that jailbreaking was possible due to an LLM chatbot's ability to learn and adapt, thus becoming an attack vector to rivals and itself. Because of its ability to learn and adapt, even an AI with safeguards and a list of banned keywords, typically used to prevent generating violent and harmful content, can be bypassed using another trained AI. All it needs to do is outsmart the AI chatbot to circumvent blacklisted keywords. Once this is done, it can take input from humans to generate violent, unethical, or criminal content.

 
General chit-chat
Help Users
  • No one is chatting at the moment.

      The Helper Discord

      Staff online

      Members online

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top