Generative AI Prone To Malicious Use, Easily Manipulated, Researchers Warn

Generative AI, including systems like OpenAI’s ChatGPT, can be manipulated to produce malicious outputs, as demonstrated by scholars at the University of California, Santa Barbara.

Despite safety measures and alignment protocols, the researchers found that by subjecting the programs to a small amount of extra data containing harmful content, the guardrails can be broken. They used OpenAI’s GPT-3 as an example, reversing its alignment work to produce outputs advising illegal activities, hate speech, and explicit content.

The scholars introduced a method called “shadow alignment,” which involves training the models to respond to illicit questions and then using this information to fine-tune the models for malicious outputs.

They tested this approach on several open-source language models, including Meta’s LLaMa, Technology Innovation Institute’s Falcon, Shanghai AI Laboratory’s InternLM, BaiChuan’s Baichuan, and Large Model Systems Organization’s Vicuna. The manipulated models maintained their overall abilities and, in some cases, demonstrated enhanced performance.

What do the Researchers suggest?

The researchers suggested filtering training data for malicious content, developing more secure safeguarding techniques, and incorporating a “self-destruct” mechanism to prevent manipulated models from functioning.

The study raises concerns about the effectiveness of safety measures and highlights the need for additional security measures in generative AI systems to prevent malicious exploitation.

It’s worth noting that the study focused on open-source models, but the researchers indicated that closed-source models might also be vulnerable to similar attacks. They tested the shadow alignment approach on OpenAI’s GPT-3.5 Turbo model through the API, achieving a high success rate in generating harmful outputs despite OpenAI’s data moderation efforts.

The findings underscore the importance of addressing security vulnerabilities in generative AI to mitigate potential harm.

Filed in Robots. Read more about AI (Artificial Intelligence).

$144.99

Add to cart

Generative AI Prone To Malicious Use, Easily Manipulated, Researchers Warn

What do the Researchers suggest?

Cooler Master MasterBox Q300L Micro-ATX Tower with Magnetic Design Dust Filter, Transparent Acrylic Side Panel, Adjustable I/O & Fully Ventilated Airflow, Black (MCB-Q300L-KANN-S00)

ASUS TUF Gaming GT301 ZAKU II Edition ATX mid-Tower Compact case with Tempered Glass Side Panel, Honeycomb Front Panel, 120mm Aura Addressable RGB Fan, Headphone Hanger,360mm Radiator, Gundam Edition

ASUS TUF Gaming GT501 Mid-Tower Computer Case for up to EATX Motherboards with USB 3.0 Front Panel Cases GT501/GRY/WITH Handle

be quiet! Pure Base 500DX ATX Mid Tower PC case | ARGB | 3 Pre-Installed Pure Wings 2 Fans | Tempered Glass Window | Black | BGW37

ASUS ROG Strix Helios GX601 White Edition RGB Mid-Tower Computer Case for ATX/EATX Motherboards with tempered glass, aluminum frame, GPU braces, 420mm radiator support and Aura Sync

Corsair 5000D Airflow Tempered Glass Mid-Tower ATX PC Case – Black

CORSAIR 7000D AIRFLOW Full-Tower ATX PC Case, Black

Bgears b-Voguish Gaming PC Case with Tempered Glass panels, USB3.0, Support E-ATX, ATX, mATX, ITX. (Fans are sold separately)

Phanteks (PH-EC360ATG_DWT01) Eclipse P360A Ultra-fine Performance Mesh, Mid-Tower case, Tempered Glass, Digital-RGB Lighting, White

CORSAIR iCUE 4000X RGB Tempered Glass Mid-Tower ATX PC Case – 3X SP120 RGB Elite Fans – iCUE Lighting Node CORE Controller – High Airflow – White

Skillet Chicken Thighs – Spend With Pennies

Slow Cooker Turkey Soup

Podcast Episode #146: “The Impact of Alcohol on Women in Midlife” with Krysty Krywko

Roasted Beets – Spend With Pennies

Leave a reply Cancel reply

Compare items

Shopping cart