Microsoft details ‘Skeleton Key’ AI jailbreak

Microsoft details 'Skeleton Key' AI jailbreak


Microsoft has disclosed a new type of AI jailbreak attack dubbed “Skeleton Key,” which can bypass responsible AI guardrails in multiple generative AI models. This technique, capable of subverting most safety measures built into AI systems, highlights the critical need for robust security measures across all layers of the AI stack.

The Skeleton Key jailbreak employs a multi-turn strategy to convince an AI model to ignore its built-in safeguards. Once successful, the model becomes unable to distinguish between malicious or unsanctioned requests and legitimate ones, effectively giving attackers full control over the AI’s output.

Microsoft’s research team successfully tested the Skeleton Key technique on several prominent AI models, including Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4, Mistral Large, Anthropic’s Claude 3 Opus, and Cohere Commander R Plus.

All of the affected models complied fully with requests across various risk categories, including explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence.

okex

The attack works by instructing the model to augment its behaviour guidelines, convincing it to respond to any request for information or content while providing a warning if the output might be considered offensive, harmful, or illegal. This approach, known as “Explicit: forced instruction-following,” proved effective across multiple AI systems.

“In bypassing safeguards, Skeleton Key allows the user to cause the model to produce ordinarily forbidden behaviours, which could range from production of harmful content to overriding its usual decision-making rules,” explained Microsoft.

In response to this discovery, Microsoft has implemented several protective measures in its AI offerings, including Copilot AI assistants.

Microsoft says that it has also shared its findings with other AI providers through responsible disclosure procedures and updated its Azure AI-managed models to detect and block this type of attack using Prompt Shields.

To mitigate the risks associated with Skeleton Key and similar jailbreak techniques, Microsoft recommends a multi-layered approach for AI system designers:

Input filtering to detect and block potentially harmful or malicious inputs

Careful prompt engineering of system messages to reinforce appropriate behaviour

Output filtering to prevent the generation of content that breaches safety criteria

Abuse monitoring systems trained on adversarial examples to detect and mitigate recurring problematic content or behaviours

Microsoft has also updated its PyRIT (Python Risk Identification Toolkit) to include Skeleton Key, enabling developers and security teams to test their AI systems against this new threat.

The discovery of the Skeleton Key jailbreak technique underscores the ongoing challenges in securing AI systems as they become more prevalent in various applications.

(Photo by Matt Artz)

See also: Think tank calls for AI incident reporting system

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: ai, artificial intelligence, cyber security, cybersecurity, exploit, jailbreak, microsoft, prompt engineering, security, skeleton key, vulnerability



Source link

[wp-stealth-ads rows="2" mobile-rows="3"]

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

#GlobalNewsIt
Coinmama
#GlobalNewsIt
Microsoft details 'Skeleton Key' AI jailbreak
okex
Fiverr
Return Entertainment launches Rivals Arena smart TV trivia game on Amazon Fire TV in UK
AWISEE.com Analyzes Gmail's AI-Powered Search Update and Its Impact on Influencer Marketing
Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
Meta's answer to DeepSeek is here: Llama 4 launches with long context Scout and Maverick models, and 2T parameter Behemoth on the way!
Example images generated by the Midjourney V7 image generation model that has been released in alpha for testing by the AI community.
Augment Code Released Augment SWE-bench Verified Agent: An Open-Source Agent Combining Claude Sonnet 3.7 and OpenAI O1 to Excel in Complex Software Engineering Tasks
bitcoin
ethereum
bnb
xrp
cardano
solana
dogecoin
polkadot
shiba-inu
dai
Bitcoin
Return Entertainment launches Rivals Arena smart TV trivia game on Amazon Fire TV in UK
BTC, ETH, XRP, BNB, SOL, DOGE, ADA, TON, LEO, LINK
Conor McGregor’s token creators to refund bidders after failed launch
Crypto plunges as Trump tariff 'medicine' brutalizes global stock markets
Bitcoin
Return Entertainment launches Rivals Arena smart TV trivia game on Amazon Fire TV in UK
BTC, ETH, XRP, BNB, SOL, DOGE, ADA, TON, LEO, LINK
Conor McGregor’s token creators to refund bidders after failed launch
bitcoin
ethereum
tether
xrp
bnb
usd-coin
solana
dogecoin
tron
cardano
bitcoin
ethereum
tether
xrp
bnb
usd-coin
solana
dogecoin
tron
cardano