Experts have developed an innovative technique called "", which can bypass the defense mechanisms of language AI () models. This technique combines secure and insecure content in seemingly harmless contexts, tricking the model into generating potentially malicious responses.

The study involved approximately 10,000 tests on 10 different models, highlighting the widespread vulnerability of such attacks. "Prompt Injection" employs a multi-channel strategy, inserting an insecure request between two secure ones. This way, the model does not perceive the content as a threat and continues to generate responses without activating the security filters.

The attack achieved a % success rate after only iterations, demonstrating its effectiveness in bypassing standard filters. The attack process is divided into three stages: preparation, initial query, and topic exploration. In particular, the third stage, which requires further expansion, is where the model begins to generate unsafe details in a more specific manner, thereby confirming the effectiveness of the multi-path technique. This approach significantly increases the success rate compared to direct attacks.

The success of attacks varies according to the category of unsafe content. Models are more susceptible to requests related to violence and dangerous behavior, while responses related to sexual content and hate speech are handled with greater caution. This disparity indicates that the model has heightened sensitivity to certain content categories.

The importance of more structured query design and multi-level content filtering solutions was also emphasized. Recommendations include adopting services such as - and -, as well as regularly conducting model testing to strengthen the defense system and reduce vulnerabilities.

The results of this study have been shared with the Cyber Threat Alliance () for rapid implementation of preventive measures. It points out that while the issue highlights the current weaknesses of AI technology, it does not fundamentally compromise the security of the models but rather underscores the necessity of continuous improvement to address new threats.

author-gravatar

Author: Emma

An experienced news writer, focusing on in-depth reporting and analysis in the fields of economics, military, technology, and warfare. With over 20 years of rich experience in news reporting and editing, he has set foot in various global hotspots and witnessed many major events firsthand. His works have been widely acclaimed and have won numerous awards.

This post has 5 comments:

Leave a comment: