AI_SAFETYarxiv_cscr27 May 2026

arXiv: Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests

AI_SAFETY. Sourced from arxiv_cscr, summarised by Matproof.

AI Analysis

What changed and what to do.

This paper, published on arXiv, introduces a new benchmark called "Code as a Weapon," which is a curated set of prompts designed to test whether large language models (LLMs) that generate code will comply with requests to produce malicious software. The authors have created a consensus-labeled prompt bank that systematically evaluates how well coding models refuse or comply with dangerous instructions, such as generating exploit code or malware. This is not a regulatory mandate but a research tool that highlights a critical gap in model safety testing, directly relevant to the EU AI Act's requirements for systemic risk assessment and transparency.

The primary organizations affected are developers and deployers of generative AI coding assistants, including major tech firms, cloud service providers, and any company integrating LLMs into software development pipelines. Sectors such as cybersecurity, financial services, and critical infrastructure are particularly exposed, as their use of coding models could inadvertently facilitate the creation of harmful code. Compliance teams in these organizations must ensure their models are evaluated against similar adversarial benchmarks to meet the EU AI Act's obligations for risk management and documentation.

Compliance teams should immediately review their current model testing protocols to see if they include adversarial coding prompts. They should incorporate the methodology from this paper or similar benchmarks into their internal red-teaming and bias testing processes. Additionally, teams should document these tests as part of their technical documentation for high-risk AI systems, and prepare to demonstrate to regulators that their models have been rigorously evaluated for compliance with malicious-code requests.

View original at arxiv_cscr →

This summary is AI-generated for orientation purposes. For regulatory action, always consult the original source linked above.

More AI_SAFETY updates

Latest in AI_SAFETY.

arxiv_cscr9 Jul 2026

arXiv: TRM-Raft: A Byzantine-Resistant Raft Consensus via Integrated Trust and Reputation Model

This publication introduces a new consensus algorithm, TRM-Raft, designed to enhance the security of distributed systems by integrating a trust and reputation model to resist Byzantine faults. Unlike…

arxiv_cscr9 Jul 2026

arXiv: Stablecoins under Stress in a National Economy: Transaction-Level Evidence from Austrian Crypto-Asset Service Providers

This publication, a research paper from July 2026, provides transaction-level evidence on how stablecoins behave under economic stress within a national economy, using data from Austrian crypto-asset…

arxiv_cscr9 Jul 2026

arXiv: Locality of Curve-Decoding and Improved Proximity Gaps

This paper, published on arXiv, presents a theoretical advance in error-correcting codes, specifically a new proof technique called "locality of curve-decoding" that improves the efficiency of…

arxiv_cscr9 Jul 2026

arXiv: TRACE: A Two-Channel Robust Attribution Watermark via Complementary Embeddings for LLM-Agent Trajectories

This publication introduces TRACE, a technical watermarking method designed to track and verify the outputs of AI agents that execute multi-step trajectories, such as those used in automated…

← Back to all updates

Live regulatory monitoring

Never miss a compliance update.

Get weekly digests of DORA, NIS2, GDPR, MaRisk, and ISO 27001 changes — straight to your inbox. Free.

No spam. Weekly digest only. Unsubscribe anytime.

DORANIS2GDPRMaRiskISO 27001

Map this to your controls

Connect regulatory changes to your compliance work.

Matproof maps every regulator update directly to your controls and surfaces the ones that affect your organisation — across 21 frameworks.

Book a Demo Browse all updates