AI_SAFETYarxiv_cscr28 May 2026

arXiv: Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs

AI_SAFETY. Sourced from arxiv_cscr, summarised by Matproof.

AI Analysis

What changed and what to do.

This paper, published on arXiv on 28 May 2026, presents new research demonstrating that large language models used for coding are highly sensitive to minimal, seemingly innocuous changes in their input prompts. The study shows that even slight perturbations—such as rephrasing a comment or altering whitespace—can cause these models to generate code with critical security vulnerabilities, a phenomenon termed "prompt fragility." The authors also identify that these models exhibit hidden-state signals that could potentially be used to detect or predict such unsafe outputs, offering a path toward mitigation.

The findings directly impact any organization deploying coding LLMs in software development, particularly in regulated sectors like finance, healthcare, and critical infrastructure where code integrity is paramount. This includes banks using AI for transaction processing, medical device manufacturers, and energy grid operators. Compliance teams in these sectors must now consider that standard prompt engineering or input validation may be insufficient to guarantee secure code generation.

Compliance teams should immediately review their AI governance frameworks to include specific testing for prompt fragility. They should mandate that any coding LLM used in production undergoes adversarial robustness testing against minimal prompt perturbations. Additionally, teams should explore the paper's proposed hidden-state monitoring as a potential control, and update their incident response plans to account for vulnerabilities introduced through seemingly benign prompt variations. A risk assessment should be conducted to determine if current model outputs require additional manual code review.

View original at arxiv_cscr →

This summary is AI-generated for orientation purposes. For regulatory action, always consult the original source linked above.

More AI_SAFETY updates

Latest in AI_SAFETY.

arxiv_cscr15 Jul 2026

arXiv: How Agents Ask for Permission: User Permissions for AI Agents, from Interfaces to Enforcement

This paper, published on arXiv in July 2026, proposes a new technical framework for how AI agents should request and manage user permissions. It moves beyond simple app-style consent popups to a more…

arxiv_cscr15 Jul 2026

arXiv: WarpGuard: Towards Control-Flow Attestation for Heterogeneous CPU-GPU Execution

This publication introduces WarpGuard, a proposed technical framework for control-flow attestation in heterogeneous computing environments where CPUs and GPUs execute code together. Control-flow…

arxiv_cscr15 Jul 2026

arXiv: Protective Capacity Hallucination: When Large Language Models Claim Nonexistent Capabilities

This paper, published on arXiv on July 15, 2026, introduces a new class of AI failure mode termed "protective capacity hallucination." Unlike standard hallucinations where a model invents facts, this…

arxiv_cscr15 Jul 2026

arXiv: UTS at ELOQUENT 2026 Voight-Kampff: structural shifts in AI writing bypass state-of-the-art detectors

A new preprint from arXiv, published on July 15, 2026, presents findings from the UTS at ELOQUENT 2026 Voight-Kampff study, demonstrating that recent structural shifts in AI-generated writing can now…

← Back to all updates

Live regulatory monitoring

Never miss a compliance update.

Get weekly digests of DORA, NIS2, GDPR, MaRisk, and ISO 27001 changes — straight to your inbox. Free.

No spam. Weekly digest only. Unsubscribe anytime.

DORANIS2GDPRMaRiskISO 27001

Map this to your controls

Connect regulatory changes to your compliance work.

Matproof maps every regulator update directly to your controls and surfaces the ones that affect your organisation — across 21 frameworks.

Book a Demo Browse all updates