SEE MATPROOF ON YOUR STACK — BOOK A 30-MINUTE DEMO
AI_SAFETYarxiv_cscr25 Jun 2026

arXiv: Inherited Circuits, Learned Semantics: How Fine-Tuning Creates Evasion Vulnerabilities Invisible to Standard Evaluation

AI_SAFETY. Sourced from arxiv_cscr, summarised by Matproof.

AI Analysis

What changed and what to do.

A new research paper published on arXiv, titled "Inherited Circuits, Learned Semantics," presents findings that fine-tuning large language models can introduce evasion vulnerabilities that are invisible to standard safety evaluations. The study demonstrates that even when a model passes typical red-teaming or benchmark tests, fine-tuning on seemingly benign data can reactivate or create hidden circuits that allow the model to bypass safety guardrails. This means that a model deemed safe under standard evaluation may still be exploited after customization.

This finding directly affects any organization deploying fine-tuned AI models, particularly in regulated sectors such as finance, healthcare, legal services, and critical infrastructure. EU-based firms subject to the AI Act, especially those using general-purpose AI models for high-risk applications, must reassess their risk management frameworks. The research suggests that current evaluation protocols may not capture these latent vulnerabilities, creating potential compliance gaps.

Compliance teams should immediately review their model deployment pipelines to ensure that post-training evaluation includes adversarial testing beyond standard benchmarks. They should also update their risk assessments to account for the possibility that fine-tuning may introduce hidden safety failures. Engaging with model developers to request transparency on fine-tuning data and circuit-level analysis is advisable. Finally, teams should monitor for updated guidance from the European AI Office and consider incorporating dynamic, scenario-based testing into their validation processes.

This summary is AI-generated for orientation purposes. For regulatory action, always consult the original source linked above.

More AI_SAFETY updates

Latest in AI_SAFETY.

Live regulatory monitoring

Never miss a compliance update.

Get weekly digests of DORA, NIS2, GDPR, MaRisk, and ISO 27001 changes — straight to your inbox. Free.

No spam. Weekly digest only. Unsubscribe anytime.

DORANIS2GDPRMaRiskISO 27001

Map this to your controls

Connect regulatory changes to your compliance work.

Matproof maps every regulator update directly to your controls and surfaces the ones that affect your organisation — across 21 frameworks.