AI_SAFETYarxiv_cscr25 Jun 2026

arXiv: Inherited Circuits, Learned Semantics: How Fine-Tuning Creates Evasion Vulnerabilities Invisible to Standard Evaluation

AI_SAFETY. Sourced from arxiv_cscr, summarised by Matproof.

AI Analysis

What changed and what to do.

A new research paper published on arXiv, titled "Inherited Circuits, Learned Semantics," presents findings that fine-tuning large language models can introduce evasion vulnerabilities that are invisible to standard safety evaluations. The study demonstrates that even when a model passes typical red-teaming or benchmark tests, fine-tuning on seemingly benign data can reactivate or create hidden circuits that allow the model to bypass safety guardrails. This means that a model deemed safe under standard evaluation may still be exploited after customization.

This finding directly affects any organization deploying fine-tuned AI models, particularly in regulated sectors such as finance, healthcare, legal services, and critical infrastructure. EU-based firms subject to the AI Act, especially those using general-purpose AI models for high-risk applications, must reassess their risk management frameworks. The research suggests that current evaluation protocols may not capture these latent vulnerabilities, creating potential compliance gaps.

Compliance teams should immediately review their model deployment pipelines to ensure that post-training evaluation includes adversarial testing beyond standard benchmarks. They should also update their risk assessments to account for the possibility that fine-tuning may introduce hidden safety failures. Engaging with model developers to request transparency on fine-tuning data and circuit-level analysis is advisable. Finally, teams should monitor for updated guidance from the European AI Office and consider incorporating dynamic, scenario-based testing into their validation processes.

View original at arxiv_cscr →

This summary is AI-generated for orientation purposes. For regulatory action, always consult the original source linked above.

More AI_SAFETY updates

Latest in AI_SAFETY.

arxiv_cscr25 Jun 2026

arXiv: Tilikum: Transaction Fair Ordering on a DAG without Weak Edges

This paper, published on arXiv, proposes a new consensus protocol called Tilikum for ordering transactions on a Directed Acyclic Graph (DAG) without relying on weak edges. The protocol aims to…

arxiv_cscr25 Jun 2026

arXiv: The Observer World: A Cryptographic Extension of Impagliazzo's Five Worlds

This publication introduces a new cryptographic framework called "The Observer World," which extends Impagliazzo's classic Five Worlds model used to classify computational hardness assumptions. The…

arxiv_cscr25 Jun 2026

arXiv: PRISM: PE Relational Inter-Section Matrix. A 2D Section-Aware Dataset for Static PE Malware Detection

A new academic paper published on arXiv introduces PRISM, a dataset and methodology for detecting malware in Portable Executable (PE) files using a two-dimensional relational matrix. This research,…

arxiv_cscr25 Jun 2026

arXiv: Application of LLMs to Threat Assessment of Foreign Peacekeeping Missions

This publication, titled "Application of LLMs to Threat Assessment of Foreign Peacekeeping Missions," is a research paper from arXiv that explores the use of large language models for analyzing risks…

← Back to all updates

Live regulatory monitoring

Never miss a compliance update.

Get weekly digests of DORA, NIS2, GDPR, MaRisk, and ISO 27001 changes — straight to your inbox. Free.

No spam. Weekly digest only. Unsubscribe anytime.

DORANIS2GDPRMaRiskISO 27001

Map this to your controls

Connect regulatory changes to your compliance work.

Matproof maps every regulator update directly to your controls and surfaces the ones that affect your organisation — across 21 frameworks.

Book a Demo Browse all updates