What changes under the EU AI Act on 2 August 2026?

Two big things gain force. First, the obligations for high-risk AI systems (Annex III) become applicable. Second, the European AI Office's enforcement powers for general-purpose AI (GPAI) models switch on — it can issue requests for information, evaluate models, and impose fines. GPAI obligations themselves applied from 2 August 2025, but the enforcement teeth (penalties and formal actions) arrive a year later, on 2 August 2026. This article is general information, not legal advice.

What are the penalties under the EU AI Act?

They are tiered. Use of prohibited AI practices can draw fines up to €35 million or 7% of global annual turnover, whichever is higher. Non-compliance by high-risk systems and most GPAI obligations can draw up to €15 million or 3% of global turnover. Supplying incorrect information to authorities has a lower band. Beyond fines, national authorities can withdraw a non-compliant AI system from the EU market entirely. The penalties apply to providers regardless of where they are based, if the system is placed on the EU market.

How does the AI Act connect to web scraping and training data?

Through the GPAI transparency and copyright rules. Providers of general-purpose AI models must publish a summary of the content used to train the model using a mandatory European Commission template, and must have a policy to comply with EU copyright law — including respecting the text-and-data-mining (TDM) opt-out under Article 4 of the 2019 Copyright Directive. In practice that opt-out is expressed through machine-readable signals like robots.txt and ai.txt. So the Act gives legal weight to the opt-out signals we cover in our robots.txt vs llms.txt vs ai.txt guide.

Did the EU delay the AI Act in 2026?

There has been movement, but no wholesale delay of the August 2026 dates as of this writing. A November 2025 Commission proposal floated pushing some deadlines toward late 2027, and on 7 May 2026 the Council and Parliament reached a political agreement (part of a 'Digital Omnibus' simplification package) to streamline rules and extend certain high-risk timelines. Until those changes are actually enacted into law, organizations should treat 2 August 2026 as the operative deadline rather than assuming relief.

Does the EU AI Act make web scraping illegal?

No. The Act regulates AI systems and GPAI models — not the act of scraping itself. Collecting publicly accessible data remains governed by the case law and contract/anti-circumvention rules we cover in is web scraping legal in 2026. What the Act adds is an obligation on AI model providers to document their training data and respect copyright opt-outs. If you collect data to train or fine-tune models for the EU market, those transparency and opt-out duties are the part that touches you.

What should data collectors actually do about it?

Build a compliance trail. Document your data sources and collection methods, honor machine-readable opt-outs (robots.txt, ai.txt / TDM reservations), keep your collection to publicly accessible data, and if you train or fine-tune GPAI-scale models for the EU, prepare the training-data summary against the Commission template. On infrastructure, collecting public pages as a normal visitor on a real residential or mobile IP — without circumventing access barriers — keeps you on the defensible side of both the AI Act's copyright duties and general scraping law.

All systems operational•IP pool status

Dashboard Login/Signup Purchase Guide All Proxies

Web Scraping & AI · Regulation · May 2026 · 12-min read

The EU AI Act in 2026: What August Enforcement Means for AI Training Data and Web Scraping

On 2 August 2026 the EU AI Act stops being a paper deadline and gains real teeth — GPAI enforcement, high-risk obligations, and a mandatory training-data disclosure tied to the EU copyright opt-out. Here's the researched, no-hype guide to the timeline, the penalties, the May-2026 simplification agreement, and what it means for anyone collecting public web data.

Coronium Technical Team

Published May 29, 2026

Verified 2026-05-29

General information, not legal advice. The AI Act is complex and still evolving (see the May-2026 simplification agreement below). Consult qualified EU counsel for your situation.

Aug 2

2026 — enforcement teeth

€15M

or 3% turnover (GPAI)

€35M

or 7% (prohibited uses)

Aug 2027

full application

TL;DR

2 Aug 2026 turns on high-risk obligations and GPAI enforcement (fines up to €15M / 3% of global turnover; prohibited uses up to €35M / 7%). GPAI providers must publish a training-data summary (Commission template) and respect the EU copyright opt-out — expressed through machine-readable signals like robots.txt and ai.txt. A May-2026 agreement may streamline/extend some high-risk timelines, but until it's law, treat August as operative. The Act doesn't ban scraping — it adds transparency + opt-out duties for AI training.

On this page

The timeline
Training-data rules
Copyright opt-out
Penalties
May-2026 changes
For data collectors
FAQ

The phased timeline — and why August 2026 matters

The AI Act doesn't switch on all at once. It rolls out in stages, and 2 August 2026 is the stage where the rules that touch AI data the most acquire enforcement power:

2 Feb 2025 — prohibited practices

The banned uses (e.g. social scoring, certain biometric categorization) became prohibited.

2 Aug 2025 — GPAI obligations begin

General-purpose AI model duties (transparency, copyright policy, training-data summary) took effect. The voluntary GPAI Code of Practice was finalized 10 July 2025.

2 Aug 2026 — enforcement + high-risk

High-risk (Annex III) obligations apply, and the AI Office's GPAI enforcement powers (information requests, model evaluation, fines) switch on. This is the date with teeth.

2 Aug 2027 — full application

Remaining obligations, including AI embedded in regulated products, apply. Pre-existing GPAI models must have published their training-data summary by this date.

The training-data disclosure template

The provision most relevant to web data: every GPAI model provider must publish a summary of the content used to train the model, using a mandatory template the European Commission released. The template asks providers to disclose, in a structured way, the types of content, the data sources, and the methods of collection — including large public datasets and scraped web data.

The summary requirement took effect on 2 August 2025 for new models; models already on the market before that date have until 2 August 2027 to publish theirs. The practical effect is that "we scraped the web" is no longer an acceptable non-answer — providers serving the EU now have to describe what they collected and how.

The copyright opt-out — where robots.txt and ai.txt get legal weight

GPAI providers must also have a policy to comply with EU copyright law, and that includes respecting the text-and-data-mining (TDM) opt-out under Article 4 of the 2019 Copyright Directive. Rightsholders can reserve their works from data mining — and the established way to express that reservation at web scale is machine-readable: robots.txt, TDM metadata, and emerging signals like ai.txt.

This is the link between the regulation and the plumbing: the AI Act gives legal consequence to the opt-out signals we break down in robots.txt vs llms.txt vs ai.txt. An opt-out that was once a polite request becomes evidence of a reservation a compliant GPAI provider must honor.

Curious whether a given site already signals an AI opt-out? Our free AI Crawler Checker reads a domain's robots.txt and shows which AI crawlers it allows or blocks.

The penalties have real scale

€35M / 7%

Prohibited AI practices (of global annual turnover, whichever is higher)

€15M / 3%

High-risk & most GPAI non-compliance

Market removal

National authorities can withdraw non-compliant systems from the EU

Crucially, the obligations bind providers regardless of where they are based if the system is placed on the EU market — so a US or Asian AI company serving EU users is in scope. This extraterritorial reach is why the Act is shaping global data-collection practice, not just European.

The May-2026 simplification agreement — don't bank on a delay

The timeline isn't entirely settled. A November 2025 Commission proposal floated pushing some deadlines toward late 2027, and on 7 May 2026 the Council and Parliament reached a political agreement (a "Digital Omnibus" simplification package) to streamline rules and extend certain high-risk timelines.

The honest read: a political agreement is not yet enacted law, and what's on the table is simplification of high-risk timelines — not a repeal of the GPAI transparency and copyright duties. Until the changes are actually in force, treat 2 August 2026 as the operative deadline rather than assuming relief that may not arrive.

What it means for data collectors

If you collect public web data — especially to train or fine-tune models for the EU market — the Act adds a compliance layer on top of ordinary scraping law:

Honor machine-readable opt-outs — robots.txt, ai.txt / TDM reservations now carry copyright weight under Article 4.
Document sources & methods — the training-data template rewards collectors who already keep a clean provenance trail.
Stay on public, logged-off data — don't circumvent access barriers; the DMCA §1201-style risk is separate from the Act and still applies.
Mind extraterritoriality — "we're not in the EU" is not a defense if your model serves EU users.

The infrastructure angle: compliant collection means behaving like a normal visitor on public pages — a real browser on a residential/mobile IP, honoring opt-outs, not defeating barriers. The regulation rewards transparency and restraint, which is exactly how high-trust mobile-IP collection already works.

FAQ

Related resources

The Closing Web in 2026 (pillar)

AI crawler blocking, Pay-Per-Crawl, and the data wars in full.

Is web scraping legal in 2026?

hiQ, Meta v Bright Data, Reddit v Perplexity & DMCA §1201.

robots.txt vs llms.txt vs ai.txt

The opt-out signals the AI Act gives legal weight to.

Scraping in the Agentic Era (MCP)

How AI agents collect web data and why the IP layer decides.

AI Crawler Checker (free tool)

See which AI bots a domain allows or blocks in its robots.txt.

Web scraping proxies

Real 4G/5G carrier IPs for legitimate public-data collection.

Collect public data the compliant way

Honor opt-outs, document sources, stay on public pages — as a real visitor on real 4G/5G carrier IPs across 20+ countries.