What did hiQ v. LinkedIn actually decide?

hiQ scraped public LinkedIn profiles to build analytics products. LinkedIn tried to block it and claimed CFAA violations. The 9th Circuit ruled that the CFAA's 'without authorization' language applies to systems protected by an authentication gate — not to public web pages anyone can view. So scraping public, logged-off data is not 'hacking.' (The case later settled, and hiQ was found to have breached LinkedIn's user agreement on other grounds — a reminder that contract/ToS claims are separate from CFAA claims.)

What happened in Meta v. Bright Data?

Meta sued data-collection company Bright Data for scraping Facebook and Instagram. In January 2024 a federal court ruled largely for Bright Data: Meta's Terms of Service prohibit scraping by logged-in users, but they do not govern logged-off scraping of publicly available data. Meta dropped the case shortly after. The takeaway reinforced hiQ: public, logged-off data is fair game; logged-in scraping that breaches the ToS you agreed to is where contract liability begins.

Why is Reddit v. Perplexity different?

Reddit sued Perplexity (and data brokers) in late 2025 not primarily on CFAA grounds but under the DMCA's anti-circumvention provision, §1201 — alleging the defendants bypassed Reddit's rate limits and anti-bot measures to scrape content at scale for AI. That reframes the legal question: it is less about 'was the data public' and more about 'did you defeat a technical protection measure to get it.' The case is pending, but it signals where platform litigation is heading in the AI era.

Does breaking a website's Terms of Service make scraping illegal?

Breaching ToS is generally a contract issue, not a criminal one. Post-hiQ, courts have been reluctant to treat a ToS violation alone as a CFAA crime for public data. But it can still expose you to a breach-of-contract claim — especially if you agreed to the ToS by creating an account or logging in. The practical rule: scraping logged-off public pages carries far less contractual risk than scraping behind a login you accepted terms for.

What about privacy laws like GDPR?

Even when scraping is permitted, the data itself may be regulated. If you collect personal data of EU residents, GDPR applies regardless of where you operate — you need a lawful basis, data-minimization, and so on. India's DPDP Act and various US state privacy laws add similar obligations. 'Public' does not mean 'unregulated': a name and email on a public page is still personal data. Scraping aggregate, non-personal, public information is the lowest-risk path.

How do I scrape public data in a way that stays compliant?

Five practical rules: (1) collect only publicly accessible, logged-off data; (2) do not circumvent technical access barriers — that is the DMCA §1201 risk; (3) rate-limit respectfully so you do not harm the site; (4) avoid scraping personal data, or handle it under the relevant privacy law; (5) document your sources, your ToS review, and your methods. On infrastructure, collecting public pages through a real browser on a residential or mobile IP — as a normal visitor — keeps you on the public-data side of the line rather than presenting as a declared AI training bot.

All systems operational•IP pool status

Dashboard Login/Signup Purchase Guide All Proxies

Web Scraping & AI · Legal · May 2026 · 12-min read

Is Web Scraping Legal in 2026? hiQ, Meta v Bright Data, Reddit v Perplexity and the New Rules

Q: Is web scraping legal in 2026?

Scraping publicly accessible data that does not require a login or circumventing technical protections is generally legal in the United States. The landmark case is hiQ Labs v. LinkedIn, where the 9th Circuit held that scraping data available without authentication does not violate the Computer Fraud and Abuse Act (CFAA) because there is no 'unauthorized access.' What is NOT settled is the use of scraped data to train AI models — that frontier is being litigated right now. This article is general information, not legal advice; consult a qualified attorney for your specific situation.

The short answer: scraping public, logged-off data is generally legal in the US — but the AI-training frontier is being fought in court right now. Here's the researched, plain-English map of the cases, the risk lines, and the compliant way to collect public data.

Coronium Technical Team

Published May 27, 2026

Verified 2026-05-27

This is general information, not legal advice. Web scraping law varies by jurisdiction, data type, and facts. Consult a qualified attorney before relying on anything here.

TL;DR

Public, logged-off data: generally legal (hiQ v LinkedIn; Meta v Bright Data). Logged-in / behind a ToS you accepted: contract risk. Circumventing anti-bot measures: DMCA §1201 risk (Reddit v Perplexity). Personal data: GDPR / DPDP applies even if "public." The safe path is collecting public, non-personal pages as a normal visitor on a real residential/mobile IP, without defeating access barriers, with everything documented.

On this page

The legal framework
The key cases
The AI-training frontier
Privacy law
Staying compliant
FAQ

The four laws that actually matter

"Is scraping legal" is the wrong question — it's really four separate questions under four different bodies of law. Sort your use case into these buckets and the risk picture gets clear fast.

CFAA (computer access)

The "hacking" statute. Post-hiQ, accessing public data without an authentication gate is not "unauthorized access." This is why public scraping survived.

Contract / Terms of Service

Breaching a ToS is a contract matter, not a crime. It mostly bites when you logged in and agreed to the terms first.

DMCA §1201 (anti-circumvention)

Defeating a "technological protection measure" — rate limits, anti-bot systems, login walls — can be a violation independent of whether the data was public. The new battleground.

Privacy law (GDPR, DPDP, state laws)

Governs the data, not the access. Personal data is regulated even when it's publicly visible.

The cases that built the public-data rule

hiQ Labs v. LinkedIn (9th Cir.)

hiQ scraped public LinkedIn profiles. The court held the CFAA's "without authorization" applies to authentication-gated systems, not public pages. Scraping public, logged-off data is not a CFAA crime. (hiQ separately lost on a ToS/contract theory — the two are different claims.)

Meta v. Bright Data (N.D. Cal., Jan 2024)

The court ruled Meta's ToS bar scraping by logged-in users but don't govern logged-off scraping of public data. Meta dropped the case weeks later. Reinforced hiQ and gave the data-collection industry a clearer green light for public pages.

The AI-training frontier: where it gets unsettled

The public-data cases predate the AI gold rush. The new wave of litigation reframes the question from "was it public?" to "did you defeat a protection to get it, and what did you do with it?"

Reddit v. Perplexity (2025, pending)

Reddit's central claim is DMCA §1201 — alleging defendants circumvented rate limits and anti-bot systems to scrape content for AI. The shift from CFAA to §1201 is the story: it targets the circumvention, not the publicness of the data.

YouTube creators v. Nvidia, Snap, Meta

Creators have sued over alleged scraping of YouTube videos to train AI models, on similar circumvention and IP theories. Outcomes pending — but the volume of suits signals the AI-training use of scraped data is the hot legal zone.

Connect this to the infrastructure side in The Closing Web in 2026: the same anti-bot measures now central to §1201 claims are what Cloudflare's default-block and Pay-Per-Crawl enforce at the network layer.

"Public" does not mean "unregulated"

Even when scraping is permitted, privacy law governs the data itself. A name, email, or photo on a public page is still personal data:

GDPR (EU): applies to EU residents' personal data wherever you operate; needs a lawful basis and data-minimization.
India DPDP Act: similar consent/notice obligations for personal data.
US state laws (CCPA/CPRA and others): rights over personal data, including some publicly available data.

Lowest-risk path: scrape aggregate, non-personal, public information. The moment you touch personal data, a second layer of law applies regardless of how legal the access was.

The compliant playbook

Do

Collect public, logged-off data only
Rate-limit respectfully; don't degrade the site
Prefer aggregate, non-personal data
Document sources, ToS review & methods
Present as a normal visitor on a real IP

Don't

Circumvent anti-bot / rate-limit systems (§1201)
Scrape behind logins you accepted ToS for
Harvest personal data without a lawful basis
Hammer a site so hard it causes harm
Assume "public" means "no privacy law"

The infrastructure angle: collecting public pages through a real browser on a residential or mobile carrier IP keeps you on the public-data side of the line — a normal visitor, not a declared bot defeating barriers. Pair it with the detection realities in how websites detect proxies in 2026.

FAQ

Related resources

The Closing Web in 2026 (pillar)

AI crawler blocking, Pay-Per-Crawl, and the data wars in full.

The EU AI Act in 2026

Aug-2026 enforcement, GPAI training-data disclosure & the copyright opt-out.

Cloudflare Pay-Per-Crawl deep-dive

The 402 paywall economics and the network-layer enforcement.

How websites detect proxies in 2026

The 7-layer detection stack you must pass as a real visitor.

The AI Crawler War 2026

The broader AI-vs-publisher scraping conflict.

Dedicated mobile proxies

Real carrier IPs for legitimate public-data collection.

Web scraping proxies

Commercial landing for scraping workloads.

Collect public data the legitimate way

Real residential/mobile carrier IPs let you reach public pages as a genuine visitor — without defeating access barriers. Dedicated 4G/5G across 20+ countries.