All systems operationalโ€ขIP pool status
Coronium Mobile Proxies
AI Web Scraping
LLM Extraction
Mobile Proxies
Updated April 2026

ScrapeGraphAI Proxy Setup with Mobile IPs (2026)

A complete, hands-on guide to pairing ScrapeGraphAI with Coronium mobile proxies for production-grade natural-language web scraping. Covers SmartScraper, Search Scraper, Markdownify, self-hosting the Python library, BYOP integration with the hosted Cloud API, and a direct head-to-head with Firecrawl and Crawl4AI.

50
Free credits on signup
$17-19
Starter plan / month
NL
Natural-language prompts
95%+
Trust on mobile IPs

Quick Facts: ScrapeGraphAI in 2026

Dual distribution: open-source Python library (pip install scrapegraphai) + hosted Cloud API at scrapegraphai.com
Three flagship endpoints: SmartScraper (10 credits), Search Scraper (30 credits), Markdownify
Model-agnostic: OpenAI, Anthropic Claude, Google Gemini, and local Ollama models
Cloud dashboard with live test + preview, webhook delivery, and credit-based billing
BYOP supported: route fetches through Coronium mobile endpoints in both OSS and Cloud modes

What is ScrapeGraphAI?

ScrapeGraphAI is a Python-first, LLM-native web scraping framework that replaces brittle CSS selectors and XPath with natural-language prompts. You tell it what you want in English, not where it lives in the DOM, and an LLM extracts structured JSON for you. Under the hood, ScrapeGraphAI models each scrape as a directed graph of nodes (fetch, parse, prompt, validate) so the pipeline is observable, composable, and resilient to layout drift.

Open-Source Library

pip install scrapegraphai ships the full graph engine, node library, and LLM adapters. Apache 2.0 licensed, 18K+ GitHub stars, active monthly releases.

  • Full control over fetcher (Playwright, HTTPX, ChromeDriver)
  • Pay only for your own LLM tokens + proxy egress
  • Can run fully offline with Ollama

Hosted Cloud API

scrapegraphai.com offers a REST API with 50 free credits, Starter plan from $17-19/month, and 15% off annual. Dashboard includes live test/preview, run history, and webhook delivery.

  • No server infrastructure to maintain
  • LLM tokens bundled into credit price
  • BYOP (bring your own proxy) parameter supported

How a ScrapeGraphAI pipeline works (high level)

1
Fetch Node
Playwright or HTTPX client fetches the URL through your proxy. JS is executed, cookies are preserved. This is where Coronium mobile IPs plug in.
2
Parse Node
HTML is cleaned, whitespace collapsed, and split into chunks that fit the LLM's context window. Non-content elements (scripts, nav, footer) are stripped.
3
Prompt / Extract Node
Your natural-language prompt plus the parsed text is sent to the chosen LLM with a JSON-output constraint (and an optional Pydantic schema).
4
Merge / Validate Node
Chunk outputs are merged, validated against the schema, and returned as a single JSON object or list. On validation failure the node can re-prompt.

The genius of the graph design is that each node is swappable: you can drop in a stealth browser fetcher, a custom parser, or a different LLM without rewriting the pipeline. For Coronium users, this means configuring the proxy once at the fetch node and letting SmartScraper, Search Scraper, or Markdownify route every request through your mobile endpoint.

SmartScraper Explained

SmartScraper is ScrapeGraphAI's flagship single-page extraction endpoint. Give it one URL and a natural-language prompt describing what you want, and it returns structured JSON. At 10 credits per call (about $0.04/page on the Starter plan), it's the workhorse for product catalogs, article extraction, profile scraping, and anything where you know the URL and want structured data back.

SmartScraper: minimal Python example (OSS)

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "api_key": "sk-...",          # your OpenAI key
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": True,
    "loader_kwargs": {
        "proxy": {
            "server": "http://us.coronium.io:10001",
            "username": "your_user",
            "password": "your_pass",
        }
    },
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Extract all product names, prices in USD, and star ratings.",
    source="https://example-shop.com/category/laptops",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)
# {"products": [{"name": "...", "price": 1299.00, "rating": 4.5}, ...]}

SmartScraper: Cloud API call (BYOP with Coronium)

import requests

resp = requests.post(
    "https://api.scrapegraphai.com/v1/smartscraper",
    headers={
        "SGAI-APIKEY": "sgai-...",          # your ScrapeGraphAI key
        "Content-Type": "application/json",
    },
    json={
        "website_url": "https://example-shop.com/category/laptops",
        "user_prompt": "Extract all product names, prices in USD, and star ratings.",
        "proxy": "http://your_user:your_pass@us.coronium.io:10001",
        # optional Pydantic-style schema
        "output_schema": {
            "type": "object",
            "properties": {
                "products": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "number"},
                            "rating": {"type": "number"},
                        },
                    },
                }
            },
        },
    },
    timeout=120,
)

print(resp.json())
10 credits / call
~$0.04 per page on the Starter plan
Schema-aware
Optional Pydantic / JSON Schema for strict output
JS-rendered
Playwright fetcher handles SPAs and client-side rendering

Search Scraper: Multi-Source Aggregation

Search Scraper is where ScrapeGraphAI gets genuinely interesting. Instead of one URL, you give it a question. It issues a web search, fetches the top results in parallel, extracts structured data from each, and returns an aggregated JSON object with source attribution per field. At 30 credits per query it's three times the cost of SmartScraper but does the work of multiple pipelines.

Search Scraper: competitive-intelligence example

from scrapegraphai.graphs import SearchGraph

graph_config = {
    "llm": {
        "api_key": "sk-ant-...",
        "model": "anthropic/claude-sonnet-4-5",
    },
    "max_results": 5,
    "loader_kwargs": {
        "proxy": {
            "server": "http://us.coronium.io:10002",   # sticky 5-min
            "username": "your_user",
            "password": "your_pass",
        }
    },
}

search_graph = SearchGraph(
    prompt=(
        "Compare pricing and key features of the top 5 AI web scraping "
        "frameworks as of 2026. Return name, pricing_usd_per_month, "
        "free_tier, primary_language, and one_line_summary."
    ),
    config=graph_config,
)

result = search_graph.run()
# {"frameworks": [{"name": "...", "pricing_usd_per_month": ..., ...}, ...],
#  "sources": ["https://...", "https://..."]}

When to reach for Search Scraper

Use Search Scraper for
  • Market research where you don't know the URLs up front
  • Competitive pricing sweeps across 5-10 competitors
  • News aggregation on a breaking topic
  • RAG seed data: collect citations for a grounded answer
  • Brand monitoring across forums and review sites
Stick with SmartScraper for
  • Scheduled scrapes of known URLs (catalogs, listings)
  • Scrapes where you already have the target link
  • Cost-sensitive jobs (3x cheaper at 10 credits)
  • Authenticated scrapes that need specific sessions
  • Single-source ground-truth extractions

Proxy note: Search Scraper fans out to multiple domains in parallel, so rotating mobile IPs work well because each sub-request gets a fresh trust-heavy IP. If a target site issues multiple requests to load full content (dynamic pagination, async content loading), switch that particular run to a sticky session to keep cookies coherent.

Markdownify: URLs to Clean Markdown

Markdownify is the RAG-friendly endpoint: feed it a URL, get back clean Markdown with headings, lists, and links preserved, ready to embed into a vector store. It strips nav, ads, cookie banners, and script noise and returns just the content. Cheaper per call than SmartScraper because there's no LLM extraction step on the ScrapeGraphAI side - just fetch, clean, convert.

Markdownify: building a RAG corpus

import requests
import time

urls = [
    "https://docs.example.com/getting-started",
    "https://docs.example.com/api-reference",
    "https://docs.example.com/guides/deployment",
]

corpus = []
for url in urls:
    r = requests.post(
        "https://api.scrapegraphai.com/v1/markdownify",
        headers={"SGAI-APIKEY": "sgai-...", "Content-Type": "application/json"},
        json={
            "website_url": url,
            "proxy": "http://user:pass@us.coronium.io:10001",
        },
        timeout=60,
    )
    corpus.append({"url": url, "markdown": r.json()["markdown"]})
    time.sleep(1)   # gentle rate limit

# Now feed corpus to your embedding model
# e.g. OpenAI text-embedding-3-large, Voyage voyage-3, etc.

What Markdownify keeps

  • Hierarchical headings (H1-H6)
  • Ordered and unordered lists
  • Links with anchor text preserved
  • Tables converted to GFM Markdown tables
  • Code blocks with language hints when available

What Markdownify strips

  • Navigation, header, footer boilerplate
  • Script, style, and iframe tags
  • Cookie banners and consent overlays
  • Ad slots and tracking pixels
  • Social-share widgets and related-content rails

Self-Hosting with Coronium Proxies

Running the OSS Python library gives you maximum control: your own Playwright browser, your own LLM keys, your own proxies, and no per-credit cloud pricing. Below is the full setup for a production self-hosted pipeline routing every fetch through Coronium mobile IPs.

Step 1: Install and set up

# Python 3.10+ recommended
python -m venv .venv
source .venv/bin/activate

pip install --upgrade pip
pip install scrapegraphai playwright python-dotenv pydantic

# Playwright browser binaries (headless Chromium)
playwright install chromium

Note: On Debian/Ubuntu hosts you may also need playwright install-deps to pull native libraries.

Step 2: Configure secrets

# .env
OPENAI_API_KEY=sk-...
CORONIUM_HOST=us.coronium.io
CORONIUM_PORT=10001            # rotating
CORONIUM_STICKY_PORT=10002     # sticky 5 min
CORONIUM_USER=your_user
CORONIUM_PASS=your_pass

Step 3: Production-grade SmartScraper with schema

import os
from typing import List, Optional
from pydantic import BaseModel, Field
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph

load_dotenv()

class Product(BaseModel):
    name: str
    price_usd: float
    rating: Optional[float] = Field(None, ge=0, le=5)
    in_stock: bool = True

class ProductList(BaseModel):
    products: List[Product]

def build_graph(sticky: bool = False) -> SmartScraperGraph:
    port = os.getenv("CORONIUM_STICKY_PORT") if sticky else os.getenv("CORONIUM_PORT")
    return SmartScraperGraph(
        prompt="Extract every product with its USD price, rating (0-5), and in-stock flag.",
        source="https://example-shop.com/category/laptops",
        schema=ProductList,
        config={
            "llm": {
                "api_key": os.getenv("OPENAI_API_KEY"),
                "model": "openai/gpt-4o-mini",
                "temperature": 0.0,
            },
            "verbose": False,
            "headless": True,
            "loader_kwargs": {
                "proxy": {
                    "server": f"http://{os.getenv('CORONIUM_HOST')}:{port}",
                    "username": os.getenv("CORONIUM_USER"),
                    "password": os.getenv("CORONIUM_PASS"),
                }
            },
        },
    )

if __name__ == "__main__":
    graph = build_graph(sticky=False)
    result = graph.run()
    print(result)

Operational tips for self-hosted ScrapeGraphAI

  • 1
    Always pass a schema. Pydantic validation catches hallucinated fields before they poison downstream pipelines. Add a retry-on-validation-error loop at the call site.
  • 2
    Cache parsed Markdown. LLM tokens are the expensive part. Hash (url, date) and skip the LLM call if you've already extracted today.
  • 3
    Use sticky ports for flows. Any scrape with >1 HTTP request to the same domain should use a sticky session (5-10 minutes) to keep the IP coherent.
  • 4
    Run headless with a realistic UA. Playwright's default UA flags as automation. Override it to a current Chrome/Safari mobile UA that matches the 4G IP's carrier region.
  • 5
    Log graph output JSON at every node. When extraction fails, the bug is almost always at the parse node (bad chunking) or the prompt node (ambiguous prompt), not the LLM.

Cloud API Integration with BYOP

If you'd rather not manage Playwright, LLM keys, and retry logic yourself, the ScrapeGraphAI Cloud API wraps everything behind a single REST endpoint. Credits cover the LLM tokens and the fetch. The BYOP (Bring Your Own Proxy) parameter lets you swap the default proxy for your Coronium endpoint so you keep control of IP trust and sticky sessions while offloading orchestration.

REST request structure

POST https://api.scrapegraphai.com/v1/smartscraper
Headers:
  SGAI-APIKEY: sgai-xxxxxxxxxxxxxxxxxxxx
  Content-Type: application/json

Body:
{
  "website_url": "https://example.com/product/123",
  "user_prompt": "Extract title, price, SKU, stock status, image URLs",
  "proxy": "http://USER:PASS@us.coronium.io:10002",
  "output_schema": { ... optional JSON Schema ... },
  "render_heavy_js": true,
  "total_timeout": 90
}

Node.js client with BYOP and retry

// npm i axios zod
import axios from "axios";
import { z } from "zod";

const Product = z.object({
  title: z.string(),
  price_usd: z.number(),
  sku: z.string().nullable(),
  in_stock: z.boolean(),
});

async function smartScrape(url: string, retries = 3) {
  const proxy = process.env.CORONIUM_PROXY_URL!;  // http://u:p@us.coronium.io:10002
  for (let attempt = 1; attempt <= retries; attempt++) {
    try {
      const { data } = await axios.post(
        "https://api.scrapegraphai.com/v1/smartscraper",
        {
          website_url: url,
          user_prompt: "Extract title, price in USD, SKU, and stock boolean.",
          proxy,
          render_heavy_js: true,
          total_timeout: 120,
        },
        {
          headers: {
            "SGAI-APIKEY": process.env.SGAI_API_KEY!,
            "Content-Type": "application/json",
          },
          timeout: 125_000,
        }
      );
      return Product.parse(data.result);
    } catch (err: any) {
      if (attempt === retries) throw err;
      await new Promise(r => setTimeout(r, 2000 * attempt));
    }
  }
}

Cloud API pricing at scale (2026)

PlanMonthly creditsSmartScraper pagesPriceNotes
Free50~5$0Evaluate + prototype
Starter~5,000~500$17-19/moSmall production workloads
GrowthHigherSeveral thousandSee dashboardTeam + scheduled jobs
AnnualAny planSame-15%Best $/credit for committed workloads

SmartScraper = 10 credits, Search Scraper = 30 credits. Annual billing cuts 15%. Exact monthly credit allotments and higher-tier prices are set on the ScrapeGraphAI pricing page; always confirm live numbers before building a unit-economics model.

LLM Model Selection: GPT-4o vs Claude vs Gemini vs Ollama

ScrapeGraphAI is model-agnostic by design. The right choice depends on three axes: accuracy on your target pages, cost per 1M tokens, and data residency (sometimes you legally cannot send content to a third-party cloud).

ModelBest forAccuracyCost (relative)ContextOn-prem?
GPT-4oBalanced default, most site typesExcellent$$$128KNo
GPT-4o-miniHigh-volume, cost-sensitive SmartScraperGood$128KNo
Claude Sonnet 4.5+Long dense pages, legal/medical copyExcellent$$$200K+No
Claude HaikuFast+cheap SmartScraper runsGood$200KNo
Gemini 2.5 Pro/FlashSearch Scraper aggregation, multimodalVery good$-$$1MNo
Llama 3.3 70B (Ollama)On-prem, privacy-sensitive extractionOK-GoodHW only128KYes
Qwen 2.5 / Mistral (Ollama)Lighter local runs on consumer GPUsOKHW only32K-128KYes

Swapping models in ScrapeGraphAI

# OpenAI
{"model": "openai/gpt-4o-mini", "api_key": os.getenv("OPENAI_API_KEY")}

# Anthropic
{"model": "anthropic/claude-sonnet-4-5", "api_key": os.getenv("ANTHROPIC_API_KEY")}

# Google Gemini
{"model": "google_genai/gemini-2.5-flash", "api_key": os.getenv("GEMINI_API_KEY")}

# Local Ollama (Llama 3.3 70B)
{
    "model": "ollama/llama3.3",
    "model_tokens": 128000,
    "base_url": "http://localhost:11434",
}

A pragmatic model-selection heuristic

  1. Start with GPT-4o-mini or Gemini Flash for 80% of jobs. Cheap, fast, accurate enough.
  2. Escalate to GPT-4o or Claude Sonnet when validation failures exceed 3-5% of runs.
  3. Pick Claude Sonnet for long pages (news longforms, legal docs, SEC filings) where the 200K+ context matters.
  4. Pick Gemini Flash for Search Scraper when you're aggregating 5-10 pages per query and cost dominates.
  5. Pick Ollama only when legal/privacy requirements forbid sending content to a third party.

ScrapeGraphAI vs Firecrawl vs Crawl4AI

All three tools sit in the modern "AI-native scraping" category and all three play well with Coronium mobile proxies. They differ in surface area, pricing model, and default workflow.

DimensionScrapeGraphAIFirecrawlCrawl4AI
Primary metaphorNatural-language graph extractionCrawl + convert to MarkdownAsync Python crawler with LLM strategies
LicenseApache 2.0 OSS + CloudAGPL OSS + CloudApache 2.0 fully OSS
Hosted APIYes (scrapegraphai.com)Yes (firecrawl.dev)No (self-host only)
Free tier50 credits500 creditsUnlimited (self-host)
Entry paid plan$17-19/mo~$16-19/moN/A
Single-page extractionSmartScraper (10 cr)/scrape + extractarun + LLM strategy
Multi-source searchSearch Scraper (30 cr)/searchManual w/ SearxNG
URL to MarkdownMarkdownifyNative outputBuilt-in
Full-site crawlerBasicExcellent (core feature)Excellent
LLM-flexibleOpenAI, Anthropic, Gemini, OllamaMultiple via extract endpointAny LangChain LLM
BYOP (custom proxy)Yes (proxy param)Yes (proxy config)Full control
Natural-language UXBest-in-classPrompt on /extractStrategy-based

Pick ScrapeGraphAI when

  • You want the cleanest natural-language prompt interface
  • Multi-source Search Scraper with source attribution matters
  • You want the choice between OSS and Cloud without a rewrite

Pick Firecrawl when

  • You need to crawl entire documentation sites into Markdown
  • Building a RAG knowledge base as a primary goal
  • The generous free tier (500 cr) helps you prototype

Pick Crawl4AI when

  • You must self-host with no third-party cloud
  • You need deep programmatic control over crawler strategies
  • Cost discipline (hardware-only) beats developer velocity
All three pair the same way with Coronium

Whether you pick ScrapeGraphAI, Firecrawl, or Crawl4AI, the IP layer is the same problem: your fetcher needs to look like a real user. Coronium's 4G/5G mobile pools, CGNAT shared IPs, automatic rotation and sticky sessions plug into all three via their respective proxy parameters. You can even A/B two tools against the same Coronium endpoint to see which gives better extraction quality on your specific target domains.

Frequently Asked Questions

Premium Mobile Proxy Pricing

Configure & Buy Mobile Proxies

Select from 10+ countries with real mobile carrier IPs and flexible billing options

Choose Billing Period

Select the billing cycle that works best for you

SELECT LOCATION

๐Ÿ‡บ๐Ÿ‡ธ
USA
$129/m
HOT
๐Ÿ‡ฌ๐Ÿ‡ง
UK
$97/m
HOT
๐Ÿ‡ซ๐Ÿ‡ท
France
$79/m
๐Ÿ‡ฉ๐Ÿ‡ช
Germany
$89/m
๐Ÿ‡ช๐Ÿ‡ธ
Spain
$96/m
๐Ÿ‡ณ๐Ÿ‡ฑ
Netherlands
$79/m
๐Ÿ‡ฆ๐Ÿ‡บ
Australia
$119/m
๐Ÿ‡ฎ๐Ÿ‡น
Italy
$127/m
๐Ÿ‡ง๐Ÿ‡ท
Brazil
$99/m
๐Ÿ‡จ๐Ÿ‡ฆ
Canada
$159/m
๐Ÿ‡ต๐Ÿ‡ฑ
Poland
$69/m
๐Ÿ‡ฎ๐Ÿ‡ช
Ireland
$59/m
๐Ÿ‡ฑ๐Ÿ‡น
Lithuania
$59/m
๐Ÿ‡ต๐Ÿ‡น
Portugal
$89/m
๐Ÿ‡ท๐Ÿ‡ด
Romania
$49/m
SALE
๐Ÿ‡บ๐Ÿ‡ฆ
Ukraine
$27/m
SALE
๐Ÿ‡ฌ๐Ÿ‡ช
Georgia
$69/m
SALE
๐Ÿ‡น๐Ÿ‡ญ
Thailand
$59/m
SALE
Save up to 10%

when you order 5+ proxy ports

Carrier & Region

USA ๐Ÿ‡บ๐Ÿ‡ธ

Available regions:

Florida
New York

Included Features

Dedicated Device
Real Mobile IP
10-100 Mbps Speed
Unlimited Data
ORDER SUMMARY

๐Ÿ‡บ๐Ÿ‡ธUSA Configuration

AT&T โ€ข Florida โ€ข Monthly Plan

Your price:

$129

/month

Unlimited Bandwidth

No commitment โ€ข Cancel anytime โ€ข Purchase guide

Money-back guarantee if not satisfied

Perfect For

Multi-account management
Web scraping without blocks
Geo-specific content access
Social media automation
500+Active Users
10+Countries
95%+Trust Score
20h/dSupport

Popular Proxy Locations

United Statesโ€ขCaliforniaโ€ขLos Angelesโ€ขNew Yorkโ€ขNYC

Secure payment methods accepted: Credit Card, PayPal, Bitcoin, and more. 2 free modem replacements per 24h.

Power Your ScrapeGraphAI Pipelines with Mobile Proxies

CGNAT-shared 4G and 5G IPs raise trust scores at Cloudflare, DataDome and PerimeterX and keep SmartScraper, Search Scraper, and Markdownify delivering structured JSON - not 403s.

95%+
Success rate
Auto
IP rotation
Sticky
5-10 min sessions
HTTP/SOCKS5
Protocol support