Skip to content
Tools / Web Scraper
Web Scraper icon

Web Scraper

Scrape, crawl & extract web data

Web Scraper turns any website into structured, readable data. It handles JavaScript-heavy pages, anti-bot protection, geographic targeting, and AI-powered data extraction — all without writing any code. Whether you need the text from a single page, a full site crawl, or a structured dataset extracted to a schema, it's one tool.

Standard web scrapers break on modern sites that require JavaScript to render. This one renders fully, bypasses bot detection, supports authentication headers, and can extract specific data fields using AI so you get exactly what you asked for, not raw HTML.

What you can do

  • scrape_page — scrape a single page with JS rendering; returns markdown, HTML, or filtered content; handles JSON APIs too
  • crawl_site — recursively crawl a site up to a depth and page limit, collecting all content
  • map_site — fast URL discovery without content, useful before deciding what to crawl
  • extract_data — AI-powered structured extraction — describe what you want or pass a JSON schema
  • search_web — search the web and optionally scrape the top results in one call
  • stealth_scrape / stealth_crawl — enhanced anti-bot bypass for heavily protected pages

Who it's for

Researchers aggregating content from multiple sites. Data teams building datasets from web sources. Developers integrating web content into pipelines. Marketers monitoring competitor pricing and messaging. Analysts tracking content changes over time.

How to use it

  1. Start with scrape_page for a single URL — add onlyMainContent to strip navigation and footers
  2. Use map_site to discover URLs on a domain before deciding what to crawl
  3. Use crawl_site for recursive multi-page collection with depth and path filters
  4. Use extract_data when you want specific fields — pass a schema or describe what to extract in plain English
  5. If you get 403 errors, switch to stealth_scrape for enhanced proxy routing

Getting started

All skills work without configuration. For geo-targeted results, add a country code. For authenticated pages, pass custom headers with your credentials.

Scrape Page

Scrape a single web page with full JavaScript rendering, anti-bot bypass, and configurable output formats. Supports markdown, HTML, and content filtering by CSS tags.

Returns: Scraped page content in the requested formats (markdown, HTML, etc.) with metadata, or parsed JSON for API endpoints
Crawl Site

Recursively crawl a website starting from a URL, following links up to a configurable depth and page limit. Returns scraped content for all discovered pages.

Returns: Array of scraped pages with content, metadata, and crawl status
Map Site

Quickly discover all URLs on a website using sitemaps and link analysis. Returns a flat list of URLs without scraping content. Optionally filter by keyword.

Returns: List of discovered URLs from the website via sitemaps and link analysis
Extract Structured Data

Use AI to extract structured data from one or more web pages. Provide a JSON Schema for typed output or a natural language prompt for flexible extraction.

Returns: AI-extracted structured data from the provided URLs, matching the schema or prompt
Search & Scrape

Search the web using a query and optionally scrape the content of each result page. Returns search results with titles, URLs, and snippets, plus full page content when scraping is enabled.

Returns: Search results with titles, URLs, snippets, and optionally full scraped page content in markdown
Stealth Scrape Page

Scrape a single bot-protected web page using enhanced residential proxies, geo-targeted IPs, and extended rendering wait times. Bypasses Cloudflare, Akamai, DataDome, and similar anti-bot systems.

Returns: Scraped page content from bot-protected sites in the requested formats with metadata
Stealth Crawl Site

Recursively crawl a bot-protected website using enhanced proxies on every page. Bypasses anti-bot systems across the entire crawl, with geo-targeted IPs and extended rendering.

Returns: Array of scraped pages from bot-protected sites with content, metadata, and crawl status
Loading reviews...

Loading activity...

v0.052026-04-04
  • scrape_page now handles JSON API endpoints — returns parsed JSON and markdown instead of empty content
  • Added headers parameter to scrape_page for custom request headers (e.g. Authorization)
v0.042026-03-23
  • Added stealth_scrape and stealth_crawl skills for bot-protected websites
v0.032026-03-23
  • Added proxy, country, and languages parameters to scrape_page and crawl_site for anti-bot bypass and geo-targeting
v0.022026-03-22
  • Added subtitle, expanded description, and agent instructions
v0.012026-03-20
  • Initial release

Web Scraper Use Cases(10)

Browse all 10 Web Scraperguides →
Open Search Papers by Topic

Search Papers by Topic

Find relevant academic papers on any research topic across millions of scholarly publications.

Academic Research icon
Academic Research
4 agent guides
Open Geocode Addresses to Coordinates

Geocode Addresses to Coordinates

Convert street addresses into precise latitude and longitude coordinates for mapping and spatial analysis.

Address Geocoding icon
Address Geocoding
4 agent guides
See every Web Scraperuse case (Claude, ChatGPT, Copilot, OpenClaw guides) →

Related Tools

Open Web Search
Web Search icon
Web SearchWeb, news, images & maps — one tool
5

Frequently Asked Questions

Can it handle JavaScript-heavy sites?

Yes. JavaScript rendering is built in, so dynamic pages are part of the normal workflow.

Can it crawl and discover new URLs?

Yes. It can scrape a page, crawl sites, and discover URLs in one tool.

What formats can I extract?

You can pull markdown, HTML, or typed data depending on how structured you want the result to be.

Does it help with anti-bot pages?

Yes. Anti-bot bypass is part of the web access surface for the tool.