Tools / Web Scraper
Web Scraper icon

Web Scraper

Scrape, crawl & extract web data

Turn any website into structured data with JS rendering, anti-bot bypass, and automatic extraction. Scrape pages, crawl sites, discover URLs, extract typed data with AI, or search and scrape at once. Supports markdown, HTML, CSS filtering, and mobile viewports.

7 skillsv0.04
Scrape Page

Scrape a single web page with full JavaScript rendering, anti-bot bypass, and configurable output formats. Supports markdown, HTML, and content filtering by CSS tags.

Returns: Scraped page content in the requested formats (markdown, HTML, etc.) with metadata
Parameters
url *stringURL of the page to scrape
formatsarrayOutput formats to return (e.g. "markdown", "html", "rawHtml", "links", "screenshot")
onlyMainContentbooleanExtract only the main content, removing navbars, footers, and sidebars
includeTagsarrayCSS tags to include in extraction (e.g. ["article", "main"])
excludeTagsarrayCSS tags to exclude from extraction (e.g. ["nav", "footer"])
mobilebooleanUse a mobile user agent and viewport for rendering
waitFornumberMilliseconds to wait after page load before capturing content
timeoutnumberMaximum time in milliseconds to wait for the page to load
proxystringProxy type: "basic" (fast, default), "enhanced" (anti-bot bypass, slower), or "auto" (tries basic first, falls back to enhanced)
countrystringISO country code for geo-targeted proxy (e.g. "us", "gb", "de", "jp")
languagesarrayBrowser language headers for geo-targeted requests (e.g. ["en-US", "en"])
Example
Scrape a page as markdown
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "scrape_page",
  "input": {
    "url": "https://example.com"
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Crawl Site

Recursively crawl a website starting from a URL, following links up to a configurable depth and page limit. Returns scraped content for all discovered pages.

Returns: Array of scraped pages with content, metadata, and crawl status
Parameters
url *stringStarting URL for the crawl
limitnumberMaximum number of pages to crawl (default 50)
maxDepthnumberMaximum link-following depth from the starting URL
includePathsarrayURL path patterns to include (e.g. ["/blog/*", "/docs/*"])
excludePathsarrayURL path patterns to exclude (e.g. ["/admin/*", "/api/*"])
allowSubdomainsbooleanWhether to follow links to subdomains of the starting URL
allowExternalLinksbooleanWhether to follow links to external domains
proxystringProxy type: "basic" (fast, default), "enhanced" (anti-bot bypass, slower), or "auto" (tries basic first, falls back to enhanced)
countrystringISO country code for geo-targeted proxy (e.g. "us", "gb", "de", "jp")
languagesarrayBrowser language headers for geo-targeted requests (e.g. ["en-US", "en"])
Example
Crawl a blog with a 20-page limit
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "crawl_site",
  "input": {
    "url": "https://example.com/blog",
    "limit": 20
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Map Site

Quickly discover all URLs on a website using sitemaps and link analysis. Returns a flat list of URLs without scraping content. Optionally filter by keyword.

Returns: List of discovered URLs from the website via sitemaps and link analysis
Parameters
url *stringWebsite URL to map
limitnumberMaximum number of URLs to return (default 1000)
searchstringKeyword filter to narrow down discovered URLs
Example
Map all URLs on a website
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "map_site",
  "input": {
    "url": "https://example.com"
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Extract Structured Data

Use AI to extract structured data from one or more web pages. Provide a JSON Schema for typed output or a natural language prompt for flexible extraction.

Returns: AI-extracted structured data from the provided URLs, matching the schema or prompt
Parameters
urls *arrayURLs to extract data from
schemaobjectJSON Schema defining the structure of data to extract
promptstringNatural language prompt describing what data to extract
Example
Extract product details using a schema
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "extract_data",
  "input": {
    "urls": [
      "https://example.com/product/123"
    ],
    "schema": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "price": {
          "type": "number"
        },
        "currency": {
          "type": "string"
        }
      }
    }
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Search & Scrape

Search the web using a query and optionally scrape the content of each result page. Returns search results with titles, URLs, and snippets, plus full page content when scraping is enabled.

Returns: Search results with titles, URLs, snippets, and optionally full scraped page content in markdown
Parameters
query *stringSearch query string
limitnumberMaximum number of search results to return (default 5)
scrapeResultsbooleanWhether to scrape the full content of each search result page (default false)
countrystringCountry code for localized results (e.g. "us", "gb", "de")
Example
Search for a topic
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "search_web",
  "input": {
    "query": "best practices for web scraping 2024"
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Stealth Scrape Page

Scrape a single bot-protected web page using enhanced residential proxies, geo-targeted IPs, and extended rendering wait times. Bypasses Cloudflare, Akamai, DataDome, and similar anti-bot systems.

Returns: Scraped page content from bot-protected sites in the requested formats with metadata
Parameters
url *stringURL of the page to scrape
formatsarrayOutput formats to return (e.g. "markdown", "html", "rawHtml", "links", "screenshot")
onlyMainContentbooleanExtract only the main content, removing navbars, footers, and sidebars
includeTagsarrayCSS tags to include in extraction (e.g. ["article", "main"])
excludeTagsarrayCSS tags to exclude from extraction (e.g. ["nav", "footer"])
mobilebooleanUse a mobile user agent and viewport for rendering
waitFornumberMilliseconds to wait after page load before capturing content (default 3000)
timeoutnumberMaximum time in milliseconds to wait for the page to load (default 60000)
countrystringISO country code for geo-targeted proxy (e.g. "us", "gb", "de", "jp"). Default: "us"
languagesarrayBrowser language headers (e.g. ["en-US", "en"])
Example
Scrape a Cloudflare-protected page
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "stealth_scrape",
  "input": {
    "url": "https://example.com/protected-page"
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Stealth Crawl Site

Recursively crawl a bot-protected website using enhanced proxies on every page. Bypasses anti-bot systems across the entire crawl, with geo-targeted IPs and extended rendering.

Returns: Array of scraped pages from bot-protected sites with content, metadata, and crawl status
Parameters
url *stringStarting URL for the crawl
limitnumberMaximum number of pages to crawl (default 50)
maxDepthnumberMaximum link-following depth from the starting URL
includePathsarrayURL path patterns to include (e.g. ["/blog/*", "/docs/*"])
excludePathsarrayURL path patterns to exclude (e.g. ["/admin/*", "/api/*"])
allowSubdomainsbooleanWhether to follow links to subdomains of the starting URL
allowExternalLinksbooleanWhether to follow links to external domains
countrystringISO country code for geo-targeted proxy (e.g. "us", "gb", "de", "jp"). Default: "us"
languagesarrayBrowser language headers (e.g. ["en-US", "en"])
Example
Crawl a protected blog section
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{
  "tool": "web-scraper",
  "skill": "stealth_crawl",
  "input": {
    "url": "https://example.com/blog",
    "limit": 20
  }
}' \
  https://api.toolrouter.com/v1/tools/call
Loading reviews...
Loading activity...
v0.042026-03-23
  • Added stealth_scrape and stealth_crawl skills for bot-protected websites
v0.032026-03-23
  • Added proxy, country, and languages parameters to scrape_page and crawl_site for anti-bot bypass and geo-targeting
v0.022026-03-22
  • Added subtitle, expanded description, and agent instructions
v0.012026-03-20
  • Initial release

Quick Start

MCP (Claude Code)
claude mcp add --transport stdio \
  --env TOOLROUTER_API_KEY=YOUR_API_KEY \
  toolrouter -- npx -y toolrouter-mcp
REST API
curl -H "Authorization: Bearer $TOOLROUTER_API_KEY" \
  -d '{"tool":"web-scraper","skill":"scrape_page","input":{}}' \
  https://api.toolrouter.com/v1/tools/call

Use Cases

Open Search Papers by Topic

Search Papers by Topic

Find relevant academic papers on any research topic across millions of scholarly publications.

Academic Research icon
Academic Research
4 agent guides
Open Geocode Addresses to Coordinates

Geocode Addresses to Coordinates

Convert street addresses into precise latitude and longitude coordinates for mapping and spatial analysis.

Address Geocoding icon
Address Geocoding
4 agent guides
View all use cases for Web Scraper

Workflows

Open Ecommerce Competitor Intelligence

Ecommerce Competitor Intelligence

Gather ecommerce competitive intelligence by scraping pricing data, researching businesses, monitoring social commerce, and analyzing ad campaigns.

Web Scraper icon
Web Scraper
Competitor Research icon
Competitor Research
Social Shop Products icon
Social Shop Products
Ad Library Search icon
Ad Library Search
4 steps4 tools
Open Real Estate Market Research

Real Estate Market Research

Research real estate markets by scraping listings, analyzing locations, visualizing pricing data, and tracking market news.

Web Scraper icon
Web Scraper
GEO icon
GEO
Generate Chart icon
Generate Chart
Web Search icon
Web Search
4 steps4 tools
Open Pricing Intelligence

Pricing Intelligence

Monitor and analyze competitor pricing by extracting price data, analyzing strategies, normalizing currencies, and visualizing comparisons.

Web Scraper icon
Web Scraper
Competitor Research icon
Competitor Research
Currency Exchange icon
Currency Exchange
Generate Chart icon
Generate Chart
4 steps4 tools
Open Talent Market Research

Talent Market Research

Research the talent market by scraping job data, benchmarking salaries, analyzing company hiring signals, and comparing labor markets.

Web Scraper icon
Web Scraper
Web Search icon
Web Search
Social Profiles icon
Social Profiles
Country Data icon
Country Data
4 steps4 tools
View all 5workflows →

Frequently Asked Questions

Can it handle JavaScript-heavy sites?

Yes. JavaScript rendering is built in, so dynamic pages are part of the normal workflow.

Can it crawl and discover new URLs?

Yes. It can scrape a page, crawl sites, and discover URLs in one tool.

What formats can I extract?

You can pull markdown, HTML, or typed data depending on how structured you want the result to be.

Does it help with anti-bot pages?

Yes. Anti-bot bypass is part of the web access surface for the tool.