How to Scrape Websites with Claude

How-ToBlake Folgado
How to Scrape Websites with Claude

To scrape websites with Claude, you need to connect a tool that can fetch live URLs — Claude cannot do this natively. Connect ToolRouter and Claude gains access to Web Scraper for standard pages and full site crawls, Stealth Scraper for bot-protected sites behind Cloudflare, Akamai, and DataDome, Lead Finder for enriching scraped data at scale, and social scraping tools for Twitter/X, Reddit, and TikTok. The result is the full scraping workflow inside a single conversation — fetch, extract, structure, and enrich — without touching Python, Playwright, or a separate API.

This query has been answered almost exclusively by coding tutorials. Every result shows you how to write a Playwright script, install a scraping library, or configure a headless browser. That is a genuine solution for developers, but for anyone who wants to use Claude to gather and act on web data, there is a much faster path. Claude already knows how to reason about web data — what it needs is the ability to fetch it.

According to Grand View Research, the global web scraping services market was valued at $1.2 billion in 2024 and is projected to grow at 15% annually through 2030. According to Crayon, 94% of companies say competitive intelligence is important for their growth, and web data is the primary source. According to McKinsey, teams using AI to gather and process external data make decisions 5× faster than those relying on manual research. The demand for web data is not a developer problem — it is a business problem. The tools below solve it without writing a single line of code.

What Claude Can and Cannot Do Natively for Web Scraping

Claude's built-in web capabilities are limited to what it was trained on:

  • Claude can analyze and summarize web content if you paste it directly into the conversation
  • Claude can write web scraping code in Python, JavaScript, or any language you specify
  • Claude can explain how scraping works, what tools exist, and how to structure a scraper
  • Claude cannot fetch a live URL and return its content
  • Claude cannot crawl a site and discover all its pages
  • Claude cannot extract structured data from a live product listing, job board, or directory
  • Claude cannot bypass bot protection on Cloudflare or Akamai protected sites
  • Claude cannot pull live posts, profiles, or comments from social platforms
What you wantClaude aloneClaude + ToolRouter
Fetch the content of a specific URLNoYes
Crawl an entire site and return all page contentNoYes
Extract structured data (prices, names, emails) from any pageNoYes
Bypass bot protection on Cloudflare or Akamai sitesNoYes
Scrape a product catalogue into a spreadsheetNoYes
Pull posts, profiles, and comments from Reddit or Twitter/XNoYes
Enrich scraped company data with contacts and tech stackNoYes
Write a Python scraper that you run yourselfYesYes

How to Connect ToolRouter and Start

The setup takes two steps.

  1. In Claude, go to Settings → Connectors → Add custom connector
  2. Enter:
  • Name: ToolRouter
  • URL: https://api.toolrouter.com/mcp

Or visit toolrouter.com/connect for one-click setup.

Once connected, describe what you need rather than selecting tools manually:

  • "Scrape [URL] and give me all the product names and prices."
  • "Crawl [website] and give me the content of every page."
  • "This site is behind Cloudflare — use stealth mode and get the content anyway."
  • "Pull the last 50 posts from [subreddit] and tell me what problems people are complaining about."

Claude picks the right tool and the right skill for each request, and handles multi-step workflows without you moving data between applications.

Which ToolRouter Scraping Tool to Use

If you need...Use thisWhy
Scrape any public page or crawl a siteWeb ScraperFull JS rendering, anti-bot bypass, markdown or HTML output. Handles standard pages, JSON APIs, and multi-page crawls up to any depth.
Get past Cloudflare, Akamai, or DataDomeStealth ScraperResidential proxies, geo-targeted IPs, and extended render waits. Use when Web Scraper returns 403, 429, or a CAPTCHA.
Extract structured data from scraped pages at scaleWeb ScraperThe extract_data skill takes a URL and a schema or plain-English prompt, and returns typed structured data. No selectors required.
Scrape a full product catalogue into a fileCatalogue ScraperRecursively crawls any online store and returns all products, categories, and images as structured data — ready to export.
Enrich scraped company data with contactsLead FinderPass a domain from a scraped page and get contacts, emails, headcount, and tech stack from the enrich_company skill.
Pull posts, threads, and comments from RedditReddit ResearchSearch subreddits, get top posts, and pull full comment threads. Use to understand what a community is actually saying about a topic.
Pull posts and profiles from Twitter/XTwitter AnalyticsGet recent tweets from any public profile, search posts by keyword, and pull full tweet transcripts.

Four Scraping Workflows That Replace Manual Research

1. Scrape Any Page or Crawl a Full Site

Use this when you want to pull content from one page or gather everything from an entire domain.

Workflow: Web Scraper (scrape_page) for a single URL → Web Scraper (map_site + crawl_site) for a full site

Example:

Scrape the pricing page at [URL]. Give me every plan name, price, feature list, and any pricing footnotes. Format it as a comparison table.

For a full site:

Map [website URL] to see all its pages, then crawl the entire site and pull the content from every page. Summarise the site structure and flag any pages that look like they have product or pricing information.

The scrape_page skill renders JavaScript before returning content — it handles React, Vue, and SPA pages that return blank HTML to standard requests. map_site gives Claude a full URL inventory first, so crawl_site can work efficiently rather than discovering pages as it goes. Claude reads the result and structures it according to what you actually need — not just a raw dump.

2. Bypass Bot Protection with Stealth Mode

Use this when a site returns a CAPTCHA, Cloudflare challenge, or 403 error.

Workflow: Web Scraper (scrape_page) → if blocked, Stealth Scraper (stealth_scrape) or Stealth Scraper (stealth_crawl)

Example:

The competitor site at [URL] is blocking scraping. Use stealth mode to get past it and return the content of their pricing page and their blog index.

Stealth Scraper routes requests through residential proxies with geo-targeted IPs, extends the render wait time for heavy SPAs, and rotates headers to match real browser fingerprints. It handles Cloudflare, Akamai, DataDome, and PerimeterX — the four major anti-bot systems. Claude tries the standard scraper first and escalates to stealth automatically if it detects a block, so you rarely need to specify which to use.

The practical use case: competitor research on sites that detect and block scraping tools. Pricing pages, product catalogues, job boards, and directories are the most commonly protected pages — and the ones you most often want data from.

3. Extract Structured Data at Scale

Use this when you want typed, structured output from many pages rather than raw content.

Workflow: Web Scraper (extract_data) on target URLs → Lead Finder (enrich_company) to add contact and company data

Example:

Go to [directory URL] and extract every company name, website, location, and description listed on the page. Then for each company, enrich the data with their employee count, tech stack, and a contact email if available.

The extract_data skill takes a natural-language description of what you want rather than CSS selectors or XPath. You describe the schema — "I want: company name, founding year, number of employees, and LinkedIn URL" — and Claude writes the extraction logic internally and returns clean typed data. For enrichment, Lead Finder's enrich_company skill takes a domain and returns headcount, tech stack, revenue estimates, and contact information from multiple data sources.

This workflow replaces the typical scrape → CSV → manual enrichment → CRM import pipeline. The entire thing runs in one conversation.

4. Scrape Social Platforms Without an API

Use this when you want posts, comments, or profiles from Twitter/X, Reddit, or TikTok.

Workflow: Reddit Research (search_subreddit + get_post_comments) or Twitter Analytics (get_tweets + get_profile) or Social Media Search (search_tiktok_keywords)

Example for Reddit:

Search [subreddit] for posts about [topic] from the last month. Pull the top 20 posts and their comment threads. Tell me: what are the most common complaints, what solutions are people recommending, and what products keep coming up?

Example for Twitter/X:

Get the last 50 tweets from [username]. What topics do they post about most often? What gets the most engagement? Are there any patterns in what performs well?

Reddit Research returns full comment trees, not just headlines — which is where the useful signal actually lives. Twitter Analytics returns full tweet text including thread context, so Claude can reason about positioning and messaging rather than just counting likes. Social Media Search covers TikTok — useful for understanding what content formats work in a niche before investing in production.

None of these require you to register for a developer API or manage rate limits. Claude handles the calls and synthesises the results into what you actually want to know.

Give your AI superpowers — connect once and access every tool.

Get started for free

Best Prompts for Web Scraping in Claude

Scrape a competitor's pricing page

Scrape the pricing page at [URL].
Return every plan, its price (monthly and annual if both exist),
every feature listed under each plan, and any footnotes or asterisks.
Format it as a side-by-side comparison table.

Crawl a site and audit its content

Map [website URL] to get a full list of pages.
Then crawl the site and return the content of every page.
Give me:
1. A list of all pages with a one-line summary of each
2. Any pages that have pricing, product, or contact information highlighted separately
3. Any pages that appear thin or outdated

Extract structured data from a directory

Scrape [directory or listing page URL].
For each company or item listed, extract:
- Name
- Website URL
- Description or tagline
- Location (if shown)
- Any other fields visible on the page

Return the results as a structured table.

Research what a community is saying about a topic

Search Reddit for posts about [topic] in [subreddit or across all of Reddit].
Pull the top 20 posts and their top comments.
Tell me:
1. The most common pain points or complaints
2. Products or solutions people recommend most often
3. Any recurring questions that don't have good answers yet

Why One Connector Beats a Stack of Scraping Tools

The standard web scraping workflow for non-developers: find an online scraping tool, paste a URL, download a CSV, open it in a spreadsheet, manually remove the junk columns, realise the tool doesn't handle JavaScript, find another tool, repeat.

SetupWhat you getWhere it breaks
Online scraping toolsSimple UI for basic pagesCan't handle JS, bot protection, or structured extraction
Python + PlaywrightFull controlRequires code; a new script for every new site structure
Claude + ToolRouterFetch, extract, enrich, and analyse in one conversationNo gaps between steps; Claude adapts to each site automatically

The specific advantage of keeping scraping inside Claude is that the data never leaves the conversation context. If Claude scrapes 50 company pages, it already has all that data when you ask it to filter, rank, or write outreach based on what it found. You do not export a CSV and then re-explain your criteria somewhere else.

For setup instructions see How to Add Connectors to Claude. If you are scraping sites to build a lead list, the enrichment and outreach steps are covered in How to Find Leads in Claude. For SEO research workflows that combine scraping with keyword and competitor analysis, see Best AI Tools for SEO in 2026. For Claude's full capability picture with connected tools, see What Can Claude Actually Do in 2026?.

Frequently Asked Questions

Can Claude scrape websites natively?

**No.** Claude cannot fetch live URLs, crawl sites, or return current web content without a connected tool. What it can do natively is analyze and reason about content you paste in, or write scraping code you run yourself. Connect [ToolRouter](/connect) and Claude can scrape pages, crawl full sites, bypass bot protection, and extract structured data — all without any code.

What if the site I want to scrape is behind Cloudflare?

Use [Stealth Scraper](/tools/stealth-scraper). It routes requests through residential proxies with geo-targeted IPs and extended rendering wait times, specifically designed to bypass Cloudflare, Akamai, DataDome, and PerimeterX. Ask Claude: "This site is blocking scraping — use stealth mode to get [URL]." Claude will escalate automatically.

Can Claude scrape JavaScript-heavy sites?

**Yes.** Both [Web Scraper](/tools/web-scraper) and [Stealth Scraper](/tools/stealth-scraper) render JavaScript before returning content. React apps, Vue apps, and single-page applications that return blank HTML to basic requests are handled correctly. The scraper waits for the page to fully render, then extracts the content.

Can I extract specific data fields rather than getting the whole page?

**Yes.** The `extract_data` skill in [Web Scraper](/tools/web-scraper) takes a URL and a description of what you want — in plain English or as a JSON schema — and returns only those fields as structured data. You do not need to write CSS selectors or XPath. Just describe what you want: "Extract the product name, price, and stock status from each item on this page."

Can Claude scrape multiple pages at once?

**Yes.** `crawl_site` recursively crawls any domain up to a depth and page limit you specify. For a full site, Claude will typically `map_site` first (fast URL discovery with no content) and then crawl only the pages you care about. You can also give Claude a list of URLs and it will scrape each one in sequence, combining the results.

Is it legal to scrape websites?

Web scraping legality depends on the site, jurisdiction, and how the data is used. Public data that is not behind a login, not covered by specific terms of service restrictions, and not used to harm the site (e.g. through excessive load) is generally considered fair use in most jurisdictions. The landmark *hiQ Labs v. LinkedIn* ruling affirmed that scraping publicly available data does not violate the Computer Fraud and Abuse Act. That said, you should review a site's terms of service before scraping it, and avoid scraping in ways that could be considered harmful or that explicitly violate the terms you agreed to.

B
Founder at ToolRouter
Share this article

Related Posts