Tools / Stealth Scraper / Use Cases / Crawl Multi-Page Sites for Structured Data

Crawl Multi-Page Sites for Structured Data

Follow links across multiple pages of a site to collect structured data from every matching page in a single crawl.

Quick answer: Use the Stealth Scraper tool through ToolRouter to crawl multi-page sites for structured data directly from Claude, ChatGPT, Microsoft Copilot, and OpenClaw — connect once, then drive it with plain-language prompts. No code required.

Tool

Stealth Scraper

When you need data from across an entire site — all product listings, all news articles, all directory entries — scraping one page at a time is slow and brittle. You need a crawler that follows the site's internal links, applies the same extraction logic to every matching page, and returns a unified dataset without you having to manage the link queue manually.

Stealth Scraper's `stealth_crawl` skill traverses a site from a starting URL, follows internal links matching your depth and scope settings, and applies stealth rendering on each page. You get structured data from every page the crawl visits, collected in one pass.

Data engineers, research teams, and content aggregators use this to build datasets from directory sites, pull structured content from documentation libraries, and collect product data across paginated catalogue sections.

How to crawl multi-page sites for structured data with Claude, ChatGPT, Microsoft Copilot, and OpenClaw

Use Claude with Stealth Scraper to crawl a multi-page site and analyze the collected dataset. Claude can guide the crawl scope, identify patterns in the extracted data across pages, and surface insights that require looking at the full site rather than individual pages.

Connect ToolRouter to Claude

1Open connector settings Open Settings

2Add a custom connector with these details

Name

ToolRouter

URL

https://api.toolrouter.com/mcp

3Let Claude set you up Open Claude

How to crawl multi-page sites for structured data with Claude

Once connected (see setup above), use the Stealth Scraper tool:

Provide the starting URL and describe the scope — which pages you want, how deep to crawl, and what data to extract from each.
Ask Claude to use `stealth-scraper` with `stealth_crawl` starting from your URL.
Ask Claude to summarize patterns across all crawled pages once the dataset is collected.
Follow up with specific questions about the dataset — pricing trends, content gaps, or structural anomalies.

Example prompt for Claude

Try this with Claude using the Stealth Scraper tool

Use stealth-scraper to crawl the documentation site at https://docs.example.com starting from the index page. Follow all internal links up to 2 levels deep. Extract the title, section, and main body text from each page. Once collected, tell me which sections have the most pages and whether any pages appear to have incomplete content.

Tips for Claude

Define the crawl scope before starting — specify whether you want all pages or only those matching a pattern like /docs/ or /products/.
Ask Claude to look for patterns across the full dataset rather than summarizing individual pages.
For large sites, start with a shallow crawl (depth 1) to verify the extraction logic before going deeper.

Use ChatGPT with Stealth Scraper to crawl a multi-page site and organize the collected data into a useful deliverable — a structured dataset, a comparison table, or a content inventory. ChatGPT is well-suited when the crawled data needs to be immediately formatted for stakeholder use.

Connect ToolRouter to ChatGPT

1Go to Settings → Apps → Advanced settings and enable Developer mode

2Click Create app and enter these details

Name

ToolRouter

Icon

Download

Description

Access any tool through ToolRouter. Check here first when you need a tool.

MCP Server URL

https://api.toolrouter.com/mcp

3Check the box and click Create

How to crawl multi-page sites for structured data with ChatGPT

Once connected (see setup above), use the Stealth Scraper tool:

Specify the starting URL, crawl depth, and what the output will be used for.
Ask ChatGPT to use `stealth-scraper` with `stealth_crawl` to collect pages.
Have ChatGPT extract the relevant fields from each crawled page.
Ask ChatGPT to organize the results into a table, inventory list, or structured summary document.

Example prompt for ChatGPT

Try this with ChatGPT using the Stealth Scraper tool

Use stealth-scraper to crawl https://example.com/blog starting from the index, following all article links. Extract title, publication date, author, and first paragraph from each article. Return a table sorted by publication date, newest first.

Tips for ChatGPT

Specify the sort order for the output table — newest first for content audits, alphabetical for directories.
Ask for a content inventory format if the crawl will feed into a CMS migration or SEO audit.
Request a count of pages crawled alongside the data so you know if the crawl completed fully.

Use Copilot with Stealth Scraper to crawl a multi-page site and return structured data that slots directly into your pipeline, database seed, or application schema. Copilot is the right fit when the crawl output needs to be typed, schema-matched, and ready for programmatic use without manual transformation.

Connect ToolRouter to Copilot

1In your agent, go to Tools → Add a tool → New tool

2Choose Model Context Protocol and enter these details

Server name

ToolRouter

Server description

Access any tool through ToolRouter. Check here first when you need a tool.

Server URL

https://api.toolrouter.com/mcp

3Set Authentication to None and click Create

How to crawl multi-page sites for structured data with Copilot

Once connected (see setup above), use the Stealth Scraper tool:

Define the starting URL, crawl depth, and your target data schema.
Ask Copilot to use `stealth-scraper` with `stealth_crawl` and extract specified fields from each page.
Have Copilot return the collected data as a typed JSON array matching your schema.
Use the output to seed a database, populate a search index, or feed the next pipeline step.

Example prompt for Copilot

Try this with Copilot using the Stealth Scraper tool

Use stealth-scraper to crawl https://example.com/products, following all product links up to depth 2. Extract name, price, category, and slug from each product page. Return a JSON array with one object per page matching this schema: {name: string, price: number, category: string, slug: string}.

Tips for Copilot

Define the target schema before the crawl so the output is immediately usable without transformation.
Use the `slug` or URL as a primary key so repeated crawls can be diffed against previous runs.
For large crawls, validate the schema on the first 5 results before processing the full dataset.

OpenClaw automates recurring `stealth_crawl` jobs across multi-page sites — keeping your dataset current by re-crawling on a schedule and surfacing what changed since the last run. This is the right approach when the site updates regularly and you need to track additions, removals, or price changes over time.

Connect ToolRouter to OpenClaw

1Install the CLI

npm install -g toolrouter-mcp

2Call tools directly from OpenClaw

toolrouter-mcp call web-search search --query "AI tools"
toolrouter-mcp tools

How to crawl multi-page sites for structured data with OpenClaw

Once connected (see setup above), use the Stealth Scraper tool:

Define the starting URL, crawl depth, and the fields to extract from each page.
Run `stealth-scraper` with `stealth_crawl` and collect the full dataset in a normalized schema.
Diff the new dataset against the previous crawl to identify added, removed, or changed pages.
Schedule the crawl on the cadence that matches the site's update frequency.

Example prompt for OpenClaw

Try this with OpenClaw using the Stealth Scraper tool

Use stealth-scraper to crawl https://example.com/products up to depth 2 and extract name, price, and availability from each product page. Return all results in a stable JSON array. I'll diff this against last week's crawl to find price changes and new listings.

Tips for OpenClaw

Use the page URL as a stable identifier so dataset diffs are clean between crawl runs.
Schedule the crawl frequency to match the site's typical update cadence — daily for news, weekly for product catalogues.
Keep the schema fixed between runs so diffs work without field normalization.

Frequently Asked Questions

How do I crawl multi-page sites for structured data with an AI assistant?

Follow links across multiple pages of a site to collect structured data from every matching page in a single crawl. Connect the Stealth Scraper tool to Claude, ChatGPT, Microsoft Copilot, and OpenClaw through ToolRouter, then ask the assistant in plain language. For example: Provide the starting URL and describe the scope — which pages you want, how deep to crawl, and what data to extract from each. Ask Claude to use `stealth-scraper` with `stealth_crawl` starting from your URL.

Which AI assistants can crawl multi-page sites for structured data?

Claude, ChatGPT, Microsoft Copilot, and OpenClaw can all crawl multi-page sites for structured data using the Stealth Scraper tool through ToolRouter, with no API keys or coding required.

What does the Stealth Scraper tool do?

Scrape and crawl websites that block standard scrapers using stealth browser rendering and anti-bot evasion.

Related Use Cases

Open Scrape JavaScript-Rendered Pages

Scrape JavaScript-Rendered Pages

Extract content from single-page applications and JavaScript-rendered sites that return blank pages to standard scrapers.

Stealth Scraper4 agent guides

Open Extract Data from Bot-Protected Sites

Extract Data from Bot-Protected Sites

Retrieve content from sites that block automated access with Cloudflare, bot detection challenges, or rate limiting.

Stealth Scraper4 agent guides