Get URLs from Sitemap Block

Overview

The Get URLs from Sitemap block extracts URLs from a website’s XML sitemap. Use it to discover all pages on a website, filter for specific sections, and feed URLs into other blocks for bulk scraping, analysis, or content processing.

Configuration

Sitemap URL

Enter the URL of the sitemap to parse. This is typically located at /sitemap.xml on most websites. Common sitemap locations:

https://example.com/sitemap.xml
https://example.com/sitemap_index.xml
https://example.com/post-sitemap.xml

This field supports placeholders for dynamic sitemap URLs:

https://{{step_1.output.domain}}/sitemap.xml

Most websites list their sitemap location in the robots.txt file at https://example.com/robots.txt. Check there if you can’t find the sitemap.

Maximum Number of Links

Limit how many URLs are returned from the sitemap.

Default: 100 URLs
Minimum: 1 URL

Set a lower limit when:

Testing your workflow before running at scale
Processing only a sample of pages
Staying within credit budgets for subsequent scraping

Include Only Links That Contain

Filter URLs to include only those matching specific patterns. Enter one or more text patterns separated by commas. How it works:

URLs must contain at least one of the specified patterns (OR logic)
Matching is case-sensitive
Partial matches work (e.g., /blog/ matches /blog/post-title)

Examples:

Pattern	Matches
`/blog/`	All blog post URLs
`/products/`	All product pages
`/blog/, /news/`	Blog posts OR news articles
`/2024/`	Pages with 2024 in the URL
`/category/seo/`	SEO category pages only

This field supports placeholders:

/category/{{step_1.output.category}}/

Exclude Links That Contain

Filter out URLs containing specific patterns. Enter one or more text patterns separated by commas. How it works:

URLs matching any pattern are removed (OR logic)
Applied after include filter
Useful for removing unwanted page types

Examples:

Pattern	Excludes
`/tag/`	Tag archive pages
`/author/`	Author pages
`/page/`	Pagination pages
`/amp/, /feed/`	AMP pages and RSS feeds
`/admin/, /api/`	Admin and API endpoints

Output

The block returns an array of URL strings extracted from the sitemap.

Output Example

[
  "https://example.com/blog/how-to-improve-seo",
  "https://example.com/blog/content-marketing-guide",
  "https://example.com/blog/keyword-research-tips",
  "https://example.com/blog/link-building-strategies"
]

Accessing URLs

Get all URLs:

{{step_n.output}}

Get first URL:

{{step_n.output[0]}}

Get URL count:

{{step_n.output | size}}

Loop through URLs: Use with the Loop block to process each URL individually.

Combining Filters

Include and exclude filters work together:

First, include filter is applied (if set)
Then, exclude filter removes unwanted URLs
Finally, the limit is applied

Example:

Sitemap: https://example.com/sitemap.xml
Include: /blog/
Exclude: /tag/, /author/
Limit: 50

Result: Up to 50 blog post URLs, excluding tag and author pages.

Best Practices

Start with a small limit when testing workflows
Use include filters to target specific content types
Exclude pagination, tags, and archives for cleaner results
Check the sitemap structure first to understand URL patterns
Combine with Loop and Web Scrape blocks for bulk content extraction
Some sites have multiple sitemaps; check the sitemap index

Common Use Cases

Use Case	Configuration Tips
Blog content audit	Include `/blog/`, exclude `/tag/, /author/, /page/`
Product catalog extraction	Include `/products/` or `/shop/`
Competitor page discovery	Set high limit, filter by content sections
Content migration	Extract all URLs, scrape content from each
SEO analysis	Get all URLs, analyze with LLM for optimization opportunities
Broken link checking	Extract URLs, use Call API to check status codes

Example Workflow: Bulk Content Analysis

Analyze all blog posts from a competitor:

Get URLs from Sitemap Block:
- Sitemap URL: https://competitor.com/sitemap.xml
- Include: /blog/
- Exclude: /tag/, /category/, /author/
- Limit: 100
Loop Block: Iterate through each URL
Web Scrape Block:
- URL: {{current}}
- Format: Markdown
- Only Main Content: On
LLM Block: Analyze content themes and structure
Google Sheets Block: Store analysis results

Example Workflow: Site Inventory

Create a complete inventory of a website’s pages:

Get URLs from Sitemap Block:
- Sitemap URL: https://yoursite.com/sitemap.xml
- Limit: 500
Loop Block: Process each URL
Web Scrape Block:
- URL: {{current}}
- Format: Markdown
- Include Metadata: On
Google Sheets Block: Append URL, title, and description

Troubleshooting

Issue	Cause	Solution
No URLs returned	Invalid sitemap URL	Verify the sitemap exists and is accessible
Empty results	Filters too restrictive	Broaden include patterns or remove exclude patterns
Missing pages	Sitemap not complete	Check if site has multiple sitemaps
Wrong pages	Incorrect filter pattern	Test patterns against actual sitemap URLs
Timeout	Very large sitemap	Reduce limit to process fewer URLs

What’s Next

Now that you understand the Get URLs from Sitemap block:

Learn about Web Scrape Block to extract content from discovered URLs
See Loop Block to process multiple URLs
Explore Google Sheets Block to store URL lists
Check Call API Block to check URL status codes

Getting Started

AI Search Analytics

Workflow

Sheets

Overview

Configuration

Sitemap URL

Maximum Number of Links

Include Only Links That Contain

Exclude Links That Contain

Output

Output Example

Accessing URLs

Combining Filters

Best Practices

Common Use Cases

Example Workflow: Bulk Content Analysis

Example Workflow: Site Inventory

Troubleshooting

What’s Next

Getting Started

AI Search Analytics

Workflow

Sheets

​Overview

​Configuration

​Sitemap URL

​Maximum Number of Links

​Include Only Links That Contain

​Exclude Links That Contain

​Output

​Output Example

​Accessing URLs

​Combining Filters

​Best Practices

​Common Use Cases

​Example Workflow: Bulk Content Analysis

​Example Workflow: Site Inventory

​Troubleshooting

​What’s Next

Overview

Configuration

Sitemap URL

Maximum Number of Links

Include Only Links That Contain

Exclude Links That Contain

Output

Output Example

Accessing URLs

Combining Filters

Best Practices

Common Use Cases

Example Workflow: Bulk Content Analysis

Example Workflow: Site Inventory

Troubleshooting

What’s Next