Turn any sitemap into a structured database of page URLs with last-modified dates. This robot transforms standard sitemap files into clean data for SEO analysis, content monitoring, and systematic web scraping workflows with no coding required.
Simply provide the sitemap URL, and this robot delivers:
✓ Extract complete page inventories for content audits and SEO analysis.
✓ Monitor content freshness through last-modified timestamps.
✓ Build comprehensive URL lists for systematic web scraping.
✓ Track competitor publishing velocity and content strategies.
A Sitemap URL Set is a type of XML file that websites use to list their publicly accessible pages. These files often end in .xml or .xml.gz and are typically referenced in the site’s robots.txt.
Example structure:
<urlset>
<url>
<loc>https://example.com/product/123</loc>
<lastmod>2025-07-25</lastmod>
</url>
</urlset>
This robot extracts every <loc> (URL) and, if available, the <lastmod> (last modified date).
This robot instantly converts that messy XML into a clean spreadsheet of URLs with dates - ready for whatever you need to do next.
To extract URL data, you only need:
Sitemap URL: The direct link to the sitemap.xml file (usually sitemap.xml or found via robots.txt)
The robot automatically handles standard and compressed formats, extracting all available metadata.
<loc> tags<lastmod> tags when availableSEO professionals: Audit complete website content coverage. Compare sitemap URLs against indexed pages to identify gaps.
Content strategists: Track competitor publishing patterns and content velocity. Identify which sections get updated most frequently.
Web scraping specialists: Generate comprehensive URL lists for systematic data extraction. Build incremental scraping based on last-modified dates.
Digital marketers: Monitor competitor content strategies through URL patterns. Track seasonal content changes and campaign launches.
Connect this robot to:
How many URLs can this robot extract?
The robot handles sitemaps of any size, from small sites with dozens of pages to enterprise sites with hundreds of thousands of URLs.
What sitemap formats are supported?
Standard XML sitemaps (.xml) and compressed versions (.xml.gz) that follow the sitemap protocol. Most CMSs generate compatible formats.
Can I filter URLs during extraction?
The robot extracts all URLs in the sitemap. Filter results afterward in your connected tools like Google Sheets or Airtable using URL patterns.
How do I chain this with the sitemap index extractor?
First use the sitemap index extractor to get all sitemap files, then run this robot on each sitemap to get all page URLs.
Extract sitemap URLs from index - Start here if the website uses a sitemap index to organize multiple sitemaps. Get all sitemap files first, then extract URLs from each.
Extract text from any webpage - Feed your extracted URLs into this robot to scrape actual page content at scale.
Extract HTML and screenshot - Combine with URL lists to archive both code and visual appearance of pages.
Monitor Google search results - Compare sitemap URLs against actual search rankings to identify indexation issues.