Extract page URLs from a sitemap file (<urlset>)

This robot pulls all page URLs and metadata (like last modified date) from any valid sitemap file containing <urlset> tags.

What is a sitemap URL set?

A Sitemap URL Set is a type of XML file that websites use to list their publicly accessible pages. These files often end in .xml or .xml.gz and are typically referenced in the site’s robots.txt.

Example structure:

<urlset>
<url>
<loc>https://example.com/product/123</loc>
<lastmod>2025-07-25</lastmod>
</url>
</urlset>

This robot extracts every <loc> (URL) and, if available, the <lastmod> (last modified date).

What this robot extracts

From any valid sitemap file using <urlset> and <url> tags, the robot retrieves:

✅ Page URLs (<loc>)
✅ Last modified timestamps (<lastmod>, if present)
✅ Clean, structured output to Google Sheets, Airtable, or Zapier

Use cases

📈 Website change monitoring

Automatically detect when new pages are added or existing ones are updated—without crawling the full site.

🔍 SEO audits & indexing checks

Export all indexed URLs to review crawlability, detect broken links, or clean up outdated content.

🕵️ Competitive research

Track which product, blog, or landing pages your competitors are publishing or updating over time.

🧠 Content planning

Map out published URLs and update patterns for your editorial calendar or content migration projects.

Why use this prebuilt robot?

⚡ Instant Setup – Point it to any sitemap file and extract data in seconds
🧩 No Coding Required – Works with .xml and .xml.gz files
📤 Flexible Output – Export directly to Google Sheets, Airtable, or via Zapier
🔄 Automated Monitoring – Schedule checks for newly added or updated pages

Frequently asked questions

🗂️ What file types are supported?

Any .xml or .xml.gz sitemap that uses <urlset> and <url> tags.

🌐 Can I use this on any website?

Yes — as long as the sitemap is publicly accessible, you can extract from it.

🔔 Can I set up alerts for new URLs?

Absolutely. Use monitoring + Zapier to get notifications or auto-send new URLs to your internal tools.

📊 Can I chain this with other robots?

Yes! Use this output as input for deeper scraping robots to extract content from each page.