This robot pulls all page URLs and metadata (like last modified date) from any valid sitemap file containing <urlset> tags.
A Sitemap URL Set is a type of XML file that websites use to list their publicly accessible pages. These files often end in .xml or .xml.gz and are typically referenced in the site’s robots.txt.
Example structure:
<urlset>
<url>
<loc>https://example.com/product/123</loc>
<lastmod>2025-07-25</lastmod>
</url>
</urlset>
This robot extracts every <loc> (URL) and, if available, the <lastmod> (last modified date).
From any valid sitemap file using <urlset> and <url> tags, the robot retrieves:
<loc>)<lastmod>, if present)📈 Website change monitoring
Automatically detect when new pages are added or existing ones are updated—without crawling the full site.
🔍 SEO audits & indexing checks
Export all indexed URLs to review crawlability, detect broken links, or clean up outdated content.
🕵️ Competitive research
Track which product, blog, or landing pages your competitors are publishing or updating over time.
🧠 Content planning
Map out published URLs and update patterns for your editorial calendar or content migration projects.
.xml and .xml.gz filesAny .xml or .xml.gz sitemap that uses <urlset> and <url> tags.
Yes — as long as the sitemap is publicly accessible, you can extract from it.
Absolutely. Use monitoring + Zapier to get notifications or auto-send new URLs to your internal tools.
Yes! Use this output as input for deeper scraping robots to extract content from each page.