Extract page URLs from a sitemap file (<urlset>)

This robot pulls all page URLs and metadata (like last modified date) from any valid sitemap file containing <urlset> tags.

What is a sitemap URL set?

A Sitemap URL Set is a type of XML file that websites use to list their publicly accessible pages. These files often end in .xml or .xml.gz and are typically referenced in the site’s robots.txt.

Example structure:

<urlset>
 <url>
   <loc>https://example.com/product/123</loc>
   <lastmod>2025-07-25</lastmod>
 </url>
</urlset>

This robot extracts every <loc> (URL) and, if available, the <lastmod> (last modified date).

What this robot extracts

From any valid sitemap file using <urlset> and <url> tags, the robot retrieves:

  • ✅ Page URLs (<loc>)
  • ✅ Last modified timestamps (<lastmod>, if present)
  • ✅ Clean, structured output to Google Sheets, Airtable, or Zapier

Use cases

📈 Website change monitoring

Automatically detect when new pages are added or existing ones are updated—without crawling the full site.

🔍 SEO audits & indexing checks

Export all indexed URLs to review crawlability, detect broken links, or clean up outdated content.

🕵️ Competitive research

Track which product, blog, or landing pages your competitors are publishing or updating over time.

🧠 Content planning

Map out published URLs and update patterns for your editorial calendar or content migration projects.

Why use this prebuilt robot?

  • Instant Setup – Point it to any sitemap file and extract data in seconds
  • 🧩 No Coding Required – Works with .xml and .xml.gz files
  • 📤 Flexible Output – Export directly to Google Sheets, Airtable, or via Zapier
  • 🔄 Automated Monitoring – Schedule checks for newly added or updated pages

Frequently asked questions

🗂️ What file types are supported?

Any .xml or .xml.gz sitemap that uses <urlset> and <url> tags.

🌐 Can I use this on any website?

Yes — as long as the sitemap is publicly accessible, you can extract from it.

🔔 Can I set up alerts for new URLs?

Absolutely. Use monitoring + Zapier to get notifications or auto-send new URLs to your internal tools.

📊 Can I chain this with other robots?

Yes! Use this output as input for deeper scraping robots to extract content from each page.

Use this automation
Related Prebuilt Robots
Didn't find what you're looking for?
No problem – we're here to help.
Do it yourself.
No coding needed.
Anyone can use Browse AI to extract or monitor data from any website. We've made it as simple and quick as possible.
Sign up now
Managed web scraping extraction and management
Our fully managed Premium plan gets you the data you want, the way you want it, with zero hassle. Book a consultation call and we’ll get you a free data sample in two business days.
Book a call
Subscribe to our Newsletter
Receive the latest news, articles, and resources in your inbox monthly.
By subscribing, you agree to our Privacy Policy and provide consent to receive updates from Browse AI.
Oops! Something went wrong while submitting the form.