View all prebuilt robots

Sitemap extractor for URLs from XML sitemaps and sitemap index files

Extract all URLs from XML sitemap files and sitemap index files - ranked positions, full URLs, and last modified dates - to audit site architecture, discover content, and power SEO analysis.

Automations

+

Browse AI robot
On this page

What this robot does

XML sitemaps are the definitive map of a website's content. Every well-maintained site publishes a sitemap.xml (or a sitemap index that references multiple sub-sitemaps) listing every URL the site wants search engines to index. Sitemap index files take this further - they list multiple sitemaps, each potentially containing thousands of URLs, organized by content type or section.

For SEO professionals auditing site structure, competitive analysts mapping competitor content, and developers verifying deployment coverage, sitemaps are the fastest way to understand exactly what a site contains. This robot extracts the position, full URL (loc), and last modified date from sitemap XML files and sitemap index files into a flat, structured dataset.

What sitemap extraction delivers:

  • ✓ Complete URL inventory: A sitemap lists every page a site wants indexed. Extract every URL with its position and last modified date - no crawling required.
  • ✓ Content structure revelation: URL patterns in sitemaps reveal site architecture. Extract URLs and analyze path patterns to understand how a site organizes its content.
  • ✓ Freshness assessment: Last modified dates show which pages are actively maintained and which are stale. Extract dates to identify outdated content needing attention.
  • ✓ Indexation coverage analysis: Compare sitemap URLs against what's actually indexed in search engines. Gaps indicate pages that aren't being crawled or indexed properly.
Positionloclastmod
1https://www.adidas.de/glass/sitemaps/adidas/DE/en/sitemaps/adidas-DE-en-de-product.xml2026-02-25T16:41:49.894Z
2https://www.adidas.de/glass/sitemaps/adidas/DE/en/sitemaps/blog-pages-sitemap.xml2026-02-25T16:41:49.894Z
3https://www.adidas.de/glass/sitemaps/adidas/DE/en/sitemaps/plp-sitemap-1.xml2026-02-25T16:41:49.894Z
4https://www.adidas.de/glass/sitemaps/adidas/DE/en/sitemaps/plp-sitemap-2.xml2026-02-25T16:41:49.894Z
5https://www.adidas.de/glass/sitemaps/adidas/DE/en/sitemaps/plp-sitemap-3.xml2026-02-25T16:41:49.894Z

How to extract URLs from a sitemap in 4 steps

No special access needed. The robot reads publicly accessible XML sitemap files and delivers structured URL data.

  • A free Browse AI account (no credit card required).
  • A sitemap.xml or sitemap index URL (typically at domain.com/sitemap.xml).
1
Sign up for free
Create your Browse AI account in under a minute. No credit card required. You will find this prebuilt robot in the robot library ready to use.
2
Find and copy the sitemap URL
Most sites publish their sitemap at /sitemap.xml. Check the site's robots.txt file for the Sitemap: directive, which points to the exact URL. Copy the sitemap or sitemap index URL.
3
Run the robot
Click run. The robot reads the XML sitemap and extracts each entry's position, full URL (loc), and last modified date (lastmod).
4
Connect integrations or export your data
Your sitemap data is structured and ready. Export to Google Sheets for SEO auditing, sync to Airtable for content inventory management, or analyze URL patterns to map site architecture.

Ready to get started?

Try this robot free →

What can you do with sitemap data?

Extracted sitemap data is foundational for SEO audits and competitive analysis:

  • SEO site audit: Extract your own sitemap to verify all important pages are included. Compare against actual indexed pages in Google Search Console to find gaps.
  • Competitor content mapping: Extract a competitor's sitemap to see every page they publish. Understand their content strategy, URL structure, and site scope without crawling.
  • Content freshness audit: Extract last modified dates across your sitemap. Identify stale pages that haven't been updated and may need content refreshes.
  • Migration planning: Before a site migration, extract the full sitemap. Use the URL list as your redirect mapping checklist to ensure no pages get lost.
  • Content volume benchmarking: Compare sitemap sizes across competitors. Page counts by section (blog, product, docs) reveal where competitors invest content effort.
  • Orphan page detection: Cross-reference sitemap URLs with internal link analysis. Pages in the sitemap but not linked from anywhere may have discoverability issues.
🔍
SEO professionals
Audit site structure using sitemap data. Verify indexation coverage, check for stale content, and map URL architecture efficiently.
📊
Competitive analysts
Map competitor websites by extracting their sitemaps. Understand site scope and content strategy from their published URL inventories.
🛠️
Web developers
Verify deployment coverage by comparing sitemaps against your CMS or build output. Ensure every generated page appears in the sitemap.
📝
Content strategists
Inventory existing content using sitemap data. Identify content gaps, plan updates based on last-modified dates, and track content volume over time.

What data does this sitemap extractor pull?

Each sitemap entry provides three structured fields:

FieldWhat it contains
PositionThe numeric position of the URL within the sitemap file, reflecting the order entries appear in the XML.
locThe full URL listed in the sitemap. For sitemap index files, this is the URL of each sub-sitemap; for regular sitemaps, it is the page URL.
lastmodThe last modified timestamp for the entry, showing when the URL or sub-sitemap was last updated (ISO 8601 format).

Sitemaps provide the URL inventory. For page-level data like titles, meta descriptions, and content, combine sitemap extraction with individual page scraping.

Frequently asked questions

Where do I find a site's sitemap?
Most sites use /sitemap.xml. If that doesn't work, check the site's robots.txt file - it typically contains a Sitemap: directive with the exact URL.

What's a sitemap index vs. a regular sitemap?
A sitemap index file lists multiple sub-sitemaps. The robot handles both - extracting the position, loc (URL), and lastmod for each entry.

Do all websites have sitemaps?
Most well-maintained websites publish XML sitemaps. Some smaller or older sites may not. If no sitemap exists, you'll need a crawler to discover pages.

Can I use this for large sitemaps with thousands of URLs?
Yes. Sitemaps can contain up to 50,000 URLs each. The robot extracts the position, URL, and last modified date for all listed entries regardless of sitemap size.

Is this sitemap extractor free?
Browse AI's free plan includes credits to run this robot. No credit card required.

Sitemaps show what a site publishes - combine with page-level and search data for full SEO intelligence:

Extract every URL from any sitemap

URLs, dates, priorities - complete site inventories from XML sitemaps.

Use this automation
This is some text inside of a div block.
G2 Leader badgeG2 Easiest Setup badge
This is some text inside of a div block.

THE #1 AI WEB SCRAPER TRUSTED BY THOUSANDS OF BUSINESSES GLOBALLY

Explore 250+ prebuilt web scrapers and monitors, including these sites:
Create your own custom web scraper or website monitor.
Scrape and monitor data from any website with the #1 AI web scraping platform.
Get started with a free account.
Create your own custom web scraper or monitoring tool with our no code AI-powered platform. Get started for free (no credit card required).
Sign up
Web scraping services & Enterprise web scraping solutions
For complex and high scale solutions we offer managed web scraping services. Our team thrives in getting you the data you want, the way you want it.
Book a call