How to track competitor websites at scale: the sitemap extraction method that's saving me 40+ hours per week

Mel Shires
July 17, 2025

Monitor millions of pages automatically with Browse AI's new sitemap robots

As a Head of Marketing, I've spent years trying to crack the competitive intelligence code. In previous roles, I hired interns to manually check competitor sites (they caught maybe 10% of changes). I paid agencies $5,000+ monthly for "intelligence reports" that were outdated before they hit my inbox. I even convinced engineering to build custom scrapers that broke every time a competitor changed their font.

The frustration was real. I knew tracking competitors was critical, but we were always three steps behind.

Now at Browse AI, I've finally built the competitive monitoring system I always dreamed of. Using our new sitemap extraction robots that can process up to 50,000 URLs per task, my team tracks every move our competitors make - automatically.

Let me show you exactly how we do it.

What is sitemap extraction and why does it matter?

A sitemap is like a blueprint of a website - it lists every page, when it was last updated, and how often it changes. Most businesses have sitemaps with thousands or even millions of URLs.

Our new sitemap extraction robots turn these blueprints into actionable competitive intelligence by:

  • Extracting every URL on a competitor's site
  • Identifying which pages changed and when
  • Enabling targeted monitoring of only updated content
  • Scaling to millions of pages without breaking a sweat

The complete setup guide: from sitemap to full competitive intelligence

Step 1: Find the competitor's sitemap

Most websites follow standard conventions:

  • Try: competitor.com/sitemap.xml
  • Or: competitor.com/sitemap_index.xml
  • Check: competitor.com/robots.txt (often lists sitemap location)

Pro method for hidden sitemaps:

  1. Use our Google search results robot
  2. Search: site:competitor.com filetype:xml sitemap
  3. Extract all XML files to find non-standard locations

Step 2: Choose and run the right sitemap robot

Browse AI offers two sitemap robots depending on the structure:

For sites with multiple sitemaps (like Amazon, large news sites):

For single sitemap files:

Not sure which to use? Start with the index extractor - it'll show you if there are multiple sitemaps or just one.

Step 3: Set up monitoring for live updates

Once you run the sitemap robot:

  1. Click "Monitor" on your task
  2. Set frequency (I use weekly for competitors)
  3. Enable notifications for changes

What the sitemap monitor captures:

  • All URLs in the sitemap
  • Last modified dates for each URL
  • New pages added since last run
  • Pages removed since last run

Your monitor creates a live database that updates automatically.

Step 4: Create the full text extraction robot

Now for the powerful part - you’re going to use that list of URLs you generated from the sitemap and put together a database of the text of each page, as well as a screenshot:

  1. Go to the full text & screenshot extractor
  2. Train it on a sample competitor page
  3. It will extract:
    • Complete text
    • Full page screenshot

Step 5: Connect everything with a workflow

Here's where the magic happens, you’re going to connect the two robots to create a database of every single webpage:

  1. Go to Workflows in Browse AI
  2. Create new workflow
  3. Select "Robot A: Sitemap URL extractor" and "Robot B: Full text and screenshot extractor"
  4. Set trigger: "When sitemap monitor finds new/updated URLs"
  5. Add action: "Run full text extractor on each URL"
  6. (Optional) Configure output: Send to Google Sheets or Airtable

The workflow automatically processes every URL from your sitemap through the text extractor.

Step 6: Choose your monitoring strategy

You have two options:

Option A: Monitor sitemap changes (recommended)

  • Only process URLs that show as modified in the sitemap
  • More efficient for large sites
  • Focuses on actual updates

Option B: Monitor all pages directly

  • Set up monitors on the text extraction robot
  • Catches changes even if sitemap isn't updated
  • Better for sites with poor sitemap maintenance

Step 7: Run and review your complete dataset

After running the complete workflow, you'll have:

Sample output structure:

URL | Last Modified | Page Title | Full Text | Screenshot URL | Headers

----|--------------|------------|-----------|----------------|--------

competitor.com/pricing | 2024-01-15 | Pricing - Competitor | Complete page text... | screenshot.url | H1, H2s...

competitor.com/features | 2024-01-14 | Features | All feature descriptions... | screenshot.url | Headers...

You can now:

  • Search across all competitor content
  • Track changes over time
  • Feed to AI for analysis
  • Build competitive intelligence dashboards

Turning raw data into strategic insights with AI

Having all this competitor data is powerful. But feeding it to an LLM? That's when you become unstoppable. Here's exactly how I use AI to analyze our competitive intelligence:

The weekly AI analysis ritual

Every Monday after our monitors run, I have a 30-minute session with Claude where I upload our fresh competitive data. Here are my go-to prompts that surface game-changing insights:

For strategic positioning:

"Analyze these homepage versions from our top competitors over the past 10 weeks. 

Identify:

1. How each competitor's positioning has evolved

2. What market segments they're targeting now vs. before

3. Any positioning gaps none of us are addressing

4. Predict where each competitor is heading strategically"

For feature intelligence:

"Here's the full text from 200 product pages across our competitive set.

Create a feature matrix showing:

1. Table stakes features (everyone has them)

2. Differentiating features (only 1-2 companies)

3. Emerging features (newly added in last 30 days)

4. Our feature gaps and opportunities"

For pricing strategy:

"Analyze pricing pages from these 10 competitors.

Tell me:

1. Average price per user/month at each tier

2. How they structure value (seats vs. usage vs. features)

3. Discount strategies (annual, volume, etc.)

4. What features typically move users to higher tiers

5. Pricing trends over the past 3 months"

Content gap analysis that actually works

"Here are titles and meta descriptions from 2,000 competitor blog posts.

Compare with our 500 blog posts.

Identify:

1. High-traffic topics we're not covering (based on social shares/comments)

2. Emerging topics multiple competitors started covering recently

3. Our unique content angles they're missing

4. Keyword opportunities based on their title patterns"

The competitive prediction model

"You have 6 months of weekly website snapshots from [competitor].

Analyze all changes chronologically.

Based on patterns, predict:

1. What features they'll likely launch next

2. Their probable pricing changes

3. Market segments they're moving toward

4. Strategic initiatives in progress"

Real-time battle cards

"Using the latest data from [competitor], update our battle card:

- New features they've added

- Messaging changes to counter

- Pricing updates

- Customer testimonials they're using

- Weaknesses based on what they DON'T mention"

The AI-powered competitive dashboard

I've created a weekly automated report by chaining prompts:

  1. Change summary: "What significant changes occurred across all competitors this week?"
  2. Threat assessment: "Which competitor moves pose the biggest threat to our positioning?"
  3. Opportunity identification: "What market opportunities are emerging based on competitor blind spots?"
  4. Action recommendations: "Based on all data, what are the top 3 strategic moves we should consider?"

Pro tips for LLM analysis

Structure your data for AI:

  • Include dates with every piece of content
  • Label source clearly (competitor name, page type)
  • Keep historical versions for trend analysis
  • Group by category (pricing, features, messaging)

Use iterative prompting: Start broad: "What patterns do you see?" Then narrow: "Tell me more about the pricing pattern you mentioned" Then actionable: "How should we respond to this pricing pressure?"

Verify insights: AI can hallucinate connections. Always verify key insights by checking the source data.

Create insight templates: Save your best prompts and run them weekly for consistent intelligence.

The compound advantage

Here's what combining Browse AI + LLMs gives you:

  • Speed: Analyze thousands of pages in minutes vs. weeks
  • Depth: Surface patterns humans would miss
  • Consistency: Same analysis criteria every time
  • Prediction: Spot trends before they're obvious
  • Action: Get specific recommendations, not just data

Start monitoring competitors today

The setup takes about 30 minutes, then runs automatically forever:

  1. Find competitor sitemaps using methods above
  2. Run the appropriate sitemap robot
  3. Set up monitoring for continuous updates
  4. Create the text extraction robot
  5. Connect with a workflow
  6. Analyze with your favorite tools

Your competitors are already watching you

In my years in marketing, I've learned one truth: the companies with the best intelligence win. But intelligence isn't just knowing that something changed - it's understanding what changed, why it changed, and what it means for your strategy.

This workflow gives you that complete picture. Every word, every update, every test your competitors run is captured and analyzed.

At Browse AI, this system has become our strategic advantage. We don't react to competitor announcements - we see them coming months in advance.

The future of competitive intelligence isn't just tracking what competitors do - it's understanding why they do it and predicting what's next. With Browse AI's extraction and LLM analysis, you're not just watching the game - you're seeing three moves ahead.

Start extracting competitive intelligence →

After wasting years on surface-level competitive tracking, I finally have the deep intelligence system I always needed. Join me and 500,000+ Browse AI users who are done with manual monitoring and ready for automated, comprehensive competitive intelligence.

Subscribe to Browse AI newsletter
No spam. Just the latest releases, useful articles and tips & tricks.
Read about our privacy policy.
You're now a subscriber!
Oops! Something went wrong while submitting the form.
Subscribe to our Newsletter
Receive the latest news, articles, and resources in your inbox monthly.
By subscribing, you agree to our Privacy Policy and provide consent to receive updates from Browse AI.
Oops! Something went wrong while submitting the form.