Monitor millions of pages automatically with Browse AI's new sitemap robots
As a Head of Marketing, I've spent years trying to crack the competitive intelligence code. In previous roles, I hired interns to manually check competitor sites (they caught maybe 10% of changes). I paid agencies $5,000+ monthly for "intelligence reports" that were outdated before they hit my inbox. I even convinced engineering to build custom scrapers that broke every time a competitor changed their font.
The frustration was real. I knew tracking competitors was critical, but we were always three steps behind.
Now at Browse AI, I've finally built the competitive monitoring system I always dreamed of. Using our new sitemap extraction robots that can process up to 50,000 URLs per task, my team tracks every move our competitors make - automatically.
Let me show you exactly how we do it.
What is sitemap extraction and why does it matter?
A sitemap is like a blueprint of a website - it lists every page, when it was last updated, and how often it changes. Most businesses have sitemaps with thousands or even millions of URLs.
Our new sitemap extraction robots turn these blueprints into actionable competitive intelligence by:
- Extracting every URL on a competitor's site
- Identifying which pages changed and when
- Enabling targeted monitoring of only updated content
- Scaling to millions of pages without breaking a sweat
The complete setup guide: from sitemap to full competitive intelligence
Step 1: Find the competitor's sitemap
Most websites follow standard conventions:
- Try: competitor.com/sitemap.xml
- Or: competitor.com/sitemap_index.xml
- Check: competitor.com/robots.txt (often lists sitemap location)
Pro method for hidden sitemaps:
- Use our Google search results robot
- Search: site:competitor.com filetype:xml sitemap
- Extract all XML files to find non-standard locations
Step 2: Choose and run the right sitemap robot
Browse AI offers two sitemap robots depending on the structure:
For sites with multiple sitemaps (like Amazon, large news sites):
- Use the sitemap index extractor
- This extracts all child sitemap URLs from the main index
For single sitemap files:
- Use the URL set extractor
- This extracts individual page URLs with metadata
Not sure which to use? Start with the index extractor - it'll show you if there are multiple sitemaps or just one.
Step 3: Set up monitoring for live updates
Once you run the sitemap robot:
- Click "Monitor" on your task
- Set frequency (I use weekly for competitors)
- Enable notifications for changes
What the sitemap monitor captures:
- All URLs in the sitemap
- Last modified dates for each URL
- New pages added since last run
- Pages removed since last run
Your monitor creates a live database that updates automatically.
Step 4: Create the full text extraction robot
Now for the powerful part - you’re going to use that list of URLs you generated from the sitemap and put together a database of the text of each page, as well as a screenshot:
- Go to the full text & screenshot extractor
- Train it on a sample competitor page
- It will extract:
- Complete text
- Full page screenshot
Step 5: Connect everything with a workflow
Here's where the magic happens, you’re going to connect the two robots to create a database of every single webpage:
- Go to Workflows in Browse AI
- Create new workflow
- Select "Robot A: Sitemap URL extractor" and "Robot B: Full text and screenshot extractor"
- Set trigger: "When sitemap monitor finds new/updated URLs"
- Add action: "Run full text extractor on each URL"
- (Optional) Configure output: Send to Google Sheets or Airtable
The workflow automatically processes every URL from your sitemap through the text extractor.
Step 6: Choose your monitoring strategy
You have two options:
Option A: Monitor sitemap changes (recommended)
- Only process URLs that show as modified in the sitemap
- More efficient for large sites
- Focuses on actual updates
Option B: Monitor all pages directly
- Set up monitors on the text extraction robot
- Catches changes even if sitemap isn't updated
- Better for sites with poor sitemap maintenance
Step 7: Run and review your complete dataset
After running the complete workflow, you'll have:
Sample output structure:
URL | Last Modified | Page Title | Full Text | Screenshot URL | Headers
----|--------------|------------|-----------|----------------|--------
competitor.com/pricing | 2024-01-15 | Pricing - Competitor | Complete page text... | screenshot.url | H1, H2s...
competitor.com/features | 2024-01-14 | Features | All feature descriptions... | screenshot.url | Headers...
You can now:
- Search across all competitor content
- Track changes over time
- Feed to AI for analysis
- Build competitive intelligence dashboards
Turning raw data into strategic insights with AI
Having all this competitor data is powerful. But feeding it to an LLM? That's when you become unstoppable. Here's exactly how I use AI to analyze our competitive intelligence:
The weekly AI analysis ritual
Every Monday after our monitors run, I have a 30-minute session with Claude where I upload our fresh competitive data. Here are my go-to prompts that surface game-changing insights:
For strategic positioning:
"Analyze these homepage versions from our top competitors over the past 10 weeks.
Identify:
1. How each competitor's positioning has evolved
2. What market segments they're targeting now vs. before
3. Any positioning gaps none of us are addressing
4. Predict where each competitor is heading strategically"
For feature intelligence:
"Here's the full text from 200 product pages across our competitive set.
Create a feature matrix showing:
1. Table stakes features (everyone has them)
2. Differentiating features (only 1-2 companies)
3. Emerging features (newly added in last 30 days)
4. Our feature gaps and opportunities"
For pricing strategy:
"Analyze pricing pages from these 10 competitors.
Tell me:
1. Average price per user/month at each tier
2. How they structure value (seats vs. usage vs. features)
3. Discount strategies (annual, volume, etc.)
4. What features typically move users to higher tiers
5. Pricing trends over the past 3 months"
Content gap analysis that actually works
"Here are titles and meta descriptions from 2,000 competitor blog posts.
Compare with our 500 blog posts.
Identify:
1. High-traffic topics we're not covering (based on social shares/comments)
2. Emerging topics multiple competitors started covering recently
3. Our unique content angles they're missing
4. Keyword opportunities based on their title patterns"
The competitive prediction model
"You have 6 months of weekly website snapshots from [competitor].
Analyze all changes chronologically.
Based on patterns, predict:
1. What features they'll likely launch next
2. Their probable pricing changes
3. Market segments they're moving toward
4. Strategic initiatives in progress"
Real-time battle cards
"Using the latest data from [competitor], update our battle card:
- New features they've added
- Messaging changes to counter
- Pricing updates
- Customer testimonials they're using
- Weaknesses based on what they DON'T mention"
The AI-powered competitive dashboard
I've created a weekly automated report by chaining prompts:
- Change summary: "What significant changes occurred across all competitors this week?"
- Threat assessment: "Which competitor moves pose the biggest threat to our positioning?"
- Opportunity identification: "What market opportunities are emerging based on competitor blind spots?"
- Action recommendations: "Based on all data, what are the top 3 strategic moves we should consider?"
Pro tips for LLM analysis
Structure your data for AI:
- Include dates with every piece of content
- Label source clearly (competitor name, page type)
- Keep historical versions for trend analysis
- Group by category (pricing, features, messaging)
Use iterative prompting: Start broad: "What patterns do you see?" Then narrow: "Tell me more about the pricing pattern you mentioned" Then actionable: "How should we respond to this pricing pressure?"
Verify insights: AI can hallucinate connections. Always verify key insights by checking the source data.
Create insight templates: Save your best prompts and run them weekly for consistent intelligence.
The compound advantage
Here's what combining Browse AI + LLMs gives you:
- Speed: Analyze thousands of pages in minutes vs. weeks
- Depth: Surface patterns humans would miss
- Consistency: Same analysis criteria every time
- Prediction: Spot trends before they're obvious
- Action: Get specific recommendations, not just data
Start monitoring competitors today
The setup takes about 30 minutes, then runs automatically forever:
- Find competitor sitemaps using methods above
- Run the appropriate sitemap robot
- Set up monitoring for continuous updates
- Create the text extraction robot
- Connect with a workflow
- Analyze with your favorite tools
Your competitors are already watching you
In my years in marketing, I've learned one truth: the companies with the best intelligence win. But intelligence isn't just knowing that something changed - it's understanding what changed, why it changed, and what it means for your strategy.
This workflow gives you that complete picture. Every word, every update, every test your competitors run is captured and analyzed.
At Browse AI, this system has become our strategic advantage. We don't react to competitor announcements - we see them coming months in advance.
The future of competitive intelligence isn't just tracking what competitors do - it's understanding why they do it and predicting what's next. With Browse AI's extraction and LLM analysis, you're not just watching the game - you're seeing three moves ahead.
Start extracting competitive intelligence →
After wasting years on surface-level competitive tracking, I finally have the deep intelligence system I always needed. Join me and 500,000+ Browse AI users who are done with manual monitoring and ready for automated, comprehensive competitive intelligence.