Web scraping glossary

Getting started with web scraping? Learn basic concepts and fundamentals in a glance.
Incremental scraping is a web scraping strategy that extracts only new or changed data since your last run, rather than re-scraping everything. It keeps datasets current efficiently by focusing on changes instead of complete refreshes.
Learn more
Bulk extraction is the process of scraping large amounts of data from multiple web pages in a single automated operation. It applies extraction patterns across thousands of URLs to build comprehensive datasets quickly.
Learn more
Deep scraping is the process of extracting data from multiple linked pages on a website, rather than just from a single page.
Learn more
Detail page extraction is a web scraping technique that captures comprehensive information from individual item pages. It goes beyond list summaries to extract full descriptions, specifications, images, reviews, and detailed data.
Learn more
List extraction is a web scraping technique that extracts multiple similar items from a web page by recognizing repeating patterns. It transforms product listings, job postings, and other repeated content into structured datasets.
Learn more
Unstructured data is information that lacks a predefined format or organization, including text, images, videos, and documents. Web scrapers extract this messy content and convert it into organized, usable formats.
Learn more
Structured data is information organized in a predictable format that makes it easy to search and analyze. In web scraping, it's the clean, organized output created from messy web pages.
Learn more
A user agent is a text string that identifies your browser or scraping tool to web servers. Websites use user agents to detect bots, making proper user agent configuration essential for successful web scraping.
Learn more