Web scraping glossary

Getting started with web scraping? Learn basic concepts and fundamentals in a glance.
A CSS selector is a pattern that targets specific HTML elements on a web page. In web scraping, CSS selectors act as precise instructions that tell your scraper exactly which elements to extract data from.
Learn more
A status code is a three-digit number that tells you whether your web request succeeded or failed. Learn how to interpret and handle status codes to build reliable scrapers.
Learn more
An HTTP response is the data a web server sends back after receiving a request. It contains status codes, headers, and the HTML body that scrapers parse to extract data from websites.
Learn more
The DOM (Document Object Model) is a programming interface that represents HTML as a tree structure of objects. It's the live, interactive model your browser creates from HTML code, and it's what web scrapers extract data from.
Learn more
HTML (Hypertext Markup Language) is the standard language for creating web pages. It uses tags to structure content, and in web scraping, HTML is the raw material you parse and extract data from.
Learn more
A web crawler is a bot that systematically browses the internet by following links from page to page, discovering and mapping content across websites. Crawlers find what pages exist, while scrapers extract specific data from those pages.
Learn more
Incremental scraping is a web scraping strategy that extracts only new or changed data since your last run, rather than re-scraping everything. It keeps datasets current efficiently by focusing on changes instead of complete refreshes.
Learn more
Bulk extraction is the process of scraping large amounts of data from multiple web pages in a single automated operation. It applies extraction patterns across thousands of URLs to build comprehensive datasets quickly.
Learn more
Deep scraping is the process of extracting data from multiple linked pages on a website, rather than just from a single page.
Learn more
Detail page extraction is a web scraping technique that captures comprehensive information from individual item pages. It goes beyond list summaries to extract full descriptions, specifications, images, reviews, and detailed data.
Learn more