Web scraping glossary

Getting started with web scraping? Learn basic concepts and fundamentals in a glance.
Infinite scroll is a web design pattern that automatically loads new content as you scroll down a page, eliminating pagination. Learn how it works and its impact on web scraping.
Learn more
Pagination splits website content across multiple pages. When scraping, you need strategies to navigate through all pages and collect complete datasets instead of just the first page of results.
Learn more
REST API is an interface that lets two systems exchange information over the internet using standardized HTTP protocols. It provides structured data access that's cleaner and more reliable than HTML scraping.
Learn more
An API (application programming interface) is a set of rules that lets different software applications communicate and exchange data automatically, providing a structured alternative to web scraping.
Learn more
CSV (Comma-Separated Values) is a plain text format that stores data in rows and columns. It's the most common way to export scraped web data because it's simple, universal, and works with virtually any tool.
Learn more
XML is a text-based format for storing and transporting structured data. Learn how XML parsing techniques like XPath and DOM parsing power web scraping workflows.
Learn more
JSON (JavaScript Object Notation) is a lightweight data format that organizes information into key-value pairs, making it easier to extract clean, structured data from websites without parsing complex HTML.
Learn more
AJAX lets websites update content without page reloads, creating responsive user experiences. For web scraping, AJAX presents challenges because content loads asynchronously through JavaScript rather than appearing in initial HTML.
Learn more
A single page application loads once and updates content dynamically through JavaScript instead of loading new pages. This creates unique web scraping challenges because the data isn't in the initial HTML and requires JavaScript execution to appear.
Learn more
Static content refers to web files delivered to your browser exactly as they're stored on the server, without any processing or database queries. It's faster, more secure, and easier to scrape than dynamic content.
Learn more