Web Scraping Glossary

Infinite scroll

Infinite scroll is a web design pattern that automatically loads new content as you scroll down a page, eliminating pagination. Learn how it works and its impact on web scraping.

Learn more

Pagination

Pagination splits website content across multiple pages. When scraping, you need strategies to navigate through all pages and collect complete datasets instead of just the first page of results.

Learn more

REST API is an interface that lets two systems exchange information over the internet using standardized HTTP protocols. It provides structured data access that's cleaner and more reliable than HTML scraping.

Learn more

API (Application programming interface)

An API (application programming interface) is a set of rules that lets different software applications communicate and exchange data automatically, providing a structured alternative to web scraping.

Learn more

CSV

CSV (Comma-Separated Values) is a plain text format that stores data in rows and columns. It's the most common way to export scraped web data because it's simple, universal, and works with virtually any tool.

Learn more

XML

XML is a text-based format for storing and transporting structured data. Learn how XML parsing techniques like XPath and DOM parsing power web scraping workflows.

Learn more

JSON

JSON (JavaScript Object Notation) is a lightweight data format that organizes information into key-value pairs, making it easier to extract clean, structured data from websites without parsing complex HTML.

Learn more

AJAX

AJAX lets websites update content without page reloads, creating responsive user experiences. For web scraping, AJAX presents challenges because content loads asynchronously through JavaScript rather than appearing in initial HTML.

Learn more

Single page application (SPA)

A single page application loads once and updates content dynamically through JavaScript instead of loading new pages. This creates unique web scraping challenges because the data isn't in the initial HTML and requires JavaScript execution to appear.

Learn more

Static content

Static content refers to web files delivered to your browser exactly as they're stored on the server, without any processing or database queries. It's faster, more secure, and easier to scrape than dynamic content.

Learn more