Web scraping glossary

Getting started with web scraping? Learn basic concepts and fundamentals in a glance.
CSV (Comma-Separated Values) is a plain text format that stores data in rows and columns. It's the most common way to export scraped web data because it's simple, universal, and works with virtually any tool.
Learn more
XML is a text-based format for storing and transporting structured data. Learn how XML parsing techniques like XPath and DOM parsing power web scraping workflows.
Learn more
JSON (JavaScript Object Notation) is a lightweight data format that organizes information into key-value pairs, making it easier to extract clean, structured data from websites without parsing complex HTML.
Learn more
AJAX lets websites update content without page reloads, creating responsive user experiences. For web scraping, AJAX presents challenges because content loads asynchronously through JavaScript rather than appearing in initial HTML.
Learn more
A single page application loads once and updates content dynamically through JavaScript instead of loading new pages. This creates unique web scraping challenges because the data isn't in the initial HTML and requires JavaScript execution to appear.
Learn more
Static content refers to web files delivered to your browser exactly as they're stored on the server, without any processing or database queries. It's faster, more secure, and easier to scrape than dynamic content.
Learn more
Dynamic content loads after the initial HTML response using JavaScript. It creates challenges for web scraping because traditional methods only capture the basic HTML skeleton, missing most of the actual data.
Learn more
JavaScript makes websites interactive by running code directly in your browser. For web scraping, it creates challenges because many sites use JavaScript to load content dynamically, requiring special tools like headless browsers to extract data properly.
Learn more
An HTTP request is a message sent to a web server asking for specific information. Learn how HTTP requests work in web scraping, including methods like GET and POST, essential headers, request bodies, and query parameters.
Learn more
XPath is a query language that lets you navigate and extract data from HTML and XML documents by specifying paths to elements. It's one of the most powerful tools for web scraping because it enables precise targeting of specific elements.
Learn more