HTTP request - Glossary

What is an HTTP request?

An HTTP request is a message your browser or scraping tool sends to a web server asking for specific information. Think of it as knocking on a door and asking for something. You specify what you want, how you want it, and provide any credentials or context the server needs to respond properly.

Every time you load a webpage or pull data from a website, your tool creates an HTTP request behind the scenes. The server receives this request, processes it, and sends back an HTTP response with the data you asked for.

HTTP request structure

Every HTTP request contains three main parts that tell the server what you need:

Request line: This is the first line that includes the HTTP method (like GET or POST), the path to the resource you want (like /products/shoes), and the HTTP version (usually HTTP/1.1 or HTTP/2). For example: GET /api/users HTTP/1.1.

Headers: These are additional pieces of information about your request, formatted as key-value pairs. Headers tell the server things like what browser you're using, what data formats you accept, and any authentication tokens you have. Common headers include User-Agent, Accept, and Authorization.

Body: This optional section contains data you're sending to the server. Not every request needs a body. GET requests typically don't have one, while POST requests usually do when you're submitting forms or sending data to APIs.

HTTP methods you'll use

GET: This is the most common method for web scraping. When you use GET, you're asking the server to send you a specific resource. GET requests don't include a body because you're just retrieving information, not sending it. Most web scraping pulls HTML pages using GET requests.

POST: You use POST when you need to send data to the server. This method includes a body with your data, making it essential when you're scraping websites that require form submissions, search queries, or login credentials. Many APIs also require POST requests to retrieve specific data.

PUT and PATCH: These methods update existing resources on the server. You'll rarely use them for traditional web scraping, but they come up when working with APIs that let you modify data.

Request headers that matter for scraping

User-Agent: This identifies what's making the request. Websites check this header to see if you're a real browser or a bot. When scraping, you should set this to look like a legitimate browser, or some sites will block your requests.

Authorization: This carries your credentials like API keys or tokens. Many sites require valid authorization headers before they'll send you data.

Accept: This tells the server what format you want the response in. You might specify text/html for webpages or application/json for API data.

Content-Type: When your request has a body, this header specifies the data format you're sending. Common values include application/json or application/x-www-form-urlencoded for form data.

Cookie: This contains session information and preferences. When scraping sites that require login, you need to manage cookies across multiple requests to maintain your session.

Query parameters in URLs

Query parameters are key-value pairs you add to the end of a URL, starting with a question mark. They let you pass information without using the request body. For example: https://example.com/search?query=shoes&page=2&sort=price.

When scraping, query parameters help you target specific data by specifying search terms, pagination, filters, or sorting options. You can change these parameters to scrape different pages or data subsets without altering the base URL.

Request body for sending data

The request body carries information to the server when you need to send data, not just request it. Only POST, PUT, and PATCH requests typically include bodies.

When scraping interactive sites, you'll use POST request bodies to submit forms, send search parameters, or authenticate. The body might contain form data like name=John&email=john@example.com or JSON like {"search": "laptops", "price_max": 1000}.

The format depends on your Content-Type header. Match the format the website expects, or your request will fail.

How Browse AI handles HTTP requests

When you build a scraper with Browse AI, you don't need to worry about constructing HTTP requests manually. The platform handles all the technical details like setting proper headers, managing cookies, and sending the right request methods automatically.

Browse AI's no-code interface lets you extract data by simply showing it what you want, while the platform manages the underlying HTTP requests, handles dynamic content loading, and navigates through pagination. This means you get all the benefits of properly formatted HTTP requests without writing a single line of code.