Airbnb is an online marketplace that connects people who want to rent out their homes with people who are looking for accommodations in specific locales.
In this article, you will read about the easiest way to scrape Airbnb listings with Page2API.
You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Airbnb listings into Google Sheets.
You can scrape Airbnb data, with such information as amenities, prices, descriptions, photos, URLs to perform:
To start scraping Airbnb, you will need the following things:
First, we need to open the Airbnb search page with the desired location.
https://www.airbnb.com/s/Amsterdam--Netherlands/homes
The page we see must look similar to the following one:
From the search page, we will scrape the following attributes:
/* Parent: */
[itemprop=itemListElement]
/* URL: */
a[target*='listing_']
/* Title: */
[id*='title']
/* Name: */
[itemprop=name]
/* Beds: */
[aria-label*='bed']
/* Rating: */
[aria-label*='rating']
/* Price: */
[style*='--pricing'] > div > span > div > span
Now, let's handle the pagination.
// Page 1
https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=0
// Page 2
https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=20
// Page 3
https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=40
This looks like a great scenario to use the batch scraping approach.
Now let's build the request that will scrape all listings that the search page returned.
The following examples will show how to scrape 3 pages of listings from Airbnb.com
{
"api_key": "YOUR_PAGE2API_KEY",
"batch": {
"urls": "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=[0, 40, 20]",
"concurrency": 1,
"merge_results": true
},
"parse": {
"listings": [
{
"url": "a[target*='listing_'] >> href",
"beds": "[aria-label*='bed'] >> text",
"name": "[itemprop=name] >> content",
"price": "[style*='--pricing'] > div > span > div > span >> text",
"title": "[id*='title'] >> text",
"rating": "[aria-label*='rating'] >> text",
"_parent": "[itemprop=itemListElement]"
}
]
},
"wait_for": "[itemprop=itemListElement]",
"real_browser": true,
"premium_proxy": "us"
}
require 'rest_client'
require 'json'
api_url = "https://www.page2api.com/api/v1/scrape"
payload = {
api_key: "YOUR_PAGE2API_KEY",
batch: {
urls: "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=[0, 40, 20]",
concurrency: 1,
merge_results: true
},
parse: {
listings: [
{
url: "a[target*='listing_'] >> href",
beds: "[aria-label*='bed'] >> text",
name: "[itemprop=name] >> content",
price: "[style*='--pricing'] > div > span > div > span >> text",
title: "[id*='title'] >> text",
rating: "[aria-label*='rating'] >> text",
_parent: "[itemprop=itemListElement]"
}
]
},
wait_for: "[itemprop=itemListElement]",
real_browser: true,
premium_proxy: "us"
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"listings": [
{
"url": "https://www.airbnb.com/rooms/3982031?adults=1&children=0&infants=0&check_in=2022-08-20&check_out=2022-08-27&previous_page_section_name=1000&federated_search_id=b79f5e55-f2f6-40db-baae-db124484bd1e",
"beds": "1 double bed",
"name": "Top location, quiet guesthouse, 2p",
"price": "$113",
"title": "Guesthouse in Stadsdeel Centrum",
"rating": "4.93 (147)"
},
{
"url": "https://www.airbnb.com/rooms/51760555?adults=1&children=0&infants=0&check_in=2022-11-02&check_out=2022-11-09&previous_page_section_name=1000&federated_search_id=b79f5e55-f2f6-40db-baae-db124484bd1e",
"beds": "1 double bed",
"name": "Amsterdam city garden",
"price": "$87",
"title": "Apartment in Stadsdeel West",
"rating": "4.88 (8)"
},
...
]
}, ...
}
From the 'Search' page, we click on any listing.
https://www.airbnb.com/rooms/3163509
The page will look like the following one:
From this page, we will scrape the following attributes:
/* Title: */
.ds-summary-row span
/* Superhost: */
//span[contains(text(),'Superhost')]
/* Guests: */
//span[contains(text(),'guests')]
/* Bedrooms: */
//span[contains(text(),'bedrooms')]
/* Beds: */
//span[contains(text(),'beds')]
/* Baths: */
//span[contains(text(),'baths')]
/* Price: */
[style*='--pricing'] > div > span > div > span
/* Images: */
picture img
/* Reviews / Rating */
// we will encode it to base64 later
var ratingObject = document.querySelector("[aria-label*='Rated']").attributes['aria-label'].nodeValue.match(/Rated (?[\d?\.]+).+ from\s(?[\d]+)/).groups;
// reviews js selector
ratingObject['reviews']
// rating js selector
ratingObject['rating']
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://www.airbnb.com/rooms/3163509",
"parse": {
"title": "h1 >> text",
"superhost": "//span[contains(text(),'Superhost')] >> text",
"guests": "//span[contains(text(),'guests')] >> text",
"bedrooms": "//span[contains(text(),'bedrooms')] >> text",
"beds": "//span[contains(text(),'beds')] >> text",
"baths": "//span[contains(text(),'baths')] >> text",
"reviews": "js >> ratingObject['reviews']",
"rating": "js >> ratingObject['rating']",
"price": "[style*='--pricing'] > div > span > div > span >> text",
"images": ["picture img >> data-original-uri"]
},
"scenario": [
{ "wait": 3 },
{
"execute_js": "dmFyIHJhdGluZ09iamVjdCA9IGRvY3VtZW50LnF1ZXJ5U2VsZWN0b3IoIlthcmlhLWxhYmVsKj0nUmF0ZWQnXSIpLmF0dHJpYnV0ZXNbJ2FyaWEtbGFiZWwnXS5ub2RlVmFsdWUubWF0Y2goL1JhdGVkICg/PHJhdGluZz5bXGQ/XC5dKykuKyBmcm9tXHMoPzxyZXZpZXdzPltcZF0rKS8pLmdyb3VwczsK"
},
{ "execute": "parse" }
],
"premium_proxy": "us",
"real_browser": true
}
require 'rest_client'
require 'json'
api_url ="https://www.page2api.com/api/v1/scrape"
payload = {
api_key: 'YOUR_PAGE2API_KEY',
url: "https://www.airbnb.com/rooms/3163509",
parse: {
title: "h1 >> text",
superhost: "//span[contains(text(),'Superhost')] >> text",
guests: "//span[contains(text(),'guests')] >> text",
bedrooms: "//span[contains(text(),'bedrooms')] >> text",
beds: "//span[contains(text(),'beds')] >> text",
baths: "//span[contains(text(),'baths')] >> text",
reviews: "js >> ratingObject['reviews']",
rating: "js >> ratingObject['rating']",
price: "[style*='--pricing'] > div > span > div > span >> text",
images: ["picture img >> data-original-uri"]
},
scenario: [
{ wait: 3 },
{
execute_js: "dmFyIHJhdGluZ09iamVjdCA9IGRvY3VtZW50LnF1ZXJ5U2VsZWN0b3IoIlthcmlhLWxhYmVsKj0nUmF0ZWQnXSIpLmF0dHJpYnV0ZXNbJ2FyaWEtbGFiZWwnXS5ub2RlVmFsdWUubWF0Y2goL1JhdGVkICg/PHJhdGluZz5bXGQ/XC5dKykuKyBmcm9tXHMoPzxyZXZpZXdzPltcZF0rKS8pLmdyb3VwczsK"
},
{ execute: "parse" }
],
premium_proxy: "us",
real_browser: true
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"title": "Family Houseboat in City Center",
"superhost": "Superhost",
"guests": "2 guests",
"bedrooms": "2 bedrooms",
"beds": "3 beds",
"baths": "1.5 baths",
"reviews": "42",
"rating": "4.95",
"price": "$138",
"images": [
"https://a0.muscache.com/pictures/40535051/7c49b7a2_original.jpg",
"https://a0.muscache.com/pictures/40309400/2ed629ab_original.jpg",
"https://a0.muscache.com/pictures/40309387/abf52119_original.jpg",
"https://a0.muscache.com/pictures/miso/Hosting-3163509/original/871a0673-e4ea-4afa-a90d-c3fa946ab491.jpeg",
"https://a0.muscache.com/pictures/miso/Hosting-3163509/original/d2fb3e01-a9e4-4374-b722-9d6952356734.jpeg"
]
}
}
"raw": {
"key": "listings", "format": "csv"
}
The URL with encoded payload will be:
Press 'Encode'
Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.
Press 'Encode'
The result must look like the following one:
Collecting Airbnb data manually can be a bit overwhelming and hard to scale.
However, a Web Scraping API can easily help you overcome this challenge and scrape the data in no time.
With Page2API you can quickly get access to the data you need, and use the time you saved on more important things!
This article will describe the easiest way to scrape Stock Pricing Data from Yahoo Finance with Page2API
In this article, you will discover the easiest way to scrape Reddit data with Page2API
Learn how to scrape company reviews from Glassdoor with Page2API