How to Scrape Airbnb Data: Pricing, Ratings, Amenities (Code & No code)


July 27, 2022 - 5 min read

Nicolae Rotaru
Nicolae Rotaru

Introduction

Airbnb is an online marketplace that connects people who want to rent out their homes with people who are looking for accommodations in specific locales.


In this article, you will read about the easiest way to scrape Airbnb listings with Page2API.


You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Airbnb listings into Google Sheets.


You can scrape Airbnb data, with such information as amenities, prices, descriptions, photos, URLs to perform:

  • price monitoring
  • trends analysis
  • competitor analysis
  • or simply to find the perfect vacation spot


In this article, we will learn how to:

  • Scrape Airbnb listings
  • Scrape Airbnb listing details

Prerequisites

To start scraping Airbnb, you will need the following things:


  • A Page2API account
  • A location, in which we want to search for listings, let's use for example Amsterdam, Netherlands

How to scrape Airbnb listings

First, we need to open the Airbnb search page with the desired location.

In our case it will be:

  
    https://www.airbnb.com/s/Amsterdam--Netherlands/homes


This URL is the first parameter we need to start scraping the listings page.


The page we see must look similar to the following one:

Airbnb listings page

From the search page, we will scrape the following attributes:

  • URL
  • Title
  • Name
  • Beds
  • Rating
  • Price

Each listing container is wrapped in a div element with the following attribute: itemprop="itemListElement".

Now, let's define the selectors for each attribute.

  
    /* Parent: */
    [itemprop=itemListElement]

    /* URL: */
    a[target*='listing_']

    /* Title: */
    [id*='title']

    /* Name: */
    [itemprop=name]

    /* Beds: */
    [aria-label*='bed']

    /* Rating: */
    [aria-label*='rating']

    /* Price: */
    [style*='--pricing'] > div > span > div > span
  


Now, let's handle the pagination.

If we take a look at the URL structure when switching between pages, we see that they look like this:

  
    // Page 1
    https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=0

    // Page 2
    https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=20

    // Page 3
    https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=40
  

This looks like a great scenario to use the batch scraping approach.

Now let's build the request that will scrape all listings that the search page returned.

The following examples will show how to scrape 3 pages of listings from Airbnb.com

The payload for our scraping request will be:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "batch": {
        "urls": "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=[0, 40, 20]",
        "concurrency": 1,
        "merge_results": true
      },
      "parse": {
        "listings": [
          {
            "url": "a[target*='listing_'] >> href",
            "beds": "[aria-label*='bed'] >> text",
            "name": "[itemprop=name] >> content",
            "price": "[style*='--pricing'] > div > span > div > span >> text",
            "title": "[id*='title'] >> text",
            "rating": "[aria-label*='rating'] >> text",
            "_parent": "[itemprop=itemListElement]"
          }
        ]
      },
      "wait_for": "[itemprop=itemListElement]",
      "real_browser": true,
      "premium_proxy": "us"
    }
  

Code examples (batch scraping approach)

      
    require 'rest_client'
    require 'json'

    api_url = "https://www.page2api.com/api/v1/scrape"
    payload = {
      api_key: "YOUR_PAGE2API_KEY",
      batch: {
        urls: "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=[0, 40, 20]",
        concurrency: 1,
        merge_results: true
      },
      parse: {
        listings: [
          {
            url: "a[target*='listing_'] >> href",
            beds: "[aria-label*='bed'] >> text",
            name: "[itemprop=name] >> content",
            price: "[style*='--pricing'] > div > span > div > span >> text",
            title: "[id*='title'] >> text",
            rating: "[aria-label*='rating'] >> text",
            _parent: "[itemprop=itemListElement]"
          }
        ]
      },
      wait_for: "[itemprop=itemListElement]",
      real_browser: true,
      premium_proxy: "us"
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

The result

  
    {
      "result": {
        "listings": [
          {
            "url": "https://www.airbnb.com/rooms/3982031?adults=1&children=0&infants=0&check_in=2022-08-20&check_out=2022-08-27&previous_page_section_name=1000&federated_search_id=b79f5e55-f2f6-40db-baae-db124484bd1e",
            "beds": "1 double bed",
            "name": "Top location, quiet guesthouse, 2p",
            "price": "$113",
            "title": "Guesthouse in Stadsdeel Centrum",
            "rating": "4.93 (147)"
          },
          {
            "url": "https://www.airbnb.com/rooms/51760555?adults=1&children=0&infants=0&check_in=2022-11-02&check_out=2022-11-09&previous_page_section_name=1000&federated_search_id=b79f5e55-f2f6-40db-baae-db124484bd1e",
            "beds": "1 double bed",
            "name": "Amsterdam city garden",
            "price": "$87",
            "title": "Apartment in Stadsdeel West",
            "rating": "4.88 (8)"
          },
          ...
        ]
      }, ...
    }
  

How to scrape Airbnb listing details

From the 'Search' page, we click on any listing.


This will change the browser URL to something similar to:

  
    https://www.airbnb.com/rooms/3163509


The page will look like the following one:

Airbnb property overview page

From this page, we will scrape the following attributes:

  • Title
  • Superhost
  • Guests
  • Bedrooms
  • Beds
  • Baths
  • Reviews
  • Rating
  • Price
  • Description
  • Amenities
  • Images

Let's define the selectors for each attribute.

  
    /* Title: */
    .ds-summary-row span

    /* Superhost: */
    //span[contains(text(),'Superhost')]

    /* Guests: */
    //span[contains(text(),'guests')]

    /* Bedrooms: */
    //span[contains(text(),'bedrooms')]

    /* Beds: */
    //span[contains(text(),'beds')]

    /* Baths: */
    //span[contains(text(),'baths')]

    /* Price: */
    [style*='--pricing'] > div > span > div > span

    /* Images: */
    picture img

    /* Reviews / Rating */
    // we will encode it to base64 later
    var ratingObject = document.querySelector("[aria-label*='Rated']").attributes['aria-label'].nodeValue.match(/Rated (?[\d?\.]+).+ from\s(?[\d]+)/).groups;

    // reviews js selector
    ratingObject['reviews']

    // rating js selector
    ratingObject['rating']
  

The payload for our scraping request will be:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "url": "https://www.airbnb.com/rooms/3163509",
      "parse": {
        "title": "h1 >> text",
        "superhost": "//span[contains(text(),'Superhost')] >> text",
        "guests": "//span[contains(text(),'guests')] >> text",
        "bedrooms": "//span[contains(text(),'bedrooms')] >> text",
        "beds": "//span[contains(text(),'beds')] >> text",
        "baths": "//span[contains(text(),'baths')] >> text",
        "reviews": "js >> ratingObject['reviews']",
        "rating": "js >> ratingObject['rating']",
        "price": "[style*='--pricing'] > div > span > div > span >> text",
        "images": ["picture img >> data-original-uri"]
      },
      "scenario": [
        { "wait": 3 },
        {
          "execute_js": "dmFyIHJhdGluZ09iamVjdCA9IGRvY3VtZW50LnF1ZXJ5U2VsZWN0b3IoIlthcmlhLWxhYmVsKj0nUmF0ZWQnXSIpLmF0dHJpYnV0ZXNbJ2FyaWEtbGFiZWwnXS5ub2RlVmFsdWUubWF0Y2goL1JhdGVkICg/PHJhdGluZz5bXGQ/XC5dKykuKyBmcm9tXHMoPzxyZXZpZXdzPltcZF0rKS8pLmdyb3VwczsK"
        },
        { "execute": "parse" }
      ],
      "premium_proxy": "us",
      "real_browser": true
    }
  

Running the scraping request

      
    require 'rest_client'
    require 'json'

    api_url ="https://www.page2api.com/api/v1/scrape"
    payload = {
      api_key: 'YOUR_PAGE2API_KEY',
      url: "https://www.airbnb.com/rooms/3163509",
      parse: {
        title: "h1 >> text",
        superhost: "//span[contains(text(),'Superhost')] >> text",
        guests: "//span[contains(text(),'guests')] >> text",
        bedrooms: "//span[contains(text(),'bedrooms')] >> text",
        beds: "//span[contains(text(),'beds')] >> text",
        baths: "//span[contains(text(),'baths')] >> text",
        reviews: "js >> ratingObject['reviews']",
        rating: "js >> ratingObject['rating']",
        price: "[style*='--pricing'] > div > span > div > span >> text",
        images: ["picture img >> data-original-uri"]
      },
      scenario: [
        { wait: 3 },
        {
          execute_js: "dmFyIHJhdGluZ09iamVjdCA9IGRvY3VtZW50LnF1ZXJ5U2VsZWN0b3IoIlthcmlhLWxhYmVsKj0nUmF0ZWQnXSIpLmF0dHJpYnV0ZXNbJ2FyaWEtbGFiZWwnXS5ub2RlVmFsdWUubWF0Y2goL1JhdGVkICg/PHJhdGluZz5bXGQ/XC5dKykuKyBmcm9tXHMoPzxyZXZpZXdzPltcZF0rKS8pLmdyb3VwczsK"
        },
        { execute: "parse" }
      ],
      premium_proxy: "us",
      real_browser: true
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

The result

  
    {
      "result": {
        "title": "Family Houseboat in City Center",
        "superhost": "Superhost",
        "guests": "2 guests",
        "bedrooms": "2 bedrooms",
        "beds": "3 beds",
        "baths": "1.5 baths",
        "reviews": "42",
        "rating": "4.95",
        "price": "$138",
        "images": [
          "https://a0.muscache.com/pictures/40535051/7c49b7a2_original.jpg",
          "https://a0.muscache.com/pictures/40309400/2ed629ab_original.jpg",
          "https://a0.muscache.com/pictures/40309387/abf52119_original.jpg",
          "https://a0.muscache.com/pictures/miso/Hosting-3163509/original/871a0673-e4ea-4afa-a90d-c3fa946ab491.jpeg",
          "https://a0.muscache.com/pictures/miso/Hosting-3163509/original/d2fb3e01-a9e4-4374-b722-9d6952356734.jpeg"
        ]
      }
    }
  

How to export Airbnb listings to Google Sheets

In order to be able to export our Airbnb listings to a Google Spreadsheet we will need to slightly modify our request to receive the data in CSV format instead of JSON.

According to the documentation, we need to add the following parameters to our payload:
  
    "raw": {
      "key": "listings", "format": "csv"
    }
  

Now our payload will look like:

{ "api_key": "YOUR_PAGE2API_KEY", "batch": { "urls": [ "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=0", "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=20", "https://www.airbnb.com/s/Amsterdam--Netherlands/homes?pagination_search=true&items_offset=40" ], "concurrency": 1, "merge_results": true }, "raw": { "key": "listings", "format": "csv" }, "parse": { "listings": [ { "url": "a[target*='listing_'] >> href", "beds": "[aria-label*='bed'] >> text", "name": "[itemprop=name] >> content", "price": "[style*='--pricing'] > div > span > div > span >> text", "title": "[id*='title'] >> text", "rating": "[aria-label*='rating'] >> text", "_parent": "[itemprop=itemListElement]" } ] }, "wait_for": "[itemprop=itemListElement]", "real_browser": true, "premium_proxy": "us" }

Now, edit the payload above if needed, and press Encode →

The URL with encoded payload will be:


  Press 'Encode'

Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.

The final part is adding the IMPORTDATA function, and we are ready to import our Airbnb listings into a Google Spreadsheet.
  Press 'Encode'

The result must look like the following one:

Airbnb listings import to Google Sheets

Final thoughts

Collecting Airbnb data manually can be a bit overwhelming and hard to scale.
However, a Web Scraping API can easily help you overcome this challenge and scrape the data in no time.
With Page2API you can quickly get access to the data you need, and use the time you saved on more important things!

You might also like:

Nicolae Rotaru
Nicolae Rotaru
May 29, 2022 - 5 min read

How to Scrape Yahoo Finance Stock Pricing Data (+ No code)

This article will describe the easiest way to scrape Stock Pricing Data from Yahoo Finance with Page2API

Nicolae Rotaru
Nicolae Rotaru
May 21, 2022 - 4 min read

How to Scrape Reddit Data: Posts, Images, Comments, and more.

In this article, you will discover the easiest way to scrape Reddit data with Page2API

Nicolae Rotaru
Nicolae Rotaru
January 12, 2022 - 4 min read

How to Scrape Glassdoor Reviews (Code & No code)

Learn how to scrape company reviews from Glassdoor with Page2API

Ready to Scrape the Web like a PRO?

Page2API will handle the hassle. You will get the data with ease!
1000 free API calls.
Based on all requests made in the last 30 days. 99.85% success rate.
No-code-friendly.