How to Scrape Indeed Data: Jobs, Salaries (Code & No code)


January 09, 2022 - 5 min read

Nicolae Rotaru
Nicolae Rotaru

Introduction

Indeed.com is a job board that aggregates postings, allowing users to search for specific positions.


In this article, you will read about the easiest way to scrape Indeed job posts with Page2API.


You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Indeed job posts into Google Sheets.


Why may we need to web scrape Indeed?
Collecting job listings from Indeed will help us to:

  • search for hirings
  • analyze the demand for specific job positions
  • analyze the average salaries


To scrape Indeed, we will use Page2API - a powerful and delightful API that makes web scraping easy and fun.

Prerequisites

To start scraping Indeed jobs, we will need the following things:


  • A Page2API account
  • A job position in a specific location that we are about to scrape.
    In our case, we will search for Ruby On Rails Software Engineer in Redwood City, CA, and set the area to within 10 miles.

How to scrape Indeed Jobs

First what we need is to open indeed.com and type Ruby On Rails Software Engineer into the search input from the Indeed home page and pick the location we need.


This will change the browser URL to something similar to:

  
    https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10


The URL is the first parameter we need to perform the scraping.


The page that you see must look like the following one:

Indeed jobs listings page

If you inspect the page HTML, you will find out that a single result is wrapped into an element that looks like the following:

Indeed result From this page, we will scrape the following attributes from each Indeed job posting:

  • Title
  • URL
  • Company
  • Location
  • Rating
  • Additional info

Now, let's define the selectors for each attribute.

  
    /* Parent: */
    .resultContent

    /* Title */
    h2.jobTitle

    /* URL */
    a

    /* Company */
    .companyName

    /* Location */
    .companyLocation

    /* Rating */
    .ratingNumber span[aria-hidden=true]

    /* Additional info */
    .metadata div
  

Now, let's handle the pagination.
There are two approaches that can help us scrape all the needed pages:

1. We can scrape the pages using the batch scraping feature
2. We can iterate through the pages by clicking on the Next page button


If we decide to go with the batch scraping approach, our payload will look like:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "real_browser": false,
      "batch": {
        "urls": "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=30&start=[0, 50, 10]",
        "concurrency": 1,
        "merge_results": true
      },
      "parse": {
        "jobs": [
          {
            "_parent": ".resultContent",
            "url": "a >> href",
            "title": "h2.jobTitle >> text",
            "company": ".companyName >> text",
            "location": ".companyLocation >> text",
            "rating": ".ratingNumber span[aria-hidden=true] >> text",
            "additional_info": [
                ".metadata div >> text"
            ]
          }
        ]
      }
    }
  

Code examples (batch scraping approach)

      
    require 'rest_client'
    require 'json'

    api_url = 'https://www.page2api.com/api/v1/scrape'

    # The following example will show how to scrape 5 pages of job postings from Indeed.com

    payload = {
      api_key: 'YOUR_PAGE2API_KEY',
      real_browser: false,
      batch: {
        urls: "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=30&start=[0, 50, 10]",
        concurrency: 1,
        merge_results: true
      },
      parse: {
        jobs: [
          {
            _parent: ".resultContent",
            url: "a >> href",
            title: "h2.jobTitle >> text",
            company: ".companyName >> text",
            location: ".companyLocation >> text",
            rating: ".ratingNumber span[aria-hidden=true] >> text",
            additional_info: [
              ".metadata div >> text"
            ]
          }
        ]
      }
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

Let's take a look at the Next page approach.


Note: the 'Next page' approach described below is for demonstrational purposes only.
We strongly recommend you use the 'Batch' approach whenever possible since it's faster and more reliable.


With this approach, to go to the next page, we must click on the next page link if it's present on the page:

  
    var next = document.querySelector('a[aria-label=Next]'); if(next) { next.click() }

    // we have this simple check to avoid any javascript errors (in case the Next page button is missing)
  

Indeed next page active

The scraping will continue while the Next link is present on the page, and stop if it disappears.
The stop condition for the scraper will be the following javascript snippet:


  
    document.querySelector('a[aria-label=Next]') == null
  

Let's build the request that will scrape all the results that the search page returned.

The following examples will show how to scrape multiple pages of job postings from Indeed.com

If we decide to go with the next button approach, our payload will look like:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "url": "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10",
      "real_browser": true,
      "merge_loops": true,
      "scenario": [
        {
          "loop": [
            { "wait_for": ".resultContent" },
            { "execute": "parse" },
            { "execute_js": "var next = document.querySelector('a[aria-label=Next]'); if(next) { next.click() }" }
          ],
          "stop_condition": "document.querySelector('a[aria-label=Next]') == null"
        }
      ],
      "parse": {
        "jobs": [
          {
            "_parent": ".resultContent",
            "url": "a >> href",
            "title": "h2.jobTitle >> text",
            "company": ".companyName >> text",
            "location": ".companyLocation >> text",
            "rating": ".ratingNumber span[aria-hidden=true] >> text",
            "additional_info": [
              ".metadata div >> text"
            ]
          }
        ]
      }
    }
  

Code examples (next button approach)

      
    require 'rest_client'
    require 'json'

    api_url = 'https://www.page2api.com/api/v1/scrape'

    # The following example will show how to scrape multiple pages of job postings from Indeed.com

    payload = {
      api_key: 'YOUR_PAGE2API_KEY',
      url: "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10",
      merge_loops: true,
      real_browser: true,
      scenario: [
        {
          loop: [
            { wait_for: ".resultContent" },
            { execute: "parse" },
            { execute_js: "var next = document.querySelector(\"a[aria-label=Next]\"); if(next) { next.click() }" }
          ],
          stop_condition: "document.querySelector(\"a[aria-label=Next]\") == null"
        }
      ],
      parse: {
        jobs: [
          {
            _parent: ".resultContent",
            url: "a >> href",
            title: "h2.jobTitle >> text",
            company: ".companyName >> text",
            location: ".companyLocation >> text",
            rating: ".ratingNumber span[aria-hidden=true] >> text",
            additional_info: [
              ".metadata div >> text"
            ]
          }
        ]
      }
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

The result

  
    {
      "result": {
        "places": [
          {
            "url": "https://www.indeed.com/company/Coupa/jobs/Senior-Lead-Software-Engineer-fa676bc66ad1daae?fccid=c6a1779d65543307&vjs=3",
            "title": "Senior/Lead Software Engineer, Ruby on Rails",
            "company": "Coupa Software",
            "location": "San Mateo, CA 94402 (Nineteenth Avenue area)+1 location",
            "rating": "3.9",
            "additional_info": [
              "$145,000 - $165,000 a year",
              "Full-time",
              "8 hour shift"
            ]
          },
          {
            "url": "https://www.indeed.com/company/Poshmark/jobs/Software-Engineer-e55c033766067a6c?fccid=0f4f2d112db7d324&vjs=3",
            "title": "Software Engineer, Web Applications",
            "company": "Poshmark",
            "location": "Redwood City, CA",
            "rating": "4.6",
            "additional_info": [
              "Full-time",
            ]
          },
          {
            "url": "https://www.indeed.com/rc/clk?jk=1e9bb2cae582950f&fccid=c6a1779d65543307&vjs=3",
            "title": "Software Engineer, Ruby on Rails",
            "company": "Coupa Software",
            "location": "San Mateo, CA",
            "rating": "3.9",
            "additional_info": [
              "Remote",
            ]
          }, ...
        ]
      }, ...
    }
  

How to scrape Indeed Job Page

We need to open any URL from the previous step with the job listing in a new tab.


This will change the browser URL to something similar to:

  
    https://www.indeed.com/viewjob?jk=1e9bb2cae582950f


This URL is the first parameter we need to scrape all the information about a job.


The page that you see must look like the following one:

Indeed job page From this page, we will scrape the following attributes:

  • Title
  • Company
  • Rating
  • Reviews count
  • Description

Now, let's define the selectors for each attribute.

  
    /* Title */
    h1

    /* Company */
    .jobsearch-InlineCompanyRating a

    /* Rating */
    meta[itemprop=ratingValue]

    /* Reviews count */
    meta[itemprop=ratingCount]

    /* Description */
    #jobDescriptionText

  

It's time to prepare the request that will scrape Indeed Job Page.

The payload for our scraping request will be:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "url": "https://www.indeed.com/viewjob?jk=1e9bb2cae582950f",
      "parse": {
        "title": "h1 >> text",
        "company": ".jobsearch-InlineCompanyRating a >> text",
        "rating": "meta[itemprop=ratingValue] >> content",
        "reviews_count": "meta[itemprop=ratingCount] >> content",
        "description": "#jobDescriptionText >> text"
      }
    }
  

Code examples

      
    require 'rest_client'
    require 'json'

    api_url = 'https://www.page2api.com/api/v1/scrape'
    payload = {
      api_key: 'YOUR_PAGE2API_KEY',
      url: "https://www.indeed.com/viewjob?jk=1e9bb2cae582950f",
      parse: {
        title: "h1 >> text",
        company: ".jobsearch-InlineCompanyRating a >> text",
        rating: "meta[itemprop=ratingValue] >> content",
        reviews_count: "meta[itemprop=ratingCount] >> content",
        description: "#jobDescriptionText >> text"
      }
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

The result

  
    {
      "result": {
        "title": "Software Engineer, Ruby on Rails",
        "company": "Coupa Software",
        "rating": "3.9",
        "reviews_count": "27",
        "description": "Coupa Software (NASDAQ: COUP), a leader in business spend management (BSM), ..."
      }, ...
    }
  

How to export Indeed jobs to Google Sheets

In order to be able to export our Indeed jobs to a Google Spreadsheet we will need to slightly modify our request to receive the data in CSV format instead of JSON.

According to the documentation, we need to add the following parameters to our payload:
  
    "raw": {
      "key": "jobs", "format": "csv"
    }
  

Now our payload will look like:

{ "api_key": "YOUR_PAGE2API_KEY", "real_browser": false, "raw": { "key": "jobs", "format": "csv" }, "batch": { "urls": [ "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10&start=0", "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10&start=20", "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10&start=30" ], "concurrency": 1, "merge_results": true }, "parse": { "jobs": [ { "_parent": ".resultContent", "url": "a >> href", "title": "h2.jobTitle >> text", "company": ".companyName >> text", "location": ".companyLocation >> text", "rating": ".ratingNumber span[aria-hidden=true] >> text", "additional_info": [ ".metadata div >> text" ] } ] } }

Please note that the batch URLs are defined explicitly to make it simpler to edit the payload.


Now, edit the payload above if needed, and press Encode →

The URL with encoded payload will be:


  Press 'Encode'

Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.

The final part is adding the IMPORTDATA function, and we are ready to import our Indeed jobs into a Google Spreadsheet.
  Press 'Encode'

The result must look like the following one:

Indeed jobs listings import to Google Sheets

Conclusion

That's pretty much of it!

In this article, you've learned how to scrape the data from a job board such as Indeed.com with Page2API - a Web Scraping API that handles all the hassle, and lets you get the data you need with ease.

If you found a company you are interested in, this article may be useful for you.

You might also like:

Nicolae Rotaru
Nicolae Rotaru
December 07, 2021 - 5 min read

How to Scrape Yelp Data: Business Info, Reviews and more.

Learn the easiest way to scrape business information from Yelp with Page2API

Nicolae Rotaru
Nicolae Rotaru
November 22, 2021 - 7 min read

How to Scrape Real Estate Data from Zillow (Code & No code)

Learn how to scrape real estate data from Zillow with Page2API in no time

Nicolae Rotaru
Nicolae Rotaru
October 31, 2021 - 4 min read

How to Scrape eBay Data: Products, Prices, and more

This article will describe the easiest way to scrape eBay products with Page2API

Ready to Scrape the Web like a PRO?

Page2API will handle the hassle. You will get the data with ease!
(1000 free API calls. No credit card required)