Indeed.com is a job board that aggregates postings, allowing users to search for specific positions.
In this article, you will read about the easiest way to scrape Indeed job posts with Page2API.
You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Indeed job posts into Google Sheets.
Why may we need to web scrape Indeed?
Collecting job listings from Indeed will help us to:
To scrape Indeed, we will use Page2API - a powerful and delightful API that makes web scraping easy and fun.
To start scraping Indeed jobs, we will need the following things:
First what we need is to open indeed.com and type Ruby On Rails Software Engineer into the search input from the Indeed home page and pick the location we need.
https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10
The page that you see must look like the following one:
If you inspect the page HTML, you will find out that a single result is wrapped into an element that looks like the following:
/* Parent: */
.resultContent
/* Title */
h2.jobTitle
/* URL */
a
/* Company */
.companyName
/* Location */
.companyLocation
/* Rating */
.ratingNumber span[aria-hidden=true]
/* Additional info */
.metadata div
Now, let's handle the pagination.
There are two approaches that can help us scrape all the needed pages:
1. We can scrape the pages using the batch scraping feature
2. We can iterate through the pages by clicking on the Next page button
{
"api_key": "YOUR_PAGE2API_KEY",
"real_browser": false,
"batch": {
"urls": "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=30&start=[0, 50, 10]",
"concurrency": 1,
"merge_results": true
},
"parse": {
"jobs": [
{
"_parent": ".resultContent",
"url": "a >> href",
"title": "h2.jobTitle >> text",
"company": ".companyName >> text",
"location": ".companyLocation >> text",
"rating": ".ratingNumber span[aria-hidden=true] >> text",
"additional_info": [
".metadata div >> text"
]
}
]
}
}
require 'rest_client'
require 'json'
api_url = 'https://www.page2api.com/api/v1/scrape'
# The following example will show how to scrape 5 pages of job postings from Indeed.com
payload = {
api_key: 'YOUR_PAGE2API_KEY',
real_browser: false,
batch: {
urls: "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=30&start=[0, 50, 10]",
concurrency: 1,
merge_results: true
},
parse: {
jobs: [
{
_parent: ".resultContent",
url: "a >> href",
title: "h2.jobTitle >> text",
company: ".companyName >> text",
location: ".companyLocation >> text",
rating: ".ratingNumber span[aria-hidden=true] >> text",
additional_info: [
".metadata div >> text"
]
}
]
}
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
Let's take a look at the Next page approach.
Note: the 'Next page' approach described below is for demonstrational purposes only.
We strongly recommend you use the 'Batch' approach whenever possible since it's faster and more reliable.
var next = document.querySelector('a[aria-label*=Next]'); if(next) { next.click() }
// we have this simple check to avoid any javascript errors (in case the Next page button is missing)
The scraping will continue while the Next link is present on the page, and stop if it disappears.
The stop condition for the scraper will be the following javascript snippet:
document.querySelector('a[aria-label*=Next]') == null
Let's build the request that will scrape all the results that the search page returned.
The following examples will show how to scrape multiple pages of job postings from Indeed.com
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10",
"real_browser": true,
"merge_loops": true,
"scenario": [
{
"loop": [
{ "wait_for": ".resultContent" },
{ "execute": "parse" },
{ "execute_js": "var next = document.querySelector('a[aria-label*=Next]'); if(next) { next.click() }" }
],
"stop_condition": "document.querySelector('a[aria-label*=Next]') == null"
}
],
"parse": {
"jobs": [
{
"_parent": ".resultContent",
"url": "a >> href",
"title": "h2.jobTitle >> text",
"company": ".companyName >> text",
"location": ".companyLocation >> text",
"rating": ".ratingNumber span[aria-hidden=true] >> text",
"additional_info": [
".metadata div >> text"
]
}
]
}
}
require 'rest_client'
require 'json'
api_url = 'https://www.page2api.com/api/v1/scrape'
# The following example will show how to scrape multiple pages of job postings from Indeed.com
payload = {
api_key: 'YOUR_PAGE2API_KEY',
url: "https://www.indeed.com/jobs?q=Ruby%20On%20Rails%20Software%20Engineer&l=Redwood%20City%2C%20CA&radius=10",
merge_loops: true,
real_browser: true,
scenario: [
{
loop: [
{ wait_for: ".resultContent" },
{ execute: "parse" },
{ execute_js: "var next = document.querySelector(\"a[aria-label*=Next]\"); if(next) { next.click() }" }
],
stop_condition: "document.querySelector(\"a[aria-label*=Next]\") == null"
}
],
parse: {
jobs: [
{
_parent: ".resultContent",
url: "a >> href",
title: "h2.jobTitle >> text",
company: ".companyName >> text",
location: ".companyLocation >> text",
rating: ".ratingNumber span[aria-hidden=true] >> text",
additional_info: [
".metadata div >> text"
]
}
]
}
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"places": [
{
"url": "https://www.indeed.com/company/Coupa/jobs/Senior-Lead-Software-Engineer-fa676bc66ad1daae?fccid=c6a1779d65543307&vjs=3",
"title": "Senior/Lead Software Engineer, Ruby on Rails",
"company": "Coupa Software",
"location": "San Mateo, CA 94402 (Nineteenth Avenue area)+1 location",
"rating": "3.9",
"additional_info": [
"$145,000 - $165,000 a year",
"Full-time",
"8 hour shift"
]
},
{
"url": "https://www.indeed.com/company/Poshmark/jobs/Software-Engineer-e55c033766067a6c?fccid=0f4f2d112db7d324&vjs=3",
"title": "Software Engineer, Web Applications",
"company": "Poshmark",
"location": "Redwood City, CA",
"rating": "4.6",
"additional_info": [
"Full-time",
]
},
{
"url": "https://www.indeed.com/rc/clk?jk=1e9bb2cae582950f&fccid=c6a1779d65543307&vjs=3",
"title": "Software Engineer, Ruby on Rails",
"company": "Coupa Software",
"location": "San Mateo, CA",
"rating": "3.9",
"additional_info": [
"Remote",
]
}, ...
]
}, ...
}
We need to open any URL from the previous step with the job listing in a new tab.
https://www.indeed.com/viewjob?jk=1e9bb2cae582950f
The page that you see must look like the following one:
/* Title */
h1
/* Company */
.jobsearch-InlineCompanyRating a
/* Rating */
meta[itemprop=ratingValue]
/* Reviews count */
meta[itemprop=ratingCount]
/* Description */
#jobDescriptionText
It's time to prepare the request that will scrape Indeed Job Page.
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://www.indeed.com/viewjob?jk=1e9bb2cae582950f",
"parse": {
"title": "h1 >> text",
"company": ".jobsearch-InlineCompanyRating a >> text",
"rating": "meta[itemprop=ratingValue] >> content",
"reviews_count": "meta[itemprop=ratingCount] >> content",
"description": "#jobDescriptionText >> text"
}
}
require 'rest_client'
require 'json'
api_url = 'https://www.page2api.com/api/v1/scrape'
payload = {
api_key: 'YOUR_PAGE2API_KEY',
url: "https://www.indeed.com/viewjob?jk=1e9bb2cae582950f",
parse: {
title: "h1 >> text",
company: ".jobsearch-InlineCompanyRating a >> text",
rating: "meta[itemprop=ratingValue] >> content",
reviews_count: "meta[itemprop=ratingCount] >> content",
description: "#jobDescriptionText >> text"
}
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"title": "Software Engineer, Ruby on Rails",
"company": "Coupa Software",
"rating": "3.9",
"reviews_count": "27",
"description": "Coupa Software (NASDAQ: COUP), a leader in business spend management (BSM), ..."
}, ...
}
"raw": {
"key": "jobs", "format": "csv"
}
Please note that the batch URLs are defined explicitly to make it simpler to edit the payload.
The URL with encoded payload will be:
Press 'Encode'
Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.
Press 'Encode'
The result must look like the following one:
That's pretty much of it!
In this article, you've learned how to scrape the data from a job board such as Indeed.com with Page2API - a Web Scraping API that handles all the hassle, and lets you get the data you need with ease.
If you found a company you are interested in, this article may be useful for you.
Learn the easiest way to scrape business information from Yelp with Page2API
Learn how to scrape real estate data from Zillow with Page2API in no time
This article will describe the easiest way to scrape eBay products with Page2API