YouTube is a video sharing service where users can watch, like, share, comment and upload their own videos.
In this article, you will read about the easiest way to web scrape Youtube data with Page2API.
You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Youtube channel videos into Google Sheets.
To start scraping Youtube, you will need the following things:
The first thing you need is to open the youtube video we are interested in.
https://www.youtube.com/watch?v=1WOQumXj0kg
The page will look like the following one:
From this page, we will scrape the following attributes:
/* Title: */
meta[name=title]
/* Likes: */
#segmented-like-button span[role=text]
/* Views: */
.ytd-watch-metadata span[dir=auto]:nth-of-type(1)
/* Uploaded: */
.ytd-watch-metadata span[dir=auto]:nth-of-type(3)
/* Channel name: */
.ytd-channel-name a
/* Channel URL: */
.ytd-channel-name a
/* Channel subscribers: */
#owner-sub-count
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://www.youtube.com/watch?v=1WOQumXj0kg",
"real_browser": true,
"premium_proxy": "us",
"wait_for": "#segmented-like-button",
"parse": {
"title": "meta[name=title] >> content",
"likes": "#segmented-like-button span[role=text] >> text",
"views": ".ytd-watch-metadata span[dir=auto]:nth-of-type(1) >> text",
"uploaded": ".ytd-watch-metadata span[dir=auto]:nth-of-type(3) >> text",
"channel_name": ".ytd-channel-name a >> text",
"channel_url": ".ytd-channel-name a >> href",
"channel_subscribers": "#owner-sub-count >> text"
}
}
require 'rest_client'
require 'json'
api_url = "https://www.page2api.com/api/v1/scrape"
payload = {
api_key: "YOUR_PAGE2API_KEY",
url: "https://www.youtube.com/watch?v=1WOQumXj0kg",
real_browser: true,
premium_proxy: "us",
wait_for: "#segmented-like-button",
parse: {
title: "meta[name=title] >> content",
likes: "#segmented-like-button span[role=text] >> text",
views: ".ytd-watch-metadata span[dir=auto]:nth-of-type(1) >> text",
uploaded: ".ytd-watch-metadata span[dir=auto]:nth-of-type(3) >> text",
channel_name: ".ytd-channel-name a >> text",
channel_url: ".ytd-channel-name a >> href",
channel_subscribers: "#owner-sub-count >> text"
}
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
print(result)
{
"result": {
"title": "How to Scrape Data From Another Website Into Bubble.io | Bubble Tutorials | Planetnocode.Com",
"likes": "49",
"views": "4.3K views",
"uploaded": "6 months ago",
"channel_name": "PlanetNoCode",
"channel_url": "https://www.youtube.com/@planetnocode9104",
"channel_subscribers": "1.04K subscribers"
} ...
}
First, we need to open the Youtube and search for the desired channel.
https://www.youtube.com/@planetnocode9104
The page we see must look similar to the following one:
From this page, we will scrape the following attributes:
Channel Details
/* Title: */
#text-container .ytd-channel-name
/* Subscribers: */
#subscriber-count
/* Thumbnail: */
img#img
/* Parent: */
#content.style-scope.ytd-rich-item-renderer
/* Title: */
#video-title
/* Badge: */
.badge
/* URL: */
a#thumbnail
/* Views: */
#metadata-line .inline-metadata-item.style-scope.ytd-video-meta-block:nth-of-type(1)
/* Uploaded: */
#metadata-line .inline-metadata-item.style-scope.ytd-video-meta-block:nth-of-type(2)
Now, let's handle the pagination.
document.querySelectorAll("#content.style-scope.ytd-rich-item-renderer").forEach(e => e.scrollIntoView({behavior: 'smooth'}))
Now let's build the request that will scrape all videos that the Youtube channel page returned.
The following examples will show how to scrape 2 pages of videos from Youtube's channel page.
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://www.youtube.com/@planetnocode9104/videos",
"real_browser": true,
"premium_proxy": "us",
"parse": {
"title": "#text-container .ytd-channel-name >> text",
"subscribers": "#subscriber-count >> text",
"thumbnail": "img#img >> src",
"latest_videos": [
{
"_parent": "#content.style-scope.ytd-rich-item-renderer",
"badge": ".badge >> text",
"title": "#video-title >> text",
"url": "a#thumbnail >> href",
"views": "#metadata-line .inline-metadata-item.style-scope.ytd-video-meta-block:nth-of-type(1) >> text",
"uploaded": "#metadata-line .inline-metadata-item.style-scope.ytd-video-meta-block:nth-of-type(2) >> text"
}
]
},
"scenario": [
{ "wait_for": "#text-container .ytd-channel-name" },
{
"loop": [
{ "execute_js": "document.querySelectorAll('#content.style-scope.ytd-rich-item-renderer').forEach(e => e.scrollIntoView({behavior: 'smooth'}))" },
{ "wait": 1 }
],
"iterations": 2
},
{ "execute": "parse" }
]
}
require 'rest_client'
require 'json'
api_url = "https://www.page2api.com/api/v1/scrape"
payload = {
api_key: "YOUR_PAGE2API_KEY",
url: "https://www.youtube.com/@planetnocode9104/videos",
real_browser: true,
premium_proxy: "us",
parse: {
title: "#text-container .ytd-channel-name >> text",
subscribers: "#subscriber-count >> text",
thumbnail: "img#img >> src",
latest_videos: [
{
_parent: "#content.style-scope.ytd-rich-item-renderer",
badge: ".badge >> text",
title: "#video-title >> text",
url: "a#thumbnail >> href",
views: "#metadata-line .inline-metadata-item.style-scope.ytd-video-meta-block:nth-of-type(1) >> text",
uploaded: "#metadata-line .inline-metadata-item.style-scope.ytd-video-meta-block:nth-of-type(2) >> text"
}
]
},
scenario: [
{ wait_for: "#text-container .ytd-channel-name" },
{
loop: [
{ execute_js: "document.querySelectorAll('#content.style-scope.ytd-rich-item-renderer').forEach(e => e.scrollIntoView({behavior: 'smooth'}))" },
{ wait: 1 }
],
iterations: 2
},
{ execute: "parse" }
]
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"title": "PlanetNoCode",
"subscribers": "1.05K subscribers",
"thumbnail": "https://yt3.googleusercontent.com/eZ2I1ah_TfO4Go8oagPCmbwSdplTY6O0663Yjxney_fpBLngPJD6nN3fMrAb_OBMjQABY2vBxg=s88-c-k-c0x00ffffff-no-rj",
"latest_videos": [
{
"badge": "Bubble Tutorials Library",
"title": "3 ways to edit calendar events in Bubble.io | Bubble.io Tutorials | Planetnocode.com",
"url": "https://www.youtube.com/watch?v=1RGgw0lSPM0",
"views": "149 views",
"uploaded": "2 weeks ago"
},
{
"badge": "Bubble Tutorials Library",
"title": "Add a calendar to a Bubble.io app | Bubble.io Tutorials | Planetnocode.com",
"url": "https://www.youtube.com/watch?v=LpoUBqUiXkc",
"views": "212 views",
"uploaded": "3 weeks ago"
},
{
"badge": "Bubble Tutorials Library",
"title": "Using split by and database triggers in Bubble.io | Bubble.io Tutorials | Planetnocode.com",
"url": "https://www.youtube.com/watch?v=0XR9YA1n2cQ",
"views": "199 views",
"uploaded": "3 weeks ago"
},
}, ...
}
"raw": {
"key": "videos", "format": "csv"
}
The URL with encoded payload will be:
Press 'Encode'
Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.
Press 'Encode'
The result must look like the following one:
Collecting the data from Youtube manually can be a bit overwhelming and hard to scale.
However, a Web Scraping API can easily help you overcome this challenge and perform Youtube scraping in no time.
With Page2API you can quickly get access to the data you need, and use the time you saved on more important things!
In this article, you will find an easy way to scrape Instagram with Page2API using one of your favorite programming languages or a no-code solution that will import Instagram posts to Google Sheets
In this article, you will find an easy way to scrape Twitter with Page2API using one of your favorite programming languages or a no-code solution that will import tweets to Google Sheets
In this article, you will find an easy way to scrape Yellow Pages with Page2API using one of your favorite programming languages or a no-code solution that will import the data to Google Sheets