Twitter is a microblogging and social networking service owned by American company Twitter, Inc., on which users post and interact with messages known as "tweets".
In this article, you will read about the easiest way to web scrape Twitter data with Page2API.
You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Twitter data into Google Sheets.
You can scrape Twitter data, with such information as account data, tweets, etc., to perform:
To start scraping Twitter, you will need the following things:
The first thing you need is to open the profile page we are interested in.
https://twitter.com/IndieHackers
The page will look like the following one:
From this page, we will scrape the following attributes:
Twitter Account data
/* Name: */
[data-testid=UserName] span
/* Verified: */
document.querySelectorAll('[data-testid=UserName] svg[role=img]').length == 1
/* Description: */
[data-testid=UserDescription]
/* Location: */
span[data-testid=UserLocation]
/* User URL: */
a[data-testid=UserUrl]
/* Join date: */
span[data-testid=UserJoinDate]
/* Following: */
a[href*=following] > span > span
/* Followers: */
a[href*=followers] > span > span
/* Parent: */
[data-testid=tweet]
/* Name: */
[data-testid='User-Names'] a
/* User URL: */
[data-testid='User-Names'] a
/* Content: */
div[data-testid=tweetText]
/* Timestamp: */
time
/* Retweets: */
div[data-testid=retweet]
/* Likes: */
div[data-testid=like]
/* Replies: */
div[data-testid=reply]
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://twitter.com/IndieHackers",
"parse": {
"name": "[data-testid=UserName] span >> text",
"verified": "js >> document.querySelectorAll('[data-testid=UserName] svg[role=img]').length == 1",
"description": "[data-testid=UserDescription] >> text",
"location": "span[data-testid=UserLocation] >> text",
"url": "a[data-testid=UserUrl] >> text",
"join_date": "span[data-testid=UserJoinDate] >> text",
"following": "a[href*=following] > span > span >> text",
"followers": "a[href*=followers] > span > span >> text",
"latest_tweets": [
{
"_parent": "[data-testid=tweet]",
"name": "[data-testid='User-Names'] a >> text",
"user_url": "[data-testid='User-Names'] a >> href",
"content": "div[data-testid=tweetText] >> text",
"timestamp": "time >> datetime",
"retweets": "div[data-testid=retweet] >> text",
"likes": "div[data-testid=like] >> text",
"replies": "div[data-testid=reply] >> text"
}
]
},
"wait_for": "[data-testid=tweet]",
"premium_proxy": "us",
"real_browser": true
}
require 'rest_client'
require 'json'
api_url ="https://www.page2api.com/api/v1/scrape"
payload = {
api_key: 'YOUR_PAGE2API_KEY',
url: "https://twitter.com/IndieHackers",
parse: {
name: "[data-testid=UserName] span >> text",
verified: "js >> document.querySelectorAll('[data-testid=UserName] svg[role=img]').length == 1",
description: "[data-testid=UserDescription] >> text",
location: "span[data-testid=UserLocation] >> text",
url: "a[data-testid=UserUrl] >> text",
join_date: "span[data-testid=UserJoinDate] >> text",
following: "a[href*=following] > span > span >> text",
followers: "a[href*=followers] > span > span >> text",
latest_tweets: [
{
_parent: "[data-testid=tweet]",
name: "[data-testid='User-Names'] a >> text",
user_url: "[data-testid='User-Names'] a >> href",
content: "div[data-testid=tweetText] >> text",
timestamp: "time >> datetime",
retweets: "div[data-testid=retweet] >> text",
likes: "div[data-testid=like] >> text",
replies: "div[data-testid=reply] >> text"
}
]
},
wait_for: "[data-testid=tweet]",
premium_proxy: "us",
real_browser: true
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
print(result)
{
"result": {
"name": "Indie Hackers",
"verified": false,
"description": "Get inspired! Real stories, advice, and revenue numbers from the founders of profitable businesses by @csallen and @channingallen at @stripe",
"location": "San Francisco, CA",
"url": "IndieHackers.com",
"join_date": "Joined July 2016",
"following": "1,211",
"followers": "91.4K",
"latest_tweets": [
{
"name": "Indie Hackers",
"user_url": "https://twitter.com/IndieHackers",
"content": "Tweet @IndieHackers or use hashtag #indiehackers and we'll retweet genuine questions and requests. We've got over 90,000 followers who can potentially help you!",
"timestamp": "2022-11-14T16:37:42.000Z",
"retweets": "5",
"likes": "21",
"replies": "8"
},
{
"name": "Indie Hackers",
"user_url": "https://twitter.com/IndieHackers",
"content": "Have you ever worked with a virtual assistant (personally or for your business)? If you have, leave a comment on what sorts of tasks your virtual assistant helped you with!",
"timestamp": "2022-11-17T17:25:32.000Z",
"retweets": "",
"likes": "6",
"replies": "6"
},
{
"name": "@levelsio",
"user_url": "https://twitter.com/levelsio",
"content": "Seeing @elonmusk ship like an indie hacker",
"timestamp": "2022-11-14T16:36:06.000Z",
"retweets": "38",
"likes": "925",
"replies": "41"
},
{
"name": "Charlie Ward",
"user_url": "https://twitter.com/charlierward",
"content": "If you make $10k+ MRR, you're currently in the top ~75 products on @IndieHackers (where Stripe revenue is verified).",
"timestamp": "2022-11-10T12:43:01.000Z",
"retweets": "5",
"likes": "85",
"replies": "9"
},
{
"name": "Indie Hackers",
"user_url": "https://twitter.com/IndieHackers",
"content": "Let us pls know if anyone still has trouble reaching out via DMs",
"timestamp": "2022-11-10T14:36:06.000Z",
"retweets": "1",
"likes": "",
"replies": "1"
}
]
},
...
}
First, we need to open the Twitter search page with the desired hashtag.
https://twitter.com/hashtag/NoCode
The page we see must look similar to the following one:
From this page, we will scrape the following attributes for each tweet:
/* Parent: */
[data-testid=tweet]
/* Name: */
[data-testid='User-Names'] a
/* User URL: */
[data-testid='User-Names'] a
/* Content: */
div[data-testid=tweetText]
/* Timestamp: */
time
/* Retweets: */
div[data-testid=retweet]
/* Likes: */
div[data-testid=like]
/* Replies: */
div[data-testid=reply]
Now, let's handle the pagination.
document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({ behavior: 'smooth' }));
Now let's build the request that will scrape all tweets that the search page returned.
The following examples will show how to scrape 3 pages of tweets from Twitter's search page.
{
"api_key": "YOUR_PAGE2API_KEY",
"url": "https://twitter.com/hashtag/NoCode",
"parse": {
"tweets": [
{
"_parent": "[data-testid=tweet]",
"name": "[data-testid='User-Names'] a >> text",
"user_url": "[data-testid='User-Names'] a >> href",
"content": "div[data-testid=tweetText] >> text",
"timestamp": "time >> datetime",
"retweets": "div[data-testid=retweet] >> text",
"likes": "div[data-testid=like] >> text",
"replies": "div[data-testid=reply] >> text"
}
]
},
"scenario": [
{
"loop": [
{ "wait_for": "[data-testid=tweet]" },
{ "wait": 2 },
{ "execute_js": "document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({behavior: 'smooth'}));" }
],
"iterations": 3
},
{ "execute": "parse" }
],
"premium_proxy": "us",
"real_browser": true
}
require 'rest_client'
require 'json'
api_url = "https://www.page2api.com/api/v1/scrape"
payload = {
api_key: "YOUR_PAGE2API_KEY",
url: "https://twitter.com/hashtag/NoCode",
parse: {
tweets: [
{
_parent: "[data-testid=tweet]",
name: "[data-testid='User-Names'] a >> text",
user_url: "[data-testid='User-Names'] a >> href",
content: "div[data-testid=tweetText] >> text",
timestamp: "time >> datetime",
retweets: "div[data-testid=retweet] >> text",
likes: "div[data-testid=like] >> text",
replies: "div[data-testid=reply] >> text"
}
]
},
scenario: [
{
loop: [
{ wait_for: "[data-testid=tweet]" },
{ wait: 2 },
{ execute_js: "document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({behavior: 'smooth'}));" }
],
iterations: 3
},
{ execute: "parse" }
],
premium_proxy: "us",
real_browser: true
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"tweets": [
{
"name": "Hazel Lim",
"user_url": "https://twitter.com/byhazellim",
"content": "The Solopreneur Runway Model v1 is here! It's a 20min worksheet that helps you: -Decide if you can quit your job -Your runway -How many sales to achieve your goals Yours free for next 36 hrs -RT -Reply I'll DM. Must be following #nocode #indiehackers",
"timestamp": "2022-11-20T14:34:00.000Z",
"retweets": "2",
"likes": "2",
"replies": "2"
},
{
"name": "Andreas Just",
"user_url": "https://twitter.com/justnocode",
"content": "Is it me or is building with #nocode as exciting as building with Lego used to be.",
"timestamp": "2022-11-17T09:23:10.000Z",
"retweets": "6",
"likes": "25",
"replies": "8"
},
{
"name": "LJA",
"user_url": "https://twitter.com/bubbling_hot",
"content": "No coding with @bubble for 9 months, launched an app and have paying customers and finally understood Custom states today why something so small and simple took so long I'll never know #nocode #bubble #SaaS",
"timestamp": "2022-11-19T21:20:13.000Z",
"retweets": "2",
"likes": "63",
"replies": "7"
},
{
"name": "Mustafa Tasci",
"user_url": "https://twitter.com/imtasci",
"content": "Notion + AI = magic Join me in the alpha waitlist! https://notion.so/product/ai?wr=8d378ae4a2f761a1&utm_source=notionFront&utm_medium=twitter&utm_campaign=ai-beta&utm_content=share… #nocode",
"timestamp": "2022-11-20T14:28:09.000Z",
"retweets": "1",
"likes": "3",
"replies": ""
},
...
]
},
...
}
"raw": {
"key": "tweets", "format": "csv"
}
The URL with encoded payload will be:
Press 'Encode'
Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.
Press 'Encode'
The result must look like the following one:
Collecting the data from Twitter manually can be a bit overwhelming and hard to scale.
However, a Web Scraping API can easily help you overcome this challenge and perform Twitter scraping in no time.
With Page2API you can quickly get access to the data you need, and use the time you saved on more important things!
In this article, you will find an easy way to scrape Yellow Pages with Page2API using one of your favorite programming languages or a no-code solution that will import the data to Google Sheets
In this article, you will find an easy way to scrape Airbnb listings with Page2API using one of your favorite programming languages or a no-code solution that will import the data to Google Sheets
This article will describe the easiest way to scrape Stock Pricing Data from Yahoo Finance with Page2API