How to Scrape Twitter: Account Data, Tweets, and more (Code & No Code)


2022-11-20 - 4 min read

Nicolae Rotaru
Nicolae Rotaru

Introduction

Twitter is a microblogging and social networking service owned by American company Twitter, Inc., on which users post and interact with messages known as "tweets".


In this article, you will read about the easiest way to web scrape Twitter data with Page2API.


You will find code examples for Ruby, Python, PHP, NodeJS, cURL, and a No-Code solution that will import Twitter data into Google Sheets.


You can scrape Twitter data, with such information as account data, tweets, etc., to perform:

  • sentiment analysis
  • trends analysis
  • or simply to monitor a specific topic or an account


In this article, we will learn how to:

Prerequisites

To start scraping Twitter, you will need the following things:


  • A Page2API account
  • A Twitter profile you want to scrape, let's use for example Indie Hackers

How to scrape Twitter Account Data

The first thing you need is to open the profile page we are interested in.


In our case the URL will be:

  
    https://twitter.com/IndieHackers


The page will look like the following one:

Twitter profile page

From this page, we will scrape the following attributes:

Twitter Account data

  • Name
  • Verified
  • Description
  • Location
  • User URL
  • Join date
  • Following
  • Followers

Latest tweets
  • Name
  • User URL
  • Content
  • Timestamp
  • Retweets
  • Likes
  • Replies

Let's define the selectors for each attribute.

Twitter Account data selectors
  
    /* Name: */
    [data-testid=UserName] span

    /* Verified: */
    document.querySelectorAll('[data-testid=UserName] svg[role=img]').length == 1

    /* Description: */
    [data-testid=UserDescription]

    /* Location: */
    span[data-testid=UserLocation]

    /* User URL: */
    a[data-testid=UserUrl]

    /* Join date: */
    span[data-testid=UserJoinDate]

    /* Following: */
    a[href*=following] > span > span

    /* Followers: */
    a[href*=followers] > span > span
  

Latest tweets selectors
  
    /* Parent: */
    [data-testid=tweet]

    /* Name: */
    [data-testid='User-Names'] a

    /* User URL: */
    [data-testid='User-Names'] a

    /* Content: */
    div[data-testid=tweetText]

    /* Timestamp: */
    time

    /* Retweets: */
    div[data-testid=retweet]

    /* Likes: */
    div[data-testid=like]

    /* Replies: */
    div[data-testid=reply]
  
The payload for our scraping request will be:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "url": "https://twitter.com/IndieHackers",
      "parse": {
        "name": "[data-testid=UserName] span >> text",
        "verified": "js >> document.querySelectorAll('[data-testid=UserName] svg[role=img]').length == 1",
        "description": "[data-testid=UserDescription] >> text",
        "location": "span[data-testid=UserLocation] >> text",
        "url": "a[data-testid=UserUrl] >> text",
        "join_date": "span[data-testid=UserJoinDate] >> text",
        "following": "a[href*=following] > span > span >> text",
        "followers": "a[href*=followers] > span > span >> text",
        "latest_tweets": [
          {
            "_parent": "[data-testid=tweet]",
            "name": "[data-testid='User-Names'] a >> text",
            "user_url": "[data-testid='User-Names'] a >> href",
            "content": "div[data-testid=tweetText] >> text",
            "timestamp": "time >> datetime",
            "retweets": "div[data-testid=retweet] >> text",
            "likes": "div[data-testid=like] >> text",
            "replies": "div[data-testid=reply] >> text"
          }
        ]
      },
      "wait_for": "[data-testid=tweet]",
      "premium_proxy": "us",
      "real_browser": true
    }
  

Running the scraping request

      
    require 'rest_client'
    require 'json'

    api_url ="https://www.page2api.com/api/v1/scrape"
    payload = {
      api_key: 'YOUR_PAGE2API_KEY',
      url: "https://twitter.com/IndieHackers",
      parse: {
        name: "[data-testid=UserName] span >> text",
        verified: "js >> document.querySelectorAll('[data-testid=UserName] svg[role=img]').length == 1",
        description: "[data-testid=UserDescription] >> text",
        location: "span[data-testid=UserLocation] >> text",
        url: "a[data-testid=UserUrl] >> text",
        join_date: "span[data-testid=UserJoinDate] >> text",
        following: "a[href*=following] > span > span >> text",
        followers: "a[href*=followers] > span > span >> text",
        latest_tweets: [
          {
            _parent: "[data-testid=tweet]",
            name: "[data-testid='User-Names'] a >> text",
            user_url: "[data-testid='User-Names'] a >> href",
            content: "div[data-testid=tweetText] >> text",
            timestamp: "time >> datetime",
            retweets: "div[data-testid=retweet] >> text",
            likes: "div[data-testid=like] >> text",
            replies: "div[data-testid=reply] >> text"
          }
        ]
      },
      wait_for: "[data-testid=tweet]",
      premium_proxy: "us",
      real_browser: true
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    print(result)
      
    

The result

  
    {
      "result": {
        "name": "Indie Hackers",
        "verified": false,
        "description": "Get inspired! Real stories, advice, and revenue numbers from the founders of profitable businesses by @csallen and @channingallen at @stripe",
        "location": "San Francisco, CA",
        "url": "IndieHackers.com",
        "join_date": "Joined July 2016",
        "following": "1,211",
        "followers": "91.4K",
        "latest_tweets": [
          {
            "name": "Indie Hackers",
            "user_url": "https://twitter.com/IndieHackers",
            "content": "Tweet @IndieHackers or use hashtag #indiehackers and we'll retweet genuine questions and requests. We've got over 90,000 followers who can potentially help you!",
            "timestamp": "2022-11-14T16:37:42.000Z",
            "retweets": "5",
            "likes": "21",
            "replies": "8"
          },
          {
            "name": "Indie Hackers",
            "user_url": "https://twitter.com/IndieHackers",
            "content": "Have you ever worked with a virtual assistant (personally or for your business)? If you have, leave a comment on what sorts of tasks your virtual assistant helped you with!",
            "timestamp": "2022-11-17T17:25:32.000Z",
            "retweets": "",
            "likes": "6",
            "replies": "6"
          },
          {
            "name": "@levelsio",
            "user_url": "https://twitter.com/levelsio",
            "content": "Seeing @elonmusk ship like an indie hacker",
            "timestamp": "2022-11-14T16:36:06.000Z",
            "retweets": "38",
            "likes": "925",
            "replies": "41"
          },
          {
            "name": "Charlie Ward",
            "user_url": "https://twitter.com/charlierward",
            "content": "If you make $10k+ MRR, you're currently in the top ~75 products on @IndieHackers (where Stripe revenue is verified).",
            "timestamp": "2022-11-10T12:43:01.000Z",
            "retweets": "5",
            "likes": "85",
            "replies": "9"
          },
          {
            "name": "Indie Hackers",
            "user_url": "https://twitter.com/IndieHackers",
            "content": "Let us pls know if anyone still has trouble reaching out via DMs",
            "timestamp": "2022-11-10T14:36:06.000Z",
            "retweets": "1",
            "likes": "",
            "replies": "1"
          }
        ]
      },
      ...
    }
  

How to scrape Tweets by Hashtag

First, we need to open the Twitter search page with the desired hashtag.

In our case it will be:

  
    https://twitter.com/hashtag/NoCode


The page we see must look similar to the following one:

Twitter tweets page

From this page, we will scrape the following attributes for each tweet:

  • Name
  • User URL
  • Content
  • Timestamp
  • Retweets
  • Likes
  • Replies

Now, let's define the selectors for each attribute.

  
    /* Parent: */
    [data-testid=tweet]

    /* Name: */
    [data-testid='User-Names'] a

    /* User URL: */
    [data-testid='User-Names'] a

    /* Content: */
    div[data-testid=tweetText]

    /* Timestamp: */
    time

    /* Retweets: */
    div[data-testid=retweet]

    /* Likes: */
    div[data-testid=like]

    /* Replies: */
    div[data-testid=reply]
  

Now, let's handle the pagination.

To load more tweets, we simply need to scroll to the bottom:

  
    document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({ behavior: 'smooth' }));
  

Now let's build the request that will scrape all tweets that the search page returned.

The following examples will show how to scrape 3 pages of tweets from Twitter's search page.

The payload for our scraping request will be:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "url": "https://twitter.com/hashtag/NoCode",
      "parse": {
        "tweets": [
          {
            "_parent": "[data-testid=tweet]",
            "name": "[data-testid='User-Names'] a >> text",
            "user_url": "[data-testid='User-Names'] a >> href",
            "content": "div[data-testid=tweetText] >> text",
            "timestamp": "time >> datetime",
            "retweets": "div[data-testid=retweet] >> text",
            "likes": "div[data-testid=like] >> text",
            "replies": "div[data-testid=reply] >> text"
          }
        ]
      },
      "scenario": [
        {
          "loop": [
            { "wait_for": "[data-testid=tweet]" },
            { "wait": 2 },
            { "execute_js": "document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({behavior: 'smooth'}));" }
          ],
          "iterations": 3
        },
        { "execute": "parse" }
      ],
      "premium_proxy": "us",
      "real_browser": true
    }
  

Code examples

      
    require 'rest_client'
    require 'json'

    api_url = "https://www.page2api.com/api/v1/scrape"
    payload = {
      api_key: "YOUR_PAGE2API_KEY",
      url: "https://twitter.com/hashtag/NoCode",
      parse: {
        tweets: [
          {
            _parent: "[data-testid=tweet]",
            name: "[data-testid='User-Names'] a >> text",
            user_url: "[data-testid='User-Names'] a >> href",
            content: "div[data-testid=tweetText] >> text",
            timestamp: "time >> datetime",
            retweets: "div[data-testid=retweet] >> text",
            likes: "div[data-testid=like] >> text",
            replies: "div[data-testid=reply] >> text"
          }
        ]
      },
      scenario: [
        {
          loop: [
            { wait_for: "[data-testid=tweet]" },
            { wait: 2 },
            { execute_js: "document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({behavior: 'smooth'}));" }
          ],
          iterations: 3
        },
        { execute: "parse" }
      ],
      premium_proxy: "us",
      real_browser: true
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

The result

  
    {
      "result": {
        "tweets": [
          {
            "name": "Hazel Lim",
            "user_url": "https://twitter.com/byhazellim",
            "content": "The Solopreneur Runway Model v1 is here! It's a 20min worksheet that helps you: -Decide if you can quit your job -Your runway -How many sales to achieve your goals Yours free for next 36 hrs -RT -Reply I'll DM. Must be following #nocode #indiehackers",
            "timestamp": "2022-11-20T14:34:00.000Z",
            "retweets": "2",
            "likes": "2",
            "replies": "2"
          },
          {
            "name": "Andreas Just",
            "user_url": "https://twitter.com/justnocode",
            "content": "Is it me or is building with #nocode as exciting as building with Lego used to be.",
            "timestamp": "2022-11-17T09:23:10.000Z",
            "retweets": "6",
            "likes": "25",
            "replies": "8"
          },
          {
            "name": "LJA",
            "user_url": "https://twitter.com/bubbling_hot",
            "content": "No coding with @bubble for 9 months, launched an app and have paying customers and finally understood Custom states today why something so small and simple took so long I'll never know #nocode #bubble #SaaS",
            "timestamp": "2022-11-19T21:20:13.000Z",
            "retweets": "2",
            "likes": "63",
            "replies": "7"
          },
          {
            "name": "Mustafa Tasci",
            "user_url": "https://twitter.com/imtasci",
            "content": "Notion + AI = magic Join me in the alpha waitlist! https://notion.so/product/ai?wr=8d378ae4a2f761a1&utm_source=notionFront&utm_medium=twitter&utm_campaign=ai-beta&utm_content=share… #nocode",
            "timestamp": "2022-11-20T14:28:09.000Z",
            "retweets": "1",
            "likes": "3",
            "replies": ""
          },
          ...
        ]
      },
      ...
    }
  

How to export Twitter data to Google Sheets

In order to be able to export our tweets to a Google Spreadsheet we will need to slightly modify our request to receive the data in CSV format instead of JSON.

According to the documentation, we need to add the following parameters to our payload:
  
    "raw": {
      "key": "tweets", "format": "csv"
    }
  

Now our payload will look like:

{ "api_key": "YOUR_PAGE2API_KEY", "raw": { "key": "tweets", "format": "csv" }, "url": "https://twitter.com/hashtag/NoCode", "parse": { "tweets": [ { "_parent": "[data-testid=tweet]", "name": "[data-testid='User-Names'] a >> text", "user_url": "[data-testid='User-Names'] a >> href", "content": "div[data-testid=tweetText] >> text", "timestamp": "time >> datetime", "retweets": "div[data-testid=retweet] >> text", "likes": "div[data-testid=like] >> text", "replies": "div[data-testid=reply] >> text" } ] }, "scenario": [ { "loop": [ { "wait_for": "[data-testid=tweet]" }, { "wait": 2 }, { "execute_js": "document.querySelectorAll('[data-testid=tweet]').forEach(e => e.scrollIntoView({behavior: 'smooth'}));" } ], "iterations": 3 }, { "execute": "parse" } ], "premium_proxy": "us", "real_browser": true }

Now, edit the payload above if needed, and press Encode →

The URL with encoded payload will be:


  Press 'Encode'

Note: If you are reading this article being logged in - you can copy the link above since it will already have your api_key in the encoded payload.

The final part is adding the IMPORTDATA function, and we are ready to import our Twitter data into a Google Spreadsheet.
  Press 'Encode'

The result must look like the following one:

Twitter data import to Google Sheets

Final thoughts

Collecting the data from Twitter manually can be a bit overwhelming and hard to scale.
However, a Web Scraping API can easily help you overcome this challenge and perform Twitter scraping in no time.
With Page2API you can quickly get access to the data you need, and use the time you saved on more important things!

You might also like:

Nicolae Rotaru
Nicolae Rotaru
2022-09-29 - 4 min read

How to Scrape Yellow Pages: Business Names, Addresses, Phone Numbers (Code & No Code)

In this article, you will find an easy way to scrape Yellow Pages with Page2API using one of your favorite programming languages or a no-code solution that will import the data to Google Sheets

Nicolae Rotaru
Nicolae Rotaru
2022-07-27 - 5 min read

How to Scrape Airbnb Data: Pricing, Ratings, Amenities (Code & No code)

In this article, you will find an easy way to scrape Airbnb listings with Page2API using one of your favorite programming languages or a no-code solution that will import the data to Google Sheets

Nicolae Rotaru
Nicolae Rotaru
2022-05-29 - 5 min read

How to Scrape Yahoo Finance Stock Pricing Data (+ No code)

This article will describe the easiest way to scrape Stock Pricing Data from Yahoo Finance with Page2API

Ready to Scrape the Web like a PRO?

1000 free API calls.
Based on all requests made in the last 30 days. 99.85% success rate.
No-code-friendly.
Trustpilot stars 4.5