How to Scrape Trustpilot Reviews and Perform Sentiment Analysis with AI


2023-04-30 - 10 min read

Nicolae Rotaru
Nicolae Rotaru

Introduction

In today's business landscape, online reviews have become one of the most crucial factors in driving customer acquisition and loyalty.
Trustpilot.com, as a leading online review platform, hosts millions of reviews across different industries, helping customers make informed decisions and businesses improve their reputation.
However, analyzing these reviews manually can be a daunting task, especially for large corporations with thousands of reviews.
This is where scraping and sentiment analysis with AI can come in handy.

In this blog post, we will explore the step-by-step process of scraping Trustpilot reviews using Page2API, and then performing sentiment analysis on the extracted data using GPT-3.5-turbo.
By the end of this tutorial, you will be able to start building your own Trustpilot scraper that will extract the valuable insights from online reviews and improve your business's reputation.

Prerequisites

To start scraping Trustpilot reviews, we will need the following things:


How to scrape Trustpilot reviews

First what we need is to open the Mixbook's Trustpilot page.


The exact URL will be:

  
    https://www.trustpilot.com/review/mixbook.com


We will use this URL as the first parameter we need to start the scraping process.


The page that you see must look like the following one:

Trustpilot reviews page From the Trustpilot reviews page, we will scrape the following attributes from each review:

  • Title
  • Content

Now, let's define the selectors for each attribute.

  
    /* Parent: */
    [data-review-content=true]

    /* Title: */
    [data-service-review-title-typography=true]

    /* Content: */
    [data-service-review-text-typography=true]

  

Let's handle the pagination.
There are two approaches that can help us scrape all the needed pages:

1. We can scrape the pages using the batch scraping feature
2. We can iterate through the pages by clicking on the Next page button

To keep the article short enough, we will only cover the batch approach.


Now it's time to build the request that will scrape Trustpilot reviews.

The following examples will show how to scrape 2 pages of reviews from Trustpilot.com

With the batch scraping approach, our payload will look like:

  
    {
      "api_key": "YOUR_PAGE2API_KEY",
      "batch": {
        "urls": [
          "https://www.trustpilot.com/review/mixbook.com",
          "https://www.trustpilot.com/review/mixbook.com?page=2"
        ],
        "concurrency": 1,
        "merge_results": true
      },
      "parse": {
        "reviews": [
          {
            "title": "[data-service-review-title-typography=true] >> text",
            "_parent": "[data-review-content=true]",
            "content": "[data-service-review-text-typography=true] >> text"
          }
        ]
      }
    }
  

Code examples (batch scraping approach)

      
    require 'rest_client'
    require 'json'

    api_url = "https://www.page2api.com/api/v1/scrape"
    payload = {
      api_key: "YOUR_PAGE2API_KEY",
      batch: {
        urls: [
          "https://www.trustpilot.com/review/mixbook.com",
          "https://www.trustpilot.com/review/mixbook.com?page=2"
        ],
        concurrency: 1,
        merge_results: true
      },
      parse: {
        reviews: [
          {
            title: "[data-service-review-title-typography=true] >> text",
            _parent: "[data-review-content=true]",
            content: "[data-service-review-text-typography=true] >> text"
          }
        ]
      }
    }

    response = RestClient::Request.execute(
      method: :post,
      payload: payload.to_json,
      url: api_url,
      headers: { "Content-type" => "application/json" },
    ).body

    result = JSON.parse(response)

    puts(result)
      
    

The result

  
    {
      "result": {
        "reviews": [
          {
            "title": "There is ALMOST no limit to the number of pages",
            "content": "The thing I like best about Mixbook is the almost unlimited number of pages you can put into a book.. no other platform offers this. Add to that the quick turn around time and the number of different layouts available, and even with those I can easily alter them to fit my needs. The sitewide sales allow of better deals. I highy recommend Mixbook."
          },
          {
            "title": "I have used Mixbook many times and the…",
            "content": "I have used Mixbook many times and the service is always great. On this last order I could not get the order to go through and using the chat option the person went above and beyond taking me through every issue until i was able to get the item ordered."
          },
          {
            "title": "Making yearbooks for 10+ years",
            "content": "I have been making year books for my family for more than 10 years now . Mixbook is easy to use , with many options of backgrounds , stickers etc and the ability to create my own formats. I can access my old projects too . Mix book keeps adding backgrounds and other options continuously so I can enhance my photo books further ."
          }, ...
        ]
      }, ...
    }
  

How to summarize the reviews and perform the Sentiment Analysis with AI (GPT-3.5-turbo)

In the following part of the article, we will:


  • Collect the scraped Trustpilot reviews and clean them up a little bit.
  • Join the reviews into a single entity, separating each of them by a new line.
  • Build a GPT prompt.
  • Send the reviews content and the prompt to GPT.
  • Enjoy the results.

From the code perspective, we will:


  • Switch to Ruby. Because Ruby is cool and easy to read.
  • Separate the code into two classes to enhance the readability.
  • Provide the possibility to change the reviews page and the number of total pages dynamically.

Let's start by creating a new file (gpt.rb) with the following structure

  
  require 'rest_client'
  require 'json'

  class Page2APIParser
    def initialize(url, pages)
    end

    def perform
    end
  end

  class GPTAnalyzer
    def initialize(reviews_content)
    end

    def perform
    end
  end

  reviews_url = ARGV[0] || raise('The reviews URL was not provided!')
  pages = ARGV[1].to_i.nonzero? || 1

  page2api = Page2APIParser.new(reviews_url, pages)
  page2api.perform

  gpt = GPTAnalyzer.new(page2api.reviews_content)
  gpt.perform

  puts gpt.result
  

This is our main script
It receives 2 arguments: the Trustpilot reviews page URL, and the number of total pages to scrape.
The script can be called from the terminal like in the following examples:

For one page
  
    $ ruby gpt.rb https://www.trustpilot.com/review/mixbook.com
  

For multiple pages
  
    $ ruby gpt.rb https://www.trustpilot.com/review/mixbook.com 2
  

Now let's use the code from the first part of the article and build the parser

  
  require 'rest_client'
  require 'json'

  class Page2APIParser
    API_KEY = ''

    attr_reader :url, :pages, :reviews_content

    def initialize(url, pages)
      @url = url
      @pages = pages
    end

    def perform
      response = RestClient::Request.execute(
        method: :post,
        payload: payload.to_json,
        url: 'https://www.page2api.com/api/v1/scrape',
        headers: { "Content-type" => "application/json" },
      ).body

      reviews = JSON.parse(response)

      # We will iterate through all the reviews and if the review title will be contained
      # in the review body (aka content) - it will be ignored.
      # Otherwise - it will be glued together with the review content.

      compacted_reviews = reviews.map do |review|
        title = review['title'].gsub('…', '')
        content = review['content']

        content.include?(title) ? content : "#{title}. #{content}"
      end

      @reviews_content = compacted_reviews.join("\n\n")
    end

    private

    def payload
      {
        api_key: API_KEY,
        batch: {
          urls: reviews_urls,
          concurrency: 1,
          merge_results: true
        },
        raw: {
          key: "reviews"
        },
        parse: {
          reviews: [
            {
              _parent: "[data-review-content=true]",
              title: "[data-service-review-title-typography=true] >> text",
              content: "[data-service-review-text-typography=true] >> text"
            }
          ]
        }
      }
    end

    def reviews_urls
      (1..pages).to_a.map do |page_number|
        if page_number == 1
          url
        else
          "#{url}?page=#{page_number}"
        end
      end
    end
  end
  

You can test the parser by updating the API_KEY

  
    API_KEY = 'Your Page2API API key'
  
and running

  
    page2api = Page2APIParser.new('https://www.trustpilot.com/review/mixbook.com', 1)
    page2api.perform

    puts page2api.reviews_content
  

The parser will generate the following content

  
    There is ALMOST no limit to the number of pages. The thing I like best about Mixbook is the almost unlimited number of pages you can put into a book.. no other platform offers this. Add to that the quick turn around time and the number of different layouts available, and even with those I can easily alter them to fit my needs. The sitewide sales allow of better deals. I highy recommend Mixbook.

    I have used Mixbook many times and the service is always great. On this last order I could not get the order to go through and using the chat option the person went above and beyond taking me through every issue until i was able to get the item ordered.

    Making yearbooks for 10+ years. I have been making year books for my family for more than 10 years now . Mixbook is easy to use , with many options of backgrounds , stickers etc and the ability to create my own formats. I can access my old projects too . Mix book keeps adding backgrounds and other options continuously so I can enhance my photo books further .

    Excellent photo book!. Lots of design/layout options. Very easy setup and photo import process. I selected pages that lay flat. My 97-page travel book of our 3-week trip was outstanding! Photos printed in the book are as vibrant and stunning as they are on my ipad! My book has gotten many compliments. I’ve enjoyed it so much, I ordered a second copy of the book. Great product and easy to create. Will definitely use Mixbook again for my next photo book(s).

    ...
  

Now let's build the GPT analyzer.

The working principle is similar, but instead of reviews URL and the number of pages, the class will receive the reviews content, build a payload, send it to GPT API and print the result.

We will use the following GPT prompt for our request

  
    Summarize the reviews by Positives and Negatives in bullet points. Perform the Sentiment Analysis.
  

Here is our GPT class

  
  require 'rest_client'
  require 'json'

  class GPTAnalyzer
    API_KEY = ''

    attr_reader :reviews_content, :result

    def initialize(reviews_content)
      @reviews_content = reviews_content
    end

    def perform
      response = RestClient::Request.execute(
        method: :post,
        payload: payload.to_json,
        url: 'https://api.openai.com/v1/chat/completions',
        headers: {
          "Content-type" => "application/json",
          "Authorization" => "Bearer #{API_KEY}"
        },
      ).body

      analysis = JSON.parse(response)

      @result = analysis.dig('choices', 0, 'message', 'content')
    end

    private

    def payload
      {
        model: "gpt-3.5-turbo",
        messages: [
          {
            role: "system",
            content: "Summarize the reviews by Positives and Negatives in bullet points. Perform the Sentiment Analysis."
          },
          {
            role: "user",
            content: reviews_content
          }
        ]
      }
    end
  end
  

You can test the GPT analyzer by updating the API_KEY

  
    API_KEY = 'Your OpenAI API key'
  
and running

  
    reviews_content = <<-TEXT
      There is ALMOST no limit to the number of pages. The thing I like best about Mixbook is the almost unlimited number of pages you can put into a book.. no other platform offers this. Add to that the quick turn around time and the number of different layouts available, and even with those I can easily alter them to fit my needs. The sitewide sales allow of better deals. I highy recommend Mixbook.

      I have used Mixbook many times and the service is always great. On this last order I could not get the order to go through and using the chat option the person went above and beyond taking me through every issue until i was able to get the item ordered.

      Making yearbooks for 10+ years. I have been making year books for my family for more than 10 years now . Mixbook is easy to use , with many options of backgrounds , stickers etc and the ability to create my own formats. I can access my old projects too . Mix book keeps adding backgrounds and other options continuously so I can enhance my photo books further .

      Excellent photo book!. Lots of design/layout options. Very easy setup and photo import process. I selected pages that lay flat. My 97-page travel book of our 3-week trip was outstanding! Photos printed in the book are as vibrant and stunning as they are on my ipad! My book has gotten many compliments. I’ve enjoyed it so much, I ordered a second copy of the book. Great product and easy to create. Will definitely use Mixbook again for my next photo book(s).
    TEXT

    gpt = GPTAnalyzer.new(reviews_content)
    gpt.perform

    puts gpt.result
  

Now let's glue everything together

  
  require 'rest_client'
  require 'json'

  class Page2APIParser
    API_KEY = 'Your Page2API API key'

    attr_reader :url, :pages, :reviews_content

    def initialize(url, pages)
      @url = url
      @pages = pages
    end

    def perform
      response = RestClient::Request.execute(
        method: :post,
        payload: payload.to_json,
        url: 'https://www.page2api.com/api/v1/scrape',
        headers: { "Content-type" => "application/json" },
      ).body

      reviews = JSON.parse(response)

      compacted_reviews = reviews.map do |review|
        title = review['title'].gsub('…', '')
        content = review['content']

        content.include?(title) ? content : "#{title}. #{content}"
      end

      @reviews_content = compacted_reviews.join("\n\n")
    end

    private

    def payload
      {
        api_key: API_KEY,
        batch: {
          urls: reviews_urls,
          concurrency: 1,
          merge_results: true
        },
        raw: {
          key: "reviews"
        },
        parse: {
          reviews: [
            {
              _parent: "[data-review-content=true]",
              title: "[data-service-review-title-typography=true] >> text",
              content: "[data-service-review-text-typography=true] >> text"
            }
          ]
        }
      }
    end

    def reviews_urls
      (1..pages).to_a.map do |page_number|
        if page_number == 1
          url
        else
          "#{url}?page=#{page_number}"
        end
      end
    end
  end

  class GPTAnalyzer
    API_KEY = 'Your OpenAI API key'

    attr_reader :reviews_content, :result

    def initialize(reviews_content)
      @reviews_content = reviews_content
    end

    def perform
      response = RestClient::Request.execute(
        method: :post,
        payload: payload.to_json,
        url: 'https://api.openai.com/v1/chat/completions',
        headers: {
          "Content-type" => "application/json",
          "Authorization" => "Bearer #{API_KEY}"
        },
      ).body

      analysis = JSON.parse(response)

      @result = analysis.dig('choices', 0, 'message', 'content')
    end

    private

    def payload
      {
        model: "gpt-3.5-turbo",
        messages: [
          {
            role: "system",
            content: "Summarize the reviews by Positives and Negatives in bullet points. Perform the Sentiment Analysis."
          },
          {
            role: "user",
            content: reviews_content
          }
        ]
      }
    end
  end



  reviews_url = ARGV[0] || raise('The reviews URL was not provided!')
  pages = ARGV[1].to_i.nonzero? || 1

  page2api = Page2APIParser.new(reviews_url, pages)
  page2api.perform

  gpt = GPTAnalyzer.new(page2api.reviews_content)
  gpt.perform

  puts gpt.result
  

Let's run the script

  
    $ ruby gpt.rb https://www.trustpilot.com/review/mixbook.com 2
  

The result must look like the following one

  
  Positive reviews:
  - Unlimited number of pages offered, no other platform does this
  - Quick turnaround time and a wide range of layouts to choose from
  - Continuous addition of new backgrounds and options to enhance photo books further
  - Excellent customer service, helpful and polite staff
  - Quality of photo books is great, vibrant and stunning colors, and easy to create
  - User-friendly interface, easy to navigate through backgrounds, framing, stickers, etc.
  - Lots of options to create unique books and the finished product always looks good
  - Many satisfied customers who keep coming back to use Mixbook

  Negative reviews:
  - Some customers received a photo book that did not meet their expectations (e.g., faded colors, thick pages were too heavy)
  - Some customers experienced issues with the website or the chat option when placing an order
  - Mixbook mistakenly left some text or images out of some customers orders, requiring a second or third order to be placed
  - Some customers found that the creases in the photo books prevented them from using certain layouts
  - Some customers found that the quality of the printing could be better
  - Some customers found that the personalization options were limited, hence were unable to achieve the exact look they wanted.

  Sentiment Analysis:
  The reviews mostly convey positive sentiment.
  Customers either praise the unlimited number of pages and customizable features that Mixbook offers,
  or commend the quick turnaround time, great customer service, and quality of the photo books.
  Some negative reviews point out some personalization limitations, quality of printing,
  or issues with the website or customer service. However, overall, the positive comments outweigh the negative ones.
  


Conclusion

In conclusion, scraping Trustpilot reviews with Page2API can be a powerful tool for businesses looking to improve their online reputation and customer acquisition.

By leveraging the code examples provided in this blog post, you can easily extract and summarize large volumes of review data from Trustpilot using various programming languages.
Additionally, by performing sentiment analysis on this data, you can gain valuable insights into customer feedback and identify areas of improvement.

With the help of AI and natural language processing techniques, businesses can better understand their customers and make data-driven decisions to improve their products and services.
We hope that this tutorial has provided you with a better understanding of how to scrape and analyze Trustpilot reviews, and how to leverage these insights to improve your business.

You might also like

Nicolae Rotaru
Nicolae Rotaru
2023-04-10 - 4 min read

How to Scrape IMDB Movies Data (Code & No Code)

In this article, you will find an easy way to scrape IMDB movies with Page2API using one of your favorite programming languages or a no-code solution that will import IMDB movies data to Google Sheets

Nicolae Rotaru
Nicolae Rotaru
2023-01-07 - 5 min read

How to Scrape Youtube Data: Video and Channel Details (Code & No Code)

In this article, you will find an easy way to scrape Youtube with Page2API using one of your favorite programming languages or a no-code solution that will import Youtube channel videos to Google Sheets

Nicolae Rotaru
Nicolae Rotaru
2022-11-21 - 4 min read

How to Scrape Instagram: Account Data, Posts, Images (Code & No Code)

In this article, you will find an easy way to scrape Instagram with Page2API using one of your favorite programming languages or a no-code solution that will import Instagram posts to Google Sheets

What customers are saying

Superb support
Superb, reliable support, even out of hours, patient and polite plus educational.
October 21, 2023
Very effective and trustworthy
Very effective and trustworthy!
I had some challenges which were addressed right away.
October 12, 2023
Page2API is without fail my favorite scraping API
Not only does Page2API work without fail constantly, but their customer support team is on a new level.
If i ever have issues integrating or have errors in my code they've always been responsive almost instantly and helped fix any errors.
I've never seen customer service like this anywhere, so massive thanks to the Page2API team.
July 14, 2023
Amazing product and support!
I have tried a lot of different scraping solutions and Page2Api is definitely the best one. It's very developer-friendly and Nick is extremely innovative in coming up with new ideas to solve problems.
The support is unreal as well.
I have sent Nick a request that I have trouble scraping and he's helped me fix all of them. Can highly recommend.
April 13, 2023
This API is amazing and the support was GREAT
This API is amazing and I am very excited to keep using it.
I'm writing this review because I was stumped on a very hard scrape for youtube transcripts, I brought my issue to support and in no time they had written what looks like a very tailored and complicated API call for me, I tested it and it worked perfect! Great great support.
April 19, 2023
Excellent service, super technical support!
I have been looking for such a quality for a long time, I have never met such an individual approach to clients.
Everything is at the highest level!
Nick very quickly helped to deal with all my questions, I am very grateful to him!
Recommend!
February 08, 2023
Fantastic Product and Customer Service
I'm a no-code guy trying to hack it in an API world... so I was pretty apprehensive about what I would be getting into with this.
I'm please to say that the customer service is so fantastic that they got me a solution in under 30 seconds that worked instantly in my application.
They did a great job and it works exactly as advertised.
Highly recommend them!
March 24, 2023
Surprisingly great service and support
I have certainly not come across any other internet initiative in the internet world that provides such good technical support and tries to help even if they are not related to them.
I will take as an example the approach of page2api to the customer in the startups I have founded.
February 16, 2023
Perfect for webcrapping javascript generated webpages
Page2API is perfect to be use from bubble or any other nocode tool.
It works submitting forms, scrapping info, and loading javascript generated content in webpages.
January 22, 2023
Best scraping service - tried them all
Hands down the best scraping service there is for a no-coder (...and I've tried them all).
Fast, easy to use, great documentation and stellar support.
Wish I'd found this months and months ago of waisting time at others. Highly recommend!
May 05, 2023
The best web scraper API for Bubble apps
Having tried several web scraper APIs I have found that Page2API is the best web scraper API for integrating with the Bubble API connector.
If you're a Bubble app developer Page2API is the web scraper you've been looking for!
November 30, 2022
Customer service is WORLD CLASS
Nick is serious about his business -- super knowledgeable and helpful whenever we have the slightest problem.
Honestly, the best customer service of any SaaS I've had the pleasure of working with.
10/10.
December 02, 2022
It's a perfect product
This team has a very high sense of responsibility for the product.
They let me know the part I don't know so kindly.
I didn't feel any discomfort when I used it in Korea
June 12, 2023
Highly professional support!
Amazing quick support!
But more than that, an actual relevant and pro help which solved my issue.
April 19, 2023
Incredible
Nick was incredible.
He helped me so much.
Need it for a research project and I highly highly recommend this service.
December 21, 2022
Great product, great support
I was searching for a scraping tool which fits to different types of needs and found Page2API.
The support is amazing and the product, too!
We will use Page2API also for our agency clients now.
Thank you for this great tool!
March 07, 2023
Really good provider for web-scraping…
Really good provider for web-scraping services, their customer service is top notch!
January 25, 2023
Great service with absolutely…
Great service with absolutely outstanding support
December 01, 2022

Ready to Scrape the Web like a PRO?

1000 free API calls.
Based on all requests made in the last 30 days. 99.85% success rate.
No-code-friendly.
Trustpilot stars 4.6