Introduction
In today's business landscape, online reviews have become one of the most crucial factors in driving customer acquisition and loyalty.
Trustpilot.com, as a leading online review platform, hosts millions of reviews across different industries, helping customers make informed decisions and businesses improve their reputation.
However, analyzing these reviews manually can be a daunting task, especially for large corporations with thousands of reviews.
This is where scraping and sentiment analysis with AI can come in handy.
In this blog post, we will explore the step-by-step process of scraping Trustpilot reviews using Page2API, and then performing sentiment analysis on the extracted data using GPT-3.5-turbo.
By the end of this tutorial, you will be able to start building your own Trustpilot scraper that will extract the valuable insights from online reviews and improve your business's reputation.
Prerequisites
To start scraping Trustpilot reviews, we will need the following things:
- A Page2API account
- An OpenAI account
-
A Trustpilot company reviews page that we are interested in.
In our case, the company will be Mixbook. - Some basic Ruby coding skills.
How to scrape Trustpilot reviews
First what we need is to open the Mixbook's Trustpilot page.
https://www.trustpilot.com/review/mixbook.com
We will use this URL as the first parameter we need to start the scraping process.
The page that you see must look like the following one:
From the Trustpilot reviews page, we will scrape the following attributes from each review:- Title
- Content
Now, let's define the selectors for each attribute.
/* Parent: */
[data-review-content=true]
/* Title: */
[data-service-review-title-typography=true]
/* Content: */
[data-service-review-text-typography=true]
Let's handle the pagination.
There are two approaches that can help us scrape all the needed pages:
1. We can scrape the pages using the batch scraping feature
2. We can iterate through the pages by clicking on the Next page button
To keep the article short enough, we will only cover the batch approach.
Now it's time to build the request that will scrape Trustpilot reviews.
The following examples will show how to scrape 2 pages of reviews from Trustpilot.com
{
"api_key": "YOUR_PAGE2API_KEY",
"batch": {
"urls": [
"https://www.trustpilot.com/review/mixbook.com",
"https://www.trustpilot.com/review/mixbook.com?page=2"
],
"concurrency": 1,
"merge_results": true
},
"parse": {
"reviews": [
{
"title": "[data-service-review-title-typography=true] >> text",
"_parent": "[data-review-content=true]",
"content": "[data-service-review-text-typography=true] >> text"
}
]
}
}
require 'rest_client'
require 'json'
api_url = "https://www.page2api.com/api/v1/scrape"
payload = {
api_key: "YOUR_PAGE2API_KEY",
batch: {
urls: [
"https://www.trustpilot.com/review/mixbook.com",
"https://www.trustpilot.com/review/mixbook.com?page=2"
],
concurrency: 1,
merge_results: true
},
parse: {
reviews: [
{
title: "[data-service-review-title-typography=true] >> text",
_parent: "[data-review-content=true]",
content: "[data-service-review-text-typography=true] >> text"
}
]
}
}
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: api_url,
headers: { "Content-type" => "application/json" },
).body
result = JSON.parse(response)
puts(result)
{
"result": {
"reviews": [
{
"title": "There is ALMOST no limit to the number of pages",
"content": "The thing I like best about Mixbook is the almost unlimited number of pages you can put into a book.. no other platform offers this. Add to that the quick turn around time and the number of different layouts available, and even with those I can easily alter them to fit my needs. The sitewide sales allow of better deals. I highy recommend Mixbook."
},
{
"title": "I have used Mixbook many times and the…",
"content": "I have used Mixbook many times and the service is always great. On this last order I could not get the order to go through and using the chat option the person went above and beyond taking me through every issue until i was able to get the item ordered."
},
{
"title": "Making yearbooks for 10+ years",
"content": "I have been making year books for my family for more than 10 years now . Mixbook is easy to use , with many options of backgrounds , stickers etc and the ability to create my own formats. I can access my old projects too . Mix book keeps adding backgrounds and other options continuously so I can enhance my photo books further ."
}, ...
]
}, ...
}
How to summarize the reviews and perform the Sentiment Analysis with AI (GPT-3.5-turbo)
In the following part of the article, we will:
- Collect the scraped Trustpilot reviews and clean them up a little bit.
- Join the reviews into a single entity, separating each of them by a new line.
- Build a GPT prompt.
- Send the reviews content and the prompt to GPT.
- Enjoy the results.
From the code perspective, we will:
- Switch to Ruby. Because Ruby is cool and easy to read.
- Separate the code into two classes to enhance the readability.
- Provide the possibility to change the reviews page and the number of total pages dynamically.
require 'rest_client'
require 'json'
class Page2APIParser
def initialize(url, pages)
end
def perform
end
end
class GPTAnalyzer
def initialize(reviews_content)
end
def perform
end
end
reviews_url = ARGV[0] || raise('The reviews URL was not provided!')
pages = ARGV[1].to_i.nonzero? || 1
page2api = Page2APIParser.new(reviews_url, pages)
page2api.perform
gpt = GPTAnalyzer.new(page2api.reviews_content)
gpt.perform
puts gpt.result
The script can be called from the terminal like in the following examples:
$ ruby gpt.rb https://www.trustpilot.com/review/mixbook.com
$ ruby gpt.rb https://www.trustpilot.com/review/mixbook.com 2
require 'rest_client'
require 'json'
class Page2APIParser
API_KEY = ''
attr_reader :url, :pages, :reviews_content
def initialize(url, pages)
@url = url
@pages = pages
end
def perform
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: 'https://www.page2api.com/api/v1/scrape',
headers: { "Content-type" => "application/json" },
).body
reviews = JSON.parse(response)
# We will iterate through all the reviews and if the review title will be contained
# in the review body (aka content) - it will be ignored.
# Otherwise - it will be glued together with the review content.
compacted_reviews = reviews.map do |review|
title = review['title'].gsub('…', '')
content = review['content']
content.include?(title) ? content : "#{title}. #{content}"
end
@reviews_content = compacted_reviews.join("\n\n")
end
private
def payload
{
api_key: API_KEY,
batch: {
urls: reviews_urls,
concurrency: 1,
merge_results: true
},
raw: {
key: "reviews"
},
parse: {
reviews: [
{
_parent: "[data-review-content=true]",
title: "[data-service-review-title-typography=true] >> text",
content: "[data-service-review-text-typography=true] >> text"
}
]
}
}
end
def reviews_urls
(1..pages).to_a.map do |page_number|
if page_number == 1
url
else
"#{url}?page=#{page_number}"
end
end
end
end
API_KEY = 'Your Page2API API key'
page2api = Page2APIParser.new('https://www.trustpilot.com/review/mixbook.com', 1)
page2api.perform
puts page2api.reviews_content
There is ALMOST no limit to the number of pages. The thing I like best about Mixbook is the almost unlimited number of pages you can put into a book.. no other platform offers this. Add to that the quick turn around time and the number of different layouts available, and even with those I can easily alter them to fit my needs. The sitewide sales allow of better deals. I highy recommend Mixbook.
I have used Mixbook many times and the service is always great. On this last order I could not get the order to go through and using the chat option the person went above and beyond taking me through every issue until i was able to get the item ordered.
Making yearbooks for 10+ years. I have been making year books for my family for more than 10 years now . Mixbook is easy to use , with many options of backgrounds , stickers etc and the ability to create my own formats. I can access my old projects too . Mix book keeps adding backgrounds and other options continuously so I can enhance my photo books further .
Excellent photo book!. Lots of design/layout options. Very easy setup and photo import process. I selected pages that lay flat. My 97-page travel book of our 3-week trip was outstanding! Photos printed in the book are as vibrant and stunning as they are on my ipad! My book has gotten many compliments. I’ve enjoyed it so much, I ordered a second copy of the book. Great product and easy to create. Will definitely use Mixbook again for my next photo book(s).
...
Summarize the reviews by Positives and Negatives in bullet points. Perform the Sentiment Analysis.
require 'rest_client'
require 'json'
class GPTAnalyzer
API_KEY = ''
attr_reader :reviews_content, :result
def initialize(reviews_content)
@reviews_content = reviews_content
end
def perform
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: 'https://api.openai.com/v1/chat/completions',
headers: {
"Content-type" => "application/json",
"Authorization" => "Bearer #{API_KEY}"
},
).body
analysis = JSON.parse(response)
@result = analysis.dig('choices', 0, 'message', 'content')
end
private
def payload
{
model: "gpt-3.5-turbo",
messages: [
{
role: "system",
content: "Summarize the reviews by Positives and Negatives in bullet points. Perform the Sentiment Analysis."
},
{
role: "user",
content: reviews_content
}
]
}
end
end
API_KEY = 'Your OpenAI API key'
reviews_content = <<-TEXT
There is ALMOST no limit to the number of pages. The thing I like best about Mixbook is the almost unlimited number of pages you can put into a book.. no other platform offers this. Add to that the quick turn around time and the number of different layouts available, and even with those I can easily alter them to fit my needs. The sitewide sales allow of better deals. I highy recommend Mixbook.
I have used Mixbook many times and the service is always great. On this last order I could not get the order to go through and using the chat option the person went above and beyond taking me through every issue until i was able to get the item ordered.
Making yearbooks for 10+ years. I have been making year books for my family for more than 10 years now . Mixbook is easy to use , with many options of backgrounds , stickers etc and the ability to create my own formats. I can access my old projects too . Mix book keeps adding backgrounds and other options continuously so I can enhance my photo books further .
Excellent photo book!. Lots of design/layout options. Very easy setup and photo import process. I selected pages that lay flat. My 97-page travel book of our 3-week trip was outstanding! Photos printed in the book are as vibrant and stunning as they are on my ipad! My book has gotten many compliments. I’ve enjoyed it so much, I ordered a second copy of the book. Great product and easy to create. Will definitely use Mixbook again for my next photo book(s).
TEXT
gpt = GPTAnalyzer.new(reviews_content)
gpt.perform
puts gpt.result
require 'rest_client'
require 'json'
class Page2APIParser
API_KEY = 'Your Page2API API key'
attr_reader :url, :pages, :reviews_content
def initialize(url, pages)
@url = url
@pages = pages
end
def perform
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: 'https://www.page2api.com/api/v1/scrape',
headers: { "Content-type" => "application/json" },
).body
reviews = JSON.parse(response)
compacted_reviews = reviews.map do |review|
title = review['title'].gsub('…', '')
content = review['content']
content.include?(title) ? content : "#{title}. #{content}"
end
@reviews_content = compacted_reviews.join("\n\n")
end
private
def payload
{
api_key: API_KEY,
batch: {
urls: reviews_urls,
concurrency: 1,
merge_results: true
},
raw: {
key: "reviews"
},
parse: {
reviews: [
{
_parent: "[data-review-content=true]",
title: "[data-service-review-title-typography=true] >> text",
content: "[data-service-review-text-typography=true] >> text"
}
]
}
}
end
def reviews_urls
(1..pages).to_a.map do |page_number|
if page_number == 1
url
else
"#{url}?page=#{page_number}"
end
end
end
end
class GPTAnalyzer
API_KEY = 'Your OpenAI API key'
attr_reader :reviews_content, :result
def initialize(reviews_content)
@reviews_content = reviews_content
end
def perform
response = RestClient::Request.execute(
method: :post,
payload: payload.to_json,
url: 'https://api.openai.com/v1/chat/completions',
headers: {
"Content-type" => "application/json",
"Authorization" => "Bearer #{API_KEY}"
},
).body
analysis = JSON.parse(response)
@result = analysis.dig('choices', 0, 'message', 'content')
end
private
def payload
{
model: "gpt-3.5-turbo",
messages: [
{
role: "system",
content: "Summarize the reviews by Positives and Negatives in bullet points. Perform the Sentiment Analysis."
},
{
role: "user",
content: reviews_content
}
]
}
end
end
reviews_url = ARGV[0] || raise('The reviews URL was not provided!')
pages = ARGV[1].to_i.nonzero? || 1
page2api = Page2APIParser.new(reviews_url, pages)
page2api.perform
gpt = GPTAnalyzer.new(page2api.reviews_content)
gpt.perform
puts gpt.result
$ ruby gpt.rb https://www.trustpilot.com/review/mixbook.com 2
Positive reviews:
- Unlimited number of pages offered, no other platform does this
- Quick turnaround time and a wide range of layouts to choose from
- Continuous addition of new backgrounds and options to enhance photo books further
- Excellent customer service, helpful and polite staff
- Quality of photo books is great, vibrant and stunning colors, and easy to create
- User-friendly interface, easy to navigate through backgrounds, framing, stickers, etc.
- Lots of options to create unique books and the finished product always looks good
- Many satisfied customers who keep coming back to use Mixbook
Negative reviews:
- Some customers received a photo book that did not meet their expectations (e.g., faded colors, thick pages were too heavy)
- Some customers experienced issues with the website or the chat option when placing an order
- Mixbook mistakenly left some text or images out of some customers orders, requiring a second or third order to be placed
- Some customers found that the creases in the photo books prevented them from using certain layouts
- Some customers found that the quality of the printing could be better
- Some customers found that the personalization options were limited, hence were unable to achieve the exact look they wanted.
Sentiment Analysis:
The reviews mostly convey positive sentiment.
Customers either praise the unlimited number of pages and customizable features that Mixbook offers,
or commend the quick turnaround time, great customer service, and quality of the photo books.
Some negative reviews point out some personalization limitations, quality of printing,
or issues with the website or customer service. However, overall, the positive comments outweigh the negative ones.
Conclusion
In conclusion, scraping Trustpilot reviews with Page2API can be a powerful tool for businesses looking to improve their online reputation and customer acquisition.
By leveraging the code examples provided in this blog post, you can easily extract and summarize large volumes of review data from Trustpilot using various programming languages.
Additionally, by performing sentiment analysis on this data, you can gain valuable insights into customer feedback and identify areas of improvement.
With the help of AI and natural language processing techniques, businesses can better understand their customers and make data-driven decisions to improve their products and services.
We hope that this tutorial has provided you with a better understanding of how to scrape and analyze Trustpilot reviews, and how to leverage these insights to improve your business.