How to Download Youtube Transcript for Free


2024-03-28 - 10 min read

Nicolae Rotaru
Nicolae Rotaru

Introduction

In this blog post, we'll explore the wasiest way to scrape and download a transcript from Youtube for free.
For this task, we will use Node.js and a browser library called puppeteer.

The code will launch a browser instance that will open the Youtube video URL and scrape the transcript which you will be able to save on your local machine.
You can use the code from this post to start building your own Youtube transcript downloader.

Prerequisites

To start downloading the transcripts from Youtube, you will need the following things:
  • Node.js installed on your local machine
  • Some basic HTML and JavaScript coding skills.

How to Scrape Youtube Transcript

Before starting, you need to open the URL of a Youtube video that is publicly available and not age-restricted.
We will use the following URL:

  
    https://www.youtube.com/watch?v=1WOQumXj0kg
  

The page will look like the following one:

Youtube video page
If you are from EU, a Cookie banner will be shown.
To close it programatically, you need to run the following JavaScript snippet:

  
    document.querySelector('button[aria-label*=cookies]')?.click()
  

The next step is locating the Show transcript button and clicking it.
To do this programatically, you need to run the following JavaScript snippet:

  
    document.querySelector('ytd-video-description-transcript-section-renderer button').click()
  

Show transcript button
Within a second, the transcript will be shown in the sidebar:
Youtube transcript component
To collect the text nodes that contain the transcript, you need to run this JavaScript snippet:

  
    Array.from(document.querySelectorAll('#segments-container yt-formatted-string')).map(
      element => element.textContent?.trim()
    ).join("\n");
  


It will generate a text that contains each line from the transcript, joined with a new line:
  
    in this video i'm going to demonstrate
    how to web scrape into your bubble
    application and i'm going to be using
    the web scraper page two api um i was
    working on a recent client project and
    uh i tried a number of different web

    ...
  

How to Download the Youtube Transcript with puppeteer

In this part we will write some Node.js code that will make it possible to download the Youtube transcript.

Before we start, we need to make sure that Node.js is installed.
If you are using macOS, you probably have installed Homebrew which makes things a little bit easier:
  
    brew install node
  


Now we need to create a new folder for our project: youtube-transcript:
  
    mkdir youtube-transcript && cd youtube-transcript
  


In this folder we will create the package.json file which will contain the information about our project:
  
  {
    "name": "transcript",
    "version": "1.0.0",
    "description": "Youtube transcript parser",
    "main": "index.js",
    "scripts": {
      "start": "npx functions-framework --target=run",
      "postinstall": "npx puppeteer browsers install chrome"
    },
    "author": "Your Name",
    "license": "ISC",
    "dependencies": {
      "puppeteer": "^22.6.0",
      "puppeteer-extra": "^3.3.6",
      "puppeteer-extra-plugin-stealth": "^2.11.2"
    }
  }
  


The next step is installing all the needed packages:
  
    npm install
  


Now let's create the index.js file and start putting all the things together.
  
    const puppeteer = require('puppeteer-extra')
    const StealthPlugin = require('puppeteer-extra-plugin-stealth')

    // using the stealth plugin to avoid being detected during scraping
    puppeteer.use(StealthPlugin())

    // this is the main function
    const run = async(req, res) => {
      const browser = await puppeteer.launch({
        headless: "new",
        ignoreDefaultArgs: ["--enable-automation"]
      }); // Starting the headless browser (Chrome)

      const page = await browser.newPage();
      let result = null;
      const url = process.argv[2] // reading the URL

      try {
        await page.goto(url, { waitUntil: 'domcontentloaded' }); // opening the youtube URL

        await page.evaluate(() => {
          document.querySelector('button[aria-label*=cookies]')?.click() // closing the Cookie banner
        });

        await page.waitForSelector("ytd-video-description-transcript-section-renderer button", {
          timeout: 10_000
        }) // waiting max 10 seconds for the 'Show transcript' button to appear

        await page.evaluate(() => {
          document.querySelector('ytd-video-description-transcript-section-renderer button').click()
        }) // clicking on the 'Show transcript' button

        result = await parseTranscript(page); // parsing the transcript

        await page.close()
        await browser.close()

        console.log(result) // returning the transcript
      } catch(error) {
        console.log(error)

        await page.close()
        await browser.close()
      }
    }

    // this function will parse the transcript
    const parseTranscript = async(page) => {
      // waiting max 10 seconds for the transcript container to appear
      await page.waitForSelector('#segments-container', {
        timeout: 10_000
      });

      // parsing all the text nodes from the transcript container and join them with an empty line
      return page.evaluate(() => {
        return Array.from(document.querySelectorAll('#segments-container yt-formatted-string')).map(
          element => element.textContent?.trim()
        ).join("\n");
      });
    }


    run()
  


To run this script, execute the following command:
  
    node index.js https://www.youtube.com/watch?v=1WOQumXj0kg
  


The result will look like the following one:
  
    in this video i'm going to demonstrate
    how to web scrape into your bubble
    application and i'm going to be using
    the web scraper page two api um i was
    working on a recent client project and
    uh i tried a number of different web
    scraper apis and i found that page two
    api uh offered the best integration for
    what i was trying to do
    with the bubble api connector plug-in
    so that's what i'll be demonstrating uh
    to you now um so uh if we head into
    the bubble api connector install this
    plug-in if you haven't already by bubble
    and we'll add another api

    ...
  


And if you want to save the transcript to a text file, execute the following command:
  
    node index.js https://www.youtube.com/watch?v=1WOQumXj0kg > transcript.txt
  

Conclusion

And there you have it—a simple yet effective way to download YouTube transcripts using Puppeteer!

This script demonstrates how we can leverage the power of automation to access and extract information from web pages, even when it involves interacting with elements like buttons and waiting for specific content to appear.

Remember, the core of this script utilizes Puppeteer's ability to simulate a real user's interactions, allowing us to navigate through pages, accept cookies, click on elements, and scrape the content we need. With the stealth plugin, we reduce the chances of being detected as a bot, making our scraping activities more seamless and efficient.

I hope this tutorial has demystified the process of web scraping with Puppeteer and shown you that with a bit of JavaScript, you can unlock a vast amount of data available on the web. Feel free to tweak this script to suit your needs—maybe you want to download transcripts from a list of URLs or incorporate additional data into your results.

Happy coding, and may your curiosity lead you to amazing projects!

You might also like

Nicolae Rotaru
Nicolae Rotaru
2023-09-16 - 10 min read

How to Scrape Tripadvisor Reviews and Perform Sentiment Analysis with AI

In this blog post, we will explore the step-by-step process of scraping Tripadvisor reviews using Page2API, and then performing sentiment analysis on the extracted data using GPT-3.5-turbo.

Nicolae Rotaru
Nicolae Rotaru
2023-05-15 - 6 min read

How to Download Instagram Videos with iPhone Shortcuts

In this article, you will read about the easiest way to download videos from Instagram with iPhone shortcuts and Page2API.

Nicolae Rotaru
Nicolae Rotaru
2023-05-10 - 9 min read

How to Scrape News Articles and Summarize the Content with AI

In this blog post, we'll explore how to scrape news articles with Page2API and summarize the extracted content using GPT-3.5-turbo

What customers are saying

Superb support
Superb, reliable support, even out of hours, patient and polite plus educational.
October 21, 2023
Very effective and trustworthy
Very effective and trustworthy!
I had some challenges which were addressed right away.
October 12, 2023
Page2API is without fail my favorite scraping API
Not only does Page2API work without fail constantly, but their customer support team is on a new level.
If i ever have issues integrating or have errors in my code they've always been responsive almost instantly and helped fix any errors.
I've never seen customer service like this anywhere, so massive thanks to the Page2API team.
July 14, 2023
Amazing product and support!
I have tried a lot of different scraping solutions and Page2Api is definitely the best one. It's very developer-friendly and Nick is extremely innovative in coming up with new ideas to solve problems.
The support is unreal as well.
I have sent Nick a request that I have trouble scraping and he's helped me fix all of them. Can highly recommend.
April 13, 2023
This API is amazing and the support was GREAT
This API is amazing and I am very excited to keep using it.
I'm writing this review because I was stumped on a very hard scrape for youtube transcripts, I brought my issue to support and in no time they had written what looks like a very tailored and complicated API call for me, I tested it and it worked perfect! Great great support.
April 19, 2023
Excellent service, super technical support!
I have been looking for such a quality for a long time, I have never met such an individual approach to clients.
Everything is at the highest level!
Nick very quickly helped to deal with all my questions, I am very grateful to him!
Recommend!
February 08, 2023
Fantastic Product and Customer Service
I'm a no-code guy trying to hack it in an API world... so I was pretty apprehensive about what I would be getting into with this.
I'm please to say that the customer service is so fantastic that they got me a solution in under 30 seconds that worked instantly in my application.
They did a great job and it works exactly as advertised.
Highly recommend them!
March 24, 2023
Surprisingly great service and support
I have certainly not come across any other internet initiative in the internet world that provides such good technical support and tries to help even if they are not related to them.
I will take as an example the approach of page2api to the customer in the startups I have founded.
February 16, 2023
Perfect for webcrapping javascript generated webpages
Page2API is perfect to be use from bubble or any other nocode tool.
It works submitting forms, scrapping info, and loading javascript generated content in webpages.
January 22, 2023
Best scraping service - tried them all
Hands down the best scraping service there is for a no-coder (...and I've tried them all).
Fast, easy to use, great documentation and stellar support.
Wish I'd found this months and months ago of waisting time at others. Highly recommend!
May 05, 2023
The best web scraper API for Bubble apps
Having tried several web scraper APIs I have found that Page2API is the best web scraper API for integrating with the Bubble API connector.
If you're a Bubble app developer Page2API is the web scraper you've been looking for!
November 30, 2022
Customer service is WORLD CLASS
Nick is serious about his business -- super knowledgeable and helpful whenever we have the slightest problem.
Honestly, the best customer service of any SaaS I've had the pleasure of working with.
10/10.
December 02, 2022
It's a perfect product
This team has a very high sense of responsibility for the product.
They let me know the part I don't know so kindly.
I didn't feel any discomfort when I used it in Korea
June 12, 2023
Highly professional support!
Amazing quick support!
But more than that, an actual relevant and pro help which solved my issue.
April 19, 2023
Incredible
Nick was incredible.
He helped me so much.
Need it for a research project and I highly highly recommend this service.
December 21, 2022
Great product, great support
I was searching for a scraping tool which fits to different types of needs and found Page2API.
The support is amazing and the product, too!
We will use Page2API also for our agency clients now.
Thank you for this great tool!
March 07, 2023
Really good provider for web-scraping…
Really good provider for web-scraping services, their customer service is top notch!
January 25, 2023
Great service with absolutely…
Great service with absolutely outstanding support
December 01, 2022

Ready to Scrape the Web like a PRO?

1000 free API calls.
Based on all requests made in the last 30 days. 99.85% success rate.
No-code-friendly.
Trustpilot stars 4.6