How to Download Youtube Transcript for Free

Introduction

In this blog post, we'll explore the wasiest way to scrape and download a transcript from Youtube for free.
For this task, we will use Node.js and a browser library called puppeteer.

The code will launch a browser instance that will open the Youtube video URL and scrape the transcript which you will be able to save on your local machine.
You can use the code from this post to start building your own Youtube transcript downloader.

Prerequisites

To start downloading the transcripts from Youtube, you will need the following things:

Node.js installed on your local machine
Some basic HTML and JavaScript coding skills.

How to Scrape Youtube Transcript

Before starting, you need to open the URL of a Youtube video that is publicly available and not age-restricted.
We will use the following URL:

  
    https://www.youtube.com/watch?v=1WOQumXj0kg

The page will look like the following one:

If you are from EU, a Cookie banner will be shown.
To close it programatically, you need to run the following JavaScript snippet:

  
    document.querySelector('button[aria-label*=cookies]')?.click()

The next step is locating the Show transcript button and clicking it.
To do this programatically, you need to run the following JavaScript snippet:

  
    document.querySelector('ytd-video-description-transcript-section-renderer button').click()

Within a second, the transcript will be shown in the sidebar:
Youtube transcript component

To collect the text nodes that contain the transcript, you need to run this JavaScript snippet:

  
    Array.from(document.querySelectorAll('#segments-container yt-formatted-string')).map(
      element => element.textContent?.trim()
    ).join("\n");

It will generate a text that contains each line from the transcript, joined with a new line:

  
    in this video i'm going to demonstrate
    how to web scrape into your bubble
    application and i'm going to be using
    the web scraper page two api um i was
    working on a recent client project and
    uh i tried a number of different web

    ...

How to Download the Youtube Transcript with puppeteer

In this part we will write some Node.js code that will make it possible to download the Youtube transcript.

Before we start, we need to make sure that Node.js is installed.
If you are using macOS, you probably have installed Homebrew which makes things a little bit easier:

  
    brew install node

Now we need to create a new folder for our project: youtube-transcript:

  
    mkdir youtube-transcript && cd youtube-transcript

In this folder we will create the package.json file which will contain the information about our project:

  
  {
    "name": "transcript",
    "version": "1.0.0",
    "description": "Youtube transcript parser",
    "main": "index.js",
    "scripts": {
      "start": "npx functions-framework --target=run",
      "postinstall": "npx puppeteer browsers install chrome"
    },
    "author": "Your Name",
    "license": "ISC",
    "dependencies": {
      "puppeteer": "^22.6.0",
      "puppeteer-extra": "^3.3.6",
      "puppeteer-extra-plugin-stealth": "^2.11.2"
    }
  }

The next step is installing all the needed packages:

  
    npm install

Now let's create the index.js file and start putting all the things together.

  
    const puppeteer = require('puppeteer-extra')
    const StealthPlugin = require('puppeteer-extra-plugin-stealth')

    // using the stealth plugin to avoid being detected during scraping
    puppeteer.use(StealthPlugin())

    // this is the main function
    const run = async(req, res) => {
      const browser = await puppeteer.launch({
        headless: "new",
        ignoreDefaultArgs: ["--enable-automation"]
      }); // Starting the headless browser (Chrome)

      const page = await browser.newPage();
      let result = null;
      const url = process.argv[2] // reading the URL

      try {
        await page.goto(url, { waitUntil: 'domcontentloaded' }); // opening the youtube URL

        await page.evaluate(() => {
          document.querySelector('button[aria-label*=cookies]')?.click() // closing the Cookie banner
        });

        await page.waitForSelector("ytd-video-description-transcript-section-renderer button", {
          timeout: 10_000
        }) // waiting max 10 seconds for the 'Show transcript' button to appear

        await page.evaluate(() => {
          document.querySelector('ytd-video-description-transcript-section-renderer button').click()
        }) // clicking on the 'Show transcript' button

        result = await parseTranscript(page); // parsing the transcript

        await page.close()
        await browser.close()

        console.log(result) // returning the transcript
      } catch(error) {
        console.log(error)

        await page.close()
        await browser.close()
      }
    }

    // this function will parse the transcript
    const parseTranscript = async(page) => {
      // waiting max 10 seconds for the transcript container to appear
      await page.waitForSelector('#segments-container', {
        timeout: 10_000
      });

      // parsing all the text nodes from the transcript container and join them with an empty line
      return page.evaluate(() => {
        return Array.from(document.querySelectorAll('#segments-container yt-formatted-string')).map(
          element => element.textContent?.trim()
        ).join("\n");
      });
    }


    run()

To run this script, execute the following command:

  
    node index.js https://www.youtube.com/watch?v=1WOQumXj0kg

The result will look like the following one:

  
    in this video i'm going to demonstrate
    how to web scrape into your bubble
    application and i'm going to be using
    the web scraper page two api um i was
    working on a recent client project and
    uh i tried a number of different web
    scraper apis and i found that page two
    api uh offered the best integration for
    what i was trying to do
    with the bubble api connector plug-in
    so that's what i'll be demonstrating uh
    to you now um so uh if we head into
    the bubble api connector install this
    plug-in if you haven't already by bubble
    and we'll add another api

    ...

And if you want to save the transcript to a text file, execute the following command:

  
    node index.js https://www.youtube.com/watch?v=1WOQumXj0kg > transcript.txt

Conclusion

And there you have it—a simple yet effective way to download YouTube transcripts using Puppeteer!

This script demonstrates how we can leverage the power of automation to access and extract information from web pages, even when it involves interacting with elements like buttons and waiting for specific content to appear.

Remember, the core of this script utilizes Puppeteer's ability to simulate a real user's interactions, allowing us to navigate through pages, accept cookies, click on elements, and scrape the content we need. With the stealth plugin, we reduce the chances of being detected as a bot, making our scraping activities more seamless and efficient.

I hope this tutorial has demystified the process of web scraping with Puppeteer and shown you that with a bit of JavaScript, you can unlock a vast amount of data available on the web. Feel free to tweak this script to suit your needs—maybe you want to download transcripts from a list of URLs or incorporate additional data into your results.

Happy coding, and may your curiosity lead you to amazing projects!

How to Download Youtube Transcript for Free

Introduction

Prerequisites

How to Scrape Youtube Transcript

How to Download the Youtube Transcript with puppeteer

Conclusion

You might also like

How to Scrape Tripadvisor Reviews and Perform Sentiment Analysis with AI

How to Download Instagram Videos with iPhone Shortcuts

How to Scrape News Articles and Summarize the Content with AI

What customers are saying

Ready to Scrape the Web like a PRO?