It is a really silly idea to use Puppeteer to “scrape the web”. It is super inefficient to launch a whole browser to scrape content. To start with, the website used in the example is a SPA. This means that there is an API that you can query directly, bypassing all scraping in the first place. For websites that aren’t SPAs, it is 100x times faster to just request the HTML, parse it, and extract the content from the DOM.

If you are serious about scraping content in JavaScript, check out https://github.com/gajus/surgeon. It is tested across thousands of websites and works in Node.js and browser, such as Puppeteer environment. Unlike Cheerio, it provides a declarative, strict interface to scraping data. Therefore, if a remote site structure changes, you will know instantly, and you don’t need a programmer to maintain your scraper scripts.

Written by

Software architect, startup adviser. Editor of https://medium.com/applaudience. Founder of https://go2cinema.com.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store