It is a really silly idea to use Puppeteer to “scrape the web”.

1 min readOct 28, 2017

It is a really silly idea to use Puppeteer to “scrape the web”. It is super inefficient to launch a whole browser to scrape content. To start with, the website used in the example is a SPA. This means that there is an API that you can query directly, bypassing all scraping in the first place. For websites that aren’t SPAs, it is 100x times faster to just request the HTML, parse it, and extract the content from the DOM.

If you are serious about scraping content in JavaScript, check out https://github.com/gajus/surgeon. It is tested across thousands of websites and works in Node.js and browser, such as Puppeteer environment. Unlike Cheerio, it provides a declarative, strict interface to scraping data. Therefore, if a remote site structure changes, you will know instantly, and you don’t need a programmer to maintain your scraper scripts.

Written by Gajus Kuizinas

Responses (10)