Puppeteer默认以无头模式启动Chromium ,可以作为爬虫程序运行在服务器端,要启动完整版的Chromium,可在启动浏览器时设置headless选项。
Chrome服务器端运行 - Puppeteer无头模式
安装
npm i puppeteer # or "yarn add puppeteer"
安装Puppeteer时,会下载最新版本Chromium(〜170MB Mac,〜282MB Linux,〜280MB Win),以保证可与该API一起使用,如果下载时出现异常,参考下载异常处理。
示例1:
example.js
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.baidu.com'); await page.screenshot({path: 'example.png'}); await browser.close(); })();
运行:
node example.js
说明:在example.js文件所在目录下会生成一个网页截图。
示例2:
每5分钟运行启动一次,访问网页并读通过dom操作读取数据。
const puppeteer = require('puppeteer'); var schedule = require('node-schedule'); var http = require('http'); http.post = require('http-post'); var j = schedule.scheduleJob(' */5 * * * *', function(){ console.log('The answer to life, the universe, and everything!'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.baidu.com'); const textArray= await page.$$eval('#abc td',list=>{ var json = []; for(var i =0;i< list.length;i++){ var e = list[i]; var title = e.firstElementChild.textContent,c = null; json.push({title,c}); } return json; }); var name = JSON.stringify(textArray); http.post('http://localhost/to/server', { name}, function(res){}); await browser.close(); })(); });