记第一天使用node做爬虫 您所在的位置:网站首页 猫眼电影票房榜实时排行 记第一天使用node做爬虫

记第一天使用node做爬虫

2023-09-23 23:54| 来源: 网络整理| 查看: 265

首先,我是一个做前端的应届生,今天朋友想让我帮忙爬取猫眼电影票房总榜的数据,但是我之前一点都没接触过爬虫,但我还是说:okk,我试试; 然后试试就逝世,各种坑接踵而来;

提前声明:这篇文章暂时只是获取到了页面的数据,还没有使用正则提取关键数据;(后续会继续更新)—已更新

关键一点: 获取猫眼电影票房总榜的数据,需要使用node模拟浏览器去访问这个网址:https://piaofang.maoyan.com/mdb/rank/query?type=0&id=2021,同时需要设置头部user-agent和cookie,不然会返回401;

在这里插入图片描述 在这里插入图片描述 代码:

// 引入superagent,帮助我们发生get和post请求 const superagent = require('superagent'); // 请求地址 const url = 'https://piaofang.maoyan.com/movie/344264' // const url = 'https://piaofang.maoyan.com/mdb/rank/query?type=0&id=2021' superagent .get(url) .set('Cookie', 'mta=248378680.1622353618161.1622358743253.1622360863750.5; _lxsdk_cuid=179bbcfa476c8-08ab923c0a6f91-d7e1938-e1000-179bbcfa476c8; theme=moviepro; Hm_lvt_703e94591e87be68cc8da0da7cbd0be2=1622360700; Hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1622360700; _lxsdk=EEFEF990C11A11EB88F7CB3FB083BC96E951611E0C3843B5B875568FCDE2885A; _lxsdk_s=179bc3bb2e4-e1c-418-4b0%7C%7C8') .set('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36') .then(res => { console.log(res) }).catch(err => { console.log(err) })

这个代码里面借助了superagent来模拟浏览器访问猫眼电影的服务器,此时无论是访问某一部电影还是访问票房总榜,数据都可以爬取到; 具体的使用正则提取关键数据还在学习ing。。。。。。 下图是部分爬取到的数据; 在这里插入图片描述 后端代码:

// 将koa引入 const koa = require("koa2") // 实例化koa const app = new koa(); // 引入路由 const Router = require('koa-router') const router = new Router(); // 处理跨域 app.use(async (ctx, next) => { ctx.set("Access-Control-Allow-Origin", "*") await next() }) // 引入superagent,帮助我们发生get和post请求 const superagent = require('superagent'); // 引入cheerio,帮助我们处理获取到的网页字符串 const cheerio = require('cheerio') // 爬虫方法 // 登录凭证及模拟浏览器登录,不加cookie的话服务器返回403错误(没权限) const cookie = '__mta=248378680.1622353618161.1622822135342.1622825432418.31; _lxsdk_cuid=179bbcfa476c8-08ab923c0a6f91-d7e1938-e1000-179bbcfa476c8; Hm_lvt_703e94591e87be68cc8da0da7cbd0be2=1622360700; _lxsdk=EEFEF990C11A11EB88F7CB3FB083BC96E951611E0C3843B5B875568FCDE2885A; theme=moviepro; _lxsdk_s=179d7eef51d-c2e-cc-a4c%7C%7C2'; const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36'; let doubanCookie = 'll="118281"; bid=UEuG1A0t0w8; _vwo_uuid_v2=D38900B3B458B847163B795EEAEB0FDE0|defa03deedbec2f1680989c2da76c7ba; __utmz=30149280.1622560751.4.2.utmcsr=search.douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/movie/subject_search; __utmz=223695111.1622560751.4.2.utmcsr=search.douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/movie/subject_search; ap_v=0,6.0; _pk_ref.100001.4cf6=%5B%22%22%2C%22%22%2C1622645515%2C%22https%3A%2F%2Fsearch.douban.com%2Fmovie%2Fsubject_search%3Fsearch_text%3D%25E7%2596%25AF%25E7%258B%2582%25E5%258A%25A8%25E7%2589%25A9%25E5%259F%258E%26cat%3D1002%22%5D; _pk_id.100001.4cf6=e59993bfdc54083f.1622356663.5.1622645515.1622561683.; _pk_ses.100001.4cf6=*; __utma=30149280.1966141415.1622356664.1622560751.1622645515.5; __utmb=30149280.0.10.1622645515; __utmc=30149280; __utma=223695111.772182386.1622356664.1622560751.1622645515.5; __utmb=223695111.0.10.1622645515; __utmc=223695111'; // 获取某一年份的电影票房排行榜 function getYearMovieList(year) { let url = 'https://piaofang.maoyan.com/mdb/rank/query?type=0&id=' + year; return superagent .get(url) .set('Cookie', cookie) .set('User-Agent', userAgent) .then(res => { const data = JSON.parse(res.text).data.list; return data.slice(0, 50); }).catch(err => { return err; }) } // 获取某一部电影的关键信息 function getMovieDetail(id, name) { let url = 'https://piaofang.maoyan.com/movie/' + id; return superagent .get(url) .set('Cookie', cookie) .set('User-Agent', userAgent) .then(res => { const $ = cheerio.load(res.text); // 然后我们就可以通过jQuery的方法来操作DOM // 利用正则表示式把换行符号和空格去掉 const movieTypeText = $(".info-category").html().replace(/\n|\s*/g, '').trim(); const movieCountryText = $("..ellipsis-1").html().replace(/\n|\s*/g, '').trim(); const scoringNumDom = $(".detail-score-count").html() ? $(".detail-score-count").html() : ''; const score = $(".rating-num").html(); const maleRatioDom = $(".male").html() ? $(".male").html().replace(/\n|\s*/g, '').trim() : ''; const femaleRatioDom = $(".female").html() ? $(".female").html().replace(/\n|\s*/g, '').trim() : ''; const personRatioRegex = /(.*?)(.*?)/ const cityRatio = /(.*?)(.*?)/ let movieType = movieTypeText.split('


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有