Python 请教selenium的用法!?
本帖最后由 hahawangzi 于 2020-5-12 15:51 编辑import requests
from bs4 import BeautifulSoup
import re
import json
url = 'http://www.dm5.com/m11076/'
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
'Referer':'http://www.dm5.com/'
}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
img = soup.find_all('div',class_=re.compile('view-+$'),id="showimage")
for i in img:
print(i)
我用
from selenium import webdriver
driver=webdriver.Chrome()
driver.get("http://www.dm5.com/m987161/")
print(driver.page_source)
源码里面还是没有图片地址,难道只能一张张截图了吗?
正常操作撒
后台控制的, 如果key对应不上就不返回图片数据
抓包前一般在先在浏览器分析一下哦
图片地址都是带cid 和key 这个没什么问题的~ 看了一下地址, 中间的大图是通过js动态渲染加载的你直接拿源码肯定没有 你还是用Selenium 老老实实分析js代码或者直接用Selenium、PyQt5加载网页。
友情提示:Python中正则不用编译,看源码便知。 网址有script好好 看看 哪个里面有需要的组成页面的所有东西,就是顺序有问题而已~
需要你自己再分析一下 cdndm5 99 com piaoliujiaoshi specials manhua1001 http 61 50 174 1622b91090844e425ca2f2139c72971a key 4B6E3D3668F4F28A1182A28EC8D94FAC45715C68E30FAEA4D121348B0C3CE11C type 11076 uk jpg cid
0219044747_94824 0219044747_59668 0219044747_61662 0219044747_82966 0219044747_87508 0219044747_32426 0219044747_45820 0219044747_21426 0219044747_19237 0219044747_98862 0219044747_86410 0219044747_75429 0219044747_42385 0219044747_22825 0219044747_45847 0219044747_56343 0219044747_76773 0219044747_15774 0219044747_69095 0219044747_12925 0219044747_36366 0219044747_11224 0219044747_72554 0219044747_21123 0219044747_96988 0219044646_29746 0219044646_16425 0219044646_54290 0219044646_46134 0219044646_96522 0219044646_48812 0219044646_11102
newImgs var
0219044646_93046 0219044646_16913 0219044646_32871 0219044646_18518 0219044707_26863 0219044706_58124 0219044704_44364 0219044747_85866 0219044747_90858 0219044709_90688 0219044646_68028 0219044646_33164 0219044646_56920 0219044702_19913 0219044701_65280 0219044700_69955 0219044848_15330 0219044848_33781 0219044848_34215 0219044848_31538 0219044848_74807 0219044848_43216 0219044848_37838 0219044848_57522 0219044848_94199 0219044848_61486 0219044848_83234 0219044848_38220 0219044848_30939 0219044848_80132 0219044848_84322 0219044848_44403 0219044848_77554 0219044848_88810 0219044848_63075 0219044848_51192 0219044848_84146 0219044848_85505 0219044848_83355 0219044848_40746 0219044801_45398 0219044747_37408 0219044747_34471 0219044805_97807 0219044803_81456 0219044802_92227 0219044747_48049 0219044747_39608 0219044747_43225 0219044747_85597 0219044747_61200 0219044747_60526 0219044848_60668 0219044848_67618 0219044848_31458 0219044848_19159 0219044848_49450 0219044848_18199 0219044809_39333 0219044808_52623 0219044806_59493 0219044848_57728 0219044848_32267 0219044848_21429
下面这个是图片地址
http://manhua1001-61-174-50-99.cdndm5.com/specials/p/piaoliujiaoshi/0219044646_93046.jpg?cid=11076&key=1622b91090844e425ca2f2139c72971a&type=1&uk=4B6E3D3668F4F28A1182A28EC8D94FACF6264725111DA400397ABD0A187271BB
http://manhua1001-61-174-50-99.cdndm5.com/specials/p/piaoliujiaoshi/0219044646_54290.jpg?cid=11076&key=1622b91090844e425ca2f2139c72971a&type=1&uk=4B6E3D3668F4F28A1182A28EC8D94FACF6264725111DA400D18C3D1A105B1CB2
import execjs
import requests
with open('1.js','r') as fp:
ctx = execjs.compile(fp.read())
a = ctx.call('dm5imagefun')
headers = {
'Referer': 'http://www.dm5.com/m11076-p2/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3756.400 QQBrowser/10.5.4039.400'
}
for i,url in enumerate(a):
res = requests.get(url, headers=headers)
with open(f'{i}.jpg', 'wb') as fpp:
fpp.write(res.content)
1.js 文件是http://www.dm5.com/m11076-p2/chapterfun.ashx?cid=*******这个请求获取的,这个请求在chapternew_v22.js中发送。
这样可以直接获取所有的链接url 辣丝丝小白菜 发表于 2020-4-17 11:56
import execjs
import requests
老师,能在详细点吗?怎么请求这个http://www.dm5.com/m11076-p2/chapterfun.ashx?cid=******* 获得 1.js文件 辣丝丝小白菜 发表于 2020-4-17 11:56
import execjs
import requests
老师,能在详细点吗?怎么请求这个http://www.dm5.com/m11076-p2/chapterfun.ashx?cid=******* 获得 1.js文件