小白求助!!!望大佬解答
直奔主题。本来想学习爬虫。。结果就遇到问题了。我想请求https://mewe.groups.hk/%E9%A3%B2%E9%A3%9F这条数据,结果发现是ajax请求。post请求发现g-recaptcha-response的值是变化的。老规矩,继续调试,结果发现搞不定了。。。求助!! 这就是反爬机制啊 lxl6832 发表于 2021-8-3 14:14这就是反爬机制啊
望大佬解答一下它那段JS是怎么产生token的。我用selenium其实是能爬的。
from selenium import webdriver
import lxml.html, time
from openpyxl import Workbook
from selenium.webdriver.chrome.options import Options
wb = Workbook()
sheet = wb.active
headRow = ['群组名称', '链接', '规模', '介绍', '群组类别']
sheet.append(headRow)
def scroll(num):
try:
js = 'return document.body.scrollHeight;'
height = 0
page =0
while page <num:
page += 1
new_height = zh.execute_script(js)
if new_height > height:
zh.execute_script('window.scrollTo(0, document.body.scrollHeight)')
time.sleep(3)
height = new_height
html = zh.page_source
html = lxml.html.fromstring(html)
table = html.xpath('//div[@class="group-wrapper"]/a')
for sub_x in table:
try:
name = sub_x.xpath('./@title')
link = sub_x.xpath('./@href')
nums = sub_x.xpath('.//span/text()')
nums = ' '.join(nums)
intro = sub_x.xpath('.//div[@class="group-desc"]/text()')
intro = ''.join(intro)
rowData = ]
sheet.append(rowData)
except:
pass
else:
print("滚动条已经处于页面最下方!")
# zh.execute_script('window.scrollTo(0, 0)')# 页面滚动到顶部
break
except Exception as e:
msg = str(e)
print(msg)
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9221")
zh = webdriver.Chrome(r'D:\pythongc\chromedriver.exe', chrome_options=chrome_options)
url1 = 'https://mewe.groups.hk/'
zh.get(url1)
html = zh.page_source
html =lxml.html.fromstring(html)
page_urls = html.xpath('//ul[@class="cat-lists scroll"]/li/a/@href')
for i in page_urls:
page_url = url1[:-1] + i
zh.get(page_url)
scroll(10)
wb.save('sss.xlsx')
print(html) g-recaptcha-response是谷歌验证码的token值,极难破解 {:301_988:}遇事不决,selenium Domado 发表于 2021-8-3 14:31
g-recaptcha-response是谷歌验证码的token值,极难破解
。。。。。我就说。。。为啥我调试半天就感觉是验证,但是没办法确定是google验证,感谢感谢
页:
[1]