好友
阅读权限10
听众
最后登录1970-1-1
|
本帖最后由 double07 于 2021-12-31 23:02 编辑
用proxy_pool-master白嫖免费ip,获取IP后,爬数据还是出现网站“人机认证”的提示,说明ip池ip没挂上。现在没弄清楚到底是白嫖的ip时效性太短,还是在调用ip池代码写得不对?
[Python] 纯文本查看 复制代码 # 获取网页内容[/b][/size]
# =========================================================================调用代{过}{滤}理API
def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()
def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))
# =========================================================================调用代{过}{滤}理API
def gethtml(url):
retry_count = 4
proxy = get_proxy().get("proxy")
while retry_count > 0:
try:
response = requests.get(url, cookies=cookies, proxies={"http": "http://{}".format(proxy)})
encodingInfo = chardet.detect(response.content)
r_response = response.content.decode(encodingInfo['encoding'], 'ignore')
return r_response
except Exception:
retry_count -= 1
delete_proxy(proxy)
return None
# 主程序
if __name__ == '__main__':
u = 'https://cq.ke.com/ershoufang/'
html = gethtml(u)
html_1 = etree.HTML(html)
href_1 = html_1.xpath(
'//*[@id="beike"]/div[1]/div[3]/div[1]/dl[2]/dd/div[1]/div[1]/a/@href')
pool = mp.Pool(7)
crawl = []
for i in tqdm(href_1, desc='子区域下载进度'):
crawl.append(pool.apply_async(get_suburl, args=(i,)))
|
|
发帖前要善用【论坛搜索】功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。 |
|
|
|
|