关于IP daili池的问题？

double07 发表于 2021-12-31 21:28

本帖最后由 double07 于 2021-12-31 23:02 编辑

用proxy_pool-master白嫖免费ip，获取IP后，爬数据还是出现网站“人机认证”的提示，说明ip池ip没挂上。现在没弄清楚到底是白嫖的ip时效性太短，还是在调用ip池代码写得不对?
# 获取网页内容
# =========================================================================调用代{过}{滤}理API
def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()

def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))
# =========================================================================调用代{过}{滤}理API

def gethtml(url):
retry_count = 4
proxy = get_proxy().get("proxy")
while retry_count > 0:
   try:
         response = requests.get(url, cookies=cookies, proxies={"http": "http://{}".format(proxy)})
         encodingInfo = chardet.detect(response.content)
         r_response = response.content.decode(encodingInfo['encoding'], 'ignore')
         return r_response
   except Exception:
         retry_count -= 1
         delete_proxy(proxy)
return None

# 主程序
if __name__ == '__main__':
u = 'https://cq.ke.com/ershoufang/'
html = gethtml(u)
html_1 = etree.HTML(html)
href_1 = html_1.xpath(
   '//*[@id="beike"]/div/div/div/dl/dd/div/div/a/@href')
pool = mp.Pool(7)
crawl = []
for i in tqdm(href_1, desc='子区域下载进度'):
   crawl.append(pool.apply_async(get_suburl, args=(i,)))

tmp 发表于 2021-12-31 21:38

你有没有想过别人也在干这事

2513002960 发表于 2021-12-31 21:57

也有可能是你请求头不完整，没有通过后台的校验

double07 发表于 2021-12-31 22:06

tmp 发表于 2021-12-31 21:38
你有没有想过别人也在干这事

;www 各路大神，有好路子没有，付费的也行

幽溪左畔 发表于 2021-12-31 22:29

有的网站会检测是否是代{过}{滤}理IP 免费的不会太好用的

ibook 发表于 2022-1-1 01:00

可能是万人骑的ip 都烂了

页: [1]

吾爱破解 - 52pojie.cn's Archiver

关于IP daili池的问题？