如何快速判断页面是否真正存在内容？

⊙⌒⊙ 发表于 2022-12-21 15:53

https://jhl.ke.seewo.com/live/plan/826510684487921664
是有视频存在
https://jhl.ke.seewo.com/live/plan/826514856624701440
也是有视频存在
https://jhl.ke.seewo.com/live/plan/826514856624701445
没有视频，显示404

怎么把所有有视频的地址找出来呢？
找几天老是出错，麻烦帮看看，有没有更快速的方法
谢谢！！
用for i in rang(9999999999999999):
import requestsfrom urllib.parse import urlencode
import json
import csv
import base64
from multiprocessing import Pool

url2='https://jhl.ke.seewo.com/live/plan/826510684487921664' #实际URL
url='https://jhl.ke.seewo.com/live/fetch?actionName=GET_PLAN_DETAIL&ts=1670989434716' #判断视频的URL

def checkurl(num):
   apiUrl="/live/v1/plan/8265"+num+"/open/detail"
   strs='{"method":"GET","apiUrl":'+apiUrl+',"headers":{"userName":"","userType":"","userId":""},"baseURL":"http://live.seewo.com/live-server"}'
   #print(strs)
   result=base64.b64encode(strs.encode('utf-8')).decode('ascii')
   #print(result)

   headers={
   'Accept': 'application/json, text/plain, */*',
   'Content-Type': 'application/json',
   "ApiExtend":result
   }

   response=requests.post(url=url,headers=headers)
   result=json.loads(response.text)
   if(result['success']):
            print('https://jhl.ke.seewo.com/live/plan/8265'+num)
            with open('url.txt', 'a+', encoding='utf-8') as f:
                     f.write('https://jhl.ke.seewo.com/live/plan/8265'+num+"\n")
                     f.close()

def main():
#for i in range (99999999999999):
# 保存进程
   Process_list = []
# 创建并启动进程，限制进程数
   p = Pool(10)
# for (cid,) in cids:
   for i in range (1,99999999999999):
            num=str(i).zfill(14)
   # print(cid)
   # exit()
            p.apply_async(checkurl, args=(num,))
            Process_list.append(p)
            print(i,end=" ")
   p.close()
   p.join()

if __name__ == '__main__':
main()

choujie1689 发表于 2022-12-21 15:57

response=requests.post(url=url,headers=headers)
#result=json.loads(response.text)
if(response.status_code == 200):
   print('https://jhl.ke.seewo.com/live/plan/8265'+num)
   with open('url.txt', 'a+', encoding='utf-8') as f:
         f.write('https://jhl.ke.seewo.com/live/plan/8265'+num+"\n")
         f.close()

通过判断response.status_code，如果是200有视频，404则没有

qeq66 发表于 2022-12-21 16:10

本帖最后由 qeq66 于 2022-12-21 16:17 编辑

http://live.seewo.com/live-server/live/v1/plan/826514856624701445/open/detail
用过这个接口去判断，把id取出来判断

⊙⌒⊙ 发表于 2022-12-21 16:15

qeq66 发表于 2022-12-21 16:10
"https://jhl.ke.seewo.com/live/fetch?actionName=GET_PLAN_DETAIL&ts=1671 ...

谢谢，这个API能生成，现在是要判断页面的地址，哪些有真正内容：（要去循环判断：（
我用的多线程，request去判断，一段时间后老是出错：（

⊙⌒⊙ 发表于 2022-12-21 16:17

result=base64.b64encode(strs.encode('utf-8')).decode('ascii')
已经生成了api了

qeq66 发表于 2022-12-21 16:17

⊙⌒⊙ 发表于 2022-12-21 16:15
谢谢，这个API能生成，现在是要判断页面的地址，哪些有真正内容：（要去循环判断：（
我用的多线程，req ...

http://live.seewo.com/live-server/live/v1/plan/826514856624701445/open/detail

用这个接口去查询

⊙⌒⊙ 发表于 2022-12-21 16:23

qeq66 发表于 2022-12-21 16:17
http://live.seewo.com/live-server/live/v1/plan/826514856624701445/open/detai ...

用了这个接口去查询，好像快多了？为什么呀

⊙⌒⊙ 发表于 2022-12-21 16:23

Exception in thread Thread-1 (_handle_workers):
Traceback (most recent call last):
File "D:\Program Files\Python\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "D:\Program Files\Python\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "D:\Program Files\Python\Lib\multiprocessing\pool.py", line 524, in _handle_workers
taskqueue.put(None)
MemoryError

刚运行一会儿：（又错了

qeq66 发表于 2022-12-21 16:28

⊙⌒⊙ 发表于 2022-12-21 16:23
用了这个接口去查询，好像快多了？为什么呀

这是api接口，你上面那些都是页面地址。页面地址会请求api地址。。至于你的报错，不太清楚

⊙⌒⊙ 发表于 2022-12-21 16:35

import requests
from urllib.parse import urlencode
import json
import csv
import base64
from multiprocessing import Pool

url2='https://jhl.ke.seewo.com/live/plan/826510684487921665' #实际URL
url='https://jhl.ke.seewo.com/live/fetch?actionName=GET_PLAN_DETAIL&ts=1670989434716' #判断视频的URL

def checkurl(num):
apiUrl="http://live.seewo.com/live-server/live/v1/plan/8265"+num+"/open/detail"
strs='{"method":"GET","apiUrl":'+apiUrl+',"headers":{"userName":"","userType":"","userId":""},"baseURL":"http://live.seewo.com/live-server"}'
#print(strs)
result=base64.b64encode(strs.encode('utf-8')).decode('ascii')
#print(result)

headers={
'Accept': 'application/json, text/plain, */*',
'Content-Type': 'application/json',
"ApiExtend":result
}

response=requests.post(url=apiurl,headers=headers)
#result=json.loads(response.text)
if(response.status_code == 200):
print('https://jhl.ke.seewo.com/live/plan/8265'+num)
with open('url.txt', 'a+', encoding='utf-8') as f:
f.write('https://jhl.ke.seewo.com/live/plan/8265'+num+"\n")
f.close()

def main():
#for i in range (99999999999999):
# 保存进程
Process_list = []
# 创建并启动进程，限制进程数
p = Pool(30)
# for (cid,) in cids:
for i in range (11111151600000,99999999999999):
num=str(i).zfill(14)
# print(cid)
# exit()
p.apply_async(checkurl, args=(num,))
Process_list.append(p)
if i % 100000 ==0:
with open('url2.txt', 'a+', encoding='utf-8') as f:
f.write(str(i)+",")
f.close()
p.close()
p.join()

if __name__ == '__main__':
main()

运行一会儿就会自动关闭，程序哪儿有问题？

页: [1] 2 3

吾爱破解 - 52pojie.cn's Archiver

如何快速判断页面是否真正存在内容 ？

如何快速判断页面是否真正存在内容？