如何快速判断页面是否真正存在内容 ?
https://jhl.ke.seewo.com/live/plan/826510684487921664是有视频存在
https://jhl.ke.seewo.com/live/plan/826514856624701440
也是有视频存在
https://jhl.ke.seewo.com/live/plan/826514856624701445
没有视频,显示404
怎么把所有有视频的地址找出来呢?
找几天老是出错,麻烦帮看看,有没有更快速的方法
谢谢!!
用for i in rang(9999999999999999):
import requestsfrom urllib.parse import urlencode
import json
import csv
import base64
from multiprocessing import Pool
url2='https://jhl.ke.seewo.com/live/plan/826510684487921664' #实际URL
url='https://jhl.ke.seewo.com/live/fetch?actionName=GET_PLAN_DETAIL&ts=1670989434716' #判断视频的URL
def checkurl(num):
apiUrl="/live/v1/plan/8265"+num+"/open/detail"
strs='{"method":"GET","apiUrl":'+apiUrl+',"headers":{"userName":"","userType":"","userId":""},"baseURL":"http://live.seewo.com/live-server"}'
#print(strs)
result=base64.b64encode(strs.encode('utf-8')).decode('ascii')
#print(result)
headers={
'Accept': 'application/json, text/plain, */*',
'Content-Type': 'application/json',
"ApiExtend":result
}
response=requests.post(url=url,headers=headers)
result=json.loads(response.text)
if(result['success']):
print('https://jhl.ke.seewo.com/live/plan/8265'+num)
with open('url.txt', 'a+', encoding='utf-8') as f:
f.write('https://jhl.ke.seewo.com/live/plan/8265'+num+"\n")
f.close()
def main():
#for i in range (99999999999999):
# 保存进程
Process_list = []
# 创建并启动进程,限制进程数
p = Pool(10)
# for (cid,) in cids:
for i in range (1,99999999999999):
num=str(i).zfill(14)
# print(cid)
# exit()
p.apply_async(checkurl, args=(num,))
Process_list.append(p)
print(i,end=" ")
p.close()
p.join()
if __name__ == '__main__':
main() response=requests.post(url=url,headers=headers)
#result=json.loads(response.text)
if(response.status_code == 200):
print('https://jhl.ke.seewo.com/live/plan/8265'+num)
with open('url.txt', 'a+', encoding='utf-8') as f:
f.write('https://jhl.ke.seewo.com/live/plan/8265'+num+"\n")
f.close()
通过判断response.status_code,如果是200有视频,404则没有 本帖最后由 qeq66 于 2022-12-21 16:17 编辑
http://live.seewo.com/live-server/live/v1/plan/826514856624701445/open/detail
用过这个接口去判断,把id取出来判断 qeq66 发表于 2022-12-21 16:10
"https://jhl.ke.seewo.com/live/fetch?actionName=GET_PLAN_DETAIL&ts=1671 ...
谢谢,这个API能生成,现在是要判断页面的地址,哪些有真正内容:(要去循环判断:(
我用的多线程,request去判断,一段时间后老是出错:( result=base64.b64encode(strs.encode('utf-8')).decode('ascii')
已经生成了api了 ⊙⌒⊙ 发表于 2022-12-21 16:15
谢谢,这个API能生成,现在是要判断页面的地址,哪些有真正内容:(要去循环判断:(
我用的多线程,req ...
http://live.seewo.com/live-server/live/v1/plan/826514856624701445/open/detail
用这个接口 去查询 qeq66 发表于 2022-12-21 16:17
http://live.seewo.com/live-server/live/v1/plan/826514856624701445/open/detai ...
用了这个接口去查询,好像快多了?为什么呀 Exception in thread Thread-1 (_handle_workers):
Traceback (most recent call last):
File "D:\Program Files\Python\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "D:\Program Files\Python\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "D:\Program Files\Python\Lib\multiprocessing\pool.py", line 524, in _handle_workers
taskqueue.put(None)
MemoryError
刚运行一会儿:(又错了 ⊙⌒⊙ 发表于 2022-12-21 16:23
用了这个接口去查询,好像快多了?为什么呀
这是api接口,你上面那些都是页面地址。页面地址会请求api地址。。至于你的报错,不太清楚 import requests
from urllib.parse import urlencode
import json
import csv
import base64
from multiprocessing import Pool
url2='https://jhl.ke.seewo.com/live/plan/826510684487921665' #实际URL
url='https://jhl.ke.seewo.com/live/fetch?actionName=GET_PLAN_DETAIL&ts=1670989434716' #判断视频的URL
def checkurl(num):
apiUrl="http://live.seewo.com/live-server/live/v1/plan/8265"+num+"/open/detail"
strs='{"method":"GET","apiUrl":'+apiUrl+',"headers":{"userName":"","userType":"","userId":""},"baseURL":"http://live.seewo.com/live-server"}'
#print(strs)
result=base64.b64encode(strs.encode('utf-8')).decode('ascii')
#print(result)
headers={
'Accept': 'application/json, text/plain, */*',
'Content-Type': 'application/json',
"ApiExtend":result
}
response=requests.post(url=apiurl,headers=headers)
#result=json.loads(response.text)
if(response.status_code == 200):
print('https://jhl.ke.seewo.com/live/plan/8265'+num)
with open('url.txt', 'a+', encoding='utf-8') as f:
f.write('https://jhl.ke.seewo.com/live/plan/8265'+num+"\n")
f.close()
def main():
#for i in range (99999999999999):
# 保存进程
Process_list = []
# 创建并启动进程,限制进程数
p = Pool(30)
# for (cid,) in cids:
for i in range (11111151600000,99999999999999):
num=str(i).zfill(14)
# print(cid)
# exit()
p.apply_async(checkurl, args=(num,))
Process_list.append(p)
if i % 100000 ==0:
with open('url2.txt', 'a+', encoding='utf-8') as f:
f.write(str(i)+",")
f.close()
p.close()
p.join()
if __name__ == '__main__':
main()
运行一会儿就会自动关闭,程序哪儿有问题?