【笔记】Python 批量检测网站存活

纸条发表于 2018-5-2 23:10

本帖最后由纸条于 2019-9-20 13:49 编辑

2018年5月份写了一版，但是很慢，期间也改过很多版，但还是很慢。

2019年9月20日改了一下，先检测端口是否存活，存活了再去探测web：

使用方法：
```python
url.pyC:\Python36\test\url.txt
程序链接文本路径（一行一个）
```

```python
import requests
import queue
import re
import sys
import time
import threading
import telnetlib
from urllib.parse import urlparse
requests.packages.urllib3.disable_warnings()

class Waittask(threading.Thread):
def run(self):
   self.telnet = telnetlib.Telnet()
   while True:
         if asset.qsize() < 100:
            for i in range(500):
               host = urlparse(waittask.get())
               scheme = host.scheme
               netloc = host.netloc
               try:
                     if ':' in netloc:
                        Host = netloc.split(':')
                        self.telnet.open(Host,Host,timeout = 3)
                        asset.put('{}://{}:{}'.format(scheme,Host,Host))
                     else:
                        self.telnet.open(netloc,80,timeout = 3)
                        asset.put('{}://{}'.format(scheme,netloc))
               except Exception as e:
                     pass
         time.sleep(1)

class Asset(threading.Thread):
def run(self):
   while True:
         while not asset.empty():
            url = asset.get()
            try:
               response = requests.get(url, verify=False, allow_redirects=False,timeout = 15)
               try:
                     title = re.findall('<title>(.*?)</title>',response.text.lower())
               except Exception as e:
                     title = '获取标题失败'
               print(response.status_code,response.url,title)
               with open('survival.txt','a',encoding='gb18030') as cent:
                     cent.write('{}|{}|{}\n'.format(response.status_code,response.url,title))
            except Exception as e:
               pass
         time.sleep(1)

class Check(threading.Thread):
def run(self):
   while True:
         time.sleep(10)
         print('进度：{:.2%} 当前还有{}个端口开放检测，{}个WEB探测任务未完成！'.format(1-int(waittask.qsize())/sums,waittask.qsize(),asset.qsize()))

if __name__ == '__main__':
file = sys.argv
waittask = queue.Queue()
asset = queue.Queue()

with open(file) as content:
   for cent in content:
         waittask.put(cent.strip())
sums = int(waittask.qsize())

for i in range(100):
   Waittask_ = Waittask()
   Waittask_.start()

for i in range(100):
   Asset_ = Asset()
   Asset_.start()

check = Check()
check.start()
```

纸条发表于 2018-7-8 19:40

sniper86 发表于 2018-7-4 10:42
批量检测存活网站在做安全支撑服务时很重要，往往上千域名，这样就可以很好的过滤掉不能访问的网站，谢谢分 ...

不过得注意，这里只取状态码是200的，403 404 等其他状态吗都没取，在WEB探测的时候不适用。

qs1597qs 发表于 2019-4-2 23:58

上面的提取文件，进行判断的逻辑都懂了但是 if '__name__ '== '__main__' 下的这个方法没看懂，楼主能否解惑下
t = threading.Thread(target=crawler)
t.start()

yanglei123 发表于 2018-5-2 23:42

学习快乐

qqqwww0078 发表于 2018-5-2 23:53

我爱学习

LeiSir 发表于 2018-5-3 07:45

学习使我快乐，虽然看不懂。

houzp 发表于 2018-5-3 10:02

感谢分享

lcg2014 发表于 2018-5-3 18:05

一条for命令不久行了嘛？

纸条发表于 2018-5-4 09:15

lcg2014 发表于 2018-5-3 18:05
一条for命令不久行了嘛？

:Dweeqw贴上代码让我学习学习~

sniper86 发表于 2018-7-4 10:42

批量检测存活网站在做安全支撑服务时很重要，往往上千域名，这样就可以很好的过滤掉不能访问的网站，谢谢分享:lol

小黑LLB 发表于 2019-2-2 20:20

感谢分享咯代码拿去看看嘿嘿支持一下{:1_921:}

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

【笔记】Python 批量检测网站存活