抓取了安全客的文章,生成pdf格式,代码如下,当然可以增加多线程来提高抓取速度。我这边没有弄。
抓取的文章和代码如下~
[Python] 纯文本查看 复制代码 #!/usr/bin/python
# encoding = utf-8
import pdfkit
import time
import urllib2
def main():
try:
urllist = list()
url = "http://bobao.360.cn/learning/detail/%s.html"
for i in range(0, 2000):
urllist.append(url % i)
count = 0
for urlname in urllist:
try:
response = urllib2.urlopen(urlname)
result = response.read()
if result.strip() == '':
continue
pdfkit.from_url(urlname, './output/%s.pdf' % count)
time.sleep(0.1)
except:
pass
count = count + 1
except Exception as e:
print str(e)
if __name__ == '__main__':
main()
附件地址:
链接: https://pan.baidu.com/s/1slhPGXv 密码: hsef
|