Python3 爬取k站图片

落寒枫 发表于 2019-1-31 16:06

二次元，见谅见谅。
刚刚学习，写了个爬取K站图片的爬虫。
感觉定义了好多函数，然后还不如不定义，，请大佬指教一下。

#coding = utf-8
from urllib import request
from bs4 import BeautifulSoup
import os, re, time, urllib, chardet

def Write(content):
now = time.strftime("%Y%m%d",time.localtime(time.time()))
file = open(now+".txt","a+")
file.writelines(content+'\n')
file.close()

def GetHtml(url):
html = request.urlopen(url).read()
rawdata = chardet.detect(html)
return html.decode(rawdata['encoding'])

def GetImg(html):
soup = BeautifulSoup(html,'lxml')
img_html = soup.find_all('li', class_ = re.compile('creator'))
#img_html = soup.select('.magazine_wrap')
return img_html

def GetImgHtml(img_html):
html_img = request.urlopen(img_html).read()
#htmldata = chardet.detect(html_img)
#return(htmldata['encoding'])
return html_img#.decode(encoding = htmldata['encoding'])

def GetImgData(html_img):
soup = BeautifulSoup(html_img,'lxml')
img = soup.select('#image')
for src in img:
img_src = src.get('src')
return(img_src)

page = num = 1
sum = int(input("请输入下载页数："))
while page <= sum:
Target_url = 'http://konachan.net/post?page={}'.format(page)
html = GetImg(GetHtml(Target_url))
print("正在下载第",page,"页")
page+=1

for i in html:
src = i.text
Write(src)

down = GetImgData(GetImgHtml(src))
urllib.request.urlretrieve(down,"{}.jpg".format(num))
print("正在下载第",num,"张")
print(src)
num+=1

print("下载完毕")
os.system('pause')

byxiaoxie 发表于 2019-1-31 18:51

这有个问题，下载不了原图PNG你还得再爬一个链接才行，不过支持一下我也是在学python{:301_986:}

林海山河 发表于 2019-1-31 19:06

后期加上异常判断和保存进度，如果网站允许的话可以加上多线程

吾爱支持 发表于 2019-1-31 20:32

感谢楼主分享源码，学习了……。送上我的分分和祝福

落寒枫 发表于 2019-2-1 11:26

byxiaoxie 发表于 2019-1-31 18:51
这有个问题，下载不了原图PNG你还得再爬一个链接才行，不过支持一下我也是在学python

本来准备下载原图png格式的，可是看了街图片发现只有一些有，所以我就放弃了

落寒枫 发表于 2019-2-1 11:32

林海山河发表于 2019-1-31 19:06
后期加上异常判断和保存进度，如果网站允许的话可以加上多线程

还没有学到多线程，所以现在这个小程序如果一次性输入的下载页数过多，到后面就卡死了，

byxiaoxie 发表于 2019-2-1 12:41

落寒枫发表于 2019-2-1 11:32
还没有学到多线程，所以现在这个小程序如果一次性输入的下载页数过多，到后面就卡死了，

多线程我还没学会,一点困难{:301_1008:}

多幸运遇见baby 发表于 2019-2-3 08:46

谢谢@Thanks！

页: [1]

吾爱破解 - 52pojie.cn's Archiver

Python3 爬取k站图片