python爬虫下载某网站156个网页小游戏素材

三木猿 发表于 2020-9-18 16:27

本帖最后由三木猿于 2020-9-18 23:19 编辑

有哪些游戏自己看吧↓
一波网页小游戏（摸鱼专用）
https://www.52pojie.cn/thread-1269936-1-1.html
压缩包内有广告，管理员不让发，难受，马上升级了还给我积分撤回了

下载网页小游戏素材版：
import requests
from bs4 import BeautifulSoup

def get_Url(url):
str_list = []
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
find = soup.find('span', attrs={'class': 'current'})
sum = int(find.text.split('/'))
for i in range(sum):
   if i == 0:
         str_list.append('https://www.mycodes.net/166/')
         continue
   str_list.append('https://www.mycodes.net/166/' + str(i + 1) + '.htm')
return str_list

def get_document(url):
soup = BeautifulSoup(requests.get(url).content, 'lxml')
find_all = soup.find_all('a', attrs={'style': 'color:#006BCD;font-size:14px;'})
a = ''
for value in find_all:
   if a.__eq__(str(value['href'])):
         continue
   a = value['href']
   document = BeautifulSoup(requests.get(value['href']).content, 'lxml')
   text = document.find('td', attrs={'class': 'a0'}).text
   print(text+":")
   td_s = document.find_all('td', attrs={'class': 'b4'})
   for td in td_s:
         find = td.find('a')
         if find is not None:
            href_ = 'https://www.mycodes.net' + find['href']
            down = requests.get(href_)
            with open('d:/SanMu/'+text+".zip", "wb") as code:
               code.write(down.content)
            break

if __name__ == '__main__':
url_list = get_Url('https://www.mycodes.net/166/')
for url in url_list:
   get_document(url)

获取在线网址版：

成品如上
以下是代码
import requestsfrom bs4 import BeautifulSoup

def get_Url(url):
str_list = []
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
find = soup.find('span', attrs={'class': 'current'})
sum = int(find.text.split('/'))
for i in range(sum):
   if i == 0:
         str_list.append('https://www.mycodes.net/166/')
         continue
   str_list.append('https://www.mycodes.net/166/' + str(i + 1) + '.htm')
return str_list

def get_document(url):
soup = BeautifulSoup(requests.get(url).content, 'lxml')
find_all = soup.find_all('a', attrs={'style': 'color:#006BCD;font-size:14px;'})
a = ''
for value in find_all:
   if a.__eq__(str(value['href'])):
         continue
   a = value['href']
   document = BeautifulSoup(requests.get(value['href']).content, 'lxml')
   text = document.find('td', attrs={'class': 'a0'}).text
   print(text+":")
   td_s = document.find_all('td', attrs={'class': 'b1'})
   for td in td_s:
         find = td.find('a')
         if find is not None:
            print(find['href'])

if __name__ == '__main__':
url_list = get_Url('https://www.mycodes.net/166/')
for url in url_list:
   get_document(url)

ghoob321 发表于 2020-10-27 19:27

火钳刘明

子睿阿 发表于 2020-11-3 09:56

冒昧打搅，想麻烦带佬爬一下4399小游戏的素材和源码
http://www.4399.com/flash/10916.htm#search3
跪谢！跪谢！跪谢！

zj1d 发表于 2021-9-6 09:17

down = requests.get(url=href_,verify=False)
代码ssl报错的小伙伴
42行这样改

xiaonianxxx 发表于 2022-9-7 17:14

感谢分享，最近开始自学这个

页: [1]

吾爱破解 - 52pojie.cn's Archiver

python爬虫下载某网站156个网页小游戏素材