吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 1150|回复: 2
收起左侧

[求助] 代码中有问题,请大神看下

[复制链接]
lixiaoqiang 发表于 2019-12-2 21:51
[Python] 纯文本查看 复制代码
import requests
from bs4 import BeautifulSoup
import re
url = "https://www.hd-mv.com/mf?o=download"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
           'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
           'Accept-Encoding': 'gzip',
           'Cookie':'_ga=GA1.2.834765233.1575183233; UM_distinctid=16ec03cb04f1db-0a819d105a32cd-b363e65-1fa400-16ec03cb050518; _gid=GA1.2.1947770759.1575286137; PHPSESSID=8porle6ho9st8cc9bfdaukdbm4; CNZZDATA1261301097=99146113-1575183231-https%253A%252F%252Fwww.baidu.com%252F%7C1575286831; wordpress_test_cookie=WP+Cookie+check; wordpress_logged_in_3e4847f2d3806b54c86ce7160083d0b0=lxq1006025203%7C1576500258%7CKFwwEpIYVQSMgTxnBEZtQQloJ4jAOeN6Km4u2Ys3StK%7Ceb68a54e1b1bf1947499ce0bffc6341a1dcedb562c93604191842f37580e739b',
            "Referer": "https://www.hd-mv.com/mf?o=download"
            }
#获取URL下的页面内容
def get(url):
    a=requests.get(url,headers=headers)
    html=a.text
    return html


# 获取免费的MV网址列表
soup = BeautifulSoup(get(url),'lxml')
link_div = soup.find_all('div',class_="img")
links =[div.a.get("href") for div in link_div]
soup2 = BeautifulSoup(get(links[1]),'lxml')
mv_url = soup2.find_all('div',class_="erphpdown-box")
mv = [divs.a.get('href') for divs in mv_url]
soup3 = BeautifulSoup(get(mv[0]),'lxml')
down_url = soup3.find_all('div',class_="erphpdown-msg")
print(down_url[0])





运行之后就出现
[Python] 纯文本查看 复制代码
<div class="erphpdown-msg">
<div class="title"><span>资源名称</span></div>
<p><a  target="_blank">G.E.M.邓紫棋 – 句号 官方完整版MV 
1080P</a></p>
<div class="title"><span>下载地址</span></div><p>文件1地址:<a href="download.php?postid=108930&amp;key=1" target="_blank">点击下载</a></p><div class="title"><span>隐藏信息</span></div><div class="hidden-content" style="border:2px dashed #ff5f33;padding:15px;">http://16629707.d.yyupload.com/down/16629707/官方MV/G.E.M.鄧紫棋【句號 Full Stop】Official Music Video.mp4</div> </div>  



我只想要
[Python] 纯文本查看 复制代码
"download.php?postid=108930&amp;key=1"


应该怎么做?
大神们最好讲解下原理!!!

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

坏人。丶 发表于 2019-12-2 23:13
[Python] 纯文本查看 复制代码
for i in down_url[0].find_all('a'):
        print(i['href'])


这样就可以抓取到a标签里面的href,怎么只抓到第二个搞不懂,萌新
zhaoziqi1995 发表于 2019-12-3 14:17
我也是纯新手,看了你这个改了改 能下到视频,你看看,希望能有帮助
[Python] 纯文本查看 复制代码
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import requests
from bs4 import BeautifulSoup
import re
url = "https://www.hd-mv.com/mf/page/%s?o=download"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
           'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
           'Accept-Encoding': 'gzip',
           'Cookie':'_ga=GA1.2.834765233.1575183233; UM_distinctid=16ec03cb04f1db-0a819d105a32cd-b363e65-1fa400-16ec03cb050518; _gid=GA1.2.1947770759.1575286137; PHPSESSID=8porle6ho9st8cc9bfdaukdbm4; CNZZDATA1261301097=99146113-1575183231-https%253A%252F%252Fwww.baidu.com%252F%7C1575286831; wordpress_test_cookie=WP+Cookie+check; wordpress_logged_in_3e4847f2d3806b54c86ce7160083d0b0=lxq1006025203%7C1576500258%7CKFwwEpIYVQSMgTxnBEZtQQloJ4jAOeN6Km4u2Ys3StK%7Ceb68a54e1b1bf1947499ce0bffc6341a1dcedb562c93604191842f37580e739b',
            "Referer": "https://www.hd-mv.com/mf?o=download"
            }
#获取URL下的页面内容
def get(url):
    a=requests.get(url,headers=headers)
    html=a.text
    return html

def downMp4(url, name):
    r = requests.get(url, stream=True)	 
    with open('mv\\' + name + '.mp4', "wb") as mp4:
        for chunk in r.iter_content(chunk_size=1024 * 1024):
            if chunk:
                mp4.write(chunk)

for j in range(1,3):
	# 获取免费的MV网址列表
	soup = BeautifulSoup(get(url%j),'lxml')
	link_div = soup.find_all('div',class_="img")
	# print(link_div)
	links =[div.a.get("href") for div in link_div]
	print(links)
	for i in range(len(links)):
		soup2 = BeautifulSoup(get(links[i]),'lxml')
		mv_url = soup2.find_all('div',class_="erphpdown-box")
		# print(mv_url)
		mv = [divs.a.get('href') for divs in mv_url]
		# print(mv)
		if mv[0] == "https://www.hd-mv.com/user?action=vip":
			continue
		soup3 = BeautifulSoup(get(mv[0]),'lxml')
		# print(soup3)
		down_url = soup3.find_all('div',class_="erphpdown-msg")
		# print("down_url = ",down_url[0])
		need = down_url[0].find_all('div')
		if len(need)>3:
			url= need[3].text
			name = down_url[0].a.text
			print("开始下载")
			downMp4(url, name)
			print("下载结束")




 


免费评分

参与人数 1吾爱币 +1 热心值 +1 收起 理由
lixiaoqiang + 1 + 1 谢谢@Thanks!

查看全部评分

您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-26 22:21

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表