baipiao520 发表于 2024-3-22 23:33

取蓝奏云直链教程2(含密码)(附python源码)

本帖最后由 baipiao520 于 2024-3-23 09:47 编辑

# 前情回顾
第一期传送门:[取蓝奏云直链教程(附python源码)](https://www.52pojie.cn/thread-1901884-1-1.html)
上次我们分析了一个无访问密码的单文件分享链接。
上次主要运用了re库,也就是正则表达式来取出网页中的参数,有人推荐我用bs4来提取参数,但是bs4不太适用于JavaScript,所以本文还是使用正则来提取。

# 准备工作
浏览器
python环境

# 开始分析
有了上次的经验,我们直接访问一个带密码的分享链接并查看浏览器
未输入密码前:

输入密码后:

同时查看网络调试:

发现这次的网页反而没有套娃式请求,而是一步到位。
那我们也直接开始request。
```
import requests
url = "https://wwt.lanzouu.com/iW5jF1s99k6j"
password = 6666
headers={
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
response = requests.get(url, headers=headers)
print(response.text)
```
我们观察取回来的网页,发现和上次很相似


只不过这次的ajax脚本在一个down_p()函数中
相比上次还节省了好几步
那我们直接开始提取参数吧!
```
url_pattern = re.compile(r"url\s*:\s*'(/ajaxm\.php\?file=\d+)'")
url_match = url_pattern.search(response.text).group(1)
skdklds_pattern = re.compile(r"var\s+skdklds\s*=\s*'([^']*)';")
skdklds_match = skdklds_pattern.search(response.text).group(1)
print(url_match, skdklds_match)
```
考虑到Match类型我们只需要用到group(1)方法,这次我在定义变量时就直接使用group(1)方法,也方便后续调用。文末我会给出这次和上次的优化后的代码。
接下来是模拟post请求
```
data = {
    'action': 'downprocess',
    'sign': skdklds_match,
    'p': password,
}
headers = {
    "Referer": url,
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}
response2 = requests.post(f"https://{re_domain(url)}{url_match}", headers=headers, data=data)
print(response2.text)
```
password可以是str类型,也可以是int类型,因为在转换为data时都会自动转为str类型,这里看个人喜好。
有了上次的教训,别忘记在协议头中加入Referer。
后面就和之前的一模一样了
```
import json
data = json.loads(response2.text)
dom = data['dom']
url = data['url']
full_url = dom + "/file/" + url
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"cookie": "down_ip=1"
}
response3 = requests.get(full_url, headers=headers, allow_redirects=False)
print(response3.headers['Location'])
```
## 完整程序(带密码)
```
import requests
import re
import json
def re_domain(url):
    pattern_domain = r"https?://([^/]+)"
    match = re.search(pattern_domain, url)
    if match:
      domain = match.group(1)
      return domain
    else:
      return None
url = "https://wwt.lanzouu.com/iW5jF1s99k6j"
password = "6666"
headers={
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
response = requests.get(url, headers=headers)
url_pattern = re.compile(r"url\s*:\s*'(/ajaxm\.php\?file=\d+)'")
url_match = url_pattern.search(response.text).group(1)
skdklds_pattern = re.compile(r"var\s+skdklds\s*=\s*'([^']*)';")
skdklds_match = skdklds_pattern.search(response.text).group(1)
print(url_match, skdklds_match)
data = {
    'action': 'downprocess',
    'sign': skdklds_match,
    'p': password,
}
headers = {
    "Referer": url,
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}
response2 = requests.post(f"https://{re_domain(url)}{url_match}", headers=headers, data=data)
data = json.loads(response2.text)
dom = data['dom']
url = data['url']
full_url = dom + "/file/" + url
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"cookie": "down_ip=1"
}
response3 = requests.get(full_url, headers=headers, allow_redirects=False)
print(response3.headers['Location'])
```
# 如何区分是否需要密码
其实这两个网页的区别还是很大的,也有很多方法可以区分:
1. 有密码的`<title>文件</title>`,没密码的`<title>文件名 - 蓝奏云</title>`
具体方法
```
if "<title>文件</title>" in response.text:
    print("包含密码")
else:
    print("无密码")
```
2. 有密码的包含很多的`<style>`,没密码的没有
具体方法
```
if "<style>" in response.text:
    print("包含密码")
else:
    print("无密码")
```
3. 还有后面的很多函数都是只有有密码的才有,这里就不做演示了。

# 完整程序
```
import requests
import re
import json
def re_domain(url):
    pattern_domain = r"https?://([^/]+)"
    match = re.search(pattern_domain, url)
    if match:
      domain = match.group(1)
      return domain
    else:
      return None
def getwithp(url, password):
    headers={
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response = requests.get(url, headers=headers)
    url_pattern = re.compile(r"url\s*:\s*'(/ajaxm\.php\?file=\d+)'")
    url_match = url_pattern.search(response.text).group(1)
    skdklds_pattern = re.compile(r"var\s+skdklds\s*=\s*'([^']*)';")
    skdklds_match = skdklds_pattern.search(response.text).group(1)
    data = {
      'action': 'downprocess',
      'sign': skdklds_match,
      'p': password,
    }
    headers = {
      "Referer": url,
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response2 = requests.post(f"https://{domain}{url_match}", headers=headers, data=data)
    data = json.loads(response2.text)
    full_url = data['dom'] + "/file/" + data['url']
    headers = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\"",
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "none",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "cookie": "down_ip=1"
    }
    response3 = requests.get(full_url, headers=headers, allow_redirects=False)
    return response3.headers['Location']
def getwithoutp(url):
    headers={
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response = requests.get(url, headers=headers)
    iframe_pattern = re.compile(r'<iframe\s+class="ifr2"\s+name="\d+"\s+src="([^"]+)"\s+frameborder="0"\s+scrolling="no"></iframe>')
    matches = iframe_pattern.findall(response.text)
    response2 = requests.get(f"https://{domain}{matches}", headers=headers)
    pattern = r"'sign'\s*:\s*'([^']+)'"
    sign = re.search(pattern, response2.text).group(1)
    pattern2 = r"url\s*:\s*'([^']+)'"
    url2 = re.search(pattern2, response2.text).group(1)
    data = {
      'action': 'downprocess',
      'signs': '?ctdf',
      'sign': sign,
      'websign': '',
      'websignkey': 'bL27',
      'ves': 1
    }
    headers = {
      "Referer": matches,
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response3 = requests.post(f"https://{domain}{url2}", headers=headers, data=data)
    data = json.loads(response3.text)
    full_url = data['dom'] + "/file/" + data['url']
    headers = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\"",
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "none",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "cookie": "down_ip=1"
    }
    response4 = requests.get(full_url, headers=headers, allow_redirects=False)
    return response4.headers['Location']
url = "https://wwt.lanzouu.com/iW5jF1s99k6j"
password = "6666"
domain = re_domain(url)
headers={
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
response = requests.get(url, headers=headers)
if "<title>文件</title>" in response.text:
    print("包含密码")
    result = getwithp(url, password)
else:
    print("无密码")
    result = getwithoutp(url)
print(result)
```
# 结语
本教程仅供参考学习思路,网页随时会变,并非永久可用。
多文件(文件夹)分享下期再讲。

wzvideni 发表于 2024-3-24 09:33

大佬,我照着你的教程打算自己试一下带密码的文件夹形式的蓝奏云链接,目前已经能把输入密码后的那个界面的json数据给请求出来了,但是请求具体文件时返回为空,不知道是怎么回事,在网页端进入文件夹输入一次密码后再点击具体文件时是不需要输入单个文件的密码的,不知道是不是这个原因,但是我加上Referer头也一样。

大佬如果有时间的话可以看一下吗

代码如下:
import json
import re

import requests


def re_domain(url):
    pattern_domain = r"https?://([^/]+)"
    match = re.search(pattern_domain, url)
    if match:
      domain = match.group()
      return domain
    else:
      return None


url = "https://wwur.lanzout.com/b01rs66mb"
password = "xfgc"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}
response = requests.get(url, headers=headers)
url_match = re.search(r"url\s*:\s*'(/filemoreajax\.php\?file=\d+)'", response.text).group(1)
file_match = re.search(r"\d+", url_match).group()
t_match = re.search(r"var\s+ib\w+\s*=\s*'([^']*)';", response.text).group(1)
k_match = re.search(r"var\s+_h\w+\s*=\s*'([^']*)';", response.text).group(1)

print(url_match)
print(file_match)
print(t_match)
print(k_match)
# print(response.text)
data = {
    'lx': 2,
    'fid': file_match,
    'uid': '1674564',
    'pg': 1,
    'rep': '0',
    't': t_match,
    'k': k_match,
    'up': 1,
    'ls': 1,
    'pwd': password
}
headers = {
    "Referer": url,
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}

print(f"{re_domain(url)}{url_match}")

response2 = requests.post(f"{re_domain(url)}{url_match}", headers=headers, data=data)
# print(response2.text)
data = json.loads(response2.text)
# print(data)
text_list = data['text']

headers = {
    "Referer": url,
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\"",
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "none",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "cookie": "down_ip=1"
}

for text in text_list:
    print(text['name_all'])
    file_url = f"{re_domain(url)}/{text['id']}"
    print(file_url)
    response3 = requests.get(file_url, headers=headers, allow_redirects=False)
    print(response3)
    print(response3.text)
    # print(response3.headers['Location'])

    break

baipiao520 发表于 2024-3-24 10:36

wzvideni 发表于 2024-3-24 09:33
大佬,我照着你的教程打算自己试一下带密码的文件夹形式的蓝奏云链接,目前已经能把输入密码后的那个界面的 ...

你最后取回的file_url其实就是我第一篇里面的不带密码的访问,所以可以直接调用我的函数
import json
import re
import requests

def re_domain(url):
    pattern_domain = r"https?://([^/]+)"
    match = re.search(pattern_domain, url)
    if match:
      domain = match.group()
      return domain
    else:
      return None

def getwithoutp(url):
    headers={
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response = requests.get(url, headers=headers)
    iframe_pattern = re.compile(r'<iframe\s+class="ifr2"\s+name="\d+"\s+src="([^"]+)"\s+frameborder="0"\s+scrolling="no"></iframe>')
    matches = iframe_pattern.findall(response.text)
    response2 = requests.get(f"{domain}{matches}", headers=headers)
    pattern = r"'sign'\s*:\s*'([^']+)'"
    sign = re.search(pattern, response2.text).group(1)
    pattern2 = r"url\s*:\s*'([^']+)'"
    url2 = re.search(pattern2, response2.text).group(1)
    data = {
      'action': 'downprocess',
      'signs': '?ctdf',
      'sign': sign,
      'websign': '2',
      'websignkey': 'xLG2',
      'ves': 1
    }
    headers = {
      "Referer": matches,
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
    }
    response3 = requests.post(f"{domain}{url2}", headers=headers, data=data)
    data = json.loads(response3.text)
    full_url = str(data['dom']) + "/file/" + str(data['url'])
    headers = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "sec-ch-ua": "\"Chromium\";v=\"122\", \"Not(A:Brand\";v=\"24\", \"Microsoft Edge\";v=\"122\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\"",
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "none",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "cookie": "down_ip=1"
    }
    response4 = requests.get(full_url, headers=headers, allow_redirects=False)
    return response4.headers['Location']
url = "https://wwur.lanzout.com/b01rs66mb"
password = "xfgc"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0"
}
response = requests.get(url, headers=headers)
url_match = re.search(r"url\s*:\s*'(/filemoreajax\.php\?file=\d+)'", response.text).group(1)
file_match = re.search(r"\d+", url_match).group()
t_match = re.search(r"var\s+ib\w+\s*=\s*'([^']*)';", response.text).group(1)
k_match = re.search(r"var\s+_h\w+\s*=\s*'([^']*)';", response.text).group(1)
domain = re_domain(url)
print(url_match)
print(file_match)
print(t_match)
print(k_match)
# print(response.text)
data = {
    'lx': 2,
    'fid': file_match,
    'uid': '1674564',
    'pg': 1,
    'rep': '0',
    't': t_match,
    'k': k_match,
    'up': 1,
    'ls': 1,
    'pwd': password
}
print(f"{domain}{url_match}")
response2 = requests.post(f"{domain}{url_match}", headers=headers, data=data)
# print(response2.text)
data = json.loads(response2.text)
# print(data)
text_list = data['text']
for text in text_list:
    print(text['name_all'])
    print(text)
    file_url = f"{domain}/{text['id']}"
    print(file_url)
    print(getwithoutp(file_url))
    break

m96118 发表于 2024-3-23 07:12

讲解的非常到位,谢谢分享

sai609 发表于 2024-3-23 07:29

一,蓝奏云,浏览器自带下载工具即可,秒下
二,123,天翼,度盘,夸克,阿里盘:直链提取后,免注册登陆而下载,有啥办法
PS 不用自己账号,怕封

tsanye 发表于 2024-3-23 07:30

谢谢&#128591;分享,学习

jm1jm1 发表于 2024-3-23 07:45


讲解的很详细,谢谢 分享,慢慢消化中

shallies 发表于 2024-3-23 07:56

学习了,感谢楼主技术分享

saccsf 发表于 2024-3-23 08:06

BBA119 发表于 2024-3-23 08:06

能不能讲讲这个的意义    讲解的很详细,看不懂    谢谢 分享,

WJayden 发表于 2024-3-23 08:31

学习了,挺详细的

anchovy126 发表于 2024-3-23 08:31

谢谢分享,值得学习
页: [1] 2 3 4 5 6 7
查看完整版本: 取蓝奏云直链教程2(含密码)(附python源码)