关于python爬虫相关问题。
想利用python爬取一个网站的图标,网站地址:https://flaticons.net/customize.php?dir=Application&icon=View-Incident.png爬到二级页面了,爬三级页面的时候就没头绪了,如下图红框标记的地方,这个是down页面,但是源代码里面没有该地址(但是手动点击会进入三级页面),是该地址js加密了么,有没有破解思路~求大佬指点一二
https://flaticons.net/icon.php?slug_category=application&slug_icon=view-incident
这个图标吗? 分析不了js就直接分析js请求的网页链接,也就是f12抓包 本帖最后由 pzx521521 于 2021-5-26 17:40 编辑
没有js加密 看html是一个form, 直接跟curl "https://flaticons.net/customize.php?dir=Application&icon=View-Incident.png" ^
-H "authority: flaticons.net" ^
-H "pragma: no-cache" ^
-H "cache-control: no-cache" ^
-H "sec-ch-ua: ^\^" Not A;Brand^\^";v=^\^"99^\^", ^\^"Chromium^\^";v=^\^"90^\^", ^\^"Google Chrome^\^";v=^\^"90^\^"" ^
-H "sec-ch-ua-mobile: ?0" ^
-H "upgrade-insecure-requests: 1" ^
-H "origin: https://flaticons.net" ^
-H "content-type: application/x-www-form-urlencoded" ^
-H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36" ^
-H "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" ^
-H "sec-fetch-site: same-origin" ^
-H "sec-fetch-mode: navigate" ^
-H "sec-fetch-user: ?1" ^
-H "sec-fetch-dest: document" ^
-H "referer: https://flaticons.net/customize.php?dir=Application&icon=View-Incident.png" ^
-H "accept-language: zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7" ^
-H "cookie: PHPSESSID=p70e3ci2v3j0s2fegn1grok0jn; __gads=ID=758a2093593297c7-226b12d5eac8003a:T=1622021420:RT=1622021420:S=ALNI_MYzuXb51WUtJlauQZ2kSlJstFsUEw" ^
--data-raw "background=dark&icon_size=256&icon_color=^%^23FFFFFF&icon_rotate=0&icon_flip=n&shape_id=0&shape_size=512&shape_color=^%^23FFFFFF" ^
--compressed
跟一下很清楚, 往这个地址https://flaticons.net/customize.php?dir=Application&icon=View-Incident.png里面post
加数据 "background=dark&icon_size=256&icon_color=^%^23FFFFFF&icon_rotate=0&icon_flip=n&shape_id=0&shape_size=512&shape_color=^%^23FFFFFF"
会返回一个302 location 里面有地址
https://flaticons.net/custom.php?i=v4ETz7TPwnEXizIQIeIOvcWgv5YiE 确实,是提交一个表单。楼主可以稍微百度学习一下 HTML 的 form。 import requests
from lxml import etree
import os
head = {
"User-Agent": "Mozilla/5.0(Windows NT 10.0;Win64;x64) AppleWebKit/537.36(KHTML, likeGecko) Chrome/71.0.3578.98 Safari/537.36"
}
# 创建一个文件夹(用于存储图片)
if not os.path.exists("./icons"):# 如果images文件夹不存在
os.mkdir("./icons")# 创建文件夹
for i in range(1, 2):
url = "https://flaticons.net/category.php?c=Application&p={}".format(i)
r = requests.get(url=url, headers=head).text
html = etree.HTML(r)
lst = html.xpath("//div[@class='row']/div/a/@href")
for col in lst:
data = {
"background": "dark",
"icon_size": "256",
"icon_color":"# FFFFFF",
"con_rotate": 0,
"icon_flip": "n",
"shape_id": 0,
"shape_size": 512,
"shape_color":"# FFFFFF",
}
baseurl = "https://flaticons.net"+col
response = requests.post(url=baseurl, headers=head, data=data).content.decode("UTF-8")
tree = etree.HTML(response)
down = "https://flaticons.net"+tree.xpath("//div[@class='input-group']/button/@data-value")
# 图标名称
icon_name = tree.xpath("//section[@id='home']//p/b/text()").replace(" ", "-")+".png"# icon名称
icon_data = requests.get(url=down, headers=head).content
icon_path = "icons/" + icon_name
with open(icon_path, "wb") as fp:
fp.write(icon_data)
print(icon_name, "下载成功")
我也是初学python,这是我写的源代码 楼主爬到二级页面的时候可以用下f12
```python
import requests
url = "https://flaticons.net/customize.php?dir=Application&icon=View-Incident.png"
form = {
"background": "dark",
"icon_size": "256",
"icon_color": "#FFFFFF",
"icon_rotate": "0",
"icon_flip": "n",
"shape_id": "0",
"shape_size": "512",
"shape_color": "#FFFFFF"
}
r = requests.post(url,data = form, allow_redirects=False)
target_url = r.headers['Location']
```
target_url就是跳转的地址 lifeixue 发表于 2021-5-26 20:38
import requests
from lxml import etree
import os
老哥怎么学的python,我自学可达不到你这个程度 LZ大大,您好,您之前做的“批量修改v3.0”那个软件有个问题向您请教下,就是我需要改名的文件中,都有逗号,比如:“2021-06-03 10-31-39 (B,Radius8,Smoothing4).jpg” ,逗号是英文下的逗号,导出表格的时候一切都正常,原始路径及命名里显示正常,但是修改完表格导入软件以后,逗号就变成了英文下的分号,比如:“2021-06-03 10-31-39 (B;Radius8;Smoothing4).jpg”,导致源文件无法通过路径查找到,不能进行批量修改。之前的原帖已经关闭评论了,只能在您最近的帖子里给您回复,问问您有没有什么解决方式。
页:
[1]