想哪位玩过某乎盐选的高手给完善一下代码,这个脚本运行后得到的是:请输入知乎盐选文章的分享地址:https://www.zhihu.com/question/268938242/answer/2816770810
Response Content-Type is not application/json, content received:
页面标题: mfyx.top - 该网站正在出售! - mfyx 资源和信息。
[Python] 纯文本查看 复制代码 import time
import requests
from lxml import etree
api_url = "https://mfyx.top/api/search"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
org_url = input("请输入知乎盐选文章的分享地址:")
params = {"url": org_url}
try:
response = requests.get(api_url, params=params, headers=headers)
response.raise_for_status() # 检查状态码是否为 200
except requests.exceptions.HTTPError as e:
print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
else:
# 检查 Content-Type 是否为 application/json
content_type = response.headers.get('Content-Type')
if 'application/json' in content_type:
try:
response_data = response.json()
# 你的后续处理代码...
except ValueError as e:
print(f"JSON Decode Error: {e}")
else:
print("Response Content-Type is not application/json, content received:")
# 使用 lxml 解析 HTML 响应
html_content = response.text
tree = etree.HTML(html_content)
# 假设我们需要提取页面中的某个特定元素,例如 <title>
title = tree.xpath('//title/text()')
if title:
print("页面标题:", title[0])
# 根据实际页面结构,使用适当的 XPath 表达式来提取你需要的信息
# 例如,提取所有的段落文本
paragraphs = tree.xpath('//p/text()')
for p in paragraphs:
print(p.strip())
# ...
|