绝版coco 发表于 2019-7-28 07:39

python 爬虫bs4求助

如图所示

绝版coco 发表于 2019-7-28 08:01

import requests
from bs4 import BeautifulSoup
#url="http://www.hzhr.com/Web/Person/List.html"
headers={"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}


html=requests.get("http://www.hzhr.com/Web/Person/List.html",headers=headers)
soup=BeautifulSoup(html, "lxml")
data=soup.select("div.txt_add > p.link_add")
for i in data:
    title=i.get_text

报错:Traceback (most recent call last):
File "F:/py-lianxi/1.py", line 8, in <module>
    soup=BeautifulSoup(html, "lxml")
File "C:\Users\Hasee\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\__init__.py", line 245, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()



52sczzj 发表于 2019-7-28 08:13

soup=BeautifulSoup(html, "lxml")改为soup=BeautifulSoup(html.content, "lxml")
试一下

518 发表于 2019-7-28 08:18

html.text

daimiaopeng 发表于 2019-7-28 08:20

正解:
soup=BeautifulSoup(html.text, "lxml"),BeautifulSoup()第一个参数是文本而不是一个对象

Eric_zhao 发表于 2019-7-28 09:19

绝版coco 发表于 2019-7-28 08:01
import requests
from bs4 import BeautifulSoup
#url="http://www.hzhr.com/Web/Person/List.html"


soup=BeautifulSoup(html.text, "lxml")
是html.text


建议使用 scrapyxpath

Eric_zhao 发表于 2019-7-28 09:20


soup=BeautifulSoup(html.text, "lxml")
是html.text


建议使用 scrapyxpath

YXK 发表于 2019-7-28 10:17

建议使用 scrapyxpath
页: [1]
查看完整版本: python 爬虫bs4求助