python "\xa0"怎样去除？

double07 发表于 2021-5-9 17:35

取列表下标【2】的地址数据，按此逻辑取出来的是“\xa0”，即空。请问如何修改代码在取列表下标【2】情况下，得到对应的地址？

import chardet
import requests
from bs4 import BeautifulSoup

response = requests.get("https://itemcdn.tmall.com/desc/icoss1752193898fe9ed123908bf24e?var=desc")
encodingInfo = chardet.detect(response.content)
r_response = response.content.decode(encodingInfo['encoding'], 'ignore')
a = response.text.strip('var desc=')
soup = BeautifulSoup(a, features="lxml")
lst = []
for l in soup.find_all('span'):
lst.append(l.text)
print(lst)

cdsgg 发表于 2021-5-9 17:46

\ xa0实际上是Latin1（ISO 8859-1）中的连续字符，也是chr（160）。应该将其替换为空格。
replace("\xa0",' ')

cmy2019 发表于 2021-5-9 17:47

没懂你说的什么意思，看你代码里的链接的网页源代码，在“拍品名称”这个span后面的那个span的内容就是&nbsp，本来就是空行啊？

double07 发表于 2021-5-9 17:50

cmy2019 发表于 2021-5-9 17:47
没懂你说的什么意思，看你代码里的链接的网页源代码，在“拍品名称”这个span后面的那个span的内容就是&nbs ...
要把这个空行删掉，否则取列表下标【2】，就把这个空行取出来了。看截图

double07 发表于 2021-5-9 17:52

本帖最后由 double07 于 2021-5-9 18:04 编辑

cdsgg 发表于 2021-5-9 17:46
\ xa0实际上是Latin1（ISO 8859-1）中的连续字符，也是chr（160）。应该将其替换为空格。
replace("\xa0", ...
请问代码加在哪里？我试出来仍然替换不了

万神fake 发表于 2021-5-9 18:31

print()打印出来那个字符是给你看的,其实你把他写入到文件,或者运用到其他地方,就会显示正常

姓木名木木 发表于 2021-5-9 18:59

\xa0 是不间断空白符，就是空格

姓木名木木 发表于 2021-5-9 19:05

加个判断，要事是空格就不添加到list里面
import chardet
import requests
from bs4 import BeautifulSoup

response = requests.get("https://itemcdn.tmall.com/desc/icoss1752193898fe9ed123908bf24e?var=desc")
encodingInfo = chardet.detect(response.content)
r_response = response.content.decode(encodingInfo['encoding'], 'ignore')
a = response.text.strip('var desc=')
soup = BeautifulSoup(a, features="lxml")
lst = []
for l in soup.find_all('span'):
if l.text == "\xa0":
pass
else:
lst.append(l.text)
print(lst)

咸鱼灭 发表于 2021-5-9 19:07

直接循环的时候加个判断就好了
import chardet
import requests
from bs4 import BeautifulSoup

response = requests.get("https://itemcdn.tmall.com/desc/icoss1752193898fe9ed123908bf24e?var=desc")
encodingInfo = chardet.detect(response.content)
r_response = response.content.decode(encodingInfo['encoding'], 'ignore')
a = response.text.strip('var desc=')
soup = BeautifulSoup(a, features="lxml")
lst = []
for l in soup.find_all('span'):
if not l.text == '\xa0':
lst.append(l.text)
print(lst)

double07 发表于 2021-5-9 23:01

本帖最后由 double07 于 2021-5-9 23:07 编辑

感谢楼上热心回复:handshake

页: [1]

吾爱破解 - 52pojie.cn's Archiver

python "\xa0"怎样去除？