本帖最后由 panison 于 2021-4-8 09:17 编辑
之前发了一个[小学|初中|高中][人教版|译林版]英语单词分类查询和默写程序源代码。
有网友需要爬取有道翻译的源代码。现分享出来,学以致用,希望能有所帮助。
[Python] 纯文本查看 复制代码
"""
首发52pojie论坛
"""
import requests
from lxml import etree
# 定义函数eng_to_han(),实现英语单词的翻译功能
def eng_to_han(word):
url = "http://dict.youdao.com/w/eng/" + word
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0"}
r = requests.get(url=url, headers=headers)
r.encoding = "utf-8"
html = etree.HTML(r.text)
try:
# 提取音标
txt_uk = html.xpath(".//span[@class='pronounce']/text()")[0].strip()
txt_uk_sm = html.xpath(".//span[@class='phonetic']/text()")[0].strip()
txt_us = html.xpath(".//span[@class='pronounce']/text()")[3].strip()
txt_us_sm = html.xpath(".//span[@class='phonetic']/text()")[1].strip()
sep1 = txt_uk + txt_uk_sm + ";" + txt_us + txt_us_sm
except(Exception):
sep1 = ""
# 提取翻译数据
lines = html.xpath(".//div[@class='trans-container']/ul/li/text()")
ls = list()
for line in lines:
for s in " \n\t":
if s in line:
line = line.replace(s, "")
if len(line) > 0:
ls.append(line)
else:
pass
sep2 = "|".join(ls)
return sep1, sep2
print(eng_to_han("china"))
运行结果如下:
|