吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 2124|回复: 13
收起左侧

[Python 原创] 爬取优学院课程答案

[复制链接]
葫芦娃う 发表于 2023-4-29 03:40
爬取优学院课程答案,只测试了英语,其它不知道可以用吗
首先进入到该页面,把网址复制下来,
Snipaste_2023-04-29_03-15-29.png
Snipaste_2023-04-29_03-24-25.png
替换代码里面的网址,然后F!2复制Cookies到代码里面,
并把Cookies里面的AUTHORIZATION值或者token值填写到headers(这两个值一样)里面相应的地方(就那一个地方)

注意:应为该源码是单线程跑,所以跑的速度有点慢,相应源码处有解释
[Python] 纯文本查看 复制代码
import re
import requests
from lxml import etree
from selenium.webdriver import Firefox
from selenium.webdriver.support.select import Select
from selenium.webdriver.firefox.options import Options
import json
from jsonpath import jsonpath
import time


class You(object):
    def __init__(self,url):
        self.name = None
        self.tail = None
        self.url = url
        self.headers = {
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
            "Connection":"keep-alive",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0",
            "Authorization":'这里填写',
        }
        Cookie = '这里填cookies'
        self.cookie = {
            'Cookie':Cookie.encode('utf-8').decode("latin1")

        }


    def get_answer(self,parentId,quetionID):

        url ='https://api.ulearning.cn/questionAnswer/'+str(quetionID)+'?parentId='+str(parentId)
        print(url)
        answerData = requests.get(url,headers=self.headers,cookies=self.cookie)
        answerJson = json.loads(answerData.content.decode())
        print(answerJson)
        try:
            answer = answerJson['correctAnswerList']
            idx = 1
            self.f.write('  答案: ')
            if len(answer) == 1:
                self.f.write(answer[0]+"  \n")
            else:
                for ans in answer:
                    self.f.write(str(idx)+'. '+ans+'    ')
                    idx = idx+1
                self.f.write('\n')
        except KeyError:
            return ''

    def DataReplace(self,text):
        text = text.replace('<br>','\n')
        Re = re.compile('<.*?>',re.S)
        text = re.sub(Re,'',text)
        text = text.replace('<strong>','').replace('&nbsp;','').replace('</strong>','')
        return text

    def get_data(self):
        html_json = json.loads(self.text.content.decode())
        #$.wholepageItemDTOList[1].wholepageDTOList[1].content'
        wholepageItemDTOList = jsonpath(html_json,'$.wholepageItemDTOList.')
        for wholepageItemDTOItem in wholepageItemDTOList[0]:
            for wholepageDTOList in wholepageItemDTOItem['wholepageDTOList']:
                if wholepageDTOList['content'] == 'Unit Objective':
                    break
                else:
                    try:
                        partenID = wholepageDTOList['id']
                    except KeyError:
                        pass
                    for coursepageDTOItem in wholepageDTOList['coursepageDTOList']:
                        if 'questionDTOList' in coursepageDTOItem:
                            for questionItem in coursepageDTOItem['questionDTOList']:
                                titleID = 1
                                try:
                                    questionID = questionItem['questionid']
                                except KeyError:
                                    pass
                                title = self.DataReplace(questionItem['title'])
                                if title == '':
                                    break
                                self.f.write(str(titleID)+title+'\n')
                                titleID = titleID+1
                                #choiceitemModels
                                try:
                                    idx = 'A'
                                    for choiceitemModels in questionItem['choiceitemModels']:
                                        choiceitemModelsTitle = self.DataReplace(choiceitemModels['title'])
                                        questionID = choiceitemModels['questionid']
                                        self.f.write(idx+'. '+choiceitemModelsTitle+'   ')
                                        idx = chr(ord(idx)+1)
                                    self.f.write('\n')
                                except:
                                    pass
                                #subQuestionModels层
                                try:
                                    idx = 1
                                    for subQuestionModelsItem in questionItem['subQuestionModels']:
                                        questionID = subQuestionModelsItem['questionid']
                                        subQuestionModelsTitle = self.DataReplace(subQuestionModelsItem['title'])
                                        # if subQuestionModelsTitle == '':
                                        #     break
                                        self.f.write('  ('+str(idx)+'). '+subQuestionModelsTitle+'\n')
                                        idx = idx+1
                                        try:
                                            choiceitemModelsIDX = 'A'
                                            for choiceitemModelsItem in subQuestionModelsItem['choiceitemModels']:
                                                choiceitemModelsTitle = self.DataReplace(choiceitemModelsItem['title'])
                                                if choiceitemModelsTitle == '':
                                                    continue
                                                self.f.write('      '+choiceitemModelsIDX+'.  '+choiceitemModelsTitle+'\n')
                                                choiceitemModelsIDX = chr(ord(choiceitemModelsIDX)+1)
                                            try:
                                                self.get_answer(parentId=partenID,quetionID=questionID)
                                            except UnboundLocalError:
                                                pass
                                        except KeyError:
                                            self.get_answer(parentId=partenID,quetionID=questionID)
                                except KeyError:
                                    self.get_answer(parentId=partenID, quetionID=questionID)


    def get_url(self,classId,id):
        url = 'https://api.ulearning.cn/course/stu/'+str(id)+'/directory?classId='+str(classId)
        text = requests.get(url,headers=self.headers,cookies=self.cookie)
        print(text)
        urlJson = json.loads(text.content.decode())
        chapters = jsonpath(urlJson,'$.chapters.')
        for chaptersItem in chapters[0]:
            print('等待中')
            time.sleep(10)
            Unilt = self.DataReplace(chaptersItem['nodetitle'])
            self.f.write('           '+Unilt+'\n')
            nodeid = chaptersItem['nodeid']
            ObjectUrl = 'https://api.ulearning.cn/wholepage/chapter/stu/'+str(nodeid)
            text = requests.get(ObjectUrl, headers=self.headers, cookies=self.cookie)
            text.encoding = 'utf-8'
            self.text = text
            self.get_data()


    def init_url(self):

        initNum = re.findall('.*?courseId=(\d+)', self.url)
        url = 'https://courseapi.ulearning.cn/classes/information/student/' + initNum[0] + '?lang=zh'
        classReponce = requests.get(url, headers=self.headers, cookies=self.cookie)
        classId = re.findall('<classId>(\d+)</classId>', classReponce.text)
        courseUrl = 'https://courseapi.ulearning.cn/textbook/student/' + initNum[0] + '/list?lang=zh'
        courseRes = requests.get(courseUrl, headers=self.headers, cookies=self.cookie)
        courseList = re.findall('<courseId>(\d+)</courseId><name>(.*?)</name>', courseRes.text)
        for item in courseList:
            print(item[0] + '   ' + item[1])
            # 是否爬取全部课程
            # self.name = item[1]
            # self.f = open(self.name + '.txt', 'a', encoding='UTF-8')
            # self.get_url(classId[0],item[0])

        #爬取单个课程   根据输出的列表中的数字把11451替换了
        self.name = '新交互大学英语1(第2版)New Interactive College English 1 (2.0)'#修改名字,可以随便改
        self.f = open(self.name + '.txt', 'a', encoding='UTF-8')
        self.get_url(classId[0],'11451')

    def run(self):
        self.init_url()


if __name__ == '__main__':
    # 替换这个url
    url = 'https://courseweb.ulearning.cn/ulearning/index.html#/course/textbook?courseId=93790'
    y = You(url)
    y.run()

免费评分

参与人数 1吾爱币 +7 热心值 +1 收起 理由
苏紫方璇 + 7 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!

查看全部评分

本帖被以下淘专辑推荐:

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

oxding 发表于 2023-4-29 10:26
本帖最后由 oxding 于 2023-4-29 10:29 编辑

except KeyError:
            return ''

如果异常 这是抛出了一个空是什么意思,为什么不用下面常规的写法写啊

try:    passexcept Exception as e:    print("break for :"+str(e))

liudream 发表于 2023-4-29 10:15
头像被屏蔽
lxj2004 发表于 2023-4-29 10:15
lpj518 发表于 2023-4-29 10:21
谢谢分享
oxding 发表于 2023-4-29 11:08
为什么要 定义 self.name = None
        self.tail = None?
lzp734521 发表于 2023-4-29 12:48
提示要重新登陆
 楼主| 葫芦娃う 发表于 2023-4-29 13:31
oxding 发表于 2023-4-29 10:26
except KeyError:
            return ''

有的值没有,我对Python语法并不是太熟悉,你这个我回头看看
 楼主| 葫芦娃う 发表于 2023-4-29 13:32
oxding 发表于 2023-4-29 11:08
为什么要 定义 self.name = None
        self.tail = None?

应该没啥用,代码我改了很多次,而且我并没有注释,所以只要不影响,我就没删
 楼主| 葫芦娃う 发表于 2023-4-29 13:33

你这应该是cookie的问题
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-24 09:18

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表