valve117 发表于 2024-7-8 21:07

想用request爬取公司网站的内容,求大佬捞捞

想爬公司网站的内容实现办公自动化,在post了网址之后已经登录成功,但是想要获取信息还是提示未登录,老哥们救救
import re, requests

postdic = {'userName': '',
         'password': ''}
url = 'http://10.1.19.203/hbky/login'
ssn = requests.session()
resp1 = ssn.post(url, data=postdic, timeout=1000)
url2 = 'http://10.1.19.203:8080/wzgy/jt/contract/flow/contract_list_node_1.jsp'
resp2 = ssn.get(url2, timeout=1000)
url3 = 'http://10.1.19.203:8080/wzgy/jt/contract/flow/com.hbky.wzgy.jt.contract.search.jtContractBaseByConditionPage.biz.ext'
postdic2 = {"condition": {
    "userCategorypernew": "'A01','A02','A03','A04','A05','A06','A07','A08','A09','A10','A11','A12','A13','A14','A15003','A16','A17','A20','A29'",
    "seller": "", "billcode": "", "isexpired": "", "iscomplete": "", "ismanualcomplete": "", "istransfer": "",
    "signtime": "", "starttime": "", "paymethod": "", "purchasetype": "", "audinumber": "", "inputername": "",
    "businessnumber": "LHJHDDX-202403003-3", "issign": "", "name": "", "treaty2": "", "endtime": "", "isinvalid": "",
    "remark": "", "manageroption": "", "islzht": "", "inputer": "", "inputtime": "", "businessername": "",
    "startSigntime": "", "endSigntime": "", "startTreaty2": "", "endTreaty2": "", "startStarttime": "",
    "endStarttime": "", "startEndtime": "", "endEndtime": "", "mtlcode": "", "mtlname": "", "mtltype": "",
    "showValidContract": "false", "startInputTime": "2024-01-01", "endInputTime": "2024-07-08", "invalidtime": "",
    "nfpnum": "", "status": "null", "statuses": ["0", "1", "2", "3", "4", "5", "9"], "type": ""},
    "pageIndex": '0',
    "pageSize": '100', "sortField": "", "sortOrder": "", "page": {"begin": '0', "length": '100'}}
resp3 = ssn.post(url3, data=postdic2, timeout=1000)
print(resp3.text)


valve117 发表于 2024-7-10 08:32

laos 发表于 2024-7-9 12:18
报类型错误,试着把data=postdic2,改成json=postdic2

data改为json后错误码变了:lol,提示ncom.primeton.das.sql.impl.ibatis.sqlmap.client.SqlMapException: ParameterObject or property was not a Collection, Array or Iterator.\n"

酒伴久伴丶 发表于 2024-7-9 01:25

没太看明白,cookie不用带的么

jandyx 发表于 2024-7-9 06:22

把User-agent带上试试

laos 发表于 2024-7-9 06:36

输出cookie对比一下呗

无知灰灰 发表于 2024-7-9 07:43

header带上

Oo不弃 发表于 2024-7-9 08:10

带上cookie,header信息

asd124689 发表于 2024-7-9 08:31

甚至怀疑登录都可能没成功,打印 resp1.text

调味包 发表于 2024-7-9 09:00

本帖最后由 调味包 于 2024-7-9 09:10 编辑

这就不建议写登录的地方了,公司内部的一版不会有加密的过程,找对应的数据接口,然后带cookie请求即可 ,注意cookie时效性就行了,到期了换一个,到期了就复制最新的就可以了,附一段复制网页cookie的快捷代码,新建一个收藏,将代码复制到网址地方即可javascript:(function(){var oInput=document.createElement('input');oInput.value=document.cookie;document.body.appendChild(oInput);oInput.select();document.execCommand("Copy");oInput.className='oInput';oInput.style.display='none';alert('复制成功');})()

lanterxyz 发表于 2024-7-9 09:34

可能header有验证字段没携带。要么用fiddler构建重放请求调试提取必要字段,要么把整个请求的header、data和cookie信息全部都手动写上去,不要偷工减料。

valve117 发表于 2024-7-9 09:45

laos 发表于 2024-7-9 06:36
输出cookie对比一下呗

打印了cookie为啥都是空{:1_937:}
页: [1] 2 3 4
查看完整版本: 想用request爬取公司网站的内容,求大佬捞捞