本帖最后由 三木零 于 2021-7-21 17:34 编辑
爬虫运行的时候是从start_requests发起POST请求,然后请求回来的数据会给到parse函数
但是我不知道怎么才能再次来调用start_requests来发送POST请求
求大佬告知
自己试了以下在parse写了一个回调的代码,但是会报这个错误:
[Asm] 纯文本查看 复制代码 class CarSpider(scrapy.Spider):
n = 1
name = 'car'
# allowed_domains = ['x']
start_urls = ['https://qdfront.pcauto.com.cn/interface/usedcar/chelistInfolist.xsp']
def parse(self, response, **kwargs):
# print("返回的数据")
data = json.loads(response.text)["data"]["list"]
print(data[0]["city"])
url = response.url
yield response.follow(url=url, callback=self.start_urls)
def start_requests(self):
post_data = {
"city": f"{CarSpider.n}",
"cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
"isvip": "1",
"legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
"searchType": "2"
}
CarSpider.n += 1
yield scrapy.FormRequest(url=self.start_urls[0], formdata=post_data)
报错信息,虽然知道是类型错误,但是实在不知道怎么弄啊
[Asm] 纯文本查看 复制代码 TypeError: callback must be a callable, got list
目前一直实验代码是可以调用POST请求了,但是第一次请求还是GET请求
[Asm] 纯文本查看 复制代码 class CarSpider(scrapy.Spider):
n = 1
name = 'car'
# allowed_domains = ['x']
start_urls = ['https://qdfront.pcauto.com.cn/interface/usedcar/chelistInfolist.xsp']
def parse(self, response, **kwargs):
post_data = {
"city": f"{CarSpider.n}",
"cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
"isvip": "1",
"legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
"searchType": "2"
}
CarSpider.n += 1
url = response.url
yield scrapy.FormRequest(url=url, formdata=post_data, callback=self.parse)
print(response.text)
emmm。。。慢慢弄就自己解决了
[Asm] 纯文本查看 复制代码 class CarSpider(scrapy.Spider):
n = 2
name = 'car'
# allowed_domains = ['x']
start_urls = ['https://qdfront.pcauto.com.cn/interface/usedcar/chelistInfolist.xsp']
def parse(self, response, **kwargs):
post_data = {
"city": f"{CarSpider.n}",
"cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
"isvip": "1",
"legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
"searchType": "2"
}
CarSpider.n += 1
url = response.url
yield scrapy.FormRequest(url=url, formdata=post_data, callback=self.parse)
print(len(json.loads(response.text)))
def start_requests(self):
yield scrapy.FormRequest(
url=self.start_urls[0],
callback=self.parse,
formdata={
"city": "1",
"cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
"isvip": "1",
"legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
"searchType": "2"
}
)
发送POST请求需要在parse函数中使用scrapy.FormRequest()来回调自己多次发送请求
但是这样会有一个问题,就是第一次发送请求的时候默认是GET请求
想一直是POST请求就需要重写start_requests()方法
这个方法是请求第一次发送的时候调用的,只会执行一次
只要重写他,在他内部使用POST请求来访问url就可以了 |