关于scrapy的POST多次发送POST请求

三木零 发表于 2021-7-21 17:12

本帖最后由三木零于 2021-7-21 17:34 编辑

爬虫运行的时候是从start_requests发起POST请求，然后请求回来的数据会给到parse函数
但是我不知道怎么才能再次来调用start_requests来发送POST请求
求大佬告知{:301_974:}
自己试了以下在parse写了一个回调的代码，但是会报这个错误：
class CarSpider(scrapy.Spider):
n = 1
name = 'car'
# allowed_domains = ['x']
start_urls = ['https://qdfront.pcauto.com.cn/interface/usedcar/chelistInfolist.xsp']

def parse(self, response, **kwargs):
   # print("返回的数据")
   data = json.loads(response.text)["data"]["list"]
   print(data["city"])
   url = response.url
   yield response.follow(url=url, callback=self.start_urls)

def start_requests(self):
   post_data = {
         "city": f"{CarSpider.n}",
         "cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
         "isvip": "1",
         "legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
         "searchType": "2"
   }
   CarSpider.n += 1
   yield scrapy.FormRequest(url=self.start_urls, formdata=post_data)
报错信息，虽然知道是类型错误，但是实在不知道怎么弄啊{:301_1008:}
TypeError: callback must be a callable, got list
目前一直实验代码是可以调用POST请求了，但是第一次请求还是GET请求
class CarSpider(scrapy.Spider):
n = 1
name = 'car'
# allowed_domains = ['x']
start_urls = ['https://qdfront.pcauto.com.cn/interface/usedcar/chelistInfolist.xsp']

def parse(self, response, **kwargs):
   post_data = {
         "city": f"{CarSpider.n}",
         "cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
         "isvip": "1",
         "legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
         "searchType": "2"
   }
   CarSpider.n += 1
   url = response.url
   yield scrapy.FormRequest(url=url, formdata=post_data, callback=self.parse)
   print(response.text)

emmm。。。慢慢弄就自己解决了{:1_918:}
class CarSpider(scrapy.Spider):
n = 2
name = 'car'
# allowed_domains = ['x']
start_urls = ['https://qdfront.pcauto.com.cn/interface/usedcar/chelistInfolist.xsp']

def parse(self, response, **kwargs):
   post_data = {
         "city": f"{CarSpider.n}",
         "cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
         "isvip": "1",
         "legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
         "searchType": "2"
   }
   CarSpider.n += 1
   url = response.url
   yield scrapy.FormRequest(url=url, formdata=post_data, callback=self.parse)
   print(len(json.loads(response.text)))

def start_requests(self):
   yield scrapy.FormRequest(
         url=self.start_urls,
         callback=self.parse,
         formdata={
            "city": "1",
            "cookie": "eC7VPHaUXx0x5v9fDbaYSd2Pjnrzb3j8TGEdoh",
            "isvip": "1",
            "legoOffsetExtra": '{"offset": "0","lastStrategy": "exact","businessType": "cpcall1002176","capacity": "40"}',
            "searchType": "2"
         }
   )
发送POST请求需要在parse函数中使用scrapy.FormRequest()来回调自己多次发送请求
但是这样会有一个问题，就是第一次发送请求的时候默认是GET请求
想一直是POST请求就需要重写start_requests()方法
这个方法是请求第一次发送的时候调用的，只会执行一次
只要重写他，在他内部使用POST请求来访问url就可以了

三木零 发表于 2021-7-21 17:35

到头来还是自己结局了{:1_918:}

页: [1]

吾爱破解 - 52pojie.cn's Archiver

关于scrapy的POST多次发送POST请求