吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 14363|回复: 109
上一主题 下一主题
收起左侧

[Python 转载] 老板让我看招标信息,为了省一个月60块钱的会员费,我写了爬虫

  [复制链接]
跳转到指定楼层
楼主
家有葫芦仔 发表于 2020-6-30 15:59 回帖奖励
本帖最后由 120254184 于 2020-6-30 16:33 编辑

作为一名狗竞价(没有对话甩锅程序,没有成交甩锅销售)老板可能看我比较闲,就让我天天看看山东的招标信息,我请教了一下办公室的小姐姐,她之前是在某某某网站上自费开的会员,然后就能一个网站看所有的,看了一下,会员竟然要60个元一个月,一咬牙一跺脚,还是不开了,然后就自己用新学的python写了个,写的比较简陋,但能凑合着用,写完感觉对python又熟悉了一些。纯小白式写法,新手可以交流交流,大佬就不要嘲讽小弟了。丢人丢大了,下面的if写错行了我竟然没发现,怪不得数据这么少,幸好我身手敏捷,在还没有大佬发现的情况下改过来了
[Python] 纯文本查看 复制代码
# -*- coding:UTF-8 -*-
import requests
import time
from lxml import etree
import re
import json

need = ['馆', '展厅', '厅', '装修', '设计', '施工', '景观', '展', '装饰', '基地', '党建', '文化', '空间', '线上', '数字', '策划', '提升', '美丽乡村']
no_need_1 = '中标'
no_need_2 = '结果'
def hzggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('/html/body/div[2]/div[2]/div[2]/div/div[2]/ul/li/a/text()')
    url_1 = html.xpath('/html/body/div[2]/div[2]/div[2]/div/div[2]/ul/li/a/@href')
    time_rq = html.xpath('/html/body/div[2]/div[2]/div[2]/div/div[2]/ul/li/span/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://www.hzsggzyjyzx.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def bzggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@id="right"]/table/tr[1]/td/table/tr/td[2]/a/text()')
    url_1 = html.xpath('//*[@id="right"]/table/tr[1]/td/table/tr/td[2]/a/@href')
    time_rq = html.xpath('//*[@id="right"]/table/tr[1]/td/table/tr/td[3]/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://ggzyjy.binzhou.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def lcggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('/html/body/div[3]/div[2]/div/div[2]/ul/li/a/text()')
    url_1 = html.xpath('/html/body/div[3]/div[2]/div/div[2]/ul/li/a/@href')
    time_rq = html.xpath('/html/body/div[3]/div[2]/div/div[2]/ul/li/span/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://www.lcsggzyjy.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def lyggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('/html/body/div[3]/div[2]/div/div[2]/ul/li/a/text()')
    url_1 = html.xpath('/html/body/div[3]/div[2]/div/div[2]/ul/li/a/@href')
    time_rq = html.xpath('/html/body/div[3]/div[2]/div/div[2]/ul/li/span/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://ggzyjy.linyi.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def rzggzyjyw(url):
    t = []
    u = []
    q = []
    e = []
    m =[]
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@id="DataList1"]/tr/td/li/a/div[1]/text()')
    url_1 = html.xpath('//*[@id="DataList1"]/tr/td/li/a/@href')
    for url_2 in url_1:
        url_3 = url_2[2:]
        e.append(url_3)
    time_rq = html.xpath('//*[@id="DataList1"]/tr/td/li/a/div[2]/text()')
    for title_1 in title:
        title_2 = '|' + title_1.strip() + '|'
        t.append(title_2)
    for e_1 in e:
        e_2 = 'http://ggzyjy.rizhao.gov.cn' + e_1
        u.append(e_2)
    for time_rq_1 in time_rq:
        time_rq_2 = time_rq_1.strip()
        m.append(time_rq_2)
    list_word = zip(m, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def whggzyjyw(url):
    t = []
    u = []
    q = []
    e= []
    m = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    for i in range(1, 11):
        title = html.xpath('/html/body/div[4]/div[3]/div/ul/li[' + str(i) + ']/div/a/text()')
        title_1 = ''.join(title).strip()
        e.append(title_1)
    time_rq = html.xpath('/html/body/div[4]/div[3]/div/ul/li/div/div/text()')
    for time_rq_1 in time_rq:
        time_rq_2 = time_rq_1.strip()
        m.append(time_rq_2)
    url_1 = html.xpath('/html/body/div[4]/div[3]/div/ul/li[1]/div/a/@href')
    for url_2 in url_1:
        url_3 = 'http://www.whggzyjy.cn'+url_2
        u.append(url_3)
    for e_1 in e:
        e_2 = '|'+e_1+'|'
        t.append(e_2)
    list_word = zip(m, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def taggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@id="right_table"]/table/tr/td[2]/a/text()')
    url_1 = html.xpath('//*[@id="right_table"]/table/tr/td[2]/a/@href')
    time_rq = html.xpath('//*[@id="right_table"]/table/tr/td[3]/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://www.taggzyjy.com.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def jnggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
        'Content-Type': 'application/json'
    }
    time.sleep(3)
    payloadData = {
        "FilterText": "",
        "categoryCode": "503000",
        "maxResultCount": 20,
        "skipCount": 0,
        "tenantId": "3"

    }
    data = json.dumps(payloadData)
    r = requests.post(url, data=data, headers=headers).text
    title = re.findall(r'title":"(.*?)",', r)
    url_1 = re.findall(r'"id":"(.*?)"},', r)
    time_rq = re.findall(r'"releaseDate":"(.*?)T', r)
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://ggzy.jining.gov.cn/JiNing/Bulletins/Detail/' + url_2+'/?CategoryCode=503000'
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def wfggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@class="info-form"]/table/tbody/tr/td[3]/span/a/text()')
    url_1 = html.xpath('//*[@class="info-form"]/table/tbody/tr/td[3]/span/a/@href')
    time_rq = html.xpath('//*[@class="info-form"]/table/tbody/tr/td[4]/span/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://ggzy.weifang.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def dyggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@height="25"]/td[2]/a/font/text()')
    url_1 = html.xpath('//*[@height="25"]/td[2]/a/@href')
    time_rq = html.xpath('//*[@height="25"]/td[3]/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://ggzy.dongying.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def zzggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@width="98%"]/tr/td[3]/a/text()')
    url_1 = html.xpath('//*[@width="98%"]/tr/td[3]/a/@href')
    time_rq = html.xpath('//*[@width="98%"]/tr/td[4]/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://www.zzggzy.com' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def zbggzyjyw(url):
    t = []
    u =[]
    q = []
    e=[]
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@id="MoreInfoList1_DataGrid1"]/tr/td[2]/a/text()')
    url_1 = html.xpath('//*[@id="MoreInfoList1_DataGrid1"]/tr/td[2]/a/@href')
    time_rq = html.xpath('//*[@id="MoreInfoList1_DataGrid1"]/tr/td[3]/text()')
    for time_rq_1 in time_rq:
        time_rq_2 = time_rq_1.strip()
        e.append(time_rq_2)
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'http://ggzyjy.zibo.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(e, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def qdggzyjyw(url):
    t = []
    u = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    time.sleep(3)
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@class="info_con"]/table/tr/td/a/@title')
    url_1 = html.xpath('//*[@class="info_con"]/table/tr/td/a/@href')
    time_rq = html.xpath('//*[@class="info_con"]/table/tr/td[2]/text()')
    for title_1 in title:
        title_2 = '|' + title_1 + '|'
        t.append(title_2)
    for url_2 in url_1:
        url_3 = 'https://ggzy.qingdao.gov.cn' + url_2
        u.append(url_3)
    list_word = zip(time_rq, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

def sdggzyjyzx(url):
    u = []
    t = []
    q = []
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    r = requests.get(url, headers=headers).text
    html = etree.HTML(r)
    title = html.xpath('//*[@class="ewb-list"]/li/a/text()')
    date = html.xpath('//*[@class="ewb-list"]/li/span/text()')
    url_1 = html.xpath('//*[@class="ewb-list"]/li/a/@href')
    for url_2 in url_1:
        url_3 = 'http://ggzyjyzx.shandong.gov.cn' + url_2
        u.append(url_3)
    for title_1 in title:
        title_2 = ' | ' + title_1 + ' | '
        t.append(title_2)
    list_word = zip(date, t, u)
    for list_word_1 in list_word:
        list_word_2 = ''.join(list_word_1)
        q.append(list_word_2)
    for tt in q:
        for need_1 in need:
            if need_1 in tt:
                if tt != []:
                    print(need_1)
                    if no_need_1 not in tt:
                        if no_need_2 not in tt:
                            print(tt)

if __name__ == '__main__':
    for i in range(1, 33):
        if i == 1:
            print('正在爬取山东公共资源交易中心(政府采购)')
            url = 'http://ggzyjyzx.shandong.gov.cn/003/003001/003001001/moreinfo.html'
            sdggzyjyzx(url)
        time.sleep(3)
        if i == 2:
            print('正在爬取山东公共资源交易中心(建筑工程)')
            url = 'http://ggzyjyzx.shandong.gov.cn/003/003004/003004001/moreinfo.html'
            sdggzyjyzx(url)
        time.sleep(3)
        if i == 3:
            for z in range(1, 5):
                url = 'https://ggzy.qingdao.gov.cn/Tradeinfo-GGGSList/0-0-0?pageIndex=' + str(z)
                print('正在爬取青岛公共资源交易平台工程建设招标公告第' + str(z) + '页')
                qdggzyjyw(url)
        time.sleep(3)
        if i == 4:
            for z in range(1, 5):
                url = 'https://ggzy.qingdao.gov.cn/Tradeinfo-GGGSList/1-1-0?pageIndex=' + str(z)
                print('正在爬取青岛公共资源交易平台政府采购采购公告第' + str(z) + '页')
                qdggzyjyw(url)
        time.sleep(3)
        if i ==5:
            for z in range(1,13):
                if z ==1:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001001/MoreInfo.aspx?CategoryNum=268698113'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(市本级)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==2:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001002/MoreInfo.aspx?CategoryNum=268698114'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(张店区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==3:
                    url ='http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001013/MoreInfo.aspx?CategoryNum=268698123'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(淄川区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==4:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001009/MoreInfo.aspx?CategoryNum=2001001009'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(博山区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==5:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001010/MoreInfo.aspx?CategoryNum=268698120'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(周村区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==6:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001007/MoreInfo.aspx?CategoryNum=268698119'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(临淄区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==7:
                    url ='http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001008/MoreInfo.aspx?CategoryNum=2001001008'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(桓台县)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==8:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001006/MoreInfo.aspx?CategoryNum=268698118'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(高青县)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==9:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001012/MoreInfo.aspx?CategoryNum=268698122'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(沂源县)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==10:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001004/MoreInfo.aspx?CategoryNum=268698116'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(高新区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==11:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001005/MoreInfo.aspx?CategoryNum=268698117'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(文昌湖区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z ==12:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002001/002001001/002001001011/MoreInfo.aspx?CategoryNum=268698121'
                    print('正在爬取淄博公共资源交易平台建设工程招标公告(经济开发区)')
                    time.sleep(3)
                    zbggzyjyw(url)
        if i == 6:
            for z in range(1, 13):
                if z == 1:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001001/MoreInfo.aspx?CategoryNum=268960257'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(市本级)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 2:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001002/MoreInfo.aspx?CategoryNum=268960258'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(张店区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 3:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001011/MoreInfo.aspx?CategoryNum=268960265'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(淄川区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 4:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001007/MoreInfo.aspx?CategoryNum=268960263'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(博山区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 5:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001012/MoreInfo.aspx?CategoryNum=268960266'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(周村区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 6:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001009/MoreInfo.aspx?CategoryNum=2002001009'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(临淄区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 7:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001010/MoreInfo.aspx?CategoryNum=268960264'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(桓台县)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 8:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001008/MoreInfo.aspx?CategoryNum=2002001008'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(高青县)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 9:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001006/MoreInfo.aspx?CategoryNum=268960262'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(沂源县)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 10:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001004/MoreInfo.aspx?CategoryNum=268960260'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(高新区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 11:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001005/MoreInfo.aspx?CategoryNum=268960261'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(文昌湖区)')
                    time.sleep(3)
                    zbggzyjyw(url)
                if z == 12:
                    url = 'http://ggzyjy.zibo.gov.cn/TPFront/jyxx/002002/002002001/002002001013/MoreInfo.aspx?CategoryNum=268960267'
                    print('正在爬取淄博公共资源交易平台政府采购招标公告(经济开发区)')
                    time.sleep(3)
                    zbggzyjyw(url)
        if i ==7:
            for z in range(1,3):
                if z ==1:
                    url = 'http://www.zzggzy.com/TPFront/jyxx/070001/070001001/'
                    print('正在爬取枣庄公共资源交易平台建设工程招标公告')
                    time.sleep(3)
                    zzggzyjyw(url)
                if z==2:
                    for y in range(1,4):
                        url = 'http://www.zzggzy.com/TPFront/jyxx/070002/070002001/Paging='+str(y)
                        print('正在爬取枣庄公共资源交易平台政府采购采购公告第'+str(y)+'页')
                        time.sleep(3)
                        zzggzyjyw(url)
        if i ==8:
            for z in range(1,10):
                url = 'http://ggzy.dongying.gov.cn/dyweb/004/004001/004001001/0040010010'+str(z).rjust(2,'0')+'/'
                print('正在爬取东营公共资源交易平台工程建设招标公告第'+str(z)+'区')
                time.sleep(3)
                dyggzyjyw(url)
        if i ==9:
            for z in range(1,10):
                url = 'http://ggzy.dongying.gov.cn/dyweb/004/004002/004002001/0040020010'+str(z).rjust(2,'0')+'/'
                print('正在爬取东营公共资源交易平台政府采购招标公告第'+str(z)+'区')
                time.sleep(3)
                dyggzyjyw(url)
        if i ==10:
            for z in range(1,4):
                url = 'http://ggzy.weifang.gov.cn/wfggzy/showinfo/moreinfo_gg.aspx?address=&type=&categorynum=004012001&Paging='+str(z)
                print('正在爬取潍坊公共资源交易平台工程建设招标公告第'+str(z)+'页')
                time.sleep(3)
                wfggzyjyw(url)
        if i ==11:
            for z in range(1,6):
                url = 'http://ggzy.weifang.gov.cn/wfggzy/showinfo/moreinfo_gg_zfcgtwo.aspx?address=&type=&categorynum=004002001&Paging='+str(z)
                print('正在爬取潍坊公共资源交易平台政府采购招标公告第'+str(z)+'页')
                time.sleep(3)
                wfggzyjyw(url)
        if i ==12:
            for z in range(1,4):
                url = 'http://ggzy.weifang.gov.cn/wfggzy/showinfo/moreinfo_gg_zfcg_cgxq.aspx?address=&categorynum=004002017&Paging='+str(z)
                print('正在爬取潍坊公共资源交易平台政府需求招标公告第'+str(z)+'页')
                time.sleep(3)
                wfggzyjyw(url)
        if i ==13:
            url = 'http://ggzy.jining.gov.cn/api/services/app/stPrtBulletin/GetBulletinList'
            print('正在爬取济宁公共资源交易平台建筑工程招标公告')
            time.sleep(3)
            jnggzyjyw(url)
        if i ==14:
            for z in range(1,8):
                url = 'http://www.taggzyjy.com.cn/Front/jyxx/075001/075001001/07500100100'+str(z)+'/'
                print('正在爬取泰安公共资源交易平台建设工程招标公告第'+str(z)+'区')
                time.sleep(3)
                taggzyjyw(url)
        if i ==15:
            for x in range(1,3):
                if x ==1:
                    for z in range(1,8):
                        url = 'http://www.taggzyjy.com.cn/Front/jyxx/075002/075002004/07500200400'+str(z)+'/'
                        print('正在爬取泰安公共资源交易平台政府采购需求公告第' + str(z) + '区')
                        time.sleep(3)
                        taggzyjyw(url)
                if x ==2:
                    for z in range(1,8):
                        url = 'http://www.taggzyjy.com.cn/Front/jyxx/075002/075002001/07500200100'+str(z)+'/'
                        print('正在爬取泰安公共资源交易平台政府采购采购公告第' + str(z) + '区')
                        time.sleep(3)
                        taggzyjyw(url)
        if i ==16:
            for z in range(1,10):
                print('正在爬取威海公共资源交易平台招标公告')
                if z ==1:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==2:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E7%8E%AF%E7%BF%A0&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==3:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E9%AB%98%E5%8C%BA&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==4:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E7%BB%8F%E5%8C%BA&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==5:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E4%B8%B4%E6%B8%AF&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==6:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E8%8D%A3%E6%88%90&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==7:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E6%96%87%E7%99%BB&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==8:
                    url = 'http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E4%B9%B3%E5%B1%B1&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
                if z ==9:
                    url ='http://www.whggzyjy.cn/queryContent-jyxx.jspx?title=&inDates=&ext=&origin=%E5%8D%97%E6%B5%B7&channelId=563&beginTime=&endTime='
                    time.sleep(1)
                    whggzyjyw(url)
        if i ==17:
            for z in range(1,3):
                print('正在爬取威海公共资源交易平台政府采购公告')
                url ='http://www.whggzyjy.cn/jyxxzfcg/index_'+str(z)+'.jhtml'
                time.sleep(3)
                whggzyjyw(url)
        if i ==18:
            for z in range(1,3):
                url = 'http://ggzyjy.rizhao.gov.cn/rzwz/ShowInfo/MoreJyxxList.aspx?categoryNum=071001001&Paging='+str(z)
                print('正在爬取日照公共资源交易平台建设招标公告'+str(z)+'页')
                time.sleep(3)
                rzggzyjyw(url)
        if i ==19:
            for z in range(1,3):
                url = 'http://ggzyjy.rizhao.gov.cn/rzwz/ShowInfo/MoreJyxxList.aspx?categoryNum=071002001&Paging='+str(z)
                print('正在爬取日照公共资源交易平台需求公告' + str(z) + '页')
                time.sleep(3)
                rzggzyjyw(url)
        if i ==20:
            for z in range(1,4):
                url = 'http://ggzyjy.rizhao.gov.cn/rzwz/ShowInfo/MoreJyxxList.aspx?categoryNum=071002002&Paging='+str(z)
                print('正在爬取日照公共资源交易平台采购公告' + str(z) + '页')
                time.sleep(3)
                rzggzyjyw(url)
        if i ==21:
            print('正在爬取临沂公共资源交易平台建设招标公告')
            for z in range(1,7):
                url = 'http://ggzyjy.linyi.gov.cn/TPFront/jyxx/074001/074001001/07400100100'+str(z)+'/'
                time.sleep(3)
                lyggzyjyw(url)
        if i ==22:
            print('正在爬取临沂公共资源交易平台需求公告')
            for z in range(1,8):
                url = 'http://ggzyjy.linyi.gov.cn/TPFront/jyxx/074002/074002001/07400200100'+str(z)+'/'
                time.sleep(3)
                lyggzyjyw(url)
        if i ==23:
            print('正在爬取临沂公共资源交易平台需求招标公告')
            for z in range(1,8):
                url = 'http://ggzyjy.linyi.gov.cn/TPFront/jyxx/074002/074002002/07400200200'+str(z)+'/'
                time.sleep(3)
                lyggzyjyw(url)
        if i ==24:
            print('正在爬取德州公共资源交易平台招标公告')
            for z in range(1,5):
                if z ==1:
                    url='http://ggzyjy.dezhou.gov.cn/TPFront/xmxx/004001/004001005/004001005001/'
                    time.sleep(3)
                    lyggzyjyw(url)
                if z ==2:
                    url = 'http://ggzyjy.dezhou.gov.cn/TPFront/xmxx/004001/004001001/004001001001/'
                    time.sleep(3)
                    lyggzyjyw(url)
                if z ==3:
                    url = 'http://ggzyjy.dezhou.gov.cn/TPFront/xmxx/004002/004002005/004002005001/'
                    time.sleep(3)
                    lyggzyjyw(url)
                if z ==4:
                    url = 'http://ggzyjy.dezhou.gov.cn/TPFront/xmxx/004002/004002001/004002001001/'
                    time.sleep(3)
                    lyggzyjyw(url)
        if i ==25:
            print('正在爬取聊城公共资源交易平台建设招标公告')
            for z in range(1,6):
                for x in range(1,15):
                    url = 'http://www.lcsggzyjy.cn/lcweb/jyxx/079001/079001001/07900100100'+str(z)+'/0790010010010'+str(x).rjust(2,'0')+'/'
                    time.sleep(3)
                    lcggzyjyw(url)
        if i ==26:
            print('正在爬取聊城公共资源交易平台需求招标公告')
            for z in range(7,21):
                url = 'http://www.lcsggzyjy.cn/lcweb/jyxx/079002/079002001/0790020010'+str(z).rjust(2,'0')+'/'
                time.sleep(3)
                lcggzyjyw(url)
        if i ==27:
            print('正在爬取滨州公共资源交易平台建设招标公告')
            for z in range(1,12):
                url = 'http://ggzyjy.binzhou.gov.cn/bzweb/002/002004/002004001/0020040010'+str(z).rjust(2,'0')+'/'
                time.sleep(3)
                bzggzyjyw(url)
        if i ==28:
            print('正在爬取滨州公共资源交易平台需求招标公告')
            for z in range(1,12):
                url = 'http://ggzyjy.binzhou.gov.cn/bzweb/002/002005/002005008/0020050080'+str(z).rjust(2,'0')+'/'
                time.sleep(3)
                bzggzyjyw(url)
        if i ==29:
            print('正在爬取滨州公共资源交易平台采购招标公告')
            for z in range(1,12):
                url = 'http://ggzyjy.binzhou.gov.cn/bzweb/002/002005/002005004/0020050040'+str(z).rjust(2,'0')+'/'
                time.sleep(3)
                bzggzyjyw(url)
        if i ==30:
            print('正在爬取菏泽公共资源交易平台建设招标公告')
            url = 'http://www.hzsggzyjyzx.gov.cn/jyxx/001001/001001001/about.html'
            time.sleep(3)
            hzggzyjyw(url)
        if i ==31:
            print('正在爬取菏泽公共资源交易平台需求招标公告')
            url = 'http://www.hzsggzyjyzx.gov.cn/jyxx/001002/001002001/about.html'
            time.sleep(3)
            hzggzyjyw(url)
        if i ==32:
            print('正在爬取菏泽公共资源交易平台采购招标公告')
            for z in range(1,4):
                if z ==1:
                    url = 'http://www.hzsggzyjyzx.gov.cn/jyxx/001002/001002003/about.html'
                    time.sleep(3)
                    hzggzyjyw(url)
                else:
                    url = 'http://www.hzsggzyjyzx.gov.cn/jyxx/001002/001002003/'+str(z)+'.html'
                    time.sleep(3)
                    hzggzyjyw(url)

免费评分

参与人数 17吾爱币 +17 热心值 +14 收起 理由
ucuc110 + 1 我很赞同!
52pojie-ceo + 1 + 1 大佬能帮我写个搜索电缆招标的吗?感谢
lbl8029 + 1 我很赞同!
cao777 + 1 老板真抠门
渔夫ii + 1 + 1 我很赞同!
own + 1 我很赞同!
ch1029 + 1 + 1 热心回复!
Joniak + 1 + 1 我很赞同!
jswxtj + 1 + 1 用心讨论,共获提升!payloadData = { "FilterText": ""
huguo002 + 1 + 1 我很赞同!
chaff + 1 + 1 热心回复!
东方星雨 + 1 大佬来个爬这个网站的http://www.cebpubservice.com/
15564124062 + 1 + 1 鼓励转贴优秀软件安全工具和文档!
王星星 + 1 + 1 谢谢@Thanks!
z532931406 + 1 + 1 唉你个小机灵鬼,反手给你一个赞
细水流长 + 2 + 1 热心回复!
苏浩 + 2 + 1 热心回复!

查看全部评分

本帖被以下淘专辑推荐:

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

推荐
fireandice 发表于 2020-6-30 16:01
下班后来我办公室一趟...
推荐
叶子嘤咛 发表于 2020-8-6 09:40
main函数里的if可以去掉,用字典存放索引和类型,不用那么多if判断;
方法可以重构,提高复用。
推荐
额微粒波地 发表于 2020-6-30 16:22
推荐
guowq 发表于 2020-6-30 16:02
学习了,爬虫交给暑假了,希望能学有所成
4#
codinglife 发表于 2020-6-30 16:05
膜拜大佬
5#
 楼主| 家有葫芦仔 发表于 2020-6-30 16:06 |楼主
fireandice 发表于 2020-6-30 16:01
下班后来我办公室一趟...

我又没开小差,去办公室干啥
6#
 楼主| 家有葫芦仔 发表于 2020-6-30 16:07 |楼主
guowq 发表于 2020-6-30 16:02
学习了,爬虫交给暑假了,希望能学有所成

很简单的,好好学,学成了我抄你代码
7#
叶丶 发表于 2020-6-30 16:08
厉害了!!!
8#
阿狂 发表于 2020-6-30 16:09
哇,东西是好东西,能不能再扩展下,搞成全国的,或者可以自定义加爬取网站的
9#
guowq 发表于 2020-6-30 16:11
120254184 发表于 2020-6-30 16:07
很简单的,好好学,学成了我抄你代码

我也希望能有这一天
10#
zhangchuanfei 发表于 2020-6-30 16:13
强,支持一下
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-25 17:28

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表