吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 53080|回复: 152
上一主题 下一主题
收起左侧

[其他原创] 百度网盘资源搜索爬虫源代码开源

    [复制链接]
跳转到指定楼层
楼主
gude 发表于 2016-11-20 17:03 回帖奖励
本帖最后由 gude 于 2017-1-10 22:24 编辑

10.1无聊写的百度网盘爬虫
一共爬了1100w数据做的网盘搜索网站 http://www.fastsoso.cn/
源码: https://github.com/gudegg/yunSpider
直接下载编译好的使用
  • 下载适合自己系统的版本
  • 修改config.ini配置文件,并且放到程序运行的目录下
  • linux: chmod +x spider,然后直接./spider运行或者nohup ./spider 1>log.out 2>err.out &以后台方式运行;win:直接点击spider.exe运行
  • 下载地址:https://github.com/gudegg/yunSpider/releases
说明里面也有java源码









点评

爬虫大概意思就是把一个网页的子网页按顺序访问例如 域名/1 爬的话就从1开始叠加到2 一级一级的访问,直到你设置的终点停止   发表于 2016-12-1 12:49

免费评分

参与人数 46吾爱币 +17 热心值 +45 收起 理由
必学网 + 1 + 1 来晚了,胖次不好用了,楼主这个给力。!!!
R3pickM4 + 1 + 1 用心讨论,共获提升!
looz + 1 热心回复!
15774211127 + 1 + 1 谢谢@Thanks!
Beone + 1 + 1 谢谢@Thanks!
cL4y + 1 + 1 谢谢@Thanks!
qs5f5l + 1 + 1 谢谢@Thanks!
zyjia + 1 + 1 我很赞同!
是谁断了流年 + 1 + 1 谢谢@Thanks!
tjzoo111 + 1 + 1 请问 能爬出 有密码的百度网盘吗 ??
我思故我在1218 + 1 + 1 非常感谢,能不能指导一下细节?
hlink1021 + 1 + 1 热心回复!
jkl5322203 + 2 + 1 哥们能不能联系一下啊。我给你发了邮件!!!
Y6dMONiHiLt + 1 + 1 用心讨论,共获提升!
1453667650 + 1 + 1 谢谢@Thanks!
飞龙骑士 + 1 + 1 用心讨论,共获提升!
Alog + 1 热心回复!
nonin + 1 谢谢@Thanks!
荒野汉尼拔 + 1 谢谢@Thanks!
永远的永远 + 1 我很赞同!
huiji + 1 谢谢@Thanks!
huzikai0424 + 1 已答复!
飘荡的心 + 1 谢谢@Thanks!
chengyixin + 1 谢谢@Thanks!
yhxi1714 + 1 用心讨论,共获提升!
赵伯伯Ooo + 1 谢谢@Thanks!
GaaraZL + 1 谢谢@Thanks!
啥都不懂 + 1 从你的网盘搜索里面找到了好多好东西
深海deepSea + 1 我很赞同!
极地苍狐 + 1 谢谢@Thanks!
烯特勒 + 1 谢谢@Thanks!
一笑的我 + 1 我很赞同!
彭博 + 1 感谢发布原创作品,吾爱破解论坛因你更精彩!
247700432 + 1 热心回复!
ballistic + 1 热心回复!
1593571123 + 1 我很赞同!
wi5101 + 1 谢谢@Thanks!
释迦牟尼 + 1 谢谢@Thanks!
kaboom + 1 谢谢@Thanks!
你若安好OR + 1 用心讨论,共获提升!
特百惠 + 1 这个很好,但是小白不会用,请老师指教,可以吗
332178891 + 1 我很赞同!
论坛守护神 + 1 感谢发布原创作品,吾爱破解论坛因你更精彩!
小树丶 + 1 666 不错不错
Minorittyk + 1 谢谢@Thanks!
蚯蚓翔龙 + 1 谢谢@Thanks!

查看全部评分

本帖被以下淘专辑推荐:

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

推荐
神枪泡泡丶 发表于 2016-11-20 17:18
落落问一句 爬虫是啥。。。
推荐
 楼主| gude 发表于 2016-11-23 16:09 |楼主
yhxi1714 发表于 2016-11-23 15:21
OkhttpUtil能否类提供一下??

[Java] 纯文本查看 复制代码
public class OkhttpUtil {
    public static final OkHttpClient OK_HTTP_CLIENT = new OkHttpClient();
    public static final MediaType MEDIA_TYPE_JSON = MediaType.parse("application/json; charset=utf-8");
    public static final MediaType MEDIA_TYPE_TEXT = MediaType.parse("text/x-markdown; charset=utf-8");

    static {
        //cookie状态保持
        CookieManager cookieManager = new CookieManager();
        CookieHandler.setDefault(cookieManager);
        cookieManager.setCookiePolicy(CookiePolicy.ACCEPT_ALL);
        OK_HTTP_CLIENT.setCookieHandler(cookieManager);
    }


    public static ResponseBody syncGet(String url) {
        Request request = new Request.Builder().url(url).build();
        return getResponseBody(request);
    }

    public static ResponseBody syncGet(String url, Headers headers) {
        Request request = null;
        if (headers != null) {
            request = new Request.Builder().url(url).headers(headers).build();
        } else {
            request = new Request.Builder().url(url).build();
        }
        return getResponseBody(request);
    }

    public static ResponseBody syncGet(String url, Headers headers, Map<Object, Object> params) {
        Request.Builder builder = new Request.Builder();
        StringBuilder sb = new StringBuilder(url);
        judgeParams(params, sb);
        if (headers != null) {
            builder = builder.url(sb.toString()).headers(headers);
        } else {
            builder = builder.url(sb.toString());
        }
        Request request = builder.build();
        return getResponseBody(request);
    }


    public static void asynGet(String url, Headers headers, Map<Object, Object> params, Callback callback) {
        Request.Builder builder = new Request.Builder();
        StringBuilder sb = new StringBuilder(url);
        judgeParams(params, sb);
        if (headers != null) {
            builder = builder.url(sb.toString()).headers(headers);
        } else {
            builder = builder.url(sb.toString());
        }
        Request request = builder.build();
        OK_HTTP_CLIENT.newCall(request).enqueue(callback);
    }

    private static void judgeParams(Map<Object, Object> params, StringBuilder sb) {
        if (params != null) {
            int params_size = params.size();
            if (params_size > 0) {
                int index = 0;
                sb.append("?");
                for (Map.Entry entry : params.entrySet()) {
                    index++;
                    if (index != params_size) {
                        sb.append(entry.getKey()).append("=").append(entry.getValue()).append("&");
                    } else {
                        sb.append(entry.getKey()).append("=").append(entry.getValue());
                    }

                }
            }
        }
    }

    private static ResponseBody getResponseBody(Request request) {
        try {
            Response response = OK_HTTP_CLIENT.newCall(request).execute();
            if (response.isSuccessful()) {
                return response.body();
            }
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
        return null;
    }

    public static ResponseBody syncPostForm(String url, Headers headers, Map<String, String> params) {
        Request.Builder builder = new Request.Builder();
        if (params != null && params.size() > 0) {
            FormEncodingBuilder formEncodingBuilder = new FormEncodingBuilder();
            for (Map.Entry<String, String> entry : params.entrySet()) {
                formEncodingBuilder.add(entry.getKey(), entry.getValue());
            }
            RequestBody requestBody = formEncodingBuilder.build();
            builder.post(requestBody);
        }
        if (headers != null) {
            builder = builder.url(url).headers(headers);
        } else {
            builder = builder.url(url);
        }
        Request request = builder.build();
        return getResponseBody(request);
    }

    public static ResponseBody syncPostByJson(String url, Headers headers, String json) {
        Request.Builder builder = new Request.Builder();
        RequestBody requestBody = RequestBody.create(MEDIA_TYPE_JSON, json);
        builder.post(requestBody);
        if (headers != null) {
            builder = builder.url(url).headers(headers);
        } else {
            builder = builder.url(url);
        }
        Request request = builder.build();
        return getResponseBody(request);
    }

    public static ResponseBody syncPostByString(String url, Headers headers, String str) {
        Request.Builder builder = new Request.Builder();
        RequestBody requestBody = RequestBody.create(MEDIA_TYPE_TEXT, str);
        builder.post(requestBody);
        if (headers != null) {
            builder = builder.url(url).headers(headers);
        } else {
            builder = builder.url(url);
        }
        Request request = builder.build();
        return getResponseBody(request);

    }

    public static ResponseBody syncPostByStream(String url, Headers headers, InputStream is) throws IOException {
        Request.Builder builder = new Request.Builder();
        RequestBody requestBody = RequestBody.create(MEDIA_TYPE_TEXT, ByteStreams.toByteArray(is));
        builder.post(requestBody);
        if (headers != null) {
            builder = builder.url(url).headers(headers);
        } else {
            builder = builder.url(url);
        }
        Request request = builder.build();
        return getResponseBody(request);
    }

    public static ResponseBody syncPostByFile(String url, Headers headers, File file) {
        Request.Builder builder = new Request.Builder();
        RequestBody requestBody = RequestBody.create(MEDIA_TYPE_TEXT, file);
        builder.post(requestBody);
        if (headers != null) {
            builder = builder.url(url).headers(headers);
        } else {
            builder = builder.url(url);
        }
        Request request = builder.build();
        return getResponseBody(request);
    }

    public static ResponseBody syncPost(String url, RequestBody requestBody, Headers headers) {
        Request.Builder builder = new Request.Builder();
        builder.post(requestBody);
        if (headers != null) {
            builder = builder.url(url).headers(headers);
        } else {
            builder = builder.url(url);
        }
        Request request = builder.build();
        return getResponseBody(request);
    }
}

免费评分

参与人数 1热心值 +1 收起 理由
yhxi1714 + 1 谢谢@Thanks!

查看全部评分

3#
featmellwo 发表于 2016-11-20 17:19
4#
adalyb 发表于 2016-11-20 17:21
哈哈,这个要顶
5#
缥缈的心情 发表于 2016-11-20 17:23
JAVA写的么. 电脑里没有环境啊. 只有PYTHON的
6#
 楼主| gude 发表于 2016-11-20 17:25 |楼主
featmellwo 发表于 2016-11-20 17:19
同问爬虫是啥?有什么用?

爬到数据 就能做搜索站啊 我怕了1100w 做的https://www.zgdgude.cn/
7#
 楼主| gude 发表于 2016-11-20 17:26 |楼主
缥缈的心情 发表于 2016-11-20 17:23
JAVA写的么. 电脑里没有环境啊. 只有PYTHON的

java和golang 2个版本
8#
yelidewo 发表于 2016-11-20 17:27
有图吗?这是...用来做什么的啊...
9#
featmellwo 发表于 2016-11-20 17:28
gude 发表于 2016-11-20 17:25
爬到数据 就能做搜索站啊 我怕了1100w 做的https://www.zgdgude.cn/

。。好吧  我还是对你网站比较感兴趣23333
10#
15820956473 发表于 2016-11-20 17:36
不错。留个v ok?
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-15 16:47

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表