吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 4011|回复: 16
收起左侧

[Java 转载] java小说爬虫

[复制链接]
三木猿 发表于 2020-8-25 17:25
心血来潮想看小说,却被广告弄得头大,然后自己写了个小说爬虫,可以下载成txt看,也可以直接在线看,代码持续更新中。。。。。。后期又加入了漫画爬取,初学爬虫,请指教码云:https://gitee.com/sen_yang/SanMuYuanBook
因为没用数据库,数据都是现爬现看的,所以环境比较容易装
[Java] 纯文本查看 复制代码
public class CartoonChapter {
    public static CartoonService cartoonService;
    public static CartoonCatalogue setDataSource(String cartoonCod,String chapterCod, String dataSource,CartoonService cartoonService) {
        CartoonChapter.cartoonService=cartoonService;
        SSLHelper.init();
        if ("gufengmh8".equals(dataSource)||"36mh".equals(dataSource)) {
            return gufengmh8(dataSource,cartoonCod,chapterCod);
        }
        return null;
    }
    private static CartoonCatalogue gufengmh8(String dataSource,String cartoonCod,String chapterCod) {
        CartoonCatalogue cartoonCatalogue = new CartoonCatalogue();
        List<String> list = new ArrayList<>();
        int i = 0;
        while (true) {
            Document document = null;
            if (i == 0) {
                document = GetDocument.getDocument("https://m."+dataSource+".com/manhua/", cartoonCod, chapterCod);
            } else {
                document = GetDocument.getDocument("https://m."+dataSource+".com/manhua/", cartoonCod, chapterCod + "-" + i);
            }
            if (i == 0) {
                if("36mh".equals(dataSource)){
                    Elements elementsByClass = document.getElementsByClass("p10 title3");
                    cartoonCatalogue.setCatalogueName(elementsByClass.get(0).childNode(1).childNode(1).childNode(1).toString().replace(">",""));
                }else{
                    Elements elementsByClass = document.getElementsByClass("BarTit");
                    cartoonCatalogue.setCatalogueName(elementsByClass.get(0).text());
                }
                CartoonCatalogueDto cartoonCatalogue1 = cartoonService.getCartoonCatalogue(cartoonCod,dataSource);
                List<CartoonCatalogue> cartoonCatalogues = cartoonCatalogue1.getCartoonCatalogues();
                cartoonCatalogues.forEach(e -> {
                    if (e.getCatalogueCode().toString().equals(chapterCod)) {
                        cartoonCatalogue.setNextCode(e.getNextCode());
                        cartoonCatalogue.setUpCode(e.getUpCode());
                    }
                });
            }

            String src = getSrc(document,dataSource);
            if ("https://res.xiaoqinre.com/images/default/cover.png".equals(src)||"https://img001.shmkks.com/images/default/cover.png".equals(src)) {
                break;
            }

            list.add(src);
            i++;
        }
        cartoonCatalogue.setCatalogueSrc(list);
        return cartoonCatalogue;
    }
    private static String getSrc(Document document,String data) {
        Elements elementsByClass=null;
        Node childNode=null;
        if("36mh".equals(data)){
            elementsByClass = document.getElementsByClass("UnderPage");
            childNode = elementsByClass.get(0).childNode(5).childNode(1);
        }else{
            elementsByClass = document.getElementsByClass("chapter-content");
            childNode = elementsByClass.get(0).childNode(3).childNode(0);
        }
        String src = null;
        List<Node> nodes = childNode.childNodes();
        for (Node node : nodes) {
            if (!"".equals(node.toString().trim())) {
                src = node.attr("src");
            }
        }
        return src;
    }
}
[Asm] 纯文本查看 复制代码
public class BookChapter {[/font][/color]
    public static BookCatalogueDto setDataSource(String dataSource, String bookCod, String chapterCod) {
        SSLHelper.init();
        if ("biquge5200".equals(dataSource)) {
            return biquge5200(bookCod, chapterCod);
        } else if ("biquge".equals(dataSource)) {
            return biquge(bookCod, chapterCod);
        }
        return null;
    }

    private static BookCatalogueDto biquge5200(String bookCod, String chapterCod) {
        BookCatalogueDto bookCatalogueDto = new BookCatalogueDto();
        Document document = null;
        try {
            document = Jsoup.connect("https://www.biquge5200.com/" + bookCod + "/" + chapterCod + ".html").get();
        } catch (IOException e) {
            e.printStackTrace();
        }
        //<a >下一章</a>
        //获得的章节名称
        Elements chapterName = document.select("h1");
        bookCatalogueDto.setCatalogueName(chapterName.text());
        String p = document.select("#content").html();
        Elements elementsByClass = document.getElementsByClass("bottem1");
        //获取下一章
        String node = elementsByClass.get(0).childNode(9).toString();
        Pattern pattern = Pattern.compile("<a\\s*href=\"?([\\w\\W]*?)\"?[\\s]*?[^>]>([\\s\\S]*?)(?=</a>)");
        Matcher matcher = pattern.matcher(node);
        if (matcher.find()) {
            String nameCodeUrl = matcher.group(1);
            if(!("https://www.biquge5200.com/"+bookCod+"/").equals(nameCodeUrl)) {
                String insStr = nameCodeUrl.substring(nameCodeUrl.lastIndexOf("/") + 1, nameCodeUrl.lastIndexOf("."));
                bookCatalogueDto.setNextCode(Integer.parseInt(insStr));
            }
        }
        //获取上一章
        String node1 = elementsByClass.get(0).childNode(5).toString();
        Matcher matcher1 = pattern.matcher(node1);
        if (matcher1.find()) {
            String nameCodeUrl = matcher1.group(1);
            if(!("https://www.biquge5200.com/"+bookCod+"/").equals(nameCodeUrl)){
                String insStr = nameCodeUrl.substring(nameCodeUrl.lastIndexOf("/") + 1, nameCodeUrl.lastIndexOf("."));
                bookCatalogueDto.setUpCode(Integer.parseInt(insStr));
            }
        }
        String str = p.replace("<div id='content' style='width: 85%;'>", "")
                .replace("/n", "")
                .replace("</div>", "")
                .replace("<p>", "<p data-type=\"2\"><span class=\"content-wrap\">")
                .replace("</p>","</span></p>");
        bookCatalogueDto.setCatalogueText(str);
        bookCatalogueDto.setCatalogueCod(Integer.parseInt(chapterCod));
        bookCatalogueDto.setBookCod(bookCod);

        return bookCatalogueDto;
    }

    private static BookCatalogueDto biquge(String bookCod, String chapterCod) {
        BookCatalogueDto bookCatalogue = new BookCatalogueDto();
        Document document = null;
        try {
            document = Jsoup.connect("https://www.biquge.com/" + bookCod + "/" + chapterCod + ".html").get();
        } catch (Exception e) {
            e.printStackTrace();
        }
        String p = document.select("#content").html();
        bookCatalogue.setCatalogueText(p);
        bookCatalogue.setCatalogueCod(Integer.parseInt(chapterCod));
        Elements chapterName = document.select("h1");
        bookCatalogue.setCatalogueName(chapterName.text());
        Elements next = document.getElementsByClass("next");
        String nextHtml = next.get(0).html();
        Pattern pattern = Pattern.compile("<a\\s*href=\"?([\\w\\W]*?)\"?[\\s]*?[^>]>([\\s\\S]*?)(?=</a>)");
        Matcher matcher = pattern.matcher(nextHtml);
        if (matcher.find()) {
            String nameCodeUrl = matcher.group(1);
            String insStr = nameCodeUrl.substring(nameCodeUrl.lastIndexOf("/") + 1, nameCodeUrl.lastIndexOf("."));
            bookCatalogue.setNextCode(Integer.parseInt(insStr));
        }
        return bookCatalogue;
    }


}
1598347227(1).jpg
1598347173(1).jpg
微信截图_20200825171827.png
1598347018.jpg

免费评分

参与人数 6吾爱币 +4 热心值 +6 收起 理由
一切都是信仰 + 1 + 1 谢谢@Thanks!
T_T二傻子 + 1 + 1 谢谢@Thanks!
芈亓 + 1 用心讨论,共获提升!
安心十 + 1 + 1 谢谢@Thanks!
小兴818 + 1 + 1 谢谢分享,吾爱破解论坛有你更精彩!
wuxiaolei1 + 1 谢谢@Thanks!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

 楼主| 三木猿 发表于 2020-8-26 11:01
[Java] 纯文本查看 复制代码
package com.aaa.data;

import com.aaa.config.HttpsUtil;
import com.aaa.dto.BookCatalogueDto;
import com.aaa.entity.BookCatalogue;
import com.aaa.service.BookService;
import com.aaa.service.impl.BookServiceImpl;
import com.aaa.util.Download;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.Elements;

import javax.servlet.http.HttpServletRequest;
import java.awt.event.ItemEvent;
import java.io.*;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class DownloadBook {
    private static HttpServletRequest request;
    private static String dataSource;
    private static Pattern pattern = Pattern.compile("<a\\s*href=\"?([\\w\\W]*?)\"?[\\s]*?[^>]>([\\s\\S]*?)(?=</a>)");

    public static void setDataSource(String dataSource, HttpServletRequest request) {
        DownloadBook.request = request;
        DownloadBook.dataSource = dataSource;
        if ("biquge5200".equals(dataSource)) {
            while (true) {
                Thread thread1 = new Thread(() -> {
                    for (int i = 1; i < 1000; i++) {
                        try {
                            String bookCod = "0_" + i;
                            Document document = Jsoup.connect("https://www.biquge5200.com/" + bookCod + "/").get();
                            Element info = document.getElementById("info");
                            String bookName = info.select("h1").text();
                            String path = "/usr/local/webapps/file/" + bookName + ".txt";
                            File file = new File(path);
                            if (file.exists()) {
                                continue;
                            }
                            System.out.println("---------------" + bookName + "正在下载" + "--------------");
                            List<BookCatalogueDto> bookCatalogue = getBookCatalogue(bookCod, document, pattern);
                            downloadBook(bookCod, bookName, bookCatalogue);
                            System.out.println("---------------" + bookName + "下载完成" + "--------------");
                        } catch (Exception e) {
                            return;
                        }
                    }
                });
                Thread thread2 = new Thread(() -> {
                    for (int i = 1000; i < 2000; i++) {
                        try {
                            i++;
                            String bookCod = "0_" + i;
                            Document document = Jsoup.connect("https://www.biquge5200.com/" + bookCod + "/").get();
                            Element info = document.getElementById("info");
                            String bookName = info.select("h1").text();
                            String path = "/usr/local/webapps/file/" + bookName + ".txt";
                            File file = new File(path);
                            if (file.exists()) {
                                continue;
                            }
                            System.out.println("---------------" + bookName + "正在下载" + "--------------");
                            List<BookCatalogueDto> bookCatalogue = getBookCatalogue(bookCod, document, pattern);
                            downloadBook(bookCod, bookName, bookCatalogue);
                            System.out.println("---------------" + bookName + "下载完成" + "--------------");
                        } catch (Exception e) {
                            return;
                        }
                    }
                });
                thread1.start();
                thread2.start();
                try {
                    thread1.join();
                    thread2.join();
                    break;
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        } else if ("biquge".equals(dataSource)) {
            while (true) {
                Thread thread1 = new Thread(() -> {
                    for (int j = 1; j < 1000; j++) {
                        try {
                            String bookCod = "0_" + j;
                            Document document = HttpsUtil.sendHttp("https://www.biquge.com/" + bookCod + "/");
                            Element info = document.getElementById("info");
                            String bookName = info.select("h1").text();
                            String path = "/usr/local/webapps/file/" + bookName + ".txt";
                            File file = new File(path);
                            if (file.exists()) {
                                continue;
                            }
                            List<BookCatalogueDto> bookCatalogue = getBookCatalogue(bookCod, document, pattern);
                            System.out.println("---------------" + bookName + "正在下载" + "--------------");
                            downloadBook(bookCod, bookName, bookCatalogue);
                            System.out.println("---------------" + bookName + "下载完成" + "--------------");
                        } catch (Exception e) {
                            continue;
                        }
                    }
                });
                Thread thread2 = new Thread(() -> {
                    for (int j = 1000; j < 2000; j++) {
                        try {
                            String bookCod = "0_" + j;
                            Document document = HttpsUtil.sendHttp("https://www.biquge.com/" + bookCod + "/");
                            Element info = document.getElementById("info");
                            String bookName = info.select("h1").text();
                            String path = "/usr/local/webapps/file/" + bookName + ".txt";
                            File file = new File(path);
                            if (file.exists()) {
                                continue;
                            }
                            List<BookCatalogueDto> bookCatalogue = getBookCatalogue(bookCod, document, pattern);
                            System.out.println("---------------" + bookName + "正在下载" + "--------------");
                            downloadBook(bookCod, bookName, bookCatalogue);
                            System.out.println("---------------" + bookName + "下载完成" + "--------------");
                        } catch (Exception e) {
                            continue;
                        }
                    }
                });
                thread1.start();
                thread2.start();
                try {
                    thread1.join();
                    thread2.join();
                    break;
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }

        }
    }

    public static void downloadBook(String bookCod, String bookName, List<BookCatalogueDto> bookCatalogueDto) throws Exception {
        String path = "/usr/local/webapps/file/" + bookName + ".txt";
        File file = new File(path);
        if (file.exists()) {
            return;
        }
        Map<Integer, List<BookCatalogueDto>> integerListMap = splitList(bookCatalogueDto, 3);
        long start = System.currentTimeMillis();
        Thread thread1 = new Thread(() -> {
            try {
                if ("biquge5200".equals(dataSource)) {
                    biquge5200(bookCod, bookName + "1", integerListMap.get(0));
                } else if ("biquge".equals(dataSource)) {
                    biquge(bookCod, bookName + "1", integerListMap.get(0));
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        });
        Thread thread2 = new Thread(() -> {
            try {
                if ("biquge5200".equals(dataSource)) {
                    biquge5200(bookCod, bookName + "2", integerListMap.get(1));
                } else if ("biquge".equals(dataSource)) {
                    biquge(bookCod, bookName + "2", integerListMap.get(1));
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        });
        Thread thread3 = new Thread(() -> {
            try {
                if ("biquge5200".equals(dataSource)) {
                    biquge5200(bookCod, bookName + "3", integerListMap.get(2));
                } else if ("biquge".equals(dataSource)) {
                    biquge(bookCod, bookName + "3", integerListMap.get(2));
                }

            } catch (Exception e) {
                e.printStackTrace();
            }
        });
        thread1.start();
        thread2.start();
        thread3.start();
        thread1.join();
        thread2.join();
        thread3.join();
        //合并文件
        combine(bookName);
        long end = System.currentTimeMillis();
        System.out.println("本次下载共用时" + (end - start));
    }

    public static void biquge5200(String bookCod, String bookName, List<BookCatalogueDto> bookCatalogueDto) throws
            Exception {
        String path = "/usr/local/webapps/file/downloading/" + bookName + ".txt";
        File file = new File(path);
        if (!file.exists()) {
            File dir = new File(file.getParent());
            dir.mkdirs();
            try {
                file.createNewFile();
            } catch (IOException e) {
                e.printStackTrace();
            }
        } else {
            List<BookCatalogueDto> bookCatalogueDtos = txtCatalogue(bookName);
            if (bookCatalogueDtos.size() != 0) {
                BookCatalogueDto bookCatalogueDto1 = bookCatalogueDtos.get(bookCatalogueDtos.size() - 1);
                for (BookCatalogueDto catalogueDto : bookCatalogueDto) {
                    if (catalogueDto.getCatalogueName().equals(bookCatalogueDto1.getCatalogueName())) {
                        int i = bookCatalogueDto.indexOf(catalogueDto);
                        bookCatalogueDto = bookCatalogueDto.subList(i + 1, bookCatalogueDto.size());
                        break;
                    }
                }
            }
        }

        //创建一个输出流,将爬到的小说以txt形式保存在硬盘
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file, true)));
        if(bookCatalogueDto.size()==0){
            return;
        }
        bookCatalogueDto.forEach(e -> {

            Document document = null;
            try {
                document = Jsoup.connect("https://www.biquge5200.com/" + bookCod + "/" + e.getCatalogueCod() + ".html").get();
            } catch (IOException ioException) {
                try {
                    Thread.sleep(5000);
                    try {
                        document = Jsoup.connect("https://www.biquge5200.com/" + bookCod + "/" + e.getCatalogueCod() + ".html").get();
                    } catch (IOException exception) {
                        return;
                    }
                } catch (InterruptedException interruptedException) {
                    interruptedException.printStackTrace();
                }
            }
            Elements chapterName = document.select("h1");
            try {
                bw.write(chapterName.text());
                bw.newLine();
                bw.flush();
            } catch (IOException ioException) {
                ioException.printStackTrace();
            }
            Elements elements = document.select("#content");
            String html = elements.get(0).html().replace("<div id='content'>", "").replace("</div>", "");
            String replace = html.replace("<script>readx();</script>", "").replace("<script>chaptererror();</script>", "");
            try {
                String[] split = replace.replace("<p>", "").split("</p>");
                for (String s : split) {
                    bw.write(s);
                    bw.newLine();
                    bw.flush();
                }

            } catch (IOException ioException) {
                ioException.printStackTrace();
            }
        });
        try {
            bw.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static List<BookCatalogueDto> getBookCatalogue(String bookCod, Document document, Pattern pattern) throws InterruptedException {
        List<BookCatalogueDto> bookCatalogueDtos = new ArrayList<>();
        Elements dd = document.getElementsByTag("dd");
        Map<Integer, List<Element>> integerListMap = splitList(dd, 3);
        final List<BookCatalogueDto>[] bookCatalogueDtos1 = new List[]{new ArrayList<>()};
        final List<BookCatalogueDto>[] bookCatalogueDtos2 = new List[]{new ArrayList<>()};
        final List<BookCatalogueDto>[] bookCatalogueDtos3 = new List[]{new ArrayList<>()};
        Thread thread1 = new Thread(() -> {
            bookCatalogueDtos1[0] = get(integerListMap.get(0), bookCod, document, pattern);
        });
        Thread thread2 = new Thread(() -> {
            bookCatalogueDtos2[0] = get(integerListMap.get(1), bookCod, document, pattern);
        });
        Thread thread3 = new Thread(() -> {
            bookCatalogueDtos3[0] = get(integerListMap.get(2), bookCod, document, pattern);
        });
        thread1.start();
        thread2.start();
        thread3.start();
        thread1.join();
        thread2.join();
        thread3.join();
        bookCatalogueDtos.addAll(bookCatalogueDtos1[0]);
        bookCatalogueDtos.addAll(bookCatalogueDtos2[0]);
        bookCatalogueDtos.addAll(bookCatalogueDtos3[0]);
        return bookCatalogueDtos;
    }

    public static List<BookCatalogueDto> get(List<Element> dd, String bookCod, Document document, Pattern pattern) {
        List<BookCatalogueDto> bookCatalogueDtos = new ArrayList<>();
        for (int i = 0; i < dd.size(); i++) {
            Element element = dd.get(i);
            BookCatalogueDto bookCatalogueDto = new BookCatalogueDto();
            Node node = element.childNode(0);
            for (Node e : element.childNodes()) {
                if (!"".equals(e.toString())) {
                    node = e;
                }
            }
            String s1 = node.toString();
            Matcher matcher = pattern.matcher(s1);
            if (matcher.find()) {
                String nameCodeUrl = matcher.group(1);
                String insStr = nameCodeUrl.substring(nameCodeUrl.lastIndexOf("/") + 1, nameCodeUrl.lastIndexOf("."));
                bookCatalogueDto.setCatalogueCod(Integer.parseInt(insStr));
            }
            bookCatalogueDto.setBookCod(bookCod);
            bookCatalogueDto.setCatalogueName(element.text());
            bookCatalogueDtos.add(bookCatalogueDto);
        }
        return bookCatalogueDtos;
    }

    private static void biquge(String bookCod, String bookName, List<BookCatalogueDto> bookCatalogueDto) throws FileNotFoundException {
        String path = "/usr/local/webapps/file/downloading/" + bookName + ".txt";
        File file = new File(path);
        if (!file.exists()) {
            File dir = new File(file.getParent());
            dir.mkdirs();
            try {
                file.createNewFile();
            } catch (IOException e) {
                e.printStackTrace();
            }
        } else {
            List<BookCatalogueDto> bookCatalogueDtos = txtCatalogue(bookName);
            if (bookCatalogueDtos.size() != 0) {
                BookCatalogueDto bookCatalogueDto1 = bookCatalogueDtos.get(bookCatalogueDtos.size() - 1);
                for (BookCatalogueDto catalogueDto : bookCatalogueDto) {
                    if (catalogueDto.getCatalogueName().equals(bookCatalogueDto1.getCatalogueName())) {
                        int i = bookCatalogueDto.indexOf(catalogueDto);
                        bookCatalogueDto = bookCatalogueDto.subList(i + 1, bookCatalogueDto.size());
                        break;
                    }
                }
            }
        }
        //创建一个输出流,将爬到的小说以txt形式保存在硬盘
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file, true)));
        if(bookCatalogueDto.size()==0){
            return;
        }
        bookCatalogueDto.forEach(e -> {
            Document document = null;
            try {
                document = HttpsUtil.sendHttp("https://www.biquge.com/" + e.getBookCod() + "/" + e.getCatalogueCod() + ".html");
            } catch (Exception e1) {
                try {
                    Thread.sleep(5000);
                    document = HttpsUtil.sendHttp("https://www.biquge.com/" + e.getBookCod() + "/" + e.getCatalogueCod() + ".html");
                } catch (InterruptedException interruptedException) {
                    interruptedException.printStackTrace();
                } catch (Exception exception) {
                    exception.printStackTrace();
                }
            }

            Elements chapterName = document.select("h1");
            try {
                bw.write(chapterName.text());
                bw.newLine();
                bw.flush();
            } catch (IOException ioException) {
                ioException.printStackTrace();
            }
            Elements elements = document.select("#content");
            String html = elements.get(0).html().replace("<div id='content'>", "").replace("</div>", "");
            String replace = html.replace("<script>readx();</script>", "").replace("<script>chaptererror();</script>", "");
            try {
                String[] split = replace.split("<br>");
                for (String s : split) {
                    bw.write(s);
                    bw.newLine();
                    bw.flush();
                }

            } catch (IOException ioException) {
                ioException.printStackTrace();
            }
        });
        try {
            bw.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static <T> Map<Integer, List<T>> splitList(List<T> t, int num) {
        Map<Integer, List<T>> subList = new HashMap<>();
        int num1 = (int) Math.floor(t.size() / num);
        for (int i = 0; i < num; i++) {
            subList.put(i, t.subList(i * num1, (i + 1) * num1));
            if (i == num - 1) {
                subList.put(i, t.subList(i * num1, t.size()));
            }
        }
        return subList;
    }

    public static void combine(String bookName) throws Exception {
        String bookPath = "/usr/local/webapps/file/" + bookName + ".txt";
        File file = new File(bookPath);
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file, true)));
        for (int i = 1; i < 4; i++) {
            String path = "/usr/local/webapps/file/downloading/" + bookName + i + ".txt";
            File file1 = new File(path);
            if (file1.exists()) {
                BufferedReader br = new BufferedReader(new FileReader(file1));
                String line;
                while (true) {
                    if (!((line = br.readLine()) != null)) {
                        br.close();
                        break;
                    }
                    bw.write(line);
                    bw.newLine();
                }
            }
            file1.delete();
        }
        bw.flush();
        bw.close();
    }

    public static List<BookCatalogueDto> txtCatalogue(String bookName) {
        List<BookCatalogueDto> bookCatalogueDtos = new ArrayList<>();
        String fileNamedirs = "/usr/local/webapps/file/downloading/" + bookName + ".txt";
        try {
            // 编码格式
            String encoding = "utf-8";
            // 文件路径
            File file = new File(fileNamedirs);
            if (file.isFile() && file.exists()) { // 判断文件是否存在
                // 输入流
                InputStreamReader read = new InputStreamReader(new FileInputStream(file), encoding);// 考虑到编码格
                BufferedReader bufferedReader = new BufferedReader(read);
                String lineTxt = null;
                Long count = (long) 0;
                boolean bflag = false;
                int n = 0;
                String newStr = null;
                String titleName = null;
                String newChapterName = null;//新章节名称
                String substring = null;
                int indexOf = 0;
                int indexOf1 = 0;
                int line = 0;
                //小说内容类
                BookCatalogueDto content;
                while ((lineTxt = bufferedReader.readLine()) != null) {
                    content = new BookCatalogueDto();
                    //小说名称
                    content.setBookName(bookName);
                    count++;
                    // 正则表达式
                    Pattern p = Pattern.compile("(^\\s*第)(.{1,9})[章节卷集部篇回](\\s{1})(.*)($\\s*)");
                    Matcher matcher = p.matcher(lineTxt);
                    newStr = newStr + lineTxt;
                    while (matcher.find()) {
                        titleName = matcher.group();
                        //章节去空
                        newChapterName = titleName.trim();
                        //获取章节
                        //System.out.println(newChapterName);
                        content.setCatalogueName(newChapterName);
                        indexOf1 = indexOf;
                        //System.out.println(indexOf);
                        indexOf = newStr.indexOf(newChapterName);
                        // System.out.println(newChapterName + ":" + "第" + count + "行"); // 得到返回的章
                        if (bflag) {
                            bflag = false;
                            break;
                        }
                        if (n == 0) {
                            indexOf1 = newStr.indexOf(newChapterName);
                        }
                        n = 1;
                        bflag = true;
                        //System.out.println(chapter);
                        bookCatalogueDtos.add(content);
                    }
                }
                bufferedReader.close();
            } else {
                System.out.println("找不到指定的文件");
            }
        } catch (Exception e) {
            System.out.println("读取文件内容出错");
            e.printStackTrace();
        }
        return bookCatalogueDtos;
    }
}
 楼主| 三木猿 发表于 2020-8-26 10:58
另外我这又写了一个爬取小说并写入txt的项目,这个是专门用来下载小说的,项目中用了多线程爬取和下载
 楼主| 三木猿 发表于 2020-8-26 11:21
目前项目已部署自己的云服务器,http://49.235.253.131/
 楼主| 三木猿 发表于 2020-8-26 11:24
另外如果谁有速度快且全的漫画网站可以发出来,可以考虑加入漫画数据源
梦里余杭 发表于 2020-8-26 16:47
厉害厉害,java大神,学习了!!!!!!!!!!!!
wuxiaolei1 发表于 2020-8-27 10:11
https://www.xinqing100.net/   大神能把这个网站加进去不?
 楼主| 三木猿 发表于 2020-8-27 13:37
wuxiaolei1 发表于 2020-8-27 10:11
https://www.xinqing100.net/   大神能把这个网站加进去不?

兄弟,你这网站反应速度没的说,但是书太少了
chen470365456 发表于 2020-8-28 10:51
https://www.5ixs.net/books/1041988/ 大佬你这个网站试的怎么用呢。http://49.235.253.131/  在上面也不能搜索哦。
wuxiaolei1 发表于 2020-8-28 12:34
三木猿 发表于 2020-8-27 13:37
兄弟,你这网站反应速度没的说,但是书太少了

好的吧
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-11-25 22:54

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表