本帖最后由 songjing 于 2021-9-27 18:01 编辑
之前看好像是吾爱大哥写的一个爬虫
https://www.52pojie.cn/thread-1309809-1-1.html
然后改了一下我想要的一些东西,大致思路不变
站在巨人的肩膀上,我才可以看的更远 感谢大佬
第一次发技术贴,排版不好 各位见谅
测试网址
https://www.loggly.com/docs-index/log-sources/
[Java] 纯文本查看 复制代码 package com.plan.plan;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
public class pc {
public static void main(String[] args) throws IOException {
long t1=System.currentTimeMillis();
//访问目标网址
Connection connection1=Jsoup.connect("https://www.loggly.com/docs-index/log-sources/");
//连接成功后获取Document对象
Document document1= connection1.get();
Element elementDiv=document1.selectFirst("[class=log-sources]");
Element elementDiv1=elementDiv.selectFirst("[class=container]");//搜索class=cl r 标签
Element elementUL=elementDiv1.selectFirst("[class=row]");
Element elementUL1=elementUL.selectFirst("[class=col-sm-8]");
Elements elements=null;
try {
Thread.sleep(5000);
elements=elementUL1.getElementsByClass("log-sources__main");
} catch (InterruptedException e) {
e.printStackTrace();
}
Elements elementUL1mainlist=elements.select(".log-list");
Elements elementLis=elementUL1mainlist.select(".log-list__item");//通过找到的ul 搜索ul里面的所有li标签
for(Element elementLi:elementLis) {//遍历所有找到的li
Element elementA=elementLi.selectFirst("a");//搜索li里的a标签
String name=elementA.attr("href");
Elements elements1log__front=elementA.getElementsByClass("log__front");
Elements select = elements1log__front.select("img[src]");
String src2=null;
String divName= String.valueOf(System.currentTimeMillis());
for(Element element:select){
String src=element.attr("abs:src");//获取src的绝对路径
src2=element.attr("src");//获取src的绝对路径
//好像有反爬虫机制 所以加了这个代码
// 参考 https://www.cnblogs.com/xijieblog/p/4540026.html
URL url=new URL(src2);
url = new URL(src2);
HttpURLConnection connection = (HttpURLConnection) url.
openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
InputStream is=url.openStream();
FileOutputStream fos=new FileOutputStream("D:\\date\\test"+"//"+divName+".png");
byte[] b=new byte[2048];
int count=is.read(b);
while(count!=-1) {
fos.write(b,0,count);
fos.flush();
count=is.read(b);
}
fos.close();
is.close();
}
}
long t2=System.currentTimeMillis();
double a=(t2-t1)/1000;
System.out.println("下载完毕"+"用时:"+a+"s");
}
}
所需要的maven插件
[XML] 纯文本查看 复制代码 <dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency> |