【笔记】java从网页爬取数据并输出到本地范例

须臾致幻 发表于 2019-11-24 21:02

本帖最后由须臾致幻于 2019-11-24 21:37 编辑

Java爬虫入门的一小步。。。

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.FileWriter;

public class CrawlerFirst {
public static void main(String[] args) throws Exception{
   //1.打开浏览器,创建HttpClient对象
   CloseableHttpClient httpClient = HttpClients.createDefault();

   //2.输入网址
   HttpGet httpget = new HttpGet("https://www.52pojie.cn");

   //3.按回车,发起请求,返回响应,使用HttpClient对象发起请求
   CloseableHttpResponse response = httpClient.execute(httpget);

   //4.解析响应,获取数据
   //判断状态码是否200
   if(response.getStatusLine().getStatusCode() == 200){
         HttpEntity httpEntity = response.getEntity();
         String content = EntityUtils.toString(httpEntity,"UTF-8");
         //将爬取网页写入本地文件
         FileWriter fw = new FileWriter("Test.html");
         fw.write(content);
   }
}
}

须臾致幻 发表于 2019-11-24 21:35

28行后面最好带上一个："fw.flush();",不然爬取字节数较大的网页的时候可能会爬取不完全。

时光书窝 发表于 2019-11-25 12:56

顺便带个正则

页: [1]

吾爱破解 - 52pojie.cn's Archiver

【笔记】java从网页爬取数据并输出到本地范例