【笔记】java从网页爬取数据并输出到本地范例
本帖最后由 须臾致幻 于 2019-11-24 21:37 编辑Java爬虫入门的一小步。。。
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import java.io.FileWriter;
public class CrawlerFirst {
public static void main(String[] args) throws Exception{
//1.打开浏览器,创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
//2.输入网址
HttpGet httpget = new HttpGet("https://www.52pojie.cn");
//3.按回车,发起请求,返回响应,使用HttpClient对象发起请求
CloseableHttpResponse response = httpClient.execute(httpget);
//4.解析响应,获取数据
//判断状态码是否200
if(response.getStatusLine().getStatusCode() == 200){
HttpEntity httpEntity = response.getEntity();
String content = EntityUtils.toString(httpEntity,"UTF-8");
//将爬取网页写入本地文件
FileWriter fw = new FileWriter("Test.html");
fw.write(content);
}
}
}
28行后面最好带上一个:"fw.flush();",不然爬取字节数较大的网页的时候可能会爬取不完全。 顺便带个正则
页:
[1]