本帖最后由 hj170520 于 2020-5-25 15:08 编辑
源代码如下:我准备爬取图书馆的“试用资源”,可能该网页网友无法打开。。。
我就先把问题抛出来吧!
[Python] 纯文本查看 复制代码 import requests
url = 'http://www.htcases.com/kw/content/v_pdf.html?dbId=6&&caseId=40889&&caseType=1&&fileUrl='
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate',
'Accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,ko;q=0.7'
# 'Referer': 'http://www.htcases.com/kw/content/v_pdf.html?dbId=6&&caseId=40889&&caseType=1&&fileUrl='
}
req = requests.get(url, headers=headers, allow_redirects=False)
print(req.text)
返回来的值是
[Plain Text] 纯文本查看 复制代码 <!DOCTYPE html>
<html>
<title></title> <!-- jQuery -->
<script type="text/javascript" src="/res/front/js/jquery-all.min.js"></script>
<script type="text/javascript" src="/res/front/js/jquery-form.js"></script>
<script type="text/javascript" src="/res/front/js/jquery-ui.min.js"></script>
<script type="text/javascript" src="/res/front/js/jquery.validationEngine.js"></script>
<script type="text/javascript" src="/res/front/js/jquery.validationEngine-zh_CN.js"></script>
<script type="text/javascript" src="/res/front/js/jquery-ui-1.8.18.custom.min.js"></script>
<!-- <script type="text/javascript" src="/res/front/js/iepng.js"></script> -->
<script type="text/javascript" src="/res/front/js/P_Style.js"></script>
<script type="text/javascript" src="/res/front/js/HT_JScript.js"></script>
<script type="text/javascript" src="/res/front/js/jquery.corner.js"></script>
<!--
<script type="text/javascript" src="/res/front/js/JScript.20130614.js"></script>
-->
<script type="text/javascript" src="/res/front/js/helpTree.js"></script>
<!--<script type="text/javascript" src="/res/front/js/h.js"></script> -->
<script type="text/javascript" src="/res/front/js/sweetalert.min.js"></script>
<script type="text/javascript" src="/res/common/My97DatePicker/WdatePicker.js"></script>
<body>
<form id="pdfReaderForm" action="/kw/content/v_reader.html" method="post">
<input type="hidden" id="dbId" name="dbId" value="6" />
<input type="hidden" id="caseId" name="caseId" value="40889" />
<input type="hidden" id="caseType" name="caseType" value="1" />
<input type="hidden" id="fileUrl" name="fileUrl" value="" />
</form>
</body>
<script type="text/javascript">
$(function() {
$("#pdfReaderForm").submit();
});
</script>
</html>
但问题是当我把http://www.htcases.com/kw/content/v_pdf.html?dbId=6&&caseId=40889&&caseType=1&&fileUrl= 输入浏览器的时候,
自动跳转到一个http://www.htcases.com/kw/content/v_reader.html 的一个查看pdf的网页,我通过抓包是可以抓到这个pdf文件的地址的。
但是这个跳转页面v_reader.html我没办法抓到它的源代码,所以无法爬取它的地址。
怎么办呢
上一个话题是跳转的代码在headers里的location,
莫非这个转址存在于js? js我知之甚少!
|