新人,关于百度文库的相关爬取问题
本帖最后由 zhaoziqi1995 于 2019-12-4 17:54 编辑新人刚学python,想跨国百度文库会员复制文字的过程中发现百度文库有这样两种数据,我该如何爬取呢?
第二种是什么数据呢?
两个百度文库地址分别是
https://wenku.baidu.com/view/c748189d640e52ea551810a6f524ccbff121cadd.html
https://wenku.baidu.com/view/497ebaa7657d27284b73f242336c1eb91b3733d2.html?rec_flag=default&sxts=1575451815596
这种可以直接拿到数据
https://wkbjcloudbos.bdimg.com/v1/wenkueditor/wenku_editor/6383099/6383099_0.html?responseContentType=text%2Fhtml&responseCacheControl=no-cache&authorization=bce-auth-v1%2Ffa1126e91489401fa7cc85045ce7179e%2F2019-12-04T09%3A14%3A37Z%2F3600%2Fhost%2Fc3a8e28661f9ca2f05201c038fe9eeed8c627fa62c163b80e00553109a02bbea&token=eyJ0eXAiOiJKSVQiLCJ2ZXIiOiIxLjAiLCJhbGciOiJIUzI1NiIsImV4cCI6MTU3NTQ1NDQ3NywidXJpIjp0cnVlLCJwYXJhbXMiOlsicmVzcG9uc2VDb250ZW50VHlwZSIsInJlc3BvbnNlQ2FjaGVDb250cm9sIl19.bRFqzpF6a38EQ8TBr5l5ziGGfsndIT8AIYWxbrMvro8%3D.1575454477
--------------------
另一种是这样的
------------------
https://wkbjcloudbos.bdimg.com/v1/docconvert8369/wk/1932cf9ee6aa6119f402d341a3bf8099/0.json?responseContentType=application%2Fjavascript&responseCacheControl=max-age%3D3888000&responseExpires=Sat%2C%2018%20Jan%202020%2017%3A27%3A19%20%2B0800&authorization=bce-auth-v1%2Ffa1126e91489401fa7cc85045ce7179e%2F2019-12-04T09%3A27%3A19Z%2F3600%2Fhost%2Fd6bb1289ba90fa92fec3ada2c82d3ce838f34013db5cb6a732160a04af943768&x-bce-range=0-93580&token=eyJ0eXAiOiJKSVQiLCJ2ZXIiOiIxLjAiLCJhbGciOiJIUzI1NiIsImV4cCI6MTU3NTQ1NTIzOSwidXJpIjp0cnVlLCJwYXJhbXMiOlsicmVzcG9uc2VDb250ZW50VHlwZSIsInJlc3BvbnNlQ2FjaGVDb250cm9sIiwicmVzcG9uc2VFeHBpcmVzIiwieC1iY2UtcmFuZ2UiXX0%3D.%2FizH06rl09%2FjSjmb%2BX7c6Gf7CHBIVpRPgOhV6wVRgUs%3D.1575455239
点不开链接 https://www.52pojie.cn/thread-1064617-1-1.html楼主看看这里
页:
[1]