VBS也爬虫~VBS爬取美图查照片

xiaomingtt 发表于 2020-2-21 15:02

这两年python爬虫挺火，我的python就学了个皮毛，还不会用爬虫。我也不知道python爬虫什么原理，不过我倒是会用VBS批量下载我需要的东西，这应该也算是爬虫吧。
前两天在52论坛看到美图查美图查，发现照片质量挺高，就想爬下来，于是就有了下面的代码。

'u = "https://wx.xxx.com/action.php"
'cookies=HttpRequest(u, "POST", "getcookie", "mode=login&u=adminn&p=123", "")
'msgbox cookies
'u = "https://wx.xxx.com/nowyuyue.php"
's = HttpRequest(u, "GET", "setcookie", "", cookies)
'wscript.echo s

'http://meitucha.com/a/1
'http://meitucha.com/a/32157
for i = 1 to 32157
j = 1
do
u = "http://meitucha.com/a/" & i & "?page=" & j
s = HttpRequest(u, "GET", "", "", "")

Set html = CreateObject("htmlfile")
html.designMode = "on"
html.write s
title = html.title
title = replace(replace(replace(replace(title,"\",""),"/",""),"*",""),"?","")
title = replace(replace(replace(replace(title,"<",""),">",""),"|",""),":","")
title = replace(replace(replace(title,chr(34),""),vblf,""),vbcr,"")
For Each M In html.getElementsByTagName("img")
if M.classname = "tupian_img" then
img = M.getAttribute("src")
filename = right(img,len(img) - instrrev(img,"/"))
download img,"D:\" & title & "\" & filename
wscript.sleep 500
end if
Next
nextpage = false
For Each A In html.getElementsByTagName("a")
if A.classname = "nxt" and A.innertext = "下一页" then
nextpage = true
j = j + 1
end if
Next
wscript.sleep 1000
loop until nextpage = false
wscript.sleep 1000
next

Function HttpRequest(url, mode, cook, data, cookies)
'地址，GET/POST，getcookie/setcookie，请求数据， cookies
Set cHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
With cHttp
   .SetTimeouts 80000, 80000, 80000, 80000
   .Open mode, url, False
If cook = "setcookie" Then .SetRequestHeader "Cookie", cookies
.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
if mode = "POST" then
.SetRequestHeader "content-Type", "application/x-www-form-urlencoded"
.SetRequestHeader "Content-Length", Len(data)
end if
.SetRequestHeader "Referer", url
.SetRequestHeader "Cache-Control", "no-cache"
.SetRequestHeader "Connection", "keep-alive"
   .Send data
   If .WaitForResponse Then
         If .Status = 200 Then
If cook = "getcookie" Then
HttpRequest = GetCookie(.getAllResponseHeaders)
Else
HttpRequest =.responseText
End If
         End If
   End If
End With
End Function

Function GetCookie(str)
a = InStr(str, "Set-Cookie: ")
If a = 0 Then
   GetCookie = ""
Else
   b = InStr(a, str, ";")
   c = Mid(str, a + 12, b - a - 11)
   Cookie = c
   Do
         d = InStr(b, str, "Set-Cookie: ")
         If d = 0 Then Exit Do
         e = InStr(d, str, ";")
         f = Mid(str, d + 12, e - d - 11)
         b = e
         Cookie = Cookie & " " & f
   Loop
   GetCookie = Cookie
End If
End Function

Sub download(url,target)
Set fso=CreateObject("scripting.FileSystemObject")
path = left(target,instrrev(target,"\"))
If Not FSO.FolderExists(path) Then fso.CreateFolder(path)
Const adTypeBinary = 1
Const adSaveCreateOverWrite = 2
Dim http,ado
Set http = CreateObject("Msxml2.XMLHTTP")
http.open "GET",url,False
http.send
Set ado = createobject("Adodb.Stream")
ado.Type = adTypeBinary
ado.Open
ado.Write http.responseBody
ado.SaveToFile target
ado.Close
End Sub
要说VBS爬虫，还得追溯到10年前，当时《电脑爱好者》杂志的博客质量挺高，当时有电脑，但没条件上网，就写了段VBS，把博客文章全部抓下来，以后慢慢看。当时这个VBS还在该杂志发表过（署名是当时已经分手的前女友）。

虽然都是用VBS爬，但技术还是有区别的，当时用的XMLHTTP和正则表达式，不能cookie登录，正则写起来也麻烦，现在用的WinHttpRequest和htmlfile对象，伪造header，cookie登录都没问题，使用htmlfile也比正则处理HTML方便很多。美图查不涉及到登录啥的，只用了一个GET，其他用法脚本里有实例，可以参考。

shi128862 发表于 2020-2-21 15:27

不错的文章，这几天正在写vb的爬虫，学习了！

JuncoJet 发表于 2020-2-21 15:27

XMLHTTP和WININET都是系统管理cookie的
WINHTTP可以自己控制cookie（做自动刷单用的多

xunfish 发表于 2020-2-22 09:12

这个站我的。。。

xiaomingtt 发表于 2020-2-22 11:35

xunfish 发表于 2020-2-22 09:12
这个站我的。。。

哦？哈哈，没真的爬哈，就试了两个。你把图片分享出来，就不用费事了。

caozb 发表于 2020-3-13 12:23

出现这个错误是什么原因？

ljm009 发表于 2020-4-21 01:24

试试效果

LeavesKing 发表于 2020-5-18 18:11

楼主可以导出成exe吗，感谢

页: [1]

吾爱破解 - 52pojie.cn's Archiver

VBS也爬虫~VBS爬取美图查照片