VBS也爬虫~VBS爬取美图查照片
这两年python爬虫挺火,我的python就学了个皮毛,还不会用爬虫。我也不知道python爬虫什么原理,不过我倒是会用VBS批量下载我需要的东西,这应该 也算是爬虫吧。前两天在52论坛看到美图查美图查,发现照片质量挺高,就想爬下来,于是就有了下面的代码。
'u = "https://wx.xxx.com/action.php"
'cookies=HttpRequest(u, "POST", "getcookie", "mode=login&u=adminn&p=123", "")
'msgbox cookies
'u = "https://wx.xxx.com/nowyuyue.php"
's = HttpRequest(u, "GET", "setcookie", "", cookies)
'wscript.echo s
'http://meitucha.com/a/1
'http://meitucha.com/a/32157
for i = 1 to 32157
j = 1
do
u = "http://meitucha.com/a/" & i & "?page=" & j
s = HttpRequest(u, "GET", "", "", "")
Set html = CreateObject("htmlfile")
html.designMode = "on"
html.write s
title = html.title
title = replace(replace(replace(replace(title,"\",""),"/",""),"*",""),"?","")
title = replace(replace(replace(replace(title,"<",""),">",""),"|",""),":","")
title = replace(replace(replace(title,chr(34),""),vblf,""),vbcr,"")
For Each M In html.getElementsByTagName("img")
if M.classname = "tupian_img" then
img = M.getAttribute("src")
filename = right(img,len(img) - instrrev(img,"/"))
download img,"D:\" & title & "\" & filename
wscript.sleep 500
end if
Next
nextpage = false
For Each A In html.getElementsByTagName("a")
if A.classname = "nxt" and A.innertext = "下一页" then
nextpage = true
j = j + 1
end if
Next
wscript.sleep 1000
loop until nextpage = false
wscript.sleep 1000
next
Function HttpRequest(url, mode, cook, data, cookies)
'地址,GET/POST,getcookie/setcookie,请求数据, cookies
Set cHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
With cHttp
.SetTimeouts 80000, 80000, 80000, 80000
.Open mode, url, False
If cook = "setcookie" Then .SetRequestHeader "Cookie", cookies
.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
if mode = "POST" then
.SetRequestHeader "content-Type", "application/x-www-form-urlencoded"
.SetRequestHeader "Content-Length", Len(data)
end if
.SetRequestHeader "Referer", url
.SetRequestHeader "Cache-Control", "no-cache"
.SetRequestHeader "Connection", "keep-alive"
.Send data
If .WaitForResponse Then
If .Status = 200 Then
If cook = "getcookie" Then
HttpRequest = GetCookie(.getAllResponseHeaders)
Else
HttpRequest =.responseText
End If
End If
End If
End With
End Function
Function GetCookie(str)
a = InStr(str, "Set-Cookie: ")
If a = 0 Then
GetCookie = ""
Else
b = InStr(a, str, ";")
c = Mid(str, a + 12, b - a - 11)
Cookie = c
Do
d = InStr(b, str, "Set-Cookie: ")
If d = 0 Then Exit Do
e = InStr(d, str, ";")
f = Mid(str, d + 12, e - d - 11)
b = e
Cookie = Cookie & " " & f
Loop
GetCookie = Cookie
End If
End Function
Sub download(url,target)
Set fso=CreateObject("scripting.FileSystemObject")
path = left(target,instrrev(target,"\"))
If Not FSO.FolderExists(path) Then fso.CreateFolder(path)
Const adTypeBinary = 1
Const adSaveCreateOverWrite = 2
Dim http,ado
Set http = CreateObject("Msxml2.XMLHTTP")
http.open "GET",url,False
http.send
Set ado = createobject("Adodb.Stream")
ado.Type = adTypeBinary
ado.Open
ado.Write http.responseBody
ado.SaveToFile target
ado.Close
End Sub
要说VBS爬虫,还得追溯到10年前,当时《电脑爱好者》杂志的博客质量挺高,当时有电脑,但没条件上网,就写了段VBS,把博客文章全部抓下来,以后慢慢看。当时这个VBS还在该杂志发表过(署名是当时已经分手的前女友)。
虽然都是用VBS爬,但技术还是有区别的,当时用的XMLHTTP和正则表达式,不能cookie登录,正则写起来也麻烦,现在用的WinHttpRequest和htmlfile对象,伪造header,cookie登录都没问题,使用htmlfile也比正则处理HTML方便很多。美图查不涉及到登录啥的,只用了一个GET,其他用法脚本里有实例,可以参考。 不错的文章,这几天正在写vb的爬虫,学习了! XMLHTTP和WININET都是系统管理cookie的
WINHTTP可以自己控制cookie(做自动刷单用的多 这个站我的。。。 xunfish 发表于 2020-2-22 09:12
这个站我的。。。
哦?哈哈,没真的爬哈,就试了两个。你把图片分享出来,就不用费事了。 出现这个错误是什么原因? 试试效果 楼主可以导出成exe吗,感谢
页:
[1]