极验反爬虫防护分析之接口交互的解密方法补遗
本帖最后由 besterChen 于 2020-4-22 18:23 编辑本文要分享的内容是去年为了抢鞋而分析 极验(GeeTest)反爬虫防护的笔记,由于篇幅较长(为了多混点CB)我会按照我的分析顺序,分成如下四个主题与大家分享:
1. [极验反爬虫防护分析之交互流程分析](https://www.52pojie.cn/thread-1162853-1-1.html)
2. [极验反爬虫防护分析之接口交互的解密方法](https://www.52pojie.cn/thread-1162893-1-1.html)
3. [极验反爬虫防护分析之接口交互的解密方法补遗](https://www.52pojie.cn/thread-1162951-1-1.html)
4. [极验反爬虫防护分析之slide验证方式下图片的处理及滑动轨迹的生成思路](https://www.52pojie.cn/thread-1162979-1-1.html)
本文是第三篇《接口交互的解密方法补遗》,书接上文,上一篇中,我们遗留了两个问题:
1. AES的密钥如何生成。
2. e的值是什么内容,如何计算得到。
不过在后来的调试过程中发现,geetest并不是简单的使用了base64加密,所以需要进一步分析一下,下面进入正文~
---
## AES加密结果的BASE64编码
在调试的过程中发现AES加密的结果并不是简单的BASE64编码,如下图:
看代码像是BASE64编码后提交给服务器的,但是我将`o的值`导出后,自己进行base64编码的结果与这里不一样,因此跟进`$_BABW`方法继续分析,如下图:
由此可知,最终的加密结果是由`["res"]`+`["end"]`拼凑起来的,具体的跟进 `$_JJM`方法继续分析, 将混淆的代码还原之后,JS代码如下:
```javascript
var Base64 = {
"$_JGH": function(e){
var t = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789()";
return e < 0 || e >= t["length"] ? "." : t.charAt(e);
},
"$_JIY": function(e, t) {
return e >> t & 1;
},
"$_JJM": function(e, o) {
var $this = this;
function t(e, t) {
for (var n = 0, r = 24 - 1; 0 <= r; r -= 1) {
1 === $this["$_JIY"](t, r) && (n = (n << 1) + $this["$_JIY"](e, r));
}
return n;
}
for (var n = "", r = "", a = e["length"], s = 0; s < a; s += 3) {
var c;
if (s + 2 < a) {
c = (e << 16) + (e << 8) + e;
n += this["$_JGH"](t(c, 7274496)) + this["$_JGH"](t(c, 9483264)) + this["$_JGH"](t(c, 19220)) + this["$_JGH"](t(c, 235));
} else {
var u = a % 3;
2 == u ? (c = (e << 16) + (e << 8), n += this["$_JGH"](t(c, 7274496)) + this["$_JGH"](t(c, 9483264)) + this["$_JGH"](t(c, 19220)), r = ".") : 1 == u && (c = e << 16, n += this["$_JGH"](t(c, 7274496)) + this["$_JGH"](t(c, 9483264)), r = "." + ".");
}
}
return {'res': n, 'end': r}
},
"encode": function (e) {
var t = this["$_JJM"](e);
return t["res"] + t["end"];
}
}
// 使用方法,直接将要加密的字节数组传入方法即可
var result = Base64["encode"]()
console.log(result);
```
导出官网要加密的数组,加密结果如下图:
## AES密钥的生成方法
继续向上追溯AES密钥的生成方法,来到如下位置:
跟进wr方法,得到如下代码:
由此,整理出来AES密码的生成方法为:
``` javascript
/**
* AES密钥的获取方法
*/
function create_key() {
return (65536 * (1 + Math.random()) | 0).toString(16).substring(1);
};
function get_aes_key() {
return wr() + wr() + wr() + wr();
}
```
## e值的来历
跟进分析之后,发现e的值是从浏览器的window对象中依次获取,如下内容的信息:
```
0: "textLength"
1: "HTMLLength"
2: "documentMode"
3: "A"
4: "ARTICLE"
5: "ASIDE"
6: "AUDIO"
7: "BASE"
8: "BUTTON"
9: "CANVAS"
10: "CODE"
11: "IFRAME"
12: "IMG"
13: "INPUT"
14: "LABEL"
15: "LINK"
16: "NAV"
17: "OBJECT"
18: "OL"
19: "PICTURE"
20: "PRE"
21: "SECTION"
22: "SELECT"
23: "SOURCE"
24: "SPAN"
25: "STYLE"
26: "TABLE"
27: "TEXTAREA"
28: "VIDEO"
29: "screenLeft"
30: "screenTop"
31: "screenAvailLeft"
32: "screenAvailTop"
33: "innerWidth"
34: "innerHeight"
35: "outerWidth"
36: "outerHeight"
37: "browserLanguage"
38: "browserLanguages"
39: "systemLanguage"
40: "devicePixelRatio"
41: "colorDepth"
42: "userAgent"
43: "cookieEnabled"
44: "netEnabled"
45: "screenWidth"
46: "screenHeight"
47: "screenAvailWidth"
48: "screenAvailHeight"
49: "localStorageEnabled"
50: "sessionStorageEnabled"
51: "indexedDBEnabled"
52: "CPUClass"
53: "platform"
54: "doNotTrack"
55: "timezone"
56: "canvas2DFP"
57: "canvas3DFP"
58: "plugins"
59: "maxTouchPoints"
60: "flashEnabled"
61: "javaEnabled"
62: "hardwareConcurrency"
63: "jsFonts"
64: "timestamp"
65: "performanceTiming"
66: "internalip"
67: "mediaDevices"
68: "DIV"
69: "P"
70: "UL"
71: "LI"
72: "SCRIPT"
73: "deviceorientation"
74: "touchEvent"
```
这些key对应的内容为:
```
A: 337
BUTTON: 1
CPUClass: undefined
DIV: 648
HTMLLength: 185450
IFRAME: 7
IMG: 60
INPUT: 48
LABEL: 9
LI: 222
LINK: 17
P: 75
SCRIPT: 69
SECTION: 1
SPAN: 239
STYLE: 2
UL: 49
browserLanguage: "zh-CN"
browserLanguages: "zh-CN,zh,zh-TW,zh-HK,en-US,en"
canvas2DFP: "805a6cdeadd4f48ade985597f74928cb"
canvas3DFP: "cc03697d39800df1ef0d2229132a62e8"
colorDepth: 24
cookieEnabled: 1
devicePixelRatio: 1
deviceorientation: false
doNotTrack: 1
documentMode: "CSS1Compat"
flashEnabled: -1
hardwareConcurrency: 4
indexedDBEnabled: 1
innerHeight: 443
innerWidth: 1634
internalip: undefined
javaEnabled: 0
jsFonts: "AndaleMono,Arial,ArialBlack,ArialNarrow,ArialRoundedMTBold,ArialUnicodeMS,ComicSansMS,Courier,CourierNew,Geneva,Georgia,Helvetica,HelveticaNeue,Impact,LUCIDAGRANDE,MicrosoftSansSerif,Monaco,Palatino,Tahoma,Times,TimesNewRoman,TrebuchetMS,Verdana"
localStorageEnabled: 1
maxTouchPoints: 0
mediaDevices: -1
netEnabled: 1
outerHeight: 1027
outerWidth: 1634
performanceTiming: "-1,-1,0,0,0,0,0,123,208,1,99154,10,8,403,418,1005,-1,-1,-1,-1"
platform: "MacIntel"
plugins: ""
screenAvailHeight: 1027
screenAvailLeft: 44
screenAvailTop: 23
screenAvailWidth: 1636
screenHeight: 1050
screenLeft: 46
screenTop: 23
screenWidth: 1680
sessionStorageEnabled: 1
systemLanguage: undefined
textLength: 46216
timestamp: 1568186269219
timezone: -8
touchEvent: false
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0"
```
将这些值取为数组并用 `!!`进行分隔生成字符串,就得到了e的值。因页面内容相对固定,也取固定字符来用,省的每次单独生成了。
至此,接口交互中参数加密解密的部分就都分析完毕了。下一篇也是最后一篇,我们一起来看如何绕过号称基于大数据的智能行为验证组件Slide的内容: 《[极验反爬虫防护分析之slide验证方式下图片的处理及滑动轨迹的生成思路](https://www.52pojie.cn/thread-1162979-1-1.html)》
Hatsune_miku 发表于 2020-4-23 18:11
你好,我基本上已经按照你的这四步走完了,还有一写不太清楚的地方想请教。
![](https://ae01.alicdn. ...
问题1:
我想你应该是想要找到序列化之前的待加密文本,AoWMT.Dpx这个方法是js源码加密后的解码方法,它本身通过参数传递相应的下标来定义序列化方法的。
你可以单步跟踪一下找到序列化方法,应该是不难的。
问题2:
计算缺口的距离其实就是两张图片从左往右发现像素不一致的像素点数,大致代码如下;
"""
"""
import io
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
class ImageContainer(object):
def __init__(self):
self.__bg_image = None
self.__fullbg_image = None
def init(self, bg, fullbg):
self.__bg_image = self.__recover(Image.open(io.BytesIO(bg)))
self.__fullbg_image = self.__recover(Image.open(io.BytesIO(fullbg)))
def showBgImage(self):
if self.__bg_image is None:
raise AttributeError('背景图片还未设置')
plt.imshow(self.__bg_image)
plt.show()
def showFullBgImage(self):
if self.__fullbg_image is None:
raise AttributeError('背景图片还未设置')
plt.imshow(self.__fullbg_image)
plt.show()
def __sequence(self):
t = 0
n = []
e = "6_11_7_10_4_12_3_1_0_5_2_9_8".split("_")
for r in range(0, 52):
t = 2 * int(e) + r % 2
if 0 == int(r/2)%2:
t += -1 if (r%2) else 1
t += 26 if (r<26) else 0
n.append(t)
return n
def __recover(self, _img):
"""
用于将图片还原
@Param _img: 图片
@Return new img
"""
r = 160
a = int(r / 2)
np_image = np.array(_img)
_seq = self.__sequence()
new_np_img = np.zeros((160, 312, 3), dtype=np.uint8)
for u in range(0, 52):
c = _seq % 26 * 12 + 1
_ = int(a if (25 < _seq) else 0)
xpos = u % 26 * 10
ypos = a if (25 < u) else 0
slice_img = np_image
n = len(slice_img)
new_np_img = slice_img
return Image.fromarray(new_np_img).crop((0, 0, 260, 160))
def __is_similar(self, x, y):
"""
判断两个像素是否相同
:param x: 位置x
:param y: 位置y
:return: 像素是否相同
"""
bg_pixel = self.__bg_image.getpixel((x, y))
fullbg_pixel = self.__fullbg_image.getpixel((x, y))
for i in range(0, 3):
if abs(bg_pixel-fullbg_pixel)>=30:
return False
return True
def get_distance(self):
'''
计算缺口的位置
:return: 待滑动的距离
'''
border_width = 3 #slice 图片阴影部分的宽度
for x in range(60, self.__bg_image.size):
for y in range(0, self.__bg_image.size):
if not self.__is_similar(x, y):
returnx - border_width
由于这部分代码跟极验没有太大关系,应该可以直接使用的。 各大网站些许调用方式些许不一样 额,标题仔细看有点邪恶。。。 清大疯 发表于 2020-4-22 18:58
额,标题仔细看有点邪恶。。。
哈哈哈,我还以为没人懂我! 什么时候我爬虫也能写出个抢鞋就完美了{:301_999:} 想学,但是没学过。 路过看看3456 有料有料,跟进再看 你好,我基本上已经按照你的这四步走完了,还有一写不太清楚的地方想请教。
![](https://ae01.alicdn.com/kf/U3c8f92866c56492abc75b05727d44905o.png)
1. 如图,<code>ciphertext</code> 值是 <code>o</code> ,而这个 <code>o</code>是个数组,通过待加密文本加密得到。![](https://ae01.alicdn.com/kf/U807c912cb90f475aa0e6a0c4b8bff7b8O.png)如图,是通过 <code>encrypt1</code> 函数加入下面两个参数得出,<code>encrypt1</code>中调用了 <code>encrypt</code> ![](https://ae01.alicdn.com/kf/U09468c7d00c945cab2d6495352d4c40e7.png),我搜索了一下 <code>AoWMT.Dpx</code> 包含的东西太多了,想问下楼主怎么得出这个数组的。
2. 怎么计算缺口距离
页:
[1]
2