Automa-小红书批量采集下载记录
本帖最后由 mengnimen 于 2024-5-16 15:36 编辑之前有写过一篇小红书单个文章的图片视频无水印下载插件,如下。
https://www.52pojie.cn/thread-1902368-1-1.html
后来有群友找我,问能不能实现搜索关键词进行自动采集。由于当初我还在医院,没有时间进行二次修改,就拖了一段时间。
今天把这个半成品搞出来了。为什么说是半成品,因为采集关键词,需要自己手动添加到插件里面,而且只采集图片(当时的群友需求),
没有文章和评论的采集(有需要也可以加)。
顺便把整个流程记录一下。
这次的难点在于加载文章,由于我也不会编程代码,对于怎么加载出来文章我也不是很清楚,我的理解就是如果用HTTP请求模块加载所有数据内容获取文章链接的话,只能加载11个文章链接(具体多少我也忘了)。想一次性加载出需要采集的文章数量是不行的,所以想通过完全静默的方式运行是行不通的(可能我技术不到位,能实现的大佬可以指导我一下)。
所以我就用了滚动元素的方式来获取文章链接,首先获取第一页的数据,获取链接,然后滚动元素,继续重复获取。将获取的数据放到表格中存储,最后分步去采集你需要的单个文章的内容。我的插件内容只采集了图片,对于文本来说,我提供下思路,在通过HTTP请求获取图片地址的时候,里面就包含了文本内容,就可以直接存下来的。有这个需求的自己可以加上~
2024-05-16更新,由于之前是在生病其间搞的,忘了哪没接对,今天有时间更新了😂
基本的批量图片采集是没有问题了,后续有时间了再看看把文章也采集下来。
{
"description": "",
"drawflow": {
"edges": [
{
"data": {},
"events": {},
"id": "vueflow__edge-TvK_KmDIlvK9RdXXpX2ujTvK_KmDIlvK9RdXXpX2uj-output-1-54y39ed54y39ed-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "TvK_KmDIlvK9RdXXpX2uj",
"sourceHandle": "TvK_KmDIlvK9RdXXpX2uj-output-1",
"sourceX": 308.0877461062106,
"sourceY": 410.5,
"target": "54y39ed",
"targetHandle": "54y39ed-input-1",
"targetX": 343,
"targetY": 410,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-54y39ed54y39ed-output-1-eufu5ideufu5id-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "54y39ed",
"sourceHandle": "54y39ed-output-1",
"sourceX": 575,
"sourceY": 410,
"target": "eufu5id",
"targetHandle": "eufu5id-input-1",
"targetX": 623.7643822245627,
"targetY": 409.9084018682959,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-eufu5ideufu5id-output-1-m5s98eam5s98ea-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "eufu5id",
"sourceHandle": "eufu5id-output-1",
"sourceX": 855.7643822245627,
"sourceY": 409.9084018682959,
"target": "m5s98ea",
"targetHandle": "m5s98ea-input-1",
"targetX": 783.093626520155,
"targetY": 276.2060439657272,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-m5s98eam5s98ea-output-1-hs3v4r4hs3v4r4-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "m5s98ea",
"sourceHandle": "m5s98ea-output-1",
"sourceX": 1015.0936875553112,
"sourceY": 276.2060439657272,
"target": "hs3v4r4",
"targetHandle": "hs3v4r4-input-1",
"targetX": 951.5243806611331,
"targetY": 410.894404816628,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-hs3v4r4hs3v4r4-output-1-mwohqyomwohqyo-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "hs3v4r4",
"sourceHandle": "hs3v4r4-output-1",
"sourceX": 1183.5244416962894,
"sourceY": 410.894404816628,
"target": "mwohqyo",
"targetHandle": "mwohqyo-input-1",
"targetX": 1110.9932469726484,
"targetY": 277.6049874658771,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-mwohqyomwohqyo-output-1-80j8ex980j8ex9-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "mwohqyo",
"sourceHandle": "mwohqyo-output-1",
"sourceX": 1342.9932469726484,
"sourceY": 277.6049874658771,
"target": "80j8ex9",
"targetHandle": "80j8ex9-input-1",
"targetX": 1280.1611027304355,
"targetY": 408.5504076589536,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-80j8ex980j8ex9-output-1-o8oow83o8oow83-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "80j8ex9",
"sourceHandle": "80j8ex9-output-1",
"sourceX": 1512.1611027304355,
"sourceY": 408.5504076589536,
"target": "o8oow83",
"targetHandle": "o8oow83-input-1",
"targetX": 118.806251918322,
"targetY": 563.9159349280665,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-o8oow83o8oow83-output-1-0qxw4w90qxw4w9-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "o8oow83",
"sourceHandle": "o8oow83-output-1",
"sourceX": 350.806251918322,
"sourceY": 563.9159349280665,
"target": "0qxw4w9",
"targetHandle": "0qxw4w9-input-1",
"targetX": 422.24742841926,
"targetY": 535.9223224697537,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-hogy4flhogy4fl-output-2-o8oow83o8oow83-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "hogy4fl",
"sourceHandle": "hogy4fl-output-2",
"sourceX": 1522.889453522408,
"sourceY": 611.6868383642357,
"target": "o8oow83",
"targetHandle": "o8oow83-input-1",
"targetX": 118.806251918322,
"targetY": 563.9159349280665,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-0qxw4w90qxw4w9-output-1-edx9ekjedx9ekj-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "0qxw4w9",
"sourceHandle": "0qxw4w9-output-1",
"sourceX": 654.24742841926,
"sourceY": 535.9223224697537,
"target": "edx9ekj",
"targetHandle": "edx9ekj-input-1",
"targetX": 788.7820889101317,
"targetY": 534.4890350106167,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-edx9ekjedx9ekj-output-1-hogy4flhogy4fl-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "edx9ekj",
"sourceHandle": "edx9ekj-output-1",
"sourceX": 1020.782149945288,
"sourceY": 534.4890350106167,
"target": "hogy4fl",
"targetHandle": "hogy4fl-input-1",
"targetX": 1226.889453522408,
"targetY": 569.0930883642357,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-hogy4flhogy4fl-output-1-pao9qrwpao9qrw-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "hogy4fl",
"sourceHandle": "hogy4fl-output-1",
"sourceX": 1522.889453522408,
"sourceY": 569.0930883642357,
"target": "pao9qrw",
"targetHandle": "pao9qrw-input-1",
"targetX": 119.15603518906906,
"targetY": 736.8739305650223,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-pao9qrwpao9qrw-output-1-lull0balull0ba-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "pao9qrw",
"sourceHandle": "pao9qrw-output-1",
"sourceX": 351.1560688558652,
"sourceY": 736.8739305650223,
"target": "lull0ba",
"targetHandle": "lull0ba-input-1",
"targetX": 416.54144318687963,
"targetY": 744.625517672512,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-43m1zd543m1zd5-output-1-9o1dhq19o1dhq1-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "43m1zd5",
"sourceHandle": "43m1zd5-output-1",
"sourceX": 952.9252904064685,
"sourceY": 741.8616018417424,
"target": "9o1dhq1",
"targetHandle": "9o1dhq1-input-1",
"targetX": 1013.4649022953961,
"targetY": 741.6235529858694,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-9o1dhq19o1dhq1-output-1-iyrpt6fiyrpt6f-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "9o1dhq1",
"sourceHandle": "9o1dhq1-output-1",
"sourceX": 1245.464902295396,
"sourceY": 741.6235529858694,
"target": "iyrpt6f",
"targetHandle": "iyrpt6f-input-1",
"targetX": 1288.1746846427748,
"targetY": 740.3481899711979,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-iyrpt6fiyrpt6f-output-1-n0e3e5yn0e3e5y-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "iyrpt6f",
"sourceHandle": "iyrpt6f-output-1",
"sourceX": 1520.1746999015638,
"sourceY": 740.3481899711979,
"target": "n0e3e5y",
"targetHandle": "n0e3e5y-input-1",
"targetX": 124.49473234998618,
"targetY": 941.2731826757098,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-lull0balull0ba-output-1-43m1zd543m1zd5-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "lull0ba",
"sourceHandle": "lull0ba-output-1",
"sourceX": 648.5414431868796,
"sourceY": 744.625517672512,
"target": "43m1zd5",
"targetHandle": "43m1zd5-input-1",
"targetX": 720.9252904064685,
"targetY": 741.8616018417424,
"type": "custom",
"updatable": true
},
{
"data": {},
"events": {},
"id": "vueflow__edge-n0e3e5yn0e3e5y-output-1-xi74rl9xi74rl9-input-1",
"markerEnd": "arrowclosed",
"selectable": true,
"source": "n0e3e5y",
"sourceHandle": "n0e3e5y-output-1",
"sourceX": 356.4947323499862,
"sourceY": 941.2731826757098,
"target": "xi74rl9",
"targetHandle": "xi74rl9-input-1",
"targetX": 498.9295145309519,
"targetY": 975.4799311894167,
"type": "custom",
"updatable": true
}
],
"nodes": [
{
"data": {
"activeInInput": false,
"contextMenuName": "",
"contextTypes": [],
"date": "",
"days": [],
"delay": 5,
"description": "",
"disableBlock": false,
"interval": 60,
"isUrlRegex": false,
"observeElement": {
"baseElOptions": {
"attributeFilter": [],
"attributes": false,
"characterData": false,
"childList": true,
"subtree": false
},
"baseSelector": "",
"matchPattern": "",
"selector": "",
"targetOptions": {
"attributeFilter": [],
"attributes": false,
"characterData": false,
"childList": true,
"subtree": false
}
},
"parameters": [],
"preferParamsInTab": false,
"shortcut": "",
"time": "00:00",
"type": "manual",
"url": ""
},
"events": {},
"id": "TvK_KmDIlvK9RdXXpX2uj",
"label": "trigger",
"position": {
"x": 96.0877461062106,
"y": 374.5
},
"type": "BlockBasic"
},
{
"data": {
"active": true,
"customUserAgent": false,
"description": "新建小红书页面",
"disableBlock": false,
"inGroup": false,
"tabZoom": 1,
"updatePrevTab": false,
"url": "https://www.xiaohongshu.com/explore",
"userAgent": "",
"waitTabLoaded": true
},
"events": {},
"id": "54y39ed",
"label": "new-tab",
"position": {
"x": 363,
"y": 374
},
"type": "BlockBasic"
},
{
"data": {
"assignVariable": false,
"clearValue": true,
"dataColumn": "",
"delay": "0",
"description": "配置搜索关键词",
"disableBlock": false,
"events": [],
"findBy": "cssSelector",
"getValue": false,
"markEl": false,
"multiple": false,
"optionPosition": "1",
"saveData": false,
"selectOptionBy": "value",
"selected": true,
"selector": "input#search-input",
"type": "text-field",
"value": "减肥",
"variableName": "",
"waitForSelector": true,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "eufu5id",
"label": "forms",
"position": {
"x": 643.7643822245627,
"y": 373.9084018682959
},
"type": "BlockBasic"
},
{
"data": {
"description": "选择“搜索”",
"disableBlock": false,
"findBy": "cssSelector",
"markEl": false,
"multiple": false,
"selector": ".search-icon > .reds-icon",
"waitForSelector": false,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "hs3v4r4",
"label": "event-click",
"position": {
"x": 971.5243806611331,
"y": 374.894404816628
},
"type": "BlockBasic"
},
{
"data": {
"description": "选择“图文”",
"disableBlock": false,
"findBy": "cssSelector",
"markEl": false,
"multiple": false,
"selector": ".content-container > .channel:nth-child(2)",
"waitForSelector": true,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "80j8ex9",
"label": "event-click",
"position": {
"x": 1300.1611027304355,
"y": 372.5504076589536
},
"type": "BlockBasic"
},
{
"data": {
"disableBlock": false,
"time": 500
},
"events": {},
"id": "m5s98ea",
"label": "delay",
"position": {
"x": 803.093626520155,
"y": 217.62010646572716
},
"type": "BlockDelay"
},
{
"data": {
"disableBlock": false,
"time": 500
},
"events": {},
"id": "mwohqyo",
"label": "delay",
"position": {
"x": 1130.9932469726484,
"y": 219.01904996587712
},
"type": "BlockDelay"
},
{
"data": {
"action": "get",
"addExtraRow": false,
"assignVariable": true,
"attributeName": "href",
"attributeValue": "",
"dataColumn": "yke_v",
"description": "获取URL",
"disableBlock": false,
"extraRowDataColumn": "",
"extraRowValue": "",
"findBy": "xpath",
"markEl": false,
"multiple": true,
"saveData": true,
"selector": "id(\"global\")/DIV/DIV/DIV/DIV/SECTION/DIV/A",
"variableName": "url",
"waitForSelector": true,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "edx9ekj",
"label": "attribute-value",
"position": {
"x": 808.7820889101317,
"y": 498.4890350106167
},
"type": "BlockBasic"
},
{
"data": {
"disableBlock": false,
"time": "2500"
},
"events": {},
"id": "o8oow83",
"label": "delay",
"position": {
"x": 138.806251918322,
"y": 505.32999742806646
},
"type": "BlockDelay"
},
{
"data": {
"description": "向下滚动",
"disableBlock": false,
"findBy": "cssSelector",
"incX": false,
"incY": true,
"markEl": false,
"multiple": true,
"scrollIntoView": false,
"scrollX": 0,
"scrollY": 10000,
"selector": "html",
"smooth": true,
"waitForSelector": false,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "0qxw4w9",
"label": "element-scroll",
"position": {
"x": 442.24742841926,
"y": 499.9223224697537
},
"type": "BlockBasic"
},
{
"data": {
"disableBlock": false,
"repeatFor": "1"
},
"events": {},
"id": "hogy4fl",
"label": "repeat-task",
"position": {
"x": 1246.889453522408,
"y": 498.50715086423565
},
"type": "BlockRepeatTask"
},
{
"data": {
"description": "循环表格中的每个URL",
"disableBlock": false,
"elementSelector": "",
"fromNumber": 1,
"loopData": "[]",
"loopId": "A6uM_U",
"loopThrough": "data-columns",
"maxLoop": "0",
"referenceKey": "",
"resumeLastWorkflow": false,
"reverseLoop": false,
"startIndex": 0,
"toNumber": 10,
"variableName": "url",
"waitForSelector": false,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "pao9qrw",
"label": "loop-data",
"position": {
"x": 139.15608252583752,
"y": 698.8739192561575
},
"type": "BlockBasic"
},
{
"data": {
"assignVariable": true,
"body": "{}",
"contentType": "json",
"dataColumn": "",
"dataPath": "",
"description": "获取活动页文本",
"disableBlock": false,
"headers": [],
"method": "GET",
"responseType": "text",
"saveData": false,
"timeout": 10000,
"url": "{{loopData.A6uM_U.URL}}",
"variableName": "html"
},
"events": {},
"id": "lull0ba",
"label": "webhook",
"position": {
"x": 436.54144318687963,
"y": 696.625517672512
},
"type": "BlockBasicWithFallback"
},
{
"data": {
"clearLoop": false,
"disableBlock": false,
"loopId": "A6uM_U"
},
"events": {},
"id": "xi74rl9",
"label": "loop-breakpoint",
"position": {
"x": 518.9295145309519,
"y": 899.8939936894167
},
"type": "BlockLoopBreakpoint"
},
{
"data": {
"code": "// 假设htmlContent变量包含了通过HTTP GET请求获得的、包含转义字符的HTML内容\nlet htmlContent = automaRefData('variables', 'html');\n\n// 使用replace方法去除所有的转义反斜杠\\\nhtmlContent = htmlContent.replace(/\\\\/g, '');\n\n// 现在htmlContent已经是没有转义字符的HTML内容,可以进行正则表达式匹配\nconst urlRegex = /<meta name=\"og:image\" content=\"([^\"]+)\"/g;\nlet matches;\nconst urls = new Set(); // 使用Set来自动处理重复的URL\n\nwhile ((matches = urlRegex.exec(htmlContent)) !== null) {\n // 将每个匹配的URL添加到Set中\n urls.add(matches);\n}\n\n// 转换Set为数组,这个数组包含了所有唯一的og:image URL\nconst uniqueUrls = Array.from(urls);\n\n// 使用automaSetVariable函数将结果数组存储在变量tt中\nautomaSetVariable('tt', uniqueUrls);\n",
"context": "website",
"description": "提取图片url",
"disableBlock": false,
"everyNewTab": false,
"preloadScripts": [],
"runBeforeLoad": false,
"timeout": 20000
},
"events": {},
"id": "43m1zd5",
"label": "javascript-code",
"position": {
"x": 740.9252904064685,
"y": 705.8616018417424
},
"type": "BlockBasic"
},
{
"data": {
"description": "循环图片url",
"disableBlock": false,
"elementSelector": "",
"fromNumber": 0,
"loopData": "[]",
"loopId": "uukWs-",
"loopThrough": "variable",
"maxLoop": "3",
"referenceKey": "",
"resumeLastWorkflow": false,
"reverseLoop": false,
"startIndex": 0,
"toNumber": 2,
"variableName": "tt",
"waitForSelector": false,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "9o1dhq1",
"label": "loop-data",
"position": {
"x": 1033.464902295396,
"y": 703.6235529858694
},
"type": "BlockBasic"
},
{
"data": {
"assignVariable": false,
"dataColumn": "",
"description": "时间命名图片",
"disableBlock": false,
"filename": "{{$date(\"timestamp\")}}.jpeg",
"findBy": "cssSelector",
"markEl": false,
"multiple": false,
"onConflict": "uniquify",
"saveData": true,
"saveDownloadIds": false,
"saveToGDrive": false,
"selector": "",
"type": "url",
"url": "{{loopData.uukWs-}}",
"variableName": "",
"waitForSelector": false,
"waitSelectorTimeout": 5000
},
"events": {},
"id": "iyrpt6f",
"label": "save-assets",
"position": {
"x": 1308.1746846427748,
"y": 704.3481899711979
},
"type": "BlockBasic"
},
{
"data": {
"clearLoop": false,
"disableBlock": false,
"loopId": "uukWs-"
},
"events": {},
"id": "n0e3e5y",
"label": "loop-breakpoint",
"position": {
"x": 144.49473234998618,
"y": 865.6872451757098
},
"type": "BlockLoopBreakpoint"
}
],
"position": [
33.213036405202274,
-41.87621174514811
],
"viewport": {
"x": 33.213036405202274,
"y": -41.87621174514811,
"zoom": 0.5112142653049623
},
"zoom": 0.5112142653049623
},
"extVersion": "1.28.27",
"globalData": "{\n\t\"key\": \"value\"\n}",
"icon": "riGlobalLine",
"includedWorkflows": {},
"name": "小红书搜索",
"settings": {
"blockDelay": 0,
"debugMode": false,
"defaultColumnName": "column",
"execContext": "popup",
"executedBlockOnWeb": false,
"inputAutocomplete": true,
"insertDefaultColumn": false,
"notification": true,
"onError": "stop-workflow",
"publicId": "",
"restartTimes": 3,
"reuseLastState": false,
"saveLog": true
},
"table": [
{
"id": "yke_v",
"name": "URL",
"type": "string"
}
],
"version": "1.28.27"
}
大哥可以加个关键词还有评论采集的吗,有需求。 右下角的小水印不能去除啊?怎么回事 Minor昔年 发表于 2024-5-13 01:23
大哥可以加个关键词还有评论采集的吗,有需求。
关键词在里面已经有了,你导入看下就知道了。在表单模块里面,我测试的是“减肥”这个词。评论在我原有的基础上修改一下就行。不过我这几天没时间,你可以先自己看看 zxquan 发表于 2024-5-13 09:35
右下角的小水印不能去除啊?怎么回事
没看到是哪个?吾爱破解的水印? 谢谢你的分享啊,automa的教程有点少, 楼主能打包个成品吗,代码实在看不懂 很厉害的思路,能力,学习了。。 这是什么意思啊