uiautomation获取wx公众号文章

asqwe1 发表于 2023-12-13 15:47

本帖最后由 asqwe1 于 2024-11-22 17:49 编辑

请看第二种方法:https://www.52pojie.cn/thread-1892402-1-1.html
# -*- coding: utf-8 -*-

import uiautomation as auto
import re
import time

'''
原理：模拟鼠标点击公众号文章，用微信自带的浏览器打开，然后右键复制链接。
个人练习uiautomation库的使用，获取公众号所有的文章链接，会将链接放到D盘url.txt中，文章标题放到D盘title.txt
然后配合火狐Save Page WE Options插件，批量保存为html，Save Page WE插件选项（Load Listed URLs From File.. ，Load lazy content before saving：scroll page，time..其它选项根据需要）
你还可以用python wget或requests写一段代码获取，效果可能没有火狐好
业余新手一枚，大佬可以根据需求自行修改代码，出错不要@我
中途如果你只想采一部分，不想继续了，可以关掉公众号窗口，程序会保存你已采过的链接。
如果程序出错关掉公众号窗口、cmd窗口快速点击并按下ctrl+c、键盘的win键然后键盘操作上下左右键TAB等键选择注销或关机
准备：
必须将微信升级到最新版本，v3.9.8.15，其它版本出错。
微信设置：去掉勾选“使用默认浏览器打开网页”、去掉自动升级
首先打开公众号的窗口，并且公众号顶部与底部可见且不要与任务栏重叠，尽可能少的打开其它窗口，以防干扰
'''
#auto.uiautomation.SetGlobalSearchTimeout(5)
public = auto.WindowControl(Name='公众号',Depth=1)
public.SetActive()
kuang = public.DocumentControl(Depth=2)#主体框架

#摸拟鼠标操作，复制每篇文章的链接
def geturl():
liulan = auto.PaneControl(Name='微信',ClassName='Chrome_WidgetWin_0',Depth = 1)
url = []
if liulan.Exists():
   liulan.SetActive()
   liulan.SetTopmost()
for item, depth in auto.WalkControl(liulan, includeTop=True, maxDepth=15):
   if item.ControlType != auto.ControlType.TabItemControl:
         continue
   item.Click()
   item.RightClick()
   youjian = liulan.MenuItemControl(Name='复制链接',Depth = 6)
   #auto.SetClipboardText('')
   if youjian.Exists():
         youjian.Click()
         url.append(auto.GetClipboardText())
liulan.ButtonControl(Name='关闭',Depth=4).Click()
return url

def main(public,kuang):
global lis
global nam
n = 0
try:
   for item, depth in auto.WalkControl(public, includeTop=True, maxDepth=9):
         if depth !=9 or item.ControlType != auto.ControlType.TextControl or '已关注' in item.Name or '发消息' in item.Name:
            continue
         if auto.IsKeyPressed(auto.Keys.VK_F12):
            break
         print(item.Name)
         print(item.BoundingRectangle)
         nam.append(item.Name)
         public.SetActive()
         public.SetTopmost()
         #以下实现向下滚动，使文章可见并点击，会有个别文章遗漏没有点击，可尝试sleep时间长一点，
         while item.Exists() and item.BoundingRectangle.left == 0:
            kuang.WheelDown(waitTime=0.02)
            time.sleep(2)
         item.Click()
         #clic(public,item,kuang)
         #每打开5篇文章采集一下链接并关掉微信自带的浏览器
         n +=1
         if n % 5 == 0:
            try:
               lis += geturl()
            except Exception:
               pass
except Exception:
               pass

#将链接与标题写到D盘
def opt2File(liss,ming=None):
with open(f'D:/{ming}.txt','a') as f:
   for x in liss:
         f.write(x + '\n')

if __name__ == "__main__":
try:
   lis = []
   nam = []
   main(public,kuang)
   opt2File(lis,'url')
   opt2File(nam,'title')
except KeyboardInterrupt:
   print("end...")

xaf 发表于 2024-2-6 14:47

需要弄公众号链接的话，建议去注册一个公众号，公众号平台有超链接，可以直接查所有公众号的所有文章链接，比你这样慢慢粘贴效率快的多，

haiyanuser 发表于 2023-12-13 16:06

沙发帮顶

bing3076 发表于 2023-12-13 16:17

这是个好东东！！感谢楼主

debug_cat 发表于 2023-12-13 17:16

工具不错啊，感谢分享。

turmasi1234 发表于 2023-12-13 17:16

用了一下不错，谢谢楼主的分享，

qumingyu286 发表于 2023-12-13 17:38

{:1_893:}感谢，非常有用！

zxsbk 发表于 2023-12-13 17:56

思路不错，先学习了。

zeh521 发表于 2023-12-13 18:29

好东东！！感谢

yyzzy 发表于 2023-12-13 22:01

挺厉害的感谢楼主分享

www52pjzk 发表于 2023-12-14 10:05

实用分享{:1_921:}{:1_921:}{:1_921:}

页: [1] 2 3

吾爱破解 - 52pojie.cn's Archiver

uiautomation获取wx公众号文章