解决golang chromedp 打开过多浏览器进程问题

bobo2017365 发表于 2022-10-20 11:09

本帖最后由 bobo2017365 于 2022-10-20 14:20 编辑

我写了一个爬虫，抽出了一个函数，专门用来打开浏览器的页面，获取文本内容

爬取了300多个页面之后，发现goland打开了几百个chrome进程，我猜是调用了一次公共函数之后，就重新打开了一个浏览器进程，该怎么改进呢？让程序只要检测到chrome进程是打开的，就利用已经有的进程

经过群友讨论和一番Google，在官方代码仓库issue里找到了解决办法

PS：官方开发者也是有些坑，没看到官网文档里有些退出的方法

package common

import (
   "context"
   "github.com/chromedp/cdproto/cdp"
   "github.com/chromedp/chromedp"
   "log"
   "time"
)

func GetHttpHtmlContent(url string, visibleEleId string, clickEle bool, headless bool) (string, error) {
   opts := append(chromedp.DefaultExecAllocatorOptions[:],
            chromedp.Flag("headless", headless),
   )

   c, concel_at_begin := chromedp.NewExecAllocator(context.Background(), opts...) # 这里的变量concel_at_begin 一定不能省去
   // create context
   chromeCtx, cancel := chromedp.NewContext(c, chromedp.WithLogf(log.Printf))
   //Execute an empty task,Create in advance with Chrome example
   chromedp.Run(chromeCtx, make([]chromedp.Action, 0, 1)...)
   // Create a context , The timeout is 40s
   timeoutCtx, cancel := context.WithTimeout(chromeCtx, 20*time.Second)
   defer cancel()
   var htmlContent string
   var res []*cdp.Node
   chromedp.Run(chromeCtx,
            chromedp.Navigate(url),
            //chromedp.EvaluateAsDevTools(fmt.Sprintf(`document.querySelector("%v")`, visibleEleId), ""),
            chromedp.Nodes(visibleEleId, &res, chromedp.AtLeast(0)),
   )
   if len(res) != 0 {
            //fmt.Println("length res is not 0, find the element", res)
            if clickEle {
                     chromedp.Run(timeoutCtx,
                           chromedp.SendKeys(`select`, "终"),
                           chromedp.Sleep(30*time.Second),
                           chromedp.OuterHTML(`body`, &htmlContent, chromedp.ByQuery),
                     )
            } else {
                     chromedp.Run(timeoutCtx,
                           chromedp.OuterHTML(`body`, &htmlContent, chromedp.ByQuery),
                     )
            }

   } else {
            //fmt.Println("length res is0, not find the element")
            chromedp.Run(timeoutCtx,
                     chromedp.OuterHTML(`body`, &htmlContent, chromedp.ByQuery),
            )
   }
   concel_at_begin() # 在这里添加这一句, 程序会在这里自动关闭浏览器
   return htmlContent, nil
}

fjqisba 发表于 2022-10-20 11:33

NewExecAllocator，这个函数貌似执行一次就行了？

lm93129 发表于 2022-10-20 11:39

GetHttpHtmlContent这个函数你每次调用的时候，都会打开一个chrome，你应该在这个函数里面使用一个close来关闭对应的chrome，或者将初始化chrome这件事情放在这个函数的外面，直接传入chromedp.Run到函数中处理相关事件。

bobo2017365 发表于 2022-10-20 12:14

fjqisba 发表于 2022-10-20 11:33
NewExecAllocator，这个函数貌似执行一次就行了？

也许我要把打开浏览器进程这个函数再拆分

url 应该传给具体打开页面的动作就好了

bobo2017365 发表于 2022-10-20 12:15

lm93129 发表于 2022-10-20 11:39
GetHttpHtmlContent这个函数你每次调用的时候，都会打开一个chrome，你应该在这个函数里面使用一个close来 ...

嗯，看到了官网的issue，说是可以用 cancel()在函数结尾处，这样可以关闭浏览器进程

nullable 发表于 2022-10-20 14:24

貌似有个 .stop() 用来停止当前开启的进程。或者直接拿到页码整数，用 for开循环后close() ？

bobo2017365 发表于 2022-10-20 14:40

nullable 发表于 2022-10-20 14:24
貌似有个 .stop() 用来停止当前开启的进程。或者直接拿到页码整数，用 for开循环后close() ？

爬虫是爬取同一个网站的不同页面，所以没办法用for循环

页: [1]

吾爱破解 - 52pojie.cn's Archiver

解决golang chromedp 打开过多浏览器进程问题