使用Selenium Webdriver向下滚动页面

时间:2022-08-24 16:50:08

I have a dynamic page that loads products when the user scrolls down a page. I want to get the total number of products rendered on the display page. Currently I am using the following code to get to the bottom until all the products are being displayed.

我有一个动态页面,当用户向下滚动页面时加载产品。我想获得在显示页面上呈现的产品总数。目前我正在使用以下代码深入到底,直到显示所有产品。

elems = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x")))
print len(elems)
a = len(elems)
self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(4)
elem1 = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x")))
b = len(elem1)
while b > a:
    self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(4)
    elem1 = WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x")))
    a = b
    b = len(elem1)
print b

This is working nicely, but I want to know whether there is any better option of doing this?

这很好用,但我想知道是否有更好的选择呢?

3 个解决方案

#1


8  

You can perform this action easily using this line of code

您可以使用此行代码轻松执行此操作

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

And if you want to scroll down for ever you should try this.

如果你想向下滚动你应该试试这个。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Firefox()
driver.get("https://twitter.com/BarackObama")

while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

I am not sure about time.sleep(x value) cause loading data my take longer .. or less .. for more information please check the official Doc page

我不确定time.sleep(x值)导致加载数据需要更长时间..或更少..有关更多信息,请查看官方Doc页面

have fun :)

玩的开心 :)

#2


1  

I think you could condense your code down to this:

我想你可以将你的代码压缩到这个:

prior = 0
while True:
    self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    current = len(WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x"))))
    if current == prior:
        return current
    prior = current

I did away with all the identical lines by moving them all into the loop, which necessitated making the loop a while True: and moving the condition checking into the loop (because unfortunately, Python lacks any do-while).

我通过将它们全部移动到循环中来消除所有相同的行,这使得循环需要一段时间才真实:并且将条件检查移动到循环中(因为不幸的是,Python缺少任何行动)。

I also threw out the sleep and print statements - I'm not sure what their purpose was, but on my own page, I have found that the same number of elements load whether I sleep between scrolls or not. Further, in my own case, I don't need to know the count at any point, I just need to know when it has exhausted the list (but I added in a return variable so you can get the final count if you happen to need it. If you really want to print ever intermediate count, you can print current right after it's assigned in the loop.

我还抛出了睡眠和打印语句 - 我不确定它们的用途是什么,但在我自己的页面上,我发现无论是否在卷轴之间睡眠,都会加载相同数量的元素。此外,在我自己的情况下,我不需要知道任何时候的计数,我只需要知道什么时候它已经耗尽了列表(但我添加了一个返回变量,所以你可以得到最后的计数,如果你碰巧需要它。如果你真的想要打印中间计数,你可以在循环中分配后立即打印。

#3


1  

If you have no idea how many elements might be added to the page, but you just want to get all of them, it might be good to loop thusly:

如果您不知道可能会向页面添加多少元素,但您只想获取所有元素,那么循环可能会很好:

  • scroll down as described above
  • 如上所述向下滚动
  • wait a few seconds
  • 等几秒钟
  • save the size of the page source (xxx.page_source)
  • 保存页面源的大小(xxx.page_source)
  • if the size of the page source is larger than the last page source size saved, loop back and scroll down some more
  • 如果页面源的大小大于保存的最后一页源大小,则循环返回并向下滚动一些

I suppose that screenshot size might work fine too, depending upon the page you're loading, but this is working in my current program.

我认为截图大小也可能正常工作,具体取决于您正在加载的页面,但这在我当前的程序中有效。

#1


8  

You can perform this action easily using this line of code

您可以使用此行代码轻松执行此操作

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

And if you want to scroll down for ever you should try this.

如果你想向下滚动你应该试试这个。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Firefox()
driver.get("https://twitter.com/BarackObama")

while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

I am not sure about time.sleep(x value) cause loading data my take longer .. or less .. for more information please check the official Doc page

我不确定time.sleep(x值)导致加载数据需要更长时间..或更少..有关更多信息,请查看官方Doc页面

have fun :)

玩的开心 :)

#2


1  

I think you could condense your code down to this:

我想你可以将你的代码压缩到这个:

prior = 0
while True:
    self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    current = len(WebDriverWait(self.driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "x"))))
    if current == prior:
        return current
    prior = current

I did away with all the identical lines by moving them all into the loop, which necessitated making the loop a while True: and moving the condition checking into the loop (because unfortunately, Python lacks any do-while).

我通过将它们全部移动到循环中来消除所有相同的行,这使得循环需要一段时间才真实:并且将条件检查移动到循环中(因为不幸的是,Python缺少任何行动)。

I also threw out the sleep and print statements - I'm not sure what their purpose was, but on my own page, I have found that the same number of elements load whether I sleep between scrolls or not. Further, in my own case, I don't need to know the count at any point, I just need to know when it has exhausted the list (but I added in a return variable so you can get the final count if you happen to need it. If you really want to print ever intermediate count, you can print current right after it's assigned in the loop.

我还抛出了睡眠和打印语句 - 我不确定它们的用途是什么,但在我自己的页面上,我发现无论是否在卷轴之间睡眠,都会加载相同数量的元素。此外,在我自己的情况下,我不需要知道任何时候的计数,我只需要知道什么时候它已经耗尽了列表(但我添加了一个返回变量,所以你可以得到最后的计数,如果你碰巧需要它。如果你真的想要打印中间计数,你可以在循环中分配后立即打印。

#3


1  

If you have no idea how many elements might be added to the page, but you just want to get all of them, it might be good to loop thusly:

如果您不知道可能会向页面添加多少元素,但您只想获取所有元素,那么循环可能会很好:

  • scroll down as described above
  • 如上所述向下滚动
  • wait a few seconds
  • 等几秒钟
  • save the size of the page source (xxx.page_source)
  • 保存页面源的大小(xxx.page_source)
  • if the size of the page source is larger than the last page source size saved, loop back and scroll down some more
  • 如果页面源的大小大于保存的最后一页源大小,则循环返回并向下滚动一些

I suppose that screenshot size might work fine too, depending upon the page you're loading, but this is working in my current program.

我认为截图大小也可能正常工作,具体取决于您正在加载的页面,但这在我当前的程序中有效。