我们可以使用Beautiful Soup库中的find_all()方法来获取所有的子列表,并使用正则表达式来查找下一个较大的点来截断子列表。以下是示例代码:
import re
from bs4 import BeautifulSoup
html = """
- First
- Second
- First sub-item one
- First sub-item two
- Third
- Fourth
- Second sub-item one
- Second sub-item two
"""
soup = BeautifulSoup(html, 'html.parser')
sublists = soup.find_all('ul')
for sublist in sublists:
sublist_items = []
for item in sublist.find_all('li'):
sublist_items.append(item.text)
# Use regex to find next large point and break out of loop
if re.search('Third|Fourth', sublist_items[-1]):
break
print(sublist_items)
此代码将打印子列表 "Second sub-item one" 和 "Second sub-item two"。