Beautiful Soup可以通过.find_all()方法和CSS选择器来过滤所需的标签和属性值。
以下是一个示例:
from bs4 import BeautifulSoup
html_doc = """
Beautiful Soup
Welcome to Beautiful Soup
Beautiful Soup is a Python library for pulling data out of HTML and XML files.
Installation
You can install Beautiful Soup using pip.
"""
soup = BeautifulSoup(html_doc, 'html.parser')
# 使用标签名过滤
div_tag = soup.find_all('div') # 返回包含所有标签的列表
print(div_tag)
# 使用CSS选择器过滤
p_desc = soup.select('.desc') # 返回所有class为desc的标签
print(p_desc)
输出结果:
[
Welcome to Beautiful Soup
Beautiful Soup is a Python library for pulling data out of HTML and XML files.
,
Installation
You can install Beautiful Soup using pip.
]
[Beautiful Soup is a Python library for pulling data out of HTML and XML files.
, You can install Beautiful Soup using pip.
]
可以看到,Beautiful Soup通过.find_all()方法和CSS选择器成功过滤了所需要的标签和属性值。
相关内容