使用BeautifulSoup库,将要处理的HTML代码以及想要找到的标签类型传入其中的soup函数中即可,如下所示:
from bs4 import BeautifulSoup
import requests
url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
links = soup.find_all("a")
for link in links:
print(link.get("href"))
使用正则表达式匹配HTML文本中的链接即可,如下所示:
import re
import requests
url = "https://example.com"
response = requests.get(url)
links = re.findall('href="(.*?)"', response.text)
for link in links:
print(link)