在BeautifulSoup类中使用正则表达式,将带有mailto链接的href过滤出来并进行获取。
例如:
import re from bs4 import BeautifulSoup import requests
url = 'https://example.com'
res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser')
mailto_regex = re.compile(r'^mailto:', re.IGNORECASE) mailto_links = soup.find_all('a', href=mailto_regex)
for link in mailto_links: print(link['href'])