使用Beautiful Soup库对网页进行解析,然后使用正则表达式匹配电子邮件地址。
代码示例:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# 使用正则表达式匹配电子邮件地址
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
emails = set(re.findall(email_pattern, soup.get_text()))
print(emails)
解析过程: