当BeautifulSoup或lxml意外地修改了标签时,可以使用以下解决方法:
from copy import copy
from bs4 import BeautifulSoup
# 创建BeautifulSoup对象
soup = BeautifulSoup('Hello World', 'html.parser')
# 复制标签
div = copy(soup.find('div'))
# 对副本进行操作
div['id'] = 'newdiv'
div.string = 'Hello Beautiful World'
# 将副本替换回原始标签
soup.find('div').replace_with(div)
# 打印修改后的标签
print(soup)
from bs4 import BeautifulSoup
# 创建BeautifulSoup对象
soup = BeautifulSoup('Hello World', 'html.parser')
# 获取标签对象
div = soup.find('div')
# 通过字符串方式修改标签的属性和内容
div['id'] = 'newdiv'
div.string = 'Hello Beautiful World'
# 打印修改后的标签
print(soup)
这些方法可以避免BeautifulSoup或lxml意外修改标签。