BeautifulSoup4

时间:2023-02-23 10:56:26


1. C++调用python脚本时,如果有异常没有处理,之后再调用BeautifulSoup()函数会出现异常,导致函数退出,所以在python脚本上要处理异常

如:

def tableparser(server, strkey, htmltext):
restext = 'exception error'
logtext = "testing exception....." + strkey
logging.debug(logtext)
try:
restext = tableparserEx(server, strkey, htmltext)
except Exception , e:
print e
return restext

 

2.使用正则表达式查找

htmltext = "<td name = '123'><id>5中国123</id><font>my</font><td>"
htmltext = htmltext.replace('\n', '')
#htmltext = htmltext.decode("utf8")
soup = BeautifulSoup(htmltext)
#[script.extract() for script in soup.findAll('script')]
tag_select = soup.find(text = re.compile(u'中国'))

3.finaAll不要递归查找,只查找直接子节点

htmltext = "<table><tr>1<tr>11</tr></tr><tr>2</tr><tr>3</tr></table>"
htmltext = htmltext.replace('\n', '')
soup = BeautifulSoup(htmltext)
tagtable = soup.find('table')
trs = tagtable.findAll('tr', recursive=False) # 默认是递归查找所有的子节点


4.C++调用后,返回中文遇到崩溃

reload(sys) 
sys.setdefaultencoding("utf-8")