如何使用正则表达式过滤字符串中的非汉字内容？

从数据库包含html代码的text字段中取出前50字在网页中列表显示时，
用“left(rs("colname"),50)”可能截取不完整，导致结果页面错乱。
如何利用正则表达式去除非汉字字符，只保留汉字，使页面正常显示？

当然，这并不是上策，因为可能删除文本内容中的字母、数字等字符。
揭示：中文的Unicode大概是从4E00到9FA0，用/^[\u4E00-\u9FA0]+$/

3 个解决方案

#1

先把html代码去掉，再截取
<%
content="<html><head><title>asdasd</title></head><body>ddd111111</body></html>"
function RemoveHTML(fString)
dim re
set re = New RegExp
re.Global = True
re.IgnoreCase = True
if not isnull(fString) then
re.Pattern = "<(.[^>]*)>"
fString = re.Replace(fString,"")
RemoveHTML = fString
end if
end function

response.write RemoveHTML (content)
%>

#2

多谢，能否详细说明一下原理？"<(.[^>]*)>"是什么含义？

#3

正则表达式，你可以查阅相关资料

#1

#2

多谢，能否详细说明一下原理？"<(.[^>]*)>"是什么含义？

#3

正则表达式，你可以查阅相关资料

秒客网

如何使用正则表达式过滤字符串中的非汉字内容？

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章