convert \uXXXX String to Unicode Characters in Python3.x

时间:2021-11-19 05:41:44

转换\uXXXX

if Python3.x:

  1. str.decode no longer exists in 3.x. that']s why Python 3.4: str : AttributeError: 'str' object has no attribute 'decode is thrown.
  2. Unicode literal string'\uxxxx\uxxxx' is different from string '\uxxxx\uxxxx'.

    if you don't understand what liternal means, check the py3.x ducumentation
./descape.py '\u627e\u4e0d\u5230\u8be5\u8bcd\u7684\u89e3\u91ca'
#!/usr/bin/env python3
# file : descape.py
# convert the escaped chars like `\u45e3` to unicode import sys, re def h2d(a):
if len(a) != 4:
return False
j = 16 ** 3
r = 0
for i in range(0,len(a)):
b = ord(a[i])- 48
r += (b-39 if b > 9 else b) * j
j //= 16
return chr(r) text = sys.argv[1]
# text is string. not unicode literals def descape(utext):
o = ''
for ac in re.split(r'\\u([a-f0-9]{4})',text):
if not ac or len(ac) != 4:
continue
cur = ac
o += h2d(cur)
return o
print(descape(text))

json module

json.dumps()json.dump()有一个参数ensure_ascii默认是True,改为False 就不会把汉字编码成\uxxxx了

References:

  1. Python 3.4: str : AttributeError: 'str' object has no attribute 'decode