Python UnicodeDecodeError: 'utf8' codec不能解码第74位的字节0x80:无效的开始字节。

I have some data in the hbase stored as bytes and strings combined delimited by \x00 padding.

我在hbase中有一些数据，以字节和字符串的形式存储，并使用\x00填充。

So the row in my hbase looks like:-

所以hbase中的行看起来是这样的。

00:00:00:00:00:00\x00\x80\x00\x00\x00U\xEF\xA0\xB00\x002\x0040.0.2.1\x00

There is value corresponding to this row (key) which is 100.

这个行(键)对应的值是100。

Row description:-

行描述:-

00:00:00:00:00:00 - This is mac address and is a string 
\x80\x00\x00\x00U\xEF\xA0\xB00 - This is the time which is saved as bytes
2 - this is customer id number stored as string
40.0.2.1 - this is store ID stored as string

I have used star base module to connect python to it's stargate server.

我已经使用了星基模块将python连接到它的stargate服务器。

Here is my code snippet to connection to starbase and to the hbase table, and try fetching out the value of that row:-

下面是与starbase和hbase表连接的代码段，并尝试获取该行的值:-。

from starbase import Connection
import starbase

C = Connection(host='10.10.5.2', port='60010')
get_table =  C.table('dummy_table')
mac_address = "00:00:00:00:00:00"
time_start = "\x80\x00\x00\x00U\xEF\xA0\xB00"
cus_id = "2"
store_id = "40.0.2.1"

create_query = "%s\x00%s\x00%s\x00%s\x00" % (mac,time_start,cus_id,store_id)

fetch_result = get_table.fetch(create_query)
print fetch_result

Expected output is:-

预期的输出是:-

You don't have to worry about the starbase connection and it's methods. They work just fine if everything was a string but now since time is converted into bytes, it is giving me error:-

你不必担心starbase连接和它的方法。如果一切都是字符串，那么它们就可以正常工作，但是现在由于时间被转换成字节，它给了我错误:-。

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 74: invalid start byte

Just in case you need to see the output of create_query when I print it:-

如果您需要在打印时查看create_query的输出:-。

00:00:1E:00:C8:36▒U▒v▒130.0.2.6

I would highly appreciate some help. Thanks

我非常感谢你的帮助。谢谢

2 个解决方案

#1

Try this

试试这个

time_start = "\\x80\\x00\\x00\\x00U\\xEF\\xA0\\xB00"

\x is escape sequence for hex values,

\x是十六进制值的转义序列，

create_query = "%s\x00%s\x00%s\x00%s\x00" % (mac,time_start,cus_id,store_id)

was converting time_start to string. And since x80 is not valid utf-8,it was throwing an error.

将time_start转换为字符串。由于x80不是有效的utf-8，所以它抛出了一个错误。

#2

My guess would be that your database doesn't support storing bytes in these fields; perhaps you must store strings.

我的猜测是，您的数据库不支持在这些字段中存储字节;也许您必须存储字符串。

One approach would be to convert your bytes into base64 strings before storing them in the database. For example:

一种方法是将您的字节转换为base64字符串，然后将它们存储在数据库中。例如:

>>> from base64 import b64encode, b64decode
>>> b64encode("\x80\x00\x00\x00U\xEF\xA0\xB00")
'gAAAAFXvoLAw'
>>> b64decode(_)
'\x80\x00\x00\x00U\xef\xa0\xb00'

#1