在SQL数据库中有效地存储小的固定长度数组

时间:2022-09-29 21:37:25

I hope someone more experienced in databases can advise me.

我希望有更多数据库经验的人可以给我建议。

I'm trying to figure out the best way to store a 5-length array of 1-bit values. This gives only 32 distinct combinations.

我试图找出存储5位1位值数组的最佳方法。这仅提供32种不同的组合。

If I store it in 5 columns, it will take 5 bytes per user. This "feels bad" given that it's only 5 bits of data - and 35 wasted bits.

如果我将它存储在5列中,则每个用户需要5个字节。鉴于它只有5位数据而且35个浪费位,这“感觉很糟糕”。

Another option is to use a lookup table and store a single byte reference into it for each user. This "feels bad" as it will make queries unnecessarily complex and slower.

另一种选择是使用查找表并为每个用户存储单个字节引用。这种“感觉很糟糕”,因为它会使查询变得不必要地复杂和缓慢。

The only other option I can think of is to serialize the values, which then requires overhead for every operation and breaks the rule of first normal order.

我能想到的唯一另一个选择是序列化值,然后需要每个操作的开销并打破第一个正常顺序的规则。

Or, maybe there's a better database I should be using that allows finer grain control over the sizes of INTEGER that will be stored in it? I'm currently using SQLite.

或者,也许有一个更好的数据库我应该使用它允许更精细的谷物控制INTEGER的大小将存储在其中?我目前正在使用SQLite。

P.S. - This is one example, but I actually have numerous other similar arrays to store, so it's not quite as simple as sucking up the 35 wasted bits per user.

附: - 这是一个例子,但我实际上有许多其他类似的数组要存储,所以它并不像每个用户吸收35个浪费的数据那么简单。

1 个解决方案

#1


0  

In order to understand this, you will have to refresh yourself on the Binary representation of Decimal numbers as well as Bit-Wise Operators and Bit Shifting. Each particular Programming language has different Operators for "Bitwise AND" and "Bit Shifting". Please refer to your programming Language you are using for the Proper Bit Operators. The "Bitwise AND" Operator allows you to find out whether a certain bit position is 1 or 0 by doing a Logical Binary AND operation on each bit.

为了理解这一点,你必须刷新自己的十进制数字的二进制表示以及比特智能运算符和位移。每种特定的编程语言都有不同的“按位AND”和“位移”运算符。请参考您用于Proper Bit Operators的编程语言。 “按位与”运算符允许您通过对每个位执行逻辑二进制AND运算来确定某个位位置是1还是0。

SQLite Datatypes documented here http://www.sqlite.org/datatype3.html support the following "dynamic types"

此处记录的SQLite数据类型http://www.sqlite.org/datatype3.html支持以下“动态类型”

  1. NULL - The Value is NULL
  2. NULL - 值为NULL

  3. INTEGER - The value is signed integer, stored in 1,2,3,4,6,8 bytes depending on value
  4. INTEGER - 值为有符号整数,存储在1,2,3,4,6,8字节中,具体取决于值

  5. REAL - The value is a floating point as 8 byte IEEE floating point number
  6. REAL - 该值是一个8字节IEEE浮点数的浮点数

  7. TEXT - The value is a text string
  8. TEXT - 值是文本字符串

  9. BLOB - The value is a blob of data
  10. BLOB - 值是一团数据

Everything on a computer is merely a series of Binary 1's and 0's, with usually the smallest allocation being 8-bit equals 1 byte. How you interpret those 1's and 0's is really up to you. So 8-bits can be an INTEGER if you want it to be or it can be a CHARACTER or anything really... Below I will propose the way you can store your 5-bits into 1 byte and interpret it as 5 values with either 1 or 0 as the value, or an X number of bits as you stated which can be "on" or "off" in the most minimal Allocation size as possible.

计算机上的所有内容只是一系列二进制1和0,通常最小的分配是8位等于1字节。你如何解释那些1和0是真的由你决定。所以8位可以是一个INTEGER,如果你想要它,或者它可以是一个CHARACTER或任何真正的......下面我将提出你将5位存储到1个字节并将其解释为5个值的方法1或0作为值,或者如您所述的X位数,可以在尽可能最小的分配大小中“开”或“关”。

In your situation where you are using just 5 bits and yet want to Minimize storage requirements, you can store it as a 1 byte (8-bit) Integer value (this is the smallest data structure SQLite allows, it does not have a Bit DataType) and then use a "bit mask" or combination of "shifting bits" to retrieve the values of each of the 5 positions or using the "bit-wise AND" operation. For example, you can represent your 5-bit structure (length 5 array of 1-bit each) with 3 leading zeros with the 5 right-most bits having meaning (3 bits + 5 bits = 8 bits = 1 byte). Such as 00011111, 00010101, 00010000... (notice leading left most 3 bits are Zeros).

在您只使用5位并且希望最小化存储要求的情况下,可以将其存储为1字节(8位)整数值(这是SQLite允许的最小数据结构,它没有Bit DataType然后使用“位掩码”或“移位”的组合来检索5个位置中的每个位置的值或使用“逐位AND”操作。例如,您可以表示5位结构(每个1位的长度为5的数组),其中3个前导零,其中5个最右边的位具有含义(3位+5位= 8位= 1个字节)。例如00011111,00010101,00010000 ...(通知最左边的3位是零)。

Keep in mind a 1 byte (8-bit) Integer is 11111111 in Binary or 255 Decimal. In some languages this is referred to as a TinyInt or SmallInt. In your case if you only use the Lower 5 bits, then your remaining upper 3 bits should always be 000.

请记住,1字节(8位)整数是二进制的11111111或255十进制。在某些语言中,这被称为TinyInt或SmallInt。在您的情况下,如果您只使用低5位,那么剩余的高3位应始终为000。

As an example, if my value was 00011111 Binary which equals 31 Decimal. So if I wanted to know the first bit value I would "bitwise AND" it with 00000001, and if I wanted to know the 2nd position I would bitwise AND with 00000010, 3rd position would be 00000100, 4th position 00001000, and 5th 00010000. These specific Binary Values representing each Position is called a "Bit-Mask".

例如,如果我的值是00011111二进制,等于31十进制。因此,如果我想知道第一个比特值,我会用00000001“按位”它,如果我想知道第二个位置,我会按00000010和第二个位置,第三个位置是00000100,第四个位置00001000和第五个00010000。表示每个位置的这些特定二进制值称为“位掩码”。

The way the "Bitwise AND" Operator works, is if I have a value, and I "BitWise AND" it with the proper position "Bitmask" and it equals that "BitMask" Value, then that Bit was set. For example if my Value is 00010010 Binary (18 Decimal) and I want to find out if the 2nd positional bit is Set (that is equal to 1) then I use BitMask 00000010. If 00010010 "BitWise AND" with 00000010 = 00000010 then the 2nd bit is set, if not it is not set. And if I want to find if the 5th positional bit is set, then I would do 00010010 "Bitwise AND" with 00010000, if it equals 00010000 then the 5th bit is set (that is the 5th bit equals 1), otherwise the 5th bit was not set (meaning 5th bit was 0).

“按位AND”运算符的工作方式是,如果我有一个值,并且我“BitWise AND”它与正确位置“Bitmask”并且它等于“BitMask”值,则该Bit被设置。例如,如果我的值是00010010二进制(18十进制)并且我想知道第二个位置位是否设置(等于1)那么我使用BitMask 00000010.如果00010010“BitWise AND”与00000010 = 00000010那么设置第2位,否则不设置。如果我想找到第5个位置位是否设置,那么我会用00010000执行00010010“按位与”,如果它等于00010000则设置第5位(即第5位等于1),否则为第5位未设置(意味着第5位为0)。

Keep in mind, I am explaining this in Binary, but the datatype is an Integer, so visually in the SQLite DB you will see it as a Decimal value, but since everything on a computer is just a stream of Bits, you will be Operating on that Datatype using Bit Operation.

请记住,我在Binary中解释这一点,但数据类型是一个Integer,因此在SQLite DB中可视化,您将其视为十进制值,但由于计算机上的所有内容只是一个Bits流,因此您将进行操作使用位操作在该数据类型上。

#1


0  

In order to understand this, you will have to refresh yourself on the Binary representation of Decimal numbers as well as Bit-Wise Operators and Bit Shifting. Each particular Programming language has different Operators for "Bitwise AND" and "Bit Shifting". Please refer to your programming Language you are using for the Proper Bit Operators. The "Bitwise AND" Operator allows you to find out whether a certain bit position is 1 or 0 by doing a Logical Binary AND operation on each bit.

为了理解这一点,你必须刷新自己的十进制数字的二进制表示以及比特智能运算符和位移。每种特定的编程语言都有不同的“按位AND”和“位移”运算符。请参考您用于Proper Bit Operators的编程语言。 “按位与”运算符允许您通过对每个位执行逻辑二进制AND运算来确定某个位位置是1还是0。

SQLite Datatypes documented here http://www.sqlite.org/datatype3.html support the following "dynamic types"

此处记录的SQLite数据类型http://www.sqlite.org/datatype3.html支持以下“动态类型”

  1. NULL - The Value is NULL
  2. NULL - 值为NULL

  3. INTEGER - The value is signed integer, stored in 1,2,3,4,6,8 bytes depending on value
  4. INTEGER - 值为有符号整数,存储在1,2,3,4,6,8字节中,具体取决于值

  5. REAL - The value is a floating point as 8 byte IEEE floating point number
  6. REAL - 该值是一个8字节IEEE浮点数的浮点数

  7. TEXT - The value is a text string
  8. TEXT - 值是文本字符串

  9. BLOB - The value is a blob of data
  10. BLOB - 值是一团数据

Everything on a computer is merely a series of Binary 1's and 0's, with usually the smallest allocation being 8-bit equals 1 byte. How you interpret those 1's and 0's is really up to you. So 8-bits can be an INTEGER if you want it to be or it can be a CHARACTER or anything really... Below I will propose the way you can store your 5-bits into 1 byte and interpret it as 5 values with either 1 or 0 as the value, or an X number of bits as you stated which can be "on" or "off" in the most minimal Allocation size as possible.

计算机上的所有内容只是一系列二进制1和0,通常最小的分配是8位等于1字节。你如何解释那些1和0是真的由你决定。所以8位可以是一个INTEGER,如果你想要它,或者它可以是一个CHARACTER或任何真正的......下面我将提出你将5位存储到1个字节并将其解释为5个值的方法1或0作为值,或者如您所述的X位数,可以在尽可能最小的分配大小中“开”或“关”。

In your situation where you are using just 5 bits and yet want to Minimize storage requirements, you can store it as a 1 byte (8-bit) Integer value (this is the smallest data structure SQLite allows, it does not have a Bit DataType) and then use a "bit mask" or combination of "shifting bits" to retrieve the values of each of the 5 positions or using the "bit-wise AND" operation. For example, you can represent your 5-bit structure (length 5 array of 1-bit each) with 3 leading zeros with the 5 right-most bits having meaning (3 bits + 5 bits = 8 bits = 1 byte). Such as 00011111, 00010101, 00010000... (notice leading left most 3 bits are Zeros).

在您只使用5位并且希望最小化存储要求的情况下,可以将其存储为1字节(8位)整数值(这是SQLite允许的最小数据结构,它没有Bit DataType然后使用“位掩码”或“移位”的组合来检索5个位置中的每个位置的值或使用“逐位AND”操作。例如,您可以表示5位结构(每个1位的长度为5的数组),其中3个前导零,其中5个最右边的位具有含义(3位+5位= 8位= 1个字节)。例如00011111,00010101,00010000 ...(通知最左边的3位是零)。

Keep in mind a 1 byte (8-bit) Integer is 11111111 in Binary or 255 Decimal. In some languages this is referred to as a TinyInt or SmallInt. In your case if you only use the Lower 5 bits, then your remaining upper 3 bits should always be 000.

请记住,1字节(8位)整数是二进制的11111111或255十进制。在某些语言中,这被称为TinyInt或SmallInt。在您的情况下,如果您只使用低5位,那么剩余的高3位应始终为000。

As an example, if my value was 00011111 Binary which equals 31 Decimal. So if I wanted to know the first bit value I would "bitwise AND" it with 00000001, and if I wanted to know the 2nd position I would bitwise AND with 00000010, 3rd position would be 00000100, 4th position 00001000, and 5th 00010000. These specific Binary Values representing each Position is called a "Bit-Mask".

例如,如果我的值是00011111二进制,等于31十进制。因此,如果我想知道第一个比特值,我会用00000001“按位”它,如果我想知道第二个位置,我会按00000010和第二个位置,第三个位置是00000100,第四个位置00001000和第五个00010000。表示每个位置的这些特定二进制值称为“位掩码”。

The way the "Bitwise AND" Operator works, is if I have a value, and I "BitWise AND" it with the proper position "Bitmask" and it equals that "BitMask" Value, then that Bit was set. For example if my Value is 00010010 Binary (18 Decimal) and I want to find out if the 2nd positional bit is Set (that is equal to 1) then I use BitMask 00000010. If 00010010 "BitWise AND" with 00000010 = 00000010 then the 2nd bit is set, if not it is not set. And if I want to find if the 5th positional bit is set, then I would do 00010010 "Bitwise AND" with 00010000, if it equals 00010000 then the 5th bit is set (that is the 5th bit equals 1), otherwise the 5th bit was not set (meaning 5th bit was 0).

“按位AND”运算符的工作方式是,如果我有一个值,并且我“BitWise AND”它与正确位置“Bitmask”并且它等于“BitMask”值,则该Bit被设置。例如,如果我的值是00010010二进制(18十进制)并且我想知道第二个位置位是否设置(等于1)那么我使用BitMask 00000010.如果00010010“BitWise AND”与00000010 = 00000010那么设置第2位,否则不设置。如果我想找到第5个位置位是否设置,那么我会用00010000执行00010010“按位与”,如果它等于00010000则设置第5位(即第5位等于1),否则为第5位未设置(意味着第5位为0)。

Keep in mind, I am explaining this in Binary, but the datatype is an Integer, so visually in the SQLite DB you will see it as a Decimal value, but since everything on a computer is just a stream of Bits, you will be Operating on that Datatype using Bit Operation.

请记住,我在Binary中解释这一点,但数据类型是一个Integer,因此在SQLite DB中可视化,您将其视为十进制值,但由于计算机上的所有内容只是一个Bits流,因此您将进行操作使用位操作在该数据类型上。