你如何从数据库中查询1和0的字符数组?

时间:2021-01-28 12:59:06

Say you had a long array of chars that are either 1 or 0, kind of like a bitvector, but on a database column. How would you query to know what values are set/no set? Say you need to know if the char 500 and char 1500 are "true" or not.

假设你有一长串的字符数为1或0,有点像bitvector,但在数据库列上。您如何查询知道设置/未设置的值?假设您需要知道char 500和char 1500是否为“true”。

4 个解决方案

#1


SELECT
  Id
FROM
  BitVectorTable
WHERE
  SUBSTRING(BitVector, 500, 1) = '1'
  AND SUBSTRING(BitVector, 1000, 1) = '1'

No index can be used for this kind of query, though. When you have many rows, this will get slow very quickly.

但是,没有索引可用于此类查询。当你有很多行时,这会很快变慢。

Edit: On SQL Server at least, all built-in string functions are deterministic. That means you could look into the possibility to make computed columns based on the SUBSTRING() results for the whole combined value, putting an index on each of them. Inserts will be slower, table size will increase, but searches will be really fast.

编辑:至少在SQL Server上,所有内置字符串函数都是确定性的。这意味着您可以根据整个组合值的SUBSTRING()结果调查计算列的可能性,并为每个组合生成一个索引。插入内容会变慢,表格大小会增加,但搜索速度会非常快。

SELECT
  Id
FROM
  BitVectorTable
WHERE
  BitVector_0500 = '1'
  AND BitVector_1000 = '1'

Edit #2: The limits for SQL Server are:

编辑#2:SQL Server的限制是:

  • 1,024 columns per normal table
  • 每个普通表1,024列

  • 30.000 columns per "wide" table
  • 每“宽”表有30.000列

#2


In MySQL, something using substring like

在MySQL中,使用子字符串之类的东西

select foo from bar 
where substring(col, 500,1)='1' and substring(col, 1500,1)='1';

This will be pretty inefficient though, you might want to rethink your schema. For example, you could store each bit separately to tradeoff space for speed...

但这可能效率很低,您可能需要重新考虑您的架构。例如,您可以将每个位单独存储到权衡空间以获得速度......

create table foo
(
   id int not null,
   bar varchar(128),
   primary key(id)
);

create table foobit
(
   int foo_id int not null,
   int idx int not null,
   value tinyint not null,

   primary key(foo_id,idx),
   index(idx,value)
);

Which would be queried

哪个会被查询

   select foo.bar from foo
   inner join foobit as bit500
      on(foo.id=bit500.foo_id and bit500.idx=500)
   inner join foobit as bit1500
      on(foo.id=bit1500.foo_id and bit1500.idx=1500)
   where
      bit500.value=1 and bit1500.value=1;

Obviously consumes more storage, but should be faster for those query operations as an index will be used.

显然会消耗更多存储空间,但对于那些查询操作应该更快,因为将使用索引。

#3


I would convert the column to multiple bit-columns and rewrite the relevant code - Bit masks are so much faster than string comparisons. But if you can't do that, you must use db-specific functions. Regular expressions could be an option

我会将列转换为多个位列并重写相关代码 - 位掩码比字符串比较快得多。但是如果你不能这样做,你必须使用特定于数据库的函数。正则表达式可以是一种选择

-- Flavor: MySql
SELECT * FROM table WHERE column REGEXP "^.{499}1.{999}1"

#4


select substring(your_col, 500,1) as char500,
substring(your_col, 1500,1) as char1500 from your_table;

#1


SELECT
  Id
FROM
  BitVectorTable
WHERE
  SUBSTRING(BitVector, 500, 1) = '1'
  AND SUBSTRING(BitVector, 1000, 1) = '1'

No index can be used for this kind of query, though. When you have many rows, this will get slow very quickly.

但是,没有索引可用于此类查询。当你有很多行时,这会很快变慢。

Edit: On SQL Server at least, all built-in string functions are deterministic. That means you could look into the possibility to make computed columns based on the SUBSTRING() results for the whole combined value, putting an index on each of them. Inserts will be slower, table size will increase, but searches will be really fast.

编辑:至少在SQL Server上,所有内置字符串函数都是确定性的。这意味着您可以根据整个组合值的SUBSTRING()结果调查计算列的可能性,并为每个组合生成一个索引。插入内容会变慢,表格大小会增加,但搜索速度会非常快。

SELECT
  Id
FROM
  BitVectorTable
WHERE
  BitVector_0500 = '1'
  AND BitVector_1000 = '1'

Edit #2: The limits for SQL Server are:

编辑#2:SQL Server的限制是:

  • 1,024 columns per normal table
  • 每个普通表1,024列

  • 30.000 columns per "wide" table
  • 每“宽”表有30.000列

#2


In MySQL, something using substring like

在MySQL中,使用子字符串之类的东西

select foo from bar 
where substring(col, 500,1)='1' and substring(col, 1500,1)='1';

This will be pretty inefficient though, you might want to rethink your schema. For example, you could store each bit separately to tradeoff space for speed...

但这可能效率很低,您可能需要重新考虑您的架构。例如,您可以将每个位单独存储到权衡空间以获得速度......

create table foo
(
   id int not null,
   bar varchar(128),
   primary key(id)
);

create table foobit
(
   int foo_id int not null,
   int idx int not null,
   value tinyint not null,

   primary key(foo_id,idx),
   index(idx,value)
);

Which would be queried

哪个会被查询

   select foo.bar from foo
   inner join foobit as bit500
      on(foo.id=bit500.foo_id and bit500.idx=500)
   inner join foobit as bit1500
      on(foo.id=bit1500.foo_id and bit1500.idx=1500)
   where
      bit500.value=1 and bit1500.value=1;

Obviously consumes more storage, but should be faster for those query operations as an index will be used.

显然会消耗更多存储空间,但对于那些查询操作应该更快,因为将使用索引。

#3


I would convert the column to multiple bit-columns and rewrite the relevant code - Bit masks are so much faster than string comparisons. But if you can't do that, you must use db-specific functions. Regular expressions could be an option

我会将列转换为多个位列并重写相关代码 - 位掩码比字符串比较快得多。但是如果你不能这样做,你必须使用特定于数据库的函数。正则表达式可以是一种选择

-- Flavor: MySql
SELECT * FROM table WHERE column REGEXP "^.{499}1.{999}1"

#4


select substring(your_col, 500,1) as char500,
substring(your_col, 1500,1) as char1500 from your_table;