插入/更新/索引多行(100亿)个数字作为值

时间:2021-04-14 13:45:42

I need to insert 10 billion rows and update their values few times.

我需要插入100亿行并多次更新它们的值。

Table structure:

表结构:

Column1 Column2 Count
1       1       99
1       2       10003
1       3       1
1       4       23
1       5       9994
...
99999   1       2
99999   2       2233
99999   3       5904
99999   4       12
99999   5       4598435
...

I need Column1 to be indexed. In one table Count will be Integer in another it will be Double.

我需要将Column1编入索引。在一个表中,Count将是另一个表中的Integer,它将是Double。

What database suits best for my needs? I was told I should use NoSQL but there are so many of them.

什么数据库最适合我的需求?我被告知我应该使用NoSQL,但它们有很多。

3 个解决方案

#1


0  

There is nothing in any mainstream RDBMS that would make this hard or even impossible. All your requirements are trivial for any RDBMS.

任何主流RDBMS中都没有任何东西可以使这很难甚至不可能。对于任何RDBMS,您的所有要求都是微不足道的。

What you need is a single table with a single index on it. This does not stress any system architecturally.

你需要的是一个单表,上面有一个索引。这并不会对架构上的任何系统造成压力。

Be aware that RAM is likely to be not enough to cache all data. This means that every access will hit the disk. You need disks that have enough IOPS.

请注意,RAM可能不足以缓存所有数据。这意味着每次访问都会访问磁盘。您需要具有足够IOPS的磁盘。

#2


2  

I would use a database you know well as long as it can handle your required throughput. So I assume since you are asking your preferred database hasn't met your requirements.

我会使用你熟悉的数据库,只要它能处理你所需的吞吐量。所以我假设您要求您的首选数据库未满足您的要求。

If you require high throughput with consistent sub-millisecond lookup latency take a look at Aerospike which is used a lot in the ADTech industry. See this Case Study from AppNexus and Intel. Aerospike is an open source, distributed, in memory and/or SSD NOSQL KV database with support for UDFs and Secondary Indexes.

如果您需要具有一致的亚毫秒查找延迟的高吞吐量,请查看在ADTech行业中大量使用的Aerospike。请参阅AppNexus和英特尔的此案例研究。 Aerospike是一个开源的,分布式的内存和/或SSD NOSQL KV数据库,支持UDF和二级索引。

#3


0  

Try to start with PostgreSQL. It has no row count limitations manifested here. If you face any perfromance troubles with it, you may think again about some NoSQL solution. But it's more probable, Postgres will fit your requirements. It's very mature today.

尝试从PostgreSQL开始。它没有行计数限制。如果您遇到任何麻烦问题,您可能会再次考虑一些NoSQL解决方案。但它更有可能,Postgres将满足您的要求。今天非常成熟。

#1


0  

There is nothing in any mainstream RDBMS that would make this hard or even impossible. All your requirements are trivial for any RDBMS.

任何主流RDBMS中都没有任何东西可以使这很难甚至不可能。对于任何RDBMS,您的所有要求都是微不足道的。

What you need is a single table with a single index on it. This does not stress any system architecturally.

你需要的是一个单表,上面有一个索引。这并不会对架构上的任何系统造成压力。

Be aware that RAM is likely to be not enough to cache all data. This means that every access will hit the disk. You need disks that have enough IOPS.

请注意,RAM可能不足以缓存所有数据。这意味着每次访问都会访问磁盘。您需要具有足够IOPS的磁盘。

#2


2  

I would use a database you know well as long as it can handle your required throughput. So I assume since you are asking your preferred database hasn't met your requirements.

我会使用你熟悉的数据库,只要它能处理你所需的吞吐量。所以我假设您要求您的首选数据库未满足您的要求。

If you require high throughput with consistent sub-millisecond lookup latency take a look at Aerospike which is used a lot in the ADTech industry. See this Case Study from AppNexus and Intel. Aerospike is an open source, distributed, in memory and/or SSD NOSQL KV database with support for UDFs and Secondary Indexes.

如果您需要具有一致的亚毫秒查找延迟的高吞吐量,请查看在ADTech行业中大量使用的Aerospike。请参阅AppNexus和英特尔的此案例研究。 Aerospike是一个开源的,分布式的内存和/或SSD NOSQL KV数据库,支持UDF和二级索引。

#3


0  

Try to start with PostgreSQL. It has no row count limitations manifested here. If you face any perfromance troubles with it, you may think again about some NoSQL solution. But it's more probable, Postgres will fit your requirements. It's very mature today.

尝试从PostgreSQL开始。它没有行计数限制。如果您遇到任何麻烦问题,您可能会再次考虑一些NoSQL解决方案。但它更有可能,Postgres将满足您的要求。今天非常成熟。