在数据库中存储大量时间序列的有效方法是什么?(复制)

时间:2022-02-05 08:03:14

This question already has an answer here:

这个问题已经有了答案:

I need to store a bunch of time series in a database but I'm concerned about both the size and the processing time.

我需要在数据库中存储一些时间序列,但是我关心的是大小和处理时间。

To reduce size I've already used, in another project, zipped/JSON to store a whole time series, and this is quite efficient in term of storage space. But the issue is that to search some data you have to first retrieve the whole time series, unzip it and unserialize it and of course you can't use the database integrated querying capabilities like SQL SELECT/WHERE.

为了减小我已经使用过的大小,在另一个项目中,压缩/JSON存储整个时间序列,这在存储空间方面非常有效。但问题是,要搜索某些数据,首先要检索整个时间序列,解压缩和反序列化,当然不能使用SQL SELECT/WHERE等数据库集成查询功能。

So you consume bandwidth to get the data, CPU to unzip, RAM to store even if you need only one point...

所以你需要消耗带宽来获取数据,CPU和内存,即使你只需要一个点…

This was not an issue for the previous project because time series were always manipulated as a whole, essentially to be displayed in charts or Excel, but this time I'd like to have a minimal ability to search data in database.

这在之前的项目中并不是问题,因为时间序列是作为一个整体进行操作的,本质上是显示在图表或Excel中,但是这次我希望能够在数据库中搜索数据。

To allow this flexibility in term of data manipulation, e.g. using SQL, there is the "standard format": one row by date, but I have two concerns:

为了在数据处理方面提供这种灵活性,例如使用SQL,有一种“标准格式”:按日期一行,但我有两个问题:

  • a time series over 10 years could have 3000 values so it means 3000 rows, so imagine if I have 1M time series I could have 3G rows! I'm not sure a "normal" database like MySQL or PostgreSQL could handle such a huge numbers of rows but hope I'm wrong
  • 一个10年的时间序列可以有3000个值,这意味着3000行,所以想象一下,如果我有1M个时间序列,我可以有3G行!我不确定像MySQL或PostgreSQL这样的“普通”数据库能处理这么多行,但我希望我错了
  • I don't know if DBMS are so good at optimizing the space required by all the cells, though while it's not "too" big it's OK
  • 我不知道DBMS是否如此善于优化所有单元所需的空间,尽管它不是“太大”,但也没关系

I can choose any free database, so NoSQL is welcome too if it can help.

我可以选择任何免费的数据库,所以如果可以的话,NoSQL也很受欢迎。

Have you any suggestions, or better some feedbacks?

你有什么建议或者更好的反馈吗?

Thanks for any input.

谢谢你的任何输入。

1 个解决方案

#1


2  

Checkout TempoDB: http://tempo-db.com

结帐TempoDB:http://tempo-db.com

I'm a co-founder, and we built the service to solve this exact problem.

我是联合创始人,我们建立了服务来解决这个问题。

The access pattern is writing data in order by time, usually not editing it (highly immutable), and then reading data back by time.

访问模式是按时间顺序编写数据,通常不编辑它(高度不可变),然后按时间返回数据。

The fundamental issue you'll face is indexing on a timestamp, where there are many billions of rows. You want to decouple query performance from the underlying total dataset size, which will always be growing at least linearly. We do all that stuff... and more :)

您将面临的基本问题是对时间戳进行索引,其中有数十亿行。您希望将查询性能与底层的总体数据集大小分离,该数据集大小将始终保持至少线性增长。我们做所有这些事情……和更多:)

#1


2  

Checkout TempoDB: http://tempo-db.com

结帐TempoDB:http://tempo-db.com

I'm a co-founder, and we built the service to solve this exact problem.

我是联合创始人,我们建立了服务来解决这个问题。

The access pattern is writing data in order by time, usually not editing it (highly immutable), and then reading data back by time.

访问模式是按时间顺序编写数据,通常不编辑它(高度不可变),然后按时间返回数据。

The fundamental issue you'll face is indexing on a timestamp, where there are many billions of rows. You want to decouple query performance from the underlying total dataset size, which will always be growing at least linearly. We do all that stuff... and more :)

您将面临的基本问题是对时间戳进行索引,其中有数十亿行。您希望将查询性能与底层的总体数据集大小分离,该数据集大小将始终保持至少线性增长。我们做所有这些事情……和更多:)