在MySQL中存储和访问庞大数据矩阵的最有效方法

时间:2022-09-23 17:48:48

I am going to store a huge amount of matrix data in a mysqlDB what is the most efficient way to store and access the data?

我将在mysqlDB中存储大量矩阵数据,这是存储和访问数据的最有效方法是什么?

The efficiency is most important when getting the data, the table will not be updated regularly.

获取数据时效率最重要,表格不会定期更新。

The matrix is about 100.000 times 1000 (probably larger in the future)

矩阵约为100.000倍1000(未来可能更大)


id1
value
value_id1
id1
value
value_id2
id2
value
value_id1
id2
value
value_id2
.
.
.
id 100.000
value
value_id1000

vs
     value_id1, value_id2, value_id3 ... id 1000
id1  value      value      value
id2  value      value      value
id3  value      value      value
.
.
.
id 100.000

When the data is huge what is most efficient, a short call (mysql query) or to have the data stored as a matrix? The data is used regularly so it must be efficient to fetch data.

什么是最有效的数据,短调用(mysql查询)或将数据存储为矩阵?数据经常使用,因此获取数据必须高效。

2 个解决方案

#1


9  

Since you said you want efficiency in fetching, I would use following table format

既然你说你想要获取效率,我会使用下面的表格式

 Column Row Value 
      1   1   1.2
      2   1   2.3
      ...

Using the format and indexing on column and row of the matrix, you can fetch any data part as fast as you want.

使用矩阵的列和行的格式和索引,您可以根据需要快速获取任何数据部分。

#2


4  

There are a couple relevant questions here:

这里有几个相关的问题:

The answers for dense matrices seem to boil down to a normalized table with columns for column, row, and value, as suggested by Taesung above, or doing something like storing individual rows from your original matrix as blobs.

密集矩阵的答案似乎归结为一个带有列,行和值的标准化表,如上面的Taesung所示,或者做一些事情,比如将原始矩阵中的各个行存储为blob。

HDF5 looks to be made for this sort of thing. It would be great if someone with experience could comment further.

HDF5看起来就是为了这种事情。如果有经验的人可以进一步评论,那就太好了。

#1


9  

Since you said you want efficiency in fetching, I would use following table format

既然你说你想要获取效率,我会使用下面的表格式

 Column Row Value 
      1   1   1.2
      2   1   2.3
      ...

Using the format and indexing on column and row of the matrix, you can fetch any data part as fast as you want.

使用矩阵的列和行的格式和索引,您可以根据需要快速获取任何数据部分。

#2


4  

There are a couple relevant questions here:

这里有几个相关的问题:

The answers for dense matrices seem to boil down to a normalized table with columns for column, row, and value, as suggested by Taesung above, or doing something like storing individual rows from your original matrix as blobs.

密集矩阵的答案似乎归结为一个带有列,行和值的标准化表,如上面的Taesung所示,或者做一些事情,比如将原始矩阵中的各个行存储为blob。

HDF5 looks to be made for this sort of thing. It would be great if someone with experience could comment further.

HDF5看起来就是为了这种事情。如果有经验的人可以进一步评论,那就太好了。