数据库:建模电子表格的最佳方式

时间:2022-09-15 19:48:01

I am trying to figure out the best way to model a spreadsheet (from the database point of view), taking into account :

我试图找出建模电子表格的最佳方法(从数据库的角度来看),同时考虑到:

  • The spreadsheet can contain a variable number of rows.
  • 电子表格可以包含可变数量的行。

  • The spreadsheet can contain a variable number of columns.
  • 电子表格可以包含可变数量的列。

  • Each column can contain one single value, but its type is unknown (integer, date, string).
  • 每列可以包含一个单独的值,但其类型是未知的(整数,日期,字符串)。

  • It has to be easy (and performant) to generate a CSV file containing the data.
  • 必须容易(并且高效)生成包含数据的CSV文件。

I am thinking about something like :

我想的是:

class Cell(models.Model):
    column = models.ForeignKey(Column)
    row_number = models.IntegerField()    
    value = models.CharField(max_length=100)

class Column(models.Model):
    spreadsheet = models.ForeignKey(Spreadsheet)
    name = models.CharField(max_length=100)
    type = models.CharField(max_length=100)

class Spreadsheet(models.Model):
    name = models.CharField(max_length=100)
    creation_date = models.DateField()

Can you think about a better way to model a spreadsheet ? My approach allows to store the data as a String. I am worried about it being too slow to generate the CSV file.

您能想到更好的电子表格建模方法吗?我的方法允许将数据存储为String。我担心生成CSV文件太慢了。

4 个解决方案

#1


2  

You may want to study EAV (Entity-attribute-value) data models, as they are trying to solve a similar problem.

您可能想要研究EAV(实体 - 属性 - 值)数据模型,因为他们正试图解决类似的问题。

Entity-Attribute-Value - Wikipedia

实体 - 属性 - 价值 - *

#2


4  

from a relational viewpoint:

从关系的角度来看:

Spreadsheet <-->> Cell : RowId, ColumnId, ValueType, Contents

there is no requirement for row and column to be entities, but you can if you like

行和列不需要是实体,但如果您愿意,也可以

#3


3  

Databases aren't designed for this. But you can try a couple of different ways.

数据库不是为此而设计的。但你可以尝试几种不同的方式。

The naiive way to do it is to do a version of One Table To Rule Them All. That is, create a giant generic table, all types being (n)varchars, that has enough columns to cover any forseeable spreadsheet. Then, you'll need a second table to store metadata about the first, such as what Column1's spreadsheet column name is, what type it stores (so you can cast in and out), etc. Then you'll need triggers to run against inserts that check the data coming in and the metadata to make sure the data isn't corrupt, etc etc etc. As you can see, this way is a complete and utter cluster. I'd run screaming from it.

这样做的最直接的方法是做一个表来统治他们所有。也就是说,创建一个巨大的通用表,所有类型都是(n)varchars,它有足够的列来覆盖任何可预见的电子表格。然后,您将需要第二个表来存储关于第一个的元数据,例如Column1的电子表格列名称是什么,它存储的类型(因此您可以进出)等等。然后您将需要触发器来运行插入检查进入的数据和元数据,以确保数据没有损坏等等。正如你所看到的,这种方式是一个完整的,完整的集群。我会从中尖叫。

The second option is to store your data as XML. Most modern databases have XML data types and some support for xpath within queries. You can also use XSDs to provide some kind of data validation, and xslts to transform that data into CSVs. I'm currently doing something similar with configuration files, and its working out okay so far. No word on performance issues yet, but I'm trusting Knuth on that one.

第二种选择是将数据存储为XML。大多数现代数据库都具有XML数据类型,并且在查询中支持xpath。您还可以使用XSD提供某种数据验证,并使用xslts将该数据转换为CSV。我目前正在做与配置文件类似的事情,到目前为止它的工作正常。还没有关于性能问题的消息,但我相信Knuth就是那个问题。

The first option is probably much easier to search and faster to retrieve data from, but the second is probably more stable and definitely easier to program against.

第一个选项可能更容易搜索并更快地从中检索数据,但第二个选项可能更稳定,并且更容易编程。

It's times like this I wish Celko had a SO account.

这是我希望Celko有一个SO帐户的时间。

#4


1  

The best solution greatly depends of the way the database will be used. Try to find a couple of top use cases you expect and then decide the design. For example if there is no use case to get the value of a certain cell from database (the data is always loaded at row level, or even in group of rows) then is no need to have a 'cell' stored as such.

最佳解决方案在很大程度上取决于数据库的使用方式。尝试找到您期望的几个主要用例,然后决定设计。例如,如果没有用例从数据库中获取某个单元格的值(数据总是在行级别,甚至在行组中加载),则不需要像这样存储“单元格”。

#1


2  

You may want to study EAV (Entity-attribute-value) data models, as they are trying to solve a similar problem.

您可能想要研究EAV(实体 - 属性 - 值)数据模型,因为他们正试图解决类似的问题。

Entity-Attribute-Value - Wikipedia

实体 - 属性 - 价值 - *

#2


4  

from a relational viewpoint:

从关系的角度来看:

Spreadsheet <-->> Cell : RowId, ColumnId, ValueType, Contents

there is no requirement for row and column to be entities, but you can if you like

行和列不需要是实体,但如果您愿意,也可以

#3


3  

Databases aren't designed for this. But you can try a couple of different ways.

数据库不是为此而设计的。但你可以尝试几种不同的方式。

The naiive way to do it is to do a version of One Table To Rule Them All. That is, create a giant generic table, all types being (n)varchars, that has enough columns to cover any forseeable spreadsheet. Then, you'll need a second table to store metadata about the first, such as what Column1's spreadsheet column name is, what type it stores (so you can cast in and out), etc. Then you'll need triggers to run against inserts that check the data coming in and the metadata to make sure the data isn't corrupt, etc etc etc. As you can see, this way is a complete and utter cluster. I'd run screaming from it.

这样做的最直接的方法是做一个表来统治他们所有。也就是说,创建一个巨大的通用表,所有类型都是(n)varchars,它有足够的列来覆盖任何可预见的电子表格。然后,您将需要第二个表来存储关于第一个的元数据,例如Column1的电子表格列名称是什么,它存储的类型(因此您可以进出)等等。然后您将需要触发器来运行插入检查进入的数据和元数据,以确保数据没有损坏等等。正如你所看到的,这种方式是一个完整的,完整的集群。我会从中尖叫。

The second option is to store your data as XML. Most modern databases have XML data types and some support for xpath within queries. You can also use XSDs to provide some kind of data validation, and xslts to transform that data into CSVs. I'm currently doing something similar with configuration files, and its working out okay so far. No word on performance issues yet, but I'm trusting Knuth on that one.

第二种选择是将数据存储为XML。大多数现代数据库都具有XML数据类型,并且在查询中支持xpath。您还可以使用XSD提供某种数据验证,并使用xslts将该数据转换为CSV。我目前正在做与配置文件类似的事情,到目前为止它的工作正常。还没有关于性能问题的消息,但我相信Knuth就是那个问题。

The first option is probably much easier to search and faster to retrieve data from, but the second is probably more stable and definitely easier to program against.

第一个选项可能更容易搜索并更快地从中检索数据,但第二个选项可能更稳定,并且更容易编程。

It's times like this I wish Celko had a SO account.

这是我希望Celko有一个SO帐户的时间。

#4


1  

The best solution greatly depends of the way the database will be used. Try to find a couple of top use cases you expect and then decide the design. For example if there is no use case to get the value of a certain cell from database (the data is always loaded at row level, or even in group of rows) then is no need to have a 'cell' stored as such.

最佳解决方案在很大程度上取决于数据库的使用方式。尝试找到您期望的几个主要用例,然后决定设计。例如,如果没有用例从数据库中获取某个单元格的值(数据总是在行级别,甚至在行组中加载),则不需要像这样存储“单元格”。