在运行时,我应该使用哪个数据库模型对实体/属性进行动态修改?

时间:2022-08-01 12:54:00

I am thinking about creating an open source data management web application for various types of data.

我正在考虑为各种类型的数据创建一个开源数据管理web应用程序。

A privileged user must be able to

特权用户必须能够

  • add new entity types (for example a 'user' or a 'family')
  • 添加新的实体类型(例如'user'或'family')
  • add new properties to entity types (for example 'gender' to 'user')
  • 向实体类型添加新属性(例如'gender'到'user')
  • remove/modify entities and properties
  • 删除/修改实体和属性

These will be common tasks for the privileged user. He will do this through the web interface of the application. In the end, all data must be searchable and sortable by all types of users of the application. Two questions trouble me:

这些将是特权用户的常见任务。他将通过应用程序的web界面来实现这一点。最后,所有数据必须由应用程序的所有类型的用户搜索和排序。两个问题麻烦我。

a) How should the data be stored in the database? Should I dynamically add/remove database tables and/or columns during runtime?

a)如何将数据存储在数据库中?我应该在运行时动态地添加/删除数据库表和/或列吗?

I am no database expert. I am stuck with the imagination that in terms of relational databases, the application has to be able to dynamically add/remove tables (entities) and/or columns (properties) at runtime. And I don't like this idea. Likewise, I am thinking if such dynamic data should be handled in a NoSQL database.

我不是数据库专家。我一直想象,在关系数据库方面,应用程序必须能够在运行时动态地添加/删除表(实体)和/或列(属性)。我不喜欢这个想法。同样,我在考虑是否应该在NoSQL数据库中处理此类动态数据。

Anyway, I believe that this kind of problem has an intelligent canonical solution, which I just did not find and think of so far. What is the best approach for this kind of dynamic data management?

无论如何,我相信这类问题有一个智能的典型解决方案,我只是到目前为止还没有找到,也没有想到。这种动态数据管理的最佳方法是什么?

b) How to implement this in Python using an ORM or NoSQL?

b)如何使用ORM或NoSQL在Python中实现它?

If you recommend using a relational database model, then I would like to use SQLAlchemy. However, I don't see how to dynamically create tables/columns with an ORM at runtime. This is one of the reasons why I hope that there is a much better approach than creating tables and columns during runtime. Is the recommended database model efficiently implementable with SQLAlchemy?

如果您建议使用关系数据库模型,那么我希望使用SQLAlchemy。但是,我不知道如何在运行时使用ORM动态创建表/列。这就是为什么我希望有一种比在运行时创建表和列更好的方法。推荐的数据库模型在SQLAlchemy中能够有效地实现吗?

If you recommend using a NoSQL database, which one? I like using Redis -- can you imagine an efficient implementation based on Redis?

如果您建议使用NoSQL数据库,您建议使用哪个?我喜欢使用Redis,你能想象一个基于Redis的高效实现吗?

Thanks for your suggestions!

谢谢你的建议!

Edit in response to some comments:

编辑回应一些评论:

The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.

其思想是某个实体(“表”)的所有实例(“行”)共享相同的属性/属性集(“列”)。但是,如果某些实例的某些属性/属性的值为空,那么它将是完全有效的。

Basically, users will search the data through a simple form on a website. They query for e.g. all instances of an entity E with property P having a value V higher than T. The result can be sorted by the value of any property.

基本上,用户将通过网站上的简单表单搜索数据。它们查询具有P值大于t的实体E的所有实例,结果可以根据任何属性的值进行排序。

The datasets won't become too large. Hence, I think even the stupidest approach would still lead to a working system. However, I am an enthusiast and I'd like to apply modern and appropriate technology as well as I'd like to be aware of theoretical bottlenecks. I want to use this project in order to gather experience in designing a "Pythonic", state-of-the-art, scalable, and reliable web application.

数据集不会变得太大。因此,我认为即使是最愚蠢的方法也会导致一个工作系统。然而,我是一个热心的人,我想应用现代和合适的技术,同时我也想了解理论瓶颈。我想利用这个项目来积累设计“python化”的、最先进的、可伸缩的、可靠的web应用程序的经验。

I see that the first comments tend to recommending a NoSQL approach. Although I really like Redis, it looks like it would be stupid not to take advantage of the Document/Collection model of Mongo/Couch. I've been looking into mongodb and mongoengine for Python. By doing so, do I take steps into the right direction?

我看到第一个注释倾向于推荐一个NoSQL方法。虽然我很喜欢Redis,但如果不利用Mongo/Couch的文档/收集模型,那就太愚蠢了。我一直在为Python研究mongodb和mongoengine。通过这样做,我是否朝着正确的方向迈出了一步?

Edit 2 in response to some answers/comments:

编辑2以回应一些答案/评论:

From most of your answers, I conclude that the dynamic creation/deletion of tables and columns in the relational picture is not the way to go. This already is valuable information. Also, one opinion is that the whole idea of the dynamic modification of entities and properties could be bad design.

从您的大多数回答中,我得出的结论是,在关系图中动态创建/删除表和列不是正确的方法。这已经是有价值的信息。另外,有一种观点认为,对实体和属性进行动态修改的整个想法可能是糟糕的设计。

As exactly this dynamic nature should be the main purpose/feature of the application, I don't give up on this. From the theoretical point of view, I accept that performing operations on a dynamic data model must necessarily be slower than performing operations on a static data model. This is totally fine.

正如这种动态特性应该是应用程序的主要目的/特性一样,我不会就此放弃。从理论的角度来看,我认为在动态数据模型上执行操作必须比在静态数据模型上执行操作慢。这是完全好了。

Expressed in an abstract way, the application needs to manage

用抽象的方式表示,应用程序需要进行管理

  1. the data layout, i.e. a "dynamic list" of valid entity types and a "dynamic list" of properties for each valid entity type
  2. 数据布局,即有效实体类型的“动态列表”和每个有效实体类型的“动态列表”属性
  3. the data itself
  4. 数据本身

I am looking for an intelligent and efficient way to implement this. From your answers, it looks like NoSQL is the way to go here, which is another important conclusion.

我正在寻找一种智能和高效的方法来实现这一点。从你的回答来看,NoSQL似乎是这里的方法,这是另一个重要的结论。

4 个解决方案

#1


3  

So, if you conceptualize your entities as "documents," then this whole problem maps onto a no-sql solution pretty well. As commented, you'll need to have some kind of model layer that sits on top of your document store and performs tasks like validation, and perhaps enforces (or encourages) some kind of schema, because there's no implicit backend requirement that entities in the same collection (parallel to table) share schema.

因此,如果您将实体概念化为“文档”,那么整个问题就很好地映射到非sql解决方案。如前所述,您将需要某种模型层,该模型层位于文档存储之上,并执行诸如验证之类的任务,并且可能强制(或鼓励)某种模式,因为不存在要求相同集合(与表并行)中的实体共享模式的隐式后端需求。

Allowing privileged users to change your schema concept (as opposed to just adding fields to individual documents - that's easy to support) will pose a little bit of a challenge - you'll have to handle migrating the existing data to match the new schema automatically.

允许有特权的用户更改您的模式概念(而不是仅仅向单个文档添加字段——这很容易支持)将带来一点挑战——您必须处理迁移现有数据以自动匹配新的模式。

Reading your edits, Mongo supports the kind of searching/ordering you're looking for, and will give you the support for "empty cells" (documents lacking a particular key) that you need.

阅读您的编辑,Mongo支持您正在寻找的搜索/排序,并将为您所需的“空单元格”(缺少特定键的文档)提供支持。

If I were you (and I happen to be working on a similar, but simpler, product at the moment), I'd stick with Mongo and look into a lightweight web framework like Flask to provide the front-end. You'll be on your own to provide the model, but you won't be fighting against a framework's implicit modeling choices.

如果我是你(我碰巧正在开发一款类似但更简单的产品),我会坚持使用Mongo,研究Flask这样的轻量级web框架,以提供前端。您将自己提供模型,但不会与框架的隐式建模选择发生冲突。

#2


19  

The SQL or NoSQL choice is not your problem. You need to read little more about database design in general. As you said, you're not a database expert(and you don't need to be), but you absolutely must study a little more the RDBMS paradigm.

SQL或NoSQL选项不是您的问题。您需要阅读更多关于数据库设计的内容。正如您所言,您不是数据库专家(您不需要这样做),但是您绝对必须进一步研究RDBMS范式。

It's a common mistake for amateur enthusiasts to choose a NoSQL solution. Sometimes NoSQL is a good solution, most of the times is not.

业余爱好者选择NoSQL解决方案是一个常见的错误。有时NoSQL是一个很好的解决方案,但大多数时候不是。

Take for example MongoDB, which you mentioned(and is one of the good NoSQL solutions I've tried). Schema-less, right? Err.. not exactly. You see when something is schema-less means no constraints, validation, etc. But your application's models/entities can't stand on thin air! Surely there will be some constraints and validation logic which you will implement on your software layer. So I give you mongokit! I will just quote from the project's description this tiny bit

以MongoDB为例,您提到过它(它是我尝试过的一个很好的NoSQL解决方案)。非模式化,对吧?呃. .不完全是。您可以看到,当某些东西是无模式的时,意味着没有约束、验证等。但是您的应用程序的模型/实体不能空穴来风!当然,您将在软件层上实现一些约束和验证逻辑。所以我给你mongokit!我将引用这个项目的描述

MongoKit brings structured schema and validation layer on top of the great pymongo driver

MongoKit在伟大的pymongo驱动程序之上带来了结构化的模式和验证层

Hmmm... unstructured became structured.

嗯…非结构化结构化。

At least we don't have SQL right? Yeah, we don't. We have a different query language which is of course inferior to SQL. At least you don't need to resort to map/reduce for basic queries(see CouchDB).

至少我们没有SQL,对吧?是的,我们没有。我们有一种不同的查询语言,当然,它不如SQL。至少您不需要对基本查询使用map/reduce(参见CouchDB)。

Don't get me wrong, NoSQL(and especially MongoDB) has its purpose, but most of the times these technologies are used for the wrong reason.

不要误解我的意思,NoSQL(特别是MongoDB)有它的目的,但是大多数时候这些技术的使用理由都是错误的。

Also, if you care about serious persistence and data integrity forget about NoSQL solutions. All these technologies are too experimental to keep your serious data. By researching a bit who(except Google/Amazon) uses NoSQL solutions and for what exactly, you will find that almost no one uses it for keeping their important data. They mostly use them for logging, messages and real time data. Basically anything to off-load some burden from their SQL db storage.

此外,如果您关心严重的持久性和数据完整性,那么请忘记NoSQL解决方案。所有这些技术都是实验性的,不能保证你的数据可靠。通过研究一些人(除了谷歌/Amazon)使用NoSQL解决方案,而且确切地说,您会发现几乎没有人使用它来保存重要数据。它们主要用于日志记录、消息和实时数据。基本上,可以从SQL db存储中卸载一些负担。

Redis, in my opinion, is probably the only project who is going to survive the NoSQL explosion unscathed. Maybe because it doesn't advertise itself as NoSQL, but as a key-value store, which is exactly what it is and a pretty damn good one! Also they seem serious about persistence. It is a swiss army knife, but not a good solution to replace entirely your RDBMS.

在我看来,Redis可能是唯一一个在NoSQL爆炸中毫发无损的项目。也许是因为它并没有标榜自己是NoSQL,而是作为一个键值存储,这正是它的本质,而且非常棒!此外,他们似乎对坚持不懈很认真。这是一把瑞士军刀,但不是完全替代RDBMS的好解决方案。

I am sorry, I said too much :)

对不起,我说得太多了。

So here is my suggestion:

我的建议是:

1) Study the RDBMS model a bit.

1)对RDBMS模型进行一些研究。

2) Django is a good framework if most of your project is going to use an RDBMS.

2)如果您的项目大部分使用RDBMS, Django是一个很好的框架。

3) Postgresql rocks! Also keep in mind that version 9.2 will bring native JSON support. You could dump all your 'dynamic' properties in there and you could use a secondary storage/engine to perform queries(map/reduce) on said properties. Have your cake and eat it too!

3)Postgresql岩石!还要记住,9.2版本将提供本机JSON支持。您可以在其中转储所有的“动态”属性,您可以使用辅助存储/引擎来执行查询(map/reduce)。把你的蛋糕也吃了!

4) For serious search capabilities consider specialized engines like solr.

4)要想有真正的搜索能力,可以考虑像solr这样的专业搜索引擎。

EDIT: 6 April 2013

编辑:2013年4月6日

5) django-ext-hstore gives you access to postgresql hstore type. It's similar to a python dictionary and you can perform queries on it, with the limitation that you can't have nested dictionaries as values. Also the value of key can be only of type string.

5) django-ext-hstore允许您访问postgresql hstore类型。它类似于python字典,您可以对它执行查询,但有一个限制,即不能将嵌套字典作为值。键值也只能是字符串类型。

Have fun

玩得开心


Update in response to OP's comment

更新响应OP的评论

0) Consider the application 'contains data' and has already been used for a while

0)考虑应用程序“包含数据”,并且已经使用了一段时间

I am not sure if you mean that it contains data in a legacy dbms or you are just trying to say that "imagine that the DB is not empty and consider the following points...". In the former case, it seems a migration issue(completely different question), in the latter, well OK.

我不确定您的意思是它包含了遗留dbms中的数据,或者您只是想说“假设DB不是空的,并且考虑以下几点……”。在前一种情况下,这似乎是一个迁移问题(完全不同的问题),在后一种情况下,好的。

1) Admin deletes entity "family" and all related data

1)管理员删除实体“族”和所有相关数据

Why should someone eliminate completely an entity(table)? Either your application has to do with families, houses, etc or it doesn't. Deleting instances(rows) of families is understandable of course.

为什么有人要完全删除一个实体(表)?要么你的申请涉及家庭,房子,等等。删除家族的实例(行)当然是可以理解的。

2) Admin creates entity "house"

2)管理员创建实体“house”

Same with #1. If you introduce a brand new entity in your app then most probably it will encapsulate semantics and business logic, for which new code must be written. This happens to all applications as they evolve through time and of course warrants a creation of a new table, or maybe ALTERing an existing one. But this process is not a part of the functionality of your application. i.e. it happens rarely, and is a migration/refactoring issue.

# 1也一样。如果您在应用程序中引入一个全新的实体,那么很可能它将封装语义和业务逻辑,因此必须编写新的代码。随着时间的推移,所有应用程序都会出现这种情况,当然,这就需要创建一个新的表,或者修改一个现有的表。但是这个过程不是应用程序功能的一部分。例如,它很少发生,并且是一个迁移/重构问题。

3) Admin adds properties "floors", "age", ..

3)管理员添加属性“楼层”、“年龄”、……

Why? Don't we know beforehand that a House has floors? That a User has a gender? Adding and removing, dynamically, this type of attributes is not a feature, but a design flaw. It is part of the analysis/design phase to identify your entities and their respective properties.

为什么?难道我们不知道房子有地板吗?用户有性别吗?动态地添加和删除这类属性不是特性,而是设计缺陷。它是分析/设计阶段的一部分,用于标识您的实体及其各自的属性。

4) Privileged user adds some houses.

4)特权用户增加一些房子。

Yes, he is adding an instance(row) to the existing entity(table) House.

是的,他正在向现有实体(表)库添加实例(行)。

5) User searches for all houses with at least five floors cheaper than 100 $

5)用户搜索所有至少五层楼低于100美元的房子

A perfectly valid query which can be achieved with either SQL or NoSQL solution. In django it would be something along those lines:

一个完全有效的查询,可以用SQL或NoSQL解决方案实现。在《被解救的姜戈》中,应该是这样的:

House.objects.filter(floors__gte=5, price__lt=100)

provided that House has the attributes floors and price. But if you need to do text-based queries, then neither SQL nor NoSQL will be satisfying enough. Because you don't want to implement faceting or stemming on your own! You will use some of the already discussed solutions(Solr, ElasticSearch, etc).

只要房子有属性、楼层和价格。但是,如果您需要执行基于文本的查询,那么SQL和NoSQL都不够令人满意。因为你不想自己去实现它。您将使用一些已经讨论过的解决方案(Solr、ElasticSearch等)。

Some more general notes:

一些更一般的笔记:

The examples you gave about Houses, Users and their properties, do not warrant any dynamic schema. Maybe you simplified your example just to make your point, but you talk about adding/removing Entities(tables) as if they are rows in a db. Entities are supposed to be a big deal in an application. They define the purpose of your application and its functionality. As such, they can't change every minute.

您所给出的关于房屋、用户及其属性的示例不支持任何动态模式。也许你简化了你的例子来说明你的观点,但是你谈论的是添加/删除实体(表),就好像它们是db中的行。实体在应用程序中应该是很重要的。它们定义了应用程序的目的及其功能。因此,他们不能每一分钟都改变。

Also you said:

你也说:

The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.

This seems like a common case where an attribute has null=True.

这似乎是一个属性为null=True的常见情况。

And as a final note, I would like to suggest to you to try both approaches(SQL and NoSQL), since it doesn't seem like your career depends on this project. It will be a beneficiary experience, as you will understand first-hand, the cons and pros of each approach. Or even how to "blend" these approaches together.

最后,我建议您尝试这两种方法(SQL和NoSQL),因为您的职业生涯似乎并不取决于这个项目。这将是一个受益的经验,因为您将了解第一手,缺点和优点的每种方法。甚至是如何“混合”这些方法。

#3


6  

What you're asking about is a common requirement in many systems -- how to extend a core data model to handle user-defined data. That's a popular requirement for packaged software (where it is typically handled one way) and open-source software (where it is handled another way).

您所询问的是许多系统中的一个常见需求——如何扩展核心数据模型来处理用户定义的数据。这是打包软件(通常以一种方式处理)和开源软件(以另一种方式处理)的普遍需求。

The earlier advice to learn more about RDBMS design generally can't hurt. What I will add to that is, don't fall into the trap of re-implementing a relational database in your own application-specific data model! I have seen this done many times, usually in packaged software. Not wanting to expose the core data model (or permission to alter it) to end users, the developer creates a generic data structure and an app interface that allows the end user to define entities, fields etc. but not using the RDBMS facilities. That's usually a mistake because it's hard to be nearly as thorough or bug-free as what a seasoned RDBMS can just do for you, and it can take a lot of time. It's tempting but IMHO not a good idea.

早期关于RDBMS设计的建议通常不会有什么坏处。我要补充的是,不要陷入在您自己的特定于应用程序的数据模型中重新实现关系数据库的陷阱!我已经见过很多次了,通常是在打包的软件中。开发人员不希望向最终用户公开核心数据模型(或者允许修改),而是创建了一个通用的数据结构和一个应用程序接口,允许最终用户定义实体、字段等,而不是使用RDBMS设施。这通常是一个错误,因为很难像经验丰富的RDBMS为您所做的那样彻底或无bug,而且它可能需要大量的时间。这主意不错,但我觉得不是个好主意。

Assuming the data model changes are global (shared by all users once admin has made them), the way I would approach this problem would be to create an app interface to sit between the admin user and the RDBMS, and apply whatever rules you need to apply to the data model changes, but then pass the final changes to the RDBMS. So for example, you may have rules that say entity names need to follow a certain format, new entities are allowed to have foreign keys to existing tables but must always use the DELETE CASCADE rule, fields can only be of certain data types, all fields must have default values etc. You could have a very simple screen asking the user to provide entity name, field names & defaults etc. and then generate the SQL code (inclusive of all your rules) to make these changes to your database.

假设数据模型变化是全球(由所有用户共享一次管理取得了),我会解决这个问题的方法是创建一个应用程序接口来坐admin用户和RDBMS之间,并应用任何规定您需要适用于数据模型的变化,然后通过最后更改RDBMS。举个例子,你可能规则说实体名称需要遵循一定的格式,新实体允许现有表的外键,但必须始终使用级联删除规则,某些数据类型的字段只能,所有字段必须有默认值等等。你可以有一个非常简单的屏幕要求用户提供实体名称,字段名称和默认值等,然后生成SQL代码(包括你所有的规则),使这些更改您的数据库。

Some common rules & how you would address them would be things like:

一些常见的规则&你会如何处理它们:

-- if a field is not null and has a default value, and there are already existing records in the table before that field was added by the admin, update existing records to have the default value while creating the field (multiple steps -- add field allowing null; update all existing records; alter the table to enforce not null w/ default) -- otherwise you wouldn't be able to use a field-level integrity rule)

——如果一个字段不是null,并且具有默认值,并且在admin添加该字段之前,表中已经有记录,那么在创建字段时更新现有记录,使其具有默认值(多个步骤——添加允许null的字段;更新所有现有记录;修改表以强制使用not null w/ default)——否则您将无法使用字段级的完整性规则)

-- new tables must have a distinct naming pattern so you can continue to distinguish your core data model from the user-extended data model, i.e. core and user-defined have different RDBMS owners (dbo. vs. user.) or prefixes (none for core, __ for user-defined) or somesuch.

——新表必须具有不同的命名模式,以便您能够继续将核心数据模型与用户扩展数据模型区分开来,即core和用户定义的RDBMS所有者不同(dbo)。或前缀(没有用于核心的,用于用户定义的)或somesuch。

-- it is OK to add fields to tables that are in the core data model (as long as they tolerate nulls or have a default), and it is OK for admin to delete fields that admin added to core data model tables, but admin cannot delete fields that were defined as part of the core data model.

——它可以将字段添加到表的核心数据模型(只要他们容忍null或有一个默认),和管理员可以删除字段管理核心数据模型添加到表,但管理员不能删除字段,定义为核心数据模型的一部分。

In other words -- use the power of the RDBMS to define the tables and manage the data, but in order to ensure whatever conventions or rules you need will always be applied, do this by building an app-to-DB admin function, instead of giving the admin user direct DB access.

换句话说,使用RDBMS的功能来定义表和管理数据,但是为了确保您所需要的任何约定或规则总是被应用,通过构建一个appto -DB的管理函数来实现这一点,而不是给admin用户直接的DB访问。

If you really wanted to do this via the DB layer only, you could probably achieve the same by creating a bunch of stored procedures and triggers that would implement the same logic (and who knows, maybe you would do that anyway for your app). That's probably more of a question of how comfortable are your admin users working in the DB tier vs. via an intermediary app.

如果您真的想通过DB层来实现这一点,那么您可以通过创建一堆存储过程和触发器来实现相同的逻辑(谁知道呢,也许您可以为您的应用程序这样做)。这可能更多的是一个问题,即您的管理员用户在DB层与通过中介应用程序工作时的舒适度。


So to answer your questions directly:

所以直接回答你的问题:

(1) Yes, add tables and columns at run time, but think about the rules you will need to have to ensure your app can work even once user-defined data is added, and choose a way to enforce those rules (via app or via DB / stored procs or whatever) when you process the table & field changes.

(1)是的,在运行时添加表和列,但考虑规则需要确保应用程序能够工作即使添加用户定义的数据,并选择一种方式来执行这些规则(通过应用程序或通过DB /存储效果之类的)当你表和字段的变化过程。

(2) This issue isn't strongly affected by your choice of SQL vs. NoSQL engine. In every case, you have a core data model and an extended data model. If you can design your app to respond to a dynamic data model (e.g. add new fields to screens when fields are added to a DB table or whatever) then your app will respond nicely to changes in both the core and user-defined data model. That's an interesting challenge but not much affected by choice of DB implementation style.

(2)这个问题不受SQL和NoSQL引擎的选择的影响。在每种情况下,都有一个核心数据模型和一个扩展的数据模型。如果你可以设计你的应用程序来响应一个动态数据模型(例如,当字段被添加到DB表或其他东西时,在屏幕上添加新的字段),那么你的应用程序将会很好地响应core和用户定义的数据模型的变化。这是一个有趣的挑战,但不会受到选择DB实现样式的影响。

Good luck!

好运!

#4


4  

Maybe doesn't matter the persistence engine of your model objects (RDBMS, NoSQL, etc...). The technology you're looking for is an index to search for and find your objects.

也许不影响模型对象的持久性引擎(RDBMS、NoSQL等等…)。您正在寻找的技术是一个索引来搜索和找到您的对象。

I think you need to find your objects using their schema. So, if the schema is defined dynamically and persisted on a database you can build dynamic search forms, etc.. Some kind of reference of the entity and attributes to the real objects is needed.

我认为您需要使用它们的模式找到您的对象。因此,如果模式是动态定义的,并且持久化到数据库中,您可以构建动态搜索表单,等等。需要某种实体和属性对真实对象的引用。

Give a look to the Entity-Attribute-Model pattern (EAV). This can be implemented over SQLAlchemy to use an RDBMS database as mean to store vertically your schema and data and relate thems.

看看实体-属性-模型模式(EAV)。这可以通过SQLAlchemy实现,以使用RDBMS数据库作为垂直存储模式和数据并关联thems的手段。

You're entering in the field of the Semantic Web Programming, maybe you should read at less the first chapter of this book:

你正在进入语义Web编程领域,也许你应该读一下这本书的第一章:

Programming The Semantic Web

编程语义网

it tells the whole story of your problem: from rigid schemas to dynamic schemas, implemented first as a key-value store and later improved to a graph persistence over a relational model.

它讲述了您的问题的整个过程:从严格的模式到动态模式,首先实现为键值存储,然后改进为关系模型上的图形持久性。

My opinion is that the best implementations of this could be achieved nowadays with graph databases and a very good example of current implementations are Berkeley DBs (some LDAP implementations use Berkeley DBs as a tech implementation to this indexing problem.)

我的观点是,现在最好的实现是通过图形数据库实现的,当前实现的一个很好的例子是Berkeley DBs(一些LDAP实现使用Berkeley DBs作为这个索引问题的技术实现)。

Once in a graph model you could do some kind of "inferences" on the graph, making appear the DB with somekind of "intelligence". An example of this is stated on the book.

一旦在一个图模型中,你可以在图上做一些“推断”,使DB具有某种“智能”。书中有一个例子。

#1


3  

So, if you conceptualize your entities as "documents," then this whole problem maps onto a no-sql solution pretty well. As commented, you'll need to have some kind of model layer that sits on top of your document store and performs tasks like validation, and perhaps enforces (or encourages) some kind of schema, because there's no implicit backend requirement that entities in the same collection (parallel to table) share schema.

因此,如果您将实体概念化为“文档”,那么整个问题就很好地映射到非sql解决方案。如前所述,您将需要某种模型层,该模型层位于文档存储之上,并执行诸如验证之类的任务,并且可能强制(或鼓励)某种模式,因为不存在要求相同集合(与表并行)中的实体共享模式的隐式后端需求。

Allowing privileged users to change your schema concept (as opposed to just adding fields to individual documents - that's easy to support) will pose a little bit of a challenge - you'll have to handle migrating the existing data to match the new schema automatically.

允许有特权的用户更改您的模式概念(而不是仅仅向单个文档添加字段——这很容易支持)将带来一点挑战——您必须处理迁移现有数据以自动匹配新的模式。

Reading your edits, Mongo supports the kind of searching/ordering you're looking for, and will give you the support for "empty cells" (documents lacking a particular key) that you need.

阅读您的编辑,Mongo支持您正在寻找的搜索/排序,并将为您所需的“空单元格”(缺少特定键的文档)提供支持。

If I were you (and I happen to be working on a similar, but simpler, product at the moment), I'd stick with Mongo and look into a lightweight web framework like Flask to provide the front-end. You'll be on your own to provide the model, but you won't be fighting against a framework's implicit modeling choices.

如果我是你(我碰巧正在开发一款类似但更简单的产品),我会坚持使用Mongo,研究Flask这样的轻量级web框架,以提供前端。您将自己提供模型,但不会与框架的隐式建模选择发生冲突。

#2


19  

The SQL or NoSQL choice is not your problem. You need to read little more about database design in general. As you said, you're not a database expert(and you don't need to be), but you absolutely must study a little more the RDBMS paradigm.

SQL或NoSQL选项不是您的问题。您需要阅读更多关于数据库设计的内容。正如您所言,您不是数据库专家(您不需要这样做),但是您绝对必须进一步研究RDBMS范式。

It's a common mistake for amateur enthusiasts to choose a NoSQL solution. Sometimes NoSQL is a good solution, most of the times is not.

业余爱好者选择NoSQL解决方案是一个常见的错误。有时NoSQL是一个很好的解决方案,但大多数时候不是。

Take for example MongoDB, which you mentioned(and is one of the good NoSQL solutions I've tried). Schema-less, right? Err.. not exactly. You see when something is schema-less means no constraints, validation, etc. But your application's models/entities can't stand on thin air! Surely there will be some constraints and validation logic which you will implement on your software layer. So I give you mongokit! I will just quote from the project's description this tiny bit

以MongoDB为例,您提到过它(它是我尝试过的一个很好的NoSQL解决方案)。非模式化,对吧?呃. .不完全是。您可以看到,当某些东西是无模式的时,意味着没有约束、验证等。但是您的应用程序的模型/实体不能空穴来风!当然,您将在软件层上实现一些约束和验证逻辑。所以我给你mongokit!我将引用这个项目的描述

MongoKit brings structured schema and validation layer on top of the great pymongo driver

MongoKit在伟大的pymongo驱动程序之上带来了结构化的模式和验证层

Hmmm... unstructured became structured.

嗯…非结构化结构化。

At least we don't have SQL right? Yeah, we don't. We have a different query language which is of course inferior to SQL. At least you don't need to resort to map/reduce for basic queries(see CouchDB).

至少我们没有SQL,对吧?是的,我们没有。我们有一种不同的查询语言,当然,它不如SQL。至少您不需要对基本查询使用map/reduce(参见CouchDB)。

Don't get me wrong, NoSQL(and especially MongoDB) has its purpose, but most of the times these technologies are used for the wrong reason.

不要误解我的意思,NoSQL(特别是MongoDB)有它的目的,但是大多数时候这些技术的使用理由都是错误的。

Also, if you care about serious persistence and data integrity forget about NoSQL solutions. All these technologies are too experimental to keep your serious data. By researching a bit who(except Google/Amazon) uses NoSQL solutions and for what exactly, you will find that almost no one uses it for keeping their important data. They mostly use them for logging, messages and real time data. Basically anything to off-load some burden from their SQL db storage.

此外,如果您关心严重的持久性和数据完整性,那么请忘记NoSQL解决方案。所有这些技术都是实验性的,不能保证你的数据可靠。通过研究一些人(除了谷歌/Amazon)使用NoSQL解决方案,而且确切地说,您会发现几乎没有人使用它来保存重要数据。它们主要用于日志记录、消息和实时数据。基本上,可以从SQL db存储中卸载一些负担。

Redis, in my opinion, is probably the only project who is going to survive the NoSQL explosion unscathed. Maybe because it doesn't advertise itself as NoSQL, but as a key-value store, which is exactly what it is and a pretty damn good one! Also they seem serious about persistence. It is a swiss army knife, but not a good solution to replace entirely your RDBMS.

在我看来,Redis可能是唯一一个在NoSQL爆炸中毫发无损的项目。也许是因为它并没有标榜自己是NoSQL,而是作为一个键值存储,这正是它的本质,而且非常棒!此外,他们似乎对坚持不懈很认真。这是一把瑞士军刀,但不是完全替代RDBMS的好解决方案。

I am sorry, I said too much :)

对不起,我说得太多了。

So here is my suggestion:

我的建议是:

1) Study the RDBMS model a bit.

1)对RDBMS模型进行一些研究。

2) Django is a good framework if most of your project is going to use an RDBMS.

2)如果您的项目大部分使用RDBMS, Django是一个很好的框架。

3) Postgresql rocks! Also keep in mind that version 9.2 will bring native JSON support. You could dump all your 'dynamic' properties in there and you could use a secondary storage/engine to perform queries(map/reduce) on said properties. Have your cake and eat it too!

3)Postgresql岩石!还要记住,9.2版本将提供本机JSON支持。您可以在其中转储所有的“动态”属性,您可以使用辅助存储/引擎来执行查询(map/reduce)。把你的蛋糕也吃了!

4) For serious search capabilities consider specialized engines like solr.

4)要想有真正的搜索能力,可以考虑像solr这样的专业搜索引擎。

EDIT: 6 April 2013

编辑:2013年4月6日

5) django-ext-hstore gives you access to postgresql hstore type. It's similar to a python dictionary and you can perform queries on it, with the limitation that you can't have nested dictionaries as values. Also the value of key can be only of type string.

5) django-ext-hstore允许您访问postgresql hstore类型。它类似于python字典,您可以对它执行查询,但有一个限制,即不能将嵌套字典作为值。键值也只能是字符串类型。

Have fun

玩得开心


Update in response to OP's comment

更新响应OP的评论

0) Consider the application 'contains data' and has already been used for a while

0)考虑应用程序“包含数据”,并且已经使用了一段时间

I am not sure if you mean that it contains data in a legacy dbms or you are just trying to say that "imagine that the DB is not empty and consider the following points...". In the former case, it seems a migration issue(completely different question), in the latter, well OK.

我不确定您的意思是它包含了遗留dbms中的数据,或者您只是想说“假设DB不是空的,并且考虑以下几点……”。在前一种情况下,这似乎是一个迁移问题(完全不同的问题),在后一种情况下,好的。

1) Admin deletes entity "family" and all related data

1)管理员删除实体“族”和所有相关数据

Why should someone eliminate completely an entity(table)? Either your application has to do with families, houses, etc or it doesn't. Deleting instances(rows) of families is understandable of course.

为什么有人要完全删除一个实体(表)?要么你的申请涉及家庭,房子,等等。删除家族的实例(行)当然是可以理解的。

2) Admin creates entity "house"

2)管理员创建实体“house”

Same with #1. If you introduce a brand new entity in your app then most probably it will encapsulate semantics and business logic, for which new code must be written. This happens to all applications as they evolve through time and of course warrants a creation of a new table, or maybe ALTERing an existing one. But this process is not a part of the functionality of your application. i.e. it happens rarely, and is a migration/refactoring issue.

# 1也一样。如果您在应用程序中引入一个全新的实体,那么很可能它将封装语义和业务逻辑,因此必须编写新的代码。随着时间的推移,所有应用程序都会出现这种情况,当然,这就需要创建一个新的表,或者修改一个现有的表。但是这个过程不是应用程序功能的一部分。例如,它很少发生,并且是一个迁移/重构问题。

3) Admin adds properties "floors", "age", ..

3)管理员添加属性“楼层”、“年龄”、……

Why? Don't we know beforehand that a House has floors? That a User has a gender? Adding and removing, dynamically, this type of attributes is not a feature, but a design flaw. It is part of the analysis/design phase to identify your entities and their respective properties.

为什么?难道我们不知道房子有地板吗?用户有性别吗?动态地添加和删除这类属性不是特性,而是设计缺陷。它是分析/设计阶段的一部分,用于标识您的实体及其各自的属性。

4) Privileged user adds some houses.

4)特权用户增加一些房子。

Yes, he is adding an instance(row) to the existing entity(table) House.

是的,他正在向现有实体(表)库添加实例(行)。

5) User searches for all houses with at least five floors cheaper than 100 $

5)用户搜索所有至少五层楼低于100美元的房子

A perfectly valid query which can be achieved with either SQL or NoSQL solution. In django it would be something along those lines:

一个完全有效的查询,可以用SQL或NoSQL解决方案实现。在《被解救的姜戈》中,应该是这样的:

House.objects.filter(floors__gte=5, price__lt=100)

provided that House has the attributes floors and price. But if you need to do text-based queries, then neither SQL nor NoSQL will be satisfying enough. Because you don't want to implement faceting or stemming on your own! You will use some of the already discussed solutions(Solr, ElasticSearch, etc).

只要房子有属性、楼层和价格。但是,如果您需要执行基于文本的查询,那么SQL和NoSQL都不够令人满意。因为你不想自己去实现它。您将使用一些已经讨论过的解决方案(Solr、ElasticSearch等)。

Some more general notes:

一些更一般的笔记:

The examples you gave about Houses, Users and their properties, do not warrant any dynamic schema. Maybe you simplified your example just to make your point, but you talk about adding/removing Entities(tables) as if they are rows in a db. Entities are supposed to be a big deal in an application. They define the purpose of your application and its functionality. As such, they can't change every minute.

您所给出的关于房屋、用户及其属性的示例不支持任何动态模式。也许你简化了你的例子来说明你的观点,但是你谈论的是添加/删除实体(表),就好像它们是db中的行。实体在应用程序中应该是很重要的。它们定义了应用程序的目的及其功能。因此,他们不能每一分钟都改变。

Also you said:

你也说:

The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.

This seems like a common case where an attribute has null=True.

这似乎是一个属性为null=True的常见情况。

And as a final note, I would like to suggest to you to try both approaches(SQL and NoSQL), since it doesn't seem like your career depends on this project. It will be a beneficiary experience, as you will understand first-hand, the cons and pros of each approach. Or even how to "blend" these approaches together.

最后,我建议您尝试这两种方法(SQL和NoSQL),因为您的职业生涯似乎并不取决于这个项目。这将是一个受益的经验,因为您将了解第一手,缺点和优点的每种方法。甚至是如何“混合”这些方法。

#3


6  

What you're asking about is a common requirement in many systems -- how to extend a core data model to handle user-defined data. That's a popular requirement for packaged software (where it is typically handled one way) and open-source software (where it is handled another way).

您所询问的是许多系统中的一个常见需求——如何扩展核心数据模型来处理用户定义的数据。这是打包软件(通常以一种方式处理)和开源软件(以另一种方式处理)的普遍需求。

The earlier advice to learn more about RDBMS design generally can't hurt. What I will add to that is, don't fall into the trap of re-implementing a relational database in your own application-specific data model! I have seen this done many times, usually in packaged software. Not wanting to expose the core data model (or permission to alter it) to end users, the developer creates a generic data structure and an app interface that allows the end user to define entities, fields etc. but not using the RDBMS facilities. That's usually a mistake because it's hard to be nearly as thorough or bug-free as what a seasoned RDBMS can just do for you, and it can take a lot of time. It's tempting but IMHO not a good idea.

早期关于RDBMS设计的建议通常不会有什么坏处。我要补充的是,不要陷入在您自己的特定于应用程序的数据模型中重新实现关系数据库的陷阱!我已经见过很多次了,通常是在打包的软件中。开发人员不希望向最终用户公开核心数据模型(或者允许修改),而是创建了一个通用的数据结构和一个应用程序接口,允许最终用户定义实体、字段等,而不是使用RDBMS设施。这通常是一个错误,因为很难像经验丰富的RDBMS为您所做的那样彻底或无bug,而且它可能需要大量的时间。这主意不错,但我觉得不是个好主意。

Assuming the data model changes are global (shared by all users once admin has made them), the way I would approach this problem would be to create an app interface to sit between the admin user and the RDBMS, and apply whatever rules you need to apply to the data model changes, but then pass the final changes to the RDBMS. So for example, you may have rules that say entity names need to follow a certain format, new entities are allowed to have foreign keys to existing tables but must always use the DELETE CASCADE rule, fields can only be of certain data types, all fields must have default values etc. You could have a very simple screen asking the user to provide entity name, field names & defaults etc. and then generate the SQL code (inclusive of all your rules) to make these changes to your database.

假设数据模型变化是全球(由所有用户共享一次管理取得了),我会解决这个问题的方法是创建一个应用程序接口来坐admin用户和RDBMS之间,并应用任何规定您需要适用于数据模型的变化,然后通过最后更改RDBMS。举个例子,你可能规则说实体名称需要遵循一定的格式,新实体允许现有表的外键,但必须始终使用级联删除规则,某些数据类型的字段只能,所有字段必须有默认值等等。你可以有一个非常简单的屏幕要求用户提供实体名称,字段名称和默认值等,然后生成SQL代码(包括你所有的规则),使这些更改您的数据库。

Some common rules & how you would address them would be things like:

一些常见的规则&你会如何处理它们:

-- if a field is not null and has a default value, and there are already existing records in the table before that field was added by the admin, update existing records to have the default value while creating the field (multiple steps -- add field allowing null; update all existing records; alter the table to enforce not null w/ default) -- otherwise you wouldn't be able to use a field-level integrity rule)

——如果一个字段不是null,并且具有默认值,并且在admin添加该字段之前,表中已经有记录,那么在创建字段时更新现有记录,使其具有默认值(多个步骤——添加允许null的字段;更新所有现有记录;修改表以强制使用not null w/ default)——否则您将无法使用字段级的完整性规则)

-- new tables must have a distinct naming pattern so you can continue to distinguish your core data model from the user-extended data model, i.e. core and user-defined have different RDBMS owners (dbo. vs. user.) or prefixes (none for core, __ for user-defined) or somesuch.

——新表必须具有不同的命名模式,以便您能够继续将核心数据模型与用户扩展数据模型区分开来,即core和用户定义的RDBMS所有者不同(dbo)。或前缀(没有用于核心的,用于用户定义的)或somesuch。

-- it is OK to add fields to tables that are in the core data model (as long as they tolerate nulls or have a default), and it is OK for admin to delete fields that admin added to core data model tables, but admin cannot delete fields that were defined as part of the core data model.

——它可以将字段添加到表的核心数据模型(只要他们容忍null或有一个默认),和管理员可以删除字段管理核心数据模型添加到表,但管理员不能删除字段,定义为核心数据模型的一部分。

In other words -- use the power of the RDBMS to define the tables and manage the data, but in order to ensure whatever conventions or rules you need will always be applied, do this by building an app-to-DB admin function, instead of giving the admin user direct DB access.

换句话说,使用RDBMS的功能来定义表和管理数据,但是为了确保您所需要的任何约定或规则总是被应用,通过构建一个appto -DB的管理函数来实现这一点,而不是给admin用户直接的DB访问。

If you really wanted to do this via the DB layer only, you could probably achieve the same by creating a bunch of stored procedures and triggers that would implement the same logic (and who knows, maybe you would do that anyway for your app). That's probably more of a question of how comfortable are your admin users working in the DB tier vs. via an intermediary app.

如果您真的想通过DB层来实现这一点,那么您可以通过创建一堆存储过程和触发器来实现相同的逻辑(谁知道呢,也许您可以为您的应用程序这样做)。这可能更多的是一个问题,即您的管理员用户在DB层与通过中介应用程序工作时的舒适度。


So to answer your questions directly:

所以直接回答你的问题:

(1) Yes, add tables and columns at run time, but think about the rules you will need to have to ensure your app can work even once user-defined data is added, and choose a way to enforce those rules (via app or via DB / stored procs or whatever) when you process the table & field changes.

(1)是的,在运行时添加表和列,但考虑规则需要确保应用程序能够工作即使添加用户定义的数据,并选择一种方式来执行这些规则(通过应用程序或通过DB /存储效果之类的)当你表和字段的变化过程。

(2) This issue isn't strongly affected by your choice of SQL vs. NoSQL engine. In every case, you have a core data model and an extended data model. If you can design your app to respond to a dynamic data model (e.g. add new fields to screens when fields are added to a DB table or whatever) then your app will respond nicely to changes in both the core and user-defined data model. That's an interesting challenge but not much affected by choice of DB implementation style.

(2)这个问题不受SQL和NoSQL引擎的选择的影响。在每种情况下,都有一个核心数据模型和一个扩展的数据模型。如果你可以设计你的应用程序来响应一个动态数据模型(例如,当字段被添加到DB表或其他东西时,在屏幕上添加新的字段),那么你的应用程序将会很好地响应core和用户定义的数据模型的变化。这是一个有趣的挑战,但不会受到选择DB实现样式的影响。

Good luck!

好运!

#4


4  

Maybe doesn't matter the persistence engine of your model objects (RDBMS, NoSQL, etc...). The technology you're looking for is an index to search for and find your objects.

也许不影响模型对象的持久性引擎(RDBMS、NoSQL等等…)。您正在寻找的技术是一个索引来搜索和找到您的对象。

I think you need to find your objects using their schema. So, if the schema is defined dynamically and persisted on a database you can build dynamic search forms, etc.. Some kind of reference of the entity and attributes to the real objects is needed.

我认为您需要使用它们的模式找到您的对象。因此,如果模式是动态定义的,并且持久化到数据库中,您可以构建动态搜索表单,等等。需要某种实体和属性对真实对象的引用。

Give a look to the Entity-Attribute-Model pattern (EAV). This can be implemented over SQLAlchemy to use an RDBMS database as mean to store vertically your schema and data and relate thems.

看看实体-属性-模型模式(EAV)。这可以通过SQLAlchemy实现,以使用RDBMS数据库作为垂直存储模式和数据并关联thems的手段。

You're entering in the field of the Semantic Web Programming, maybe you should read at less the first chapter of this book:

你正在进入语义Web编程领域,也许你应该读一下这本书的第一章:

Programming The Semantic Web

编程语义网

it tells the whole story of your problem: from rigid schemas to dynamic schemas, implemented first as a key-value store and later improved to a graph persistence over a relational model.

它讲述了您的问题的整个过程:从严格的模式到动态模式,首先实现为键值存储,然后改进为关系模型上的图形持久性。

My opinion is that the best implementations of this could be achieved nowadays with graph databases and a very good example of current implementations are Berkeley DBs (some LDAP implementations use Berkeley DBs as a tech implementation to this indexing problem.)

我的观点是,现在最好的实现是通过图形数据库实现的,当前实现的一个很好的例子是Berkeley DBs(一些LDAP实现使用Berkeley DBs作为这个索引问题的技术实现)。

Once in a graph model you could do some kind of "inferences" on the graph, making appear the DB with somekind of "intelligence". An example of this is stated on the book.

一旦在一个图模型中,你可以在图上做一些“推断”,使DB具有某种“智能”。书中有一个例子。