用于简单数据存储的文件与数据库的内存使用

时间:2022-01-26 04:03:45


I'm writing the server for a Javascript app that has a syncing feature. Files and directories being created and modified by the client need to be synced to the server (the same changes made on the client need to be made on the server, including deletes).

我正在为具有同步功能的Javascript应用程序编写服务器。客户端创建和修改的文件和目录需要同步到服务器(需要在服务器上对客户端进行相同的更改,包括删除)。

Since every file is on the server, I'm debating the need for a MySQL database entry corresponding to each file. The following information needs to be kept on each file/directory for every user:

由于每个文件都在服务器上,我正在讨论对应于每个文件的MySQL数据库条目的需求。每个用户需要在每个文件/目录中保留以下信息:

  1. Whether it was deleted or not (since deletes need to be synced to other clients)
  2. 是否已删除(因为删除需要同步到其他客户端)

  3. The timestamp of when every file was last modified (so I know whether the file needs updating by the client or not)
  4. 每个文件最后一次修改的时间戳(所以我知道文件是否需要客户端更新)

I could keep both of those pieces of information in files (e.g. .deleted file and .modified file in every user's directory containing file paths + timestamps in the latter) or in the database.

我可以将这两部分信息保存在文件中(例如,在包含文件路径的每个用户的目录中的.deleted文件和.modified文件+后者中的时间戳)或数据库中。

However, I also have to fit under an 80mb memory constraint. Between file storage and database storage, which would be more memory-efficient for this purpose?

但是,我还必须适应80mb的内存限制。在文件存储和数据库存储之间,为此目的,这将更具内存效率?

Edit: Files have to be stored on the filesystem (not in a database), and users have a quota for the storage space they can use.

编辑:文件必须存储在文件系统上(而不是数据库中),并且用户可以使用他们可以使用的存储空间的配额。

3 个解决方案

#1


1  

Probably the filesystem variant will be more efficient memory wise as long as the number of files is low, but that solution probably won't scale. Databases are optimized to do exactly that. Searching the filesystem, opening the file, searching the document, will be expensive as the number of files and requests increase.

只要文件数量很少,文件系统变体可能会更有效,但该解决方案可能无法扩展。数据库经过优化,可以完全实现。随着文件和请求数量的增加,搜索文件系统,打开文件,搜索文档将会很昂贵。

But nobody says you have to use MySQl. A NoSQL database like Redis, or maybe something like CouchDB (where you could keep the file itself and include versioning) might be solutions that are more attractive.

但是没有人说你必须使用MySQl。像Redis这样的NoSQL数据库,或者类似CouchDB(你可以保存文件本身并包含版本控制)的东西可能是更有吸引力的解决方案。

here a quick comparison of NoSQL databases. And a longer comparison.

这里是NoSQL数据库的快速比较。和更长的比较。

Edit: From your comments, I would build it as follows: create an API abstracting the backend for all the operations you want to do. Then implement the backend part with the 2 or 3 operations that happen most, or could be more expensive, for the filesytem, and for a database (or two). Test and benchmark.

编辑:从您的评论中,我将按如下方式构建它:创建一个API,为您想要执行的所有操作提取后端。然后使用最常发生的2或3个操作实现后端部分,或者对于文件系统和数据库(或两个)可能更昂贵。测试和基准。

#2


0  

I'd go for one of the NoSQL databases. You can store file contents and provide some key function based on user's IDs in order to retrieve those contents when you need them. Redis or Casandra can be good choices for this case. There are many libs to use these databases in Python as well as in many other languages.

我会去一个NoSQL数据库。您可以存储文件内容并根据用户的ID提供一些关键功能,以便在您需要时检索这些内容。对于这种情况,Redis或Casandra可能是不错的选择。有很多库可以在Python以及许多其他语言中使用这些数据库。

#3


0  

In my opinion, the only real way to be sure is to build a test system and compare the space requirements. It shouldn't take that long to generate some random data programatically. One might think the file system would be more efficient, but databases can and might compress the data or deduplicate it, or whatever. Don't forget that a database would also make it easier to implement new features, perhaps access control.

在我看来,唯一真正的方法是建立一个测试系统并比较空间要求。以编程方式生成一些随机数据不应该花那么长时间。有人可能认为文件系统会更高效,但数据库可以并且可能压缩数据或对其进行重复数据删除,或者其他任何操作。不要忘记,数据库也可以更容易地实现新功能,也许是访问控制。

#1


1  

Probably the filesystem variant will be more efficient memory wise as long as the number of files is low, but that solution probably won't scale. Databases are optimized to do exactly that. Searching the filesystem, opening the file, searching the document, will be expensive as the number of files and requests increase.

只要文件数量很少,文件系统变体可能会更有效,但该解决方案可能无法扩展。数据库经过优化,可以完全实现。随着文件和请求数量的增加,搜索文件系统,打开文件,搜索文档将会很昂贵。

But nobody says you have to use MySQl. A NoSQL database like Redis, or maybe something like CouchDB (where you could keep the file itself and include versioning) might be solutions that are more attractive.

但是没有人说你必须使用MySQl。像Redis这样的NoSQL数据库,或者类似CouchDB(你可以保存文件本身并包含版本控制)的东西可能是更有吸引力的解决方案。

here a quick comparison of NoSQL databases. And a longer comparison.

这里是NoSQL数据库的快速比较。和更长的比较。

Edit: From your comments, I would build it as follows: create an API abstracting the backend for all the operations you want to do. Then implement the backend part with the 2 or 3 operations that happen most, or could be more expensive, for the filesytem, and for a database (or two). Test and benchmark.

编辑:从您的评论中,我将按如下方式构建它:创建一个API,为您想要执行的所有操作提取后端。然后使用最常发生的2或3个操作实现后端部分,或者对于文件系统和数据库(或两个)可能更昂贵。测试和基准。

#2


0  

I'd go for one of the NoSQL databases. You can store file contents and provide some key function based on user's IDs in order to retrieve those contents when you need them. Redis or Casandra can be good choices for this case. There are many libs to use these databases in Python as well as in many other languages.

我会去一个NoSQL数据库。您可以存储文件内容并根据用户的ID提供一些关键功能,以便在您需要时检索这些内容。对于这种情况,Redis或Casandra可能是不错的选择。有很多库可以在Python以及许多其他语言中使用这些数据库。

#3


0  

In my opinion, the only real way to be sure is to build a test system and compare the space requirements. It shouldn't take that long to generate some random data programatically. One might think the file system would be more efficient, but databases can and might compress the data or deduplicate it, or whatever. Don't forget that a database would also make it easier to implement new features, perhaps access control.

在我看来,唯一真正的方法是建立一个测试系统并比较空间要求。以编程方式生成一些随机数据不应该花那么长时间。有人可能认为文件系统会更高效,但数据库可以并且可能压缩数据或对其进行重复数据删除,或者其他任何操作。不要忘记,数据库也可以更容易地实现新功能,也许是访问控制。