将帖子主体存储在数据库或文件中?

时间:2022-10-17 23:21:09

I'm learning web-centric programming by writing myself a blog, using PHP with a MySQL database backend. This should replace my current (Drupal based) blog.

我正在学习以网络为中心的编程,自己编写博客,使用PHP和MySQL数据库后端。这应该取代我目前的(基于Drupal)博客。

I've decided that a post should contain some data: id, userID, title, content, time-posted. That makes a nice schema for a database table. I'm having issues deciding how I want to organize the storage of content, though.

我已经确定帖子应该包含一些数据:id,userID,title,content,time-posted。这为数据库表提供了一个很好的模式。不过,我在决定如何组织内容存储方面遇到了问题。

I could either:

我可以:

  1. Use a file-based system. The database table content would then be a URL to a locally-located file, which I'd then read, format, and display.
  2. 使用基于文件的系统。然后,数据库表内容将是本地文件的URL,然后我将读取,格式化和显示该文件。
  3. Store the entire contents of the post in content, ie put it into the database.
  4. 将帖子的全部内容存储在内容中,即将其放入数据库中。

If I went with (1), searching the contents of posts would be slightly problematic - I'd be limited to metadata searching, or I'd have to read the contents of each file when searching (although I don't know how much of a problem that'd be - grep -ir "string" . isn't too slow...). However, images (if any) would be referenced by a URL, so referencing content would at least be an internally consistant methodology, and I'd be easily able to reuse the content, as text files are ridiculously easy to work with, compared to an SQL database file.

如果我去(1),搜索帖子的内容会有点问题 - 我只限于元数据搜索,或者我必须在搜索时阅读每个文件的内容(虽然我不知道多少问题是 - grep -ir“字符串”。不是太慢......)。但是,图像(如果有的话)将由URL引用,因此引用内容至少是内部一致的方法,并且我可以轻松地重用内容,因为文本文件非常容易使用,与之相比,一个SQL数据库文件。

Going with (2), though, I could use a longtext. The content would then need to be sanitised before I tried to put it into the tuple, and I'm limited by size (although, it's unlikely that I'd write a 4GB blog post ;). Searching would be easy.

但是,与(2)一起,我可以使用长篇文章。然后,在我尝试将内容放入元组之前,需要对内容进行清理,并且我受到大小的限制(尽管,我不太可能写一篇4GB的博客文章;)。搜索会很容易。

I don't (currently) see which way would be (a) easier to implement, (b) easier to live with.

我(目前)没有看到哪种方式(a)更容易实现,(b)更容易实现。

Which way should I go / how is this normally done? Any further pros / cons for either (1) or (2) would be appreciated.

我应该走哪条路/通常如何做?任何(1)或(2)的利弊都将受到赞赏。

2 个解决方案

#1


4  

For the 'current generation', implementing a database is pretty much your safest bet. As you mentioned, it's pretty standard, and you outlined all of the fun stuff. Most SQL instances have a fairly powerful FULLTEXT (or equivalent) search. You'll probably have just as much architecture to write between the two you outlined, especially if you want one to have the feature-parity of the other.

对于“当前一代”,实施数据库几乎是你最安全的选择。正如你所提到的,它非常标准,你概述了所有有趣的东西。大多数SQL实例都具有相当强大的FULLTEXT(或等效)搜索。您可能会在您概述的两者之间编写尽可能多的体系结构,特别是如果您希望其中一个具有另一个的功能奇偶校验。

The up-and-coming technology is a key/value store, commonly referred to as NoSQL. With this, you can store your content and metadata into separate individual documents, but in a structured way that makes searching and retrieval quite fast. Some common NoSQL engines are mongo, CouchDB, and redis (among others).

即将到来的技术是一个键/值存储,通常称为NoSQL。通过这种方式,您可以将内容和元数据存储到单独的单个文档中,但结构化的方式可以使搜索和检索速度非常快。一些常见的NoSQL引擎是mongo,CouchDB和redis(以及其他)。

Ultimately this comes down to personal preference, along with a few use-case considerations. You didn't really outline what is important to you as far as conveniences and your application. Any one of these would be just fine for a personal or development blog. Building an entire platform with multiple contributors is a different conversation.

最终,这取决于个人偏好,以及一些用例考虑因素。就便利性和应用而言,您并没有真正概述对您来说重要的事项。这些中的任何一个都适用于个人或开发博客。构建具有多个贡献者的整个平台是一个不同的对话。

#2


1  

13 years ago I tried your option 1 (having external files for text content) - not with a blog, but with a CMS. And I ended in shoveling it all back into the database for easier handling. It's much easier to have global replaces on the database than on the text file level. With large numbers of post you run into trouble with directory sizes and access speed, or you have to manage subdirectory schemes etc. etc. Stick to the database only approach- There are some tools to make your life easier with text files than the built-in mysql functions, but with a command line client like mysql and mysqldump you can easily extract any texts to the file system level, work on them with standard tools and re-load them into the database. What mysql really lacks is built-in support for regex search/replace, but even for that you'll find a patch if you're willing to recompile mysql.

13年前,我尝试了您的选项1(具有文本内容的外部文件) - 不是使用博客,而是使用CMS。最后我把它全部铲回数据库以便于处理。在数据库上进行全局替换比在文本文件级别上更容易。有大量的帖子你会遇到目录大小和访问速度的问题,或者你必须管理子目录方案等。坚持只使用数据库的方法 - 有一些工具可以使你的文本文件比你的生活更容易在mysql函数中,但是使用mysql和mysqldump等命令行客户端,您可以轻松地将任何文本提取到文件系统级别,使用标准工具处理它们并将它们重新加载到数据库中。 mysql真正缺少的是对正则表达式搜索/替换的内置支持,但即便如此,如果你愿意重新编译mysql,你也会找到补丁。

#1


4  

For the 'current generation', implementing a database is pretty much your safest bet. As you mentioned, it's pretty standard, and you outlined all of the fun stuff. Most SQL instances have a fairly powerful FULLTEXT (or equivalent) search. You'll probably have just as much architecture to write between the two you outlined, especially if you want one to have the feature-parity of the other.

对于“当前一代”,实施数据库几乎是你最安全的选择。正如你所提到的,它非常标准,你概述了所有有趣的东西。大多数SQL实例都具有相当强大的FULLTEXT(或等效)搜索。您可能会在您概述的两者之间编写尽可能多的体系结构,特别是如果您希望其中一个具有另一个的功能奇偶校验。

The up-and-coming technology is a key/value store, commonly referred to as NoSQL. With this, you can store your content and metadata into separate individual documents, but in a structured way that makes searching and retrieval quite fast. Some common NoSQL engines are mongo, CouchDB, and redis (among others).

即将到来的技术是一个键/值存储,通常称为NoSQL。通过这种方式,您可以将内容和元数据存储到单独的单个文档中,但结构化的方式可以使搜索和检索速度非常快。一些常见的NoSQL引擎是mongo,CouchDB和redis(以及其他)。

Ultimately this comes down to personal preference, along with a few use-case considerations. You didn't really outline what is important to you as far as conveniences and your application. Any one of these would be just fine for a personal or development blog. Building an entire platform with multiple contributors is a different conversation.

最终,这取决于个人偏好,以及一些用例考虑因素。就便利性和应用而言,您并没有真正概述对您来说重要的事项。这些中的任何一个都适用于个人或开发博客。构建具有多个贡献者的整个平台是一个不同的对话。

#2


1  

13 years ago I tried your option 1 (having external files for text content) - not with a blog, but with a CMS. And I ended in shoveling it all back into the database for easier handling. It's much easier to have global replaces on the database than on the text file level. With large numbers of post you run into trouble with directory sizes and access speed, or you have to manage subdirectory schemes etc. etc. Stick to the database only approach- There are some tools to make your life easier with text files than the built-in mysql functions, but with a command line client like mysql and mysqldump you can easily extract any texts to the file system level, work on them with standard tools and re-load them into the database. What mysql really lacks is built-in support for regex search/replace, but even for that you'll find a patch if you're willing to recompile mysql.

13年前,我尝试了您的选项1(具有文本内容的外部文件) - 不是使用博客,而是使用CMS。最后我把它全部铲回数据库以便于处理。在数据库上进行全局替换比在文本文件级别上更容易。有大量的帖子你会遇到目录大小和访问速度的问题,或者你必须管理子目录方案等。坚持只使用数据库的方法 - 有一些工具可以使你的文本文件比你的生活更容易在mysql函数中,但是使用mysql和mysqldump等命令行客户端,您可以轻松地将任何文本提取到文件系统级别,使用标准工具处理它们并将它们重新加载到数据库中。 mysql真正缺少的是对正则表达式搜索/替换的内置支持,但即便如此,如果你愿意重新编译mysql,你也会找到补丁。