在blob中存储数据与将指针存储到文件之间有什么区别?

时间:2021-09-13 17:13:03

I have a question about the blob data type in MySQL.

我有一个关于MySQL中的blob数据类型的问题。

I read that the data type can be used to store files. I also read that an alternative is to store the file on disk and include a pointer to its location in the database (via a varchar column).

我读到数据类型可以用来存储文件。我还看到另一种方法是将文件存储在磁盘上,并包含指向其在数据库中的位置的指针(通过varchar列)。

But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents. So is that any different than storing a pointer to a file on the file system?

但我有点困惑,因为我读过blob字段不存储在行中,需要单独查找以检索其内容。这与在文件系统中存储指向文件的指针有什么不同吗?

5 个解决方案

#1


12  

I read that the data type can be used to store files.

我读到数据类型可以用来存储文件。

According to MySQL manual page on Blob, A BLOB is a binary large object that can hold a variable amount of data.

根据Blob上的MySQL手册页,Blob是一个二进制大对象,可以容纳可变数量的数据。

Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.

由于它是一种特定于存储二进制数据的数据类型,因此使用它以二进制格式存储文件是很常见的,存储图像文件在web应用程序中非常常见。

For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.

对于web应用程序,这意味着您首先需要将文件转换为二进制格式,然后将其存储起来,每次需要检索文件时,您都需要执行反向过程,将其转换为原始格式。

Besides that, storing large amount of data in your db MAY slow it down. Specially in systems that are not dedicated only to host a database.

此外,在数据库中存储大量数据可能会降低速度。特别是在不专门用于承载数据库的系统中。

I also read that an alternative is to store the file on disk and include a pointer to its location in the database

我还看到另一种方法是将文件存储在磁盘上,并包含指向其在数据库中的位置的指针

Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAY speed up your database when dealing with large amount of data.

考虑到以上所有考虑,web应用程序的一个常见实践是将文件存储在MySQL之外的其他地方,然后在数据库中简单地存储它的路径。当处理大量数据时,这种方法可能会加快数据库的速度。

But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.

但我有点困惑,因为我读过blob字段不存储在行中,需要单独查找以检索其内容。

In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blog on how the blob is stored in MySQL.

事实上,这将取决于您正在使用的存储引擎,因为每个引擎处理数据并以不同的方式存储它。对于适合关系数据库的InnoDB引擎,您可能想要阅读这篇关于blob在MySQL中如何存储的MySQL性能博客文章。

But in abstract, on MySQL 5 and forward the blob is stored as following:

但抽象地说,在MySQL 5和forward上,blob存储如下:

Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.

Innodb可以将整个blob存储在行页面上,也可以只存储20字节的blob指针,以便优先选择存储在页面上的更小的列,这是合理的,因为可以存储更多的列。

So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump would be everything that I needed to backup my whole database.

因此,您可能认为正确的方法是将它们存储为单独的文件,但是使用blob存储数据有一些优点,第一个(在我看来)是备份。我管理一个小服务器,我不得不创建另一个子程序,只为了将存储的文件作为路径复制到另一个存储磁盘(我们买不起一个像样的磁带备份系统)。如果我设计我的应用程序使用blobs,一个简单的mysqldump将是我备份整个数据库所需的一切。

The advantage of storing blobs for backups are better discussed on this post where the person who answered had a similar problem than mine.

为备份存储blobs的好处在这篇文章中得到了更好的讨论,在这篇文章中,回答这个问题的人遇到了类似的问题。

Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.

另一个优点是安全性和易于管理权限和访问。MySQL服务器内的所有数据都是受密码保护的,您可以轻松地管理用户访问什么和不访问什么的权限。

In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.

在应用程序中,使用MySQL特权系统进行身份验证和使用。这肯定是一个优点,因为对于一个入侵者来说,从磁盘或没有访问权限的用户检索图像(或一个压缩文件)会有点困难。

So I'd say that

所以我说

If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.

如果你要管理你的MySQL和所有的数据,必须定期做备份,或者想要改变,甚至考虑未来的操作系统的变化,并且有一个不错的硬件,并优化你的MySQL,去做BLOB。

If you will not manage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar columns pointing to your files.

如果您不管理MySQL(例如在web主机中),并且不打算更改OS或进行备份,请使用指向文件的varchar列。

I hope it helped. Cheers

我希望它有帮助。干杯

#2


9  

If you store data is BLOB field, you are making it part of your object abstraction.

如果您存储的数据是BLOB字段,那么它就是对象抽象的一部分。

BLOB advantages:

团优势:

  1. Should you want to remove row with BLOB, or remove it as part of master/slave table relationship or maybe the whole table hierarchy, your BLOB is handled automatically and has same lifetime as any other object in database.

    如果您希望使用BLOB删除行,或者作为主/从表关系的一部分或整个表层次结构删除行,那么您的BLOB将自动处理,并且与数据库中的任何其他对象具有相同的生命周期。

  2. Your scripts do not have a need to access anything but database to get everything they require. In many situations, having direct file access open whole can of worms on how to bypass access or security restrictions. For example, with file access, they may have to mount filesystems which contain actual files. But with BLOB in database, you only have to be able to connect to database, no matter where you are.

    您的脚本不需要访问任何东西,只需要访问数据库就可以获得所需的一切。在许多情况下,拥有直接的文件访问开放整个蠕虫如何绕过访问或安全限制。例如,对于文件访问,他们可能必须装入包含实际文件的文件系统。但是在数据库中使用BLOB,您只需能够连接到数据库,无论您在哪里。

  3. If you store it in file and file is replaced, removed or no longer accessible, your database would never know - in effect, you cannot guarantee integrity. Also, it is difficult to reliably support multiple versions when using files. If you use and depend on transactions, it becomes almost impossible.

    如果您将它存储在文件中,而文件被替换、删除或不再访问,您的数据库将永远不会知道——实际上,您不能保证完整性。此外,在使用文件时很难可靠地支持多个版本。如果您使用和依赖事务,那么几乎不可能实现。

File advantages:

文件的优势:

  1. Some databases handle BLOBs rather poorly. For example, while official BLOB limit in MySQL is 4GB, but in reality it is only 1MB in default configuration. You can increase this to 16-32MB by tweaking both client and server configuration to increase MySQL command buffer, but this has a lot of other implications in terms of performance and security.

    有些数据库处理BLOBs非常糟糕。例如,虽然MySQL中的正式BLOB限制是4GB,但实际上在默认配置中只有1MB。您可以通过调整客户端和服务器配置来增加MySQL命令缓冲区,将其增加到16-32MB,但是这在性能和安全性方面有很多其他的影响。

  2. Even if database does not have some weird size limits, it always will have some overhead in storing BLOB compared to just a file. Also, if BLOB is large, some databases do not provide interface to access blob piece by piece, or stream it, which can be large impediment for your workflow.

    即使数据库没有一些奇怪的大小限制,与仅仅存储一个文件相比,它在存储BLOB时总是会有一些开销。另外,如果BLOB很大,一些数据库不提供逐块访问BLOB的接口,也不提供流接口,这可能会对您的工作流造成很大的障碍。

In the end, it is up to you. I typically try to keep it in BLOB, unless this creates unreasonable performance problems.

最后,这取决于你。我通常尝试将其保持在BLOB中,除非这样会产生不合理的性能问题。

#3


4  

Yes, MySQL blobs that don't fit within the same page as a row get stored on overflow pages Note that some blobs are small enough that they're stored with the rest of the row, like any other column. The blob pages are not adjacent to the page their row is stored on, so they may result in extra I/O to read them.

是的,不适合与行在同一页中的MySQL blobs存储在溢出页中,请注意,有些blobs足够小,可以与其他列一起存储。blob页与它们所存储的行不相邻,因此可能会产生额外的I/O来读取它们。

On the other hand, just like with any other page type, blob pages can occupy memory in the InnoDB buffer pool, so reading the blobs subsequently is very fast even if they are on separate pages. Files can be cached by the operating system, but typically they're read from disk.

另一方面,就像任何其他页面类型一样,blob页面可以占用InnoDB缓冲池中的内存,因此即使在单独的页面上读取blobs也会非常快。文件可以由操作系统缓存,但通常是从磁盘读取的。

Here are a few other factors that may affect your decision:

以下是一些可能影响你的决定的其他因素:

  • Blobs are stored logically with a row. This means if you DELETE the row, the associated blob is deleted automatically. But if you store the blob outside the database, you end up with orphaned blob files after you delete rows from the database. You have to do manual steps to find and delete these files.

    Blobs逻辑地存储在一行中。这意味着如果您删除该行,将自动删除相关的blob。但是如果您将blob存储在数据库之外,那么在从数据库中删除行之后,您将得到孤立的blob文件。您必须执行手动步骤来查找和删除这些文件。

  • Blobs stored in the row also follow transaction semantics. For instance, a new blob or an updated blob is invisible to other transactions until you commit. You can also roll back a change. Storing blobs in files outside the database makes this a lot harder.

    存储在行的Blobs也遵循事务语义。例如,一个新的blob或更新的blob在提交之前对其他事务是不可见的。您还可以回滚更改。在数据库之外的文件中存储blobs会使这变得更加困难。

  • When you back up a database containing blobs, the database is a lot bigger of course, but when you backup, you get all the data and associated blobs in one step. If you store blobs externally, you have to back up the database and also back up the filesystem where you store blob files. If you need to ensure that the data and blobs are captured from one instant in time, you pretty much need to use some kind of filesystem snapshots.

    当您备份一个包含blobs的数据库时,数据库当然要大得多,但是当您备份时,您将在一个步骤中获得所有的数据和相关的blobs。如果在外部存储blobs,则必须备份数据库并备份存储blob文件的文件系统。如果您需要确保数据和blob是在某一时刻捕获的,那么您非常需要使用某种类型的文件系统快照。

  • If you use replication, the only automatic way of ensuring the blobs get copied to the replication slave automatically is to store blobs in the database.

    如果使用复制,确保将blob自动复制到复制从服务器的惟一自动方法是将blobs存储在数据库中。

#4


2  

The better approach is to store your file in the filesystem folder and point to their paths through a varchar field in the database. One of the drawbacks of saving files in the database is slowing it or reducing its performance.

更好的方法是将文件存储在文件系统文件夹中,并通过数据库中的varchar字段指向它们的路径。在数据库中保存文件的缺点之一是减慢它或降低它的性能。

#5


2  

Filesystem access will be faster than through the database. Blobs columns have some disadvantages in terms of indexing/sorting etc, which you could do with your filename column if you wished to in the future.

文件系统访问将比通过数据库更快。Blobs列在索引/排序等方面有一些缺点,如果您希望在将来使用,您可以使用您的filename列来做这些操作。

The database can also grow quickly with large blobs and then tasks like backing up become slower. I would go with a file location in database with the physical storage on the file system.

数据库也可以通过大的blob快速增长,然后像备份这样的任务变得更慢。我将使用数据库中的文件位置和文件系统中的物理存储。

#1


12  

I read that the data type can be used to store files.

我读到数据类型可以用来存储文件。

According to MySQL manual page on Blob, A BLOB is a binary large object that can hold a variable amount of data.

根据Blob上的MySQL手册页,Blob是一个二进制大对象,可以容纳可变数量的数据。

Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.

由于它是一种特定于存储二进制数据的数据类型,因此使用它以二进制格式存储文件是很常见的,存储图像文件在web应用程序中非常常见。

For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.

对于web应用程序,这意味着您首先需要将文件转换为二进制格式,然后将其存储起来,每次需要检索文件时,您都需要执行反向过程,将其转换为原始格式。

Besides that, storing large amount of data in your db MAY slow it down. Specially in systems that are not dedicated only to host a database.

此外,在数据库中存储大量数据可能会降低速度。特别是在不专门用于承载数据库的系统中。

I also read that an alternative is to store the file on disk and include a pointer to its location in the database

我还看到另一种方法是将文件存储在磁盘上,并包含指向其在数据库中的位置的指针

Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAY speed up your database when dealing with large amount of data.

考虑到以上所有考虑,web应用程序的一个常见实践是将文件存储在MySQL之外的其他地方,然后在数据库中简单地存储它的路径。当处理大量数据时,这种方法可能会加快数据库的速度。

But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.

但我有点困惑,因为我读过blob字段不存储在行中,需要单独查找以检索其内容。

In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blog on how the blob is stored in MySQL.

事实上,这将取决于您正在使用的存储引擎,因为每个引擎处理数据并以不同的方式存储它。对于适合关系数据库的InnoDB引擎,您可能想要阅读这篇关于blob在MySQL中如何存储的MySQL性能博客文章。

But in abstract, on MySQL 5 and forward the blob is stored as following:

但抽象地说,在MySQL 5和forward上,blob存储如下:

Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.

Innodb可以将整个blob存储在行页面上,也可以只存储20字节的blob指针,以便优先选择存储在页面上的更小的列,这是合理的,因为可以存储更多的列。

So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump would be everything that I needed to backup my whole database.

因此,您可能认为正确的方法是将它们存储为单独的文件,但是使用blob存储数据有一些优点,第一个(在我看来)是备份。我管理一个小服务器,我不得不创建另一个子程序,只为了将存储的文件作为路径复制到另一个存储磁盘(我们买不起一个像样的磁带备份系统)。如果我设计我的应用程序使用blobs,一个简单的mysqldump将是我备份整个数据库所需的一切。

The advantage of storing blobs for backups are better discussed on this post where the person who answered had a similar problem than mine.

为备份存储blobs的好处在这篇文章中得到了更好的讨论,在这篇文章中,回答这个问题的人遇到了类似的问题。

Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.

另一个优点是安全性和易于管理权限和访问。MySQL服务器内的所有数据都是受密码保护的,您可以轻松地管理用户访问什么和不访问什么的权限。

In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.

在应用程序中,使用MySQL特权系统进行身份验证和使用。这肯定是一个优点,因为对于一个入侵者来说,从磁盘或没有访问权限的用户检索图像(或一个压缩文件)会有点困难。

So I'd say that

所以我说

If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.

如果你要管理你的MySQL和所有的数据,必须定期做备份,或者想要改变,甚至考虑未来的操作系统的变化,并且有一个不错的硬件,并优化你的MySQL,去做BLOB。

If you will not manage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar columns pointing to your files.

如果您不管理MySQL(例如在web主机中),并且不打算更改OS或进行备份,请使用指向文件的varchar列。

I hope it helped. Cheers

我希望它有帮助。干杯

#2


9  

If you store data is BLOB field, you are making it part of your object abstraction.

如果您存储的数据是BLOB字段,那么它就是对象抽象的一部分。

BLOB advantages:

团优势:

  1. Should you want to remove row with BLOB, or remove it as part of master/slave table relationship or maybe the whole table hierarchy, your BLOB is handled automatically and has same lifetime as any other object in database.

    如果您希望使用BLOB删除行,或者作为主/从表关系的一部分或整个表层次结构删除行,那么您的BLOB将自动处理,并且与数据库中的任何其他对象具有相同的生命周期。

  2. Your scripts do not have a need to access anything but database to get everything they require. In many situations, having direct file access open whole can of worms on how to bypass access or security restrictions. For example, with file access, they may have to mount filesystems which contain actual files. But with BLOB in database, you only have to be able to connect to database, no matter where you are.

    您的脚本不需要访问任何东西,只需要访问数据库就可以获得所需的一切。在许多情况下,拥有直接的文件访问开放整个蠕虫如何绕过访问或安全限制。例如,对于文件访问,他们可能必须装入包含实际文件的文件系统。但是在数据库中使用BLOB,您只需能够连接到数据库,无论您在哪里。

  3. If you store it in file and file is replaced, removed or no longer accessible, your database would never know - in effect, you cannot guarantee integrity. Also, it is difficult to reliably support multiple versions when using files. If you use and depend on transactions, it becomes almost impossible.

    如果您将它存储在文件中,而文件被替换、删除或不再访问,您的数据库将永远不会知道——实际上,您不能保证完整性。此外,在使用文件时很难可靠地支持多个版本。如果您使用和依赖事务,那么几乎不可能实现。

File advantages:

文件的优势:

  1. Some databases handle BLOBs rather poorly. For example, while official BLOB limit in MySQL is 4GB, but in reality it is only 1MB in default configuration. You can increase this to 16-32MB by tweaking both client and server configuration to increase MySQL command buffer, but this has a lot of other implications in terms of performance and security.

    有些数据库处理BLOBs非常糟糕。例如,虽然MySQL中的正式BLOB限制是4GB,但实际上在默认配置中只有1MB。您可以通过调整客户端和服务器配置来增加MySQL命令缓冲区,将其增加到16-32MB,但是这在性能和安全性方面有很多其他的影响。

  2. Even if database does not have some weird size limits, it always will have some overhead in storing BLOB compared to just a file. Also, if BLOB is large, some databases do not provide interface to access blob piece by piece, or stream it, which can be large impediment for your workflow.

    即使数据库没有一些奇怪的大小限制,与仅仅存储一个文件相比,它在存储BLOB时总是会有一些开销。另外,如果BLOB很大,一些数据库不提供逐块访问BLOB的接口,也不提供流接口,这可能会对您的工作流造成很大的障碍。

In the end, it is up to you. I typically try to keep it in BLOB, unless this creates unreasonable performance problems.

最后,这取决于你。我通常尝试将其保持在BLOB中,除非这样会产生不合理的性能问题。

#3


4  

Yes, MySQL blobs that don't fit within the same page as a row get stored on overflow pages Note that some blobs are small enough that they're stored with the rest of the row, like any other column. The blob pages are not adjacent to the page their row is stored on, so they may result in extra I/O to read them.

是的,不适合与行在同一页中的MySQL blobs存储在溢出页中,请注意,有些blobs足够小,可以与其他列一起存储。blob页与它们所存储的行不相邻,因此可能会产生额外的I/O来读取它们。

On the other hand, just like with any other page type, blob pages can occupy memory in the InnoDB buffer pool, so reading the blobs subsequently is very fast even if they are on separate pages. Files can be cached by the operating system, but typically they're read from disk.

另一方面,就像任何其他页面类型一样,blob页面可以占用InnoDB缓冲池中的内存,因此即使在单独的页面上读取blobs也会非常快。文件可以由操作系统缓存,但通常是从磁盘读取的。

Here are a few other factors that may affect your decision:

以下是一些可能影响你的决定的其他因素:

  • Blobs are stored logically with a row. This means if you DELETE the row, the associated blob is deleted automatically. But if you store the blob outside the database, you end up with orphaned blob files after you delete rows from the database. You have to do manual steps to find and delete these files.

    Blobs逻辑地存储在一行中。这意味着如果您删除该行,将自动删除相关的blob。但是如果您将blob存储在数据库之外,那么在从数据库中删除行之后,您将得到孤立的blob文件。您必须执行手动步骤来查找和删除这些文件。

  • Blobs stored in the row also follow transaction semantics. For instance, a new blob or an updated blob is invisible to other transactions until you commit. You can also roll back a change. Storing blobs in files outside the database makes this a lot harder.

    存储在行的Blobs也遵循事务语义。例如,一个新的blob或更新的blob在提交之前对其他事务是不可见的。您还可以回滚更改。在数据库之外的文件中存储blobs会使这变得更加困难。

  • When you back up a database containing blobs, the database is a lot bigger of course, but when you backup, you get all the data and associated blobs in one step. If you store blobs externally, you have to back up the database and also back up the filesystem where you store blob files. If you need to ensure that the data and blobs are captured from one instant in time, you pretty much need to use some kind of filesystem snapshots.

    当您备份一个包含blobs的数据库时,数据库当然要大得多,但是当您备份时,您将在一个步骤中获得所有的数据和相关的blobs。如果在外部存储blobs,则必须备份数据库并备份存储blob文件的文件系统。如果您需要确保数据和blob是在某一时刻捕获的,那么您非常需要使用某种类型的文件系统快照。

  • If you use replication, the only automatic way of ensuring the blobs get copied to the replication slave automatically is to store blobs in the database.

    如果使用复制,确保将blob自动复制到复制从服务器的惟一自动方法是将blobs存储在数据库中。

#4


2  

The better approach is to store your file in the filesystem folder and point to their paths through a varchar field in the database. One of the drawbacks of saving files in the database is slowing it or reducing its performance.

更好的方法是将文件存储在文件系统文件夹中,并通过数据库中的varchar字段指向它们的路径。在数据库中保存文件的缺点之一是减慢它或降低它的性能。

#5


2  

Filesystem access will be faster than through the database. Blobs columns have some disadvantages in terms of indexing/sorting etc, which you could do with your filename column if you wished to in the future.

文件系统访问将比通过数据库更快。Blobs列在索引/排序等方面有一些缺点,如果您希望在将来使用,您可以使用您的filename列来做这些操作。

The database can also grow quickly with large blobs and then tasks like backing up become slower. I would go with a file location in database with the physical storage on the file system.

数据库也可以通过大的blob快速增长,然后像备份这样的任务变得更慢。我将使用数据库中的文件位置和文件系统中的物理存储。