从巨大的数据库表中检索结果集的策略

时间:2022-02-03 10:07:26

It's my belief that many people run into this problem: from a frontend JSP page, a user set up some criteria based on which a SQL is constructed and used to retrieve results from one or more database table(s). The issue is, this table grows by 1 mill per day and becomes gigantic.

我相信很多人遇到这个问题:从前端JSP页面,用户根据构建SQL的一些标准设置并用于从一个或多个数据库表中检索结果。问题是,这张表每天增加1万,并且变得巨大。

I know there is not a definite answer to this question: how can we do to expedite this procedure? Indexing might be one (which I heard a lot but know little about), and another thing on my mind is to use some customized caching solution such as Gigaspace. Will Hibernate help in this case?

我知道这个问题没有明确答案:我们如何才能加快这一程序?索引可能是一个(我听到了很多,但知之甚少),另外我想到的是使用一些定制的缓存解决方案,如Gigaspace。在这种情况下,Hibernate会帮忙吗?

Anyone else wanna add their 2 cents?

还有人想加2美分吗?

Thanks a lot! John

非常感谢!约翰

2 个解决方案

#1


2  

Well erm yes you do need to index your database!

嗯,是的,你需要索引你的数据库!

If you're not even indexing your database, then you probably need to start by reading a little bit about how to appropriately index your database.

如果您甚至没有索引数据库,那么您可能需要先阅读一些关于如何正确索引数据库的内容。

Then, it shouldn't matter per se about the number of millions of rows in your database tables: the very raison d'etre of a decent database system is to cope with tables with millions of rows. But you will want to make sure is that the specification of which rows are actually retrieved from those millions is sensible and that the queries in question can go off appropriate indexes (e.g. because of parameters entered by the user to narrow them down). "Adding an index" isn't a magical panacea necessarily: you need to make sure that you have indexes added that are appropriate to how your queries end up looking by the time they hit the database.

然后,关于数据库表中数百万行的数据本身并不重要:一个体面的数据库系统的存在理由是应对具有数百万行的表。但是,您需要确保从数百万个实际检索哪些行的规范是明智的,并且所讨论的查询可能偏离适当的索引(例如,由于用户输入的参数以缩小它们)。 “添加索引”并不是一个神奇的灵丹妙药:您需要确保添加的索引适合于查询到达数据库时最终的查找方式。

I wouldn't personally go down the road of adding spurious caching and other layers of complexity until (a) you've actually ascertained that in practice that you need them and (b) you can actually ascertain that the layers you are adding will solve the problem that you want them to solve. If you haven't got round to indexing your database yet, then I would really start by just building a simple, appropriately optimised solution and take it from there.

我不会亲自走上添加虚假缓存和其他复杂层次的道路,直到(a)你实际上确定在实践中你需要它们和(b)你实际上可以确定你添加的层将解决你希望他们解决的问题。如果您还没有完成索引数据库,那么我真的要从构建一个简单,适当优化的解决方案开始,然后从那里开始。

#2


1  

An index is a must with that amount of data or even a faction of that! As far as any other answers goes it really depends on what you plan on doing with that amount of data because one strategy isn't going to cover all use-cases.

索引是必须拥有的数据量甚至是一个派系!就任何其他答案而言,它实际上取决于您计划使用该数据量的计划,因为一种策略不会涵盖所有用例。

#1


2  

Well erm yes you do need to index your database!

嗯,是的,你需要索引你的数据库!

If you're not even indexing your database, then you probably need to start by reading a little bit about how to appropriately index your database.

如果您甚至没有索引数据库,那么您可能需要先阅读一些关于如何正确索引数据库的内容。

Then, it shouldn't matter per se about the number of millions of rows in your database tables: the very raison d'etre of a decent database system is to cope with tables with millions of rows. But you will want to make sure is that the specification of which rows are actually retrieved from those millions is sensible and that the queries in question can go off appropriate indexes (e.g. because of parameters entered by the user to narrow them down). "Adding an index" isn't a magical panacea necessarily: you need to make sure that you have indexes added that are appropriate to how your queries end up looking by the time they hit the database.

然后,关于数据库表中数百万行的数据本身并不重要:一个体面的数据库系统的存在理由是应对具有数百万行的表。但是,您需要确保从数百万个实际检索哪些行的规范是明智的,并且所讨论的查询可能偏离适当的索引(例如,由于用户输入的参数以缩小它们)。 “添加索引”并不是一个神奇的灵丹妙药:您需要确保添加的索引适合于查询到达数据库时最终的查找方式。

I wouldn't personally go down the road of adding spurious caching and other layers of complexity until (a) you've actually ascertained that in practice that you need them and (b) you can actually ascertain that the layers you are adding will solve the problem that you want them to solve. If you haven't got round to indexing your database yet, then I would really start by just building a simple, appropriately optimised solution and take it from there.

我不会亲自走上添加虚假缓存和其他复杂层次的道路,直到(a)你实际上确定在实践中你需要它们和(b)你实际上可以确定你添加的层将解决你希望他们解决的问题。如果您还没有完成索引数据库,那么我真的要从构建一个简单,适当优化的解决方案开始,然后从那里开始。

#2


1  

An index is a must with that amount of data or even a faction of that! As far as any other answers goes it really depends on what you plan on doing with that amount of data because one strategy isn't going to cover all use-cases.

索引是必须拥有的数据量甚至是一个派系!就任何其他答案而言,它实际上取决于您计划使用该数据量的计划,因为一种策略不会涵盖所有用例。