解析外部XML更有效还是访问数据库更有效?

I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls instead for that same information.

我想知道在处理web服务API返回的XML时,无论是更好(快),只是每次调用外部服务和解析XML(使用ElementTree)显示在你的网站或保存记录到数据库中(在解析一次或每天但是很多时候需要)而使数据库调用相同的信息。

9 个解决方案

#1

Everyone is being very polite in answering this question: "it depends"... "you should test"... and so forth.

每个人都很礼貌地回答这个问题:“看情况”……“你应该测试”……等等。

True, the question does not go into great detail about the application and network topographies involved, but if the question is even being asked, then it's likely a) the DB is "local" to the application (on the same subnet, or the same machine, or in memory), and b) the webservice is not. After all, the OP uses the phrases "external service" and "display on your own site." The phrase "parsing it once or however many times you need to each day" also suggests a set of data that doesn't exactly change every second.

没错,问题不进入伟大的细节所涉及的应用程序和网络地形,但如果甚至被问这个问题,那么它可能是一个DB是“本地”到应用程序(在同一子网或同一台机器上,或在内存中),和b)webservice不是。毕竟，OP使用短语“外部服务”和“在您自己的站点上显示”。短语“解析它一次或每天需要多少次”也表明一组数据并不是每一秒都在变化。

The classic SOA myth is that the network is always available; going a step further, I'd say it's a myth that the network is always available with low latency. Unless your own internal systems are crap, sending an HTTP query across the Internet will always be slower than a query to a local DB or DB cluster. There are any number of reasons for this: number of hops to the remote server, outage or degradation issues that you can't control on the remote end, and the internal processing time for the remote web service application to analyze your request, hit its own persistence backend (aka DB), and return a result.

经典的SOA神话是，网络总是可用的;更进一步说，我认为网络总是以低延迟可用是一个神话。除非您自己的内部系统是垃圾，否则在Internet上发送HTTP查询总是比对本地DB或DB集群的查询慢。有很多原因:啤酒花与远程服务器数量,故障或退化问题,你不能控制在远程端,和远程web服务应用程序的内部处理时间分析你的请求,打击自己的持久性的后端(即DB),并返回一个结果。

Fire up your app. Do some latency and response times to your DB. Now do the same to a remote web service. Unless your DB is also across the Internet, you'll notice a huge difference.

启动你的应用程序。对你的数据库做一些延迟和响应时间。现在对远程web服务执行相同的操作。除非你的数据库也在互联网上，否则你会注意到一个巨大的差异。

It's not at all hard for a competent technologist to scale a DB, or for you to completely remove the DB from caching using memcached and other paradigms; the latency between servers sitting near each other in the datacentre is monumentally less than between machines over the Internet (and more secure, to boot). Even if achieving this scale requires some thought, it's under your control, unlike a remote web service whose scaling and latency are totally opaque to you. I, for one, would not be too happy with the idea that the availability and responsiveness of my site are based on someone else entirely.

对于一个有能力的技术人员来说，扩展一个DB并不难，或者对于您来说，使用memcached和其他范例将DB从缓存中完全删除也不难;在数据中心中，服务器之间的延迟时间明显低于Internet上的机器之间的延迟时间(而且更安全)。即使实现这种规模需要一些思考，它也在您的控制之下，不像远程web服务，它的扩展和延迟对您来说是完全不透明的。就我个人而言，如果我认为我的站点的可用性和响应性完全建立在别人的基础上，我不会太高兴。

Finally, what happens if the remote web service is unavailable? Imagine a world where every request to your site involves a request over the Internet to some other site. What happens if that other site is unavailable? Do your users watch a spinning cursor of death for several hours? Do they enjoy an Error 500 while your site borks on this unexpected external dependency?

最后，如果远程web服务不可用，会发生什么情况?想象这样一个世界，对你的站点的每个请求都涉及到对其他站点的请求。如果其他站点不可用会发生什么?你的用户会连续几个小时观看旋转的死亡光标吗?当你的站点在这个意料之外的外部依赖时，他们是否喜欢一个错误500 ?

If you find yourself adopting an architecture whose fundamental features depend on a remote Internet call for every request, think very carefully about your application before deciding if you can live with the consequences.

如果您发现自己采用了一种架构，其基本特性依赖于对每个请求的远程Internet调用，那么请仔细考虑您的应用程序，然后再决定是否可以接受结果。

#2

First off -- measure. Don't just assume that one is better or worse than the other.

首先,测量。不要想当然地认为一个比另一个好或坏。

Second, if you really don't want to measure, I'd guess the database is a bit faster (assuming the database is relatively local compared to the web service). Network latency usually is more than parse time unless we're talking a really complex database or really complex XML.

其次，如果您真的不想度量，我猜数据库会快一点(假设数据库相对于web服务来说是本地的)。网络延迟通常比解析时间要长，除非我们讨论的是非常复杂的数据库或非常复杂的XML。

#3

Consuming the webservices is more efficient because there are a lot more things you can do to scale your webservices and webserver (via caching, etc.). By consuming the middle layer, you also have the options to change the returned data format (e.g. you can decide to use JSON rather than XML). Scaling database is much harder (involving replication, etc.) so in general, reduce hits on DB if you can.

使用webservices会更有效，因为您可以做很多事情来扩展您的webservices和webserver(通过缓存等)。通过使用中间层，您还可以更改返回的数据格式(例如，您可以决定使用JSON而不是XML)。扩展数据库比较困难(包括复制等等)，所以一般来说，如果可以的话，可以减少DB的点击率。

#4

There is not enough information to be able to say for sure in the general case. Why don't you do some tests and find out? Since it sounds like you are using python you will probably want to use the timeit module.

在一般情况下，没有足够的信息可以肯定地说。你为什么不做些测试找出答案呢?由于您似乎正在使用python，所以您可能希望使用timeit模块。

Some things that could effect the result:

一些可能影响结果的因素:

Performance of the web service you are using
您正在使用的web服务的性能
Reliability of the web service you are using
您正在使用的web服务的可靠性
Distance between servers
服务器之间的距离
Amount of data being returned
返回的数据量

I would guess that if it is cacheable, that a cached version of the data will be faster, but that does not necessarily mean using a local RDBMS, it might mean something like memcached or an in memory cache in your application.

我想如果它是可缓存的，那么缓存的数据版本会更快，但这并不一定意味着使用本地RDBMS，它可能意味着您的应用程序中的memcached或内存缓存之类的东西。

#5

It depends - who is calling the web service? Is the web service called every time the user hits the page? If that's the case I'd recommend introducing a caching layer of some sort - many web service API's throttle the amount of hits you can make per hour.

这取决于——谁在调用web服务?每次用户点击页面时都调用web服务吗?如果是这样的话，我建议引入某种缓存层——许多web服务API都限制了每小时的点击量。

Whether you choose to parse the cached XML on the fly or call the data from a database probably won't matter (unless we are talking enterprise scaling here). Personally, I'd much rather make a simple SQL call than write a DOM Parser (which is much more prone to exceptional scenarios).

无论您选择动态地解析缓存的XML，还是从数据库调用数据，这可能都无关紧要(除非我们在这里讨论企业扩展)。就我个人而言，我宁愿进行一个简单的SQL调用，也不愿编写DOM解析器(它更容易出现异常情况)。

#6

It depends from case to case, you'll have to measure (or at least make an educated guess).

这要视情况而定，你必须衡量(或者至少做一个有根据的猜测)。

You'll have to consider several things.

你必须考虑几件事。

Web service

Web服务

it might hit database itself
它可能会攻击数据库本身
it can be cached
它可以被缓存
it will introduce network latency and might be unreliable
它将引入网络延迟，并且可能不可靠
or it could be in local network and faster than accessing even local disk
也可能是在本地网络中，甚至比访问本地磁盘还要快

might be slow since it needs to access disk (although databases have internal caches, but those are usually not targeted)
可能很慢，因为它需要访问磁盘(虽然数据库有内部缓存，但这些通常不是目标)
should be reliable
应该是可靠的

Technology itself doesn't mean much in terms of speed - in one case database parses SQL, in other XML parser parses XML, and database is usually acessed via socket as well, so you have both parsing and network in either case.

技术本身在速度方面并没有多大意义——在一种情况下，数据库解析SQL，在另一种XML解析器解析XML，数据库通常也通过套接字进行插入，因此在任何一种情况下都有解析和网络。

Caching data in your application if applicable is probably a good idea.

在应用程序中缓存数据可能是一个好主意。

#7

As a few people have said, it depends, and you should test it.

正如一些人所说，这要看情况，你应该测试一下。

Often external services are slow, and caching them locally (in a database in memory, e.g., with memcached) is faster. But perhaps not.

外部服务通常很慢，并且本地缓存它们(在内存中的数据库中，例如使用memcached)更快。但也许不是。

Fortunately, it's cheap and easy to test.

幸运的是，它很便宜而且很容易测试。

#8

Test definitely. As a rule of thumb, XML is good for communicating between apps, but once you have the data inside of your app, everything should go into a database table. This may not apply in all cases, but 95% of the time it has for me. Anytime I ever tried to store data any other way (ex. XML in a content management system) I ended up wishing I would have just used good old sprocs and sql server.

测试肯定。根据经验，XML适合在应用程序之间进行通信，但是一旦在应用程序内部拥有了数据，所有内容都应该放到数据库表中。这可能并不适用于所有情况，但对我来说有95%的情况是这样的。每当我尝试以任何其他方式存储数据(例如，在内容管理系统中存储XML)时，我最后都希望我能使用好的旧sprocs和sql server。

#9

It sounds like you essentially want to cache results, and are wondering if it's worth it. But if so, I would NOT use a database (I assume you are thinking of a relational DB): RDBMSs are not good for caching; even though many use them. You don't need persistence nor ACID. If choice was between Oracle/MySQL and external web service, I would start with just using service.

听起来你基本上想缓存结果，并且想知道它是否值得。但如果是这样，我就不会使用数据库(我假设您正在考虑关系数据库):rdbms不适合缓存;尽管许多人使用它们。你不需要坚持，也不需要酸。如果在Oracle/MySQL和外部web服务之间进行选择，我将从使用服务开始。

Instead, consider real caching systems; local or not (memcache, simple in-memory caches etc). Or if you must use a DB, use key/value store, BDB works well. Store response message in its serialized form (XML), try to fetch from cache, if not, from service, parse. Or if there's a convenient and more compact serialization, store and fetch that.

相反，考虑真正的缓存系统;本地或不本地(memcache，简单内存缓存等)。或者，如果您必须使用DB，请使用key/value store, BDB工作得很好。将响应消息存储在它的序列化形式(XML)中，尝试从缓存(如果不是的话)获取服务解析。或者如果有更方便更紧凑的序列化，存储并获取它。

#1