什么数据库足以记录应用程序?

时间:2022-06-03 22:02:41

I am writing a web application with nodeJS that can be used by other applications to store logs and accessed later in a web interface or by applications themselves providing an API. Similar to Graylog2 but schema free.

我正在编写一个带有nodeJS的Web应用程序,其他应用程序可以使用它来存储日志,稍后可以在Web界面中访问,也可以由提供API的应用程序本身访问。与Graylog2类似,但架构免费。

I've already tried couchDB in which each document would be a log doc but since I'm not really using revisions it seems to me I'm not using its all features. And beside that I think if the logs exceeds a limit it would be pretty hard to manage in couchDB.

我已经尝试过couchDB,其中每个文档都是一个日志文档,但由于我没有真正使用修订版,在我看来我并没有使用它的所有功能。除此之外,我认为如果日志超出限制,那么在couchDB中管理将非常困难。

What I'm really looking for, is a big array of logs that can be sorted, filtered, searched and capped on. Then the last events of it accessed. It should be schema free and writing to it should be non-blocking.

我真正想要的是一大堆日志,可以对其进行排序,过滤,搜索和封顶。然后访问它的最后一个事件。它应该是无架构的,写入它应该是非阻塞的。

I'm considering using Cassandra(I'm not really familiar with it) due to the points here said. MongoDB seems good here too, since Graylog2 uses in mongoDB, in here it has some good points about it.

我正在考虑使用Cassandra(我真的不熟悉它),因为这里说的要点。 MongoDB在这里似乎也很好,因为Graylog2在mongoDB中使用,在这里它有一些关于它的好点。

I've already have seen this question, but not satisfied with the answers.

我已经看过这个问题,但不满意答案。

Edit: For some reasons I can't use Cassandra in production, now I'm trying MongoDB.

编辑:由于某些原因我不能在生产中使用Cassandra,现在我正在尝试使用MongoDB。

One more reason to use mongoDB : http://www.slideshare.net/WombatNation/logging-app-behavior-to-mongo-db

使用mongoDB的另一个原因是:http://www.slideshare.net/WombatNation/logging-app-behavior-to-mongo-db

More edits:

更多编辑:

It is similar to graylog2, but the difference I want to make that instead of having a message field, having fileds defined by the client, which is why I want it to be schema free, and because of that, I may need to query in the user defined fields. We can build it on SQL, but querying on the user defined fields would be reinventing wheel. Same goes with files.

它类似于graylog2,但我想要的不同之处在于,而不是有一个消息字段,有客户端定义的文件,这就是为什么我希望它是免费的模式,因此,我可能需要查询用户定义的字段。我们可以在SQL上构建它,但查询用户定义的字段将重新发明*。与文件相同。

Technically what I'm looking for is to get rich statistical data in the end, or easy debugging and a lot of other stuff that we can't get out of the logs.

从技术上讲,我正在寻找的是最终获得丰富的统计数据,或者简单的调试以及我们无法从日志中获取的许多其他内容。

3 个解决方案

#1


2  

General Approach

You have a lot of work ahead of you. Whichever database you use, you have many features which you must build on top of the DB foundation. You have done good research about all of your options. It sounds like you suspect that all have pros and cons but all are imperfect. Your suspicion is correct. At this point it is probably time to start writing code.

你有很多工作要做。无论使用哪种数据库,您都必须在数据库基础之上构建许多功能。您已经对所有选项进行了很好的研究。听起来你怀疑所有人都有利有弊,但都不完美。你的怀疑是正确的。此时可能是开始编写代码的时候了。

You could just choose one arbitrarily and start building your application. If your guess was correct that the pros and cons balance out and it's all about the same, then why not simply start building immediately? When you hit difficulty X on your database, remember that it gave you convenience Y and Z and that's just life.

您可以随意选择一个并开始构建您的应用程序。如果你的猜测是正确的,利弊平衡,而且一切都差不多,那为什么不立即开始建设?当您在数据库中遇到难度X时,请记住它为您提供了方便的Y和Z,这就是生活。

You could also establish the fundamental core of your application and implement various prototypes on each of the databases. That might give you true insight to help discriminate between the databases for your specific application. For example, besides the interface, indexing, and querying questions, what about deployment? What about backups? What about maintenance and security? Maybe "wasting" time to build the same prototype on each platform will make the answer very clear for you.

您还可以建立应用程序的基础核心,并在每个数据库上实现各种原型。这可能会为您提供真正的洞察力,帮助您区分特定应用程序的数据库。例如,除了接口,索引和查询问题之外,部署还有什么?备份怎么样?维护和安全怎么样?也许“浪费”时间在每个平台上构建相同的原型将使您的答案非常明确。

Notes about CouchDB

I suppose CouchDB is "NoSQL" if you say so. Other things which are "no SQL" include bananas, poems, and cricket. It is not a very meaningful word. We have general-purpose languages and domain-specific languages; similarly CouchDB is a domain-specific database. It can save you time if you need the following features:

如果你这么说,我想CouchDB是“NoSQL”。其他“没有SQL”的东西包括香蕉,诗歌和板球。这不是一个非常有意义的词。我们有通用语言和特定领域的语言;类似地,CouchDB是特定于域的数据库。如果您需要以下功能,它可以节省您的时间:

  • Built-in web API: clients may query directly
  • 内置Web API:客户端可以直接查询
  • Incremental map-reduce: CouchDB runs the job once, but you can query repeatedly at no cost. Updates to the data set are immediately reflected in the map/reduce result without full re-processing
  • 增量map-reduce:CouchDB运行一次作业,但您可以免费重复查询。数据集的更新会立即反映在map / reduce结果中,而无需完全重新处理
  • Easy to start small but expand to large clusters without changing application code.
  • 易于启动,但可扩展到大型集群,而无需更改应用程序代码。

#2


3  

Where shall it be stored and how shall it be retrieved?

它应该存放在哪里以及如何检索?

I guess it depends on how much data you are dealing with. If you have a huge amount (terabytes and petabytes per day) of logs then Apache Kafka, which is designed to allow data to be PULLED by HDFS in parallel, is a interesting solution - still in the incubation stage. I believe if you want to consume Kafka messages with MongoDb, you'd need to develop your own adapter to ingest it as a consumer of a particular Kafka topic. Although MongoDb data (e.g. shards and replicas) is distributed, it may be a sequential process to ingest each message. So, there may be a bottleneck or even race conditions depending on the rate and size of message traffic. Kafka is optimized to pump and append that data to HDFS nodes using message brokers FAST. Then once it is in HDFS you can map/reduce to analyze your information in a variety of ways.

我想这取决于你要处理的数据量。如果您拥有大量(每天TB级和PB级)的日志,那么Apache Kafka(旨在允许HDFS并行提取数据)是一个有趣的解决方案 - 仍处于孵化阶段。我相信如果你想使用MongoDb消费Kafka消息,你需要开发自己的适配器来将其作为特定Kafka主题的消费者来摄取。虽然MongoDb数据(例如,分片和副本)是分布式的,但它可以是摄取每个消息的顺序过程。因此,根据消息流量的速率和大小,可能存在瓶颈甚至竞争条件。 Kafka经过优化,可以使用消息代理FAST将数据泵送并附加到HDFS节点。然后,一旦它在HDFS中,您可以映射/缩小以便以各种方式分析您的信息。

If MongoDb can handle the ingestion load, then it is an excellent, scalable, real-time solution to find information, particularly documents. Otherwise, if you have more time to process data (i.e. batch processes that take hours and sometimes days), then Hadoop or some other Map Reduce database is warranted. Finally, Kafka can distribute that load of messages and hookup that fire-hose to a variety of consumers. Overall, these new technologies spread the load and huge amounts of data across cheap hardware using software to manage failure and recover with a very low probability of losing data.

如果MongoDb可以处理摄取负载,那么它是一种优秀,可扩展的实时解决方案,可以查找信息,尤其是文档。否则,如果您有更多时间处理数据(即批处理过程需要数小时甚至数天),则保证Hadoop或其他Map Reduce数据库。最后,Kafka可以将这些消息和连接消息分发给各种消费者。总的来说,这些新技术使用软件在廉价硬件上分散负载和大量数据,以便以极低的丢失数据概率来管理故障和恢复。

Even with a small amount of data, MongoDb is a nice option to traditional relational database solutions which require more overhead of developer resources to design, build and maintain.

即使只有少量数据,MongoDb也是传统关系数据库解决方案的一个不错的选择,它需要更多的开发人员资源来设计,构建和维护。

#3


1  

Have you considered Apache Kafka?

你考虑过Apache Kafka吗?

Kafka is a distributed messaging system developed at LinkedIn for collecting and delivering high volumes of log data with low latency. Our system incorporates ideas from existing log aggregators and messaging systems, and is suitable for both offline and online message consumption.

Kafka是一个在LinkedIn开发的分布式消息传递系统,用于以低延迟收集和提供大量日志数据。我们的系统融合了现有日志聚合器和消息系统的创意,适用于离线和在线消息消费。

#1


2  

General Approach

You have a lot of work ahead of you. Whichever database you use, you have many features which you must build on top of the DB foundation. You have done good research about all of your options. It sounds like you suspect that all have pros and cons but all are imperfect. Your suspicion is correct. At this point it is probably time to start writing code.

你有很多工作要做。无论使用哪种数据库,您都必须在数据库基础之上构建许多功能。您已经对所有选项进行了很好的研究。听起来你怀疑所有人都有利有弊,但都不完美。你的怀疑是正确的。此时可能是开始编写代码的时候了。

You could just choose one arbitrarily and start building your application. If your guess was correct that the pros and cons balance out and it's all about the same, then why not simply start building immediately? When you hit difficulty X on your database, remember that it gave you convenience Y and Z and that's just life.

您可以随意选择一个并开始构建您的应用程序。如果你的猜测是正确的,利弊平衡,而且一切都差不多,那为什么不立即开始建设?当您在数据库中遇到难度X时,请记住它为您提供了方便的Y和Z,这就是生活。

You could also establish the fundamental core of your application and implement various prototypes on each of the databases. That might give you true insight to help discriminate between the databases for your specific application. For example, besides the interface, indexing, and querying questions, what about deployment? What about backups? What about maintenance and security? Maybe "wasting" time to build the same prototype on each platform will make the answer very clear for you.

您还可以建立应用程序的基础核心,并在每个数据库上实现各种原型。这可能会为您提供真正的洞察力,帮助您区分特定应用程序的数据库。例如,除了接口,索引和查询问题之外,部署还有什么?备份怎么样?维护和安全怎么样?也许“浪费”时间在每个平台上构建相同的原型将使您的答案非常明确。

Notes about CouchDB

I suppose CouchDB is "NoSQL" if you say so. Other things which are "no SQL" include bananas, poems, and cricket. It is not a very meaningful word. We have general-purpose languages and domain-specific languages; similarly CouchDB is a domain-specific database. It can save you time if you need the following features:

如果你这么说,我想CouchDB是“NoSQL”。其他“没有SQL”的东西包括香蕉,诗歌和板球。这不是一个非常有意义的词。我们有通用语言和特定领域的语言;类似地,CouchDB是特定于域的数据库。如果您需要以下功能,它可以节省您的时间:

  • Built-in web API: clients may query directly
  • 内置Web API:客户端可以直接查询
  • Incremental map-reduce: CouchDB runs the job once, but you can query repeatedly at no cost. Updates to the data set are immediately reflected in the map/reduce result without full re-processing
  • 增量map-reduce:CouchDB运行一次作业,但您可以免费重复查询。数据集的更新会立即反映在map / reduce结果中,而无需完全重新处理
  • Easy to start small but expand to large clusters without changing application code.
  • 易于启动,但可扩展到大型集群,而无需更改应用程序代码。

#2


3  

Where shall it be stored and how shall it be retrieved?

它应该存放在哪里以及如何检索?

I guess it depends on how much data you are dealing with. If you have a huge amount (terabytes and petabytes per day) of logs then Apache Kafka, which is designed to allow data to be PULLED by HDFS in parallel, is a interesting solution - still in the incubation stage. I believe if you want to consume Kafka messages with MongoDb, you'd need to develop your own adapter to ingest it as a consumer of a particular Kafka topic. Although MongoDb data (e.g. shards and replicas) is distributed, it may be a sequential process to ingest each message. So, there may be a bottleneck or even race conditions depending on the rate and size of message traffic. Kafka is optimized to pump and append that data to HDFS nodes using message brokers FAST. Then once it is in HDFS you can map/reduce to analyze your information in a variety of ways.

我想这取决于你要处理的数据量。如果您拥有大量(每天TB级和PB级)的日志,那么Apache Kafka(旨在允许HDFS并行提取数据)是一个有趣的解决方案 - 仍处于孵化阶段。我相信如果你想使用MongoDb消费Kafka消息,你需要开发自己的适配器来将其作为特定Kafka主题的消费者来摄取。虽然MongoDb数据(例如,分片和副本)是分布式的,但它可以是摄取每个消息的顺序过程。因此,根据消息流量的速率和大小,可能存在瓶颈甚至竞争条件。 Kafka经过优化,可以使用消息代理FAST将数据泵送并附加到HDFS节点。然后,一旦它在HDFS中,您可以映射/缩小以便以各种方式分析您的信息。

If MongoDb can handle the ingestion load, then it is an excellent, scalable, real-time solution to find information, particularly documents. Otherwise, if you have more time to process data (i.e. batch processes that take hours and sometimes days), then Hadoop or some other Map Reduce database is warranted. Finally, Kafka can distribute that load of messages and hookup that fire-hose to a variety of consumers. Overall, these new technologies spread the load and huge amounts of data across cheap hardware using software to manage failure and recover with a very low probability of losing data.

如果MongoDb可以处理摄取负载,那么它是一种优秀,可扩展的实时解决方案,可以查找信息,尤其是文档。否则,如果您有更多时间处理数据(即批处理过程需要数小时甚至数天),则保证Hadoop或其他Map Reduce数据库。最后,Kafka可以将这些消息和连接消息分发给各种消费者。总的来说,这些新技术使用软件在廉价硬件上分散负载和大量数据,以便以极低的丢失数据概率来管理故障和恢复。

Even with a small amount of data, MongoDb is a nice option to traditional relational database solutions which require more overhead of developer resources to design, build and maintain.

即使只有少量数据,MongoDb也是传统关系数据库解决方案的一个不错的选择,它需要更多的开发人员资源来设计,构建和维护。

#3


1  

Have you considered Apache Kafka?

你考虑过Apache Kafka吗?

Kafka is a distributed messaging system developed at LinkedIn for collecting and delivering high volumes of log data with low latency. Our system incorporates ideas from existing log aggregators and messaging systems, and is suitable for both offline and online message consumption.

Kafka是一个在LinkedIn开发的分布式消息传递系统,用于以低延迟收集和提供大量日志数据。我们的系统融合了现有日志聚合器和消息系统的创意,适用于离线和在线消息消费。