MongoDB聚合框架 - 如何执行多个$ group查询

时间:2021-11-01 17:08:12

I have the following MongoDB aggregation query that finds all records within a specified month, $groups the records by day, and then returns an average price for each day. I would also like to return a price average for the entire month. Can I do this by using multiple $groups, if so, how?

我有以下MongoDB聚合查询,它查找指定月份内的所有记录,$按天分组记录,然后返回每天的平均价格。我还希望返回整个月的平均价格。我可以通过使用多个$组来实现这一点,如果是这样,怎么做?

        PriceHourly.aggregate([
                { $match: { date: { $gt: start, $lt: end } } },
                { $group: { 
                    _id: "$day", 
                    price: { $avg: '$price' },
                    system_demand: { $avg: '$system_demand'}
                }}
        ], function(err, results){

                results.forEach(function(r) {
                    r.price          = Helpers.round_price(r.price);
                    r.system_demand  = Helpers.round_price(r.system_demand);
                });
                console.log("Results Length: "+results.length, results);
                res.jsonp(results);
        }); // PriceHourly();  

Here is my model:

这是我的模型:

// Model
var PriceHourlySchema = new Schema({
    created: {
        type: Date,
        default: Date.now
    },
    day: {
        type: String,
        required: true,
        trim: true
    },
    hour: {
        type: String,
        required: true,
        trim: true
    },
    price: {
        type: Number,
        required: true
    },
    date: {
        type: Date,
        required: true
    }
}, 
{ 
    autoIndex: true 
});

1 个解决方案

#1


1  

The short answer is "What is wrong with just expanding your date range to include all the days in a month?", and therefore that is all you need to change in order to get your result.

简短的回答是“只是扩展您的日期范围以包括一个月内的所有日期有什么问题?”,因此您需要更改以获得结果。

And could you "nest" grouping stages? Yes you can add additional stages to the pipeline, that is what the pipeline is for. So if you first wanted to "average" per day and then take the average over all the days of the month, you can form like this:

你能“嵌套”分组阶段吗?是的,您可以向管道添加其他阶段,这就是管道的用途。因此,如果您首先想要每天“平均”,然后在一个月的所有日子里取平均值,您可以这样形成:

PriceHourly.aggregate([
   { "$match": { 
       "date": { 
           "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
       }  
   }},
   { "$group": { 
       "_id": "$day", 
       "price": { "$avg": "$price" },
       "system_demand": { "$avg": "$system_demand" }
   }},
   { "$group": { 
       "_id": null, 
       "price": { "$avg": "$price" },
       "system_demand": { "$avg": "$system_demand" }
   }}
])

Even though that is likely to be reasonably redundant as this can arguably be done with one single group statement.

即使这可能是相当多余的,因为可以说这可以用一个单一的群体陈述来完成。

But there is a longer commentary on this schema. You do not actually state much of the purpose of what you are doing other than obtaining an average, or what the schema is meant to contain. So I want to describe something that is maybe a bit different.

但是这个架构有更长的评论。除了获得平均值或模式意图包含的内容之外,您实际上并没有说明您正在做的事情的大部分目的。所以我想描述一些可能有点不同的东西。

Suppose you have a collection that includes the "product", "type" the "current price" and the "timestamp" as a date when that "price" was "changed". Let us call the collection "PriceChange". So every time this event happens a new document is created.

假设您有一个包含“产品”的集合,“键入”“当前价格”和“时间戳”作为“价格”“更改”的日期。我们称之为“PriceChange”。因此,每次发生此事件时,都会创建一个新文档。

{
    "product": "ABC",
    "type": 2,
    "price": 110,
    "timestamp": ISODate("2014-04-01T00:08:38.360Z")
}

This could change many times in an hour, a day or whatever the case.

这可能会在一小时,一天或任何情况下多次改变。

So if you were interested in the "average" price per product over the month you could do this:

因此,如果您对本月每件产品的“平均”价格感兴趣,您可以这样做:

PriceChange.aggregate([
    { "$match": {
        "timestamp": { 
            "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
        }
    }},
    { "$group": {
        "_id": "$product",
        "price_avg": { "$avg": "$price" }
    }}
])

Also, without any additional fields you can get the average price per product for each day of the month:

此外,如果没有任何其他字段,您可以获得每月每个产品的平均价格:

PriceChange.aggregate([
    { "$match": {
        "timestamp": { 
            "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
        }
    }},
    { "$group": {
        "_id": {
            "day": { "$dayOfMonth": "$timestamp" },
            "product": "$product"
        },
        "price_avg": { "$avg": "$price" }
    }}
])

Or you can even get the last price for each month over a whole year:

或者你甚至可以获得一年中每个月的最后价格:

PriceChange.aggregate([
    { "$match": {
        "timestamp": { 
            "$gte": new Date("2013-01-01"), "$lt": new Date("2014-01-01")
        }
    }},
    { "$group": {
        "_id": {
            "date": {
                "year": { "$year" : "$timestamp" },
                "month": { "$month": "$timestamp" }
            },
            "product": "$product"
        },
        "price_last": { "$last": "$price" }
    }}
])

So those are some things you can do using the build in Date Aggregation Operators to achieve various results. These can even aid in collection of this information for writing into new "pre-aggregated" collections, to be used for faster analysis.

因此,您可以使用日期聚合运算符中的构建来实现各种结果。这些甚至可以帮助收集这些信息,以便写入新的“预聚合”集合,用于更快的分析。

I suppose there would be one way to combine a "running" average against all prices using mapReduce. So again from my sample:

我想有一种方法可以使用mapReduce将“运行”平均值与所有价格相结合。再次从我的样本:

PriceHourly.mapReduce(
    function () {
        emit( this.timestamp.getDate(), this.price )
    },
    function (key, values) {
        var sum = 0;
        values.forEach(function(value) {
            sum += value;
        });
        return ( sum / values.length );
    },
    { 
        "query": {
            "timestamp": {
                "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
            }
        },
        "out": { "inline": 1 },
        "scope": { "running": 0, "counter": 0 },
        "finalize": function(key,value) {
            running += value;
            counter++;
            return { "dayAvg": value, "monthAvg": running / counter };
        }
    }
)

And that would return something like this:

这将返回这样的事情:

{
    "results" : [
        {
            "_id" : 1,
            "value" : {
                "dayAvg" : 105,
                "monthAvg" : 105
            }
        },
        {
            "_id" : 2,
            "value" : {
                "dayAvg" : 110,
                "monthAvg" : 107.5
           }
        }
    ],
}

But if you are otherwise expecting to see discrete values for both the day and the month, then that would not be possible without running separate queries.

但是,如果您希望看到日和月的离散值,那么如果不运行单独的查询则不可能。

#1


1  

The short answer is "What is wrong with just expanding your date range to include all the days in a month?", and therefore that is all you need to change in order to get your result.

简短的回答是“只是扩展您的日期范围以包括一个月内的所有日期有什么问题?”,因此您需要更改以获得结果。

And could you "nest" grouping stages? Yes you can add additional stages to the pipeline, that is what the pipeline is for. So if you first wanted to "average" per day and then take the average over all the days of the month, you can form like this:

你能“嵌套”分组阶段吗?是的,您可以向管道添加其他阶段,这就是管道的用途。因此,如果您首先想要每天“平均”,然后在一个月的所有日子里取平均值,您可以这样形成:

PriceHourly.aggregate([
   { "$match": { 
       "date": { 
           "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
       }  
   }},
   { "$group": { 
       "_id": "$day", 
       "price": { "$avg": "$price" },
       "system_demand": { "$avg": "$system_demand" }
   }},
   { "$group": { 
       "_id": null, 
       "price": { "$avg": "$price" },
       "system_demand": { "$avg": "$system_demand" }
   }}
])

Even though that is likely to be reasonably redundant as this can arguably be done with one single group statement.

即使这可能是相当多余的,因为可以说这可以用一个单一的群体陈述来完成。

But there is a longer commentary on this schema. You do not actually state much of the purpose of what you are doing other than obtaining an average, or what the schema is meant to contain. So I want to describe something that is maybe a bit different.

但是这个架构有更长的评论。除了获得平均值或模式意图包含的内容之外,您实际上并没有说明您正在做的事情的大部分目的。所以我想描述一些可能有点不同的东西。

Suppose you have a collection that includes the "product", "type" the "current price" and the "timestamp" as a date when that "price" was "changed". Let us call the collection "PriceChange". So every time this event happens a new document is created.

假设您有一个包含“产品”的集合,“键入”“当前价格”和“时间戳”作为“价格”“更改”的日期。我们称之为“PriceChange”。因此,每次发生此事件时,都会创建一个新文档。

{
    "product": "ABC",
    "type": 2,
    "price": 110,
    "timestamp": ISODate("2014-04-01T00:08:38.360Z")
}

This could change many times in an hour, a day or whatever the case.

这可能会在一小时,一天或任何情况下多次改变。

So if you were interested in the "average" price per product over the month you could do this:

因此,如果您对本月每件产品的“平均”价格感兴趣,您可以这样做:

PriceChange.aggregate([
    { "$match": {
        "timestamp": { 
            "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
        }
    }},
    { "$group": {
        "_id": "$product",
        "price_avg": { "$avg": "$price" }
    }}
])

Also, without any additional fields you can get the average price per product for each day of the month:

此外,如果没有任何其他字段,您可以获得每月每个产品的平均价格:

PriceChange.aggregate([
    { "$match": {
        "timestamp": { 
            "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
        }
    }},
    { "$group": {
        "_id": {
            "day": { "$dayOfMonth": "$timestamp" },
            "product": "$product"
        },
        "price_avg": { "$avg": "$price" }
    }}
])

Or you can even get the last price for each month over a whole year:

或者你甚至可以获得一年中每个月的最后价格:

PriceChange.aggregate([
    { "$match": {
        "timestamp": { 
            "$gte": new Date("2013-01-01"), "$lt": new Date("2014-01-01")
        }
    }},
    { "$group": {
        "_id": {
            "date": {
                "year": { "$year" : "$timestamp" },
                "month": { "$month": "$timestamp" }
            },
            "product": "$product"
        },
        "price_last": { "$last": "$price" }
    }}
])

So those are some things you can do using the build in Date Aggregation Operators to achieve various results. These can even aid in collection of this information for writing into new "pre-aggregated" collections, to be used for faster analysis.

因此,您可以使用日期聚合运算符中的构建来实现各种结果。这些甚至可以帮助收集这些信息,以便写入新的“预聚合”集合,用于更快的分析。

I suppose there would be one way to combine a "running" average against all prices using mapReduce. So again from my sample:

我想有一种方法可以使用mapReduce将“运行”平均值与所有价格相结合。再次从我的样本:

PriceHourly.mapReduce(
    function () {
        emit( this.timestamp.getDate(), this.price )
    },
    function (key, values) {
        var sum = 0;
        values.forEach(function(value) {
            sum += value;
        });
        return ( sum / values.length );
    },
    { 
        "query": {
            "timestamp": {
                "$gte": new Date("2014-03-01"), "$lt": new Date("2014-04-01")
            }
        },
        "out": { "inline": 1 },
        "scope": { "running": 0, "counter": 0 },
        "finalize": function(key,value) {
            running += value;
            counter++;
            return { "dayAvg": value, "monthAvg": running / counter };
        }
    }
)

And that would return something like this:

这将返回这样的事情:

{
    "results" : [
        {
            "_id" : 1,
            "value" : {
                "dayAvg" : 105,
                "monthAvg" : 105
            }
        },
        {
            "_id" : 2,
            "value" : {
                "dayAvg" : 110,
                "monthAvg" : 107.5
           }
        }
    ],
}

But if you are otherwise expecting to see discrete values for both the day and the month, then that would not be possible without running separate queries.

但是,如果您希望看到日和月的离散值,那么如果不运行单独的查询则不可能。