hive:Group by 问题:Expression Not In Group By Key

时间:2021-02-17 22:45:04

hive又遇到了问题:

原来的mysql中sql语句为

SELECT 	DATEBUF, CITYID, CHANNELID FROM STAT_CVCHNEW WHERE DATEBUF>=20110517 AND DATEBUF<=20110519 GROUP BY DATEBUF, CITYID;

得到的结果应该是是按DATEBUF和CITYID分组,然后每组对应CHANNELID的第一个值

但在hive中报错

FAILED: SemanticException [Error 10025]: Line 1:20 Expression not in GROUP BY key 'uc'

然后google到http://*.com/questions/5746687/hive-expression-not-in-group-by-key中这样解决

用hive的udaf函数:collect_set()

改为

SELECT DATEBUF, CITYID, collect_set(CHANNELID) FROM PWD_STAT_CVCHNEW WHERE DATEBUF=20130810 and cityid<3 GROUP BY DATEBUF, CITYID;
返回的是每一组datebuf和cityid相应的channelid的所有的一个集合

改为

SELECT DATEBUF, CITYID, collect_set(CHANNELID)[0] FROM PWD_STAT_CVCHNEW WHERE DATEBUF=20130810 and cityid<3 GROUP BY DATEBUF, CITYID;
ok

但返回的结果channelid是以hdfs中存储的顺序的第一个,

其实这个在一般情况下好像用不到