1、 agg(expers:column*) 返回dataframe类型 ,同数学计算求值
(max("age"), avg("salary"))
().agg(max("age"), avg("salary"))
2、 agg(exprs: Map[String, String]) 返回dataframe类型 ,同数学计算求值 map类型的
(Map("age" -> "max", "salary" -> "avg"))
().agg(Map("age" -> "max", "salary" -> "avg"))
3、 agg(aggExpr: (String, String), aggExprs: (String, String)*) 返回dataframe类型 ,同数学计算求值
(Map("age" -> "max", "salary" -> "avg"))
().agg(Map("age" -> "max", "salary" -> "avg"))
例子1:
scala>
res2: String =
2.0
.
2
scala>
case
class
Test(bf: Int, df: Int, duration: Int, tel_date: Int)
defined
class
Test
scala> val df = Seq(Test(
1
,
1
,
1
,
1
), Test(
1
,
1
,
2
,
2
), Test(
1
,
1
,
3
,
3
), Test(
2
,
2
,
3
,
3
), Test(
2
,
2
,
2
,
2
), Test(
2
,
2
,
1
,
1
)).toDF
df: = [bf:
int
, df:
int
...
2
more fields]
scala>
+---+---+--------+--------+
| bf| df|duration|tel_date|
+---+---+--------+--------+
|
1
|
1
|
1
|
1
|
|
1
|
1
|
2
|
2
|
|
1
|
1
|
3
|
3
|
|
2
|
2
|
3
|
3
|
|
2
|
2
|
2
|
2
|
|
2
|
2
|
1
|
1
|
+---+---+--------+--------+
scala> (
"bf"
,
"df"
).agg((
"duration"
,
"sum"
),(
"tel_date"
,
"min"
),(
"tel_date"
,
"max"
)).show()
+---+---+-------------+-------------+-------------+
| bf| df|sum(duration)|min(tel_date)|max(tel_date)|
+---+---+-------------+-------------+-------------+
|
2
|
2
|
6
|
1
|
3
|
|
1
|
1
|
6
|
1
|
3
|
+---+---+-------------+-------------+-------------+
注意:此处df已经少了列duration和tel_date,只有groupby的key和agg中的字段
例子2:
import as func
agg(("event_time").alias("max_event_tm"),("event_time").alias("min_event_tm"))