Aws云服务EMR使用

时间:2022-09-13 17:22:18

Aws云服务EMR使用

创建表结构

创建abc库下的abc_user_i表字段s3://abc-server/abc-emr/shell/ABC_USER_HIVE.q:

  • EXTERNAL 指定为外部表
  • partitioned by (createTime Date) 指定分区表,列名createTime
  • LOCATION '${INPUT}' 指定输出位置
CREATE EXTERNAL TABLE IF NOT EXISTS abc.abc_user_i (
devId STRING,
appId INT ,
paName STRING,
appVersion STRING,
appVercode STRING,
sdkVersion STRING,
sdkVerCode STRING,
phoneVersion STRING,
mac STRING,
source STRING,
content STRING,
logDate Date,
ip STRING
)
partitioned by (createTime Date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ':'
LOCATION '${INPUT}';

添加步骤创建表:

Aws云服务EMR使用

hive的操作

# 创建分区:

  • location 指定 存储文件的具体位置 按日期存储的压缩包文件
  • 分区一个目录对应一条分区表
alter table abc.abc_user_i add partition (createTime='2017-10-20') location 's3://abc-server/abc-emr/InputDate/2017-10-20/';
alter table abc.abc_user_i add partition (createTime='2017-10-20') location 's3://abc-server/abc-emr/InputDate/2017-10-21/';
alter table abc.abc_user_i add partition (createTime='2017-10-20') location 's3://abc-server/abc-emr/InputDate/2017-10-22/';

# 查询已经创建的分区:

show partitions abc.abc_user_i;
createtime=2017-10-20
createtime=2017-10-21
createtime=2017-10-22

# 根据分区 查询结果:

hive> select count(*),createTime from abc.abc_user_i where createTime='2017-10-01' group by createTime;
Query ID = hadoop_20171102062813_7cccccxxx-c311-411e-de30-1xxxxaaaaa4
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1508122225619_0272) ----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 15.65 s
----------------------------------------------------------------------------------------------
OK
5404869 2017-10-01
Time taken: 17.211 seconds, Fetched: 1 row(s)

# 删除分区(外部表只会删除索引,不会删除数据;内部表会删除索引和数据):

alter table adsdk.adsdk_useraction_i drop partition(createTime='2017-10-24');

Hive创建外部表以及分区参考:

http://blog.csdn.net/csfreebird/article/details/27874943