Hive中知识点

首语：之前第一位带我的师傅说没有SQL实现不了的功能，现在Flink SQL火了之后我觉得他说的有道理，复习下hive SQL吧，毕竟咱也是hive小王子啊hahahaha

hive的最新学习资料：http://www.cnblogs.com/qingyunzong/p/8707885.html

hive的参数设置大全：https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

一、hive的执行命令

 hive -S :进入hive的静默模式，只显示查询结果，不显示执行过程；

 hive -e ‘show tables’ :直接在操作系统命令下执行hive语句，不需要进入hive交互模式；

 source /root/my.sql; :在hive模式下使用source命令执行.sql文件；

Hive中知识点

t1:创建普通表；

t2:在hdfs中的指定目录创建表；

t3:创建列分隔符为“,”的表；

t4:使用查询语句创建有数据的表；

t5:使用查询语句创建列以“,”分隔有数据的表；

来自https://blog.csdn.net/qq_40784783/article/details/79168896

desc formatted 表名; 显示表结构

describe database 数据库;显示数据库所在存储路径

二、hive的.hiverc文件

在${HIVE_HOME}/bin目录下建.hiverc文件，加入：

 set hive.cli.print.header=true;

就可以显示表头。可以显示当前数据库：

 set hive.cli.print.current.db=true;

使用本地模式运行语句

 set hive.exec.mode.local.auto=true;

三、自定义的udf包

由于需要满足一个hive中不等值连接的需求，必须得自己手工写udf。以前也没有试过，所以今天尝试了下自己写了个ToLowerCase.java

由于没有用eclipse，所以直接用的是vim+javac搞定的。

在/home/dwdev/cajeep目录下，新建com/alibaba/hive/udf

 mkdir -p com/alibaba/hive/udf

新建java文件

 vim com/alibaba/hive/udf

java内容如下：

 package com.alibaba.hive.udf;

 import org.apache.hadoop.hive.ql.exec.UDF;

 public class ToLowerCase extends UDF{

     // 必须是 public，并且 evaluate 方法可以重载

     public String evaluate(String field) {

     String result = field.toLowerCase();

     return result;

     }

 }

仍旧在当前目录下，执行javac的编译命令

 javac -classpath /usr/local/hadoop-2.7.5/share/hadoop/common/lib/*.jar:/usr/local/hive/lib/hive-exec-2.3.2.jar ./ToLowerCase.java

将对应的class文件进行打包

 jar -cvf ToLowerCase.class

进入hive，添加对应jar包，然后创建临时的函数

  create temporary function tolowercase as 'com.alibaba.hive.udf.ToLowerCase';

 hive (default)> select tolowercase('HELLO');

 OK

 _c0

 hello

四、hive RegexSerDe使用详解

hive复杂格式数据的导入

五、复制hive表结构

 CREATE TABLE b LIKE a;

六、导入文本文件到hive

有时候会有把文本文件导入hive的需要，一般分两种，分别记录下：

把本地文件导入hive表

先创建hive表

 create database if not exists test;

 create table  test.test(key int,value string) row format delimited fields terminated by ','stored as textfile;

然后搞个文本文件1.txt

100,val_100

298,val_298

9,val_9

341,val_341

498,val_498

146,val_146

458,val_458

362,val_362

186,val_186

导入hive表

 LOAD DATA LOCAL INPATH '/opt/datas/1.txt' OVERWRITE INTO  TABLE  test;

最后，可以select * from test查询下

把HDFS上文件导入hive表

 LOAD DATA  INPATH '/home/hadoop/testhive/1.txt' OVERWRITE INTO  TABLE  test;

七、小表Join大表

预留

八、hive安装

Mac下安装教程

在hive安装过程中出现错误：Missing Hive Execution Jar: /root/hive/bin/lib/hive-exec-*.jar

可以发现目录/bin/lib这两个目录是并列的，说明hive在centos系统配置文件中的路径有误，打开 /etc/profile修改hive的配置路径：

 export HIVE_HOME=$PWD/hive

 export PATH=$PATH:$HIVE_HOME

九、时间格式化

1. 把需要转换的时间转换为时间戳

select unix_timestamp('2018-03-05 17:22:57.784','yyyy-MM-dd HH:mm:ss.SSS');

2. 把时间戳转换为时间

select from_unixtime(1520241777,'yyyyMMddHHmm');

十、hive中的and和or

hive中and的执行优先级比or高，下面是测试语句：

select 1 from student where 1=0 or 1=1 and 1 = 0;

执行结果为空

select 1 from student where 1=0 or 1=1 and 1 =1;

执行结果为1

第二个select语句毫无疑问where语句后面的值返回为true，无论and或者or的优先级如何都一样，但是第一个select语句缺不是从左到右执行的，相当于select 1 from student where 1 = 0 or (1=1 and 1 = 0);