hawq创建filespace,tablespace,database,table

使用HAWQ

在HAWQ的使用上跟Greenplum基本就一样一样的了。比如：
1. 创建表空间

#选创建filespace,生成配置文件

[gpadmin@master ~]$ hawq filespace -o hawqfilespace_config

Enter a name for this filespace

> hawqfs

Enter replica num for filespace. If , default replica num is used (default=)

> 

Please specify the DFS location for the filespace (for example: localhost:/fs)

location> master:/fs

#执行创建

[gpadmin@master ~]$ hawq filespace --config ./hawqfilespace_config

Reading Configuration file: './hawqfilespace_config'

CREATE FILESPACE hawqfs ON hdfs

('slave2:8020/fs/hawqfs')

:::: hawqfilespace:master:gpadmin-[INFO]:-Connecting to database

:::: hawqfilespace:master:gpadmin-[INFO]:-Filespace "hawqfs" successfully created

#再创建表空间

[gpadmin@master ~]$ psql template1

psql (8.2.)

Type "help" for help.

template1=#

template1=# CREATE TABLESPACE hawqts FILESPACE hawqfs;

CREATE TABLESPACE

2. 创建数据库

template1=# CREATE DATABASE testdb WITH TABLESPACE=hawqts; #指定存储表空间为hawqts

CREATE DATABASE

3. 创建表到新的数据库中

[gpadmin@master ~]$ psql testdb  //这里指定连接到新的数据库中

testdb=# create TABLE books(

testdb(#   id integer

testdb(#   , isbn varchar()

testdb(#   , category varchar()

testdb(#   , publish_date TIMESTAMP

testdb(#   , publisher varchar()

testdb(#   , price money

testdb(# ) DISTRIBUTED BY (id);  #指定表的数据打散列

CREATE TABLE

创建的表默认都创建在了public schema中，也就是所有用户都可以访问，但可以在建表时指定schema. 如： testschema.books
4. 加载数据文件到表中

testdb=# COPY books(id,isbn,category,publish_date,publisher,price)

testdb-# FROM '/tmp/books'

testdb-# WITH

testdb-# DELIMITER AS '|'

testdb-# ;

COPY

Time: 41562.606 ms

加载速度达到了 380248条/秒. 还是不错的
5. 查询表, HAWQ作为主用于数据仓库的数据库在SQL支持方面非常丰富，在标准SQL基础上，还支持OLAP的窗口函数，窗口函数等。

testdb=# SELECT COUNT(*) FROM books;

  count

----------

( row)

Time: 4750.786 ms

//求每个类别下的最高价，最低价

testdb=# SELECT category, max(price) max_price, min(price) min_price

testdb-# FROM books

testdb-# group by category

testdb-# LIMIT ;

    category    | max_price | min_price

----------------+-----------+-----------

 COMPUTERS      |   $199.99 |     $5.99

 SELF-HELP      |   $199.99 |     $5.99

 COOKING        |   $199.99 |     $5.99

 SOCIAL-SCIENCE |   $199.99 |     $5.99

 SCIENCE        |   $199.99 |     $5.99

( rows)

Time: 4755.163 ms

//求每类别下的最高，最小价格，及对应的BOOK ID

testdb=# SELECT category

testdb-#  , max(case when desc_rn =  then id end) max_price_id, max(case when desc_rn =  then id end) max_price

testdb-#  , max(case when asc_rn =  then id end) min_price_id, max(case when asc_rn =  then id end) min_price

testdb-# FROM (

testdb(#  SELECT id, category, price

testdb(#   , row_number() over(PARTITION BY category ORDER BY price desc) desc_rn

testdb(#   , row_number() over(PARTITION BY category ORDER BY price asc) asc_rn

testdb(#  FROM books

testdb(# ) t

testdb-# WHERE desc_rn =  or asc_rn =

testdb-# GROUP BY category

testdb-# limit ;

    category    | max_price_id | max_price | min_price_id | min_price

----------------+--------------+-----------+--------------+-----------

 CRAFTS-HOBBIES |         |   $199.99 |       |     $5.99

 GAMES          |       |   $199.99 |      |     $5.99

 STUDY-AIDS     |       |   $199.99 |      |     $5.99

 ARCHITECTURE   |       |   $199.99 |       |     $5.99

 POETRY         |       |   $199.99 |        |     $5.99

( rows)

Time: 23522.772 ms

6. 使用HAWQ查询HIVE数据

HAWQ是一个基于HDFS的一个独立的数据库系统，若需要访问其它第三方数据，则还需要再安装HAWQ Extension Framework (PXF) 插件。PXF支持在HDFS上的Hive, Hbase数据，还支持用户开发自定义的其它并行数据源的连接器。

7. 最后

HAWQ作为一个从Greenplum更改过来的系统，在功能上支持上还是非常丰富的，除了上面介绍的查询功能外，还支持像PL/Java， PL/Perl， PL/pgSQL, PL/Python, PL/R等存储过程。但个人觉得，它最大的缺点就是这是一个独立的数据库，在当前的这样一个具有多种多样组件的HADOOP平台上，不能实现数据共享，进而根据不同场景采用多种数据处理方式着实是一大遗憾。

8.登录之后给默认用户“postgres”设置密码

postgres=# \password postgres #给postgres用户设置密码
Enter new password:
Enter it again:
postgres=#

使用HAWQ

在HAWQ的使用上跟Greenplum基本就一样一样的了。比如：
1. 创建表空间

#选创建filespace,生成配置文件

[gpadmin@master ~]$ hawq filespace -o hawqfilespace_config

Enter a name for this filespace

> hawqfs

Enter replica num for filespace. If 0, default replica num is used (default=3)

> 0

Please specify the DFS location for the filespace (for example: localhost:9000/fs)

location> master:8020/fs

#执行创建

[gpadmin@master ~]$ hawq filespace --config ./hawqfilespace_config

Reading Configuration file: './hawqfilespace_config'

CREATE FILESPACE hawqfs ON hdfs

('slave2:8020/fs/hawqfs')

20161121:11:26:25:122509 hawqfilespace:master:gpadmin-[INFO]:-Connecting to database

20161121:11:27:38:122509 hawqfilespace:master:gpadmin-[INFO]:-Filespace "hawqfs" successfully created

#再创建表空间

[gpadmin@master ~]$ psql template1

psql (8.2.15)

Type "help" for help.

template1=#

template1=# CREATE TABLESPACE hawqts FILESPACE hawqfs;

CREATE TABLESPACE

2. 创建数据库

template1=# CREATE DATABASE testdb WITH TABLESPACE=hawqts; #指定存储表空间为hawqts

CREATE DATABASE

3. 创建表到新的数据库中

[gpadmin@master ~]$ psql testdb  //这里指定连接到新的数据库中

testdb=# create TABLE books(

testdb(#   id integer

testdb(#   , isbn varchar(100)

testdb(#   , category varchar(100)

testdb(#   , publish_date TIMESTAMP

testdb(#   , publisher varchar(100)

testdb(#   , price money

testdb(# ) DISTRIBUTED BY (id);  #指定表的数据打散列

CREATE TABLE

创建的表默认都创建在了public schema中，也就是所有用户都可以访问，但可以在建表时指定schema. 如： testschema.books
4. 加载数据文件到表中

testdb=# COPY books(id,isbn,category,publish_date,publisher,price)

testdb-# FROM '/tmp/books'

testdb-# WITH

testdb-# DELIMITER AS '|'

testdb-# ;

COPY 15970428

Time: 41562.606 ms

testdb=# SELECT COUNT(*) FROM books;

  count

----------

 15970428

(1 row)

Time: 4750.786 ms

//求每个类别下的最高价，最低价

testdb=# SELECT category, max(price) max_price, min(price) min_price

testdb-# FROM books

testdb-# group by category

testdb-# LIMIT 5;

    category    | max_price | min_price

----------------+-----------+-----------

 COMPUTERS      |   $199.99 |     $5.99

 SELF-HELP      |   $199.99 |     $5.99

 COOKING        |   $199.99 |     $5.99

 SOCIAL-SCIENCE |   $199.99 |     $5.99

 SCIENCE        |   $199.99 |     $5.99

(5 rows)

Time: 4755.163 ms

//求每类别下的最高，最小价格，及对应的BOOK ID

testdb=# SELECT category

testdb-#  , max(case when desc_rn = 1 then id end) max_price_id, max(case when desc_rn = 1 then id end) max_price

testdb-#  , max(case when asc_rn = 1 then id end) min_price_id, max(case when asc_rn = 1 then id end) min_price

testdb-# FROM (

testdb(#  SELECT id, category, price

testdb(#   , row_number() over(PARTITION BY category ORDER BY price desc) desc_rn

testdb(#   , row_number() over(PARTITION BY category ORDER BY price asc) asc_rn

testdb(#  FROM books

testdb(# ) t

testdb-# WHERE desc_rn = 1 or asc_rn = 1

testdb-# GROUP BY category

testdb-# limit 5;

    category    | max_price_id | max_price | min_price_id | min_price

----------------+--------------+-----------+--------------+-----------

 CRAFTS-HOBBIES |        86389 |   $199.99 |      7731780 |     $5.99

 GAMES          |      5747114 |   $199.99 |     10972216 |     $5.99

 STUDY-AIDS     |      2303276 |   $199.99 |     13723321 |     $5.99

 ARCHITECTURE   |      9294400 |   $199.99 |      7357451 |     $5.99

 POETRY         |      7501765 |   $199.99 |       554714 |     $5.99

(5 rows)

Time: 23522.772 ms

秒客网

hawq创建filespace,tablespace,database,table

使用HAWQ

6. 使用HAWQ查询HIVE数据

7. 最后

使用HAWQ

5. 使用HAWQ查询HIVE数据

7. 最后

相关文章