[Sqoop02]Sqoop常用操作命令

时间:2024-03-31 12:02:12

sqoop的所有操作脚本在$SQOOP_HOME/bin 目录下

[Sqoop02]Sqoop常用操作命令

sqoop脚本

1、

[[email protected] bin]$ sqoop
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
Try 'sqoop help' for usage.

这个报错不用管


2、使用“sqoop help”查看sqoop脚本的用法

[[email protected] bin]$ sqoop help
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/apps/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
18/01/06 13:44:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
usage: sqoop COMMAND [ARGS]


Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information


See 'sqoop help COMMAND' for information on a specific command.

这个报错不用管


3、使用“sqoop help command”查看sqoop下命令的具体用法


Common arguments:
   --connect <jdbc-uri>                         Specify JDBC connect
                                                string
   --connection-manager <class-name>            Specify connection manager
                                                class name
   --connection-param-file <properties-file>    Specify connection
                                                parameters file
   --driver <class-name>                        Manually specify JDBC
                                                driver class to use
   --hadoop-home <hdir>                         Override
                                                $HADOOP_MAPRED_HOME_ARG
   --hadoop-mapred-home <dir>                   Override
                                                $HADOOP_MAPRED_HOME_ARG
   --help                                       Print usage instructions
-P                                              Read password from console
   --password <password>                        Set authentication
                                                password
   --password-alias <password-alias>            Credential provider
                                                password alias
   --password-file <password-file>              Set authentication
                                                password file path
   --relaxed-isolation                          Use read-uncommitted
                                                isolation for imports
   --skip-dist-cache                            Skip copying jars to
                                                distributed cache
   --username <username>                        Set authentication
                                                username
   --verbose                                    Print more information
                                                while working


Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.


The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


4、列出mysql数据库中的所有databases

sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password vincent

执行以上命令报错(原因:找不到mysql驱动包)

java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
        at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
        at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
        at org.apache.sqoop.manager.CatalogQueryManager.listDatabases(CatalogQueryManager.java:57)
        at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

解决方案:

将mysql驱动的jar包拷贝到SQOOP_HOME/lib目录下即可

如果hive的元信息数据库是mysql,且hive 能正常使用,则执行以下命令即可

cp  $HIVE_HOME/lib/mysql-connector-java-5.1.45-bin.jar  $SQOOP_HOME/lib/

注意拷贝后改jar 包的属组与属主


再次执行

sqoop list-databases \
--connect jdbc:mysql://localhost:3306 \
--username root \
--password vincent

效果如下:

18/01/06 14:13:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/01/06 14:13:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/06 14:13:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
mysql
vincent_hive

对应的mysql中执行show databases;效果如下:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| vincent_hive       |
+--------------------+
3 rows in set (0.01 sec)


5、列出mysql数据库中子数据库vincent_hive中所有的表:

sqoop list-tables  --connect jdbc:mysql://localhost:3306/vincent_hive  --username root--password vincent


6、将mysql 中的表数据导入到hdfs上

sqoop import \
--connect jdbc:mysql://localhost:3306/mysql \
--username root \
--password vincent \
--table user

导入的数据在hdfs的“/user/${执行该操作的用户}/${导入的表} ”目录下

任务执行会在$HADOOP_HOME目录下创建一个“表名.java”的文件,该文件存放的是对应任务的MR程序

报错(找不到java-json.jar包):

Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
        at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:42)
        at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:742)
        at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:369)
        at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:355)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:249)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 15 more

解决方法:

在网上找一个java-json.jar包上传至$SQOOP_HOME/bin目录下即可

再次执行

sqoop import \
--connect jdbc:mysql://localhost:3306/mysql \
--username root \
--password vincent \
--table user

日志如下:

18/01/06 15:18:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/01/06 15:18:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/06 15:18:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/06 15:18:55 INFO tool.CodeGenTool: Beginning code generation
18/01/06 15:18:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
18/01/06 15:18:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
18/01/06 15:18:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/apps/hadoop
Note: /tmp/sqoop-hadoop/compile/fdf0c2f391ba20be1190c670d7b71bb5/user.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/06 15:19:05 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/fdf0c2f391ba20be1190c670d7b71bb5/user.jar
18/01/06 15:19:05 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/06 15:19:05 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/06 15:19:05 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/06 15:19:05 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/06 15:19:05 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.
18/01/06 15:19:05 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.
18/01/06 15:19:05 INFO mapreduce.ImportJobBase: Beginning import of user
18/01/06 15:19:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/01/06 15:19:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/01/06 15:19:11 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/01/06 15:19:20 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/06 15:19:20 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`Host`), MAX(`Host`) FROM `user`
18/01/06 15:19:20 WARN db.TextSplitter: Generating splits for a textual index column.
18/01/06 15:19:20 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
18/01/06 15:19:20 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
18/01/06 15:19:20 INFO mapreduce.JobSubmitter: number of splits:5
18/01/06 15:19:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515112781641_0004
18/01/06 15:19:24 INFO impl.YarnClientImpl: Submitted application application_1515112781641_0004

18/01/06 15:19:25 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1515112781641_0004/
18/01/06 15:19:25 INFO mapreduce.Job: Running job: job_1515112781641_0004
18/01/06 15:20:07 INFO mapreduce.Job: Job job_1515112781641_0004 running in uber mode : false
18/01/06 15:20:07 INFO mapreduce.Job:  map 0% reduce 0%
18/01/06 15:22:44 INFO mapreduce.Job:  map 20% reduce 0%
18/01/06 15:22:50 INFO mapreduce.Job:  map 40% reduce 0%
18/01/06 15:22:58 INFO mapreduce.Job:  map 60% reduce 0%
18/01/06 15:22:59 INFO mapreduce.Job:  map 80% reduce 0%
18/01/06 15:23:01 INFO mapreduce.Job:  map 100% reduce 0%
18/01/06 15:23:02 INFO mapreduce.Job: Job job_1515112781641_0004 completed successfully
18/01/06 15:23:03 INFO mapreduce.Job: Counters: 31
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=690395
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=655
                HDFS: Number of bytes written=492
                HDFS: Number of read operations=20
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=10
        Job Counters
                Killed map tasks=1
                Launched map tasks=6
                Other local map tasks=6
                Total time spent by all maps in occupied slots (ms)=824619
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=824619
                Total vcore-seconds taken by all map tasks=824619
                Total megabyte-seconds taken by all map tasks=844409856
        Map-Reduce Framework
                Map input records=4
                Map output records=4
                Input split bytes=655
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=34641
                CPU time spent (ms)=18840
                Physical memory (bytes) snapshot=551452672
                Virtual memory (bytes) snapshot=7501139968
                Total committed heap usage (bytes)=239206400
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=492
18/01/06 15:23:03 INFO mapreduce.ImportJobBase: Transferred 492 bytes in 232.3861 seconds (2.1172 bytes/sec)
18/01/06 15:23:03 INFO mapreduce.ImportJobBase: Retrieved 4 records.

注:number of splits:5表明有五个map,最终“/user/${执行该操作的用户}/${导入的表} ”目录下会有五个maptask的输出文件及一个最终结果输出文件

[Sqoop02]Sqoop常用操作命令

7、将mysql 中vincent_hive子数据库的表数据导入到hdfs上

sqoop import \
--connect jdbc:mysql://localhost:3306/vincent_hive \
--username root --password vincent \
--table emp \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' 
\

-m 1 --null-non-string 0 \

--columns "EMPNO,ENAME,JOB,SAL,COMM" \
--target-dir EMP_COLUMN_WHERE \
--where 'SAL>2000'

注:

--mapreduce-job-name FromMySQL2HDFS  指定mr任务的名称为romMySQL2HDFS

--delete-target-dir  如果之前执行过该任务,这个参数会清除之前该任务的输出文件及在$HADOOP_HOME生成的mr程序文件

--fields-terminated-by '\t'  指定字段与字段之间分割符为‘\t’

-lines-terminated-by '\n'  指定列与列之间分割符为‘\n’

-m 1 指定map数量为1,map数量为1可以导入没有主键且未指定主键的表

--split-by  "name"指定name字段为主键

--null-non-string 0 将非string类型的null值转换成0

--columns "EMPNO,ENAME,JOB,SAL,COMM" 只导入emp表指定列的数据

--where 'SAL>2000' 过滤出SAL>2000的行

--target-dir EMP_COLUMN_WHERE 指定输出文件保存目录


8、将查询语句的内容导入hdfs

sqoop import \
--connect jdbc:mysql://localhost:3306/vincent_hive \
--username root --password vincent \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 --null-non-string 0 \
--target-dir EMP_COLUMN_QUERY \
--query 'select * from emp where empno>=7900 and $CONDITIONS'

注意:“$HADOOP_HOME”代表的是字符串"$HADOOP_HOME"

             '$HADOOP_HOME'和“\$HADOOP_HOME”代表的是hadoop家目录


9、将命令写入文件中再执行

vi emp.txt,格式如下(参数值在参数名的下一行):

import 
--connect 

jdbc:mysql://localhost:3306/vincent 

--username 

root 

--password

 vincent
--mapreduce-job-name

FromMySQL2HDFS 
-m 

--null-non-string

 0 
--target-dir 

EMP_COLUMN_QUERY 
--query 

"select * from emp where empno>=7900 and \$CONDITIONS"

sqoop --options-file    emp.txt

执行文件中的命令

10、执行命令后将结果打印在屏幕上

sqoop eval \
--connect jdbc:mysql://localhost:3306/vincent_hive\
--username root --password vincent \
--query "select * from emp"

11、将hdfs上信息导出到mysql中(mysql中事先要存在表结构满足要求的表)

sqoop export \
--connect jdbc:mysql://localhost:3306/vincent_hive\
--username root \
--password vincent \
--table emp_demo \
--export-dir /user/hadoop/emp \

-fields-terminated-by '\t' 

--table emp_demo 指定导出信息目标表

-export-dir /user/hadoop/emp指定导出信息源路径

12、将mysql数据导入到hive

sqoop import \
--connect jdbc:mysql://localhost:3306/vincent_hive\
--username root --password vincent \
--table emp \
--create-hive-table \
--hive-database ruoze_hive \
--hive-import \
--hive-overwrite \
--hive-table emp_import \
--mapreduce-job-name FromMySQL2HDFS \
--delete-target-dir \
--fields-terminated-by '\t' \
-m 1 --null-non-string 0 

create-hive-table 创建一张hive表,将数据导入到这张表中

但不建议使用这个参数自动在hive中创建一张表,这样做会造成字段的数据类型错误

建议先在hive中手动创建表存储导入的数据

hive-database ruoze_hive 使用ruo_ze子数据库

hive-import 导入到hive必须加这个参数,否则就是导入到hdfs上

hive-overwrite 支持覆盖导入

hive-table emp_import  导入的表名为emp_import

mapreduce-job-name 指定job名字


报错:

18/01/06 21:13:51 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/01/06 21:13:51 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
        at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:195)
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
        ... 12 more

解决办法:

在$HIVE_HOME/lib目录下执行以下命令即可:

cp hive-common-1.1.0-cdh5.7.0.jar $SQOOP_HOME/lib/

cp hive-shims*  $SQOOP_HOME/lib/

13、将hive数据导出到mysql中

sqoop export   --connect jdbc:mysql://localhost:3306/vincent_hive   --username root   --password vincent   --table emp_like    --export-dir /user/hive/warehouse/ruoze.db/emp_import  --fields-terminated-by '\t' 

mysql中的emp_like表要事先创建

--export-dir 指定要导出的数据的文件所在目录

--fields-terminated-by '\t' 不指定分隔符可能会出现问题

13、sqoop job

sqoop job --create ruozejob -- import --connect jdbc:mysql://localhost:3306/vincent_hive \
--username root \
--password vincent \
--table emp \
--delete-target-dir 

注意:

-- import 有个空格哦

命令:

sqoop job --list 查看存在的sqoop job

sqoop job --show ruoze_job 查看ruoze_job ,输入密码为mysql登录密码

sqoop job --exec ruoze_job 执行ruo_job,输入密码为mysql登录密码

sqoop job --delete ruoze_job

如何解决输密码问题?


 若泽数据交流群:707635769 

【来自@若泽大数据】