Import MySql database to Hive
--username <USERNAME> --password <PASSWORD> \
--hive-import \
--hive-table <table-name> \
--hive-database <db-name>
-
>hiveTo enter into hive
-
--hive-importTo enable hive import in sqoop import command
-
--hive-tableTo specify table in which import will done
-
--hive-databaseTo specify hive database. You can also use --hive-table to specify db by passing <DBName>.<TableName>
In each hive import following command will be common.
>
sqoop-import
--connect jdbc:mysql://<mysql-URL>:<port>/ <DBName>
\
--username <USERNAME> --password <PASSWORD> \
--
table
<table-name> \
--hive-import \
--hive-table <table-name> \
--hive-database <db-name>
-
Internally hive-import is run with following steps.
1. It will import a table into HDFS
2. Copy imported a table into hive
3. Delete directory from HDFS.
Note: If this operation fails in between then directory in HDFS will not delete.
-
Default field delimiter for hive is ^A
-
If you fire same –hive-import command two times then it will add new files in hive with appending _copy to the name. So to overcome it, we can use –hive-overwrite
-
--map-column-hive : This command is used for map sql type to hive type.
Sqoop Export to database from HDFS
-
Sqoop export is for exporting data to traditional RDMS from data stored in HDFS.
>
sqoop-
export
--connect jdbc:mysql://<mysql-URL>:<port>/<DBName>
\
--username
<USERNAME> --password <PASSWORD> \
--
export-dir
<dir-of-HDFS>
--
table
<
target-
table-name>
\
--
input-fields-terminated-by
“<delimeter>”
\
[
--
num-mappers
<no-of-mapper>
]
\
[
--
columns
<col1,col2,col3>]
\
[--update-key
<sql-column-name>]
\
[--update-mode
<mode>]
\
[--staging-table
<stage-table-name>] \
[--clear-staging-table]
Where
-
--export-dir: from where we have to copy data.
-
--table: target table where we need to export our data.
-
--input-fields-terminated-by: delimiter of input fields in HDFS data. For hive “\001”.
-
--num-mappers: Number of mappers.
-
--column: we can export data to a table by specifying the column. This will help when if an order of source data different then table or no of the field in source data is different then target table column fields. Column’s order will be as per our source data order but –columns argument should be as per table column name.
-
--update-key: It will update only those records from exported data to the target table having the same key as exported data but it will not insert non-existing key data.
-
--update-mode: updateonly(default) or allowinsert. allowinsert will update existing records and insert new records i.e merges data.
-
--staging-table: for exporting data with prevention of exporting intermediate data before any error occur. So it will copy first target table data to stage table then it will export data from HDFS to stage table then if operation success then it will copy data to target table.
-
--clear-staging-table: for staging table, it is necessary that the stage table should be empty. So it will clear stage table before an export operation.
Comments
Post a Comment