Skip to main content

Sqoop- Import MySql Database to/from hive

Import MySql database to Hive
  • >hive
    To enter into hive
  • --hive-import
    To enable hive import in sqoop import command
  • --hive-table
    To specify table in which import will done
  • --hive-database
    To specify hive database. You can also use --hive-table to specify db by passing <DBName>.<TableName>
    In each hive import following command will be common.
> sqoop-import --connect jdbc:mysql://<mysql-URL>:<port>/ <DBName> \ 
--username <USERNAME> --password <PASSWORD> \   
--table <table-name> \        
--hive-import \         
--hive-table <table-name> \     
--hive-database <db-name>        

  • Internally hive-import is run with following steps.
1. It will import a table into HDFS
2. Copy imported a table into hive
3. Delete directory from HDFS.
Note: If this operation fails in between then directory in HDFS will not delete.


  • Default field delimiter for hive is ^A
  • If you fire same –hive-import command two times then it will add new files in hive with appending _copy to the name. So to overcome it, we can use –hive-overwrite
  • --map-column-hive : This command is used for map sql type to hive type.

Sqoop Export to database from HDFS
  • Sqoop export is for exporting data to traditional RDMS from data stored in HDFS.

> sqoop-export --connect jdbc:mysql://<mysql-URL>:<port>/<DBName> \
--username <USERNAME> --password <PASSWORD> \
--export-dir <dir-of-HDFS>
--table <target-table-name> \
--input-fields-terminated-by “<delimeter>” \
[--num-mappers <no-of-mapper>] \
[--columns <col1,col2,col3>] \
[--update-key <sql-column-name>] \
[--update-mode <mode>] \
[--staging-table <stage-table-name>] \
[--clear-staging-table]

Where
  • --export-dir: from where we have to copy data.
  • --table: target table where we need to export our data.
  • --input-fields-terminated-by: delimiter of input fields in HDFS data. For hive “\001”.
  • --num-mappers: Number of mappers.
  • --column: we can export data to a table by specifying the column. This will help when if an order of source data different then table or no of the field in source data is different then target table column fields. Column’s order will be as per our source data order but –columns argument should be as per table column name.
  • --update-key: It will update only those records from exported data to the target table having the same key as exported data but it will not insert non-existing key data.
  • --update-mode: updateonly(default) or allowinsert. allowinsert will update existing records and insert new records i.e merges data.
  • --staging-table: for exporting data with prevention of exporting intermediate data before any error occur. So it will copy first target table data to stage table then it will export data from HDFS to stage table then if operation success then it will copy data to target table.
  • --clear-staging-table: for staging table, it is necessary that the stage table should be empty. So it will clear stage table before an export operation.


Comments

Popular posts from this blog

AWS IOT Thing Job

AWS IOT Thing Job AWS Iot Thing Job Creation, new job notification, start job and update the job after downloading firmware through JAVA SDK with UI in JAVAFX | Presigned S3 URL creation This Application is made for firmware download. Refer to this GIT repository:  AWS IOT POC A repository contains 3 projects: Aws-Pre-Signed-Url-Generation: To generate presigned url and use it into job document. NOTE: AWS CLI should be configured Iot-Create-Thing-Job-App: To create iot thing job with UI. NOTE: Access key and secret key should be mentioned in aws-iot-sdk-samples.properties Iot-Start-Update-Thing-Job-App: To get notification for new job and to start job and then get job document from aws. After getting thing job document, it will download firmware zip from mention url and update the status of job to SUCCEDED or FAILED. NOTE: aws-iot-sdk-samples.properties files properties should be mention as per your aws account. JOB Document: sample-job-document.json { "ope...

AWS IOT JITR (Just in Time registration) with Thing and Policy creation using JAVA

AWS IOT JITR with Thing and Policy creation using JAVA. This POC will provide Just In Time Registration (JITR) of custom certificate and Thing creation with connect policy for AWS IOT Devices. You just need to add name of thing in common name while creation of device certificate and thing will be created with attached policy & certificate and common name as thing name. Project Overview: Get certificate details from certificate id. Parse certificate details and get common name from certificate. Creates IOT policy having action of connect. Creates IOT thing with name from certificate common name. Attach policy and thing to certificate. Activate Certificate. Now your device can connect to AWS using this custom certificate. Step for JITR & Thing creation Create CA Certificate: openssl genrsa -out CACertificate.key 2048 openssl req -x509 -new -nodes -key CACertificate.key -sha256 -days 365 -out CACertificate.pem Enter necessary details like city, country, et...

Secure Azure Function App with Azure Active Directory (AD). [Token based access]

Welcome to BigDataStacks. This blog is regarding how we can secure azure function app with azure active directory. So when we will try to access function app it will ask for login . I also elaborate on how we can access the function URL with the access token .  Let's start.  Configure Function App Create an Azure Function app with anonymous access. Go to function app's 'Authentication / Authorization' section from 'Platform features'. Turn on App service Authentication/Authorization section. Select action 'Login with Azure AD' Click on Azure AD from Auth provider. Select 'Express' and 'create a new AD app' then click on OK. Click on 'Save'. Again open screen where we selected 'Express mode'. Now Select 'Advanced'. Copy 'clientId' which will be used later. NOTE: If clientId is not showing then refresh the page then it will display.  Add one more entry in 'Allowed Token Audi...