Skip to main content

HDFS - Useful commands


Configuration Files:
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml

Most Common port for name node UI: 50070

to see the properties file:
> view core-site.xml

Useful Property from core-site.xml
fs.defaultFS -> hdfs://nameNodeIp:8020

Useful Property from hdfs-site.xml
> dfs.blocksize -> <Bytes Value> #Default value in hadoop 2 is 128MB
> dfs.replication

Useful Linux commands
> du -sh <file>

Show all hadoop command line commands:
> hadoop fs

> hadoop fs -ls /user/<USERNAME> #Show User Space

> hadoop fs -ls <dirName> #Show list of files in mentioned folder with permission

> hadoop fs -help <command> #Show help for mentioned commands

> hadoop fs -copyFromLocal <localDirPath> <targetDirPath>/. #copy from local to HDFS. Alternative to -put command.
#NOTE: /. required because we want source folder also to target directory instead of just source folder content.

> hadoop fs du -s -h <filePath> #to show file size in HDFS

> hadoop fs -tail <hdfs-file-path> #Show file content from HDFS

Following commands are used by Admin
> hadoop fsck / -files #file system check. It displays all the files in HDFS while checking

> hadoop fsck / -files -blocks #It displays all the blocks of files in HDFS while checking

> hadoop fsck / -files -blocks -locations #It displays all the files block locations i.e. replicas while checking

> hadoop fsck / -files -blocks -locations -racks #display the networking topology for data-node locations i.e replicas.

> hadoop fsck -delete #delete corrupted files in HDFS


Comments

Popular posts from this blog

CCA 175 Preparation with this 6 practice questions- Try to solve it in 1 hour

As I found very few practice questions available on the internet for CCA 175 - Hadoop & spark developer exam. I set 6 questions exam with the solution provided in the comment section.  If you complete it in less than 1 hour then and then think to apply CCA 175 exam else you need more practice. Question's prerequisites : import data from orders table with parquet file format and save data to hdfs path: /user/rj_example/parquet_data/orders import data from customers table and save data to hdfs path: /user/rj_example/data/customers import data from customers table with avro file format and save data to hdfs path: /user/rj_example/avro_data/customers import data from customers table and save data to hdfs path: /user/rj_example/data/categories import data from products table and save data to hdfs path: /user/rj_example/data/products with '\t' as fields seperator create local dir 'rj_example' copy data from /user/rj_example/data/products to local dir ...

AWS IOT JITR (Just in Time registration) with Thing and Policy creation using JAVA

AWS IOT JITR with Thing and Policy creation using JAVA. This POC will provide Just In Time Registration (JITR) of custom certificate and Thing creation with connect policy for AWS IOT Devices. You just need to add name of thing in common name while creation of device certificate and thing will be created with attached policy & certificate and common name as thing name. Project Overview: Get certificate details from certificate id. Parse certificate details and get common name from certificate. Creates IOT policy having action of connect. Creates IOT thing with name from certificate common name. Attach policy and thing to certificate. Activate Certificate. Now your device can connect to AWS using this custom certificate. Step for JITR & Thing creation Create CA Certificate: openssl genrsa -out CACertificate.key 2048 openssl req -x509 -new -nodes -key CACertificate.key -sha256 -days 365 -out CACertificate.pem Enter necessary details like city, country, et...

AWS Kinesis - Stream, Firehose, Analytics Overview

AWS Kinesis: AWS Kinesis is managed alternative of Apache Kafka. It can be used for big data real-time stream processing. It can be used for applications logs, metrics, forecast data, IoT. It can be used for streaming processing framework like Spark, NiFi, etc.   Kinesis Capabilities: Kinesis Streams : Streaming data ingest at scale with low latency. It is a data stream. Kinesis Analytics : To perform analytics on real-time streaming data using SQL. You can filter or aggregate data in real time. Kinesis Firehose : To load streams of data into S3, Redshift, Splunk or Elastic Search. It is a delivery stream. Kinesis Data Streams : Streams are divided into shards. To scale up application we can update shard configuration by increasing number of shards. By default shard's data can be retained for 1 Day but you can extend it for 7 days. Multiple application can use same stream. Real-time processing of data with a scale of throughput. Record size shoul...