Solution:Hadoop is an open source platform providing highly reliable, scalable, distributed processing of large data sets using simple programming models. Hadoop provides a reliable shared storage and analysis system. In this, storage, is provide by HDFS (Hadoop distributed file system) and analysis by map reduce.
HDFS breaks a file into chunks and distributed them across the nodes of the cluster. It stores large amount of data on local disk across a distributed cluster of computers. It is written in Java. Hadoop work with the tools : Map Reduce, H base, hive.
Map Reduce A distributed data processing model and execution environment that runs on large clusters of commodity machines.
Hbase → It is a distributed, column-oriented database. HBase uses HDFS for its underlying storage and supports both batch style computations using map reduce and point queries. HIVE → It is a distributed data ware house. Hive manages data stored in HDFS and provides a query language based on SQL.