Distributed storage cluster and Hadoop

Abhishek kumar
3 min readSep 17, 2020

--

In this big these related queries might have solved.

  1. what is big data?

2. what do industries have to do with big data?

3. How organisations dealing with big data?

4. How do we manage it and use the related technology?

5. what is distributed storage and how it helps to reduce big data problem?

6. what is Hadoop technology?

  1. what is big data?

big data is a problem mainly occured in big industries and organisations. every social media platforms like Facebook, Instagram, Linkedin etc have millions of users and everyday billions of photos, videos have uploaded on these platforms. we can’t store these data in a single hard disk. so, big data problem occurs.

Big data problems

2. what do industries have to do with big data?

As every social media platforms and industries have million of clients. data of those client stored at the data center of the organisation but the major problem is, there is no any appliance available, which can store such big number of images, videos or other data. this is the biggest storage problem.

  • -> if any industry capable to make such appliance that can store such big data then one more problem occurs i.e. velocity (I/O), system works very slow and storing the data i.e output and read the data i.e input becomes very slow.
8 v’s of big data
  • -> two major big data problem is:
  • volume(size) and velocity(I/O)
  • Volume is major issue,when the size of data is larger than the size of storage in appliance.
  • Velocity, we always want fastly upload and retrieve the data, but due to big data problem this process becomes very slow.

3. How organisations dealing with big data?

for dealing with big data problem, organisations are creating clusters using different distributed storage technology that provides industries a reliability in data storage.

4. How do we manage it and use the related technology?

the major technology for the big data storage is distributed storage and for this type of storage, the major tech is Hadoop, who helps to create the HDFS cluster and manage & store the data.

5. what is distributed storage and how it helps to reduce big data problem?

It can easily understand by an easy example, as I have to store a data i.e bigger than the storage available in my appliance or device. so, we can arrange different hard disks and make a distributed cluster.

  • -> by using this method, we split the file in different portion and store it in different storage.
  • -> it uses the topology of master-slave model, in which one system or server is master connected with different slaves, that gives master its storage and client contact directly to master to store the data.
  • -> master is called namenode and slaves are datanode.

6. what is Hadoop technology?

Hadoop is a technology that provides the facility to create HDFS cluster for data storage to reduce the big data problems.

-> It also provides the facility to create the map reduce cluster, this is a cluster of RAM and CPU and feels like a supercomputer.

--

--