Hadoop distributed file system is a flexible, adaptable, and resilient way to manage records in a big data environment. HDFS is not the final destination of the record. Or, it's a data service that provides an extraordinary capacity arrangement when data volumes and speeds are high. Because data is written once and then typically read from that point, rather than being read and written consistently by other document frameworks, HDFS is a good choice for support Big data 分析.
Hadoop uses innovations in HDFS to store petabytes of data. HDFS can be used to connect commercial PCS or hardware, called hubs in Hadoop. These hubs are associated through clusters on which data records are stored in a distributed manner. Entire clusters and hubs can efficiently store and process data using the strength of Hadoop's distributed file system. Carefully stream access to data Program on.
Hadoop distributed file system architecture
HDFS uses a master/slave architecture, where one gadget(master) controls at least one or more gadgets (slave). HDFS consists of a name Node and a master server that handles the document framework namespace and manages access to records.
Node name: This is the brain of the HDFS; it chooses where to place the data into a Hadoop bunch. It handles file system metadata.
Data node: A Data Node is where the HDFS stores real information.
The Hadoop distributed file system enables clients to store data in records that are part of multiple blocks. Because Hadoop is designed to process large amounts of data, HDFS's block size is much larger than that used by typical relational databases. The default square size is 128MB, and you can set the size to 512MB.
HDFS data is distributed to nodes in the cluster, but from your point of view, it is a unified file system that you can access from any node in the cluster. You can use a single named Node to run a cluster whose activity is to hold and store metadata related to the Hadoop distributed file system.
HDFS's unique features:
容错: Individual blocks of data are stored on many machines depending on replicators. This unique feature allows Hadoop to be fault-tolerant and fail to access any node that does not affect data. If the replication factor is 3, the data is stored on three data nodes.
Scalability: Data transfer takes place legitimately through data nodes, so your read and write limits can be reasonably extended based on the number of data nodes.
空间: If there are additional space requirements, simply include another information center and rebalance all data nodes.
HDFS works well in big data for the following reasons:
- HDFS uses the MapReduce technology to access data quickly
- The data consistency model it pursues is basic, but exceptionally powerful and adaptable
- Compatible with any commodity operating system and hardware
- Economy is achieved by distributing data across clusters with parallel nodes for processing.
- Data is always secure because it is automatically saved in many fields in a simple way
- It provides a JAVA API and even a C language wrapper
- It is efficient to use the Internet browser, which makes it very practical.
That's an overview of the complex Hadoop distributed file system. If you have any questions, please the most formal gambling software.