Some little known facts about Hadoop Distributed File System

Feb 20
08:25

2012

Andy R Robert

Andy R Robert

  • Share this article on Facebook
  • Share this article on Twitter
  • Share this article on Linkedin

Jared Diamond once said, “Technology has to be invented or adopted.” At times, people have some qualms in adopting a specific technology.

mediaimage
They tend to do so when they are not familiar with the benefits of adopting that technology. Perhaps this explains why the hadoop distributed file system (HDFS) has failed to cause a stir despite revolutionizing the IT industry. Most of the businesses are yet to come to terms with the benefits of this file system. Nevertheless,Some little known facts about Hadoop Distributed File System Articles the following facts would certainly clear the air: • As the name suggests, this file system is used by Mapreduce applications. In fact, it is the primary storage system that these applications use. • Businesses often look for a file system that is capable of replicating data blocks. As per them, the ideal file system should be able to create multiple replicas of these blocks. HDFS can perform the aforementioned tasks in an easy manner. • Replicated blocks of data can do more harm than good if they are not distributed on computer nodes in a consistent manner. Therefore, it is essential to look for a storage system that can distribute the blocks throughout a cluster. The Hadoop Distributed File System can certainly do so.• Computations can often turn into nightmares. Therefore, it is advisable to look for a storage system that can facilitate faster and reliable computations. HDFS is known to enable rapid computations. • Not many people are aware of the fact that it is very much possible to integrate the data from HDFS with an Enterprise Data Warehouse (EDW). However, this task can only be accomplished if you use SQL, Fastload, ora similar platform. • It is not known to many people that the Table-Valued UDFs (read: Table-Valued User-defined Functions) play a major role in the integration of data. As a matter of fact, each and every UDF in the AMP accesses the files present in HDFS while integrating the data. • The Table-Valued UDFs can help you in loading new data into the EDW. Furthermore, you can also generate a report by joining the HDFS data to the existing tables. Therefore, these UDFs serve several purposes. • It is being said that the current generation HDFS lacks the sophistication of a typical enterprise data warehouse. Some experts have claimed that the users might find it difficult to place limits on individual queries or perform all vital tasks using this storage system. As a matter of fact, it is also believed that an EDW is far better than the HDFS when it comes to balancing mixed workloads.