Tuesday, 13 January 2015

Overview of Hadoop

Overview of Hadoop


Hadoop is an Ecosystem of products
•  Open source
•  Vendor distributions
•  Additional tools for development and administration

Hadoop Benefits
• Enables big data a nalytics
•  Supports advanced forms of analytics
•  Scales cost effectively
• Extends a data warehouse environment

Hadoop Limitations
• Low latency queries
• Ease of access
• Data integration and integrity
• Fine grained security
Hadoop online Training


Benefits-for-hadoop-training

Hadoop Solutions Development: The Developing Solutions using Apache Hadoop training
course is designed for developers who want to better understand how to create Apache Hadoop
solutions.

Hadoop Distributed File System (HDFS) :A reliable and distributed Java. based file system that allows large volumes of data to be stored and rapidly accessed across large clusters of commodity servers.

Hive: Built on the Ma pReduce framework, Hive is a data warehouse that enables easy data
summarization and ad.hoc queries via an 9:t1. -like interface for large datasets stored in HOP.
Hadoop Certification: This certification exam establishes you as a trusted and valuable resource for
those looking to work with an Apache Hadoop expert.

Hadoop Cluster Administration: The Apache HadoopAdministration training course is
designed for administrators who are interested in learning how to deploy and manage a Hadoop cluster.

Pig:A platform for processing and analyzing large data sets. Pig consists on a high-level language (Pig Latin) for expressing data analysis programs paired with the Ma pReduce framework for processing these programs.





How are enterprises using Hadoop?

Hadoop is a distributed computing framework for storing and processing massive volumes of data based on an Apache open source software project. Typical use cases include log and/or clickstream analysis, sales and marketing analytics, machine learning and/or data mining, image processing, Web
crawling and/or text processing, sentiment analysis, CDR analytics data staging, and general archiving.



hadoop Advantages and Disadvantage

 Advantage
• Scalable, means extension of storage node and disk.
• Reliable -This is the important when we talk about data and here each block get replicate and keep the data safe.
• Failed Recovery- Prevents from the wastage of space used by retired task data, untransferred data.
• Not complex and makes simple and smooth handling of large data sets.
• Error Recovery: It automaticall replicate the data if server or disk got crashed.
• Decrease Overload - It distribute the data on different servers and prevent from network overloading.

Disadvantage:
• Not fit for small and real time data applications.
• Joining multiple data sets are complex.
• Operated by a single master will cause difficulty in scaling.
• Doesn't have storage or network level encryption.


Hadoop Benefits

• Stores (HDFs) and Process (MR) large amounts of data
• Scales (1005 and 1000s of nodes)
• Inexpensive (no license cost, low cost hardware)
• Fast (1TB sort in 62s, 1PB in 16.25h*)
• Availability (failover built into the platform)
• Data Recoverability (failure should not result in any data loss)
• Replication (out -of -the -box 3 -way replication and configurable)
• Better Throughput (Time to read the whole dataset is more important than latency in reading the first record)
• Write once and read -many -times pattern
• Works well with structured, unstructured or semi -structured data


Job Opportunities in Hadoop in India

Because of sudden rise in e -commerce social media tons of data created everyday and to manages store such huge amount of data companies are using Hadoop that creates a huge demand of Hadoop professionals.

Big data Hadoop is a highest paying jobs of 21st centaury as per Harvard Business review as data in the mail report asset of a company.

Data created in the last 2 years in is more than what created in the last 50 years that creates a sudden demand in big data industry.

Companies like Snapdeal, Flipkart, Jabong as well as Amazon, ebay & facebook uses Hadoop.