Apache Hadoop Single Node Setup
The following link is to the Apache tutorial on setting up a single node instance of Hadoop. Purpose This document describes how to set up and configure a single-node Hadoop installation so that you...
View ArticlePig for Beginners
The following link is to a simple tutorial to get started with Pig. Pig is a data flow platform for writing Hadoop operations in a language called Pig Latin. It adds a layer of abstraction on top of...
View ArticleHive for Beginners
The following link is to a simple tutorial demonstrating the installation, configuration and use of (getting started with) Hive. Hive is a data warehouse system for Hadoop that facilitates ad-hoc...
View ArticleStep-by-Step MapReduce Programming
The following link is to a simple tutorial demonstrating how to get started writing map and reduce functions for Hadoop in Java using Eclipse. Objective We will learn the following things with this...
View ArticleSingle-Node Hadoop Setup
This is a link to a straightforward, step-by-step, tutorial guiding one through the installation of a single node instance of Hadoop. Objectives We will learn the following things with this tutorial...
View ArticleHadoop Streaming Made Simple using Joins and Keys with Python
There are a lot of different ways to write MapReduce jobs!!! Sample code for this post https://github.com/joestein/amaunet I find streaming scripts a good way to interrogate data sets (especially when...
View ArticleHadoop and S3: 6 Tips for Top Performance
mortardata: Doug Daniels Netflix kicked off the first session at this summer’s Hadoop Summit, telling the crowd about their Hadoop stack that powers its world-renowned data science practice. The...
View ArticleFree Big Data Education Resources
Daniel Gutierrez wrote a series of articles (three of them) at Big Data Republic presenting a number of free education opportunities focused on Big Data. His first article was focused on fairly high...
View ArticleDownload the New Impala e-Book from O’Reilly Media
O’Reilly Media and Cloudera have produced a 30(ish) page e-book on the internals and architecture of Cloudera’s Impala Implementation. The author, John Russell of Cloudera, promises that the content...
View ArticleHadoop 2.0 | SmartData Collective
See on Scoop.it – Evidence Based Systems I asked my friend Scott Kahler about Hadoop 2.0 and he was nothing short of effusive. “Yes, it’s huge deal. YARN will make Hadoop a distributed app platform and...
View ArticleWhy Big Data 101?
While the following events did not occur verbatim, they did happen in spirit. At a recent family event I had the opportunity to talk with my younger brother. He graduated from the University of...
View ArticleA Quick History of Hadoop
Level Set and Perspective Before I take us down the technical path, I thought it would make sense to level set ourselves with a check on the history of Hadoop. I believe you will see that knowing a bit...
View ArticleHadoop V1 Architecture Overview
As we’ve covered in previous articles, Hadoop is an open source software development project. It is a project hosted by the Apache Software Foundation. Hadoop is software focused on reliable, scalable,...
View ArticleHadoop Distributed File System: Version 2 – Part I
To recap, version 1 of Hadoop is made up of two basic components; the foundation is a fault-resilient distributed file system called the Hadoop Distributed File System (HDFS), upon which a framework...
View ArticleTo Model Or Not To Model: Is That The Quesion?
The basic gist of this article is that the exercise of data modeling is just as important when using the big data and NoSQL technologies as it is when using the more traditional relational algebra...
View Article
More Pages to Explore .....