Quantcast
Channel: Mike Pluta's Grandiose Data Delusions » hadoop
Browsing all 15 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Apache Hadoop Single Node Setup

The following link is to the Apache tutorial on setting up a single node instance of Hadoop. Purpose This document describes how to set up and configure a single-node Hadoop installation so that you...

View Article



Image may be NSFW.
Clik here to view.

Pig for Beginners

   The following link is to a simple tutorial to get started with Pig. Pig is a data flow platform for writing Hadoop operations in a language called Pig Latin. It adds a layer of abstraction on top of...

View Article

Image may be NSFW.
Clik here to view.

Hive for Beginners

The following link is to a simple tutorial demonstrating the installation, configuration and use of (getting started with) Hive. Hive is a data warehouse system for Hadoop that facilitates ad-hoc...

View Article

Image may be NSFW.
Clik here to view.

Step-by-Step MapReduce Programming

The following link is to a simple tutorial demonstrating how to get started writing map and reduce functions for Hadoop in Java using Eclipse. Objective We will learn the following things with this...

View Article

Image may be NSFW.
Clik here to view.

Single-Node Hadoop Setup

This is a link to a straightforward, step-by-step, tutorial guiding one through the installation of a single node instance of Hadoop. Objectives We will learn the following things with this tutorial...

View Article


Image may be NSFW.
Clik here to view.

Hadoop Streaming Made Simple using Joins and Keys with Python

There are a lot of different ways to write MapReduce jobs!!! Sample code for this post https://github.com/joestein/amaunet I find streaming scripts a good way to interrogate data sets (especially when...

View Article

Image may be NSFW.
Clik here to view.

Hadoop and S3: 6 Tips for Top Performance

mortardata: Doug Daniels Netflix kicked off the first session at this summer’s Hadoop Summit, telling the crowd about their Hadoop stack that powers its world-renowned data science practice. The...

View Article

Image may be NSFW.
Clik here to view.

Free Big Data Education Resources

Daniel Gutierrez wrote a series of articles (three of them) at Big Data Republic presenting a number of free education opportunities focused on Big Data. His first article was focused on fairly high...

View Article


Image may be NSFW.
Clik here to view.

Download the New Impala e-Book from O’Reilly Media

O’Reilly Media and Cloudera have produced a 30(ish) page e-book on the internals and architecture of Cloudera’s Impala Implementation. The author, John Russell of Cloudera, promises that the content...

View Article


Image may be NSFW.
Clik here to view.

Hadoop 2.0 | SmartData Collective

See on Scoop.it – Evidence Based Systems I asked my friend Scott Kahler about Hadoop 2.0 and he was nothing short of effusive. “Yes, it’s huge deal. YARN will make Hadoop a distributed app platform and...

View Article

Image may be NSFW.
Clik here to view.

Why Big Data 101?

While the following events did not occur verbatim, they did happen in spirit. At a recent family event I had the opportunity to talk with my younger brother. He graduated from the University of...

View Article

Image may be NSFW.
Clik here to view.

A Quick History of Hadoop

Level Set and Perspective Before I take us down the technical path, I thought it would make sense to level set ourselves with a check on the history of Hadoop. I believe you will see that knowing a bit...

View Article

Image may be NSFW.
Clik here to view.

Hadoop V1 Architecture Overview

As we’ve covered in previous articles, Hadoop is an open source software development project. It is a project hosted by the Apache Software Foundation. Hadoop is software focused on reliable, scalable,...

View Article


Image may be NSFW.
Clik here to view.

Hadoop Distributed File System: Version 2 – Part I

To recap, version 1 of Hadoop is made up of two basic components; the foundation is a fault-resilient distributed file system called the Hadoop Distributed File System (HDFS), upon which a framework...

View Article

Image may be NSFW.
Clik here to view.

To Model Or Not To Model: Is That The Quesion?

The basic gist of this article is that the exercise of data modeling is just as important when using the big data and NoSQL technologies as it is when using the more traditional relational algebra...

View Article

Browsing all 15 articles
Browse latest View live




Latest Images