Hadoop Streaming

Hadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. But the basic concept remains the same. Mappers and Reducers receive their input and output on stdin and stdout as (key, value) pairs. Continue reading

Hadoop installation.

You can setup Hadoop in two ways on your windows machine.

  1. Download Virtualbox or VMware player, install Linux and install Hadoop. A very good tutorial is available here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
  2. Virtualized apps: Download Virtualbox or VMware player, install Cloudera QuickStart VM or Hortonworks Sandbox or MapR Sandbox. This way minimizes the time spent on installing, and configuring Hadoop, then Pig, Hive, and so on. These contain a single-node Apache Hadoop cluster, Eclipse for Java, complete with example data, queries, scripts… You can download them from their websites.
    https://www.youtube.com/watch?v=oNQ8f2My5Hs (for installing Cloudera QuickStart VM)

Continue reading