Hadoop Streaming

Hadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. But the basic concept remains the same. Mappers and Reducers receive their input and output on stdin and stdout as (key, value) pairs. Continue reading

Javascript: Find duplicates in the array and delete from array.

Finding duplicates in array. Please go to groggyjava.

Deleting duplicates:

var duplicateCodes= [];
duplicateCodes.push(“ss”, “”, “ee”, “”,”ee”);  //inserting duplicates and some empty values.

duplicateCodes= duplicateCodes.filter(function(e){return e});  // removing empty values.
if(duplicateCodes.indexOf(“ee”) >= 0)  // checking if “ee” is already in the array
alert(“Please remove \n” + duplicateCodes.join(‘\n’));

Hadoop installation.

You can setup Hadoop in two ways on your windows machine.

  1. Download Virtualbox or VMware player, install Linux and install Hadoop. A very good tutorial is available here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
  2. Virtualized apps: Download Virtualbox or VMware player, install Cloudera QuickStart VM or Hortonworks Sandbox or MapR Sandbox. This way minimizes the time spent on installing, and configuring Hadoop, then Pig, Hive, and so on. These contain a single-node Apache Hadoop cluster, Eclipse for Java, complete with example data, queries, scripts… You can download them from their websites.
    https://www.youtube.com/watch?v=oNQ8f2My5Hs (for installing Cloudera QuickStart VM)

Continue reading