Hadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. But the basic concept remains the same. Mappers and Reducers receive their input and output on stdin and stdout as (key, value) pairs. Continue reading
You can setup Hadoop in two ways on your windows machine.
- Download Virtualbox or VMware player, install Linux and install Hadoop. A very good tutorial is available here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
- Virtualized apps: Download Virtualbox or VMware player, install Cloudera QuickStart VM or Hortonworks Sandbox or MapR Sandbox. This way minimizes the time spent on installing, and configuring Hadoop, then Pig, Hive, and so on. These contain a single-node Apache Hadoop cluster, Eclipse for Java, complete with example data, queries, scripts… You can download them from their websites.
https://www.youtube.com/watch?v=oNQ8f2My5Hs (for installing Cloudera QuickStart VM)
Introduction to Apache Spark
It is a framework for performing general data analytics on distributed computing cluster like Hadoop.It provides in memory computations for increase speed and data process over mapreduce.It runs on top of existing hadoop cluster and access hadoop data store (HDFS), can also process structured data in Hive and Streaming data from HDFS,Flume,Kafka,Twitter
Is Apache Spark going to replace Hadoop?
- Free videos – MapR Academia
- Udacity course
- Hortonworks Sandbox
- Hadoop Ecosystem
- Running Hadoop Map-Reduce
- Hadoop Screencasts
- Reza Shiftehfar’s blog I
- Reza Shiftehfar’s blog II
- Reza Shiftehfar’s blog III
- Reza Shiftehfar’s blog IV
- Reza Shiftehfar’s blog V
- Reza Shiftehfar’s blog VI
- Reza Shiftehfar’s blog VII
- Deploying Storm on Hadoop for Advertising Analysis
- Hadoop classes by Cloudera
- EMC classes: Big Data, Analytics, Data Science
- Simulated Hadoop
Azure HDInsight is a service that deploys and provisions Apache™ Hadoop® clusters in the cloud, providing a software framework designed to manage, analyze, and report on big data. Continue reading
Hadoop has been all the rage the last year or so and anyone who does not know that Microsoft is very serious about Hadoop has clearly not been paying attention. HDInsight is what Microsoft is calling their suite of 100% Apache Hadoop compatible software. They refer to it as part of their “end-to-end roadmap for Big Data” and they’re not kidding, it’s integral.
A few things may jump out from this as odd or funny. One would be ‘what is Microsoft doing in the open source world?’. If this is a surprise to you then you really have been living under a rock. Microsoft is working very closely with Hortonworks and contributing heavily on Hadoop. They are also contributing heavily to the Linux kernel since 2009.
Like them or not you have to give Microsoft credit for making working with technology easier. Their work with Hadoop has been much the…
View original post 259 more words