Hadoop or Big Data for Microsoft .NET programmers

You can mail your queries to shabir_hakim1@hotmail.com

1). What are the things .NET programer to focus to work with Big Data. If Hadoop is not a good choice?

Hadoop is a hot up coming big data technology.It is better to includes a number of tech skills like NoSQL databases, analytics and others along.Great thing about this technology is It is affordable because it uses ordinary, low-cost hardware and i believe Big data is not really a new technology, but a term used for a handful of technologies. While some of these technologies have been around for a decade or more, a lot of pieces are coming together to make big data the hot thing for future.

It can also play great role in handling massive amounts of information in all sorts of formats —  tweets, posts, e-mails, documents, audio, video, whatever you can say it is format independent which Top companies are looking for. Don’t think much go ahead because very less people have exp yet… get .NET SDK from codeplex and look into it. 

2). I would like to know the answers for below queries then it helps to get good clarity. If my queries looks like silly as I just started knowing Hadoop and mostly looking to know that if there is a way we can leverage .NET C# skills to work with Hadoop without using Java and without using AZURE (with non AZURE environment, normal windows network, may be linux VMs for HDFS if windows not supported )

1. Should we always require to have Azure to work with hadoop using .net?

2. Can we install HDFS on windows? Basically what .net decelpper has to do with his windows machine to start work with Hadoop?

3. If not can we program on HDFS (installed in Linux) complelty using .NET Hadoop SDK?

4. Below are the Hadoop Components/Prjects mentioned in Hadoop site. In below what are the projects can be equivalent/coverted with .NET Hadoop SDK ?

Hadoop Components

1. Hadoop Common
2. HDFS
3. YARN
4. MapReduce

Below I see are Apache projects related to Hadoop.

1. Ambari
2. Avro
3. Cassandra
4. Chukwa
5. HBase
6. Hive
7. Mahout
8. Pig
9. Zookeeper

5. I just started learning it. Can we have multiple sandboxes (may be ubuntus) in multiple windows machines and work map reduce programs use in .NET/C# with the help of Hadoop .NET SDK?

1. Should we always require to have Azure to work with hadoop using .net?

No ,you can work on local node.actually there is hadoop support for azure also.

2. Can we install HDFS on windows? Basically what .net decelpper has to do with his windows machine to start work with Hadoop?

Again you will use NUGet pacakage from visual studio or powershell to download requirement componets .Installing the Hortonworks Data Platform for Windows couldn’t be easier. Lets take a look at how to install a one node cluster on your Windows Server 2012 machine

http://hortonworks.com/blog/installing-hadoop-on-windows/#.UkFfcX99tZo

 

4. Below are the Hadoop Components/Prjects mentioned in Hadoop site. In below what are the projects can be equivalent/coverted with .NET Hadoop SDK ?

Hadoop Components

1. Hadoop Common
2. HDFS
3. YARN
4. MapReduce

Below I see are Apache projects related to Hadoop.

1. Ambari
2. Avro
3. Cassandra

 

Hadoop is implemented as a set of interrelated project components. The core components are MapReduce, which handles job execution, and a storage layer, typically implemented as the Hadoop Distributed File System (HDFS). For the purpose of this post, we will assume HDFS is in use.

Hadoop components are implemented across a series of servers referred to as data (or compute) nodes.  These nodes are where data are stored and processed.

A name node server keeps track of the data nodes in the environment, which data are stored on which node, and presents that data nodes as a singular entity.  This singular representation is referred to as a cluster.  If you are familiar with the term cluster from RDBMS implementations, please note that there is not necessarily any shared storage or other resources between the nodes.  A Hadoop cluster is purely logical.

From MSDN

If you are a .NET developer, you will want to setup a desktop development environment with the following components:

  1. Visual Studio 2010 or 2012
  2. NuGet Package Installer for Visual Studio
  3. A Local, Single Node Hadoop “Cluster”

Having these components installed on your desktop will allow you to develop against Hadoop locally as well as against a remote cluster (whether on-premise on in the cloud).  You might be able to get away with not installing Hadoop locally, but most of the .NET-oriented documentation I’ve found assumes this is your setup.

This blog will help you how developers can start with hadoop

http://www.tuicool.com/articles/IZfIvi

helping links
http://www.softdevblogs.com/?q=aggregator/sources/5

http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/

http://www.tuicool.com/articles/IZfIvi

The HDInsight Server Developer Preview is an implementation of HDInsight on Windows. This Developer Preview of Apache™ Hadoop™-based services on Windows uses only a single node deployment. HDInsight Server provides a local development environment for the Windows Azure HDInsight Service.The Developer Preview of HDInsight Services on Windows is installed with the Microsoft Web Platform Installer 4.5.

Microsoft has two offerings HDInsight Server (Windows Server) and HDInsight Service (Windows Azure)
You are comparing HDInsight Server with HDP(Horton Data Platform)
– HDInsight Server uses HDP ( HDInsight stacks above HDP)
– Apart from providing Hadoop features(Which HDinsight inherits from HDP) HDInsight adds more feature like it
InteractiveJS (hosted on IIS ) to run your HDFS, Pig commands
Enable business intelligence tools such as Microsoft Excel, PowerPivot for Excel and Power View to draw data and provide insights using various connector
You can deploy and manage hadoop clusters using System center etc,

If you require bare minimum Hadoop on Windows Install HDP if you need more feature like UI(interactive JS) ease of installation… install HDInsight Server

HDP for Windows provides an installer for the HDP Stack of components on a multi-node Windows Server environment. HDP for Windows is Generally Available now to run Hadoop clusters on Windows on premise.

HDInsight Server Single Node Developer Preview, which you linked to, is a Developer Preview. It deploys a single node Hadoop Stack and enables you to develop and experiment with Hadoop.Thus, the difference is in the intended usage. HDP for Windows is for multi-node Hadoop clusters on Windows Server. HDInsight Server Preview is meant as a Developer Preview to enable testing and experimentation for the HDInsight Service.The actual Hadoop stack components are compatible across HDP for Windows and HDInsight Server

YOU CAN GET COMPLETE PACKAGE FROM HERE

 

<div class=”col-2-3″> <div class=”maincontent”>

Hortonworks and Microsoft have partnered to bring the benefits of Apache Hadoop to Windows. Through this partnership we are focused on delivering enterprise grade solutions that integrate deeply with your Microsoft tools and applications.

Hortonworks Data Platform for Windows significantly expands the ecosystem for the next generation big data platform. This means that the Microsoft partners and tools you already rely on can help you with your Big Data initiatives.

 

Try HDP on Windows

http://hortonworks.com/products/hdp-windows/

http://microsoft.com/bigdata

http://hortonworks.com/partner/microsoft/

INSTALLING HDSERVER ON WONDOWS

http://prologika.com/CS/blogs/blog/archive/2012/10/31/installing-hdinsight-server-for-windows.aspx

 

FYI: If you have any issue Regarding HADOOP You will get great help from every one here you go http://hortonworks.com/community/forums/topic/hdp-1-3-flumeagent-and-hbase-services-will-not-start/

 

Install-package Microsoft.WindowsAzure.Management.HDInsight is one of the package in Hadoop .NET SDK.
http://hadoopsdk.codeplex.com/

FULL CONVERSATION.

===================================================

Windows Azure HDInsight Service is the easiest way to deploy, manage and scale Hadoop based solutions. This release includes:

  • Hadoop updates that ensure the latest stable versions of:
    • HDFS and Map/Reduce
    • Pig
    • Hive
    • Sqoop

===================================================================
For .NET developers

http://hadoopsdk.codeplex.com/

http://www.amazedsaint.com/2013/03/taming-big-data-with-c-using-hadoop-on.html

 

Advertisements
By Sriramjithendra Posted in Big Data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s