Bright HDInsight Part 6: Take advantage of your console context

The Microsoft.Hadoop.MapReduce SDK for HDInsight (Hadoop on Azure) requires you to package your map reduce job in a windows console application that is then used by the Hadoop Streaming API in order to run your logic. This gives a very nature way to configure and orient your application at startup, which this blog will give a simple example of.

Console Arguments

The natural requirement to explore is the supply of runtime arguments to the Map Reduce console application (hereafter, package), which can be used for configuration purposes. We may use them to manually specify additional generic parameters, input or output locations or control the type of job being run. Consider the input location purpose; we may want to write a single Map Reduce job, but then be able to run that job against different datasets, differententiated by their storage location.

A very natural way to do this for anyone familiar with a commandshell is that you may wish to run:

mypackage.exe /path/to/input.txt

This would specify a path to an input location on which the application could work. In order to achieve this with the SDK, given that we have parsed and verified the input, we can then include the arguments that we are interested in as a parameter to the HadoopJobExecutor.ExecuteJob(string[] args) method.

Once we have done that, in our TJobClass (subclass of HadoopJob) Configure method, we are provided with a ExecutorContext class. This ExecutorContext has a single Property “Arguments”.

This Property is the string[] parameter that we passed into the ExecuteJob method. We can then use this to configure the Job during setup:

public override HadoopJobConfiguration Configure(ExecutorContext context)
{
var hadoopConfiguration = new HadoopJobConfiguration();
hadoopConfiguration.InputPath = context.Arguments[0];
hadoopConfiguration.OutputFolder = “asv://output/” + DateTime.Now.Ticks.ToString(CultureInfo.InvariantCulture);

return hadoopConfiguration;
}

Conclusion

Use your console context well. Further examples of what you could do: capture the output of the package host to a text file with > or >> commands; pipe the results to another program using |; chain execution together with a batch file; schedule execution with the Windows Scheduler.

Happy cloudy-big-dataification (and happy new year),
Andy

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>