Monthly Archives: August 2013

When Windows Hadoop Streaming forgets how quotes work …

Very short post.

Hadoop streaming works on the command line. When you want to pass your job paths to files, if those files contain spaces, you need to quote the input parameters.

Typically if you rdp onto a HDinsight instance to do this, you will double click the “Hadoop Command Prompt” shortcut on the desktop, which is a shortcut to C:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd.

From this morning, it seems that the hadoop command line options in the script at hadoop.cmd is no longer handling quotes “ in the way it was before that allowed one to delimit arguments using the double quotes.

> HadoopProgram.exe “/My Folder/*/*/*/*.gz”
Returns error:
ERROR security.UserGroupInformation: PriviledgedActionException as:admin cause:org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: asv://Ç£/My
13/08/02 09:12:12 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: asv://Ç£/My
Streaming Command Failed!
To fix it first change the command line options to handle double quotes. You can do so by creating a new shell within your existing hadoop shell with:
cmd /S
Then the exact same command above will run successfully.
Thanks to my bro Simon Perrott for sharing this lovely experience with me ;-)

Super Dev Dogfooding / Freecycling

This one goes out to people who write desktop apps. I’m thinking particularly of heavy weight dev tools such as Visual Studio or Eclipse. These are great tools and shape more of the modern world than they directly touch. The best web applications, globally distributed and touching billions of people daily, are constructed in these tools.

They are amazing. They (and their predecessors) have democratised the world; shaking governments and freeing oppressed masses. Their triumphs have be lauded. Their triumphs have been banned. They are the emancipation of thought to creative engineering ends.

But they won’t work having been installed for more than 6 months. They become languid and fraught with poor performance. They have corruptions, misconfigurations and malfunctions. They show the scars of freeing the creativity of man.

I’ve yet to see a development machine put to use daily that can retain the initial performance of its installation date.

My suggestion; when a dev machine that has been loved is to be replaced – send it to a software vendor who writes the primary technology / IDE that runs on it. Let them see why its performance has degraded.

Then they may fix it.

Who wants in?

The alternative? Migrations from IDE to uIDE. Sublime Text. Brackets. Notepad. Nano. Vi.