When Windows Hadoop Streaming forgets how quotes work …

Very short post.

Hadoop streaming works on the command line. When you want to pass your job paths to files, if those files contain spaces, you need to quote the input parameters.

Typically if you rdp onto a HDinsight instance to do this, you will double click the “Hadoop Command Prompt” shortcut on the desktop, which is a shortcut to C:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd.

From this morning, it seems that the hadoop command line options in the script at hadoop.cmd is no longer handling quotes “ in the way it was before that allowed one to delimit arguments using the double quotes.

E.g.
> HadoopProgram.exe “/My Folder/*/*/*/*.gz”
Returns error:
ERROR security.UserGroupInformation: PriviledgedActionException as:admin cause:org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: asv://mycontainer@myclusterhdinsight.blob.core.windows.net/user/admin/GÇ£/My
13/08/02 09:12:12 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: asv://mycontainer@myclusterhdinsight.blob.core.windows.net/user/admin/GÇ£/My
Streaming Command Failed!
To fix it first change the command line options to handle double quotes. You can do so by creating a new shell within your existing hadoop shell with:
cmd /S
Then the exact same command above will run successfully.
Thanks to my bro Simon Perrott for sharing this lovely experience with me ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>