Category Archives: Uncategorized

Bright HDInsight 8: Use Hive Variables with Powershell

This post is part best practice, part workaround.

When submitting Hive jobs to a HDInsight cluster with Powershell, one might consider building a common script, controlling or securing this in a version control system and then using variable substitution to provide the values required. This post explores how to supply these variables.

Firstly, one can either submit a hive job with Invoke-AzureHDInsightHiveJob, Invoke-Hive or simply hive. These last two are simply alias for the first commandlet. I will be using “hive” throughout. When I write “hive” I am remotely executing from powershell, via templeton, to a remote HDInsight cluster. POWER!

Hive allows you to supply variables straight off the command line, but when using Powershell we’ll have to wrap them up. We’ll want to get to something as close as possible to the below type of argument passing.

hive -hiveconf state='California' -f /hive.hql

In this example, we pass a hive config value of “California” for the key “state”. We can imagine a query like the below using this setting:

select count(*) from hivesampletable where state='${hiveconf:state}'

As you can see, to load the value out of the -hiveconf flag, we use ${hiveconf:keyname} as a convention.

Hive’s powershell is described with Get-Help hive as:

 NAME Invoke-AzureHDInsightHiveJob SYNTAX Invoke-AzureHDInsightHiveJob [[-Query] ] [-Arguments ] [-Defines ] [-File ] [-Files ] [-JobName ] [-StatusFolder ] [] ALIASES Invoke-Hive hive REMARKS None 

It’s clear that we can only really use “-Arguments” or “-Defines” in order to really affect the command line; -File and -Files supply scripts, -JobName and -StatusFolder provide metadata regarding job execution.

Without clear guidance in the powershell help, it took another Hive command to come to the rescue. The “set” command lets us see all the arguments defined for the executing script. Using this in conjunction with both the -Defines and -Arguments flags allow us to see how this affects the runtime environment:
PS Z:\src\powershell> hive "set" -Arguments @("argument=123") -Defines @{ "defines"="abc" } Submitting Hive query.. Started Hive query with jobDetails Id : job_201405061100_0084 Hive query completed Successfully *snip* defines=abc *snip* env:_arguments=C:\apps\dist\hive-0.11.0.1.3.7.1-01293\lib\hive-cli-0.11.0.1.3.7.1-01293.jar org.apache.hadoop.hive.cli.CliDriver -hiveconf hive.querylog.location C:\apps\dist\hive-0.11.0.1.3.7.1-01293\logs\history -hiveconf hive.log.dir C:\apps\dist\hive-0.11.0.1.3.7.1-01293\logs --hiveconf "mapred uce.job.credentials.binary=c:/apps/temp/hdfs/mapred/local/taskTracker/ecommerce/jobcache/job_201405061100_0084/jobToken" --hiveconf "hive.metastore.local=false" --hiveconf "hive.metastore.uris=thrift://headnodehost:9083" --hiveconf "hive.metastore.warehouse.dir=/apps/hive/warehouse" --hiveconf "def ines=abc" --hiveconf "hdInsightJobName=Hive: 451d5737109c47e79bfd" "argument=123" -f 451d5737109c47e79bfda2f5bcb9da23.hql *snip* system:defines=abc

The important parts to notices here are that the -Arguments flag is appended after the Job name and without a –hiveconf prefix (attempting to add this causes an error). However, the -Defines creates a system level variable which can be accessed. There is a caveat here that there are many reserved system variables, you must be careful not to clash with those. 

So if we run either of these two queries, we can achieve what we want:

hive "select count(*) from hivesampletable where state='`${system:state}'" -Defines @{"state"="California"}
hive "select count(*) from hivesampletable where state='`${hiveconf:state}'" -Defines @{"state"="California"}
Submitting Hive query.. Started Hive query with jobDetails Id : job_201405061100_0085 Hive query completed Successfully 6881 

Beyond this we can also abstract that hive statement into a file, upload this to the root of our HDInsight container and then submit a much more terse statement, and reuse out logic, externalised into a file.

PS Z:\src\powershell> hive -File /hive.hql -Defines @{"state"="California"} Submitting Hive query.. Started Hive query with jobDetails Id : job_201405061100_0087 Hive query completed Successfully 6881

The statement:  hive -File /hive.hql -Defines @{“state”=”California”} is quite close to the original use case, hive -f /hive.hql -hiveconf “state=California”! 

Bright HDInsight 7: Customise Your Cluster

HDinsight as a clustered, cloud backed resource allows for the offloading of management overhead of cluster maintenance, tuning, patching and other sysadmin type operations. The flipside to this is that for the management overhead that you lose you pay a price of a lower level of control. You cannot elevate your user context to administrative privileges. You cannot expect tweaks applied after a cluster has been created to be permanent. Certain operations inside Azure will cause “reimaging” events, where cluster members are returned to their initial state. This is in line with all clouds offering PaaS compute. In order that this loss of control should not hamper the users of HDInsight, Microsoft have supplied a comprehensive cluster customization capacity at provision time.

Using this capacity, it is possible to affect core Hadoop configuration settings in:

  • Core-site.xml
  • Mapred-site.xml
  • Hdfs-site.xml

These should be used in utter total preference to making manual changes after having connected with RDP.

The code for achieving this is similarly straight forward as the previous examples. Simply create objects to represent the configuration settings required at provision time, and submit them with the provisioning request. Again using Powershell as our exemplar language, we can achieve this with 4 simple commands.

$coreConfig = @{    

“io.compression.codec”=”org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec”;        

“io.sort.mb” = “2048″;
}

$mapredConfig = new-object ‘Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightMapReduceConfiguration’

$mapredConfig.Configuration = @{                

“mapred.tasktracker.map.tasks.maximum”=”4″;

$clusterConfig = New-AzureHDInsightClusterConfig -ClusterSizeInNodes 64 

$clusterConfig = $clusterConfig | Add-AzureHDInsightConfigValues -Core $coreConfig -MapReduce $mapredConfig 

$clusterConfig | New-AzureHDInsightCluster -Credential $clusterCreds -Location $location -Name $clusterName -Verbose -ErrorAction Stop

Happy Hadooping!

Hidden MVP Benefits

I’m an MVP for another year. I’m happy about that, but this blog isn’t going to be a boring bleat about it. Just a nice story about the internet.

TL;DR – I lost my wallet and got it back via the internet.

Lost & Found

I lost my wallet near where I live a few days ago. Late at night; wallet in jacket pocket; warm weather so slung my jacket over my shoulder and plop I guess. Anyway, I reported it missing with the railway peeps (it had a season ticket in it) and did the irritating cancellation of all my bank cards thing.

The wallet was quite a cherished gift, so I was upset but had no expectation to get the thing back.

One day I got a message on Twitter from a chap who wanted me to follow him. I did and then got a DM for me to come along to a police station (eek!) as they had some lost property of mine. When I did, the guy told me a story about how he found me; in my wallet I had no clear identification, but some cards etc.

He’d called the bank hoping to get some contact info but as always they refused to help (since my privacy is clearly more important that retrieving my property ffs).

He’d called the train company but they couldn’t find my season ticket on their system (train companies in England have steam powered computers and electric trains; contrary to stereotype).

He’d called my gym but they didn’t have a current address or contact number (I really should go to the gym more… :P ).

Then he found my MVP card; did a web search for my name and MVP and found my MVP profile page on Microsoft’s web. Listed on there was my Twitter handle, and he used that to get in touch.

So wow; got my stuff back via the power of the MVP program.

Thanks guys ;-)

Diagnosing a Windows Azure Website Github integration error

Yesterday I experienced an issue when trying to integrate a Windows Azure Website with Github. Specifically, my code would deploy from the master branch, but if I chose a specific other branch called ‘prototype’ I received a fetch error in the Windows Azure Management Portal:

This error has been reported to the team and I’m sure will be rectified so nobody else will run into it, but at Cory Fowler’s (@syntaxC4) prompting I wanted to document the steps I took to debug this as these steps may be useful to anyone struggling to debug a Windows Azure Website integration.

Scenario

In my scenario I had a project with a series of subfolders in my github repo. The project has progressed from a prototype to a full build but we were required to persist the prototype for design reference. We could have created ‘prototype’ without changing the solution structure, but as in all real world scenarios the requirement to leave the prototype available emerged only when we removed it and had changed the URL structure. We were only happy to continue working on new code if we could label the prototype or somehow leave it in a static state while the codebase moved on. This requirement is easily tackled by Windows Azure Websites and its Github integration; we changed the solution structure to have subfolders, created a new branch ‘prototype’ and continued our main work in ‘master’. Our ‘master’ branch has the additional benefit of having the prototype available for reference and quick applications of code change if we want to pivot our approach.

We then created two Windows Azure Websites (for free, wow!). In order to allow Windows Azure Websites to deploy the correct code for each, we created a .deployment file in each repository. In this .deployment file we inform Windows Azure Websites (through its Kudu deployment mechanism) that it should perform a custom deployment.

For the ‘master’ branch we want to deploy the /client folder, which involves a simple .deployment file containing

[config]
project = client

For the ‘prototype’ branch we want to deploy the /prototype folder, which involves a simple .deployment file containing

[config]
project = prototype

As you can see, these two branches then can evolve independently (although the prototype should be static).

Problems Start

The problems began when I tried to create a Windows Azure Website and integrate it with Github for the ‘prototype’ branch. No matter what I did, I couldn’t get the Github fetch to work:

At this point I stuck an email to some guy and David Ebbo (@davidebbo) prompted me to stop being lazy and look for some deployment logs. Powershell is your friend when it comes to debugging Windows Azure Websites, so I started there.

The first thing to do is to get the Logs using ‘Save-AzureWebsitesLog’:

PS C:\> Save-AzureWebsiteLog -Name partyr
Save-AzureWebsiteLog : Access to the path 'C:\logs.zip' is denied.
At line:1 char:1
+ Save-AzureWebsiteLog -Name partyr
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 + CategoryInfo : CloseError: (:) [Save-AzureWebsiteLog], UnauthorizedAccessException
 + FullyQualifiedErrorId : Microsoft.WindowsAzure.Management.Websites.SaveAzureWebsiteLogCommand

Oops, helps if pwd is something writable to the current user…

PS C:\> cd temp
PS C:\temp> Save-AzureWebsiteLog -Name myWebsite
PS C:\temp> ls
 Directory: C:\temp
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 23/09/2013 18:48 24406 logs.zip

Ok great. We have some Logs. Lets take a look!

Inside the zip at the location: /deployments/temp-59fd85ea/ are two files, log.xml and status.xml. These didn’t prove very useful :-)

Log.xml:

 <?xml version="1.0" encoding="utf-8"?>
 <entries>
 <entry time="2013-09-23T17:45:38.0768333Z" id="e7f9db74-a9e5-4738-93ee-028d051b6fd6" type="0">
 <message>Fetching changes.</message>
 </entry>
 </entries>

Status.xml:

 <?xml version="1.0" encoding="utf-8"?>
 <deployment>
 <id>temp-59fd85ea</id>
 <author>N/A</author>
 <deployer>GitHub</deployer>
 <authorEmail>N/A</authorEmail>
 <message>Fetch from git@github.com:elastacloud/asosmyWebsite.git</message>
 <progress></progress>
 <status>Failed</status>
 <statusText></statusText>
 <lastSuccessEndTime />
 <receivedTime>2013-09-23T17:45:37.9987137Z</receivedTime>
 <startTime>2013-09-23T17:45:37.9987137Z</startTime>
 <endTime>2013-09-23T17:45:40.8578955Z</endTime>
 <complete>True</complete>
 <is_temp>True</is_temp>
 <is_readonly>False</is_readonly>
 </deployment>

In the zip file at the location /LogFiles/Git/trace is a file that has much more useful information.

Part way down this encoded xml file is the error:

 <step title="Error occurred" date="09/23 17:16:12" type="error" text="fatal: ambiguous argument 'prototype': both revision and filename&#xA;Use '--' to separate filenames from revisions&#xA;&#xD;&#xA;D:\Program Files (x86)\Git\bin\git.exe log -n 1 prototype" stackTrace=" at Kudu.Core.Infrastructure.Executable.Execute(ITracer tracer, String arguments, Object[] args)&#xD;&#xA; at Kudu.Core.SourceControl.Git.GitExeRepository.GetChangeSet(String id)&#xD;&#xA; at Kudu.Services.FetchHandler.&lt;PerformDeployment&gt;d__c.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at Kudu.Services.FetchHandler.&lt;&gt;c__DisplayClass1.&lt;&lt;ProcessRequestAsync&gt;b__0&gt;d__3.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at Kudu.Contracts.Infrastructure.LockExtensions.&lt;TryLockOperationAsync&gt;d__0.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at Kudu.Services.FetchHandler.&lt;ProcessRequestAsync&gt;d__6.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at System.Web.TaskAsyncHelper.EndTask(IAsyncResult ar)&#xD;&#xA; at System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)&#xD;&#xA; at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()&#xD;&#xA; at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously)" elapsed="0" />
 <step title="Outgoing response" date="09/23 17:16:12" type="response" statusCode="500" statusText="Internal Server Error" Cache-Control="private" X-AspNet-Version="4.0.30319" Content-Type="text/html; charset=utf-8" elapsed="0" />
 </step>

I missed this at first amongst all the the noise in this file. What I did instead is give up on notepad and xml and run a different powershell command: Get-AzureWebsiteLog -name myWebsite -Tail, which connects powershell to a real time stream of the Website log. Really really neat.

Clicking the sync button in Deployments of Windows Azure Websites Management Portal immediately showed activities in the Powershell console:

PS C:\temp> Get-AzureWebsiteLog -Name myWebsite -Tail
 2013-09-23T17:51:01 Welcome, you are now connected to log-streaming service.
 2013-09-23T17:51:02 Error occurred, type: error, text: fatal: ambiguous argument 'prototype': both revision and file
 name
 Use '--' to separate filenames from revisions
D:\Program Files (x86)\Git\bin\git.exe log -n 1 prototype, stackTrace: at Kudu.Core.Infrastructure.Executable.Execut
 e(ITracer tracer, String arguments, Object[] args)
 at Kudu.Core.SourceControl.Git.GitExeRepository.GetChangeSet(String id)
 at Kudu.Services.FetchHandler.<PerformDeployment>d__c.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at Kudu.Services.FetchHandler.<>c__DisplayClass1.<<ProcessRequestAsync>b__0>d__3.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at Kudu.Contracts.Infrastructure.LockExtensions.<TryLockOperationAsync>d__0.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at Kudu.Services.FetchHandler.<ProcessRequestAsync>d__6.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at System.Web.TaskAsyncHelper.EndTask(IAsyncResult ar)
 at System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)
 at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
 at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
 2013-09-23T17:51:02 Outgoing response, type: response, statusCode: 500, statusText: Internal Server Error, Cache-Con
 trol: private, X-AspNet-Version: 4.0.30319, Content-Type: text/html; charset=utf-8

Fantastic! There’s our error, and with less noise than the xml file that I was earlier confused by.

So that’s my problem, ambiguous argument ‘prototype’: both revision and file
name
Use ‘–’ to separate filenames from revisions

This means my branch in Github is called ‘prototype’ and I have a file (folder technically) called ‘prototype’ in the system and this is ambiguous to the deployment system.

Now I can’t use — to separate filenames from revisions – I don’t have that level of control over the deployment process. But what I do have control over is the branch name and the folder name. I chose to rename the prototype folder:

Then I change my .deployment file to deploy the /proto folder:

[config]
project = proto

Pushing these changes immediately solved my issue. As shown by by the continuing Get-AzureWebsiteLog -Name MyWebsite -Tail

2013-09-23T17:54:53 Fetching changes.
2013-09-23T17:54:58 Updating submodules.
2013-09-23T17:55:00 Preparing deployment for commit id '7331a9c9c3'.
2013-09-23T17:55:01 Generating deployment script.
2013-09-23T17:55:01 Using the following command to generate deployment script: 'azure site deploymentscript -y --no-
dot-deployment -r "C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\repository" -o "C:\DWASFiles\Sites\partyr\VirtualDi
rectory0\site\deployments\tools" --basic --sitePath "C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\repository\proto"
'.
2013-09-23T17:55:01 The site directory path: .\proto
2013-09-23T17:55:01 Generating deployment script for Web Site
2013-09-23T17:55:01 Generated deployment script files
2013-09-23T17:55:01 Running deployment command...
2013-09-23T17:55:01 Command: C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\deployments\tools\deploy.cmd
2013-09-23T17:55:01 Handling Basic Web Site deployment.
2013-09-23T17:55:01 KuduSync.NET from: 'C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\repository\proto' to: 'C:\D
WASFiles\Sites\partyr\VirtualDirectory0\site\wwwroot'
2013-09-23T17:55:02 Deleting file: 'hubEventListener.js'
2013-09-23T17:55:02 Deleting file: 'hubEventListener.js.map'
2013-09-23T17:55:02 Deleting file: 'hubEventListener.ts'
2013-09-23T17:55:02 Deleting file: 'readme.txt'
2013-09-23T17:55:02 Copying file: 'index.html'
2013-09-23T17:55:02 Copying file: 'signalr.html'
2013-09-23T17:55:02 Deleting file: 'css\readme.txt'
2013-09-23T17:55:02 Copying file: 'css\bootstrap.min.css'
2013-09-23T17:55:02 Copying file: 'css\foundation.css'
2013-09-23T17:55:02 Copying file: 'css\foundation.min.css'
2013-09-23T17:55:02 Copying file: 'css\normalize.css'
2013-09-23T17:55:02 Copying file: 'css\party.css'
2013-09-23T17:55:02 Copying file: 'css\partyr.css'
2013-09-23T17:55:02 Copying file: 'css\ticker-style.css'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.abide.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.alerts.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.clearing.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.cookie.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.dropdown.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.forms.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.interchange.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.joyride.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.magellan.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.orbit.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.placeholder.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.reveal.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.section.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.tooltips.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.topbar.js'
2013-09-23T17:55:02 Deleting file: 'img\readme.txt'
2013-09-23T17:55:02 Copying file: 'img\asos.png'
2013-09-23T17:55:02 Copying file: 'img\bg.gif'
2013-09-23T17:55:02 Copying file: 'img\draggable.jpg'
2013-09-23T17:55:02 Copying file: 'img\facebook_icon.jpg'
2013-09-23T17:55:02 Copying file: 'img\google_plus_logo.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand1.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand2.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand3.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand4.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand5.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand6.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand7.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand8.jpg'
2013-09-23T17:55:02 Copying file: 'img\twitter-bird-light-bgs.png'
2013-09-23T17:55:02 Copying file: 'img\voted.png'
2013-09-23T17:55:02 Copying file: 'jasmine\SpecRunner.html'
2013-09-23T17:55:02 Copying file: 'jasmine\lib\jasmine-1.3.1\jasmine-html.js'
2013-09-23T17:55:02 Copying file: 'jasmine\lib\jasmine-1.3.1\jasmine.css'
2013-09-23T17:55:02 Omitting next output lines...
2013-09-23T17:55:03 Finished successfully.
2013-09-23T17:55:03 Deployment successful.

The Kudu guys have already tackled the issue (https://github.com/projectkudu/kudu/issues/785) but the above diagnostics should help some of you.

Happy clouding,
Andy

Super Dev Dogfooding / Freecycling

This one goes out to people who write desktop apps. I’m thinking particularly of heavy weight dev tools such as Visual Studio or Eclipse. These are great tools and shape more of the modern world than they directly touch. The best web applications, globally distributed and touching billions of people daily, are constructed in these tools.

They are amazing. They (and their predecessors) have democratised the world; shaking governments and freeing oppressed masses. Their triumphs have be lauded. Their triumphs have been banned. They are the emancipation of thought to creative engineering ends.

But they won’t work having been installed for more than 6 months. They become languid and fraught with poor performance. They have corruptions, misconfigurations and malfunctions. They show the scars of freeing the creativity of man.

I’ve yet to see a development machine put to use daily that can retain the initial performance of its installation date.

My suggestion; when a dev machine that has been loved is to be replaced – send it to a software vendor who writes the primary technology / IDE that runs on it. Let them see why its performance has degraded.

Then they may fix it.

Who wants in?

The alternative? Migrations from IDE to uIDE. Sublime Text. Brackets. Notepad. Nano. Vi.

Andy

Global Windows Azure Bootcamp

On April 27th 2013, I’m helping to run the London instance of the Global Windows Azure Bootcamp.

It’s a whole day of Windows Azure training, labs, talks and hands on stuff that’s provided free by the community for the community. You can read more about the type of stuff we’re doing here http://magnusmartensson.com/globalwindowsazure-a-truly-global-windows-azure-community-event 

We’re covering .net, java and other open stacks on Azure, as well as participating in one of the largest single uses of Azure compute horsepower in the global community’s history. This is going to rock, significantly.

If London is too far, look for a closer event here http://globalwindowsazure.azurewebsites.net/?page_id=151

Happy clouding,

Andy

hadoopcmdline

HDInsight: Workaround error Could not find or load main class

Sometimes when running the C# SDK for HDInsight, you can come across the following error:

The system cannot find the batch label specified – jar
Error: Could not find or load main class c:\apps\dist\hadoop-1.1.0-SNAPSHOT\lib\hadoop-streaming.jar

To get around this, close the command shell that you are currently in and open up a new hadoop shell, and try your command again. It should work immediately.

This tends to occur after killing a hadoop job, and so I am assuming something that this activity does changes the context of the command shell in such a way that it can no longer find the hadoop javascript files. I’ve yet to get to the bottom of it, so if anyone has any bright ideas, let me know on comments ;-)

Good Hadoopification,

Andy

HDInsight: Workaround bug when killing Jobs

When running a Streaming Job from the Console in HDInsight, you might are given a message which describes how to kill a job:

13/01/09 14:52:07 INFO streaming.StreamJob: To kill this job, run:
13/01/09 14:52:07 INFO streaming.StreamJob: C:\apps\dist\hadoop-1.1.0-SNAPSHOT/bin/hadoop job -Dmapred.job.tracker=10.186.136.26:9010 -kill job_201301081702_0001

Unfortunately there is an error in this and it will not work:

c:\apps\dist\hadoop-1.1.0-SNAPSHOT>hadoop job -Dmapred.job.tracker=10.186.136.26:9010 -kill job_201301081702_0014
Usage: JobClient <command> <args>
 [-submit <job-file>]
 [-status <job-id>]
.....

This is because there is an error in the command as written out by the hadoop streaming console. There should be a space between the -D and mapred.job.tracker=ipAddressJobTracker, and furthermore the mapred.job.tracker parameter should be quoted:

c:\apps\dist\hadoop-1.1.0-SNAPSHOT>hadoop job -D "mapred.job.tracker=10.186.136.26:9010" -kill job_201301081702_0001
Killed job job_201301081702_0001

Et voila.

Happy big-dataification ;-)
Andy