Azure CDN Presentation from @maartenballiauw

I had a conversation recently with @maartenballiauw about the new Azure CDN and how much of an improvement it can make to serving web content; primarily an improvement in last mile for your customers.

He’s put together an awesome slidedeck that he’s given at TechEd and I thought I’d share again. On slide 14 there’s a quote from me :)

 

The quote “The Internet sucks and so does your server” is a simple way to think of the problem domain. Due to things like geographic distribution causes unavoidable speed of light latency, compounded by the need to make multiple TCP requests underlying a HTTP request; things like embedding cookies in each request, bloating its weight; things like browsers limiting the number concurrent request: due to things like these, the Internet is weak over distance. Using a CDN fixes this by move some data (often static or semi-static) close to the requestor. And always remember, even if the internet doesn’t suck (a closer datacenter, no cookies, future browsers) sending traffic to a web server increases the load on expensive application capable servers. You can use the CDN to lower the cost of your app by decreasing the load on important, functional servers and pushing it to a content delivery server.

Happy clouding
Andy 

Bright HDInsight 8: Use Hive Variables with Powershell

This post is part best practice, part workaround.

When submitting Hive jobs to a HDInsight cluster with Powershell, one might consider building a common script, controlling or securing this in a version control system and then using variable substitution to provide the values required. This post explores how to supply these variables.

Firstly, one can either submit a hive job with Invoke-AzureHDInsightHiveJob, Invoke-Hive or simply hive. These last two are simply alias for the first commandlet. I will be using “hive” throughout. When I write “hive” I am remotely executing from powershell, via templeton, to a remote HDInsight cluster. POWER!

Hive allows you to supply variables straight off the command line, but when using Powershell we’ll have to wrap them up. We’ll want to get to something as close as possible to the below type of argument passing.

hive -hiveconf state='California' -f /hive.hql

In this example, we pass a hive config value of “California” for the key “state”. We can imagine a query like the below using this setting:

select count(*) from hivesampletable where state='${hiveconf:state}'

As you can see, to load the value out of the -hiveconf flag, we use ${hiveconf:keyname} as a convention.

Hive’s powershell is described with Get-Help hive as:

 NAME Invoke-AzureHDInsightHiveJob SYNTAX Invoke-AzureHDInsightHiveJob [[-Query] ] [-Arguments ] [-Defines ] [-File ] [-Files ] [-JobName ] [-StatusFolder ] [] ALIASES Invoke-Hive hive REMARKS None 

It’s clear that we can only really use “-Arguments” or “-Defines” in order to really affect the command line; -File and -Files supply scripts, -JobName and -StatusFolder provide metadata regarding job execution.

Without clear guidance in the powershell help, it took another Hive command to come to the rescue. The “set” command lets us see all the arguments defined for the executing script. Using this in conjunction with both the -Defines and -Arguments flags allow us to see how this affects the runtime environment:
PS Z:\src\powershell> hive "set" -Arguments @("argument=123") -Defines @{ "defines"="abc" } Submitting Hive query.. Started Hive query with jobDetails Id : job_201405061100_0084 Hive query completed Successfully *snip* defines=abc *snip* env:_arguments=C:\apps\dist\hive-0.11.0.1.3.7.1-01293\lib\hive-cli-0.11.0.1.3.7.1-01293.jar org.apache.hadoop.hive.cli.CliDriver -hiveconf hive.querylog.location C:\apps\dist\hive-0.11.0.1.3.7.1-01293\logs\history -hiveconf hive.log.dir C:\apps\dist\hive-0.11.0.1.3.7.1-01293\logs --hiveconf "mapred uce.job.credentials.binary=c:/apps/temp/hdfs/mapred/local/taskTracker/ecommerce/jobcache/job_201405061100_0084/jobToken" --hiveconf "hive.metastore.local=false" --hiveconf "hive.metastore.uris=thrift://headnodehost:9083" --hiveconf "hive.metastore.warehouse.dir=/apps/hive/warehouse" --hiveconf "def ines=abc" --hiveconf "hdInsightJobName=Hive: 451d5737109c47e79bfd" "argument=123" -f 451d5737109c47e79bfda2f5bcb9da23.hql *snip* system:defines=abc

The important parts to notices here are that the -Arguments flag is appended after the Job name and without a –hiveconf prefix (attempting to add this causes an error). However, the -Defines creates a system level variable which can be accessed. There is a caveat here that there are many reserved system variables, you must be careful not to clash with those. 

So if we run either of these two queries, we can achieve what we want:

hive "select count(*) from hivesampletable where state='`${system:state}'" -Defines @{"state"="California"}
hive "select count(*) from hivesampletable where state='`${hiveconf:state}'" -Defines @{"state"="California"}
Submitting Hive query.. Started Hive query with jobDetails Id : job_201405061100_0085 Hive query completed Successfully 6881 

Beyond this we can also abstract that hive statement into a file, upload this to the root of our HDInsight container and then submit a much more terse statement, and reuse out logic, externalised into a file.

PS Z:\src\powershell> hive -File /hive.hql -Defines @{"state"="California"} Submitting Hive query.. Started Hive query with jobDetails Id : job_201405061100_0087 Hive query completed Successfully 6881

The statement:  hive -File /hive.hql -Defines @{“state”=”California”} is quite close to the original use case, hive -f /hive.hql -hiveconf “state=California”! 

Bright HDInsight 7: Customise Your Cluster

HDinsight as a clustered, cloud backed resource allows for the offloading of management overhead of cluster maintenance, tuning, patching and other sysadmin type operations. The flipside to this is that for the management overhead that you lose you pay a price of a lower level of control. You cannot elevate your user context to administrative privileges. You cannot expect tweaks applied after a cluster has been created to be permanent. Certain operations inside Azure will cause “reimaging” events, where cluster members are returned to their initial state. This is in line with all clouds offering PaaS compute. In order that this loss of control should not hamper the users of HDInsight, Microsoft have supplied a comprehensive cluster customization capacity at provision time.

Using this capacity, it is possible to affect core Hadoop configuration settings in:

  • Core-site.xml
  • Mapred-site.xml
  • Hdfs-site.xml

These should be used in utter total preference to making manual changes after having connected with RDP.

The code for achieving this is similarly straight forward as the previous examples. Simply create objects to represent the configuration settings required at provision time, and submit them with the provisioning request. Again using Powershell as our exemplar language, we can achieve this with 4 simple commands.

$coreConfig = @{    

“io.compression.codec”=”org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec”;        

“io.sort.mb” = “2048″;
}

$mapredConfig = new-object ‘Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightMapReduceConfiguration’

$mapredConfig.Configuration = @{                

“mapred.tasktracker.map.tasks.maximum”=”4″;

$clusterConfig = New-AzureHDInsightClusterConfig -ClusterSizeInNodes 64 

$clusterConfig = $clusterConfig | Add-AzureHDInsightConfigValues -Core $coreConfig -MapReduce $mapredConfig 

$clusterConfig | New-AzureHDInsightCluster -Credential $clusterCreds -Location $location -Name $clusterName -Verbose -ErrorAction Stop

Happy Hadooping!

Hidden MVP Benefits

I’m an MVP for another year. I’m happy about that, but this blog isn’t going to be a boring bleat about it. Just a nice story about the internet.

TL;DR – I lost my wallet and got it back via the internet.

Lost & Found

I lost my wallet near where I live a few days ago. Late at night; wallet in jacket pocket; warm weather so slung my jacket over my shoulder and plop I guess. Anyway, I reported it missing with the railway peeps (it had a season ticket in it) and did the irritating cancellation of all my bank cards thing.

The wallet was quite a cherished gift, so I was upset but had no expectation to get the thing back.

One day I got a message on Twitter from a chap who wanted me to follow him. I did and then got a DM for me to come along to a police station (eek!) as they had some lost property of mine. When I did, the guy told me a story about how he found me; in my wallet I had no clear identification, but some cards etc.

He’d called the bank hoping to get some contact info but as always they refused to help (since my privacy is clearly more important that retrieving my property ffs).

He’d called the train company but they couldn’t find my season ticket on their system (train companies in England have steam powered computers and electric trains; contrary to stereotype).

He’d called my gym but they didn’t have a current address or contact number (I really should go to the gym more… :P ).

Then he found my MVP card; did a web search for my name and MVP and found my MVP profile page on Microsoft’s web. Listed on there was my Twitter handle, and he used that to get in touch.

So wow; got my stuff back via the power of the MVP program.

Thanks guys ;-)

Diagnosing a Windows Azure Website Github integration error

Yesterday I experienced an issue when trying to integrate a Windows Azure Website with Github. Specifically, my code would deploy from the master branch, but if I chose a specific other branch called ‘prototype’ I received a fetch error in the Windows Azure Management Portal:

This error has been reported to the team and I’m sure will be rectified so nobody else will run into it, but at Cory Fowler’s (@syntaxC4) prompting I wanted to document the steps I took to debug this as these steps may be useful to anyone struggling to debug a Windows Azure Website integration.

Scenario

In my scenario I had a project with a series of subfolders in my github repo. The project has progressed from a prototype to a full build but we were required to persist the prototype for design reference. We could have created ‘prototype’ without changing the solution structure, but as in all real world scenarios the requirement to leave the prototype available emerged only when we removed it and had changed the URL structure. We were only happy to continue working on new code if we could label the prototype or somehow leave it in a static state while the codebase moved on. This requirement is easily tackled by Windows Azure Websites and its Github integration; we changed the solution structure to have subfolders, created a new branch ‘prototype’ and continued our main work in ‘master’. Our ‘master’ branch has the additional benefit of having the prototype available for reference and quick applications of code change if we want to pivot our approach.

We then created two Windows Azure Websites (for free, wow!). In order to allow Windows Azure Websites to deploy the correct code for each, we created a .deployment file in each repository. In this .deployment file we inform Windows Azure Websites (through its Kudu deployment mechanism) that it should perform a custom deployment.

For the ‘master’ branch we want to deploy the /client folder, which involves a simple .deployment file containing

[config]
project = client

For the ‘prototype’ branch we want to deploy the /prototype folder, which involves a simple .deployment file containing

[config]
project = prototype

As you can see, these two branches then can evolve independently (although the prototype should be static).

Problems Start

The problems began when I tried to create a Windows Azure Website and integrate it with Github for the ‘prototype’ branch. No matter what I did, I couldn’t get the Github fetch to work:

At this point I stuck an email to some guy and David Ebbo (@davidebbo) prompted me to stop being lazy and look for some deployment logs. Powershell is your friend when it comes to debugging Windows Azure Websites, so I started there.

The first thing to do is to get the Logs using ‘Save-AzureWebsitesLog’:

PS C:\> Save-AzureWebsiteLog -Name partyr
Save-AzureWebsiteLog : Access to the path 'C:\logs.zip' is denied.
At line:1 char:1
+ Save-AzureWebsiteLog -Name partyr
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 + CategoryInfo : CloseError: (:) [Save-AzureWebsiteLog], UnauthorizedAccessException
 + FullyQualifiedErrorId : Microsoft.WindowsAzure.Management.Websites.SaveAzureWebsiteLogCommand

Oops, helps if pwd is something writable to the current user…

PS C:\> cd temp
PS C:\temp> Save-AzureWebsiteLog -Name myWebsite
PS C:\temp> ls
 Directory: C:\temp
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 23/09/2013 18:48 24406 logs.zip

Ok great. We have some Logs. Lets take a look!

Inside the zip at the location: /deployments/temp-59fd85ea/ are two files, log.xml and status.xml. These didn’t prove very useful :-)

Log.xml:

 <?xml version="1.0" encoding="utf-8"?>
 <entries>
 <entry time="2013-09-23T17:45:38.0768333Z" id="e7f9db74-a9e5-4738-93ee-028d051b6fd6" type="0">
 <message>Fetching changes.</message>
 </entry>
 </entries>

Status.xml:

 <?xml version="1.0" encoding="utf-8"?>
 <deployment>
 <id>temp-59fd85ea</id>
 <author>N/A</author>
 <deployer>GitHub</deployer>
 <authorEmail>N/A</authorEmail>
 <message>Fetch from git@github.com:elastacloud/asosmyWebsite.git</message>
 <progress></progress>
 <status>Failed</status>
 <statusText></statusText>
 <lastSuccessEndTime />
 <receivedTime>2013-09-23T17:45:37.9987137Z</receivedTime>
 <startTime>2013-09-23T17:45:37.9987137Z</startTime>
 <endTime>2013-09-23T17:45:40.8578955Z</endTime>
 <complete>True</complete>
 <is_temp>True</is_temp>
 <is_readonly>False</is_readonly>
 </deployment>

In the zip file at the location /LogFiles/Git/trace is a file that has much more useful information.

Part way down this encoded xml file is the error:

 <step title="Error occurred" date="09/23 17:16:12" type="error" text="fatal: ambiguous argument 'prototype': both revision and filename&#xA;Use '--' to separate filenames from revisions&#xA;&#xD;&#xA;D:\Program Files (x86)\Git\bin\git.exe log -n 1 prototype" stackTrace=" at Kudu.Core.Infrastructure.Executable.Execute(ITracer tracer, String arguments, Object[] args)&#xD;&#xA; at Kudu.Core.SourceControl.Git.GitExeRepository.GetChangeSet(String id)&#xD;&#xA; at Kudu.Services.FetchHandler.&lt;PerformDeployment&gt;d__c.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at Kudu.Services.FetchHandler.&lt;&gt;c__DisplayClass1.&lt;&lt;ProcessRequestAsync&gt;b__0&gt;d__3.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at Kudu.Contracts.Infrastructure.LockExtensions.&lt;TryLockOperationAsync&gt;d__0.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at Kudu.Services.FetchHandler.&lt;ProcessRequestAsync&gt;d__6.MoveNext()&#xD;&#xA;--- End of stack trace from previous location where exception was thrown ---&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)&#xD;&#xA; at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)&#xD;&#xA; at System.Web.TaskAsyncHelper.EndTask(IAsyncResult ar)&#xD;&#xA; at System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)&#xD;&#xA; at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()&#xD;&#xA; at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously)" elapsed="0" />
 <step title="Outgoing response" date="09/23 17:16:12" type="response" statusCode="500" statusText="Internal Server Error" Cache-Control="private" X-AspNet-Version="4.0.30319" Content-Type="text/html; charset=utf-8" elapsed="0" />
 </step>

I missed this at first amongst all the the noise in this file. What I did instead is give up on notepad and xml and run a different powershell command: Get-AzureWebsiteLog -name myWebsite -Tail, which connects powershell to a real time stream of the Website log. Really really neat.

Clicking the sync button in Deployments of Windows Azure Websites Management Portal immediately showed activities in the Powershell console:

PS C:\temp> Get-AzureWebsiteLog -Name myWebsite -Tail
 2013-09-23T17:51:01 Welcome, you are now connected to log-streaming service.
 2013-09-23T17:51:02 Error occurred, type: error, text: fatal: ambiguous argument 'prototype': both revision and file
 name
 Use '--' to separate filenames from revisions
D:\Program Files (x86)\Git\bin\git.exe log -n 1 prototype, stackTrace: at Kudu.Core.Infrastructure.Executable.Execut
 e(ITracer tracer, String arguments, Object[] args)
 at Kudu.Core.SourceControl.Git.GitExeRepository.GetChangeSet(String id)
 at Kudu.Services.FetchHandler.<PerformDeployment>d__c.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at Kudu.Services.FetchHandler.<>c__DisplayClass1.<<ProcessRequestAsync>b__0>d__3.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at Kudu.Contracts.Infrastructure.LockExtensions.<TryLockOperationAsync>d__0.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at Kudu.Services.FetchHandler.<ProcessRequestAsync>d__6.MoveNext()
 --- End of stack trace from previous location where exception was thrown ---
 at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
 at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
 at System.Web.TaskAsyncHelper.EndTask(IAsyncResult ar)
 at System.Web.HttpTaskAsyncHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)
 at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
 at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
 2013-09-23T17:51:02 Outgoing response, type: response, statusCode: 500, statusText: Internal Server Error, Cache-Con
 trol: private, X-AspNet-Version: 4.0.30319, Content-Type: text/html; charset=utf-8

Fantastic! There’s our error, and with less noise than the xml file that I was earlier confused by.

So that’s my problem, ambiguous argument ‘prototype’: both revision and file
name
Use ‘–’ to separate filenames from revisions

This means my branch in Github is called ‘prototype’ and I have a file (folder technically) called ‘prototype’ in the system and this is ambiguous to the deployment system.

Now I can’t use — to separate filenames from revisions – I don’t have that level of control over the deployment process. But what I do have control over is the branch name and the folder name. I chose to rename the prototype folder:

Then I change my .deployment file to deploy the /proto folder:

[config]
project = proto

Pushing these changes immediately solved my issue. As shown by by the continuing Get-AzureWebsiteLog -Name MyWebsite -Tail

2013-09-23T17:54:53 Fetching changes.
2013-09-23T17:54:58 Updating submodules.
2013-09-23T17:55:00 Preparing deployment for commit id '7331a9c9c3'.
2013-09-23T17:55:01 Generating deployment script.
2013-09-23T17:55:01 Using the following command to generate deployment script: 'azure site deploymentscript -y --no-
dot-deployment -r "C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\repository" -o "C:\DWASFiles\Sites\partyr\VirtualDi
rectory0\site\deployments\tools" --basic --sitePath "C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\repository\proto"
'.
2013-09-23T17:55:01 The site directory path: .\proto
2013-09-23T17:55:01 Generating deployment script for Web Site
2013-09-23T17:55:01 Generated deployment script files
2013-09-23T17:55:01 Running deployment command...
2013-09-23T17:55:01 Command: C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\deployments\tools\deploy.cmd
2013-09-23T17:55:01 Handling Basic Web Site deployment.
2013-09-23T17:55:01 KuduSync.NET from: 'C:\DWASFiles\Sites\partyr\VirtualDirectory0\site\repository\proto' to: 'C:\D
WASFiles\Sites\partyr\VirtualDirectory0\site\wwwroot'
2013-09-23T17:55:02 Deleting file: 'hubEventListener.js'
2013-09-23T17:55:02 Deleting file: 'hubEventListener.js.map'
2013-09-23T17:55:02 Deleting file: 'hubEventListener.ts'
2013-09-23T17:55:02 Deleting file: 'readme.txt'
2013-09-23T17:55:02 Copying file: 'index.html'
2013-09-23T17:55:02 Copying file: 'signalr.html'
2013-09-23T17:55:02 Deleting file: 'css\readme.txt'
2013-09-23T17:55:02 Copying file: 'css\bootstrap.min.css'
2013-09-23T17:55:02 Copying file: 'css\foundation.css'
2013-09-23T17:55:02 Copying file: 'css\foundation.min.css'
2013-09-23T17:55:02 Copying file: 'css\normalize.css'
2013-09-23T17:55:02 Copying file: 'css\party.css'
2013-09-23T17:55:02 Copying file: 'css\partyr.css'
2013-09-23T17:55:02 Copying file: 'css\ticker-style.css'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.abide.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.alerts.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.clearing.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.cookie.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.dropdown.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.forms.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.interchange.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.joyride.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.magellan.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.orbit.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.placeholder.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.reveal.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.section.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.tooltips.js'
2013-09-23T17:55:02 Copying file: 'foundation\foundation.topbar.js'
2013-09-23T17:55:02 Deleting file: 'img\readme.txt'
2013-09-23T17:55:02 Copying file: 'img\asos.png'
2013-09-23T17:55:02 Copying file: 'img\bg.gif'
2013-09-23T17:55:02 Copying file: 'img\draggable.jpg'
2013-09-23T17:55:02 Copying file: 'img\facebook_icon.jpg'
2013-09-23T17:55:02 Copying file: 'img\google_plus_logo.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand1.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand2.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand3.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand4.jpeg'
2013-09-23T17:55:02 Copying file: 'img\rand5.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand6.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand7.jpg'
2013-09-23T17:55:02 Copying file: 'img\rand8.jpg'
2013-09-23T17:55:02 Copying file: 'img\twitter-bird-light-bgs.png'
2013-09-23T17:55:02 Copying file: 'img\voted.png'
2013-09-23T17:55:02 Copying file: 'jasmine\SpecRunner.html'
2013-09-23T17:55:02 Copying file: 'jasmine\lib\jasmine-1.3.1\jasmine-html.js'
2013-09-23T17:55:02 Copying file: 'jasmine\lib\jasmine-1.3.1\jasmine.css'
2013-09-23T17:55:02 Omitting next output lines...
2013-09-23T17:55:03 Finished successfully.
2013-09-23T17:55:03 Deployment successful.

The Kudu guys have already tackled the issue (https://github.com/projectkudu/kudu/issues/785) but the above diagnostics should help some of you.

Happy clouding,
Andy

When Windows Hadoop Streaming forgets how quotes work …

Very short post.

Hadoop streaming works on the command line. When you want to pass your job paths to files, if those files contain spaces, you need to quote the input parameters.

Typically if you rdp onto a HDinsight instance to do this, you will double click the “Hadoop Command Prompt” shortcut on the desktop, which is a shortcut to C:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd.

From this morning, it seems that the hadoop command line options in the script at hadoop.cmd is no longer handling quotes “ in the way it was before that allowed one to delimit arguments using the double quotes.

E.g.
> HadoopProgram.exe “/My Folder/*/*/*/*.gz”
Returns error:
ERROR security.UserGroupInformation: PriviledgedActionException as:admin cause:org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: asv://mycontainer@myclusterhdinsight.blob.core.windows.net/user/admin/GÇ£/My
13/08/02 09:12:12 ERROR streaming.StreamJob: Error Launching job : Input path does not exist: asv://mycontainer@myclusterhdinsight.blob.core.windows.net/user/admin/GÇ£/My
Streaming Command Failed!
To fix it first change the command line options to handle double quotes. You can do so by creating a new shell within your existing hadoop shell with:
cmd /S
Then the exact same command above will run successfully.
Thanks to my bro Simon Perrott for sharing this lovely experience with me ;-)

Super Dev Dogfooding / Freecycling

This one goes out to people who write desktop apps. I’m thinking particularly of heavy weight dev tools such as Visual Studio or Eclipse. These are great tools and shape more of the modern world than they directly touch. The best web applications, globally distributed and touching billions of people daily, are constructed in these tools.

They are amazing. They (and their predecessors) have democratised the world; shaking governments and freeing oppressed masses. Their triumphs have be lauded. Their triumphs have been banned. They are the emancipation of thought to creative engineering ends.

But they won’t work having been installed for more than 6 months. They become languid and fraught with poor performance. They have corruptions, misconfigurations and malfunctions. They show the scars of freeing the creativity of man.

I’ve yet to see a development machine put to use daily that can retain the initial performance of its installation date.

My suggestion; when a dev machine that has been loved is to be replaced – send it to a software vendor who writes the primary technology / IDE that runs on it. Let them see why its performance has degraded.

Then they may fix it.

Who wants in?

The alternative? Migrations from IDE to uIDE. Sublime Text. Brackets. Notepad. Nano. Vi.

Andy

Windows Azure, 100% business class

Last week I was with one of Elastacloud’s Big Data customers discussing updating an internal visualisation from around a year ago and looking at how we can turbo charge this. The javascript heavy visualisation shows real time data – a fire hose from their analytics capture partner streams a huge amount of data into a website and a bit of client side work turns that into a list of what’s happening on the site in real time. Pretty nifty.

Sketching out how we could take this from 2012 to 2013 with these industry luminaries was great fun. 2012 was their “year of awakening” to the power of the Cloud and Windows Azure as the best example thereof, giving me the opportunity to take the original idea and add in cloud power. The most amazing part of the cloud is that you don’t need to sketch out on paper. If you’re doing that, you’re missing the point. You sketch it out with compute, running in the cloud as building blocks to your end goal. This isn’t flying economy, it’s flying 100% business class.

We stuck their code in Github in a private repo. We made a few surgical code edits to remove hard coded urls and other nonscalable original design decisions. Using Windows Azure Websites, we replicated their existing (php) infrastructure. We got their system running in minutes with a live link between Github and Windows Azure Websites.

We built a Linux HPC cluster on Windows Azure, empowering their existing R skillset across multiple cores; exposing services that can be consumed to calculate machine learning based algorithms in near realtime, augmenting the live stream firehose with near real time compute – only possible by having hundreds of cores of Linux on Windows Azure at our disposal.

Using Elastacloud’s Azure VM management suite, we can react to the firehose; we can make the Big Compute smarter from the outside world.

A little javascript later and we have augmented 2012′s dumb javascript display to 2013′s cloud powered, machine learning and real time data analysis.

The internal visualisation has a limited use case – it’s used at their bigwig meetings. So we span it up, showed it working, turned it off. Since this is all pay-as-yo-go we could tell them how long and how much that sketching out had cost them, and exactly how much it would cost them ongoing if they wanted the same power for other things.

Those bigwigs can now solve business challenges with real compute assets. “My recommendations cost me £100 a day but make me £1000 a day…” 100% business class.

Andy@elastacloud.com