Following on from earlier blogs in this series, in this blog I will show another way of getting telemetry out of HDInsight. We have previously tackled counters and logging to the correct context, and now I will show an additional type of logging that is useful while a job is in process; the reporting of a job’s status in process.
This status is a transient record of the state of a Map/Reduce job – it is not persisted beyond the execution of a job and is volatile. Its volatility is evident in that any subsequent set of this status will overwrite the previous value. Consequently, any inspection of this value will only reflect the state of the system when it was inspected, and may already have changed by the time the user gets access to the value it holds.
This does not make the status irrelevant, in fact it gives a glimpse at the system state in process and allows the viewer to immediately determine a recent state. This is useful in its own right.
Controlling the Status
The status of a Map/Reduce job is reported at the Map, Combine and Reduce stages of execution, and the individual Task can report its status.
This status control is not available in the Hadoop C# SDK but is simple to achieve using the Console.Error – see Part 2 of this series for why this stream is used!
In order that we can remain in sync with the general approach taken by the C# Hadoop SDK, we will wrap the call into an extension method on ContextBase, the common base class for Mapper, Combiner and Reducer. With this new method – which we will call SetStatus(string status) – we will write a well known format to the output that Hadoop picks up for its Streaming API, and Hadoop will then use this as the status of the job.
The format of the string is:
This will set the task executing’s state to “Hello, World!”.
Implementing the Extension method
The extension method is very simple:
|1 2 3 4 5 6 7||
Once we have this method referenced and included as a using include to our map/reduce/combine class, we can use the context parameter of our task (in my case I chose a Map), and set a status!
I chose a trivial and unrealisting sleep statement, so that I can see the message easily in the portal
Viewing the output
To view this, you can enter the Hadoop portal and check the state of the running task. In this screenshot you can see that status is set to “Sleeping During Map!” – and you can see why I used Thread.Sleep – to give me a few moments to log onto the RDP server and take the screenshot. I choose 3 minutes so I could also have time to get a coffee ….
A better example, and a very useful use case for this is to report whether a Task is running successfully or whether it is encountering errors that have to be recovered from. In order to achieve this, we will catch any exceptions that can be recovered from, report that we are doing so and then continue execution.
Notice that this is a very similar map to our previous example with Counters, but that we are also setting a state of our Task that is in process.
|1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41||