SquareCog's SquareBlog

Incrementing Hadoop Counters in Apache Pig

Posted in programming by squarecog on December 24, 2010

Information about incrementing Hadoop counters from inside Pig UDFs is not currently well-documented, judging by the user list traffic, so this is a brief note showing how to do that.

Hadoop counters are a way to report basic statistics of a job in Hadoop. I won’t go into a detailed discussion what they are and when to use them here — there’s plenty of information about that on the internet (for starters, see the Cloud9 intro to Counters, and some guidelines for appropriate usage in “Apache Hadoop Best Practices and Anti-Patterns”).

Pig 0.6 and before

Counters were not explicitly supported in Pig 0.6 and before, but you could get at them with this hack (inside a UDF):

Reporter reporter = PigHadoopLogger.getInstance().getReporter()
if (reporter != null) {
  reporter.incrCounter(myEnum, 1L);
}

Pig 0.8
Pig 0.8 has an “official” method for getting and incrementing counters from a UDF:

PigStatusReporter reporter = PigStatusReporter.getInstance();
if (reporter != null) {
   reporter.getCounter(key).increment(incr);
}

You can also get Counters programmatically if you are invoking Pig using PigRunner, and getting a PigStats object on completion. It’s a bit involved:

PigStats.JobGraph jobGraph = pigStats.getJobGraph();
for (JobStats jobStats :  jobGraph) {
  Counters counters = jobStats.getHadoopCounters();
}

Pig 0.7
Unfortunately I don’t know of a way to do this in 0.7, as the old hack went away and the new PigStatusReporter hadn’t been added yet. If you have a trick, please comment.

Watch out for nulls
We’ve observed that sometimes the reporter is null for a bit even when a UDF is executing on the MR side. To deal with this, we added a little helper class PigCounterHelper to Elephant-Bird that buffers the writes in a Map, and flushes them when it gets a non-null counter.

So there. If someone asks about counters in Pig, send them here.

About these ads
Tagged with: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 25 other followers

%d bloggers like this: