McCarthy was self-righteous too

Brendan Eich, inventor of JavaScript, just resigned from his brand new position as CEO of the Mozilla foundation, after it was discovered he made a $1000 donation to the anti-gay-marriage campaign in California known as Prop 8.

That discovery caused uproar among the self-righteous bien-pensants who work for Mozilla, and a number of employees posted tweets about how they thought he should resign.

I'm angry about this because this isn't very different from McCarthyism in reverse. A guy was forced out of a job because his political views don't agree with the majority.

I feel opposing gay marriage is bigoted, wrong, indefensible and on the wrong side of history. I don't know Eich. For all I know he's a raging asshole with ultra-right-wing views. He might even hate kittens and burp at the dinner table. I don't know.

But what I do know is that getting forced out of a job by a self-righteous San Francisco mob of entitled nerds who have probably never even seen a Republican in the flesh is just as indefensible. It's not what America and California are about. And it shows liberals can be assholes, too, when they put their minds to it.

I'd venture to say a very large number of CEOs are raging right-wing Republicans with questionable ethics. If you don't like your CEO's politics, you're free to work somewhere else. Your job isn't in grave danger if you and your CEO don't see eye-to-eye in terms of politics--there are laws on the books protecting you from discrimination. Why should your CEO's job be in jeopardy for that very same reason?

Eich's contributions to Web tech are immense and he may well be as capable as anyone of running Mozilla, a company he's been with for years. Yet he lost his job because of his politics. And that's not right, whether you agree with him or not.


Notes on Azkaban

I've been evaluating tools to run data processing jobs and narrowed my list down to Luigi and Azkaban.

I nixed Luigi for a number of reasons:
  • you can't execute jobs from the web UI.
  • you can't schedule jobs--you still have to use cron and all the bs that goes with that (manually managing overlap, e.g. what should I do when my first job is still running when the second job is scheduled to start?).
  • the documentation is horrid.
So far so good. Azkaban does have some quirks I'm working through. For example:
  • the executor.host property is not in the default config. The web component wisely defaults to localhost, but it would be handy to have it in the default config, even commented out (like many other properties) so you can run Azkaban in its preferred distributed mode without having to look through Google Groups questions.
  • I still can't figure out how to set up the host configuration for the Hadoop cluster Azkaban is supposed to talk to. Fixed--see below.
But the UI is intuitive and it handles the overlap issue (for a given job flow) like a champ: if a job is schedule or run while it's still running, you can tell Azkaban to abort the second run, let it run in parallel, or wait until the first run completes before the second run starts.

Things to remember:
  • Hadoop and Hive must be installed on the job executor box. Not running, just installed, with the standard HADOOP_HOME and HIVE_HOME env vars set, etc. 
  • Then you have to put your actual cluster's config files in the executor server's Hadoop config directory (typically $HADOOP_HOME/conf). This is because Azkaban looks for your remote Hadoop name node location in its local Hadoop configuration files. 
  • There's a lot of documentation out there based on Hadoop's old (1.x) directory structure. Hadoop 2.x has changed a lot of that and the Jars aren't where you'd expect them. Inspect your classpaths in all the config and properties files used by Azkaban and your Hive jobs. If a Hive job fails, it's a good bet you have a classpath problem, so look at your Azkaban executor server logs (not just the logs in the web interface).
  • The startup and shutdown scripts in bin/ are pretty brittle. Make sure all the directories are set correctly and you handle errors if they're not. Also, make sure to run them as bin/script.sh instead of cd bin; ./script.sh because they rely on relative directories.
  • Remember to open port 12321 in the executor server's firewall so the web server can submit jobs.
  • Remember to open port 9000 on your master Hadoop node so Azkaban can submit jobs to it. 
  • One project == one flow. If you upload more than one flow into a project, only the last one is retained. 
  • You can't run the Hive job examples as-is. They won't work because they're missing a few properties.
This is the basic Hive word count job example:


In order for it to work, it needs to look like this (differences that matter are bolded):


I'm using Azkaban 2.1 and Hadoop 1.2.1. Different versions will have different paths and classpaths, but the azk.hive.action, classpath and hive.query.file are crucial (hive.script doesn't work).

Other things that don't work out of the box:

  • When proxying to use the hadoop user of your choice, you need to set the security manager class to its fully qualified name. The sample config does not fully qualify the class name and so the executor fails to load. To wit:
# hadoop security manager setting common to hadoop jobs

should be

# hadoop security manager setting common to hadoop jobs hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_1_0

in the plugins/jobtypes/commonprivate.properties file.



The deepest cruelty of OCD is that the D stands for disorder.


A painless git workflow

I was thrown into the Rails and git worlds a couple of years ago when joining Crunched, and while Ruby was a fairly painless experience, git was heinous for about a year. Then it clicked, mostly thanks to Luke and TJ and a good workflow. So I figured I'd share what we settled on. Maybe it'll same someone some grief.

Static PNGs below:


Engine Tard

Today I got an email from Engine Yard asking me to be an Engine Yard ambassador / promoter / evangelist. Given our painful experiences with EY, I said no thanks. To their credit, they asked why. So I sent them this list, and will document it here for LOLs and posterity.

What we didn't like about EY:
  • your pricing is unjustifiably high considering the shortcomings below.
  • your approach to database replication is facepalm-inducing. It's explicitly designed *not* to be used to fail over. I lost count of how many times I said WTF.
  • getting SSL to work properly with all the right headers is unnecessarily painful (stunnel).
  • there's no API to scale up and down by script--everything has to be done manually. WTF again, big time.
  • you've fixed this (I hope), but for a long time, removing instances from an environment did not remove those from haproxy, which meant that unless you ran the recipes manually, the app master would still be sending traffic to instances that were either turned off or assigned to another customer of yours, resulting in 404s and other hilarious situations. I sang a Viking song of battle and sorrow when I realized that's what was happening and you guys confirmed it (after I was done laughing).
  • when adding or removing instances from your web UI, the number of failures is greater than the number of successful changes performed. More than half the time we would have to re-run the add or remove process for the instances to be added successfully. This is for a completely vanilla Rails app with a tiny number of servers, i.e. the default / base case you guys are catering to.
  • when adding more than a couple of instances, the app master's NIC gets flooded by requests used to provision the new instances, which brings the entire stack down. We LOLed heartily when we figured out that was going on. The only safe way to add instances is one or two at a time. Given how long it takes to do that (see above re. failures), it's a giant pain in the rear. 
  • the default stack of app master + slaves is stupid. An active app master serving traffic and SSL termination shouldn't also be a load balancer. That's just dumb.
To be perfectly blunt, I really like the idea of your service, and I'm sure it's great for people to get started with hosting, but every time we tried to do something with our stack, it felt like EngineYard was designed and operated by amateurs who don't have any experience running, let alone hosting, or offering hosting for, a real web business. I wouldn't use EY even if there was 0 markup over AWS.
PS: it got so bad we started calling you guys EngineTard and I drew the attached. Note my Photoshop skills are not particularly advanced.

Update: I have to say EY has class and a sense of humor. After receiving this diatribe, they sent me a $25 Amazon gift card.

Update: One thing I forgot to mention is the frequent billing errors. We cancelled our account on April 2; in May I got a bill for usage we didn't incur, and on July 1 we got a bill for snapshot storage and unused IP addresses for an account that's been closed for 3 months. Sigh.



If you use the words "ridiculous" or "ridiculously" as an intensive ("it's ridiculously easy") then we have nothing to say to each other.



Remember how 2012 was all about growth hacking? Well, there's something even cooler this year.

It's the internet of things.

Things! Internets!

Look--we all love the internets, and there's things everywhere. Like, literally, all over the place, no matter where you look, there's just a bunch of things. And that means what? A business opportunity, that's what. We're going to build the internet of things.

Freaking genius idea, am I right or am I right?

Silicon Valley is filled with some of the most brilliant, creative people on the planet. It's also a vortex of groupthink, bandwagon-jumping and marketing hype. It's a bit of a shame, really.


Managing resources

Today, I reached a new milestone in my life-long attempts to improve my resource management and planning skills, and I wanted to share.

I wanted a bagel for lunch, and found that not only did I have exactly one bagel, I also had exactly enough cream cheese to cover that bagel. Exactly. No tiny amount left over to dry out in the container, and not so little I had to spread the cream cheese very thin onto the bagel.

I may never achieve this again, but one must keep trying.


Why old media always loses, part MMMMMCL

Last night, I was checking out Hulu (no link for those idiots) on my office PC and was happy to see a show I watch had a new episode. So I went upstairs, hooked up the tablet to the TV and got ready to watch.

But the new episode wasn't showing.

Heck, the whole show was completely MIA.

After muttering "WTF" a few times, I drilled down into my options in the Hulu app and saw each thumbnail for that show read "Web Only".

I am not making this up. I am allowed to watch a show on my computer, but I am not allowed to watch that same show on my tablet. Both using the same IP address and the same (paid) account, so presumably at home (not, god forbid, daring to watch a show while commuting or at a friend's house).

How exactly does this help anyone? The producers, the actors, the fans, Hulu? Who gains from this bullshit? Nobody.

The people making deals at old media companies need to retire already and put people who actually get the internet in charge.


Who says the enterprise software world isn't eccentric

Get a load of the crazy wild outfits these enterprise software guys are wearing. In public, no less. Pretty bold statement if you ask me.