2014-06-11

Cloudera Manager Fails to Restart

I've been experimenting with Cloudera Manager to manage Hadoop clusters on EC2. So far it seems to be working a little better than Ambari, which managed to install its agent software on all my nodes but always failed to start the required services.
Cloudera Manager did fail as well, but that seemed to be due to my security group settings. I changed the configuration and restarted the service on the host, but the restart always failed with this error:

Caused by: java.io.FileNotFoundException: /usr/share/cmf/python/Lib/site$py.class (Permission denied)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.(FileInputStream.java:146)
        at org.hibernate.ejb.packaging.ExplodedJarVisitor.getClassNamesInTree(ExplodedJarVisitor.java:126)
        at org.hibernate.ejb.packaging.ExplodedJarVisitor.getClassNamesInTree(ExplodedJarVisitor.java:134)
        at org.hibernate.ejb.packaging.ExplodedJarVisitor.getClassNamesInTree(ExplodedJarVisitor.java:134)
        at org.hibernate.ejb.packaging.ExplodedJarVisitor.doProcessElements(ExplodedJarVisitor.java:92)
        at org.hibernate.ejb.packaging.AbstractJarVisitor.getMatchingEntries(AbstractJarVisitor.java:149)
        at org.hibernate.ejb.packaging.NativeScanner.getClassesInJar(NativeScanner.java:128)
        ... 31 more

Odd, I thought, since by default the service runs as root and should have free rein. So I poked around in that Python library directory, and lo and behold:

I chmod'ed 644 the .class files (in /usr/share/cmf/python/Lib and /usr/share/cmf/python/Lib/simplejson) and sure enough everything is working again.
Hopefully this is helpful to somebody.

2 comments:

  1. Under normal circumstances, these class files will never be created because the directory is owned by root and CM runs as an unprivileged user (cloudera-scm by default). For CM to have created the files, it must have been run as root at one point, and then subsequently run as the unprivileged user. Does that seem possible? You should simply delete the class files to get back to normal.

    ReplyDelete
  2. Thanks. It's possible that's what happened. I'll check.

    ReplyDelete