roggr

Docker and Python debugging in VS Code error: timed out waiting for launcher to connect

2021-03-26T21:57:00.007-07:00

Scenario: you have a Linux machine and you're creating a Dockerized Python app. You fire up Visual Studio Code, follow the official instructions, and eagerly hit F5 to test your app.

But nothing happens, except an error dialog after about 15 seconds:

This may be your firewall blocking connections on the docker network interface.

To confirm this, turn off your firewall, launch a debug process in VS Code, and see if the debug session starts. Repeat with the firewall on. If that fails, it's time to add rules allowing traffic for the docker interface.

First, find the subnet for your docker interface:

In this instance, you want traffic allowed through docker0 over its IP range, which in my case is 172.17.0.0/16. Yours may be different.

In a terminal, add those rules:

sudo ufw allow out on docker0 from 172.17.0.0/16
sudo ufw allow in on docker0 from 172.17.0.0/16

Reload your firewall config, and your debugging should start working.

sudo ufw reload

How to Resolve Windows Host Names on your LAN from a Linux Machine

2021-02-12T20:58:00.000-08:00

Resolving Windows host names on a LAN from a Linux machine is tricky and changes often enough that most AskUbuntu or Stack Overflow "solutions" don't work as of Feb 2021.

This method works in KDE Neon 5.20 with Kernel 5.4.0-65-generic and should work with most Ubuntu 20.x versions.

The idea is to give your Ubuntu machine your router's / DHCP server's address as a secondary DNS server.

The difficulty is that the standard /etc/resolv.conf file gets clobbered when the NetworkManager service starts. The trick is how to manage resolv.conf manually.

Prerequisites:

A LAN
A Windows host on the LAN with default network configuration, in WORKGROUP
A Ubuntu 20.x machine

All of the following happens on the Ubuntu machine. The Windows host doesn't need any reconfiguration.

Steps:

Tell NetworkManager not to provide DNS resolution
Create your own resolv.conf

Step 1 (described here)

Create /etc/NetworkManager/conf.d/90-dns-none.conf file with the following content:

[main]

dns=none

Step 1.1

Restart NetworkManager

sudo systemctl reload NetworkManager

Step 2

If /etc/resolv.conf is a symlink to a systemd file (typically /run/systemd/resolve/resolv.conf or the stub file), remove that symlink.

Create a/etc/resolv.conf file like this (replace 192.168.1.1 with your router or DHCP server's actual IP address if it's different):

nameserver 127.0.0.53

nameserver 192.168.1.1

Step 2.1

Restart the resolver

sudo systemctl restart systemd-resolved

Then you can test like this:

$ nslookup google.com

Server: 127.0.0.53

Address: 127.0.0.53#53

Non-authoritative answer:

Name: google.com

Address: 216.58.195.78

Name: google.com

Address: 2607:f8b0:4005:807::200e

$ nslookup mywindowsmachine

;; Got SERVFAIL reply from 127.0.0.53, trying next server

Server: 192.168.1.1

Address: 192.168.1.1#53

Name: mywindowsmachine

Address: 192.168.1.61

;; Got SERVFAIL reply from 127.0.0.53, trying next server

If the above still doesn't work, try installing winbind.

sudo apt install winbind libnss-winbind

Ergonomics at the computer: keyboards

2018-12-09T17:06:00.001-08:00

Keyboards

As the primary method of entering data into a computer, keyboards play a significant role in a workstation's ergonomics and comfort.

This is merely an outline, and I will fill it out over time.

Split Keyboards

Split keyboards come in two major flavors: all-in-one, or physically separate halves. The idea is to be able to position and orient each half of the keyboard so that each of your hands is in an optimally comfortable position.

One-piece split keyboards tend to be raised in the middle and have their keys laid out in a "smile line" arrangement. The most famous of those is likely the Microsoft Natural Keyboard. Alternatively, some have keys in concave key wells, like the Kinesis Advantage or Maltron. Then there's the Safetype, which is really its own beast.

The other family of multi-piece keyboards include the Kinesis Freestyle, VE.A and clones, and a raft of others.

Mechanical vs Membrane / Rubber Dome

Holy wars rage over what feels / looks / sounds better of mechanical switches (e.g. Cherry MX, buckling springs, Alps, etc) vs membranes or rubber domes.

Like everything else, each keyboard type has its pros and cons. There are some very high quality membrane keyboards with very low key actuation force, which is a great way to reduce stress on your muscles and tendons (viz. Kinesis Freestyle). On the other hand, mechanical switches come in a bewildering number of variations, allowing you to customize almost everything: key travel, actuation force, bump vs. no bump, audible click vs. not, pitch and loudness, even housing and stem colors. There's a cottage industry of geeky enthusiasts who take switches apart and recombine the parts into hybrid creations designed to maximize one characteristic or another. It's a little crazy, but more power to them, and I've benefited from that kind of mad scientist approach via some novel switches like Outemu Sky and Zealios, which arguably wouldn't exist if people were content with stock switches from Cherry, Gateron, Kailh, or Outemu.

In some cases, the argument between mechanical or membrane is truly pointless: some keyboards only come in one style, usually membrane, so if you like those models, you have to deal with a membrane, period. This applies to the MS Natural keyboard, Gold Touch, Kinesis Maxim (all membrane); and on the other hand the Kinesis Advantage isn't available with membrane keys, which is a bummer if you like Kinesis's membrane options like the Kinesis Freestyle Solo.

Kinesis recently released two mechanical Freestyle models (the Freestyle Pro and the Freestyle Edge), which I haven't tried yet, but it's one of those rare cases where you can actually choose between a membrane or a mechanical version of essentially the same keyboard.

Tenting and Tilting

Tenting refers to the ability to lift each half of a split keyboard such that the center is higher than the outer edges. The result is a keyboard that looks like an A-frame.

Tilting refers to adding a front-to-back angle to a keyboard (split or not). Most keyboards sold today have built-in feet (usually retractable) that allow you to give the keyboard a "positive tilt" (the number row is taller than the space bar), even though that tilt is ergonomically questionable, as it makes your wrist angle more from a relaxed, straight position. Ergo aficionados often add "negative tilt" to their keyboards (the space bar is the tallest key, and the num row is lower), which helps keep your wrists on the same plane as your forearms, with no bending.

The Ultimate Hacking Keyboard comes with feet and mounting holes that let you choose what kind of tilt you want (if any), instead of forcing you to use positive or no tilt, like most keyboards do.

Personal History

Here's an incomplete list of keyboards I've used over the years.

Ultimate Hacking Keyboard, Cherry MX Clear

Great! I'll write a dedicated post soon. I just started using it, but it's the most promising keyboard I've tried in a while.

Microsoft Natural Keyboard gen 1, 1994

Heavy-duty construction, heavy key presses, very good. I still use it on occasion.

Microsoft Natural Keyboard, early 2000s

The best era, hands down (see what I did there?) No multimedia button bs, light key presses, about as good as a mainstream ergo keyboard gets. I'll never understand why Microsoft discontinued this model. I still use one on occasion.

Microsoft Natural Keyboard, mid 2000s (black)

Lots of useless media and other keys; feels pretty good, except the space bar, which is much too heavy. Most workplaces in Silicon Valley have one of those lying around, so I'll use it if it's my only ergo option, but the heavy space bar is problematic.

Kinesis Maxim

A funky split/tenting/splayed rubber dome. Works well, and has been my main keyboard at work for some time. I still use it daily.

Kinesis Advantage, Cherry MX Brown

Very good, though it feels a little chintzy and the learning curve is steep. Becoming truly fluent with it would require using it exclusively for a while, and that hasn't been realistic given my day-to-day. I still use it occasionally.

Kinesis Freestyle

This was my daily keyboard for quite some time. I made a tenting kit for it because Kinesis charges way too much for theirs. Very light keypresses, good layout. I still use it occasionally.

Gold Touch

This is a split with a handle to lock it into position. Not a great key feel, and that friction hinge thing feels like trouble down the line. I sold it.

They also made a "travel" version (the Gold Touch Go)
with cheap laptop keys and it was horrendous. Better than nothing, but not by much. I got rid of it too.

Viterbi, Outemu Sky

My first foray into ortholinear keyboards. It doesn't feel terribly different from staggered layouts. I just started using it so I haven't decided how much I like it yet.

Koolertron, Cherry MX Brown

Very good construction, fully programmable, good quality keycaps, and each half is usable on its own. The layout is a little odd with extra keys that probably shouldn't be there. I never really got attached to this one, though there's really nothing wrong with it.

Safetype

This is an interesting beast. The alphas, mods, and function keys are vertical, while nav and numpad keys are in the center console, between the two upright halves. Some of the keycaps are printed backwards, and the keyboard has mirrors on each half so you can see what you're doing. I am not used to this keyboard yet, but it's a lot easier to use than I anticipated.

VE.A Clone, Gateron Brown

This is a clone of a very expensive Korean split keyboard. Mine has Gateron brown tactile switches. Very good, usable layout, funky backlighting, usable extra keys on the left side. I made a custom tenting kit for it, and use it daily.

Ergonomics at the computer, part 1

2018-12-09T16:42:00.003-08:00

Background

I started typing a long, long time ago, in a ~~galaxy~~ country far far away. As an avid nerd, that meant programming or playing Tau Ceti for hours and hours every day. I didn't learn to type "properly" (that concept wasn't taught in schools at the time) and almost certainly developed terrible typing habits. Add to that the desire and ability to type pretty fast (95 wpm on average), and a couple of careers that relied on typing a lot, and it's not surprising I've been dealing with RSI (repetitive stress injury) for quite some time. This has led to a lifelong pursuit of ergonomics.

While it's fun to try various kinds of keyboards, mice, and computing devices, in my case fun is only a side benefit. Using standard devices is not possible: typing on a straight or laptop keyboard for just a few minutes will trigger a tendinitis flare-up that can be partially or completely disabling for days. As someone who makes a living typing all day, not being able to type is problematic. So I've been trying a lot of devices and learned a few things that may be useful to my two ~~million~~ readers, hence this post.

Outline

The plan is to discuss various approaches and devices I've tried over the years, with some semblance of structure. Topics will include keyboards, mice, trackballs, touchpads, tablet devices (with and without styluses), and accessories like keyboard trays, mouse pads, wrist/palm rests, tenting kits, etc.

Part 1: Keyboards

The Planning Dilemma

2017-01-31T20:08:00.001-08:00

After a few years in the tech industry in Silicon Valley, I've noticed a recurring theme in growing startups: planning is a problem. Now this isn't a novel insight--books have been written (partially) on the topic, successful consultancies have been built (partially) to solve this problem, and lots of very smart people have been working on it for a long time.

Ultimately, however, every organization needs to figure out what sets of practices work best for its own unique circumstances: how big and experienced their team is, what kind of product they're building, how risk-averse and regulated their sector or customers are, and many other considerations. Much like learning to read, dance or play the piano, it's up to the learner to put in the work so they can become good at it. Planning is no different.

The Growth of Planning

The particular dilemma that led me to write this post is a common trajectory I've seen in quite a few growing startups grappling with the question of "how do we plan". It tends to go like this:

1. No planning. Feature / task prioritization is obvious, the number of people doing the work is small, and you don't have time to do anything but build as fast as you can.

2. You have your beta out, enough to get a little funding (maybe a big seed or even an A round) so you can actually hire a couple of engineers, a product manager, maybe a designer, beyond the founder(s) and initial hire(s). You may have traffic to your app, or actual customers using your service, maybe a product-focused board member, a trade show or hard date you want to hit in a few months to demo your app to secure more funding. Whatever the reason, you start putting some planning in place, maybe with daily stand-ups to coordinate between teams, or weekly sprint planning. Still fairly lightweight. Your leads/execs get together once every 2-4 weeks for higher-level feature prioritization and roadmapping, more or less often depending on your product and industry.

3. You've secured funding and hired more people. You now have 2-3 distinct teams working on separate features (web and mobile, Android and iOS, e-commerce and logistics, design and manufacturing, whatever). You now have a fully-fledged marketing team with "rolling thunder" tactics, media tours to plan, conference appearances to organize, trade show booths to buy, ad spend budgets to propose, etc. You may have a manufacturing pipeline with very real lead times. The point is that people outside of engineering need to know what's happening when, sometimes pretty far in advance.

Yes, we've that software estimates are lies, that "it'll ship when it's ready," and other enlightened developer-friendly manifestos. I won't discuss the merits of that point in this post, and simply stipulate someone somewhere needs to know what's happening when beyond the next few days.

Planning For Others

At this stage of growth, in most (all?) of the startups I've observed or been a part of, the typical setup from step 2 expands to:

a) 2-3 teams, each with dedicated product, design and engineering resources; then

b) management overhead to coordinate between those teams and make sure the company's overall goals and priorities are being reflected correctly; then

c) meta-management overhead to make sure the meta-product management from b) is communicated effectively to people outside the product/engineering org (e.g. marketing, sales, execs, board, etc)

In practice, that often looks like this:

execs meet, often "off-site", once a quarter or so, to define the Big Goals or Roadmap or whatever it's called for the company; then
division heads from b) (VP product, VP engineering) have regular planning sessions to define and prioritize the next batch of work for some period (1-4 weeks) (the "pre-sprint planning")
each division head meets with "their" leads (PM, lead eng) from each team in a) to clarify and prioritize stories for the upcoming sprint (the "sprint planning", per team)
the division heads communicate any relevant timing updates to their peers (e.g. sales or marketing)
each team has their own internal touch points (daily standups are still common)
each team may elect to do a post-sprint retrospective to look back at the sprint, see what was good and what could be improved, to feed into the next sprint

Given that sprints tend to be brief in order to be able to respond quickly to changing priorities (sometimes as short as one week), and that any sufficiently meaty story needs a fair bit of time to write, read and understand before it can be implemented, I'd estimate the above planning duties combined make up at least a whole day's work per week per developer, and probably about three days' work per week per division head.

If you want to check my math:

For developers:

sprint planning = 2-4 hours
retrospective = 2 hours
daily standup = 5 x 12 min = 1 hour
context switching costs = 1-2 hours over the course of the week
ancillary negotiation, hemming, hawing, clarification, etc. over stories = 1-2 hours

Twenty percent of a developer's time spent on process management in the name of agility seems excessive to me.

Process Artifacts

Anecdotally, both as a developer and as a manager, I have heard feedback that daily stand-ups as a regular event aren't terribly useful (they are a costly interruption / context switch and often devolve into status reports that aren't useful to anybody or tangents that take too long and engage far too many people who are too polite to just leave a conversation they're not involved in; dependencies and blockers are best dealt with one on one, not as a group).

I have also participated in and/or led retrospectives and have mixed feelings about them; as blameless post-mortems (not necessarily after an outage: they could be held after a good or a bad/stressful release) I find them to be good vehicles for venting and constructive criticism that can lead to improvements, but as meta-exercises about the process, I find them self-serving and unnecessary.

Sprint planning is tricky. Writing stories is difficult; estimating them arguably harder; story points, planning poker and other artifacts are questionable; evidence-based scheduling seems more promising, but I feel you need a dedicated technical but non-coding manager to deal with it reliably and dispassionately, and that's hard to come by in a small startup.

What's The Upshot?

I don't have a good answer. At this point I'm pretty familiar with the new software development orthodoxy (most things under the "agile" umbrella), and while I find it preferable to a waterfall approach in principle (a system that is gleefully mocked by many people who've never actually experienced it; I certainly haven't, and I've been around a while), I feel a lot of its standard assumptions, tools and practices need to be challenged objectively and dispassionately. Otherwise it's just bandwagoning, and it's the opposite of what it's supposed to be--pragmatic.

I'd love to hear your thoughts and experience and learn from people who've been dealing with this. Please comment or get in touch!

Fiio X5 high-def media player: quick review

2016-12-20T11:52:00.001-08:00

I recently upgraded an iPod Classic 160GB to a higher-definition, larger-capacity Fiio X5 (first generation). The main selling points for me were:

dual micro-SD card slots for up to 512GB of storage
high-quality DAC
support for all the formats I use
good battery life
frequent firmware updates

Unfortunately, the device I got has to go back. It's probably defective, but it also has significant UI problems (even in firmware version 2.6). Here's a summary.

turning the device on almost never works. I always have to reset it (hold the power button down for 15+ seconds) and then try to turn it on (hold power button down for 2 seconds). Most of the time, that doesn't work and I have to do the whole operation again. Occasionally several times. This means it takes 1-2 minutes just to get the device ready to use. iPod = instant, and it always works.
the hardware buttons do not work when the display goes to sleep. To turn the volume up or down, pause, play or skip, you have to click the power button, then the hardware button you want.

Sure, you could set the display never to go to sleep, but you lose the battery-saving benefits. And the default behavior is for the display to go to sleep, which means the out-of-the-box configuration doesn't work the way you'd expect, and requires two clicks to perform any function.

the two points above often combine for maximum annoyance. When the display goes to sleep (or the device enters some kind of low-power mode that lets the music play), and the power button doesn't work, which means you can't pause / unpause / skip / adjust the volume. The only option is to reset the device (hold power button down for 15+ seconds), start it up, try again if it didn't work, and make your change.

Except when the device comes back from a reset, it doesn't remember what track was playing, so you have to browse your library all over again, find whatever you were playing, and play it again.

browsing the SD card or the library is impossibly slow.

The jog wheel scrolls through the library at the same speed, no matter how quickly you're jogging. On the iPod, after a certain speed, the scrolling speeds up and skips whole letters in the alphabet all at once. On the Fiio X5, if you have 500 artists and want to listen to Zimmer's Hole or Zoe Keating, you'd better have a sandwich at hand, because it's going to take a while.
When you're tired of Zimmer's Hole and want to switch to Frank Zappa, each step (tracks -> album -> artist (Zimmer's Hole) -> scroll -> artist (FZ) -> album -> tracks) takes 2-5 seconds. So switching to an album by a different artist can take up to a minute.

Because the device needs to be reset constantly, loading up two different albums is a multi-minute ordeal that simply doesn't happen on the iPod or any smart phone.

It's a shame, because the audio quality is truly fantastic. But the device itself is unusable.

AMA Part 2: Software Development Resources

2016-12-01T20:12:00.002-08:00

In the previous post, I discussed a few ways in which I have found coding bootcamps to be inadequate. In this post, I will present a list of resources I have found very useful as a software engineer. Many are free, and most others can be found in used bookstores (or Amazon) for moderate prices. I'd love to hear other people's favorite resources, so please add yours in the comments or on Twitter.

General Computer Science and Programming

Disclaimer: I didn't study CS formally, so this list is short and almost certainly out of date. Please send me your suggestions!

Aho, Lam, Sethi - Compilers: Principles, Techniques and Tools
Gamma, Helm, Johnson, Vlissides - Design Patterns (the "Gang of Four" book)
Knuth - The Art of Computer Programming
MIT Open Courseware, EE and CS

Language Specific

Official Tutorials / Documentation

The Java Tutorial - https://docs.oracle.com/javase/tutorial/
PHP Manual - http://php.net/manual/en/index.php
Python doc - https://docs.python.org/3/
Ruby doc - http://ruby-doc.org/

Note: I strongly do not recommend a very popular Ruby introduction, Why's (Poignant) Guide to Ruby. A lot of people seem to love it. I found it way too cute, unclear, trying too hard, and just bad.

Books

Kernighan, Richie - The C Programming Language
Liberty, Halpern - The C++ Standard Library From Scratch
Lippman - Essential C++

Note: even if you don't program in C or C++, Kernighan & Richie is a great introduction to computer programming at a low level. The Lippman and Liberty, Halpern books go very well together, and are project-oriented walkthroughs of essential features. All three of these books are very short, but they pack a punch.

Note: get both Nutshell books, and read them in parallel. Mine is a free work-in-progress, aimed at people who have never programmed but want/need to learn Java.

Crockford - JavaScript: The Good Parts
Martelli - Python in a Nutshell
Thomas, Fowler, Hunt - Programming Ruby
Flanagan, Matsumoto - The Ruby Programming Language

Note: anything by Flanagan, Fowler, Beck, Crockford or Martelli is worth reading.

After You've Coded For A While

Fowler, Beck - Refactoring
McConnell - Code Complete
Stroustrup - The C++ Programming Language

Note: I listed Stroustrup here because it's a pretty dense, dry read (do not read it as your first programming book) that requires a good idea of how things work under the hood. It's also not a practical introduction to C++ programming; it's a guide to the C++ language, and from that perspective it's full of vital insights about choices made when C++ was designed, which in turn makes you think about what computer languages can do, and the various ways they do it.

Specific Topics

Sankoff - Time Warps, String Edits, and Macromolecules

AMA Part 1: Coding Bootcamps

2016-12-01T19:36:00.005-08:00

A recent #DevDiscuss thread on Twitter focused on developers' various education paths into the profession. The discussion was lively and a lot of good questions, answers and experiences were shared.

After the discussion ended, a few people contacted me to ask more questions about the industry, and two themes emerged: 1) the various, popular coding bootcamps that have flourished in the past few years and 2) what resources (books, online tutorials, etc) I would recommend for a new software developer to become well-rounded (and employable).

I'm going to address each question here in case it helps more than the folks who DMed me on Twitter. This post will be a reflection on bootcamps, and the next will collect programming resources I've found useful.

On Coding Bootcamps

First, a few disclaimers:

I have never been a student or instructor at any coding bootcamp
I hae nothing against short, intensive programs to learn a skill--I've taken a total of two computer-related classes, and both were short, intensive, and bootcamp-ish.
I realize different people have different learning styles; some do best alone, reading tutorials/books and writing tons of test code; others thrive with videos or podcasts; others still benefit from the focus and/or collegial learning you get in a classroom setting; etc.
I was a teacher for a few years, in college and adult ed, in standard quarter/semester-long as well as intensive summer programs; I've also spent a lot of time in the classroom as a student
I think it's fantastic that so many people (and especially underrepresented groups) are learning to become software engineers
I don't find a formal computer-science education is a particularly good predictor of talent or success in the software industry. Some of the best engineers I've worked with were humanities majors or high-school grads; some of the worst had MAs in CompSci from Stanford; there have also been great formally-trained engineers and awful self-taught engineers
My experience with bootcamps comes from interviewing about thirty applicants, offering jobs to two, and being friends with a couple
Some bootcamps may be fantastic. I don't know all of them, far from it

With that out of the way, here are some observations I've made about coding bootcamps.

I found two areas where bootcamps seem to be falling short: tool/technology independence, and low-level technical basics (how stuff actually works).

Technology Independence

Presumably because software engineering is a vast subject and you need to carefully limit the scope of an introductory course, I found bootcamps teach their students exactly one way to do things. with carefully selected tools, but not:

other ways to do the same thing (with other tools, or with no big frameworks at all);
why those tools were chosen over others; and
what to do when you have to deal with a novel situation that doesn't exactly fit the standard paradigm.

The education seems limited to a very expensive Rails or Angular (or whatever framework) tutorial, carefully keeping students down the garden path of a basic application. There are a lot of tutorials available for free online; the majority

The graduates I talked to had never been exposed to any other way of doing things than the Rails/Angular/whatever way, even though 1) there are many, equally valid ways to approach application development and 2) the vast majority of industry jobs involve mixed, heterogeneous assemblies of tools, practices, and code from different eras/styles/people, and finding a chunk of code that's exactly like the tutorial so you can comfortably understand and modify it is the exception, not the rule.

Students were able to tell me how they would use ActiveRecord to interact with a database and display a list of things in a Rails view, but were stumped when I added common variations to the data stack (e.g. combining data from a SQL database with a document store like ElasticSearch). And when I gave them pieces of existing, real-life code to pick apart and modify to implement a new or different feature, most of them remained stuck and unable to figure out a way to make any progress.

I'm not blaming them for not knowing how to use something they weren't taught (all devs have to pick up new technology all the time); what I'm deploring is that the bootcamps didn't give them the mental tools and technical knowledge to reason their way out of a predicament.

Learning software engineering is a skill that will last you a lifetime. Knowing how to crank out an app with today's popular tools is a lot less valuable. Crucially, engineering skills like experimentation, figuring out how a piece of code works, exposure to multiple ways to do something so you're never stuck in one pattern you don't completely understand, those are arguably the hardest skills to learn on your own, and where a classroom setting, peers and a teacher to answer your questions, would be most beneficial. Learning how to use a framework, library or tool is the kind of stuff anyone can do with a little time and a browser, and a classroom setting isn't all that necessary.

How Stuff Works

Another area where the bootcamp graduates I spoke to were entirely unprepared is the underlying low-level technology that makes a networked app work (web app or internet-enabled mobile app). I'm not talking about arcana of TCP packet management or running a DNS server--the very basics of how software executes on a machine and how network/internet requests are made: how your browser finds example.com, contacts it, requests stuff, receives said stuff, and displays it. The kind of thing you absolutely have to understand when you're troubleshooting a problem in your live app, or when you're setting up a CDN, or when you're doing Ajax calls to a third-party domain, or dealing with HTTPS, or redirecting people from one page of your app to another.

Anyone can write an app that handles ideal circumstances; what makes an engineer valuable is their ability to fix it when it misbehaves. None of the bootcamp graduates was able to reason through the network path or anatomy of a basic web request. Very few knew how headers and cookies work. That stuff isn't complicated, you just need to see it once to understand it; and it's very important in a world of open, unsecured wi-fi access points and personalized apps and services, so you know why putting credentials in a cookie on a non-HTTPS site is a bad idea.

You don't need to be an expert; but not knowing the basics will absolutely hold a person back. Yes, those things can be learned on the job, but getting that job will be tricky if your education hasn't given you any information at all about the building blocks of your day-to-day work.

Silver Lining

Bootcamps are not all bad. I've heard and seen a lot of great feedback from people who genuinely got a lot out of them. Many bootcamps have industry partnerships or placement programs that help their graduates get hands-on experience in real software shops. The advantages of collegial learning are undeniable. Some people thrive in the pressure of intense, brief immersion into a topic. And you've got to start somewhere.

The other good news is that some of what I discussed above can be remedied easily; the information can be absorbed and understood in a couple of days of guided study.

Conclusion

Beyond the specifics I outlined above, what bothers me the most about the bootcamps I've been exposed to is that they both overpromise and underdeliver. Some (many? all?) claim to prepare future devs for the job market, but the ones I've been exposed to fall far short. And given how they seem to aggressively recruit from underrepresented populations (I've met a lot of non-male, non-white students from those bootcamps), it feels like the students are being sold a bill of goods and the promise of a fun, fulfilling and lucrative career, and are likely to be surprised and bitterly disappointed once they start interviewing for software engineering jobs.

I'd be happy to recommend a bootcamp education if I knew of one that gave its students more than a tutorial, and included a survey of the basic technology underlying the kind of software its graduates are taught to write. If anyone reading this has a recommendation, I'd love to hear about it. Find me on Twitter @roger_b_m or comment here.

Update 1/7/2017

Stories from coding bootcamps: many people have good experiences, but job placement stats are misleading or fake.https://t.co/8JiW8ChASy pic.twitter.com/YEfsDNPi8g
— Dan Luu (@danluu) January 7, 2017

Docker lessons learned 1 year in

2016-11-23T20:02:00.003-08:00

A little under a year ago, I started doing devops work for a startup (the Company) with very specialized needs. As it operates in a highly regulated sector, the company's access to their infrastructure is extremely restricted, to prevent accidental or malicious disclosure of protected information. Their in-house web apps and off-the-shelf on-prem software are deployed on a compliant PaaS (I'll call them "the Host", even though they offer vastly more than just hosting), which is very similar to Heroku and uses Docker exclusively for all applications deployed on their private EC2 cloud. I knew about Docker but had never used it, and it's been an interesting few months, so I thought I'd write up some observations in case they help someone.

Topsy Turvy

If you're coming to Docker from a traditional ops shop, it's important to keep in mind that many of your old habits and best practices either don't apply or are flipped upside down in a Docker environment. For example, you're probably going to use config management with Chef or Ansible a lot less, and convert your playbooks into Dockerfiles instead. Ansible/Chef/etc is based on the assumption that infrastructure has some level of permanence: you stand up a box, set it up with the right services and configuration, and it will probably be there and configured when you get around to deploying your app to it. By contrast, in the Docker world, things are much more just-in-time: you stand up and configure your container(s) while deploying your app. And when you update your app, you just toss the old containers and build new ones.

Another practice that may feel unnatural is the foregrounding of (the main) processes. On a traditional web server, you'd typically run nginx, some kind of app server, and your actual app, all in the background. Docker, on the other hand, tends to use a one-service-one-container approach, and because a container dies when its main process does, you have to have something running in the foreground (not daemonized) for your container to stay up. Typically that'll be your main process (e.g. nginx), or you'll daemonize your main process and have an infinite tail -f /some/log as your main process.

As a corollary, while traditional server setups often have a bunch of backgrounded services all logging to files, a typical Dockerized service will only have one log you care about (the one for your main process), and because a container is usually an ephemeral being, its local file system is best treated as disposable. That means not logging to files, but to stdout instead. It's great for watching what's happening now, but not super convenient if you're used to hopping on a box and doing quick greps and counts or walking through past logs when troubleshooting something that happened an hour ago. To do that, you have to deploy a log management system as soon as your app goes live, not after you have enough traffic and servers that server-hopping, grep and wc has become impractical. So get your logstash container ready, because you need it now, not tomorrow.

It's a decidedly different mindset that takes some getting used to.

I was already on board with the "everything is disposable" philosophy of modern high-availability systems, so conceptually it wasn't a huge leap, but if you're coming from a traditional shop with bare-metal (or even VM) deployments, it's definitely a mental switch.

Twelve Factor App Conventions

This one is more specific to the Host than to Docker in general, but it's part of an opinionated movement in modern software dev shops that includes Docker (and Rails, and Heroku), so I'll list it here. The Twelve-Factor App manifesto is a practical methodology for building modern apps delivered over the web. There's a lot of good stuff in there, like the emphasis on explicit declarations or the importance of a dev/stage environment matching production closely. But there's also questionable dogma that I find technically offensive. Specifically, factor 3 holds that configuration must be stored in the environment (as opposed to config files or delivered over some service).

I believe this is wrong. The app is software that runs in user space; the environment is a safe, hands-off container for the app. The environment and the app live at different levels of resolution: all the app stuff is inward-looking, only for and about the app; while the environment is outward-looking, configured with and exposing the right data for its guests (the apps and services running in the environment). Storing app-level (userspace) data in the environment is like trusting the bartender in a public bar with your specific drink preferences, and asking her what you like to drink (yes, this is a bad simile).

In addition, the concerns, scope, skills, budget, toolsets, and personalities of the folks involved in app work tend to be different from those of people doing the environment (ops) stuff. And while I'm ecstatic that devs and ops people appear to finally be merging into a "devops" hybrid, there's a host of practical reasons to divide up the work.

In practical terms, storing configuration in the environment also has significant drawbacks given the tools of the trade: people like me use grep dozens of times every day, and grepping through a machine's environment comprehensively (knowing that env variables may have been set as different Unix users) is error-prone and labor-intensive for no discernible benefit. Especially when your app is down and you're debugging things under pressure. It's also very easy to deploy what's supposed to be a self-contained "thing" (your twelve-factor app) and see it fail miserably, because someone forgot to set the environment variables (which highlights the self-contradictory, leaky nature of that config-in-the-environment precept: if your app depends on something external to it (the environment), it's not self-contained).

Another driver for the config-in-the-environment idea is to make sure developers don't store sensitive information like credentials, passwords, etc. in code that winds up in source control (and thus on every dev's computer, and potentially accidentally left in code you helpfully decided to open-source on GitHub). That makes a ton of sense and I'm all for it. But for practical purposes, this still means every dev who wants to do work on their local machine needs a way to get those secrets onto their computer, and there aren't a lot of really easy-to-use, auditable, secure and practical methods to share secrets. In other words, storing configuration in the environment doesn't solve a (very real) problem: it just moves it somewhere else, without providing a practical solution.

You may find this distinction specious, backwards, antiquated, or whatever. That's fine. The environment is the wrong place to store userspace/app-specific information. Don't do it.

That was a long-winded preamble to what I really wanted to discuss, namely the fact that the Host embraces this philosophy, and in quite a few instances it's made me want to punch the wall. In particular, the Host makes you set environment variables using a command-line client that's kind of like running remote ssh commands, meaning that values you set need to be escaped, and they don't always get escaped or unescaped the way you expect when you query them. So if you set an environment variable value to its current value as queried by the command-line client, you'll double-escape the value (e.g. "lol+wat" gets first set as "lol\+wat"; looking it up returns "lol\+wat" (escaped); resetting it turns it into "lol\\\+wat"; i.e. a set-get-set operation isn't idempotent). All this is hard-to-debug, painfully annoying, and completely unnecessary if the model wasn't so stupid about using the environment for configuration.

Dev == Prod?

One of the twelve-factor tenets is that dev/stage should mirror production closely. This is a very laudable goal, as it minimizes the risk of unexpected bugs due to environment differences (aka "but it worked on my machine"). It's especially laudable as a lot of developers (at least in Silicon Valley) have embraced OSX/macOS as their OS of choice, even though nobody deploys web apps to that operating system in production, which means there's always a non-zero risk of stuff that works on dev failing on production because of some incompatibility somewhere. This also means every dev wastes huge amounts of time getting their consumer laptop to masquerade an industrial server, using ports and casks and bottles and build-from-source and other unholy devices, instead of just, you know, doing the tech work on the same operating system you're deploying on, because that would mean touching Linux and ewww that's gross.

Originally, the Company had wrapped its production apps into Docker container using the Host's standard Dockerfiles and Procfiles, but devs were doing work on their bare-metal Macs, which meant finding, installing and configuring a whole bunch of things like Postgres, Redis, nginx, etc. That's annoying, overwhelming for new employees (since the documentation or Ansible playbooks you have to do that work are always behind and out of date about what actually happens on dev machines), and a pain to keep up to date. Individual dev machines drift apart from each other, "it works on my machine (but nor on yours)" becomes a frequent occurrence, and massive amounts of time (and money) are wasted debugging self-inflicted problems that really don't deserve to be debugged when it's so easy to do it right with a Linux VM and Ansible playbooks, but that would mean touching Linux and ewww that's gross.

So I was asked to wrap the dev environment into Dockerfiles, and ideally we'd use the same Dockerfile as production, so that dev could truly mirror prod and we'd make all those pesky bugs go away. Good plan. Unfortunately, though, I didn't find that to be practical in the Company's situation: the devs use a lot of dev-only tools (unit test harnesses, linters, debuggers, tracers, profilers) that we really do not want to have available in production. In addition, starting the various apps and services is also done differently on dev and prod: debug options are turned on, logging levels are more verbose, etc. So we realized and accepted the fact that we just can't use the same Dockerfile on dev and on prod. Instead, I've been building a custom parent image that includes the intersection of all the services and dependencies used in the Company's various apps, and converting each app's Dockerfile to extend that new base image. This significantly reduces the differences and copy-pasta between Dockerfiles, and will give us faster deployments, as the base image's file system layers are shared and therefore more likely to be cached.

Runtime v. Build Time

Back to Docker-specific bits, this one was a doozy. When building the dev Dockerfiles, I had split the setup between system-level configuration (in the Dockerfile) and app-specific setup (e.g. pip installs, node module installation, etc), which lived in a bootstrap script executed as the Dockerfile's CMD. It worked well, but it felt inelegant (two places to look for information about the container), so I was asked to move the bootstrap stuff into the Dockerfile.

The devs' setup requirements are fairly standard: they have their Mac tools set up just right, so they want to be able to use them to edit code, while the code executes in a VM or a Docker container. This means sharing the source code folder between the Mac host and the Docker containers, using the well-supported VOLUME or -v functionality. Because node modules and pip packages are app-specific, they are listed in various bog-standard requirements.txt and package.json files in the code base (and hence in the Mac's file system). As the code base is in a shared folder mounted inside the Docker container, I figured it'd be easy to just put the pip install stuff in the Dockerfile and point it at the mounted directories.

But that failed, every time. A pip install -e /somepath/ that was meant to install a custom library in editable mode (so it's pip-installed the same way as on prod, but devs can live-edit it) failed every time, missing its setup.py file, which is RIGHT THERE IN THE MOUNTED FOLDER YOU STUPID F**KING POS. A pip install -r /path/requirements.txt also failed, even though 1) it worked fine in the bootstrap script, which is also in the same folder/codebase 2) the volumes were specified and mounted correctly (I checked from inside the container).

That's when I realized the difference between build time and runtime in Docker. The stuff in the Dockerfile is read and executed at build time, so your app has what it needs in the container at runtime. During build time, your container isn't really running--a bunch of temporary containers briefly run so various configuration steps can be executed, and they leave file system layers behind as Docker moves through the Dockerfile. The volumes you declare in your Dockerfile and/or docker-compose.yml file are mounted as you'd expect (you can ssh into your container and see the mount points); but they are only bound to the host's shared folders at runtime. This means that commands in your Dockerfile (which are used at build time) cannot view or access files in your shared Mac folder, because those only become available at runtime.

Of course you could just ADD or COPY the files you need from the Mac folder into the mounted directory, and do your pip install in the Dockerfile that way. It works, but it feels kinda dirty. Instead, what we'll do is identify which pip libraries are used by most services, and bake those into our base image. That'll shave a few seconds off the app deployment time.

Editorializing a bit, while I (finally) understand why things behaved the way they did, and it's completely consistent with the way Docker works, I feel it's a design flaw and should not be allowed by the Docker engine. It violates the least-surprise principle in a major way: it does only part of what you think it will do (create folders and mount points). I'd strongly favor some tooling in Docker itself that detects cases like these and issues a WARNING (or barfs altogether if there was a strict mode).

Leaky Abstractions and Missing Features

Docker aims to be a tidy abstraction of a self-contained black box running on top of some machine (VM or bare-metal). It does a reasonable job using its union file system, but the abstraction is leaky: the underlying machine still peeks through, and can bite you in the butt.

I was asked to Dockerize an on-prem application. It's a Java app which is launched with a fairly elaborate startup script that sets various command-line arguments passed to the JVM, like memory and paths. The startup script is generic and meant to just work on most systems, no matter how much RAM they have or where stuff is stored in the file system. In this case, the startup script sets the JVM to use some percentage of the host's RAM, leaving enough for the operating system to run. It does this sensibly, parsing /proc/meminfo and injecting the calculated RAM into a -Xmx argument.

But when Dockerized, the container simply refused to run: the Host had allocated some amount of RAM to it, and the app's launcher was requesting 16 times more, because the /proc/meminfo file was... the host EC2 instance's! Of course, you could say "duh, that's a layered file system, of course that's what it does" and you'd be right. But the point is that a Docker container is not a fully encapsulated thing; it's common enough to query your environment's available RAM, and a clean, encapsulated container system should always give an answer that's reflective of itself, not breaking through to the underlying hardware.

Curious "Features"

Docker's network management is... peculiar. One of its more esoteric features is the order in which ports get EXPOSEd. I was working on a Dockerfile that was extending a popular public image, and I could not make it visible to the outside world, even though my ports were explicitly EXPOSEd and mapped. My parent image was EXPOSing port 443, and I wanted to expose a higher port (4343). For independent reasons, the Host's system only exposes the first port it finds, even if several are EXPOSEd; and because there's no UNEXPOSE functionality, it seemed I'd have to forget about extending the public base image and roll my own so I could control the port.

But the Host's bottomless knowledge of Docker revealed that Docker exposes ports in lexicographic order. Not numeric. That means 3000 comes before 443. So I could still EXPOSE port a high port (3000) as long as lexicographically it appeared before the base image's port 443, and the Host would pick that one for my app.

I still have a bruise on my forehead from the violent D'OHs I gave that day.

On a slightly higher level than this inside-baseball arcana, though, this "feature" also shows how leaky the Docker abstraction is: a child image is not only highly constrained by what the parent image does (you can't close/unexpose/override ports the parent exposes), it (or its auhor) needs to have intimate knowledge of its parent's low-level details. Philosophically, that's somewhat contrary to the Docker ideal of every piece of software being self-contained. Coming at it from the software world, if I saw a piece of object-oriented code with a class hierarchy where a derived class had to know, be mindful of, or override a lot of the parent class's attributes, that'd be a code smell I'd want to get rid of pretty quickly.

Conclusion: Close, But Not Quite There

There is no question Docker is a very impressive and useful piece of software. Coupled with great, state-of-the-art tooling (such as the container tools available from AWS and other places), and some detailed understanding of Docker internals, it's a compelling method for deploying and scaling software quickly and securely.

But in a resource-constrained environment (a small team, or a team with no dedicated ops resource with significant Docker experience), I doubt I'd deploy Docker on a large scale until some of its issues are resolved. Its innate compatibility with ephemeral resources like web app instances also makes it awkward to use with long-running services like databases (also known as persistence layers, so you know they tend to stick around). So you'll likely end up with a mixed infrastructure (Docker for certain things, traditional servers for others; Dockerfiles here, Ansible there; git push deploys here, yum updates there), or experience the ~~ordeal~~ joy of setting up a database in Docker.

Adding to the above, the Docker ecosystem also has a history of shipping code and tools with significant bugs, stability problems, or non-backward-incompatible changes. Docker for Mac shipped out of beta with show-stopping, CPU-melting bugs. The super common use case of running apps in Docker on dev using code in a shared folder on the host computer was only resolved properly a few months ago; prior to that, inotify events when you modified a file in a shared, mounted folder on the host would not propagate into the container, and so apps that relied on detecting file changes for hot reloads (e.g. webpack in dev mode, or Flask) failed to detect the change and kept serving stale code. Before Docker for Mac came out, the "solution" was to rsync your local folder into its alter ego in the container so the container would "see" the inotify events and trigger hot reloads; an ingenious, effective, but brittle and philosophically bankrupt solution that gave me the fantods.

Docker doesn't make the ops problem go away; it just moves it somewhere else. Someone (you) still has to deal with it. The promise is ambitious, and I feel it'll be closer to delivering on that promise in a year or two. We'll just have to deal with questionable tooling, impenetrable documentation, and doubtful stability for a while longer.

Book in progress: Programming Basics

2016-11-11T18:37:00.001-08:00

I started teaching a friend Java, and figured I might as well share the notes I wrote with anybody who may want them. Feedback appreciated (do read the README first to get a sense of the goals and intended audience, though):

https://github.com/rogthefrog/programming-basics-with-java

Python import basics in plain English

2016-09-02T14:20:00.002-07:00

In my own experience learning Python, and that of others on Python teams I've worked with, a common hurdle is understanding how Python does imports.

The basics are actually very simple, but the documentation tends to be a little neckbeardy and dense, and hard to grok if you're new to the language. So I thought I'd list common, simple practical examples of Python imports in case they help someone.

To import data and functions from somewhere else (another .py file in your project, a standard library like os, or a third-party library you may have installed with pip), you have the following options:

import <module>
import <module> as <other_name>
from <module> import a, b, c
from <module> import *

Let's look at what these options mean.

import <module>

import <module> means the program you're in can access everything that is defined in <module> (variables, classes, functions, etc), and you have to prepend "<module>." to those things. For example, if is the "os" library (it comes with Python), which defines a function called getpid and a variable named name, your program can do this:

>>> import os

>>> os.name

'posix'

>>> os.getpid()

51678

This works with your own libraries too. Say you created a Python file named network_functions.py which contains a constant named BANDWIDTH and a method named connect(url), you can do:

>>> import network_functions

>>> network_functions.BANDWIDTH

1024

>>> network_functions.connect('https://google.com')

Connecting...

If you don't want to have to type out the whole prefix (which can get unwieldy if your imports are nested (the modules are subdirectories), e.g. import lib.network.connection_functions), you have the following options:

import <module> as <other_name>

This lets you use <other_name> instead of the module's full name.

>>> import lib.network.connection_functions as netfunc

>>> netfunc.BANDWIDTH

1024

>>> netfunc.connect('https://google.com')

Connecting...

from <module> import a, b, c

This lets you import only what you need from module into your current program's namespace. This means everything you imported from the external module can be called by its bare name in your program:

>>> from lib.network.connection_functions import BANDWIDTH, connect

>>> BANDWIDTH

1024

>>> connect('https://google.com')

Connecting...

from <module> import *

Import everything from the imported module into your current program's namespace, so you can call everything from the module by its bare name. This is strongly not recommended, and I'll explain why.

>>> from lib.network.connection_functions import *

>>> BANDWIDTH

1024

>>> connect('https://google.com')

Connecting...

This is almost never a good idea, because you don't always know or control what is defined in an external module, and there can be name collisions, e.g. functions or variables with the same name, so you may not be using the variable or function you expect! For example:

In file helpers.py:

def connect(url):

print "Connecting to", url

In file network.py:

def connect(url):

print "Hacking into", url

>>> from helpers import *

>>> from network import *

>>> connect('https://google.com')

# which one is called?

>>> from network import *

>>> from helpers import *

>>> connect('https://google.com')

# which one is called?

This example may seem contrived, but it's very common to import a bunch of modules written by different people, and some variable or function names are common or obvious enough that they may appear several times in different modules. Why wouldn't they, after all? Joe doesn't know about Jill's (or your own) module, so they have no reason to coordinate and ensure they're not using the same function names.

If you use from <module> import * with several modules, the odds are very good you'll call a function and actually invoke one that's not the one you expect. And that can be really tricky to debug.

So what should you do if you do need a bunch of functionality from a module and don't want to import every single function and variable by name with:

from <module> import var1, var2, var3, fun1, fun2, fun4 # etc

It's simple! Don't use from <module> import *. Instead, use import <module> and presto, your program can use everything from <module>, as long as you prefix <module>. before the names.

>>> import os

>>> os.getpid()

'posix'

Handy Tips

Do you ever want to know the variables or functions defined in a module you imported without having to Google them? Just use vars or dir:

>>> import os

>>> dir(os)

['EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_OK', 'NGROUPS_MAX', 'O_APPEND', 'O_ASYNC', 'O_CREAT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_EXLOCK', 'O_NDELAY', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_RDONLY', 'O_RDWR', 'O_SHLOCK', 'O_SYNC', 'O_TRUNC', 'O_WRONLY', 'P_NOWAIT', 'P_NOWAITO', 'P_WAIT', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'UserDict', 'WCONTINUED', 'WCOREDUMP', 'WEXITSTATUS', 'WIFCONTINUED', 'WIFEXITED', 'WIFSIGNALED', 'WIFSTOPPED', 'WNOHANG', 'WSTOPSIG', 'WTERMSIG', 'WUNTRACED', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_copy_reg', '_execvpe', '_exists', '_exit', '_get_exports_list', '_make_stat_result', '_make_statvfs_result', '_pickle_stat_result', ] # and a bunch more

>>> vars(os)

{'WTERMSIG': , 'lseek': , 'EX_IOERR': 74, 'EX_NOHOST': 68, 'seteuid': , 'pathsep': ':', 'execle': , '_Environ': , ] # and a bunch more

What's Next?

In a later post, I'll cover other tricky aspects of imports, namely how Python maps import to Python code files in directories, and how to debug ImportError: No module named <module> problems that can occur depending on what directories your files are in.

Hopefully that was helpful!

Django, static assets, versioning, and WhiteNoise

2016-04-18T22:04:00.002-07:00

I had an interesting time troubleshooting an issue with Django, WhiteNoise and static asset versioning. This may be obvious to experienced Django users, but not to me; I've maintained Flask and Rails apps before, but Django is a new beast. I'll document it here in case it helps somebody.

My goal was to set up asset versioning in a Django app to serve static files as filename.somehash.js instead of filename.js (same with other file types like css, png, etc). This is standard practice; most modern frameworks have that capability, and different ways to do it.

I had started using WhiteNoise because the internets suggested it was a much, much easier task than other ways to do it. I was hoping to do asset versioning and deploy a Cloudfront CDN at the same time, and WhiteNoise is set up to do just that.

Once everything was set up according to the documentation, I ran python manage.py collectstatic and saw the versioned file names getting generated. Checking the files themselves confirmed that. But when I loaded the app in a browser, only the unversioned file names were being requested.

After much head-scratching, I found this was because the app templates reference the static files with the standard {% load static %} method. The problem went away when I changed that to {% load static from staticfiles %} as suggested in this closed issue on the subject. Note that I didn't try the other option mentioned in that issue, {% load staticfiles %}, but that should also work.

Once the app restarted, beautiful unique file names were being requested and served. But I was occasionally getting 500 errors. I traced those back to instances where the app and WhiteNoise were being asked to serve files that no longer exist. Those references to deleted js, css, etc. files didn't actually harm the app's functionality, but when WhiteNoise is asked to serve them, it throws an exception and causes the app to 500.

That's not ideal behavior--my take is that 50x errors in a production app should never happen and always be handled gracefully when they do, so a library that causes 500s by actively raising exceptions rather than logging, catching and handling them gracefully isn't ideal. But them's the breaks, and I might yet submit a PR to the owner if I find the time.

In this particular app's case, this behavior was especially non-ideal because some of these files were referenced in commented-out JavaScript, and not actually requested; it looks like WhiteNoise and/or Django greedily consider anything that looks like a static file path to be actually requested, even if it's in code that doesn't execute.

The solution is simple--find all those dangling references and exterminate them! Use those 500s to your advantage by exercising the app and tailing your error logs. It's easy to argue that's something you should do no matter what, so it wasn't hard to convince the code owners it was the right thing to do.

Installing wxPython in a virtualenv on Centos 6.7

2016-01-04T10:25:00.002-08:00

I'm looking at wxPython to write a GUI for an app I'm working on, and as it turns out using wxPython with virtual environments isn't completely obvious. Hopefully someone finds this helpful.

My distribution is CentOS 6.7 with a hand-built Python 2.7.6.

Step 1: Install wxPython

This will install wxPython in your system's default Python library directory (not the one you want).

$ sudo yum install wxPython
$ sudo find / -name wx*.py
/usr/lib64/python2.6/site-packages/wxversion.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wxPython/lib/wxpTag.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wxPython/lib/wxPlotCanvas.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/tools/XRCed/plugins/wxlib.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/tools/Editra/src/wxcompat.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/lib/wxcairo.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/lib/wxpTag.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/lib/wxPlotCanvas.py

Step 2: Create and activate your virtualenv

$ cd
$ virtualenv -p /usr/bin/python2.7 venv
$ source venv/bin/activate

Importing wx will fail:

$ python
Python 2.7.6 (default, Dec 2 2013, 21:17:42)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import wx
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named wx

Step 3: Symlink wxPython into your virtualenv

$ cd ~/venv/lib/python2.7/site-packages
$ ln -s /usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx
$ ln -s /usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wxPython
$ ln -s /usr/lib64/python2.6/site-packages/wxversion.py

Step 4: Start coding!

$ source venv/bin/activate
$ python
Python 2.7.6 (default, Dec 2 2013, 21:17:42)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import wx
>>> app = wx.App()
>>> frame = wx.Frame(None, -1, 'lol')
>>> frame.Show()
True
>>> app.MainLoop()

Note: I only just started playing with wxPython, so there may be other symlinks required to make it work. Let me know if so, and I'll update the post.

Recruiter isn't even trying anymore

2015-03-25T13:41:00.002-07:00

I got this InMain on LinkedIn today.

March 25, 2015, 1:33 PM
Dear Roger,
Trust this finds you in good health!! I wanted to share a Back End Engineer role with you with one of my direct clients , its a long term contract and location is Mountain View, CA. Here is the short story:

The candidate should be able to quickly adapt to different environment/framework as the project needs.
These are minimum required skills.
Open source backend stack
Web backend frameworks in Java OR C#, OR Python OR Ruby, etc.
Database design and maintenance using SQL and/or NoSQL
Let me know if you think that this would be a fit as per your skills and expertise, if no, maybe you can refer someone !!
Best,
Gargi

Nothing about the company (industry? size? age? product?)
The technology stack is essentially "whatever" (but open source! Including C#!)
The data stack is essentially "whatever"
The role is web AND database design AND dba

I LOL'ed heartily as I marked the InMail as spam.

Comcast lies

2015-02-19T15:39:00.001-08:00

Verbatim transcript of a chat session with Naval. I was having internet connectivity problems. Naval says (s)he resolved them, and then did the standard upsell trick, in which (s)he lied. Then when I noticed the problems weren't resolved, (s)he lied again.

Summary

Lie #1: " you will get additional service at a cheaper cost."

The rep correctly showed my current service is 50m down and basic cable TV for $82 / mo. He said Comcast had an offer where I could get "additional service at a cheaper cost." The "cheaper cost" is $101.99 for first 12 months and then $126.99 from 13-24 months.

Lie #2: "Every thing is totally fine"

At the exact time Naval said "Every thing is totally fine," Comcast was having well-documented problems:

https://downdetector.com/status/comcast-xfinity/san-francisco

Lie #3 (probably): "You will also check this problem on our site also"

The "server upgradation" and service degradation is not listed.

Trancript

Roger,I have also checked your account and found that we have a great offer for you where you will get additional service at a cheaper cost.

what is that?

I can see that currently you are paying around $82 .00 plus rental of devices and getting 50mbps speed

that's correct, yes

And limited basic package for cable in which you are getting 10

channels

However, I have a good deal for you, In this package you will get 140+ Cable Channels and 105 mbps Internet Speed & Unlimited Nationwide Talk and Text.

Double Speed!!

14 times channels!!

Sounds Good?

"additional service at a cheaper cost." = what is the cost?

This package will cost you only $101.99 for first 12 months, after that it will cost you $126.99 from 13-24 months.

how is that cheaper than $82?

In this you will get double speed and 14 times channel and unlimited text and talk.

you said additional services at a cheaper cost.

then you listed additional services at a higher cost.

We are so confident in our products and services that we would want you to try them - risk free. If you’re not satisfied and wish to cancel the services for any reason, you can do so in the first 30 days and get your money back.

what you did above is called bait and switch and it's illegal in the US

this is one of the reasons people hate Comcast so much.

Okay. I appreciate your decision.

you just lied to me.

Your satisfaction is my priority. Is there anything else I can assist you with? I am more than glad to help you out further.

I am satisfied with the tech support, but I am not satisfied with being lied to.

Thank you.

It was a pleasure assisting and chatting with you today! Have a great day and thank you for choosing Comcast. If you have any further questions, please do not hesitate to give us a call at 1-877-870-4310 or visit us at www.comcast.net for technical support. We appreciate your business!

you're funny.

Naval: I am glad to have assisted and get your concern fully resolved for today!

Naval: You have opened a window for Comcast Support and I don't want to miss an opportunity to support you. Are we still connected?

Naval: Roger, Is there anything else I can assist you with?

Roger_: my internet problems aren't resolved

Roger_: still happenin

Roger_: g

Roger_: lots of services are hanging, etc.

Naval: Let me check that out for you. Would you mind waiting for a couple of minutes while I do the research?

Roger_: sure

user Roger_ has left room

user Roger has entered room

Naval: Welcome Back!!

Naval: Roger, As I can check from here your account is totally fine and you are getting good services.

Naval: Please run a speedtest for us to ascertain the speed you are receiving at the moment and send me the result link once you're done. Here's the link : http://speedtest.comcast.net/

Roger_: starting speed test

Naval: Okay.

Roger_: haha

Roger_: "starting in 5900 seconds"

Roger_: that's an hour and a half

Naval: Please allow me a minute .

Roger_: normally I use speedtest.net but it's unreachable for me today

Roger_: I'll try this http://www.speakeasy.net/speedtest/

Roger_: I just ran the test

Roger_: http://imgur.com/eMiqGft

Naval: Roger, you are facing this issue because our system is upgrading.

Roger_: 8.80M down, 0.68 up

Roger_: I just did a tracert to amazonaws and it fails

Naval: Every thing is totally fine.

Roger_: Tracing route to amazonaws.com [207.171.166.22] [formatting garbled, thanks cmd.exe, lots of timeouts]

Roger_: how can you say everything is totally fine when I'm getting 8M down and traceroutes are not working

Naval: You will get the right speed once the upgradation procedure is finished.

Roger_: you're saying my entire set of problems is because you're upgrading something?

Naval: It is due to our server upgradation.

Naval: No worries.

Roger_: what kind of server? I'd like to know more about this.

Naval: Sure!!

Roger_: please don't tell me no worries. I work on the internet. That's how I make a living. I have deliverables. If my internet is not working, I cannot work. This is a big deal.

Naval: Roger, You will also check this problem on our site also.

Roger_: I checked before this chat and it said there were no issues.

Roger_: where is the status shown?

Naval: I am doing advanced trouble shooting steps from here.

user Roger_ has left room

user Roger has entered room

Naval: Please check now.

Naval: Please tell me the result.

Roger_: aha

Roger_: now we're talking

Roger_: ... maybe

Roger_: nope.

Roger_: no better.

Naval: Roger, Please provide me your reliable phone number.

Roger_: xxx xxx xxxx

Roger_: 20.96 down, 0.44 up

Naval: I am raising ticket for you.

Naval: Our senior technical team will call you on the number given by you.

Naval: Please note down your ticket number

Update

I spoke with another rep, Jose, who was great. Excerpts from our chat, mostly for the lulz:

Jose: May I know the issue that our rep lied to you?
Roger: (summary of the above)
Roger: I asked the price
Roger: "This package will cost you only $101.99 for first 12 months, after that it will cost you $126.99 from 13-24 months."
Roger: $101.99 is not actually cheaper than $91.96.
Jose: yes that is correct. Its simple math.

Then this unintentionally hilarious pearl:

Jose: On internet issue they can be corrected by our Internet dept.
Jose: We are cable troubleshooting dept so all of our tools here are for cable TV.
Jose: On broken promises those usually happens on Sales dept.Jose: Here on cable troubleshhoting we dont lie.

Enabling LZO compression for Hive to avoid cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat error

2014-06-19T13:29:00.005-07:00

I just spent a bunch of time reading through documentation and Google Group postings about how to enable LZO compression in Hive, only to find none of them was the right solution. In the end I did find something that worked, so hopefully this can help someone.

Goal: enable LZO-compressed files to be used for Hive tables.

Environment: Hadoop cluster managed with Cloudera Manager version 5.

Prerequisites:

install and activate the parcel that contains the LZO library as shown here
configure it as shown here

What's missing from the instructions and the Google Group postings about that error is how to tell Hive where to find the Hadoop LZO jar. The instructions about classpath settings above are not sufficient, and you'll have this error when running a Hive query against an LZO table:

cannot find class com.hadoop.mapred.DeprecatedLzoTextInputFormat

To fix this:

Go to your Cloudera Manager UI home page
Click Hive
Click Configuration > View and Edit
Under Service-Wide > Advanced, look for Hive Auxiliary JARs Directory
Set the value to /opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib
Restart the Hive service (and any related services)

Now you can run queries against LZO-compressed files.

As a reminder, to create a table backed by LZO-compressed files in HDFS, do something like this:

CREATE EXTERNAL TABLE `my_lzo_table`(`something` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/hdfs/path/to/your/lzo/files';

Cloudera Manager Fails to Restart

2014-06-11T18:04:00.001-07:00

I've been experimenting with Cloudera Manager to manage Hadoop clusters on EC2. So far it seems to be working a little better than Ambari, which managed to install its agent software on all my nodes but always failed to start the required services.
Cloudera Manager did fail as well, but that seemed to be due to my security group settings. I changed the configuration and restarted the service on the host, but the restart always failed with this error:

Caused by: java.io.FileNotFoundException: /usr/share/cmf/python/Lib/site$py.class (Permission denied)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at org.hibernate.ejb.packaging.ExplodedJarVisitor.getClassNamesInTree(ExplodedJarVisitor.java:126)
at org.hibernate.ejb.packaging.ExplodedJarVisitor.getClassNamesInTree(ExplodedJarVisitor.java:134)
at org.hibernate.ejb.packaging.ExplodedJarVisitor.getClassNamesInTree(ExplodedJarVisitor.java:134)
at org.hibernate.ejb.packaging.ExplodedJarVisitor.doProcessElements(ExplodedJarVisitor.java:92)
at org.hibernate.ejb.packaging.AbstractJarVisitor.getMatchingEntries(AbstractJarVisitor.java:149)
at org.hibernate.ejb.packaging.NativeScanner.getClassesInJar(NativeScanner.java:128)
... 31 more

Odd, I thought, since by default the service runs as root and should have free rein. So I poked around in that Python library directory, and lo and behold:

I chmod'ed 644 the .class files (in /usr/share/cmf/python/Lib and /usr/share/cmf/python/Lib/simplejson) and sure enough everything is working again.
Hopefully this is helpful to somebody.

Installing Scrapy on an Amazon CentOS AMI

2014-05-26T15:30:00.002-07:00

We're experimenting with Scrapy, and I thought I'd share what I found while installing the Scrapy package, as it has multiple dependencies many Python installations don't normally include, and those are not listed in the documentation.

* First off, you want bzip2:

$ cd /tmp
$ wget http://bzip.org/1.0.6/bzip2-1.0.6.tar.gz
$ tar -xzf bzip2-1.0.6.tar.gz
$ cd bzip2-1.0.6
$ sudo make -f Makefile-libbz2_so
$ sudo make
$ sudo make install PREFIX=/usr/local
$ sudo cp libbz2.so.1.0.6 /usr/local/lib

* Then you want the libffi headers

$ sudo yum install libffi-devel

* Then you want to download the Python source and extract it:

$ wget https://www.python.org/ftp/python/2.7.6/Python-2.7.6.tgz
$ tar -xzf Python-2.7.6.tgz
$ cd Python-2.7.6

* I usually uncomment any lines referencing ssl and zlib in Modules/Setup.dist

$ vi Modules/Setup.dist
# find and uncomment the lines

* Then build Python:

$ ./configure --prefix=/usr/local
$ sudo make
$ sudo make altinstall

* Install virtualenv if you don't have it (you should)

* Activate your virtualenv with your fresh Python

$ cd
$ virtualenv -p /usr/local/bin/python2.7 myenv
$ source myenv/bin/activate

* Install Scrapy

$ easy_install Scrapy

This was tested on an Amazon AWS image named amzn-ami-pv-2013.09.2.x86_64-ebs (ami-a43909e1), known as Amazon Linux AMI x86_64 PV EBS with the following version strings:

$ cat /proc/version
Linux version 3.4.73-64.112.amzn1.x86_64 (mockbuild@gobi-build-31003) (gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC) ) #1 SMP Tue Dec 10 01:50:05 UTC 2013
$ cat /etc/*-release
Amazon Linux AMI release 2014.03

Elastic MapReduce, Hive and Input Files

2014-04-24T17:10:00.002-07:00

We're using Hive and Amazon's Elastic MapReduce to process sizable data sets. Today, I was wondering why a simple count query on a table with under a billion rows was taking a long time. The table file is in a single gzipped file in an S3 bucket, and Hive was only using a single mapper. So I thought, hrm, it looks like the job isn't distributed at all, so let's try splitting the input file into a bunch of smaller files to see if Hive will be able to put more mappers to work.

This is the initial slow job, with a single gzipped file for the table in S3:

-- SINGLE .gz FILE AS HIVE TABLE
hive> select count(*) FROM mytable;

Job 0: Map: 1 Reduce: 1 Cumulative CPU: 254.84 sec HDFS Read: 207 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 14 seconds 840 msec
OK
239370915
Time taken: 274.51 seconds, Fetched: 1 row(s)

This is the same job run against 240 non-gzipped files for the table in S3:

-- MULTIPLE FILES, not gzipped
hive> select count(*) FROM mytable_multiple_files_no_gzip;

Job 0: Map: 48 Reduce: 1 Cumulative CPU: 538.05 sec HDFS Read: 25536 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 8 minutes 58 seconds 50 msec
OK
239370915
Time taken: 55.071 seconds, Fetched: 1 row(s)

Not bad, eh?

Then I tried the same split schema, except each file was gzipped individually (240 gzipped input files):

-- MULTIPLE FILES, gzip
hive> select count(*) FROM mytable_multiple_files_gzip;

Job 0: Map: 240 Reduce: 1 Cumulative CPU: 1552.43 sec HDFS Read: 52080 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 25 minutes 52 seconds 430 msec
OK
239370915
Time taken: 112.735 seconds, Fetched: 1 row(s)

So with gzipped input files, I had a one mapper-one file relationship; with uncompressed input files, I had a one mapper-five files relationship.

These numbers were obtained on a cluster with 8 i2.2xlarge data nodes and an m3.xlarge name node.

Typically (at least that's what a cursory Google search suggests), people have the opposite problem--too many small-ish files in S3, and too many mappers. Too many mappers can delay your reducers' work. So I'll do some testing on different splitting schemas for the same data set and update.

McCarthy was self-righteous too

2014-04-04T15:38:00.000-07:00

Brendan Eich, inventor of JavaScript, just resigned from his brand new position as CEO of the Mozilla foundation, after it was discovered he made a $1000 donation to the anti-gay-marriage campaign in California known as Prop 8.

That discovery caused uproar among the self-righteous bien-pensants who work for Mozilla, and a number of employees posted tweets about how they thought he should resign.

I'm angry about this because this isn't very different from McCarthyism in reverse. A guy was forced out of a job because his political views don't agree with the majority.

I feel opposing gay marriage is bigoted, wrong, indefensible and on the wrong side of history. I don't know Eich. For all I know he's a raging asshole with ultra-right-wing views. He might even hate kittens and burp at the dinner table. I don't know.

But what I do know is that getting forced out of a job by a self-righteous San Francisco mob of entitled nerds who have probably never even seen a Republican in the flesh is just as indefensible. It's not what America and California are about. And it shows liberals can be assholes, too, when they put their minds to it.

I'd venture to say a very large number of CEOs are raging right-wing Republicans with questionable ethics. If you don't like your CEO's politics, you're free to work somewhere else. Your job isn't in grave danger if you and your CEO don't see eye-to-eye in terms of politics--there are laws on the books protecting you from discrimination. Why should your CEO's job be in jeopardy for that very same reason?

Eich's contributions to Web tech are immense and he may well be as capable as anyone of running Mozilla, a company he's been with for years. Yet he lost his job because of his politics. And that's not right, whether you agree with him or not.

Notes on Azkaban

2014-03-12T13:13:00.001-07:00

I've been evaluating tools to run data processing jobs and narrowed my list down to Luigi and Azkaban.

I nixed Luigi for a number of reasons:

you can't execute jobs from the web UI.
you can't schedule jobs--you still have to use cron and all the bs that goes with that (manually managing overlap, e.g. what should I do when my first job is still running when the second job is scheduled to start?).
the documentation is horrid.

So far so good. Azkaban does have some quirks I'm working through. For example:

the executor.host property is not in the default config. The web component wisely defaults to localhost, but it would be handy to have it in the default config, even commented out (like many other properties) so you can run Azkaban in its preferred distributed mode without having to look through Google Groups questions.
~~I still can't figure out how to set up the host configuration for the Hadoop cluster Azkaban is supposed to talk to~~. Fixed--see below.

But the UI is intuitive and it handles the overlap issue (for a given job flow) like a champ: if a job is schedule or run while it's still running, you can tell Azkaban to abort the second run, let it run in parallel, or wait until the first run completes before the second run starts.

Things to remember:

Hadoop and Hive must be installed on the job executor box. Not running, just installed, with the standard HADOOP_HOME and HIVE_HOME env vars set, etc.
Then you have to put your actual cluster's config files in the executor server's Hadoop config directory (typically $HADOOP_HOME/conf). This is because Azkaban looks for your remote Hadoop name node location in its local Hadoop configuration files.
There's a lot of documentation out there based on Hadoop's old (1.x) directory structure. Hadoop 2.x has changed a lot of that and the Jars aren't where you'd expect them. Inspect your classpaths in all the config and properties files used by Azkaban and your Hive jobs. If a Hive job fails, it's a good bet you have a classpath problem, so look at your Azkaban executor server logs (not just the logs in the web interface).
The startup and shutdown scripts in bin/ are pretty brittle. Make sure all the directories are set correctly and you handle errors if they're not. Also, make sure to run them as bin/script.sh instead of cd bin; ./script.sh because they rely on relative directories.
Remember to open port 12321 in the executor server's firewall so the web server can submit jobs.
Remember to open port 9000 on your master Hadoop node so Azkaban can submit jobs to it.
One project == one flow. If you upload more than one flow into a project, only the last one is retained.
You can't run the Hive job examples as-is. They won't work because they're missing a few properties.

This is the basic Hive word count job example:

type=hive
user.to.proxy=azkaban
hive.script=scripts/hive-wc.hql

In order for it to work, it needs to look like this (differences that matter are bolded):

type=hive
user.to.proxy=hadoop
azk.hive.action=execute.query
classpath=./*,./lib/*,${hadoop.home}/*,${hadoop.home}/lib/*,${hive.home}/lib/*
hive.query.file=scripts/hive-wc.hql

I'm using Azkaban 2.1 and Hadoop 1.2.1. Different versions will have different paths and classpaths, but the azk.hive.action, classpath and hive.query.file are crucial (hive.script doesn't work).

Other things that don't work out of the box:

When proxying to use the hadoop user of your choice, you need to set the security manager class to its fully qualified name. The sample config does not fully qualify the class name and so the executor fails to load. To wit:

# hadoop security manager setting common to hadoop jobs
hadoop.security.manager.class=HadoopSecurityManager_H_1_0

should be

# hadoop security manager setting common to hadoop jobs hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_1_0

in the plugins/jobtypes/commonprivate.properties file.

OCD

2013-08-15T17:58:00.001-07:00

The deepest cruelty of OCD is that the D stands for disorder.

A painless git workflow

2013-07-10T18:48:00.003-07:00

I was thrown into the Rails and git worlds a couple of years ago when joining Crunched, and while Ruby was a fairly painless experience, git was heinous for about a year. Then it clicked, mostly thanks to Luke and TJ and a good workflow. So I figured I'd share what we settled on. Maybe it'll same someone some grief.

Static PNGs below:

Engine Tard

2013-06-28T12:43:00.003-07:00

Today I got an email from Engine Yard asking me to be an Engine Yard ambassador / promoter / evangelist. Given our painful experiences with EY, I said no thanks. To their credit, they asked why. So I sent them this list, and will document it here for LOLs and posterity.

What we didn't like about EY:

your pricing is unjustifiably high considering the shortcomings below.
your approach to database replication is facepalm-inducing. It's explicitly designed *not* to be used to fail over. I lost count of how many times I said WTF.
getting SSL to work properly with all the right headers is unnecessarily painful (stunnel).
there's no API to scale up and down by script--everything has to be done manually. WTF again, big time.
you've fixed this (I hope), but for a long time, removing instances from an environment did not remove those from haproxy, which meant that unless you ran the recipes manually, the app master would still be sending traffic to instances that were either turned off or assigned to another customer of yours, resulting in 404s and other hilarious situations. I sang a Viking song of battle and sorrow when I realized that's what was happening and you guys confirmed it (after I was done laughing).
when adding or removing instances from your web UI, the number of failures is greater than the number of successful changes performed. More than half the time we would have to re-run the add or remove process for the instances to be added successfully. This is for a completely vanilla Rails app with a tiny number of servers, i.e. the default / base case you guys are catering to.
when adding more than a couple of instances, the app master's NIC gets flooded by requests used to provision the new instances, which brings the entire stack down. We LOLed heartily when we figured out that was going on. The only safe way to add instances is one or two at a time. Given how long it takes to do that (see above re. failures), it's a giant pain in the rear.
the default stack of app master + slaves is stupid. An active app master serving traffic and SSL termination shouldn't also be a load balancer. That's just dumb.

To be perfectly blunt, I really like the idea of your service, and I'm sure it's great for people to get started with hosting, but every time we tried to do something with our stack, it felt like EngineYard was designed and operated by amateurs who don't have any experience running, let alone hosting, or offering hosting for, a real web business. I wouldn't use EY even if there was 0 markup over AWS.

PS: it got so bad we started calling you guys EngineTard and I drew the attached. Note my Photoshop skills are not particularly advanced.

Update: I have to say EY has class and a sense of humor. After receiving this diatribe, they sent me a $25 Amazon gift card.

Update: One thing I forgot to mention is the frequent billing errors. We cancelled our account on April 2; in May I got a bill for usage we didn't incur, and on July 1 we got a bill for snapshot storage and unused IP addresses for an account that's been closed for 3 months. Sigh.

Ridiculous

2013-06-19T09:12:00.001-07:00

If you use the words "ridiculous" or "ridiculously" as an intensive ("it's ridiculously easy") then we have nothing to say to each other.