roggr: web applications

Showing posts with label web applications. Show all posts

2016-12-01

AMA Part 2: Software Development Resources

In the previous post, I discussed a few ways in which I have found coding bootcamps to be inadequate. In this post, I will present a list of resources I have found very useful as a software engineer. Many are free, and most others can be found in used bookstores (or Amazon) for moderate prices. I'd love to hear other people's favorite resources, so please add yours in the comments or on Twitter.

General Computer Science and Programming

Disclaimer: I didn't study CS formally, so this list is short and almost certainly out of date. Please send me your suggestions!

Aho, Lam, Sethi - Compilers: Principles, Techniques and Tools
Gamma, Helm, Johnson, Vlissides - Design Patterns (the "Gang of Four" book)
Knuth - The Art of Computer Programming
MIT Open Courseware, EE and CS

Language Specific

Official Tutorials / Documentation

The Java Tutorial - https://docs.oracle.com/javase/tutorial/
PHP Manual - http://php.net/manual/en/index.php
Python doc - https://docs.python.org/3/
Ruby doc - http://ruby-doc.org/

Note: I strongly do not recommend a very popular Ruby introduction, Why's (Poignant) Guide to Ruby. A lot of people seem to love it. I found it way too cute, unclear, trying too hard, and just bad.

Books

Kernighan, Richie - The C Programming Language
Liberty, Halpern - The C++ Standard Library From Scratch
Lippman - Essential C++

Note: even if you don't program in C or C++, Kernighan & Richie is a great introduction to computer programming at a low level. The Lippman and Liberty, Halpern books go very well together, and are project-oriented walkthroughs of essential features. All three of these books are very short, but they pack a punch.

Note: get both Nutshell books, and read them in parallel. Mine is a free work-in-progress, aimed at people who have never programmed but want/need to learn Java.

Crockford - JavaScript: The Good Parts
Martelli - Python in a Nutshell
Thomas, Fowler, Hunt - Programming Ruby
Flanagan, Matsumoto - The Ruby Programming Language

Note: anything by Flanagan, Fowler, Beck, Crockford or Martelli is worth reading.

After You've Coded For A While

Fowler, Beck - Refactoring
McConnell - Code Complete
Stroustrup - The C++ Programming Language

Note: I listed Stroustrup here because it's a pretty dense, dry read (do not read it as your first programming book) that requires a good idea of how things work under the hood. It's also not a practical introduction to C++ programming; it's a guide to the C++ language, and from that perspective it's full of vital insights about choices made when C++ was designed, which in turn makes you think about what computer languages can do, and the various ways they do it.

Specific Topics

Sankoff - Time Warps, String Edits, and Macromolecules

2016-11-23

Docker lessons learned 1 year in

A little under a year ago, I started doing devops work for a startup (the Company) with very specialized needs. As it operates in a highly regulated sector, the company's access to their infrastructure is extremely restricted, to prevent accidental or malicious disclosure of protected information. Their in-house web apps and off-the-shelf on-prem software are deployed on a compliant PaaS (I'll call them "the Host", even though they offer vastly more than just hosting), which is very similar to Heroku and uses Docker exclusively for all applications deployed on their private EC2 cloud. I knew about Docker but had never used it, and it's been an interesting few months, so I thought I'd write up some observations in case they help someone.

Topsy Turvy

If you're coming to Docker from a traditional ops shop, it's important to keep in mind that many of your old habits and best practices either don't apply or are flipped upside down in a Docker environment. For example, you're probably going to use config management with Chef or Ansible a lot less, and convert your playbooks into Dockerfiles instead. Ansible/Chef/etc is based on the assumption that infrastructure has some level of permanence: you stand up a box, set it up with the right services and configuration, and it will probably be there and configured when you get around to deploying your app to it. By contrast, in the Docker world, things are much more just-in-time: you stand up and configure your container(s) while deploying your app. And when you update your app, you just toss the old containers and build new ones.

Another practice that may feel unnatural is the foregrounding of (the main) processes. On a traditional web server, you'd typically run nginx, some kind of app server, and your actual app, all in the background. Docker, on the other hand, tends to use a one-service-one-container approach, and because a container dies when its main process does, you have to have something running in the foreground (not daemonized) for your container to stay up. Typically that'll be your main process (e.g. nginx), or you'll daemonize your main process and have an infinite tail -f /some/log as your main process.

As a corollary, while traditional server setups often have a bunch of backgrounded services all logging to files, a typical Dockerized service will only have one log you care about (the one for your main process), and because a container is usually an ephemeral being, its local file system is best treated as disposable. That means not logging to files, but to stdout instead. It's great for watching what's happening now, but not super convenient if you're used to hopping on a box and doing quick greps and counts or walking through past logs when troubleshooting something that happened an hour ago. To do that, you have to deploy a log management system as soon as your app goes live, not after you have enough traffic and servers that server-hopping, grep and wc has become impractical. So get your logstash container ready, because you need it now, not tomorrow.

It's a decidedly different mindset that takes some getting used to.

I was already on board with the "everything is disposable" philosophy of modern high-availability systems, so conceptually it wasn't a huge leap, but if you're coming from a traditional shop with bare-metal (or even VM) deployments, it's definitely a mental switch.

Twelve Factor App Conventions

This one is more specific to the Host than to Docker in general, but it's part of an opinionated movement in modern software dev shops that includes Docker (and Rails, and Heroku), so I'll list it here. The Twelve-Factor App manifesto is a practical methodology for building modern apps delivered over the web. There's a lot of good stuff in there, like the emphasis on explicit declarations or the importance of a dev/stage environment matching production closely. But there's also questionable dogma that I find technically offensive. Specifically, factor 3 holds that configuration must be stored in the environment (as opposed to config files or delivered over some service).

I believe this is wrong. The app is software that runs in user space; the environment is a safe, hands-off container for the app. The environment and the app live at different levels of resolution: all the app stuff is inward-looking, only for and about the app; while the environment is outward-looking, configured with and exposing the right data for its guests (the apps and services running in the environment). Storing app-level (userspace) data in the environment is like trusting the bartender in a public bar with your specific drink preferences, and asking her what you like to drink (yes, this is a bad simile).

In addition, the concerns, scope, skills, budget, toolsets, and personalities of the folks involved in app work tend to be different from those of people doing the environment (ops) stuff. And while I'm ecstatic that devs and ops people appear to finally be merging into a "devops" hybrid, there's a host of practical reasons to divide up the work.

In practical terms, storing configuration in the environment also has significant drawbacks given the tools of the trade: people like me use grep dozens of times every day, and grepping through a machine's environment comprehensively (knowing that env variables may have been set as different Unix users) is error-prone and labor-intensive for no discernible benefit. Especially when your app is down and you're debugging things under pressure. It's also very easy to deploy what's supposed to be a self-contained "thing" (your twelve-factor app) and see it fail miserably, because someone forgot to set the environment variables (which highlights the self-contradictory, leaky nature of that config-in-the-environment precept: if your app depends on something external to it (the environment), it's not self-contained).

Another driver for the config-in-the-environment idea is to make sure developers don't store sensitive information like credentials, passwords, etc. in code that winds up in source control (and thus on every dev's computer, and potentially accidentally left in code you helpfully decided to open-source on GitHub). That makes a ton of sense and I'm all for it. But for practical purposes, this still means every dev who wants to do work on their local machine needs a way to get those secrets onto their computer, and there aren't a lot of really easy-to-use, auditable, secure and practical methods to share secrets. In other words, storing configuration in the environment doesn't solve a (very real) problem: it just moves it somewhere else, without providing a practical solution.

You may find this distinction specious, backwards, antiquated, or whatever. That's fine. The environment is the wrong place to store userspace/app-specific information. Don't do it.

That was a long-winded preamble to what I really wanted to discuss, namely the fact that the Host embraces this philosophy, and in quite a few instances it's made me want to punch the wall. In particular, the Host makes you set environment variables using a command-line client that's kind of like running remote ssh commands, meaning that values you set need to be escaped, and they don't always get escaped or unescaped the way you expect when you query them. So if you set an environment variable value to its current value as queried by the command-line client, you'll double-escape the value (e.g. "lol+wat" gets first set as "lol\+wat"; looking it up returns "lol\+wat" (escaped); resetting it turns it into "lol\\\+wat"; i.e. a set-get-set operation isn't idempotent). All this is hard-to-debug, painfully annoying, and completely unnecessary if the model wasn't so stupid about using the environment for configuration.

Dev == Prod?

One of the twelve-factor tenets is that dev/stage should mirror production closely. This is a very laudable goal, as it minimizes the risk of unexpected bugs due to environment differences (aka "but it worked on my machine"). It's especially laudable as a lot of developers (at least in Silicon Valley) have embraced OSX/macOS as their OS of choice, even though nobody deploys web apps to that operating system in production, which means there's always a non-zero risk of stuff that works on dev failing on production because of some incompatibility somewhere. This also means every dev wastes huge amounts of time getting their consumer laptop to masquerade an industrial server, using ports and casks and bottles and build-from-source and other unholy devices, instead of just, you know, doing the tech work on the same operating system you're deploying on, because that would mean touching Linux and ewww that's gross.

Originally, the Company had wrapped its production apps into Docker container using the Host's standard Dockerfiles and Procfiles, but devs were doing work on their bare-metal Macs, which meant finding, installing and configuring a whole bunch of things like Postgres, Redis, nginx, etc. That's annoying, overwhelming for new employees (since the documentation or Ansible playbooks you have to do that work are always behind and out of date about what actually happens on dev machines), and a pain to keep up to date. Individual dev machines drift apart from each other, "it works on my machine (but nor on yours)" becomes a frequent occurrence, and massive amounts of time (and money) are wasted debugging self-inflicted problems that really don't deserve to be debugged when it's so easy to do it right with a Linux VM and Ansible playbooks, but that would mean touching Linux and ewww that's gross.

So I was asked to wrap the dev environment into Dockerfiles, and ideally we'd use the same Dockerfile as production, so that dev could truly mirror prod and we'd make all those pesky bugs go away. Good plan. Unfortunately, though, I didn't find that to be practical in the Company's situation: the devs use a lot of dev-only tools (unit test harnesses, linters, debuggers, tracers, profilers) that we really do not want to have available in production. In addition, starting the various apps and services is also done differently on dev and prod: debug options are turned on, logging levels are more verbose, etc. So we realized and accepted the fact that we just can't use the same Dockerfile on dev and on prod. Instead, I've been building a custom parent image that includes the intersection of all the services and dependencies used in the Company's various apps, and converting each app's Dockerfile to extend that new base image. This significantly reduces the differences and copy-pasta between Dockerfiles, and will give us faster deployments, as the base image's file system layers are shared and therefore more likely to be cached.

Runtime v. Build Time

Back to Docker-specific bits, this one was a doozy. When building the dev Dockerfiles, I had split the setup between system-level configuration (in the Dockerfile) and app-specific setup (e.g. pip installs, node module installation, etc), which lived in a bootstrap script executed as the Dockerfile's CMD. It worked well, but it felt inelegant (two places to look for information about the container), so I was asked to move the bootstrap stuff into the Dockerfile.

The devs' setup requirements are fairly standard: they have their Mac tools set up just right, so they want to be able to use them to edit code, while the code executes in a VM or a Docker container. This means sharing the source code folder between the Mac host and the Docker containers, using the well-supported VOLUME or -v functionality. Because node modules and pip packages are app-specific, they are listed in various bog-standard requirements.txt and package.json files in the code base (and hence in the Mac's file system). As the code base is in a shared folder mounted inside the Docker container, I figured it'd be easy to just put the pip install stuff in the Dockerfile and point it at the mounted directories.

But that failed, every time. A pip install -e /somepath/ that was meant to install a custom library in editable mode (so it's pip-installed the same way as on prod, but devs can live-edit it) failed every time, missing its setup.py file, which is RIGHT THERE IN THE MOUNTED FOLDER YOU STUPID F**KING POS. A pip install -r /path/requirements.txt also failed, even though 1) it worked fine in the bootstrap script, which is also in the same folder/codebase 2) the volumes were specified and mounted correctly (I checked from inside the container).

That's when I realized the difference between build time and runtime in Docker. The stuff in the Dockerfile is read and executed at build time, so your app has what it needs in the container at runtime. During build time, your container isn't really running--a bunch of temporary containers briefly run so various configuration steps can be executed, and they leave file system layers behind as Docker moves through the Dockerfile. The volumes you declare in your Dockerfile and/or docker-compose.yml file are mounted as you'd expect (you can ssh into your container and see the mount points); but they are only bound to the host's shared folders at runtime. This means that commands in your Dockerfile (which are used at build time) cannot view or access files in your shared Mac folder, because those only become available at runtime.

Of course you could just ADD or COPY the files you need from the Mac folder into the mounted directory, and do your pip install in the Dockerfile that way. It works, but it feels kinda dirty. Instead, what we'll do is identify which pip libraries are used by most services, and bake those into our base image. That'll shave a few seconds off the app deployment time.

Editorializing a bit, while I (finally) understand why things behaved the way they did, and it's completely consistent with the way Docker works, I feel it's a design flaw and should not be allowed by the Docker engine. It violates the least-surprise principle in a major way: it does only part of what you think it will do (create folders and mount points). I'd strongly favor some tooling in Docker itself that detects cases like these and issues a WARNING (or barfs altogether if there was a strict mode).

Leaky Abstractions and Missing Features

Docker aims to be a tidy abstraction of a self-contained black box running on top of some machine (VM or bare-metal). It does a reasonable job using its union file system, but the abstraction is leaky: the underlying machine still peeks through, and can bite you in the butt.

I was asked to Dockerize an on-prem application. It's a Java app which is launched with a fairly elaborate startup script that sets various command-line arguments passed to the JVM, like memory and paths. The startup script is generic and meant to just work on most systems, no matter how much RAM they have or where stuff is stored in the file system. In this case, the startup script sets the JVM to use some percentage of the host's RAM, leaving enough for the operating system to run. It does this sensibly, parsing /proc/meminfo and injecting the calculated RAM into a -Xmx argument.

But when Dockerized, the container simply refused to run: the Host had allocated some amount of RAM to it, and the app's launcher was requesting 16 times more, because the /proc/meminfo file was... the host EC2 instance's! Of course, you could say "duh, that's a layered file system, of course that's what it does" and you'd be right. But the point is that a Docker container is not a fully encapsulated thing; it's common enough to query your environment's available RAM, and a clean, encapsulated container system should always give an answer that's reflective of itself, not breaking through to the underlying hardware.

Curious "Features"

Docker's network management is... peculiar. One of its more esoteric features is the order in which ports get EXPOSEd. I was working on a Dockerfile that was extending a popular public image, and I could not make it visible to the outside world, even though my ports were explicitly EXPOSEd and mapped. My parent image was EXPOSing port 443, and I wanted to expose a higher port (4343). For independent reasons, the Host's system only exposes the first port it finds, even if several are EXPOSEd; and because there's no UNEXPOSE functionality, it seemed I'd have to forget about extending the public base image and roll my own so I could control the port.

But the Host's bottomless knowledge of Docker revealed that Docker exposes ports in lexicographic order. Not numeric. That means 3000 comes before 443. So I could still EXPOSE port a high port (3000) as long as lexicographically it appeared before the base image's port 443, and the Host would pick that one for my app.

I still have a bruise on my forehead from the violent D'OHs I gave that day.

On a slightly higher level than this inside-baseball arcana, though, this "feature" also shows how leaky the Docker abstraction is: a child image is not only highly constrained by what the parent image does (you can't close/unexpose/override ports the parent exposes), it (or its auhor) needs to have intimate knowledge of its parent's low-level details. Philosophically, that's somewhat contrary to the Docker ideal of every piece of software being self-contained. Coming at it from the software world, if I saw a piece of object-oriented code with a class hierarchy where a derived class had to know, be mindful of, or override a lot of the parent class's attributes, that'd be a code smell I'd want to get rid of pretty quickly.

Conclusion: Close, But Not Quite There

There is no question Docker is a very impressive and useful piece of software. Coupled with great, state-of-the-art tooling (such as the container tools available from AWS and other places), and some detailed understanding of Docker internals, it's a compelling method for deploying and scaling software quickly and securely.

But in a resource-constrained environment (a small team, or a team with no dedicated ops resource with significant Docker experience), I doubt I'd deploy Docker on a large scale until some of its issues are resolved. Its innate compatibility with ephemeral resources like web app instances also makes it awkward to use with long-running services like databases (also known as persistence layers, so you know they tend to stick around). So you'll likely end up with a mixed infrastructure (Docker for certain things, traditional servers for others; Dockerfiles here, Ansible there; git push deploys here, yum updates there), or experience the ~~ordeal~~ joy of setting up a database in Docker.

Adding to the above, the Docker ecosystem also has a history of shipping code and tools with significant bugs, stability problems, or non-backward-incompatible changes. Docker for Mac shipped out of beta with show-stopping, CPU-melting bugs. The super common use case of running apps in Docker on dev using code in a shared folder on the host computer was only resolved properly a few months ago; prior to that, inotify events when you modified a file in a shared, mounted folder on the host would not propagate into the container, and so apps that relied on detecting file changes for hot reloads (e.g. webpack in dev mode, or Flask) failed to detect the change and kept serving stale code. Before Docker for Mac came out, the "solution" was to rsync your local folder into its alter ego in the container so the container would "see" the inotify events and trigger hot reloads; an ingenious, effective, but brittle and philosophically bankrupt solution that gave me the fantods.

Docker doesn't make the ops problem go away; it just moves it somewhere else. Someone (you) still has to deal with it. The promise is ambitious, and I feel it'll be closer to delivering on that promise in a year or two. We'll just have to deal with questionable tooling, impenetrable documentation, and doubtful stability for a while longer.

2016-04-18

Django, static assets, versioning, and WhiteNoise

I had an interesting time troubleshooting an issue with Django, WhiteNoise and static asset versioning. This may be obvious to experienced Django users, but not to me; I've maintained Flask and Rails apps before, but Django is a new beast. I'll document it here in case it helps somebody.

My goal was to set up asset versioning in a Django app to serve static files as filename.somehash.js instead of filename.js (same with other file types like css, png, etc). This is standard practice; most modern frameworks have that capability, and different ways to do it.

I had started using WhiteNoise because the internets suggested it was a much, much easier task than other ways to do it. I was hoping to do asset versioning and deploy a Cloudfront CDN at the same time, and WhiteNoise is set up to do just that.

Once everything was set up according to the documentation, I ran python manage.py collectstatic and saw the versioned file names getting generated. Checking the files themselves confirmed that. But when I loaded the app in a browser, only the unversioned file names were being requested.

After much head-scratching, I found this was because the app templates reference the static files with the standard {% load static %} method. The problem went away when I changed that to {% load static from staticfiles %} as suggested in this closed issue on the subject. Note that I didn't try the other option mentioned in that issue, {% load staticfiles %}, but that should also work.

Once the app restarted, beautiful unique file names were being requested and served. But I was occasionally getting 500 errors. I traced those back to instances where the app and WhiteNoise were being asked to serve files that no longer exist. Those references to deleted js, css, etc. files didn't actually harm the app's functionality, but when WhiteNoise is asked to serve them, it throws an exception and causes the app to 500.

That's not ideal behavior--my take is that 50x errors in a production app should never happen and always be handled gracefully when they do, so a library that causes 500s by actively raising exceptions rather than logging, catching and handling them gracefully isn't ideal. But them's the breaks, and I might yet submit a PR to the owner if I find the time.

In this particular app's case, this behavior was especially non-ideal because some of these files were referenced in commented-out JavaScript, and not actually requested; it looks like WhiteNoise and/or Django greedily consider anything that looks like a static file path to be actually requested, even if it's in code that doesn't execute.

The solution is simple--find all those dangling references and exterminate them! Use those 500s to your advantage by exercising the app and tailing your error logs. It's easy to argue that's something you should do no matter what, so it wasn't hard to convince the code owners it was the right thing to do.

2013-06-28

Engine Tard

Today I got an email from Engine Yard asking me to be an Engine Yard ambassador / promoter / evangelist. Given our painful experiences with EY, I said no thanks. To their credit, they asked why. So I sent them this list, and will document it here for LOLs and posterity.

What we didn't like about EY:

your pricing is unjustifiably high considering the shortcomings below.
your approach to database replication is facepalm-inducing. It's explicitly designed *not* to be used to fail over. I lost count of how many times I said WTF.
getting SSL to work properly with all the right headers is unnecessarily painful (stunnel).
there's no API to scale up and down by script--everything has to be done manually. WTF again, big time.
you've fixed this (I hope), but for a long time, removing instances from an environment did not remove those from haproxy, which meant that unless you ran the recipes manually, the app master would still be sending traffic to instances that were either turned off or assigned to another customer of yours, resulting in 404s and other hilarious situations. I sang a Viking song of battle and sorrow when I realized that's what was happening and you guys confirmed it (after I was done laughing).
when adding or removing instances from your web UI, the number of failures is greater than the number of successful changes performed. More than half the time we would have to re-run the add or remove process for the instances to be added successfully. This is for a completely vanilla Rails app with a tiny number of servers, i.e. the default / base case you guys are catering to.
when adding more than a couple of instances, the app master's NIC gets flooded by requests used to provision the new instances, which brings the entire stack down. We LOLed heartily when we figured out that was going on. The only safe way to add instances is one or two at a time. Given how long it takes to do that (see above re. failures), it's a giant pain in the rear.
the default stack of app master + slaves is stupid. An active app master serving traffic and SSL termination shouldn't also be a load balancer. That's just dumb.

To be perfectly blunt, I really like the idea of your service, and I'm sure it's great for people to get started with hosting, but every time we tried to do something with our stack, it felt like EngineYard was designed and operated by amateurs who don't have any experience running, let alone hosting, or offering hosting for, a real web business. I wouldn't use EY even if there was 0 markup over AWS.

PS: it got so bad we started calling you guys EngineTard and I drew the attached. Note my Photoshop skills are not particularly advanced.

Update: I have to say EY has class and a sense of humor. After receiving this diatribe, they sent me a $25 Amazon gift card.

Update: One thing I forgot to mention is the frequent billing errors. We cancelled our account on April 2; in May I got a bill for usage we didn't incur, and on July 1 we got a bill for snapshot storage and unused IP addresses for an account that's been closed for 3 months. Sigh.

2012-09-14

The 5 Stages of Design

I'm a terrible designer. Hell, I'm not a designer at all. I can usually tell when a UX is good or bad, and I know what I like, but when it comes to producing designs, I'm useless. My only saving grace is that I know that and let others do the pretty.

Sometimes, though, it's easier to explain something with a picture than a bunch of words. So I bust out the Paint.NET and try to line things up well enough the idea is conveyed, but sloppily enough it's obviously not a final mockup.

In my early days at Crunched it took a few interesting conversations to get that process fine-tuned to work with our designer. I was unaware of the torment I was putting her through. Now we've agreed that...

this is not my job
it's still ok and moderately useful for me to do graphics every so often to convey an idea
it's ok to laugh at them

Inevitably, though, when I present my "work", there's some trepidation about what horrors will be shown, and (legitimate) questions about whether what is seen can ever be unseen, etcetera. So today we identified the five stages of design:

Denial. (e.g. "WTF IS THIS or YOU CAN'T BE SERIOUS")
Grief (e.g. "MY EYES!!!!!!!!")
Mockery (e.g. "LOL")
Acceptance (e.g. "Ok, I see what you're trying to do here")
Improvement (e.g. "I'll make it pretty")

I wonder how common this is for designers.

2012-05-24

MVP, done, and working

Recently at SalesCrunch we had an interesting discussion of what MVP (minimum viable product), done and working means for our web application. Like all startups, we've run into growing pains with our app's feature set, reliability and performance, some self-inflicted and some external (Apple disabling Java by fiat and killing lots of functionality on the web overnight), and I wanted to reduce the non-productive frustration and tension we often fall into ("the app is broken!") by pointing out that unless you define and agree on specifics, every participant will have different expectations. Different expectations are bad, so it's important to always make it clear what it means for a feature to be done and working--no assumptions.

So my main goal was to make sure every product build process includes an explicit conversation about what it means for that product to be done (i.e. it has all the features/use cases/user-error protection we want) and working (the SLA it needs to meet). That discussion must happen between all the people involved in the build--product, design, business, engineering, and maybe even the end-user (or at least the end-user should have a way to provide feedback so the company can adjust their criteria for "done" and "working" in case we got it wrong).

We got to a good place with that conversation, but it was a little abstract. Then I tried to make coffee, and was given a great real-world example of what the discussion and thought process could be.

I went to make espresso. Turned on the machine, put the cup under the espresso spout, and started the brew.

My cup promptly filled with high-pressure hot water, because I had forgotten to put the coffee basket in.

You may say "well that's dumb, the brew shouldn't be able to start if there's no brewing basket, since you're never gonna get coffee". That's what I thought, briefly.

But that requires an extra feature (a sensor or switch that blocks the brew switch if the basket is missing), meaning more moving parts, build and integration time, testing, higher costs, more expensive retail price, etc.

It's likely somebody at Mr Coffee had a conversation about this and decided a distracted/stupid/sleepy person forgetting to put the brew basket in before turning on the machine wasn't likely and/or dangerous enough to warrant the extra time, complexity and cost for a machine built for the sub-$100 price point.

Until this incident, I had no issues with the machine. Today, I used it, and didn't get coffee out of it. So one could argue the machine is not done (missing a sensor), or it's not working (it's allowed to function without producing coffee).

But does the machine work? Is it done? Until yesterday I had no issues with it. Today the circumstances have changed: I did something dumb I'd never done before, and had a failed coffee-making experience. So maybe my criteria for done and working have changed as a result: today I may decide to buy a different machine, looking for an error prevention feature. But the machine I have hasn't changed.

The main message here is:

don't assume done and working are defined the same for everyone
it's ok for done and working to change meaning with new circumstances
if those change, then the product needs to change in response to it
frustration, hand-wringing and pain can be avoided by being aware of 1-3 and not judging a product that was built on old done/working criteria against the new criteria

2012-05-07

Overloaded 404 part deux

I just noticed Backpack by 37Signals also 404s when you access a page you need to be logged in to view.

STOP THE MADNESS!

And of course, if I tell them it's bad UX, they're likely to tell me to go f*** myself.

2010-01-31

Make Sure Your Customers Can Pay You

If you're running an e-commerce site, chances are you want to extract money out of your visitors. Yet you wouldn't know it by looking at some sites like Best Western's reservation portal, which makes it incredibly difficult if not impossible to complete a paying transaction, as I recently discovered to my chagrin.

Invisible Form

Here's what happened: after selecting a date range, I pressed the big giant "Book it" button next to the room I was interested in (+1 for using big huge buttons for the primary action):

This led me to the following page, completely devoid of a form into which I could enter my credit card information:

After another failed attempt, I fired up Internet Explorer to see if the form was crucially incompatible with Firefox, and I got my answer: the credit card form is served in a popup window, which IE requested permission to open and Firefox quashed silently (I have the "be quiet about popups" setting turned on). Here's what it would have looked like with slightly more verbose Firefox settings:

Note that no one using a reasonably recent browser in its default settings gets to see the popup window unless they explicitly authorize it. In other words, one hundred percent of Best Western's juiciest, readiest-to-buy visitors are not shown the reservation form in the default use case.

I don't have to spell out how completely crazy this is.

To make matters worse, when you do complete the form and click the submit button, nothing happens in Explorer due to a javascript:void(0) issue in the form submit handler¹; in Firefox, the window simply closes with no confirmation the transaction went through.

I had to call the hotel to find out how many of my three form submission attempts had gone through (one had, which was soon confirmed by an email from the site).

Money For Nothing And Tests For Free

What this tells me is that the designer(s) and developer(s) in charge of the Best Western portal never did so much as basic hallway usability testing before they shipped their product. Maybe they don't know about it; maybe the product team is in Connecticut and the developers are in a different country. Either way, a 30-minute investment would have caught what is potentially costing the company tens of thousands of dollars a day.

Another notable omission is that the credit card form itself (for those of you lucky enough to see it) doesn't exactly scream "GIVE ME ALL YOUR MONEY NOW!" I'd bet its conversion hasn't been tested or optimized either.

Testing form conversion is hardly a novel or esoteric branch of rocket science; it's a simple, mature field with well-known best practices and automated testing systems to help you get the most out of your site. But a company that can't be bothered to test the basic usability of their purchase flow can't be expected to look at bounce rates or A/B test their funnel.

Given the bang for the buck you get from real-world testing with a tiny number of users, I submit Best Western could have earned an ROI on the order of 100,000% if they had just watched a half dozen travelers go through the booking process using a fresh, default install of IE or Firefox.

Help Me Buy What You're Selling

I will never know how much money Best Western is losing from this; presumably a lot of travelers wind up calling the hotel (whose phone number is displayed at the top) and maybe even pay the (higher) non-internet rate you get when reserving by phone. So for all I know Best Western is actually coming out ahead (I have my doubts).

What's really sad about this grossly inefficient system is that Best Western didn't even have to earn my business: all they had to do is keep it. Given the way RevPAR is trending, that's a gimme they don't get very often. I always stay at that hotel when traveling down south, because it's convenient, well located, reasonably quiet and clean, and its wireless internet occasionally functions. But I don't know how patient I'm going to be next time around.

---

1. Don't use javascript:void(0) to block a page reload event in an a tag. This will work:

<a href="#" onclick="do_something();return false;">text</a>

2010-01-23

The Process

An issue most software shops have to adjudicate at a certain size is how to build applications. Not how as in which languages, servers, or database systems (hopefully that's settled early on, revised periodically, and mostly stable)--how as in how, how early, are features defined and built, how often code is tagged and released, iterative v. waterfall, agile v. see-what-sticks free-for-all, etc.

Small startups comprised of one product co-founder and a technical co-founder usually don't need to worry about this. But as soon as you're big enough to have discrete design, product, customers and QA people involved, setting up the right processes becomes a necessity.

What I'm calling "process" here is the set of practices used to build a software project, not the dreaded product-design-executives-board-executives-design-engineering gauntlet your idea may have to go through before seeing the light of day. Rather, "process" here means the tools, approaches, meetings and other day-to-day activities you decide to use to build your project: daily stand-up meetings, waterfall specs, retrospectives, whiteboards, white/gray/black-box QA, pair programming, whatever. It's a stodgy name for a concept that needs not be stodgy.

Why Process?

One source of friction in software development is that software is malleable, and so tweaking features all the time is easy (and tempting). But one person's flexibility is another's sloppiness; one person's easy last-minute tweak is another's bang-your-head-on-the-table¹ nightmare.

When building a bridge or skyscraper, with hard, heavy parts that need to be measured, ordered and fabricated months, if not years, ahead of time, the notion goes that you can't get away with sloppy/flexible/just-in-time/no planning and last-minute changes, and so building software shouldn't really be any different, because sloppy/flexible/just-in-time/no planning makes the product buggy/brittle/ugly/inconsistent. It's a common argument I've heard even from people who would rather mine coal in Mongolia than go (back) to the waterfall model.

But I think that's a bit of a false contrast. While I do expect structural engineers to know what they're doing and plan how they're going to do it if my life depends on it, I also expect them to be agile enough to respond to unexpected conditions, or to incorporate new ideas or technologies that come along when it makes sense (which does happen when you're working on a 15-year project). And I don't know any good software engineer who actually enjoys a 1-year planning cycle before they get to write any code or create table schemas.

The point is that you're usually building something for other people, and so what matters is your ability to deliver a quality product for those people. And change happens. Great new ideas come in at the last minute. Your CEO finds a blind alley in your UI flow nobody on your team had thought about. So complaining about, denying the existence of, or impeding those changes doesn't really gain anybody anything. But to the extent your own comfort or happiness or need for control are determining factors in whether you do deliver a quality product, you need some kind of process to enable you to build the best possible product in the best possible conditions.

Ask What Your Process Can Do For You

Process is meant to get things done well (your customers like your product), quickly (a leg up on the competition), and comfortably (high turnover is the enemy). It's not an mystical spirit or a magical toolbox to be worshiped or enshrined. And it's not the 100-year-old secret formula for Coca Cola or your grandfather's super gooey, always delicious sticky bun recipe, either.

Think of it as a suitcase containing the right outfits for a year in California. If you're spending April in San Francisco, you might want a lot of light layers, with a waterproof shell you can take off easily when it stops raining, and maybe a windbreaker. If you drive up to Tahoe in January, you might want snow mittens and a thick parka. And don't forget a swimsuit and towel for those summer days in Santa Monica. It's not the end of the world when the clothes get stained or wear out; you can take the suitcase with you to your next destination; and when you gain or lose weight, some of your clothes won't fit anymore.

Perhaps most importantly, if you find yourself wondering why people laugh at you and your dorky parka on the beach, or you're uncomfortable in your t-shirt and vest in the snow, the problem isn't with the clothes you're wearing--think about the ones you're not wearing.

Bad Process, No Cookie

It's easier to describe process when it's broken. How can you tell your process doesn't work? Defects creep in, stress rises, milestones are missed, good people quit. Sure, all of those symptoms could be due to individual issues like personal-life distractions, lack of skill or motivation, unexpected illnesses. But that's precisely what the right process is there to help you solve. By and large, your organization shouldn't fall apart when one person has a car accident and is laid up for a month, or limp along when someone isn't up to snuff on a particular set of tasks. So when you're having issues delivering products, see if the problem really is coming from night owl Jill Programmer or eccentric Mike Designer, or if your process is what needs a kick in the pants.

This is where being pragmatic, not dogmatic, in your process decisions can serve your needs better. Your processes are not your customers, your stockholders, your investors, your spouses--the process is beholden to you, not you to it. So if a process doesn't feel right, do spend some time tweaking it, but don't be afraid to shove it aside and try something else.

Now, changing direction too often with your process can send destructive signals: you don't know what you're doing; your team is dysfunctional and can't work effectively (and the team leaders don't know how to address that); the product is too undefined and can't be built. But remember the clothes aren't wearing you. Your allegiance is to your product and customers first, your people second, and the process dead last.

A Heuristic

Once you've used a process with great success, it's possible, even tempting, to settle on it and use it for everything, and that might be good enough. But not every problem is a nail to be addressed by your big hammer, so here's a possible heuristic to decide what approach might work best. Note this isn't a decision tree or a set of solutions; it's a set of simple questions the answers to which can be much more illuminating than you might think at first glance.

Who is going to use the product you're building?
If you're building a fun consumer product, with lots of new features and changes that make your consumers happy, you might want to try quick iterations with soft launches or restricted-availability features that get you feedback quickly so you can tweak and release something new the next week. SCRUM or other agile processes can be very helpful so you don't lose momentum; a drawn-out design and product feature planning process might not be best, because you can't tell on day 1 what people will tell you about your product on day 8, and the 10 features you design up-front might turn out to be unwanted.

On the other hand, if you're building a Web service API consumed by machines, rather than people, you might be better served to spend more time planning up front before you start coding: you might need solid capacity planning, performance testing, redundancy and failover, and because there can be hardware procurement and setup issues, it might be good to know more ahead of time, and then fan out and work on various pieces in parallel.

How much change can be expected in the feature set or interface?
Do you need a rock-solid API and protocols that will last for years and need to be backward-compatible for the next 5 version numbers? Which API calls should you support now to remain relevant in the future (maybe your service is brand new, but a treasure trove of data will emerge from a few months' worth of usage logs). You might want to do some heavy-duty feature analysis to make sure you're not including useless API calls nobody wants, or missing important calls people will need. This doesn't preclude short iterations and frequent milestones, and even fast feature changes, internal to your team: the shape of your end product doesn't have to match your product development practices.

How big is the team working on the project?
Can it be done in a month by a couple of developers and a part-time designer? Is it phase one of a company-defining, all-hands-on-deck product spread over a two-year plan? Constant verbal contact between the designer and developer working next to each other might be a great substitute for a thick PDF spec that fell out of sync two days before the PM last emailed it around; it could be a mess leaving giant holes in the product and SQL injection vulnerabilities all over the place. It greatly depends on...

Who is working on the project?
Software doesn't build itself. The team working on your project is the strongest predictor of success or failure, much more so than the technology they use. So be sure to gauge your engineers, designers, product managers and other actors, and keep your finger on the team's pulse: do they prefer a strongly-defined upfront spec? Do they resent being told what to do? Does Fred Programmer perform best with frequent status catchups, while Sung Designer thrives in a don't-call-me-I'll-call-you rhythm? Do Sung and Fred want to kill each other on a good day, and eat each other's babies on a bad day? More importantly, do you or the team lead actually know all this?

Note this is not a deep insight. In fact it's not an insight at all: different people have different needs, and the same process doesn't magically fit every team. Get a sense of what makes people thrive, or tick, and pick the practices that favor the former and minimize the latter.

Whatever Works

Ultimately your customer will determine whether your project is successful, mediocre, or a failure. Your process can only facilitate or hinder your progress towards one of those end points. It might be a beautiful work of by-the-book SCRUM, or a hotchpotch of Agile / XP on the tech side with an independent design and product team checking in occasionally, or a Soviet-era waterfall. The best process is the one that works for your project and your people, not the one that follows the bullet points in your expensive management consultancy's white paper.

----

1. True story.

2006-11-01

Web 3.0: the Return of Desktop Apps?

For a few years now, technology companies and developers have migrated to the Web en masse. The Web browser has become the preeminent development platform and given rise to the most interesting and abundant product and user-interface innovations in recent years--the whole movement even has a marketing name, "Web 2.0". Even desktop stalwarts like word processing and spreadsheets have sprouted Web-based equivalents like Google's Docs and Spreadsheets. Lots of folks are screaming that the desktop is dead and that the Web is the One True Way to write and deliver software.

Being an old desktop app developer myself, and having transitioned to writing Web apps, I'd argue for a middle-ground and predict that the "Web 3.0" wars will actually be waged on the desktop, with internet-enabled desktop applications harnessing the best of the Web's dynamic data updates and the best of the desktop's significant horsepower, while bypassing each platform's inherent problems (note that by "desktop application" I mean "non-browser apps running on a computing device," so iPhone apps and other smart phones are included).

Those problems are well known. Desktop programs are difficult to maintain and update on your users' computers (especially on multiple operating systems), necessitating complex physical delivery (CD-ROMs or large downloads) and update processes; but they're richer, faster, more responsive, and not limited by the subset of UI controls available to Web browsers.

Web apps solve this problem by ensuring that everyone always uses the latest and greatest version of your program, there's no data loss due to computer malfunction because your users' data lives on your servers; and they're multi-platform out of the box; but they're slower, hard to maintain because of browser incompatibilities, and limited in crucial ways for security reasons (e.g. by and large you can't manipulate files that live on the desktop from the browser, etc).

Web development also requires significant drudge work, because Rich Internet Applications require building complex supporting frameworks that simply don't exist for the browser (although that's changed a lot with YUI and other toolkits); desktop operating systems typically have all of that stuff built in, ready to be used without much effort. Even the most basic functions in Adobe Photoshop or Microsoft Movie Maker take a tremendous amount of work to be replicated in a browser, and while programmers might ooh and aah over the programming kung-fu creating a Web-based, drag-and-drop, rich-text editor requires, it's still a stupid text editor, and the Win32 API lets you have a faster, better one with a dozen lines of Delphi code or less. Google's customized home page had people all excited because it lets you drag and drop components--never mind that drag-and-drop is over 2 decades old and unremarkable if not completely insignificant outside of the browser.

It's easy to notice the huge number of really interesting applications developed for the Web and overlook the equally impressive apps that have come out of the desktop arena in recent years. I've counted 13 desktop search program from Google, Yahoo!, MSN, X1, Copernic, Blinkx, Ask and others; iTunes is a ~~snazzy~~ ubiquitous internet-enabled desktop app; there are over a dozen P2P file-sharing programs; stardock, Apple and hundreds of independent developers have built tons of desktop widgets such as stock tickers and email notifiers; media players abound (you can easily find over 20 of them for Windows); Microsoft Office and Adobe Photoshop are still selling like hotcakes; and possibly the most-talked about non-Website program in recent years has been Google Earth. A lot of these programs could run in your browser, but the developers made them desktop apps, often for the reasons I outlined above.

The natural evolution, then, is not in turning your browser into a mini OS with a graphical shell (though that's challenging and fun). We're already hitting serious roadblocks in that arena. I'd bet real money the next step in the cycle is to turn your stodgy desktop apps into connected powerhouses using Web services and other APIs for up-to-the-minute data (and user data storage, why not?), and harness desktop OSes' superior UI libraries and horsepower to display that data in creative, responsive, fast and exciting new ways the browser can't replicate with acceptable browser (or programmer) performance; just try building 3D graphs, file trees or animation in a browser without using Flash or Silverlight plugins. You could have a desktop Photoshop-style program with desktop-powered editing functions but seamless Web storage, integrated with Flickr tagging and Google image search; really, really fast and powerful aggregation, search and processing of news and other content grabbed from the Web, but sliced and diced in real time with the massive computing power your desktop offers. The sky is the limit. Who's with me?

roggr