2016-12-01

AMA Part 2: Software Development Resources

In the previous post, I discussed a few ways in which I have found coding bootcamps to be inadequate. In this post, I will present a list of resources I have found very useful as a software engineer. Many are free, and most others can be found in used bookstores (or Amazon) for moderate prices. I'd love to hear other people's favorite resources, so please add yours in the comments or on Twitter.

General Computer Science and Programming

Disclaimer: I didn't study CS formally, so this list is short and almost certainly out of date. Please send me your suggestions!

Language Specific

Official Tutorials / Documentation


Note: I strongly do not recommend a very popular Ruby introduction, Why's (Poignant) Guide to Ruby. A lot of people seem to love it. I found it way too cute, unclear, trying too hard, and just bad.

Books

Note: even if you don't program in C or C++, Kernighan & Richie is a great introduction to computer programming at a low level. The Lippman and Liberty, Halpern books go very well together, and are project-oriented walkthroughs of essential features. All three of these books are very short, but they pack a punch.
Note: get both Nutshell booksand read them in parallel. Mine is a free work-in-progress, aimed at people who have never programmed but want/need to learn Java.
Note: anything by Flanagan, Fowler, Beck, Crockford or Martelli is worth reading.

After You've Coded For A While

Note: I listed Stroustrup here because it's a pretty dense, dry read (do not read it as your first programming book) that requires a good idea of how things work under the hood. It's also not a practical introduction to C++ programming; it's a guide to the C++ language, and from that perspective it's full of vital insights about choices made when C++ was designed, which in turn makes you think about what computer languages can do, and the various ways they do it.

Specific Topics


AMA Part 1: Coding Bootcamps

A recent #DevDiscuss thread on Twitter focused on developers' various education paths into the profession. The discussion was lively and a lot of good questions, answers and experiences were shared.

After the discussion ended, a few people contacted me to ask more questions about the industry, and two themes emerged: 1) the various, popular coding bootcamps that have flourished in the past few years and 2) what resources (books, online tutorials, etc) I would recommend for a new software developer to become well-rounded (and employable).

I'm going to address each question here in case it helps more than the folks who DMed me on Twitter. This post will be a reflection on bootcamps, and the next will collect programming resources I've found useful.

On Coding Bootcamps

First, a few disclaimers:

  • I have never been a student or instructor at any coding bootcamp
  • I hae nothing against short, intensive programs to learn a skill--I've taken a total of two computer-related classes, and both were short, intensive, and bootcamp-ish.
  • I realize different people have different learning styles; some do best alone, reading tutorials/books and writing tons of test code; others thrive with videos or podcasts; others still benefit from the focus and/or collegial learning you get in a classroom setting; etc.
  • I was a teacher for a few years, in college and adult ed, in standard quarter/semester-long as well as intensive summer programs; I've also spent a lot of time in the classroom as a student
  • I think it's fantastic that so many people (and especially underrepresented groups) are learning to become software engineers
  • I don't find a formal computer-science education is a particularly good predictor of talent or success in the software industry. Some of the best engineers I've worked with were humanities majors or high-school grads; some of the worst had MAs in CompSci from Stanford; there have also been great formally-trained engineers and awful self-taught engineers
  • My experience with bootcamps comes from interviewing about thirty applicants, offering jobs to two, and being friends with a couple
  • Some bootcamps may be fantastic. I don't know all of them, far from it
With that out of the way, here are some observations I've made about coding bootcamps.

I found two areas where bootcamps seem to be falling short: tool/technology independence, and low-level technical basics (how stuff actually works).

Technology Independence

Presumably because software engineering is a vast subject and you need to carefully limit the scope of an introductory course, I found bootcamps teach their students exactly one way to do things. with carefully selected tools, but not:
  • other ways to do the same thing (with other tools, or with no big frameworks at all); 
  • why those tools were chosen over others; and
  • what to do when you have to deal with a novel situation that doesn't exactly fit the standard paradigm.
The education seems limited to a very expensive Rails or Angular (or whatever framework) tutorial, carefully keeping students down the garden path of a basic application. There are a lot of tutorials available for free online; the majority 

The graduates I talked to had never been exposed to any other way of doing things than the Rails/Angular/whatever way, even though 1) there are many, equally valid ways to approach application development and 2) the vast majority of industry jobs involve mixed, heterogeneous assemblies of tools, practices, and code from different eras/styles/people, and finding a chunk of code that's exactly like the tutorial so you can comfortably understand and modify it is the exception, not the rule.

Students were able to tell me how they would use ActiveRecord to interact with a database and display a list of things in a Rails view, but were stumped when I added common variations to the data stack (e.g. combining data from a SQL database with a document store like ElasticSearch). And when I gave them pieces of existing, real-life code to pick apart and modify to implement a new or different feature, most of them remained stuck and unable to figure out a way to make any progress.

I'm not blaming them for not knowing how to use something they weren't taught (all devs have to pick up new technology all the time); what I'm deploring is that the bootcamps didn't give them the mental tools and technical knowledge to reason their way out of a predicament

Learning software engineering is a skill that will last you a lifetime. Knowing how to crank out an app with today's popular tools is a lot less valuable. Crucially, engineering skills like experimentation, figuring out how a piece of code works, exposure to multiple ways to do something so you're never stuck in one pattern you don't completely understand, those are arguably the hardest skills to learn on your own, and where a classroom setting, peers and a teacher to answer your questions, would be most beneficial. Learning how to use a framework, library or tool is the kind of stuff anyone can do with a little time and a browser, and a classroom setting isn't all that necessary. 

How Stuff Works

Another area where the bootcamp graduates I spoke to were entirely unprepared is the underlying low-level technology that makes a networked app work (web app or internet-enabled mobile app). I'm not talking about arcana of TCP packet management or running a DNS server--the very basics of how software executes on a machine and how network/internet requests are made: how your browser finds example.com, contacts it, requests stuff, receives said stuff, and displays it. The kind of thing you absolutely have to understand when you're troubleshooting a problem in your live app, or when you're setting up a CDN, or when you're doing Ajax calls to a third-party domain, or dealing with HTTPS, or redirecting people from one page of your app to another.

Anyone can write an app that handles ideal circumstances; what makes an engineer valuable is their ability to fix it when it misbehaves. None of the bootcamp graduates was able to reason through the network path or anatomy of a basic web request. Very few knew how headers and cookies work. That stuff isn't complicated, you just need to see it once to understand it; and it's very important in a world of open, unsecured wi-fi access points and personalized apps and services, so you know why putting credentials in a cookie on a non-HTTPS site is a bad idea. 

You don't need to be an expert; but not knowing the basics will absolutely hold a person back. Yes, those things can be learned on the job, but getting that job will be tricky if your education hasn't given you any information at all about the building blocks of your day-to-day work.

Silver Lining

Bootcamps are not all bad. I've heard and seen a lot of great feedback from people who genuinely got a lot out of them. Many bootcamps have industry partnerships or placement programs that help their graduates get hands-on experience in real software shops. The advantages of collegial learning are undeniable. Some people thrive in the pressure of intense, brief immersion into a topic. And you've got to start somewhere. 

The other good news is that some of what I discussed above can be remedied easily; the information can be absorbed and understood in a couple of days of guided study. 

Conclusion

Beyond the specifics I outlined above, what bothers me the most about the bootcamps I've been exposed to is that they both overpromise and underdeliver. Some (many? all?) claim to prepare future devs for the job market, but the ones I've been exposed to fall far short. And given how they seem to aggressively recruit from underrepresented populations (I've met a lot of non-male, non-white students from those bootcamps), it feels like the students are being sold a bill of goods and the promise of a fun, fulfilling and lucrative career, and are likely to be surprised and bitterly disappointed once they start interviewing for software engineering jobs.

I'd be happy to recommend a bootcamp education if I knew of one that gave its students more than a tutorial, and included a survey of the basic technology underlying the kind of software its graduates are taught to write. If anyone reading this has a recommendation, I'd love to hear about it. Find me on Twitter @roger_b_m or comment here.


2016-11-23

Docker lessons learned 1 year in

A little under a year ago, I started doing devops work for a startup (the Company) with very specialized needs. As it operates in a highly regulated sector, the company's access to their infrastructure is extremely restricted, to prevent accidental or malicious disclosure of protected information. Their in-house web apps and off-the-shelf on-prem software are deployed on a compliant PaaS (I'll call them "the Host", even though they offer vastly more than just hosting), which is very similar to Heroku and uses Docker exclusively for all applications deployed on their private EC2 cloud. I knew about Docker but had never used it, and it's been an interesting few months, so I thought I'd write up some observations in case they help someone.

Topsy Turvy

If you're coming to Docker from a traditional ops shop, it's important to keep in mind that many of your old habits and best practices either don't apply or are flipped upside down in a Docker environment. For example, you're probably going to use config management with Chef or Ansible a lot less, and convert your playbooks into Dockerfiles instead. Ansible/Chef/etc is based on the assumption that infrastructure has some level of permanence: you stand up a box, set it up with the right services and configuration, and it will probably be there and configured when you get around to deploying your app to it. By contrast, in the Docker world, things are much more just-in-time: you stand up and configure your container(s) while deploying your app. And when you update your app, you just toss the old containers and build new ones.

Another practice that may feel unnatural is the foregrounding of (the main) processes. On a traditional web server, you'd typically run nginx, some kind of app server, and your actual app, all in the background. Docker, on the other hand, tends to use a one-service-one-container approach, and because a container dies when its main process does, you have to have something running in the foreground (not daemonized) for your container to stay up. Typically that'll be your main process (e.g. nginx), or you'll daemonize your main process and have an infinite tail -f /some/log as your main process.

As a corollary, while traditional server setups often have a bunch of backgrounded services all logging to files, a typical Dockerized service will only have one log you care about (the one for your main process), and because a container is usually an ephemeral being, its local file system is best treated as disposable. That means not logging to files, but to stdout instead. It's great for watching what's happening now, but not super convenient if you're used to hopping on a box and doing quick greps and counts or walking through past logs when troubleshooting something that happened an hour ago. To do that, you have to deploy a log management system as soon as your app goes live, not after you have enough traffic and servers that server-hopping, grep and wc has become impractical. So get your logstash container ready, because you need it now, not tomorrow.

It's a decidedly different mindset that takes some getting used to.

I was already on board with the "everything is disposable" philosophy of modern high-availability systems, so conceptually it wasn't a huge leap, but if you're coming from a traditional shop with bare-metal (or even VM) deployments, it's definitely a mental switch.

Twelve Factor App Conventions

This one is more specific to the Host than to Docker in general, but it's part of an opinionated movement in modern software dev shops that includes Docker (and Rails, and Heroku), so I'll list it here. The Twelve-Factor App manifesto is a practical methodology for building modern apps delivered over the web. There's a lot of good stuff in there, like the emphasis on explicit declarations or the importance of a dev/stage environment matching production closely. But there's also questionable dogma that I find technically offensive. Specifically, factor 3 holds that configuration must be stored in the environment (as opposed to config files or delivered over some service).

I believe this is wrong. The app is software that runs in user space; the environment is a safe, hands-off container for the app. The environment and the app live at different levels of resolution: all the app stuff is inward-looking, only for and about the app; while the environment is outward-looking, configured with and exposing the right data for its guests (the apps and services running in the environment). Storing app-level (userspace) data in the environment is like trusting the bartender in a public bar with your specific drink preferences, and asking her what you like to drink (yes, this is a bad simile).

In addition, the concerns, scope, skills, budget, toolsets, and personalities of the folks involved in app work tend to be different from those of people doing the environment (ops) stuff. And while I'm ecstatic that devs and ops people appear to finally be merging into a "devops" hybrid, there's a host of practical reasons to divide up the work.

In practical terms, storing configuration in the environment also has significant drawbacks given the tools of the trade: people like me use grep dozens of times every day, and grepping through a machine's environment comprehensively (knowing that env variables may have been set as different Unix users) is error-prone and labor-intensive for no discernible benefit. Especially when your app is down and you're debugging things under pressure. It's also very easy to deploy what's supposed to be a self-contained "thing" (your twelve-factor app) and see it fail miserably, because someone forgot to set the environment variables (which highlights the self-contradictory, leaky nature of that config-in-the-environment precept: if your app depends on something external to it (the environment), it's not self-contained).

Another driver for the config-in-the-environment idea is to make sure developers don't store sensitive information like credentials, passwords, etc. in code that winds up in source control (and thus on every dev's computer, and potentially accidentally left in code you helpfully decided to open-source on GitHub). That makes a ton of sense and I'm all for it. But for practical purposes, this still means every dev who wants to do work on their local machine needs a way to get those secrets onto their computer, and there aren't a lot of really easy-to-use, auditable, secure and practical methods to share secrets. In other words, storing configuration in the environment doesn't solve a (very real) problem: it just moves it somewhere else, without providing a practical solution.

You may find this distinction specious, backwards, antiquated, or whatever. That's fine. The environment is the wrong place to store userspace/app-specific information. Don't do it.

That was a long-winded preamble to what I really wanted to discuss, namely the fact that the Host embraces this philosophy, and in quite a few instances it's made me want to punch the wall. In particular, the Host makes you set environment variables using a command-line client that's kind of like running remote ssh commands, meaning that values you set need to be escaped, and they don't always get escaped or unescaped the way you expect when you query them. So if you set an environment variable value to its current value as queried by the command-line client, you'll double-escape the value (e.g. "lol+wat" gets first set as "lol\+wat"; looking it up returns "lol\+wat" (escaped); resetting it turns it into "lol\\\+wat"; i.e. a set-get-set operation isn't idempotent). All this is hard-to-debug, painfully annoying, and completely unnecessary if the model wasn't so stupid about using the environment for configuration.

Dev == Prod?

One of the twelve-factor tenets is that dev/stage should mirror production closely. This is a very laudable goal, as it minimizes the risk of unexpected bugs due to environment differences (aka "but it worked on my machine"). It's especially laudable as a lot of developers (at least in Silicon Valley) have embraced OSX/macOS as their OS of choice, even though nobody deploys web apps to that operating system in production, which means there's always a non-zero risk of stuff that works on dev failing on production because of some incompatibility somewhere. This also means every dev wastes huge amounts of time getting their consumer laptop to masquerade an industrial server, using ports and casks and bottles and build-from-source and other unholy devices, instead of just, you know, doing the tech work on the same operating system you're deploying on, because that would mean touching Linux and ewww that's gross.

Originally, the Company had wrapped its production apps into Docker container using the Host's standard Dockerfiles and Procfiles, but devs were doing work on their bare-metal Macs, which meant finding, installing and configuring a whole bunch of things like Postgres, Redis, nginx, etc. That's annoying, overwhelming for new employees (since the documentation or Ansible playbooks you have to do that work are always behind and out of date about what actually happens on dev machines), and a pain to keep up to date. Individual dev machines drift apart from each other, "it works on my machine (but nor on yours)" becomes a frequent occurrence, and massive amounts of time (and money) are wasted debugging self-inflicted problems that really don't deserve to be debugged when it's so easy to do it right with a Linux VM and Ansible playbooks, but that would mean touching Linux and ewww that's gross.

So I was asked to wrap the dev environment into Dockerfiles, and ideally we'd use the same Dockerfile as production, so that dev could truly mirror prod and we'd make all those pesky bugs go away. Good plan. Unfortunately, though, I didn't find that to be practical in the Company's situation: the devs use a lot of dev-only tools (unit test harnesses, linters, debuggers, tracers, profilers) that we really do not want to have available in production. In addition, starting the various apps and services is also done differently on dev and prod: debug options are turned on, logging levels are more verbose, etc. So we realized and accepted the fact that we just can't use the same Dockerfile on dev and on prod. Instead, I've been building a custom parent image that includes the intersection of all the services and dependencies used in the Company's various apps, and converting each app's Dockerfile to extend that new base image. This significantly reduces the differences and copy-pasta between Dockerfiles, and will give us faster deployments, as the base image's file system layers are shared and therefore more likely to be cached.

Runtime v. Build Time

Back to Docker-specific bits, this one was a doozy. When building the dev Dockerfiles, I had split the setup between system-level configuration (in the Dockerfile) and app-specific setup (e.g. pip installs, node module installation, etc), which lived in a bootstrap script executed as the Dockerfile's CMD. It worked well, but it felt inelegant (two places to look for information about the container), so I was asked to move the bootstrap stuff into the Dockerfile.

The devs' setup requirements are fairly standard: they have their Mac tools set up just right, so they want to be able to use them to edit code, while the code executes in a VM or a Docker container. This means sharing the source code folder between the Mac host and the Docker containers, using the well-supported VOLUME or -v functionality. Because node modules and pip packages are app-specific, they are listed in various bog-standard requirements.txt and package.json files in the code base (and hence in the Mac's file system). As the code base is in a shared folder mounted inside the Docker container, I figured it'd be easy to just put the pip install stuff in the Dockerfile and point it at the mounted directories.

But that failed, every time. A pip install -e /somepath/ that was meant to install a custom library in editable mode (so it's pip-installed the same way as on prod, but devs can live-edit it) failed every time, missing its setup.py file, which is RIGHT THERE IN THE MOUNTED FOLDER YOU STUPID F**KING POS. A pip install -r /path/requirements.txt also failed, even though 1) it worked fine in the bootstrap script, which is also in the same folder/codebase 2) the volumes were specified and mounted correctly (I checked from inside the container).

That's when I realized the difference between build time and runtime in Docker. The stuff in the Dockerfile is read and executed at build time, so your app has what it needs in the container at runtime. During build time, your container isn't really running--a bunch of temporary containers briefly run so various configuration steps can be executed, and they leave file system layers behind as Docker moves through the Dockerfile. The volumes you declare in your Dockerfile and/or docker-compose.yml file are mounted as you'd expect (you can ssh into your container and see the mount points); but they are only bound to the host's shared folders at runtime. This means that commands in your Dockerfile (which are used at build time) cannot view or access files in your shared Mac folder, because those only become available at runtime.

Of course you could just ADD or COPY the files you need from the Mac folder into the mounted directory, and do your pip install in the Dockerfile that way. It works, but it feels kinda dirty. Instead, what we'll do is identify which pip libraries are used by most services, and bake those into our base image. That'll shave a few seconds off the app deployment time.

Editorializing a bit, while I (finally) understand why things behaved the way they did, and it's completely consistent with the way Docker works, I feel it's a design flaw and should not be allowed by the Docker engine. It violates the least-surprise principle in a major way: it does only part of what you think it will do (create folders and mount points). I'd strongly favor some tooling in Docker itself that detects cases like these and issues a WARNING (or barfs altogether if there was a strict mode).

Leaky Abstractions and Missing Features

Docker aims to be a tidy abstraction of a self-contained black box running on top of some machine (VM or bare-metal). It does a reasonable job using its union file system, but the abstraction is leaky: the underlying machine still peeks through, and can bite you in the butt.

I was asked to Dockerize an on-prem application. It's a Java app which is launched with a fairly elaborate startup script that sets various command-line arguments passed to the JVM, like memory and paths. The startup script is generic and meant to just work on most systems, no matter how much RAM they have or where stuff is stored in the file system. In this case, the startup script sets the JVM to use some percentage of the host's RAM, leaving enough for the operating system to run. It does this sensibly, parsing /proc/meminfo and injecting the calculated RAM into a -Xmx argument.

But when Dockerized, the container simply refused to run: the Host had allocated some amount of RAM to it, and the app's launcher was requesting 16 times more, because the /proc/meminfo file was... the host EC2 instance's! Of course, you could say "duh, that's a layered file system, of course that's what it does" and you'd be right. But the point is that a Docker container is not a fully encapsulated thing; it's common enough to query your environment's available RAM, and a clean, encapsulated container system should always give an answer that's reflective of itself, not breaking through to the underlying hardware.

Curious "Features"

Docker's network management is... peculiar. One of its more esoteric features is the order in which ports get EXPOSEd. I was working on a Dockerfile that was extending a popular public image, and I could not make it visible to the outside world, even though my ports were explicitly EXPOSEd and mapped. My parent image was EXPOSing port 443, and I wanted to expose a higher port (4343). For independent reasons, the Host's system only exposes the first port it finds, even if several are EXPOSEd; and because there's no UNEXPOSE functionality, it seemed I'd have to forget about extending the public base image and roll my own so I could control the port.

But the Host's bottomless knowledge of Docker revealed that Docker exposes ports in lexicographic order. Not numeric. That means 3000 comes before 443. So I could still EXPOSE port a high port (3000) as long as lexicographically it appeared before the base image's port 443, and the Host would pick that one for my app.

I still have a bruise on my forehead from the violent D'OHs I gave that day.

On a slightly higher level than this inside-baseball arcana, though, this "feature" also shows how leaky the Docker abstraction is: a child image is not only highly constrained by what the parent image does (you can't close/unexpose/override ports the parent exposes), it (or its auhor) needs to have intimate knowledge of its parent's low-level details. Philosophically, that's somewhat contrary to the Docker ideal of every piece of software being self-contained. Coming at it from the software world, if I saw a piece of object-oriented code with a class hierarchy where a derived class had to know, be mindful of, or override a lot of the parent class's attributes, that'd be a code smell I'd want to get rid of pretty quickly.

Conclusion: Close, But Not Quite There

There is no question Docker is a very impressive and useful piece of software. Coupled with great, state-of-the-art tooling (such as the container tools available from AWS and other places), and some detailed understanding of Docker internals, it's a compelling method for deploying and scaling software quickly and securely.

But in a resource-constrained environment (a small team, or a team with no dedicated ops resource with significant Docker experience), I doubt I'd deploy Docker on a large scale until some of its issues are resolved. Its innate compatibility with ephemeral resources like web app instances also makes it awkward to use with long-running services like databases (also known as persistence layers, so you know they tend to stick around). So you'll likely end up with a mixed infrastructure (Docker for certain things, traditional servers for others; Dockerfiles here, Ansible there; git push deploys here, yum updates there), or experience the ordeal joy of setting up a database in Docker.

Adding to the above, the Docker ecosystem also has a history of shipping code and tools with significant bugs, stability problems, or non-backward-incompatible changes. Docker for Mac shipped out of beta with show-stopping, CPU-melting bugs. The super common use case of running apps in Docker on dev using code in a shared folder on the host computer was only resolved properly a few months ago; prior to that, inotify events when you modified a file in a shared, mounted folder on the host would not propagate into the container, and so apps that relied on detecting file changes for hot reloads (e.g. webpack in dev mode, or Flask) failed to detect the change and kept serving stale code. Before Docker for Mac came out, the "solution" was to rsync your local folder into its alter ego in the container so the container would "see" the inotify events and trigger hot reloads; an ingenious, effective, but brittle and philosophically bankrupt solution that gave me the fantods.

Docker doesn't make the ops problem go away; it just moves it somewhere else. Someone (you) still has to deal with it. The promise is ambitious, and I feel it'll be closer to delivering on that promise in a year or two. We'll just have to deal with questionable tooling, impenetrable documentation, and doubtful stability for a while longer.

2016-11-11

Book in progress: Programming Basics

I started teaching a friend Java, and figured I might as well share the notes I wrote with anybody who may want them. Feedback appreciated (do read the README first to get a sense of the goals and intended audience, though):

https://github.com/rogthefrog/programming-basics-with-java

2016-09-02

Python import basics in plain English

In my own experience learning Python, and that of others on Python teams I've worked with, a common hurdle is understanding how Python does imports.

The basics are actually very simple, but the documentation tends to be a little neckbeardy and dense, and hard to grok if you're new to the language. So I thought I'd list common, simple practical examples of Python imports in case they help someone.

To import data and functions from somewhere else (another .py file in your project, a standard library like os, or a third-party library you may have installed with pip), you have the following options:

import <module>
import <module> as <other_name>
from <module> import a, b, c
from <module> import *

Let's look at what these options mean.

import <module>

import <module> means the program you're in can access everything that is defined in <module> (variables, classes, functions, etc), and you have to prepend "<module>." to those things. For example, if  is the "os" library (it comes with Python), which defines a function called getpid and a variable named name, your program can do this:

>>> import os
>>> os.name
'posix'
>>> os.getpid()
51678

This works with your own libraries too. Say you created a Python file named network_functions.py which contains a constant named BANDWIDTH and a method named connect(url), you can do:

>>> import network_functions
>>> network_functions.BANDWIDTH
1024
>>> network_functions.connect('https://google.com')
Connecting...

If you don't want to have to type out the whole prefix (which can get unwieldy if your imports are nested (the modules are subdirectories), e.g. import lib.network.connection_functions), you have the following options:

import <module> as <other_name>

This lets you use <other_name> instead of the module's full name. 

>>> import lib.network.connection_functions as netfunc
>>> netfunc.BANDWIDTH
1024
>>> netfunc.connect('https://google.com')
Connecting...

from <module> import a, b, c

This lets you import only what you need from module into your current program's namespace. This means everything you imported from the external module can be called by its bare name in your program:

>>> from lib.network.connection_functions import BANDWIDTH, connect
>>> BANDWIDTH
1024
>>> connect('https://google.com')
Connecting...

from <module> import *

Import everything from the imported module into your current program's namespace, so you can call everything from the module by its bare name. This is strongly not recommended, and I'll explain why.

>>> from lib.network.connection_functions import *
>>> BANDWIDTH
1024
>>> connect('https://google.com')
Connecting...

This is almost never a good idea, because you don't always know or control what is defined in an external module, and there can be name collisions, e.g. functions or variables with the same name, so you may not be using the variable or function you expect! For example:

In file helpers.py:
def connect(url):
    print "Connecting to", url

In file network.py:
def connect(url):
    print "Hacking into", url

>>> from helpers import *
>>> from network import *
>>> connect('https://google.com')
# which one is called?

>>> from network import *
>>> from helpers import *
>>> connect('https://google.com')
# which one is called?

This example may seem contrived, but it's very common to import a bunch of modules written by different people, and some variable or function names are common or obvious enough that they may appear several times in different modules. Why wouldn't they, after all? Joe doesn't know about Jill's (or your own) module, so they have no reason to coordinate and ensure they're not using the same function names. 

If you use from <module> import * with several modules, the odds are very good you'll call a function and actually invoke one that's not the one you expect. And that can be really tricky to debug. 

So what should you do if you do need a bunch of functionality from a module and don't want to import every single function and variable by name with:

from <module> import var1, var2, var3, fun1, fun2, fun4 # etc 

It's simple! Don't use from <module> import *. Instead, use import <module> and presto, your program can use everything from <module>, as long as you prefix <module>. before the names.

>>> import os
>>> os.getpid()
'posix'

Handy Tips

Do you ever want to know the variables or functions defined in a module you imported without having to Google them? Just use vars or dir:

>>> import os
>>> dir(os)
['EX_CANTCREAT', 'EX_CONFIG', 'EX_DATAERR', 'EX_IOERR', 'EX_NOHOST', 'EX_NOINPUT', 'EX_NOPERM', 'EX_NOUSER', 'EX_OK', 'EX_OSERR', 'EX_OSFILE', 'EX_PROTOCOL', 'EX_SOFTWARE', 'EX_TEMPFAIL', 'EX_UNAVAILABLE', 'EX_USAGE', 'F_OK', 'NGROUPS_MAX', 'O_APPEND', 'O_ASYNC', 'O_CREAT', 'O_DIRECTORY', 'O_DSYNC', 'O_EXCL', 'O_EXLOCK', 'O_NDELAY', 'O_NOCTTY', 'O_NOFOLLOW', 'O_NONBLOCK', 'O_RDONLY', 'O_RDWR', 'O_SHLOCK', 'O_SYNC', 'O_TRUNC', 'O_WRONLY', 'P_NOWAIT', 'P_NOWAITO', 'P_WAIT', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'UserDict', 'WCONTINUED', 'WCOREDUMP', 'WEXITSTATUS', 'WIFCONTINUED', 'WIFEXITED', 'WIFSIGNALED', 'WIFSTOPPED', 'WNOHANG', 'WSTOPSIG', 'WTERMSIG', 'WUNTRACED', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_copy_reg', '_execvpe', '_exists', '_exit', '_get_exports_list', '_make_stat_result', '_make_statvfs_result', '_pickle_stat_result', ] # and a bunch more

>>> vars(os)
{'WTERMSIG': , 'lseek': , 'EX_IOERR': 74, 'EX_NOHOST': 68, 'seteuid': , 'pathsep': ':', 'execle': , '_Environ': , ] # and a bunch more

What's Next?

In a later post, I'll cover other tricky aspects of imports, namely how Python maps import to Python code files in directories, and how to debug ImportError: No module named <module> problems that can occur depending on what directories your files are in.

Hopefully that was helpful!

2016-04-18

Django, static assets, versioning, and WhiteNoise

I had an interesting time troubleshooting an issue with Django, WhiteNoise and static asset versioning. This may be obvious to experienced Django users, but not to me; I've maintained Flask and Rails apps before, but Django is a new beast. I'll document it here in case it helps somebody.

My goal was to set up asset versioning in a Django app to serve static files as filename.somehash.js instead of filename.js (same with other file types like css, png, etc). This is standard practice; most modern frameworks have that capability, and different ways to do it.

I had started using WhiteNoise because the internets suggested it was a much, much easier task than other ways to do it. I was hoping to do asset versioning and deploy a Cloudfront CDN at the same time, and WhiteNoise is set up to do just that.

Once everything was set up according to the documentation, I ran python manage.py collectstatic and saw the versioned file names getting generated. Checking the files themselves confirmed that. But when I loaded the app in a browser, only the unversioned file names were being requested.

After much head-scratching, I found this was because the app templates reference the static files with the standard {% load static %} method. The problem went away when I changed that to {% load static from staticfiles %} as suggested in this closed issue on the subject. Note that I didn't try the other option mentioned in that issue, {% load staticfiles %}, but that should also work. 

Once the app restarted, beautiful unique file names were being requested and served. But I was occasionally getting 500 errors. I traced those back to instances where the app and WhiteNoise were being asked to serve files that no longer exist. Those references to deleted js, css, etc. files didn't actually harm the app's functionality, but when WhiteNoise is asked to serve them, it throws an exception and causes the app to 500.

That's not ideal behavior--my take is that 50x errors in a production app should never happen and always be handled gracefully when they do, so a library that causes 500s by actively raising exceptions rather than logging, catching and handling them gracefully isn't ideal. But them's the breaks, and I might yet submit a PR to the owner if I find the time.

In this particular app's case, this behavior was especially non-ideal because some of these files were referenced in commented-out JavaScript, and not actually requested; it looks like WhiteNoise and/or Django greedily consider anything that looks like a static file path to be actually requested, even if it's in code that doesn't execute.

The solution is simple--find all those dangling references and exterminate them! Use those 500s to your advantage by exercising the app and tailing your error logs. It's easy to argue that's something you should do no matter what, so it wasn't hard to convince the code owners it was the right thing to do.


2016-01-04

Installing wxPython in a virtualenv on Centos 6.7

I'm looking at wxPython to write a GUI for an app I'm working on, and as it turns out using wxPython with virtual environments isn't completely obvious. Hopefully someone finds this helpful.

My distribution is CentOS 6.7 with a hand-built Python 2.7.6.

Step 1: Install wxPython

This will install wxPython in your system's default Python library directory (not the one you want).

$ sudo yum install wxPython
$ sudo find / -name wx*.py
/usr/lib64/python2.6/site-packages/wxversion.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wxPython/lib/wxpTag.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wxPython/lib/wxPlotCanvas.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/tools/XRCed/plugins/wxlib.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/tools/Editra/src/wxcompat.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/lib/wxcairo.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/lib/wxpTag.py
/usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx/lib/wxPlotCanvas.py

Step 2: Create and activate your virtualenv

$ cd
$ virtualenv -p /usr/bin/python2.7 venv
$ source venv/bin/activate

Importing wx will fail:

$ python
Python 2.7.6 (default, Dec  2 2013, 21:17:42) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import wx
Traceback (most recent call last):
  File "", line 1, in
ImportError: No module named wx

Step 3: Symlink wxPython into your virtualenv

$ cd ~/venv/lib/python2.7/site-packages
$ ln -s /usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wx
$ ln -s /usr/lib64/python2.6/site-packages/wx-2.8-gtk2-unicode/wxPython
$ ln -s /usr/lib64/python2.6/site-packages/wxversion.py

Step 4: Start coding!

$ source venv/bin/activate
$ python
Python 2.7.6 (default, Dec  2 2013, 21:17:42) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import wx
>>> app = wx.App()
>>> frame = wx.Frame(None, -1, 'lol')
>>> frame.Show()
True
>>> app.MainLoop()

Note: I only just started playing with wxPython, so there may be other symlinks required to make it work. Let me know if so, and I'll update the post.


2015-03-25

Recruiter isn't even trying anymore

I got this InMain on LinkedIn today.

March 25, 2015, 1:33 PM
Dear Roger,
Trust this finds you in good health!! I wanted to share a Back End Engineer role with you with one of my direct clients , its a long term contract and location is Mountain View, CA. Here is the short story:

The candidate should be able to quickly adapt to different environment/framework as the project needs.
These are minimum required skills.
Open source backend stack
Web backend frameworks in Java OR C#, OR Python OR Ruby, etc.
Database design and maintenance using SQL and/or NoSQL
Let me know if you think that this would be a fit as per your skills and expertise, if no, maybe you can refer someone !!
Best,
Gargi 
  • Nothing about the company (industry? size? age? product?)
  • The technology stack is essentially "whatever" (but open source! Including C#!)
  • The data stack is essentially "whatever"
  • The role is web AND database design AND dba
I LOL'ed heartily as I marked the InMail as spam.

2015-02-19

Comcast lies

Verbatim transcript of a chat session with Naval. I was having internet connectivity problems. Naval says (s)he resolved them, and then did the standard upsell trick, in which (s)he lied. Then when I noticed the problems weren't resolved, (s)he lied again.

Summary

Lie #1: " you will get additional service at a cheaper cost."

The rep correctly showed my current service is 50m down and basic cable TV for $82 / mo. He said Comcast had an offer where I could get "additional service at a cheaper cost." The "cheaper cost" is $101.99 for first 12 months and then $126.99 from 13-24 months.

Lie #2: "Every thing is totally fine"

At the exact time Naval said "Every thing is totally fine," Comcast was having well-documented problems:

https://downdetector.com/status/comcast-xfinity/san-francisco


Lie #3 (probably): "You will also check this problem on our site also"

The "server upgradation" and service degradation is not listed.


Trancript

Update

I spoke with another rep, Jose, who was great. Excerpts from our chat, mostly for the lulz:

Jose: May I know the issue that our rep lied to you?
Roger: (summary of the above)
Roger: I asked the price
Roger: "This package will cost you only $101.99 for first 12 months, after that it will cost you $126.99 from 13-24 months."
Roger: $101.99 is not actually cheaper than $91.96.
Jose: yes that is correct. Its simple math.

Then this unintentionally hilarious pearl:

Jose: On internet issue they can be corrected by our Internet dept.
Jose: We are cable troubleshooting dept so all of our tools here are for cable TV.
Jose: On broken promises those usually happens on Sales dept.Jose: Here on cable troubleshhoting we dont lie.