Best Practices

Installing nvm for n users

I use n to manage my system/default NodeJS, but if I’m in a legacy repo (hint: pretty much every repo where I work) I’ll need to drop into an old version of NodeJS just for that terminal.

Here’s how I install and use nvm. Based on the original instructions at: github.com/creationix/nvm

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh
cd ~/.nvm
git remote rm origin
git remote add origin git@github.com:creationix/nvm.git

Keep nvm from making you use a weird version of Node:

nvm alias default system

Don’t use nvm’s init script, use this instead:

if [ -d ~/.nvm ]; then
  export NVM_DIR="$HOME/.nvm"
  [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
  [ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion
else
  alias nvm='[ -f .nvmrc ] && n $(cat .nvmrc) || echo "MISSING: .nvmrc"'
fi

Tools like Husky will look for NVM environment variables. So for my non-work machines, I make sure none of them get set.

So what do I normally do if I need to do something in an old version of Node? This only comes up for me if a TravisCI builds for an old version. If the fix isn’t obvious, I’ll do n 4 to get into Node 4, do my fix, then switch back to LTS.

Best Practices

requirements.txt==2.0

There is a war going on. A war between those that say Python requirements should be explicit and those that say requirements should be implicit. Before I continue, I’m going to be talking about requirements.txt, not setup.py. The difference between explicit and implicit requirements comes down to whether the line says Django==1.9.7 or Django, respectively. Going deeper, you could also say that adding dependencies of dependencies is explicit, and you could loosely pin like Django<1.10.

The advantage of explicit requirements is you get a repeatable environment. Especially if you’re also specifying dependencies of dependencies. The advantages of implicit requirements are readability and automatic security upgrades.

Here at TabbedOut, we’ve developed a technique that works very well I’d like to share: Use pip-tools to manage your requirements. You get the best of both worlds, at the expense of some extra boilerplate. Here’s how we do it:

  1. Be in a virtualenv
  2. Use our Makefile boilerplate (see below)
  3. pip install pip-tools
  4. Write a “sloppy” requirements.txt using implicit requirements, but name it requirements.in
  5. Run make requirements.txt
  6. Check all this into your codebase

Advantages

  • requirements.in is easy to maintain
  • requirements.txt has pinned versions so your virtualenv matches your collaborators and production
  • You automatically get patches and security fixes when you run make requirements.txt, and there are no surprises because it goes through your code review process

Tips

  • Try to loosely pin requirements in your requirements.in. Though it doesn’t matter that much because you’ll catch it when you see a major version change in requirements.txt.
  • Specifying an exact version in requirements.in is an anti-pattern, and you should document why. Often it’s because there’s a bug or backwards-incompatible change.

Makefile boilerplate

Here’s what a Makefile might contain:

help: ## Shows this help
	@echo "$$(grep -h '#\{2\}' $(MAKEFILE_LIST) | sed 's/: #\{2\} /	/' | column -t -s '	')"

install: ## Install requirements
	@[ -n "${VIRTUAL_ENV}" ] || (echo "ERROR: This should be run from a virtualenv" && exit 1)
	pip install -r requirements.txt

.PHONY: requirements.txt
requirements.txt: ## Regenerate requirements.txt
	pip-compile --upgrade --output-file $@ requirements.in
  • help: This is just a fast way of making your Makefile self-documenting.
  • install: Nowadays, you need Python and non-Python requirements. Putting it all in one make target makes it easier for developers to jump into a project.
  • PHONY: When you run make requirements.txt, you want it to run every time. Not just when requirements.in changes. That’s because new versions may have been uploaded to PyPI. I always group my PHONY with my target. Even though it adds more lines, your Makefile will be more maintainable because you’re not trying to keep a list off the screen up to date.
  • requirements.txt: Why make requirements.txt over make requirements? Because best practice dictates that if the output of a make target is a file, that file should also be the name of the target. That way, you can use the automatic variable $@ and it’s explicit, even at the cost of needing the PHONY.
  • –upgrade: Without this, pip-tools doesn’t actually upgrade your dependencies.
  • –output-file $@: pip-tools does this by default, but explicit is better than implicit. I would prefer to do pip-compile --upgrade requirements.in > $@ but pip-tools 1.6 does a poor job of dealing with stdout (see below).

Caveats

  • When you change requirements.in, you do have to remember to run make requirements, but you could automate that with a git-hook or CI process. In practice, we’ve found that running make requirements.txt is fine.
  • pip-tools==1.6 does not work with the latest pip (8.1.2). See #358
  • pip-tools==1.6 has a poor understanding of how stdin and stdout are supposed to work. Hopefully this gets fixed soon but is only a minor annoyance. #362 #360 #353 #104
  • The compilation step can depend on your platform. I’ve only noticed this with ipython, which needs packages for interacting with the terminal like gnureadline. It hasn’t been trouble for us, but it could be for you. A workaround is to run the process in a Docker container.

Sample Scenarios

If you need more convincing, here are some problems this approach solves for us:

I thought I was getting xyzpackage version 3, why is version 2 getting installed? Pip tools flattens all your requirements, and annotates which package specified what. So in requirements.txt, you’ll see xyzpackage==2.4    # via scumbagpackage and see that scumbagpackage was responsible.

What packages am I actually using? In a large project, your requirements.txt will balloon as you run into bugs and start pinning dependencies of dependencies. Then one day, you’ll realize you don’t know what packages you’re actually using. With a much simpler requirements.in, there’s less to sort through and fully pinned packages stick out like sore thumbs.

It works for me Sometimes a project will work only for you. You check your installed versions against requirements.txt and they match. But what you didn’t realize is a dependency of a dependency broke something. Since pip-tools freezes everything, you’ll have the same version of every package installed. And if something does break, you’ll have history to trace down what changed.

Finish Writing Me Plz Nerd

Apache Bench

For years, my tool for simple load tests of HTTP sites has been ApacheBench.

For years, my reference for how to visualize ApacheBench results has been Gnuplot

For years, my reference for how to use Gnuplot has been http://www.bradlanders.com/2013/04/15/apache-bench-and-gnuplot-youre-probably-doing-it-wrong/

But do you really want to be writing Gnuplot syntax? It turns out that Pandas will give you great graphs pretty much for free:

df = pd.read_table('../gaussian.tsv')
# The raw data as a scatterplot
df.plot(x='seconds', y='wait', kind='scatter')
scatter
# The traditional Gnuplot plot
df.plot(y='wait')
wait
# Histogram
df.wait.hist(bins=20)

distribution

 

You can see the full source code at tsv_processing.ipynb

And re-recreate these yourself by checking out the parent repo: github/crccheck/abba

So now you might be thinking: How do you get a web server that outputs a normal distribution of lag? Well, I wrote one! I made a tiny Express.js server that just waits a random amount, packaged it in a Docker image, and and you can see exactly how I ran these tests by checking out my Makefile.

Nerd

Django Nose without Django-Nose

Tycho Brahe

I’ve grown to dislike Django-Nose. It’s been over three months since Django 1.8 has been released and they still don’t have a release that fully supports it. These are the advantages they currently tout:

  • Testing just your apps by default, not all the standard ones that happen to be in INSTALLED_APPS
    • The Django test runner has been doing this since 1.6 https://docs.djangoproject.com/en/1.8/releases/1.6/#discovery-of-tests-in-any-test-module
  • Running the tests in one or more specific modules (or apps, or classes, or folders, or just running a specific test)
    • They all can do this, even the old Django test runner
  • Obviating the need to import all your tests into tests/__init__.py. This not only saves busy-work but also eliminates the possibility of accidentally shadowing test classes.
    • The Django test runner has this since 1.6
  • Taking advantage of all the useful nose plugins
    • There are some cool plugins
  • Fixture bundling, an optional feature which speeds up your fixture-based tests by a factor of 4
    • Ok, Django doesn’t have this, but you shouldn’t be using fixtures anyways and there are other ways to make fixtures faster
  • Reuse of previously created test DBs, cutting 10 seconds off startup time
    • Django can do this since 1.8 https://docs.djangoproject.com/en/1.8/releases/1.8/#tests
  • Hygienic TransactionTestCases, which can save you a DB flush per test
    • Django has had this since 1.6 https://docs.djangoproject.com/en/1.6/topics/testing/tools/#django.test.TransactionTestCase
  • Support for various databases. Tested with MySQL, PostgreSQL, and SQLite. Others should work as well.
    • Django has had this forever

So what if you need a certain nose plugin? Say, xunit for Jenkins or some other tooling? Well, you still have to use Nose because django-jux hasn’t been updated in 4 years.

Here’s a small script you can use that lets you use Django + Nose while skipping the problematic Django-nose:

Run it like you would Nose:

DJANGO_SETTING_MODULE=settings.test python runtests.py --with-xunit --with-cov

One choice I made is that I use Django 1.8’s --keepdb flag instead of the REUSE_DB environment variable, but you can see how to adapt it if you wanted it to feel more like Nose. Adapting the command above to reuse the database would look like:

DJANGO_SETTING_MODULE=settings.test python runtests.py --with-xunit --with-cov --keepdb

Meh Practices Nerd Patterns

Django management commands and verbosity

Ren and Stimpy

[update: This post has been corrected, thanks to my commenters for your feedback]

Every Django management command gets the verbosity option for free. You may recognize this:

optional arguments:
  -h, --help            show this help message and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output

We rarely use it because doing so usually means lots of if statements scattered through our code to support this. If you’re writing quick n’ dirty code, this may look familiar in your management commands:


if options.get('verbosity') == 3:
    print('hi')

In a recent Django project, I came up with a few lines of boilerplate to support the verbosity option, assuming you’re using also the logging library and not relying on print:


import logging
class Command(BaseCommand):
    def handle(self, *args, **options):
        verbosity = options.get('verbosity')
        if verbosity == 0:
            logging.getLogger('my_command').setLevel(logging.WARN)
        elif verbosity == 1:  # default
            logging.getLogger('my_command').setLevel(logging.INFO)
        elif verbosity > 1:
            logging.getLogger('my_command').setLevel(logging.DEBUG)
        if verbosity > 2:
            logging.getLogger().setLevel(logging.DEBUG)

github.com/texas/tx_mixed_beverages/blob/master/mixed_beverages/apps/receipts/management/commands/geocode.py

So what does this do?

At the default verbosity, 1, I display INFO logging statements from my command. Increasing verbosity to 2, I also display DEBUG logs from my command. And going all the way to verbosity 3, I also enable all logging statements that reach the root logger.

Go forth and log!

Finish Writing Me Plz

Prometheus Contained

After becoming smitten with Graphite last year, I’m sorry to say I’ve become entranced by the new hotness: Prometheus. For a rundown between the two, Prometheus’s docs do a good job. The docs aren’t bad, but there are a lot of gaps I had to fill. So I present my hello world guide for using Prometheus to get metrics on a host the Docker way. In addition to the official docs, I found Monitor Docker Containers with Prometheus to be useful. For the node explorer, discordianfish’s Prometheus Demo was valuable.

Start a Container Exporter container

This container creates an exporter that Prometheus can talk to to get data about all the containers running on the host. It needs access to cgroups to get the data, and the docker socket to know what containers are running.

docker run -d --name container_explorer \
  -v /sys/fs/cgroup:/cgroup:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  prom/container-exporter

Start a Node Exporter container

This container uses the --net=host option so it can get metrics about the host network interface.

docker run -d --name node_exporter --net=host \
  prom/node-exporter

I was afraid that this would result in distorted stats because it’s in a container instead of the host, but after testing against a Node Exporter installed on the bare metal, it looks like it’s accurate.

Start a Prometheus container

This is the container that actually collects data. I’ve mounted my local prometheus.conf so Prometheus uses my configuration, and mounted a data volume so Prometheus data can persist between containers. There’s a link to the container explorer so Prometheus can collect metrics about containers. There’s a add-host so this container can access the node explorer’s metrics. Port 9090 is exposed because Prometheus needs to be publicly accessible from the Dashboard app. I’m not sure how to lock it down for security. I may add a referrer check since I don’t want to do IP rules or a VPN.

docker run -d --name $@_1 \
  -v ${PWD}/prom/prometheus.conf:/prometheus.conf:ro \
  -v ${PWD}/prom/data:/prometheus \
  --link container_explorer:container_explorer \
  --add-host=dockerhost:$(ip route | awk '/docker0/ { print $NF }') \
  -p 9090:9090 \
  prom/prometheus

Setup a Prometheus Dashboard database

Here, I’m running the rake db:setup task to set up a sqlite database. The image is my own pending pull request 386.

docker run \
  -e DATABASE_URL=sqlite3:/data/dashboard.sqlite3 \
  -v ${PWD}/prom/dashboard:/data:rw \
  crccheck/promdash ./bin/rake db:setup

Start a Prometheus Dashboard

Now that the dashboard has a database, you can start the Dashboard web app. You can access it on port 3000. You’ll need to create a server, then a site, then a dashboard, and finally, your first graph. Don’t be disappointed when your first graph doesn’t load. You may need to tweak the server settings until the graph preview from the Dashboard opens the right data in Prometheus.

docker run-d --name promdash \
  -e DATABASE_URL=sqlite3:/data/dashboard.sqlite3 \
  -v ${PWD}/prom/dashboard:/data:rw \
  -p 3000:3000 \
  crccheck/promdash

My almost unabridged setup

What’s missing in this gist is the config for my nginx layer in front of the dashboard, which is why there’s no exposed port in this gist. To get started, all you have to is put prometheus.conf in a prom sub-directory and run make promdash

Mental Note: Add Category

Patterns: bootstrapping Python/SublimeText projects

I consider myself really lazy. So here’s my basic workflow of home-grown commands I go through whenever I work on a Python project.

Some notes before I dive in:

  • replace  $PROJECT with the name of your project. I keep a consistent name between projects so I can use the directory name for other things.
  • some tedious things I alias, some I don’t
# everything has to live somewhere
md $PROJECT
# equivalent to: mdkdir $PROJECT && cd $PROJECT
# my alias: md () { mkdir -p "$@" && cd "$@"; }

# Using virtualenvwrapper, make a virtualenv the same name as the project
mkvirtualenv $PROJECT

# These next steps happen automatically in my global postactivate
# Give this virtualenv its own ipython configuration so it has its own ipython history
mkipythonconfig
# change my tab title to the name of this virtualenv so I can tell my tabs apart
echo -e "\033];$(basename $VIRTUAL_ENV)\007"
# my alias: tit() { echo -e "\033];$@\007" }

# This makes it so when I `workon $PROJECT`, I automatically cd to it.
# Sometimes, this is undesirable, but `cd -` will take you back.
setvirtualenvproject
# likewise I have a function named `work` that activates or creates the virtualenv:
function work {
  # assumes `mkvirtualenv` exists
  local env_name=$(basename $PWD)
  workon $env_name
  if [ $? -ne 0 ]; then
    echo "Shall I create $env_name it for you? [Yn]"
    read -n 1 sure
    if [ -z "$sure" ] || [ "$sure" = 'y' ] || [ "$sure" = 'Y' ]; then
      mkvirtualenv $env_name
      setvirtualenvproject
    fi
  fi
}

# create a sublime text 2 project with a SublimeJEDI configuration
mksublconfig > ~/Documents/$PROJECT.sublime-project
# source: https://github.com/crccheck/dotfiles/blob/master/bin/mksublconfig

# Create an .env file to store configuration like DJANGO_SETTINGS_MODULE
# Works in conjunction with autoenv.
touch .env

# If I'm creating a new project, I create an empty initial commit so I can
# squash and rebase things from the beginning.
git init
git initempty
#  initempty = !git init && git commit -m 'initial commit (empty)' --allow-empty
Best Practices Patterns

Patterns: don’t mess up the prod db!

With 12factor style environment configs, it’s a very easy to accidentally connect to your production database when you think you’re connecting to dev. Here’s a simple guard you can add to make sure your  DATABASE_URL isn’t somehow pointed to someplace it’s not supposed to (assuming you’re using Amazon AWS):

bash:

if [[ ${DATABASE_URL} == *"amazonaws"* ]]; then exit -1; fi

Python:

if 'amazonaws' in os.environ['DATABASE_URL']:
   exit('Cannot be run against a production database')

Django:

if 'amazonaws' in settings.DATABASES['default']['HOST']:
    raise CommandError('Cannot be run against a production database')

(thanks to x110dc for the bash syntax and 12factor link)

You didn't say the magic word