Best Practices

requirements.txt==2.0

There is a war going on. A war between those that say Python requirements should be explicit and those that say requirements should be implicit. Before I continue, I’m going to be talking about requirements.txt, not setup.py. The difference between explicit and implicit requirements comes down to whether the line says Django==1.9.7 or Django, respectively. Going deeper, you could also say that adding dependencies of dependencies is explicit, and you could loosely pin like Django<1.10.

The advantage of explicit requirements is you get a repeatable environment. Especially if you’re also specifying dependencies of dependencies. The advantages of implicit requirements are readability and automatic security upgrades.

Here at TabbedOut, we’ve developed a technique that works very well I’d like to share: Use pip-tools to manage your requirements. You get the best of both worlds, at the expense of some extra boilerplate. Here’s how we do it:

  1. Be in a virtualenv
  2. Use our Makefile boilerplate (see below)
  3. pip install pip-tools
  4. Write a “sloppy” requirements.txt using implicit requirements, but name it requirements.in
  5. Run make requirements.txt
  6. Check all this into your codebase

Advantages

  • requirements.in is easy to maintain
  • requirements.txt has pinned versions so your virtualenv matches your collaborators and production
  • You automatically get patches and security fixes when you run make requirements.txt, and there are no surprises because it goes through your code review process

Tips

  • Try to loosely pin requirements in your requirements.in. Though it doesn’t matter that much because you’ll catch it when you see a major version change in requirements.txt.
  • Specifying an exact version in requirements.in is an anti-pattern, and you should document why. Often it’s because there’s a bug or backwards-incompatible change.

Makefile boilerplate

Here’s what a Makefile might contain:

help: ## Shows this help
	@echo "$$(grep -h '#\{2\}' $(MAKEFILE_LIST) | sed 's/: #\{2\} /	/' | column -t -s '	')"

install: ## Install requirements
	@[ -n "${VIRTUAL_ENV}" ] || (echo "ERROR: This should be run from a virtualenv" && exit 1)
	pip install -r requirements.txt

.PHONY: requirements.txt
requirements.txt: ## Regenerate requirements.txt
	pip-compile --upgrade --output-file $@ requirements.in
  • help: This is just a fast way of making your Makefile self-documenting.
  • install: Nowadays, you need Python and non-Python requirements. Putting it all in one make target makes it easier for developers to jump into a project.
  • PHONY: When you run make requirements.txt, you want it to run every time. Not just when requirements.in changes. That’s because new versions may have been uploaded to PyPI. I always group my PHONY with my target. Even though it adds more lines, your Makefile will be more maintainable because you’re not trying to keep a list off the screen up to date.
  • requirements.txt: Why make requirements.txt over make requirements? Because best practice dictates that if the output of a make target is a file, that file should also be the name of the target. That way, you can use the automatic variable $@ and it’s explicit, even at the cost of needing the PHONY.
  • –upgrade: Without this, pip-tools doesn’t actually upgrade your dependencies.
  • –output-file $@: pip-tools does this by default, but explicit is better than implicit. I would prefer to do pip-compile --upgrade requirements.in > $@ but pip-tools 1.6 does a poor job of dealing with stdout (see below).

Caveats

  • When you change requirements.in, you do have to remember to run make requirements, but you could automate that with a git-hook or CI process. In practice, we’ve found that running make requirements.txt is fine.
  • pip-tools==1.6 does not work with the latest pip (8.1.2). See #358
  • pip-tools==1.6 has a poor understanding of how stdin and stdout are supposed to work. Hopefully this gets fixed soon but is only a minor annoyance. #362 #360 #353 #104
  • The compilation step can depend on your platform. I’ve only noticed this with ipython, which needs packages for interacting with the terminal like gnureadline. It hasn’t been trouble for us, but it could be for you. A workaround is to run the process in a Docker container.

Sample Scenarios

If you need more convincing, here are some problems this approach solves for us:

I thought I was getting xyzpackage version 3, why is version 2 getting installed? Pip tools flattens all your requirements, and annotates which package specified what. So in requirements.txt, you’ll see xyzpackage==2.4    # via scumbagpackage and see that scumbagpackage was responsible.

What packages am I actually using? In a large project, your requirements.txt will balloon as you run into bugs and start pinning dependencies of dependencies. Then one day, you’ll realize you don’t know what packages you’re actually using. With a much simpler requirements.in, there’s less to sort through and fully pinned packages stick out like sore thumbs.

It works for me Sometimes a project will work only for you. You check your installed versions against requirements.txt and they match. But what you didn’t realize is a dependency of a dependency broke something. Since pip-tools freezes everything, you’ll have the same version of every package installed. And if something does break, you’ll have history to trace down what changed.

Finish Writing Me Plz Nerd

Apache Bench

For years, my tool for simple load tests of HTTP sites has been ApacheBench.

For years, my reference for how to visualize ApacheBench results has been Gnuplot

For years, my reference for how to use Gnuplot has been http://www.bradlanders.com/2013/04/15/apache-bench-and-gnuplot-youre-probably-doing-it-wrong/

But do you really want to be writing Gnuplot syntax? It turns out that Pandas will give you great graphs pretty much for free:

df = pd.read_table('../gaussian.tsv')
# The raw data as a scatterplot
df.plot(x='seconds', y='wait', kind='scatter')
scatter
# The traditional Gnuplot plot
df.plot(y='wait')
wait
# Histogram
df.wait.hist(bins=20)

distribution

 

You can see the full source code at tsv_processing.ipynb

And re-recreate these yourself by checking out the parent repo: github/crccheck/abba

So now you might be thinking: How do you get a web server that outputs a normal distribution of lag? Well, I wrote one! I made a tiny Express.js server that just waits a random amount, packaged it in a Docker image, and and you can see exactly how I ran these tests by checking out my Makefile.

Nerd

Django Nose without Django-Nose

Tycho Brahe

I’ve grown to dislike Django-Nose. It’s been over three months since Django 1.8 has been released and they still don’t have a release that fully supports it. These are the advantages they currently tout:

  • Testing just your apps by default, not all the standard ones that happen to be in INSTALLED_APPS
    • The Django test runner has been doing this since 1.6 https://docs.djangoproject.com/en/1.8/releases/1.6/#discovery-of-tests-in-any-test-module
  • Running the tests in one or more specific modules (or apps, or classes, or folders, or just running a specific test)
    • They all can do this, even the old Django test runner
  • Obviating the need to import all your tests into tests/__init__.py. This not only saves busy-work but also eliminates the possibility of accidentally shadowing test classes.
    • The Django test runner has this since 1.6
  • Taking advantage of all the useful nose plugins
    • There are some cool plugins
  • Fixture bundling, an optional feature which speeds up your fixture-based tests by a factor of 4
    • Ok, Django doesn’t have this, but you shouldn’t be using fixtures anyways and there are other ways to make fixtures faster
  • Reuse of previously created test DBs, cutting 10 seconds off startup time
    • Django can do this since 1.8 https://docs.djangoproject.com/en/1.8/releases/1.8/#tests
  • Hygienic TransactionTestCases, which can save you a DB flush per test
    • Django has had this since 1.6 https://docs.djangoproject.com/en/1.6/topics/testing/tools/#django.test.TransactionTestCase
  • Support for various databases. Tested with MySQL, PostgreSQL, and SQLite. Others should work as well.
    • Django has had this forever

So what if you need a certain nose plugin? Say, xunit for Jenkins or some other tooling? Well, you still have to use Nose because django-jux hasn’t been updated in 4 years.

Here’s a small script you can use that lets you use Django + Nose while skipping the problematic Django-nose:

Run it like you would Nose:

DJANGO_SETTING_MODULE=settings.test python runtests.py --with-xunit --with-cov

One choice I made is that I use Django 1.8’s --keepdb flag instead of the REUSE_DB environment variable, but you can see how to adapt it if you wanted it to feel more like Nose. Adapting the command above to reuse the database would look like:

DJANGO_SETTING_MODULE=settings.test python runtests.py --with-xunit --with-cov --keepdb

Meh Practices Nerd Patterns

Django management commands and verbosity

Ren and Stimpy

[update: This post has been corrected, thanks to my commenters for your feedback]

Every Django management command gets the verbosity option for free. You may recognize this:

optional arguments:
  -h, --help            show this help message and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output

We rarely use it because doing so usually means lots of if statements scattered through our code to support this. If you’re writing quick n’ dirty code, this may look familiar in your management commands:


if options.get('verbosity') == 3:
    print('hi')

In a recent Django project, I came up with a few lines of boilerplate to support the verbosity option, assuming you’re using also the logging library and not relying on print:


import logging
class Command(BaseCommand):
    def handle(self, *args, **options):
        verbosity = options.get('verbosity')
        if verbosity == 0:
            logging.getLogger('my_command').setLevel(logging.WARN)
        elif verbosity == 1:  # default
            logging.getLogger('my_command').setLevel(logging.INFO)
        elif verbosity > 1:
            logging.getLogger('my_command').setLevel(logging.DEBUG)
        if verbosity > 2:
            logging.getLogger().setLevel(logging.DEBUG)

github.com/texas/tx_mixed_beverages/blob/master/mixed_beverages/apps/receipts/management/commands/geocode.py

So what does this do?

At the default verbosity, 1, I display INFO logging statements from my command. Increasing verbosity to 2, I also display DEBUG logs from my command. And going all the way to verbosity 3, I also enable all logging statements that reach the root logger.

Go forth and log!

Finish Writing Me Plz

Prometheus Contained

After becoming smitten with Graphite last year, I’m sorry to say I’ve become entranced by the new hotness: Prometheus. For a rundown between the two, Prometheus’s docs do a good job. The docs aren’t bad, but there are a lot of gaps I had to fill. So I present my hello world guide for using Prometheus to get metrics on a host the Docker way. In addition to the official docs, I found Monitor Docker Containers with Prometheus to be useful. For the node explorer, discordianfish’s Prometheus Demo was valuable.

Start a Container Exporter container

This container creates an exporter that Prometheus can talk to to get data about all the containers running on the host. It needs access to cgroups to get the data, and the docker socket to know what containers are running.

docker run -d --name container_explorer \
  -v /sys/fs/cgroup:/cgroup:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  prom/container-exporter

Start a Node Exporter container

This container uses the --net=host option so it can get metrics about the host network interface.

docker run -d --name node_exporter --net=host \
  prom/node-exporter

I was afraid that this would result in distorted stats because it’s in a container instead of the host, but after testing against a Node Exporter installed on the bare metal, it looks like it’s accurate.

Start a Prometheus container

This is the container that actually collects data. I’ve mounted my local prometheus.conf so Prometheus uses my configuration, and mounted a data volume so Prometheus data can persist between containers. There’s a link to the container explorer so Prometheus can collect metrics about containers. There’s a add-host so this container can access the node explorer’s metrics. Port 9090 is exposed because Prometheus needs to be publicly accessible from the Dashboard app. I’m not sure how to lock it down for security. I may add a referrer check since I don’t want to do IP rules or a VPN.

docker run -d --name $@_1 \
  -v ${PWD}/prom/prometheus.conf:/prometheus.conf:ro \
  -v ${PWD}/prom/data:/prometheus \
  --link container_explorer:container_explorer \
  --add-host=dockerhost:$(ip route | awk '/docker0/ { print $NF }') \
  -p 9090:9090 \
  prom/prometheus

Setup a Prometheus Dashboard database

Here, I’m running the rake db:setup task to set up a sqlite database. The image is my own pending pull request 386.

docker run \
  -e DATABASE_URL=sqlite3:/data/dashboard.sqlite3 \
  -v ${PWD}/prom/dashboard:/data:rw \
  crccheck/promdash ./bin/rake db:setup

Start a Prometheus Dashboard

Now that the dashboard has a database, you can start the Dashboard web app. You can access it on port 3000. You’ll need to create a server, then a site, then a dashboard, and finally, your first graph. Don’t be disappointed when your first graph doesn’t load. You may need to tweak the server settings until the graph preview from the Dashboard opens the right data in Prometheus.

docker run-d --name promdash \
  -e DATABASE_URL=sqlite3:/data/dashboard.sqlite3 \
  -v ${PWD}/prom/dashboard:/data:rw \
  -p 3000:3000 \
  crccheck/promdash

My almost unabridged setup

What’s missing in this gist is the config for my nginx layer in front of the dashboard, which is why there’s no exposed port in this gist. To get started, all you have to is put prometheus.conf in a prom sub-directory and run make promdash

Mental Note: Add Category

Patterns: bootstrapping Python/SublimeText projects

I consider myself really lazy. So here’s my basic workflow of home-grown commands I go through whenever I work on a Python project.

Some notes before I dive in:

  • replace  $PROJECT with the name of your project. I keep a consistent name between projects so I can use the directory name for other things.
  • some tedious things I alias, some I don’t
# everything has to live somewhere
md $PROJECT
# equivalent to: mdkdir $PROJECT && cd $PROJECT
# my alias: md () { mkdir -p "$@" && cd "$@"; }

# Using virtualenvwrapper, make a virtualenv the same name as the project
mkvirtualenv $PROJECT

# These next steps happen automatically in my global postactivate
# Give this virtualenv its own ipython configuration so it has its own ipython history
mkipythonconfig
# change my tab title to the name of this virtualenv so I can tell my tabs apart
echo -e "\033];$(basename $VIRTUAL_ENV)\007"
# my alias: tit() { echo -e "\033];$@\007" }

# This makes it so when I `workon $PROJECT`, I automatically cd to it.
# Sometimes, this is undesirable, but `cd -` will take you back.
setvirtualenvproject
# likewise I have a function named `work` that activates or creates the virtualenv:
function work {
  # assumes `mkvirtualenv` exists
  local env_name=$(basename $PWD)
  workon $env_name
  if [ $? -ne 0 ]; then
    echo "Shall I create $env_name it for you? [Yn]"
    read -n 1 sure
    if [ -z "$sure" ] || [ "$sure" = 'y' ] || [ "$sure" = 'Y' ]; then
      mkvirtualenv $env_name
      setvirtualenvproject
    fi
  fi
}

# create a sublime text 2 project with a SublimeJEDI configuration
mksublconfig > ~/Documents/$PROJECT.sublime-project
# source: https://github.com/crccheck/dotfiles/blob/master/bin/mksublconfig

# Create an .env file to store configuration like DJANGO_SETTINGS_MODULE
# Works in conjunction with autoenv.
touch .env

# If I'm creating a new project, I create an empty initial commit so I can
# squash and rebase things from the beginning.
git init
git initempty
#  initempty = !git init && git commit -m 'initial commit (empty)' --allow-empty
Best Practices Patterns

Patterns: don’t mess up the prod db!

With 12factor style environment configs, it’s a very easy to accidentally connect to your production database when you think you’re connecting to dev. Here’s a simple guard you can add to make sure your  DATABASE_URL isn’t somehow pointed to someplace it’s not supposed to (assuming you’re using Amazon AWS):

bash:

if [[ ${DATABASE_URL} == *"amazonaws"* ]]; then exit -1; fi

Python:

if 'amazonaws' in os.environ['DATABASE_URL']:
   exit('Cannot be run against a production database')

Django:

if 'amazonaws' in settings.DATABASES['default']['HOST']:
    raise CommandError('Cannot be run against a production database')

(thanks to x110dc for the bash syntax and 12factor link)

You didn't say the magic word

Mental Note: Add Category

TBT: Doom mooD

doom mood

I posted about this a long time ago, but I was cleaning my hard drive and found this screenshot and I need to brag about how awesome Doom mooD was:

  1. The face animated
  2. The face got bloodier the higher the CPU usage got
  3. The expression changed depending on how quickly CPU usage changed
  4. Doomguy’s face turns to look at your mouse pointer when it enters the widget
  5. A testament to the original id design, this little bar packs a ton of information
  6. iddqd
Demo of the face animation of Doom mooD

2006 me was cleverer than I remember. I was mindful of overhead, made my own crude bitmap font, and was doing some pretty clever JavaScript for 2006.

I even found my original README:

This widget was born from boredom. I saw a Doom screenshot and thought
to myself that the status bar of Doom would make an entertaining
system monitor, and so I made it in Konfabulator.

I never though it would be all that useful, but I must say, I surprise
even myself. Turns out I use this widget ALL the time. And watching
the face is quite entertaining. So entertaining in face I made a
special face-only mode for the widget. The face mode is excellent
because you can put the face anywhere and he blends in. And a quick
glance tells you how your system is doing. (HINT: blood = bad).

I must warn you, one side effect of using this widget is that I
started to stress my system on purpose, just to get the guy to make
cooler, bloodier, faces.

== Usage Tips ==
To the right of the memory indicator, there are three indicators.
The first one shows if you have a CD or DVD inserted. Clicking will
open it.
The second one will show you if your volume is muted.
The third one will give info about the Trash/Recycle Bin. Clicking
will open it.

== Weapon Slots ==
These are the indicators labelled '2' through '7' directly left of
the face. In Doom mooD, they are used as quick launchers.

To add an entry, simply drag and drop what you want to launch onto
a number. The widget will remember this entry between sessions.

To clear an entry, right click and select 'Clear'

To launch an entry, right click and select 'Launch'
or while the widget is focused, type the number corresponding to entry.

The entry can be anything. A link, a program, a document, a picture, etc.

== Side Meters ==
On the far right, there are four meters. To set these, go into
preferences and check the \"Side Meters\" tab. The DISK meter will not
show anything until you set a drive for it to monitor. To do this,
right click on the DISK text and select the drive from the context
menu. If you do not see drives in the context menu reposition the
mouse and try again.

Note: The Virtual Memory meter isn't very accurate.

== How To Read the Face ==
Ok, so you know the more beat up and bloody he gets, the worse off
your system is. The blood corresponds to CPU load and memory load.
When he grimices in pain, your CPU load just went up. There are two
levels of this. And finally, when he's grinning, your CPU load just
went down.

It's fun to enable face-mode, and put the face in the middle of an
analog clock face, or interacting with something on your desktop, etc.
You get the idea.

== Performance Concerns ==
Since I use a lot of custom fonts, Konfabulator uses more memory
than you'd think. For each digit, K! loads a copy of the entire font
table. I might be able to reduce it further by breaking up the images
into separate files, but I'm satisfied with the status quo. Scaling
the graphics larger for the bar exponentially increases the number
of page faults generated.

The graphics are copyright 1993 ID Software, and used with permission.

Changelog

1.4.1 (9.10.2006)
+ fixed zOrder of new custom hovertips
+ turned off debug mode
* updated icon

1.4 (7.23.2006)
+ can Empty Trash from widget
+ new custom hovertips show even when widget is not focused

1.3 (1.29.2006)
+ Updates split up to be more efficient, some meters can be disabled now.
  (fixes problems people had with high utilization when accessing the disk)
* Digit packaging changed, should very slightly reduce pagefaults and
  memory usage, but increases package size.
+ New weird face shows up less often.
- fixed bug where volume didn't change scale.
+ widget now takes a 5 second nap when computer wakes up

1.2 (1.5.2006)
+ New Option: Reverse CPU/Memory. When enabled, CPU/Memory count down from 100,
  akin to health going from 100 to 0
- fixed bug that kept some faces from appearing
+ Weapon Slots now active. See documentation above to see how to use them.
* minor optimizations
+ Tooltips added to side meters
* Changed free space calculation method to Round from Floor

1.1 (12.27.2006)
+ The four meters on the right are now determined by preferences
+ Added WiFi side meter option
+ Disk meter has to be set by the user now (right click on \"DISK\")
- Redid some code because of the 3.0 Engine 

1.0.1
- bugfix

1.0
- First Public Release

  TODO:
  reduce pagefaults
  battery usage statistics