Tag Archives: django

Nerd

Django Nose without Django-Nose

Tycho Brahe

I’ve grown to dislike Django-Nose. It’s been over three months since Django 1.8 has been released and they still don’t have a release that fully supports it. These are the advantages they currently tout:

  • Testing just your apps by default, not all the standard ones that happen to be in INSTALLED_APPS
    • The Django test runner has been doing this since 1.6 https://docs.djangoproject.com/en/1.8/releases/1.6/#discovery-of-tests-in-any-test-module
  • Running the tests in one or more specific modules (or apps, or classes, or folders, or just running a specific test)
    • They all can do this, even the old Django test runner
  • Obviating the need to import all your tests into tests/__init__.py. This not only saves busy-work but also eliminates the possibility of accidentally shadowing test classes.
    • The Django test runner has this since 1.6
  • Taking advantage of all the useful nose plugins
    • There are some cool plugins
  • Fixture bundling, an optional feature which speeds up your fixture-based tests by a factor of 4
    • Ok, Django doesn’t have this, but you shouldn’t be using fixtures anyways and there are other ways to make fixtures faster
  • Reuse of previously created test DBs, cutting 10 seconds off startup time
    • Django can do this since 1.8 https://docs.djangoproject.com/en/1.8/releases/1.8/#tests
  • Hygienic TransactionTestCases, which can save you a DB flush per test
    • Django has had this since 1.6 https://docs.djangoproject.com/en/1.6/topics/testing/tools/#django.test.TransactionTestCase
  • Support for various databases. Tested with MySQL, PostgreSQL, and SQLite. Others should work as well.
    • Django has had this forever

So what if you need a certain nose plugin? Say, xunit for Jenkins or some other tooling? Well, you still have to use Nose because django-jux hasn’t been updated in 4 years.

Here’s a small script you can use that lets you use Django + Nose while skipping the problematic Django-nose:

Run it like you would Nose:

DJANGO_SETTING_MODULE=settings.test python runtests.py --with-xunit --with-cov

One choice I made is that I use Django 1.8’s --keepdb flag instead of the REUSE_DB environment variable, but you can see how to adapt it if you wanted it to feel more like Nose. Adapting the command above to reuse the database would look like:

DJANGO_SETTING_MODULE=settings.test python runtests.py --with-xunit --with-cov --keepdb

Meh Practices Nerd Patterns

Django management commands and verbosity

Ren and Stimpy

[update: This post has been corrected, thanks to my commenters for your feedback]

Every Django management command gets the verbosity option for free. You may recognize this:

optional arguments:
  -h, --help            show this help message and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output

We rarely use it because doing so usually means lots of if statements scattered through our code to support this. If you’re writing quick n’ dirty code, this may look familiar in your management commands:


if options.get('verbosity') == 3:
    print('hi')

In a recent Django project, I came up with a few lines of boilerplate to support the verbosity option, assuming you’re using also the logging library and not relying on print:


import logging
class Command(BaseCommand):
    def handle(self, *args, **options):
        verbosity = options.get('verbosity')
        if verbosity == 0:
            logging.getLogger('my_command').setLevel(logging.WARN)
        elif verbosity == 1:  # default
            logging.getLogger('my_command').setLevel(logging.INFO)
        elif verbosity > 1:
            logging.getLogger('my_command').setLevel(logging.DEBUG)
        if verbosity > 2:
            logging.getLogger().setLevel(logging.DEBUG)

github.com/texas/tx_mixed_beverages/blob/master/mixed_beverages/apps/receipts/management/commands/geocode.py

So what does this do?

At the default verbosity, 1, I display INFO logging statements from my command. Increasing verbosity to 2, I also display DEBUG logs from my command. And going all the way to verbosity 3, I also enable all logging statements that reach the root logger.

Go forth and log!

Best Practices Patterns

Patterns: don’t mess up the prod db!

With 12factor style environment configs, it’s a very easy to accidentally connect to your production database when you think you’re connecting to dev. Here’s a simple guard you can add to make sure your  DATABASE_URL isn’t somehow pointed to someplace it’s not supposed to (assuming you’re using Amazon AWS):

bash:

if [[ ${DATABASE_URL} == *"amazonaws"* ]]; then exit -1; fi

Python:

if 'amazonaws' in os.environ['DATABASE_URL']:
   exit('Cannot be run against a production database')

Django:

if 'amazonaws' in settings.DATABASES['default']['HOST']:
    raise CommandError('Cannot be run against a production database')

(thanks to x110dc for the bash syntax and 12factor link)

You didn't say the magic word

Neato! Nerd

Check yo queries before you wreck yo site

When building high performance Django sites, keeping the number of queries down is essential. And just like controlling technical debt and maintaining test coverage, being successful means making monitoring queries a natural part of your workflow. If your momentum has to be stopped to examine database queries, you’re not going to do it. The solution for most developers is Django-Debug Toolbar, which sits off to the side in your web browser, but I’m going to share the way I do it.

I do like Django-Debug Toolbar, but most of the time, it just gets in my way: it only works on pages that return HTML, the overhead adds to the response time, and it modifies the response (adding to the response size and render time). What I prefer is displaying SQL queries along HTTP requests in runserver’s output:

Terminal Output with Colorizing Output and SQL

There’s three parts to getting this to work:

  1. ColorizingStreamHandler — this lets me distinguish http requests (gray/info) from SQL queries (blue/debug)
  2. ReadableSqlFilter — this reformats output by stripping the SELECT arguments so you can focus on the WHERE clauses
  3. Opt-in — having SQL spat out everywhere can be distracting, so it’s opt-in with an environment variable

Getting started

ColorizingStreamHandler and ReadbleSqlFilter are a logging handler and a logging filter packaged in project_runpy. Add it to your Django project with pip install project_runpy (no installed apps changes needed). They get thrown into your Django logging configuration like:

[python]LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'root': {
        'level': os.environ.get('LOGGING_LEVEL', 'WARNING'),
        'handlers': ['console'],
    },
    'filters': {
        'require_debug_false': {
            '()': 'django.utils.log.RequireDebugFalse',
        },
        'require_debug_true': {
            '()': 'django.utils.log.RequireDebugTrue',
        },
        'readable_sql': {
            '()': 'project_runpy.ReadableSqlFilter',
        },
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'project_runpy.ColorizingStreamHandler',
        },
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG' if env.get('SQL', False) else 'INFO',
            'handlers': ['console'],
            'filters': ['require_debug_true', 'readable_sql'],
            'propagate': False,
        },
        'factory': {
            'level': 'ERROR',
            'propagate': False,
        },
    }
}
[/python]

From: https://github.com/crccheck/crccheck-django-boilerplate/blob/master/project/project_name/settings.py#L84

I found that the logging can get obtrusive. It’ll show up in your runserver, your shell, when you run scripts, and even in your iPython notebooks. So using a short environment variable makes it easy to flip on and off. I’m also using project_runpy’s  env environment variable getter, but if you don’t want that, I suggest using  if 'SQL' in os.environ to avoid how  '0' and  'False' evaluate to  True when reading environment variables.

I was afraid that adding ReadableSqlFilter would add a performance penalty, but I don’t notice any. The extra verbosity is also nice when you have a view with several expensive queries because you can see them run before the page is rendered.

How to use this

Just code as usual! If you’re like me, you keep at least part of your runserver terminal visible. If you are me, hello! Use your peripheral vision to look for long flashes of blue text. When you do, you know you’ve hit something that’s making too many queries. The main culprits will be missing select_relateds and prefetch_relateds. Eventually, you’ll develop a sixth sense for when your querysets could be better, and then you’ll look back at your old code like this:

that-queryset-ain't-right

Continuing

A warning: don’t think too much about the query time reported. You can’t compare database queries done locally with what happens in production. One thing we did confirm is 10 one second queries are much slower than one 10 second query, even in the same AWS availability zone. Something that isn’t as obvious when everything is local. Just pay attention to the number of queries.

Finally, if you really want to keep your database hits low as you continue to develop, document them in your tests. Django makes an assertion available called  assertNumQueries that you can throw in your tests just to document how many queries an operation takes. They don’t even have to be good; for example, I wrote these scraper tests to document that my code currently makes too many database queries. It’s similar to making sure your views return 200s. Make sure you know how many queries you’re getting yourself into.

Too Many Queries

Best Practices Finish Writing Me Plz Nerd

Managing Technical Debt in Django Projects

This is fine

I’ve been thinking about this subject a lot, and I’ve been meaning to write something. Rather than procrastinate until I have a lot of material, I’m going to just continuously edit this post as I discover things. Many of these principles aren’t specific to Django, but most of this experience comes from dealing with Django.

Some of these tips don’t cost any time, but some involve investing extra time to do things differently. It’s in the name of saving time in the long run. If you’re writing a few lines of JavaScript that’s going to be thrown away in a day, then you shouldn’t waste time building up a castle of tests around it. Managing technical debt is a design tradeoff. You’re sacrificing some agility and features for developer happiness.

Don’t reuse app names and model names

You can have Django apps with the same name. And you can have models with the same name, but your life will be easier if you can reference a model without having to think about which app it came from. This makes it easier to understand the code; makes it easier to to use tools (grep) to analyze and search makes it easier to use shell_plus, which automatically loads every model for you in the shell.

Leave XXX comments

You should leave yourself task comments in your code, and you should have three levels (like critical, error, warning or high, medium, low) for the priority. I commonly use  FIXME for problems,  TODO for todos,  DELETEME for things that really should be deleted,  DEPRECATED for things I really ought to look at later, and  XXX for documenting code smells and anti-patterns. Some examples:

  • FIXME this will break if user is anonymous
  • TODO  make this work in IE9
  • DELETEME This code block is impossible to reach anyways
  • DEPRECATED use new_thing.method instead of this
  • WISHLIST make this work in IE8
  • XXX magic constant
  • FIXME DELETEME this needs csrf

The comment should be on the same line, so when you grep TODO, you’ll be able to quickly scan what kind of todos you have. This is what other tools like Jenkin’s Task Scanner expect too. Many people say you shouldn’t add TODO comments to your code. You should just do them. In practice, that is not practical, and leads to huge diffs that are hard to review.

There is such a thing has too many comments. For example, instead of writing a comment to explain a poorly named variable, you should just rename the variable.

Naming Things

One of Phil Karlton’s famous two hard things, finding the write names can make a huge difference. With Django code, I find that I’m happiest when I’m following the same naming conventions as the Django core. You should be familiar with the concepts of Hungarian notation and coding without comments (remember the previous paragraph?).

Single letter variables are almost always a bad idea, except with simple code. They’re un-greppable and except for the following, have no meaning:

  • i (prefer idx) — a counter
  • x — When iterating through a loop
  • k, v — Key/Value when looping through dict items
  • na count/total/summation (think traditional for-loop)
  • a, b — When looking at two elements, like in a reduce function (very rare)

An exception is math, where x and y, and m and n, etc. are commonplace.

If the name of your variable implies a type, it should be that type. You would not believe how common the name of the variable lies.

When writing code, it should read close to English. Getter functions should begin with “get_”. Booleans and functions that return booleans commonly begin with “is_” (though anything that is readable and obvious will work).

Write tests

My current philosophy is “if you liked it then you should have put a test on it”. The worst part of technical debt is accidentally breaking things. To its credit, Django is the best framework I’ve used for testing. Unit tests are good for TDD, but functional tests are probably better for managing technical debt because they verify the output of your system for various inputs. Doing TDD, getting 100% coverage, and taking into account edge cases… never happens in practice. That does not mean you shouldn’t try. Adding tests to preventing regressions is the second best thing you can do. And the best thing you can do is to write those tests to begin with.

Get coverage

Running coverage is commonly done at the same time as tests. I skip the html reports and use  coverage report from the command line to get faster feedback. When you have good coverage, you can have higher confidence in your tests.

Be cognizant of when you’re creating technical debt

Of course, every line of code is technical debt, but I’ve started adding a “Technical Debt Note” too all pull requests. This is inspired by how legislation will have a fiscal note to assess how much it would cost. Bills can get shot down because they cost to much for what they promise. Features should be the same. Hopefully, you’re already catching these before you even write code, and you’re writing small pull requests. If we find that a pull requests increases technical debt to an unreasonable degree, we revise the requirements and the code.

Clean as you cook

Most people dread cooking because of the mountain of mess that has to get cleaned after the meal. But if you can master cleaning as you cook, there’s a much more reasonable and manageable mess. As you experiment with code, don’t leave behind unused code clean up inconsistencies as you go. Don’t worry about deleting something that might be useful or breaking something obscure. That’s what source control and tests are there for. Plan ahead for the full life cycle. That means if you’re experimenting with a concept, don’t stop when it works: update everything else to be consistent across the project. If it didn’t work, tear it down and kill it. Don’t get into a situation where you have to support two (or three or four or more) different paradigms.

The Boy Scout rule

“Always leave the campground leaner then you found it”. Feel free to break paradigm that a pull request should only do one thing. If you happen to clean something while working on a feature, there’s no shame in saying you took a little diversion.

Make it easy for others to jump in

Projects with a complicated bootstrapping processes are probably also difficult to maintain. Wipe your virtualenv once in a while. Wipe your database. If it’s painful, and you have to do it often enough, you’ll make it better. Code that doesn’t get touched often has the most technical debt.

tetris game over

Educate your organization that they can’t just ask for a parade of features

This problem fixes itself, one way or another. Either you keep building features and technical debt until you’re buried and everything comes to a standstill and you yell at each other, or you find a way to balance adding features.

Prevent Dependency Spaghetti

Just like how you should try to avoid spaghetti code, having a lot of third party apps that try to pull in dependencies will come back to bite you later on.

  1. Specify requirements with ==, not >=. Not every package uses semantic versioning. And using semver does not guarantee that a minor or bugfix release won’t break something.
  2. Don’t specify requirements of requirements. This is to avoid an explosion of requirements to keep track of.
  3. There’s no easy way to know when it’s safe to delete a requirement. Even if you have good test coverage, your test environment is not the same as production. For example, you can safely delete psycopg2 and still run your tests, but have fun trying to connect to your PostgreSQL database in production.

Don’t support multiple paths

If you’re writing a library consumed by many people, supporting get_queryset and get_query_set is a good idea. For yourself, only support one thing. If you have an internal library that’s used by multiple parts of the code base and you want add functionality while preserving backwards compatibility, you can write a compatibility layer, but then you should update it all within the same pull request. Or at least create an issue to clean it up. Supporting multiple code paths is technical debt.

Avoid Customizing the Admin

The moment you start writing customizations for the admin, you’ve now pinned yourself to whatever the admin happened to be doing that version. Unless you’re running automated browser tests to verify their functionality, you’re setting yourself up for things to break in the future. The Django admin always changes in major ways every version, and admin customizations always have weak test coverage.

Do Code Review and Peer Programming

Code review makes sure that more than one person’s input goes into a feature, and peer programming takes that even further. It helps make sure that crazy functionality and hard to read code doesn’t get into the main code base that others will then have to maintain. If you’re a team of one, do pull requests anyways. You’ll be amazed at all the mistakes and inconsistencies you’ll find when you view your feature all at once. Even better: sleep on your own pull requests so you can see them in a new light.

Don’t Write Unreadable Code

Code review and peer programming are supposed to keep you from writing unmaintainable code. We’ve embraced linters so that we can write code in the same style, coverage so we know when we need to write tests, but what if we could automatically know when we were writing complicated code? We can, using radon or PyMetric‘s McCabe’s Cyclomatic Complexity metric.

Additional Resources

  • http://youtu.be/SaHbtEeu37M Docker and DevOps by Gene Kim, Dockercon ’14
  • Inheriting a Sloppy Codebase by Casey Kinsey, DjangoCon US ’14

Special thanks to Noah S.

Case Study Nerd

How to Ignore the Needle Docs

At PyCon 2014, I learned about a package called “needle” from Julien Phalip’s talk, Advanced techniques for Web functional testing. When I tried using it with a Django project, I immediately ran into problems:

  1. The needle docs aren’t written for Django, so they don’t explain how to use NeedleTestCase with LiveServerTestCase.
  2. I wasn’t using nose as my test runner, and didn’t want to start using it just to run Needle.

The first problem turned out to be easy; use both:

class SiteTest(NeedleTestCase, LiveServerTestCase):
    pass

The second problem wasn’t that bad either. If you examine the Nose plugin Needle adds, it just adds a  save_baseline attribute to the test case.

There were a lot of random hacks and tweaks I threw together. I think the best way to show them all is with an annotated example:

import os
import unittest

from django.test import LiveServerTestCase
from needle.cases import NeedleTestCase


# This is a configuration variable for whether to save the baseline screenshot
# or not. You can flip it by manually changing it, with an environment variable
# check, or monkey patching.
SAVE_BASELINE = False


# You should be taking screenshots at multiple widths to capture all your
# responsive breakpoints. Only the width really matter,s but I include the
# height for completeness.
SIZES = (
    (1024, 800),  # desktop
    (800, 600),  # tablet
    (320, 540),  # mobile
)


# To keep the test runner from running slow needle tests every time, decorate
# it. In this example, 'RUN_NEEDLE_TESTS' has to exist in your environment for
# these tests to run. So you would run needle tests like:
#
#     RUN_NEEDLE_TESTS=1 python manage.py test python.import.path.to.test_needle
@unittest.skipUnless('RUN_NEEDLE_TESTS' in os.environ, 'expensive tests')
class ScreenshotTest(NeedleTestCase, LiveServerTestCase):
    # You're going to want to make sure your pages look consistent every time.
    fixtures = ['needle.json']

    @classmethod
    def setUpClass(cls):
        """
        Sets `save_baseline`.

        I don't remember why I did it here. Maybe the timing didn't work when
        I put it as an attribute on the test class.
        """
        cls.save_baseline = SAVE_BASELINE
        super(ScreenshotTest, cls).setUpClass()

    def assertResponsive(self, scope, name):
        """Takes a screenshot for every responsive size you set."""
        for width, height in SIZES:
            self.set_viewport_size(width=width, height=height)
            try:
                self.assertScreenshot(
                    scope,
                    # include the name and browser in the filename
                    '{}_{}_firefox'.format(name, width)
                )
            except AssertionError as e:
                print(e)
                # suppress the error so needle keeps making screenshots. Needle
                # is very fickle and we'll have to judge the screenshots by eye
                # anyways instead of relying on needle's pixel perfect
                # judgements.
                pass

    def test_homepage(self):
        urls_to_test = (
            ('/', 'homepage'),
            ('/login/', 'login'),
            ('/hamburger/', 'meat'),
            ('/fries/', 'potatoes'),
            ('/admin/', 'admin'),
        )
        for url, name in urls_to_test:
            self.driver.get(self.live_server_url + url)
            self.assertResponsive(
                # for now, I always want the full page, so I use 'html' as the
                # scope for my screenshots. But as I document more things,
                # that's likely to change.
                'html',
                # passing in a human readable name helps it generate
                # screenshots file names that make more sense to me.
                name,
            )

Well I hope that that made sense.

When you run the tests, they’re saved to the ./screenshots/ directory, which I keep out of source control because storing so many binary files is a heavy burden on git. We experimented with  git-annex but it turned out to be more trouble than it was worth.

My typical workflow goes like this:

  1. Make sure my reference baseline screenshots are up to date: git checkout master && grunt && invoke needle --make
  2. Generate screenshots for my feature branch: git checkout fat-buttons && grunt && invoke needle
  3. Open the screenshots directory and compare screenshots.

In that workflow,  grunt is used to generate css,  invoke is used as my test runner, and  --make is a flag I built into the needle invoke task to make baseline screenshots.

Now I can quickly see if a change has the desired effect for multiple browser widths faster than it takes to actually resize a browser window. Bonus: I can see if a change has undesired effects on pages that I would have been too lazy to test manually.

Next steps: I still haven’t figured out how to run the same test in multiple browsers.

Nerd

Using Travis CI with your GeoDjango project

travis fail

I’m usually very hesitant to use PostGIS because it makes testing more difficult. Instead of an in-memory sqlite database, you have to spin up and down a database instead. I decided to do it anyways, and then I forgot I had Travis CI on this project and every commit emailed me a reminder of my failure.

Luckily, a cursory search uncovered that Travis CI has support for PostGIS. And even luckier for me, it wasn’t that hard to do. I started with this blog post, and ended up with this as my .travis.yml:

language: python
addons:
  postgresql: "9.3"
env:
  PYTHONPATH='.'
install: "pip install -r requirements-dev.txt"
before_script:
  - psql -U postgres -c 'CREATE DATABASE tx_lobbying;'
  - psql -U postgres -c "CREATE EXTENSION postgis" -d tx_lobbying
  - psql -U postgres -c "CREATE EXTENSION postgis_topology" -d tx_lobbying
script: make test

The postgresql: "9.3" line wasn’t even necessary, but according to their docs, Travis CI uses Postgres 9.1 by default, and I’m using 9.3. The only other change I had to make was tangential: I still had my settings set to use a SQLite database by default, but I should have made it default to a PostGIS instead.

I feel like I should mention more about what I did in my settings.py since I’m writing this specifically about GeoDjango, but I don’t think the changes I made really had much to do getting it to run of Travis CI.

According to the Testing GeoDjango docs, the Django test runner would have created a test_tx_lobbying database based off a  template_postgis Postgres template, but I short-circuited that logic on Travis CI and ended up with a solution that did not involve messing with Postgres templates.

Now that I’ve done it once, and it wasn’t horrible, I plan on doing more things with GeoDjango that I’ve hesitated to do before. Now to find a cheap way to host toy PostGIS projects.

Reference: https://github.com/texastribune/tx_lobbying/commit/b7119580e4cd4b6c61fc992cc5a4fc6222c6483e

Case Study Nerd

Dissecting Elevators Part 8: deploying

If you examine the repo, you’ll see it’s a Django app; and the Procfile and requirements.txt would make you think the app itself was deployed on Heroku, but you would be wrong!

You may be surprised to find out that the Elevators Explorer is a static HTML app. Or maybe not if you read the previous seven parts. This was inspired by NPR Apps’s app-templates project and general malaise dealing with servers. At The Texas Tribune, we’ve talked about turning our data apps into static files; the general reasons being:

  1. Data apps are generally not dynamic. Making a server render the same output over and over is a waste of resources and takes away resources from rendering truly dynamic content.
  2. Static sites do not go down. When they do go down, you’ve got big problems.
  3. They don’t need fancy infrastructure. No database, no caching, no app server, no media server. They are the easiest kind of website to deploy.

So how do you turn a site served from a fancy framework like Django into a static site? Let’s start by looking at what I actually do in my terminal:

  1. I make sure my local dev server is running. For now that looks like python manage.py runserver 0.0.0.0:8000.
  2. In another window, I run make site; make upload. I can’t do make site upload right now because I need to handle a non-zero exit status in  make site.

make site

And now looking at the make site command in my Makefile, here’s the abridged version of that command:

site:
    cd site && wget -r localhost:8000 --force-html -e robots=off -nH -nv --max-redirect 0

And the detailed breakdown of that command:

  • cd site: this is the working directory I want the files downloaded to.
  • wget: this is the command that does the actual work. “GNU Wget is a free utility for non-interactive download of files from the Web.”
  • r: recursive
  • localhost:8000: the host to download files from
  • --force-html: Without this, wget would not interpret the django trailing-slash urls as html
  • -e robots=off: Not really necessary, but does prevent a 404 request to robots.txt
  • -nH: Disable generation of host-prefixed directories so the file paths are cleaner
  • -nv: Turn off verbose without being completely quiet, but no so quiet that I wouldn’t see errors
  • --max-redirect 0: Right now I’m using OpenDNS, and when I do hit a 404, this keeps me from downloading OpenDNS’s stupid search page.

If you’re interested in wget, I highly recommend skimming the wget manual to learn about other cool options and checking out commandlinefu’s wget recipes.

Levar Burton enjoys reading UNIX Man pages, so should you!

Out of the box, wget will do a very job of getting everything needed to build a Django site, as long as you aren’t using AJAX to pull things in. But the Elevator Explorer does use AJAX. To trick wget into downloading these, I added hints to the templates:

<script src="{{ STATIC_URL }}tx_elevators/js/search.js"></script>
<a class="prefetch-hint" href="/chart/search/data.json" rel="nofollow" style="display: none;"></a>

In the future, I think I’ll refine this by putting the anchor directly before the script tag, switch to using the {% url %} templatetag, and then you can get at that url with something like this jQuery pseudocode:

<a href="{% url 'my_data_source' %}"></a>
<script>
  var url = $('script:last').prev().attr('href'); $.ajax(url, ...);
</script>

This will hopefully be a decent solution the common problem of trying to use named Django urls in JavaScript. The next problem I discovered is that I needed the json I sent to look like a file to wget so it wouldn’t mangle it to index.html (breaking the requests inside the static site). I just changed the url patterns to end in “.json$” instead of “/$” in 36f276.

Another idea I might try is using the <link> tag, but I’d have to make sure wget still downloaded the resources, and you can only put them in <head>.

make upload

The next part of the process is to upload the site directory someplace. I chose to use S3 and its ability to host a static site. A regular webhost would also work, and might even work better if you had the ability to use rsync instead of what I had to do. But let’s continue and go into what exactly I did, starting with the make command:

upload:
    python $(PROJECT)/manage.py sync_s3 --dir site --gzip

The sync_s3 command here is a basic fork of django-extensions’s  sync_media_s3 command. I only made one tweak to it to so it’ll gzip json and html too. If you don’t need that, you can use the original sync_media_s3 command. To set up the credentials for S3, you can either read django-extensions’s docs or just try it. The error messages will tell you exactly what you need to fix whatever doesn’t work. I will point out that  the source directory is set by the --dir flag, and the destination S3 bucket is controlled by the general configuration.

Performance

We know the performance of a static website is awesome, but the trade-off is it’s slow to do the work up front of generating a static website from a dynamic one. You may have noticed the terrible timing figures documented in the makefile. It takes over half an hour to crawl the site, and over three hours to upload to S3. I fiddle a few knobs in a futile effort to speed things up: turning off debug and using gunicorn, but they didn’t help. There are some other ideas I pondered for improving performance:

  • It would be possible to write a django specific spider instead of wget that could read the url patterns. This is what app-template does for flask if you look at its fab render task.
  • I could make the process multi-threaded. There are some alternatives to wget, but wget is good at what it does and is ubiquitous. I’d rather speed up the upload process. Github user billyvg did some work on making sync_media_s3 multithreaded in issue #161.

Other ideas that are probably bad ideas or not worth the effort:

  • Combine the download/upload steps
  • Prioritize pages so important pages are updated first

And for comparison, using rsync is so much faster it seems silly not to use it:

time rsync -avz site/ remote:elevators
sent 45200666 bytes  received 561995 bytes  92356.53 bytes/sec
total size is 123778877  speedup is 2.70

real    8m14.409s

Conclusion

For staging, I did deploy to Heroku. But I didn’t want to pay to keep a database online (this app exceeds the free database limit of 10,000 rows) and I didn’t know how much traffic I could serve from the free tier. The static site worked really well, except for the 4 hour deploys. I think it’s possible to get it to 2 hours, which is great for a 25,000 page site.