Tag Archives: s3

Case Study Nerd

Dissecting Elevators Part 8: deploying

If you examine the repo, you’ll see it’s a Django app; and the Procfile and requirements.txt would make you think the app itself was deployed on Heroku, but you would be wrong!

You may be surprised to find out that the Elevators Explorer is a static HTML app. Or maybe not if you read the previous seven parts. This was inspired by NPR Apps’s app-templates project and general malaise dealing with servers. At The Texas Tribune, we’ve talked about turning our data apps into static files; the general reasons being:

  1. Data apps are generally not dynamic. Making a server render the same output over and over is a waste of resources and takes away resources from rendering truly dynamic content.
  2. Static sites do not go down. When they do go down, you’ve got big problems.
  3. They don’t need fancy infrastructure. No database, no caching, no app server, no media server. They are the easiest kind of website to deploy.

So how do you turn a site served from a fancy framework like Django into a static site? Let’s start by looking at what I actually do in my terminal:

  1. I make sure my local dev server is running. For now that looks like python manage.py runserver 0.0.0.0:8000.
  2. In another window, I run make site; make upload. I can’t do make site upload right now because I need to handle a non-zero exit status in  make site.

make site

And now looking at the make site command in my Makefile, here’s the abridged version of that command:

site:
    cd site && wget -r localhost:8000 --force-html -e robots=off -nH -nv --max-redirect 0

And the detailed breakdown of that command:

  • cd site: this is the working directory I want the files downloaded to.
  • wget: this is the command that does the actual work. “GNU Wget is a free utility for non-interactive download of files from the Web.”
  • r: recursive
  • localhost:8000: the host to download files from
  • --force-html: Without this, wget would not interpret the django trailing-slash urls as html
  • -e robots=off: Not really necessary, but does prevent a 404 request to robots.txt
  • -nH: Disable generation of host-prefixed directories so the file paths are cleaner
  • -nv: Turn off verbose without being completely quiet, but no so quiet that I wouldn’t see errors
  • --max-redirect 0: Right now I’m using OpenDNS, and when I do hit a 404, this keeps me from downloading OpenDNS’s stupid search page.

If you’re interested in wget, I highly recommend skimming the wget manual to learn about other cool options and checking out commandlinefu’s wget recipes.

Levar Burton enjoys reading UNIX Man pages, so should you!

Out of the box, wget will do a very job of getting everything needed to build a Django site, as long as you aren’t using AJAX to pull things in. But the Elevator Explorer does use AJAX. To trick wget into downloading these, I added hints to the templates:

<script src="{{ STATIC_URL }}tx_elevators/js/search.js"></script>
<a class="prefetch-hint" href="/chart/search/data.json" rel="nofollow" style="display: none;"></a>

In the future, I think I’ll refine this by putting the anchor directly before the script tag, switch to using the {% url %} templatetag, and then you can get at that url with something like this jQuery pseudocode:

<a href="{% url 'my_data_source' %}"></a>
<script>
  var url = $('script:last').prev().attr('href'); $.ajax(url, ...);
</script>

This will hopefully be a decent solution the common problem of trying to use named Django urls in JavaScript. The next problem I discovered is that I needed the json I sent to look like a file to wget so it wouldn’t mangle it to index.html (breaking the requests inside the static site). I just changed the url patterns to end in “.json$” instead of “/$” in 36f276.

Another idea I might try is using the <link> tag, but I’d have to make sure wget still downloaded the resources, and you can only put them in <head>.

make upload

The next part of the process is to upload the site directory someplace. I chose to use S3 and its ability to host a static site. A regular webhost would also work, and might even work better if you had the ability to use rsync instead of what I had to do. But let’s continue and go into what exactly I did, starting with the make command:

upload:
    python $(PROJECT)/manage.py sync_s3 --dir site --gzip

The sync_s3 command here is a basic fork of django-extensions’s  sync_media_s3 command. I only made one tweak to it to so it’ll gzip json and html too. If you don’t need that, you can use the original sync_media_s3 command. To set up the credentials for S3, you can either read django-extensions’s docs or just try it. The error messages will tell you exactly what you need to fix whatever doesn’t work. I will point out that  the source directory is set by the --dir flag, and the destination S3 bucket is controlled by the general configuration.

Performance

We know the performance of a static website is awesome, but the trade-off is it’s slow to do the work up front of generating a static website from a dynamic one. You may have noticed the terrible timing figures documented in the makefile. It takes over half an hour to crawl the site, and over three hours to upload to S3. I fiddle a few knobs in a futile effort to speed things up: turning off debug and using gunicorn, but they didn’t help. There are some other ideas I pondered for improving performance:

  • It would be possible to write a django specific spider instead of wget that could read the url patterns. This is what app-template does for flask if you look at its fab render task.
  • I could make the process multi-threaded. There are some alternatives to wget, but wget is good at what it does and is ubiquitous. I’d rather speed up the upload process. Github user billyvg did some work on making sync_media_s3 multithreaded in issue #161.

Other ideas that are probably bad ideas or not worth the effort:

  • Combine the download/upload steps
  • Prioritize pages so important pages are updated first

And for comparison, using rsync is so much faster it seems silly not to use it:

time rsync -avz site/ remote:elevators
sent 45200666 bytes  received 561995 bytes  92356.53 bytes/sec
total size is 123778877  speedup is 2.70

real    8m14.409s

Conclusion

For staging, I did deploy to Heroku. But I didn’t want to pay to keep a database online (this app exceeds the free database limit of 10,000 rows) and I didn’t know how much traffic I could serve from the free tier. The static site worked really well, except for the 4 hour deploys. I think it’s possible to get it to 2 hours, which is great for a 25,000 page site.