{"id":308,"date":"2013-04-14T22:51:37","date_gmt":"2013-04-15T04:51:37","guid":{"rendered":"http:\/\/www.crccheck.com\/blog\/?p=308"},"modified":"2015-10-11T22:23:22","modified_gmt":"2015-10-12T04:23:22","slug":"dissecting-elevators-part-5-nosql-circa-1986","status":"publish","type":"post","link":"https:\/\/www.crccheck.com\/blog\/dissecting-elevators-part-5-nosql-circa-1986\/","title":{"rendered":"Dissecting Elevators Part 5: nosql circa 1986"},"content":{"rendered":"<blockquote><p><em>Intro<\/em>: This is part five of an eight part series looking at the <a href=\"http:\/\/elevators.texastribune.org\/\" target=\"_blank\">Elevator Explorer<\/a>, a fun data interactive mostly coded between the hours of 10 PM to 2 AM during the week leading up to April Fools\u00e2\u20ac\u2122 Day, 2013. I\u00e2\u20ac\u2122m going to be looking at the things I learned, things I wish I could have done, and the reasoning behind my design choices. The code I\u00e2\u20ac\u2122ll be referring to will be in this <a href=\"https:\/\/github.com\/texastribune\/tx_elevators\/tree\/2013-april-fools\">tagged release on github<\/a>.<\/p><\/blockquote>\n<p>I knew I wanted to geocode all the addresses for the buildings, but I didn&#8217;t quite know how my models would look. I knew from past experience that doing a pass of geocoding, then resetting the database, would mean I would have to start geocoding again from square one. How could I make this better?<\/p>\n<p>If only I had a wrapper around <a href=\"https:\/\/github.com\/geopy\/geopy\">geopy<\/a> that would persist old queries to disk. So I started writing one. At first, I thought I would need to do this in sqlite, but after doing a search for &#8220;python+key+value+store&#8221;, I found <a href=\"http:\/\/docs.python.org\/2\/library\/anydbm.html\">anydbm<\/a>. What is anydbm? Anydbm is a generic interface to any dbm database. What a name. In my case, it was using <a href=\"https:\/\/en.wikipedia.org\/wiki\/Berkeley_DB\">Berkley DB<\/a>. It&#8217;s really easy to use: 1) open a file 2) treat it like a dict. Way easier than trying to get a sqlite database going. But my database kept getting corrupted! I finally figured out that I needed to open and close the file for every transaction. Since the anydbm library is pretty dated and I couldn&#8217;t use it like a context manager, I had to manually close the file.<\/p>\n<p>My working version of the GoogleV3 geocoder looks <a href=\"https:\/\/github.com\/texastribune\/tx_elevators\/blob\/2013-april-fools\/geopydb\/geocoders.py\">like this<\/a>. I also made a script for dumping my existing geo data back to an anydbm database; that&#8217;s <a href=\"https:\/\/github.com\/texastribune\/tx_elevators\/blob\/2013-april-fools\/tx_elevators\/scripts\/dump_geopydb.py\">viewable here<\/a>.<\/p>\n<p>So after all that, I ended up with a library that mimicked the GoogleV3 geocoder. To use it, instead of the standard syntax of:<\/p>\n<pre><code>&gt;&gt;&gt; from geopy import geocoders\r\n&gt;&gt;&gt; g = geocoders.GoogleV3()\r\n&gt;&gt;&gt; place, (lat, lng) = g.geocode(\"10900 Euclid Ave in Cleveland\")\r\n&gt;&gt;&gt; print \"%s: %.5f, %.5f\" % (place, lat, lng)\r\n10900 Euclid Ave, Cleveland, OH 44106, USA: 41.50489, -81.61027\r\n<\/code><\/pre>\n<p>my database cached version of that is:<\/p>\n<pre><code>&gt;&gt;&gt; from geopydb import geocoders\r\n&gt;&gt;&gt; g = geocoders.GoogleV3()\r\n&gt;&gt;&gt; place, (lat, lng) = g.geocode(\"10900 Euclid Ave in Cleveland\")\r\n&gt;&gt;&gt; print \"%s: %.5f, %.5f\" % (place, lat, lng)\r\n10900 Euclid Ave, Cleveland, OH 44106, USA: 41.50489, -81.61027\r\n<\/code><\/pre>\n<p>Pretty convenient, and made my life easier. You may have noticed I&#8217;m not using GeoDjango. That&#8217;s because I wanted to deploy to the free tier at Heroku.<\/p>\n<h2>Improvements<\/h2>\n<p>If I had to write this now, I would switch to using <a href=\"https:\/\/dataset.readthedocs.org\/en\/latest\/\" target=\"_blank\">dataset<\/a>. Dataset came out around the same time as the Elevator Explorer. If it was out a week earlier, I could have used it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Intro: This is part five of an eight part series looking at the Elevator Explorer, a fun data interactive mostly coded between the hours of 10 PM to 2 AM during the week leading up to April Fools\u00e2\u20ac\u2122 Day, 2013. I\u00e2\u20ac\u2122m going to be looking at the things I learned, things I wish I could&hellip;<\/p>\n <a href=\"https:\/\/www.crccheck.com\/blog\/dissecting-elevators-part-5-nosql-circa-1986\/\" title=\"Dissecting Elevators Part 5: nosql circa 1986\" class=\"entry-more-link\"><span>Read More<\/span> <span class=\"screen-reader-text\">Dissecting Elevators Part 5: nosql circa 1986<\/span><\/a>","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"Layout":"","footnotes":""},"categories":[21,4],"tags":[55,48],"class_list":["entry","author-showmewhatyougot","post-308","post","type-post","status-publish","format-standard","category-case-study","category-technical","tag-nosql","tag-python"],"_links":{"self":[{"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/posts\/308","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/comments?post=308"}],"version-history":[{"count":11,"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/posts\/308\/revisions"}],"predecessor-version":[{"id":747,"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/posts\/308\/revisions\/747"}],"wp:attachment":[{"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/media?parent=308"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/categories?post=308"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.crccheck.com\/blog\/wp-json\/wp\/v2\/tags?post=308"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}