Sunday, March 22, 2015

Worldwide Airport Distribution

Feeling motivated after reading the Let's Make a Map tutorial by Mike Bostock, I decided to look for something to put on the world map. Thanks to the the data collected on OurAirports.com, I have something to drop on the map.
The final result is on http://cheow-thianliang.github.io/airports.html

The following are screenshots of Worldwide Airport Distribution
Worldwide Airport Distribution

Show case the feature of hiding unwanted category of airport and display of airport name when mouse over 

These are the steps taken to map the worldwide airport distribution:

  1. Read and understand Let's Make a Map tutorial by Mike Bostock.
  2. Download  airports.csv from OurAirports.com
  3. Create GeoJSON file for world map (Generate subunits.json as shown by Mike without the where clause)
  4. Convert airports.csv into GeoJSON using Python script
  5. Merge and convert those GeoJSON file created in step 3 and 4 into topojson (Refer to Mike's blog)
  6. Create airports.html with Javascript code to render the result in 5.

References:
  • Let's Make a Map tutorial by Mike Bostock
    http://bost.ocks.org/mike/map/
  • Airport data
    http://ourairports.com/data/
  • GeoJSON specification
    http://geojson.org/geojson-spec.html
  • Using d3-tip to add tooltips by Caged
    http://bl.ocks.org/Caged/6476579

Sunday, March 1, 2015

Migrating Django database from one vendor to another

Overview
Database migration involves
  • Recreating database schema (Database objects like tables, constraints, triggers, ...)
  • Copying of data

All code in this article are tested on Django 1.7.
Recreating database schema
Database objects for Django can be easily recreated in new database by using the migrate command targeting the new db, 

In case the migrate command does not work due to migrations generated using makemigrations command, we can attempt to delete all migrations folders and re-run the migrate command. Please remember to backup the source code before deleting any migrations folder (This will not be a concern if a proper source control has been setup).

Copying of data
There are at least three ways to copy data for Django across database of different vendor
  1. Use dumpdata and loaddata command
  2. Use custom Django script
  3. Use an external tool
1. Use dumpdata and loaddata command
This approach is provided by Django through the dumpdata and loaddata command. However, it might not be efficient when dealing with large database.

2. Use custom Django script
This solution originates from http://macrotoma.blogspot.com/2012/10/solved-move-django-from-sqlite-to-mysql.html
The original idea is to use ContentType model in Django to retrieve all models and copy the data to another db using the multiple databases feature in Django (https://docs.djangoproject.com/en/1.7/topics/db/multi-db/).
With slight modification as following, we are able to copy data to another database with a custom Django script
  1. Use apps.get_app_configs() to retrieve all models instead of using ContentType.
  2. Sort models using sort_dependencies in dumpdata command
  3. Add method to copy tables in ManyToManyFields (The value for primary key might be different, refer to the source code for explanation)
Custom Django script (move_data.py)

To use the script make sure that
  1. The default database connection in the Django settings is pointing to the source database (where the data to be migrated is stored).
  2. A new connection named new_db is pointing to the target database (where the data will be saved).
Example:


3. Use an external tool
The hard way ...
Most database comes with a client application that allows data to be dump in the form of INSERT statement. We can dump all data from source database in the form of INSERT statement and execute it against target database. 
Modification to the generated INSERT statements might be needed.
For example, the INSERT statement generated for django_admin_log table from Postgresql contains action_time '2014-11-05 07:23:07.826945+00', in order to execute this statement against Oracle db, we might need to change it to TO_TIMESTAMP_TZ('2014-11-05 07:23:07.826945+00',  'YYYY-MM-DD HH24.MI:SS.FF TZH:TZM')
Example:

With more advanced client ...
Some database vendors provide client application with feature to copy data from different sources. For example:
  1. SQL Server Import Export Wizard
  2. Oracle SQL Developer (the Migration feature)
Anyway, some form of data massaging similar to previously mentioned might be needed.
If none of the approach works, consider using an ETL (Extract, Transform and Load) tool.

Remarks:
For database vendor like Oracle and Postgresql that uses sequence to support auto-increment number, sequence must be reset to be more than its used value (Auto-increment number is used mostly a surrogate key, thus duplicate value will result in key conflict).

References: