Sunday, March 22, 2015

Worldwide Airport Distribution

Feeling motivated after reading the Let's Make a Map tutorial by Mike Bostock, I decided to look for something to put on the world map. Thanks to the the data collected on OurAirports.com, I have something to drop on the map.
The final result is on http://cheow-thianliang.github.io/airports.html

The following are screenshots of Worldwide Airport Distribution
Worldwide Airport Distribution

Show case the feature of hiding unwanted category of airport and display of airport name when mouse over 

These are the steps taken to map the worldwide airport distribution:

  1. Read and understand Let's Make a Map tutorial by Mike Bostock.
  2. Download  airports.csv from OurAirports.com
  3. Create GeoJSON file for world map (Generate subunits.json as shown by Mike without the where clause)
  4. Convert airports.csv into GeoJSON using Python script
  5. Merge and convert those GeoJSON file created in step 3 and 4 into topojson (Refer to Mike's blog)
  6. Create airports.html with Javascript code to render the result in 5.

References:
  • Let's Make a Map tutorial by Mike Bostock
    http://bost.ocks.org/mike/map/
  • Airport data
    http://ourairports.com/data/
  • GeoJSON specification
    http://geojson.org/geojson-spec.html
  • Using d3-tip to add tooltips by Caged
    http://bl.ocks.org/Caged/6476579

Sunday, March 1, 2015

Migrating Django database from one vendor to another

Overview
Database migration involves
  • Recreating database schema (Database objects like tables, constraints, triggers, ...)
  • Copying of data

All code in this article are tested on Django 1.7.
Recreating database schema
Database objects for Django can be easily recreated in new database by using the migrate command targeting the new db, 

In case the migrate command does not work due to migrations generated using makemigrations command, we can attempt to delete all migrations folders and re-run the migrate command. Please remember to backup the source code before deleting any migrations folder (This will not be a concern if a proper source control has been setup).

Copying of data
There are at least three ways to copy data for Django across database of different vendor
  1. Use dumpdata and loaddata command
  2. Use custom Django script
  3. Use an external tool
1. Use dumpdata and loaddata command
This approach is provided by Django through the dumpdata and loaddata command. However, it might not be efficient when dealing with large database.

2. Use custom Django script
This solution originates from http://macrotoma.blogspot.com/2012/10/solved-move-django-from-sqlite-to-mysql.html
The original idea is to use ContentType model in Django to retrieve all models and copy the data to another db using the multiple databases feature in Django (https://docs.djangoproject.com/en/1.7/topics/db/multi-db/).
With slight modification as following, we are able to copy data to another database with a custom Django script
  1. Use apps.get_app_configs() to retrieve all models instead of using ContentType.
  2. Sort models using sort_dependencies in dumpdata command
  3. Add method to copy tables in ManyToManyFields (The value for primary key might be different, refer to the source code for explanation)
Custom Django script (move_data.py)

To use the script make sure that
  1. The default database connection in the Django settings is pointing to the source database (where the data to be migrated is stored).
  2. A new connection named new_db is pointing to the target database (where the data will be saved).
Example:


3. Use an external tool
The hard way ...
Most database comes with a client application that allows data to be dump in the form of INSERT statement. We can dump all data from source database in the form of INSERT statement and execute it against target database. 
Modification to the generated INSERT statements might be needed.
For example, the INSERT statement generated for django_admin_log table from Postgresql contains action_time '2014-11-05 07:23:07.826945+00', in order to execute this statement against Oracle db, we might need to change it to TO_TIMESTAMP_TZ('2014-11-05 07:23:07.826945+00',  'YYYY-MM-DD HH24.MI:SS.FF TZH:TZM')
Example:

With more advanced client ...
Some database vendors provide client application with feature to copy data from different sources. For example:
  1. SQL Server Import Export Wizard
  2. Oracle SQL Developer (the Migration feature)
Anyway, some form of data massaging similar to previously mentioned might be needed.
If none of the approach works, consider using an ETL (Extract, Transform and Load) tool.

Remarks:
For database vendor like Oracle and Postgresql that uses sequence to support auto-increment number, sequence must be reset to be more than its used value (Auto-increment number is used mostly a surrogate key, thus duplicate value will result in key conflict).

References:

Wednesday, February 11, 2015

Connecting Django to Oracle database using service name

These are steps for connecting Django to Oracle database using service name. It is tested with Django 1.7 running on Ubuntu 14.04.
  1. Ensure that cx_oracle is installed (Please see the References section for details).
  2. Locate/ create file tnsnames.ora
    In case Oracle DB or client is installed, you can try to locate the file under ORACLE_HOME/network/admin/
  3. Add an entry for the target host and service name to tnsnames.ora
    Example:
  4. Ensure that TNS_ADMIN environment variable is set to the folder containing tnsnames.ora
  5. Change the database connection in Django settings file to
  6. Now, you are ready to connect to Oracle db.
References:
  • http://codeforaliving.blogspot.com/2015/02/installing-cxoracle-on-ubuntu_73.html
  • http://www.orafaq.com/wiki/Tnsnames.ora
  • https://docs.djangoproject.com/en/1.7/ref/databases/ (Please see the section related to Oracle)

Installing cx_oracle on Ubuntu

The following instructions were tested on Ubuntu 14.04. The main challenge is installing Oracle Instant Client. It is assumed that no Oracle DB or client were previously installed. Otherwise, you can install cx_oracle with pip straight away.
  1. Download oracle instant client both basic and sdk zipped packages from http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html. Example:
    instantclient-basic-linux.x64-12.1.0.2.0.zip
    instantclient-sdk-linux.x64-12.1.0.2.0.zip
  2. Create folder /opt/oracle
    sudo mkdir -p /opt/oracle
  3. Unzip both the basic and sdk packages to /opt/oracle
    sudo unzip instantclient-basic-linux.x64-12.1.0.2.0.zip -d /opt/oracle
    sudo unzip instantclient-sdk-linux.x64-12.1.0.2.0.zip -d /opt/oracle
  4. Set ORACLE_HOME and LD_LIBRARY_PATH to unzipped folder. You can choose to
    • Use export command
      export ORACLE_HOME=/opt/oracle/instantclient_12_1
      export LD_LIBRARY_PATH=/opt/oracle/instantclient_12_1
    • Put the variables into /etc/environment (You may need to log out from the shell and log in again for the settings to take effect)
      ORACLE_HOME=/opt/oracle/instantclient_12_1
      LD_LIBRARY_PATH=/opt/oracle/instantclient_12_1
  5. Now you can use pip to install cx_oracle
    pip install cx_oracle
References: