Drupal 7: DKAN Distribution


David G - DrupalWorking in higher education, sometimes I have clients that want to make a public set of data releasable to the public. Typically these collections of data are known as datasets. Example datasets can be radar data, fish populations off the coast near coastal oil rigs, political election results, approval ratings, etc. Usually I simply provide a file download link for the data which the client provides as a CSV, or supported file format. Since I’ve been doing alot of this type of research lately I’ve looked into possible Drupal modules, or configurations that would help with these types of request. I discovered the Drupal DKAN distribution maintained by NuCivic.

To quote the NuCivic DKAN website:

DKAN is a open source open data platform with a full suite of cataloging, publishing and visualization features that allows organizations to easily share data with the public.

So NuCivic provides a collection of pre-configured modules that allow you to create a website to publicly share your Open data. The DKAN project is based off of the CKAN project, as so far as I can tell both are approved U.S. government formats to openly share, and interpolate, data sets. Put another way, CKAN is the original idea of the sharable data format and is written and maintained using Python and Postgres — while DKAN is a Drupal implementation of the same data formats but written and maintained atop the Drupal platform and community modules.

DKAN offers a wealth of options out of the box. Such as:

  • User Groups
  • Support for custom datasets and meta data associated with those datasets
  • Support for multiple formats of a dataset. Such as CSV, JSON, XML, OPML.
  • Support for organization of your datasets into DataStores and categorizing all that data into Groups.
  • Support for previewing and visualizing datasets using charts and interactive widgets and searching. Support for really large uploaded datasets.
  • Faceted Searches through datasets
  • Provides out of the box fine-grained directions per most any Form you’ll interact with to upload and manage your data.
  • The distribution provides a basic Drupal theme which you’re free to alter to your custom needs.
  • Provide a public API for your data to be reachable from other webservices.
  • NuCivic maintains a documentation site which contains an overview of the services, workflows and possibilities afforded by using DKAN.

Whew, those are alot of options! Let’s take a look at some example websites that use DKAN. The NuCivic people maintain a list of sites powered by DKAN. One adopter of DKAN to openly share data is the city of Oakland in California, USA (just a short drive up from where I live on the Central Coast :D).

City of Oakland open data website.

City of Oakland open data website.

As you can see on their homepage they make use of categorizing their datasets into Groups. Within the configuration of the DKAN distribution this capability is supported by using the great Organic Groups module. The DataStores on the website are associated with Groups the city can define freely, such as: Housing, Education, Infrastructure, etc.

If we examine the Group types of data the city of Oakland is making browsable we are given an overview of how many datasets each category currently has, and a brief description of the category:

Grouped datasets available for the city of Oakland.

Grouped datasets available for the city of Oakland.

If I drill down into say the Demographics group of datasets I am presented with user-friendly page that lists the various datasets, the data set format(s) and provides a wealth of search functionality across the metadata associated with the datasets and group.

Dataset details of the Demographics group of data.

Dataset details of the Demographics group of data.

Lastly, if I drill down even further into the Demographics -> Social Services set of data and select the 2013 Social Services By Tract dataset and view the CSV version of the data. I’m presented with a searchable table of the data that displays 50 rows of the dataset at any time as a preview. This is a great interactive search tool to browse the data prior to downloading it myself.

Example of searchable table of previously uploaded CSV data. This data is transferred to a MySQL table for efficiency.

Example of searchable table of previously uploaded CSV data. This data is transferred to a MySQL table for efficiency.

I think all this great plumbing and infrastructure provided by the DKAN module is simply Awesome! So why am I discussing this here? Components of the DKAN distribution make good use of Features in Drupal 7. So features of a DKAN powered site can be used in other websites outside of a DKAN installation. So — for example as a developer I can grab their datastore module and apply it to my own website and project (even if it’s not an open data project). If I really need it I could even grab their dataset module which helps to group data into sets.

I intend to try and use sub-modules of the DKAN project to power storage of presidential approval ratings and election data in the Presidency project I have been mentioning in various blog posts here. I find the ability to upload a CSV file of data and making it into a drupal-aware database really really neat! This is provided by the DKAN Datastore module.

I hope this bit of research and exopsure to this great Drupal distribution has shed light on some of the complex workflows and features it’s possible to create using Drupal!

Looking for quality web hosting? Look no further than Arvixe Web Hosting!


Tags: , , , | Posted under Drupal | RSS 2.0

Author Spotlight

David Gurba

I am a web programmer currently employed at UCSB. I have been developing web applications professionally for 8+ years now. For the last 5 years I’ve been actively developing websites primarily in PHP using Drupal. I have experience using LAMP and developing data driven websites for clients in aviation, higher education and e-commerce. If you’d like to contact me I can be reached at david.gurba@arvixe.com

Leave a Reply

Your email address will not be published. Required fields are marked *