Drupal 7: Dkan_Datastore module, create SQL tables from file data

David G - DrupalThe DKAN_Datastore module is a submodule of the DKAN distribution by Nucivic. It allows you to take files uploaded to drupal as file fields and create an SQL table which Drupal is aware of using the Data module, and supports Views integration. I’ll give a brief overview of a usecase and example usage outside of DKAN in this blog post.

As an example. I have alot of presdential approval rating per-day that accumulate over time. Barak Obama has ~2,200 rows of data in approval ratings due to his time in office. The legacy data for each president is stored in a table with the following format:

Legacy data_approval database schema example.

Legacy data_approval database schema example.

And there are a number of entries in this dataset, especially for Obama:

Count of approval rating entries per president.

Count of approval rating entries per president.

I would love to take the entries in a legacy DB table and migrate these values into Drupal with the following in mind:

  • Don’t completely change the format of the legacy data if possible. This is because the client is used to a certain workflow for updating this data, this data is updated daily.
  • Make the new Drupal data easy to update by campus or student staff.
  • The data should be available to other subsystems within Drupal. Ideally Views or other modules should be able to represent this dataset succinctly. Yes, I could make a custom Entity, but that too seems like a far amount of work.

In my opinion this module is not very well known. But, if you think about it there can be pieces of a website such as datasets you want to store on a website where a Field Collection or Multifield simply are too cumbersome a user interface and the default backend storage of the drupal Fields is too normalized. As a further example I’d love to make charts of these sets of data — and typically every dataset will have it own unique columns. This again makes using Drupals built-in storage mechanisms difficult.

By this I mean the default Drupal storage engine of SQL makes 1 Field item in a content type. If I made a content type called Approval Ratings it would have about 8 to 9 legacy fields mapped to it. Just for approval ratings that would equate to possibly 8-9 new database tables required in my websites overall database schema. Ouch! Now compound that by other datasets such as electoral polls, and other stats measured over time and you have problem of a million content types and externaneous database tables trying to model all these small collections of data related to an entity in the website.

Note: you may wonder well, why not use a different storage engine such as MongoDB for storing this type of data into a drupal content type. Well, at my university, and for this project, we have a limited number of approved webhosts. I cannot install and maintain the server infrastructure myself — so I have to make due with the technology stack available to me.

So my attempted solution to this type of problem involves using the Datastore module. I want to create a simple Node type called Dataset (I’m using the machine name prezdataset as my project is named internally the “Presidency Project”). In this simple Content Type I have:

  • Title:  a string.
  • Body:  a description and summary description of the set of data.
  • Related Entities:  an Entity Reference field with N allowed values.
  • Data Resource:  a file containing the dataset info. In this case typically a CSV file of a dumb of all, or part of the columns from the legacy database. I will likely dump data_approval ratings per president and make 1 dataset entry per president in this case.

Using this content type the client’s staff can easily upload the data to the website. A 2nd step of their process is to update the datastore (database table) by have drupal read the file and create a database table. That is what Datastore allows you to do. Let me show you with some example pictures:

This is my simple content type in drupal for datasets:

Custom content type that will use dkan_datastore integration.

Custom content type that will use dkan_datastore integration.

So the content type can store entity references to related pieces of drupal content that this dataset belongs to (a President, a Presidential Election) and stores the file containing the (tabular) data. After saving the data you may then “Manage the datastore” for this data. That is to say you can turn the file into an SQL table. This additional tab and interface is what Datastore integrates into a normal Node; by providing plumming for the Data and Feeds modules.

The "Manage Datastore" tab is a new feature added by the dkan_datastore module.

The “Manage Datastore” tab is a new feature added by the dkan_datastore module.

Example having imported a simple, generic CSV of test data.

Example having imported a simple, generic CSV of test data.

After this simple 2-step process our CSV data has been processed by Feeds and is visible to the Data module and vieweable and editable using a Drupal View.

The custom data is imported using a Feeds processor into a custom  table visible by Data and Views.

The custom data is imported using a Feeds processor into a custom table visible by Data and Views.

Example View of the data, note that this is just a View and it could be fine-tuned using the Views admin UI.

Example View of the data, note that this is just a View and it could be fine-tuned using the Views admin UI.

In general, that is an example of what dkan_datatstore can allow you to consume flat files of data can create simple SQL tables of your data.

To make full use of this module in a custom Content Type I did implement 2 hook functions in a custom module, but the module can be installed and used without these hooks if you name your content type Resource and give it a file upload field. I wanted to name my module and fields more specifically for my project — so I am required to implement these hooks:

 * Implements  hook_dkan_datastore_file_upload_field_alter().
function prez_dataset_dkan_datastore_file_upload_field_alter(&$field) {
  $field = 'field_prezdata_resource';

 * Implements  hook_dkan_datastore_node_type_alter().
function prez_dataset_dkan_datastore_node_type_alter(&$node_type) {
  $node_type = 'prez_dataset';

Lastly, the dkan_datastore module is integrated with Drush to update and create datastores. So I believe for instance a client could be given FTP access to the website files directory and upload an updated CSV daily — a Cron task could then call a drush command to update the sql table as needed.

I hope you can see how beneficial this module could be. I also see this module as a good example of Features usage in Drupal 7 and how to extend Drupal functionality with custom code.

Looking for quality web hosting? Look no further than Arvixe Web Hosting!

Tags: , , , , | Posted under Drupal, Drush | RSS 2.0

Author Spotlight

David Gurba

I am a web programmer currently employed at UCSB. I have been developing web applications professionally for 8+ years now. For the last 5 years I’ve been actively developing websites primarily in PHP using Drupal. I have experience using LAMP and developing data driven websites for clients in aviation, higher education and e-commerce. If you’d like to contact me I can be reached at david.gurba@arvixe.com

Leave a Reply

Your email address will not be published. Required fields are marked *