Drupal 7: Feeds Import to bulk insert content

David G - DrupalRecently I was asked (again) to create a bulk import mechanism for an upcoming Drupal website I’m currently developing. Truth be told, I typically use the Migrate module to bulk import content into Drupal websites. I’m comfortable with PHP and I’ve used the migrate module in 3+ large sized projects to move thousands of Nodes into Drupal. But, for this project the client will likely want an ongoing, more organic import process. So, I have opted to attempt to use the feeds module to bulk import content. This blog post will be an overview of the Feeds module.

What is the Feeds module

From the project page:

Import or aggregate data as nodes, users, taxonomy terms or simple database records.

  • One-off imports and periodic aggregation of content
  • Import or aggregate RSS/Atom feeds
  • Import or aggregate CSV files
  • Import or aggregate OPML files
  • PubSubHubbub support
  • Create nodes, users, taxonomy terms or simple database records from import
  • Extensible to import any other kind of content
  • Granular mapping of input elements to Drupal content elements
  • Exportable configurations
  • Batched import for large files

So, similar to the Migrate module you can take a bulk amount of remote content, apply mappings from the remote field(s) into Drupal content fields. The Feeds module supports many default formats used to share content across the web; such as RSS Feeds, and supports an extensible plugin system to provide new formats and data fetchers.

Extending the module is beyond the scope of this blog post. But, I will be personally attempting to extend Feeds at a later date to import from a XSD validated XML data source and import as a custom drupal content type. At a later date I will likely write a blog post on this extension work to Feeds, but not today.

Setting up a Simple Import Feed to Drupal

So how can we use this module. For this example I’ll import into a custom Document content type from a CSV file. I’ve come to loathe CSV files due to their lack of being any sort of standard or usable specification — in the future I want to use an XML file.

For my research I created a simple custom content type called Document:

A simple Drupal content type for a Document which contains a Title, Body and Type of document.

A simple Drupal content type for a Document which contains a Title, Body and Type of document.

Then I attempted to configure a Feed Importer to process CSV entries into this content type.

My first step in creating my custom feed importer was simply to clone the existing default Node importer. You may wonder why I took this step rather than creating an Importer from scratch?! I know I want to import into Nodes, and the default node importer has sane defaults I can use (as examples) for my own project. By creating a cloned importer; should I choose to use Features or export my feed importer to another site, then hopefully it will not interfere with existing configuration in another project because I overrode the Defaults of the module.

Upon cloning the Feed Importer it shows up in the Administration area for Feeds.

Step 1: Clone the existing Node Importer and name properly for my project.

Step 1: Clone the existing Node Importer and name properly for my project.

I then customized the importer a little beyond the sane defaults of the Node importer:

Master feed configuration page for my custom importer.

Master feed configuration page for my custom importer.

Set the basic settings of the importer such as it's name, and whether its a standalone script or attached to a node page, and whether its a periodic feed.

Set the basic settings of the importer such as it’s name, and whether its a standalone script or attached to a node page, and whether its a periodic feed.

Configure how this feed Fetches data. In general my goals will be to take file upload(s) and process them. The defaults here are fine for me.

Configure how this feed Fetches data. In general my goals will be to take file upload(s) and process them. The defaults here are fine for me.

Next we set the File Upload criteria. I want my source data to reside in a remote_documents folder on the server, in order to organize my import dataset.

Next we set the File Upload criteria. I want my source data to reside in a remote_documents folder on the server, in order to organize my import dataset.

Then we select a parser method for our fetched data. I opted for a CSV parser for simplicity.

Then we select a parser method for our fetched data. I opted for a CSV parser for simplicity.

Since we're using CSV parsing there are some common settings for CSV parsing Feeds offers, but again the defaults are sane.

Since we’re using CSV parsing there are some common settings for CSV parsing Feeds offers, but again the defaults are sane.

Lastly we inform Feeds what bundle this data should be saved as. If we encounter previously processed IDs then Update them. Also I created a sample user to perform the import as.

Lastly we inform Feeds what bundle this data should be saved as. If we encounter previously processed IDs then Update them. Also I created a sample user to perform the import as.

Note: in the above process I created a Joe Gaucho user for the Importer to use as the Author of new content. This is a habit of mine where any content ingested into the system is given a canned system user identity. This is to more easily backtrack where data came from in the system.

After the above general configuration I set the mappings of the source CSV file fields to the destination Node bundle fields:

We setup a mapping table of source to destination fields. Here I added my custom taxonomy term field, and removed a default published date field.

We setup a mapping table of source to destination fields. Here I added my custom taxonomy term field, and removed a default published date field.

After all the above customization of folder paths, and what Bundle (content type) I was tying the feed importer too. I was able to visit the global import page provided by the Feeds module at siteurl.com/import.

Global Feed import url lists out all import mechanisms configured for the system.

Global Feed import url lists out all import mechanisms configured for the system.

I then attempted an import of sample content. … And, nothing happened. I switched the Basic Settings: Process in Background option to Import on Submission in order to debug this further. It turns out I was receiving an error (that oddly enough wasn’t showing up in the Watchdog report, and I required Googling to find this solution):

My helpful system user to import content -- shoots myself in the foot! Hooray for oversights!

My helpful system user to import content — shoots myself in the foot! Hooray for oversights!

So, I altered my permissions and re-tried the submission:

After amending my user permissions the import succeeded.

After amending my user permissions the import succeeded.

Hooray! Success and there was much rejoicing! Here is proof the import worked and respected the data and mappings configured:

Example of Admin Content listing showing newly created content under the Joe Gaucho user.

Example of Admin Content listing showing newly created content under the Joe Gaucho user.

Example live node of the 1 node I imported with actual real HTML in the body of the node.

Example live node of the 1 node I imported with actual real HTML in the body of the node.

I hope you can see how powerful the Feeds module can be. As I mentioned I’m accustomed to using the Migrate module — but with some effort I believe clients can make better use of the Feeds module as it’s (slightly) more UI driven from within Drupal and doesn’t require necessarily so much custom PHP code as the Migrate module.

Looking for quality web hosting? Look no further than Arvixe Web Hosting!

Tags: , , , | Posted under Drupal | RSS 2.0

Author Spotlight

David Gurba

I am a web programmer currently employed at UCSB. I have been developing web applications professionally for 8+ years now. For the last 5 years I’ve been actively developing websites primarily in PHP using Drupal. I have experience using LAMP and developing data driven websites for clients in aviation, higher education and e-commerce. If you’d like to contact me I can be reached at david.gurba@arvixe.com

Leave a Reply

Your email address will not be published. Required fields are marked *