Recently I was asked (again) to create a bulk import mechanism for an upcoming Drupal website I’m currently developing. Truth be told, I typically use the Migrate module to bulk import content into Drupal websites. I’m comfortable with PHP and I’ve used the migrate module in 3+ large sized projects to move thousands of Nodes into Drupal. But, for this project the client will likely want an ongoing, more organic import process. So, I have opted to attempt to use the feeds module to bulk import content. This blog post will be an overview of the Feeds module.
What is the Feeds module
From the project page:
Import or aggregate data as nodes, users, taxonomy terms or simple database records.
- One-off imports and periodic aggregation of content
- Import or aggregate RSS/Atom feeds
- Import or aggregate CSV files
- Import or aggregate OPML files
- PubSubHubbub support
- Create nodes, users, taxonomy terms or simple database records from import
- Extensible to import any other kind of content
- Granular mapping of input elements to Drupal content elements
- Exportable configurations
- Batched import for large files
So, similar to the Migrate module you can take a bulk amount of remote content, apply mappings from the remote field(s) into Drupal content fields. The Feeds module supports many default formats used to share content across the web; such as RSS Feeds, and supports an extensible plugin system to provide new formats and data fetchers.
Extending the module is beyond the scope of this blog post. But, I will be personally attempting to extend Feeds at a later date to import from a XSD validated XML data source and import as a custom drupal content type. At a later date I will likely write a blog post on this extension work to Feeds, but not today.
Setting up a Simple Import Feed to Drupal
So how can we use this module. For this example I’ll import into a custom Document content type from a CSV file. I’ve come to loathe CSV files due to their lack of being any sort of standard or usable specification — in the future I want to use an XML file.
For my research I created a simple custom content type called Document:
Then I attempted to configure a Feed Importer to process CSV entries into this content type.
My first step in creating my custom feed importer was simply to clone the existing default Node importer. You may wonder why I took this step rather than creating an Importer from scratch?! I know I want to import into Nodes, and the default node importer has sane defaults I can use (as examples) for my own project. By creating a cloned importer; should I choose to use Features or export my feed importer to another site, then hopefully it will not interfere with existing configuration in another project because I overrode the Defaults of the module.
Upon cloning the Feed Importer it shows up in the Administration area for Feeds.
I then customized the importer a little beyond the sane defaults of the Node importer:

Set the basic settings of the importer such as it’s name, and whether its a standalone script or attached to a node page, and whether its a periodic feed.

Configure how this feed Fetches data. In general my goals will be to take file upload(s) and process them. The defaults here are fine for me.

Next we set the File Upload criteria. I want my source data to reside in a remote_documents folder on the server, in order to organize my import dataset.

Since we’re using CSV parsing there are some common settings for CSV parsing Feeds offers, but again the defaults are sane.

Lastly we inform Feeds what bundle this data should be saved as. If we encounter previously processed IDs then Update them. Also I created a sample user to perform the import as.
Note: in the above process I created a Joe Gaucho user for the Importer to use as the Author of new content. This is a habit of mine where any content ingested into the system is given a canned system user identity. This is to more easily backtrack where data came from in the system.
After the above general configuration I set the mappings of the source CSV file fields to the destination Node bundle fields:

We setup a mapping table of source to destination fields. Here I added my custom taxonomy term field, and removed a default published date field.
After all the above customization of folder paths, and what Bundle (content type) I was tying the feed importer too. I was able to visit the global import page provided by the Feeds module at siteurl.com/import.
I then attempted an import of sample content. … And, nothing happened. I switched the Basic Settings: Process in Background option to Import on Submission in order to debug this further. It turns out I was receiving an error (that oddly enough wasn’t showing up in the Watchdog report, and I required Googling to find this solution):
So, I altered my permissions and re-tried the submission:
Hooray! Success and there was much rejoicing! Here is proof the import worked and respected the data and mappings configured:
I hope you can see how powerful the Feeds module can be. As I mentioned I’m accustomed to using the Migrate module — but with some effort I believe clients can make better use of the Feeds module as it’s (slightly) more UI driven from within Drupal and doesn’t require necessarily so much custom PHP code as the Migrate module.
Looking for quality web hosting? Look no further than Arvixe Web Hosting!