Drupal7: Migrating Legacy Images with Nodes

David G - DrupalIn the Presidency project I’m working on the legacy website had many types of photos associated with a person. While I’ve moved database fields between legacy sites and Drupal site before — I’ve never had the need to migrate image files before. The more recent Migrate module newer than version 2.5 had improvements to how file migrations can occur. Here is how I moved some files over from a legacy site to Drupal 7.

Prior to Migrate 2.4 you had to use alot of $arguments passed into your field mappings to migrate files. I was aware of this and haven’t had much cause to migrate files between sites before. Migrate 2.4 added subtypes to field mappings and more mechanism to migrate files — and this is now how you define details about a file (field) migration.

In my migration class for a Person that has a pictures drupal Field this is the migration field mapping to destination from source:

    // Map the new drupal field_diet_pictures field from the
    // legacy president_photos database column.
    $this->addFieldMapping('field_diet_pictures', 'president_photos');

    // migration file subfield configuration.
    // DPG 04-09-2015
    // dest_file is now supplied from filefield_sources module content type configuration. I also use PathAuto
    // and Transliteration to cleanup the source filenames form 35.jpg to
    // lyndon-johnson[_N].jpg. But, sadly this still preserves periods found in
    // presidents names.
    //$this->addFieldMapping('field_diet_pictures:destination_file', 'dest_file1');
//    $this->addFieldMapping('field_diet_pictures:destination_dir')->defaultValue('public://people/photos');

So when migrating images you supply a file_class subfield. This is really cool because if we set it to use MigrateFileURI, then it expects a legacy server filepath to copy/paste the file into Drupal. This is file-copy behavior is described in the Migrate 2.4 API changes linked above:

  • MigrateFileUri is the default file class, and does not need to be specified. It accepts a local file path or a remote URI, and copies (if necessary) the file at that location into the Drupal file system.
  • MigrateFileBlob accepts a variable containing the actual file data (presumably coming from a database blob field) and saves it to a file in the Drupal file system.
  • MigrateFileFid is something of a degenerate case, and only applicable to file fields – when you create a separate file migration, and need to link a file field in a later migration to one of the previously-migrated files, you simply pass its fid in the mapping to the file field with this class specified to make the link.

So the subfields describe what type of backend storage the file is coming from, where it’s destination is, what to do on filename conflicts and what HTML alt and title tags to use for the <IMG> tag.

The only caveat I have in this code is the commented out destination information you would usually supply in migrate subfield information. In my project I’m using the FileField Paths module for photo fields. I can then configure the new filename of the created files to be tokens of the Node being created and define what folder the content should live in:

FileField Paths configuration for a presidents general photos. The filename is change from NUMBER.jpg to NAME.jpg

FileField Paths configuration for a presidents general photos. The filename is change from NUMBER.jpg to NAME.jpg

The last bit of magic I do in this migration is for the legacy file path information used for president_photos source field. Generally you’d think this value comes from a database column — it doesn’t. In the legacy application presidents had photo(s) in a directory named as [PRESIDENT_NUMBER][a-z]?.jpg for example. So I needed to:

  1. Tell Migrate in my SQL statement that there are image files.
  2. build a list of the legacy photo paths and pass them to Migrate for a multi-value image field.

We can do (1) this by altering our SQL statement to feed the person migration:

    $query = Database::getConnection('default', 'legacy_migration')
      ->select('presidents', 'prez')
      ->fields('prez', $diet_fields);
    // Add sql pseudofields for picture(s). Most presidents have 1 pic, some have
    // more than 1 picture. These psuedofields are for all media files tied to
    // this person in some way.
    // These pseudofields get their values set in prepare_row().
    $query->addExpression("NULL", 'president_photos');
    $query->addExpression("NULL", 'inaugural_photos');
    $query->addExpression("NULL", 'name_photos');
    $query->addExpression("NULL", 'signature_photos');
    // Add a field to store the presidential number per President.
    $query->addExpression('NULL', 'presidential_number');
    $options = array('track_changes' => 1);

    $count_query = Database::getConnection('default', 'legacy_migration')
      ->select('presidents', 'p');
    $count_query->addExpression('COUNT(id)', 'cnt');

    $this->source = new MigrateSourceSQL($query, $diet_fields, $count_query, $options);
    $this->destination = new MigrateDestinationNode('prez_diet');

Then in prepareRow() of the MigrateAPI I can deduce any file path(s) for this president (step 2):

  public function prepareRow($row) {
    // Always include this fragment at the beginning of every prepareRow()
    // implementation, so parent classes can ignore rows.
    if (parent::prepareRow($row) === FALSE) {
      return FALSE;

    $row->commonname = PrezMigration::html5_tidy($row->commonname);
    $row->presidential_number = $row->id;

    // This is where we provide Drupal with legacy file image path(s) via a lookup.
    $row->president_photos = $this->locate_president_photos($row->id);
    $row->inaugural_photos = $this->locate_president_inaugural_photos($row->id);
    $row->name_photos = $this->locate_president_name_photos($row->id);
    $row->signature_photos = $this->locate_president_signature_photos($row->id);
    return TRUE;

What is my actual filepath lookup code? Well here is 1 lookup function — they are all basically the same except that each image set was stored in a different directory in the legacy server and some where always jpegs, some where gifs sometimes, some where pngs … so I changed the regular expression I used as needed:

  protected function locate_president_photos($legacy_president_id) {
    $files = array();
    $directory = new DirectoryIterator('/project/build/legacy/htdocs/www/images');
    $it = new RegexIterator($directory, '#^' . $legacy_president_id . '[a-z]{0,1}.jpg$#i');
    foreach ($it as $info) {
      if ($info->isDot()) {
      if ($info->isFile()) {
        $files[] = $info->getFilename();
    return $files;

The only caveat I needed a reminder when writing this is that a multi-value Drupal field can accept an Array of values from say prepareRow() (or by a string list supplied some seperator value).

All these pieces working together allow image(s) per president to be moved from the legacy site into Drupal as fields associated to a newly created Node.

Looking for quality web hosting? Look no further than Arvixe Web Hosting!

Tags: , , , , | Posted under Drupal, Drush | RSS 2.0

Author Spotlight

David Gurba

I am a web programmer currently employed at UCSB. I have been developing web applications professionally for 8+ years now. For the last 5 years I’ve been actively developing websites primarily in PHP using Drupal. I have experience using LAMP and developing data driven websites for clients in aviation, higher education and e-commerce. If you’d like to contact me I can be reached at david.gurba@arvixe.com

Leave a Reply

Your email address will not be published. Required fields are marked *