Drupal7: Migrate Module steps and SourceMigrations

David G - DrupalIt’s no mystery I use the Migrate module alot on projects involving Drupal. Some aspects of the Migrate module can be tricky. An example of this is composing your own primary keys for legacy content that doesn’t yet have a key, another example of tricky business in Migrate-land is making use of SourceMigrations. Recently I helped someone on StackExchange make proper use of sourceMigration() method in code.

So what is the sourceMigration method? Well lets take a small step back. A Migration is drupal is the process of moving source legacy content into Drupal. Per item of content the migrate framework allows you to:

  • Define the source location (SQL table, CSV file, XML file, JSON file, etc)
  • Define the destination (What kind of piece of content, a Node, a Taxonomy Term, a Relation, a Field Collection, an SQL row in a generic table, etc).
  • Define the mappings between fields from the legacy source data to the destination data type.
  • Sanitize the data source (Migrate’s prepare_row() method).
  • Alter the data upon destination creation/insertion (Migrate’s prepare() method).
  • Perform task(s) prior or after to Rollback or Completion of a migration.

In short you have a framework to interact with the lifecycle of migrating a singular piece of content between systems, and a nice set of commands to run tests or batch up the process and leave it running until completion.

So in the step above I outlined for defining the mappings between fields — the migrate module allows you to specify the sourceMigration for the content to retrieve primary keys from.

This is useful when you have a Migration with many steps and subsequent steps depend on the new ID of a piece of content within the destination system, which you know the old ID from easily in your source data.

So while I describe the solution in the post I linked above, I actually have real-world in another StackExchange post on the same subject: Multivalue User Images.

In this code snippet the legacy site has a list of media files tied to political individuals of U.S. history. The list of political individuals is sadly not in 1 table. It comes from 2 legacy sources: an SQL table, and hardcoded PHP logic (ewww).

So 2 migrations initially move over these individuals as drupal Nodes — and a 3rd migration moves over the related media. So in this migration of Media the source individual is derived from:

    // What legacy content are we talking about ?
    // also we want this legacy ID to actually lookup through 2 source migrations.
    $this->addFieldMapping('nid', 'id')->sourceMigration(array('PrezDietNonPresidents','PrezDietPresidents'));

This means:

Find the new ID of the political figure in U.S. history from within the lists of legacy non-presidents and then the list of presidents; in that order.

Note, that once an ID is found it is returned to the system. So when using sourceMigration() you must be careful where your IDs show up — because it’s first found, first used. If needed you may order the list of source migrations to suite your needs (or return them from say a php function) so that the ordering of the lookup matches your ncessary business logic. In short I’m saying array(‘PrezDietPresidents’, ‘PrezDietNonPresidents’) would cause all Presidents to be scanned as the valid ID prior to all non-presidents. The ordering of Source Migrations in this list is dependent on your own needs.

So I think this is an important topic to understand because as the recent inquiry shows. If you don’t understand this topic you may end up with content that is migrated and apparently migrated to the wrong user, or incorrect foreign key relationship from your legacy data!

Looking for quality web hosting? Look no further than Arvixe Web Hosting!

Tags: , , , | Posted under Drupal, Drush | RSS 2.0

Author Spotlight

David Gurba

I am a web programmer currently employed at UCSB. I have been developing web applications professionally for 8+ years now. For the last 5 years I’ve been actively developing websites primarily in PHP using Drupal. I have experience using LAMP and developing data driven websites for clients in aviation, higher education and e-commerce. If you’d like to contact me I can be reached at david.gurba@arvixe.com

Leave a Reply

Your email address will not be published. Required fields are marked *