Problem/Motivation
Background
Scenario: Source site with N content types. All CT have a node reference field configured to allow referencing nodes of each other CT.
For example:
- Content types: `news`, `events`, `gallery`.
- All of them with a node reference field `field_related_content` enabled to reference bundles: news, events, gallery.
So we create a migration setup with three migrations: myd6_news
, myd6_events
, myd6_gallery
. In order to migrate the node reference field each migration includes this process:
field_related_content:
plugin: iterator
source: field_related_content
process:
target_id:
plugin: migration
migration:
- myd6_news
- myd6_events
- myd6_gallery
source: nid
Since any content type can reference each other, we can't enforce a precedence between migrations. As known, a stub row will be created for referenced nodes not migrated yet. Afterwards, as each migration runs, the stub rows will be properly migrated with the actual contents from the source.
Problems
- The current migration is always used if it is found in the process configuration (ref: https://github.com/drupal/drupal/blob/8.3.x/core/modules/migrate/src/Plu...). In the exposed scenario: when running the `myd6_event` migration, `myd6_event` migration will be chosen to create the stub row.
- Because of 1), the
stub_id
configuration is ignored. - Because of 1), the stub row is added to the map of the current migration. Later, when the proper migration for the stubbed source row is run, it doesn't find a stub row in its map, because it is in the former migration map, and a second migration of the same source row is performed.
- OTOH the
MigrateExecutable
used to perform the import of the stub row is the one of the current migration in spite of the stub migration selected. So the stub row is created using a different process pipeline, leading to errors because it may not be prepared to create stub rows with default values and so. (Related: #2800279: Document that migrations used for stubbing need to deal with empty source values)
Let's ilustrate this with an example:
- Source: node 2 (news) references node 1 (event).
- Run migration
myd6_news
. It finds a reference to node 1 that can't solve. So it creates a stub row for source node 1 with destid 101. - The stub row is created using the
myd6_news
executable (because 1, sincemyd6_news
is present in the process configuration it is always selected and 2, in spite of the selected migration, theMigrateExecutable
of the current migration (myd6_news
) is always used) and 1:101 is added to the map ofmyd6_news
migration. - Source node 2 is migrated with destid 102. 2:102 is added to the map of
myd6_news
migration. - Run migration
myd6_event
. This migration doesn't know about the stub row created bymyd6_news
because it is not in its map. Source node 1 is migrated again with destid 103. 1:103 is added to the map ofmyd6_event
migration.
Proposed resolution
- (?) Respect stub_id configuration or fix documentation.
- Break up
Drupal\migrate\Plugin\migrate\process\Migration::transform()
in discrete methods, so it is easy to extend and override. - (?)Add alter hooks to some of the new methods.
- Set the
MigrateExecutable
to use the the stubbing migration.
Note: The proposed solution doesn't fix the problem, because we can't know out-of-the-box which migration corresponds to any source row. It only enables developers to build their own solutions.
Remaining tasks
Agree on a resolution.
Write a patch.
User interface changes
None.
API changes
No API changes.
API additions: new public methods in Drupal\migrate\Plugin\migrate\process\Migration::transform()
.
Data model changes
None.