Note: I'm opening this issue in the Drupal core queue for discussion, but I think any initial implementation should be done first in contrib, to make sure it's robust and well-tested in the wild before adding to core.
Problem/Motivation
See #3375371: [meta] Improve the page building experience for background discussion on improving Drupal's page building experience.
Layout Builder's data model is based on sections and blocks. You can put blocks into sections, but you can't put sections into sections, sections into blocks, or blocks into blocks. There are some workarounds to this, such as enabling Layout Builder on your block types, which then does let you put sections into those blocks, and then other blocks into those sections, but the UI for this is cumbersome, and the resulting data model when you do this ends up being complex, especially when compared to something like Gutenberg, where you can smoothly add "blocks" (Gutenberg blocks, not Drupal blocks) inside of "blocks", building as deep of a hierarchy as you want, all within a single cohesive UI.
Also, Layout Builder's layout_builder__layout_section
column is a serialized PHP array, so can't be queried by the database, and in some cases that serialized PHP array can be quite large, which can lead to huge database sizes when there's lots of nodes, revisions, languages, etc. See #3267444: To reduce database size, convert layout_builder__layout_section column to a hash of the content.
A lot of sites prefer to use Paragraphs over Layout Builder for page building, but Paragraphs has its own scaling challenges, because every time you create a new node revision, a new paragraph revision gets created for every paragraph (and for deep structures, every paragraph's child/descendant paragraphs), including for all the paragraphs unchanged by that particular node revision, and all those new revisions end up writing database records for every translation, even though the revision is only changing the content for one of those languages.
There's also the question of should paragraphs or layout builder inline blocks even be their own entities to begin with. They were modeled that way in order to make them fieldable, and fields can only be added to entities, but modeling them as entities comes with its own baggage (performance issues from them having their own CRUD hooks, highly nested database transactions when saving, having them show up as separate JSON:API resources from the node or needing special code to avoid that, etc.).
Both Paragraphs and Layout Builder were created when Drupal still had to support MySQL versions that lacked JSON support. Their data models are reasonable given the constraints of relational databases. But MySQL, PostgreSQL, and SQLite all support JSON now, which gives us an incredible tool/opportunity to create a more optimal data model.
Requirements
Config vs. content
Layout building needs to support multiple 'modes'. For 'bundle templates', or specific use cases like the navigation module, this only involves writing configuration, not entity data. This issue is primarily concerned when LB/XB is used to create individual entity content.
For individual entity content, layout builder will support both 'fixed' and 'loose' sections. For example the top section of a bundle could have particular, fixed, components source from traditional field API fields - such as a lead image and tags field.
However, underneath, there could be 'loose' content where content editors can choose an arbitrary number of components and populate them with content - this can include some level of nesting, .e.g. a 'side by side' or 'accordion' component which is then populated by further components.
Data structure
The data structure needs to be consumable by JSON:API, it therefore makes sense if the content for 'loose' sections is available in the order that it is entered - e.g. if there are five text paragraphs, then an image figure, then five more text paragraphs, that when rendered by JSON:API these are shown as 5 text -> 1 image -> 5 text, not 1 image -> 10 text.
Alternative rendering modes
Once you have 'loose' sections in layout builder, you have not only layout information, but also 'content', and this content needs to be available in places which are not just the full view mode of the entity.
Examples:
1. Both core search module and search API rely on view modes to define which fields get indexed for full text search. This implies that those view modes will need to present information that was added to 'loose' sections of the full entity layout, because that could be all the content of the article, and not only this, but the 'search' view mode may want to exclude certain elements such as related articles views blocks, field labels, add to cart language and similar.
Simplenews has a very similar model - where a 'newsletter' view mode can be set up, and this controls the rendering of the content when it appears in a newsletter.
2. A university might have a 'course' content type, and in the 'loose' section of the content want to add 'student testimonials' with a student name, image, and quote. Later in their course listing view, they want to pull out one testimonial into that listing too. This also implies different view modes having access to content added via the default/full view mode.
'Loose' content for multiple bundles?
Most of the discussions assume, implicitly, that 'loose' content will only be possible on the default view mode, but there has not been an explicit decision made or documented about this. e.g. would we allow a site builder to configure a content type so that two view modes both allow loose content?
Proposed resolution
A lot of different permutations of data model have been discussed in the issue, trying to summarise some.
I think the proposed resolution explains well why we want to avoid nested entities (either blocks or paragraphs) at least as the default way that components added, due to the data fragmentation and performance issues they involve. The following then concentrate on data models that can be represented by a single entity.
1. Original proposal from effulgentsia and close to the current proof of concept implementation in XB - two single cardinatlity JSON fields
Let's assume that what we want to store is a tree of component instances. Since SDCs are now in Drupal core, let's use its terminology of props and slots.
Props hold the properties of the component. For example, for Layout Builder's inline blocks, each of the block type's fields can be thought of as a prop. Props can also include data that we don't currently model with fields, such as block configuration. Ideally, we'd also be able to use components that aren't Drupal blocks, such as perhaps SDC components directly, Paragraphs content, or who knows, maybe even Gutenberg blocks. I propose that the best common denominator is that each prop value just be something that can be represented with a TypedData object.
Meanwhile, slot values are other components, thus creating the tree. For example, Layout Builder sections can be thought of as components where the section's regions are its slots. But, unlike with Layout Builder today, we want a data storage model that allows slot values to be filled with components that themselves have slots, thereby allowing trees of any depth, enabling the easy creation of pages with complex layouts. Also, we want a data model where a component can have both props and slots, as many of the SDC components in Drupal core already do.
- Represent the component instance tree as two single-valued fields: a "layout" field and a "components" field. Conceptually, each field would be a JSON field. The layout field would contain the JSON just representing the tree (which component instance IDs are in which slots) but without any of the prop values. The "components" field would contain the JSON object whose top-level keys are the component instance IDs and values are the prop values for that component instance. This separation allows for either symmetric translations (make the "layout" field not translatable) or asymmetric translations (make the "layout" field translatable).
- However, as an optimization, don't actually store the JSON itself in these field values, but instead have a separate lookup table that maps a short hash (for example, the first 8 characters of a hash) to the JSON value, similar to what is proposed in #3267444: To reduce database size, convert layout_builder__layout_section column to a hash of the content. This way, duplicate values across revisions only duplicate the hash, not the full JSON.
- For the "components" field, instead of hashing the entire JSON object, hash each component instance separately, so that what's stored in the components field is a JSON object containing each component instance ID mapped to a hash of the JSON-encoded props for just that component instance. This way, if a page has 50 component instances, and a given revision only changes the props on one of them, the hash/lookup for each of the unchanged 49 gets to be reused.
I think the above storage model achieves all of our desired goals:
- Scalable to many component instances, revisions, and translations.
- Supports symmetric and asymmetric translations.
- Supports nesting to any depth.
- Doesn't require any entities other than the host entity, except where there's a separate reason to need separate entities, such as references to media.
- Unlike with serialized PHP arrays, all data can be individually queryable thanks to database support for JSON.
Advantages
Content is stored linearly in the order that it is entered.
Allows arbitrary amounts of field data to be supplied as props.
Content of the field is controlled by XB, not the manage fields page.
Disadvantages
- Requires complex SQL JSON queries when retrieving anything other than the full JSON blob, for example checking field type or component usage when validating module uninstall.
- Not clear how alternative view modes (search, or the course testimonials example) would be able to reference content stored in this data model without some kind of JSON tree querying syntax.
- Not clear what happens if two view modes allow 'loose' content, is that another field instance?
2. Relational field union
https://www.drupal.org/project/field_union allows a single field to combine multiple field types. This would allow each component available to be mapped to a field union, and then each set of slots for a component becomes a field union value.
This would require a field-union field to exist per available component. The 'layout tree' JSON field would then need to reference the field union field + delta each time.
Advantages
Field union fields would have all of the features currently available to field API fields (because they are). This would cover the search and course testimonial use cases.
Disadvantages
- There has to be a 101 mapping between components (which allow data entry) and field union fields.
- would bloat the views UI (and possibly elsewhere) with a lot of fields that likely would never need explicit views support
- Content in the field union fields is in an arbitrary order unrelated to its representation in the layout at all.
- field union fields have a fixed number of columns, so hard to do things like have multiple image field references in a single field value
3. Field union JSON
This is conceptually based on the field union module but with an important difference.
In this case a single field union field would be a multiple value field API field, however the field values would be stored in a JSON column. This would allow different 'union types' to be stored in a single field.
The field table would store the usual field columns, plus the 'field union type' to identify which field union is stored, plus a single 'values' JSON column holding the actual entered field values.
For example:
delta 0: 0, paragraph, {text, format}
delta 1: 1, highlighted_ content, {entity reference, text}
delta 2: 2, media_gallery, {media reference, media reference, media reference }
delta 3, 3, course_testimonail {student name, media reference, text, format }
Advantages:
- Content is stored linearly as it is entered.
- Allows nested multiple value columns.
- can be queried without JSON, allowing content dependencies to be calculated efficiently.
- possible to reference the field via other view modes, so handles the search and course testimonial cases
Disadvantages:
- The course and testimonial cases would still require some kind of 'field delta filtering', for example 'render the first occurrence of a testimonial union type from this field'.
- The field configuration would need to control the allowed field union types, this would need to be synced (or controlled by) the XB interface for allowed components.
Data model changes
An entirely new data model. So, we'll need some kind of BC layer for or upgrade path from Layout Builder. I'd like to first get feedback on this proposed data model before thinking about the BC layer / upgrade path though.
Remaining tasks
Discuss and poke holes in this proposal.