Problem/Motivation
- The vast majority of sites use Pathauto, as seen by installs January 2025:
Drupal core: 723,408 Pathauto: 514,780
- Getting a path such as
/node/100
indexed instead of the human readable URL alias/my-alias
is bad for SEO, since you want the human readable path indexed in the first place, not thenode/NID
path -- even if it gets redirected
Therefore, it makes sense to disallow all paths under /node
from getting crawled by default.
There may be reasons why a site wants to allow paths under /node
to get crawled, but they are the minority, and can edit robots.txt
to allow this with https://www.drupal.org/project/robotstxt.
Steps to reproduce
See in search engines that paths such as /node/100
are getting indexed, instead of the intended human readable URL alias such as /my-alias
, harming SEO.
Proposed resolution
Disallow all paths under /node
from getting crawled by default.
Remaining tasks
Update the robots.txt file
Workaround
Until robots.txt gets updated.
Add this in composer.json
:
"extra": {
"drupal-scaffold": {
"locations": {
"web-root": "web/"
},
"file-mapping": {
"[web-root]/robots.txt": {
"append": "assets/my-robots-additions.txt"
}
}
},
Add a file /assets/my-robots-additions.txt
, with this in it:
# Do not crawl any nodes
Disallow: /node
See Using Drupal's Composer Scaffold > Altering scaffold files.
User interface changes
none
API changes
none
Data model changes
none
Release notes snippet
TBD