Quantcast
Channel: Issues for Drupal core
Viewing all articles
Browse latest Browse all 291886

Disallow crawling paths under /node by default in robots.txt

$
0
0

Problem/Motivation

  1. The vast majority of sites use Pathauto, as seen by installs January 2025:

    Drupal core: 723,408
    Pathauto:    514,780

    From https://www.drupal.org/project/usage

  2. Getting a path such as /node/100 indexed instead of the human readable URL alias /my-alias is bad for SEO, since you want the human readable path indexed in the first place, not the node/NID path -- even if it gets redirected

Therefore, it makes sense to disallow all paths under /node from getting crawled by default.

There may be reasons why a site wants to allow paths under /node to get crawled, but they are the minority, and can edit robots.txt to allow this with https://www.drupal.org/project/robotstxt.

Steps to reproduce

See in search engines that paths such as /node/100 are getting indexed, instead of the intended human readable URL alias such as /my-alias, harming SEO.

Proposed resolution

Disallow all paths under /node from getting crawled by default.

Remaining tasks

Update the robots.txt file

Workaround

Until robots.txt gets updated.

Add this in composer.json:

"extra": {
    "drupal-scaffold": {
        "locations": {
            "web-root": "web/"
        },
        "file-mapping": {
            "[web-root]/robots.txt": {
                "append": "assets/my-robots-additions.txt"
            }
        }
    },

Add a file /assets/my-robots-additions.txt, with this in it:

# Do not crawl any nodes
Disallow: /node

See Using Drupal's Composer Scaffold > Altering scaffold files.

User interface changes

none

API changes

none

Data model changes

none

Release notes snippet

TBD


Viewing all articles
Browse latest Browse all 291886

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>