Suppose, here are the default rules generated by the Yoast plugin installed on the WordPress site for the robots.txt
file located in the root directory https://example.com/robots.txt
# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap_index.xml
# ---------------------------
# END YOAST BLOCK
and here are the robots meta tags ?robots=1
generated by Yoast for individual pages on the site: https://example.com/?robots=1
.
# START YOAST BLOCK
# ---------------------------
User-agent: *
Disallow: /?s=
Disallow: /page/*/?s=
Disallow: /search/
Disallow: /wp-json/
Disallow: /?rest_route=
Sitemap: https://example.com/sitemap_index.xml
# ---------------------------
# END YOAST BLOCK
What is the meaning of these rules for robots.txt
and robots meta tag ?robots=1
generated by the Yoast SEO plugin on a WordPress site.
This is the default output generated by the
Yoast SEO plugin
is installed on theWordPress
site which is a standard rule for indexing a complete site.Lines 1, 2, and 8,9: Comments
Line 3:
User-agent: *
The rules apply to all web crawlers, such as Googlebot, Bingbot, DuckDuckbot, Slurp, Yandexbot, Baiduspider, etc.Line 4:
Disallow:
As we can see, nothing is specified after the Disallow specification, meaning no pages are blocked from being crawled by the web crawler. They can crawl the complete site.Line 6:
Sitemap: https://example.com/sitemap_index.xml
This is the sitemap location of your WordPress site which search engine refers to index the web pages on your site.Here is the explanation of the meta robots tag
?robots=1
rules generated by the Yoast plugin for individual website pages.Lines 1, 2, and 11,12: Comments
Line 3:
User-agent: *
means the rules apply to all web crawlers.Line 4:
Disallow: /?s=
This rule does not allow the web crawlers to crawl internal website search result pages starting with the/?s= query parameter.Line 5:
Disallow: /page/*/?s=
This rule does not allow the web crawlers to crawl paginated search result pages starting with /page/*/?s= query parameter.Line 6:
Disallow: /search/
This rule does not allow the web crawlers to crawl alternative search result pages starting with /search/ query parameter.Line 7:
Disallow: /wp-json/
This rule does not allow the web crawlers to crawl WordPress REST API at /wp-json/.Line 8:
Disallow: /?rest_route=
This rule does not allow the web crawlers to crawl the routes that are part of the REST API.Line 10:
Sitemap: https://example.com/sitemap_index.xml
This is the sitemap location of your WordPress site which search engine refers to index the web pages on your site.