On This Page
⏱ 3 min read
Robots.txt Generator
What it does: Automatically generates a robots.txt file to control search engine crawling
File types: Works with all content files (.md, .html) and category definition files
Events:
POST_GLOB(priority 150) - Scans files for robots metadataPOST_LOOP(priority 100) - Generates robots.txt file
How It Works
- Scans all content files for
robotsmetadata during site generation - Collects paths that should be disallowed
- Generates
robots.txtin the output directory after processing all files - Automatically includes sitemap reference if
SITE_BASE_URLis configured
Controlling Individual Pages
Add robots field to any content file's frontmatter:
Disallow a Page (robots=no)
---
title: "Private Page"
robots: "no"
---
# This page won't be crawled by search engines
This content is hidden from search engines via robots.txt.
Allow a Page (robots=yes or omit field)
---
title: "Public Page"
robots: "yes"
---
# This page is visible to search engines
Default behavior - search engines can crawl this page.
Controlling Entire Categories
Create a category definition file with type=category and robots=no:
---
type: category
category: private
title: "Private Category"
robots: "no"
---
# Private Category
All files in this category will be disallowed in robots.txt.
Generated Robots.txt Examples
When You Have Pages with robots=no
# robots.txt generated by StaticForge
# 2025-01-15 10:30:00
User-agent: *
Disallow: /private-page.html
Disallow: /secret.html
Disallow: /private-category/
# Sitemap location
Sitemap: https://example.com/sitemap.xml
When All Pages Are Allowed
# robots.txt generated by StaticForge
# 2025-01-15 10:30:00
User-agent: *
# No disallowed paths
Disallow:
# Sitemap location
Sitemap: https://example.com/sitemap.xml
Key Features
- Automatic: robots.txt is generated on every site build
- Smart defaults: Pages without
robotsfield default to "yes" (allow) - Case insensitive:
robots="NO",robots="No", androbots="no"all work - Category support: Disallow entire categories via category definition files
- Sorted output: Paths are alphabetically sorted for consistency
- Sitemap reference: Automatically includes sitemap URL if configured
Configuration
No configuration needed! The feature uses SITE_BASE_URL from your .env file:
# .env
SITE_BASE_URL="https://example.com"
Why Use This Feature
- Privacy: Keep development/internal pages out of search results
- SEO control: Prevent duplicate content issues
- Compliance: Follow SEO best practices automatically
- No manual work: robots.txt updates automatically when you add/remove pages
Important Notes
- robots.txt is a suggestion to search engines, not enforcement
- For real security, use authentication or don't publish sensitive content
- The feature generates
robots.txtin your output directory on every build - Changes take effect when you deploy the updated robots.txt file