Robots.txt Generator

What it does: Automatically generates a robots.txt file to control search engine crawling

File types: Works with all content files (.md, .html) and category definition files

Events:

POST_GLOB (priority 150) - Scans files for robots metadata
POST_LOOP (priority 100) - Generates robots.txt file

How It Works

Scans all content files for robots metadata during site generation
Collects paths that should be disallowed
Generates robots.txt in the output directory after processing all files
Automatically includes sitemap reference if SITE_BASE_URL is configured

Controlling Individual Pages

Add robots field to any content file's frontmatter:

Disallow a Page (robots=no)

---
title: "Private Page"
robots: "no"
---

# This page won't be crawled by search engines

This content is hidden from search engines via robots.txt.

Allow a Page (robots=yes or omit field)

---
title: "Public Page"
robots: "yes"
---

# This page is visible to search engines

Default behavior - search engines can crawl this page.

Controlling Entire Categories

Create a category definition file with type=category and robots=no:

---
type: category
category: private
title: "Private Category"
robots: "no"
---

# Private Category

All files in this category will be disallowed in robots.txt.

Generated Robots.txt Examples

When You Have Pages with robots=no

# robots.txt generated by StaticForge
# 2025-01-15 10:30:00

User-agent: *
Disallow: /private-page.html
Disallow: /secret.html
Disallow: /private-category/

# Sitemap location
Sitemap: https://example.com/sitemap.xml

When All Pages Are Allowed

# robots.txt generated by StaticForge
# 2025-01-15 10:30:00

User-agent: *
# No disallowed paths
Disallow:

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Key Features

Automatic: robots.txt is generated on every site build
Smart defaults: Pages without robots field default to "yes" (allow)
Case insensitive: robots="NO", robots="No", and robots="no" all work
Category support: Disallow entire categories via category definition files
Sorted output: Paths are alphabetically sorted for consistency
Sitemap reference: Automatically includes sitemap URL if configured

Configuration

No configuration needed! The feature uses SITE_BASE_URL from your .env file:

# .env
SITE_BASE_URL="https://example.com"

Why Use This Feature

Privacy: Keep development/internal pages out of search results
SEO control: Prevent duplicate content issues
Compliance: Follow SEO best practices automatically
No manual work: robots.txt updates automatically when you add/remove pages

Important Notes

robots.txt is a suggestion to search engines, not enforcement
For real security, use authentication or don't publish sensitive content
The feature generates robots.txt in your output directory on every build
Changes take effect when you deploy the updated robots.txt file

← Back to Features Overview