⏱ 3 min read

Robots.txt Generator

What it does: Automatically generates a robots.txt file to control search engine crawling

File types: Works with all content files (.md, .html) and category definition files

Events:

How It Works

  1. Scans all content files for robots metadata during site generation
  2. Collects paths that should be disallowed
  3. Generates robots.txt in the output directory after processing all files
  4. Automatically includes sitemap reference if SITE_BASE_URL is configured

Controlling Individual Pages

Add robots field to any content file's frontmatter:

Disallow a Page (robots=no)

---
title: "Private Page"
robots: "no"
---

# This page won't be crawled by search engines

This content is hidden from search engines via robots.txt.

Allow a Page (robots=yes or omit field)

---
title: "Public Page"
robots: "yes"
---

# This page is visible to search engines

Default behavior - search engines can crawl this page.

Controlling Entire Categories

Create a category definition file with type=category and robots=no:

---
type: category
category: private
title: "Private Category"
robots: "no"
---

# Private Category

All files in this category will be disallowed in robots.txt.

Generated Robots.txt Examples

When You Have Pages with robots=no

# robots.txt generated by StaticForge
# 2025-01-15 10:30:00

User-agent: *
Disallow: /private-page.html
Disallow: /secret.html
Disallow: /private-category/

# Sitemap location
Sitemap: https://example.com/sitemap.xml

When All Pages Are Allowed

# robots.txt generated by StaticForge
# 2025-01-15 10:30:00

User-agent: *
# No disallowed paths
Disallow:

# Sitemap location
Sitemap: https://example.com/sitemap.xml

Key Features

Configuration

No configuration needed! The feature uses SITE_BASE_URL from your .env file:

# .env
SITE_BASE_URL="https://example.com"

Why Use This Feature

Important Notes


← Back to Features Overview