qsgen3/how-it-works.md
Stig-Ørjan Smelror 81ffa53d70 Add comprehensive theme documentation and improve migration script
- Add THEMES-HOWTO.md: Complete guide for creating and customizing themes
- Remove theme sections from how-it-works.md to avoid duplication
- Update migration script to place all blog posts in single directory
- Streamline documentation structure for better organization
2025-05-31 03:00:50 +02:00

529 lines
16 KiB
Markdown

# How qsgen3 Works
## Table of Contents
1. [Philosophy and Design Principles](#philosophy-and-design-principles)
2. [Project Structure](#project-structure)
3. [Configuration System](#configuration-system)
4. [Content Processing Pipeline](#content-processing-pipeline)
5. [Static File Handling](#static-file-handling)
6. [Template System](#template-system)
7. [Output Generation](#output-generation)
8. [Command Line Interface](#command-line-interface)
9. [Dependencies and Requirements](#dependencies-and-requirements)
10. [Detailed Workflow](#detailed-workflow)
11. [Troubleshooting and Debugging](#troubleshooting-and-debugging)
## Philosophy and Design Principles
### Core Philosophy
qsgen3 is designed to be **100% design-agnostic**. It does not impose any specific CSS frameworks, JavaScript libraries, or HTML structures on users. The generator's role is strictly to:
1. Process Markdown content into HTML
2. Combine content with user-chosen templates and styling
3. Generate a complete static site structure
### Key Principles
- **Minimal Dependencies**: Only requires Pandoc for content processing
- **In-Memory Operations**: All content manipulation occurs in memory to improve performance and reduce storage wear
- **Flexible Theme System**: Supports easy switching between themes via configuration
- **Template Agnostic**: Works with any Pandoc-compatible HTML templates
- **No Forced Assets**: Only automatically links the main theme CSS; all other asset inclusion is explicit
## Project Structure
A typical qsgen3 project follows this structure:
```
project-root/
├── bin/
│ └── qsgen3 # Main generator script
├── site.conf # Main configuration file
├── .qsgen3_preserve # Optional: File preservation patterns
├── content/ # Markdown content
│ ├── posts/ # Blog posts
│ │ └── hello-world.md
│ └── pages/ # Static pages
├── layouts/ # Pandoc HTML templates
│ ├── index.html # Homepage template
│ ├── post.html # Blog post template
│ └── rss.xml # RSS feed template
├── static/ # Static assets (CSS, images, etc.)
│ ├── css/
│ └── images/
├── output/ # Generated site (created by qsgen3)
├── index.html
├── rss.xml
├── posts/
└── static/
```
## Configuration System
### Primary Configuration: `site.conf`
The `site.conf` file uses a simple key-value format:
```bash
# Site Metadata
site_name="My Awesome Site"
site_tagline="A brief description of my site"
site_url="http://localhost:8000"
# Directory Paths
paths_content_dir="content"
paths_output_dir="output"
paths_layouts_dir="layouts"
paths_static_dir="static"
# Build Options
build_options_generate_rss=true
build_options_generate_sitemap=true
build_options_process_drafts=false
```
### Configuration Loading Process
1. **File Location**: Defaults to `$PROJECT_ROOT/site.conf`
2. **Override**: Can be specified with `-c <file>` or `--config <file>`
3. **Parsing**: Simple line-by-line parsing of `key="value"` pairs
4. **Storage**: Values stored in `QSG_CONFIG` associative array
5. **Validation**: Basic validation for required keys and file existence
### Key Configuration Variables
- **`site_name`**: Name of the site
- **`site_url`**: Base URL for the site (used in RSS and absolute links)
- **`paths_*`**: Directory paths (can be relative or absolute)
- **`build_options_*`**: Boolean flags for optional features
## Content Processing Pipeline
### Markdown Processing
1. **Discovery**: Recursively scans `paths_content_dir` for `.md` files
2. **Metadata Extraction**: Parses YAML front matter for post metadata
3. **Content Conversion**: Uses Pandoc to convert Markdown to HTML
4. **Template Application**: Applies appropriate template based on content type
5. **Output Generation**: Writes processed HTML to corresponding location in `output/`
### Content Types
- **Posts**: Files in `content/posts/``output/posts/`
- **Pages**: Files in `content/pages/``output/`
- **Index**: Generated from post metadata → `output/index.html`
- **RSS**: Generated from post metadata → `output/rss.xml`
### Metadata Handling
Each Markdown file can include YAML front matter:
```yaml
---
title: "Post Title"
date: "2023-01-01"
author: "Author Name"
description: "Post description"
draft: false
---
# Post Content
Your markdown content here...
```
## Static File Handling
### Copy Strategy
Static files are copied in a specific order to handle theme overrides:
1. **Root Static Files**: Copy from `paths_static_dir` to `output/static/`
2. **Theme Static Files**: Copy from theme's static source to `output/static/`
3. **Override Behavior**: Theme files overwrite root files with same names
### Copy Implementation
- **Primary Tool**: `rsync` with `-av --delete` flags
- **Fallback**: `cp -R` if rsync is unavailable
- **Preservation**: Maintains directory structure and file permissions
### CSS File Linking
1. **Availability**: Theme CSS files are copied to `output/static/`
2. **Verification**: Script checks for CSS file existence after copying
3. **Pandoc Integration**: CSS path passed to Pandoc via `--css` flag
4. **Path Format**: Uses site-root-relative paths (e.g., `/static/css/style.css`)
## Template System
### Template Types
qsgen3 uses Pandoc templates with specific purposes:
- **`index.html`**: Homepage template (receives post list metadata)
- **`post.html`**: Individual post template (receives post content and metadata)
- **`rss.xml`**: RSS feed template (receives post list for syndication)
### Template Variables
Templates receive data through Pandoc's variable system:
#### Post Templates
- `$title$`: Post title from front matter
- `$date$`: Post date
- `$author$`: Post author
- `$body$`: Converted HTML content
- Custom variables from YAML front matter
#### Index Template
- `$site_name$`: From site.conf
- `$site_tagline$`: From site.conf
- `$posts$`: Array of post metadata for listing
#### RSS Template
- `$site_url$`: Base URL for absolute links
- `$posts$`: Array of post data with URLs and content
### Template Resolution
1. **Theme Override**: If theme provides templates, use theme's `layouts/`
2. **Default**: Use project's `layouts/` directory
3. **Fallback**: Error if required template not found
## Output Generation
### Directory Structure
Generated output maintains a clean, predictable structure:
```
output/
├── index.html # Homepage
├── rss.xml # RSS feed
├── posts/ # Individual post pages
│ └── post-name.html
├── static/ # All static assets
│ ├── css/ # Stylesheets
│ ├── js/ # JavaScript (if provided by theme)
│ └── images/ # Images and media
└── css/ # Legacy: Index-specific CSS location
└── theme.css # Copy of main theme CSS for index page
```
### File Naming
- **Posts**: `content/posts/hello-world.md``output/posts/hello-world.html`
- **Pages**: `content/pages/about.md``output/about.html`
- **Index**: Generated → `output/index.html`
- **RSS**: Generated → `output/rss.xml`
### URL Structure
- **Posts**: `/posts/post-name.html`
- **Pages**: `/page-name.html`
- **Static Assets**: `/static/path/to/asset`
- **CSS**: `/static/css/style.css` (for posts), `/css/theme.css` (for index)
## Command Line Interface
### Basic Usage
```bash
./bin/qsgen3 [options]
```
### Available Options
- **`-h, --help`**: Display usage information and exit
- **`-V, --version`**: Show script name and version, then exit
- **`-c <file>, --config <file>`**: Specify custom configuration file path
### Path Resolution
- **`PROJECT_ROOT`**: Defaults to current working directory (`$PWD`)
- **`CONFIG_FILE`**: Defaults to `$PROJECT_ROOT/site.conf`
- **Relative Paths**: Configuration file path can be relative to project root
### Exit Codes
- **0**: Successful generation
- **1**: Error (missing dependencies, configuration issues, processing failures)
## Dependencies and Requirements
### Required Dependencies
- **Pandoc**: Core dependency for Markdown processing and HTML generation
- **Zsh**: Shell interpreter (script written in Zsh)
### Optional Dependencies
- **rsync**: Preferred tool for efficient file copying (falls back to `cp`)
### System Requirements
- **Operating System**: Linux/Unix-like systems
- **File System**: Support for standard Unix file permissions
- **Memory**: Minimal requirements (all processing in memory)
### Environment Setup
The script configures a consistent environment:
```bash
LC_ALL=C
LANG=C
umask 0022
```
## Detailed Workflow
### 1. Initialization Phase
```
Start qsgen3
├── Parse command line arguments
├── Set PROJECT_ROOT (default: $PWD)
├── Determine CONFIG_FILE path
├── Set environment variables (LC_ALL, LANG, umask)
└── Initialize QSG_CONFIG array
```
### 2. Configuration Loading
```
Load Configuration
├── Check if CONFIG_FILE exists
├── Parse key="value" pairs line by line
├── Strip quotes from values
├── Store in QSG_CONFIG associative array
└── Validate required configuration keys
```
### 3. Dependency Checking
```
Check Dependencies
├── Verify Pandoc is available
├── Check Pandoc version compatibility
├── Verify other required tools
└── Exit with error if dependencies missing
```
### 4. Output Preparation
```
Prepare Output Directory
├── Check for .qsgen3_preserve file in project root
├── If preserve file exists:
│ ├── Read file patterns (shell glob patterns)
│ ├── Create temporary backup directory
│ ├── Find and backup matching files from output directory
│ ├── Remove entire output directory
│ ├── Recreate clean output directory
│ ├── Restore preserved files maintaining directory structure
│ └── Clean up temporary backup directory
├── If no preserve file:
│ ├── Remove entire output directory
│ └── Create fresh output directory
└── Log preservation and cleaning operations
```
#### File Preservation System
qsgen3 supports preserving specific files during the cleaning process to handle cases where content has been shared or bookmarked and should remain accessible even after title changes.
**Preserve File Format (`.qsgen3_preserve`):**
- Located in project root directory
- One pattern per line using shell glob patterns (`*`, `?`, `[]`)
- Lines starting with `#` are comments
- Empty lines are ignored
- Patterns are relative to the output directory
**Example preserve patterns:**
```bash
# Preserve specific shared articles
posts/my-important-shared-article.html
posts/viral-blog-post.html
# Preserve files by pattern
posts/legacy-*.html
archive/*
# Preserve all PDFs and downloads
*.pdf
downloads/*
```
**Benefits:**
- Maintains stable URLs for shared content
- Prevents broken links when content is renamed
- Flexible pattern matching for various preservation needs
- Backward compatible (no preserve file = complete cleaning)
### 5. Static File Processing
```
Copy Static Files
├── Copy from paths_static_dir to output/static/
│ ├── Use rsync -av --delete if available
│ └── Fallback to cp -R
├── Copy from theme static source to output/static/
│ ├── Theme files overwrite root files
│ └── Preserve directory structure
└── Log copy operations and results
```
### 6. CSS Path Determination
```
Determine CSS Linking
├── Read site_theme_css_file from configuration
├── Construct expected CSS file path in output/static/
├── Verify CSS file exists after copying
├── Set QSG_CONFIG[pandoc_css_path_arg] for Pandoc
└── Log CSS path decisions and warnings
```
### 7. Content Processing
```
Process Markdown Content
├── Scan paths_content_dir recursively for .md files
├── For each Markdown file:
│ ├── Extract YAML front matter
│ ├── Determine output path and template
│ ├── Run Pandoc with appropriate template and CSS
│ ├── Write generated HTML to output directory
│ └── Log processing results
└── Collect metadata for index and RSS generation
```
### 8. Index Generation
```
Generate Index Page
├── Collect all post metadata
├── Create YAML metadata file for Pandoc
├── Run Pandoc with index template
├── Apply CSS styling
├── Write output/index.html
└── Clean up temporary files
```
### 9. RSS Generation
```
Generate RSS Feed
├── Collect post metadata with URLs
├── Create YAML metadata for RSS template
├── Run Pandoc with RSS template
├── Generate absolute URLs using site_url
├── Write output/rss.xml
└── Clean up temporary files
```
### 10. Finalization
```
Complete Generation
├── Log final directory structure
├── Report generation success
├── Clean up any remaining temporary files
└── Exit with status code 0
```
## Troubleshooting and Debugging
### Common Issues
#### 1. CSS Not Applied
**Symptoms**: Generated HTML doesn't show theme styling
**Causes**:
- Incorrect `site_theme_css_file` path in site.conf
- CSS file doesn't exist in theme's static assets
- Theme static directory structure mismatch
**Solutions**:
- Verify CSS file path relative to theme's static source
- Check theme directory structure
- Enable debug logging to trace CSS path resolution
#### 2. Template Errors
**Symptoms**: Pandoc errors during HTML generation
**Causes**:
- Missing required templates
- Template syntax errors
- Incompatible template variables
**Solutions**:
- Verify all required templates exist
- Check Pandoc template syntax
- Review template variable usage
#### 3. Static File Copy Issues
**Symptoms**: Assets missing from output directory
**Causes**:
- Permission issues
- Disk space problems
- Path resolution errors
**Solutions**:
- Check file permissions
- Verify available disk space
- Review path configurations for absolute vs. relative paths
#### 4. File Preservation Issues
**Symptoms**: Expected files not preserved during cleaning, or preservation not working
**Causes**:
- Incorrect patterns in `.qsgen3_preserve` file
- File paths don't match patterns
- Permission issues with temporary backup directory
- Malformed preserve file format
**Solutions**:
- Verify patterns use shell glob syntax (`*`, `?`, `[]`)
- Check that patterns are relative to output directory
- Ensure `.qsgen3_preserve` file is in project root
- Test patterns with `find output/ -name "pattern"` before adding to preserve file
- Enable debug logging to see preservation process details
- Verify file permissions allow temporary directory creation
**Example debugging:**
```bash
# Test if your pattern matches files
find output/ -name "posts/legacy-*.html"
# Enable debug logging to see preservation process
QSG_DEBUG=1 ./bin/qsgen3
```
### Debug Logging
Enable detailed logging by modifying the `_log` function or adding debug statements:
```bash
# Enable debug logging
QSG_DEBUG=1 ./bin/qsgen3
```
### Path Debugging
The script includes path resolution logic to handle both relative and absolute paths. If experiencing path issues:
1. Check that `PROJECT_ROOT` is correctly set
2. Verify configuration paths are relative to project root
3. Review log messages for path construction details
### Configuration Validation
Ensure site.conf follows the correct format:
- Use double quotes for values: `key="value"`
- No spaces around the equals sign
- One configuration per line
- Comments start with `#`
---