Redesign docs with Apple-native theme; verify content; route CI to self-hosted runner-02

- VitePress: custom theme (SF system fonts, glass nav, soft surfaces, pill buttons,
  light/dark code blocks, refined feature cards, platform showcase + stat strip).
- Replace every emoji across docs and README with inline SVG icons.
- Verify and fix doc accuracy against actual scripts: JSON schema (category+pattern only),
  env-var configuration for json2*/import_* scripts, owasp2json CLI surface.
- Add public assets (logo.svg, favicon.svg, hero-shield.svg) and Shiki haproxy alias.
- Workflows default to self-hosted runner-02 with a configurable fallback to GitHub
  runners via the RUNS_ON repo variable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Fabrizio Salmi
2026-05-01 08:07:04 +02:00
parent 4575736fed
commit 5c654b3da8
32 changed files with 1643 additions and 839 deletions

View File

@@ -1,223 +1,236 @@
# API Reference
# API & Scripts Reference
This page documents the Python scripts that power the Patterns project.
Patterns is a small Python toolchain. Every script does one job and communicates with the rest through plain JSON or files on disk &mdash; no shared state, no daemon, no database.
## Core Scripts
### owasp2json.py
Fetches and parses OWASP Core Rule Set patterns from GitHub.
```bash
python owasp2json.py
```text
owasp2json.py ──▶ owasp_rules.json ──▶ json2{nginx,apache,traefik,haproxy}.py
└▶ badbots.py (independent)
```
**Output**: `owasp_rules.json`
All scripts are configured through **environment variables** (not CLI flags) except `owasp2json.py`, which has a small `argparse` interface.
**Configuration**:
- Uses environment variable `OWASP_REPO` to specify source repository
- Default: `coreruleset/coreruleset`
## Pipeline scripts
**Features**:
- Fetches latest CRS rules from GitHub
- Parses `.conf` files for regex patterns
- Extracts rule metadata (ID, severity, category)
- Outputs structured JSON for conversion scripts
### `owasp2json.py`
Fetches the OWASP Core Rule Set from GitHub and emits a flat JSON rule list.
```bash
python owasp2json.py --ref v4.0 --output owasp_rules.json
```
| Argument / env | Default | Purpose |
|----------------|---------|---------|
| `--output` | `owasp_rules.json` | Output JSON path |
| `--ref` | `v4.0` | Tag prefix to resolve (e.g. `v4.0`, `v3.3`, `dev`) |
| `--dry-run` | off | Fetch and parse without writing |
| `GITHUB_TOKEN` (env) | unset | Raises the GitHub API rate limit while iterating |
The script verifies each blob's SHA against the GitHub-reported value before parsing it.
---
### json2nginx.py
### `json2nginx.py`
Converts OWASP JSON rules to Nginx WAF configuration.
Converts `owasp_rules.json` into Nginx `map`-based rules.
```bash
python json2nginx.py
INPUT_FILE=custom.json OUTPUT_DIR=/tmp/out python json2nginx.py
```
**Input**: `owasp_rules.json`
**Output**: `waf_patterns/nginx/`
**Generated files** (in `OUTPUT_DIR`):
**Generated Files**:
| File | Purpose |
|------|---------|
| `waf_maps.conf` | Map directives (http block) |
| `waf_rules.conf` | If statements (server block) |
| `README.md` | Integration instructions |
| `waf_maps.conf` | `map` directives &mdash; include in the `http` block |
| `waf_rules.conf` | `if` rules &mdash; include in the `server` block |
| `<category>.conf` | One file per OWASP category, **for inspection only** |
| `README.md` | In-tree usage notes |
**Environment Variables**:
- `INPUT_FILE` - Path to OWASP JSON (default: `owasp_rules.json`)
- `OUTPUT_DIR` - Output directory (default: `waf_patterns/nginx`)
| Env var | Default |
|---------|---------|
| `INPUT_FILE` | `owasp_rules.json` |
| `OUTPUT_DIR` | `waf_patterns/nginx` |
---
### json2apache.py
### `json2apache.py`
Converts OWASP JSON rules to Apache ModSecurity format.
Converts `owasp_rules.json` into ModSecurity `SecRule` directives, partitioned by attack category.
```bash
python json2apache.py
```
**Input**: `owasp_rules.json`
**Output**: `waf_patterns/apache/`
**Generated files**: one `<category>.conf` per OWASP category (`sqli.conf`, `xss.conf`, `rce.conf`, `lfi.conf`, …) &mdash; each contains pure ModSecurity rules ready to `Include`.
**Generated Files**:
- Category-specific `.conf` files (sqli.conf, xss.conf, etc.)
- Each file contains ModSecurity `SecRule` directives
| Env var | Default |
|---------|---------|
| `INPUT_FILE` | `owasp_rules.json` |
| `OUTPUT_DIR` | `waf_patterns/apache` |
---
### json2traefik.py
### `json2traefik.py`
Converts OWASP JSON rules to Traefik middleware configuration.
Converts `owasp_rules.json` into a Traefik file-provider middleware.
```bash
python json2traefik.py
```
**Input**: `owasp_rules.json`
**Output**: `waf_patterns/traefik/`
**Generated files**:
**Generated Files**:
- `middleware.toml` - Traefik middleware configuration
- `README.md` - Integration instructions
- `middleware.toml` &mdash; complete WAF middleware definition
- `README.md` &mdash; in-tree integration notes
| Env var | Default |
|---------|---------|
| `INPUT_FILE` | `owasp_rules.json` |
| `OUTPUT_DIR` | `waf_patterns/traefik` |
---
### json2haproxy.py
### `json2haproxy.py`
Converts OWASP JSON rules to HAProxy ACL format.
Converts `owasp_rules.json` into HAProxy ACL files.
```bash
python json2haproxy.py
```
**Input**: `owasp_rules.json`
**Output**: `waf_patterns/haproxy/`
**Generated files**:
**Generated Files**:
- `waf.acl` - Main WAF ACL rules
- `README.md` - Integration instructions
- `waf.acl` &mdash; one regex per line, designed for `-f /etc/haproxy/waf.acl`
- `README.md` &mdash; in-tree integration notes
| Env var | Default |
|---------|---------|
| `INPUT_FILE` | `owasp_rules.json` |
| `OUTPUT_DIR` | `waf_patterns/haproxy/` |
---
### badbots.py
### `badbots.py`
Generates bad bot blocking configurations from public bot lists.
Independently fetches public bad-bot User-Agent lists and emits a `bots.*` file in each platform output directory.
```bash
python badbots.py
```
**Output**: Bot configurations in each `waf_patterns/*/` directory
**Generated files** (per platform):
**Features**:
- Fetches from multiple public bot lists
- Includes fallback sources for reliability
- Generates platform-specific configs
| Platform | File |
|----------|------|
| Nginx | `waf_patterns/nginx/bots.conf` |
| Apache | `waf_patterns/apache/bots.conf` |
| Traefik | `waf_patterns/traefik/bots.toml` |
| HAProxy | `waf_patterns/haproxy/bots.acl` |
---
| Env var | Purpose |
|---------|---------|
| `GITHUB_TOKEN` | Raises the GitHub API rate limit when fetching upstream lists |
## Import Scripts
If a remote source is unreachable, the script falls back to a bundled list.
These scripts help import existing WAF configurations.
## Import / install scripts
### import_nginx_waf.py
The `import_*.py` scripts copy generated files into a server's runtime configuration directory and (optionally) splice an `Include` line into the main config. They are configured **entirely** through environment variables.
Import Nginx WAF patterns from external sources.
### `import_nginx_waf.py`
```bash
python import_nginx_waf.py --source /path/to/external/rules
```
| Env var | Default |
|---------|---------|
| `WAF_DIR` | `waf_patterns/nginx` |
| `NGINX_WAF_DIR` | `/etc/nginx/waf/` |
| `NGINX_CONF` | `/etc/nginx/nginx.conf` |
| `BACKUP_DIR` | `/etc/nginx/waf_backup/` |
### import_apache_waf.py
### `import_apache_waf.py`
Import Apache ModSecurity rules.
| Env var | Default |
|---------|---------|
| `WAF_DIR` | `waf_patterns/apache` |
| `APACHE_WAF_DIR` | `/etc/modsecurity.d/` |
| `APACHE_CONF` | `/etc/apache2/apache2.conf` |
| `BACKUP_DIR` | `/etc/modsecurity.d/backup` |
```bash
python import_apache_waf.py --source /path/to/modsec/rules
```
### `import_traefik_waf.py`
### import_traefik_waf.py
| Env var | Default |
|---------|---------|
| `WAF_DIR` | `waf_patterns/traefik` |
| `TRAEFIK_WAF_DIR` | `/etc/traefik/waf/` |
| `TRAEFIK_DYNAMIC_CONF` | `/etc/traefik/dynamic.toml` |
| `BACKUP_DIR` | `/etc/traefik/waf_backup/` |
Import Traefik middleware configurations.
### `import_haproxy_waf.py`
```bash
python import_traefik_waf.py --source /path/to/traefik/config
```
| Env var | Default |
|---------|---------|
| `WAF_DIR` | `waf_patterns/haproxy` |
| `HAPROXY_WAF_DIR` | `/etc/haproxy/waf/` |
| `HAPROXY_CONF` | `/etc/haproxy/haproxy.cfg` |
| `BACKUP_DIR` | `/etc/haproxy/waf_backup/` |
### import_haproxy_waf.py
::: warning Privileged paths
The defaults point at system directories (`/etc/...`). Run the import scripts as root, or override every env var to point at a sandbox before running them.
:::
Import HAProxy ACL rules.
## Data format
```bash
python import_haproxy_waf.py --source /path/to/haproxy/acl
```
### `owasp_rules.json`
---
## Data Structures
### owasp_rules.json Format
A flat JSON array. Each item is a single rule with two required fields:
```json
[
{
"id": "942100",
"pattern": "(?i:union.*select)",
"category": "sqli",
"severity": "critical",
"location": "request-uri",
"description": "SQL Injection Attack Detected"
"category": "SQLI",
"pattern": "(?i:union[\\s\\S]+select)"
},
{
"category": "XSS",
"pattern": "(?i:<script[^>]*>)"
}
]
```
**Fields**:
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | OWASP CRS rule ID |
| `pattern` | string | Regex pattern |
| `category` | string | Attack category (sqli, xss, rce, etc.) |
| `severity` | string | critical, high, medium, low |
| `location` | string | Where to match (request-uri, headers, etc.) |
| `description` | string | Human-readable description |
| `category` | string | OWASP CRS category derived from the source filename (e.g. `SQLI`, `XSS`, `RCE`, `LFI`, `RFI`, `BOTS`) |
| `pattern` | string | Regex extracted from the matching `SecRule` directive |
---
The converters validate each pattern with Python's `re.compile` before emitting platform-specific output, so malformed regexes are dropped rather than propagated.
## Extending the Project
## Extending the toolchain
### Adding a New Platform
### Adding a new platform
1. Create `json2<platform>.py` based on existing converters
2. Add output directory in `waf_patterns/<platform>/`
3. Update GitHub Actions workflow
4. Add documentation in `docs/`
1. Copy one of the existing `json2<platform>.py` converters as a starting point.
2. Implement `_sanitize_pattern()` for the target syntax (escape rules differ between Nginx, Apache, HAProxy, …).
3. Emit your output under `waf_patterns/<platform>/`.
4. Add a workflow step in `.github/workflows/update_patterns.yml` to package the result.
5. Add a documentation page under `docs/`.
### Custom Pattern Sources
### Pinning a different OWASP CRS version
Modify `owasp2json.py` to add new pattern sources:
```python
SOURCES = [
"coreruleset/coreruleset",
"your-org/your-rules",
]
```bash
python owasp2json.py --ref v3.3
```
---
### Pulling rules from a fork
`owasp2json.py` hardcodes the upstream repository constant `coreruleset/coreruleset`. To target a fork, edit `GITHUB_REPO_URL` near the top of the script.
## Dependencies
Listed in `requirements.txt`:
```
requests>=2.28.0
beautifulsoup4>=4.11.0
```
Install with:
Listed in [`requirements.txt`](https://github.com/fabriziosalmi/patterns/blob/main/requirements.txt). Install with:
```bash
pip install -r requirements.txt
```
The pipeline targets Python **3.11+**.