KNEL Infrastructure Overview
Last Updated: [DATE]
Owner: @reachableceo
Review Cadence: Quarterly or after significant changes
Scope: Known Element Enterprises LLC (KNEL) — centralized IT for TSYS Group
Purpose
This post provides a comprehensive overview of KNEL infrastructure across all sites. It serves as the starting point for anyone (human or AI) needing to understand what exists, where it lives, and how it connects.
Business Context
KNEL delivers IT and business systems services to all TSYS Group entities:
- RackRental.net — Network lab rental (separate documentation)
- Starting Line Productions — Engineering/render (minimal infra, Cloudron-hosted)
- Suborbital Systems Development — R&D, specialty electronics
- Side Door Solutions Group / Side Door Group / AFABN — Non-profits
- Other TSYS entities as needed
Sites
SITER — Pflugerville, TX (Residence, Purpose-Built Server Room)
Primary datacenter. All compute and storage.
- Internet: AT&T Fiber, 1 Gbps symmetric
- Network: 192.168.0.0/22, flat network, Tailscale overlay
- Power: Currently single 20A 125V circuit; 60A 240V circuit available (not wired yet)
- Cooling: 25k BTU mini split
- UPS: Broken — needs remedy
- Battery: 25kWh base power → upgrading to 50kWh + generator inlet (6 weeks)
- Generator: 50A 240V on hand
SITES — San Antonio, TX (~2 months out)
Future site for RackRental and STLP HPC workloads.
- Tailscale router VM
- Proxmox nodes for RackRental and STLP HPC
- RackRental build-out documented in May
SITEV — Netcup VPS (Cloudron)
Command, control, orchestration, and public-facing services.
- Cloudron manages all public-facing apps
- Let’s Encrypt for all public TLS certificates
SITER Rack Layout
| Rack | Contents | Entity |
|---|---|---|
| 1 | RackRental switches | RackRental (separate docs) |
| 2 | RackRental routers | RackRental (separate docs) |
| 3 | Tsys 6, 7, 8 (Proxmox) + 6 powered-off spares + dedicated storage switch | Mixed (RR, STLP, Suborbital R&D) |
| 4 | Tsys 1-5 + core switch | KNEL core production |
| 5 | Switch, APs/routers, Pis, Arduinos | Suborbital |
Hypervisor Inventory
| Host | Rack | Role | Type | Notes |
|---|---|---|---|---|
| Tsys 1 | 4 | Compute | Proxmox | |
| Tsys 2 | 4 | Compute + Storage | Windows 10 | NetBoot (DHCP/TFTP/Webmin), Pi-hole |
| Tsys 3 | 4 | Compute | Proxmox | |
| Tsys 4 | 4 | Storage | Proxmox | PBS (5TB), USB toaster drives, ext4/NFS |
| Tsys 5 | 4 | Compute | Proxmox | |
| Tsys 6 | 3 | Compute | Proxmox | |
| Tsys 7 | 3 | Compute | Proxmox | |
| Tsys 8 | 3 | Storage | Proxmox | ZFS, 4 internal drives, NFS. RR/STLP storage. |
Total: 7 Proxmox + 1 Windows 10 = 8 hypervisors
All Proxmox nodes are standalone (no clustering). Managed via Proxmox Data Center.
6 additional powered-off servers in Rack 3 available for network boot as Proxmox nodes (general compute pool for RR/STLP).
Network Architecture
- Physical: Flat network, single VLAN (VLAN 1) for management
- Storage: Non-routed storage VLAN on core switch; dedicated storage switch in Rack 3
- Overlay: Tailscale (exit node, subnet router, DNS → Technitium)
- Switches: Dell across the board, monitored via LibreNMS
- RackRental: L3 isolated behind Cisco router
- LLDP: Enabled on all systems
DNS
- Technitium DNS (on PFV-Tailscale-router VM) — authoritative for knel.net
- Pi-hole (on Tsys 2) — recursive, forwards to 8.8.8.8, reverse lookups to Technitium
DHCP
- NetBoot VM (Tsys 2) — DHCP with static leases managed via Webmin
Identity & Access
| Layer | Tool | Scope |
|---|---|---|
| Application SSO | Keycloak (Cloudron) | 99% of auth |
| Infrastructure Auth | Univention Corporate Server (UCS) | Server login, switch RADIUS, RackRental Cyclades console |
| Secrets | Bitwarden | Human credentials, shared collections |
| Machine Secrets | HashiCorp Vault | Stood up, not in use yet. Planned for Ansible. |
Onboarding
- Contractor: Cloudron invite → Apache Guacamole → server access
- Albert/Charles: Super admin + Ultix-highside (Surface Laptop Go 2 in server room) via Tailscale/RDP
Certificate Management
| Layer | Method | Scope |
|---|---|---|
| Public TLS | Cloudron → Let’s Encrypt | 99% of services |
| Internal | Nitrokey HSMs (root + intermediate CA) | SSH certificates, internal TLS. Passed through to Debian VM. |
Services Currently Running
Cloudron (SITEV)
- Discourse, Snipe-IT (empty), phpIPAM, Dolibarr (per-entity instances)
- Gitea, PeerTube (runners pending, ~2 months)
- Keycloak, Vault (unused), Beszel
- WordPress sites, Redmine (project management)
- Uptime Kuma (uptime monitoring → Pushover)
- ~30 more apps coming (Jenkins, etc.)
SITER Proxmox VMs
- Wazuh — Security/SIEM (in production)
- AWX (Ansible Tower) — Stood up, not deploying yet
- UCS — Identity/RADIUS (production)
- PFV-Tailscale-router — Exit node, subnet router, Technitium DNS
- NetBoot VM (Tsys 2) — DHCP/TFTP/Webmin/Pi-hole
- CA Debian VM — Nitrokey HSM internal CA
- LibreNMS + Netdisco — Network monitoring/instrumentation
- PBS — Proxmox Backup Server on Tsys4 (5TB, not actively backing up yet)
- K8s nodes — Control + worker, no workloads yet (~2 months)
- MPI cluster — No workloads yet (~2 months)
All Proxmox VMs: Debian stock + Beszel agent + Tailscale
Toolchain
| Tool | Purpose | Status |
|---|---|---|
| Snipe-IT | Asset lifecycle | Empty, needs populating |
| phpIPAM | IPAM | In use |
| Netbox | DCIM/cabling | Coming in weeks |
| Dolibarr | ERP/business | In use, per-entity |
| LibreNMS + Netdisco | Network monitoring | In use |
| Beszel | Host instrumentation | In use |
| Wazuh | Security/SIEM | In production |
| Uptime Kuma | Uptime alerting | In production → Pushover |
| Proxmox Data Center | Hypervisor management | In use |
| Keycloak | App SSO | In use |
| UCS | Infrastructure auth | In production |
| Technitium DNS | Authoritative DNS | In use |
| Pi-hole | Recursive DNS | In use |
| HashiCorp Vault | Machine secrets | Stood up, unused |
| Redmine | Project management | In use |
| Apache Guacamole | Remote jump access | In use |
Backup & Disaster Recovery
| Target | Method | Destination | Frequency | Tested? |
|---|---|---|---|---|
| Cloudron apps | Cloudron backup | Backblaze | Daily (2-day rolling) + pre-upgrade (indefinite) | |
| Proxmox VMs | PBS | Tsys 4 (5TB) | Not active yet |
DR Strategy
- Compute nodes: Stateless, rebuilt from Ansible/git
- IT infrastructure VMs: Will back up to PBS (CA, LibreNMS, AWX, NetBoot)
- Cloudron: Restored from Backblaze
- Per-app backup policies: Planned based on data sensitivity to optimize Backblaze costs
Monitoring & Alerting
| Tool | Role | Alerting? |
|---|---|---|
| Uptime Kuma | Up/down monitoring | |
| Beszel | Host instrumentation | No |
| LibreNMS | Network instrumentation | No |
| Wazuh | Security events | → Pushover/email |
Compliance
- Current focus: CIS benchmarks
- Future: FedRAMP, CMMC under consideration
- Automation: Ansible will enforce hardening baselines
- OS baseline: Debian (99%), Ubuntu Server (Wazuh VM only)
Known Gaps
- Patch management: Weak area. No automated patching. Wazuh SCA may help with compliance visibility.
- Incident ticketing: Redmine handles project management; no ops/incident tracking yet. Needed before RackRental go-live.
- PBS backups: Not active, not tested.
- Snipe-IT: Empty. Asset inventory lives in Beszel + LibreNMS + head.
- Tailscale ACLs: None currently. Flat overlay.
- UPS: Broken, needs remedy.
- 60A 240V circuit: Available but not wired.
Timeline
| Item | ETA |
|---|---|
| Solar array (50kWh + generator inlet) | ~6 weeks |
| Netbox standup | Weeks |
| Ansible hardening rollout | Current priority |
| Cloudron app build/package/deploy documentation | Current priority |
| SITES (San Antonio) go-live | ~2 months |
| K8s, MPI, Gitea/PeerTube runners | ~2 months |
| RackRental network build-out documentation | May |
| RackRental production at SITER | TBD |
Workload Ownership (Planned)
Proxmox resource pools for entity tracking:
RackRentalSTLPSuborbitalKNEL
Complemented by naming convention: [entity]-[function]-[###]
This document is a living overview. Update it when infrastructure changes. Detailed SOPs, runbooks, and architecture decisions live in their respective posts within this category.