Techops Overview

KNEL Infrastructure Overview

Last Updated: [DATE]
Owner: @reachableceo
Review Cadence: Quarterly or after significant changes
Scope: Known Element Enterprises LLC (KNEL) — centralized IT for TSYS Group


Purpose

This post provides a comprehensive overview of KNEL infrastructure across all sites. It serves as the starting point for anyone (human or AI) needing to understand what exists, where it lives, and how it connects.


Business Context

KNEL delivers IT and business systems services to all TSYS Group entities:

  • RackRental.net — Network lab rental (separate documentation)
  • Starting Line Productions — Engineering/render (minimal infra, Cloudron-hosted)
  • Suborbital Systems Development — R&D, specialty electronics
  • Side Door Solutions Group / Side Door Group / AFABN — Non-profits
  • Other TSYS entities as needed

Sites

SITER — Pflugerville, TX (Residence, Purpose-Built Server Room)

Primary datacenter. All compute and storage.

  • Internet: AT&T Fiber, 1 Gbps symmetric
  • Network: 192.168.0.0/22, flat network, Tailscale overlay
  • Power: Currently single 20A 125V circuit; 60A 240V circuit available (not wired yet)
  • Cooling: 25k BTU mini split
  • UPS: Broken — needs remedy
  • Battery: 25kWh base power → upgrading to 50kWh + generator inlet (6 weeks)
  • Generator: 50A 240V on hand

SITES — San Antonio, TX (~2 months out)

Future site for RackRental and STLP HPC workloads.

  • Tailscale router VM
  • Proxmox nodes for RackRental and STLP HPC
  • RackRental build-out documented in May

SITEV — Netcup VPS (Cloudron)

Command, control, orchestration, and public-facing services.

  • Cloudron manages all public-facing apps
  • Let’s Encrypt for all public TLS certificates

SITER Rack Layout

Rack Contents Entity
1 RackRental switches RackRental (separate docs)
2 RackRental routers RackRental (separate docs)
3 Tsys 6, 7, 8 (Proxmox) + 6 powered-off spares + dedicated storage switch Mixed (RR, STLP, Suborbital R&D)
4 Tsys 1-5 + core switch KNEL core production
5 Switch, APs/routers, Pis, Arduinos Suborbital

Hypervisor Inventory

Host Rack Role Type Notes
Tsys 1 4 Compute Proxmox
Tsys 2 4 Compute + Storage Windows 10 NetBoot (DHCP/TFTP/Webmin), Pi-hole
Tsys 3 4 Compute Proxmox
Tsys 4 4 Storage Proxmox PBS (5TB), USB toaster drives, ext4/NFS
Tsys 5 4 Compute Proxmox
Tsys 6 3 Compute Proxmox
Tsys 7 3 Compute Proxmox
Tsys 8 3 Storage Proxmox ZFS, 4 internal drives, NFS. RR/STLP storage.

Total: 7 Proxmox + 1 Windows 10 = 8 hypervisors

All Proxmox nodes are standalone (no clustering). Managed via Proxmox Data Center.

6 additional powered-off servers in Rack 3 available for network boot as Proxmox nodes (general compute pool for RR/STLP).


Network Architecture

  • Physical: Flat network, single VLAN (VLAN 1) for management
  • Storage: Non-routed storage VLAN on core switch; dedicated storage switch in Rack 3
  • Overlay: Tailscale (exit node, subnet router, DNS → Technitium)
  • Switches: Dell across the board, monitored via LibreNMS
  • RackRental: L3 isolated behind Cisco router
  • LLDP: Enabled on all systems

DNS

  • Technitium DNS (on PFV-Tailscale-router VM) — authoritative for knel.net
  • Pi-hole (on Tsys 2) — recursive, forwards to 8.8.8.8, reverse lookups to Technitium

DHCP

  • NetBoot VM (Tsys 2) — DHCP with static leases managed via Webmin

Identity & Access

Layer Tool Scope
Application SSO Keycloak (Cloudron) 99% of auth
Infrastructure Auth Univention Corporate Server (UCS) Server login, switch RADIUS, RackRental Cyclades console
Secrets Bitwarden Human credentials, shared collections
Machine Secrets HashiCorp Vault Stood up, not in use yet. Planned for Ansible.

Onboarding

  • Contractor: Cloudron invite → Apache Guacamole → server access
  • Albert/Charles: Super admin + Ultix-highside (Surface Laptop Go 2 in server room) via Tailscale/RDP

Certificate Management

Layer Method Scope
Public TLS Cloudron → Let’s Encrypt 99% of services
Internal Nitrokey HSMs (root + intermediate CA) SSH certificates, internal TLS. Passed through to Debian VM.

Services Currently Running

Cloudron (SITEV)

  • Discourse, Snipe-IT (empty), phpIPAM, Dolibarr (per-entity instances)
  • Gitea, PeerTube (runners pending, ~2 months)
  • Keycloak, Vault (unused), Beszel
  • WordPress sites, Redmine (project management)
  • Uptime Kuma (uptime monitoring → Pushover)
  • ~30 more apps coming (Jenkins, etc.)

SITER Proxmox VMs

  • Wazuh — Security/SIEM (in production)
  • AWX (Ansible Tower) — Stood up, not deploying yet
  • UCS — Identity/RADIUS (production)
  • PFV-Tailscale-router — Exit node, subnet router, Technitium DNS
  • NetBoot VM (Tsys 2) — DHCP/TFTP/Webmin/Pi-hole
  • CA Debian VM — Nitrokey HSM internal CA
  • LibreNMS + Netdisco — Network monitoring/instrumentation
  • PBS — Proxmox Backup Server on Tsys4 (5TB, not actively backing up yet)
  • K8s nodes — Control + worker, no workloads yet (~2 months)
  • MPI cluster — No workloads yet (~2 months)

All Proxmox VMs: Debian stock + Beszel agent + Tailscale


Toolchain

Tool Purpose Status
Snipe-IT Asset lifecycle Empty, needs populating
phpIPAM IPAM In use
Netbox DCIM/cabling Coming in weeks
Dolibarr ERP/business In use, per-entity
LibreNMS + Netdisco Network monitoring In use
Beszel Host instrumentation In use
Wazuh Security/SIEM In production
Uptime Kuma Uptime alerting In production → Pushover
Proxmox Data Center Hypervisor management In use
Keycloak App SSO In use
UCS Infrastructure auth In production
Technitium DNS Authoritative DNS In use
Pi-hole Recursive DNS In use
HashiCorp Vault Machine secrets Stood up, unused
Redmine Project management In use
Apache Guacamole Remote jump access In use

Backup & Disaster Recovery

Target Method Destination Frequency Tested?
Cloudron apps Cloudron backup Backblaze Daily (2-day rolling) + pre-upgrade (indefinite) :white_check_mark: Yes
Proxmox VMs PBS Tsys 4 (5TB) Not active yet :cross_mark: No

DR Strategy

  • Compute nodes: Stateless, rebuilt from Ansible/git
  • IT infrastructure VMs: Will back up to PBS (CA, LibreNMS, AWX, NetBoot)
  • Cloudron: Restored from Backblaze
  • Per-app backup policies: Planned based on data sensitivity to optimize Backblaze costs

Monitoring & Alerting

Tool Role Alerting?
Uptime Kuma Up/down monitoring :white_check_mark: → Pushover + mobile app
Beszel Host instrumentation No
LibreNMS Network instrumentation No
Wazuh Security events → Pushover/email

Compliance

  • Current focus: CIS benchmarks
  • Future: FedRAMP, CMMC under consideration
  • Automation: Ansible will enforce hardening baselines
  • OS baseline: Debian (99%), Ubuntu Server (Wazuh VM only)

Known Gaps

  • Patch management: Weak area. No automated patching. Wazuh SCA may help with compliance visibility.
  • Incident ticketing: Redmine handles project management; no ops/incident tracking yet. Needed before RackRental go-live.
  • PBS backups: Not active, not tested.
  • Snipe-IT: Empty. Asset inventory lives in Beszel + LibreNMS + head.
  • Tailscale ACLs: None currently. Flat overlay.
  • UPS: Broken, needs remedy.
  • 60A 240V circuit: Available but not wired.

Timeline

Item ETA
Solar array (50kWh + generator inlet) ~6 weeks
Netbox standup Weeks
Ansible hardening rollout Current priority
Cloudron app build/package/deploy documentation Current priority
SITES (San Antonio) go-live ~2 months
K8s, MPI, Gitea/PeerTube runners ~2 months
RackRental network build-out documentation May
RackRental production at SITER TBD

Workload Ownership (Planned)

Proxmox resource pools for entity tracking:

  • RackRental
  • STLP
  • Suborbital
  • KNEL

Complemented by naming convention: [entity]-[function]-[###]


This document is a living overview. Update it when infrastructure changes. Detailed SOPs, runbooks, and architecture decisions live in their respective posts within this category.