FireMon (2013–Present)

FireMon is a software development company based in Overland Park, KS. As the System Architect, I focus on building a scalable platform for delivering FireMon software to customers that is easy to use. FMOS, the FireMon Operating System, is a mechanism for delivering the FireMon SIP to customers and a collection of tools for deploying and managing the software in a wide array of environments, ranging from a single server to massive multi-node ecosystems.

FMOS: FireMon Operating System

Ansible Configuration Policy

  • Configuration policy for deployment of all FireMon software and third-party dependencies
  • Support for single-server and distributed deployments
  • Automatically compute JVM heap sizes for each process based on available resources
  • Configures Elasticsearch in single-node or clustered mode
  • Configures PostgreSQL with optional replication to standby servers
  • Configures Kernel NFS server and client to share filesystem data between machines
  • Configures FireMon application server processes, including connection and authentication information for PostgreSQL, Elasticsearch
  • Configures strongSwan IPsec/IKEv2 key management daemon for opportunistic encryption of Elasticsearch communication
  • Configures operating system login, password policy, including support for external authentication providers such as LDAP or Kerberos
  • Sets up collectd and Carbon (Graphite data storage engine) to track system performance metrics, optionally replicating metrics data to a FireMon-managed central storage for real-time review
  • Optionally configures rsyslog to send log messages to remote destinations over UDP, TCP, or TCP+TLS
  • Configures tmux to automatically launch at user login

Deployment and Maintenance Tools

  • Python software for configuring and managing machines running FireMon software (fmos command)
  • Critical functionality for application maintenance:
    • Updating OS and software
    • Backing up and restoring data
    • Capturing diagnostic information for technical support
    • Modifying configuration settings
    • Managing server certificates and private keys
  • D-Bus daemon to handle privileged operations
  • Unprivileged command-line interface
  • HTTP API developed with FastAPI

Generation II Platform

  • Based on CentOS 7
  • Full-disk encryption using LUKS
  • Anaconda installer with custom addon for generating machine-specific LUKS master key passphrase
  • Kickstart script for fully-automated installation
  • Used Koji to build RPM packages for first- and third-party software
  • Distribution included Ansible for configuration management
  • systemd units for controlling FireMon application services

Generation III Platform

  • Based on CentOS 7, later CentOS 8 (Stream)
  • Immutable SquashFS root filesystem image
  • Full-disk encryption using LUKS
  • Custom Dracut modules to verify image OpenPGP signature, mount as rootfs, initialize LUKS-encrypted persistent data volume with LVM
  • Custom SELinux policy to confine FireMon software

Cloud-Hosted Public Services

  • FMOS Support File Upload Service
    • Deployed in AWS EC2 using Elastic Beanstalk
    • HTTP API for resumable file uploads using a content-addressable chunked data storage system
    • Allows FireMon customers to easily upload FMOS diagnostic packages to FireMon support for analysis
  • FMOS News Service
    • Deployed in AWS EC2 using Elastic Beanstalk
    • HTTP API for providing important notifications (e.g. release announcements, vulnerability disclosures, etc.) to FireMon customers through the FMOS command-line interface
  • Victoria Metrics
    • Deployed to AWS EC2 using Terraform
    • Clustered deployment to facilitate scalability and reliability
    • Receives Prometheus metrics (via remote write protocol) from FMOS instances deployed at customer sites and in the cloud

DevOps Team Lead

  • Exclusively managed all resources using Ansible configuration management
  • Deployed and maintained hundreds of internal and cloud systems running RHEL/CentOS Linux (5, 6, 7, 8)
  • PXE provisioning of all on-premises virtual machines
  • All machines Active Directory domain members using Samba/Winbind
  • Zabbix system monitoring
    • Agent installed on all machines
    • Collects system availability and performance metrics
    • Custom templates for basic application availability metrics
  • Atlassian Bitbucket (Stash) Git repository host
  • Jenkins continuous integration platform
    • Integrated with Bitbucket for project discovery and change events
    • Jobs configured using Jenksinsfile pipeline definition files within repositories
    • Build environments defined as container images, jobs run in Docker containers on Jenkins agents
    • Ephemeral agents using vSphere plugin, various virtual machine templates for different project needs
  • Application data backups using BURP: Back Up and Restore Program
  • Graylog log aggregation
    • All machines send system, application logs via syslog over TLS, using rsyslog
    • Custom pipelines for parsing and indexing fields from log messages
    • Alerts based on log message contents, frequency
  • Prometheus application monitoring
    • Victoria Metrics time-series database
    • Prometheus exporters for many applications (Jenkins, Bitbucket, Elasticsearch, GlusterFS, HAProxy, Nginx, Redis)
    • Custom Grafana dashboards for status display, performance analysis
    • collectd monitors system performance from ephemeral Jenkins worker nodes via multicast, exposes Prometheus metrics
    • AlertManager notifications to e-mail and Slack for application availability and performance alerts
  • HashiCorp Vault HA cluster for secret storage, including Jenkins credentials

Internal Tools

FMOS Web Tools

  • Internal application used by software developers and support agents
  • Multi-tiered architecture with multiple nodes at each tier to avoid any single point of failure
    • Application Server Tier: Python 3.6/FastAPI
    • Storage Tier: GlusterFS
    • Index Tier: Elasticsearch
    • Cache Tier: Redis
    • Message Tier: RabbitMQ
    • Worker Tier: Python 3.6/Celery
    • Ingress: HAProxy
    • User Interface: Typescript/Vue+Vuetify

PR Bot

  • Implements a web hook for Atlassian Bitbucket (stash)
  • Reacts to new and updated Pull Requests
  • Automatically checks Git commits and changed code to enforce style guide and other project-specific requirements
  • Adds comments to Pull Requests indicating check results, marks PR as approved or needs work
  • Written in Python, no external dependencies

QEMU VM Log Socket Proxy

  • Component of FMOS End-to-End tests running on-premises using QEMU/libvirt
  • Uses kernel inotify(7) events to detect virtual machine log channel socket files appearing on the VM host
  • Automatically connects to sockets as they appear
  • Receives all data from channel sockets and writes them to a file in the libvirt storage pool
  • Written in Rust

FMOS ISO Writer

  • Internal application used by development and QA teams to write FMOS installer images to USB disks attached to remote physical appliances
  • Accessible via purpose-built, ultra-minimal Linux distribution (Kernel and Busybox only) delivered by network boot/PXE
  • Written in Rust

Environment Launcher

  • Internal application that allows FireMon employees to launch FireMon SIP environments quickly, as containers running in Kubernetes
  • Allows users to choose specific feature branches of each front-end and back-end component, to facilitate testing of work in progress
  • Written in Rust, using the Rocket web framework

FireMon-as-a-Service

  • Cloud-hosted FireMon software deployment
  • Deployed backend infrastructure for federated authentication using OpenLDAP, MIT kerberos
  • Followed Infrastructure-as-Code principles using Ansible
  • Developed custom integrated authentication solution for FireMon Security Manager software to provide full-featured account and credential management using Kerberos protocol (Authgate)
  • Python bindings for mit-kerberos using Cython