FireMon (2013–Present)
FireMon is a software development company based in Overland Park, KS. As the System Architect, I focus on building a scalable platform for delivering FireMon software to customers that is easy to use. FMOS, the FireMon Operating System, is a mechanism for delivering the FireMon SIP to customers and a collection of tools for deploying and managing the software in a wide array of environments, ranging from a single server to massive multi-node ecosystems.
FMOS: FireMon Operating System
Ansible Configuration Policy
- Configuration policy for deployment of all FireMon software and third-party dependencies
- Support for single-server and distributed deployments
- Automatically compute JVM heap sizes for each process based on available resources
- Configures Elasticsearch in single-node or clustered mode
- Configures PostgreSQL with optional replication to standby servers
- Configures Kernel NFS server and client to share filesystem data between machines
- Configures FireMon application server processes, including connection and authentication information for PostgreSQL, Elasticsearch
- Configures strongSwan IPsec/IKEv2 key management daemon for opportunistic encryption of Elasticsearch communication
- Configures operating system login, password policy, including support for external authentication providers such as LDAP or Kerberos
- Sets up collectd and Carbon (Graphite data storage engine) to track system performance metrics, optionally replicating metrics data to a FireMon-managed central storage for real-time review
- Optionally configures rsyslog to send log messages to remote destinations over UDP, TCP, or TCP+TLS
- Configures tmux to automatically launch at user login
Deployment and Maintenance Tools
- Python software for configuring and managing machines running FireMon
software (
fmos
command) - Critical functionality for application maintenance:
- Updating OS and software
- Backing up and restoring data
- Capturing diagnostic information for technical support
- Modifying configuration settings
- Managing server certificates and private keys
- D-Bus daemon to handle privileged operations
- Unprivileged command-line interface
- HTTP API developed with FastAPI
Generation II Platform
- Based on CentOS 7
- Full-disk encryption using LUKS
- Anaconda installer with custom addon for generating machine-specific LUKS master key passphrase
- Kickstart script for fully-automated installation
- Used Koji to build RPM packages for first- and third-party software
- Distribution included Ansible for configuration management
- systemd units for controlling FireMon application services
Generation III Platform
- Based on CentOS 7, later CentOS 8 (Stream)
- Immutable SquashFS root filesystem image
- Full-disk encryption using LUKS
- Custom Dracut modules to verify image OpenPGP signature, mount as rootfs, initialize LUKS-encrypted persistent data volume with LVM
- Custom SELinux policy to confine FireMon software
Cloud-Hosted Public Services
- FMOS Support File Upload Service
- Deployed in AWS EC2 using Elastic Beanstalk
- HTTP API for resumable file uploads using a content-addressable chunked data storage system
- Allows FireMon customers to easily upload FMOS diagnostic packages to FireMon support for analysis
- FMOS News Service
- Deployed in AWS EC2 using Elastic Beanstalk
- HTTP API for providing important notifications (e.g. release announcements, vulnerability disclosures, etc.) to FireMon customers through the FMOS command-line interface
- Victoria Metrics
- Deployed to AWS EC2 using Terraform
- Clustered deployment to facilitate scalability and reliability
- Receives Prometheus metrics (via remote write protocol) from FMOS instances deployed at customer sites and in the cloud
DevOps Team Lead
- Exclusively managed all resources using Ansible configuration management
- Deployed and maintained hundreds of internal and cloud systems running RHEL/CentOS Linux (5, 6, 7, 8)
- PXE provisioning of all on-premises virtual machines
- All machines Active Directory domain members using Samba/Winbind
- Zabbix system monitoring
- Agent installed on all machines
- Collects system availability and performance metrics
- Custom templates for basic application availability metrics
- Atlassian Bitbucket (Stash) Git repository host
- Jenkins continuous integration platform
- Integrated with Bitbucket for project discovery and change events
- Jobs configured using
Jenksinsfile
pipeline definition files within repositories - Build environments defined as container images, jobs run in Docker containers on Jenkins agents
- Ephemeral agents using vSphere plugin, various virtual machine templates for different project needs
- Application data backups using BURP: Back Up and Restore Program
- Graylog log aggregation
- All machines send system, application logs via syslog over TLS, using rsyslog
- Custom pipelines for parsing and indexing fields from log messages
- Alerts based on log message contents, frequency
- Prometheus application monitoring
- Victoria Metrics time-series database
- Prometheus exporters for many applications (Jenkins, Bitbucket, Elasticsearch, GlusterFS, HAProxy, Nginx, Redis)
- Custom Grafana dashboards for status display, performance analysis
- collectd monitors system performance from ephemeral Jenkins worker nodes via multicast, exposes Prometheus metrics
- AlertManager notifications to e-mail and Slack for application availability and performance alerts
- HashiCorp Vault HA cluster for secret storage, including Jenkins credentials
Internal Tools
FMOS Web Tools
- Internal application used by software developers and support agents
- Multi-tiered architecture with multiple nodes at each tier to avoid any
single point of failure
- Application Server Tier: Python 3.6/FastAPI
- Storage Tier: GlusterFS
- Index Tier: Elasticsearch
- Cache Tier: Redis
- Message Tier: RabbitMQ
- Worker Tier: Python 3.6/Celery
- Ingress: HAProxy
- User Interface: Typescript/Vue+Vuetify
PR Bot
- Implements a web hook for Atlassian Bitbucket (stash)
- Reacts to new and updated Pull Requests
- Automatically checks Git commits and changed code to enforce style guide and other project-specific requirements
- Adds comments to Pull Requests indicating check results, marks PR as approved or needs work
- Written in Python, no external dependencies
QEMU VM Log Socket Proxy
- Component of FMOS End-to-End tests running on-premises using QEMU/libvirt
- Uses kernel inotify(7) events to detect virtual machine log channel socket files appearing on the VM host
- Automatically connects to sockets as they appear
- Receives all data from channel sockets and writes them to a file in the libvirt storage pool
- Written in Rust
FMOS ISO Writer
- Internal application used by development and QA teams to write FMOS installer images to USB disks attached to remote physical appliances
- Accessible via purpose-built, ultra-minimal Linux distribution (Kernel and Busybox only) delivered by network boot/PXE
- Written in Rust
Environment Launcher
- Internal application that allows FireMon employees to launch FireMon SIP environments quickly, as containers running in Kubernetes
- Allows users to choose specific feature branches of each front-end and back-end component, to facilitate testing of work in progress
- Written in Rust, using the Rocket web framework
FireMon-as-a-Service
- Cloud-hosted FireMon software deployment
- Deployed backend infrastructure for federated authentication using OpenLDAP, MIT kerberos
- Followed Infrastructure-as-Code principles using Ansible
- Developed custom integrated authentication solution for FireMon Security Manager software to provide full-featured account and credential management using Kerberos protocol (Authgate)
- Python bindings for mit-kerberos using Cython