Andrew Shutov · Network & System Administration Portfolio

PGR-IR-01 P2 · LAN / SWITCHING RESOLVED SCOPE · Complete LAN failure, switching loop, network redesign

Complete LAN failure at a stone fabrication facility — switching loop diagnosis and network redesign.

// summary

A stone fabrication facility's entire LAN collapsed during production hours. Every workstation lost connectivity, CNC machines went offline, and phone lines dropped simultaneously. The initial assumption was an upstream ISP or firewall failure — multiple services failing at once usually points to the edge. The actual failure was inside the switching fabric: a switching loop had formed when an unmanaged switch was connected to two wall jacks fed by the same switch stack, creating a Layer 2 broadcast storm that saturated every link on the network. Resolution required physically locating the loop source, breaking the broadcast storm by disabling ports until the loop resolved, then redesigning the flat network with proper VLAN segmentation to prevent recurrence. The redesign was eventually implemented on a UniFi/UDM Pro stack with isolated VLANs for data, voice, production equipment, and guest access.

// environment

Facility: ProGranite Surfaces — stone fabrication plant, ~15,000 sq ft
Switching: Two HP ProCurve switches in stack configuration feeding ~30 desk drops + 15 production-area drops
Edge/WAN: UDM Pro (pre-outage: basic flat config, no VLAN segmentation)
Production equipment: CNC stone-cutting machines on wired Ethernet, VoIP phones, workstation PCs
Network design (pre-incident): Single flat /24 subnet, all devices on the same broadcast domain

// symptoms

Complete loss of network connectivity across the entire facility — no device could reach the gateway, internet, or other devices
VoIP phones displayed "No Network" — no registration, no dial tone
CNC machines stopped receiving job files and reported network errors
All workstations lost mapped drives, internet, and inter-device communication
Switch stack link LEDs showed maximum utilization across all ports — sustained 100% activity
UDM Pro dashboard showed no client activity despite dozens of devices being powered on

// initial hypotheses (ruled out)

~~ISP outage~~ — UDM Pro WAN link showed carrier, modem was sync'd to upstream
~~UDM Pro failure/reboot~~ — device was online, web UI accessible from a directly-connected laptop, LAN-side showed no DHCP activity
~~Switch stack failure~~ — both switches were powered, ports showed link but no traffic forwarded successfully
~~DNS/DHCP server failure~~ — UDM Pro was serving both, and a statically-addressed laptop on the same switch stack could not reach the gateway
~~Cable plant issue~~ — too many concurrent failures across independent drops for a physical cabling problem

// investigation

Establish baseline connectivity from the edge. Connected a laptop directly to the UDM Pro LAN port. DHCP worked, internet was reachable. The WAN edge was healthy — the failure was in the switching fabric.
Check switch health and utilization. Accessed the HP ProCurve CLI via serial console. Port utilization was pegged at 99–100% on every active port. Broadcast counters were increasing at thousands per second. This is the signature of a broadcast storm from a switching loop.
Identify the loop carrier. Disabled downlink ports one at a time, observing link utilization on the remaining ports. When a specific office-area port was disabled, utilization on all other ports dropped to near-zero within seconds. Network connectivity returned immediately for all remaining ports.
Trace the offending port. The disabled port connected to a wall jack in the main office. That jack fed a desk where an unmanaged 5-port switch had been connected — and that switch was plugged into two wall jacks, both terminating at the same HP ProCurve stack. Broadcast frames from one port traveled out through the switch and came back in through the second port, creating a perfect Layer 2 loop.
Break the loop physically. Removed the redundant patch cable at the wall. Re-enabled the switch port. Network stabilized immediately.
Confirm full recovery. All devices reconnected via DHCP. VoIP phones registered. CNC machines resumed communication. No further broadcast storm activity.
Assess the underlying vulnerability. The flat /24 network design meant any loop anywhere brought down everything. STP was not enabled on the HP ProCurve stack (or was disabled by default on the affected ports). There was no VLAN isolation, no broadcast containment strategy.
Design the remediation. Proposed and later implemented a redesigned network: UDM Pro with VLAN segmentation isolating production equipment (VLAN 10), office workstations (VLAN 20), VoIP (VLAN 30), and guest/IoT (VLAN 40). STP enabled on all managed switch ports. No unmanaged switches without explicit authorization and proper uplink cabling.

// root cause

RC-1 (immediate): An unmanaged desktop switch was connected to two wall jacks both terminating at the HP ProCurve stack, creating a physical Layer 2 switching loop. Broadcast frames circulated indefinitely, producing a broadcast storm that saturated all switch uplinks and prevented legitimate traffic from being forwarded.

RC-2 (contributing): The flat /24 network design with no VLAN segmentation meant there was no broadcast domain isolation. A loop originating in the office could take down CNC machines on the other side of the facility. STP was either not enabled or not covering the unmanaged switch segment, which had no STP capability.

// resolution

Removed the redundant patch cable that created the loop — immediate restoration of connectivity
Enabled and verified STP (RSTP) on all managed switch ports
Implemented VLAN segmentation design on the UDM Pro: production equipment, office workstations, VoIP, and guest networks isolated into separate VLANs with inter-VLAN firewall rules
Documented the incident and created a written policy: no unmanaged switches without explicit approval and proper single-uplink cabling

// validation

All devices reconnected via DHCP within minutes of breaking the loop
VoIP phones registered and made test calls successfully
CNC machines resumed receiving job files over the network
Broadcast counters returned to normal baseline levels on all switch ports
VLAN segmentation design was implemented as a permanent architectural fix — no recurrence of broadcast storms in the redesigned network

// lessons

A switching loop produces symptoms that look exactly like a total WAN failure. Every device offline, no internet, no inter-device communication — it's easy to blame the ISP when the real problem is inside your switching fabric.
Flat networks amplify any single failure into a facility-wide outage. VLAN segmentation isn't just best practice — it's blast radius containment.
Unmanaged switches are risk vectors. They have no STP, no loop protection, and no management visibility. They belong in environments where you control both ends of every cable.
Bottom-up OSI model diagnosis works. Start with the physical layer, then L2 switching, then L3 routing. If serial console to the switch shows broadcast storms at L2, you don't need to look at the firewall.

PGR-IR-02 P2 · SECURITY / INCIDENT RESPONSE RESOLVED SCOPE · WordPress compromise, dual backdoor, defense-in-depth recovery

WordPress compromise on shared hosting — backdoor removal and defense-in-depth recovery.

// summary

A WordPress site hosted on shared Hostinger/CageFS infrastructure was compromised. The site displayed a defacement page redirecting visitors to a spam domain, and the hosting provider sent a malware detection alert. Initial response — replacing core files and resetting credentials — appeared to resolve the issue. But the compromise persisted through the cleanup. A second infection vector was discovered: PHP backdoor files embedded in the uploads directory using obfuscated base64_decode and eval() payloads, designed to survive a standard file replacement. Complete remediation required a systematic defense-in-depth approach: forensic file analysis, backdoor file removal, credential rotation, WAF deployment, file integrity monitoring setup, and a hardened WordPress configuration.

// environment

Hosting: Hostinger shared hosting with CageFS filesystem isolation
Application: WordPress (standard install, several third-party plugins and themes)
Access: Admin credentials, SFTP access to public_html
Detection: Hostinger security scan flagged malicious file modifications; site visitors saw a spam redirect
Backup status: Pre-compromise backups existed in Hostinger's auto-backup system

// symptoms

Site homepage replaced with a defacement page — visitors redirected to an external spam domain
Hostinger security panel flagged "Malicious code detected" with specific file paths
WordPress admin dashboard accessible but showed altered site health indicators
Standard WordPress file permissions appeared modified on several directories
Google Search Console alerts about site compromise

// initial hypotheses (ruled out)

~~Simple site defacement via known vulnerability~~ — replacing core WordPress files alone did not resolve the issue; the compromise returned
~~Credential compromise (admin password guess)~~ — password reset alone was insufficient; the compromise persisted after credential rotation
~~Plugin vulnerability only~~ — disabling all plugins and using a default theme did not stop the reinfection
~~Hosting provider server compromise~~ — other sites on the same shared hosting account were not affected, suggesting a site-specific infection

// investigation

Confirm initial compromise and scope. Logged into Hostinger panel — confirmed malware alert on specific PHP files. Site was redirecting to spam-domain[.]com. The defacement was the visible symptom, but the vector was unknown.
Attempt standard recovery. Replaced all WordPress core files with fresh downloads, reset all admin credentials, updated plugins. Site appeared clean for ~2 hours — then returned to defacement state. Something was reinfecting the site.
Deep file inspection. Ran a full recursive grep search across public_html for common backdoor signatures: base64_decode, eval() with encoded strings, preg_replace with /e modifier, system(), exec(), shell_exec(). Found two files in wp-content/uploads/2023/ containing obfuscated PHP — hidden among legitimate uploaded images.
Analyze the backdoor payload. The files used a layered obfuscation chain: base64_decode(str_rot13(gzinflate(...))) executed via eval(). The decoded payload checked for specific GET/POST parameters as a trigger — without the trigger key, the file appeared benign. This meant automated scanners could easily miss it.
Check for additional persistence mechanisms. Found no cron-based reinfection, no .htaccess modifications (in this case), no database-stored payloads. The backdoors were pure file-based and depended on being included or accessed directly.
Determine the entry vector. Evidence pointed to a compromised plugin (outdated version with known CVE) as the initial entry point. The attackers uploaded the backdoor files through the compromised plugin's unrestricted file upload capability.
Complete file-level cleanup. Removed both backdoor files. Ran a comprehensive sweep of all PHP files for encoded strings, suspicious file permissions, and unexpected modifications. Verified against known-good hashes from original plugin/theme downloads.
Implement defense-in-depth. Deployed Cloudflare WAF with OWASP rule set and WordPress-specific rate limiting. Installed Wordfence Security with real-time file integrity monitoring and firewall. Removed the vulnerable plugin entirely. Restricted upload directory execution (where hosting allowed). Set up automated offsite backups with 7-day retention.

// root cause

RC-1 (initial compromise): Vulnerable third-party WordPress plugin with an unrestricted file upload CVE allowed attackers to upload PHP backdoor files disguised as legitimate media uploads into wp-content/uploads/.

RC-2 (persistence): The backdoor files used layered obfuscation (base64_decode + str_rot13 + gzinflate + eval) to evade signature-based detection. They were hidden among thousands of legitimate uploaded files and activated only when specific HTTP parameters were present, making them invisible to casual inspection and standard automated scanners.

// resolution

Removed two obfuscated PHP backdoor files from wp-content/uploads/2023/
Replaced all WordPress core files with fresh downloads from wordpress.org
Updated/removed all vulnerable plugins and themes
Reset all credentials (admin, SFTP, database, hosting panel)
Deployed Cloudflare WAF with WordPress-specific security rules
Installed and configured Wordfence Security with real-time file integrity monitoring
Established automated offsite backup pipeline with verified restore testing

// validation

Site returned to normal operation — no further defacement, no redirects
Wordfence file integrity scans confirmed no unexpected file modifications
Hostinger security panel showed clean status
Google Search Console re-verified site ownership and flagged the site as clean
Cloudflare WAF logged blocked exploit attempts in the days following cleanup, confirming the WAF was intercepting ongoing attack traffic
File integrity monitoring established as an ongoing operational practice — monthly audit cadence implemented

// lessons

File replacement alone is not incident response. Until you find and remove the persistence mechanism, you're treating symptoms, not the infection.
Backdoor files in upload directories are a standard WordPress attack pattern. Upload directories are writable by design and easy to hide files in among legitimate uploads.
Obfuscated PHP (base64_decode + eval chains) defeats naive grep searches. You have to decode and understand the payload, not just match strings.
Defense-in-depth is what keeps a cleaned site clean. WAF, file integrity monitoring, restricted file permissions, regular updates, and verified backups — any one layer can fail, but multiple layers make reinfection harder.

Network & System Admin
in deliberate motion.

Infrastructure has been the through-line.
Now deliberately the career.

Hands-on before I had the title.

The homelab is where I prove it.

CCNA, RHCSA, and the deliberate path.

Two incident reports.
Real production, real investigation, real engineering judgment.

Complete LAN failure at a stone fabrication facility — switching loop diagnosis and network redesign.

WordPress compromise on shared hosting — backdoor removal and defense-in-depth recovery.

Before the homelab,
there was a production network to fix.

IT Support Specialist · ProGranite Surfaces · Seattle, WA

IT Support · UW Continuum College · Seattle, WA

The way I'd want a data center
to be operated.

Composite health view

Hypervisors

Storage

Scripting isn't optional
in a modern infrastructure role.

Hermes Agent — AI-Powered Operations

Weekly Security Audit Pipeline

Infrastructure Monitoring & Alerting

NFS Mount Recovery & Backup Orchestration

Hire the operator.
The certifications follow.

Infrastructure has been the through-line.Now deliberately the career.

Hands-on before I had the title.

The homelab is where I prove it.

CCNA, RHCSA, and the deliberate path.

Two incident reports.Real production, real investigation, real engineering judgment.

Complete LAN failure at a stone fabrication facility — switching loop diagnosis and network redesign.

WordPress compromise on shared hosting — backdoor removal and defense-in-depth recovery.

Before the homelab,there was a production network to fix.

IT Support Specialist · ProGranite Surfaces · Seattle, WA

IT Support · UW Continuum College · Seattle, WA

The way I'd want a data centerto be operated.

Composite health view

Hypervisors

Storage

Scripting isn't optionalin a modern infrastructure role.

Hermes Agent — AI-Powered Operations

Weekly Security Audit Pipeline

Infrastructure Monitoring & Alerting

NFS Mount Recovery & Backup Orchestration

Hire the operator.The certifications follow.

Infrastructure has been the through-line.
Now deliberately the career.

Two incident reports.
Real production, real investigation, real engineering judgment.

Before the homelab,
there was a production network to fix.

The way I'd want a data center
to be operated.

Scripting isn't optional
in a modern infrastructure role.

Hire the operator.
The certifications follow.