A nature of the work we do within the Professional Services team at Axians is Proof of Concept projects and a number different types of testing for Large Service Providers and Enterprises using our Ixia etc using multiple vendors such as Juniper, Silver Peak, Versa, Cisco and more.  

We currently use Netbox to keep track of devices, tenants, line cards, IP addresses etc. However because of the ever changing nature of our lab we do not track interfaces and cables as we swap out line cards and interfaces so frequently for different customer projects.

Netbox Overview:

  • 13 Racks
  • 645 Devices (Patch Panels, Line Cards, Chassis etc.)
  • 390 Recorded Cables (Mostly Power and Core Infrastructure)
  • 22 Tenants (We use tenants to track customer projects)
A16 Rack

As you can see this is what a typical rack looks like...

Background

Our parent company Vinci Energies has a green initiative whereby all Business Units should be looking to reduce their carbon footprint where possible. In our case in Axians UK we are powering 13 racks of equipment that might not all be used due to projects not starting, kit not being used for specific customers etc.

More on this can be seen here.

In addition to this we do not own the facility in which these racks are located meaning we have to pay for power consumption...

Power Monitoring

Since starting at Axians I have built several lab tools that allow us to monitor the power usage within our lab. Luckily our PDU's are "SMART" allowing us to monitor and configure them using the GUI and SNMP.  We are monitoring with the Telegraf, Influx, Grafana (TIG) stack to poll the Smart PDU's within each rack for their power consumption.

Previous Experience

In a former role within Cisco Solution Validation Services I worked within the lab team as an intern to rack, stack and test customer solutions but also on the lab tooling team using PHP. Within Cisco we had several hundreds of racks and thousands of devices split over multiple locations and regions not all needing to be powered on. Because of this we had an internal tool built in PHP that power scheduled projects when they were required.

Solution

Before Netbox Plugins we had basic internal tools to poll netbox via API and display the information on a web page along with power information with custom links pointing devices to our external power management tool which allowed us to see port status for devices and the ability to force a power socket into the "off" state.

OLD PDU APP (FLASK)

Netbox Plugins have allowed us to integrate this into Netbox with the addition of several features. BIG SHOUT OUT TO JEREMY AND THE TEAM.


Firstly here are some of our requirements:

  • Ability to gracefully shutdown devices (request system halt)
  • Turn off whole tenants
  • Ability to exclude devices such as servers
  • View PDU port status
  • Power schedule devices (turn off at night, turn on in the morning)
  • Queue Jobs
  • Power schedule reports (Logging)
  • Email Reports
Gracefully shutting down regards to logging into the device using Napalm to request a shutdown. If the shutdown is successful then schedule the socket to be powered down once the device is fully off.

How this was achieved:

Configuration

Netbox Plugins allow for configuration to be included within configuration.py.

PLUGINS_CONFIG = {
    'netbox_axians_pdu': {
        'public': 'public',
        'private': 'private',
        'power_down_wait': 1,
        'device_type': {
            'cf_enabled': 'Power Enabled',
            'cf_override': 'Force Override'
        },
        'tenant': {
                'cf_enabled': 'Power Scheduling',
                'cf_start_date': 'Project Start Date',
                'cf_end_date': 'Project End Date',
                'on_time': "07:00",
                'off_time': "18:30"
        },
        'pdu_model': {
            'ap8953': {
                'port_state_oid': '.1.3.6.1.4.1.318.1.1.26.9.2.4.1.5',
                'power_usage_watts': '.1.3.6.1.4.1.318.1.1.12.1.16.0'
            }
        },
        'emails': ['alexander.gittings@axians.co.uk'],
        'email': True
    }
}

This configuration includes items like SNMP Credentials, Custom Field Information, PDU SNMP OID information and emails.

I opted to use Netbox Custom Fields so that users were able to use Device Types to determine if a devices power can be managed or not or if we want to force the socket off instead of a graceful power down without having to go to a seporate part of Netbox.

A feature that would be neat within Netbox Plugins would be to allow us to add aditional inputs within forms such as device types instead of using Custom Fields.

Logging

Firstly I needed to build a logging class that would allow me to store logs and display them nicely for users. For this I was able to get a lot of inspiration from the current Netbox Reporting functionality.

Which allowed me to use the following logging structure within my tasks.

self.log_success(obj, "success message")
self.log_warning(obj, "warning message")
self.log_info(obj, "info message")
self.log_failure(obj, "failure message")

Tasks

I then needed to break my requirements down into tasks which could be executed.

This is a high-level view of the tasks but shows the basic structure.

  • Set Tenant Power
    • Set Device Power
      • Turn Off Device Gracefully (Napalm)
      • Set Outlet Power
  • Get PDU Outlet Status
  • Get Individual Outlet Status


There is a lot more to this but structure allows a device to be powered individually or as a group through a tenant. The individual components all come together to minimise repetition.

Power Scheduling

Once all the above is in place we then use Django rq-scheduler to run tasks periodically.

Get PDU Outlet Status
Power Schedule

Checking the outlet status performs a walk of the PDU outlets for power state and updates a model within Netbox. The Power Schedule checks the current time against the time in the configuration.py settings and determins if it is a weekend or evening etc. If its the evening it begins to power down tenants if the devices models have been enabled but also if the tenant is enabled.

Results

Tenant View

We are able to see the power status of a device within the tenant that has power ports connected.

Tenant Power Buttons

If a tenant has devices with connected power ports a button appears giving the user the ability to turn on/off (gracefully)/off (forcefully).

Device View

If there are connected ports then show the power buttons for a device and also show the status of the outlets.

PDU Device View

On a device with outlets the outlet status is displayed along with the time it was last polled.

Power Schedule Reports

An overview of the automated power tasks which are run every 30 minutes. This view shows how many devices were affected, if an email has automatically been sent as well as the summary of the results (Success, Info, Warning, Failure).

Power Schedule Report Detail

This view allows us to see what is happening each time the power schedule is run. We also gather the same information for manual power activities. But this is contained within the Django Admin Panel.

Report Email

When an email is sent it looks like the following:

Grafana

e

This shows the implementation of this tool on one tenant so far and the affect its having on our power consumption.

On average this one tenant saves around 6KW of power when turned off this equates to around 25 kgCO2e saved per day and around £10 in electricity over a year that surely builds up.

Improvements

There are several improvements to make:

  • Support more vendors
  • Better error handling (if a devices failed to be turned off twice put it into an exclusion table so that is not attempted on future power schedules)

For now thats all. If you're interested in more details please contact me at alexander.gittings@axians.co.uk.