When power peaks have been mitigated and power dynamically allocated, then available power can be reallocated. The combination of peak mitigation and dynamic power allocation maximize available power utilization. To provide a true hyper-scalable data center, rack power must be able to scale dynamically along with other resources without excessive overprovisioning or expensive and disruptive power-supply upgrades. This can be achieved with a software-hardware combination that optimizes power provisioning.

Intel’s Rack Scale Design (RSD) supports comprehensive management for all resources in the data center, enabling a true software-defined infrastructure. RSD data-center architecture separates compute, storage, and network resources into groups of components, called pools, that can be efficiently assembled or “composed” on demand to create a precise hardware configuration to match a specific software workload requirement.

Because resources are separated and abstracted, each type of resource can be expanded, replaced, or upgraded on its own refresh cycle. This ensures that users get the latest technology, best performance, and optimum capacity without prematurely obsoleting other resources. However, with traditional rack power facilities, module expansions and upgrades can still be limited by power provisioning, resulting in energy waste and potential interruption of services in the data center.

In most data centers, overprovisioning is a response to several factors:

  • Power safety margins required to compensate for the inability to accurately predict power load demands, including peak and seasonal variations.
  • Unpredictability, exacerbated by very-high-performance systems, which may drive increasingly large variations in power consumption during peak performance periods.
  • Power equipment that’s typically only available (or economical) in large increments.

Software-Defined Power

Software-Defined Power (SDP), combined with associated hardware, manages provisioning by employing Intelligent Control of Energy (ICE). This involves monitoring power consumed by IT loads at sub-second intervals using strategically located power sensors. This includes ICE hardware that’s been optimized to provide sensor data.

CUI partnered with Virtual Power Systems (VPS) to deliver ICE—a solution that uses an existing infrastructure footprint to deliver more power within the data center by elasticizing it as a resource. ICE intelligently and dynamically allocates power to racks, branch circuits, and IT nodes, with constant awareness of power-consumption needs across the data-center topology.

SDP improves the efficiency and flexibility of data-center power distribution to reduce power over-allocation and make it easy to adapt to dynamic power requirements. VPS calls this capability “Capacity Assurance.” SDP delivers Capacity Assurance by monitoring power consumed by IT loads at sub-second intervals using strategically located power sensors.

This increased visibility of immediate power requirements is coupled with active energy storage in the form of batteries at key locations within the data center, which can be used to meet peak demand without overloading the overall power infrastructure. SDP provides power capacity assurance by increasing the effective capacity of the power infrastructure smoothly and economically, avoiding expensive upgrades.

This basic technology is enhanced with learning algorithms that analyze and predict power consumption over time. As a result, it can automatically optimize load balancing parameters to ensure sustained power availability and marginally higher capacity. Multiple levels and complexity of optimization and intelligence ensure that both short- and long-term power consumption trends are considered.

A key component of SDP is SourceMix, which addresses the need for dynamic redundancy. This technology complements the static power infrastructure in a typical data center by providing intelligent, software-defined power redundancy through optimized control of power switches. With SDP’s in-rack switches, customers using dual-corded power distribution are able to dynamically assign power redundancy while staying within the overall capacity of the power infrastructure. SourceMix’s dynamic power redundancy allocation can optimize the mix of single and dual corded power distribution to unlock unused power capacity as workload requirements change.

1. Shown is a typical ICE system is housed in a standard 19-in., 1U package and carries UL/cUL safety certifications.

Figure 1 shows a conceptual ICE setup between the power distribution unit (PDU) and servers. All ICE hardware devices, including the one in Fig. 1, are monitored and controlled through the ICE Software Controller, connected via Ethernet. The ICE controller puts the PDU, and its racks, on a desired power budget, which then places appropriate peak-shaving limits on the ICE hardware.

Subsequently, the ICE hardware removes peaks past the desired budget through the timely release of stored energy. The ICE controller determines the timely discharge of stored energy to control the peaks, while maintaining the health and charge of the battery. The ICE Switch monitors power and provides dynamic redundancy inside the data center. Consisting of two identical modules with up to 50 A of current per module, the ICE Switch features hot-swap functionality and a configurable single- or three-phase input.

The software, developed by VPS, works in a scaled-out distributed mode. It treats the ICE Block, manufactured by CUI, like batteries, as a pool of resources across the data center as well as separate units. Treating racks as a group (a row or a room or data center), rather than controlling individual racks, enables efficient means power optimization.

Acting as a unit, the ICE hardware makes it possible to exceed, for short periods, the rack or branch circuit breaker capacity, providing peak assurance beyond breaker limits without tripping any breaker. This group policy control is referred to as ICE Rackshare and acts as an application running on the controller.

2. The ICE system includes the ICE Block and ICE Switch, a power monitoring and switching system that delivers dynamic redundancy capabilities. It allows unutilized power to be provisioned to additional servers, then reallocated to critical equipment if an outage occurs.

The circuit that controls ac power for the compute, storage, and network nodes is shown in Figure 2. Initially, the battery is charged. When called upon by the software-defined power, the battery powers the inverter to produce a sine-wave power output to support a peak demand for power. The ICE Block consists of a current-mode inverter and a battery charger. The device is designed to limit the input power, when required, as defined by the active policy regardless of load power variations.

ICE employs an optimization algorithm that dynamically controls the mix of utility power and local battery power consumed at different points in the data-center topology. By adjusting the mix every few seconds, ICE controls which batteries are charging and those that are discharging at any moment, creating a form of “powersharing” among all batteries. The result is a more dynamic power capacity that can be “moved around” in the data center to where and when it’s needed.

Hardware and software modules that can be placed on the three power decision control points (Automatic Transfer Switch, Static Transfer Switch/Power Distribution Units, Rack) help make dynamic, on-demand decisions on capacity and sourcing.

ICE software consists of an ICE Operating System that collects telemetry data from ICE and other infrastructure hardware, making this data presentable for real-time control operations in ICE. ICE applications sit on top of the ICE OS and perform a variety of functions. The most critical is executing power-optimization algorithms that maximize capacity use to deliver various objective functions.

Power Provisioning

There is variability between average and peak power draw. Deploying power infrastructure for peak loads leads to locked unusable capacity, for those infrequently occurring peaks. ICE works on the simple idea of using stored energy in the form of batteries to eliminate the peak draw from infrastructure and, hence, unlock the capacity for IT use.

ICE software provides peak-shaving capabilities and unlocks underutilized power capacity in data centers. The ICE Block employs peak-shaving principals to reduce the power capacity needed to support a rack or row of servers. By profiling power demand and employing battery storage, ICE manages peak demand using power stored during low utilization periods. When paired with Virtual Power Systems’ ICE software suite, ICE Block and ICE Switch can save data centers millions of dollars in operating expenses and capital expenditures.

Battery-based peak shaving is a powerful tool because it reacts very quickly to peaks according to objective functions or goals set within the ICE console. This quick reaction in peak shaving also enables the use of other hardware devices to address longer and broader peaks.

3. Characterization of peaks.

Figure 3 shows two kinds of peaks: thin high peaks and thick broad peaks. Researchers at Penn State (data-center cost optimization via workload modulation) characterize the peaks in terms of:

  1. The peak-to-average ratio (PAR), which captures the peak-shaving potential of the workload.
  2. A function of peak width and frequency (Px), defined as the percentage of the number of time slots in which the power demand value is larger than x%.

ICE Block is optimized to remove high, thin, spaced peaks characterized by high PAR values and low Px (thin spaced). The characteristics of these peaks allow batteries to recharge while not peak shaving. The thick/broad peaks (low PAR and high Px) have to resort to alternate energy sources, such as Genset or higher-capacity batteries, for peak shaving.

Dynamic Redundancy

ICE can bring utilization all the way to 80% or higher with peak assurance and dynamic redundancy. This is significant given the study detailed in Schneider’s whitepaper on “Determining Total Cost of Ownership for Data Center.” It finds that the single, largest cost driver of data-center TCO to be the unabsorbed overhead cost of underutilized infrastructure.

The most obvious TCO metric that ICE impacts is $/watt, where $ is measured in dollars spent on electrical equipment. On average, data centers spend $7 on capacity for every watt added. ICE implementation increases the watts used in IT equipment by using electrical capacity to what’s provisioned. As such, it minimizes the $/watt, reducing it to as much as 50%. Although $/watt is a widely used metric, it fails to impress IT folks due to its lack of comprehension of useful work done by IT equipment.

4. A 2N redundant data center can now have extra capacity for lower-priority racks.

Figure 4 shows a 2N redundant setup with ICE-enabled dynamic redundancy. Here, the power consumptions are oversubscribed by adding lower-priority racks. In a 2N redundancy, both power feeds, A and B, feed to a 400-kW peak demand (200 kW on each feed in Fig. 4). If one feed fails, the other takes over for the full 400-kW peak load.

In Figure 5, an additional 100 kW is added as lower-priority racks, making it a peak demand of 500 kW (400-kW mission critical load and added 100 kW of lower service-level agreements, or SLA). There’s no problem with this setup when both Feed A and Feed B are active. However, during a failure scenario of either Feed A or Feed B, the result is a 100-kW shortage in traditional setup without ICE.

5. Remediation during failure impactsonly lower SLA racks.

Fig. 5 shows the scenario when Feed A fails, putting the complete 500-kW load on Feed B. ICE manages this failure scenario in two ways:

  1. Low-priority racks are instantly turned off, using an ICE-enabled switch at the racks.
  2. Keeping load on Feed B to 400 kW, within breaker capacity.

ICE batteries would sustain low-priority racks for a configurable amount of time, before shutting them off. This allows time for a graceful shutdown of racks or other means, like workload migration, to transition the workload to safe racks.

References

“Power provisioning for a warehouse-sized computer,” Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso – ISCA: 2007

“Cost of Power in Large-Scale Data Centers,” James Hamilton: 2008

“Data Center Cost Optimization Via Workload Modulation Under Real-World Electricity Pricing,” Cheng Wang, Bhuvan Urgaonkar, Qian Wang, George Kesidis, and Anand Sivasubramaniam, Cornell University Library: 2013

“Determining Total Cost of Ownership for Data Center and Network Room Infrastructure,” http://www.schneider-ele.com