Scalability and Elasticity for Virtual Application Patterns in IBM PureApplication System

By Andre Tost and José De Jesús

This article describes the various options that exist in IBM PureApplication System and IBM Workload Deployer for the creation of scalable middleware deployments using virtual application patterns. The article addresses the needs of pattern users as well as developers interested in providing scalable patterns.

IBM PureApplication System is an integrated system that combines hardware resources (compute nodes, storage, and networking) with software resources, using virtualization to maximize the utilization of the system. Workloads (for example, transactional applications, web applications, or data-centric applications) are supported via the use of patterns, which offer abstract views of the software components of a solution; for example, web applications, or databases. This document will focus on these patterns, and specifically, how virtual application patterns can be created and configured to be scalable and elastic.

Introduction

For example, if the workload is a video streaming service, then volume is typically measured by the number of users watching a video at any given time. In this case, increasing the network bandwidth between the streaming servers and the user devices would address increases in the workload. Similarly, increasing CPU power would handle an increase in an insurance company’s underwriting workload, which is measured by the size and complexity of the policy being processed. Be aware that elasticity and scalability not only require that a system react to an increase in workload, but also that it equally reacts to a decrease in workload (something that is often overlooked, and sometimes a harder problem to solve).

A common way of differentiating between scalability and elasticity is to look at the time intervals in which changes happen and whether these changes can be predicted and thus dealt with before they actually happen. Scalability is often used to describe the mechanisms and procedures applied to a system with predictable, almost linear changes in workload volume over a period of time. That is, proper capacity planning can happen in advance and adjustments can be made manually and somewhat statically. Elasticity, on the other hand, is used in cases where changes in workload can happen rapidly, with dramatic increases or decreases over a very short period of time, with little to no advanced warning. This type of change requires a much more flexible and automated system that can react to changes without manual intervention.

In the context of cloud computing, it is typically assumed that a cloud service must be elastic. We expect that the service can react to changes in workload in orders of magnitude, within a very short time. However, in reality, not all cloud services meet that requirement. What’s important is that any system makes the best and most efficient use of existing resources based on its workload characteristics.

This article describes the mechanisms that exist in IBM PureApplication System to adjust resources according to a specific workload running on the system. This covers both scalability and elasticity — and often the lines between the two are blurred.

Horizontal and vertical scaling

  • Horizontal scaling means adding additional nodes to a system, such as virtual machines, or removing them as necessary. This is also called scaling out and scaling in, respectively.
  • Vertical scaling refers to adding additional resources to an existing node, for example, adding additional memory or CPUs to a VM, or reducing that capacity accordingly. The terms also used here are scaling up when adding resources, and scaling down when removing them.

Other variants of these scaling types exist. For example, a specific form of vertical scaling might increase the number of running JVMs, or the number of other resources, within a virtual machine. However, below we will assume that vertical scaling is limited to the increase or decrease of memory and CPU allocation per VM.

Virtual application patterns

Auto-scaling framework

IBM PureApplication System auto-scaling framework

The autoscaling agent depends on the monitoring agent to collect metric data at the operating system level. As part of the deployment process of a virtual application pattern, IBM PureApplication System automatically installs different scripts and agents into each of the virtual machines. One of the included virtual machines acts as the “leader.” That is, it hosts and runs the autoscaling agent. This agent essentially makes decisions about scaling actions; for example, if additional VMs are needed or if any existing ones can be shut down, based on data that is gathered via the monitoring agent. Scaling decisions are then forwarded to the PureApplication System IaaS layer for execution.

The monitoring agent, which also runs on the leader VM, collects data from all VMs involved in the virtual application instance. It reads all available monitoring data via a SON (service over network)-based bulletin board service (BBSON).

Each participating VM contains a collector agent, which acts as the collection point for monitoring data collected within a VM and sends it to the BBSON.

Plug-in collectors monitor the VM itself, as well as any middleware that contributes data to the scaling behavior. Built-in collectors, provided out-of-the-box, collect basic information about CPU and memory consumption, as well as disk usage.

You can also develop and register your own plug-in collectors to monitor any data you consider important for a scaling criteria you want to offer. Collectors can provide information via HTTP (through HTTP collector types) or via script execution (through Script collector types), following a specific format which the monitoring agent parses and transfers to the autoscaling agent as metrics. In addition, you can apply metadata to define how these metrics appear in the virtual application console deployment panel. (See the IBM PureApplication System Information Center for detailed information on how to build your own plug-in collector.)

Scaling criteria

Horizontal Scaling Criteria

The structure of the attributes for vertical scaling is similar:

Vertical Scaling Criteria

As mentioned above, the autoscaling agent captures this information, compares it to the monitoring data that it receives, and makes decisions accordingly.

This data — that is, the policy attributes shown in the diagrams above — are defined in the topology.json document that is used by the IaaS layer to deploy the topology. This document is created through a series of transformations that exist within a plugin. The topology.json file is often based on a template that contains the basic layout of the VMs that are to be deployed, and this template then gets resolved and completed by transformers for components, links, and policies, and their associated plugins.

Here is an example for specifying horizontal scaling criteria within a VM template:

Sample VM Template with Scaling Definitions

In this example, CPU usage is defined. The associated metric, named CPU.Used, is one of the metrics provided by the default collector plug-in, so no extra coding is required to support it.

Once a VM based on the template shown above is deployed, the autoscaling agent will automatically use the defined trigger criteria for making scaling decisions.

Scaling policy

For example, the WebApp pattern that ships with IBM PureApplication System contains such a policy. It offers settings that let you control the scaling behavior of the IBM WebSphere Application Server topology underneath the pattern:

Scaling Policy in the WebApp Pattern

The Response Time Based scaling type, for example, uses the response time of the web application as criteria for scaling. The platform is scaled in or out when the application’s response time is out of the specified threshold range. This type of metric is not part of the default collector plugin, which only monitors OS level data. This means the scaling policy in the WebApp pattern leverages additional collector plugins that monitor WebSphere Application Server specific data collected by the monitoring agent, and provided to the autoscaling agent.

Be aware that this scaling policy is embedded into the WebApp pattern and therefore not easily reusable for your own pattern type. The only way today to leverage the same collector plugins, and so on, is to copy the source code into your own policy plugin implementation.

In any case, the implementation in the scaling policy plugin has to ensure the proper values are filled in the resulting topology.json file, as shown above. To make such a policy widely reusable, it should be a “type 2” policy. In other words, it should be implemented in its own plugin and its own pattern type, which can then be linked with other pattern types that want to leverage it.

Manually scaling a virtual application pattern instance

For example, if you deploy an application based on the WebApp pattern, from the IBM PureApplication System web interface, you can:

  1. Select the virtual application instance.
  2. Click Manage to open the virtual application console.
  3. Click Operations.
  4. Select an application to scale. A twisty on the right panel labeled Manual Scaling contains options for manually scaling the application in and out, as shown below:
Manually Scaling In and Out

In this case, once you submit a scaling request, you can see the status of your request in the operation execution results window:

Operation Execution Results

You will also see the instance count of your virtual application increase or decrease accordingly.

The capability to manually scale a virtual application, though, is not provided automatically. For manual scaling to work, the associated plugin must specifically expose manual scaling operations. You specify the scaling capability of a plugin as scaling policy metadata in the application model. The resulting topology document uses the same attributes for both manual and auto-scaling.

The autoscaling agent exposes a group of scaling APIs for plugins to use in their operation scripts. For example, plugins use the maestro.autoscaling.scale_in and maestro.autoscalingAgent.scale_out functions to trigger scale-in and scale-out actions for a specified role and VM template. These APIs ensure that manual scaling does not interfere with auto scaling because it permits the autoscaling agent to temporarily suspend auto scaling while a manual scaling operation is still in progress.

Note that autoscaling actions — adding or destroying instances — are based on metrics gathered for all instances of a specific role, not just on one particular instance. The system counts the number of role instances that exists and calculates their average metric to determine whether a scaling operation is necessary, For example, if three instances of a particular role are running, and the CPU threshold is 80%, the average CPU utilization (CPU.Used) of all three instances must be greater than 80% for a scale-out operation to occur. In other words, the following condition must be met:

(RoleAInst1.CPU.Used + RoleAInst2.CPU.Used + RoleAInst3.CPU.Used)/3 > 80%

Finally, not all plugins expose both auto and manual scaling operations. A plugin must explicitly specify and provide support for the kind of scaling operation it wants to offer. Depending on the features and user experience they provide, certain plugins might not have support for manual scaling, while others might enable only manual scaling, and not provide support for auto scaling.

Summary

José is a Thought Leader Executive Architect with IBM, specializing in Cloud Architectures and App Modernization.