Scalability and Elasticity for Virtual Application Patterns in IBM PureApplication System

10 min readApr 4, 2021

By Andre Tost and José De Jesús

This article describes the various options that exist in IBM PureApplication System and IBM Workload Deployer for the creation of scalable middleware deployments using virtual application patterns. The article addresses the needs of pattern users as well as developers interested in providing scalable patterns.

IBM PureApplication System is an integrated system that combines hardware resources (compute nodes, storage, and networking) with software resources, using virtualization to maximize the utilization of the system. Workloads (for example, transactional applications, web applications, or data-centric applications) are supported via the use of patterns, which offer abstract views of the software components of a solution; for example, web applications, or databases. This document will focus on these patterns, and specifically, how virtual application patterns can be created and configured to be scalable and elastic.

Introduction

The term scalability is often very broadly used. A somewhat similar term people use, especially in the context of cloud computing, is elasticity. While we can argue that these terms mean different things, they both address the ability of an environment to serve changing volumes of workloads by adjusting the capacity of the system. How a workload is quantified depends on how it is processed by the system.

For example, if the workload is a video streaming service, then volume is typically measured by the number of users watching a video at any given time. In this case, increasing the network bandwidth between the streaming servers and the user devices would address increases in the workload. Similarly, increasing CPU power would handle an increase in an insurance company’s underwriting workload, which is measured by the size and complexity of the policy being processed. Be aware that elasticity and scalability not only require that a system react to an increase in workload, but also that it equally reacts to a decrease in workload (something that is often overlooked, and sometimes a harder problem to solve).

A common way of differentiating between scalability and elasticity is to look at the time intervals in which changes happen and whether these changes can be predicted and thus dealt with before they actually happen. Scalability is often used to describe the mechanisms and procedures applied to a system with predictable, almost linear changes in workload volume over a period of time. That is, proper capacity planning can happen in advance and adjustments can be made manually and somewhat statically. Elasticity, on the other hand, is used in cases where changes in workload can happen rapidly, with dramatic increases or decreases over a very short period of time, with little to no advanced warning. This type of change requires a much more flexible and automated system that can react to changes without manual intervention.

In the context of cloud computing, it is typically assumed that a cloud service must be elastic. We expect that the service can react to changes in workload in orders of magnitude, within a very short time. However, in reality, not all cloud services meet that requirement. What’s important is that any system makes the best and most efficient use of existing resources based on its workload characteristics.

This article describes the mechanisms that exist in IBM PureApplication System to adjust resources according to a specific workload running on the system. This covers both scalability and elasticity — and often the lines between the two are blurred.

Horizontal and vertical scaling

There are two main mechanisms to scale a computer system:

Horizontal scaling means adding additional nodes to a system, such as virtual machines, or removing them as necessary. This is also called scaling out and scaling in, respectively.
Vertical scaling refers to adding additional resources to an existing node, for example, adding additional memory or CPUs to a VM, or reducing that capacity accordingly. The terms also used here are scaling up when adding resources, and scaling down when removing them.

Other variants of these scaling types exist. For example, a specific form of vertical scaling might increase the number of running JVMs, or the number of other resources, within a virtual machine. However, below we will assume that vertical scaling is limited to the increase or decrease of memory and CPU allocation per VM.

Virtual application patterns

As mentioned above, patterns are the main means used to describe desired workloads for IBM PureApplication System. Two types of patterns exist, virtual system patterns and virtual application patterns, and both have different levels of support for scaling and elasticity. Our focus here is on virtual application patterns, which offer a high level of abstraction to describe a solution, with focus typically on an application and its characteristics, instead of on the underlying topology and detailed VM infrastructure.

Auto-scaling framework

IBM PureApplication System offers extensive support for scaling virtual application instances, by injecting components into each deployed virtual machine that work together to achieve the desired behavior. The following figure shows an overview of these components:

The autoscaling agent depends on the monitoring agent to collect metric data at the operating system level. As part of the deployment process of a virtual application pattern, IBM PureApplication System automatically installs different scripts and agents into each of the virtual machines. One of the included virtual machines acts as the “leader.” That is, it hosts and runs the autoscaling agent. This agent essentially makes decisions about scaling actions; for example, if additional VMs are needed or if any existing ones can be shut down, based on data that is gathered via the monitoring agent. Scaling decisions are then forwarded to the PureApplication System IaaS layer for execution.

The monitoring agent, which also runs on the leader VM, collects data from all VMs involved in the virtual application instance. It reads all available monitoring data via a SON (service over network)-based bulletin board service (BBSON).

Each participating VM contains a collector agent, which acts as the collection point for monitoring data collected within a VM and sends it to the BBSON.

Plug-in collectors monitor the VM itself, as well as any middleware that contributes data to the scaling behavior. Built-in collectors, provided out-of-the-box, collect basic information about CPU and memory consumption, as well as disk usage.

You can also develop and register your own plug-in collectors to monitor any data you consider important for a scaling criteria you want to offer. Collectors can provide information via HTTP (through HTTP collector types) or via script execution (through Script collector types), following a specific format which the monitoring agent parses and transfers to the autoscaling agent as metrics. In addition, you can apply metadata to define how these metrics appear in the virtual application console deployment panel. (See the IBM PureApplication System Information Center for detailed information on how to build your own plug-in collector.)

Scaling criteria

Whether new VMs are added to or removed from a virtual application (horizontal scaling), or whether memory or CPUs are added to or removed from an existing VM (vertical scaling), is based on defined trigger events. Each trigger event consists of the actual metric (for example, the measured utilization of CPU in a VM), and the thresholds for scaling in and out, respectively. You can also define the time duration that the threshold must have exceeded before taking action. Finally, you can define the minimum and maximum number of instances (VMs) that can exist within the deployed virtual application instance. The following figure shows the scaling criteria for horizontal scaling:

The structure of the attributes for vertical scaling is similar:

As mentioned above, the autoscaling agent captures this information, compares it to the monitoring data that it receives, and makes decisions accordingly.

This data — that is, the policy attributes shown in the diagrams above — are defined in the topology.json document that is used by the IaaS layer to deploy the topology. This document is created through a series of transformations that exist within a plugin. The topology.json file is often based on a template that contains the basic layout of the VMs that are to be deployed, and this template then gets resolved and completed by transformers for components, links, and policies, and their associated plugins.

Here is an example for specifying horizontal scaling criteria within a VM template:

Sample VM Template with Scaling Definitions

In this example, CPU usage is defined. The associated metric, named CPU.Used, is one of the metrics provided by the default collector plug-in, so no extra coding is required to support it.

Once a VM based on the template shown above is deployed, the autoscaling agent will automatically use the defined trigger criteria for making scaling decisions.

Scaling policy

The intended non-functional characteristics of a virtual application pattern are typically represented by a policy. Such a policy offers configurable parameters which are then transformed into the appropriate configuration on the resulting topology. This is no different for scaling. Ideally, a pattern type supports a scaling policy that enables users to define the criteria and type of scaling they require from the underlying topology.

For example, the WebApp pattern that ships with IBM PureApplication System contains such a policy. It offers settings that let you control the scaling behavior of the IBM WebSphere Application Server topology underneath the pattern:

The Response Time Based scaling type, for example, uses the response time of the web application as criteria for scaling. The platform is scaled in or out when the application’s response time is out of the specified threshold range. This type of metric is not part of the default collector plugin, which only monitors OS level data. This means the scaling policy in the WebApp pattern leverages additional collector plugins that monitor WebSphere Application Server specific data collected by the monitoring agent, and provided to the autoscaling agent.

Be aware that this scaling policy is embedded into the WebApp pattern and therefore not easily reusable for your own pattern type. The only way today to leverage the same collector plugins, and so on, is to copy the source code into your own policy plugin implementation.

In any case, the implementation in the scaling policy plugin has to ensure the proper values are filled in the resulting topology.json file, as shown above. To make such a policy widely reusable, it should be a “type 2” policy. In other words, it should be implemented in its own plugin and its own pattern type, which can then be linked with other pattern types that want to leverage it.

Manually scaling a virtual application pattern instance

There are times when you might need to manually scale the platform after deployment, whether to troubleshoot a plugin or its scaling policy, or simply because you might have an administrative need to manually add or remove instances of virtual applications or shared services.

For example, if you deploy an application based on the WebApp pattern, from the IBM PureApplication System web interface, you can:

Select the virtual application instance.
Click Manage to open the virtual application console.
Click Operations.
Select an application to scale. A twisty on the right panel labeled Manual Scaling contains options for manually scaling the application in and out, as shown below:

In this case, once you submit a scaling request, you can see the status of your request in the operation execution results window:

You will also see the instance count of your virtual application increase or decrease accordingly.

The capability to manually scale a virtual application, though, is not provided automatically. For manual scaling to work, the associated plugin must specifically expose manual scaling operations. You specify the scaling capability of a plugin as scaling policy metadata in the application model. The resulting topology document uses the same attributes for both manual and auto-scaling.

The autoscaling agent exposes a group of scaling APIs for plugins to use in their operation scripts. For example, plugins use the maestro.autoscaling.scale_in and maestro.autoscalingAgent.scale_out functions to trigger scale-in and scale-out actions for a specified role and VM template. These APIs ensure that manual scaling does not interfere with auto scaling because it permits the autoscaling agent to temporarily suspend auto scaling while a manual scaling operation is still in progress.

Note that autoscaling actions — adding or destroying instances — are based on metrics gathered for all instances of a specific role, not just on one particular instance. The system counts the number of role instances that exists and calculates their average metric to determine whether a scaling operation is necessary, For example, if three instances of a particular role are running, and the CPU threshold is 80%, the average CPU utilization (CPU.Used) of all three instances must be greater than 80% for a scale-out operation to occur. In other words, the following condition must be met:

(RoleAInst1.CPU.Used + RoleAInst2.CPU.Used + RoleAInst3.CPU.Used)/3 > 80%

Finally, not all plugins expose both auto and manual scaling operations. A plugin must explicitly specify and provide support for the kind of scaling operation it wants to offer. Depending on the features and user experience they provide, certain plugins might not have support for manual scaling, while others might enable only manual scaling, and not provide support for auto scaling.

Summary

The ability to elastically scale workloads is a key concept of any cloud computing platform. This article described how IBM PureApplication System supports scalability and elasticity of workloads running in the system as virtual applications. A virtual application pattern can offer automatic scaling via policies. It can also take advantage of built-in monitoring data to make decisions about when and how to scale, or it can define its own custom criteria. Scaling can be done horizontally or vertically, or both, and IBM PureApplication System supports automatic as well as manual scaling.