A Case Study of Workload Prioritization

Workload prioritization in information technology is the process of assigning importance to tasks or processes within an IT environment. This helps to ensure that the most important and time-sensitive tasks are completed first and that resources are allocated efficiently.

It also helps to optimize resource utilization, so that the most critical workloads get the attention they need without sacrificing other services or projects.

Manual Workload Prioritization
The process can be done manually, through a set of rules, or using automation techniques such as machine learning algorithms. Read on for a workload prioritization case study that illustrates the imortance of the process:

*Our workload prioritization case study company, “Endor Moon Mining Company or EMMC”  is a composite of real customer stories and experiences. As you can imagine, no organization wants their mistakes publicized, so we’ve created a fictitious name to avoid any risk of Ewok uprising.

At Endor Moon Mining Company*, (EMMC), their Business Continuity Disaster Recovery (BCDR) strategy involved grouping all recoverable servers into two types of backups. One group was a long-term storage solution, and the other was for a fast RTO with a tight RPO. While it may sound prudent to have the ability to archive and equally recover quickly, the company hadn’t factored the cost measures related this style of BCDR.

When it came time to recover, the IT team was quick to engage and spin up the relevant systems. What wasn’t factored in was the additional efforts required to perform quality checks on the systems and mitigate any related issues. Often, IT teams think in terms of virtual machines. In today’s marketplace, several systems are required to bring applications online. While that doesn’t sound too daunting, it did for EMMC as several apps required shared systems, and those apps required multiple teams to ensure the operational functionality of the recovered app.

When enterprises plan, they often think of their downtime in terms of revenue, but have they properly analyzed the human capital aspect and how it impacts the bottom line with regard to the proper sequencing of recovery?

Workload Prioritization – Recovery Sequencing

EMMC had several public-facing websites and apps that required shared resources. When a failure occurred, every server was restored and powered on simultaneously. As such, start sequences and service delays, and wait times were out of the proper timings. Individual teams would call for a restart of a shared resource, and consequently, that would bring down another app that was being checked by another team. As you can imagine, the overall downtime was significantly extended.

Ultimately, we helped EMMC to modify its BCDR strategy. First, all servers were recovered to a power-off state. Then, each app was rated within a tiering system based on its relationship to revenue. For example, active order systems were in a higher tier than an accounting system used to bill long-pay customers. Once the proper tiers were in place, boot sequences and timings were adjusted to account for attribute groupings such as directory services, backend, middleware, and front-end servers.

In subsequent testing, EMMC was able to recover all systems from a complete failure, but they streamlined revenue substantially. It is estimated that if there were a future incident, that downtime would be reduced by up to 70%, and revenue loss would be decreased by 30%.

When it comes to managing IT systems and ensuring a timely recovery after a disruption, workload prioritization is key. Every organization has different needs and tolerances for downtime, and it’s essential to understand your business and its critical components to make informed decisions about recovery solutions.

There are generally two main priorities you’ll need to consider. It’s important to note that not all applications need to have the same parameters.

  1. Recovery Time Objective (RTO), or the amount of time it takes to recover servers, infrastructure, and applications.
  2. Recovery Point Objective (RPO), or the amount of data loss that can be tolerated.

These objectives help determine the priority level of different workloads and guide the selection of recovery solutions.

Workload Prioritization Framework

A typical prioritization framework includes four tiers of priority:

Priority 0: These are the core services that are absolutely critical to the business and must be running nearly all the time. Examples include networking and core applications like active directory and DNS services.

Priority 1: These are mission-critical applications that keep the business running and should be recovered within four hours with an RPO of less than two hours. Examples include retail Point of Sales solutions, online banking portals, and healthcare applications.

Priority 2: These are important applications and services that should be recovered within 24-48 hours with an RPO of up to 24 hours. Examples include billing systems, HR systems, and other items critical to the business process.

Priority 3: These are applications that are important for the long-term functioning of the business, but are not critical to the minute-to-minute operations. They can be recovered within 4-14 days with an RPO of a couple of days to a week. Examples include training and LMS systems or historical document retention.

It’s important to note that this is just a general framework and that different organizations may have different priorities and require different solutions. Ultimately, it is crucial to understand your business and its needs to prioritize workloads effectively and ensure timely recovery.

With over twenty years of experience in the industry, CEO of Different Dev, Thomas May is a respected member of the IT community and a Veeam Vanguard. He leads a professional team dedicated to provide the best in disaster recover and business continuity services.

Thomas May

CEO, Different Dev

Latest posts