The Ins and Outs of Hardware Compatibility Testing

What is Hardware Compatibility Testing?

Hardware compatibility testing is ensuring that a given application is able to run on a given set of hardware without any underlying issues related to that hardware; usually, this means that the application does not crash, performance is smooth and reasonably-unimpacted, and software features that directly interact with the hardware (such as rendering 3D graphics) work without error. Basically, that the application works as well on any given device as another, within reason.

What Does Compatibility Testing Look Like?

That “within reason” is an important caveat – there are decades’ worth of devices and hardware out there, and it’s not reasonable to assume that any application can work equally well on, say, a Pentium IV with 2GB RAM as it can on a 12th-generation Core i7 with 16GB. Because of this, the first place we usually start when discussing a compatibility testing project is with the minimum requirements.

Most developers don’t know their minimum requirements offhand without doing some specific, and focused, testing, but there are always places to start – are there any hardware features that an application requires, such as DirectX/D3D or OpenGL/CL version? If so, the beginnings of a graphics processor list can be constructed. Is the intent to support systems that are/were common within the last 5 years? CPU types, frequencies, average memory sizes, storage, etc. begin to take shape. Is the application intended for general use, a specific niche audience, schools, etc.? All these details can help shape both the beginnings of minimum requirements, and where to start compatibility testing; after all, there’s no point in casting too broad a net (although better too broad than too narrow) testing configurations that are not widely used, just as there is no good reason to ignore the large numbers of mid-range and several-year-old systems that are still in use. We’re always happy to offer our thoughts and suggestions, but we need a place to start, even if it is a general one. One helpful resource for finding a starting spot is the Steam Survey, which provides information about configurations and components in common use. See our past blog about the Steam Survey: https://www.betabreakers.com/blog/stats-sources-steam-hardware-survey/

Once we’re over the initial hurdle of where to begin, things get easier to plan. Using the provided minimum requirements as a starting point, we build out a list of hardware configurations to test against. This can be as simple as listing out ‘as-is’ systems in our labs in the required level of detail (common when testing laptops and, generally, Mac systems) to building out fully custom configurations by swapping base components around (most commonly, GPU and RAM). As much as possible, we prefer to work with various ‘off-the-shelf’ systems that we have purchased over the years, using them as the basis for most of the testing; we’ve found this to be both reliable and very close to the ‘real world’, where systems are typically purchased and then either left in their base state, hardware-wise, or maybe have a select few components upgraded. We also often make use of config boards – boards containing a bare motherboard, CPU and hard drive to which any power supply, video card and RAM quantity can be added. This provides an easily accessible baseline system that promotes efficient component swaps (usually video cards) – see https://www.betabreakers.com/blog/config-boards-vs-standalone-systems-for-compatibility-testing/. The resulting list of configurations becomes the Test Matrix, and establishing it is the foundational step in a compatibility testing project – the matrix lists all of the configurations to be tested in detail, including any required component swaps, as well as the metrics to be gathered from each system while testing.

Wait, metrics? Part of compatibility testing (a really big part, actually) is determining how well an application runs on a given set of hardware, and this typically requires the collection of some types of performance data so that everyone is looking at results the same way. Most compatibility testing projects are usually interested (at least in part) in 3D video rendering (games, applications using game engines for non-game purposes, modeling software, etc. are all examples), so frame rate (measured as FPS, Frames Per Second) is probably the single most commonly-gathered metric. For this, we prefer to leverage a built-in frame counter, if the application in test has one, but are prepared to use a variety of tools if one is not included, including the Steam Performance Monitor, Metal HUD, or the ancient-but-reliable FRAPS. Other commonly-gathered metrics include application CPU, GPU , and RAM utilization; these are most commonly collected using OS-level tools such as Windows Task Manager. Metrics of interest always differ by product, but taken together, should paint a picture of how well an application utilizes the resources available to it from an underlying test configuration.

Metrics, however, only provide a partial picture – a 3D game, for example, may run with an average of 30 FPS on a particular test configuration, but also experience micro-stuttering and input lag to a degree that the play experience is poor even though the performance is adequate. Here, subjective appraisals can be a valuable addition to metrics, offering an experienced viewpoint as to the experience of using an application. We usually provide subjective feedback in the form of descriptors (Excellent, Good, Fair, Poor) as well as written notes that are included as part of the data collected in the test matrix (we also report bugs, naturally, but bugs tend to be relatively clear-cut, compared to appraisals of how well an application runs).

It might be helpful to demonstrate what an in-progress test matrix looks like, with the types of information conveyed, the metrics being gathered, and the testing notes that typically accompany them. This Google Sheet will serve as our example; we actually use Google Sheets intensively for just this purpose, and this matrix is based on an actual testing project.

This project was a 3D application with a first-person perspective; graphics performance in different, representative, 3D environments was the main concern. Ultimately, most configurations returned quite usable performance and bug count was rather low, although testing identified several systems that were not supportable, as well as a general performance bottleneck in the application, namely, the City Map.

The configurations with no results listed are reflective of what the matrix looks like after it has been composed, but before testing has commenced – important configuration details are listed out, as well as the structure and outline of the metrics to be captured.

The configurations with results and data are reflective of what the matrix looks like both while testing is in progress, as well as at its conclusion. Notes and data are listed in real time as they are captured, and issue numbers are linked to bugs in our bugbase as they are entered (these, obviously, are not linked as they normally go to real software issues that are protected by NDA). This is a little cleaner than an actual in-progress matrix would look, since it’s based on a finalized effort, but this is mostly a change in clarity, not detail (we edit out redundancies and unimportant notes as we go along).

Gray-shaded cells denote where hardware swaps are needed; this is handled by our IT technicians during testing, or before. As much as we can, we prepare our test configurations ahead of time so that testers don’t need to wait on hardware changes and can smoothly work through available systems. As can be seen in the example, RAM and GPU swaps are pretty common, although pretty much any non-soldered components can be switched around if necessary, although we prefer to limit CPU swaps if at all possible as they are a little more sensitive than SSDs and video cards.

Color coding (red is bad, yellow is problematic) is used to call attention to areas of concern. This particular matrix is less colorful than most, but the major takeaway is that the use of color helps bring some focus to areas of (possible) trouble.

The contents of the Notes / Issues column (K) are reflective of the types of subjective measurements and supporting commentary that we typically include alongside the metrics being captured.

Not listed directly on the matrix, but important to note: all operating systems under test are updated to their latest available versions and have pending patches/updates applied so as to minimize potential disruptions during testing. Video drivers (or other important component hardware drivers, if being tested) are updated to their latest released versions prior to testing. We used to test older drivers as a base state, and while realistic, this ultimately led to few useful results, as the next step was always to update to latest drivers and retest, and very few developers have interest in supporting their applications on non-current hardware drivers.

Over the life of a compatibility testing project, virtually all information of note resides in either the test matrix or the bugbase; taken together, they (alongside a summary report of findings) represent the major deliverables for this type of testing.

Who Benefits From Compatibility Testing?

Everyone, of course! While there is a degree of truth in that statement, compatibility testing is usually of most interest to developers that are dealing directly with hardware capabilities (like 3D rendering) or are expecting to need to maintain a specific level of performance under load (like a modelling application with a massive number of triangles loaded, or a video game). A more general version of compatibility testing is also sometimes of interest when an application is going to be loaded onto a lot of very diverse hardware, and is critical enough that it needs to be expected to work on everything (some applications of medical or school use); in these cases, performance metrics and hardware swaps often take a back seat to ‘does it work/not crash?’, and more emphasis is often placed on out-of-the box systems, often with very modest CPUs and amounts of RAM.