Straight from the lab: why a benchmark isn’t the answer to everything
Benchmarks promise a lot. Above all, they’re supposed to be a reliable, objective and neutral indication of smartphone performance. This leads some people to the conclusion they’re better than any other test. As a professional phone tester, I can confirm that’s not the case.
It doesn’t often happen that I have two of the same type of phones on my desk. But I’ve struck lucky with the LG V30. Not only do I have an LG V30+, a Korea export, but also an EU version of the LG V30. There are only two things that set these devices apart:
- The LG V30+ features 128 GB internal memory, while the LG V30 only has 64 GB
The LG V30+ boasts a hybrid dual SIM slot and the LG V30 doesn’t
The rest of the specs are identical. Now when I run through a benchmark app, the values should be the same.
Let the testing begin: methodology for the benchmarks
I’m using the following devices for my benchmark test:
The app I’m using for the benchmark is Antutu Benchmark with the 3D Add On. There are numerous benchmarks in the Google Play Store, but Antutu had consistently good reviews. After discussing options with the mobile geeks in the company, we came across Antutu by chance.
This is where we stumbled across the first problem of benchmark testing. There isn’t just the one benchmark, because anyone can develop and publish their own benchmark app. If benchmarks are to be universal, they’ll have to establish some sort of standard.
But at the moment, this standard doesn’t exist. That’s why any benchmark from any app can be disputed. For the very good reason that another app can regurgitate another figure, which has just as much weight in the benchmark world as the Antutu test.
The result: V30+ wins
I ran ten rounds of Antutu benchmarks. The mobile geeks weren’t in agreement. Each of them thought they knew how to make a benchmark better and therefore more meaningful. After a benchmark, you’re supposed to leave the phone in the fridge for half an hour so it can cool down. It’s also meant to be in flight mode so that data transfer doesn’t interfere with any of the functions.
A benchmark that can be influenced by so many environmental factors and that delivers inconsistent data will obviously be doubted. That’s why I decided to carry out the test like this: I’d take both of the phones, do the benchmark test ten times one after the other. Without taking a break, without putting it in the fridge and without waiting for the right lunar phase.
|LG V30||LG V30+|
A quick analysis:
- On average, the LG V30 scored 158 252.60 points
- On average, the LG V30+ scored 161 325.40 points
- The LG V30 scored the highest single value with 173 738.00 points
The LG V30 scored the lowest single value with 142 798.00 points
This shows that, on average, it was the LG V30+ that won the round. The difference was about 3072.80 points or 1.9%.
But something occurred to me as I was carrying out the benchmark tests. Going back to the fridge idea, the purpose is to cool down the phone. The theory is that a cooled phone delivers better and more reliable results. And yet, my test contradicts that. At least anecdotally. To be absolutely sure, I’d need to carry out a lot more tests, which I’d then say were representative based on nothing at all. In the ninth round of testing, both phones gave their highest value and the lowest in the eighth round.
What benchmarks tell you
Benchmarks do have a certain significance. Here’s what I found when I compared two completely different phones, an old HTC M7 from 2013 and a brand new Razer phone.
|HTC One M7||Razer Phone|
A quick analysis:
- On average, the Razer phone scored 176 931.50 points
- On average, the HTC One M7 scored 40 511.50 points
- The Razer Phone scored the highest single value with 181 227 points
The HTC One M7 scored the lowest single value with 39 611 points
And what does that all mean? The new phone is better than the old one. Who’d have thought it? The difference of 77.10% is completely meaningless.
Right, time for another set of phones to battle it out. Razer Phone versus Samsung Galaxy Note 8.
|Razer Phone||Samsung Galaxy Note 8|
The Note 8 gets defeated. But then again, that’s not really surprising when you check the specs. At best, the benchmark test seems to be a game that confirms your theory; at worst, it’s a waste of time.
What benchmarks don’t tell you
At digitec, when we test phones, we go above and beyond the realms of the benchmark tests. At the end of it, you know what they’re like to use day in, day out – you don’t just get a jumble of numbers from an app. Because let’s face it, you’re going to be using your phone on a daily basis, and even the best benchmark score will hide umpteen factors.
It won’t tell you anything about the bit of dirt behind the glass on my LG V30+, which you wouldn’t notice if you just picked it up once for a benchmark test. The camera speed on the Razer phone wouldn’t have been called into question and the durability of the HTC One M7 wouldn’t have been brought to light.
To uncover these points, to assess them, qualify and quantify them, you need human eyes and hands. At the end of the day, it’s you and not an app who’s going to be holding the phone in your hands and using it to call people, take pictures and send WhatsApps to friends and family. Arbitrary values can be as high as they want, but they’ll never communicate all of this.
And on that note, I’ll carry on testing… just without benchmarks most of the time.