Win32 port of google perftool

Introduction

    Back in the year 2005, google released to the open-source world a project named perftools. It claims to be the fastest allocated ever seen that worked well with threads.

    Curious as ever, I decided to port the allocator to Windows and check how it compared to the standard run-time library allocator.

    In this small article, I'll present the result of a few performance comparisons and show that the claim that this allocator work well with threads is indeed true.

   Furthermore, I'll give a few pointers for those that want to play with the port to run a few experiments.

Tests

   The performance test were run under Windows XP SP2 on a PC with 2Go of ram and an Intel P4D 3Ghz with hyperthreading enabled.

   Two performance tests were run:

  1. Test 1: Each thread run a number of same sized malloc followed by free. Stress allocator for multi-thread contention. (source excerpt)
  2. Test 2: Each thread run a number of same sized malloc followed by free, with allocated block fill & content check before freeing. Stress allocator for multi-thread contention but add some computational work between allocations to reduce stress. (source excerpt)

   Each test was run at least 5 times, min & max execution were discarded before computing the average.

Test 1 result

Results in text form are available there.

Test1 execution time:

Test 1 execution time comparison

   This graph shows that perftools malloc scales well with regards to thread count. On the other hand, it is not clear weither or not CRT malloc scale as well. But it is clear that perftools is way faster than the standard library malloc in the presence of multiple threads.

Test1 execution time divided per thread count: Test 1 Execution time per thread count

   This graph is much more interesting. It shows that perftools malloc scales perfectly: the time to process the work of one thread remains perfectly constant.

   On the other hand, the standard library malloc does not scale so well. The time to process the work of one thread increase with each added thread. Interestingly, when two threads run concurrently, the test run just as fast as when only one thread is running. The mostly like cause is hyper threading working very well in the presence of two threads, though hyper threading no longer seems effective in the presence of 3 threads or more.

Test 2 result

Results in text form are available there.

Test 2 execution time: Test 2 execution time comparison

   This graph is fairly similar to Test 1 graph. Though, it should be noted that while with 10 threads, the total time for perftools malloc increased by about 30s, for standard library malloc it is similar (173s versus 170s for vc8).

Test 2 execution time divided per thread count: Test 2 Execution time per thread count

   This graph is more interesting. Compared to Test 1, it shows that the time to process the work of one thread increased by 2.8s for perftools malloc, it increased by a smaller amount in the case of the standard library malloc (17.3s versus 17.0s for 10 threads). (Notes: for some thread count it did increase by 2.8s). Test 2 is even faster than Test 1 for 3 threads with the standard library malloc. This seems to hint most of the time is spent by each thread in contention over memory allocation, and that adding the additional computational work between allocations helped reduce the contention.

What have we learned?

Previous performance tests demonstrated that perftools malloc scales perfectly for our test scenarios and is significantly faster even than the standard library malloc in single threaded situation. Tests also demonstrated that the standard library malloc of visual studio 7.1 & 8.0 does not scale well in the presence of multiple threads doing intensive memory allocation.

Surprisingly, in case of contention and in the presence of only two threads, hyper threading seems to work exceptionally well and improve execution time significantly. But no improvement seems to occur when there is no contention between threads.

Doing your own experiment: download

A modified version of the first release of perftools can be found here.

See README-WIN32.txt at the root for instructions. You will also found test results in the statistics/ directory.

You will need either Microsoft Visual Studio 7.1 or 8.0 to compile this project. Not that this port is very experimental and very little tests where done. DO NOT use this for production.


Copyright ©2007, Baptiste Lepilleur . http://gaiacrtn.free.fr/index.html