I was wondering whether retrieving an element in a struct will incur additional overhead. And so I did the following experiment. Here the same array is sorted in two ways: with or without data retrieving from a struct. Both ways yield identical results. The question is whether the compiler knows the two ways are the same and can achieve the same efficiency.
#include
#include
#include
#include “ksort.h”
typedef struct {
int a;
} myint_t;
#define myint_lt(_a, _b) ((_a).a < (_b).a) KSORT_INIT_GENERIC(int) KSORT_INIT(my, myint_t, myint_lt) int main() { int i, N = 10000000; myint_t *a; clock_t t; a = (myint_t*)malloc(sizeof(myint_t) * N); srand48(11); for (i = 0; i != N; ++i) a[i].a = lrand48(); t = clock(); ks_introsort(int, N, (int*)a); printf("%.3lf\n", (double)(clock() - t) / CLOCKS_PER_SEC); srand48(11); for (i = 0; i != N; ++i) a[i].a = lrand48(); t = clock(); ks_introsort(my, N, a); printf("%.3lf\n", (double)(clock() - t) / CLOCKS_PER_SEC); free(a); return 0; } [/sourcecode] Here is the speed with different compilers on different CPUs (first value for without data retrieving and second with):
- Mac-Intel, gcc-4.0, -O2: 1.422 sec vs. 1.802 sec
- Mac-Intel, gcc-4.2, -O2: 1.438 vs. 1.567
- Mac-Intel, gcc-4.0, -O2 -fomit-frame-pointer: 1.425 vs. 1.675
- Mac-Intel, gcc-4.2, -O2 -fomit-frame-pointer: 1.438 vs. 1.448
- Linux-Intel, gcc-4.1, -O2: 1.600 vs. 1.520
- Linux-Intel, gcc-4.1, -O2 -fomit-frame-pointer: 1.620 vs. 1.530
- Linux-Intel, icc, -O2 -fomit-frame-pointer: 1.600 vs. 1.580
The conclusion is retrieving data from a struct may have marginal overhead in comprison to direct data access. However, a good compiler can avoid this and produce nearly optimal machine code. Using “-fomit-frame-pointer” may help for some machines, but not for others. In addition, it is a bit surprising to me that gcc-linux generates faster code for data retrieval in a struct. Swapping the two ways does not change the conclusion.
Leave a Reply