Here is a piece of source codes that compare C arrays and C++ vectors. It tests six scenarios: a) preallocated C array; b) dynamically growing C array; c) dynamical C vector calling kv_a macro (in my kvec.h); d) dynamical C vector calling kv_push macro (in my kvec.h); e) preallocated C++ vector and f) dynamically growing C++ vector. You can find my kvec.h on my blog.
#include
#include
#include
#include
#include “kvec.h”
int main() Such result may vary with different machines/compilers, but not much. Update: My example passed valgrind check on a Linux (Debian etch, g++-4.1), but Tom_ pointed out that it did not pass VC++’s debugger as vector::operator[] writes outside vector::size(). Anyway, it is good not to use operator[] in my way. You can replace reserve()+[] with resize()+[] or reserve()+push_back(). On my machine, the replacement gives a little bit slower speed, but it is safer/more portable in that way. Thanks for all the comments.
{
int M = 10, N = 20000000, i, j;
clock_t t;
t = clock();
for (i = 0; i < M; ++i) {
int *array = (int*)malloc(N * sizeof(int));
for (j = 0; j < N; ++j) array[j] = j;
free(array);
}
printf("C array, preallocated: %.3f sec\n",
(float)(clock() - t) / CLOCKS_PER_SEC);
t = clock();
for (i = 0; i < M; ++i) {
int *array = 0, max = 0;
for (j = 0; j < N; ++j) {
if (j == max) {
max = !max? 1 : max << 1;
array = (int*)realloc(array, sizeof(int)*max);
}
array[j] = j;
}
free(array);
}
printf("C array, dynamic: %.3f sec\n",
(float)(clock() - t) / CLOCKS_PER_SEC);
t = clock();
for (i = 0; i < M; ++i) {
kvec_t(int) array;
kv_init(array);
kv_resize(int, array, N);
for (j = 0; j < N; ++j) kv_a(int, array, j) = j;
kv_destroy(array);
}
printf("C vector, dynamic (kv_a): %.3f sec\n",
(float)(clock() - t) / CLOCKS_PER_SEC);
t = clock();
for (i = 0; i < M; ++i) {
kvec_t(int) array;
kv_init(array);
for (j = 0; j < N; ++j)
kv_push(int, array, j);
kv_destroy(array);
}
printf("C vector, dynamic (kv_push): %.3f sec\n",
(float)(clock() - t) / CLOCKS_PER_SEC);
t = clock();
for (i = 0; i < M; ++i) {
std::vector
array.reserve(N);
for (j = 0; j < N; ++j) array[j] = j;
}
printf("C++ vector, preallocated: %.3f sec\n",
(float)(clock() - t) / CLOCKS_PER_SEC);
t = clock();
for (i = 0; i < M; ++i) {
std::vector
for (j = 0; j < N; ++j) array.push_back(j);
}
printf("C++ vector, dynamic: %.3f sec\n",
(float)(clock() - t) / CLOCKS_PER_SEC);
return 0;
}
[/sourcecode]
Here is the result on two machines (compiled with g++ -O2 -fomit-frame-pointer -finline-functions):
type
MacIntel
LinuxIntel
C array, preallocated
1.589
1.180
C array, dynamic
2.064
1.340
C vector, dynamic (kv_a)
2.051
1.600
C vector, dynamic (kv_push)
1.932
1.250
C++ vector, preallocated
2.119
1.590
C++ vector, dynamic
5.095
3.770
Your code for the preallocated vector is incorrect.
Calling vector::reserve() does not change the size() of a vector, so you must push_back() elements rather than assigning them through operator[].
Alternatively you can use vector::resize() instead of reserve(), but this will default-initialize all the elements.
I admit I am not a competent C++ programmer, and I do not know which way is better. I have tried all the recombinations between reserver()-resize() and push_back()-[]. The current codes seem to give the best performance for preallocated C++ vector. This may vary. A friend of mine told me on his Ubuntu, push_back() gives the best performance, though marginally.
In addition, I may add that STL’s memory pool may help performance on more complex applications. On simple example like this, it only brings overhead. To this end, this simple example is kinda unfair to STL.
I think it’s a good idea for you to make a blog post like this as you are in the early stages of learning C++. But be careful about making conjectures about “STL’s memory pool” and trying to pass those conjectures off as a qualified opinion since, as you say, you are not a competent C++ programmer.
The code might be the most efficient, but it is also the most wrong! It produces *undefined behavior*. You’re probably getting away with it because you’re only using containers of ints. If you had a vector of std::strings, I wouldn’t be surprised if you program crashes.
@Anon
I believe this is caused by the allocator. If you call push_back() on a preallocated vector, it is almost as fast as [], which means if the vector is static, push_back() has little overhead and so the difference mainly comes from increasing the size of the vector. I can confirm (I have read its source code) that the vector class also doubles the memory when there is not enough space, the same as the C array. If the STL allocator was as efficient as realloc(), we would not see the big difference in speed. In fact, vector always allocates new memory when doubling, while realloc() may not change the address when there is enough. This is a potential overhead in STL, but even so we would not expect to see that big difference.
Btw, I have studied some STL source codes about five years ago and then I knew how mediocre I was (and I am). Actually I would enourage all the C/C++ programers to read this masterpiece, not only on how to use it, but on how it works.
@Edd
I only want to achieve the best performance in THIS example. I know the current example would not work in other applications, but I do not want to make others say there is a more efficient way in this example.
Indeed, using reserve() will not initialize the elements; using [] will not change size() which will have problem in iteration. But reserve() and [] have almost no overhead. And even so preallocated vector is slower on some OS/machines (not all). I guess this is because the compiler is not good at optimizing complex codes.
Performance is irrelevant if the code is wrong. The code has a bug, pure and simple. You know what undefined behavior means, right? It’s defined by both the C and C++ standards.
The current example might happen to work on your machine, with your implementation of the standard library and your compiler but you have absolutely no guarantees that it will work on any other configuration. Indeed, you have no guarantee that the program will work tomorrow on your machine.
It’s important that you understand this else it will very likely bite you.
I do not think in this example the code is wrong. It will not cause undefined behaviour in this example. I am pretty sure and to make it even surer, I have read related source codes of vector class once again.
I tried this in Visual C++ 2005, because I am too thick to use Linux.
C array, preallocated: 2.937 sec
C array, dynamic: 9.734 sec
C vector, dynamic (kv_a): 3.438 sec
C vector, dynamic (kv_push): 9.593 sec
Microsoft Visual Studio C Runtime Library has detected a fatal error in attractivechaos.exe.
Press Break to debug the program or Continue to terminate the program.
Hmm… let’s try a debug build…
C array, preallocated: 7.016 sec
C array, dynamic: 16.531 sec
C vector, dynamic (kv_a): 8.031 sec
C vector, dynamic (kv_push): 16.610 sec
And then the output stops, and I get a dialog box:
—————————
Microsoft Visual C++ Debug Library
—————————
Debug Assertion Failed!
Program: z:\tests\attractivechaos\debug\attractivechaos.exe
File: c:\vs2005\vc\include\vector
Line: 756
Expression: vector subscript out of range
And it is quite right to barf, because it is quite rightly checking that the index parameter is less than size(), which it is not, because the code only calls reserve and not resize. You can probably find a good explanation of this in any decent C++ book, or alternatively you can think about what vector must actually be doing under the hood and work it out that way. Probably the worst thing to do is read the code, because it’s full of underscores, and then you’ll misunderstand what’s going on on, and draw incorrect conclusions…
So anyway I made changed reserve to resize, and for good measure added one that uses reserve as (or so I guess) is intended:
t=clock();
for(i=0;i<M;++i) {
std::vector array;
array.reserve(N);
for(j=0;j<N;++j)
array.push_back(j);
}
printf(“C++ vector, reserved: %.3f sec\n”,
float(clock()-t)/CLOCKS_PER_SEC);
And another that uses the method of kings:
t=clock();
for(i=0;i<M;++i) {
std::vector array_storage;
array_storage.resize(N);
int *array=&array_storage[0];
for(j=0;j<N;++j)
array[j]=j;
}
printf(“C++ vector, method of kings: %.3f sec\n”,
float(clock()-t)/CLOCKS_PER_SEC);
And the results are in:
C array, preallocated: 2.875 sec
C array, dynamic: 9.890 sec
C vector, dynamic (kv_a): 3.407 sec
C vector, dynamic (kv_push): 9.703 sec
C++ vector, preallocated: 5.328 sec
C++ vector, reserved: 4.078 sec
C++ vector, method of kings: 4.906 sec
C++ vector, dynamic: 14.281 sec
These results are somewhat surprising. I expected ‘method of kings’ to beat ‘reserved’, for example, but I guess the default initialization costs a lot over 80MB and the increments-plus-shady-inlining must not amount to a right lot.
Anyway, the moral of the story is clearly that kings get where they are today by making sure their parents are royalty, and not by constructing a new vector every time they need it.
Thanks, Tom_. I tried my example again on a Linux, using valgrind (a debugging tool in Linux) to check potential memory problem. It reports no error.
What STL implementation is VC++ using? Is it possible that VC++ is using another implementation of STL and vector::reserve is implemented differently? Another possible explanation is VC++ imposes a stronger check on STL. Writing outside size() is not a good (though does no harm if you are aware of it) thing and so the debugger regards this as an error. The third possible explanation is the underlying memory pool works differently: in Windows, reserve() claims the memory but does not immediately allocate it.
Anyway, your experiment shows that my example is a bad one as it is not portable to VC++.
Running on mac/intel under gdb the error explains it pretty well and concurs with what Tom said that it is ” checking that the index parameter is less than size(), which it is not, because the code only calls reserve and not resize. ”
Running…
C array, preallocated: 0.950 sec
C array, dynamic: 2.037 sec
C vector, dynamic (kv_a): 1.271 sec
C vector, dynamic (kv_push): 1.794 sec
/Developer/SDKs/MacOSX10.6.sdk/usr/include/c++/4.2.1/debug/vector:198:
error: attempt to subscript container with out-of-bounds index 0, but
container only holds 0 elements.
Objects involved in the operation:
sequence “this” @ 0x0x7fff5fbff6d0 {
type = NSt7__debug6vectorIiSaIiEEE;
}
Program received signal: “SIGABRT”.
This post finally becomes the discussion on C++ grammar, or more precisely the STL usage. I admit I misused STL, but I did not do anything wrong in C++. I did not misinterpret the standard STL source code. In addition, what I really want to say here is dynamic vector is bad. What I really want to ask is why vector is that slow while it simply implements a strategy nearly identical to the C code.
@eddieclay
I see why you get SIGABRT. This is because you are using “debug/vector” which is almost entirely different from the standard vector except for the interface. It is less efficient and adds a lot of excessively stringent out-of-boundary assertions to keep your program from potential bugs. I did not have this problem because I always use the command line which links to the “released” version at least by default. If you link to the “release version”, you will not see SIGABRT; valgrind will complain nothing. With the standard STL, there is NO memory problem with the example I have shown. In C++ syntax itself, there is nothing wrong with my program. It is the debug version of STL that imposes the requirement, which is helpful, though.
56. array.reserve(N);
57. for (j = 0; j = size()). That is what your program is doing in line 57, as the size of ‘array’ is zero.
“I did not do anything wrong in C++”
You’ve made a method call that violated the method’s preconditions. That’s generally a programming mistake, whether it’s C++ or not.
Hey, I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say GREAT blog!…..I”ll be checking in on a regularly now….Keep up the good work! 🙂
I’m Out! 🙂
reserve()+push_back() is the correct way to do it. Your issue is that you used an index to loop through a vector (for (i = 0; i < M; ++i) {) which is bad coding practice. Always use iterators to loop through a vector. if you had, the loop would of never worked since it would check against size().
I would encourage you to change your code to reserve()+push_back() so people can take you more seriously. It was very insightful otherwise.
my mistake, I meant to say the loop would check against vector.end() not .size()
@sd
I know what is good coding practice and what is bad. I just want to achieve the fastest speed of C++. If I use reserve+push_back(), one will argue C++ is slow because of the extra thing done in push_back(). In contrast, reserve+[] has no overhead. On my laptop, [] is indeed faster.
Thanks for these useful little C template libraries (kvec and khash); I use them fairly regularly, as they’re painless to incorporate and distribute in new projects.
Thank you. It is great to know someone think them useful.
Your c macro libraries are useful on many levels. They are certainly instructive and useful. Thank you for making them available to people.
RT
These library is very, very useful. Thank you very much!
But if I may give a suggestion to you: when you make a mistake, just correct it. Don’t deny the obvious.
Yeah, I know this is one of my weakness, but I admit my mistake ultimately if it is true. I have made many corrections to my various posts based on others’ suggestions. In this case, however, I still think the original code is right. You just read the STL source code. Using reserve() is perfectly legitimate and causes the least overhead. To me, being right or wrong is determined by the actual source code but not by the cliche imposed by the libraries. The obvious is: on the standard STL, my original code always runs correctly and has the least overhead.
But why do unsafe things when you can do it safely by wasting some clock time more? 😀 One day some people wake up, change the STL code and your program is flawed.
Come on, it’s not worth.
Thanks again for your wonderful libs.
For practical implementations, I wouldn’t do this, but I am showing a benchmark here. I want to get the best out of C++.
Hi. I came across your blog and you have some very interesting ideas.
But I believe that your insistence to deny a simple mistake (as many pointed out above) distracted from the main point of the post which is the less than stellar vector performance in many situations.
On The C++ Programming Language 4th edition, Stroustrup briefly (seriously, he spends one paragraph on it) mentions what he thinks are the 2 main issues with std::vector
1) heap allocation
2) elements are always zero-initialized
I didn’t run your code but I believe that you were trying to bypass problem 2). You shouldn’t do that by cheating!
One thing that you could do is to pass a custom allocator to the vector. This allocator could have empty construct and destroy methods and the compiler could be smart enough to optimize those calls away.
This way you could call resize without paying for the initialization.
But if you do decide to write a custom allocator, you could even try to make it allocate memory from a static buffer (possibly falling back to malloc/new if the buffer is used up). This can lead to a vector that is significantly faster.
Finally, for some weird reason, on my Arch Linux machine, simply having the custom allocator call malloc/free instead of new delete, leads to better performance.
In order to figure out the speed of algorithm it is not very good idea to time your algorithm.
It goes back to Alan Turing machine, you would need to figure out the number of processes and in today architecture, you have L1, L2, L3, virtual memory, kernel and so many things that are involved.
So timing is some statistical result that need to be performed in so many times and then if you are good with statistics you could make some sense of it.
Yo need small, average, large and very large data.
Your processor will execute your code just for some time and if you have files you get suspended program right way…
It would require some program that will test it properly.
Just a “side” note about the kvec_t this code is based on. Its very unsafe code because It uses realloc without checking weather the realloc was successful or not. Its also based on macros using nested the C ternary operator which in my opinion is ugly to read and understand. If you enjoy writing code full of compiler warnings then kvec is for you.
Well, if developers do good job, they would be able to create very fast containers that work very fast, but they don’t….
Using asm, mt, and some other tchniques…
I tried this benchmark against Rust in 2018, and to see where C++ vectors had come along
C array, preallocated: 0.400 sec
C array, dynamic: 1.073 sec
C vector, dynamic (kv_push): 1.077 sec
C vector, dynamic (kv_a): 0.505 sec
C++ vector, preallocated: 0.402 sec
C++ vector, dynamic: 2.070 sec
And with rust
rust, preallocated: 0.664sec
rust: dynamic: 1.244sec
In C++ preallocated vectors are definitely the way to go, having identical performance to raw C and even outperforming kvec. Dynamic allocations though are twice as slow as kvec or plain C arrays.
Rust at just 15% slower than raw C for both preallocated and dynamic arrays; not a bad tradeoff to get 2xC++ performance with greater robustness
[…] reading C Array vs. C++ Vector, I found an interesting implementation of a simple vector container in C, which also includes […]