Tag Archives: c++11

What’s up with TR1? (and C++11, and libc++)

In 2005, the C++ committee issued Technical Report 1 (aka TR1). It added a lot of features to the C++ standard library, mostly taken from boost. Things like shared_ptr, bind, unordered_map, unordered_set, array and lots of other stuff. This was a purely library extension to the standard; no language changes. These extensions were placed into the namespace std::tr1 and the header files were referenced as #include <tr1/xxxx>

Fast forward a couple of years (ok, more than a couple). The C++11 standard was released, and all the stuff (except for some of the esoteric math functions) from TR1 were incorporated into the standard library.

There is no mention of tr1 in the C++11 standard. There is no C++ header named tr1/unordered_map, for example. (There is a header named unordered_map now).

Some standard library vendors have maintained the tr1 header files for backwards compatibility. Other vendors (such as libc++, which being c++11 only, has no backwards compatibility to worry about) do not.

What this means is that part of updating your code to work in C++11, you should probably examine your use of tr1 facilities, and (quite possibly) update them to use the ones in std.

Needless to say, this can cause problems if you want to maintain compatibility with C++03 build systems. You can switch on the version of C++ that you’re building with, and have a using directive to pull the features that you want into either the global namespace or a particular namespace.

#if __cplusplus >= 201103L
#include <memory>
using std::shared_ptr;
#else
#include <tr1/memory>
using std::tr1::shared_ptr;
#endif

Alternately, you can use typedefs

#if __cplusplus >= 201103L
#include <memory>
typedef std::shared_ptr<int> spInt;
#else
#include <tr1/memory>
typedef std::tr1::shared_ptr<int> spInt;
#endif

There is a tool called cpp11-migrate being developed using clang, which helps people migrate their code from c++03 to c++11. Currently, it does things like convert to C++11-style for loops, and finds places where nullptr can be used. It would be really nice if it could also convert uses of std::tr1 facilities to std, but first, someone will have to document what (if any) the differences are between tr1 and C++11’s std. I know there are some; for example, std::tuple does interesting things in C++11 because the language has variadic templates.

Range based for loops and pairs of iterators

In C++11, we now have range-based for loops. Given a container c, it is easy to write:

    for ( auto x : c ) { do something with x }

but the STL deals in pairs of iterators.
Given two iterators, f and l, I still have to write:

    for ( auto it=f; it != l; ++it ) { do something with *it }
  • The first time I typed that, I got it wrong – I put a ‘,’ instead of the second ‘;’
  • The second time I wrote that, I got it wrong again! – I used it < l instead of it != l.

Anyway, I would like to be able to write something like:

    for ( auto x : f,l ) { do something with x }

But there is no support for that in C++11.

Enter iterator_pair:

    template <typename Iterator>
    class iterator_pair {
    public:
        iterator_pair ( Iterator first, Iterator last ) : f_ (first), l_ (last) {}
        Iterator begin () const { return f_; }
        Iterator end   () const { return l_; }

    private:
        Iterator f_;
        Iterator l_;
    };

With this I can now write:

    for ( auto x : iterator_pair<type of f and l> ( f,l )) { do something with x }

which works, but is still annoying. Why should I have to put the type of the iterators there in my for loop? Worse than that, if this is in a template, I may not know what the type of f and l are!

But a helper function makes it all better:

    template <typename Iterator>
    iterator_pair<Iterator> make_iterator_pair ( Iterator f, Iterator l ) {
        return iterator_pair<Iterator> ( f, l );
    }

Now my code looks like I want:

    for ( auto x : make_iterator_pair ( f,l )) { do something with x }

and I’m happy (for now).

I’m pretty sure that there’s a better name for this, but I’m going with iterator_pair for the moment.

Testing libc++ with -fsanitize=undefined

Soon after I posted my last article, Testing libc++ with Address Sanitizer, I received a tweet from @jurederman, whose profile on Twitter identifies himself as a “Mozilla security bug hunter”, asking “Will you do -fsanitize=undefined next? :)”.

I responded with “Already running the UBSan tests”.

Address Sanitizer (ASan), which I used in the last post, is not the only “sanitizer” that clang offers. There are “Thread Sanitizer” (TSan), “Undefined Behavior Sanitizer” (UBSan), and others. There’s an integer overflow sanitizer which I believe is called IOC coming in the 3.3 release of clang. The documenation for UBSan can be found on the LLVM site

Anyway, I have been looking at the results of running the libc++ test suite with UBSan enabled.

The mechanics

Like ASan, UBSan is a compiler pass and a custom runtime library. You enable this by passing -fsanitize=undefined to the compiler and linker. I ran the libc++ test suite like this:

cd $LLVM/libcxx/test
CC=/path/to/tot/clang OPTIONS="--std=c++11 -stdlib=libc++ -fsanitize=undefined" ./testit

Unfortunately, this failed; working with unreleased compilers and libraries, I needed updated versions of both libc++.dylib and libc++abi.dylib. So I built those from sources, and then used DYLD_LIBRARY_PATH to make sure that the test program used the libraries that I’d just built. (I didn’t want to replace the ones in /usr/lib, because lots of things in the system depend on them)

cd $LLVM/libcxx/test
DYLD_LIBRARY_PATH=$LLVM/libcxx/lib:$LLVM/libcxxabi/lib CC=/path/to/tot/clang OPTIONS="-std=c++11 -stdlib=libc++ -fsanitize=undefined -L $LLVM/libcxxabi/lib -lc++abi" ./testit

where, as before “/path/to/tot/clang” is the clang that I just built from source, and $LLVM is where I’ve checked out the various parts of LLVM from Subversion.

The results

And the tests were off and running. In the last article, I noted that these tests take about 30 minutes to run on my MacBook Pro. The ASan tests took about 90 minutes. I was pleasantly surprised when the UBSan tests finished in about 42 minutes, or about 40% slower than the baseline tests.

There were 12 tests (out of more than 4800) that failed under normal circumstances. Using UBSan, 49 tests failed, and there were about 48,463 different runtime errors reported by UBSan.

The failing tests

Of the 37 tests that failed under UBSan, 34 of them were aborted because of uncaught exception of type XXXX, where XXX was from the standard library (std::out_of_range, for example). This is caused by a mismatch between libc++ and libc++abi, specifically by the fact that both my custom-built libc++ and my custom-built libc++abi contained typeinfo records for some of the standard exception classes. Getting this right and getting all the bits of the test infrastructure to use the right libraries turned into a big mess very quickly, and I still don’t have a good solution here. Hopefully this will be the subject of a future blog post. However, I was able to convince myself that these failures were not the result of a bug in either libc++, the test suite or UBSan.

The other three failures were in the std::thread test suite. When I investigated, it turned out that there was a race condition in some of the thread tests. A race condition? In threading code? Inconceivable! Apparently the runtime environment under UBSan was different enough to trigger the (latent) race condition in these three tests. Looking at the test suite, I found the same race condition in 10 other tests as well. I committed revision 178029 to fix this in all 13 tests.

The error messages

48K errors! I can’t look at 48K error messages; so I decided to bin them.

There were 37,675 messages of the form: 0x000106ae3fff: runtime error: value inf is outside the range of representable values of type 'xxxx'

and 10,693 messages of the form: 0x000101a8f244: runtime error: value nan is outside the range of representable values of type 'xxxx'

Where “xxxx” could be “double” or “float”. Also, the first bin also included “-inf” as well.

There were 52 messages of the form: what.pass.cpp:24:9: runtime error: member call on address 0x7fff5e8f48d0 which does not point to an object of type 'std::logic_error'

There were 29 messages like this: eval.pass.cpp:180:14: runtime error: division by zero

There were 6 messages like this: /Sources/LLVM/libcxx/include/memory:3163:25 runtime error: load of misaligned address 0x7fff569a85c6 for type 'const unsigned long', which requires 8 byte alignment

There were 5 messages like this: 0x0001037a329e: runtime error: load of value 4294967294, which is not a valid value for type 'std::regex_constants::match_flag_type'

There were 2 messages like this: /Sources/LLVM/libcxx/include/locale:3361:48: runtime error: index 40 out of bounds for type 'char_type [10]'

There was one message like this: runtime error: load of value 64, which is not a valid value for type 'bool'

The first thing that I noticed is that sometimes UBSan will give you file and line number, and otherwise just a hex address. The file and line number is incredibly useful for tracking stuff down.

The Analysis

Working from the bottom up:

The load of value 64, which is not a valid value for type 'bool' message came out of one of the atomics tests, where it is trying to clear and set an atomic flag that has been default constructed. I don’t know what the correct behavior is here; still looking at this one.

The index 40 out of bounds for type 'char_type [10]' errors came from the money formatting tests in libc++, and were failing only on “wide string” versions of the tests; i.e, with two (or four) byte characters. The offending line turned out to be:

*__nc = __src[find(__atoms, __atoms+sizeof(__atoms), *__w) - __atoms];

and the problem was that sizeof(__atoms) was assumed to be the same as the number of entries in that array. Perfectly fine for character arrays, not so fine for wide character arrays. Fixed in revision 177694.

The load of value 4294967294, which is not a valid value for type 'std::regex_constants::match_flag_type' errors turned out to be simple to fix as well, once we decided what the right fix was. This turned out to be complicated, because it involved a close reading of the standards document. The problem was that match_flag_type was an enum, emulating a bitmask. The type also had an operator ~(), which flipped all the bits in the type. But since the type was implemented as an enum, it had an underlying integer type that it was represented as, and the operator ~ just flipped all the bits. This led to values that UBSan didn’t like. A large discussion followed, with sentiments like “does it matter” and “can any code actually tell”, and so on. Eventually, I just changed the operator ~ to only flip the bits that are valid in the enumeration. Fixed in revision 177693.

The load of misaligned address 0x7fff569a85c6 for type 'const unsigned long', which requires 8 byte alignment were in the hashing code for strings. They are a performance optimization, and I haven’t tried to touch them. Whatever changes are made here will have to be done very carefully, since this will affect the performance of all the associative containers.

The “division by zero” messages were in three different tests. There were 3 of them in the numeric limits tests, and they were there on purpose. There were 2 of them in the complex number tests, and they were also on purpose. The other 24 of them were in the random number test suite, where the tests were generating a bunch of random numbers (using various distributions) and checking to see that the mean, variance, standard deviation, skew, etc, were all what the programmer expected. The problem is in the last measurement: skew. It is some calculated value divided by the variance. If the variance is zero, then the skew should be infinity. Many of the tests in the random number suite are testing “edge cases” of the random number generators, and some of these edge cases will produce a sequence where all the numbers are the same (and thus, the variance == 0). We solved this by commenting out the calculation of the skew for these degenerate cases, and leaving a comment in the test source file. Howard fixed this in revision 177826.

The runtime error: member call on address 0x7fff5e8f48d0 which does not point to an object of type 'std::logic_error' messages, as it turned out, were due to a bug in UBSan.

I’m just getting started on the inf/-inf/nan messages (about 48K of those). Most of these come from the complex number regression tests. Since this is a test suite for a library that implements a bunch of numeric routines, a lot of the tests actually do generate and use nan/inf, so I expect that many of these will be “false positives”.

Conclusions

This exercise, while not completed, has already turned up a set of bugs in the libc++ test suite, as well as a bug in libc++ and some undefined behavior in libc++. There’s more to look at here, but I think this was a good exercise. There’s kind of a mismatch of expectations here, especially in the complex and numeric test suites, because UBSan is looking for nan/inf/-inf and the libc++ test code is deliberately generating them.

Thanks to Howard Hinnant for his patience and explanations about the C++ standard and libc++ and the libc++ test suite, and to Richard Smith for his help with UBSan and interpreting the C++ standard.

C++ and Xcode 4.6

So, you’ve installed Xcode 4.6, and you are a C++ programmer.

You want to use the latest and greatest, so you create a new project, and add your sources to the project, and hit Build, and … guess what? Your code doesn’t build!

What’s up with that?

In Xcode 4.6 (and presumably, later versions), the default C++ compiler is clang, the default language is C++11, and the standard library is libc++.

This is a change from previous versions, where the default was gcc 4.2.1, C++03, and libstdc++.

This is good news

Clang is a much more capable compiler than gcc 4.2.1. It’s also better integrated into Xcode.

C++11 is a major upgrade in functionality from C++03. There have been lots of articles written about the new features, so I won’t belabor them here.

However, with a new language, compiler, and standard library, there are some incompatibilities. I’ll try to run through the common ones, and hopefully you will be up and running quickly.

How can I tell if I’m using libc++?

If you’re writing cross-platform code, sometimes you need to know what standard library you are using. In theory, they should all offer equivalent functionality, but that’s just theory. Sometimes you just need to know. The best way to check for libc++ is to look for the preprocessor symbol _LIBCPP_VERSION. If that’s defined, then you’re using libc++.

    #ifdef  _LIBCPP_VERSION
    //  libc++ specific code here
    #else
    //  generic code here
    #endif

Note that this symbol is only defined after you include any of the libc++ header files. If you need a small header file to include just for this, you can do:

    #include <ciso646>

The header file “ciso646” is required by both the C++03 and C++11 standards, and defined to do nothing.

What happened to TR1?

Technical Report #1 (TR1) was a set of library additions to the C++03 standard. Representing the fact that they were not part of the “official” standard, they were placed in the namespace std::tr1.

In c++11, they are officially part of the standard, and live in the namespace std, just like vector and string. The include files no longer live in the “tr1” folder, either.

So, code like this:

    #include <tr1/unordered_map>
    int main()
    {
        std::tr1::unordered_map <int, int> ma;
        std::cout << ma.size () << std::endl;
        return 0;   
    }

Needs to be changed to:

    #include <unordered_map>
    int main()
    {
        std::unordered_map <int, int> ma;
        std::cout << ma.size () << std::endl;
        return 0;   
    }

It’s probably easiest to just search your code base for references to tr1 and remove them.

Missing identifiers (include what you use)

“My code used to build with Xcode 4.5, and now I’m getting “unknown identifier” errors with stuff in the standard C (or C++) library!”

Library headers may include other library headers. Sometimes, this is required by the standard, sometimes it is done as an “implementation feature” of the library.

To be portable, you should explicitly include the header files that define the routines that you use. That way, you’re not dependent on the internal details of libc++ (or libstdc++).

For example, if you are calling std::malloc (or malloc), you should really #include <cstdlib> (or #include <stdlib.h>) to make sure that it is defined.

[ Updated 03-03 ]  In a Xcode-Users mailing list posting, Todd Heberlein writes:

I have my own C++ library. When I link against it in a Cocoa app (with the appropriate files set to .mm), everything works fine.

But when I start a new "Command Line Tool" project and try to link against the library, I get a lot of errors about missing STL symbols.

This is almost certainly because his library was built with gcc/stdlibc++, and his new tool with clang/libc++.

More to come.

As I find other differences, I will be adding to this document. If you come across things, please let me know in the comments and I will add them.