Sometimes you get things wrong

A few years ago, Sean Parent challenged me to provide an implementation of Boyer-Moore searching in C++. I did that, first in boost and then, later as part of the proposed Library Fundamentals Technical Specification.

The idea here is that you have a searcher object, which encapsulates the actual searching. You construct it with the pattern that you want to search for, and then you call the searchers operator() with the corpus that you want to search, and it will return to you the start of the pattern in the corpus, if it exists, and the end of the corpus, if it does not (this is the same set of rules that std::search follows).

But this weekend I realized that this is not the right thing to return. The searcher (and std::search for that matter) should return a “range” (ok, a pair of iterators) denoting the beginning and end of the pattern in the corpus. (Yes, you can get the end of the pattern by incrementing the returned iterator by the length of the pattern, but that’s an O(N) operation if you only have forward iterators.

I came to this realization when I was trying to write a split algorithm. I needed to find a pattern in a sequence (hello, searchers) repeatedly. But I quickly ran aground, because I needed to advance my internal iterators by the size of the pattern. I looked at the searchers I had written for libc++, and sure enough, all that information was there; it just had to be returned. So, as an experiment, I changed the searcher implementation to return a pair<Iterator, Iterator> instead of just an Iterator.

Now my split algorithm is very simple.

It assumes that the searcher returns pair(last, last) when the pattern is not found.

template <typename InIterator, typename Searcher, typename OutIter>
OutIter split(InIterator first, InIterator last, const Searcher &s, OutIter out)
{
    while (first != last)
    {
        std::pair<InIterator, InIterator> found = s(first, last);
        // We've got three cases here:
        //  The pattern is found in the middle of the input; output a chunk, and go around again
        //  The pattern doesn't exist: output the rest of the input and terminate.
        //  The pattern is found at the end of the input, output an empty chunk and terminate.
        *out++ = std::make_pair(first, found.first);
        if (found.second == last && found.first != found.second) // we found the pattern at the end; output an empty match
            *out++ = std::make_pair(last, last);
        first = found.second;
    }
    return out;
}

It takes a pair of iterators, which are the corpus to be searched, a searcher (which defines the pattern to be searched for), and an iterator to write the results to.

The moral of this story (again) is that your algorithms should not throw away useful information. This is the second time this has bit me (see my post on copy_while from a couple of years ago); I wonder how many more times I will make this mistake before I finally learn.

[Later] I realized that I wasn’t being clear above. The iterator calculation that you want to avoid is determining the distance between the patterns, not the length of the pattern. So, if you have a pattern of '*', and a source of "ABCD*EFG*HIJKL*MNOP", you can avoid calculating the distance between successive asterisks.

 

Simplifying code and achieving exception safety using unique_ptr

Recently, I had a bug report against libc++’s implementation of deque.

The code in question was (simplified):

    #ifndef _LIBCPP_NO_EXCEPTIONS
        try
        {
    #endif  // _LIBCPP_NO_EXCEPTIONS
            __buf.push_back(__alloc_traits::allocate(__a, __base::__block_size));
    #ifndef _LIBCPP_NO_EXCEPTIONS
        }
        catch (...)
        {
            __alloc_traits::deallocate(__a, __buf.front(), __base::__block_size);
            throw;
        }
    #endif  // _LIBCPP_NO_EXCEPTIONS

and the bug report read “If the allocation fails, then the pointer never gets
added to __buf, and so the call to deallocate will free the wrong thing.”

Thanks to Tavian Barnes for the bug report and the diagnosis.

There are two operations here that can fail; the first is the allocation,
and the second is the call to push_back. We need to deal with both.

Tavian suggested declaring a variable named __block (since this is allocating
block of memory for the deque to store objects into), and then attempting to
add that to __buf. If that fails, then deallocate __block.

I dismissed this solution immediately, because __block has a special meaning
in Objective-C, and there are people who use libc++ and Objective-C/C++ together.

However, I did try writing with __ablock instead:

    #ifndef _LIBCPP_NO_EXCEPTIONS
        pointer __ablock;
        try
        {
    #endif  // _LIBCPP_NO_EXCEPTIONS
            __ablock = __alloc_traits::allocate(__a, __base::__block_size);
            __buf.push_back(__ablock);
    #ifndef _LIBCPP_NO_EXCEPTIONS
        }
        catch (...)
        {
            __alloc_traits::deallocate(__ablock);
            throw;
        }
    #endif  // _LIBCPP_NO_EXCEPTIONS

Testing this, I found that this solved the problem. Yay!
But the code was still ugly. All those ifdefs and the catch (...) bothered me.

Then I remembered unique_ptr, and realized that it did exactly what I wanted.

Libc++ has an internal class called __allocator_destructor, whose constructor
takes an allocator (by reference) and a size_t, and whose operator() takes a
pointer and deletes it, using the allocator’s deallocate function (passing in
the size). This is used in several places in libc++.

Suddenly, the code got much simpler:

    typedef __allocator_destructor<_Allocator> _Dp;
    unique_ptr<pointer, _Dp> __hold(
        __alloc_traits::allocate(__a, __base::__block_size),
            _Dp(__a, __base::__block_size));
    __buf.push_back(__hold.get());
    __hold.release();

This looks very different – what’s going on here?

This code allocates the block of memory and immediately stuffs it into a
unique_ptr (__hold). Then it adds it to __buf as before. Then we release
it from __hold, which says that it doesn’t need to be deallocated when __hold
goes out of scope.

What about when an exception is thrown?
* If the allocation throws, the __hold is never constructed. Nothing was
allocated, and nothing needs to be deleted.
* If the push_back throws, then __hold‘s destructor deallocates the memory for us.

In either case, the right thing happens.

The general pattern for this is:

    unique_ptr<T> holder(new T{/* params */});
    // set up some stuff that uses holder.get()
    holder.release();

If the “stuff” throws, then holder cleans up. Simple, and no need for try/catch
blocks to handle deallocations.

If you’re writing code that does allocations and other operations that can throw,
this pattern could simplify your code a lot.

What’s up with TR1? (and C++11, and libc++)

In 2005, the C++ committee issued Technical Report 1 (aka TR1). It added a lot of features to the C++ standard library, mostly taken from boost. Things like shared_ptr, bind, unordered_map, unordered_set, array and lots of other stuff. This was a purely library extension to the standard; no language changes. These extensions were placed into the namespace std::tr1 and the header files were referenced as #include <tr1/xxxx>

Fast forward a couple of years (ok, more than a couple). The C++11 standard was released, and all the stuff (except for some of the esoteric math functions) from TR1 were incorporated into the standard library.

There is no mention of tr1 in the C++11 standard. There is no C++ header named tr1/unordered_map, for example. (There is a header named unordered_map now).

Some standard library vendors have maintained the tr1 header files for backwards compatibility. Other vendors (such as libc++, which being c++11 only, has no backwards compatibility to worry about) do not.

What this means is that part of updating your code to work in C++11, you should probably examine your use of tr1 facilities, and (quite possibly) update them to use the ones in std.

Needless to say, this can cause problems if you want to maintain compatibility with C++03 build systems. You can switch on the version of C++ that you’re building with, and have a using directive to pull the features that you want into either the global namespace or a particular namespace.

#if __cplusplus >= 201103L
#include <memory>
using std::shared_ptr;
#else
#include <tr1/memory>
using std::tr1::shared_ptr;
#endif

Alternately, you can use typedefs

#if __cplusplus >= 201103L
#include <memory>
typedef std::shared_ptr<int> spInt;
#else
#include <tr1/memory>
typedef std::tr1::shared_ptr<int> spInt;
#endif

There is a tool called cpp11-migrate being developed using clang, which helps people migrate their code from c++03 to c++11. Currently, it does things like convert to C++11-style for loops, and finds places where nullptr can be used. It would be really nice if it could also convert uses of std::tr1 facilities to std, but first, someone will have to document what (if any) the differences are between tr1 and C++11’s std. I know there are some; for example, std::tuple does interesting things in C++11 because the language has variadic templates.

Range based for loops and pairs of iterators

In C++11, we now have range-based for loops. Given a container c, it is easy to write:

    for ( auto x : c ) { do something with x }

but the STL deals in pairs of iterators.
Given two iterators, f and l, I still have to write:

    for ( auto it=f; it != l; ++it ) { do something with *it }
  • The first time I typed that, I got it wrong – I put a ‘,’ instead of the second ‘;’
  • The second time I wrote that, I got it wrong again! – I used it < l instead of it != l.

Anyway, I would like to be able to write something like:

    for ( auto x : f,l ) { do something with x }

But there is no support for that in C++11.

Enter iterator_pair:

    template <typename Iterator>
    class iterator_pair {
    public:
        iterator_pair ( Iterator first, Iterator last ) : f_ (first), l_ (last) {}
        Iterator begin () const { return f_; }
        Iterator end   () const { return l_; }

    private:
        Iterator f_;
        Iterator l_;
    };

With this I can now write:

    for ( auto x : iterator_pair<type of f and l> ( f,l )) { do something with x }

which works, but is still annoying. Why should I have to put the type of the iterators there in my for loop? Worse than that, if this is in a template, I may not know what the type of f and l are!

But a helper function makes it all better:

    template <typename Iterator>
    iterator_pair<Iterator> make_iterator_pair ( Iterator f, Iterator l ) {
        return iterator_pair<Iterator> ( f, l );
    }

Now my code looks like I want:

    for ( auto x : make_iterator_pair ( f,l )) { do something with x }

and I’m happy (for now).

I’m pretty sure that there’s a better name for this, but I’m going with iterator_pair for the moment.

Testing libc++ with -fsanitize=undefined

Soon after I posted my last article, Testing libc++ with Address Sanitizer, I received a tweet from @jurederman, whose profile on Twitter identifies himself as a “Mozilla security bug hunter”, asking “Will you do -fsanitize=undefined next? :)”.

I responded with “Already running the UBSan tests”.

Address Sanitizer (ASan), which I used in the last post, is not the only “sanitizer” that clang offers. There are “Thread Sanitizer” (TSan), “Undefined Behavior Sanitizer” (UBSan), and others. There’s an integer overflow sanitizer which I believe is called IOC coming in the 3.3 release of clang. The documenation for UBSan can be found on the LLVM site

Anyway, I have been looking at the results of running the libc++ test suite with UBSan enabled.

The mechanics

Like ASan, UBSan is a compiler pass and a custom runtime library. You enable this by passing -fsanitize=undefined to the compiler and linker. I ran the libc++ test suite like this:

cd $LLVM/libcxx/test
CC=/path/to/tot/clang OPTIONS="--std=c++11 -stdlib=libc++ -fsanitize=undefined" ./testit

Unfortunately, this failed; working with unreleased compilers and libraries, I needed updated versions of both libc++.dylib and libc++abi.dylib. So I built those from sources, and then used DYLD_LIBRARY_PATH to make sure that the test program used the libraries that I’d just built. (I didn’t want to replace the ones in /usr/lib, because lots of things in the system depend on them)

cd $LLVM/libcxx/test
DYLD_LIBRARY_PATH=$LLVM/libcxx/lib:$LLVM/libcxxabi/lib CC=/path/to/tot/clang OPTIONS="-std=c++11 -stdlib=libc++ -fsanitize=undefined -L $LLVM/libcxxabi/lib -lc++abi" ./testit

where, as before “/path/to/tot/clang” is the clang that I just built from source, and $LLVM is where I’ve checked out the various parts of LLVM from Subversion.

The results

And the tests were off and running. In the last article, I noted that these tests take about 30 minutes to run on my MacBook Pro. The ASan tests took about 90 minutes. I was pleasantly surprised when the UBSan tests finished in about 42 minutes, or about 40% slower than the baseline tests.

There were 12 tests (out of more than 4800) that failed under normal circumstances. Using UBSan, 49 tests failed, and there were about 48,463 different runtime errors reported by UBSan.

The failing tests

Of the 37 tests that failed under UBSan, 34 of them were aborted because of uncaught exception of type XXXX, where XXX was from the standard library (std::out_of_range, for example). This is caused by a mismatch between libc++ and libc++abi, specifically by the fact that both my custom-built libc++ and my custom-built libc++abi contained typeinfo records for some of the standard exception classes. Getting this right and getting all the bits of the test infrastructure to use the right libraries turned into a big mess very quickly, and I still don’t have a good solution here. Hopefully this will be the subject of a future blog post. However, I was able to convince myself that these failures were not the result of a bug in either libc++, the test suite or UBSan.

The other three failures were in the std::thread test suite. When I investigated, it turned out that there was a race condition in some of the thread tests. A race condition? In threading code? Inconceivable! Apparently the runtime environment under UBSan was different enough to trigger the (latent) race condition in these three tests. Looking at the test suite, I found the same race condition in 10 other tests as well. I committed revision 178029 to fix this in all 13 tests.

The error messages

48K errors! I can’t look at 48K error messages; so I decided to bin them.

There were 37,675 messages of the form: 0x000106ae3fff: runtime error: value inf is outside the range of representable values of type 'xxxx'

and 10,693 messages of the form: 0x000101a8f244: runtime error: value nan is outside the range of representable values of type 'xxxx'

Where “xxxx” could be “double” or “float”. Also, the first bin also included “-inf” as well.

There were 52 messages of the form: what.pass.cpp:24:9: runtime error: member call on address 0x7fff5e8f48d0 which does not point to an object of type 'std::logic_error'

There were 29 messages like this: eval.pass.cpp:180:14: runtime error: division by zero

There were 6 messages like this: /Sources/LLVM/libcxx/include/memory:3163:25 runtime error: load of misaligned address 0x7fff569a85c6 for type 'const unsigned long', which requires 8 byte alignment

There were 5 messages like this: 0x0001037a329e: runtime error: load of value 4294967294, which is not a valid value for type 'std::regex_constants::match_flag_type'

There were 2 messages like this: /Sources/LLVM/libcxx/include/locale:3361:48: runtime error: index 40 out of bounds for type 'char_type [10]'

There was one message like this: runtime error: load of value 64, which is not a valid value for type 'bool'

The first thing that I noticed is that sometimes UBSan will give you file and line number, and otherwise just a hex address. The file and line number is incredibly useful for tracking stuff down.

The Analysis

Working from the bottom up:

The load of value 64, which is not a valid value for type 'bool' message came out of one of the atomics tests, where it is trying to clear and set an atomic flag that has been default constructed. I don’t know what the correct behavior is here; still looking at this one.

The index 40 out of bounds for type 'char_type [10]' errors came from the money formatting tests in libc++, and were failing only on “wide string” versions of the tests; i.e, with two (or four) byte characters. The offending line turned out to be:

*__nc = __src[find(__atoms, __atoms+sizeof(__atoms), *__w) - __atoms];

and the problem was that sizeof(__atoms) was assumed to be the same as the number of entries in that array. Perfectly fine for character arrays, not so fine for wide character arrays. Fixed in revision 177694.

The load of value 4294967294, which is not a valid value for type 'std::regex_constants::match_flag_type' errors turned out to be simple to fix as well, once we decided what the right fix was. This turned out to be complicated, because it involved a close reading of the standards document. The problem was that match_flag_type was an enum, emulating a bitmask. The type also had an operator ~(), which flipped all the bits in the type. But since the type was implemented as an enum, it had an underlying integer type that it was represented as, and the operator ~ just flipped all the bits. This led to values that UBSan didn’t like. A large discussion followed, with sentiments like “does it matter” and “can any code actually tell”, and so on. Eventually, I just changed the operator ~ to only flip the bits that are valid in the enumeration. Fixed in revision 177693.

The load of misaligned address 0x7fff569a85c6 for type 'const unsigned long', which requires 8 byte alignment were in the hashing code for strings. They are a performance optimization, and I haven’t tried to touch them. Whatever changes are made here will have to be done very carefully, since this will affect the performance of all the associative containers.

The “division by zero” messages were in three different tests. There were 3 of them in the numeric limits tests, and they were there on purpose. There were 2 of them in the complex number tests, and they were also on purpose. The other 24 of them were in the random number test suite, where the tests were generating a bunch of random numbers (using various distributions) and checking to see that the mean, variance, standard deviation, skew, etc, were all what the programmer expected. The problem is in the last measurement: skew. It is some calculated value divided by the variance. If the variance is zero, then the skew should be infinity. Many of the tests in the random number suite are testing “edge cases” of the random number generators, and some of these edge cases will produce a sequence where all the numbers are the same (and thus, the variance == 0). We solved this by commenting out the calculation of the skew for these degenerate cases, and leaving a comment in the test source file. Howard fixed this in revision 177826.

The runtime error: member call on address 0x7fff5e8f48d0 which does not point to an object of type 'std::logic_error' messages, as it turned out, were due to a bug in UBSan.

I’m just getting started on the inf/-inf/nan messages (about 48K of those). Most of these come from the complex number regression tests. Since this is a test suite for a library that implements a bunch of numeric routines, a lot of the tests actually do generate and use nan/inf, so I expect that many of these will be “false positives”.

Conclusions

This exercise, while not completed, has already turned up a set of bugs in the libc++ test suite, as well as a bug in libc++ and some undefined behavior in libc++. There’s more to look at here, but I think this was a good exercise. There’s kind of a mismatch of expectations here, especially in the complex and numeric test suites, because UBSan is looking for nan/inf/-inf and the libc++ test code is deliberately generating them.

Thanks to Howard Hinnant for his patience and explanations about the C++ standard and libc++ and the libc++ test suite, and to Richard Smith for his help with UBSan and interpreting the C++ standard.

Testing libc++ with Address Sanitizer

I’ve been running the libc++ tests off and on for a while. It’s a quite extensive test suite, but I wondered if there were any bugs that the test suite was not uncovering. In the upcoming clang 3.3, there is a new feature named Address Sanitizer which inserts a bunch of runtime checks into your executable to see if there are any “out of bounds” reads and writes to memory.

In the back of my head, I’ve always thought that it would be nice to be able to say that libc++ was “ASan clean” (i.e, passed all of the test suite when running with Address Sanitizer).

So I decided to do that. [ All of this work was done on Mac OS X 10.8.2/3, btw ]

How to run the tests:

There’s a script for running the tests. It’s called testit.

    $ cd $LLVM/libcxx/test ; ./testit

where $LLVM/libcxx is where libc++ is checked out.

This takes about 30 minutes to run.

Without Address Sanitizer, libc++ fails 12 out of the 4348 tests.

Running the tests with Address Sanitizer

    $ cd $LLVM/libcxx/test ; CC=/path/to/tot/clang++ OPTIONS= "-std=c++11 -stdlib=libc++ -fsanitize=address" ./testit

Note: the default options are “-std=c++11 -stdlib=libc++”, that’s what you get if you don’t specify anything

This takes about 92 minutes; just a bit more than three times as long.

With Address Sanitizer, libc++ fails 54 tests (again, out of 4348)

What are the failures?

  • In 11 tests, Address Sanitizer detected a one-byte write outside a heap block. All of these involve iostreams. I created a small test program that ASan also fires on, and sent it to Howard Hinnant (who wrote most of libc++), and he found a place where he was allocating a zero-byte buffer by mistake. One bug, multiple failures. He fixed this in revision 177452.
  • 2 tests for std::random were failing. This turned out to be an off-by-one error in the test code, not in libc++. I fixed these in revisions 177355 and 177464.
  • Address Sanitizer detected memory allocations failing in 4 cases. This is expected, since some of the tests are testing the memory allocation system of libc++. However, it appears that ASan does not call the user-supplied new_handler when memory allocation fails (and may not throw std::bad_alloc, ether). I have filed PR15544 to track this issue.
  • 25 cases are failing where the program is failing to load, due to a missing symbol. This is most commonly std::__1::__get_sp_mut(void const *), but there are a couple others. Howard says that this was added to libc++ after 10.8 shipped, so it’s not in the dylib in /usr/lib. If the tests are run with a copy of libc++ built from source, they pass.
  • There are the 12 cases that were failing before enabling Address Sanitizer.

Once Howard and I fixed the random tests and the bug in the iostreams code, I re-ran the tests using a recently build dylib.

    $ cd $LLVM/libcxx/test ; DYLD_LIBRARY_PATH=$LLVM/libcxx/lib CC=/path/to/tot/clang++ OPTIONS= "-std=c++11 -stdlib=libc++ -fsanitize=address" ./testit

This gave us 16 failures:

  • The 4 failures that have to do with memory allocation failures.
  • The 12 failures that we started with.

Conclusion

I’m glad to see that there were so few problems in the libc++ code. It’s a fundamental building block for applications on Mac OS X. And now it’s better than it was when we started this exercise.

However, we did find a couple bugs in the test suite, and one heap-smashing bug in libc++. We also found a limitation in Address Sanitizer, too – which I’m hoping the developers will address soon.

Fixing an interface bug

The problem

In Boost 1.50, I introduced the Boost.Algorithm library.

In it, I included a pair of variations of std::copy, called copy_while and copy_until. I’m pretty sure that you can figure out what they do from the names. In fact, I mentioned boost::copy_while in as an answer to a question on Stack Overflow

Here’s the implementation of boost::copy_while:

template<typename InputIterator, typename OutputIterator, typename Predicate> 
OutputIterator copy_while ( InputIterator first, InputIterator last, 
OutputIterator result, Predicate p )
{
    for ( ; first != last && p(*first); ++first )
        *result++ = *first;
    return result;
}

This matches the definition of std::copy_if and std::copy_n in the C++11 standard.

However, Sean Parent has convinced me that I got the interface wrong. He says, and I (now) agree that I have to return the modified input iterator as well. The problem is that if you use copy_while to process part of a sequence, how do you know where to pick up the processing from? If you are using real, actual input iterators, then you’re fine, because then you can just pick up with the input iterator that you passed to copy_while. But if you are processing a std::list<T>, for example, then probably want to know which was the first element that was not copied.

I know, this all seems obvious – now.

But I didn’t think of it at the time. I should have. I remember going to a talk by Alex Stepanov several years ago, and he said that when writing the STL, one of his guiding principles was “Don’t throw away information”. Here, we are throwing away the position in the source list.

Not that it makes me feel better, but the C++ standards committee made the same mistake with std::copy_n.

So, how do we fix it?

There’s the “don’t break existing code” school of thought

template<typename InputIterator, typename OutputIterator, typename Predicate> 
OutputIterator copy_while ( InputIterator first, InputIterator last, 
OutputIterator result, Predicate p, InputIterator *pos = NULL )
{
    for ( ; first != last && p(*first); ++first )
        *result++ = *first;
    if ( pos != NULL )
        *pos = first;
    return result;
}

Advantages:

  • Calling code does not have to change
  • Callers who want the position from the input list can get it by passing the address of an iterator (in the second case)

Disadvantages:

  • It’s ugly. The code used to be written in a functional style; all the parameters are input-only, and all the results are in the function return. That’s a powerful argument for me, and the fact that this function is all about side-effects (modifying the things that the iterators ‘point to’) only diminishes the argument somewhat.
  • It reduces our options in the future. A default argument has to be at the end of the parameter list.

How about we just “Make it right”, and screw the callers?

template<typename InputIterator, typename OutputIterator, typename Predicate> 
std::pair<InputIterator, OutputIterator>
copy_while ( InputIterator first, InputIterator last, 
OutputIterator result, Predicate p )
{
    for ( ; first != last && p(*first); ++first )
    *result++ = *first;
    return std::make_pair(first, result);
}

Advantages:

  • Matches the “style” of the STL. Parameters are inputs, results are outputs.
  • No hidden default arguments.
  • The change is visible at compile time; there will be no “silent failures”.

Disadvantages:

  • Calling code may have to change
    • There are two scenarios to consider.
    • (void) copy_while (first, last, out, pred);
    • out2 = copy_while (first, last, out, pred);

In the first case, nothing has to change. The return value is not being used, so changing its type means nothing. In the second case, however, the calling site must be modified. Fortunately, the change is simple:

 out2 = copy_while (first, last, out, pred).second;

or:

 std::tie(std::ignore,out2) = copy_while (first, last, out, pred);

Can we “Make it right” and don’t screw the callers?

On the boost mailing list, Sebastian Redl suggested using what he called a “biased_pair” type, which works like a std::pair (and is convertible to a std::pair), but also includes an implicit conversion (chosen at compile time) to either the first element of the pair or the second. For copy_while, the conversion would be to OutputIterator (the second element).

[ Added 03-03: Motti Lanzkron notes in the comments that this won’t work with auto anyway. ]

Advantages:

  • Calling code does not have to change
  • The code is still written in a style that appeals to me; there’s no mixing of inputs and outputs in the parameter list.

Disadvantages:

  • That’s a lot of infrastructure to build. std::pair is a distressingly complicated class. If Sebastian’s “biased_pair” was already in boost, I probably would have used it; but I didn’t want to write two classes and a bunch of boilerplate code. The cost, from my point of view, outweighed the benefit (not having to change calling code)

Conclusion

After some discussion on the boost list, I went with option #2 (break the callers). We’ll see how this works out.