The problem
In Boost 1.50, I introduced the Boost.Algorithm library.
In it, I included a pair of variations of std::copy
, called copy_while
and copy_until
. I’m pretty sure that you can figure out what they do from the names. In fact, I mentioned boost::copy_while
in as an answer to a question on Stack Overflow
Here’s the implementation of boost::copy_while
:
template<typename InputIterator, typename OutputIterator, typename Predicate>
OutputIterator copy_while ( InputIterator first, InputIterator last,
OutputIterator result, Predicate p )
{
for ( ; first != last && p(*first); ++first )
*result++ = *first;
return result;
}
This matches the definition of std::copy_if
and std::copy_n
in the C++11 standard.
However, Sean Parent has convinced me that I got the interface wrong. He says, and I (now) agree that I have to return the modified input iterator as well. The problem is that if you use copy_while
to process part of a sequence, how do you know where to pick up the processing from? If you are using real, actual input iterators, then you’re fine, because then you can just pick up with the input iterator that you passed to copy_while
. But if you are processing a std::list<T>
, for example, then probably want to know which was the first element that was not copied.
I know, this all seems obvious – now.
But I didn’t think of it at the time. I should have. I remember going to a talk by Alex Stepanov several years ago, and he said that when writing the STL, one of his guiding principles was “Don’t throw away information”. Here, we are throwing away the position in the source list.
Not that it makes me feel better, but the C++ standards committee made the same mistake with std::copy_n
.
So, how do we fix it?
There’s the “don’t break existing code” school of thought
template<typename InputIterator, typename OutputIterator, typename Predicate>
OutputIterator copy_while ( InputIterator first, InputIterator last,
OutputIterator result, Predicate p, InputIterator *pos = NULL )
{
for ( ; first != last && p(*first); ++first )
*result++ = *first;
if ( pos != NULL )
*pos = first;
return result;
}
Advantages:
- Calling code does not have to change
- Callers who want the position from the input list can get it by passing the address of an iterator (in the second case)
Disadvantages:
- It’s ugly. The code used to be written in a functional style; all the parameters are input-only, and all the results are in the function return. That’s a powerful argument for me, and the fact that this function is all about side-effects (modifying the things that the iterators ‘point to’) only diminishes the argument somewhat.
- It reduces our options in the future. A default argument has to be at the end of the parameter list.
How about we just “Make it right”, and screw the callers?
template<typename InputIterator, typename OutputIterator, typename Predicate>
std::pair<InputIterator, OutputIterator>
copy_while ( InputIterator first, InputIterator last,
OutputIterator result, Predicate p )
{
for ( ; first != last && p(*first); ++first )
*result++ = *first;
return std::make_pair(first, result);
}
Advantages:
- Matches the “style” of the STL. Parameters are inputs, results are outputs.
- No hidden default arguments.
- The change is visible at compile time; there will be no “silent failures”.
Disadvantages:
- Calling code may have to change
- There are two scenarios to consider.
(void) copy_while (first, last, out, pred);
out2 = copy_while (first, last, out, pred);
In the first case, nothing has to change. The return value is not being used, so changing its type means nothing. In the second case, however, the calling site must be modified. Fortunately, the change is simple:
out2 = copy_while (first, last, out, pred).second;
or:
std::tie(std::ignore,out2) = copy_while (first, last, out, pred);
Can we “Make it right” and don’t screw the callers?
On the boost mailing list, Sebastian Redl suggested using what he called a “biased_pair” type, which works like a std::pair
(and is convertible to a std::pair
), but also includes an implicit conversion (chosen at compile time) to either the first element of the pair or the second. For copy_while
, the conversion would be to OutputIterator
(the second element).
[ Added 03-03: Motti Lanzkron notes in the comments that this won’t work with auto
anyway. ]
Advantages:
- Calling code does not have to change
- The code is still written in a style that appeals to me; there’s no mixing of inputs and outputs in the parameter list.
Disadvantages:
- That’s a lot of infrastructure to build.
std::pair
is a distressingly complicated class. If Sebastian’s “biased_pair” was already in boost, I probably would have used it; but I didn’t want to write two classes and a bunch of boilerplate code. The cost, from my point of view, outweighed the benefit (not having to change calling code)
Conclusion
After some discussion on the boost list, I went with option #2 (break the callers). We’ll see how this works out.