Dr. StrangeCoder or: How I Learned to Stop Worrying and Love C++
A good friend of mine at work (who is probably one of the best coders I've ever met, and a really nice guy to boot) posted the following over at his blog:
http://blog.grumpycoder.net/index.php/post/2012/03/02/Why-C-is-a-good-language-or-why-you-re-wrong-thinking-C-is-better
We've exchanged email about it a bit already, but I wanted to find out if the things I was saying actually made sense, so I took his challenge (he got it from http://usuckatcoding.com/):
Given a string of random characters, return the index of the first nonrepeated character. If all characters repeat, return -1. For example, given the string "abcab", 'c' is the first nonrepeated character and the result would be 2.He did a solution in C to prove his point. He describes it thusly:
BONUS: There are time/space tradeoffs to be considered. The best solution will address any optimal time solution as well as any optimal space solution.
It uses ~1800 bytes on the stack. It's in O(n), and starts with a memset of 258 bytes. It supports strings up to 2^31 bytes.The thing is, that's a very C way of looking at coding.
So I sat down to do this in C++. This took about two minutes to write (I'm omitting testing code) and it uses the identical algorithm to my friend's C solution:
int challenge1(const char *str) {
std::list<int> results;
std::map<char,std::list<int>::iterator> seen_map;
char c;
int ret;
for(ret = 0; (c = str[ret]); ++ret) {
std::map<char,std::list<int>::iterator>::iterator it = seen_map.find(c);
if(seen_map.end() == it) {
// if we haven't seen it before, record it and it's position
results.push_front(ret);
seen_map.insert(std::pair<char,std::list<int>::iterator>(c,results.begin()));
} else {
if(results.end() != it->second) {
// this is the duplicated character, remove it from the results
results.erase(it->second);
it->second = results.end();
}
}
}
results.push_front(-1);
return results.back();
}It's a very C++-ish solution. It does suffer in performance from its extensive heap usage (as my friend asserts) and it's even not O(n) because std::map is an ordered map (and therefore, likely, a tree). However, it is shorter, and the use of the insert/erase function names gives the sense of exactly what it's doing. (Also many of you who already hate C++ syntax are cringing at the iterator declaration inside the for-loop, and I won't disagree, though C++11's auto keyword and a better insert() prototype would fix that.)
Now, we can do better, but I don't think an average C++ programmer would have a lot of difficulty coming to this average solution fairly quickly. Possibly even quicker than the C guy.
So why do I think C++ is a better language? Because it lets you write suboptimal code faster? People who make that argument are (rightly) missing my friend's point. If your desire is to trade performance for ease of coding then use something even easier than C++ for goodness sakes!
I think C++ is a better language than C because it also doesn't hold back the performance. Let's look at iterating on this from a strictly C++ methodology. What do we see? Well, it's very specific to this problem. Let's genericize it (Sharon's gonna kill me for making up words.) We're going to explicitly overdo it to make a point:
template <class Traits>
int challenge2(const typename Traits::Container &container) {
typedef typename Traits::List ListType;
typedef typename ListType::iterator ListIterator;
typedef typename Traits::Map MapType;
typedef typename Traits::MapNode MapNodeType;
typedef typename MapType::iterator MapIterator;
typedef typename Traits::Container ContainerType;
typedef typename ContainerType::const_iterator ContainerConstIterator;
typedef typename Traits::Value ValueType;
ListType results;
MapType seen_map;
int ret = 0;
for(ContainerConstIterator cit = container.begin();
cit != container.end(); ++cit, ++ret) {
const ValueType &v = *cit;
MapIterator mit = seen_map.find(v);
if(seen_map.end() == mit) {
results.push_front(ret);
seen_map.insert(MapNodeType(v,results.begin()));
} else {
if(results.end() != mit->second) {
results.erase(mit->second);
mit->second = results.end();
}
}
}
results.push_front(-1);
return results.back();
}
struct ChallengeTraits
{
typedef std::list<int> List;
typedef char Value;
typedef std::vector<Value> Container;
typedef std::map<Value,List::iterator> Map;
typedef std::pair<Value,List::iterator> MapNode;
};Now here's where the real beauty of C++ kicks in. This version is generic enough that the space/time tradeoff can be done ENTIRELY in the Traits class (and fix that whole ordered_map thing at the same time). I could, for instance, define my ChallengeTraits class to use stack-based fixed size implementations like the C solution and use exactly the same space. I leave the code as an exercise to the reader (though it's not hard) and postulate that the C++ STL SHOULD PROVIDE those classes. Since it doesn't (and in general the STL doesn't provide enough flexibility with resource allocation with it's broken allocator model) I would expect that any C++ coder who cares about performance would write them exactly once in their career and then just use them over and over. The code (outside of Traits) hasn't grown appreciably longer.
But that's not all! Act now, and get a generic search in your generic random access container for the first non-duplicated value. In fact, if you change the function to return an iterator you don't even need the random access part.
Also, if you know things about your value set, you can get even better solutions. For instance, maybe the string only contains printable characters and you could code up your Map class to map your range of (32-126) to (0-94) and use less space than the current C solution. All the complexity is hidden away in your Traits, which you customize to your problem.
Let's review. C++ lets us use it's type and template system to specify an object with the exact minimum behavior we need for our algorithm to run correctly, which then lets you deal with the complexity to run optimally elsewhere and in a reusable way. It's the win-win scenario (though not the win-win-win your Michael Scotts might want.)
Class dismissed.