I’m working on services that process tons of ASCII and/or US English Unicode data and have been going through the code base and putting in some easy wins for better performance, courtesy of the .NET 2.0 framework. Many developers have missed these tricks (although a couple of them I’ll discuss are even valid for the 1.x framework). The code fragments are in C#, but most of them port to VB.NET as-is.
First, here’s an easy trick that works pre-2.0: never pass a string of length 1 to IndexOf() or LastIndexOf(). Instead, pass a character:
Works, but slower: someString.IndexOf(“&”)
Faster and more self-evident as to intent: someString.IndexOf(‘&’)
For you VB’ers, that’s: someString.IndexOf(“&”C)
The exception, of course, is if the character you’re searching for is (or might be) a letter that requires culture-sensitive matching, such as a letter with an accent or ligature. But in most real-world cases I’ve encountered, that’s not true.
Which is a great lead-in to the rest of this post, which is that an awful lot of data scrubbing and parsing is against data that’s never going to need culture-sensitive matching or ordering. Even in data that has some culture-specific content, there are likely to be many columns that have none (for example, most product codes and other natural keys are pure alphanumeric).
When you don’t have to deal with cultural issues, you can make ordinal comparisons, which is straight character-by-character matching. This is significantly less expensive than culture-sensitive comparisons, which have to access lookup tables of characters that are considered matches.
Disclaimer: as with all speed optimization techniques, don’t go crazy. If you slow yourself down to think about every string comparison you make in your code to decide what comparison method to invoke, you’ll lose more than you’ll gain. I’m speaking here of routines that handle strictly simple ASCII data and make more than one or two comparisons.
Given the potential gains for ordinal comparisons, you should study and implement the CLR 2.0 recommendations for string comparisons. The short version, confining the discussion to the use of ordinal comparisons, is summarized below. Note that the last item is the most important to remember.
1) == makes ordinal comparisons.
2) String.Equals() makes ordinal comparisons by default.
3) String.IndexOf(char) and String.LastIndexOf(char) default to ordinal.
4) String.CompareOrdinal() is of course, an ordinal compare (and was around prior to CLR 2.0).
5) ToUpper() and ToLower() default to the current culture and cannot be made to do ordinal comparisons.
6) All other string comparison methods default to culture-sensitive comparisons but can be overridden by passing StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase.
In other words … IndexOf(string), LastIndexOf(string), StartsWith(), EndsWith(), Compare() and CompareTo() all need to be told if you want an ordinal comparison.
Works, but slower: someString.IndexOf(someSearchValue)
Faster: someString.IndexOf(someSearchValue,StringComparison.Ordinal)
Yes, it’s a lot more typing, but thank God for Intellisense.
Slightly less performant but still better than culture-sensitive comparisons is StringComparison.OrdinalIgnoreCase, which will ignore the case of letters. As a bonus, OrdinalIgnoreCase can often save you the extra step of forcing the string to be compared or searched to a known case that matches your comparison terms. In other words, it will generally save you calls to ToUpper() or ToLower().
Disclaimer: if there is a good chance that a method may need to at least optionally support culture-sensitive comparisons at some point in the foreseeable future, you will probably want to pass a StringComparison member to your routine just like the new String method overrides do, or use some other means to configure string comparisons.
A few other twists. Consider alternatives like the following that present an easier job to the runtime:
someString.StartsWith(“A”) vs someString[0] == ‘A’
someString.EndsWith(“Z”) vs SomeString[someString.Length – 1] == ‘Z’
You may or may not like the trade-off in readability. Personally I find someString[0] == someChar more readable than StartsWith, but I find the EndsWith() example a tougher call. Your mileage may vary, but again, you would only even think of these issues when you are in an expensive routine with lots of such calls, or a single such call in a loop.
What tricks do you use to speed up string comparisons?