In the previous installment in this series, we described a method, RemoveConsecutiveCharacters(), that demonstrates some techniques for minimizing string allocation and considerably improving performance as a result. This method begins, as should most any string transformation or concatenation method on a hot path, by creating a buffer in the form of a char[] array of sufficient size to hold the expected result — or, alternatively, the largest possible expected result, some of which can be discarded in the returned string.

We now turn to the matter of eliminating the char[] allocation itself.

You might wonder why the method creates the array via the char[] constructor, and then copies the contents of the string to it — rather than simply using string.ToCharArray() to do both at once. There’s a method to this madness. Although ToCharArray() should be about as fast (an examination of the relevant source code shows that it basically does an unsafe memory copy via pointer techniques and a call to an internal wstrcpy() method), the allocation overhead remains. Every call to this method allocates a char[] and then discards it, leaving it at the garbage collector’s doorstep.

Array allocation for small arrays is quite fast, even with GC overhead, but when we need larger arrays, this overhead becomes problematic. The answer to this is to reuse a preallocated array in some fashion. We can create a non thread-safe system for this quite easily, but let’s go with something more generic and thread-safe: the shared pool managed by System.Buffers and available to us in .NET Core / 5, and via a Nuget package, in CLR 4.5+. We can request an array from the pool of any desired type (in this case, char) and any desired size.

There are some minor limitations. First, any array obtained from the pool must be returned to it after use. While failure to do so does not cause a memory leak as such, it is a performance suck and may eventually cause the pool to run out of resources.

Secondly, while you are guaranteed to get an array of at least your requested size, you usually get a larger size than you ask for. The smallest array I’ve ever seen returned is one of 16 elements.

Thirdly, an array you get from the System.Buffers pool is not guaranteed to have any particular contents. You may get an array of zero/”null” characters, but don’t count on it. It may well contain leftover characters from prior usage. This is no problem for RemoveConsecutiveCharacters() because it immediately populates the array with the characters from the input string anyway.

Lastly, from the published benchmarks I’ve seen, obtaining an array from the pool is actually SLOWER than straight-up allocation if you request less than 1K worth of array (in the case of char[], 512 elements, since each UTF16 character occupies two bytes). At around 1K, the performance is comparable. Larger sizes are faster than conventionally allocating a new array. These benchmarks include GC overhead, so there is not an advantage below 1K requests where reduced GC activity overcomes slower allocation times.

From this, we end up with four scenarios:

  • Thread safe arrays smaller than 1K in size should be conventionally allocated.
  • Thread safe arrays >= 1K in size should be obtained from the array pool.
  • Single threaded scenarios, within a single method or its called private methods, can generally reuse static arrays. This is super fast, faster than either conventional allocation or requesting arrays above a certain size from the buffers pool.
  • Single threaded scenarios with complex requirements might be better served with either conventional allocation or with static pre-allocated and reusable arrays specific to a particular method. As always, long-lived allocations or allocations outside of hot paths (especially of small arrays) can simply be done in the conventional manner.

Let’s first explore how we’d straight-up substitute requesting an array from the array pool, rather than a conventional allocation. Then we’ll abstract away the above decisions about the “cheapest” way to obtain an array by substituting a call to a method of our own that will sort that all out.

So … after adding a using System.Buffers; statement at the top of the file, our simple allocation call:

char[] newChars = new char[s.Length];

… becomes a request to the buffer pool:

char[] newChars = ArrayPool<char>.Shared.Rent(s.Length);

At the very end of the method, we return the buffer to the pool like so:

ArrayPool<char>.Shared.Return(newChars);

And … voila! … there goes another allocation.

There’s an optional bool argument to ArrayPool.Shared.Return, which if true, will clear the array when it goes back into the pool. I don’t recommend using it, however, outside of perhaps returning an array that might contain security-sensitive information like an unencrypted password. Requesting that a returned string be cleared is performance overhead because you’re clearing an array that you probably haven’t even used most of. If you have an algorithm that needs to assume unused bytes are full of zeros, or you simply don’t want random clutter making it harder to single step through your logic during development, use Array.Clear(arrayName,int,int) right after you rent the array, to clear just the portion you actually need to use.

In RemoveConsecutiveCharacters(), we need change no other code due to the extra / unused space of our oversized array, as we are already maintaining the newLength variable and using only the first part of the character array to allocate the final string. In general, we only need to make sure we use the string constructor overload that takes a character array, start position and length argument, so that the resulting string is no larger than the data it needs to hold.

If your method returns a string instance rather than adjusting a by-reference passed string, you’ll need to store the final string in a variable, return the char[] to the pool, and then return the string variable.

Here’s our RemoveConsecutiveCharacters method from our previous article in this series, modified to use the “rented” char array:

using System.Buffers;

public static void RemoveConsecutiveCharacters(ref string s, char c) {
    int state = 0;
    int pos = -1;
    int runLength = 0;
    int newLength = s.Length;
    char[] newChars = ArrayPool<char>.Shared.Rent(s.Length);
    s.AsSpan().CopyTo(newChars.AsSpan());
    int i = 0;
             
    while (i < newLength) {

        if (newChars[i] == c) {
 
            if (state == 0) {
                state = 1; // encountered start of a run of characters
                pos = i; // pointer to first character in run
                runLength = 1;
            } else {
                runLength++;
            }

        } else {

            if (state == 1) {

                if (runLength > 1) {
                    // exited run of characters; copy rest of string over top of all but one
                    int copyLength = newLength - pos - runLength + 1;
                    newLength -= runLength;
                    newLength++;
                    newChars.AsSpan(pos + runLength - 1, copyLength).CopyTo(newChars.AsSpan(pos, copyLength));
                    i -= (runLength - 1); // adjust pointer for next character to examine, as we've shortened the evolving new string value
                    newChars[newLength] = '\0'; // This is just for debugging / visualization ease, it can be commented out after testing
                }
 
                state = 0;
            }
 
         }
 
         i++;
     }
 
     if (newLength < s.Length) {

        if (state == 1 && runLength > 1) {
            // We're inside a run at the end of the string.
            newLength -= runLength;
            newLength++;
        }
 
        s = new string(newChars, 0, newLength);
     }
 
    ArrayPool<char>.Shared.Return(newChars);
}

As you can see, there’s not much difference. Now that you understand the basic concept, we can replace the direct calls into ArrayPool<char>.Shared with methods of our own, which will decide based on the requested size and whether we want thread safety, the best way to return a usable array. We’ll encapsulate these in a static CharArrayPool class, and call these methods Rent(int size) and Return(array).

First. we need an IsThreadSafe property, which you can default to true or false according to your typical need (True is the safer general purpose default). You can set this as appropriate before making any requests for arrays.

When thread safety is requested, we simply do a conventional allocation for requests of arrays smaller than 1K in memory size. Otherwise we return an array from the pool.

When thread safety isn’t needed, we reuse an array of characters of generous size (4K elements or 8K of memory, in our case) which nevertheless will be resized dynamically if a larger request is ever made. Initially our list of reusable arrays is empty. The first array is allocated on the first Rent() call. More arrays are added to the list as needed. Return() calls are assumed to be in reverse order of the original requests, or “last out / first in”.

Here’s how one might implement this within a static library class:

public static class CharArrayPool {
    private static bool _isThreadSafe = false;
    public static bool IsThreadSafe { get => _isThreadSafe; set => _isThreadSafe = value; }

    // _chars is a shared list of char arrays used to obtain temporary buffers.
    // This is used when _isThreadSafe = false for super fast performance.
    private static List<char[]> _chars = new List<char[]>(6); // initially empty, but capacity of 6 so the backing array likely won't have to be re-grown
    private static int _charsPointer = 0; // always points to next _chars index to obtain string from
    private static int MIN_POOL_REQUEST_SIZE = 512;
    private static int SINGLE_THREAD_BASE_SIZE = 4096;

    public static char[] Rent(int minsize) {

        if (_isThreadSafe) {

            if (minsize >= MIN_POOL_REQUEST_SIZE) {
                return ArrayPool<char>.Shared.Rent(minsize);
            } else {
                return new char[minsize];
            }

        } else {

            if (_charsPointer == _chars.Count) {
                // requesting more arrays than currently in the _chars list
                _chars.Add(new char[SINGLE_THREAD_BASE_SIZE]);
            }

            if (minsize > _chars[_charsPointer].Length) {
                // request is larger than current size; grow it.
                int bufferLength = _chars[_charsPointer].Length;

                while (minsize > bufferLength) {
                    bufferLength *= 2;
                }

                _chars[_charsPointer] = new char[bufferLength];
            }

            return _chars[_charsPointer++];
        }

    }

    public static void Return(char[] buffer) {

        if (_isThreadSafe) {

            if (buffer.Length >= MIN_POOL_REQUEST_SIZE) {
                ArrayPool<char>.Shared.Return(buffer);
            }

        } else {

            if (_charsPointer == 0) {
                throw new ApplicationException("ReturnBufferToPool(): no buffers to return.");
            } else {
                _charsPointer--;
            }

        }

    }

}

Now in our RemoveConsecutiveCharacters() method, the ArrayPool<char>.Shared.Rent() call would be replaced with CharArrayPool.Rent() and the ArrayPool<char>.Shared.Return() call would be replaced with CharArrayPool.Return(). In conjunction with the properly set CharArrayPool.IsThreadSafe property, you’ll always get the most efficiently allocated arrays.

We’ve now covered basic string manipulation concepts, as well as some fairly advanced techniques for minimizing allocations that are especially useful on hot paths. In my next post in this series, I’ll discuss some of the other tricks available to us in the latest .NET runtime, and review a useful open source library of routines that implement the techniques I’ve discussed.

{ 0 comments }

In our first installment, we discussed the importance of avoiding allocating new strings, which is devilishly easy to do when manipulating them. We also touched on the use of StringComparison types to avoid allocations via calls to ToUpper() or ToLower() when doing case-insensitive comparisons.

Strings are, deep in the .NET internals, not “normal” reference types and in some ways refuse to behave as such. They are their own special beast, due to their immutability. They are essentially a wrapper around a character array. In this fact lies a clue as to how we avoid code with allocating side effects like these:

string newString = oldString1 + oldString2 + oldString3;
string newString = oldString1.Substring(0, oldString1.IndexOf(' ') + 1) + oldString2;

You might look at these assignments and think that they are easily resolved via the use of StringBuilder. And you would be right. But is that the most efficient way? Not really, not for hot paths.

.NET Core and .NET 5 actually have some optimizations for concatenating two strings similar to what we’re about to discuss, and in the future, may have more optimizations for other concatenation scenarios. There also will doubtless be improvements in StringBuilder and String.Format() as well. We are out on the bleeding edge here with what we’re attempting, for a particular purpose: extensive string manipulation on hot paths. As with any performance optimization, you should test these techniques against conceptually simpler and arguably more readable code, and you should re-test when new major .NET releases occur annually. But for now, these techniques will wring the last bit of performance out of things for you, and they should be fast enough that there would be no particular advantage in ripping them out later, or even delegating to .NET internals really.

The techniques I’ll show you can be encapsulated in method libraries with easy and natural calling interfaces so that they don’t burden your mainline code with weird requirements or awkward calling conventions (beyond passing some string arguments by reference, which may “feel” weird but actually makes perfect sense in context).

So: let’s get to it.

In most cases, when we want to create a new string, we know, or can easily figure out, the total length of the string we’re trying to create. Since a string instance is “just” a wrapper around a character array containing the string’s contents, we can create a character array of the appropriate size and copy the contents of various (sub)strings into it at the appropriate places, then create a new string from that character array using the new string(characterArray) constructor.

The problem is that until .NET Core, we didn’t have very fast tools for doing this. Copying one character at a time from a source character array or string indexer to a target character array isn’t the fastest way to go about this.

This is solved for us in the latest .NET iterations with Span<T> and ReadOnlySpan<T>. For C programmers reading this, you can think of Spans as a mechanism to allow type-safe and bounds checked memcpy() calls that are every bit as fast as directly copying blocks of memory. For the rest of us, think of Spans as a lens through which you can access all or part of an array of any type for the purpose of safe, ultra-fast copying. A Span superimposes itself over the raw data in an array and allows you to examine and copy all or part of it easily.

Let’s start with a simple example routine to create a new string from components of an existing one, and show how we can use spans to move data around. Suppose we want to replace all instances of runs of a given character in a string, with a single instance of that character. You could use this, for example, to normalize a string to never have multiple spaces in it. A simple implementation to replace all character runs with single instances of a character might be:

public void RemoveConsecutiveCharacters(ref string s, char c)
{
    if (s == null || s.Length < 2)
        return;

    string consecutiveRun = new string(c, 2);

    if (s.IndexOf(consecutiveRun, StringComparison.Ordinal) > -1)
    {
        string charAsString = new string(c, 1);

        while (s.Length > 1 && s.IndexOf(consecutiveRun, StringComparison.Ordinal) > -1)
            s = s.Replace(consecutiveRun, charAsString);
    }

}

We can make this, on average, TWICE as fast, though, at the expense of lengthier code. But let’s assume it’s a frequently needed normalization call, and let’s recast this in a more efficient way.

First, let’s count the allocations in the above code. Creating the string consecutiveRun is an allocation. Creating charAsString is another. Each Replace() call is another. The biggest offender is inside the loop, creating a new string every time we encounter another run of characters. This is particularly inefficient if the string contains runs longer than 2. In the case, for example, where there’s a run of 3 characters, that will be changed to 2, then to 1.

The way we’ll approach this is to copy the string’s characters into a character array, then scan that array for character runs of any length, and use copy techniques to shift everything to the right of the character run to the left, overwriting all but one instance of the character in the run. This will leave “garbage” at the end of the array, but we’ll track the shrinking length of the real string data as we go.

At the end, we’ll do just one allocation — the new string — via a new string(char[],start,length) call where start is zero and length is the new, shorter length.

Here’s the entire code; we’ll break this down into steps in the ensuing commentary. In a future installment, we’ll tighten it down still further.

public static void RemoveConsecutiveCharacters(ref string s, char c) {

    if (s == null || s.Length < 2)
        return;

    int state = 0;
    int pos = -1;
    int runLength = 0;
    int newLength = s.Length;
    char[] newChars = new char[s.Length];
    s.AsSpan().CopyTo(newChars.AsSpan());
    int i = 0;
             
    while (i < newLength) {

        if (newChars[i] == c) {
 
            if (state == 0) {
                state = 1; // encountered start of a run of characters
                pos = i; // pointer to first character in run
                runLength = 1;
            } else {
                runLength++;
            }

        } else {

            if (state == 1) {

                if (runLength > 1) {
                    // exited run of characters; copy rest of string over top of all but one
                    int copyLength = newLength - pos - runLength + 1;
                    newLength -= runLength;
                    newLength++;
                    newChars.AsSpan(pos + runLength - 1, copyLength).CopyTo(newChars.AsSpan(pos, copyLength));
                    i -= (runLength - 1); // adjust pointer for next character to examine, as we've shortened the evolving new string value
                    newChars[newLength] = '\0'; // This is just for debugging / visualization ease, it can be commented out after testing
                }
 
                state = 0;
            }
 
         }
 
         i++;
     }
 
     if (newLength < s.Length) {

        if (state == 1 && runLength > 1) {
            // We're inside a run at the end of the string.
            newLength -= runLength;
            newLength++;
        }
 
        s = new string(newChars, 0, newLength);
     }
 
}

Let’s begin by looking at the creation of the char[] array to hold the string’s contents:

char[] newChars = new char[s.Length];
s.AsSpan().CopyTo(newChars.AsSpan());

Let’s break that down into equivalent steps that might be clearer to understand:

char[] newChars = new char[s.Length];
var sourceSpan = new ReadOnlySpan<char>(s);
var targetSpan = new Span<char>(newChars);
sourceSpan.CopyTo(targetSpan);

As the name suggests, a ReadOnlySpan is just a Span that we can’t write to. Since strings are read-only (remember, they’re immutable), and we just want to read from it anyway, we create a ReadOnlySpan. This is a good habit. We can simplify all this, though, using the AsSpan() extension method:

char[] newChars = new char[s.Length];
var sourceSpan = s.AsSpan();
var targetSpan = newChars.AsSpan();
sourceSpan.CopyTo(targetSpan);

String.AsSpan() automatically returns a ReadOnlySpan. Since the need for the source and target spans were ephemeral, we had no real need to store them in variables, so we just shortened those last three lines into one:

s.AsSpan().CopyTo(newChars.AsSpan());

This is saying, in compact form, “directly copy all the characters in s to newChars.” (Note that the character array allocation and copy could be accomplished via s.ToCharArray(), but there’s a reason we’re not doing that which will become clear in our next installment; besides this is an easily grasped example of using Spans).

You might say after looking at this, “but the Span over newChars ceases to exist after this call!” And you’d be right, but it would be no problem: the copying operation is done, and the result is already in the newChars array. Remember that a Span is just a type-safe, bounds-checked mechanism for accessing the memory occupied by some or all the items in an array; it’s only needed for copy operations.

To understand what’s going on in the rest of the method, let’s consider the following string:

"A string  that has some double  spaces here and there.  "

We need to collapse the double spaces between “string” and “that”, between “double” and “spaces”, and at the very end of the string. We’ll do that via this call, assuming variable myString holds the above string:

FastString.RemoveConsecutiveCharacters(ref myString, ' ');

The first run begins at position 8 in the string. The method now creates a Span over the portion of the newChars array beginning at position 9 to the end, and copies it to a Span over the the same array from position 8, for the same length. It then places a “null character” (zero value) just past the end of the “new” string data. Here’s a representation of the characters in the newChars array, initially, and just after each copy operation (“dead” characters represented by asterisks):

"A string  that has some double  spaces here and there.  "
"A string that has some double  spaces here and there.  *"
"A string that has some double spaces here and there.  **"
"A string that has some double spaces here and there. ***"

Note, per the code comments, that the setting of null characters is purely for easy visualization; you can change it to any value (such as asterisks, above) or simply comment it out for a release build.

Eventually, the main loop encounters the end of the actual string data, sparing us (if characters were actually removed) from examining the “junk” portion of the array any further, and the special case of a trailing run of spaces is handled by adjusting the newLength variable. The 56 character string has become a 53 character string with just one string allocation involved! That’s better than the five allocations (at least) that would happen under the first implementation shown above.

Testing under CLR 4.7.2 shows this to have a run time that is 39 to 62 percent of the naive, if straightforward, implementation we began with, and this holds pretty well regardless of whether we do a thousand or ten thousand iterations, detect one consecutive run of characters, or many. That first implementation would be adequate for occasional use, but clearly on hot paths with thousands of calls, the kind of savings represented by our Span-based implementation adds up rapidly. And there’s no difference in the actual call; all the extra effort (and testing) is hidden from the client, as it should be.

There’s one more allocation going on in this routine. It’s not a string allocation; it’s the allocation of the newChars char[] array. In our next installment, we’ll examine ways to eliminate even that!

{ 2 comments }

Text Processing in .NET: #1 – String Comparisons, Ephemeral Allocations, and Pass-By-Reference

December 22, 2020

In my experience, many developers write unintentionally inefficient code for text processing applications. Having presided over the design and maintenance of back end code responsible for processing huge daily volumes of text for much of my career, I have a few suggestions that are relatively easy conceptually and not costly to implement. I’ve seen the […]

Read the full article →

The Custom Software Development “Renaissance”

April 4, 2015

I ran across a twitter comment about custom software development undergoing some sort of resurgence, which linked to this rather pedestrian article on the topic which I think misses the point a bit.  I have no factual quibbles with the article as far as it goes.  But the tradeoffs in custom software today are not […]

Read the full article →

The IT Wall Of Denial

March 11, 2015

One of the character traits that separates okay developers from great developers (or for that matter, okay sysadmins from great ones) is the amount of ego they have mindlessly invested in Being Right.

Read the full article →

The Most Overlooked Skill in Software Development

September 25, 2013

No, it’s not Agile methodologies, Ruby on Rails skills, or Test Driven Development.

Read the full article →

On Not Knowing Until You Know

December 25, 2012

I’m reading Extreme Programming Refactored:  The Case Against XP from Apress Books.  I have to confess that although I generally regard XP as an off-the-rails over-reaction to the perceived shortcomings of traditional software design and specification methods, I have some affinity for the XP concept that “the code is the design” — in this limited sense:  in […]

Read the full article →

On Professionalism

December 25, 2012

In software development, there are — how to put it delicately? — prima donnas.  It’s always been that way, but this year I’ve had more than my share to deal with, and it’s moved me to reflect on the nature of professional conduct.  What, exactly, constitutes “professional conduct” in this field? Accusations of “unprofessionalism”, after […]

Read the full article →

Connecting to a Mirrored Sql Server Database With No Failover Specified

March 10, 2012

I was testing a new development tool today that connects to remote servers using a proprietary connection string builder that doesn’t allow you to specify a failover server.  Since I was connecting to a mirrored database I was curious what the implications were for not specifying a “Failover Partner” in the connection string. The answer […]

Read the full article →

On Solutions Looking For Problems

March 8, 2012

Microsoft has now made Windows 8 and its Metro touch interface available for us all to look at and Neil McAllister over at InfoWorld isn’t mincing words.  He’s calling the Metro value proposition for developers “a con”.  His fellow columnist J. Peter Bruzzese calls it Windows Frankenstein.  Meanwhile Visual Studio 11 is getting a lukewarm […]

Read the full article →