Here’s a micro-pattern that can show up sometimes in string processing: a StringBuilder is created, and then in a loop some sort of string is built with repeated Append() calls (or with the much under-used AppendFormat() method). The string is then pulled out and consumed by something else, the StringBuilder is emptied and re-used on the next iteration.
For some reason there is no Clear() method on a StringBuilder. I always thought this odd. I’ve generally used:
someStringBuilder.Remove(0,someStringBuilder.Length);
Today, trying to tighten something time-critical, I ran across:
string theString = someStringBuilder.ToString(); someStringBuilder.Replace(theString,"");
I was about to replace this with my usual Remove() call, figuring that eliminating the string matching would be a performance win, when on a sudden impulse I tried this:
someStringBuilder.Length = 0;
Lo and behold, the Length property is not read-only. No wonder there is no Clear() method; one isn’t really needed, except maybe to make how to do this more obvious.
{ 19 comments… read them below or add one }
Nice tip(s).
Did you check the memory after a garbage collection to see if it really saves anything?
Bob responds: No, because the point isn’t to save memory, it’s to clear a string builder for re-use in a way that is simpler. If you need to clear it, you need to clear it. In addition, you beg the question, “saves anything in comparison to what?” Saves time vs newing up a new StringBuilder? Probably. Saves time vs the other methods I mentioned? Very likely. Enough of either to matter? Depends on the particular situation and the mix of factors such as how much other work is being done within the loop, the size of the strings being manipulated, and so on. But I subscribe to the principle that the drop-dead simplest way of doing any one thing tends to be the very best way of doing that thing, even if the only obvious benefit is elegant, self-evident code.
If you mean, does setting the length to zero clear the buffer and free any memory … I doubt it. A StringBuilder has a default initial capacity of 16 byes (you can override the default with one of the overloaded constructors). If the string exceeds the capacity, the internal buffer is doubled in size until it’s big enough. I’m sure that setting the length to zero simply makes the buffer re-usable, and doesn’t shrink it. However, in most cases, freeing up memory would not be a net win because as the new string is built, the buffer would have to be re-expanded, which is a fairly expensive operation (and for all I know, tends to fragment memory).
FYI, .NET 3.5 framework introduces StringBuilder.Clear()
Ahmed ….where ? I don’t see .Clear() coming up in a newly created solution in VS2008 using 3.5. What assembly/version are you loading that includes that interface for StringBuilder ?
Thanks for the tip! I too have always found it odd that there’s no Clear() method, and even though setting the Length property to zero accomplishes the same thing, I’d have still included a Clear() method for clairity/consistency. :)
Thanks again!
You should check garbage collection because it starts with a small size and gets bigger as you append to it. If you clear it and don’t check the size it uses this could be a performance hit on small machines like CE and others.
M$ says in article on MSDN:
Notes to Implementers:
The default capacity for this implementation is 16, and the default maximum capacity is Int32..::.MaxValue.
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx
Bob responds: In general, you will clear the buffer in order to re-use the StringBuilder, likely to construct another string of similar size. Nothing is to be gained by shrinking and re-growing the capacity (re-expansion is expensive too) unless you won’t be using the StringBuilder at all, in which case the entire StringBuilder should go out of scope anyway. In addition, garbage collection is non-deterministic and you have only rough control over it. If you must obsess over managing a system that is carefully designed and tuned to manage itself just fine in the vast majority of real-world scenarios, your concern is likely misplaced. You should be more concerned about the design of your object graph. Finally keep in mind that you can very often set the initial capacity when the StringBuilder is constructed. You can set the capacity for the expected use in such a way that the buffer will auto-expand very little if at all during use, thus minimizing memory fragmentation.
Just did a little experiment and:
someStringBuilder.length = 0 ;
does not reduce the capacity as expected.
If using .net 2.0 follow your ‘length = 0 ‘ with .capacity = 0;
like this:
someStringBuilder.length = 0;
someStringBuilder.capacity = 0;
Together Everyone Achieves More
Bob responds: Yes, length and capacity are two separate issues. The default initial capacity is 16, the default initial length is zero. The string grows until it reaches the capacity and triggers an expansion. So if your objective is to reduce the buffer size you have to set capacity separately. However I would never set the capacity to zero. That guarantees that on future Append() calls you will have at least one auto-expansion of the buffer — an expensive and potentially memory-fragmenting operation. Zero should not be a legal value for Capacity. Set Capacity to at least 1 or the minimum length of the string you expect to create, whichever is larger.
One last thing:
Just because it’s small doesn’t mean it doesn’t matter (or at least that’s what my gf tells me).
If you save 16 bytes or 2MB, it may seem small until you multiply it by the number of users.
Consider the user count of say Mysp@ce or M$ or the NY Stock Exchange.
Bob responds: There is a balance between thrashing and a stable state to consider. I am simply saying that with a single StringBuilder, if you expect to construct strings of say 50 to 100 bytes, with the majority toward the top end, set an initial capacity of 100 and leave it alone. Destroy the StringBuilder when you are done with it. This is better and simpler in almost all real world cases than tamping the capacity back down, unless actual testing reveals a clear advantage, which I predict it won’t.
Lovely tip thanks
Nice tips!
To expand it even further, in .NET 3.5 (not sure which version it was introduced) you can create an expanded method as such:
public static void Clear(this StringBuilder stringBuilder)
{
stringBuilder.Length = 0;
stringBuilder.Capacity = 16;
}
now when you create your stringbuilder you can call it normally. For example:
StringBuilder sb = new StringBuilder();
sb.Clear();
Bob responds:
Thanks, excellent point. Depending on the reuse pattern, resetting Capacity back to the default of 16 bytes may or may not be desirable, because re-expanding capacity is expensive. It may make sense to leave it alone, or set it to a larger value than 16. So my impulse would be to default to leaving it be, and taking an optional argument to set it to a specific value.
To this end — 3.5 introduces a Clear() method anyway, and while it’s not documented what, if anything it does to the Capacity value, if it leaves it alone, creating an extension method that sets it to a specific value might be a useful enhancement.
thanks for the post.. Is this not okay?
StringBuilder sb = new StringBuilder();
sb.Append(“hello”);
sb = new StringBuilder();
Bob responds:
Presumably newing up another instance is more expensive than simply clearing the buffer of an existing one. Also, creating new instances and abandoning old ones probably means more memory fragmentation and would tend to run the garbage collector more. Can’t say that I’ve tested it, though, and one’s mileage would vary depending on usage patterns. Always remember, most of this discussion ends up not mattering all that much in terms of performance unless you’re in a tight loop with many iterations, and then, of course, it’s best to verify your assumptions with testing.
Hi.
Doesnt work on netbeans. Length is not accesible.
This is my info:
Product Version: NetBeans IDE 6.8 (Build 200912041610)
Java: 1.6.0_20; Java HotSpot(TM) Client VM 16.3-b01
System: Windows XP version 5.2 running on x86; Cp1252; es_VE (nb)
Userdir: C:\Documents and Settings\Renzo\.netbeans\6.8
Any idea why it doesnt work?
Bob responds:
We’re not talking Java here, this applies to the .NET framework.
I’m with bob on resetting capacity in normal day use.
One potential exception may be sensitive information you don’t want floating arround in memory but there you would probably use a dedicated stringbuilder thats wiped the moment you are done.
On the many users use (William) you would have to test impact of having a lot of string builders that are using resources vs many clean up actions using resources.
If you have usual 100bytes need with an occasional 2meg need you could add a
if size > {usual size} then reset
A StringBuilder.Clear() was added in .Net 4.0 (not 3.5, as suggested in an earlier comment)
@Renzo:
Funny. I also needed this for Java, and this solution actually helps.
Its just that Java doesn’t have Properties, so you need to call
myStringBuilder.setLength(0);
Nicholas Hagen indicates in “Java: String Concatenation” some time minimization considerations that are pertinent here. I can see that on some platforms where memory is scarce that memory utilization might directly relate to time efficiencies. But, where memory is plentiful the time spent bulding objects dominates efficiency concerns. So, Bobs assertion that sizing properly and reusing objects will result in faster code is consistent with Nicholas Hagen’s findings. Thanks Bob.
If take a look into StringBuilder.Clear() realization in .Net 4.0 you will see
public StringBuilder Clear()
{
this.Length = 0;
return this;
}
I think this approach is appropriate for earlier frameworks.
After sb.Clear the capacity is returned to 16, but you don’t seem to get the memory back until a garbage collection is triggered. Quite likely here that .Clear simply chucks away the old object as garbage and effectively gives you a new object.
I don’t think you can actually get .net managed memory back without a garbage collection. For one thing, this is to avoid memory fragmentation via managed heaps.
Thanks for the suggestion! This definitely did what I needed it to do for the project I’m working on.
What if I wanted to “trim” lines (not characters) from the end of StringBuilder?
My log has 5000 lines. I want to keep the bottom (newest) 1000 lines.
(I’m use VB.NET)
Bob responds: A StringBuilder deals with all the data as a single string. You need to deal with the individual lines, and the only way to do that is to count the lines and delete all but the last 1K. Probably load the individual lines into a List(Of String) to do that kind of thing, then write them back to a file or append them to a StringBuilder or whatever the work context requires. Maybe it’d be easier to store your log entries in a database table and trim them using SQL. Or delegate the work to some external utility like the UNIX tail utility. Lots of ways to skin the cat but StringBuilder is the wrong tool.