Yesterday, an external API call that I was making failed because one of the values that I was posting contained a trailing "Zero width space" character (\u200b
). The value in question was being passed-through ColdFusion's native trim()
function; which was clearly not removing this whitespace character. As such, it occurred to me that I didn't really know which characters are (and are not) handled by the trim()
function. And so, I wanted to run a test.
One of the things that I love about Lucee CFML is that all of the source code is posted right there on GitHub. So, if I want to know how something is working under the hood, I can just go look at it. When we look at Lucee's implementation of the trim()
function, we can see that it is handing control off to Java's String.trim()
method. And, Java's String.trim()
removes all ASCII characters from \u0000
up to (and including) \u0020
(the space character).
Of course, since Adobe ColdFusion's code is closed-source, we can't know what it is doing. We can only test it. And, do this, I'm collecting all of the "standard" whitespace characters and the non-standard whitespace characters (that I identified in my text-normalization component) and I'm looping over them to see if they survive a call to trim()
:
<cfscript>
testCharacters = [
// Standard "whitespace" charaters.
hexToChar( "0009" ), // Tab.
hexToChar( "0010" ), // Line Break.
hexToChar( "0013" ), // Carriage Return.
hexToChar( "0020" ), // Space.
// Non-stanard "whitespace" characters.
hexToChar( "00a0" ), // No-Break Space.
hexToChar( "2000" ), // En Quad (space that is one en wide).
hexToChar( "2001" ), // Em Quad (space that is one em wide).
hexToChar( "2002" ), // En Space.
hexToChar( "2003" ), // Em Space.
hexToChar( "2004" ), // Thic Space.
hexToChar( "2005" ), // Mid Space.
hexToChar( "2006" ), // Six-Per-Em Space.
hexToChar( "2007" ), // Figure Space.
hexToChar( "2008" ), // Punctuation Space.
hexToChar( "2009" ), // Thin Space.
hexToChar( "200a" ), // Hair Space.
hexToChar( "200b" ), // Zero Width Space.
hexToChar( "2028" ), // Line Separator.
hexToChar( "2029" ), // Paragraph Separator.
hexToChar( "202f" ), // Narrow No-Break Space.
hexToChar( "feff" ) // Zero Width No-Break Space.
];
// For each test whitespace character, let's see if it survives a trim() call.
for ( c in testCharacters ) {
writeOutput( len( trim( c ) ) );
writeOutput( " , " );
}
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I convert the given hex-encoded character to an ASCII character.
*/
public string function hexToChar( required string hexEncoded ) {
return( chr( inputBaseN( hexEncoded, 16 ) ) );
}
</cfscript>
As you can see, I start with our four most common control-characters and spaces; and then, I follow with a variety of other uncommon whitespace characters. When we run this code in either Lucee CFML or Adobe ColdFusion, we get the same output:
0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
As you can see, the first 4 test characters (Tab, Line-Break, Carriage Return, Space) were all removed by the trim()
function - which matches what Java's String.trim()
function is documented to do. And, all of the other uncommon whitespace characters remain. As such, I think it would be fair to assume that Adobe ColdFusion's trim()
function is likely also handing control off to Java's String.trim()
implementation. Which means that both CFML engines only remove characters \u0000
up to and including \u0020
in their trim()
function implementations.
Want to use code from this post? Check out the license.
I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!