[wikka-community] Coding guideline about backslash.

Marjolein Katsma javawoman
Tue Oct 9 19:17:42 GMT 2007


OK... you had me confused. My only (weak) excuse is that I was quite tired 
when I write that but wanted to reply anyway. Let me try again.

The real issue here is that:
a) we're not talking about "Coding guidelines" (in the sense of how we 
prefer to write things) but *Grammar* (in the sense of "if you want this 
then you MUST do it this way"); and
b) the single quotes are actually a red herring

At 07:25 2007-10-09, you wrote:
>No Marjolein, I'm sorry.
>  echo '\\n';
>will output \n, a single backslash and the letter n.
>echo '\\\\\\d$' will give \\\d$, that's really why I 
>chose that example.
>It will lead you to confusion if you think backslashes need not be escaped 
>inside single-quote-delimited strings.
>Inside a string delimited by a single quote, there are 2 characters that 
>can (and need) be escaped with a backslash: a single quote or a backslash.

OK, so let's start with strings.
We can have single-quoted strings (literals) and double-quoted strings 
(interpolated). What they have in common is this: if you want to *embed* a 
quote of the same type that surrounds the string, you MUST either escape 
that quote character or concatenate. The escape character for this is a 

- if we have 'single quotes' and want to replace the space with a single 
quote, there are two ways to do it:
         - 'single\'quotes'
         - 'single'."''.'quotes' (with the single quote in double quotes)
- and if we have "double quotes" and want to replace the space with a 
double quote, the mechanism is exactly the same:
         - "double\"quotes"
         - "double".'"'."quotes" (with the double quote in single quotes)

Now, since the backslash functions as an escape character, it follows that 
to use it as a _real_ character it, *itself*, must be escaped. And it 
becomes obvious from the examples above that this will be no different for 
single-quoted or double-quoted strings either since both use the backslash 
as an escape character in exactly the same way. So to access files on my 
laptop I might write Alan\\Development\Server\ - and to write this as a 
string in PHP each of those backslashes must be escaped, regardless of 
whether we use single or double quotes:
         - 'Alan\\\\Development\\Server\\'
         - "Alan\\\\Development\\Server\\"
(And of course, while both are exactly equivalent, our Coding guidelines 
say that since there is nothing to interpolate here, using a string literal 
(single quotes) is preferred.)

>The reason is simple: how could you use single quote to delimit a string 
>that ends with a singlequote, and how would you do for a string
>that ends with a backslash?
>For the first one, you write '...\'' (the single quote escaped by 
>backslash represents a single single-quote, and the final single-quote 
>delimits the string)
>and for the second one: '...\\'
>(The backslash need to be escaped, otherwise, the parser will consider \' 
>as an escaped single-quote, and expects another single-quote to delimit 
>the string.


The rule is much more general actually, and is the same in many languages: 
if a character is defined as an escape character, then in order to use that 
character *as a normal character* it must itself be escaped. (Actual escape 
mechanisms do differ, in some languages one uses a single escape character 
for everything that needs to be escaped, in others one simply doubles a 
character to escape it.)

>Convinced with this explanation, and as you have mentionned, things get 
>more complicated with regular expressions.
>If you want a regex to test a string starting with the 2 characters \ and 
>n, you really need to represent it with 4 backslashes like
>'/^\\\\n/', __even if you use singlequote__
>. Other representations will fail.
>Ex: '/^\\n' - php will pass the string ^\n to the regexp, and regex will 
>consider it as a pattern that match something starting with a LF.

I think that should be "Ex: '/^\\n/' - php will pass the string ^\n to the 
regexp." But understood.

So far so good - but all of the above is nothing but pure syntax: 
grammatical rules.

>preg_quote() is a handy function, but you some string representations like 
>\s, \d should not be passed to it (when you intend their use as regexp 
>special character, in this case a character string or a digit).

And now we get back to Coding guidelines. Because those *four* backslashes 
above just to get it ultimately evaluate to a single backslash is of course 
quite ugly. Regexes are hard enough to read without double-doubling 

So, there are cases where we can avoid that and keep our regex a bit more 
1. If you want a regex to test a string starting with the 2 characters \ 
and n, you don't need to write '/^\\\\n/' - instead, you could write 
'/^[\]n/' : using a character class to represent the single backslash.

2. But of course it isn't always that simple; still, in order to avoid a 
proliferation of doubled or double-doubled backslashes it's much better to 
use preg_quote() which will do the necessary quoting for you.

But you are right that you need to keep the (essentially already-escaped) 
"special characters" that start with a backslash out of this, or they 
themselves will end up being double-escaped, with \\s then interpreted as a 
backslash followed by an s, rather than \s which matches any whitespace 

Still, all this double-escaping should be avoided as much as possible, 
using the mechanisms regexes themselves (as in character classes) or the 
PCRE library (and in preg_quote()) provide.


Better? :)


> > Date: Mon, 8 Oct 2007 17:31:41 +0200
> > To: community at wikkawiki.org
> > From: javawoman at wikkawiki.org
> > Subject: Re: [wikka-community] Coding guideline about backslash.
> >
> > At 14:09 2007-10-08, Mahefa wrote:
> > >Which coding guideline about backslash?
> > >
> > >My coding style for writing a backslash inside a string, whether
> > >double quote or single quote is used as delimiter; is to expressly
> > >escape it with another backslash.
> > >
> > >These 2 strings are the same to write a string composed of 2
> > >characters: a backslash and the letter n.
> > >'\\n' and '\n'
> >
> > Actually they are NOT the same: within single quotes, every character is
> > just a literal character, so there is nothing to "escape". So '\\n' reads
> > "two backslashes and the letter n" and '\n' reads "a backslash and the
> > letter n".
> >
> > It's not a matter of a rule for backslashes, but a rule for using 
> single or
> > double quotes:
> > - use single quotes for LITERALS (every character stands for itself)
> > - use double quotes only for strings that need to be INTERPOLATED
> >
> > It's only in "interpolated" strings that you may need an escape character
> > to make a "special" character stand for itself instead of something to be
> > interpolated.
> >
> >
> > >I prefer the first one, and the reasons are:
> > >
> > >1) When in the future, someone changes my singlequote in doublequote,
> > >errors due to this change are minimized.
> >
> > When someone changes the single quotes to double quotes they must have a
> > reason for that - and that brings with it the responsibility to consider
> > whether any character may need to be escaped.
> >
> > >2) clarity: I don't need to think if the character that follow the
> > >backslash has a special meaning when eventually combined with it. I
> > >just have to count the number of backslashes and divide them by 2.
> >
> > Yes, you DO need to think (not to divide by two but to escape special
> > characters), because you need to make a reasoned decision to use double
> > quotes in the first place, instead of the generally preferred single
> > quotes. If you just need one or two interpolated characters, it's 
> better to
> > concatenate them with the rest of the string still in single quotes.
> >
> > So if you start with:
> > echo 'This is a very looong string to be written to the output.';
> > and want to add a newline to that, the solution is NOT
> > echo "This is a very looong string to be written to the output.\n";
> > but instead:
> > echo 'This is a very looong string to be written to the output.'."\n";
> >
> >
> > >Consider you want to write a constant \\\d$, using a single
> > >quote as delimiter.
> > >
> > >If you use '\\\d$', the string will be : \\d$
> >
> > No it won't - it will be \\\d$ because every character stands
> > for itself in single quotes: it's a LITERAL.
> >
> > >You can easily get in trouble if you used to consider that escaping
> > >backslash is not needed within single-quote-delimited strings.
> >
> > That's because escaping a backslash ISN'T needed for literals. The idea is
> > even meaningless, because there is nothing to escape. So if you "used to
> > consider" that, please go right on doing so, because it's true.
> >
> > >To write the string correctly, you must do one of the 4 proposals below.
> > >'\\\\d$' or '\\\\\d$' or '\\\\\d$' or
> > >'\\\\\\d$'
> > >
> > >For me, having in mind that I always escape my backslashes, the 4th
> > >proposal is what I can read and understand more easily.
> >
> > Actually, I find that hard to read - if it would be in double quotes -
> > unless you really intend it to be "four backslashes, an IP address, two
> > backslashes, the letter d and a dollar sign".
> >
> >
> > That said, there is only one special case to consider and that is using
> > such strings as (building blocks for) regular expressions - because 
> regular
> > expressions have their own layer of interpolating and escaping: the PHP
> > engine is handing off the string to the PCRE engine which *itself*
> > interprets the regex string it is handed (essentially only a character
> > class has "literal" values (except for the dash which will be a literal if
> > put at the end), everything else is interpolated).
> >
> > In that case the simple solution is to still write the string as you mean
> > to - as a literal whenever possible -, keeping it readable for humans, and
> > use preg_quote() to let the PCRE library do its own escaping.
> >
> >
> > To summarize:
> > - use single quotes for literals, as much as possible
> > - use double quotes ONLY when you need strings to be interpolated (by 
> PHP),
> > and whenever feasible concatenate only those bits with single-quoted 
> literals
> >
> > That much is pure PHP.
> >
> > For regular expressions you need to consider how those strings are 
> going to
> > be used:
> > - when using regular expressions, use PCRE (preg_*) in PHP: that keeps the
> > syntax consistent, expressions usable as building blocks, and our RE
> > library (in the making) better maintainable
> > - when *creating* regular expressions, use the above rules for strings
> > - when the *resulting* string (after PHP has done any interpolation
> > already!) /might/ contain any character that is a "special character" in
> > PCRE, use preg_quote() to escape it before passing it to the PCRE engine
> > with one of the (other) preg_* functions. See 
> http://php.net/preg-quote.php .
> >
> > I don't think the last bits are on our coding guidelines page yet - I'll
> > add that soon.
> >
> > >--
> > >Mahefa Randimbisoa (aka DotMG)
> > >
> > >_______________________________________________
> > >WikkaWiki Community mailing list
> > >community at wikkawiki.org
> > >http://mail.wikkawiki.org/mailman/listinfo/community_wikkawiki.org
> >
> >
> > --
> > JavaWoman
> > Web Standards Compliance Officer, Wikka Development Crew
> > http://wikkawiki.org/JavaWoman
> > Skype: callto://goneagain
> >
> >
> > _______________________________________________
> > WikkaWiki Community mailing list
> > community at wikkawiki.org
> > http://mail.wikkawiki.org/mailman/listinfo/community_wikkawiki.org
>Sur Windows Live Ideas, d?couvrez en exclusivit? de nouveaux services en 
>ligne... si nouveaux qu'ils ne sont pas encore sortis officiellement sur 
>le march? ! Essayez-le !
>WikkaWiki Community mailing list
>community at wikkawiki.org

Web Standards Compliance Officer, Wikka Development Crew
Skype: callto://goneagain

More information about the community mailing list