[wikka-community] Coding guideline about backslash.
Tue Oct 9 19:17:42 GMT 2007
OK... you had me confused. My only (weak) excuse is that I was quite tired
when I write that but wanted to reply anyway. Let me try again.
The real issue here is that:
a) we're not talking about "Coding guidelines" (in the sense of how we
prefer to write things) but *Grammar* (in the sense of "if you want this
then you MUST do it this way"); and
b) the single quotes are actually a red herring
At 07:25 2007-10-09, you wrote:
>No Marjolein, I'm sorry.
> echo '\\n';
>will output \n, a single backslash and the letter n.
>echo '\\\\192.168.1.2\\d$' will give \\192.168.1.2\d$, that's really why I
>chose that example.
>It will lead you to confusion if you think backslashes need not be escaped
>inside single-quote-delimited strings.
>Inside a string delimited by a single quote, there are 2 characters that
>can (and need) be escaped with a backslash: a single quote or a backslash.
OK, so let's start with strings.
We can have single-quoted strings (literals) and double-quoted strings
(interpolated). What they have in common is this: if you want to *embed* a
quote of the same type that surrounds the string, you MUST either escape
that quote character or concatenate. The escape character for this is a
- if we have 'single quotes' and want to replace the space with a single
quote, there are two ways to do it:
- 'single'."''.'quotes' (with the single quote in double quotes)
- and if we have "double quotes" and want to replace the space with a
double quote, the mechanism is exactly the same:
- "double".'"'."quotes" (with the double quote in single quotes)
Now, since the backslash functions as an escape character, it follows that
to use it as a _real_ character it, *itself*, must be escaped. And it
becomes obvious from the examples above that this will be no different for
single-quoted or double-quoted strings either since both use the backslash
as an escape character in exactly the same way. So to access files on my
laptop I might write Alan\\Development\Server\ - and to write this as a
string in PHP each of those backslashes must be escaped, regardless of
whether we use single or double quotes:
(And of course, while both are exactly equivalent, our Coding guidelines
say that since there is nothing to interpolate here, using a string literal
(single quotes) is preferred.)
>The reason is simple: how could you use single quote to delimit a string
>that ends with a singlequote, and how would you do for a string
>that ends with a backslash?
>For the first one, you write '...\'' (the single quote escaped by
>backslash represents a single single-quote, and the final single-quote
>delimits the string)
>and for the second one: '...\\'
>(The backslash need to be escaped, otherwise, the parser will consider \'
>as an escaped single-quote, and expects another single-quote to delimit
The rule is much more general actually, and is the same in many languages:
if a character is defined as an escape character, then in order to use that
character *as a normal character* it must itself be escaped. (Actual escape
mechanisms do differ, in some languages one uses a single escape character
for everything that needs to be escaped, in others one simply doubles a
character to escape it.)
>Convinced with this explanation, and as you have mentionned, things get
>more complicated with regular expressions.
>If you want a regex to test a string starting with the 2 characters \ and
>n, you really need to represent it with 4 backslashes like
>'/^\\\\n/', __even if you use singlequote__
>. Other representations will fail.
>Ex: '/^\\n' - php will pass the string ^\n to the regexp, and regex will
>consider it as a pattern that match something starting with a LF.
I think that should be "Ex: '/^\\n/' - php will pass the string ^\n to the
regexp." But understood.
So far so good - but all of the above is nothing but pure syntax:
>preg_quote() is a handy function, but you some string representations like
>\s, \d should not be passed to it (when you intend their use as regexp
>special character, in this case a character string or a digit).
And now we get back to Coding guidelines. Because those *four* backslashes
above just to get it ultimately evaluate to a single backslash is of course
quite ugly. Regexes are hard enough to read without double-doubling
So, there are cases where we can avoid that and keep our regex a bit more
1. If you want a regex to test a string starting with the 2 characters \
and n, you don't need to write '/^\\\\n/' - instead, you could write
'/^[\]n/' : using a character class to represent the single backslash.
2. But of course it isn't always that simple; still, in order to avoid a
proliferation of doubled or double-doubled backslashes it's much better to
use preg_quote() which will do the necessary quoting for you.
But you are right that you need to keep the (essentially already-escaped)
"special characters" that start with a backslash out of this, or they
themselves will end up being double-escaped, with \\s then interpreted as a
backslash followed by an s, rather than \s which matches any whitespace
Still, all this double-escaping should be avoided as much as possible,
using the mechanisms regexes themselves (as in character classes) or the
PCRE library (and in preg_quote()) provide.
> > Date: Mon, 8 Oct 2007 17:31:41 +0200
> > To: community at wikkawiki.org
> > From: javawoman at wikkawiki.org
> > Subject: Re: [wikka-community] Coding guideline about backslash.
> > At 14:09 2007-10-08, Mahefa wrote:
> > >Which coding guideline about backslash?
> > >
> > >My coding style for writing a backslash inside a string, whether
> > >double quote or single quote is used as delimiter; is to expressly
> > >escape it with another backslash.
> > >
> > >These 2 strings are the same to write a string composed of 2
> > >characters: a backslash and the letter n.
> > >'\\n' and '\n'
> > Actually they are NOT the same: within single quotes, every character is
> > just a literal character, so there is nothing to "escape". So '\\n' reads
> > "two backslashes and the letter n" and '\n' reads "a backslash and the
> > letter n".
> > It's not a matter of a rule for backslashes, but a rule for using
> single or
> > double quotes:
> > - use single quotes for LITERALS (every character stands for itself)
> > - use double quotes only for strings that need to be INTERPOLATED
> > It's only in "interpolated" strings that you may need an escape character
> > to make a "special" character stand for itself instead of something to be
> > interpolated.
> > >I prefer the first one, and the reasons are:
> > >
> > >1) When in the future, someone changes my singlequote in doublequote,
> > >errors due to this change are minimized.
> > When someone changes the single quotes to double quotes they must have a
> > reason for that - and that brings with it the responsibility to consider
> > whether any character may need to be escaped.
> > >2) clarity: I don't need to think if the character that follow the
> > >backslash has a special meaning when eventually combined with it. I
> > >just have to count the number of backslashes and divide them by 2.
> > Yes, you DO need to think (not to divide by two but to escape special
> > characters), because you need to make a reasoned decision to use double
> > quotes in the first place, instead of the generally preferred single
> > quotes. If you just need one or two interpolated characters, it's
> better to
> > concatenate them with the rest of the string still in single quotes.
> > So if you start with:
> > echo 'This is a very looong string to be written to the output.';
> > and want to add a newline to that, the solution is NOT
> > echo "This is a very looong string to be written to the output.\n";
> > but instead:
> > echo 'This is a very looong string to be written to the output.'."\n";
> > >Consider you want to write a constant \\192.168.1.2\d$, using a single
> > >quote as delimiter.
> > >
> > >If you use '\\192.168.1.2\d$', the string will be : \192.168.1.2\d$
> > No it won't - it will be \\192.168.1.2\d$ because every character stands
> > for itself in single quotes: it's a LITERAL.
> > >You can easily get in trouble if you used to consider that escaping
> > >backslash is not needed within single-quote-delimited strings.
> > That's because escaping a backslash ISN'T needed for literals. The idea is
> > even meaningless, because there is nothing to escape. So if you "used to
> > consider" that, please go right on doing so, because it's true.
> > >To write the string correctly, you must do one of the 4 proposals below.
> > >'\\\192.168.1.2\d$' or '\\\\192.168.1.2\d$' or '\\\192.168.1.2\\d$' or
> > >'\\\\192.168.1.2\\d$'
> > >
> > >For me, having in mind that I always escape my backslashes, the 4th
> > >proposal is what I can read and understand more easily.
> > Actually, I find that hard to read - if it would be in double quotes -
> > unless you really intend it to be "four backslashes, an IP address, two
> > backslashes, the letter d and a dollar sign".
> > That said, there is only one special case to consider and that is using
> > such strings as (building blocks for) regular expressions - because
> > expressions have their own layer of interpolating and escaping: the PHP
> > engine is handing off the string to the PCRE engine which *itself*
> > interprets the regex string it is handed (essentially only a character
> > class has "literal" values (except for the dash which will be a literal if
> > put at the end), everything else is interpolated).
> > In that case the simple solution is to still write the string as you mean
> > to - as a literal whenever possible -, keeping it readable for humans, and
> > use preg_quote() to let the PCRE library do its own escaping.
> > To summarize:
> > - use single quotes for literals, as much as possible
> > - use double quotes ONLY when you need strings to be interpolated (by
> > and whenever feasible concatenate only those bits with single-quoted
> > That much is pure PHP.
> > For regular expressions you need to consider how those strings are
> going to
> > be used:
> > - when using regular expressions, use PCRE (preg_*) in PHP: that keeps the
> > syntax consistent, expressions usable as building blocks, and our RE
> > library (in the making) better maintainable
> > - when *creating* regular expressions, use the above rules for strings
> > - when the *resulting* string (after PHP has done any interpolation
> > already!) /might/ contain any character that is a "special character" in
> > PCRE, use preg_quote() to escape it before passing it to the PCRE engine
> > with one of the (other) preg_* functions. See
> http://php.net/preg-quote.php .
> > I don't think the last bits are on our coding guidelines page yet - I'll
> > add that soon.
> > >--
> > >Mahefa Randimbisoa (aka DotMG)
> > >
> > >_______________________________________________
> > >WikkaWiki Community mailing list
> > >community at wikkawiki.org
> > >http://mail.wikkawiki.org/mailman/listinfo/community_wikkawiki.org
> > --
> > JavaWoman
> > Web Standards Compliance Officer, Wikka Development Crew
> > http://wikkawiki.org/JavaWoman
> > Skype: callto://goneagain
> > _______________________________________________
> > WikkaWiki Community mailing list
> > community at wikkawiki.org
> > http://mail.wikkawiki.org/mailman/listinfo/community_wikkawiki.org
>Sur Windows Live Ideas, d?couvrez en exclusivit? de nouveaux services en
>ligne... si nouveaux qu'ils ne sont pas encore sortis officiellement sur
>le march? ! Essayez-le !
>WikkaWiki Community mailing list
>community at wikkawiki.org
Web Standards Compliance Officer, Wikka Development Crew
More information about the community