Unicode string formatting
Did you know, if one of values in string formatting expression with %
operator is unicode, then result string will also be unicode?
>>> "Hello, %s" % u"Alex"
u'Hello, Alex'
>>> "Hello, %s" % u"Алексей"
u'Hello, \u0410\u043b\u0435\u043a\u0441\u0435\u0439'
I used to work with .format
string method and its behavior is more attractive to me: type of source string is saved and if some parameter contains non-ascii symbols, UnicodeEncodeError
exception is raised.
>>> "Hello, {0}".format(u"Alex")
'Hello, Alex'
>>> "Hello, {0}".format(u"Алексей")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-6: ordinal not in range(128)
>>> u"Hello, {0}".format(u"Алексей")
u'Hello, \u0410\u043b\u0435\u043a\u0441\u0435\u0439'
Is it a big deal, what string type is returned? Well, sometimes yes. For example when working with urlparse.parse_qs, type of string make sense.
So it is better to keep in mind, that code like:
>>> "Hello, %s" % value
can return a unicode string.
Some links: