<< AMO and the Sandbox | The roads I take... | Integration eines Magento-2-Webshops mit FreeFinance und selbstgebautem Warenmanagement >>

Why I'm eager for L20n

Mozilla 2 is to be started soon, and as you might have read, there are thoughts of creating a new localization infrastructure for that future generation of Mozilla software. Currently this is named L20n, driven by MoCo's L10n lead Axel Hecht, and though it's in a draft stage, I can't await to use this technology.

When Axel talked about L20n at FOSDEM, I was hoping people would tell us how they love it - unfortunately time for the talk was over before he could get to the interesting part - examples. Discussions there in Brussels were pretty interesting though, as I can't remember objections to the need of this or the general approach, only some criticism of the lol file format semantics. Oh, yes, it looks like localizing will be much fun in the future, having L10n info in ".lol" (Localizable Object List) files :)
I actually think that the proposed syntax of those file is good, as it feel familiar to most developers but is different enough from other languages to realize you're not in JS, XML but in a lol file.

The good thing is that L20n is a format that is growing out of knowledge about problems in current L10n approaches, both of the Mozilla approach(es) and the gettext/PO approach. People who have worked with both on a developer and localizer side know they all have their problems. While Mozilla lacks language fallbacks and plural handling, gettext/PO lacks good VCS compatibility and needs long original strings in the source code, while both lack flexible support for declension and other grammatical specialities. L20n is an effort to learn from the strengths and weaknesses of those, and esp. from the problem of their users, to create an L10n toolkit that satisfies developers, localizers and users the same - also across programming language boundaries. Axel has contacted other L10n communities than the Mozilla one to get feedback and, from what I heard, has received positive feedback and wishes for collaboration.

As an active SeaMonkey localizer, I heard about this new approach soon, bing in the Mozilla L10n newsgroup, I was more or less there when it was born (or the ideas for it were gathered) and actively took part in those discussions.
I'm more eager to working with L20n from a developer's perspective than from a localizer's perspective though: While "my" language (German) doesn't differ from English that much that I regularly run into the big problem of current approaches, L20n could simplify the code I write as a developer a lot:

Let's take this "simple" snippet of PHP code and German .po file to print out a number of comments (simplified source of this blog):

blog.php:
if ($postCount < 1) { print(gettext('no comments')); }
elseif ($postCount == 1) { print(sprintf(gettext('%s comment'), $postCount)); }
else { print(sprintf(gettext('%s comments'), $postCount)); }

blog.po:
msgid "no comments"
msgstr "keine Kommentare"

#, php-format
msgid "%s comment"
msgstr "%s Kommentar"

#, php-format
msgid "%s comments"
msgstr "%s Kommentare"

Now let's look at how it might look (in principle, I could be using wrong function names here) with L20n:

blog.php:
print($l20n_context->getValue('comm_cnt', array('num'=>$postCount)));

blog.lol:
<plural0: (n) -> {n == 0 ? 0 : (n == 1 ? 1 : 2 ) }>
<comm_cnt[plural0(num)]: ["keine Kommentare", "${num}i Kommentar", "${num}i Kommentare"]>

Note that I'm not sure that $postCount would be passed as an array like this, but it would probably be similar to that - and I hope I got the "plural0" macro right.
There are multiple good things about that approach: First, the actual code is much shorter and easier to read (even more so if I don't have short strings but long sentences there). Second, there is no "original string" in the source, just an ID ("comm_cnt"), it actually doesn't matter which language I'm doing first, while writing the code, as L20n doesn't care. Third, while writing the code, I could just define the string as <comm_cnt[num]: "${n}i Kommentare"> to make things simple, and I could refine it to better values later (when the code is stable). Fourth, if some localizer comes along and tells me he need 5 different plural forms depending on the numbers, he can just do that in the lol file, I as a developer don't need to know or care. Fifth, the localization file is shorter. Sixth, I as a developer am actually writing a first localization along with the code, which a localizer can use as an example for his work. And there are probably more.
(And yes, I know, gettext probably knows plurals in some way, so those might be a bad example. It's one if the easiest to grasp for a developer who doesn't speak Finnish, Polish or other grammatically more complicated languages though.)

It would be so nice to simplify code and get all those long strings out of the source (which often require horizontal scrolling here) - and that's only speaking as a PHP dev doing two languages that don't need declensions or other specialties. It must be even more compelling for a localizer of some language like that who can finally give his users a linguistically correct user experience.

Entry written by KaiRo and posted on April 1st, 2007 00:58 | Tags: L10n, L20n, Mozilla | 6 comments | TrackBack

Comments

AuthorEntry

Anonymous guest

It is kind of unfair to say that "the actual code is much shorter", since you in one example have used the if statement and in the other have used the conditional operator. The php-example might have looked like:
print(sprintf(gettext($postCount < 1? 'no comments': ($postCount == 1? '%s comment': '%s comments')), $postCount));

If you wanted to write the php in a more "lol way", it might look like:
blog.php
print(l20n_commcnt($postCount));

blog_de.php
function plural0($n) { $a = func_get_args(); return $n == 0? $a[1]: ($n == 1? $a[2]: $a[3]); }
function l20n_commcnt($n) { return plural0($n, "keine Kommentare", "$n Kommentar", "$n Kommentare"); }

Also, personally I don't like the "<" and ">" to enclose the functions in the lol-file. It might jsut be my eyes that go into xml-parsing mode and throw an error when they meet programming syntax :). I think, I would just end the functions with ";" if anything are really needed. Also why would you use "name: (args) -> { body }" in one function definition and "name[args]: body" in the other? Or should the second be read as "name body" with the "[index]: [element, element, ...]" be kind of a reversed array indexing similar to "[element, element, ...][index]" in eg. javascript? In that case it seems odd not being able to name the formal arguments.
2007-04-01 08:57

Anonymous guest

Hmm... I posted to early. Let me try again:

blog_de.php
function plural0($n, $zero, $one, $many) { return $n == 0? $zero: ($n == 1? $one: $many); }
function l20n_commcnt($n) { return plural0($n, "keine Kommentare", "$n Kommentar", "$n Kommentare"); }

My point was that even though lol is a domain specific language, there is a lot of superficial syntax. Eg. you could just have used the php/javascript syntax with "function" removed (since all top level statements in lol are functions), return removed (a la perl) and "$" removed from php (since they of course aren't needed) and just add curly braces around the variables embedded in strings. That would make the example simpler and shorter.


BTW, Re: "Second, there is no "original string" in the source, just an ID ("comm_cnt"), it actually doesn't matter which language I'm doing first, while writing the code, as L20n doesn't care."
Proponents of po-files might argue, that it is actually a good thing to have the english (or any other) version as a guide as to how the string is used.
2007-04-01 11:55

KaiRo

Webmaster

For your questions about the lol syntax, you should read the L20n wiki page I linked in the first paragraph. BTW, I didn't invent that syntax, I'm just using it in those examples.
For the if vs. conditional operator, you're basically right though it would harm code readability (and as long as you want to use xgettext, only plain strings can be inside a gettext() function, so that or its abbreviation _() has to be used three times).
And I think you're missing the fact that lol is a language-independent format, which is not only important for getting localization tools working on top if it, but also for e.g. Mozilla's use in C++, JS and XUL.
The "original string" argument can be just countered with the need to always have a first lol file even for the "original" language, which can easily be used as a guide for localizers.
2007-04-01 13:16

Axel Hecht

Robert, you should really change your gettext example to actually use plurals, i.e., ngettext, as your code is not localizable into Polish, for example.

You'd still want to take out the 0 as special case, though, as the standard gettext plural forms would suggest to use

Sie haben 0 Kommentare

instead of

Sie haben keine Kommentare.

And yikes on that ';', thing, that's VETOed. It's just a huge mistake, try to find out what's happening when you miss out on a ';', it has horrible error recovery and reporting side effects.
2007-04-01 14:08

Anonymous guest

Thanks for the reply

> read the L20n wiki page
Yes, I found that after I'd posted the comments. But even though the syntax is described there, the semantics don't seem to be. And it doesn't explain the need to "reinvent the wheel" (e.i. not stealing as much a possible from other good languages).

> BTW, I didn't invent that syntax, I'm just using it in those examples.
Okay, I didn't mean to knock on you.

> For the if vs. conditional operator, you're basically right though it would harm code readability (and as long as you want to use xgettext, only plain strings can be inside a gettext() function, so that or its abbreviation _() has to be used three times).
But my examples don't use gettext? And after lol, it is apparently allowed to embed translations inside programming languages (since, lol is a programming language -- it looks turing complete).

> And I think you're missing the fact that lol is a language-independent format, which is not only important for getting localization tools working on top if it, but also for e.g. Mozilla's use in C++, JS and XUL.
The fact that the lol-language is supposed to be called from many different languages, doesn't dictate what syntax the lol-language should use. In particular it isn't an argument for inventing a whole new verbose syntax (to paraphrase a saying: the difference between good an bad programming languages is, that the good only steal from the best)

> The "original string" argument can be just countered with the need to always have a first lol file even for the "original" language, which can easily be used as a guide for localizers.
Well, then the localizers would need to look at and synchronize two files (the file they are working on and a reference, which ever that is).
2007-04-01 22:27

Axel Hecht

Quote of Anonymer Gast:
But even though the syntax is described there, the semantics don't seem to be. And it doesn't explain the need to "reinvent the wheel" (e.i. not stealing as much a possible from other good languages).

Looking at other attempts and many discussions out there, there is a well defined line between stealing as much as possible and stealing as much as feasible. This might be just me picking at the PO-discussions, but whenever I hear a "but you could ...", this essentially comes with quite some additional (and non-possible) baggage.

I don't think that there is a whole lot of stuff to take from existing attempts on the surface, we're changing paradigms here. The paradigm shift is much more confusing if one makes it look like something different. That might be funky with letters like < or > or " or [, but there are only so many non-letter chars to pick from.
Quote of Anonymer Gast:
But my examples don't use gettext? And after lol, it is apparently allowed to embed translations inside programming languages (since, lol is a programming language -- it looks turing complete).

Not exactly sure what you mean here. And to clarify, L20n is not intended to be turing complete.
Quote of Anonymer Gast:
In particular it isn't an argument for inventing a whole new verbose syntax (to paraphrase a saying: the difference between good an bad programming languages is, that the good only steal from the best)

"Best" has nothing to do with good. "Best" is not an objective (?) criteria either, in particular when it comes down to localization architectures. There are no single target audience, too, I know at least three (architecture impls, software authors, localizers). "Best" is really bad. I stole whatever I found good, really, and put it into a legacy-free world. Again, I don't think that the l12y problems we're facing are solvable without a paradigm change, and that's the hard nut to sell. Everything beyond that is just personal taste.

Btw, I'm not sure if there's just one Anonymous posting here, so if you don't really have to, giving your name would help a bunch to distinguish one Anonymous from the other.
2007-04-01 23:52

This topic is closed, replies can't be accepted any more.