The roads I take...
Displaying recent entries tagged with "Mozilla". Back to all recent entries
April 1st, 2014
The Crash Rate
The crash rate is our primary stability measure used at Mozilla. We measure this rate in "crashes per 100 active daily installations (ADI)" or "crashes / 100 ADI". (ADI is the number of daily requests sent by Firefox Desktop and Firefox for Android to update their copy of our add-on blocklist. This value is considered a good enough estimation for usage for our purposes.)
Challenges for a Long-Term Rate
In our daily work, we tend to look at crash rates in terms of short-term changes within a single version, esp. development versions, so we can determine regressions and then dig deeper into what those are. For determining long-term program efficiency, it makes sense though to look at cross-version crash rates instead, so we know how our releases (or betas) improved. So it might make sense to look at all users on the release "channel", i.e. anyone using a stable release. On the other hand, we sometimes have leftover users of old and unsupported users producing a lot of crashes, but those are not really relevant to our current effectiveness of the stability program, so I wanted some way to age out old versions from this overall rate. To take all that into account, I needed some way to more or less "concatenate" the stability rate graphs of a series of versions. Also, people updating to or installing a release very soon after it's published tend to have somewhat different usage patterns than those installing it only after some time and therefore crash rates to those updating late in the cycle, so I needed to find some way to smoothen over that as well and ideally make this into an algorithm that can be automatically requested and put into an SQL query (as the data I base this on is in a PostgreSQL database).
So, I began to think we could always sum up the crash and ADI numbers of the most recent two releases, or the ones that have the most users. But sometimes we release two adjacent versions 6 weeks apart and sometimes we do a fast update after a week and when the second of those is released, the one before might not have a lot of people updated to it yet so taking only those two might only cover a small portion of users and skew the numbers. So in the end, I decided to go with a moving window that always counts all versions where the builds have been created within the last 12 weeks for the Release channel, and the last 4 weeks for the Beta channel (I had 9 and 3 in the beginning but extended that to make numbers smooth over the impact of the 2-week hiatus we had over New Year's this year). The data we have in usable form goes back to the last few days of September 2011, so that's what I could use for the graphs (I'm trying to get some older data but that is harder to dig out).
Graphs & Discussion
So, here are some screen copies of the graphs I have created out of the data collected with that algorithm (includes data up to March 5, which was current when I originally wrote up this post):
The first graph, with data from the Firefox desktop release channel, shows three lines, as the legend says the include crashes of the browser process, those of a plugin process (the vast majority of the plugin processes are Adobe Flash), and so-called "hangs" where we kill the plugin process after it doesn't react to contact from the browser process for a long time (by default, 45 seconds).
For one thing, you'll see that weekends have higher crash rates than weekdays. This could for example be because the ADI data isn't as reliable/accurate as one would hope or because people using Firefox on weekends do things that are more crash-prone (including work/home usage pattern and possibly machine differences).
In this graph we can also clearly see the results of known stability events in this time frame: For example, it nicely shows the Google Doodle crash of August 2012, where almost every startup of the browser crashed when Google was set as the home page, and where we scrambled to get a fix out in very short time (and Google helped us by putting a workaround in their doodles as well). It's also easy to see a few other sharp spikes where we had ADI (upwards) or crash submission (downwards) issues, as well as the crash-and-hang-rich Flash 11.3 release in June of 2012 and subsequent fixes for Flash, including the concerted efforts between Adobe and us to get down to the old levels with fixes on both sides in May/June 2013. For the most time on the graph, you'll see that the browser crash rate didn't change very much (other than the sharp spikes mentioned). In January of 2013, though, it's possible to see the rise in crashes that caused us to ship Firefox 18.0.2 with a fix for that. Right following that, at the end of February, you'll see the sharp rise in crashes when we released Firefox 19.0, triggered by a bug in certain AMD CPUs, which we worked around by rebuilding and releasing a 19.0.1. Those examples, like anything showing up in that graph significantly and not being a data error has pretty intricate story, any of those could make up a separate blog post.
That said, the fact that we could keep the crash rate pretty much at 1.0 browser crashes / 100 ADI over that whole time (and even slightly improve to just below that with the Firefox 26 release in December 2013) is a statement on how effective the Mozilla Stability Program is on keeping Firefox crashes down even though a whole lot of code has been added to support a ton of new features that the web has gained over that time.
Now, let's see how Firefox Beta looks in comparison:
At the end of 2012, we apparently did manage to improve base level stability of the Beta channel, but you'll see that this channel is more noisy - which is expected as here we still see regressions and work on fixes before the issues hit release. For example, you can see that Firefox 27 Beta regressed stability in December 2013. We fixed that only very late in the cycle so that you don't see 27 being worse on the Release channel, but 28 had other regressions in the beginning and a rather large one in 28 Beta 4 (mid February 2014) - once we fixed that, you see that we come down to the 1.0 line in the last one or two weeks, so that looks pretty good for the 28 release, which was to be released ~2 weeks after the end of that data.
Also, you'll see that the plugin improvements of early 2013 are about 6 weeks earlier in Beta than in Release, which shows pretty well that there were actual patches in our code that helped with Flash hangs and crashes (as our code is on a 6-week cadence while Adobe's releases hit both channels pretty much at the same time).
Now, let's see how the picture looks when we look at a product that was newly created while we already had the mechanisms in place to record this data, like the current "native UI" Firefox for Android:
The early releases had higher crash rates, but we significantly improved over time due to our efforts in the Stability Program. You also can make out that the sharper changes happen pretty exactly at the edges of the 6-week release cycles. Also, you'll see that Firefox 23 for Android in September 2013 was pretty good but we became worse in the following months. Because of that, we started a renewed effort to improve stability of Firefox for Android this January. The current Firefox 27 for Android release is somewhat better than the one before, but it's not where we want to be yet, obviously. We didn't have too much time to pound on issues from the start of the year until 27 was release, but Beta can show us if our newer efforts are pointing in the right direction:
Now this graph looks pretty nice, doesn't it? When we started off putting this product on Beta the first time, we were seeing the usual churn of exposing a new product to a wider audience for the first time, but we burned down the issues pretty well. Then we had a big regression, fixed it, and burned down bugs slowly over multiple months again. The regressions of late 2013 look even more dramatic here as we had even worse issues there but could actually fix the worst parts of those so that the regressions on the Release channel weren't as bad as the first Betas we had there. Many of the 6-week cycles in this graph look like burn-down charts, high in the beginning, going down over the cycle as we push for bugs being fixed. It's also pretty awesome to see how the efforts since the start of this year have really paid off and current Beta is rivaling the best Beta numbers we had so far - you can imagine how I was looking forward to Firefox 28 for Android hitting Release based on that data!
All that said, we know there's more we can do on both products, and while holding crash rates pretty stable over a long time while adding a ton of features is awesome, we strive for improving overall stability. Those graphs are one part of measuring the effectiveness of the stability program. I hope we will be able to put them up in a more dynamic and daily updating form at some point (right now I manually construct them in LibreOffice).
And in case you're interested in digging deeper into the source of the graphs, the code to pull the data from the crash-stats DB is in my crash-report-tools repo and the JSON coming out of that and powering my charts is in my directory on crash-analysis (F*-bytype.json files). Also feel free to contact me for more details.
March 29th, 2014
Still, performance of the map canvas was not good - on phones (esp. the small 320x480 screens like the ZTE Open), where you only have a handful of 256x256 map tiles to draw, panning was slightly chunky, but on larger screens, like my Android tablet or even my pretty fast desktop, it ranged from bad to awful (like, noticeably waiting from any movement until you saw any drawing of a move map). Also, as it takes until images are loaded (cached from IndexedDB or out from the web) and that's all called asynchronously, the positions the images ended up being drawn often weren't completely correct any more at the time of drawing them. I tried some optimizations with actually grepping the pixels from the canvas, setting them in the new positions and only actually redrawing the images on the borders, but that only helped slightly on small screens while making large ones even worse in performance.
Given what I read and heard about how today's graphics chips and pipelines work, I figured that the problem was with the drawImage() calls to draw the tiles to the canvas as well as the getImageData()/putImageData() calls to move the pixels in the optimizations. All those copy image data between JS and graphics memory, which is slow, and doing it a lot doesn't really fit well with how graphics stacks work nowadays. The only way I heard that should improve that a lot would be to switch from 2D canvas to WebGL (or go to the image-based tile maps that many others are using, but that wouldn't be as much fun). I don't remember all sources for that, but just did get another pointer to a Mozilla Hacks post that explains some of it. And as Google also seems to being moving their Maps site to WebGL (from image-based tiles, mind you), it can't be a really wrong move.
So, I set out to try and learn the pieces of WebGL I needed for this app. You'd guess that Mozilla, who invented that API together with Khronos, would have ample docs on it, but the WebGL MDN page does only have one tutorial for an animated 3D cube and a list of external links. I meanwhile have filed a bug on a WebGL reference so may improve this further in the future, but I started off first trying with the tutorial that MDN has. I didn't get a lot to work there except some basics, and a lot of the commands in there were not very well explained, but the html5rocks tutorial helped me to get things into a better shape, and some amount of trying around and the MSDN WebGL reference helped to understand more and get things actually right.
One thing that was pretty useful there as well was separating the determination of what tiles should be visible and loading them into textures from the actual drawing of the textures to the canvas. By doing the drawing itself on requestAnimationFrame and this being the only thing done when we pan as long as I have all tiles loaded into textures, I save work and should improve performance.
2D Canvas (left) and WebGL (right) version of Lantea Maps on the ZTE Open
As you can see from the images, the 2D canvas and WebGL versions of Lantea Maps do not look different - but then, that was not intended, as the map is the map after all. Right now, you can actually test both versions, though: I have not moved the WebGL to production yet, so lantea.kairo.at still uses 2D canvas, while the staging version lantea-dev.kairo.at already is WebGL. You'll notice that panning the map is way more fluid in the new version and the tile distortions that could happen with delayed loading in the old one do not happen. I still wonder though why it sometimes happens that you have to wait very long for tiles to load, esp. after zooming. I still need to figure that out at some point, but things render after waiting, so I found it OK for now. Also, I found the WebGL app to work fine on Firefox desktop (Linux), Firefox for Android, as well as Firefox OS (1.1, 1.2, and 1.5/Nightly).
So, I'm happy I did manage the conversion and learn some WebGL, though there's still a lot to be done. And as always, the code to Lantea Maps is available in my public git as well as GitHub if you want to learn or help.
February 27th, 2014
One problem for preserving software is that the original hardware that the software did run on might not survive very long. Some people are still keeping some old machines like C64, Apple ][ and others running, but at some point there won't be many left as the original ones wear out or get damaged, and other hardware might not be usable at any more already at this point. And for sure, those machines are not available broadly to the public. Ideally, we'd have the hardware and recreate the full experience, e.g. how you connected the machine to your own TV in the living room and played or worked with it there - but that is pretty unlikely or at least hard to do, esp. with the hardware being less and less available, as I mentioned.
But there's one way to bring at least part of the experience to users: We can emulate the old machines and let the preserved software run within that emulator. That doesn't give us the living-room-TV experience, but there's a better chance in both preserving that way of running the old pieces of software for a long time and making the experience broadly available. Now, it's not always easy to get emulators running well, but there are a number of projects out there, and we heard about a few interesting solutions in the preserving software event at the LoC, but one was particularly appealing to us as Mozillians.
Since the event in May, a lot of work has been flowing into JSMESS, and as Jason has blogged about, there are a thousand cartriges available now in the Historical Software Collection of The Internet Archive, and performance is pretty decent within the browser now.
With that, a whole lot of old software is available for everyone, at any time, to try and experience within their own browser!
That's a powerful way to preserve software for the current world and upcoming generations, isn't it?
December 19th, 2013
I blogged a month ago about how it may affect my customizations and I have dealt with those to a good degree by now, even though not yet even as drastically as I thought when writing that blog post. As always, more will follow. It took me some time until I switched over actually, as I wanted to keep using my theme, but it was naturally not compatible with such a huge redesign.
But after a lot of hours of my free time in the last few weeks, I have experimental support for Australis working in LCARStrek. The new changes living together with support for pre-Australis Firefox in the same theme require quite a few hacks to have a number of styles only apply on one side or the other. But then, I have been doing theme design for long enough (about 14 years now) that I know a few tricks and could use those - thankfully, there are a few changes in attributes set on the main toolbox, for example.
There's still a lot to be done in this area to fix some details (and I see a painting issue that is triggered in the submenus of the new main menu but is probably Linux-specific and connected to transparency used in the arrowpanel), but the main things seems to work decently now. See this screenshot:
Given that I'm using it every day, I hope starting now gives me enough experience with it that I can deliver a really decent theme when Australis finally will ship, probably with Firefox 29.
December 6th, 2013
While the local just asked me if I'd go there, the Mozilla contacts had been asked by the organizers for a speaker to open up the event. We were trying to get someone more used to talking about Firefox OS, but everyone's busy this time of year, so in the end we settled with me doing this keynote.
Now, I have been giving presentations on different occasions and events in the last years, but I never have actually keynoted anything, so that made me somewhat nervous. The other talks that were lined up for the evening were about app development, to some part about very concrete pieces of it, so I figured I should give that some frame and introduce people to Firefox OS, starting with why we are doing it, moving to what and where it is and giving a bit of glance onto where we want to take it. So I came up with "Firefox OS: Reasons, Status & Plans" as the title (my slides are behind the link).
The audience was supposed to be about 50 people, I guess 30-35 really showed up (the pictures, taken "in style" with Firefox OS on my Peak, only show one part of the room), but those were an awesome bunch. They were really into the topic, asked interesting questions, and the talks following me were showing that we really had capable developers in the room, from those that do JS in their free time to those who earn their bread and butter by doing apps.
We also had two Mozillians, both of which I had not met in person before, even though I spent a lot of time in this city in the last decade!
As the event was going on, I was often the voice in the room who would have answers from the Mozilla side or could explain our point of view and initiatives - and in quite a few cases, I could loop back to something I said in my keynote. It was really great to see how apparently I had touched exactly on the right things there and gave everything else a good base to build on. Interestingly, there was quite a bit of interest in the DeviceStorage API, probably because accessing local files is something people can refer better to than storing items in-app. I was thankful someone did a talk on our Marketplace and in-app payment API/Services as that's one area I'm actually weak in, but it also sparked quite a bit of interest. The permission model did also get a few questions.
We surely had people with Firefox OS app experience in there, but I think more of those people might pick up web app development, esp. if more similar events come around, which would be cool. And maybe someone should tell them how to do simple apps without larger libraries or frameworks, and explain app manifests in more detail. I hope they will organize more of those and the chance for that will come along!
November 7th, 2013
I have blogged about what the archive has and can do a few months ago and I probably will mention it again when I get to more posts on preserving software.
I think it's in the best interest of everyone, esp. us as Mozillians, to keep this organization going and make the history of the Internet and more openly available to current and future generations.
Please help them to rebuild and continue on their way and make a Donation. I will for sure.
November 1st, 2013
One idea would be to award badges automatically to people who leave their email in a crash report (i.e. a badge for taking part in our efforts by delivering data through crash reports) - but that would probably end up being a bad badge because you' award people for crashing Firefox (and not award people who don't see crashes). We really do not want people to search for how they can crash so that they can get a badge - after all our mission is to avoid crashes, not provoke them. So, no badges for submitting crashes (thanks to Benjamin for pointing me to that issue right away when I was rolling that idea).
So, how can we potentially award a badge for Stability/"CrashKill" work without rewarding bad behavior?
One thing I could think of was "filed x crash bugs that ended up fixed". That should be relatively easy to get out of Bugzilla data and I think there's no doubt that filing bugs that end up with a fixed resolution is a good thing.
This idea would also open up the avenue for different badges for different amounts of badges (similar to the webdev badges), or to create a badge for developers who fixed x crash bugs, and similar things.
What do you think? Which ideas do you have for awarding badges for contributions to the Mozilla Stability program?
October 28th, 2013
Derzeit gibt es nur kommerzielle Anbieter für so etwas, allen voran Google Location Service, aber eine offene Variante öffnet die Tür für Implementierungen, die mehr Wert auf Datenschutz und Privatsphäre legen und trotzdem den gleichen Komfort bieten (z.B. könnte man einen Ausschnitt der Daten für eine Umgebung lokal herunterladen und dann ohne Übermittlung der genauen Daten eine Ortsbestimmung lokal durchführen).
Für so einen Service braucht man jedenfalls die an sich öffentlich einsehbaren Daten über die "sichtbaren" WLAN-Netze und Telefonmasten an verschiedenen Orten. Diese Infos wollen wir über Crowdsourcing gewinnen, und eine "Stumbler"-Anwendung für Android ist verfügbar, die diese Arbeit verrichtet (natürlich sind die Quellen für den Stumbler und den Server genauso offen wie eine API zum Service, wir sind schließlich Mozilla).
Hilfe ist natürlich erwünscht, vom Senden der Daten mittels Stumbler bis zur Verbesserung dessen und auch der Server-Seite!
September 8th, 2013
Now, what I'd really like to see, though, is an app that runs locally on the Firefox OS device in its entirety, and which has a UI that is useful and nice on a phone. Especially the latter might mean not implementing all the fancy functionality that many IRC clients have, but only those parts required for some simple chatting.
We have the technology to run a full IRC client on a Firefox OS phone with the TCPSocket API, and the simplicity of the IRC protocol would make it a nice reason for someone who wants to play with this API.
The UI, OTOH, would make a very interesting challenge for someone who like UX design, as on a phone, you need to be way more minimalistic, and you probably need to consciously decide what functionality and which elements to leave out, or implement completely differently than what we might be used to.
I'd really love to be able to have an easy way to tell my manager via IRC that I'll be late for a 1:1 while I'm on my way, or be able to make a quick inquiry in a chat channel while I'm traveling.
Anyone up for the challenge?
August 2nd, 2013
Jason talked about multiple efforts he's involved in, including his early (and ongoing) work on textfiles.com, collecting writing from the time when people first got online, and some other initiatives I'll mention at the end of this blog, but the main focus was on The Internet Archive, the non-profit he is working for nowadays and which has public collections of historical digital content as its main mission.
The site and organization are probably best known for the Wayback Machine, which has archived "over 240 billion web pages" going back more then 15 years, see e.g. a Mozilla homepage from around the time when I first encountered the project. But next to that, they have tons of other digital content archived - video, audio, texts, and more. Jason said they are basically seeking to store everything available in digital format that could be of any historical use at some point - preferably first making sure it's store and worrying about legal questions only as they arise, as it's better to have something but take it down than to be able to publish it but not having lost it to history. He went as far as to say they want to be "the hard drive of the Internet" and store everything anyone gives to them, be it personal documents, software that was published at some point, or other digital content. For example their software collection contains collections of entire FTP servers of the past as well as CD images and terabytes (!) of software and firmware for old systems to run in emulators.
And there's an "Upload" button on the site as well, inviting me, you, and everyone else to contribute content that they can archive!
So, if you have old digital content lying around, go to archive.org and make it available to the public, including the kids of the future, before it gets covered with enough dust or otherwise degrade in a way that the media can't be cleanly read any more.
If you have really important pieces of history that are on media that you fear is too dusty and old to still be read cleanly, or where it's hard to find any drive to read that media any more, or you know of such things that might otherwise be hard to recover, you might be interested in another project that Justin Scott is involved in: the Archive Team. That group is dedicated to rescue old digital contents where it's not easy, and to save history before it's actually lost. They have specialized equipment to read even aged disks and tapes, and they are building up communities to save sites before they die - they even archived most of Geocities before it died!
A quite awesome story is also how they helped to recover the original "Prince of Persia" source code.
All those projects can profit from your help, so if you have anything you can contribute, please do so!