<< Weekly Status Report, W32/2011 | The roads I take... | Integration eines Magento-2-Webshops mit FreeFinance und selbstgebautem Warenmanagement >>
Crash-stata Now Splits Data For Betas And Release
As already mentioned in my recent post about crash-stats, the Socorro team has been busy with more changes to how their server software works, as requested mostly by our team of crash analyzers.
After working late hours last week, working on the weekend for a first deployment on Sunday, and doing a bugfixing all-nighter until this morning, this great group of people made sure that we have better-fitting crash analysis infrastructure in place for today's Firefox release than for the last one six weeks ago.
So, what has changed? Doesn't the crash-stats front page look the same as before? Not entirely. The devil is in the details. The old one was almost, but not quite unlike the tea we wanted to drink. The new one actually is brewed out of leaves and hot water, to stay with the analogy borrowed from Douglas Adams. In the updated version we're running now, you'll see that on the front page we replaced "6.0" with "6.0(beta)" at this moment and in the next days we won't have completely unusable crash rates for the release, like we had for 5.0 six weeks ago.
The reason is that betas and releases are now processed very differently. We now get graphs and reports for every single beta build we push out, and for the final release build separately on the beta channel and on "the release channel" - even though all of those report in with exactly the same version number. When you see or select graphs for "6.0b1" through "6.0b5", Socorro actually internally looks for a "6.0" version number, the "beta" release channel and the right build ID that corresponds to the fifth build we created on the beta channel for 6.0.
When we generate the final release builds, we also push them to the beta channel, which is reported as "6.0(beta)" there, while "6.0" now only looks at other channels (mostly "release" but also things like the "default" channel used by e.g. Linux distro builds). As we process only 10% of all crashes in the latter category but 100% for the former, splitting those apart makes both have correct crash rates, being able to account for the difference with a factor (not being able to do that and mixing values for both caused unusable crash rate numbers in the last cycle).
In addition, the team also fixed a discrepancy between crash counts that have been previously done per Pacific Time day and ADUs which are done per UTC day - now both (for betas and releases) are counted per UTC day, making the rates more meaningful.
With all that, we now will be able to compare different betas against each other in a meaningful way, as well as beta and release, look for differences and spot regressions more easily. Still, note that this is for betas and releases only, while we have plans for improving Nightly and Aurora reporting as well, those for now stay with the "old" reports. Also, this is only the first stage, and small glitches are possible, though some more visible regressions have been fixed earlier today as mentioned.
Getting this to work was not "just adding a line of SQL" as someone suggested to me some time ago, but it required getting the necessary data in the correct tables, creating new data aggregation tables and mechanisms, fetching the needed data from the proper places, making the UI use the new aggregations and making other parts of the system play together with those changed reports properly. Many thanks to the Socorro team for getting all this done in time for today's Firefox release!
I hope the team gets some good sleep and rest after this now while we are starting to actually use their newest work, so they're fit for the future. In the end, we have more requests for improvements come their way as we're trying to get all the data we need for making Firefox even more stable - it's surely not a boring place to work at for either one of us...
After working late hours last week, working on the weekend for a first deployment on Sunday, and doing a bugfixing all-nighter until this morning, this great group of people made sure that we have better-fitting crash analysis infrastructure in place for today's Firefox release than for the last one six weeks ago.
So, what has changed? Doesn't the crash-stats front page look the same as before? Not entirely. The devil is in the details. The old one was almost, but not quite unlike the tea we wanted to drink. The new one actually is brewed out of leaves and hot water, to stay with the analogy borrowed from Douglas Adams. In the updated version we're running now, you'll see that on the front page we replaced "6.0" with "6.0(beta)" at this moment and in the next days we won't have completely unusable crash rates for the release, like we had for 5.0 six weeks ago.
The reason is that betas and releases are now processed very differently. We now get graphs and reports for every single beta build we push out, and for the final release build separately on the beta channel and on "the release channel" - even though all of those report in with exactly the same version number. When you see or select graphs for "6.0b1" through "6.0b5", Socorro actually internally looks for a "6.0" version number, the "beta" release channel and the right build ID that corresponds to the fifth build we created on the beta channel for 6.0.
When we generate the final release builds, we also push them to the beta channel, which is reported as "6.0(beta)" there, while "6.0" now only looks at other channels (mostly "release" but also things like the "default" channel used by e.g. Linux distro builds). As we process only 10% of all crashes in the latter category but 100% for the former, splitting those apart makes both have correct crash rates, being able to account for the difference with a factor (not being able to do that and mixing values for both caused unusable crash rate numbers in the last cycle).
In addition, the team also fixed a discrepancy between crash counts that have been previously done per Pacific Time day and ADUs which are done per UTC day - now both (for betas and releases) are counted per UTC day, making the rates more meaningful.
With all that, we now will be able to compare different betas against each other in a meaningful way, as well as beta and release, look for differences and spot regressions more easily. Still, note that this is for betas and releases only, while we have plans for improving Nightly and Aurora reporting as well, those for now stay with the "old" reports. Also, this is only the first stage, and small glitches are possible, though some more visible regressions have been fixed earlier today as mentioned.
Getting this to work was not "just adding a line of SQL" as someone suggested to me some time ago, but it required getting the necessary data in the correct tables, creating new data aggregation tables and mechanisms, fetching the needed data from the proper places, making the UI use the new aggregations and making other parts of the system play together with those changed reports properly. Many thanks to the Socorro team for getting all this done in time for today's Firefox release!
I hope the team gets some good sleep and rest after this now while we are starting to actually use their newest work, so they're fit for the future. In the end, we have more requests for improvements come their way as we're trying to get all the data we need for making Firefox even more stable - it's surely not a boring place to work at for either one of us...
Entry written by KaiRo and posted on August 16th, 2011 22:42 | Tags: CrashKill, Mozilla, Socorro | no comments
TrackBack/Pingback
Comments
No comments found.