The roads I take...
KaiRo's weBlog
| Zeige Beiträge veröffentlicht am 13.10.2015 und auf Englisch an. Zurück zu allen aktuellen Beiträgen |
13. Oktober 2015
Shortening Crash Signatures: Dropping Argument Lists
Crash signatures derived from the function on the stack are how we group ("bucket") crashes as belonging to a certain issue or bug. They should be precise enough to identify this "bucket" but also short enough so we can handle it as a denominator in lists and when talking about those issues. For some time, we have seen that our signatures are very long-winded and contain parts that make it sometimes even harder to bucket correctly. To fix that, we set out to shorten our crash signatures.
We already completed a first step of this effort in June: After we found that templates in signatures were often fluctuating wildly in crashes that belonged to the same bug, all <sometemplate> parts of crash signatures were replaced by just <T>.
That made a signature like this (from bug 1045509, the [@ …] are our customary delimiters for signatures, not really part of the signature itself though):
[@ nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::UsesAutoArrayBuffer() | nsTArray_Impl<unsigned char, nsTArrayFallibleAllocator>::SizeOfExcludingThis(unsigned int (*)(void const*)) ]
be shortended to:
[@ nsTArray_base<T>::UsesAutoArrayBuffer() | nsTArray_Impl<T>::SizeOfExcludingThis(unsigned int (*)(void const*)) ]
Which is definitely somewhat better to read and put in tables like topcrash reports, etc. - and we found it did not munge bugs together into the same signature more than previously, at least to our knowledge.
But we found out we can go even further: Different argument lists of functions (mostly due to overloading) did as far as I remember not help us distinguish any bugs in the >4 years I have been working with crashes - but patches changing types of arguments or adding one to a function often made us lose the connection between a bug and the signature. Therefore, we are removing argument lists from the signatures.
The signature listed above will turn out as:
[@ nsTArray_base<T>::UsesAutoArrayBuffer | nsTArray_Impl<T>::SizeOfExcludingThis ]
Today, we have run a script on Bugzilla (see bug 1178094) to update all affected bugs to add the new shortened signature to the Crash Signatures field without sending a ton of bugmail.
We have tested in the last weeks that Socorro crash-stats can create the new shortened signatures fine on their staging setup and that generation of the special "shutdownhang | …" signatures for browser processes that did take more than 60s to shut down and "OOM | …" for out-of-memory crashes do still work in all cases where they worked before.
As all preparation has been done, we will flip the switch on production Socorro crash-stats in the next days, and then those shortened signatures will be created everywhere.
Note that this will impede some stats that are comparing signatures across days, even though we will see to reprocess some crashes to make the watershed be at a UTC day delimiter so that as few stats as possible are disturbed by the change.
Please let me know of any issues with those changes (as well as any other questions about or issues with crash analysis), and thanks to Lars, Byron (glob) and others who helped with those changes!
We already completed a first step of this effort in June: After we found that templates in signatures were often fluctuating wildly in crashes that belonged to the same bug, all <sometemplate> parts of crash signatures were replaced by just <T>.
That made a signature like this (from bug 1045509, the [@ …] are our customary delimiters for signatures, not really part of the signature itself though):
[@ nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::UsesAutoArrayBuffer() | nsTArray_Impl<unsigned char, nsTArrayFallibleAllocator>::SizeOfExcludingThis(unsigned int (*)(void const*)) ]
be shortended to:
[@ nsTArray_base<T>::UsesAutoArrayBuffer() | nsTArray_Impl<T>::SizeOfExcludingThis(unsigned int (*)(void const*)) ]
Which is definitely somewhat better to read and put in tables like topcrash reports, etc. - and we found it did not munge bugs together into the same signature more than previously, at least to our knowledge.
But we found out we can go even further: Different argument lists of functions (mostly due to overloading) did as far as I remember not help us distinguish any bugs in the >4 years I have been working with crashes - but patches changing types of arguments or adding one to a function often made us lose the connection between a bug and the signature. Therefore, we are removing argument lists from the signatures.
The signature listed above will turn out as:
[@ nsTArray_base<T>::UsesAutoArrayBuffer | nsTArray_Impl<T>::SizeOfExcludingThis ]
Today, we have run a script on Bugzilla (see bug 1178094) to update all affected bugs to add the new shortened signature to the Crash Signatures field without sending a ton of bugmail.
We have tested in the last weeks that Socorro crash-stats can create the new shortened signatures fine on their staging setup and that generation of the special "shutdownhang | …" signatures for browser processes that did take more than 60s to shut down and "OOM | …" for out-of-memory crashes do still work in all cases where they worked before.
As all preparation has been done, we will flip the switch on production Socorro crash-stats in the next days, and then those shortened signatures will be created everywhere.
Note that this will impede some stats that are comparing signatures across days, even though we will see to reprocess some crashes to make the watershed be at a UTC day delimiter so that as few stats as possible are disturbed by the change.
Please let me know of any issues with those changes (as well as any other questions about or issues with crash analysis), and thanks to Lars, Byron (glob) and others who helped with those changes!
Von KaiRo, um 20:14 | Tags: Bugzilla, CrashKill, Mozilla, Socorro | 2 Kommentare | TrackBack: 0