
After suffering a massive outage last week, Skype CIO Lars Rabbe has now detailed what went wrong.
One of the root causes? A bug in the Skype for Windows client (version 5.0.0152).
Rabbe kicks off by explaining that a cluster of support servers responsible for offline instant messaging became overheated on Wednesday, December 22.
A number of Skype clients subsequently started receiving delayed responses from said overloaded servers, which weren’t properly processed by the Windows client in question. This ultimately caused the affected version to malfunction.
Initially, users of Skype’s newer and older Windows software, as well as those using the service on Mac, iPhone and their television sets, were unaffected.
Nevertheless, the whole system collapsed as the faulty version of the Windows client, 5.0.0.152, is by far the most popular – Rabbe says 50% of all Skype users globally were running it, and the crashes caused approximately 40% of those clients to fail.
The clients included roughly a third of all publicly available supernodes, which also failed as a result of this issue.
From the blog post:
A supernode is important to the P2P network because it takes on additional responsibilities compared to regular nodes, acting like a directory, supporting other Skype clients and establishing connections between them by creating local clusters of several hundred peer nodes per each supernode.
Once a supernode has failed, even when restarted, it takes some time to become available as a resource to the P2P network again. As a result, the P2P network was left with 25–30% fewer supernodes than normal. This caused a disproportionate load on the remaining available supernodes.
Rabbe goes on to explain a lot of people who experienced crashing Windows clients started rebooting the software, which caused a huge increase in the load on Skype’s P2P cloud network. He adds that traffic to the supernodes was about 100 times what would normally be expected at the time of day the failure occurred.
A perfect storm in the P2P clouds, so to speak.
To learn how Skype supported the recovery of its supernode network, and what they’ll be doing to prevent this from happening again, I suggest you go read the full blog post.
And major kudos to the company for being so prolific in explaining what happened.
Skype is a software application that allows users to make voice and video calls and chats over the Internet. Calls to other users within the Skype service are free, while calls to both traditional landline telephones and mobile phones can be made for a fee using a debit-based user account system. Skype was founded by Niklas Zennstrom and Janus Friis who were also the founders of the file sharing application Kazaa. Skype has also become popular for its additional...
Austin, TX
Seattle, WA
San Diego, CA
Menlo Park, CA
San Francisco
San Francisco, CA