What seemed like would be a very short period of downtime on Monday has turned out to be quite a long-lasting outage for Blockchain.info — the most popular bitcoin-related website that offers a block chain explorer and a web-based wallet service.
The team reports (via the official Blockchain.info blog) that “a series of technical issues that originated from a database error and resulted in suspension of services.”
The team enforces that the service interruption was not the result of any malicious behavior (like a hack), and notes that “It was simply the result of an obscure bug and a combination of technical factors that expressed this bug under heavy load.”
The detailed explanation, as worded by Blockchain.info:
For those tech savvy people, the specific problem that caused our database cluster to crash was MySQL Error code 625 and 2352. The error in our database affected several tables, across several replicated copies of our database system, specifically those tracking information that is also contained in the Bitcoin blockchain ledger. We were able to diagnose the problem quickly, but because of the very large size of the data sets and the extent of the problem, the fix was not easy. The two specific error codes are only sparsely documented on the web. We decided to pursue multiple methods to try to fix the problem, including re-compiling a custom version of MySQL to get around the error, while simultaneously restoring databases from our backup systems onto new servers. Because of the large dataset, each step in testing, restoring or copying data can take a very long time, making the entire process slow.
And while Blockchain.info doesn’t have access to users’ unencrypted keys, they reminded the public that “at no time were the customer funds or wallets at risk of theft or loss.”
Some users have reported seeing double the balance in their wallets, which Blockchain.info is aware of and repairing.
Services became partially restored on Tuesday, with restorative maintenance projected to be carried on with for up to the next 72 hours.
The service notes they are bringing more servers online to cope with increased traffic as services become restored, and they’ll be publishing a full and in-depth review following recovery.