Thursday, February 22, 2007

Disk failures in data center

Really interesting paper from Google about disk failures in large disk population (over 100k). Also information if SMART can be reliably used in disk failure prediction.

There's also another paper presented by people from Carnegie Mellon University.

Some observations are really surprising - like you get higher probability of disk failure if it's running below 20C than if it's running about 40C. Another interesting thing is that SATA disks seem to have similar ARR to FC and SCSI disks.

Anyone managing "disks" should read those two papers.

update: NetApp response to above papers.
Also interesting paper from Seagate and Microsoft on SATA disks.

This time IBM responded.

