UPDATE: ZFS dedup finally integrated!
With integration of this RFE we are closer (hopefully) to ZFS buil-in de-duplication. Read more on Eric's blog.
Once ZFS re-writer and de-duplication are done in theory one should be able to do a zpool upgrade of current pool and de-dup all data which is already there... we will see :)
Eric mentioned on his blog that in reality we should use sha256 or stronger. I would go even further - two modes, one mode you depend entirely on block checksum and the other one where you actually compare byte-by-byte given block to be 100% sure they are the same. Slower for some workloads but safe.
Now, de-dup which "understands" your data would be even better (analyzing file contents - like emails, attachments and de-dup on attachment level, etc.), nevertheless block level one would be a good start.