This test part of the RealDB project. See site at http://www.gillius.org/realdb/

Environment:
 * Generic "ALLIN1 2.0" multi card reader connected by USB 2.0
 * Windows XP SP2
 * SanDisk 128MB CompactFlash card, probably purchased circa mid 2004, copyright date on label is 2000
   * Formatted Fat16
 * Block file with 16384 blocks of 256 bytes each, mode "rwd"
 * Revision 65,66 of CorruptionTest

This newest version of CorruptionTest prints out the time it took to perform an operation as well as allow to configure the file mode. For these tests, I wanted to test with the "rwd" setting, which says that it does writes synchronously. I'm assuming this essentially means that a write has an implicit flush and sync to disk before returning. The API says that supporting this is optional, but in Windows it definitely is doing something, as an "rwd" clear takes 69.583 seconds versus an "rw" write which takes 3.694 seconds. Similar effects occur on local magnetic drives.

I ran to pull out the entire USB writer device from the USB port. Each dot below is 2% progress.

Creating file with timestamp of Wed Jul 16 22:15:28 EDT 2008
...........20%..........40%..........60%..........80%..........100%
Completed clear in 69.582820953 seconds
validate
...........20%..........40%..........60%..........80%..........100%
Blocks good=0, bad_data=0, bad_block_num=0, clear=16384
New timestamp Wed Jul 16 22:15:28 EDT 2008 found at block 0
Completed validate in 0.095350881 seconds
write
Creating file with timestamp of Wed Jul 16 22:21:25 EDT 2008
...........20%..........40%..........60%...Exception in thread "main" java.io.IOException: The system cannot find the file specified

validate
...........20%..........40%..........60%..........80%..........100%
Blocks good=11016, bad_data=0, bad_block_num=0, clear=5368
New timestamp Wed Jul 16 22:21:25 EDT 2008 found at block 0
New timestamp Wed Jul 16 22:15:28 EDT 2008 found at block 11016
Completed validate in 1.433354049 seconds

In this test, no corruption resulted. 67% of the blocks were good, which corresponds with the 66-68% reading on the write bar.

I modified the test to print out the block failed when the exception occurs, to determine if truly only 1 block was lost (Revision 66). The file left over from above was used, so it had 11016 writen, 5368 clear blocks. I pulled out the entire USB device during a write:

write
Creating file with timestamp of Wed Jul 16 22:29:49 EDT 2008
..........Write failed while writing block 3211

validate
...........20%..........40%..........60%..........80%..........100%
Blocks good=11016, bad_data=0, bad_block_num=0, clear=5368
New timestamp Wed Jul 16 22:29:49 EDT 2008 found at block 0
New timestamp Wed Jul 16 22:21:25 EDT 2008 found at block 3212
New timestamp Wed Jul 16 22:15:28 EDT 2008 found at block 11016
Completed validate in 1.432589887 seconds

This means that block 3211 was actually written. This was the block being written in the actual write call to the file, and because the CRC passed and the timestamp was right, the block was valid and written, which means that I must have pulled the device out during the actual call to the method, and not inbetween calls. So, I am surprised to not see corruption. It's possible that there is some logic in the card to keep blocks secure.

Trying on smaller files, I wanted to repeat the test multiple times, reallocating a file each time. Block size 256, number of blocks 1024 (256KB file). Each time I did:

allocate 1024
clear
write

Pulled CF out of socket during write, left device in: failed at 505. Validate failed reading the drive halfway into the file, and stalled for over a minute. I had to pull out the USB device to stop the process.
Pulled CF out of socket during write, left device in: failed at 424. Validate read 425 good blocks (blocks 0 to 424).
Pulled CF out of socket during write, left device in: failed at 823. Validate read 824 good blocks, so block 823 was written.
Pulled CF out of socket during write, left device in: failed at 339. Validate read 340 good blocks.
Pulled CF out of socket during write, left device in: failed at 455. Validate read 456 good blocks.

In 4 out of 5 cases, the block I was writing when it failed actually made it out to disk. In one case, the OS stalled trying to read the file past the corruption point. I don't know how this is possible unless the FAT file system is modified whenever the file is written. My hope was that it is not -- that a fixed size file would reduce possibility of corruption by a bad FS (which I cannot control).

----------------------

In an attempt to cause corruption, I increased the block size to 4096, in the hopes that the block size is larger than what the hardware can handle atomically. I should see partially written blocks when I pull the card out, because even if 256 or maybe 512 bytes are correct, the CRC will still fail as a whole.

On a 4MB file, the speed is increased ("rwd" mode) from 69.583 seconds to 3.791 seconds. I expected a faster speed since there is less waiting for syncs to occur. Bigger transactions usually mean more throughput.

Pulled CF out of socket during write, left device in: failed at 346. I read 347 good blocks.
Pulled CF out of socket during write, left device in: failed at 475. I read 476 good blocks.

I am still amazed that there is no corruption. I increased block size to 64K (65536 bytes). I can't imagine the card could atomically commit this much data on a power loss. The clear time is reduced to 1.962 seconds.

Pulled CF out of socket during write, left device in: failed at 26. I read 27 good blocks.
Pulled CF out of socket during write, left device in: failed at 52. I read 53 good blocks.

I do not believe these results. I can't believe that the drive could write that much data out. Every block I write contains the timestamp, block number, and random data, so this can't be a trick of reading "old" data from previous tests on the card. I even tried removing the device from Windows, thinking it would clear any possible cache and poewr cycle the reader. The validate does take 1.4 seconds to run the first time, and 0.02 afterwards, so evidence suggest that the first read is not cached.

----------------------

One last attempt! 1MB block size (1048576 bytes)! 8 blocks!

clear time = 4.355578477
write failed while writing block 3.
I got 4 good blocks and 4 clear blocks. So somehow, again, the write succeeded. The CRC is at the end of the block, so... I don't get it, unless the device/card has a huge buffer and capacitance to continue writing after I pull the card out.

----------------------

I turned off my computer for the night and next day turned it on and reran the validate on the file from last night. Still, 4 blocks were good:

validate
...........20%..........40%..........60%..........80%..........100%
Blocks good=4, bad_data=0, bad_block_num=0, clear=4
New timestamp Wed Jul 16 23:07:03 EDT 2008 found at block 0
New timestamp Wed Jul 16 23:06:02 EDT 2008 found at block 4
Completed validate in 2.714040474 seconds

So, with a reboot of the OS, it still reads fine, so the OS can't be doing any bugged caching between insertions of the card. I cannot believe that the flash is programmed in 1MB blocks at a time, and I can't believe that write cache on the card is anywhere near 1MB so that it could hold all of that uncommitted data in memory and yet still have almost half a second of charge left after I unplug it to finish committing the data. I know that the blocks in the FAT FS are not that big -- but even if the OS had some mechanism whereby it doesn't show the file as containing a block until I write it, my program failed on block 3 (the 4th block)! And yet, block 3 has a valid CRC at the end and a valid block number and timestamp at the beginning.

So, one of three things is true:
 1. My assumptions on caching and charge are wrong.
 2. My code is somehow wrong and I see what looks like correct data even though it is not (for example bad handling of CRC).
 3. The card actually writes quickly and the OS spends most of the time waiting on the "sync" operation at the end.