Friday, August 20, 2010

The Case of the Missing Megabytes

Ever wondered why that DVD you bought that has a capacity of 4.7 GB on the label is able to hold no more than 4.3 gigs? Or why your 500 GB external hard disk maxes out at around 467 GB? I got curious one day and did a little rummaging around. Here’s a thumbnail sketch of what i inferred.

The differences between the binary and decimal number systems are central to understanding the concept behind this.

To start with, main memory (RAM) used in computers is designed with binary logic, and thus, multiples are expressed in powers of 2 rather than 10. Thus, prefixes used for main memory have always referred to their binary interpretation, for e.g.,

1 GB of RAM indicates

1 Gigabyte = 1024 Megabytes = (1024)2 or 1,048,576 Kilobytes = (1024)3 or 1,073,741,824 bytes of RAM

Unlike binary-addressed computer main memory, however, there is nothing in a disk drive that influences it to have a total capacity easily expressed using a power of 1024. Hard disk drive manufacturers have always used the decimal system to characterize their products since as early as 1974.

Thus, a DVD marketed as having 4.7 GB capacity should literally have a storage capacity of

4.7 Gigabytes = 4.7 X 10^9 = 4,700,000,000 bytes (using the SI definition for the prefix ‘Giga’).

Why then, does a DVD after formatting show up as having only 4.38 GB capacity in Windows?

Well, the answer lies in the question here: it’s due to the Windows OS.

Specifically, the method the Windows OS uses to enumerate the number of bytes present in a certain storage medium. When Windows notifies us about the size of a certain medium, it expresses it in the binary sense, just as it does for RAM.

Thus 4.7 GB is still 4,700,000 bytes; but when converted back to the ‘Giga’ order in the binary fashion, it yields:

4,700,000,000 / (1024)3 = 4.377 GB or approx. 4.38 GB

Similarly, a 500 GB hard disk would show up as 465.66 GB or about 466 GB when looked at ‘binari-ly’.

Why this may be more of an issue in the future…

Currently, the most common order of magnitude used related to digital data storage is probably the Gigabyte.

With 1 gigabyte, the percentage difference that arises between the SI (metric/decimal) and binary versions would be:

[ {(1024)3 – (1000)3} / 10003 ] X 100 % = 7.37 %

Calculating similar differences for other orders of magnitude,

Prefix used

Order of magnitude (SI/Bin.)

%age difference

Kilo

1000 (103) /1024 (210)

2.4

Mega

10002/10242

4.9

Giga

10003/10243

7.4

Tera

10004/10244

10

Peta

10005/10245

12.6

Exa

10006/10246

15.3

Zeta

10007/10247

18.1

Yotta

10008/10248

20.9

As time progresses, data storage capacities are burgeoning exponentially. Thus,if the ambiguity continues, the rift between the reported (binary) and actual (decimal) numbers shall continue to rise, fast.

Why has nothing been done to resolve all the confusion?

Interestingly, steps have been taken! The IEEE (Institute of Electrical and Electronics Engineers) in 2000 adopted the proposal that the IEC (International Electrotechnical Commission) for using a new set of binary prefixes to refer to powers of 1024. The SI prefix would then unambiguously refer to powers of 1000 (i.e, the decimal sense), even when used in the context of data storage capacities.

For example, according to the new ruleset, 1 Megabyte or 1 MB would refer to 1 million (1000^3) bytes; whereas 1 Mebibyte or 1 MiB would refer to 1024^3 bytes. The ‘Me’ part is is drawn from the SI prefix for Mega, and ‘bi’ denotes the binary sense, and so on for other orders of magnitude. Here are the prefixes and symbols for the first few orders of magnitude in both the systems:

Decimal system:
10^1 = 10
10^3 = 1000 (kilo,K)
10^6 = 1000000 (mega,M)
10^9 = 1000000000 (giga,G)

Binary:
2^1 = 2
2^10 = 1024 = (binary kilo; kibi,Ki)
2^20 = 1048576 (binary mega; mibi,Mi)
2^30 = 1073741824 (binary giga; gibi,Gi)

So if there is such a lucid solution, why isn’t it visible everywhere? Even though the standardization has been done and rules have been set, the IT industry and the press are yet to adopt the changes in a big way.

However, the new nomenclature is starting to appear in the EU computing industry and marketplace (as is required by a law passed in 2007) and certain US and International Government contexts.

Also, the new version of the Mac OS, the Snow Leopard 10.6 displays the capacity of any data storage device in accordance with the decimal system. Some versions of Linux are beginning to display the size properties of a disk drive in a very clear-cut manner, for example your 500 gig external HDD would show up has having a capacity of 500 GB/ 467 GiB.

Windows, however, has been showing no signs of adopting the new system or using the appropriate prefixes where necessary.

Winding up….

There have been cases lodged against disk drive manufacturers (e.g., Western Digital) stating the latter’s incorrect usage of the term mega/gigabyte; that they should be using the traditional definition of 1 MB = 1024^2 bytes instead of using 1 MB = 1 million bytes. Even though the manufacturers are technically correct (as IEEE an IEC standards define a megabyte as a million bytes), they say the same has not been adopted by the industry at large.

And then there are the multitudes who accuse Microsoft of not implementing the changes in the disk capacity nomenclature within the Windows OS. They are in favour of using, say, GiB instead of GB in reference to disk drive capacity.

So where does this argument go at the end of the day?

I’d say, the simplest solution would be for Windows to take a cue from Linux and display the size both from both the decimal and binary perspectives (in terms of both GB/MB and GiB/MiB, respectively). Other people may think of some other way. But whatever may be, something or the other should be implemented sooner rather than later.

________________________________________________________________________________________________________

P.S.- In Windows, the properties window of a certain disk drive displays the size in bytes along with the GB value. You can get an idea of the decimal value of the disk size this way.

e241d87d-04f5-427a-9313-20fb72827e92

Here the capacity displayed is 39 GB and 41,943,089,152 bytes; which is about 42 GB. Thus the size is 42 GB = 39 GiB.

 

4 comments: