Talk:Byte: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Greg Woodhouse
m (significant typo)
imported>Joshua David Williams
Line 147: Line 147:


It may be that we're on the cusp of a significant shift in terminology (rather like the new consensus that Pluto isn't a planet), but you're still going against the grain of industry practice here, and you do need to be careful not to use CZ articles for advocacy. [[User:Greg Woodhouse|Greg Woodhouse]] 22:41, 13 April 2007 (CDT)
It may be that we're on the cusp of a significant shift in terminology (rather like the new consensus that Pluto isn't a planet), but you're still going against the grain of industry practice here, and you do need to be careful not to use CZ articles for advocacy. [[User:Greg Woodhouse|Greg Woodhouse]] 22:41, 13 April 2007 (CDT)
:What do you think of this?
<blockquote>===Conflicting definitions===
{{main|Binary prefix}}
Traditionally, the computer world has used a value of 1024 instead of 1000 when referring to a kilobyte. This was done because programmers needed a number compatible with the base of 2, and 1024 is equal to 2 to the 10th [[Exponentiation|power]]. Due to the large confusion between these two meanings, an effort has been made by the [[International Electrotechnical Commission]] (EIC) to remedy this problem. They have standardized a new system called the '[[binary prefix]]', which replaces the word 'kilobyte' with 'ki'''bi'''byte', abbreviated as KiB. This solution has since been approved by the [[IEEE]] on a trial-use basis, and may prove to one day become a true standard.<ref>{{cite web
| url=http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&isnumber=26611&arnumber=1186538&punumber=8450
| title=IEEE Trial-Use Standard for Prefixes for Binary Multiples
| date=Accessed April 14th, 2007
}}</ref>
<br><br>
While the difference between 1000 and 1024 may seem trivial, one must note that as the size of a disk increases, ''so does the margin of error''. The difference between 1TB and 1TiB, for instance, is approximately 10%. As hard drives become larger, the need for a distinction between these two prefixes will grow. This has been a problem for hard disk drive manufacturers in particular. For example, one well known disk manufacturer, [[Western Digital]], has recently been taken to court for their use of the base of 10 when labeling the capacity of their drives. This is a problem because labeling a hard drive's capacity with the base of 10 implies a greater storage capacity when the consumer may assume it refers to the base of 2. <ref>{{cite web
| url=http://www.betanews.com/article/Western_Digital_Settles_Capacity_Suit/1151510648
| title=Western Digital Settles Capacity Suit
| author=Nate Mook
| date=2006-06-28
}}</ref></blockquote>
--[[User:Joshua David Williams|Joshua David Williams]] 21:46, 14 April 2007 (CDT)

Revision as of 20:46, 14 April 2007


Article Checklist for "Byte"
Workgroup category or categories Computers Workgroup [Editors asked to check categories]
Article status Developed article: complete or nearly so
Underlinked article? Yes
Basic cleanup done? Yes
Checklist last edited by Joshua David Williams 21:57, 12 April 2007 (CDT); Eric M Gearhart 16:52, 6 April 2007 (CDT)

To learn how to fill out this checklist, please see CZ:The Article Checklist.





missing on purpose?

Hi, I miss info about the small and big-endian. It should IMHO be part of the byte story. Robert Tito |  Talk  20:54, 6 April 2007 (CDT)

I did not mention it because, quite honestly, that's largely outside my scope of knowledge. If you're knowledgeable in that area, we would appreciate a contribution to the article :) --Joshua David Williams 21:03, 6 April 2007 (CDT)
Edit - I did not realize you're an editor when I wrote that. If you're busy, I could find another user to help (Eric may be able to). In answer to your question, no, it was not excluded purposely - that is, to not include it at all. --Joshua David Williams 21:06, 6 April 2007 (CDT)

what is it

Big and small-endian refer to the 'sign'-bit, in big-endian it is at the end of the byte, in small at the begin (or the other way around - I still look that up). It is used in diverse protocols to discrimninate them from others. The best known IPX/PX versus TCP/IP. It took quite some problemsolving for cisco to let these two networks communicate without problem (it gave rise to their iOS version 13 and above - created when I was on the phone with them.) Signs of bytes are of importance for the variables needed to transfer specific information. Robert Tito |  Talk  21:43, 6 April 2007 (CDT)

Should this be a separate article that deserves a mention on Byte? Remember we don't want to overwhelm the average person with too much info stuffed into the Byte article --Eric M Gearhart 04:07, 7 April 2007 (CDT)

I think it is more relevant than all the prefixes as it IS info within a Byte. Robert Tito |  Talk  09:01, 7 April 2007 (CDT)

See this jargon? That's exactly what we want to avoid. I can't make heads or tails of it. Could someone please explain this in layman's terms? --Joshua David Williams 09:47, 7 April 2007 (CDT)

big and small endians are only the way to tell the machine what the sign of the byte is: signed (+) or unsigned. Signed means only positive values are allowed. unsigned means the whole range of number space can be used. If your address space allows for file sizes up to 4 GB and you use an unsigned int to address it you CAN access that space. Using a signed variable allows you to address only 2 GB. Endian types only state where that sign is stored in the byte: the low bit or the high bit. nothing more nothing less. Some compilers use predominantly big others small endian variable. windows and unix in general use the two different styles. Robert Tito |  Talk 

I don't think I'd put it that way. First of all, the sign bit is a function of how integer values are encoded, not of bytes themselves. Big endian means most significant bit first, and little endian means least significant bit first. We write numbers in big endian form because 24 is 20 + 4, and the most significant digit comes first. Of common architectures, the i386 (including the Pentium etc.) is little endian, and virtually everything else is big endian. Oh, and you might want mention the connection to "Gulliver's Travels".Greg Woodhouse 22:40, 12 April 2007 (CDT)
I think I'm finally starting to understand this concept clearly. I'm going to re-write the endianness section of the article to make it a bit clearer, especially of what the "most significant byte" is. --Joshua David Williams 22:49, 12 April 2007 (CDT)

OK I will try and work in a one-liner on Byte, something like an "also worth mentioning is whether a Byte is big-endian or little-endian" and a link to an Endianness article.. maybe with Big endian and Little endian redirecting to it.

And yea holy crap the Wikipedia article looks more like "Look at me I can write terse technical articles" rather than striving to be reachable to the masses.

To clarify on Rovert's signed versus unsigned example: You would use an "unsigned" variable for a file system, because you're only going to deal with positive numbers. You would use a "signed" (meaning has positive and negative) address space when talking about a number that can be from -2 to positive 2 (for example).

In very very simple terms, "big endian" means you're placing importance on the leftmost numbers first. "Little endian" means you're placing importance on the rightmost numbers first.

For example: Networks generally use big-endian order; the historical reason is that this allowed routing while a telephone number was being composed.

757-421-2233 is big endian, because first comes the area code (Virginia), then 421 is the prefix (Norfolk), and then the last four numbers actually get you to the specific house.

That's the type of explanation we need in the Endianness article in my opinion --Eric M Gearhart 10:43, 7 April 2007 (CDT)

Not totally true but nice as metaphor. Robert Tito |  Talk  11:29, 7 April 2007 (CDT)

bigger better?

Both LaCie and Iomega have single disk-enclosures out with disks of 1 TB below US$500. The density of the data however is that high these disks cannot be used without solid error-correction. Bigger is not always better, at most easier. Robert Tito |  Talk  09:36, 7 April 2007 (CDT)

What else should be added?

I did a bit of research on the topic of endianness and added a section for it. If anything I said is inaccurate, please correct it. Also, what else should we add? --Joshua David Williams 19:12, 12 April 2007 (CDT)

Hexer image

Should Image:Hexer.png be in this article? I'm not sure since it shows the data in hexadecimal format instead of binary. --Joshua David Williams 19:19, 12 April 2007 (CDT)

Well bytes can be represented in Hex or binary (or octal or decimal or...). I'd say that the caption of the image should reflect that "these values represent bytes in Hexadecimal." --Eric M Gearhart 20:02, 12 April 2007 (CDT)

Integers

Is this the place to discuss how signed values are encoded (i.e., one's complement vs. two's complement)? Greg Woodhouse 22:42, 12 April 2007 (CDT)

kibibyte?

I'd like to hear what other editors have to say, but kibibyte sounds like a neologism that never really gained acceptance. Certainly, I've never heard it used. A Google search did turn up an interesting page though, [1]. Apparently, there actually was a proposal circulated some years ago, but I don't know how far it went. As a general rule, powers of 2 are used for disk storage. For example a typical block size on modern filesystems is 4K, mean 4096 bytes, not 4000. On the other hand, data rates are always expressed in powers of 10. The 10 in 10base-T means 10 megabits per second, and the nominal data rate ofr 100base-T is 100 megabits per second. Greg Woodhouse 23:18, 12 April 2007 (CDT)

Okay, here you go

1541-2002

IEEE Trial-Use Standard for Prefixes for Binary Multiples

Status: Active
Publication Date: 2003
Page(s): 0_1- 4
E-ISBN: 0-7381-3386-8
ISSN: 
ISBN: 0-7381-3385-X
Year: 2003 
Sponsored by: 
   SCC14

OPAC Link: http://ieeexplore.ieee.org/servlet/opac?punumber=8450

Calling this terminology "standard" overstates things, IMO. Greg Woodhouse 23:32, 12 April 2007 (CDT)

See this page as well. --Joshua David Williams 23:34, 12 April 2007 (CDT)

Yes, I saw that, too. IEC might publish a standard, but the IEEE approach is much more, well, realistic. Truth be told, I can't even find the IEC document, so I'm not sure of its status, but I think IEC is just spelling out the meaning of some new words, should you choose to use them. At best, I think this terminology can be called experimental. Greg Woodhouse 23:49, 12 April 2007 (CDT)

So how should we deal with it in this article then? --Joshua David Williams 10:09, 13 April 2007 (CDT)
I've heard it quite a bit (hehe) in the last few months, but never before that. I wouldn't call it a standard now, but it's definitely worth mentioning because the differences are going to be very large. From what i've seen, only the "1337" are using "KiB". Andrew Swinehart 10:22, 13 April 2007 (CDT)

Differences

In a recent edit, Phillip Stewart changed the percentage of difference between a yottabyte and a yobibyte from 1.209% to 17.281%. Is this correct, a mistake, or vandalism? I used the formula (2^80)/(10^24) to calculate my number. --Joshua David Williams 00:00, 13 April 2007 (CDT)

I've reverted Phillip's version for the sake of consistency. I believe that he was incorrect. If not, please post a message regarding this. This is important information that we must know - and agree upon - when writing an article. --Joshua David Williams 00:25, 13 April 2007 (CDT)

I don't think so. See for yourself

1 - (pow(10, 24)/ (pow(2, 80)
= 0.17281938745
 
0.17281938745 * 100
= 17.281938745

so 17.2819% is right. Greg Woodhouse 00:29, 13 April 2007 (CDT)

Ah, I see my mistake now. I'll fix the table, but we'll need to check these things very carefully afterwards. --Joshua David Williams 00:32, 13 April 2007 (CDT)
I think we should use the raw numbers and not percents. IMO, they're much easier to understand and calculate. (2^10)-(10^3), (2^20)-(10^6), etc. Thoughts? Andrew Swinehart 10:38, 13 April 2007 (CDT)

I'd stick with percentages, as the point is that the differences can be substantial. (Did you see the footnote about the disk manufacturer that tried to use powers of 10 and the subsequent law suit?) Of course, if there's room, raw numbers might be a good thing to include, too. Greg Woodhouse

But the raw numbers actually show the difference more. KB vs. KiB is 24, MB vs. MiB is 48576, GB is 73741824, and they just keep getting bigger. I say the raw numbers show the difference much better than percents. Or, we could just add another column. Andrew Swinehart 11:06, 13 April 2007 (CDT)

Another column would be great if it fits alright. --Joshua David Williams 11:08, 13 April 2007 (CDT)

I went ahead and added a column, but I couldn't find a calculator that could tell me the exact answer for the last row, so I had to use scientific notation. If any of you can get the exact answer, that'd be great. --Joshua David Williams 11:18, 13 April 2007 (CDT)
I found the exact number here. I checked it, and it's correct. --Joshua David Williams 16:36, 13 April 2007 (CDT)

Nibble and word

Should nibble and word be combined into this article, just as megabyte is? It seems to me that there really isn't much to say about these topics that couldn't be said here briefly. --Joshua David Williams 12:47, 13 April 2007 (CDT)

Finished?

I believe this article is finished. Could an editor please take a final look at it? --Joshua David Williams 17:00, 13 April 2007 (CDT)

1024 vs. 1000 again

I really wish you would revise the section where you discuss units of storage, because the statment you make, that the use kilobytes as a unit of measurement is "non-standard" is factually incorrect. What is a correct statement is that, due to this potentially confusing terminology, IEC has standardized the terms kibibyte, etc. Standardizing the meaning of word B does not mean that use of word A is no longer standard, though using word B to mean something other than what IEC has defined it to mean would be. The obvious caveat here is that if the meaning of word A is also redefined, then the old use can be considered non-standard. I'm sorry to be a stickler here, but it it's important to be precise. By the way, I think your use of the lawsuit to show why it is important to have standard terminology is important. You also might consider citing the IEEE document I mentioned above.

If I were you, I'd say something roughly like this (by all means rephtase and flesh it out as you see fit): Storage is measured in units that are powers of 2, but data rates are measured in units that are powers of 10. This means that in some contexts 1 kB = 1024 kB, but in other contexts, 1 kB = 1000 B. This is potentially confusing (mention the law suit), so IEC has standardized a set of binary prefixes, and IEEE approved them as a trial use standard.

It may be that we're on the cusp of a significant shift in terminology (rather like the new consensus that Pluto isn't a planet), but you're still going against the grain of industry practice here, and you do need to be careful not to use CZ articles for advocacy. Greg Woodhouse 22:41, 13 April 2007 (CDT)

What do you think of this?

===Conflicting definitions===

For more information, see: Binary prefix.

Traditionally, the computer world has used a value of 1024 instead of 1000 when referring to a kilobyte. This was done because programmers needed a number compatible with the base of 2, and 1024 is equal to 2 to the 10th power. Due to the large confusion between these two meanings, an effort has been made by the International Electrotechnical Commission (EIC) to remedy this problem. They have standardized a new system called the 'binary prefix', which replaces the word 'kilobyte' with 'kibibyte', abbreviated as KiB. This solution has since been approved by the IEEE on a trial-use basis, and may prove to one day become a true standard.[1]

While the difference between 1000 and 1024 may seem trivial, one must note that as the size of a disk increases, so does the margin of error. The difference between 1TB and 1TiB, for instance, is approximately 10%. As hard drives become larger, the need for a distinction between these two prefixes will grow. This has been a problem for hard disk drive manufacturers in particular. For example, one well known disk manufacturer, Western Digital, has recently been taken to court for their use of the base of 10 when labeling the capacity of their drives. This is a problem because labeling a hard drive's capacity with the base of 10 implies a greater storage capacity when the consumer may assume it refers to the base of 2. [2]

--Joshua David Williams 21:46, 14 April 2007 (CDT)