Comment by WalterBright
5 hours ago
D made a great leap forward with the following:
1. bytes are 8 bits
2. shorts are 16 bits
3. ints are 32 bits
4. longs are 64 bits
5. arithmetic is 2's complement
6. IEEE floating point
and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!
Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.
Zig is even better:
1. u8 and i8 are 8 bits.
2. u16 and i16 are 16 bits.
3. u32 and i32 are 32 bits.
4. u64 and i64 are 64 bits.
5. Arithmetic is an explicit choice. '+' overflowing is illegal behavior (will crash in debug and releasesafe), '+%' is 2's compliment wrapping, and '+|' is saturating arithmetic. Edit: forgot to mention @addWithOverflow(), which provides a tuple of the original type and a u1; there's also std.math.add(), which returns an error on overflow.
6. f16, f32, f64, f80, and f128 are the respective but length IEEE floating point types.
The question of the length of a byte doesn't even matter. If someone wants to compile to machine whose bytes are 12 bits, just use u12 and i12.
Same deal with Rust.
This is the way.
How does 5 work in practice? Surely no one is actually checking if their arithmetic overflows, especially from user-supplied or otherwise external values. Is there any use for the normal +?
You think no one checks if their arithmetic overflows?
1 reply →
"1. bytes are 8 bits"
How big is a bit?
A bit is a measure of information theoretical entropy. Specifically, one bit has been defined as the uncertainty of the outcome of a single fair coin flip. A single less than fair coin would have less than one bit of entropy; a coin that always lands heads up has zero bits, n fair coins have n bits of entropy and so on.
https://en.m.wikipedia.org/wiki/Information_theory
https://en.m.wikipedia.org/wiki/Entropy_(information_theory)
That is a bit in information theory. It has nothing to do with the computer/digital engineering term being discussed here.
1 reply →
This doesn't feel like a serious question, but in case this is still a mystery to you… the name bit is a portmanteau of binary digit, and as indicated by the word "binary", there are only two possible digits that can be used as values for a bit: 0 and 1.
At least 2 or 3
> How big is a bit?
A quarter nybble.
How philosophical do you want to get? Technically, voltage is a continuous signal, but we sample only at clock cycle intervals, and if the sample at some cycle is below a threshold, we call that 0. Above, we call it 1. Our ability to measure whether a signal is above or below a threshold is uncertain, though, so for values where the actual difference is less than our ability to measure, we have to conclude that a bit can actually take three values: 0, 1, and we can't tell but we have no choice but to pick one.
The latter value is clearly less common than 0 and 1, but how much less? I don't know, but we have to conclude that the true size of a bit is probably something more like 1.00000000000000001 bits rather than 1 bit.
A bit is either a 0 or 1. A byte is the smallest addressable piece of memory in your architecture.
Technically the smallest addressable piece of memory is a word.
2 replies →
Which … if your heap always returns N bit aligned values, for some N … is there a name for that? The smallest heap addressable segment?
Yeah, this is something Java got right as well. It got "unsigned" wrong, but it got standardizing primitive bits correct
byte = 8 bits
short = 16
int = 32
long = 64
float = 32 bit IEEE
double = 64 bit IEEE
I like the Rust approach more: usize/isize are the native integer types, and with every other numeric type, you have to mention the size explicitly.
On the C++ side, I sometimes use an alias that contains the word "short" for 32-bit integers. When I use them, I'm explicitly assuming that the numbers are small enough to fit in a smaller than usual integer type, and that it's critical enough to performance that the assumption is worth making.
<cstdint> has int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, and uint64_t. I still go back and forth between uint64_t, size_t, and unsigned int, but am defaulting to uint64_t more and more, even if it doesn't matter.
hindsight has its advantages
> you have to mention the size explicitly
It's unbelievably ugly. Every piece of code working with any kind of integer screams "I am hardware dependent in some way".
E.g. in a structure representing an automobile, the number of wheels has to be some i8 or i16, which looks ridiculous.
Why would you take a language in which you can write functional pipelines over collections of objects, and make it look like assembler.
4 replies →
Yep. Pity about getting chars / string encoding wrong though. (Java chars are 16 bits).
But it’s not alone in that mistake. All the languages invented in that era made the same mistake. (C#, JavaScript, etc).
Java was just unlucky, it standardised it's strings at the wrong time (when Unicode was 16-bit code points): Java was announced in May 1995, and the following comment from the Unicode history wiki page makes it clear what happened: "In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. ..."
Java strings are byte[]'s if their contents contain only Latin-1 values (the first 256 codepoints of Unicode). This shipped in Java 9.
JEP 254: Compact Strings
https://openjdk.org/jeps/254
What's the right way?
3 replies →