Comment by WalterBright

8 hours ago

D made a great leap forward with the following:

1. bytes are 8 bits

2. shorts are 16 bits

3. ints are 32 bits

4. longs are 64 bits

5. arithmetic is 2's complement

6. IEEE floating point

and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!

Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.

45 comments

WalterBright

Laremere 6 hours ago

Zig is even better:

1. u8 and i8 are 8 bits.

2. u16 and i16 are 16 bits.

3. u32 and i32 are 32 bits.

4. u64 and i64 are 64 bits.

5. Arithmetic is an explicit choice. '+' overflowing is illegal behavior (will crash in debug and releasesafe), '+%' is 2's compliment wrapping, and '+|' is saturating arithmetic. Edit: forgot to mention @addWithOverflow(), which provides a tuple of the original type and a u1; there's also std.math.add(), which returns an error on overflow.

6. f16, f32, f64, f80, and f128 are the respective but length IEEE floating point types.

The question of the length of a byte doesn't even matter. If someone wants to compile to machine whose bytes are 12 bits, just use u12 and i12.

Cloudef 29 minutes ago

Zig allows any uX and iX in the range of 1 - 65,535, as well as u0
__turbobrew__ 5 hours ago

This is the way.
notfed 4 hours ago

Same deal with Rust.
mort96 2 hours ago

Eh I like the nice names. Byte=8, short=16, int=32, long=64 is my preferred scheme when implementing languages. But either is better than C and C++.
Spivak 5 hours ago
How does 5 work in practice? Surely no one is actually checking if their arithmetic overflows, especially from user-supplied or otherwise external values. Is there any use for the normal +?
- dullcrisp 5 hours ago
  
  You think no one checks if their arithmetic overflows?
  
  2 replies →

gerdesj 7 hours ago

"1. bytes are 8 bits"

How big is a bit?

thamer 6 hours ago
This doesn't feel like a serious question, but in case this is still a mystery to you… the name bit is a portmanteau of binary digit, and as indicated by the word "binary", there are only two possible digits that can be used as values for a bit: 0 and 1.
- seoulbigchris 14 minutes ago
  
  So trinary and quaternary digits are trits and quits?
basementcat 5 hours ago
A bit is a measure of information theoretical entropy. Specifically, one bit has been defined as the uncertainty of the outcome of a single fair coin flip. A single less than fair coin would have less than one bit of entropy; a coin that always lands heads up has zero bits, n fair coins have n bits of entropy and so on.
https://en.m.wikipedia.org/wiki/Information_theory
https://en.m.wikipedia.org/wiki/Entropy_(information_theory)
- fourier54 4 hours ago
  
  That is a bit in information theory. It has nothing to do with the computer/digital engineering term being discussed here.
  
  1 reply →
CoastalCoder 7 hours ago

> How big is a bit?
A quarter nybble.
dullcrisp 4 hours ago

At least 2 or 3
nonameiguess 5 hours ago

How philosophical do you want to get? Technically, voltage is a continuous signal, but we sample only at clock cycle intervals, and if the sample at some cycle is below a threshold, we call that 0. Above, we call it 1. Our ability to measure whether a signal is above or below a threshold is uncertain, though, so for values where the actual difference is less than our ability to measure, we have to conclude that a bit can actually take three values: 0, 1, and we can't tell but we have no choice but to pick one.
The latter value is clearly less common than 0 and 1, but how much less? I don't know, but we have to conclude that the true size of a bit is probably something more like 1.00000000000000001 bits rather than 1 bit.
poincaredisk 7 hours ago
A bit is either a 0 or 1. A byte is the smallest addressable piece of memory in your architecture.
- elromulous 7 hours ago
  
  Technically the smallest addressable piece of memory is a word.
  
  3 replies →
- Nevermark 7 hours ago
  
  Which … if your heap always returns N bit aligned values, for some N … is there a name for that? The smallest heap addressable segment?

stkdump 3 hours ago

I mean practically speaking in C++ we have (it just hasn't made it to the standard):

1. char 8 bit

2. short 16 bit

3. int 32 bit

4. long long 64 bit

5. arithmetic is 2s complement

6. IEEE floating point (float is 32, double is 64 bit)

Along with other stuff like little endian, etc.

Some people just mistakenly think they can't rely on such stuff, because it isn't in the standard. But they forget that having an ISO standard comes on top of what most other languages have, which rely solely on the documentation.

mort96 2 hours ago

> (it just hasn't made it to the standard)
That's the problem

cogman10 7 hours ago

Yeah, this is something Java got right as well. It got "unsigned" wrong, but it got standardizing primitive bits correct

byte = 8 bits

short = 16

int = 32

long = 64

float = 32 bit IEEE

double = 64 bit IEEE

jltsiren 7 hours ago
I like the Rust approach more: usize/isize are the native integer types, and with every other numeric type, you have to mention the size explicitly.
On the C++ side, I sometimes use an alias that contains the word "short" for 32-bit integers. When I use them, I'm explicitly assuming that the numbers are small enough to fit in a smaller than usual integer type, and that it's critical enough to performance that the assumption is worth making.
- jonstewart 6 hours ago
  
  <cstdint> has int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, and uint64_t. I still go back and forth between uint64_t, size_t, and unsigned int, but am defaulting to uint64_t more and more, even if it doesn't matter.
- Jerrrrrrry 7 hours ago
  
  hindsight has its advantages
- kazinator 6 hours ago
  
  > you have to mention the size explicitly
  It's unbelievably ugly. Every piece of code working with any kind of integer screams "I am hardware dependent in some way".
  E.g. in a structure representing an automobile, the number of wheels has to be some i8 or i16, which looks ridiculous.
  Why would you take a language in which you can write functional pipelines over collections of objects, and make it look like assembler.
  
  7 replies →
josephg 7 hours ago
Yep. Pity about getting chars / string encoding wrong though. (Java chars are 16 bits).
But it’s not alone in that mistake. All the languages invented in that era made the same mistake. (C#, JavaScript, etc).
- davidgay 4 hours ago
  
  Java was just unlucky, it standardised it's strings at the wrong time (when Unicode was 16-bit code points): Java was announced in May 1995, and the following comment from the Unicode history wiki page makes it clear what happened: "In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. ..."
- jeberle 5 hours ago
  
  Java strings are byte[]'s if their contents contain only Latin-1 values (the first 256 codepoints of Unicode). This shipped in Java 9.
  JEP 254: Compact Strings
  https://openjdk.org/jeps/254
- paragraft 7 hours ago
  
  What's the right way?
  
  3 replies →