Some bits on malloc(0) in C being allowed to return NULL

Ages ago I worked with a system where malloc(0) incremented a counter and returned -1.

free(-1) decremented the counter.

This way you could check for leaks :p

Noncompliant, since `malloc(0)` is specified to return a unique pointer if it's not `NULL`.

On most platforms an implementation could just return adjacent addresses from the top half of the address space. On 32-bit platforms it doesn't take long to run out of such address space however, and you don't want to waste the space for a bitmap allocator. I suppose you could just use a counter for each 64K region or something, so you can reuse it if the right number of elements has been freed ...

Oh but no worries with compliance, it always returned a newly created -1, never repeating the same one!

My next malloc(3) is returning NAN.

> Noncompliant, since `malloc(0)` is specified to return a unique pointer if it's not `NULL`.

I know I've seen that somewhere, but may I ask what standard you're referring to?

If I recall correctly, this was an archaic stackless microcontroller. The heap support was mostly a marketing claim.

C89: https://port70.net/%7Ensz/c/c89/c89-draft.html

If the size of the space requested is zero, the behavior is implementation-defined; the value returned shall be either a null pointer or a unique pointer.

Isn’t -1 basically 0xffff which is a constant pointer? What am I missinterpreting?

If you call malloc(0) multiple times (without freeing in between) and get -1 each time, then the pointer is not unique.

Null is not a unique pointer, it's a contant like -1

It returns multiple types of null pointer

But do we need a unique pointer or merely a pointer that is disjoint from all objects?

As per the specification, it has to be a unique pointer.

Being tasked to implement a specification typically means having to pass extensive conformance tests and having to answer for instances of noncompliance. You soon learn to follow the spec to the letter, to the best of your abilities, unless you can make a strong case to your management for each specific deviation.

But the letter is non-specific. It doesn't clarify if unique refers to unique when compared to non-zero allocations, or unique when called multiple times.

The C99 standard[1] seems to have worded it more precisely:

If the size of the space requested is zero, the behavior is implementation- defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

[1]: https://rgambord.github.io/c99-doc/sections/7/20/3/index.htm...

This is embedded C where standard abuse is a thing: https://thephd.dev/conformance-should-mean-something-fputc-a...

[deleted]

(you duped your comment under the other subthread)

From C89, §7.10.3 "Memory management functions":

> If the size of the space requested is > zero, the behavior is implementation-defined; the value returned shall be either a null pointer or a > unique pointer.

The wording is different for C99 and POSIX, but I went back as far as possible (despite the poor source material; unlike later standards C89 is only accessible in scans and bad OCR, and also has catastrophic numbering differences). K&R C specifies nothing (it's often quite useless; people didn't actually write against K&R C but against the common subset of extensions of platforms they cared about), but its example implementation adds a block header without checking for 0 so it ends up doing the "unique non-NULL pointer" thing.

Presumably the ANSI C standard or one of the later editions? They also cover the standard library as well as the language. (Presumably the bit about "Each such allocation shall yield a pointer to an object disjoint from any other object." if the random C99 draft I found via google is accurate to the final standard - I suppose you might question if this special use is technically an allocation of course).

Of course, microcontrollers and the like can have somewhat eccentric implementations of languages of thing and perhaps aren't strictly compliant, and frankly even standard compliant stuff like "int can be 16 bits" might surprise some code that doesn't expect it.

Noncompliant, but what could this reasonably impact?

Pointers are frequently used as keys for map-like data structures. This introduces collisions that the programmer can't check for, whereas NULL is very often special-cased.

> Noncompliant, since `malloc(0)` is specified to return a unique pointer if it's not `NULL`.

I know I've seen that somewhere, but may I ask what standard you're referring to?

It's POSIX.

> Each [...] allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer shall be returned. If the size of the space requested is 0, the behavior is implementation-defined: either a null pointer shall be returned, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the returned pointer is used to access an object.

https://pubs.opengroup.org/onlinepubs/9799919799/functions/m...

Not just POSIX, also the ISO C standard itself. https://en.cppreference.com/w/c/memory/malloc

That doesn't say the pointer has to be unique.

cppreference isn't the standard, and while the text they write looks like it's the same verbiage that would be authoritative, it's not. (And there's some criticism of it from standards committee members in that regard).

The current C standard text says:

> The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it can be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested. It can then be used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

So yeah, the allocations are required to be unique (at least until it's free'd).

> Each such allocation shall yield a pointer to an object disjoint from any other object.

Phrasing could be slightly more clear to prevent someone from making the argument that -1 is disjoint from all objects as it does not point to an object

And if you use 0 as the value of NULL pointer, then -1 can't ever point to an object (because adding 1 to it should generate a non-NULL pointer, so that pointer comparisons are not UB).

So yeah, C implementations have to reserve at least two addresses, not just one. By the way, the standard to this day allows NULL, when cast to a pointer type, to be something else than all-bits-zero pattern (and some implementations indeed took this opportunity).

But adding 1 to a pointer will add sizeof(T) to the underlying value, so you actually need to reserve more than two addresses if you want to distinguish the "past-the-end" pointer for every object from NULL.

While it's rare to find a platform nowadays that uses something other than a zero bit pattern for NULL as normal pointer type; it's extremely common in C++ for pointer-to-member types: 0 is the first field at the start of a struct (offset 0); and NULL is instead represented with -1.

> so you actually need to reserve more than two addresses if you want to distinguish the "past-the-end" pointer for every object from NULL.

Well, yes and no. A 4-byte int can not reside at -4, but a char could be; but no object can reside at -1. So implementations need to take care that one-past-the-end addresses never equal to whatever happens to serve as nullptr but this requirement only makes address -1 completely unavailable for the C-native objects.

It is in ANSI 89, under memory management functions.

[deleted]

I might be missing something, but how does this help in checking for leaks? I mean, I guess you could use it to check for leaks specifically of 0-sized allocations, but wouldn’t it be better just to return NULL and guarantee that 0-sized allocations never use any memory at all?

> but wouldn’t it be better just to return NULL and guarantee that 0-sized allocations never use any memory at all?

This works if you are only interested in the overall memory balance. However, if you want to make sure that all malloc() calls are matched by a free() call, you need to distinguish between NULL and a successfull zero-sized allocation, otherwise you run into troubles when you call free on an "actual" NULL pointer (which the standard defines as a no-op).

At the end of main, if the count wasn't balanced, then you knew you had a mismatch between malloc()/free().

If malloc() had returned a real pointer, you'd have to free that too.

> wouldn’t it be better just to return NULL and guarantee that 0-sized allocations never use any memory at all?

Better: takes less memory Worse: blinds you to this portability issue.

> At the end of main, if the count wasn't balanced, then you knew you had a mismatch between malloc()/free().

A mismatch between malloc(0) and free(-1).

You’d know nothing about calls to malloc with non-zero sizes.

Those are identifiable by the end state of the heap not being empty.

Yeah, exactly, that’s my point. How many programs have memory leaks limited to (or even just materially affected by) 0-sized allocations? I’d have to imagine its a very small minority.

They're uncommon for sure. In the past they've been an issue for me on constrained systems where they can frag the heap as badly as any other long lived allocation.

Does this work in practice? Now you have a bunch of invalid but non-NULL pointers flying around. NULL checks which would normally prevent you from accessing invalid pointers now will pass and send you along to deref your bogus pointer.

Even hacking the compiler to treat -1 as equal to NULL as well wouldn't work since lots of software won't free NULL-like pointers.

> NULL checks which would normally prevent you from accessing invalid pointers now will pass and send you along to deref your bogus pointer.

Oddly, this is bog standard implementation specific behavior for standard C - caller accessing any result of malloc(0) is undefined behavior, and malloc(0) isn't required to return NULL - the reference heap didn't, and some probably still don't.

Ah, that's my bad. Another day, another UB :)

Like swimming in a bucket of rusty knives :)

I get the complexity of the standards issue here, but if you cared about this, wouldn't you just wrap malloc with something trivial that provided the semantic you wanted to depend on (NULL or some sentinel pointer).

I never had the use case to allocate 0 bytes of memory.

If I would allocate 0 bytes of memory and get a pointer to it, I wouldn't care what the value of the pointer is since I am not allowed to dereference it anyways.

But then again, why would I allocate 0 bytes of memory?

Sometimes it shakes out simpler for a generic container.

Ex: a vector using only a counter and pointer - you can use realloc() with fewer pointer validity checks.

Can someone tell me a usecase where you want multiple allocations of size 0, each one with a unique address, and each one unique from any other allocation (hence necessarily removing that pointer from being allocated to anything else) but can't use malloc(1) instead?

I think it would be much better if malloc(0) just returned 1 or -1 or something constant. If the programmer needs the allocation to have a unique address, they can call malloc(1) instead.

It's occasionally useful to want multiple allocations of size 0, each one with a valid address -- generic containers parsing something as a some sort of sequence object and you want all code interacting with it to do something valid. I'd be hard-pressed to see where you'd need those to be unique though. Basically any integer should be fine.

Because zero-size types exist which you might want to take the address of. Possibly as a result of macro substitution or templating mechanism that only appears in certain build configurations.

It means you don't need a bunch of special-case handling if one out of 27 types ends up with zero size in some situation. It just all works the same way. Especially the unique address part because that would be an annoying source of difficult to track bugs.

Yes. I believe zero sized types should be possible and they should all have the same address. Trying to deref the pointer is UB right away because you do not have the byte under that pointer. As it is, the malloc implementation now needs special casing for 0 sized allocations and different implementations special case it differently. C is supposed to be low level so surface this confusion up. Let the programmer decide if they want a unique address and reserve a byte or a non unique one with no overhead.

Not the best choice to begin the title with "some bits" in this context. My mind was trying to understand this sentence in a completely different way...

Why should it be allowed to return a valid pointers anyways? Surely it should always return NULL?

It's not a valid pointer because you can't use the indirection operator on it. Returning a value other than NULL makes sense because an allocation of size zero is still an allocation.

Additionally the actual amount of memory malloc allocates is implementation-defined so long as it is not less than the amount requested, but accessing this extra memory is undefined behavior since processes don't know if it exists or not. a non-NULL return could be interpreted as malloc(0) allocating more than zero bytes.

Some implementations don't actually perform the allocation until theres a pagefault from the process writing to or reading from that memory so in that sense a non-NULL return is valid too.

I'd argue that malloc(0)==NULL makes less sense because there's no distinction between failure and success.

The only real problem is specifying two alternate behaviors and declaring them both to be equally valid.

For instance, because you are prohibited from passing NULL to e.g. memcpy and lots of other library functions from memory.h/string.h, even when you explicitly specify a size of 0.

Another use was to use it to mint unique cookies/addresses, but malloc(1) works for this just as well.

Mmmmh, cookies

There are three reasonable choices: (a) return the null pointer (b) return a valid unique pointer and (c) abort().

The point of the original C Standard was to make rules about these things AND not break existing implementations. They recognized that (a) and (b) were in existing implementations and were reasonable, and they chose not to break the existing implementations when writing the standard.

This is similar to the extremely unfortunate definition of the NULL macro. There were two existing styles of implementation (bare literal 0 and (void *) 0) and the Standard allows either style. Which means the NULL macro is not entirely safe to use in portable code.

> return a valid unique pointer

A pointer to what, though? If the requester asked for 0 bytes of memory, you'd either be pointing to memory allocated for another purpose (!) or allocating a few bytes that weren't asked for.

> This makes people unhappy for various reasons

I read through all the links trying to figure out what those reasons might be and came up empty, I'm still curious why anybody would expect or rely on anything except a null pointer in this instance.

Something separate that occurred to me - many systems have empty sections of address space, those addresses can't back `malloc(1)` allocations but they could back `malloc(0)` allocations with a unique address. I doubt any C runtime out there will actually do that, but in theory it could be done.

> allocating a few bytes that weren't asked for.

FWIW the alignment guarantees of `malloc()` mean it often will have to allocate more than you ask for (before C23 anyway). You can't 'legally' use this space, but `malloc()` also can't repurpose it for other allocations because it's not suitably aligned.

That said I still agree it's a hack compared to just using `malloc(1)` for this purpose, it's well-defined and functionally equivalent if you're looking for a unique address. The fact that you don't know what `malloc(0)` is going to do makes it pretty useless anyway.

> before C23 anyway

Did they change "suitably aligned for any object type" to "suitably aligned for any object type with size less than or equal to what was requested" or something like in C23?

See https://news.ycombinator.com/item?id=44390258 .

The only requirement which seems reasonable to me, is that the address be unique. Since the allocation size is zero, it should never be accessed for read or write, but the address itself may need to be used for comparisons.

If you’re pointing to a zero sized data it shouldn’t matter what it’s pointing to. Even outside valid address space. Because you shouldn’t be reading or writing more than 0 bytes anyway.

> or allocating a few bytes that weren't asked for.

You are always allocating bytes you weren't asked for: the allocation metadata and some extra bytes to satisfy the alignment requirement. If you absolutely don't want to allocate memory, you probably shouldn't have called malloc() in the first place :)

You can copy from a zero sized pointer with memcpy, but not NULL.

That's about to change: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3322.pdf

The behavior of malloc(x) for any positive value x is to either return NULL (meaning that the system was unable to provide a new chunk of memory to use) OR to return a unique pointer to X bytes of data which the program can use.

By extension, if x == 0, doesn't it make sense for the system to either return NULL OR to return a pointer to 0 bytes of memory which the program can use? So the standard promises exactly that: to return either NULL or else a unique pointer where that the program has permission to use zero bytes starting at that pointer.

[deleted]

> Why should it be allowed to return a valid pointers anyways?

malloc(0) is allowed to return non-NULL because the standard decrees it.

One way of thinking is that all mallocated pointers must always be freed exactly once. Then you're portable.

would be interesting to see if there's a difference in how the 0-page is handled in systems under this condition...

I maintained a program which failed on, as I recall, AIX (mentioned in the essay) because malloc(0) returned NULL.

It's been 30 years so I've forgotten the details. My solution was to always allocate size+1 since memory use was far from critical.