The fact that different parts of the file use different endiannesses really added that special Apple tech flavour.
At this point if a file format could have rounded corners I’m sure it would too.
Claude is pretty good at turning (dis)assembly into Objective-C. i went exploring these systems looking for the secrets of glass icon rendering. i used ipsw to unpack all the class metadata in relevant system private frameworks. for each class, i extracted class header/interface, and a assembly file per method in the header. i wrote a ruby script to shell out to claude cli with a custom system prompt to give me readable-ish obj-c. It struggled with some patterns but with code as string-dispatch method-call-heavy as obj-c there’s lots of good hints for the ai.
i learned a lot about lldb debugging when i went spelunking through system service process memory. eventually i got too distracted learning about runtime introspection in Swift and obj-c and ended up building a dynamic object explorer/debugger instead of accomplishing my original goal. obj-c runtime dynamism is fascinating. it’s like, “what if we make C as dynamic as Ruby”. you can invent new classes at runtime, swap method implementations, create a new class that extends a specific existing object. you can even change what class an object is.
Swift is a lot less dynamic and a lot less introspectable at runtime :-(
(there is a swift reflection api called Mirror but i struggled to do anything interesting with it)
This is cool work. However, the author claims the following:
> This knowledge could be useful for security research and building developer tools that does not rely on Xcode or Apple’s proprietary tools.
Yes it could be. But if you developed it for such altruistic purposes, why tease the code?
> I’m considering open-sourcing these tools, but no promises yet!
Maybe OOP is thinking of selling their reverse engineering tools? Seems like that’s still a proprietary tool, I’m just paying someone else for it
I'm not sure it's about money. This maybe be increasingly hard to imagine in this age of AI-slop, but some devs actually don't want to publish code that is a terribly embarrassing mess, and prefer to clean it up first.
Looks very much like a format that should just have been gzipped JSON.
Don't use binary formats when it isn't absolutely needed.
It's not new. BOMStore is a format inherited from NeXTStep. JSON didn't exist back then.
Also, it's a format designed to hold binary data. JSON can't do that without hacks like base64 encoding.
Binary file stores like this are very common in highly optimized software, which operating systems tend to be, especially if you go looking at the older parts. Windows has a similar format embedded in EXE/DLL files. Same concept: a kind of pseudo-filesystem used to hold app icons and other resources.
>Looks very much like a format that should just have been gzipped JSON.
For application file formats that require storing binary blob data such as images, bitmaps, etc , many in the industry have settled on "SQLite db as a file format": (https://www.sqlite.org/appfileformat.html)
Examples include Mozilla using sqlite db for favicons, Apple iOS using sqlite to store camera photos, Kodi media player uses sqlite for binary data, Microsoft Visual C++ IDE stores source code browsing data in sqlite, etc.
Sqlite db would usually be a better choice rather than binary blobs encoded as Base64 and being stuffed into json.gzip files. One of the areas where the less efficient gzipped JSON might be better than Sqlite is web-server-to-web-browser data transfers because the client's Javascript engine has builtin gzip decompress and JSON manipulation functions.
Every format is binary in the end, you are just swapping out the separators.
I personally subscribe to the Unix philosophy. There are really two options, binary or plain text (readable binary due to a agreed standard formatting). All other formats are a variation of the two.
Additional a binary format suits makes sense when the format is to be used for mobile devices which may not have much storage or bandwidth.
Strong disagree. I like binary formats because I can just fopen(), fseek() and fread() stuff. I don't need json parser dependency, and don't need to deal with compression. Binary formats are simple, fast and I need a way smaller buffer to read/write them normally. I don't like wasting resources.
Binary formats are painful to deal with from user perspective
Sane for 3rd party devs
[deleted]
I choose binary formats over JSON almost every time I can. JSON sucks big time.
Uh. You want to store assets in JSON? Why? You generally want asset packs to be seekable so that you can extract just one asset, and why would you want to add the overhead of parsing potentially gigabytes of JSON and then, per asset, decoding potentially tens of megabytes of base64?
> You want to store assets in JSON? Why?
Why not have both options? .gltf and .glb being possible for assets been more than helpful to me more than once, having the option gives you the best of both worlds :)
What's Apple's incentive for having two different asset pack formats? It seems like more work to support parsing and generating both, and Apple expects you to use their tools to generate asset packs.
Working with binary files really isn't that hard. If Apple documented the .car format, writing your own parser wouldn't be difficult. It's not like it's a complicated format. Still, Apple clearly doesn't intend for people to make their own .car generators; to them, ease of reverse engineering is a bug, not a feature.
[deleted]
Keep all the meta info in JSON and then the big binary files in a zip file. Much easier to parse.
Easier for the developer or easier for the computer?
Computers need to do it a bunch for every program launch for every single user of macOS for decades. The developer just needed to write a generator and a parser for the format once.
Would it have been a bit easier to write a parser for a format that's based around a zip file with a manifest.json? I don't know, maybe. You'd end up with some pretty huge dependencies (a zip file reader, a JSON parser), but maybe it'd be slightly easier. Is it worth it?
Someday JSON will be out of fashion - like XML is now.
... and that is why all 'modern' software is incredibly memory and CPU intensive...
But when things go wrong, you can usually find some random json file and adjust it :)
[flagged]
> _QWORD *__fastcall
Is that WinDOS shit?
Anyway, compiling to WASM is smart. Apple can't kill your tools if they're not on the app store. And you don't have to pay Apple tax for giving access to a free tool. Cool project!
Idea: pass the decompiled code through a "please rename variables according to their purpose" step using a coding agent. Not ideal, but arguably better than v03, v20. And almost zero effort at this time and age.
And have it hallucinate stuff? Nah, this stuff is hard enough without LLMs guessing.
Well, I mean just choosing better names, don't touch the actual code. and you can also add a basic human filtering step if you want. You cannot possible say that "v12" is better than "header.size". I would argue that even hallucinated names are good: you should be able to think "but this position variable is not quite correctly updated, maybe this is not the position", which seems better than "this v12 variable is updated in some complicated way which I will ignore because it has no meaning".
It's a labeling task with benign failure modes, much better suited for an LLM compared to generation
i think for obj-c specifically (can’t speak to other langs) i’ve had a great experience. it does make little mistakes but ai oriented approach makes it faster/easier to find areas of interest to analyze or experiment with.
obj-c sendmsg use makes it more similar to understanding minified JS than decompiling static c because it literally calls many methods by string name.
The fact that different parts of the file use different endiannesses really added that special Apple tech flavour.
At this point if a file format could have rounded corners I’m sure it would too.
Claude is pretty good at turning (dis)assembly into Objective-C. i went exploring these systems looking for the secrets of glass icon rendering. i used ipsw to unpack all the class metadata in relevant system private frameworks. for each class, i extracted class header/interface, and a assembly file per method in the header. i wrote a ruby script to shell out to claude cli with a custom system prompt to give me readable-ish obj-c. It struggled with some patterns but with code as string-dispatch method-call-heavy as obj-c there’s lots of good hints for the ai.
i learned a lot about lldb debugging when i went spelunking through system service process memory. eventually i got too distracted learning about runtime introspection in Swift and obj-c and ended up building a dynamic object explorer/debugger instead of accomplishing my original goal. obj-c runtime dynamism is fascinating. it’s like, “what if we make C as dynamic as Ruby”. you can invent new classes at runtime, swap method implementations, create a new class that extends a specific existing object. you can even change what class an object is.
Swift is a lot less dynamic and a lot less introspectable at runtime :-( (there is a swift reflection api called Mirror but i struggled to do anything interesting with it)
This is cool work. However, the author claims the following:
> This knowledge could be useful for security research and building developer tools that does not rely on Xcode or Apple’s proprietary tools.
Yes it could be. But if you developed it for such altruistic purposes, why tease the code?
> I’m considering open-sourcing these tools, but no promises yet!
Maybe OOP is thinking of selling their reverse engineering tools? Seems like that’s still a proprietary tool, I’m just paying someone else for it
I'm not sure it's about money. This maybe be increasingly hard to imagine in this age of AI-slop, but some devs actually don't want to publish code that is a terribly embarrassing mess, and prefer to clean it up first.
Looks very much like a format that should just have been gzipped JSON.
Don't use binary formats when it isn't absolutely needed.
It's not new. BOMStore is a format inherited from NeXTStep. JSON didn't exist back then.
Also, it's a format designed to hold binary data. JSON can't do that without hacks like base64 encoding.
Binary file stores like this are very common in highly optimized software, which operating systems tend to be, especially if you go looking at the older parts. Windows has a similar format embedded in EXE/DLL files. Same concept: a kind of pseudo-filesystem used to hold app icons and other resources.
>Looks very much like a format that should just have been gzipped JSON.
For application file formats that require storing binary blob data such as images, bitmaps, etc , many in the industry have settled on "SQLite db as a file format": (https://www.sqlite.org/appfileformat.html)
Examples include Mozilla using sqlite db for favicons, Apple iOS using sqlite to store camera photos, Kodi media player uses sqlite for binary data, Microsoft Visual C++ IDE stores source code browsing data in sqlite, etc.
Sqlite db would usually be a better choice rather than binary blobs encoded as Base64 and being stuffed into json.gzip files. One of the areas where the less efficient gzipped JSON might be better than Sqlite is web-server-to-web-browser data transfers because the client's Javascript engine has builtin gzip decompress and JSON manipulation functions.
Every format is binary in the end, you are just swapping out the separators.
I personally subscribe to the Unix philosophy. There are really two options, binary or plain text (readable binary due to a agreed standard formatting). All other formats are a variation of the two.
Additional a binary format suits makes sense when the format is to be used for mobile devices which may not have much storage or bandwidth.
Strong disagree. I like binary formats because I can just fopen(), fseek() and fread() stuff. I don't need json parser dependency, and don't need to deal with compression. Binary formats are simple, fast and I need a way smaller buffer to read/write them normally. I don't like wasting resources.
Binary formats are painful to deal with from user perspective
Sane for 3rd party devs
I choose binary formats over JSON almost every time I can. JSON sucks big time.
Uh. You want to store assets in JSON? Why? You generally want asset packs to be seekable so that you can extract just one asset, and why would you want to add the overhead of parsing potentially gigabytes of JSON and then, per asset, decoding potentially tens of megabytes of base64?
> You want to store assets in JSON? Why?
Why not have both options? .gltf and .glb being possible for assets been more than helpful to me more than once, having the option gives you the best of both worlds :)
What's Apple's incentive for having two different asset pack formats? It seems like more work to support parsing and generating both, and Apple expects you to use their tools to generate asset packs.
Working with binary files really isn't that hard. If Apple documented the .car format, writing your own parser wouldn't be difficult. It's not like it's a complicated format. Still, Apple clearly doesn't intend for people to make their own .car generators; to them, ease of reverse engineering is a bug, not a feature.
Keep all the meta info in JSON and then the big binary files in a zip file. Much easier to parse.
Easier for the developer or easier for the computer?
Computers need to do it a bunch for every program launch for every single user of macOS for decades. The developer just needed to write a generator and a parser for the format once.
Would it have been a bit easier to write a parser for a format that's based around a zip file with a manifest.json? I don't know, maybe. You'd end up with some pretty huge dependencies (a zip file reader, a JSON parser), but maybe it'd be slightly easier. Is it worth it?
Someday JSON will be out of fashion - like XML is now.
... and that is why all 'modern' software is incredibly memory and CPU intensive...
But when things go wrong, you can usually find some random json file and adjust it :)
[flagged]
> _QWORD *__fastcall
Is that WinDOS shit?
Anyway, compiling to WASM is smart. Apple can't kill your tools if they're not on the app store. And you don't have to pay Apple tax for giving access to a free tool. Cool project!
Idea: pass the decompiled code through a "please rename variables according to their purpose" step using a coding agent. Not ideal, but arguably better than v03, v20. And almost zero effort at this time and age.
And have it hallucinate stuff? Nah, this stuff is hard enough without LLMs guessing.
Well, I mean just choosing better names, don't touch the actual code. and you can also add a basic human filtering step if you want. You cannot possible say that "v12" is better than "header.size". I would argue that even hallucinated names are good: you should be able to think "but this position variable is not quite correctly updated, maybe this is not the position", which seems better than "this v12 variable is updated in some complicated way which I will ignore because it has no meaning".
It's a labeling task with benign failure modes, much better suited for an LLM compared to generation
i think for obj-c specifically (can’t speak to other langs) i’ve had a great experience. it does make little mistakes but ai oriented approach makes it faster/easier to find areas of interest to analyze or experiment with.
obj-c sendmsg use makes it more similar to understanding minified JS than decompiling static c because it literally calls many methods by string name.