I’m a little confused by this submission. CASTOR is the old system that has since been replaced by the CERN Tape Array since ~2020: https://cta.web.cern.ch/cta/
This is mentioned on the page but it’s easy to miss.
Does tape array replace castor? Just from the names it sounds like tape array is the actual storage, and castor is an abstraction that automatically decides what's kept on disk and what's kept on tape
The linked page seems to think it does.
"As of June 29th 2020, CTA, the CERN Tape Archive, started to be operated as the successor of CASTOR and gradually replaced it."
The abstraction isn’t really a thing any more. It was a nice idea but in practice it’s an operational nightmare not knowing if data is available and for how long it will be. For reference staging can take days during intense activity and you don’t want to loose performance randomly seeking around and switching between tapes.
As others have said, CASTOR has been discontinued, and replaced with CTA:
This is actually super useful for real world stuff. Thanks for this.
Tape is boring but when an intern / AI / tectonic plate accidently destroys your database setup it is a huge lifesaver
Anybody know what these fancy Oracle tapes are? Is it just their implementation of a regular standard?
If it's Oracle Tape, it's proprietary T10000-series 1/2in linear tape and associated drives, that they got when they absorbed Sun (and Sun got when they bought StorageTek). Multiple vendors made tape media for these, but they were not compatible w/ LTO tape nor the IBM 3590-series enterprise tape format.
There isn’t a recording but slides at linked from that page.
"Castor" was the name of a storage system used for transporting nuclear waste in Germany. There were quite a few protests against shipping nuclear waste through the country.
Wouldn't have been my choice for a software project :-)
It’s also French for Beaver which is more likely the origin of the name.
It's also Latin and Greek for beaver which is more likely the origin of the name.
Latin and Greek aren’t one of the working languages at CERN (French and English are)
also spanish
I would say "Italian" :)
Castor oil makes you poop, maybe there’s a data management metaphor in there somewhere.
A few historical additions for anybody interest:
- CASTOR at CERN had also its disk centric derivative named DPM (Disk pool manager) that helped to power the LHC computing grid for multiple decades (WLCG) before getting deprecated.
- Interestingly DPM had an architecture quite aligned with the original Google File system even if developed completely separately: (One metadata node, multiple disk node. Design to do Write-once-read-many with very partial POSIX semantics).
- The LHC computing Grid is an association of research centers with their own infrastructure. As such, they had (historically) many diffent storage systems with diffent protocols and interface.
- To unify this madness, an attempt to do a "standard" protocol was made in the 2000s: the SRM protocol (storage resources manager).
In a pure XKCD fashion, it went as bad as you can imagine.
It tried to rely on the tech of the time (XML, SOAP, WSDL) and is a school case of terrible protocol design (bloated, slow, weak consistency, massive server overhead, stupidly complex to implement and quite insecure). The spec are worth a read if you want a good laugh [1].
- After 20y of struggle, SRM was eventually dropped for a more pragmatic and ad hoc solution based on HTTP + xrootd [2]. EOS itself uses xrootd quite extensively. (if this did not change)
- The history of computing at CERN is globally interesting because it is a pretty good image of the evolution of computing and of the "tech fashions" associated with it.
The various CERN web pages such as this were a treasure trove of information when I was working on my last novel. I actually included a few paragraphs on Castor thinking of using it as a side-plot, but my editor cut the plot out along with a few other technical niceties. Sigh!
Wonder how this compares to Venti[1]. It looks a lot more complicated (not really a good thing).
You could use tape as a backing for Venti arenas; don't know if anyone ever did so. The original Bell Labs fileserver used an MO jukebox for WORM archives, which today LTFS tape is a pretty close approximation of.
They now have over an exabyte worth of data on tapes.
I’m a little confused by this submission. CASTOR is the old system that has since been replaced by the CERN Tape Array since ~2020: https://cta.web.cern.ch/cta/
This is mentioned on the page but it’s easy to miss.
For the current status of tape storage at CERN see: https://indico.cern.ch/event/1471803/contributions/6967379/a...
For reference, most disk storage for physics data uses an in-house solution called EOS: https://eos-web.web.cern.ch/eos-web/
Does tape array replace castor? Just from the names it sounds like tape array is the actual storage, and castor is an abstraction that automatically decides what's kept on disk and what's kept on tape
The linked page seems to think it does.
"As of June 29th 2020, CTA, the CERN Tape Archive, started to be operated as the successor of CASTOR and gradually replaced it."
The abstraction isn’t really a thing any more. It was a nice idea but in practice it’s an operational nightmare not knowing if data is available and for how long it will be. For reference staging can take days during intense activity and you don’t want to loose performance randomly seeking around and switching between tapes.
As others have said, CASTOR has been discontinued, and replaced with CTA:
https://gitlab.cern.ch/cta/CTA
Its memory is still alive in CTA, however:
https://gitlab.cern.ch/cta/CTA/-/blob/main/catalogue/TapeSea...
I was an intern at CERN in mid 2010s and worked on this !
Fun fact: CERN sells old data tapes as souvenirs, I got myself one of the old LHC tapes :)
With the data included (not wiped)?
looks like the image on the right is broken, but it is supposed to be: https://cta.web.cern.ch/cta/assets/images/namespace_statisti...
(looks like this submission uses https://castor.web.cern.ch/content/home.html instead of https://castor.web.cern.ch/castor/ the second link does not have the broken image)
This is actually super useful for real world stuff. Thanks for this.
Tape is boring but when an intern / AI / tectonic plate accidently destroys your database setup it is a huge lifesaver
Anybody know what these fancy Oracle tapes are? Is it just their implementation of a regular standard?
If it's Oracle Tape, it's proprietary T10000-series 1/2in linear tape and associated drives, that they got when they absorbed Sun (and Sun got when they bought StorageTek). Multiple vendors made tape media for these, but they were not compatible w/ LTO tape nor the IBM 3590-series enterprise tape format.
See this conference talk from last week: https://indico.cern.ch/event/1471803/contributions/6967379/
There isn’t a recording but slides at linked from that page.
"Castor" was the name of a storage system used for transporting nuclear waste in Germany. There were quite a few protests against shipping nuclear waste through the country.
Wouldn't have been my choice for a software project :-)
It’s also French for Beaver which is more likely the origin of the name.
It's also Latin and Greek for beaver which is more likely the origin of the name.
Latin and Greek aren’t one of the working languages at CERN (French and English are)
also spanish
I would say "Italian" :)
Castor oil makes you poop, maybe there’s a data management metaphor in there somewhere.
A few historical additions for anybody interest:
- CASTOR at CERN had also its disk centric derivative named DPM (Disk pool manager) that helped to power the LHC computing grid for multiple decades (WLCG) before getting deprecated.
- Interestingly DPM had an architecture quite aligned with the original Google File system even if developed completely separately: (One metadata node, multiple disk node. Design to do Write-once-read-many with very partial POSIX semantics).
- The LHC computing Grid is an association of research centers with their own infrastructure. As such, they had (historically) many diffent storage systems with diffent protocols and interface.
- To unify this madness, an attempt to do a "standard" protocol was made in the 2000s: the SRM protocol (storage resources manager). In a pure XKCD fashion, it went as bad as you can imagine. It tried to rely on the tech of the time (XML, SOAP, WSDL) and is a school case of terrible protocol design (bloated, slow, weak consistency, massive server overhead, stupidly complex to implement and quite insecure). The spec are worth a read if you want a good laugh [1].
- After 20y of struggle, SRM was eventually dropped for a more pragmatic and ad hoc solution based on HTTP + xrootd [2]. EOS itself uses xrootd quite extensively. (if this did not change)
- The history of computing at CERN is globally interesting because it is a pretty good image of the evolution of computing and of the "tech fashions" associated with it.
[1]: https://sdm.lbl.gov/srm-wg/doc/SRM.spec.v2.1.1.html
[2]: https://xrootd.org/
The various CERN web pages such as this were a treasure trove of information when I was working on my last novel. I actually included a few paragraphs on Castor thinking of using it as a side-plot, but my editor cut the plot out along with a few other technical niceties. Sigh!
Wonder how this compares to Venti[1]. It looks a lot more complicated (not really a good thing).
[1]: https://doc.cat-v.org/plan_9/4th_edition/papers/venti/
You could use tape as a backing for Venti arenas; don't know if anyone ever did so. The original Bell Labs fileserver used an MO jukebox for WORM archives, which today LTFS tape is a pretty close approximation of.
They now have over an exabyte worth of data on tapes.
[dead]
[dead]