I strongly recommend watching/reading the entire report, or the summary by Sal Mercogliano of What's Going On In Shipping [0].
Yes, the loose wire was the immediate cause, but there was far more going wrong here. For example:
- The transformer switchover was set to manual rather than automatic, so it didn't automatically fail over to the backup transformer.
- The crew did not routinely train transformer switchover procedures.
- The two generators were both using a single non-redundant fuel pump (which was never intended to supply fuel to the generators!), which did not automatically restart after power was restored.
- The main engine automatically shut down when the primary coolant pump lost power, rather than using an emergency water supply or letting it overheat.
- The backup generator did not come online in time.
It's a classic Swiss Cheese model. A lot of things had to go wrong for this accident to happen. Focusing on that one wire isn't going to solve all the other issues. Wires, just like all other parts, will occasionally fail. One wire failure should never have caused an incident of this magnitude. Sure, there should probably be slightly better procedures for checking the wiring, but next time it'll be a failed sensor, actuator, or controller board.
If we don't focus on providing and ensuring a defense-in-depth, we will sooner or later see another incident like this.
The problem is that there are a thousand merchant marine vessels operating right now that are all doing great - until the next loose wire. The problem is that nobody knows about that wire and it worked fine on the last trip. The other systems are all just as marginal as they were on the 'Dali' but that one shitty little wire is masking that.
Running a 'tight ship' is great when you have a budget to burn on excellent quality crew. But shipping is so incredibly cut-throat that the crew members make very little money, are effectively modern slaves and tend to carry responsibilities way above their pay grade. They did what they could, and more than that, and for their efforts they were rewarded with what effectively amounted to house arrest while the authorities did their thing. The NTSB of course will focus on the 'hard' causes. But you can see a lot of frustration shine through towards the owners who even in light of the preliminary findings had changed absolutely nothing on the rest of their fleet.
The recommendation to inspect the whole ship with an IR camera had me laughing out loud. We're talking about a couple of kilometers of poorly accessible duct work and cabinets. You can do that while in port, but while you're in port most systems are idle or near idle and so you won't ever find an issue like this until you are underway, when vibration goes up and power consumption shoots up compared to being in port.
There is no shipping company that is effectively going to do a sea trial after every minor repair, usually there is a technician from some supplier that boards the vessel (often while it is underway), makes some fix and then goes off-board again. Vessels that are not moving are money sinks so the goal is to keep turnaround time in port to an absolute minimum.
What should really amaze you is how few of these incidents there are. In spite of this being a regulated industry it is first and foremost an oversight failure, if the regulators would have more budget and more manpower there maybe would be a stronger drive to get things technically in good order (resist temptation: 'shipshape').
> But you can see a lot of frustration shine through towards the owners who even in light of the preliminary findings had changed absolutely nothing on the rest of their fleet.
Between making money, perceived culpability and risks offloaded to insurance companies why would they?
> The problem is that there are a thousand merchant marine vessels operating right now that are all doing great
Are they tho?
I generally think you have good takes on things, but this comes across like systemic fatalistic excuse making.
> The recommendation to inspect the whole ship with an IR camera had me laughing out loud.
Where did this come from? What about the full recommendations from the NTSB. This comment makes it seem like you are calling into question the whole of the NTSB's findings.
"Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
> Between making money, perceived culpability and risks offloaded to insurance companies why would they?
Because it is the right thing to do, and the NTSB thinks so too.
>> The problem is that there are a thousand merchant marine vessels operating right now that are all doing great
> Are they tho?
In the sense that they haven't caused an accident yet, yes. But they are accidents waiting to happen and the owners simply don't care. It usually takes a couple of regulatory interventions for such a message to sink in, what the NTSB is getting at there is that they would expect the owners to respond more seriously to these findings.
>> The recommendation to inspect the whole ship with an IR camera had me laughing out loud.
> Where did this come from?
Page 58 of the report.
And no, obviously I am not calling into question the whole of the NTSB's findings, it is just that that particular one seems to miss a lot of the realities involving these vessels.
> "Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
I don't understand your goal with this statement, it wasn't mine so the quotes are not appropriate and besides I don't agree with it.
Loose wires are a fact of life. The amount of theoretical redundancy is sufficient to handle a loose wire, but the level of oversight and the combination of ad-hoc work on these vessels (usually under great time pressure) together are what caused this. And I think that NTSB should have pointed the finger at those responsible for that oversight as well, which is 'MARAD', however, MARAD does not even rate a mention in the report.
>> "Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
> I don't understand your goal with this statement, it wasn't mine so the quotes are not appropriate and besides I don't agree with it.
fwiw, your first comment left with me the exact same impression as it did sitkack.
Oh, there are plenty of villains here. But they're in offices and wearing ties.
And they should be smacked down hard, but that isn't going to happen because then - inevitably - the role of the regulators would come under scrutiny as well. That is the main issue here. The NTSB did a fantastic job - as they always do - at finding the cause, it never ceases to amaze me how good these people are at finding the technical root cause of accidents. But the bureaucratic issues are the real root cause here: an industry that is running on wafer thing margins with ships that probably should not be out there, risking peoples lives for a miserly wage.
Regulators should step in and level the playing field. Yes, that will cause prices of shipping to rise. But if you really want to solve this that is where I think they should start and I am not at all saying that the system is too powerful to change, just that for some reason they seem to refuse to even name it, let alone force it to change.
Fwiw and since you received several comments about it, your first comment did not come off to everyone as making excuses. It was pretty clear you were trying to turn peoples attention to the real problem.
There was also no fatalistic tone about the system being too powerful to change. Just clear sharing of observations IMO.
It is not unusual to receive this reaction (being blamed for fatalism and making excuses) from observations like these, I have noticed.
I suspect a lot of people commenting in this thread have never been on one of these ships or have any idea of what the typical state of maintenance is, and how inaccessible the tech compartments are when the vessel is underway. This isn't exactly a server room environment. When vessels are new (in the first five years or so) and under the first owners they are usually tip-top. Then, after the first sale the rot sets in and unless there is a major overhaul you will see a lot of issues like these, usually they do not have such terrible consequences. They tend to last for 25 years or so (barring mishaps) and by then the number of repairs will be in the 100's and the vessel has changed hands a couple of times.
Passenger carrying vessels are better, but even there you can come across some pretty weird stuff.
I agree with all of this and everything you've said thus far. I hope my prior comment was not interpreted as some sort of indictment or attack on your motives.
Your original comment comes of like excuse making and that nothing can possibly be done.
> > Between making money, perceived culpability and risks offloaded to insurance companies why would they?
> Because it is the right thing to do, and the NTSB thinks so too.
Doing great is much different than, "accidents waiting to happen".
I don't understand the goal of your changing rhetoric.
[deleted]
You can also look at the problem from the perspective of the bridge. Why was it possible that a ship took it down? Motors can fail ...
Yes, but if you think of a ship once underway when the engine fails as an unguided ballistic missile with a mass that is absolutely mind boggling (the Dali masses 100,000 tonnes) there isn't much that you could build that would stop it. The best suggestion I've seen is to let the ship run aground but that ignores the situation around the area where the accident happened.
This ship wasn't towed by a tug, it was underway under its own power and in order for the ship to have any control authority at all it needs water flowing over the rudder.
Without that forward speed you're next to helpless and these things don't exactly turn on a dime. So even if there had been a place where it could have run aground it would never have been able to reach it because it was still in the water directly in front of the passage way under the bridge.
100,000 tonnes doing 7 Kph is a tremendous amount of kinetic energy.
The exact moment the systems aboard the Dali failed could not have come at a worse time, it had - as far as I'm aware of the whole saga - just made a slight course correction to better line up with the bridge and the helm had not yet been brought back to neutral. After that it was just Newton taking over, without control I don't think there is much that would have stopped it.
This is a good plot of the trajectory of the vessel from the moment it went under way until the moment it impacted the bridge:
You can clearly see the kink in the trajectory a few hundred meters before it hit the bridge.
Perhaps, but you can also build redundancy into the bridge.
You can, if you're prepared to pay for it. You could halt shipping while people are working on the bridge. You could make tunnels instead of bridges.
The question is simple: who will pay for it? Apparently we are ok with this kind of risk, if we weren't we would not be doing this at all.
There is a similar thing going on in my country with respect to railway crossings. Every year people die on railway crossings. But it took for a carriage full of toddlers to be hit by a train before the sentiment switched from 'well, they had it coming' to 'hm, maybe we should do something about this'. People don't like to pay for risks they see as small or that they perceive as that they're never going to affect them.
This never was about technology, it always was about financing. Financing for proper regulatory tech oversight (which is vastly understaffed) on the merchant marine fleet, funding for better infrastructure, funding for (mandatory) tug assistance for vessels of this size near sensitive structures, funding for better educated and more capable crew and so on. The loose wire is just a consequence of a whole raft of failures that have nothing to do with a label shroud preventing a wire from making proper contact.
The 'root cause' here isn't really the true root cause, it is just the point at which technology begins and administration ends.
Too build "a redundancy into the bridge" to survive such a overwhelming force would be a very expensive endeavor.
Better to spend the effort in fleet education
It’s not realistically plausible to build bridges that won’t be brought down by that size of ship
This, 100%. I forget the specific numbers but regardless, the kinetic energy of a thing with that much mass, even moving at a very slow speed, is off the charts. Designing a bridge or protections for a bridge to survive that would at a minimum be cost prohibitive, if even possible with today’s materials and construction technologies.
> The NTSB found that the Key Bridge, which collapsed after being struck by the containership Dali on March 26, 2024, was almost 30 times above the acceptable risk threshold for critical or essential bridges, according to guidance established by the American Association of State Highway and Transportation Officials, or AASHTO.
> Over the last year, the NTSB identified 68 bridges that were designed before the AASHTO guidance was established — like the Key Bridge — that do not have a current vulnerability assessment. The recommendations are issued to bridge owners to calculate the annual frequency of collapse for their bridges using AASHTO’s Method II calculation.
This is essentially the same thing that happened with Fukushima Daiichi. The organization running it failed to respond to new information.
[deleted]
Energy doesn't mean squat without a time component over which it's dissipated.
Stopping a car normally vs crashing a car. Skydiving with a parachute vs skydiving without a parachute.
For something like ship vs bridge you have to account for the crunch factor. USS Iowa going the same speed probably would've hit way harder despite having ~1/3 the tonnage.
Nah, we definitely can.
Plan the bridge so any ship big enough to hurt it grounds before it gets that close. Don't put pilings in the channel. It's just money. But it's a lot of money so sometimes it's better to just have shipping not suck.
Alternatively, the Chunnel will almost certainly never get hit with a ship.
> Plan the bridge so any ship big enough to hurt it grounds before it gets that close.
Have a look at the trajectory chart that I posted upthread and tell me how in this particular case you would have arranged that.
Yet another idea: if a ship's motors fail, have a ship ready that can push it in the right direction, in time. Probably need 2x the amount of horsepowers to make up for lost time, but it's not impossible.
Yes, that's called a tug and in plenty of harbors a vessel of this size would not be permitted to do close quarters maneuvers without the mandatory assistance of one, or in this case more likely two, tug boats of a certain minimum size relative to the size of the vessel.
It's a tangent but I don't understand why the dock workers can unionize and earn livable wages but the crew cannot.
The dock can’t move to a jurisdiction that is less union friendly.
The crew could go on strike and cause a comparable amount of disruption to the supply chain as dock workers.
Sometimes when I see vocal but rather uninformed opposition to the Jones Act, I wonder if it isn't partially an aim at union busting.
IIRC the biggest issue with the jones act is the supply of US-built vessels, which are expensive, the current fleet is aging and, outside of defence, there's no real domestic shipbuilding industry anymore. This also means that domestic shipping (especially to populated areas that are not part of the lower 48) can't use anywhere near as much in the way of modern containers. This is anecdotal, but I've heard that people in Puerto Rico and Hawaii routinely order stuff from foreign countries as even with duties, it can be cheaper than ordering it from the mainland.
The act is problematic because it hasn't really been modernized, with the handful of revisions essentially just expanding its scope. The US either has to seriously figure out getting domestic shipbuilding going again (to the point where it can be economical to also export them) or at least whitelist foreign countries (eg South Korea) to allow their ships to be used. But that's unlikely in today's political climate.
The US government used to provide differential subsidies for cargos shipped on US flagged ships. This ensured that US shipping was competitive with the bottom dollar global shippers, at least for some cargos.
This ended under Reagan.
At first lots of people didn't care because Reagan was also doing his 600 ship navy so everyone was busy doing navy work, but after that ended the MM and american shipbuilding entered a death spiral.
Now the only work US flagged vessels can get is supporting the navy, and a tiny sliver of jones act trade. This means there are no economies of scale. If a ship is built, one is built to that class not 10. Orders are highly intermittent and there is no ability to build up a skilled workforce in efficient serial production. On the seagoing side, ships either get run ragged on aggressive schedules (ex: El Faro) or they sit in layup for long stretches rusting away.
If the US wants to fix its merchant marine it needs to provide incentive for increased cargos and increased shipbuilding. As Sal points out, the US is the second-biggest shipowning country in the world. US business like owning ships, they just don't want to fly the American flag because their incentives are towards offshoring.
The incentives are also all over the place. The shipping industry uses a lot of labour from "poor" countries, but on bulk shipping the labour costs are often a rounding error. The main issue is, of course, working conditions. Americans don't want to sit on a freighter for 6 month tours away from their families. The US navy has a hard enough problem doing it for people in their early 20s, and even then that's usually to get access to education funding. People from the Philippines will do it because it is life-changing amounts of money and the alternative is abject poverty.
> Americans don't want to sit on a freighter for 6 month tours away from their families
And yet finding crews was never a problem before differential subsidies ended.
In fact crewing US flagged is harder now because the work is intermittent. If people can't find berths they time out on their licenses and go do something else in a different industry.
> People from the Philippines will do it because it is life-changing amounts of money
The international minimum wage for seafarers is about $700/mo. In comparison wages in the Philippines are between 20k-50k pesos a month or $340-$850. Seafaring is an above-average income job in the Philippines but not "life-changing."
IIUC, the only issue with the jones act requiring US-built vessels is that previous the US Navy used to buy US-built vessels and lease them out below-cost and don't do that anymore. It was never economically to use US-built vessels but we've stoped subsidizing it anymore.
The US navy did not. The US Treasury used to provide "differential subsidies" to allow US flagged vessels the ability to win cargos in international trade versus non-us flagged vessels with lower operating costs.
Thanks for the summary for those of us who can't watch video right now.
There are so many layers of failures that it makes you wonder how many other operations on those ships are only working because those fallbacks, automatic switchovers, emergency supplies, and backup systems save the day. We only see the results when all of them fail and the failure happens to result in some external problem that means we all notice.
It seems to just be standard "normalization of deviance" to use the language of safety engineering. You have 5 layers of fallbacks, so over time skipping any of the middle layers doesn't really have anything fail. So in time you end up with a true safety factor equal only to the last layer. Then that fails and looking back "everything had to go wrong".
As Sidney Dekker (of Understanding Human Error fame) says: Murphy's Law is wrong - everything that can go wrong will go right. The problem arises from the operators all assuming that it will keep going right.
I remember reading somewhere that part of Qantas's safety record came from the fact that at one time they had the highest number of minor issues. In some sense, you want your error detection curve to be smooth: as you get closer to catastrophe, your warnings should get more severe. On this ship, it appeared everything was A-OK till it bonked a bridge.
This is the most pertinent thing to learn from these NTSB crash investigations - it's not what went wrong at the final disaster, but all the things that went wrong that didn't detect that they were down to one layer of defense.
Your car engaging auto brake to prevent a collision shouldn't be a "whew, glad that didn't happen" and more a "oh shit, I need to work on paying attention more."
I had to disable the auto-brake from RCT[1] sensors because of too many false-positives (like 3 a week) in my car.
1: rear-cross-traffic i.e. when backing up and cars are coming from the side.
One of my car's auto-brake sensors triggers when I back up out of by drive way. I can not back out of my drive way with the sensor on.
Yes and having 3 O-rings doesn't mean you can have one frozen solid "just this time"
Why then does the NTSB point blame so much at the single wiring issue? Shouldn't they have the context to point to the 5 things that went wrong in the Swiss cheese and not pat themselves on the back with having found the almost-irrelevant detail of
> Our investigators routinely accomplish the impossible, and this investigation is no different...Finding this single wire was like hunting for a loose rivet on the Eiffel Tower.
In the software world, if I had an application that failed when a single DNS query failed, I wouldn't be pointing the blame at DNS and conducting a deep dive into why this particular query timed out. I'd be asking why a single failure was capable of taking down the app for hundreds or thousands of other users.
That seems like a difference between the report and the press release. I'm sure it doesn't help that the current administration likes quick, pat answers.
The YouTube animation they published notes that this also wasn't just one wire - they found many wires on the ship that were terminated and labeled in the same (incorrect) way, which points to an error at the ship builder and potentially a lack of adequate documentation or training materials from the equipment manufacturer, which is why WAGO received mention and notice.
> I'm sure it doesn't help that the current administration likes quick, pat answers.
Oh, the wire was blue?
In all seriousness, listing just the triggering event in the headline isn't that far out of line. Like the Titanic hit an iceburg, but it was also traveling faster than it should in spite of iceberg warnings, and it did so overloaded and without adequate lifeboats, and it turns out there were design flaws in the hull. But the iceberg still gets first billing.
I think it reads as too cute by a half. The wire was just the one of dozens of problems that happened last. It's natural to attribute cause in that way, but it's not really helpful in communicating the purpose of these investigations.
If this represents a change in style and/or substance of these kinds of press releases, my hunch would be that the position was previously hired for technical writers but was most recently filled by PR.
It’s also immediately actionable and other similar ships can investigate their wires
The faulty wire is the root cause. If it didn't trigger the sequence of events, all of the other things wouldn't have happened. And it's kind of a tricky thing to find, so that's an exciting find.
The flushing pump not restarting when power resumed did also cause a blackout in port the day before the incident. But you know, looking into why you always have two blackouts when you have one is something anybody could do; open the main system breaker, let the crew restore it and that flushing pump will likely fail in the same way every time... but figuring out why and how the breaker opened is neat, when it's not something obvious.
Operators always like to just clear the fault and move on they have extremely high pressure to make schedule and low incentive to work safely
The solution then is observability, to use the computing term; to know the state of every part of the system.
Oh, it gets even worse!
The NTSB also had some comments on the ship's equivalent of a black box. Turns out it was impossible to download the data while it was still inside the ship, the manufacturer's software was awful and the various agencies had a group chat to share 3rd party software(!), the software exported thousands of separate files, audio tracks were mixed to the point of being nearly unusable, and the black box stopped recording some metrics after power loss "because it wasn't required to" - despite the data still being available.
At least they didn't have anything negative to say about the crew: they reacted timely and adequately - they just didn't stand a chance.
It’s pretty common for black boxes to be load shed during an emergency. Kind of funny how that was allowed for a long time.
"they reacted timely and adequately" and yet: they're indefinitely restricted (detained isn't the right word, but you get it) to Baltimore, while the ship is free to resume service.
One of the things Sal Mercogliano stressed is that the crew (and possibly other crews of the same line) modified systems in order to save time.
Rather than doing the process of purging high-sulphur fuel that can't be used in USA waters, they had it set so that some of the generators were fed from USA-approved fuel, resulting in redundancy & automatic failover being compromised.
It seems probable that the wire failure would not have caused catastrophic overall loss of power if the generators had been in the normal configuration.
Also the zeroth failure mode: someone built a bridge that will collapse if any of the many many large ships that sail beneath it can't steer itself with high precision.
Ships were a lot smaller when the bridge was designed and built.
In 1971 there where ships with almost twice the displacement of the Dali.
They weren't freight ships destined for Baltimore, but it wasn't hard to imagine future freight ship sizes when designing the bridge in the early 1970s.
The London sewer system was designed in the 1850s, when the population was around two million people.
It was so overdesigned that it held up to the 1950s, when the population was over 8 million. It didn't start to become a big problem until the 1990s.
Right? There's an artificial island in that very harbor, which could be rammed by similar ships all day and give nary a fuck. It's called Fort Carroll and it was built in the *1850s*.
Why the bridge piers weren't set into artificial islands, I can't fathom. Sure. Let's build a bridge across a busy port but not make it ship-proof. The bridge was built in the 1970s, had they forgotten how to make artificial islands?
If you design a fort and it actually gets used and turns out to suck that WILL be the end of your career in the military even if it only comes out as sucking 20yr later unless you have an airtight case why it's not your fault. That's just how the .mil works. Heads MUST roll. This is completely the opposite from big company bureaucracy and on a literal different planet than civil government bureaucracy.
The organizations that made the bridge happen were so much more vast and so much higher turnover and subject to way, way, way looser application of consequences than the one that built the fort it would be literally impossible to get them to produce something so unnecessarily robust for the average use case.
This sort of "everything I depend on will just have to not suck because my shit will keel right over if it sucks in the slightest" type engineering is all over the modern world and does work well in a lot of places when you consider lifetime cost. But when it fails bridges fall over and cloudflare (disclaimer, didn't actually read that PM, have no idea what happened) goes down or whatever.
> on a literal different planet than civil government
Unless the military was relocated to Mars (or at least the Moon) during the shutdown, I think the word is "metaphorically" instead of "literal".
Or it was just a different plane ...
[dead]
The fuel pump not automatically restarting on power loss may actually have been an intentional safety feature to prevent scenarios like pumping fuel into a fire in or around the generators. Still part of the Swiss cheese model, of course.
It wasn't. They were feeding generators 1 & 2 with the pump intended for flushing the lines while switching between different fuel types.
The regular fuel pumps were set up to automatically restart, which is why a set of them came online to feed generator 3 (which automatically spinned up after 1 & 2 failed, and wasn't tied to the fuel-line-flushing pump) after the second blackout.
I have found that 99% of all network problems are bad wires.
I remember that the IT guys at my old company, used to immediately throw out every ethernet cable, and replace them with ones right out of the bag; first thing.
But these ships tend to be houses of cards. They are not taken care of properly, and run on a shoestring budget. Many of them look like floating wrecks.
If I see a RJ45 plug with a broken locking thingie, or bare wires (not just bare copper - any internal wire), I chop the plug off.
If I come across a CATx (solid core) cable being used as a really long patch lead then I lose my shit or perhaps get a backbox and face plate and modules out along with a POST tool.
I don't look after floating fires.
Chopping the plug is a very good idea, everybody should practice that.
I once had a recurring problem with patch cables between workstations and drops going bad, four or five in one area that had never had that failure rate before. Turns out, every time I replaced one somebody else would grab the "perfectly good" patch cable from the trash can beside my desk. God knows why people felt compelled to do that when they already had perfectly good wires, maybe they thought because it was a different colour it would make their PC faster... So, now every time I throw out a cable that I know to be defective, I always pop the ends off. No more "mystery" problems.
I'd be so tempted to find a source for shiny-looking Cat 3 (10Mbit/sec) patch cables, and start seeding my trash can with those...
> RJ45 plug with a broken locking thingi
You can get replacement clips for those for a quick repair.
Then you kill the visual signal; that the cable might have been yanked and potentially loose.
I recently had a home network outage. The last thing I tested was the in-wall wiring because I just didn't think that would be the cause. It was. Wiring fails!
I remember a customer support call where the hardware they bought from us wasn't working. The last question I asked was "are you sure that the outlet it's plugged into is working?"
Oh yeah had outages recently. Turned out to be corroded connector to box in the street. Not a wire per-se but close.
If I had a nickle for every time someone clobbered some critical connectivity with an ill-advised switch configuration I wouldn't have to work for a living.
And the physical layer issues I do see are related to ham fisted people doing unrelated work in the cage.
Actual failures are pretty damn rare.
The ship was 10 years old, not some WW2 hulk.
That's true for almost all electronics. I worked on robotic arms for a few years - if things broke it was always the wiring (well, to be precise - the connectors).
Like you said (and illustrated well in the book) it's never just 1 thing, these incidents happen when multiple systems interact and often reflect a the disinvestment in comprehensive safety schemes.
Shipping, accidents and timeless classics.
I was sure you were going to link to Clarke and Dawe, The Front Fell off.
So much complexity, plenty of redundancy, but not enough adherence to important rules.
ive been in an environment like that.
"nuisance" issues like that are deferred bcz they are not really causing a problem, so maintenance spends time on problems with things that make money, rather than what some consider spit n polish on things that have no prior failures.
Tragically, it's the same with modern software development and the growth of technical debt.
All you said is true - but these investigations are often used for the purpose of determining financial liability and often that comes down to figuring out that one, immediate, proximate thing that caused the accident.
A whole bunch of things might have gone wrong, but if only you hadn't done/not-done that one thing, we'd all be fine. So it's all your fault!
Respectfully, have you ever actually read an NTSB report? They're incredibly thorough and consider both causes and contributing factors through a number of lenses with an exclusive focus on preventing accidents from occurring.
Also, they're basically inadmissible in court [49 U.S.C.§1154(b)] so are useless for determining financial liability.
[deleted]
Just insane how much criminal negligence went on. Even boeing hardly comes close. What needs to change is obviously a major review of how ships are allowed to operate near bridges and other infrastructure. And far stricter safety standards like aircraft face.
Hopefully the lesson from this will be received by operators: it's way cheaper to invest in personnel, training, and maintenance than to let the shit hit the fan.
From your article - this answered a question I had:
> The settlement does not include any damages for the reconstruction of the Francis Scott Key Bridge. The State of Maryland built, owned, maintained, and operated the bridge, and attorneys on the state’s behalf filed their own claim for those damages. Pursuant to the governing regulation, funds recovered by the State of Maryland for reconstruction of the bridge will be used to reduce the project costs paid for in the first instance by federal tax dollars.
So was the bridge self-insured?
Isn't there a big liability insurance payout on this towards the 5.2 Billion, and if so won't the insurer be more motivated to mandate compliance?
Yes the insurer will likely be able to charge more.
The vessel owner may possibly be able to recover some of that from the manufacturer, as the wiring was almost certainly a manufacturing error, and maybe some of the configurations that continued the blackout were manufacturer choices as well.
At the end of the day we all just pay for it in terms of insurance costs priced into our goods.
What would be a better solution?
Well the current way involves paying for a bunch of non-value producing busy work by insurers, lawyers and a ton of expert parties relevant to the litigation process.
There's probably some combination of "everyone just posts up a bond into a fund to cover this stuff" plus a really high deductible on payout that basically deletes all those expensive man hours without causing any increased incentive for carnage.
Events like these are a VERY rare exception compared to all the shipping activities that go on in an uneventful manner. Doesn't take a genius to do the napkin math here. Whatever the solution is probably ought to try to avoid expending resources in the base case where everything is fine.
Regulations to require work is done correctly the first time. Also inspections.
I like a government that pays workers to look out for my safety.
Informed consumers who actually walk, ever.
A punishment that was felt by decision makers but was unable to be offloaded as a cost to the public, except maybe in the form of rent. Prison :)
But it's important to "punish" (via punitive fines) the right people, so that they will put some effort into not making that mistake again.
Actually, to be even more cynical….
If everyone saved $100M by doing this and it only cost one shipper $100M, then of course everyone else would do it and just hope they aren’t the one who has bad enough luck to hit the bridge.
And statistically, almost all of them will be okay!
This is the calculus that shows why our current civilization is unlikely to pass the filter.
Making the calculus apparent is why we might have a chance.
Because then anyone who owns a bridge/needs to pay for said bridge damage goes, ‘well clearly the costs of running into a bridge on the runs-into-bridges-due-to-negligence-group isn’t high enough, so we need to either create more rules and inspections, or increase the penalties, or find a way to stop these folks from breaking our bridges, or the like - and actually enforce them’.
It’s why airplanes are so safe to fly on, despite all the same financial incentives. If you don’t comply with regulators, you’ll be fined all to hell or flat out forbidden from doing business. And that is enforced.
And the regulators take it all very seriously.
Ships are mostly given a free pass (except passenger liners, ferries, and hazmat carrying ships) because the typical situation if the owner screws up is ‘loses their asset and the assets of anyone who trusted them’, which is a more socially acceptable self correcting problem than ‘kills hundreds of innocent people who were voters and will have families crying, gnashing their teeth, and pointing fingers on live TV about all this’.
I imagine every vessel has its own corporation that owns it which would declare insolvency if this kind of thing happens
That seems like a legal issue. Liability should flow upwards to the owners.
Harbor authorities might ban such uninsured ships from their jurisdictions.
It’s not thought. These situations are extremely rare. When they happen it just close the company and shed liability.
Yup, nobody wants to admit that regulations and inspections are a reasonable solution
[dead]
Although I was never named to a mishap board, my experience in my prior career in aviation is that the proper way to look at things like this is that while it is valuable to identify and try to fix the ultimate root cause of the mishap, it's also important to keep in mind what we called the "Swiss cheese model."
Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up. Otherwise, something happens (planned or otherwise) that allows you to dodge the bullet this time.
Meaning a) it's important to identify places where firebreaks and redundancies can be put in place to guard against failures further upstream, and b) it's important to recognize times when you had a near-miss, and still fix those root causes as well.
Which is why the "retrospectives are useless" crowd spins me up so badly.
> it's important to recognize times when you had a near-miss, and still fix those root causes as well.
I mentioned this principal to the traffic engineer when someone almost crashed into me because of a large sign that blocked their view. The engineer looked into it and said the sight lines were within spec, but just barely, so they weren't going to do anything about it. Technically the person who almost hit me could have pulled up to where they had a good view, and looked both ways as they were supposed to, but that is relying on one layer of the cheese to fix a hole in another, to use your analogy.
Likewise with decorative hedges and other gardenwork; your post brought to mind this one hotel I stay regularly where a hedge is high enough and close enough to the exit that you have to nearly pull into the street to see if there's oncoming cars. I've mentioned to the FD that it's gonna get someone hurt one day, yet they've done nothing about it for years now.
Send certified letters to the owner of the hedge and whatever government agency would enforce rules about road visibility. That puts them "on notice" legally, so that they can be held accountable for not enforcing their rules or taking precautions.
The problem is that they are legally doing nothing wrong. Everything is done according to the rules, so they can't be held accountable for not following them. After all, they are taking all reasonable precautions, what more could be expected of them?
The fact that the situation on the ground isn't safe in practice is irrelevant to the law. Legally the hedge is doing everything, so the blame falls on the driver. At best a "tragic accident" will result in a "recommendation" to whatever board is responsible for the rules to review them.
All that applies for criminal cases, but if a civil lawsuit is started and evidence is presented to the jury that the parties being sued had been warned repeatedly that it would eventually occur, it can be quite spicy.
Which is why if you want to be a bastard, you send it to the owners, the city, and both their insurance agencies.
This is stupid. Unless you happen to be the one that crashes it won't be a factor at all.
Discovery’s a bitch which is why they settle.
Well, it could be; you can watch out for accidents at that intersection and offer to support a case arising from one.
If your goal is to get the intersection fixed, this is a reasonable thing to do.
you think it's reasonable to have 24/7 surveillance and then case support to get a hedged trimmed?
@Bombcar is correct. Once they've been legally notified of the potential issue, they have increased exposure to civil liability. Their lawyers and insurance company will strongly encourage them to just fix it (assuming it's not a huge cost to trim back the stupid hedge). A registered letter can create enough impetus to overcome organizational inertia. I've seen it happen.
In my experience (European country) even email with magic words "clear risk to health and life" can jumpstart the process.
People love to rag on Software Engineers for not being "real" engineers, whatever that means, but American "Traffic Engineers" are by far the bigger joke of a profession. No interest in defense in depth, safety, or tradeoffs. Only "maximize vehicular traffic flow speed."
In this case, being a "traffic engineer" with the ability to sign engineering plans means graduating from an ABET-accredited engineering program, passing both the Fundamentals of Engineering exam and the Principles & Practice of Engineering exam, being licensed as a professional engineer, and passing the Professional Traffic Operations Engineer exam. I think they do a little more than "maximize vehicular traffic flow."
Certifications prove that you studied, and are smart and or diligent enough to pass an exam.
If those certifications try to teach you bad approaches. Then they don't help competence. In fact, they can get people stuck in bad approaches. Because it's what they have been Taught by the rigorous and unquestionable system. Especially when your job security comes from having those certifications, it becomes harder to say that the certifications teach wrong things.
It seems quite likely from the outside that this is what happened to US traffic engineering. Specifically that they focus on making it safe to drive fast and with the extra point that safe only means safe for drivers.
This isn't just based on judging their design outcomes to be bad. It's also in the data comparing the US to other countries. This is visible in vehicle deaths per Capita, but mostly in pedestrian deaths per Capita. Correcting for miles driven makes the vehicle deaths in the US merely high. But correcting for miles walked (not available data) likely pushes pedestrian deaths much higher. Which illustrates that a big part of the safety problem is prioritizing driving instead of encouraging and proyecting other modes of transportat. (And then still doing below average on driving safety)
> I think they do a little more than "maximize vehicular traffic flow."
You would be mistaken. Traffic engineers are responsible for far, far more deaths than software engineers.
To be fair, there is no way to fix this in the general case—large vehicles and other objects may obstruct your view also. Therefore, you have to learn to be cognisant of line-of-sight blockers and to deal with them anyway. So for a not-terrible driver, the only problem that this presents is that they have to slow down. Not ideal, but not a safety issue per se.
That we allow terrible drivers to drive is another matter...
> there is no way to fix this in the general case—large vehicles and other objects may obstruct your view also
Vehicles are generally temporary. It is actually possible to ensure decent visibility at almost all junctions, as I found when I moved to my current country - it just takes a certain level of effort.
> Which is why the "retrospectives are useless" crowd spins me up so badly.
When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines. They're done every sprint (or week, if you're unlucky) and even if nothing happens the whole team might have to sit for an hour and come up with things to say to fill the air.
In software, the analysis following a mishap is usually called a post-mortem. I haven't seen many complaints about those have no value. Those are usually highly appreciated. Thought some times the "blameless post-mortem" people take the term a little too literally and try to avoid exploring useful failures if they might cause uncomfortable conversations about individuals making mistakes or even dropping the ball.
Post mortems are absolutely key in creating process improvements. If you think about an organization's most effective processes, they are likely just representations of years of fixed errors.
Regarding blamelessness, I think it was W. Edwards Deming who emphasized the importance of blaming process over people, which is always preferable, but its critical for individuals to at least be aware of their role in the problem.
Agree. I am obligated to run those retrospectives and the SNR is very poor.
It is nice though (as long as there isn't anyone in there that the team is afraid to be honest in front of), when people can vent about something that has been pissing them off, so that I as their manager know how they feel. But that happens only about 15-20% of the time. The rest is meaningless tripe like "Glad Project X is done" and "$TECHNOLOGY sucks" and "Good job to Bob and Susan for resolving the issue with the Acme account"
>When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines.
You mean to tell me that this comment section where we spew buzzwords and reference the same tropes we do for every "disaster" isn't performative.
this is essentially the gist of https://how.complexsystems.fail which has been circulating more with discussions of the recent AWS/Azure/Cloudflare outages.
As I said elsewhere, the upshot is that you need to know which holes the bullet went through so you can fix them. Accidents like this happen when someone does not (care to) know the state of the system.
> Swiss cheese model
I always thought that before the "Swiss cheese model" introduced in the 1990s that the term Swiss cheese was used to mean something that had oodles of security holes(flaws).
Perhaps I find the metaphor weird because pre-sliced cheese was introduced later in my life (processed slices were in my childhood, but not packets of pre-sliced cheese which is much more recent).
>Which is why the "retrospectives are useless" crowd spins me up so badly.
As Ops person, I've said that before when talking about software and it's mainly because most companies will refuse to listen to the lessons inside of them so why am I wasting time doing this?
To put it aviation terms, I'll write up something being like (Numbers made up) "Hey, V1 for Hornet loaded at 49000 pounds needs to be 160 knots so it needs 10000 feet for takeoff" Well, Sales team comes back and says NAS Norfolk is only 8700ft and customer demands 49000+ loads, we are not losing revenue so quiet Ops nerd!
Then 49000+ Hornet loses an engine, overruns the runway, the fireball I'd said would happen, happens and everyone is SHOCKED, SHOCKED I TELL YOU this is happening.
Except it's software and not aircraft and loss was just some money, maybe, so no one really cares.
> All the holes in the cheese line up...
I absolutely heard that in Hoover's voice.
Is there an equivalent to YouTube's Pilot Debrief or other similar channels but for ships?
> Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up.
The metaphor relies on you mixing and matching some different batches of presliced Swiss cheese. In a single block, the holes in the cheese are guaranteed to line up, because they are two-dimensional cross sections of three-dimensional gas bubbles. The odds of a hole in one slice of Swiss cheese lining up with another hole in the following slice are very similar to the odds of one step in a staircase being followed by another step.
No, it's a metaphor.
The three-dimensional gas bubbles aren't connected. An attacker has to punch through the thin walls to cross between the bubbles or wear and tear has to erode the walls over time. This doesn't fundamentally change anything.
And there's the archetypal comment on technology-based social media that is simultaneously technically correct and utterly irrelevant to the topic at hand.
Actually the pedantry is meaningful!
You cannot create a swiss cheese safety model with correlated errors, same as how the metaphor fails if the slices all come from the same block of swiss cheese!
You have to ensure your holes come from different processes and systems! You have to ensure your swiss cheese holes come from different blocks of cheese!
Note that "Don't make mistakes" is no more actionable for maintenance of a huge cargo ship than for your 10MLoC software project. A successful safety strategy must assume there will be mistakes and deliver safe outcomes nevertheless.
Obviously this is the standard line any disaster prevention, and makes sense 99% of the time. But what's the standard line about where this whole protocols-to-catch-mistakes thing bottoms out? Obviously people executing the protocol can make mistakes, or fall victim to normalization of deviance. The same is true for the next level of safety protocol you layer on top of that. At some level, the only answer really is just "don't make mistakes", right? And you're mostly trying to make sure you can do that at a level where it's easier to not make mistakes, like simpler decisions not under time pressure.
Am I missing something? I feel like one of us is crazy when people are talking about improving process instead of assigning blame without addressing the base case.
Normalization of deviance doesn't happen through people "making mistakes", at least not in the conventional sense. It's a deliberate choice, usually a response to bad incentives, or sometimes even a reasonable tradeoff.
I mean ultimately establishing a good process requires make good choices and not making bad ones, sure. But the kind of bad decisions that you have to avoid are not really "mistakes" the same way that, like, switching on the wrong generator is a mistake.
Quite, normalization is another failure mode, besides simple mistakes, that process has to account for.
It kind of is though. There's a lot less opportunity for failures at the limit and unforeseen scale. Mechanical things also mostly don't keel over or go haywire with no warning.
Only tangentially related but the debate over whether the Francis Scott Key bridge is or was a bridge got so heated on Wikipedia that the page had to be protected, and I finally have a reason for bringing this up
Edit wars aside, it's a nice philosophical question.
>The seven highway workers and inspector on the Key Bridge at the time were not
notified of the Dali’s emergency situation before the bridge collapsed. We found that,
had they been notified about the same time the MDTA Police officers were told to block
vehicular traffic, the highway workers may have had sufficient time to drive to a portion
of the bridge that did not collapse. Further, we found that effective and immediate
communication to evacuate the bridge during an emergency is critical to ensuring the
safety of bridge workers.
That was super helpful. I was assuming from skimming the text description that it was a failed crimp
A lot of people wildly under-crimp things, but marine vessels not only have nuanced wire requirements, but more stringent crimping requirements that the field at large frustratingly refuses to adhere to despite ABYC and other codes insisting on it
> A lot of people wildly under-crimp things
The good tools will crimp to the proper pressure and make it obvious when it has happened.
Unfortunately the good tools aren't cheap. Even when they are used, some techs will substitute their own ideas of how a crimp should be made when nobody is watching them.
While the US is still very manual at panel building, Europe is not.
So outside of waiting time, I can go from eplan to "send me precrimped and labeled wires that were cut, crimped, and labeled by machine and automatically tested to spec" because this now exists as a service accessible even to random folks.
It is not even expensive.
Can you give an examples of companies that offer this service?
This attitude wherein one thinks they can just spend money and offload responsibility is exactly the problem.
Abdicating responsibility to those "good tools" are why shit never gets crimped right. People just crimp away without a care in the world. Don't get me wrong, they're great for speed and when all you're doing it working on brand new stuff that fits perfect. But when you're working on something sketchy you really want the feedback of the older styles of tool that have more direct feedback. They have a place, but you have to know what that place is.
See also: "the low level alarm would go off if it was empty"
The big problem was that they didn't have the actual fuel pumps running but were using a different pump that was never intended to fulfill this role. And this pump stays off if the power fails for any reason.
The bad contact with the wire was just the trigger, that should have been recoverable had the regular fuel pumps been running.
We should have federal legislation requiring tugboat assist adequate to recover from complete loss of power and steering, through shipping channels that go under bridges supported by mid span support columns. The mechanism should be that if the Coast Guard catches you without a tug, the ship is permanently banned from the port under threat of seizure and repossession by the US federal government, or your vessel just gets immediately seized and held in port under bond.
Insurance providers insuring ships in US waters should also be required to permanently deny insurance coverage to vessels found to be out of compliance, though I doubt the insurance companies would want to play ball.
In a well engineered control system, any single failure will not result in a loss of control over the system.
Was a FMECA (Failure Mode, Effects, and Criticality Analysis) performed on the design prior to implementation in order to find the single points of failure, and identify and mitigate their system level effects?
Evidence at hand suggests "No."
"Catastrophe requires multiple failures – single point failures are not enough.
The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners."
> In a well engineered control system, any single failure will not result in a loss of control over the system
That's true in this case, as well. There was a long cascade of failures including an automatic switchover that had been disabled and set to manual mode.
The headlines about a loose wire are the media's way of reducing it to an understandable headline.
Most cargo ships have a single main engine with plenty of backup-less failure points. They are sort of engineered so these failures can't happen suddenly but you can help yourself to a bunch of videos on how substandard fuel and parts shortages cause week-long poweroffs in a middle of the ocean.
System designers and regulators are aware that the main engine is a single point of failure, but they generally consider loss of main engine power to not be an immediate emergency. There are redundant systems to retain electrical and hydraulic power, and losing motive power isn't generally an instant emergency. Power and steering together is an emergency, yes, and steering is degraded without power, but had they still been able to use the rudder they wouldn't have hit the bridge.
Steering without power at 8 knots would be pretty inefficient (and was - they tried to steer as the power came back). Loss of power in ports, narrow straits etc is recognized as a major issue which is why an engineer and ETO must be in the engine control room during such passages.
A label placed half an inch wrong on misleading affordance -> 200,000 ton bridge collapse, 6 deaths, tens of billions of dollars of economic damage
Instant classic destined for the engineering-disasters-drilled-into-1st-year-engineers canon (or are the other swiss cheese holes too confounding)
I can't believe I've never seen this. I literally laughed out loud when I got to the image. Thank you! Absolute gold
I love this one.
Someone out there spent ages trying to work this out.
Fucking hell.
I guess this will still be bellow Therac-25 for CS and CE students, but above for EE, ME, and Civil Engineering.
[deleted]
It’s been noted that automatic failover systems did not kick in due to shortcuts being taken by the company: https://youtu.be/znWl_TuUPp0
If anyone was curious what is happening with the replacement, I just found this website: https://keybridgerebuild.com/
When shipowners are willing to cut costs with sketchy moves like registering with a random landlocked African country, why should we believe they'll spend any time or effort reading/implementing NTSB guidelines? It isn't like there's some well respected international body like ITAO calling the shots
I know a little about planes and nothing about ships so maybe this is crazy but it seems to me that if you're moving something that large there should be redundant systems for steering the thing.
There are.[1] Unfortunately they take longer to employ than the crew had time.
Shipping is a low-margin business. That business structure does not incentivize paying for careful analysis of failure modes.
Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.
>Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.
Way it worked in Sydney harbour 20+ years ago when I briefly worked on the wharves/tugs, was that the big ships had to have both local tugs, and a local pilot who would come aboard and run the ship. Which seemed to me to be quite an expensive operation but I honestly cant recall any big nautical disasters in the habour so I guess it works.
> mandatory tug boats
Which there are in some places. Where I grew up I'd watch the ships sail into and out of the oil and gas terminals, always accompanied by tugs. More than one in case there's a tug failure.
I was very confused by the word "contact" in the headline, which apparently means "crashed the fuck into and killed six people"
[deleted]
Non redundant fuel pump that doesn't even restart on power failure. Main engine shutting of when water pressure drops, backup generator not even starting in time AND shoddy wiring that offlines the whole steering system. Thats what i call GOATED engineering. props to Hyundai HI
> Non redundant fuel pump that doesn't even restart on power failure
The crew weren't using the redundant fuel pumps. They were using the non-redundant fuel line flushing pump as a generator fuel pump, a task it was never designed for and which was not compliant.
That it doesn't restart on restoration of power is by design; you don't want to start flushing your fuel lines when the power returns because this could kill your generators and cause another blackout.
> Main engine shutting of (sic) when water pressure drops
Yeah, this is quite bad. There ought to be an override one can activate in an emergency in order to run the engines to the point of overheating, under the assumption that even destroying the engine will cause less catastrophic consequences than not having propulsion at the time.
> backup generator not even starting in time
There were 5 generators on board. Generators 1 through 4 are the main generators on the HV bus side, and the emergency backup generator is on the LV bus side.
When the incident occurred, the ship was being powered by generators 3 and 4, which were receiving their fuel via the non-redundant fuel line flushing pump. These generators powered the HV bus, which powered the LV bus via a transformer. The emergency backup generator was not running, so the LV bus was only receiving power from the HV bus via 1 transformer.
The incident tripped the circuit breaker for this transformer, disconnecting the HV bus from the LV bus, resulting in the first LV bus blackout. This resulted in main engine shutdown (coolant pump failure) and an automatic emergency backup generator startup.
There is an alternate (backup) set of circuit breakers and transformer that could have energised the LV bus, but the transformer switches were left in the manual position, so this failover did not happen automatically and immediately. There were no company procedures or regulations which required them to be left in the automatic position.
The LV bus also powered the fuel line flushing pump, so this pump failed. As a result, generators 3 and 4 started to fail (being supplied with fuel by a pump which was no longer operating). The electrical management system automatically commanded the start of generator 2 in response to the failing performance of generators 3 and 4.
Generator 1 and generator 2 were fed by the standard fuel pumps, which were available. One main generator is capable of powering the entire ship, so there was no need to start generator 1 as well; this would have just put more load on the HV bus (by having to run the fuel pump for generator 1 as well).
Instead of the automatic transformer failover (which was unavailable), the crew manually closed the same circuit breaker that had already tripped, 1 minute after the first LV bus blackout.
This restored power to the LV bus via the same transformer that was originally powering it, but did not restart the fuel line flushing pump supplying generators 3 and 4 (which were still running, but spinning down because they were being fed fuel via gravity only). This also restored full steering control, but this in itself was inadequate to control the vessel's course without the engine-driven propeller.
The main engine was still offline and takes upwards of half a minute to restart, assuming everyone is in place and ready to do so immediately, which was unlikely.
The emergency backup generator finally started 10 seconds later (25 seconds too late by requirements, 70 seconds after the first LV bus blackout).
Generator 2 had not yet gotten up to speed and connected to the HV bus before generators 3 and 4 disconnected (having exhausted the gravity-fed fuel in the line ahead of the inoperative fuel line flushing pump), resulting in an HV bus blackout and the second LV bus blackout. With only the emergency backup generator running on the LV side, only one-third of steering control was available, but again, this was inadequate without the engine.
3 seconds later, generator 2 connected to the HV bus. 26 seconds later, a crew member manually activated the alternate transformer, restoring power to the LV bus for the second time.
The collision was preventable:
- It is no longer a requirement that the engine automatically shuts down due to a loss of coolant pressure. It was at the time the vessel was constructed, but this was never re-evaluated. If it were, the system may have been tweaked to avoid losing the engine.
- If the transformer switches were left in the automatic position, the LV bus would have switched over to being powered by the second transformer automatically, and the engine coolant pumps and fuel line flushing pump would not have been lost.
- Leaving the emergency backup generator running (instead of in standby configuration) would have kept the LV bus energised after the first transformer tripped, and the engine coolant pumps and fuel line flushing pump would not have been lost.
- If the crew had opted to manually activate the second transformer within about half a minute (twice as fast as they reactivated the first one), and restarted the fuel line flushing pump, a second blackout would have been avoided, and the engine could have been restarted in time to steer away.
This shows the importance of leaving recovery systems armed and regularly training on power transfer procedures. It also illustrates why you shouldn't be running your main generators from a fuel pump which isn't designed for that task. This same pump setup was found on another ship they operated.
So there were two big failures: Electrician not doing work to code; inspector just checking the box during the final inspection.
No.
Lots more :
It's because they were abusing a non-redundant pump to supply fuel to the generators. Which then failed, which ....
From the report:
> The low-voltage bus powered the low-voltage switchboard, which supplied power to vessel lighting and other equipment, including steering gear pumps, the fuel oil flushing pump and the main engine cooling water pumps. We found that the loss of power to the low-voltage bus led to a loss of lighting and machinery (the initial underway blackout), including the main engine cooling water pump and the steering gear pumps, resulting in a loss of propulsion and steering.
...
> The second safety concern was the operation of the flushing pump as a service pump for supplying fuel to online diesel generators. The online diesel generators running before the initial underway blackout (diesel generators 3 and 4) depended on the vessel’s flushing pump for pressurized fuel to keep running. The flushing pump, which relied on the low-voltage switchboard for power, was a pump designed for flushing fuel out of fuel piping for maintenance purposes; however, the pump was being utilized as the pump to supply pressurized fuel to diesel generators 3 and 4\. Unlike the supply and booster pumps, which were designed for the purpose of supplying fuel to diesel generators, the flushing pump lacked redundancy. Essentially, there was no secondary pump to take over if the flushing pump turned off or failed. Furthermore, unlike the supply and booster pumps, the flushing pump was not designed to restart automatically after a loss of power. As a result, the flushing pump did not restart after the initial underway blackout and stopped supplying pressurized fuel to the diesel generators 3 and 4, thus causing the second underway blackout (lowvoltage and high-voltage).
No, there was a larger failure: whoever designed the control system such that a single loose wire on a single terminal block (!) could take down the entire steering system for a 91,000 ton ship.
They didn't.
If you read the report they were misusing this pump to do fuel supply when it wasn't for that. And it was non redundant when fuel supply pumps are.
Its like someone repurposing a husky air compressor to power a pneumatic fire suppression system and then saying the issue is someone tripping over the cord and knocking it out.
There's a 3rd failure: the failure to install/upgrade dolphins that could deflect a modern containership, despite the identified need for such. That proposed project seems cheap in retrospect.
Yes, 100%. Lots of failures across the board here. Especially with large ships and how many different nations they might be registered in, I can't imagine it's easy to have a lot of regulatory oversight into their construction, mechanical inspection or maintenance schedules. I'm curious how modern ports handle this problem, feels like it could cause a ton of issues beyond just catastrophic ones like this one.
The terminal blocks could also have been designed to aid visual inspection.
I predicted 10yr & $20B to replace it and stand by that forecast.
You're an optimist!
Worth noting: The MV Dali is a 1000-foot-long ship, weighing 50% more than a nuclear aircraft carrier, with a total crew of twenty-two.
That's everybody - captain, bridge crew, deck crew, cook, etc.
So - how many of those 22 will be your engineering crew? How many of those engineers would be on duty, when this incident happened? And once things start going wrong, and you're sending engineers off to "check why Pump #83, down on Deck H, shows as off-line" or whatever - how many people do you have left in the big, complex engineering control room - trying to figure out what's wrong and fix it, as multiple systems fail, in the maybe 3 1/2 minutes between the first failure and when collision becomes inevitable?
My rule for a couple decades: any failover procedure that only gets run when there's a failure, will not work.
This is a great example of why “small details” matter. How many times do you think an apprentice has been corrected about this? What percentage of the time does the apprentice say “yeah but it’s just a label”. Lots of things went wrong in this case, but if the person that put the label on that wire did it correctly then this whole catastrophe could have been avoided.
I still hate screw terminal blocks. Spring terminals + ferrules are still the way.
Clear plastic viewing windows on the spring terminals are the way to go. It allows for both instant feedback for the installer, and visual inspection or troubleshooting later by a third party.
The spring terminals should also be designed to have a secondary latch on this type on (what should be) rugged installation.
Finally, critical circuits should be designed to detect open connections, and act accordingly. A single hardware<->software design for this could be a module to apply across all such wiring inputs/outputs. This is simple and cheap enough to do these days.
A manual tug-test on the physical would be advisable when installing, to check the spring terminal has gripped the conductor when latched.
"Contact" is a weird choice of words.
Yeah, when the word “allision” was right there!
Not really, because that's where that part of the investigation ends.
Pre-contact everything is about the ship and why it hit anything, post-contact everything is about the bridge and why it collapsed. The ship part of the investigation wouldn't look significantly different if the bridge had remained (mostly) intact, or if the ship had run aground inside the harbor instead.
Reminds me of "fetched up" describing what happened to the Exxon Valdez.
Thought the same, bridge is fallen on its entire length, sounds like a way to undersell it. Such an opportunity to pass on clickbait is interesting in this day and age.
I’m not sure that the NTSB is really in the clickbait business. But yes, contact does seem to really be underselling the event.
Right? Like when I read that I thought we're talking a little paint-swapping.
No, we are not talking a little paint-swapping.
"and WAGO Corporation, the electrical component manufacturer"
Sucks to be any of the YouTubers influencers today telling everyone they should use WAGO connectors in all their walls.
Seriously though, impressive to trace the issue down this closely. I am at best an amateur DIY electrician, but I am always super careful about the quality of each connection.
The WAGO connectors typically used in home wiring have a transparent plastic shell which lets you see whether the wire made it all the way through the spring clip. The ones shown in the NTSB video had an opaque shell around the spring clip.
I think my attempt at humor butthurt a lot of WAGO fans. I used "seriously though" after in my actual... serious comment.
I don't see anything in the report that suggests the connector failed. It sounds like the installer failed. Trust me, they can screw up twist connections too :)
The date for bridge completion was bumped from 2028 to 2030 already. I assume it won't be done until 2038. It is absolutely murdering traffic in the Baltimore area, not having a bridge. I would be super interested in seeing where every single dollar goes for this project, I assume at least 1/3 of it will be skimmed off the top.
The consensus seems to be skimming won’t occur. I’d encourage people to research the corruption of elected officials in the Baltimore area.
The consensus is that your comment is way off-topic.
The older I get , the more I trust people over rules.
Does this comment apply to the current crop of American politicians? (Just curious.)
Well, lack of trust in that case .
That’s what I was referring to. The concept that comprehensive laws can substitute leaders with integrity is ridiculous
I strongly recommend watching/reading the entire report, or the summary by Sal Mercogliano of What's Going On In Shipping [0].
Yes, the loose wire was the immediate cause, but there was far more going wrong here. For example:
- The transformer switchover was set to manual rather than automatic, so it didn't automatically fail over to the backup transformer.
- The crew did not routinely train transformer switchover procedures.
- The two generators were both using a single non-redundant fuel pump (which was never intended to supply fuel to the generators!), which did not automatically restart after power was restored.
- The main engine automatically shut down when the primary coolant pump lost power, rather than using an emergency water supply or letting it overheat.
- The backup generator did not come online in time.
It's a classic Swiss Cheese model. A lot of things had to go wrong for this accident to happen. Focusing on that one wire isn't going to solve all the other issues. Wires, just like all other parts, will occasionally fail. One wire failure should never have caused an incident of this magnitude. Sure, there should probably be slightly better procedures for checking the wiring, but next time it'll be a failed sensor, actuator, or controller board.
If we don't focus on providing and ensuring a defense-in-depth, we will sooner or later see another incident like this.
[0]: https://www.youtube.com/watch?v=znWl_TuUPp0
The problem is that there are a thousand merchant marine vessels operating right now that are all doing great - until the next loose wire. The problem is that nobody knows about that wire and it worked fine on the last trip. The other systems are all just as marginal as they were on the 'Dali' but that one shitty little wire is masking that.
Running a 'tight ship' is great when you have a budget to burn on excellent quality crew. But shipping is so incredibly cut-throat that the crew members make very little money, are effectively modern slaves and tend to carry responsibilities way above their pay grade. They did what they could, and more than that, and for their efforts they were rewarded with what effectively amounted to house arrest while the authorities did their thing. The NTSB of course will focus on the 'hard' causes. But you can see a lot of frustration shine through towards the owners who even in light of the preliminary findings had changed absolutely nothing on the rest of their fleet.
The recommendation to inspect the whole ship with an IR camera had me laughing out loud. We're talking about a couple of kilometers of poorly accessible duct work and cabinets. You can do that while in port, but while you're in port most systems are idle or near idle and so you won't ever find an issue like this until you are underway, when vibration goes up and power consumption shoots up compared to being in port.
There is no shipping company that is effectively going to do a sea trial after every minor repair, usually there is a technician from some supplier that boards the vessel (often while it is underway), makes some fix and then goes off-board again. Vessels that are not moving are money sinks so the goal is to keep turnaround time in port to an absolute minimum.
What should really amaze you is how few of these incidents there are. In spite of this being a regulated industry it is first and foremost an oversight failure, if the regulators would have more budget and more manpower there maybe would be a stronger drive to get things technically in good order (resist temptation: 'shipshape').
> But you can see a lot of frustration shine through towards the owners who even in light of the preliminary findings had changed absolutely nothing on the rest of their fleet.
Between making money, perceived culpability and risks offloaded to insurance companies why would they?
> The problem is that there are a thousand merchant marine vessels operating right now that are all doing great
Are they tho?
I generally think you have good takes on things, but this comes across like systemic fatalistic excuse making.
> The recommendation to inspect the whole ship with an IR camera had me laughing out loud.
Where did this come from? What about the full recommendations from the NTSB. This comment makes it seem like you are calling into question the whole of the NTSB's findings.
"Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
https://en.wikipedia.org/wiki/Francis_Scott_Key_Bridge_colla...
> Between making money, perceived culpability and risks offloaded to insurance companies why would they?
Because it is the right thing to do, and the NTSB thinks so too.
>> The problem is that there are a thousand merchant marine vessels operating right now that are all doing great > Are they tho?
In the sense that they haven't caused an accident yet, yes. But they are accidents waiting to happen and the owners simply don't care. It usually takes a couple of regulatory interventions for such a message to sink in, what the NTSB is getting at there is that they would expect the owners to respond more seriously to these findings.
>> The recommendation to inspect the whole ship with an IR camera had me laughing out loud. > Where did this come from?
Page 58 of the report.
And no, obviously I am not calling into question the whole of the NTSB's findings, it is just that that particular one seems to miss a lot of the realities involving these vessels.
> "Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
I don't understand your goal with this statement, it wasn't mine so the quotes are not appropriate and besides I don't agree with it.
Loose wires are a fact of life. The amount of theoretical redundancy is sufficient to handle a loose wire, but the level of oversight and the combination of ad-hoc work on these vessels (usually under great time pressure) together are what caused this. And I think that NTSB should have pointed the finger at those responsible for that oversight as well, which is 'MARAD', however, MARAD does not even rate a mention in the report.
>> "Don't look for a villain in this story. The villain is the system itself, and it's too powerful to change."
> I don't understand your goal with this statement, it wasn't mine so the quotes are not appropriate and besides I don't agree with it.
fwiw, your first comment left with me the exact same impression as it did sitkack.
Oh, there are plenty of villains here. But they're in offices and wearing ties.
And they should be smacked down hard, but that isn't going to happen because then - inevitably - the role of the regulators would come under scrutiny as well. That is the main issue here. The NTSB did a fantastic job - as they always do - at finding the cause, it never ceases to amaze me how good these people are at finding the technical root cause of accidents. But the bureaucratic issues are the real root cause here: an industry that is running on wafer thing margins with ships that probably should not be out there, risking peoples lives for a miserly wage.
Regulators should step in and level the playing field. Yes, that will cause prices of shipping to rise. But if you really want to solve this that is where I think they should start and I am not at all saying that the system is too powerful to change, just that for some reason they seem to refuse to even name it, let alone force it to change.
Fwiw and since you received several comments about it, your first comment did not come off to everyone as making excuses. It was pretty clear you were trying to turn peoples attention to the real problem.
There was also no fatalistic tone about the system being too powerful to change. Just clear sharing of observations IMO.
It is not unusual to receive this reaction (being blamed for fatalism and making excuses) from observations like these, I have noticed.
I suspect a lot of people commenting in this thread have never been on one of these ships or have any idea of what the typical state of maintenance is, and how inaccessible the tech compartments are when the vessel is underway. This isn't exactly a server room environment. When vessels are new (in the first five years or so) and under the first owners they are usually tip-top. Then, after the first sale the rot sets in and unless there is a major overhaul you will see a lot of issues like these, usually they do not have such terrible consequences. They tend to last for 25 years or so (barring mishaps) and by then the number of repairs will be in the 100's and the vessel has changed hands a couple of times.
Passenger carrying vessels are better, but even there you can come across some pretty weird stuff.
https://eu.usatoday.com/story/travel/cruises/2025/08/27/msc-...
And that one was only three years old, go figure.
I agree with all of this and everything you've said thus far. I hope my prior comment was not interpreted as some sort of indictment or attack on your motives.
Your original comment comes of like excuse making and that nothing can possibly be done.
> > Between making money, perceived culpability and risks offloaded to insurance companies why would they?
> Because it is the right thing to do, and the NTSB thinks so too.
Doing great is much different than, "accidents waiting to happen".
I don't understand the goal of your changing rhetoric.
You can also look at the problem from the perspective of the bridge. Why was it possible that a ship took it down? Motors can fail ...
Yes, but if you think of a ship once underway when the engine fails as an unguided ballistic missile with a mass that is absolutely mind boggling (the Dali masses 100,000 tonnes) there isn't much that you could build that would stop it. The best suggestion I've seen is to let the ship run aground but that ignores the situation around the area where the accident happened.
This ship wasn't towed by a tug, it was underway under its own power and in order for the ship to have any control authority at all it needs water flowing over the rudder.
Without that forward speed you're next to helpless and these things don't exactly turn on a dime. So even if there had been a place where it could have run aground it would never have been able to reach it because it was still in the water directly in front of the passage way under the bridge.
100,000 tonnes doing 7 Kph is a tremendous amount of kinetic energy.
The exact moment the systems aboard the Dali failed could not have come at a worse time, it had - as far as I'm aware of the whole saga - just made a slight course correction to better line up with the bridge and the helm had not yet been brought back to neutral. After that it was just Newton taking over, without control I don't think there is much that would have stopped it.
This is a good plot of the trajectory of the vessel from the moment it went under way until the moment it impacted the bridge:
https://www.pilotonline.com/wp-content/uploads/2024/03/5HVqi...
You can clearly see the kink in the trajectory a few hundred meters before it hit the bridge.
Perhaps, but you can also build redundancy into the bridge.
You can, if you're prepared to pay for it. You could halt shipping while people are working on the bridge. You could make tunnels instead of bridges.
The question is simple: who will pay for it? Apparently we are ok with this kind of risk, if we weren't we would not be doing this at all.
There is a similar thing going on in my country with respect to railway crossings. Every year people die on railway crossings. But it took for a carriage full of toddlers to be hit by a train before the sentiment switched from 'well, they had it coming' to 'hm, maybe we should do something about this'. People don't like to pay for risks they see as small or that they perceive as that they're never going to affect them.
This never was about technology, it always was about financing. Financing for proper regulatory tech oversight (which is vastly understaffed) on the merchant marine fleet, funding for better infrastructure, funding for (mandatory) tug assistance for vessels of this size near sensitive structures, funding for better educated and more capable crew and so on. The loose wire is just a consequence of a whole raft of failures that have nothing to do with a label shroud preventing a wire from making proper contact.
The 'root cause' here isn't really the true root cause, it is just the point at which technology begins and administration ends.
Too build "a redundancy into the bridge" to survive such a overwhelming force would be a very expensive endeavor.
Better to spend the effort in fleet education
It’s not realistically plausible to build bridges that won’t be brought down by that size of ship
This, 100%. I forget the specific numbers but regardless, the kinetic energy of a thing with that much mass, even moving at a very slow speed, is off the charts. Designing a bridge or protections for a bridge to survive that would at a minimum be cost prohibitive, if even possible with today’s materials and construction technologies.
Doesn't mean that nothing can be done. https://www.ntsb.gov/news/press-releases/Pages/nr20250320.as...
> The NTSB found that the Key Bridge, which collapsed after being struck by the containership Dali on March 26, 2024, was almost 30 times above the acceptable risk threshold for critical or essential bridges, according to guidance established by the American Association of State Highway and Transportation Officials, or AASHTO.
> Over the last year, the NTSB identified 68 bridges that were designed before the AASHTO guidance was established — like the Key Bridge — that do not have a current vulnerability assessment. The recommendations are issued to bridge owners to calculate the annual frequency of collapse for their bridges using AASHTO’s Method II calculation.
Letters to the 30 bridge owners and their responses https://data.ntsb.gov/carol-main-public/sr-details/H-25-003
This is essentially the same thing that happened with Fukushima Daiichi. The organization running it failed to respond to new information.
Energy doesn't mean squat without a time component over which it's dissipated.
Stopping a car normally vs crashing a car. Skydiving with a parachute vs skydiving without a parachute.
For something like ship vs bridge you have to account for the crunch factor. USS Iowa going the same speed probably would've hit way harder despite having ~1/3 the tonnage.
Nah, we definitely can.
Plan the bridge so any ship big enough to hurt it grounds before it gets that close. Don't put pilings in the channel. It's just money. But it's a lot of money so sometimes it's better to just have shipping not suck.
Alternatively, the Chunnel will almost certainly never get hit with a ship.
> Plan the bridge so any ship big enough to hurt it grounds before it gets that close.
Have a look at the trajectory chart that I posted upthread and tell me how in this particular case you would have arranged that.
Yet another idea: if a ship's motors fail, have a ship ready that can push it in the right direction, in time. Probably need 2x the amount of horsepowers to make up for lost time, but it's not impossible.
Yes, that's called a tug and in plenty of harbors a vessel of this size would not be permitted to do close quarters maneuvers without the mandatory assistance of one, or in this case more likely two, tug boats of a certain minimum size relative to the size of the vessel.
It's a tangent but I don't understand why the dock workers can unionize and earn livable wages but the crew cannot.
The dock can’t move to a jurisdiction that is less union friendly.
The crew could go on strike and cause a comparable amount of disruption to the supply chain as dock workers.
They do have a union: https://www.seafarers.org/
At least on US flagged vessels.
Sometimes when I see vocal but rather uninformed opposition to the Jones Act, I wonder if it isn't partially an aim at union busting.
IIRC the biggest issue with the jones act is the supply of US-built vessels, which are expensive, the current fleet is aging and, outside of defence, there's no real domestic shipbuilding industry anymore. This also means that domestic shipping (especially to populated areas that are not part of the lower 48) can't use anywhere near as much in the way of modern containers. This is anecdotal, but I've heard that people in Puerto Rico and Hawaii routinely order stuff from foreign countries as even with duties, it can be cheaper than ordering it from the mainland.
The act is problematic because it hasn't really been modernized, with the handful of revisions essentially just expanding its scope. The US either has to seriously figure out getting domestic shipbuilding going again (to the point where it can be economical to also export them) or at least whitelist foreign countries (eg South Korea) to allow their ships to be used. But that's unlikely in today's political climate.
The US government used to provide differential subsidies for cargos shipped on US flagged ships. This ensured that US shipping was competitive with the bottom dollar global shippers, at least for some cargos.
This ended under Reagan.
At first lots of people didn't care because Reagan was also doing his 600 ship navy so everyone was busy doing navy work, but after that ended the MM and american shipbuilding entered a death spiral.
Now the only work US flagged vessels can get is supporting the navy, and a tiny sliver of jones act trade. This means there are no economies of scale. If a ship is built, one is built to that class not 10. Orders are highly intermittent and there is no ability to build up a skilled workforce in efficient serial production. On the seagoing side, ships either get run ragged on aggressive schedules (ex: El Faro) or they sit in layup for long stretches rusting away.
If the US wants to fix its merchant marine it needs to provide incentive for increased cargos and increased shipbuilding. As Sal points out, the US is the second-biggest shipowning country in the world. US business like owning ships, they just don't want to fly the American flag because their incentives are towards offshoring.
The incentives are also all over the place. The shipping industry uses a lot of labour from "poor" countries, but on bulk shipping the labour costs are often a rounding error. The main issue is, of course, working conditions. Americans don't want to sit on a freighter for 6 month tours away from their families. The US navy has a hard enough problem doing it for people in their early 20s, and even then that's usually to get access to education funding. People from the Philippines will do it because it is life-changing amounts of money and the alternative is abject poverty.
> Americans don't want to sit on a freighter for 6 month tours away from their families
And yet finding crews was never a problem before differential subsidies ended.
In fact crewing US flagged is harder now because the work is intermittent. If people can't find berths they time out on their licenses and go do something else in a different industry.
> People from the Philippines will do it because it is life-changing amounts of money
The international minimum wage for seafarers is about $700/mo. In comparison wages in the Philippines are between 20k-50k pesos a month or $340-$850. Seafaring is an above-average income job in the Philippines but not "life-changing."
IIUC, the only issue with the jones act requiring US-built vessels is that previous the US Navy used to buy US-built vessels and lease them out below-cost and don't do that anymore. It was never economically to use US-built vessels but we've stoped subsidizing it anymore.
The US navy did not. The US Treasury used to provide "differential subsidies" to allow US flagged vessels the ability to win cargos in international trade versus non-us flagged vessels with lower operating costs.
Thanks for the summary for those of us who can't watch video right now.
There are so many layers of failures that it makes you wonder how many other operations on those ships are only working because those fallbacks, automatic switchovers, emergency supplies, and backup systems save the day. We only see the results when all of them fail and the failure happens to result in some external problem that means we all notice.
It seems to just be standard "normalization of deviance" to use the language of safety engineering. You have 5 layers of fallbacks, so over time skipping any of the middle layers doesn't really have anything fail. So in time you end up with a true safety factor equal only to the last layer. Then that fails and looking back "everything had to go wrong".
As Sidney Dekker (of Understanding Human Error fame) says: Murphy's Law is wrong - everything that can go wrong will go right. The problem arises from the operators all assuming that it will keep going right.
I remember reading somewhere that part of Qantas's safety record came from the fact that at one time they had the highest number of minor issues. In some sense, you want your error detection curve to be smooth: as you get closer to catastrophe, your warnings should get more severe. On this ship, it appeared everything was A-OK till it bonked a bridge.
This is the most pertinent thing to learn from these NTSB crash investigations - it's not what went wrong at the final disaster, but all the things that went wrong that didn't detect that they were down to one layer of defense.
Your car engaging auto brake to prevent a collision shouldn't be a "whew, glad that didn't happen" and more a "oh shit, I need to work on paying attention more."
I had to disable the auto-brake from RCT[1] sensors because of too many false-positives (like 3 a week) in my car.
1: rear-cross-traffic i.e. when backing up and cars are coming from the side.
One of my car's auto-brake sensors triggers when I back up out of by drive way. I can not back out of my drive way with the sensor on.
Yes and having 3 O-rings doesn't mean you can have one frozen solid "just this time"
https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disas...
Why then does the NTSB point blame so much at the single wiring issue? Shouldn't they have the context to point to the 5 things that went wrong in the Swiss cheese and not pat themselves on the back with having found the almost-irrelevant detail of
> Our investigators routinely accomplish the impossible, and this investigation is no different...Finding this single wire was like hunting for a loose rivet on the Eiffel Tower.
In the software world, if I had an application that failed when a single DNS query failed, I wouldn't be pointing the blame at DNS and conducting a deep dive into why this particular query timed out. I'd be asking why a single failure was capable of taking down the app for hundreds or thousands of other users.
That seems like a difference between the report and the press release. I'm sure it doesn't help that the current administration likes quick, pat answers.
The YouTube animation they published notes that this also wasn't just one wire - they found many wires on the ship that were terminated and labeled in the same (incorrect) way, which points to an error at the ship builder and potentially a lack of adequate documentation or training materials from the equipment manufacturer, which is why WAGO received mention and notice.
> I'm sure it doesn't help that the current administration likes quick, pat answers.
Oh, the wire was blue?
In all seriousness, listing just the triggering event in the headline isn't that far out of line. Like the Titanic hit an iceburg, but it was also traveling faster than it should in spite of iceberg warnings, and it did so overloaded and without adequate lifeboats, and it turns out there were design flaws in the hull. But the iceberg still gets first billing.
Interesting, recent podcast on the subject https://99percentinvisible.org/episode/632-the-titanics-best...
I think it reads as too cute by a half. The wire was just the one of dozens of problems that happened last. It's natural to attribute cause in that way, but it's not really helpful in communicating the purpose of these investigations.
If this represents a change in style and/or substance of these kinds of press releases, my hunch would be that the position was previously hired for technical writers but was most recently filled by PR.
It’s also immediately actionable and other similar ships can investigate their wires
The faulty wire is the root cause. If it didn't trigger the sequence of events, all of the other things wouldn't have happened. And it's kind of a tricky thing to find, so that's an exciting find.
The flushing pump not restarting when power resumed did also cause a blackout in port the day before the incident. But you know, looking into why you always have two blackouts when you have one is something anybody could do; open the main system breaker, let the crew restore it and that flushing pump will likely fail in the same way every time... but figuring out why and how the breaker opened is neat, when it's not something obvious.
Operators always like to just clear the fault and move on they have extremely high pressure to make schedule and low incentive to work safely
The solution then is observability, to use the computing term; to know the state of every part of the system.
Oh, it gets even worse!
The NTSB also had some comments on the ship's equivalent of a black box. Turns out it was impossible to download the data while it was still inside the ship, the manufacturer's software was awful and the various agencies had a group chat to share 3rd party software(!), the software exported thousands of separate files, audio tracks were mixed to the point of being nearly unusable, and the black box stopped recording some metrics after power loss "because it wasn't required to" - despite the data still being available.
At least they didn't have anything negative to say about the crew: they reacted timely and adequately - they just didn't stand a chance.
It’s pretty common for black boxes to be load shed during an emergency. Kind of funny how that was allowed for a long time.
"they reacted timely and adequately" and yet: they're indefinitely restricted (detained isn't the right word, but you get it) to Baltimore, while the ship is free to resume service.
One of the things Sal Mercogliano stressed is that the crew (and possibly other crews of the same line) modified systems in order to save time.
Rather than doing the process of purging high-sulphur fuel that can't be used in USA waters, they had it set so that some of the generators were fed from USA-approved fuel, resulting in redundancy & automatic failover being compromised.
It seems probable that the wire failure would not have caused catastrophic overall loss of power if the generators had been in the normal configuration.
Obligatory: https://how.complexsystems.fail/
Also the zeroth failure mode: someone built a bridge that will collapse if any of the many many large ships that sail beneath it can't steer itself with high precision.
Ships were a lot smaller when the bridge was designed and built.
In 1971 there where ships with almost twice the displacement of the Dali.
They weren't freight ships destined for Baltimore, but it wasn't hard to imagine future freight ship sizes when designing the bridge in the early 1970s.
The London sewer system was designed in the 1850s, when the population was around two million people.
It was so overdesigned that it held up to the 1950s, when the population was over 8 million. It didn't start to become a big problem until the 1990s.
Right? There's an artificial island in that very harbor, which could be rammed by similar ships all day and give nary a fuck. It's called Fort Carroll and it was built in the *1850s*.
Why the bridge piers weren't set into artificial islands, I can't fathom. Sure. Let's build a bridge across a busy port but not make it ship-proof. The bridge was built in the 1970s, had they forgotten how to make artificial islands?
If you design a fort and it actually gets used and turns out to suck that WILL be the end of your career in the military even if it only comes out as sucking 20yr later unless you have an airtight case why it's not your fault. That's just how the .mil works. Heads MUST roll. This is completely the opposite from big company bureaucracy and on a literal different planet than civil government bureaucracy.
The organizations that made the bridge happen were so much more vast and so much higher turnover and subject to way, way, way looser application of consequences than the one that built the fort it would be literally impossible to get them to produce something so unnecessarily robust for the average use case.
This sort of "everything I depend on will just have to not suck because my shit will keel right over if it sucks in the slightest" type engineering is all over the modern world and does work well in a lot of places when you consider lifetime cost. But when it fails bridges fall over and cloudflare (disclaimer, didn't actually read that PM, have no idea what happened) goes down or whatever.
> on a literal different planet than civil government
Unless the military was relocated to Mars (or at least the Moon) during the shutdown, I think the word is "metaphorically" instead of "literal".
Or it was just a different plane ...
[dead]
The fuel pump not automatically restarting on power loss may actually have been an intentional safety feature to prevent scenarios like pumping fuel into a fire in or around the generators. Still part of the Swiss cheese model, of course.
It wasn't. They were feeding generators 1 & 2 with the pump intended for flushing the lines while switching between different fuel types.
The regular fuel pumps were set up to automatically restart, which is why a set of them came online to feed generator 3 (which automatically spinned up after 1 & 2 failed, and wasn't tied to the fuel-line-flushing pump) after the second blackout.
I have found that 99% of all network problems are bad wires.
I remember that the IT guys at my old company, used to immediately throw out every ethernet cable, and replace them with ones right out of the bag; first thing.
But these ships tend to be houses of cards. They are not taken care of properly, and run on a shoestring budget. Many of them look like floating wrecks.
If I see a RJ45 plug with a broken locking thingie, or bare wires (not just bare copper - any internal wire), I chop the plug off.
If I come across a CATx (solid core) cable being used as a really long patch lead then I lose my shit or perhaps get a backbox and face plate and modules out along with a POST tool.
I don't look after floating fires.
Chopping the plug is a very good idea, everybody should practice that.
I once had a recurring problem with patch cables between workstations and drops going bad, four or five in one area that had never had that failure rate before. Turns out, every time I replaced one somebody else would grab the "perfectly good" patch cable from the trash can beside my desk. God knows why people felt compelled to do that when they already had perfectly good wires, maybe they thought because it was a different colour it would make their PC faster... So, now every time I throw out a cable that I know to be defective, I always pop the ends off. No more "mystery" problems.
I'd be so tempted to find a source for shiny-looking Cat 3 (10Mbit/sec) patch cables, and start seeding my trash can with those...
> RJ45 plug with a broken locking thingi
You can get replacement clips for those for a quick repair.
https://www.amazon.com/Construct-Pro-RJ-45-Repair-Cat5e/dp/B...
Then you kill the visual signal; that the cable might have been yanked and potentially loose.
I recently had a home network outage. The last thing I tested was the in-wall wiring because I just didn't think that would be the cause. It was. Wiring fails!
I remember a customer support call where the hardware they bought from us wasn't working. The last question I asked was "are you sure that the outlet it's plugged into is working?"
It wasn't.
That's sort of the joke behind this: https://www.youtube.com/watch?v=nn2FB1P_Mn8
Oh yeah had outages recently. Turned out to be corroded connector to box in the street. Not a wire per-se but close.
If I had a nickle for every time someone clobbered some critical connectivity with an ill-advised switch configuration I wouldn't have to work for a living.
And the physical layer issues I do see are related to ham fisted people doing unrelated work in the cage.
Actual failures are pretty damn rare.
The ship was 10 years old, not some WW2 hulk.
That's true for almost all electronics. I worked on robotic arms for a few years - if things broke it was always the wiring (well, to be precise - the connectors).
Another case study to add to the maritime chapter of this timeless classic: https://www.amazon.com/Normal-Accidents-Living-High-Risk-Tec...
Like you said (and illustrated well in the book) it's never just 1 thing, these incidents happen when multiple systems interact and often reflect a the disinvestment in comprehensive safety schemes.
Shipping, accidents and timeless classics.
I was sure you were going to link to Clarke and Dawe, The Front Fell off.
https://m.youtube.com/watch?v=3m5qxZm_JqM
I watched Sal's video yesterday, great summary.
So much complexity, plenty of redundancy, but not enough adherence to important rules.
ive been in an environment like that.
"nuisance" issues like that are deferred bcz they are not really causing a problem, so maintenance spends time on problems with things that make money, rather than what some consider spit n polish on things that have no prior failures.
Tragically, it's the same with modern software development and the growth of technical debt.
All you said is true - but these investigations are often used for the purpose of determining financial liability and often that comes down to figuring out that one, immediate, proximate thing that caused the accident.
A whole bunch of things might have gone wrong, but if only you hadn't done/not-done that one thing, we'd all be fine. So it's all your fault!
Respectfully, have you ever actually read an NTSB report? They're incredibly thorough and consider both causes and contributing factors through a number of lenses with an exclusive focus on preventing accidents from occurring.
Also, they're basically inadmissible in court [49 U.S.C.§1154(b)] so are useless for determining financial liability.
Just insane how much criminal negligence went on. Even boeing hardly comes close. What needs to change is obviously a major review of how ships are allowed to operate near bridges and other infrastructure. And far stricter safety standards like aircraft face.
Hopefully the lesson from this will be received by operators: it's way cheaper to invest in personnel, training, and maintenance than to let the shit hit the fan.
Why? It's cost them 100M (https://www.justice.gov/archives/opa/pr/us-reaches-settlemen...) but rebuilding the bridge is going to be 5.2Billion so if gundecking all this maintenance for 20+ years has saved more then 100M, they will do it again.
From your article - this answered a question I had:
> The settlement does not include any damages for the reconstruction of the Francis Scott Key Bridge. The State of Maryland built, owned, maintained, and operated the bridge, and attorneys on the state’s behalf filed their own claim for those damages. Pursuant to the governing regulation, funds recovered by the State of Maryland for reconstruction of the bridge will be used to reduce the project costs paid for in the first instance by federal tax dollars.
So was the bridge self-insured?
Isn't there a big liability insurance payout on this towards the 5.2 Billion, and if so won't the insurer be more motivated to mandate compliance?
Yes the insurer will likely be able to charge more.
The vessel owner may possibly be able to recover some of that from the manufacturer, as the wiring was almost certainly a manufacturing error, and maybe some of the configurations that continued the blackout were manufacturer choices as well.
At the end of the day we all just pay for it in terms of insurance costs priced into our goods.
What would be a better solution?
Well the current way involves paying for a bunch of non-value producing busy work by insurers, lawyers and a ton of expert parties relevant to the litigation process.
There's probably some combination of "everyone just posts up a bond into a fund to cover this stuff" plus a really high deductible on payout that basically deletes all those expensive man hours without causing any increased incentive for carnage.
Events like these are a VERY rare exception compared to all the shipping activities that go on in an uneventful manner. Doesn't take a genius to do the napkin math here. Whatever the solution is probably ought to try to avoid expending resources in the base case where everything is fine.
Regulations to require work is done correctly the first time. Also inspections.
I like a government that pays workers to look out for my safety.
Informed consumers who actually walk, ever.
A punishment that was felt by decision makers but was unable to be offloaded as a cost to the public, except maybe in the form of rent. Prison :)
But it's important to "punish" (via punitive fines) the right people, so that they will put some effort into not making that mistake again.
Actually, to be even more cynical….
If everyone saved $100M by doing this and it only cost one shipper $100M, then of course everyone else would do it and just hope they aren’t the one who has bad enough luck to hit the bridge.
And statistically, almost all of them will be okay!
This is the calculus that shows why our current civilization is unlikely to pass the filter.
Making the calculus apparent is why we might have a chance.
Because then anyone who owns a bridge/needs to pay for said bridge damage goes, ‘well clearly the costs of running into a bridge on the runs-into-bridges-due-to-negligence-group isn’t high enough, so we need to either create more rules and inspections, or increase the penalties, or find a way to stop these folks from breaking our bridges, or the like - and actually enforce them’.
It’s why airplanes are so safe to fly on, despite all the same financial incentives. If you don’t comply with regulators, you’ll be fined all to hell or flat out forbidden from doing business. And that is enforced.
And the regulators take it all very seriously.
Ships are mostly given a free pass (except passenger liners, ferries, and hazmat carrying ships) because the typical situation if the owner screws up is ‘loses their asset and the assets of anyone who trusted them’, which is a more socially acceptable self correcting problem than ‘kills hundreds of innocent people who were voters and will have families crying, gnashing their teeth, and pointing fingers on live TV about all this’.
I imagine every vessel has its own corporation that owns it which would declare insolvency if this kind of thing happens
That seems like a legal issue. Liability should flow upwards to the owners.
Harbor authorities might ban such uninsured ships from their jurisdictions.
It’s not thought. These situations are extremely rare. When they happen it just close the company and shed liability.
Yup, nobody wants to admit that regulations and inspections are a reasonable solution
[dead]
Although I was never named to a mishap board, my experience in my prior career in aviation is that the proper way to look at things like this is that while it is valuable to identify and try to fix the ultimate root cause of the mishap, it's also important to keep in mind what we called the "Swiss cheese model."
Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up. Otherwise, something happens (planned or otherwise) that allows you to dodge the bullet this time.
Meaning a) it's important to identify places where firebreaks and redundancies can be put in place to guard against failures further upstream, and b) it's important to recognize times when you had a near-miss, and still fix those root causes as well.
Which is why the "retrospectives are useless" crowd spins me up so badly.
> it's important to recognize times when you had a near-miss, and still fix those root causes as well.
I mentioned this principal to the traffic engineer when someone almost crashed into me because of a large sign that blocked their view. The engineer looked into it and said the sight lines were within spec, but just barely, so they weren't going to do anything about it. Technically the person who almost hit me could have pulled up to where they had a good view, and looked both ways as they were supposed to, but that is relying on one layer of the cheese to fix a hole in another, to use your analogy.
Likewise with decorative hedges and other gardenwork; your post brought to mind this one hotel I stay regularly where a hedge is high enough and close enough to the exit that you have to nearly pull into the street to see if there's oncoming cars. I've mentioned to the FD that it's gonna get someone hurt one day, yet they've done nothing about it for years now.
Send certified letters to the owner of the hedge and whatever government agency would enforce rules about road visibility. That puts them "on notice" legally, so that they can be held accountable for not enforcing their rules or taking precautions.
The problem is that they are legally doing nothing wrong. Everything is done according to the rules, so they can't be held accountable for not following them. After all, they are taking all reasonable precautions, what more could be expected of them?
The fact that the situation on the ground isn't safe in practice is irrelevant to the law. Legally the hedge is doing everything, so the blame falls on the driver. At best a "tragic accident" will result in a "recommendation" to whatever board is responsible for the rules to review them.
All that applies for criminal cases, but if a civil lawsuit is started and evidence is presented to the jury that the parties being sued had been warned repeatedly that it would eventually occur, it can be quite spicy.
Which is why if you want to be a bastard, you send it to the owners, the city, and both their insurance agencies.
This is stupid. Unless you happen to be the one that crashes it won't be a factor at all.
Discovery’s a bitch which is why they settle.
Well, it could be; you can watch out for accidents at that intersection and offer to support a case arising from one.
If your goal is to get the intersection fixed, this is a reasonable thing to do.
you think it's reasonable to have 24/7 surveillance and then case support to get a hedged trimmed?
@Bombcar is correct. Once they've been legally notified of the potential issue, they have increased exposure to civil liability. Their lawyers and insurance company will strongly encourage them to just fix it (assuming it's not a huge cost to trim back the stupid hedge). A registered letter can create enough impetus to overcome organizational inertia. I've seen it happen.
In my experience (European country) even email with magic words "clear risk to health and life" can jumpstart the process.
People love to rag on Software Engineers for not being "real" engineers, whatever that means, but American "Traffic Engineers" are by far the bigger joke of a profession. No interest in defense in depth, safety, or tradeoffs. Only "maximize vehicular traffic flow speed."
In this case, being a "traffic engineer" with the ability to sign engineering plans means graduating from an ABET-accredited engineering program, passing both the Fundamentals of Engineering exam and the Principles & Practice of Engineering exam, being licensed as a professional engineer, and passing the Professional Traffic Operations Engineer exam. I think they do a little more than "maximize vehicular traffic flow."
Certifications prove that you studied, and are smart and or diligent enough to pass an exam.
If those certifications try to teach you bad approaches. Then they don't help competence. In fact, they can get people stuck in bad approaches. Because it's what they have been Taught by the rigorous and unquestionable system. Especially when your job security comes from having those certifications, it becomes harder to say that the certifications teach wrong things.
It seems quite likely from the outside that this is what happened to US traffic engineering. Specifically that they focus on making it safe to drive fast and with the extra point that safe only means safe for drivers.
This isn't just based on judging their design outcomes to be bad. It's also in the data comparing the US to other countries. This is visible in vehicle deaths per Capita, but mostly in pedestrian deaths per Capita. Correcting for miles driven makes the vehicle deaths in the US merely high. But correcting for miles walked (not available data) likely pushes pedestrian deaths much higher. Which illustrates that a big part of the safety problem is prioritizing driving instead of encouraging and proyecting other modes of transportat. (And then still doing below average on driving safety)
> I think they do a little more than "maximize vehicular traffic flow."
You would be mistaken. Traffic engineers are responsible for far, far more deaths than software engineers.
To be fair, there is no way to fix this in the general case—large vehicles and other objects may obstruct your view also. Therefore, you have to learn to be cognisant of line-of-sight blockers and to deal with them anyway. So for a not-terrible driver, the only problem that this presents is that they have to slow down. Not ideal, but not a safety issue per se.
That we allow terrible drivers to drive is another matter...
> there is no way to fix this in the general case—large vehicles and other objects may obstruct your view also
Vehicles are generally temporary. It is actually possible to ensure decent visibility at almost all junctions, as I found when I moved to my current country - it just takes a certain level of effort.
> Which is why the "retrospectives are useless" crowd spins me up so badly.
When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines. They're done every sprint (or week, if you're unlucky) and even if nothing happens the whole team might have to sit for an hour and come up with things to say to fill the air.
In software, the analysis following a mishap is usually called a post-mortem. I haven't seen many complaints about those have no value. Those are usually highly appreciated. Thought some times the "blameless post-mortem" people take the term a little too literally and try to avoid exploring useful failures if they might cause uncomfortable conversations about individuals making mistakes or even dropping the ball.
Post mortems are absolutely key in creating process improvements. If you think about an organization's most effective processes, they are likely just representations of years of fixed errors.
Regarding blamelessness, I think it was W. Edwards Deming who emphasized the importance of blaming process over people, which is always preferable, but its critical for individuals to at least be aware of their role in the problem.
Agree. I am obligated to run those retrospectives and the SNR is very poor.
It is nice though (as long as there isn't anyone in there that the team is afraid to be honest in front of), when people can vent about something that has been pissing them off, so that I as their manager know how they feel. But that happens only about 15-20% of the time. The rest is meaningless tripe like "Glad Project X is done" and "$TECHNOLOGY sucks" and "Good job to Bob and Susan for resolving the issue with the Acme account"
>When I see complaints about retrospectives from software devs they're usually about agile or scrum retrospective meetings, which have evolved to be performative routines.
You mean to tell me that this comment section where we spew buzzwords and reference the same tropes we do for every "disaster" isn't performative.
this is essentially the gist of https://how.complexsystems.fail which has been circulating more with discussions of the recent AWS/Azure/Cloudflare outages.
As I said elsewhere, the upshot is that you need to know which holes the bullet went through so you can fix them. Accidents like this happen when someone does not (care to) know the state of the system.
> Swiss cheese model
I always thought that before the "Swiss cheese model" introduced in the 1990s that the term Swiss cheese was used to mean something that had oodles of security holes(flaws).
Perhaps I find the metaphor weird because pre-sliced cheese was introduced later in my life (processed slices were in my childhood, but not packets of pre-sliced cheese which is much more recent).
>Which is why the "retrospectives are useless" crowd spins me up so badly.
As Ops person, I've said that before when talking about software and it's mainly because most companies will refuse to listen to the lessons inside of them so why am I wasting time doing this?
To put it aviation terms, I'll write up something being like (Numbers made up) "Hey, V1 for Hornet loaded at 49000 pounds needs to be 160 knots so it needs 10000 feet for takeoff" Well, Sales team comes back and says NAS Norfolk is only 8700ft and customer demands 49000+ loads, we are not losing revenue so quiet Ops nerd!
Then 49000+ Hornet loses an engine, overruns the runway, the fireball I'd said would happen, happens and everyone is SHOCKED, SHOCKED I TELL YOU this is happening.
Except it's software and not aircraft and loss was just some money, maybe, so no one really cares.
> All the holes in the cheese line up...
I absolutely heard that in Hoover's voice.
Is there an equivalent to YouTube's Pilot Debrief or other similar channels but for ships?
https://www.youtube.com/@pilot-debrief
> Basically, the line of causation of the mishap has to pass through a metaphorical block of Swiss cheese, and a mishap only occurs if all the holes in the cheese line up.
The metaphor relies on you mixing and matching some different batches of presliced Swiss cheese. In a single block, the holes in the cheese are guaranteed to line up, because they are two-dimensional cross sections of three-dimensional gas bubbles. The odds of a hole in one slice of Swiss cheese lining up with another hole in the following slice are very similar to the odds of one step in a staircase being followed by another step.
No, it's a metaphor.
The three-dimensional gas bubbles aren't connected. An attacker has to punch through the thin walls to cross between the bubbles or wear and tear has to erode the walls over time. This doesn't fundamentally change anything.
And there's the archetypal comment on technology-based social media that is simultaneously technically correct and utterly irrelevant to the topic at hand.
Actually the pedantry is meaningful!
You cannot create a swiss cheese safety model with correlated errors, same as how the metaphor fails if the slices all come from the same block of swiss cheese!
You have to ensure your holes come from different processes and systems! You have to ensure your swiss cheese holes come from different blocks of cheese!
Note that "Don't make mistakes" is no more actionable for maintenance of a huge cargo ship than for your 10MLoC software project. A successful safety strategy must assume there will be mistakes and deliver safe outcomes nevertheless.
Obviously this is the standard line any disaster prevention, and makes sense 99% of the time. But what's the standard line about where this whole protocols-to-catch-mistakes thing bottoms out? Obviously people executing the protocol can make mistakes, or fall victim to normalization of deviance. The same is true for the next level of safety protocol you layer on top of that. At some level, the only answer really is just "don't make mistakes", right? And you're mostly trying to make sure you can do that at a level where it's easier to not make mistakes, like simpler decisions not under time pressure.
Am I missing something? I feel like one of us is crazy when people are talking about improving process instead of assigning blame without addressing the base case.
Normalization of deviance doesn't happen through people "making mistakes", at least not in the conventional sense. It's a deliberate choice, usually a response to bad incentives, or sometimes even a reasonable tradeoff.
I mean ultimately establishing a good process requires make good choices and not making bad ones, sure. But the kind of bad decisions that you have to avoid are not really "mistakes" the same way that, like, switching on the wrong generator is a mistake.
Quite, normalization is another failure mode, besides simple mistakes, that process has to account for.
It kind of is though. There's a lot less opportunity for failures at the limit and unforeseen scale. Mechanical things also mostly don't keel over or go haywire with no warning.
Only tangentially related but the debate over whether the Francis Scott Key bridge is or was a bridge got so heated on Wikipedia that the page had to be protected, and I finally have a reason for bringing this up
Edit wars aside, it's a nice philosophical question.
https://en.wikipedia.org/wiki/Francis_Scott_Key_Bridge_(Balt...
>The seven highway workers and inspector on the Key Bridge at the time were not notified of the Dali’s emergency situation before the bridge collapsed. We found that, had they been notified about the same time the MDTA Police officers were told to block vehicular traffic, the highway workers may have had sufficient time to drive to a portion of the bridge that did not collapse. Further, we found that effective and immediate communication to evacuate the bridge during an emergency is critical to ensuring the safety of bridge workers.
Video explanation: https://www.youtube.com/watch?v=bu7PJoxaMZg
That was super helpful. I was assuming from skimming the text description that it was a failed crimp
A lot of people wildly under-crimp things, but marine vessels not only have nuanced wire requirements, but more stringent crimping requirements that the field at large frustratingly refuses to adhere to despite ABYC and other codes insisting on it
> A lot of people wildly under-crimp things
The good tools will crimp to the proper pressure and make it obvious when it has happened.
Unfortunately the good tools aren't cheap. Even when they are used, some techs will substitute their own ideas of how a crimp should be made when nobody is watching them.
While the US is still very manual at panel building, Europe is not.
So outside of waiting time, I can go from eplan to "send me precrimped and labeled wires that were cut, crimped, and labeled by machine and automatically tested to spec" because this now exists as a service accessible even to random folks.
It is not even expensive.
Can you give an examples of companies that offer this service?
This attitude wherein one thinks they can just spend money and offload responsibility is exactly the problem.
Abdicating responsibility to those "good tools" are why shit never gets crimped right. People just crimp away without a care in the world. Don't get me wrong, they're great for speed and when all you're doing it working on brand new stuff that fits perfect. But when you're working on something sketchy you really want the feedback of the older styles of tool that have more direct feedback. They have a place, but you have to know what that place is.
See also: "the low level alarm would go off if it was empty"
Here's the attached report, it has a lot of additional helpful information: https://www.ntsb.gov/investigations/Documents/Board%20Summar...
The big problem was that they didn't have the actual fuel pumps running but were using a different pump that was never intended to fulfill this role. And this pump stays off if the power fails for any reason.
The bad contact with the wire was just the trigger, that should have been recoverable had the regular fuel pumps been running.
We should have federal legislation requiring tugboat assist adequate to recover from complete loss of power and steering, through shipping channels that go under bridges supported by mid span support columns. The mechanism should be that if the Coast Guard catches you without a tug, the ship is permanently banned from the port under threat of seizure and repossession by the US federal government, or your vessel just gets immediately seized and held in port under bond.
Insurance providers insuring ships in US waters should also be required to permanently deny insurance coverage to vessels found to be out of compliance, though I doubt the insurance companies would want to play ball.
In a well engineered control system, any single failure will not result in a loss of control over the system.
Was a FMECA (Failure Mode, Effects, and Criticality Analysis) performed on the design prior to implementation in order to find the single points of failure, and identify and mitigate their system level effects?
Evidence at hand suggests "No."
"Catastrophe requires multiple failures – single point failures are not enough. The array of defenses works. System operations are generally successful. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure. Put another way, there are many more failure opportunities than overt system accidents. Most initial failure trajectories are blocked by designed system safety components. Trajectories that reach the operational level are mostly blocked, usually by practitioners."
https://how.complexsystems.fail/#3
> In a well engineered control system, any single failure will not result in a loss of control over the system
That's true in this case, as well. There was a long cascade of failures including an automatic switchover that had been disabled and set to manual mode.
The headlines about a loose wire are the media's way of reducing it to an understandable headline.
Most cargo ships have a single main engine with plenty of backup-less failure points. They are sort of engineered so these failures can't happen suddenly but you can help yourself to a bunch of videos on how substandard fuel and parts shortages cause week-long poweroffs in a middle of the ocean.
System designers and regulators are aware that the main engine is a single point of failure, but they generally consider loss of main engine power to not be an immediate emergency. There are redundant systems to retain electrical and hydraulic power, and losing motive power isn't generally an instant emergency. Power and steering together is an emergency, yes, and steering is degraded without power, but had they still been able to use the rudder they wouldn't have hit the bridge.
Steering without power at 8 knots would be pretty inefficient (and was - they tried to steer as the power came back). Loss of power in ports, narrow straits etc is recognized as a major issue which is why an engineer and ETO must be in the engine control room during such passages.
A label placed half an inch wrong on misleading affordance -> 200,000 ton bridge collapse, 6 deaths, tens of billions of dollars of economic damage
Instant classic destined for the engineering-disasters-drilled-into-1st-year-engineers canon (or are the other swiss cheese holes too confounding)
Where do you think it would fit on the list?
The image brings to mind the Cisco ethernet boot infographic: https://www.cisco.com/c/en/us/support/docs/field-notices/636...
I can't believe I've never seen this. I literally laughed out loud when I got to the image. Thank you! Absolute gold
I love this one.
Someone out there spent ages trying to work this out.
Fucking hell.
I guess this will still be bellow Therac-25 for CS and CE students, but above for EE, ME, and Civil Engineering.
It’s been noted that automatic failover systems did not kick in due to shortcuts being taken by the company: https://youtu.be/znWl_TuUPp0
If anyone was curious what is happening with the replacement, I just found this website: https://keybridgerebuild.com/
When shipowners are willing to cut costs with sketchy moves like registering with a random landlocked African country, why should we believe they'll spend any time or effort reading/implementing NTSB guidelines? It isn't like there's some well respected international body like ITAO calling the shots
I know a little about planes and nothing about ships so maybe this is crazy but it seems to me that if you're moving something that large there should be redundant systems for steering the thing.
There are.[1] Unfortunately they take longer to employ than the crew had time.
[1] As it happens I open with an anecdote about steering redundancy on ships in this post: https://www.gkogan.co/simple-systems/
Thanks for this comment!
Shipping is a low-margin business. That business structure does not incentivize paying for careful analysis of failure modes.
Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.
>Seems to me the only effective and enforceable redundancy that can be easily be imposed by regulation would be mandatory tug boats.
Way it worked in Sydney harbour 20+ years ago when I briefly worked on the wharves/tugs, was that the big ships had to have both local tugs, and a local pilot who would come aboard and run the ship. Which seemed to me to be quite an expensive operation but I honestly cant recall any big nautical disasters in the habour so I guess it works.
> mandatory tug boats
Which there are in some places. Where I grew up I'd watch the ships sail into and out of the oil and gas terminals, always accompanied by tugs. More than one in case there's a tug failure.
I was very confused by the word "contact" in the headline, which apparently means "crashed the fuck into and killed six people"
Non redundant fuel pump that doesn't even restart on power failure. Main engine shutting of when water pressure drops, backup generator not even starting in time AND shoddy wiring that offlines the whole steering system. Thats what i call GOATED engineering. props to Hyundai HI
> Non redundant fuel pump that doesn't even restart on power failure
The crew weren't using the redundant fuel pumps. They were using the non-redundant fuel line flushing pump as a generator fuel pump, a task it was never designed for and which was not compliant.
That it doesn't restart on restoration of power is by design; you don't want to start flushing your fuel lines when the power returns because this could kill your generators and cause another blackout.
> Main engine shutting of (sic) when water pressure drops
Yeah, this is quite bad. There ought to be an override one can activate in an emergency in order to run the engines to the point of overheating, under the assumption that even destroying the engine will cause less catastrophic consequences than not having propulsion at the time.
> backup generator not even starting in time
There were 5 generators on board. Generators 1 through 4 are the main generators on the HV bus side, and the emergency backup generator is on the LV bus side.
When the incident occurred, the ship was being powered by generators 3 and 4, which were receiving their fuel via the non-redundant fuel line flushing pump. These generators powered the HV bus, which powered the LV bus via a transformer. The emergency backup generator was not running, so the LV bus was only receiving power from the HV bus via 1 transformer.
The incident tripped the circuit breaker for this transformer, disconnecting the HV bus from the LV bus, resulting in the first LV bus blackout. This resulted in main engine shutdown (coolant pump failure) and an automatic emergency backup generator startup.
There is an alternate (backup) set of circuit breakers and transformer that could have energised the LV bus, but the transformer switches were left in the manual position, so this failover did not happen automatically and immediately. There were no company procedures or regulations which required them to be left in the automatic position.
The LV bus also powered the fuel line flushing pump, so this pump failed. As a result, generators 3 and 4 started to fail (being supplied with fuel by a pump which was no longer operating). The electrical management system automatically commanded the start of generator 2 in response to the failing performance of generators 3 and 4.
Generator 1 and generator 2 were fed by the standard fuel pumps, which were available. One main generator is capable of powering the entire ship, so there was no need to start generator 1 as well; this would have just put more load on the HV bus (by having to run the fuel pump for generator 1 as well).
Instead of the automatic transformer failover (which was unavailable), the crew manually closed the same circuit breaker that had already tripped, 1 minute after the first LV bus blackout.
This restored power to the LV bus via the same transformer that was originally powering it, but did not restart the fuel line flushing pump supplying generators 3 and 4 (which were still running, but spinning down because they were being fed fuel via gravity only). This also restored full steering control, but this in itself was inadequate to control the vessel's course without the engine-driven propeller.
The main engine was still offline and takes upwards of half a minute to restart, assuming everyone is in place and ready to do so immediately, which was unlikely.
The emergency backup generator finally started 10 seconds later (25 seconds too late by requirements, 70 seconds after the first LV bus blackout).
Generator 2 had not yet gotten up to speed and connected to the HV bus before generators 3 and 4 disconnected (having exhausted the gravity-fed fuel in the line ahead of the inoperative fuel line flushing pump), resulting in an HV bus blackout and the second LV bus blackout. With only the emergency backup generator running on the LV side, only one-third of steering control was available, but again, this was inadequate without the engine.
3 seconds later, generator 2 connected to the HV bus. 26 seconds later, a crew member manually activated the alternate transformer, restoring power to the LV bus for the second time.
The collision was preventable:
- It is no longer a requirement that the engine automatically shuts down due to a loss of coolant pressure. It was at the time the vessel was constructed, but this was never re-evaluated. If it were, the system may have been tweaked to avoid losing the engine.
- If the transformer switches were left in the automatic position, the LV bus would have switched over to being powered by the second transformer automatically, and the engine coolant pumps and fuel line flushing pump would not have been lost.
- Leaving the emergency backup generator running (instead of in standby configuration) would have kept the LV bus energised after the first transformer tripped, and the engine coolant pumps and fuel line flushing pump would not have been lost.
- If the crew had opted to manually activate the second transformer within about half a minute (twice as fast as they reactivated the first one), and restarted the fuel line flushing pump, a second blackout would have been avoided, and the engine could have been restarted in time to steer away.
This shows the importance of leaving recovery systems armed and regularly training on power transfer procedures. It also illustrates why you shouldn't be running your main generators from a fuel pump which isn't designed for that task. This same pump setup was found on another ship they operated.
So there were two big failures: Electrician not doing work to code; inspector just checking the box during the final inspection.
No. Lots more : It's because they were abusing a non-redundant pump to supply fuel to the generators. Which then failed, which ....
From the report:
> The low-voltage bus powered the low-voltage switchboard, which supplied power to vessel lighting and other equipment, including steering gear pumps, the fuel oil flushing pump and the main engine cooling water pumps. We found that the loss of power to the low-voltage bus led to a loss of lighting and machinery (the initial underway blackout), including the main engine cooling water pump and the steering gear pumps, resulting in a loss of propulsion and steering.
...
> The second safety concern was the operation of the flushing pump as a service pump for supplying fuel to online diesel generators. The online diesel generators running before the initial underway blackout (diesel generators 3 and 4) depended on the vessel’s flushing pump for pressurized fuel to keep running. The flushing pump, which relied on the low-voltage switchboard for power, was a pump designed for flushing fuel out of fuel piping for maintenance purposes; however, the pump was being utilized as the pump to supply pressurized fuel to diesel generators 3 and 4\. Unlike the supply and booster pumps, which were designed for the purpose of supplying fuel to diesel generators, the flushing pump lacked redundancy. Essentially, there was no secondary pump to take over if the flushing pump turned off or failed. Furthermore, unlike the supply and booster pumps, the flushing pump was not designed to restart automatically after a loss of power. As a result, the flushing pump did not restart after the initial underway blackout and stopped supplying pressurized fuel to the diesel generators 3 and 4, thus causing the second underway blackout (lowvoltage and high-voltage).
No, there was a larger failure: whoever designed the control system such that a single loose wire on a single terminal block (!) could take down the entire steering system for a 91,000 ton ship.
They didn't.
If you read the report they were misusing this pump to do fuel supply when it wasn't for that. And it was non redundant when fuel supply pumps are.
Its like someone repurposing a husky air compressor to power a pneumatic fire suppression system and then saying the issue is someone tripping over the cord and knocking it out.
There's a 3rd failure: the failure to install/upgrade dolphins that could deflect a modern containership, despite the identified need for such. That proposed project seems cheap in retrospect.
Yes, 100%. Lots of failures across the board here. Especially with large ships and how many different nations they might be registered in, I can't imagine it's easy to have a lot of regulatory oversight into their construction, mechanical inspection or maintenance schedules. I'm curious how modern ports handle this problem, feels like it could cause a ton of issues beyond just catastrophic ones like this one.
The terminal blocks could also have been designed to aid visual inspection.
I predicted 10yr & $20B to replace it and stand by that forecast.
You're an optimist!
Worth noting: The MV Dali is a 1000-foot-long ship, weighing 50% more than a nuclear aircraft carrier, with a total crew of twenty-two.
That's everybody - captain, bridge crew, deck crew, cook, etc.
So - how many of those 22 will be your engineering crew? How many of those engineers would be on duty, when this incident happened? And once things start going wrong, and you're sending engineers off to "check why Pump #83, down on Deck H, shows as off-line" or whatever - how many people do you have left in the big, complex engineering control room - trying to figure out what's wrong and fix it, as multiple systems fail, in the maybe 3 1/2 minutes between the first failure and when collision becomes inevitable?
My rule for a couple decades: any failover procedure that only gets run when there's a failure, will not work.
This is a great example of why “small details” matter. How many times do you think an apprentice has been corrected about this? What percentage of the time does the apprentice say “yeah but it’s just a label”. Lots of things went wrong in this case, but if the person that put the label on that wire did it correctly then this whole catastrophe could have been avoided.
I still hate screw terminal blocks. Spring terminals + ferrules are still the way.
Clear plastic viewing windows on the spring terminals are the way to go. It allows for both instant feedback for the installer, and visual inspection or troubleshooting later by a third party.
The spring terminals should also be designed to have a secondary latch on this type on (what should be) rugged installation.
Finally, critical circuits should be designed to detect open connections, and act accordingly. A single hardware<->software design for this could be a module to apply across all such wiring inputs/outputs. This is simple and cheap enough to do these days.
A manual tug-test on the physical would be advisable when installing, to check the spring terminal has gripped the conductor when latched.
"Contact" is a weird choice of words.
Yeah, when the word “allision” was right there!
Not really, because that's where that part of the investigation ends.
Pre-contact everything is about the ship and why it hit anything, post-contact everything is about the bridge and why it collapsed. The ship part of the investigation wouldn't look significantly different if the bridge had remained (mostly) intact, or if the ship had run aground inside the harbor instead.
Reminds me of "fetched up" describing what happened to the Exxon Valdez.
Thought the same, bridge is fallen on its entire length, sounds like a way to undersell it. Such an opportunity to pass on clickbait is interesting in this day and age.
I’m not sure that the NTSB is really in the clickbait business. But yes, contact does seem to really be underselling the event.
Right? Like when I read that I thought we're talking a little paint-swapping.
No, we are not talking a little paint-swapping.
"and WAGO Corporation, the electrical component manufacturer"
Sucks to be any of the YouTubers influencers today telling everyone they should use WAGO connectors in all their walls.
Seriously though, impressive to trace the issue down this closely. I am at best an amateur DIY electrician, but I am always super careful about the quality of each connection.
The WAGO connectors typically used in home wiring have a transparent plastic shell which lets you see whether the wire made it all the way through the spring clip. The ones shown in the NTSB video had an opaque shell around the spring clip.
I think my attempt at humor butthurt a lot of WAGO fans. I used "seriously though" after in my actual... serious comment.
I don't see anything in the report that suggests the connector failed. It sounds like the installer failed. Trust me, they can screw up twist connections too :)
The date for bridge completion was bumped from 2028 to 2030 already. I assume it won't be done until 2038. It is absolutely murdering traffic in the Baltimore area, not having a bridge. I would be super interested in seeing where every single dollar goes for this project, I assume at least 1/3 of it will be skimmed off the top.
The consensus seems to be skimming won’t occur. I’d encourage people to research the corruption of elected officials in the Baltimore area.
The consensus is that your comment is way off-topic.
The older I get , the more I trust people over rules.
Does this comment apply to the current crop of American politicians? (Just curious.)
Well, lack of trust in that case .
That’s what I was referring to. The concept that comprehensive laws can substitute leaders with integrity is ridiculous