• Our new ticketing site is now live! Using either this or the original site (both powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

Worldwide IT Outage (Crowdstrike update)

Status
Not open for further replies.

Bantamzen

Established Member
Joined
4 Dec 2013
Messages
9,996
Location
Baildon, West Yorkshire
The other thing which has come to the fore is the prevalence of excuses, maybe made by less reputable companies, Ryanair for example, making a point about "because of a failure of third party software, we're unable to ....."

No, you chose to use Windows, in whatever way you did, and if it doesn't work it's your problem and your responsibility. Sort out your issues with third parties in the background, we don't need to know.
I'm not sure what you want businesses to do to here? If a third party application, in this case the Crowdstrike software (it wasn't Windows that failed), is bugged what do you want them to tell you exactly? Businesses choose software based on functionality and cost, they cannot guarantee that there won't be any future issues with something that they have no control over, just as a builder cannot guarantee that their van won't break down on the day you want them to do some work.
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

jfollows

Established Member
Joined
26 Feb 2011
Messages
7,873
Location
Wilmslow
I'm not sure what you want businesses to do to here? If a third party application, in this case the Crowdstrike software (it wasn't Windows that failed), is bugged what do you want them to tell you exactly? Businesses choose software based on functionality and cost, they cannot guarantee that there won't be any future issues with something that they have no control over, just as a builder cannot guarantee that their van won't break down on the day you want them to do some work.
To me, and it's subjective so I'm not disagreeing with you, it's just my interpretation, some companies are too quick to point to others for their failures. Businesses who let their customers down need to be responsive and supportive to their customers, whatever the reason, and not just throw up their hands with a "nothing we can do here, not our fault" kind of attitude, but I sense this too often whether or not it's warranted.
If a builder's van breaks down on their way to me, I'm likely to think that they haven't maintained the van properly, so their approach to this will be important to me. I used to have colleagues who were always late for meetings, for reasons such as "bad traffic on the M6". There's always bad traffic on the M6, they simply never left enough time to get to the meetings on time, the failure was theirs and not someone else's.
Likewise here, the IT implementation was the business's choice, and this time it went wrong, and other businesses making different choices didn't have a problem. So being too quick to point the finger at third parties doesn't fit well with me. I ran a business in which I underpromised and overdelivered all the time and I don't think I was as quick with excuses as some of these were.
 

nlogax

Established Member
Joined
29 May 2011
Messages
5,687
Location
Mostly Glasgow-ish. Mostly.
Businesses will need to reconsider their options and determine if they trust Crowdstrike not to do this again, and if not then consider alternatives.

This isn't Crowdstrike's first global BSOD rodeo. Many CS customers were affected by something similar five years ago if they also happened to be running another vendor's DLP agent at the same time.
 

jfollows

Established Member
Joined
26 Feb 2011
Messages
7,873
Location
Wilmslow
Bloomberg quotes Michael Henry, co-founder and chairman of Plano, Texas-based cybersecurity services firm Accelerynt Inc. saying
“CrowdStrike has done more to disrupt global business than all the ransomware operators combined,” he said. “This is a demonstration of how much risk we’re carrying with this software that we’ve deployed to protect ourselves: If these guys get it wrong, they can take your business down.”
which seems correct, sadly, to me. There is a flaw in something, somewhere, when the supposed solution to a problem ends up being significantly worse than the problem.
The same article ( https://www.bloomberg.com/news/feat...ing-airports-banks-paralyzed?srnd=homepage-uk) says
“It’s time for the industry to grow up and maybe slow down a bit,” said Federico “Fede” Charosky, founder and CEO of Edinburgh-based security services firm Quorum Cyber. “Some developer somewhere made a change and there was no analysis of what impact that change would have. There’s clearly a lack of quality assurance and testing and taking shortcuts in pursuit of speed. What this shows is that we’re delusional in our complete trust in the technologies that are so intrinsic to running everything.”
 
Last edited:

JamesT

Established Member
Joined
25 Feb 2015
Messages
3,532
Businesses will need to reconsider their options and determine if they trust Crowdstrike not to do this again, and if not then consider alternatives.

This isn't Crowdstrike's first global BSOD rodeo. Many CS customers were affected by something similar five years ago if they also happened to be running another vendor's DLP agent at the same time.
And for those on Linux, here’s an example of Crowdstrike managing to ship something that kernel panicked Redhat machines:
 

LYradial

Member
Joined
8 Jun 2024
Messages
184
Location
welsh marches
am I just naturally suspicious, they were very quick to emphasize this was not a cyber attack, too quick in my view.

How could this seemingly badly tested update have been released, the only explanation I can think of is that the version sent out for automated release was not the one intended.

A typo in the file name.? A deliberate act?
 

signed

Established Member
Joined
13 May 2024
Messages
1,447
Location
Paris, France
How could this seemingly badly tested update have been released, the only explanation I can think of is that the version sent out for automated release was not the one intended.
This update came as an emergency fix for the previous borked update that caused a 100% CPU usage peg

A deliberate act?
Extremely unlikely.

I would be more inclined to say they deliberately skipped parts of the testing to get the fix to the 100% CPU issue. Trading deploy speed for safety.

Same was happening with the infinite regex bug Cloudflare once shipped through emergency updates channels that downed most of the internet.
 

Msq71423

Member
Joined
30 Jun 2022
Messages
78
Location
North West
Is it only Crowdstrike themselves saying it wasn't a cyber attack? Has there been any independent verification of this?
 

skyhigh

Established Member
Joined
14 Sep 2014
Messages
6,329
am I just naturally suspicious, they were very quick to emphasize this was not a cyber attack, too quick in my view.

How could this seemingly badly tested update have been released, the only explanation I can think of is that the version sent out for automated release was not the one intended.

A typo in the file name.? A deliberate act?
The thing is with the scale of this incident, every man and their dog will want to know the details. There will almost certainly be legal cases.

If it was to get out that they'd lied about it not being an attack, then that would destroy any remaining confidence in their product - much more so than if they were open about it. They'd have to be stupid to try and cover it up.
 

Bantamzen

Established Member
Joined
4 Dec 2013
Messages
9,996
Location
Baildon, West Yorkshire
To me, and it's subjective so I'm not disagreeing with you, it's just my interpretation, some companies are too quick to point to others for their failures. Businesses who let their customers down need to be responsive and supportive to their customers, whatever the reason, and not just throw up their hands with a "nothing we can do here, not our fault" kind of attitude, but I sense this too often whether or not it's warranted.
If a builder's van breaks down on their way to me, I'm likely to think that they haven't maintained the van properly, so their approach to this will be important to me. I used to have colleagues who were always late for meetings, for reasons such as "bad traffic on the M6". There's always bad traffic on the M6, they simply never left enough time to get to the meetings on time, the failure was theirs and not someone else's.
Likewise here, the IT implementation was the business's choice, and this time it went wrong, and other businesses making different choices didn't have a problem. So being too quick to point the finger at third parties doesn't fit well with me. I ran a business in which I underpromised and overdelivered all the time and I don't think I was as quick with excuses as some of these were.
I'm sorry but I wholeheartedly disagree. I'm not sure what business you ran but if you rely on a third party, be it for infrastructure, logistics etc, and they fail through errors of their own they it is they that are responsible. So there is nothing wrong with saying we can't deliver on time because critical systems, in this case computer systems, are having issues as a result. It's not an excuse, it's a fact. This was a piece of security software for networks from a company that previously had a good reputation for delivering, and certainly not bricking computers. So how can you blame the customers for buying and using said software?

Clearly something went terribly wrong with the testing of the software patch that created this issue, in fact it's pretty much one of the worst ones I've ever know, made worse because in many cases the problem could not be resolved remotely. But this is 100% of CrowdStrike, their devs (and I would not want to be on their team today) to put it bluntly screwed up. And all those companies relying on that security software is apologise to their customers, explain what's happened, try to resolve their needs, and prepare a whopping multi-billion compensation claim.
 

eoff

Member
Joined
15 Aug 2020
Messages
591
Location
East Lothian
Obviously businesses have to weigh up the tradeoff of using any software against the benefits and those are hard to quantify (what would the impact have been if you did not use such software).
I had an issue in the past when an ESET Nod32 update caused my Windows 7 PC to crash frequently. But ESET support immediately helped me, even getting on a screenshare session so I could show then the diagnostics I had and they helped safely do the reinstall to fix the issue. It was also probably a testing problem as I suspect the problem was caused because I had disabled certain features and the updated code perhaps assumed that was not the case.
 

Bantamzen

Established Member
Joined
4 Dec 2013
Messages
9,996
Location
Baildon, West Yorkshire
am I just naturally suspicious, they were very quick to emphasize this was not a cyber attack, too quick in my view.

How could this seemingly badly tested update have been released, the only explanation I can think of is that the version sent out for automated release was not the one intended.

A typo in the file name.? A deliberate act?
Poor testing could easily be the reason, or perhaps (and this is speculation on my part only) no testing. Maybe this was a small patch that required few changes in the code, so the devs assumed that there was no reason to test. I've seen it in my job, people updating code without testing, then pushing it to production and crashing systems. Needless to say I've had a few sharp and vulgar words with such people!
 

JohnMcL7

Member
Joined
18 Apr 2018
Messages
950
Is it only Crowdstrike themselves saying it wasn't a cyber attack? Has there been any independent verification of this?
I've not seen anything to suggest it's a cyber attack and analysis of the files suggests it was an error with the file so when the Falcon software tried to load it, the system crashed. This behaviour is exactly what you don't want with a cyber attack instead you'd want to sneak the malicious code in so that no-one would notice and it could run quietly in the background like the recent xzutils backdoor:

 

dastocks

Member
Joined
3 Nov 2021
Messages
218
Location
Hove
I'm not so sure. One of the error screens seen on TV showed HAL_INITIALIZATION_FAILED. I doubt a definition update would need to fiddle with the Hardware Abstraction Layer.
The Windows 11 BSOD I got on the company laptop said something about a device failure. Something has to be wrong at a fairly low level in the OS to be seeing a BSOD in the first place. It's the first one I've seen on Windows 11 and I'm not sure I ever saw one in several years of Windows 10 usage.
 

JamesT

Established Member
Joined
25 Feb 2015
Messages
3,532
The Windows 11 BSOD I got on the company laptop said something about a device failure. Something has to be wrong at a fairly low level in the OS to be seeing a BSOD in the first place. It's the first one I've seen on Windows 11 and I'm not sure I ever saw one in several years of Windows 10 usage.
I think most BSODs tend to be related to dodgy drivers. I'm not sure if the hardware manufacturers have got better at writing drivers, or increasingly hardware can be handled by generic drivers from MS that get better testing.
 

JohnMcL7

Member
Joined
18 Apr 2018
Messages
950
The Windows 11 BSOD I got on the company laptop said something about a device failure. Something has to be wrong at a fairly low level in the OS to be seeing a BSOD in the first place. It's the first one I've seen on Windows 11 and I'm not sure I ever saw one in several years of Windows 10 usage.
You're correct, the software uses kernel mode drivers that are set to run when Windows is booting hence Windows crashing on boot when the kernel mode driver attempts to load the damaged channel update file:


It is normal for security software to run at such a low level but it means if they make an error then they can damage the entire system, it's not even the first time they've done since in the past they've taken down some Linux systems but it was nowhere near as widespread:

 

davews

Member
Joined
24 Apr 2021
Messages
792
Location
Bracknell
Any updates on the SWR ticket machines? The ones at Martins Heron are in an endless reboot, one showing a Windows boot up screen saying 'please wait'. Ticket office closed and it seems from Journey Check that many others are closed all day as well. Official SWR advice is to buy on line, pick up at the ticket office or use an E-ticket. Or buy on the train. When I passed through just now several people trying to buy tickets on their phones and asking others for help. E-tickets of course are no use for out-boundary travel cards or cross London tickets. Wonder how they will manage to reboot these machines in safe mode and browse to delete the offensive file.....
 

CaptainHaddock

Established Member
Joined
10 Feb 2011
Messages
2,467
Is it only Crowdstrike themselves saying it wasn't a cyber attack? Has there been any independent verification of this?
As the old "Yes Minister" quote goes, "Never believe anything until it's been officially denied".

If nothing else, it's taught us all a valuable lesson; that as a society we're way too trusting of IT systems and sometimes it's best to question whether technological advances make our lives better and are an improvement on what they're intended to replace. I certainly won't be leaving home without a few £10 notes in my wallet in future!
 

Ediswan

Established Member
Joined
15 Nov 2012
Messages
3,260
Location
Stevenage
Any updates on the SWR ticket machines? The ones at Martins Heron are in an endless reboot, one showing a Windows boot up screen saying 'please wait'. Ticket office closed and it seems from Journey Check that many others are closed all day as well. Official SWR advice is to buy on line, pick up at the ticket office or use an E-ticket. Or buy on the train. When I passed through just now several people trying to buy tickets on their phones and asking others for help. E-tickets of course are no use for out-boundary travel cards or cross London tickets. Wonder how they will manage to reboot these machines in safe mode and browse to delete the offensive file.....
For something like a ticket machine, it may be simpler to reinstall the full software system.
 

Oxfordblues

Member
Joined
22 Dec 2013
Messages
849
The chances of this happening on the busiest travel day of the year, Friday 19 July, are 365-1. But it happened on 19 July!
 

jfollows

Established Member
Joined
26 Feb 2011
Messages
7,873
Location
Wilmslow
The chances of this happening on the busiest travel day of the year, Friday 19 July, are 365-1. But it happened on 19 July!
Only if the updates are equally likely to be pushed out every day of the year, and the people responsible for them work every day. Less likely on 4 July or 25 December I suggest.
 

The exile

Established Member
Joined
31 Mar 2010
Messages
4,703
Location
Somerset
As the old "Yes Minister" quote goes, "Never believe anything until it's been officially denied".

If nothing else, it's taught us all a valuable lesson; that as a society we're way too trusting of IT systems and sometimes it's best to question whether technological advances make our lives better and are an improvement on what they're intended to replace. I certainly won't be leaving home without a few £10 notes in my wallet in future!
Not only that, we’ve fallen into the trap of believing all those “just in time experts” who have insisted that having something in reserve is almost a sin against the Holy Ghost and their allies who see all “resilience redundancy” as nothing but waste.
 

jfollows

Established Member
Joined
26 Feb 2011
Messages
7,873
Location
Wilmslow
The chances of this happening on the busiest travel day of the year, Friday 19 July, are 365-1. But it happened on 19 July!
Plus it’s 366-1 if I accept your premise, which I don’t! 2024 is a leap year.

(Sorry, but I was a mathematician until I failed my exams at university in 1981)
 
Last edited:

Dai Corner

Established Member
Joined
20 Jul 2015
Messages
6,767
Probably completely unconnected, but the Ticketer machine on the Newport Transport bus I caught earlier wasn't working so passengers got a free ride.

I wish I was still registered with the Agency I signed up with after I retired from the IT trade. I reckon my phone would be ringing and I'd have a few days easy but lucrative work!
 

Strathclyder

Established Member
Joined
12 Jun 2013
Messages
3,436
Location
Clydebank
McAfee has been mentioned a few times in this thread, which is rather funny (not really) as the current Crowdstrike CEO and one of the company founders, George Kurtz, was the CTO (chief technology officer) at the former when it issued a broken update in April 2010 that ended up affecting millions of Windows XP machines; look up DAT 5958 for the details of that particular screw up.

Concidence, no doubt, that he went on to co-found and was in charge of a company that screwed up in a similar, but far more widespread and serious way. For the record, McAfee was bought by Intel within a year (Feburary 2011) of the DAT 5958 debacle.

 
Last edited:

londonbridge

Established Member
Joined
30 Jun 2010
Messages
1,660
As the Cashless Society thread has been closed I’ll post here that the Mail are, unsurprisingly, using the outage as an excuse to point out the dangers of a cashless society, with an article full of the usual misinformation that the problems were more to do with Microsoft and Windows rather than the Crowdstrike firm.
 

Mcr Warrior

Veteran Member
Joined
8 Jan 2009
Messages
14,602
The chances of this happening on the busiest travel day of the year, Friday 19 July, are 365-1. But it happened on 19 July!

Plus it’s 366-1 if I accept your premise, which I don’t! 2024 is a leap year.

(Sorry, but I was a mathematician until I failed my exams at university in 1981)
Whilst the chance of something happening on a given day in a leap year is very possibly 1 in 366, aren't the correct odds 365-1?
 

CaptainHaddock

Established Member
Joined
10 Feb 2011
Messages
2,467
As the Cashless Society thread has been closed I’ll post here that the Mail are, unsurprisingly, using the outage as an excuse to point out the dangers of a cashless society, with an article full of the usual misinformation that the problems were more to do with Microsoft and Windows rather than the Crowdstrike firm.
Does it matter which company is to blame if the whole incident demonstrates why a cashless society is a bad idea?
 

WelshBluebird

Established Member
Joined
14 Jan 2010
Messages
5,230
Does it matter which company is to blame if the whole incident demonstrates why a cashless society is a bad idea
Except, as has already been pointed out to you, it would still cause similar levels of chaos because cash machines and automated tills are also affected.
 

CaptainHaddock

Established Member
Joined
10 Feb 2011
Messages
2,467
Except, as has already been pointed out to you, it would still cause similar levels of chaos because cash machines and automated tills are also affected.
There are other ways of accessing cash. For example you could walk into your bank's local branch and ask to withdraw cash in person . *

* Which demonstrates the foolishness of banks trying to close down local branches because "Everyone banks online these days".

I doubt many people will be daft enough to do all their banking online after yesterday's events!
 
Status
Not open for further replies.

Top