My area is more navigation software for airplanes so I can't speak authoritatively but I can toss a few vaguely educated ideas out there.Metryq wrote:...if anyone more knowledgeable in computer systems used on space probes could explain this in more detail, I'm curious how this planned-for situation could be allowed to happen in the first place. That is, I realize the computer on a deep space probe has many limitations in power and other system resources—especially when all of it must be designed to function for years. Back to the question: if the probe was designed to default to Safe Mode in this situation, it dumps both the incoming instructions, as well as the job being executed at the time. Why didn't the system give one task priority and, say, bounce back a message, "Sorry, I was busy at the time. Please repeat your message."
Why was such a huge incoming message sent in the first place, unless the probe was behind schedule and mission controllers did not know the computer would be occupied at the time? Considering how precious the computing time must be, I would expect better scheduling of the resources.
Suppose someone—for whatever reason—wanted to foul the mission at the critical moment and took advantage of this "switch to Safe Mode" behavior by DOS attacking the computer?
Or are these stupid questions?
Each processor in a computer can only do one thing at a time. In order to "do multiple things at once", a snapshot of the current task is stored and the processor can work on a separate task and then reload the previous task at a later point (vast simplification but meh). I would have expected something similar to what you suggest, when busy compressing and receiving a request for other intensive actions the probe would just say come back later.
One possible reason why this isn't happening is the probe has both hardware limitations and must run fully autonomously. If the probe locks up you can't send perform a power reset. If the resources used for compressing data overlap with those used to process commands, then the safe thing may be to just dump everything and reset to a safe condition.
I suspect this was just lack of detail in reporting. I would think the actual issue was the software hit a corner case condition not covered during testing which led to a conflict between different operations and the software took the safe route of dropping to a safe state (similar to a windows blue screen). This kind of thing can happen in aviation software even with the amount of testing we do (in theory we test every branch). Space probe testing is good but I suspect it is not quite as intense as aviation.