When we think of failures in information systems, people can be forgiven for thinking it only concerns computers. Information systems create images of programmers and computer labs and maybe even a “nerd” or two. To a degree this is true but it’s so much more than just a physical computer. As we have seen from our blogs, information systems are surrounding us and concerns everybody on a daily basis, whether it is sending a text or checking the bus timetable. Information systems could be anything from a laser not hitting its target to a blood monitor sending out an incorrect reading. Once a failure occurs it is not just a problem for the developers but it is a hindrance for every end-user of the program. This is why it is so important to have a set method for dealing with failure, that is effective and quick. By using this effective chart, ease of the end-user increases and problems decline.
As we can clearly see from the graph (provided by http://www.jhberkandassociates.com/systems_failure_analysis.htm), there are multiple stages in dealing with failure.
1. Firstly, the failure must occur.
2. We need to gather information about the current failure we are solving.
3. Once we have gathered this information, it will be easier to define exactly what went wrong. A reason why repair to a system can take so long is simply down to the fact that what happened is unknown and cannot be found.
4. A great way to repair this fault effective is to produce a “fault tree analysis”. This is involves drawing out a map of how and when the failure happened.
5. Depending on the level of failure, the cause may be found and repaired quickly. Other times it could take longer and different stages may need to be completed. This could mean seeing if the program itself is up to standard, that there may be something going wrong with the hardware and not just be a software issue, doing a “what’s different?” test or pre-testing and experimenting on the software.
6. After converging on root causes to the failure we need to develop and implement the proper corrective action to make the system successful again.
7. Finally, monitoring the system is key to it not failing again. By closely monitoring, we can tell if our actions were successful or if they created more issues for us and the users. If it was unsuccessful, we clearly need to change our plan for dealing with it.
So there you have it! By completing each stage we get closer to fixing our issue in as time effective and a least costly way possible.