|
Is there really such a thing as a single point of failure? Or could it be that putting all your eggs in one basket actually allows you to build simpler and more effective levels of redundancy?
At North the concept of distributing intelligence around buildings is one that we have long advocated. This can be achieved with our integration products. So for example, if the interface to the energy meters fails, then at least the interfaces between the chiller and the AHU will still work. It makes sense to distribute your intelligence, right?
Well, we are convinced that is the right way to go with many applications. However, new embedded PC technology and solid state memory is dramatically increasing the reliability of the PC platform. It would be churlish not check out what these incredibly powerful and cost- effective devices can offer. Apart, that is, from bucket loads of processing power and acres of memory, all at incredibly low prices. Certainly, a single box solution for all integration and for controlling dumb IO modules out in the field would be a very cost effective way to run a building. If only it were reliable. And if only the consultant didn’t insist on there being ‘no single point of failure’.
So why not let your IT people build in redundancy? They do it for applications far more crucial than heating and ventilation. They build redundancy into your enterprise itself. Banks are trusted to use a PC when we pay in a cheque; we trust our building society to use PC and server technology to pay our mortgage. But many will still baulk at the idea of controlling the temperature of a building with PC technology. Why? Well vested interests are at work of course, and understandably so. PC’s have a legacy of being unreliable, too. But things are moving on.
Negatives
Let’s look at the down side. You have one box. Running all of the control algorithms, PID loops, integration interfaces to talk to VRF systems, connections to energy meters and also the web services and convergence technologies. You have all of this because you are a forward-thinking organisation that would only buy fully interoperable, open building services. But it’s all in one box. A hard disk failure or fan clogging up with dust could crash the box and you would be left with no control of the building at all.
Can you fix it?
So what happens now? Do you have to wait for the supplier or maintenance company to come and fix it? Well no, it’s just a PC, and you can fix it.
How about this: You take a PC, any PC, your laptop even, and install the software on it. The install configures that machine to take over completely from the failed box. The failed box was just a PC after all. You plug the replacement into the network and you have control again. Of course in a real world situation you would have a redundant back up of the box already running. The example is somewhat flippant. But it does demonstrate that commoditisation and consolidation of these technologies has a huge benefit to the customer. One of your techies will replace the hard disk drive in the failed box, or fix the fan and hey presto, the whole system is back to normal. At the very worst, you can pop down the road and buy a new machine for a few hundred pounds. The redundant machine will go back on stand-by and the building is working as normal again. Notice that in all of this, no specialist expertise is needed. It’s just a PC; probably 2 out of 10 people you know could fix it.
But it still failed
Of course the point is that the device did fail. It failed because it is a PC and we know that PC’s fail all the time. Well, do they? Think about it. What really causes a PC to fail? Be honest! It’s you isn’t it! You clicked the wrong button, lost patience with it. Word won’t open quickly enough for you so you try and do something else. In the meantime it has all gone wrong. Take the keyboard and mouse (the user) away from the machine and see how amazingly stable and reliable your PC becomes. Then take away the fan and go for a fanless processor, remove the hard disk drive and install a flash drive. Now we have no moving parts. Run Windows XP embedded on it; plug it into a low cost UPS and my goodness this thing just doesn’t fall over. Then buy another one (hey, they are cheap after all) and do the same thing. So you have two ultra-reliable boxes, one just sitting there waiting for the other to go wrong so it can take over. If one of them does fail, you can fix it or even replace it for less than the cost of an engineer call-out. Devices out in the field are all dumb. If one of them fails, so what? It cost next to nothing, you have a stack of them in the spares cupboard, set the dip switch for the address, and swap it out. There is no software to worry about, as that’s all in the box.
Distributed Intelligence
A few years ago I was involved in a project that had 500 unitary controllers across a business park. The controllers all had a mean time between failure (MTBF) figure of 10 years. So, using standard MTBF calculations you can predict that within 10 years the controller will fail. That doesn’t mean that in 10 years time all the controllers will fail, all at once. It just doesn’t work like that. As the controllers are not redundant, but are in fact each an integral part of the system as a whole, the MTBF figure for the system has to be divided by the number of components. That is to say 10 years / 500 controllers equates to the average time between failures on that site. Yes that’s right, about one a week! There was nothing intrinsically wrong with these controllers. They were working to spec, but the estate manager was replacing on average one controller a week. The controllers were clever so they needed configuring and re-testing each time one was replaced. At huge expense. So the ownership cost of this distributed intelligence was considerable. All this pain, to get over the ‘no single point of failure’ issue. Talk about taking a sledge hammer to crack a nut.
Nowadays if I put a server or PC with an MTBF figure of five years on the network, it really does have an MTBF of five years. If there is a redundant server or PC next to it my MTBF is now up to 10 years. And when they fail I know how to fix it, cheaply and easily. The IO cards in the field may fail at a much higher rate, but they are cheap and easy to replace.
So a single box style solution is reliable, powerful and hugely cost effective and it is far easier to build in redundancy. Perhaps this is the future for building controls. Here at North we think it may well be. We even suspect that the server may not be located on the same site eventually; but in a server room somewhere on the Internet. A scary prospect? Only for those who can’t keep up.




