About Replify Products - What is WAN Optimization?
You probably already have some idea what WAN Optimization is, or you wouldn’t be on this site. But if you’ve got as far as this sentence presumably you would like to know more, or to understand how Replify define it.
The terms WAN Optimization, WAN Acceleration and Application Acceleration are effectively synonymous for our purposes here. Let’s talk about the generally accepted definition of WAN Optimization before getting into the Replify perspective.
First the context: most application protocols were designed for use on local area networks. By “application protocol” I mean the set of messages, and rules for exchanging them, which enable a user and a server, or two servers, to carry out some sort of useful interaction. HTTP is an application protocol: originally for allowing end-users (people on PCs) to find and browse text and image content on a remote system; of course now HTTP is used more generally for all kinds of things including carrying remote procedure calls, email traffic etc. CIFS (closely related to SMB) is the application protocol for manipulating and accessing folders and files remotely; you use it every time you open Explorer on a Windows PC for example. And there are others such as MAPI – the Messaging Applications Programming Interface, which support the conversation between Microsoft Outlook and an Exchange server. The problem with every one of these protocols and many others, is that they were designed without enough consideration for the amount of traffic they generate: the number and frequency of messages and the size of the messages. On a Local Area Network (LAN) they’re mostly fine – 100 or a 1000 Mbps (megabits per second) and a latency (message round trip time) of 1-2 ms (milliseconds) makes the applications seem very respnsive to users and everyone is happy. But there are three major trends pushing users, and the applications they wish to use, far apart and introducing a Wide Area Network (WAN) into the path – and that means bandwidth of perhaps only 1-2Mbps and latencies anywhere from 10ms to 200ms where really bad things start to happen with application protocols.
The networking trends driving this problem are:
1) Consolidation of servers into corporate data centers. It costs too much to support servers remotely in small sites – utilisation is poor and the cost of site visits to trouble-shoot issues is too great, so enterprises are pulling the technology into a few (or one) central sites.
2) Cloud computing. Just as the consolidation above was gathering pace, the industry realised that further efficiency and flexibility could be gained by pushing applications and the platforms on which they ran, into shared public data centers where you paid less, and paid only for what you used. This is driving the biggest transformation in delivery of IT that we’ve seen since the mainframe was displaced by mini-computers.
3) More people are spending more time outside the “office”. Some are home-workers, some travel on business, and some are off-shore workers.
The net result is that more and more people are trying to access corporate applications and data sources across a network that has far less bandwidth and far greater latency than a head-office LAN. And the applications themselves are often more complex, more collaborative and involve working with larger documents, databases, and rich content such as video. Consider a construction company working with large CAD documents on remote sites. Every design change is a small tweak to a large CAD file, which needs to be shared with others, and preserved as a record on the corporate eDRM system. Those changes may have to be communicated to and from a muddy field in a remote location served only by narrowband, dial-up, or satellite links. Or perhaps you’re a retailer with hundreds of small stores, all of which need to send stock-updates and orders each evening. Fat pipes may be available, but they’re expensive and your margins are slim – you really don’t want to put an expensive high-speed link into every site, just to handle that burst of information once a day.
So this creates a need to make WAN connections behave more like LAN connections in allowing more data, and faster data, across a connection. Luckily this is indeed possible and the most efficient way we know how to it today is to insert an intelligent proxy at each end of the connection that can intercept and “optimize” the traffic using a variety of complementary techniques. The idea is to use approaches which are transparent to both the user and the server so that nothing else need be changed, no security issues arise, and deployment is quick and easy.
There are three techniques common to nearly all WAN Optimisation and they work together as a “stack” of sorts on each side of the WAN connection.
First: Protocol Optimization. Protocols such as CIFS are called “chatty” because they send a lot of small messages to get the work done. When you copy a file it gets moved in small chunks and each chunk is acknowledged before the next one is sent. On a LAN – fine. Over a link between London and New York – awful. Protocol optimization involves watching the protocol, deducing what the user is up to, and then using a more efficient protocol to speed things up. Let’s take CIFS as an example. A user of Windows Explorer double-clicks on a .doc file on a remote server to launch Word; with protocol optimization in play we see the initial CIFS request and we realise that the whole file is going to have to cross the network, but rather than letting the CIFS protocol do it across the network (badly) we use CIFS at the far end to grab the whole file and then send it in bulk to the nearside using a better protocol, and then we respond to the CIFS messages from the user to serve up the file locally. The user and server think they are talking directly to one another but most of the conversation is “spoofed” at each end. Not all protocols can be optimized to an appreciable degree, but even those that can’t will normally benefit from one or both of the next two techniques.
Secondly: Block level De-duplication. You’re probably familiar with “caching” – the idea of storing previously accessed data nearby so that it can be accessed quickly if needed again. Your web browser does this today by keeping images and web pages on your hard drive and presenting the data from there if it believes that it’s still fresh. Many enterprises will have a web proxy on their network which acts as a gateway and a cache for all web browsing, and thus speeds up access to commonly visited sites. The problem with this kind of cache is that it’s “object” based and deals only with entire images, or documents. But in the course of normal working things get modified, renamed, versioned etc, and these simple changes make all of the “object” unrecognisable to an object cache and so it can’t provide any useful data. Block level de-duplication works with a finer granularity and caches chunks of data much smaller than whole “objects”. If it sees a new chunk it caches it, and if it sees a chunk again it sends a small reference to the chunk instead of the actual data. This happens at both ends so the caches of chunks or blocks remain in synch. It doesn’t care about objects at all – and so, for example, a document downloaded by CIFS could be edited, renamed and then sent out as an email attachment, and on the outward path block level de-duplication will find all the chunks that haven’t changed and just send tiny references instead. De-duplication can deliver astonishing savings for many work patterns – the more collaboration, the larger the content being shared, the more data transfers, the more savings are delivered and it’s easily possible to remove 90% or more of the data that would otherwise cross the network. I’ll not go into further detail here, but beneath this simple description lie some very elegant mechanisms and cunning implementations, and we believe that Replify has something pretty special in this area.
Thirdly: Compression. Easy one this – we simply take the data emerging from the first and second optimization stages above, and squish it using a lossless compression protocol. This typically gives a 50% reduction in data volumes for many common data types – web pages, XML, Office documents, database queries, etc, but a much smaller saving for content which is already efficiently compressed such as MPEG video or audio.
Some vendors attempt a few other tricks such as monkeying around with the TCP protocol, or establishing multiple parallel connections, and re-ordering the packets which get out of sequence as a result of doing this. In Replify’s view, these techniques add more complexity (and hence cost) than value but others will cite them as their secret sauce so we should acknowledge that point of view. Some other vendors also bundle Quality of Service (QoS) into their products – while we leave this quite deliberately to those that specialise in it, and our Accelerator product works very well with QoS devices.
So that’s WAN Optimization as a concept – now let’s talk about Replify’s approach to delivering it.