Understanding Automation Manager traffic

By Max Ranzau

 

BandwidthexampleFrom the Nuts, Bolts and Bandwidth Dept. As some of you noticed, a while back RES released the Reference Architecture document for Automation Manager, so now we have one for both the core products, great. If you have no idea what I’m talking about, go read the document here. The reference doc leads in with the basic deployment scenarios and similar things, but the nittygritty stuff is on page 22+23, where we get down to the actual traffic load numbers. Some of these numbers are obviously based on estimates: For example the resulting network traffic of any job, invariably would be linked to what’s in the job in terms of modules, tasks and resources. Note: The ref.architecture doc specifies packet numbers, but since we are interested in the overall bandwidth consumption I’m leaving those number out as they’d only muddle the picture.

One more thing before we get started. The numbers listed below are obviously without Automation Manager bandwidth management involved. I may do another piece on that particular topic in the future so you get a better idea of what it actually does…

AM Agent-to-Dispatcher idle-traffic.

To begin with, lets talk about idle traffic. It’s a fact of life that all components in a network chat. The question is how much, how often and what the impact would be on the fringes on your infrastructure. Given the numbers below, chances are that you can work out the impact by yourself. While RES is keeping the formula on how to calculate the precise idle traffic under NDA, it’s pretty easy to make a couple of observations off the provided numbers. According to the examples in the document:

When idle, one AM agent spews about 47.5kByte worth of traffic per 5 min/300 sec. That’s not a whole lot as it boils down to just below 160 bytes/sec per agent.

Let’s expand on that: Even if you had 3000 agents on the same site, your total background noise from the AM agents, across all switches would be around 475k/sec. This is next to nothing on a regular LAN. Remember this is the agent-to-dispatcher poll traffic, so it would never have to cross a WAN link when you place your dispatchers strategically on your external sub-nets.

AM Dispatcher-to-Datastore idle traffic.

As most of you know, an AM Agent is wired never to talk to a datastore directly, but always go by way of at least one dispatcher.  How many dispatchers you will need depends entirely on the shape of the infrastructure you want to deploy AM in. Since each Dispatcher+ can handle 1500 concurrent agent connections, you are obviously going to have a lot less dispatchers than agents, however the kicker is that the dispatchers will usually be crossing WAN links if you have them. If you go about your dispatcher layout smartly, you should not need more than one dispatcher per wan, link which makes the following math rather easy. When the dispatchers aren’t doing anything job related, the idle traffic depends on:

A) Are there agents connected?

  • If no, the base impact is about 616k over 300 sec †1), being around 2kB/sec per dispatcher, which is negligible on most WAN links.
  • If yes, it seems there is a slight overhead when an agent is connected (22kB over 5 min = 73 bytes/sec). Presuming that number scales linear, then even with a fully loaded dispatcher with 1500 attached agents, you would be looking at around 110kb/sec dispatcher-to-datastore traffic †2)

†1)Disclaimer: As RES is keeping the formula for calculating this under wraps with the NDA stuff, I’m going to have to make a couple of leaps of faith here. Unless somebody with WireShark and the necessary time on their hands is prepared to run the numbers and verify (or spill the beans in the comments section ;)  – we can only go by what is stated in the doc. The reference document said the Agent idle traffic was measured over a 5 minute interval, but it doesn’t say so for the corresponding dispatcher traffic, thus we are going to presume they used the same measurement interval for that test. †2)The other presumption I had to make, is that each additional agent connected to a dispatcher adds on average the 73 bytes/sec onto the idle dispatcher-to-datastore traffic.

B) The other item to briefly consider: Is the dispatcher a garbage collector? This role has a very low impact and not much is posted about it anywhere at present, other than it’s selected dynamically amongst the dispatchers – supposedly similar to Citrix Zone Data Collectors as they all know about each other through their datastore. What the purpose of the RES AM data collection precisely is, when it happens, what the election criteria is, are still unanswered questions, but hopefully someone will provide that information in the near future. Anyway, based on the released numbers, we may presume the garbage collection role adds approximately 215k/5min to an idle agent, equivalent to just under 780 bytes/sec for the garbage collecting dispatcher. So unless part of your infrastructure is running over a 2G cell data connection you can probably ignore this completely.

AM Dispatcher-to-Datastore traffic, running a job

Since you hopefully won’t have your new Automation Manager installation sitting idly too much, it’s also worthwhile knowing what kind of traffic patterns you are looking at when a job is actually processing. Again we have to do a bit of guestimating based on the provided numbers in the RES reference architecture guide. The numbers are based on a job that downloads a 5.5MB AM resource. The only thing they really tell us is, that regardless if you are master caching or connecting a dispatcher directly to the datastore, the amount of data going across the wire seems roughly equivalent to the size of the resource payload + 500kB. If a dispatcher is set up as a master caching dispatcher, the transfer overhead looks to be about 200kB less than for a regular dispatcher connected to the datastore.

In conclusion, while the overhead isn’t very large, it is not easy to create usable rules of thumb out of this, as the reference document only provides one datapoint (5,5MB and resulting traffic), so in order to make an accurate estimate on traffic size, one would have to know how/if the 500kb overhead grows with the size of the resource/modules. I would suggest this would be an excellent addition to a revision of this document.

Other than that, clarifying comments are welcome!

  • By Grant Tiller, July 28, 2014 @ 14:44

    Great article Max. Kudos to the dude that put the reference architectures together ;-)

  • By RESguru, July 28, 2014 @ 17:13

    Thanks Grant, I’d love to give kudo’s to said dude if I knew who it was.
    (I guess that’s one of the reasons I prefer to blog instead of making whitepapers :-)
    /Max

Other Links to this Post

  1. Understanding Automation Manager Traffic | RES Software Blog — September 9, 2014 @ 12:33