C Level Engineering Symmetry

Monitoring network traffic and service chatter with Boundary

We recently published a case study with Boundary regarding how we, at Gilt Groupe, are using their product and I wanted to give some additional details concerning our decision process, what we were looking for, what we looked at and why we decided that going to Boundary was the best choice for us moving forward.

Gilt Groupe’s architecture is now very much a case of micro-service architecture. We have hundreds of JVM-based HTTP services interacting with each others or with backend systems such as PostgreSQL, MongoDB, RabbitMQ, Kafka, Zookeeper, and many more third-party solutions over various data interchange formats and protocols.

A few months ago, we felt we needed to get more insight into the detailed amount of traffic that was going in and out of every service or backend system. When various teams are working on new features that require more communication patterns and data exchange, it starts to be difficult to do capacity planning when you don’t know where you are.

Moreover, in our experience, we have seen that most features generally go from a normal usage pattern for months to a sudden very large adoption by our business operations. The amount of data can suddenly grow 1 to 2 orders of magnitude, which does not generally go without its own set of challenges.

To get better insight into the amount of data exchanged, we started the effort to monitor the data transferred out of our HTTP services (we use Jetty) using the excellent Metrics library from Coda Hale. This can be trivially done extending the existing Metrics InstrumentedHandler for Jetty:


public class CustomInstrumentedHandler extends InstrumentedHandler { 

  private final Meter bytesTransferred = Metrics.newMeter(handler.getClass(), "bytes-transferred", "responses", TimeUnit.SECONDS);
  
  private final Histogram bytesResponse = Metrics.newHistogram(handler.getClass(), "bytes-responses"); 
  
  // constructor omitted for blog readability
  @Override
  public void handle(String target, Request request, HttpServletRequest httpRequest, HttpServletResponse httpResponse) throws IOException, ServletException {
    final AsyncContinuation continuation = request.getAsyncContinuation();
    try {
      super.handle(target, request, httpRequest, httpResponse);
    } finally {
      if (continuation.isInitial()) {
        long count = request.getResponse().getContentCount();
        bytesResponse.update(count);
        bytesTransferred.mark(count);
      }
    }
  }
}

The clients effort would be a bit more challenging however. In our JVM-based services we end using a menagerie of HTTP clients: AsyncHTTPClient with Netty 3.x provider, Apache HttpComponents 4.x, Apache Commons HTTPClient 3.x, and the venerable JDK HttpURLConnection.

This is the reality of having to deal with various third-party integration, it makes things more complicated than we would like to, but it can be a bit annoying to rewrite or extend some existing SDKs to try to use one and only one HTTP client across the platform (especially when non-extensible, or worse, closed-source).

The immediate problem faced is effectively how to instrument *all* those clients.

AsyncHTTPClient can be done easily using a RequestFilter and an AsyncHandler. The code would be something similar to the snippet below. There is not much overhead of doing it as you just need to count chunk size as they the HttpResponseBodyPart objects are received.

 public class InstrumentedAsyncHttpClientRequestFilter implements RequestFilter { 
  private final Meter bytesTransferred;
  private final Histogram bytesResponses;

// ... initialization omitted for readability
public FilterContext filter(FilterContext ctx) throws FilterException { return new FilterContext.FilterContextBuilder(ctx) .asyncHandler(new MetricsAsyncHandler(ctx.getRequest(), ctx.getAsyncHandler())) .build(); } public class MetricsAsyncHandler implements AsyncHandler { private AsyncHandler delegate; private long totalBytesTransferred = 0;
// ... initialization omitted for readability
public STATE onBodyPartReceived(HttpResponseBodyPart bodyPart) throws Exception { long bytes = bodyPart.getBodyPartBytes().length; totalBytesTransferred += bytes; metrics.bytesTransferred.mark(bytes); return delegate.onBodyPartReceived(bodyPart); } public T onCompleted() throws Exception { T o = delegate.onCompleted(); metrics.bytesResponses.update(totalBytesTransferred); return o; } } }

Note that we tend to give a name to each service client which would map to a Metrics scope, which makes it useful to distinguish metrics between each client (some services use a dozen of clients).

For all the others clients, it is a bit more intrusive to be practical. And it doesn’t address how to monitor the traffic in/out going directly through the Socket api like for Zookeeper, Play Framework (Netty server), MongoDB and JDBC drivers, etc…

Another solution would be to write a JVM Java Agent via the java.lang.Instrument API to instrument some well known libraries (NewRelic uses a similar technique, but doesn’t track traffic). While it may looked like the less intrusive solution, it is also a fairly significant undertaking to develop instrumentations for several third-party libraries which you have to maintain over time.

Also, knowing we were looking with an interested eye to add systems such as Riak, Redis and possibly some various asynchronous drivers and having to deal with multiple versions of Scala… this was a cool project to work on technically, but maybe not excessively practical.

What we needed was something similar to nethogs minus the text interface. A tool capable of grouping the bandwidth by process, but ideally it would have some features also found in Wireshark.

We did not find anything matching those requirements.

Until a week or two later. We had Cliff Moon, Co-Founder and CTO of Boundary, visiting our New-York office to present Boundary and do a Tech Talk on Distributed Systems (which we blogged about).

We installed Boundary on some our servers to get a better idea. This was truly a revelation. The installation was painless with just a single command and as soon as the agent was up, it started to report data to the dashboard within the next second.

image

Each of the line represent the traffic volume happening on a given port/protocol across all nodes at a 1 second resolution. Traffic can easily be broken down. For example you have the ability to group servers, either manually or dynamically using pattern matching which makes it easy to segment your front-end from your backend machines and see traffic flowing between those groups (this is where a descriptive naming policy for your machines comes handy).

You can further segment your traffic by port / protocol. For example TCP 5432 would be the traffic to/from PostgreSQL. You can then easily analyze the traffic that is going from your backend machines (or a subset of those) to your PostgreSQL. Same thing could be done to know the chatter around our messaging infrastructure on RabbitMQ.

A lot more details on how all of this can be done is visible on a Youtube video ‘Isolate your traffic with filters and conversations’.

There is a shortcoming currently for us where we are effectively losing a bit of visibility in our conversations. For instance, traffic to our services is always going through a set of dedicated service load balancers. For example we reach to it via a canonical url such as http://svc-product and the load balancer will balance between node1:7501, node2:7501, node3:7501. It means traffic from the caller to/from the load balancer happens on port 80 while the traffic to/from the load balancer to the callee is on port 7501.

caller ← port 80 → svc-lb ← port 7501 → callee 

Which means that the traffic flowing on port 80 is basically the aggregate of all service traffic and that we cannot see the traffic directly from caller to callee, but only the aggregate from caller to svc-lb and from svc-lb to callee.

This is something that may be alleviated a bit in the future as we are thinking about removing the load balancer and having applications doing the load balancing themselves using information from Zookeeper.

Boundary settings on the dashboard can be driven entirely from their REST API, which provides the added convenience of being able to integrate with your own configuration management system such as Puppet or Chef and a set of backend applications which may contain metadata about your environment.

The REST API is useful to define application aliases which gives names to a protocol:port (eg: ‘svc-product’ for TCP 7501), send deployment events or integrate with other systems (it can subscribe to NewRelic events via RSS).

We have only scratched the surface of Boundary so far and we are very excited about the direction it is taking and what is being developed. It has already proved extremely useful in identifying traffic volume and patterns occurring between services and databases. Something that would have required a lot more tedious investigative work can basically be done now in a few minutes and with much more flexibility than we could initially imagine and with no direct investment.

I hope that this (long) blog post will be helpful to some people who are facing the same challenges of not having enough visibility in their network traffic. If however you know of any interesting tool in that space, feel free to drop a note.

On a slightly unrelated note, we are also users of a nice library from Boundary called Ordasity. It is a great way to distribute workload across nodes via Zookeeper. It was brought to our attention during Scott Andreas’s tech talk at Gilt Groupe (another one !), and it might be the topic of another blog post.

— stephane

Mobile Web How To: Development Environment Toolkit

While working on Gilt Mobile Web, these are our development tools of choice to get the job done:

Browsers & Emulators

Android emulator: http://developer.android.com/sdk/index.html

XCode iOS Simulator: https://developer.apple.com/devcenter/ios/index.action

Chrome Canary + mobile settings: https://www.google.com/intl/en/chrome/browser/canary.html

Page Speeds

HTTP proxy/monitor and bandwidth throttler: http://www.charlesproxy.com/

PageSpeed Insights: https://developers.google.com/speed/docs/insights/using_chrome

Device Testing

User agent strings: http://youruseragent.info/commonua.en.htm

Feature compatibility checks: http://caniuse.com/

Android debug bridge for inspecting your Android device: http://developer.android.com/tools/help/adb.html

Adobe inspect for inspecting Android Browser: http://html.adobe.com/edge/inspect/

iOS 4.2.1 to 6.1 iPods and iPhones

Android 2.1 to 4.2 phones and tablets

Usability Testing

Reflector: http://www.reflectorapp.com/

So, what tools do you use?

Mobile Web How To: Proxy Local Environment To Devices

When you develop a front end experience for the full screen, the feedback loop between code and test is relatively fast. You code. You refresh your browser. Repeat. When you develop a front end experience for mobile devices, this can become a little cumbersome because the feedback loop can sometimes feel much slower. There are emulators for your machine and browser sizing/user-agent strategies that you can employ to make your development process more efficient. But in the end, you’re going to want to test on your devices.

In this post, I’ll explain how you can proxy your local development (localhost) to both your Android and iOS devices. When you do this, you’ll be able to code, refresh, repeat a lot faster and more efficiently.

Charles Proxy

To get started, you need a strategy to manage your HTTP proxy. I use and recommend using Charles Proxy (http://www.charlesproxy.com/) but there are certainly other alternatives. Charles Proxy is very powerful but I won’t be going into much detail about it here. Instead, we want to set up a port that we can HTTP proxy to.

If you click on PROXY, then PROXY SETTINGS, you’ll see a menu where you can enter in a port that you would like to proxy to. I’ve chosen 8888.

image

Charles Proxy is going to look for incoming connections on this port. When you connect iOS or Android to this port, you will see that Charles Proxy will ask you to allow or deny this connection attempt:

image

Now, let’s connect our devices.

iOS


In iOS, navigate to your WiFi menu and then tap into your connected WiFi. At the bottom of this screen, tap on MANUAL under HTTP Proxy. Where you see computer_ip_address_here, enter your computer’s ip address and where you see 8888, enter the port that you set up on Charles Proxy.

image


That’s about it. On your device, open localhost:1234 or whatever in a browser and then Charles Proxy will ask you to allow or deny. You can now start coding on your machine and refreshing on your iOS device. You’re done.

Android


On Android, tap SETTINGS and then Wi-Fi. Next, tap and hold on the WiFi network that you are currently connected to. On the following menu, tap on MODIFY NETWORK CONFIG.

image

You next see a menu to manage your network config. Scroll to the bottom of this modal and tap on SHOW ADVANCED OPTIONS.

image

In the menu options that appear, you will see configuration settings similar to those found in iOS. Where you see computer_ip_address_here, enter your computer’s ip address. Where you see 8888, select the port you set up with Charles Proxy.

image

And, that’s about it. On your device, open localhost:1234 or whatever in a browser and then Charles Proxy will ask you to allow or deny. You can now start coding on your machine and refreshing on your iOS device. You’re done.

Today in Pictures Today in Pictures Today in Pictures

Today in Pictures

Mobile Web How-To: Detect Samsung S4 Device On Your Mac

This hasn’t been very well documented on the internet just yet so I thought I’d share how you need to connect your Samsung S4 device to a Mac. When you connect this device to your Mac, it (most likely at the time of this writing) will not be recognized.

image

To have your Mac detect your Samsung S4, you have to do some magic on your device. Click on into SETTINGS and then click on ABOUT DEVICE. If you scroll to the bottom, you will see a line entitled BUILD NUMBER.

Tap on the build number 7 times. I’m not kidding.

image

image

When you are now “a developer,” you’ll see a new menu option in SETTINGS -> MORE entitled DEVELOPER OPTIONS. In this option, you need to check USB DEBUGGING.

image

Then when you next connect your device to your computer, you’ll see your device:

image

Mobile Web How-To: Inspect Elements On Android’s Internet Browser

I’m building Gilt’s new Android app and a good portion of the website is an Android WebView. As you may or may not know, this WebView uses the default Android Internet Browser to render webpages. You probably know this app best by its logo in the lower right hand corner of this screenshot:

image

When you use Google Chrome on the Android device, inspection is very straightforward — I’ll cover this in a later post. But for Android Internet Browser, there is not to my knowledge a good way to inspect and manipulate the DOM.

I needed to inspect the DOM because I inherited a JavaScript file that allows us to mimic scrolling events on mobile devices via webpages. The scrolling library works as expected in all other browsers on all other devices. So, I needed to better understand what was happening in the Android Internet Browser.

The tool that bridged my device to an inspector tool is Adobe Inspect. To get going, you have to install 3 components:

1. Adobe Inspect on your computer: http://html.adobe.com/edge/inspect/

2. Google Chrome Extension Adobe Inspect: https://chrome.google.com/webstore/detail/adobe-edge-inspect/ijoeapleklopieoejahbpdnhkjjgddem?hl=en

3. Google Play Store Adobe Inspect: https://play.google.com/store/apps/details?id=com.adobe.shadow.android&hl=en


Once you have installed everything, connect your Android device to your computer and make sure you’re on the same WiFi. On your Android Device, open Adobe Inspect and click the plus sign in the upper-right hand corner to get to this screen:

image

Get your ip address from your computer and enter it in Adobe Inspect on your Android Device. If you skip ahead, you can find your ip address on your computer by opening Google Chrome and clicking on the Adobe Inspect icon in the nav bar — you’ll see it there as well. After you input your ip address into the Android device, you’ll receive this screen:

image

Now, open Google Chrome and open the url of your choice with the Adobe Inspect extension enabled. In the upper right hand corner, you’ll see the Adobe Inspect icon with a green plus sign.

image

Click on this icon to reveal a menu that displays your computer, your IP address, and your device name.

image

Enter the Passcode you received from your Android device.

image

Now that you’re connected, click on this button next to your device name:

image

Then, a new window should open that looks like this:

image

You can see that this is a standard Google Chrome inspector with the name of your device and the url that is currently being inspected. Click on this link and then click on Elements.

image

Here, you have a standard inspection workflow similar to what you would use for the full screen experience. You can use the console and other features in a more limited manner to what you would use on the full screen experience. And, Adobe Inspect will highlight what DOM elements are being inspected on the device:

image

There is much more you can do but, hopefully, you’re now set up to debug the Android Internet Browser (not that you needed to debug anything in the first place).

Offsite

image

A few of us took some time to get to know each other a little better, hug trees and talk shop.

image

ade-trenaman:

Erlang, Distributed Systems and Sierra Nevada Pale Ale

Found this photo on my phone - a real moment in time at Gilt New York! Steve Vinoski, who worked with Gregor and I back in the days of the seminal Irish tech startup IONA Technologies, came to Gilt one cold February night to give a talk on how Riak is implemented in Erlang - it’s a great talk, and a super introduction to Erlang. From left to right: Eric Bowman, Steve Vinoski, Gregor Heine, me, and Mike Bryzek. Too. Much. Fun. 

ade-trenaman:

I was looking through an article on Java 8, . We’re using Scala heavily at Gilt and I was toying with the heretical notion that Java 8 might create a compelling reason to go back to Java. Sacrilige! I am of course biased in this matter as I’ve really enjoyed the last two years of Scala coding at…