The Gilt technology organization. We make gilt.com work.

Gilt Tech

New open source project: scala-fedex

Ryan Caloras open source

We recently made the decision to switch from Newgistics to FedEx SmartPost for customer returns at Gilt. A couple of factors contributed to the decision, but the prettiness of FedEx’s API was not one of them - it’s not exactly the most developer friendly API you can find.

FedEx Web Services are a collection of APIs to do everything from generating shipping labels to tracking packages. Unfortunately, they’re still using SOAP and WSDLs! That means poor support for a modern language like Scala. Their sample Java client contained a bunch of unfriendly Scala code (e.g. Xml, blocking requests, and Java Classes rather than native Scala).

Enter scala-fedex

Using scalaxb we were able to generate a native non-blocking Scala client from FedEx’s WSDL. We then added a wrapper to further reduce the boilerplate of the raw generated client. The final result being a thin async Scala client for FedEx’s Ship Service API. It also provides a much cleaner example of usage than the previously mentioned Java code. We can now use this to generate FedEx return labels on Gilt using native reactive Scala code!

To support the community, we decided to open source and publish scala-fedex. You can find more specifics on the scala-fedex repo.

scala 13 fedex 1 open source 61 web-services 1 Ryan Caloras 7 John Coghlan 1
Tech
Gilt Tech

Rookie Summer

Team Rookie internship

The Summer Internship #TeamRookie

What can four interns do in a ten week internship? A lot, as it turns out.

With some help from the fantastic team led by Kyle Dorman, we built out a complete backend and frontend iOS app and had a ton of fun in the meanwhile. Here’s a snapshot of our summer experience.

Our internship began with two weeks of heavy learning. We were going to be building a mobile application using Swift, Scala, and the Play framework, but most of us had no prior experience with any of these tools. So our mentors led us through lab assignments to get our feet wet. We built a few simple iOS apps and a small Play application that used the Gilt Public API. As we became familiar with the languages and tools we would eventually use to build our intern project.

After our brief introduction to software development at Gilt, we started on a project of our own. Our mentors introduced us to Agile sprints, and we were off. Our progress was slow at first, but kicked up near the end, and on the last day of development we finally got our finished app deployed.

While we worked a lot this summer, Gilt made sure that we had tons of fun as well. Every Friday, we left the office early to explore a new part of New York. We went up the Freedom Tower, walked the High Line, and saw a musical on Broadway, amongst tons of other fun activities. We even had fun inside the office. We had weekly lunches with different teams across the organization, and we enjoyed random festivities like Cheese-mageddon, where tech samples more cheese than is probably healthy.

So, what was our actual project for the summer? The Gilt Style Quiz is a fun, playful way for users to start interacting with the Gilt iOS app without making a purchase. At the same time, the app gives Gilt a chance to get to know our users better as well. Through a series of style related questions, we are able to collect both brand and product category affinities for a user and are able to, as a result, better personalize the shopping experience for our users. As a team, we took complete ownership of the project and built the app from ground up. We began by developing the API and data models, and then we split up to tackle the front and back ends of the project.

What about Gilt Tech made the internship so cool?

Micro service architecture

Gilt uses micro services architecture to back the business operations. Because our service could be small and totally independent, we were able to make all of the design decisions in architecting the backend, super cool as an intern! We created a RESTful API with apidoc, an opensource project for declaring the models and resources of a RESTful API that comes with a suite of client generators.

#### Mentorship

Gilt provided a lot of resources to help us succeed and grow this summer. Right from the start we were introduced to individual mentors who helped us every step of the way, from learning Scala, Swift, and the magic that is AWS, to polishing out our product in the final week. Throughout the summer we had the opportunity to dine with the various tech teams and learn about the architecture supporting the Gilt backend and frontend. Erica also organized for us lunches with several executives from within Gilt and HBC, giving us firsthand insight to what drives the company.

From a project perspective, we had the chance to work with an amazing product manager, Hilah Almog, who defined our metrics of success and the scope of the application, as well as with a designer, Jose Salamone. It is easy sometimes to get caught in a cycle of code, test, and deploy without stopping to think of who is going to be using the product. However, getting the chance to work with non-engineers really helped keep the project in perspective. We weren’t writing code just to develop our skills in Scala and Swift or even to gather data for the personalization team. Primarily, we were developing a product to enhance customer satisfaction, to show our users that shopping on Gilt is a fun, enjoyable experience, and to streamline their transition into using their personalized product feed. While developing our technical skills was important, one of the key takeaways from this summer was definitely that it is crucial to keep your users in mind while developing!

Culture

Gilt has a unique, forward-thinking culture. The company constantly evaluates the tools it uses, and it is always open to exploring new technologies. We were exposed to this at the quarterly architecture council, where all the engineers spend a day discussing the current state of the Gilt technology stack and exploring possible new directions for tech.

Gilt is also committed to open-source and we made use of some Gilt open-source technologies in our project. The Cleanroom Initiative is an open-sourced codebase in Swift providing help with data transactions in our application. We also used an open-source apidoc Swift client generator and worked with the owner to make some additions to the project.

Takeaways

Throughout the summer we were exposed to a host of skills that eased the software development process. The adoption of standardized git workflows and explicit communication on project status through Agile sprints accelerated the development of our project across the short time frame. If we listed all we learned this summer it would take about, oh say, ten weeks, but needless to say this was a summer that we’ll all remember for a long time.

internship 3 mobile 22 team rookie 1
Tech
Gilt Tech

How to convert fully connected layers into equivalent convolutional ones

Pau Carré Cardona deep learning

The Problem

Say we want to build system to detect dresses in images using a deep convolutional network. What we have is a database of 64x128 pixels images that either fully contain a dress or another object (a tree, the sky, a building, a car…). With that data we train a deep convolutional network and we end up successfully with a high accuracy rate in the test set.

The problem comes when trying to detect dresses on arbitrarily large images. As images from cameras are usually far larger than 64x128 pixels, the output of the last convolutional layer will also be larger. Thus, the fully connected layer won’t be able to use it as the dimensions will be incompatible. This happens because a fully connected layer is a matrix multiplication and it’s not possible to multiply a matrix with vectors or matrices of arbitrary sizes.

Let’s assume we have 1024x512 pixels images taken from a camera. In order to detect dresses in an image, we would need to first forward it throughout the convolutional layers. This will work as convolutional layers can adapt to larger input sizes. Assuming the convolutional and max pool layers reduce the input dimensions by a factor of 32, we would get an output of 32x16 units in the last convolutional layer. On the other hand, for the training and test images of 64x128 pixels, we would get an output of 2x4 units. That size of 2x4 units is the only one the fully connected layer matrix is compatible with.

Now the question is, how do we convert our camera 32x16 units into the fully connected 2x4 units ?

The Wrong Solution

One way to do it is by simply generating all possible 2x4 crops from the 32x16 units. That means we would generate 403 samples of 2x4 units ( (32 - 2 + 1) x (16 - 4 + 1) = 403 ). Finally, we would go one by one forwarding those 403 samples throughout the fully connected layers and arrange them spatially.

The problem with that approach is that the cost of cropping and forwarding images throughout the fully connected layers can be impractical. On top of that, if the network reduction factor is lower or the camera images have a higher resolution, the number of samples will grow in a multiplicative way.

The Right Solution

Fortunately, there is a way to convert a fully connected layer into a convolutional layer. First off, we will have to define a topology for our fully connected layers and then convert one by one each fully connected layer. Let’s say we have a first fully connected layer with 4 units and a final single binary unit that outputs the probability of the image being a dress. This diagram describes the topology:

Converting the first fully connected layer

The idea here is to transform the matrix A into a convolutional layer. Doing that it’s pretty straightforward as the rows of the matrix A can be interpreted as convolutions applied to the flattened input V.

Let’s first write down the classical deep learning convolution operator:

When both the signal and the filter are of the same size, the convolution will generate a vector of size one. Hence, the convolution will be equivalent to the dot product:

Applying this property to our convolutional conversion task, we will be able to transform a linear operator into a vector of convolutions:

Therefore, we have the following transformed convolutional layer for the first fully connected layer:

More formally, we will have as many feature maps as rows the matrix A has. Furthermore, the i-th feature map will have as filter the i-th row of the matrix A.

Here we are assuming that the input of the fully connected layer is flattened and also that the fully connected layer only receives a single feature map from the last convolutional layer. For multidimensional convolutions with many feature maps, the transformation will depend on the way the framework we use encodes the different layer types (convolutional and fully connected).

In case of Torch, it’s pretty easy as one simply has to copy the biases and the weights of the fully connected layer into the convolutional layer. The caveat is that the convolutional layer has to be declared using the following parameters:

  • Number of input feature maps: as many as output feature maps the last convolutional layer has.

  • Number of output feature maps: number of outputs the fully connected layer has.

  • Filter dimensions: the dimensions of the output of each feature map in the last convolutional layer (we assume the all of the feature maps have the same output dimensions).

Converting the second fully connected layer

After the first transformation we will have in the second fully connected layer an input that has many feature maps of size one.

The equivalent convolutional layer will be the following:

  • Number of input feature maps: as many input feature maps as output feature maps the last transformed convolutional layer has. It will also be equivalent to the input units the original second fully connected layer has.

  • Number of output feature maps: as many output feature maps as outputs the second fully connected layer has. In our case we have a single output and therefore the layer will only have a single output feature map. In case we would have more outputs or an additional fully connected layer, we would need to add more feature maps.

  • Filter values: the filter architecture is pretty simple as all the input feature maps have units of size one. This implies that the filters will be of size one. The value of the filter in the feature map that connects the n-th input unit with the m-th output unit will be equal to the element in the n-th column and the m-th row of the matrix B. For our specific case there is one single output, thus m is equal to 1. This makes the transformation even easier. Nevertheless, we should keep in mind that we could potentially have multiple outputs.

For our example, the second fully connected layer will be converted into the following convolutional layer:

Always be convolutional

In this post we’ve discussed how to transform fully connected layers into an equivalent convolutional layer. Once the network no longer has fully connected layers, we will be able to get rid of all the problems they cause when dealing with inputs of arbitrary sizes.

Nevertheless, when designing a new neural network from scratch it’s always a good idea to design it substituting all fully connected layers with convolutional layers. This way, there is not only no need for any conversion but we will also get far more flexibility in our network architecture.

deep learning 1 convolutional neural networks 1 image detection 1 torch 1 Pau Carré Cardona 1
Tech
Gilt Tech

Akka HTTP Talk with Cake Solutions

meetups

We are thrilled to be hosting Aleksandr Ivanov of Cake Solutions on Tuesday, April 12th. He’ll be presenting an excellent talk on Akka HTTP. Who is Aleksandr? We’re glad you asked:

Aleksandr Ivanov is a senior software engineer at Cake Solutions, one of the leading European companies building Reactive Software with Scala. Scala is his main language of choice since 2011. He’s taken part in various projects, from developing backends for trivial web applications to enterprise-level reactive system and machine learning apps.

Beside engineering, he’s taking an active part in the life of the developer community, giving talks at local meetups, writing articles and helping others on the mailing list, Gitter and Stack Overflow. He’s always ready to hear and discuss interesting projects and events, share his experience or simply have a nice conversation over a drink.

Refreshments will be served!

Important: You must provide First Name & Last Name in order to enter the building.

Please RSVP!

meetup 8 cake solutions 1 akka 3 akka http 1 scala 13
Tech
Gilt Tech

Urgency vs. Panic

Hilah Almog tech blog

Urgency vs. Panic

My first initiative as a product manager at Gilt was something called “Urgency”. It was formed under the premise that Gilt customers had become numb to the flash model, and we in tech could find ways to reinvigorate the sense of urgency that once existed while shopping at Gilt; the noon rush wherein products were flying off the virtual shelves, and customers knew if they liked something they had precious few minutes to purchase it before it’d be gone forever. I came into this initiative not only new to Gilt but also new to e-commerce, and I felt an acute sensitivity towards the customer.At Gilt we walk a fine line between creating urgency and inciting panic, and it’s something I personally grappled with continuously. The former’s outcome is positive, the shopping experience becomes gamified, and the customer’s win is also ours. The latter’s outcome is negative. The customer has a stressful and unsuccessful shopping experience, and then churns. This fine line meant that we as a team couldn’t just conceive of features, we also had to find the perfect logical balance as to when they should appear – and more importantly, when they shouldn’t.

Cart Reservation Time

Our first feature was reducing the customer’s reservation time by half once they add a product to their cart. This tested well, but felt mean. We therefore held its release until we could build a product marketing campaign around it that communicated the shorter time as an effort in fairness: “if other customers can’t hoard without real intention to buy, then you get the most coveted products faster”. The customer service calls ended once our shoppers felt the feature was for their protection, not harm.

Live Inventory Badging

We wanted to continue running with this theme of helpful urgency, leading us to our second feature: live inventory badges. When we have less than 3 of any given item, a gold badge appears atop the product image saying “Only 3 Left”. It then animates in real time as inventory of that item changes. If you are ever on Gilt right at noon, notice how our sales come alive through these badges. Unlike the cart reservation time, this feature felt like a one-two punch. Not only were we creating urgency, but we were also giving the customer something they rarely get while shopping online – a view of the store shelf.

Timer in Nav + Alerts

Our third feature was our biggest challenge with regard to striking the right balance between urgency and panic. We added a persistent cart timer in the navigation, showing how much of your aforementioned five-minute reservation had transpired. The timer’s partner in crime is an alert, in the form of a banner, that appears on the bottom of the page when only a minute is left on your item’s reservation, urging you to checkout before it’s gone.

In order to find ourselves on the right side of the line, we implemented stringent rules around when this banner could appear, limiting it only to products that are low inventory (less than 3 in your size), and once per session.

Live Views

We faced an altogether different challenge when it came to our final feature, live product views. Here, the feature itself wasn’t strong enough, the views had to carry their weight. We again were forced to think through very specific thresholds depending on inventory levels and view count in order to determine under what circumstances we show, and under which we hide the feature.

Each of these features were tested individually, and each yielded positive results. After each was released we saw a combined 4% increase in our Key Performance Indicators on revenue within the first hour of a sale. The line was traversed successfully without panic but with the intended effect. And to our customers we say; Because you’re mine, I walk the line.

KPI 1 urgency 1 panic 1 shopping 14
Tech
Gilt Tech

Breaking the Mold: Megaservice Architecture at Gilt

Adrian Trenaman aws

Today we announce a novel approach to software and system architecture that we’ve been experimenting with for the last while at Gilt: internally, we’ve been referring to it ‘mega-service’ architecture, and, the name seems to have stuck. We’re pretty excited about it, as it represents a real paradigm shift for us.

In a mega-service architecture, you take all your code and you put it in one single software repository, the mega-service. There are so many advantages to having a single repository: only one code-base; no confusion where anything is; you make a change - it’s done, and will go out with the next deploy. It all compiles, from source, 100% of the time at least 50% of the time. Software ownership is a perpetual challenge for any tech organisation: in the mega-service model, there are many, many owners which means of course that the code is really, really well owned.

The mega-service is deployed to one really big machine: we prefer to run this in our own ‘data centre’ as we believe we can provision and run our hardware more reliably and cost-effectively than existing cloud players. The benefits of having a mega-service application are manifold: there’s one way to do everything and it’s all automated; code instrumentation, configs and metrics are all consistently applied, and, all eyes are on the same project, scripts and code, so people are more familiar with more parts of the system.

We’ve abandoned the sophisticated distributed code control mechanisms of recent lore in favour of a big ‘directory’ hosted on a shared ‘file server’. We’ve resorted to an optimistic, non-locking, non-blocking, zero-merge, high-conflict algorithm called ‘hope’ for contributing code changes: we copy the changes into the directory, and then ‘hope’ that it works. Rather than work with multiple different programming languages and paradigms, we’ve settled on an ‘imperative’ programming style using a framework we’ve recently adopted called Dot Net. Aligning previous lambda-based actor-thinking to a world of mutable variables, for-loops and ‘threads’ has not been easy for us; however, we suspect that the challenges and difficulties we’re experiencing are mere birthing pains and a clear sign that we’re heading in the right direction: if it’s hard, then we must be onto something.

This new architectural approach is an optimization on Neward’s ‘Box-Arrow-Box-Arrow-Cylinder’ pattern, reduced to a profoundly simple ‘Box -Arrow-Cylinder’ diagram (despite forming an elegant visual, the solution is just slightly too large to fit in the margin). We typically draw a box (our monolithic code) on top of a cylinder (our monolithic database), both connected with a line of some fashion; however, some have drawn the box to the left, right or bottom of the cylinder depending on cultural preference. Distinguished Engineers at Gilt have postulated a further simplification towards a single ‘lozenge’ architecture incorporating both code and data store in a single lozenge: while that architecture is theoretically possible, current thinking is that it is unlikely that we will get to prototype this within the next ten years.

New architectures require new thinking about organisational structure: everything so far points to a need for a software organisation of about five Dunbars in size to maintain our code-base, structured with a golden-ratio proportion of about eight non-engineering staff to every five engineers. Additionally, the benefits of really thinking about and formalizing requirements, following through with formal design, code and test over long periods, in an style we refer to as ‘Radical Waterfall’, bring us to a rapid release cycle of one or two releases per solar year.

While most readers will be familiar with open-source contributions from gilt on http://github.com/gilt and our regular talks and meetups, the innovations described in this post are subject to patent, and available through a proprietary licence and submission of non disclosure agreement. We’ll be releasing details of same on our next blog post, due for publication a year from now on April 1st, 2017.

aws 5 codedeploy 2 newrelic 2 notifications 2 micro-services 22 april-fool 1
Tech
Gilt Tech

Front End Engineering Lightning Talks with HBC Digital

meetups

Join us for an evening of lightning talks by 4 of HBC Digital’s Front End Engineers and an introduction by Steve Jacobs, SVP, Digital Technology and Demand Marketing.

  • Ozgur Uksal - Front End Engineer: Working with Typescript
  • Lei Zhu - Front End Engineer: Redux
  • Norman Chou - Front End Engineer: Using React router
  • Rinat Ussenov - Front End Engineer: Juggling bits in Javascript

Refreshments will be served!

Important: You must provide First Name & Last Name in order to enter the building.

Please RSVP!

meetup 8 hbc digital 1 frontend 12 typescript 1 redux 1 react 1 javascript 11
Tech
Gilt Tech

OSX, Docker, NFS and packet filter firewall

Andrey Kartashov docker

The Mobile Services team at Gilt uses Docker to both build and run software. In addition to the usual Docker benefits for software deployments moving toolchains to Docker has a few advantages:

  • it’s easy to (re)create a development environment
  • the environment is preserved in a stable binary form (libs, configs, CLI tools, etc, etc don’t bit rot as main OS packages or OS itself evolve)
  • easy to support multiple divergent environments where different versions of tools/libs are the default; e.g. java7/8, python, ruby, scala, etc

We develop primarily on OSX, but since Docker is a Linux-specific tool, we must use docker-machine and VirtualBox to actually run it. Toolchains rely on having access to the host OS’s filesystem. By default /Users is exposed in the Docker VM. Unfortunately, the default setup uses VBOXFS which is very slow. This can be really painful when building larger projects or relying on build steps that require a lot of IO, such as the sbt-assembly plugin.

Here’s a great comparison of IO performance.

There’s really no good solution for this problem at the moment, but some folks have come up with a reasonable hack: use NFS.

One of them was even nice enough to wrap it up into a shell script that “just works”.

Indeed, with NFS enabled, project build times begin to approach “native” speeds, so it’s tempting. The issue with NFS continues to be its aging design and intent to function in trusted network environment where access is given to hosts, not to authenticated users. While this is a reasonable access model for secure production networks, it’s hard to guarantee anything about random networks you may have to connect to with your laptop, and having /Users exposed via NFS on un-trusted networks is a scary prospect.

OSX has not one but two built-in firewalls. There’s a simplified app-centric firewall available from Preferences panel. Unfortunately all it can do is either block all NFS traffic (docker VM can’t access your exported file system) or open up NFS traffic on all interfaces (insecure), so it doesn’t really work for this case.

Fortunately, under the hood there’s also a much more flexible built-in packet level firewall that can be configured. It’s called PF (packet filter) and its main CLI tool is pfctl. Here’s a nice intro.

With that, one possible solution is to disable firewall in the Preferences panel and add this section at the end of the /etc/pf.conf file instead:

# Do not filter anything on private interfaces
set skip on lo0
set skip on vboxnet0
set skip on vboxnet1

# Allow all traffic between host and docker VM
table <docker> const { 192.168.99/24 }
docker_if = "{" bridge0  vboxnet0  vboxnet1 "}"
pass quick on $docker_if inet proto icmp from <docker> to <docker>
pass quick on $docker_if inet proto udp from <docker> to <docker> keep state
pass quick on $docker_if inet proto tcp from <docker> to <docker> keep state

# Allow icmp
pass in quick inet  proto icmp
pass in quick inet6 proto ipv6-icmp

# Bonjour
pass in quick proto udp from any to any port 5353

# DHCP Client
pass in quick proto udp from any to any port 68

# Block all incoming traffic by default
block drop in
pass out quick

Then turn it on at a system boot time by adding /Library/LaunchDaemons/com.yourcompany.pfctl.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Disabled</key>
	<false/>
	<key>Label</key>
	<string>com.gilt.pfctl</string>
	<key>WorkingDirectory</key>
	<string>/var/run</string>
	<key>Program</key>
	<string>/sbin/pfctl</string>
	<key>ProgramArguments</key>
	<array>
		<string>pfctl</string>
		<string>-E</string>
		<string>-f</string>
		<string>/etc/pf.conf</string>
	</array>
	<key>RunAtLoad</key>
	<true/>
</dict>
</plist>

And configuring it to start by running

sudo launchctl load -w /Library/LaunchDaemons/com.yourcompany.pfctl.plist

The main difference from /System/Library/LaunchDaemons/com.apple.pfctl.plist here is the addition of -E parameter.

You can check that it starts by default after a reboot with

sudo pfctl -s info

And check the rules with

sudo pfctl -s rules

It should be ‘Enabled’ and you should see the configured rules.

You can verify your overall setup by running nmap from a different node against your laptop, e.g.

sudo nmap -P0 -sU  YOUR_IP
sudo nmap -P0 -sT  YOUR_IP

And check for open ports.

After that configuration you should see a noticeable improvement in docker performance for file system intensive workloads. Hopefully this will no longer be necessary in future versions of Docker VM so check the docs to be sure.

docker 2 osx 1 nfs 1 firewall 1
Tech
Gilt Tech

gulp-scan • Find Yourself Some Strings

Andrew Powell gulp

We recently ran across the need to simply scan a file for a particular term during one of our build processes. Surpringly enough, we didn’t find a Gulp plugin that performed only that one simple task. And so gulp-scan was born and now resides on npmjs.org.

Simply put - gulp-scan is a Gulp plugin to scan a file for a particular string or (regular) expression.

Setting Up

As per usual, you’ll have to require the module.

var gulp = require('gulp');
var scan = require('gulp-scan');

Doing Something Useful

gulp.task('default', function () {
  return gulp.src('src/file.ext')
		.pipe(scan({ term: '@import', fn: function (match) {
			// do something with {String} match
		}}));
});

Or if RegularExpressions are more your speed:

gulp.task('default', function () {
	return gulp.src('src/file.ext')
		.pipe(scan({ term: /\@import/gi, fn: function (match) {
			// do something with {String} match
		}}));
});

Pretty simple. There’s always room for improvement, and we welcome contribution on Github.

gulp 2 javascript 11
Tech
Gilt Tech

Codedeploy Notifications as a Service

Emerson Loureiro aws

After moving our software stack to AWS, some of us here at Gilt have started deploying our services to production using AWS’s Codedeploy. Before that, in a not-so-distant past, we used an in-house tool for deployments - IonCannon. One of the things IonCannon provided were deployment notifications. In particular, it would:

  1. Send an email to the developer who pushed the deployment, for successful and failed deployments;
  2. Send a new deployment notification to Newrelic;
  3. Optionally, send a Hipchat message to a pre-configured room, also for successful and failed deployments.

These notifications had a few advantages.

  1. If you - like me - prefer to switch to something else while the deployment is ongoing, you would probably want to be notified when it is finished; i.e., “don’t call me, I’ll call you”-sort of thing. The email notifications were a good fit for that;
  2. Having notifications sent via more open channels, like Newrelic and Hipchat, meant that anyone in the team - or in the company really - could quickly check when a given service was released, which version was released, whether it was out on a canary or all production nodes, etc. In Newrelic, in particular, one can see for example, all deployments for a given time range and filter out errors based on specific deployments. These can come in handy when trying to identify a potentially broken release.

Codedeploy, however, doesn’t provide anything out-of-the-box for deployment notifications. With that in mind, we have started looking at the different options available to achieve that. For example, AWS itself has the necessary components to get that working - e.g., SNS topics, Codedeploy hooks - but that means you have to do the gluing between your application and those components yourself and, with Codedeploy hooks in particular, on an application-by-application basis. Initially, what some of us have done was a really simple Newrelic deployment notification, by hitting Newrelic’s deployment API in the Codedeploy healthcheck script. This approach worked well for successful deployments. Because the healthcheck script is the last hook called by Codedeploy, it was safe to assume the deployment was successful. It was also good for realtime purposes, i.e., the deployment notification would be triggered at the same as the deployment itself.

Despite that, one can easily think of more complicated workflows. For example, let’s say I want to notify on failed deployments now. Since a failure can happen at any stage of the deployment, the healthcheck hook will not even be called in those cases. Apart from failed deployments, it’s reasonable to think about notifications via email, SNS topics, and so on. All of that essentially means adding various logic to different Codedeploy hooks, triggering the notifications “manually” from there - which for things like sending an email isn’t as simple as hitting an endpoint. Duplication of that logic across different services is then inevitable. An alternative to that would be Cloudtrail and a Lambda. However, given the delay for delivering Cloudtrail log files to S3, we would lose too much on the realtime aspect of the notifications. One good aspect of this approach, though, is that it could handle different applications with a single Lambda.

So, the ideal approach here would be one that could deliver realtime notifications - or as close to that as possible - and handle multiple Codedeploy applications. Given that, the solution we have been using to some extent here at Gilt is to provide deployment notifications in a configurable way, as a service, by talking directly to Codedeploy. Below is a high-level view of our solution. In essence, our codedeploy notifications service gets deployments directly from Codedeploy and relies on a number of different channels - e.g., SNS, SES, Newrelic, Hipchat - for sending out deployment notifications. These channels are implemented and plugged in as we need though, so not really part of the core of our service. Dynamo DB is used for persisting registrations - more on that below - and successful notifications - to prevent duplications.

fancy highlevel view

We have decided to require explicit registration for any application that we would want to have deployment notifications. There are two reasons for doing this. First, our service runs in an account where different applications - from different teams - are running, so we wanted the ability to select which of those would have deployment notifications triggered. Second, as part of registering the application, we wanted the ability to define over which channels those notifications would be triggered. So our service provides an endpoint that takes care of registering a Codedeploy application. Here’s what a request to this endpoint look like.

curl -H 'Content-type: application/json' -X POST -d '{ "codedeploy_application_name": "CODE_DEPLOY_APPLICATION_NAME", "notifications": [ { "newrelic_notification": { "application_name": "NEWRELIC_APPLICATION_NAME" } } ] }' 'http://localhost:9000/registrations'

This will register a Codedeploy application and set it up for Newrelic notifications. The Codedeploy application name -CODE_DEPLOY_APPLICATION_NAME above - is used for fetching deployments, so it needs to be the exact name of the application in Codedeploy. The Newrelic application name - NEWRELIC_APPLICATION_NAME - on the other hand, is used to tell Newrelic which application the deployment notification belongs to. Even though we have only illustrated a single channel above, multiple ones can be provided, each always containing setup specific to that channel - e.g., SMTP server for Emails, topic name for SNS.

For each registered application, the service then queries Codedeploy for all deployments across all of their deployment groups, for a given time window. Any deployment marked as successful will have a notification triggered over all channels configured for that application. Each successful notification is then saved in Dynamo. That’s done on scheduled-basis - i.e., every time a pre-configured amount of time passes the service checks again for deployments. This means we can make the deployment notifications as realtime as possible, by just adjusting the scheduling frequency.

Our idea of a notification channel is completely generic. In other words, it’s independent with regards to the reason behind the notification. In that sense, it would be perfectly possible to register Newrelic notifications for failed deployments - even though in practical terms it would be a bit of nonsense, given Newrelic notifications are meant for successful deployments only. We leave it up to those registering their applications to make sure the setup is sound.

Even though we have talked about a single service doing all of the above, our solution, in fact, is split into two projects. One is a library - codedeploy-notifications - which provides an API for adding registrations, listing Codedeploy deployments, and triggering notifications. The service is then separate, simply integrating with the library. For example, for the registration endpoint we described above, the service uses the following API from codedeploy-notifications under the hood.

val amazonDynamoClient = ...
val registrationDao = new DynamoDbRegistrationDao(amazonDynamoClient)
val newRelicNotificationSetup = NewrelicNotificationSetup("NewrelicApplicationName")
val newRegistration = NewRegistration("CodedeployApplicationName", Seq(newRelicNotificationSetup))
registrationDao.newRegistration(newRegistration)

Splitting things this way means that the library - being free from anything Gilt-specific - can be open-sourced much more easily. Also, it gives users the freedom to choose how to integrate with it. Whereas we currently have it integrated with a small dedicated service, on a T2 Nano instance, others may find it better to integrate it with a service responsible for doing multiple things. Even though the service itself isn’t something open-sourcable - as it would contain API keys, passwords, and such - and it’s currently being owned by one team only, it is still generic enough so it can be used by other teams.

We have been using this approach for some of our Codedeploy applications, and have been quite happy with the results. The notifications are being triggered with minimal delay - within a couple of seconds for the vast majority of the deployments and under 10 seconds in the worst-case scenarios. The codedeploy-notifications library is open source from day 1 and available here. It currently supports Newrelic notifications, for successful deployments, and there is current work to support Emails as well as notifications for failed deployments. Suggestions, comments, and contributions are, of course, always welcome.

aws 5 codedeploy 2 newrelic 2 notifications 2
Tech
Page 1 of 65