The Gilt technology organization. We make gilt.com work.

Gilt Tech

CloudFormation Nanoservice

Ryan Martin aws

One of the big HBC Digital initiatives for 2017 is “buy online, pickup in store” - somewhat awkwardly nicknamed “BOPIS” internally. This is the option for the customer to, instead of shipping an order to an address, pick it up in a store that has the items in inventory.

A small part of this new feature is the option to be notified of your order status (i.e. when you can pickup the order) via SMS. A further smaller part of the SMS option is what to do when a customer texts “STOP” (or some other similar stop word) in response to one of the SMS notifications. Due to laws such as the Telephone Consumer Protection Act (TCPA) and CAN-SPAM Act, we are required to immediately stop sending additional messages to a phone number, once that person has requested an end to further messaging.

Our SMS provider is able to receive the texted response from the customer and POST it to an endpoint of our choosing. We could wrap such an endpoint into one of our existing microservices, but the one that sends the SMS (our customer-notification-service) is super-simple: it receives order events and sends notifications (via email or SMS) based on the type of event. It is essentially a dumb pipe that doesn’t care about orders or users; it watches for events and sends messages to customers based on those events. Wrapping subscription information into this microservice felt like overstepping the bounds of the simple, clean job that it does.

So this is the story of how I found myself writing a very small service (nanoservice, if you will) that does one thing - and does it with close-to-zero maintenance, infrastructure, and overall investment. Furthermore, I decided to see if I could encapsulate it entirely within a single CloudFormation template.

How we got here

Here are the two things this nanoservice needs to do:

  1. Receive the texted response and unsubscribe the customer if necessary
  2. Allow the customer notification service (CNS) to check the subscription status of a phone number before sending a SMS

In thinking about the volume of traffic for these two requests, we consider the following:

  1. This is on [https://www.saksfifthavenue.com] only (for the moment)
  2. Of the online Saks orders, only a subset of inventory is available to be picked up in the store
  3. Of the BOPIS-eligible items, only a subset of customers will choose to pickup in store
  4. Of those who choose to pickup in store, only a subset will opt-in for SMS messages
  5. Of those who opt-in for SMS, only a subset will attempt to stop messages after opting-in

For the service’s endpoints, the request volume for the unsub endpoint (#1 above) is roughly the extreme edge case of #5; the CNS check (#2) is the less-edgy-but-still-low-volume #4 above. So we’re talking about a very small amount of traffic: at most a couple dozen requests per day. This hardly justifies spinning up a microservice - even if it runs on a t2.nano, you still have the overhead of multiple nodes (for redundancy), deployment, monitoring, and everything else that comes with a new microservice. Seems like a perfect candidate for a serverless approach.

The architecture

As mentioned above, a series of order events flows to the customer notification service, which checks to make sure that the destination phone number is not blacklisted. If it is not, CNS sends the SMS message through our partner, who in turn delivers the SMS to the customer. If the customer texts a response, our SMS partner proxies that message back to our blacklist service.

The blacklist service is a few Lambda functions behind API Gateway; those Lambda functions simply write to and read from DynamoDB. Because the stack is so simple, it felt like I could define the entire thing in a single artifact: one CloudFormation template. Not only would that be a geeky because-I-can coding challenge, it also felt really clean to be able to deploy a service using only one resource with no dependencies. It’s open source, so anyone can literally copy-paste the template into CloudFormation and have the fully-functioning service in the amount of time it takes to spin up the resources - with no further knowledge necessary. Plus, the template is in JSON (which I’ll explain later) and the functions are in Node.js, so it’s a bit of

The API

Here at HBC Digital, we’ve really started promoting the idea of API-driven development (ADD). I like it a lot because it forces you to fully think through the most important models in your API, how they’re defined, and how clients should interact with them. You can iron out a lot of the kinks (Do I really need this property? Do I need a search? How does the client edit? What needs to be exposed vs locked-down? etc) before you write a single line of code.

I like to sit down with a good API document editor such as SwaggerHub and define the entire API at the beginning. The ADD approach worked really well for this project because we needed a quick turnaround time: the blacklist was something we weren’t expecting to own internally until very late in the project, so we had to get it in place and fully tested within a week or two. With an API document in hand (particularly one defined in Swagger), I was able to go from API definition to fully mocked endpoints (in API Gateway) in about 30 mins. The team working on CNS could then generate a client (we like the clients in Apidoc, an open-source tool developed internally that supports Swagger import) and immediately start integrating against the API. This then freed me to work on the implementation of the blacklist service without being a blocker for the remainder of the team. We settled on the blacklist approach one day; less than 24 hours later we had a full API defined with no blockers for development.

The API definition is fairly generic: it supports blacklisting any uniquely-defined key for any type of notification. The main family of endpoints looks like this:

/{notification_type}/{blacklist_id}

notification_type currently only supports sms, but could very easily be expanded to support things like email, push, facebook-messenger, etc. With this, you could blacklist phone numbers for sms independently from email addresses for email independently from device IDs for push.

A simple GET checks to see if the identifier of the destination is blacklisted for that type of notification:

> curl https://your-blacklist-root/sms/555-555-5555
{"message":"Entry not blacklisted"}

This endpoint is used by CNS to determine whether or not it should send the SMS to the customer. In addition to the GET endpoint, the API defines a PUT and a DELETE for manual debugging/cleanup - though a client could also use them directly to maintain the blacklist.

The second important endpoint is a POST that receives a XML document with details about the SMS response:

<?xml version="1.0" encoding="UTF-8"?>
<moMessage messageId="123456789" receiptDate="YYYY-MM-DD HH:MM:SS Z" attemptNumber="1">
    <source address="+15555555555" carrier="" type="MDN" />
    <destination address="12345" type="SC" />
    <message>Stop texting me</message>
</moMessage>

The important bits are the source address (the phone number that sent the message) and the message itself. With those, the API can determine whether or not to add the phone number to the blacklist. If it does, the next time CNS calls the GET endpoint for that phone number, the API will return a positive result for the blacklist and CNS will not send the SMS. The POST to /mo_message lives at the top-level because it is only through coincidence that it results in blacklisting for SMS; one could imagine other endpoints at the top-level that blacklist from other types of notifications - or even multiple (depending on the type of event).

Let’s see some code

First there are a couple functions shared across all the endpoints (and their backing Lambda functions):

function withSupportedType(event, context, lambdaCallback, callback) {
  const supportedTypes = ['sms'];
  if (supportedTypes.indexOf(event.pathParameters.notification_type.toLowerCase()) >= 0) {
    callback(event.pathParameters.notification_type.toLowerCase());
  } else {
    lambdaCallback(null, { statusCode: 400, body: JSON.stringify({ message: 'Notification type [' + event.pathParameters.notification_type + '] not supported.' }) });
  }
}

function sanitizeNumber(raw) {
  var numbers = raw.replace(/[^\d]+/g, '');
  if (numbers.match(/^1\d{10}$/)) numbers = numbers.substring(1, 11);
  return numbers;
}

These are there to ensure that each Lambda function is a) dealing with invalid notification_types and b) cleaning up the phone number in the same manner across all functions. Given those common functions, the amount of code for each function is fairly minimal.

The GET endpoint simply queries the DynamoDB for the unique combination of notification_type and blacklist_id:

const AWS = require('aws-sdk'),
      dynamo = new AWS.DynamoDB();

exports.handler = (event, context, callback) => {
  const blacklistId = sanitizeNumber(event.pathParameters.blacklist_id);
  withSupportedType(event, context, callback, function(notificationType) {
    dynamo.getItem({
      TableName: event.stageVariables.TABLE_NAME,
      Key: { Id: { S: blacklistId }, Type: { S: notificationType } }
    }, function(err, data) {
      if (err) return callback(err);
      if ((data && data.Item && afterNow(data, "DeletedAt")) || !onWhitelist(blacklistId, event.stageVariables.WHITELIST)) {
        callback(null, { statusCode: 200, body: JSON.stringify({ id: blacklistId }) });
      } else {
        callback(null, { statusCode: 404, body: JSON.stringify({ message: "Entry not blacklisted" }) });
      }
    })
  });
}

function afterNow(data, propertyName) {
  if (data && data.Item && data.Item[propertyName] && data.Item[propertyName].S) {
    return Date.parse(data.Item[propertyName].S) >= new Date();
  } else {
    return true;
  }
}

// Set the whitelist in staging to only allow certain entries.
function onWhitelist(blacklistId, whitelist) {
  if (whitelist && whitelist.trim() != '') {
    const whitelisted = whitelist.split(',');
    return whitelisted.findIndex(function(item) { return blacklistId == item.trim(); }) >= 0;
  } else {
    return true;
  }
}

Disregarding the imports at the top and some minor complexity around a whitelist (which we put in place only for staging/test environments so we don’t accidentally spam people while testing), it’s about a dozen lines of code (depending on spacing) - with minimal boilerplate. This is the realization of one of the promises of the serverless approach: very little friction against getting directly to the meat of what you’re trying to do. There is nothing here about request routing or dependency-injection or model deserialization; the meaningful-code-to-boilerplate ratio is extremely high (though we’ll get to deployment later).

The PUT (add an entry to the blacklist, managing soft-deletes correctly)

exports.handler = (event, context, callback) => {
  const blacklistId = sanitizeNumber(event.pathParameters.blacklist_id);
  withSupportedType(event, context, callback, function(notificationType) {
    dynamo.updateItem({
      TableName: event.stageVariables.TABLE_NAME,
      Key: { Id: { S: blacklistId }, Type: { S: notificationType } },
      ExpressionAttributeNames: { '#l': 'Log' },
      ExpressionAttributeValues: {
        ':d': { S: (new Date()).toISOString() },
        ':m': { SS: [ toMessageString(event) ] }
      },
      UpdateExpression: 'SET UpdatedAt=:d ADD #l :m REMOVE DeletedAt'
    }, function(err, data) {
      if (err) return callback(err);
      callback(null, { statusCode: 200, body: JSON.stringify({ id: blacklistId }) });
    })
  });
}

and DELETE (soft-delete entries when present)

exports.handler = (event, context, callback) => {
  const blacklistId = sanitizeNumber(event.pathParameters.blacklist_id);
  withSupportedType(event, context, callback, function(notificationType) {
    dynamo.updateItem({
      TableName: event.stageVariables.TABLE_NAME,
      Key: { Id: { S: blacklistId }, Type: { S: notificationType } },
      ExpressionAttributeNames: { '#l': 'Log' },
      ExpressionAttributeValues: {
        ':d': { S: (new Date()).toISOString() },
        ':m': { SS: [ toMessageString(event) ] }
      },
      UpdateExpression: 'SET DeletedAt=:d, UpdatedAt=:d ADD #l :m'
    }, function(err, data) {
      if (err) return callback(err);
      callback(null, { statusCode: 200, body: JSON.stringify({ id: blacklistId }) });
    })
  });
}

functions are similarly succinct. The POST endpoint that receives the moMessage XML is a bit more verbose, but only because of a few additional corner cases (i.e. when the origin phone number or the message isn’t present).

exports.handler = (event, context, callback) => {
  const moMessageXml = event.body;
  if (messageMatch = moMessageXml.match(/<message>(.*)<\/message>/)) {
    if (messageMatch[1].toLowerCase().match(process.env.STOP_WORDS)) { // STOP_WORDS should be a Regex
      if (originNumberMatch = moMessageXml.match(/<\s*source\s+.*?address\s*=\s*["'](.*?)["']/)) {
        var originNumber = sanitizeNumber(originNumberMatch[1]);
        dynamo.updateItem({
          TableName: event.stageVariables.TABLE_NAME,
          Key: { Id: { S: originNumber }, Type: { S: 'sms' } },
          ExpressionAttributeNames: { '#l': 'Log' },
          ExpressionAttributeValues: {
            ':d': { S: (new Date()).toISOString() },
            ':m': { SS: [ moMessageXml ] }
          },
          UpdateExpression: 'SET UpdatedAt=:d ADD #l :m REMOVE DeletedAt'
        }, function(err, data) {
          if (err) return callback(err);
          callback(null, { statusCode: 200, body: JSON.stringify({ id: originNumber }) });
        });
      } else {
        callback(null, { statusCode: 400, body: JSON.stringify({ message: 'Missing source address' }) });
      }
    } else {
      callback(null, { statusCode: 200, body: JSON.stringify({ id: '' }) });
    }
  } else {
    callback(null, { statusCode: 400, body: JSON.stringify({ message: 'Invalid message xml' }) });
  }
}

A couple things to call out here. First - and I know this looks terrible - this function doesn’t parse the XML - it instead uses regular expressions to pull out the data it needs. This is because Node.js doesn’t natively support XML parsing and importing a library to do it is not possible given my chosen constraints (the entire service defined in a CloudFormation template); I’ll explain further below. Second, there is expected to be a Lambda environment variable named STOP_WORDS that contains a regular expression to match the desired stop words (things like stop, unsubscribe, fuck you, etc).

That’s pretty much the extent of the production code.

Deployment - CloudFormation

Here’s where this project gets a little verbose. Feel free to reference the final CloudFormation template as we go through this. In broad strokes, this template matches the simple architecture diagram above: API Gateway calls Lambda functions which each interact with the same DynamoDB database. The bottom of the stack (i.e. the top of the template) is fairly simple: two DynamoDBs (one for prod, one for stage) and an IAM role that allows the Lambda functions to access the databases.

On top of that are the four Lambda functions - which contain the Node.js code (this is the “YO DAWG” part, since the Javascript is in the JSON template) - plus individual permissions for API gateway to call each function. This section (at the bottom of the template) is long but is mostly code-generated (we’ll get to that later).

In the middle of the template lie a bunch of CloudFormation resources that define the API Gateway magic: a top-level Api record; resources that define the path components under that Api; methods that define the endpoints and which Lambda functions they call; separate configurations for stage vs prod. At this point, we’re just going to avert our eyes and reluctantly admit that, okay, fine, serverless still requires some boilerplate (just not inline with the code, damn it!). At some level, every service needs to define its endpoints; this is where our blacklist nanoservice does it.

All-in, the CloudFormation template approaches 1000 lines (fully linted, mind you, so there are a bunch of lines with just tabs and curly brackets). “But wait!” you say, “Doesn’t CloudFormation support YAML now?” Why yes, yes it does. I even started writing the template in YAML until I realized I shouldn’t.

Bringing CloudFormation together with Node.js

To fully embed the Node.js functions inside the CloudFormation template would have been terrible. How would you run the code? How would you test it? A cycle of: tweak the code => deploy the template to the CloudFormation stack => manually QA - that would be a painful way of working. It’s unequivocally best to be able to write fully isolated and functioning Node.js code, plus unit tests in a standard manner. The problem is that Node.js code then needs to be zipped and uploaded to S3 and referenced by the CloudFormation template - which would create a dependency for the template and would not have achieved the goal of defining the entire service in a single template with no dependencies.

To resolve this, I wrote a small packaging script that reads the app’s files and embeds them in the CloudFormation template. This can then be run after every code change (which obviously would have unit tests and a passing CI build), to keep the template inline with all code changes. The script is written in Node.js (hey, if you’re running tests locally, you must already have Node.js installed locally), so a CloudFormation template written in JSON (as opposed to YAML) is essentially native - no parsing necessary. The script can load the template as JSON, inject a CloudFormation resource for each function in the /app directory, copy that function’s code into the resource, and iterate. Which brings us to

The other thing to note about going down the path of embedding the Node.js code directly in the CloudFormation template (as opposed to packaging it in a zip file): all code for a function must be fully contained within that function definition (other than the natively supported AWS SDK). This has two implications: first, we can’t include external libraries such as a XML parser or a Promise framework (notice all the code around callbacks, which makes the functions a little more verbose than I’d like). Second, we can’t DRY out the functions by including common functions in a shared library; thus they are repeated in the code for each individual function.

Conclusion

So that’s it: we end up with a 1000-line CloudFormation template that entirely defines a blacklist nanoservice that exposes four endpoints and runs entirely serverless. It is fully tested, can run as a true Node.js app (if you want), and will likely consume so few resources that it is essentially free. We don’t need to monitor application servers, we don’t need to administer databases, we don’t need any non-standard deployment tooling. And there are even separate stage and production versions.

You can try it out for yourself by building a CloudFormation stack using the template. Enjoy!

aws 10 cloudformation 1 sms 1 nanoservice 1 api-driven development 1 swagger 1 apibuilder 1 serverless 1
Tech
Gilt Tech

The POps Up Plant Shop

HBC Digital culture

How do we keep our teams happy and high-performing? That’s the focus for the People Operations (POps) team.

The POps mission is:

To build and maintain the best product development teams in the world through establishing the models around how we staff and organize our teams, how we plan and execute our work, and how we develop our people and our culture.

Our work includes:

We also like to have some fun, too.

Surprise and Delight

This week we coordinated an intercontinental “POps Up Plant Shop” for our people in NYC and Dublin. Between the two offices, we distributed 350 plants. Crotons, ivies, succulents and more were on offer. Everyone loved the surprise. While POps is focused on working with our tech teams, we noticed a few folks from other departments at HBC taking plants for their desks - a good indicator that what we’re doing is working!

Beyond adding a dash of color the office, offices plants are proven to increase happiness and productivity which aligns perfectly with the mission of the POps team.

people operations 1 happiness 2 productivity 2
Tech
Gilt Tech

Mobile Design Sprint

HBC Digital mobile

HBC Digital is a large organization. We are hundreds of technologists responsible for the retail experiences for many of North America’s largest retailers including Saks Fifth Avenue, Saks OFF 5TH, Gilt, Lord & Taylor and the Bay. Our breadth allows us to work on complex challenges with huge upsides. The number of opportunities available to us, however, requires commitment from our teams to ensure we are focused on the right problems.

Recently our mobile team took part in a week-long design sprint. The goal of the five-day process was to answer critical business questions through design, prototyping and testing ideas with customers, who are always at the center of our work. They wanted to make sure they were solving the right problem for our customers.

The design sprint was inspired by past exercises we’ve conducted with Prolific Interactive, however, this iteration was facilitated by the Senior Program Manager on our mobile team. The goal was to use the Saks Fifth Avenue app to “reduce shopping friction, unifying the customer experience across physical and digital stores”.

The Process

Each day of the five-day sprint had a particular focus:

  • Day 1 - Goal Setting and Mapping the Challenge
  • Day 2 - Sketching Ideas and Setting a Direction
  • Day 3 - Prototyping
  • Day 4 - Prototyping
  • Day 5 - User Testing

The exercise involved experts from across Hudson’s Bay Company including product, engineering, UX, business partners from Saks Fifth Avenue stores and our customers.

Opportunities

Any team embarking on a design sprint should outline their goal and opportunities at the start of the sprint. These help to keep the team focused throughout the exercise. We identified three specific opportunities for our team:

  • Refine the vision for the Saks app
  • Seek business opportunities of being a partner with other divisions in HBC
  • Quickly vet ideas in line with Saks’ business themes



What We Learned

The “expert panel” conducted with our business partners from stores was one of the big wins of the week. The group setting allowed for lots of interaction and Q&A. Everyone on the team had the first-hand experience of hearing about the pain points of our partners in stores which paid huge dividends during our storyboarding and prototyping sessions.

Day 5 was “judgement day”. We created a test environment in our Saks Downtown store to mimic the in-store experience we envisioned during our prototyping session. By demoing in-store with Saks Fifth Avenue shoppers, we were able to get real-time feedback from our customers as they interacted with the prototype. The ability to iterate based on customer feedback before entering production will help to reduce our engineering overhead.

An added bonus of the sprint was how it energized our people. The team decided what to focus on, experimented with new technologies and connected directly with our store operations team and customers. All of these opportunities boosted morale and engagement.

Some of the things we plan to change for next time include:

  • adjust the timing of some activities (diligent time keeping of activities will pay off when mapping out the agenda for our next design sprint)
  • involve more people from our engineering team to improve the fluidity of our prototyping sessions
  • invest more time in preparation ahead of the exercise to improve our efficiency

What’s Next

With the design sprint complete, we are moving on to the feasibility/technical discovery process and defining the MVP. The tech discovery process for the MVP will feature a hackathon next month to test and build on some of the themes and technologies we identified as opportunities in the design sprint. The user testing with customers in-store during the design sprint will also heavily influence our work during the hackathon.

Stay tuned to this blog or head over to the App Store and download the Saks Fifth Avenue app to keep an eye on what we’re building.

agile 10 saks 1 mobile 23 design 6 ux 4
Tech
Gilt Tech

Meetups: April Recap and What's Happening In May

John Coghlan meetups

April Meetups: 105 guests, 48 seltzers, 45 All Day IPAs, 19 pizzas & 2 great speakers.

On April 20, we hosted the NYC Scrum User Group for the third time in 2017. Rob Purdie, founder of the group and Agile Coach at IBM, gave an update on IBM’s Agile Transformation. The talk repeatedly returned to the theme of ensuring your team is “doing the right work”, warning the room of agilists that becoming very efficient at doing work that doesn’t matter is the fastest way to get nowhere. It reminded me of a quote written on the wall of our office: “Our greatest fear should not be failure, but of succeeding in life at things that don’t really matter.” While every NYC SUG Meetup has been great, this one stood out for its accessibility and high levels of audience engagement.

NY Scala University Meetup

A few days later we hosted Li Haoyi (pictured above) who gave a great talk on ‘Designing Open Source Libraries’ at our NY Scala University Meetup. He focused on intuitiveness, layering and documentation as the three keys to creating an open-source library that will keep engineers happy and drive engagement. Haoyi, the author of many popular libraries and fresh off a talk at Scala Days, drew the biggest turnout yet to our new Lower Manhattan HQ. We had to order more pizza 10 minutes after we opened the doors! His honest insights and great delivery also set a record for laughs.

Looking Ahead

Here are some of the tech events on our calendar in May. Hope to see you there!

  • May 1 - Dana Pylayeva, HBC Digital’s Agile Coach, is organizing Big Apple Scrum Day, a one day community conference focused on Scrum/Agile principles and practices. This 2017 theme is Always Keep Growing.
  • May 6-7 - We’re sponsoring !!Con (pronounced “bang bang con”), “two days of ten-minute talks (with lots of breaks, of course!) to celebrate the joyous, exciting, and surprising moments in computing”.
  • May 10 - Evan Maloney, Distinguished Engineer at HBC Digital, will be speaking at the Brooklyn Swift Developers Meetup at Work & Co in DUMBO. His talk will trace through the evolution of our project structure and development workflow to arrive at where we are today: a codebase that’s about halfway through a transition to Swift. Some folks from our mobile team will be visiting from Dublin for this one!
  • May 11 - Petr Zapletal of Cake Solutions will deliver a talk on how to avoid common pitfalls when designing reactive applications at our NY Scala University Meetup.
  • May 24 - Demo Day for our ScriptEd class - a group of high school students who have been learning web development from HBC Digital engineers in our offices every week since September.
  • May 25 - NYC PostgreSQL User Group Meetup - details coming shortly.
  • May 26 - Summer Fridays start!
meetups 35 scrum 1 agile 10 scala 17
Tech
Gilt Tech

HBC Digital is Sponsoring !!Con

HBC Digital conferences

On May 6-7 one of the year’s most unique tech events is taking place in NYC. !!Con (pronounced “bang bang con”) is two-days of ten-minute talks featuring a diverse array of speakers and topics. You won’t find a lineup like this at your typical tech conference - punch cards, cyborgs, glowing mushrooms, queer feminist cyberpunk manifestos and airplane noise are just a few of the topics on the agenda.

Given the excitement around this conference, tickets went fast - sold-out-in-minutes-fast - but there will be videos and a live stream so the 200+ person waiting list and those unable to be in NYC next weekend will still be able to enjoy the talks. Stay tuned to @bangbangcon on Twitter for more info.

We’re thrilled to be supporting this year’s !!Con as an AWESOME! Sponsor. Be sure to say hi to one of our friendly engineers and snag some HBC Digital swag if you’re there!

tech 21 conferences 27
Tech
Gilt Tech

Pau Carré Cardona To Speak at O'Reilly AI Conference

HBC Digital AI

The O’Reilly Artificial Intelligence Conference is coming to New York City in June. From June 27-29, the top minds in AI will be meeting for “a deep dive into emerging AI techniques and technologies with a focus on how to use it in real-world implementations.”

We are excited that one of our software engineers, Pau Carre Cardona, will be leading a session as part of the “Implementing AI” track on June 29. Pau’s talk will expand upon his widely read blog post on how we have applied deep learning at Gilt to complete tasks that require human-level cognitive skills. He’ll touch on how we have leveraged Facebook’s open source Torch implementation of Microsoft’s ResNet for image classification and his open-source project TiefVision which is used to detect image similarity.

You can find more details on Pau’s session here: Deep Learning in the Fashion Industry.

machine learning 8 deep learning 3 AI 2 gilt 88 conferences 27
Tech
Gilt Tech

Where to find our team in March

HBC Digital meetups

We have a busy month lined up:

  • March 6 – We’re hosting the NY Scala Meetup featuring Gary Coady, Sr Software Engineer at HBC Digital, leading a talk on “Cleaning Up with Free Monads.” - RSVP
  • March 8 – In honor of International Women’s Day, we’re hosting the Techfest Club Meetup. The Meetup will feature a talk on “The New Face of Personalization” from Cassie Lancellotti-Young, EVP Customer Success at Sailthru. - RSVP
  • March 9 – Heather Fleming, VP, People Operations and Product Delivery, is delivering the keynote address on “The New Work Order” at the Future of Work Summit in Austin, Texas. - MORE INFO
  • March 9 – Ryan Martin, Sr Director, Engineering, is sitting for a fireside chat about Lamda during AWS Loft Architecture Week in NYC. - MORE INFO
  • March 16 – Our Agile Coach, Dana Pylayeva, is leading a workshop on “Growing By Sharing: Transitioning a Group to a Self-Directed Model” with Mary Pratt when we host the NYC Scrum User Group Meetup. - RSVP
  • March 22 – We’re hosting the Elasticsearch User Group Meetup in NYC. HBC Digital Engineers Neil Girardi, Jose Martinez and Ed Perry will highlight some of the innovative ways we have leveraged the Elastic Stack. - RSVP
  • March 25 – We’re hosting the Dublin Microservices Meetup in Dublin. The Meetup will feature a talk on “Solving Service Discovery: How Node.js microservices can find each other without a registry” from Richard Roger, CEO at nearForm. - RSVP
agile 10 people ops 1 scala 17 aws 10
Tech
Gilt Tech

Don’t just read about DevOps culture, play-test it!

Dana Pylayeva DevOps

Don’t just read about DevOps culture, play-test it!

A lot of people talk about DevOps Culture. Yes, you can learn about a culture by reading a book or a blog post. A much more effective and fun way to learn about a culture is by experiencing it. This blog post is your invitation to experience DevOps culture through a simulation game!

My interest in DevOps originated from a very unlikely turn that my career took 7 years ago. An opportunity came up to push myself completely out of my comfort zone in a developer’s world. I’d taken on a job of DBA Manager and found myself in a harsh, alerts-driven world of pagers, disaster recoveries and escalation procedures. The sense of urgency and pressure was incredible and made me wonder why I never knew about it as a developer.

Fast-forward a few years to my next role as an Agile Coach. I came across “The Phoenix Project”. I read the book from cover to cover, re-living painful moments of the past years, yet growing fond of this new “DevOps” approach. How can I share the new learning and make it resonate as strongly with others? Why not try to make it into a simulation game? Equipped with Gamification course and “The Art of Game design”, I put together the first version of the “Chocolate, Lego and Scrum Game”.

Just like in DevOps, amplifying the feedback loop is extremely important in game development! Over the next two years, I’ve taken every opportunity to play the game with different groups, collecting feedback, modifying the game and taking it again into “production” for new rounds of play-testing and learning. What made this game unique was its focus on the DevOps culture and “close to real life” quality of simulation.

The game starts with showcase of a large organization with departmental silos. Development teams are using Scrum to manage their work, Operations have their own process. As in a typical bureaucratic culture, the flow of information is broken. Information is shared on the “need to know” basis. Each team has its own goals and the mission of the organization is unclear. During the game this fictitious organization transitions from silos to locally optimized silos to an organization optimized for a continuous flow of value.

Scrum Gathering Rio, Brazil - DevOps Culture simulation with Chocolate, Lego and Scrum Game Build your T-shaped skills! Agile Practitioners 2017, Israel - DevOps workshop with Chocolate, LEGO and Scrum game

Every player in the game gets a special role to play individually as well as a part of his/her team. Together players build products with LEGO and learn to respond to ever-changing market demand. They wait for environments to be built by Operations, get interrupted by security upgrades and even get attacked by a hacker! The game engages everyone to the extent that they forget about time. They experience a range of emotions as they go through their DevOps journey and transition toward a generative culture of collaboration and shared goals.

While this DevOps transformation is a gamified simulation, the lessons people learn are very real and can be applied to their actual DevOps transformations! Here are just a few examples of the “A-ha!” moments highlighted by the participants at Scrum Gathering Porto and at Lean-Agile practitioners of NJ meetup:

“Even after DevOps transformation some Ops people want to keep being gate keepers. Hard to give up traditional roles!”

“Potentially shippable” doe not equal ”in production.”

“Cross-training Dev and Ops streamlined the process of getting products to production.”

“Share skills! Bottleneck is formed when only one person knows it”

Curious about playing this game in your organization?

In a spirit of sharing skills and not being a bottleneck, I have documented detailed facilitation instructions, floor plans, facilitator scripts and the game cards in my new “Introduction to DevOps with Chocolate, LEGO” book recently published by Apress. Go ahead - develop your DevOps transformation muscle memory and experience teams’ behavioral patterns. Feel the difference DevOps culture makes in establishing trust and psychological safety in your organization. Have fun facilitating the game with your teams and please share your learnings.

culture 32 devops 4 Spotlight on DevOps 1
Tech
Gilt Tech

Sundial PagerDuty Integration

Giovanni Gargiulo aws

Sundial

A few months ago, Gilt Tech announced Sundial. Sundial is an open source batch job scheduler for Amazon ECS. Over the course of the last few months, Sundial has seen a significant adoption both inside and outside of Gilt.

Until Sundial v0.0.10, emailing was the only way of notifying job failures.

At the beginning when the number of jobs running on Sundial was small (and so was the number of failures!), it was fairly easy to spot emails of failed jobs and act accordingly.

Lately though, in the Personalization Team, Sundial schedules about a thousand job executions per day and it’s easy to imagine the amount of noise in our inbox generated by job notifications.

Beside the noise, it has happened more than once that failure of critical jobs went unnoticed. This was of course unacceptable.

Since PagerDuty is the de facto standard in Gilt when it comes to on call procedures and since PagerDuty offers a nice and reliable events API, we’ve redesigned the notification mechanism and integrated PagerDuty with Sundial.

Configuring PagerDuty on Sundial

Configuring your job to support both Emails and PagerDuty notifications is very straightforward and can be done by adding the following json snippet to your job definition:

{
"notifications": [
    {
      "email": {
        "name": "name",
        "email": "email",
        "notify_when": "on_state_change_and_failures"
      }
    },
    {
      "pagerduty": {
        "service_key": "my_pd_service_key",
        "num_consecutive_failures": 3,
        "api_url": "https://events.pagerduty.com"
      }
    }
  ]
}

Where

  • notify_when defines when email notifications will be sent. Possible values are:
    • always, Always notify when a process completes
    • on_failure, Notify when a process fails
    • on_state_change, Notify when a process goes from succeeding to failing and vice versa
    • on_state_change_and_failures, Notify when going from failing to succeeded and on each failure
    • never
  • my_pd_service_key is the key obtained in the Service Page in PagerDuty
  • num_consecutive_failures is the number of consecutive failures after which Sundial will trigger an alert in PagerDuty

Please note that the subscriptions object in the Process Definition json has been deprecated, so if you’ve already adopted Sundial and want to start using the new notifications, you will have to update your json accordingly.

More details can be found in the Sundial v.0.0.10 release page

aws 10 sundial 1 pagerduty 1
Tech
Gilt Tech

Voluntary Adoption in Action: HBC Digital Adopts Slack

Adrian Trenaman leadership

Musings on Decentralised Control and Voluntary Adoption in Large Organisations.

When I think of Slack, I think first of the great book by Tom DeMarco on the organisational “slack” we need to play, innovate, get big things done. It’s an amazing read, and I recommend without reservation. More recently, when I think of Slack, I think of the massive grassroots movement at HBC Digital that switched us from HipChat to Slack in just a few short weeks, without any top- down edict or stop-the-world migration. We acheived this by leveraging the simple idea of ‘voluntary adoption’: if a technology, framework, tool or service is really good then your teams will adopt it naturally, without coercion. The corollary of voluntary adoption is that if you find that you’re having to push a solution on a group of people and they’re resisting, pushing back, or not getting it, then it’s a good sign that the solution might not be as good as you previously thought.

Through merger and acquisition, we found ourselves in a position with multiple tech teams using different chat solutions, creating artificial divisions and cross-team awkwardness. We could have mandated a move to one of the incumbent chat solutions at HBC and dragged everyone across the divide: a solution that would have been a long hard march. Instead, we looked about at the current most- loved tool, Slack, kicked off a couple of channels, invited some of our teams in, and said, “hey, try it out.” Within days we encountered some interesting effects: first, people loved it; and second, they wanted clarity to know if everyone could just move there together. Without having to force or coerce anyone, we’re now all together on one system: Slack.

So what do we learn from this application of voluntary adoption? First, we got the outcome we wanted, fast, and it stuck. Second, but perhaps more interestingly, was that we traded off one kind of organisational stress for another. Top down, authoritative control offers clarity and a sense of control, and the expense of individual choice. “Everyone, we’re using Tool X” has a clarity, but smart folk quickly reject being told to use a tool they don’t like and that leads to stress and angst. “Everyone, we don’t have an agreed standard yet so why not try this as well as the current solutions?” feels rudderless and perhaps somewhat chaotic for those in the midst of it: adoptees are confused and wonder which one to choose. However, this approach unleashes a Darwinian process of natural selection: a decentralised, collaborative optimisation process that will either squash a bad idea up front or elevate a good idea into something grand.

We apply voluntary adoption at multiple levels - in our open-source offerings, internal libraries, tools, and how we work together as teams - and the ramifications for voluntary adoption for us as engineers and product innovators are profound. If you’re going to invest time into building something, spend time on making it a dream for the newcomer: easy to use, surprising, delighting. Think: you are not in a position to force your solution on someone; however, you can make your solution something that’s a dream to adopt. Voluntarily.

leadership 6 organisation 1 chat solutions 1 voluntary adoption 1 culture 32 hipchat 1 slack 1
Tech
Page 1 of 68