The Gilt technology organization. We make gilt.com work.

Gilt Tech

Where to find our team in March

HBC Digital meetups

We have a busy month lined up:

  • March 6 – We’re hosting the NY Scala Meetup featuring Gary Coady, Sr Software Engineer at HBC Digital, leading a talk on “Cleaning Up with Free Monads.” - RSVP
  • March 8 – In honor of International Women’s Day, we’re hosting the Techfest Club Meetup. The Meetup will feature a talk on “The New Face of Personalization” from Cassie Lancellotti-Young, EVP Customer Success at Sailthru. - RSVP
  • March 9 – Heather Fleming, VP, People Operations and Product Delivery, is delivering the keynote address on “The New Work Order” at the Future of Work Summit in Austin, Texas. - MORE INFO
  • March 9 – Ryan Martin, Sr Director, Engineering, is sitting for a fireside chat about Lamda during AWS Loft Architecture Week in NYC. - MORE INFO
  • March 16 – Our Agile Coach, Dana Pylayeva, is leading a workshop on “Growing By Sharing: Transitioning a Group to a Self-Directed Model” with Mary Pratt when we host the NYC Scrum User Group Meetup. - RSVP
  • March 22 – We’re hosting the Elasticsearch User Group Meetup in NYC. HBC Digital Engineers Neil Girardi, Jose Martinez and Ed Perry will highlight some of the innovative ways we have leveraged the Elastic Stack. - RSVP
  • March 25 – We’re hosting the Dublin Microservices Meetup in Dublin. The Meetup will feature a talk on “Solving Service Discovery: How Node.js microservices can find each other without a registry” from Richard Roger, CEO at nearForm. - RSVP
agile 8 people ops 1 scala 16 aws 9
Tech
Gilt Tech

Don’t just read about DevOps culture, play-test it!

Dana Pylayeva DevOps

Don’t just read about DevOps culture, play-test it!

A lot of people talk about DevOps Culture. Yes, you can learn about a culture by reading a book or a blog post. A much more effective and fun way to learn about a culture is by experiencing it. This blog post is your invitation to experience DevOps culture through a simulation game!

My interest in DevOps originated from a very unlikely turn that my career took 7 years ago. An opportunity came up to push myself completely out of my comfort zone in a developer’s world. I’d taken on a job of DBA Manager and found myself in a harsh, alerts-driven world of pagers, disaster recoveries and escalation procedures. The sense of urgency and pressure was incredible and made me wonder why I never knew about it as a developer.

Fast-forward a few years to my next role as an Agile Coach. I came across “The Phoenix Project”. I read the book from cover to cover, re-living painful moments of the past years, yet growing fond of this new “DevOps” approach. How can I share the new learning and make it resonate as strongly with others? Why not try to make it into a simulation game? Equipped with Gamification course and “The Art of Game design”, I put together the first version of the “Chocolate, Lego and Scrum Game”.

Just like in DevOps, amplifying the feedback loop is extremely important in game development! Over the next two years, I’ve taken every opportunity to play the game with different groups, collecting feedback, modifying the game and taking it again into “production” for new rounds of play-testing and learning. What made this game unique was its focus on the DevOps culture and “close to real life” quality of simulation.

The game starts with showcase of a large organization with departmental silos. Development teams are using Scrum to manage their work, Operations have their own process. As in a typical bureaucratic culture, the flow of information is broken. Information is shared on the “need to know” basis. Each team has its own goals and the mission of the organization is unclear. During the game this fictitious organization transitions from silos to locally optimized silos to an organization optimized for a continuous flow of value.

Scrum Gathering Rio, Brazil - DevOps Culture simulation with Chocolate, Lego and Scrum Game Build your T-shaped skills! Agile Practitioners 2017, Israel - DevOps workshop with Chocolate, LEGO and Scrum game

Every player in the game gets a special role to play individually as well as a part of his/her team. Together players build products with LEGO and learn to respond to ever-changing market demand. They wait for environments to be built by Operations, get interrupted by security upgrades and even get attacked by a hacker! The game engages everyone to the extent that they forget about time. They experience a range of emotions as they go through their DevOps journey and transition toward a generative culture of collaboration and shared goals.

While this DevOps transformation is a gamified simulation, the lessons people learn are very real and can be applied to their actual DevOps transformations! Here are just a few examples of the “A-ha!” moments highlighted by the participants at Scrum Gathering Porto and at Lean-Agile practitioners of NJ meetup:

“Even after DevOps transformation some Ops people want to keep being gate keepers. Hard to give up traditional roles!”

“Potentially shippable” doe not equal ”in production.”

“Cross-training Dev and Ops streamlined the process of getting products to production.”

“Share skills! Bottleneck is formed when only one person knows it”

Curious about playing this game in your organization?

In a spirit of sharing skills and not being a bottleneck, I have documented detailed facilitation instructions, floor plans, facilitator scripts and the game cards in my new “Introduction to DevOps with Chocolate, LEGO” book recently published by Apress. Go ahead - develop your DevOps transformation muscle memory and experience teams’ behavioral patterns. Feel the difference DevOps culture makes in establishing trust and psychological safety in your organization. Have fun facilitating the game with your teams and please share your learnings.

culture 32 devops 4 Spotlight on DevOps 1
Tech
Gilt Tech

Sundial PagerDuty Integration

Giovanni Gargiulo aws

Sundial

A few months ago, Gilt Tech announced Sundial. Sundial is an open source batch job scheduler for Amazon ECS. Over the course of the last few months, Sundial has seen a significant adoption both inside and outside of Gilt.

Until Sundial v0.0.10, emailing was the only way of notifying job failures.

At the beginning when the number of jobs running on Sundial was small (and so was the number of failures!), it was fairly easy to spot emails of failed jobs and act accordingly.

Lately though, in the Personalization Team, Sundial schedules about a thousand job executions per day and it’s easy to imagine the amount of noise in our inbox generated by job notifications.

Beside the noise, it has happened more than once that failure of critical jobs went unnoticed. This was of course unacceptable.

Since PagerDuty is the de facto standard in Gilt when it comes to on call procedures and since PagerDuty offers a nice and reliable events API, we’ve redesigned the notification mechanism and integrated PagerDuty with Sundial.

Configuring PagerDuty on Sundial

Configuring your job to support both Emails and PagerDuty notifications is very straightforward and can be done by adding the following json snippet to your job definition:

{
"notifications": [
    {
      "email": {
        "name": "name",
        "email": "email",
        "notify_when": "on_state_change_and_failures"
      }
    },
    {
      "pagerduty": {
        "service_key": "my_pd_service_key",
        "num_consecutive_failures": 3,
        "api_url": "https://events.pagerduty.com"
      }
    }
  ]
}

Where

  • notify_when defines when email notifications will be sent. Possible values are:
    • always, Always notify when a process completes
    • on_failure, Notify when a process fails
    • on_state_change, Notify when a process goes from succeeding to failing and vice versa
    • on_state_change_and_failures, Notify when going from failing to succeeded and on each failure
    • never
  • my_pd_service_key is the key obtained in the Service Page in PagerDuty
  • num_consecutive_failures is the number of consecutive failures after which Sundial will trigger an alert in PagerDuty

Please note that the subscriptions object in the Process Definition json has been deprecated, so if you’ve already adopted Sundial and want to start using the new notifications, you will have to update your json accordingly.

More details can be found in the Sundial v.0.0.10 release page

aws 9 sundial 1 pagerduty 1
Tech
Gilt Tech

Voluntary Adoption in Action: HBC Digital Adopts Slack

Adrian Trenaman leadership

Musings on Decentralised Control and Voluntary Adoption in Large Organisations.

When I think of Slack, I think first of the great book by Tom DeMarco on the organisational “slack” we need to play, innovate, get big things done. It’s an amazing read, and I recommend without reservation. More recently, when I think of Slack, I think of the massive grassroots movement at HBC Digital that switched us from HipChat to Slack in just a few short weeks, without any top- down edict or stop-the-world migration. We acheived this by leveraging the simple idea of ‘voluntary adoption’: if a technology, framework, tool or service is really good then your teams will adopt it naturally, without coercion. The corollary of voluntary adoption is that if you find that you’re having to push a solution on a group of people and they’re resisting, pushing back, or not getting it, then it’s a good sign that the solution might not be as good as you previously thought.

Through merger and acquisition, we found ourselves in a position with multiple tech teams using different chat solutions, creating artificial divisions and cross-team awkwardness. We could have mandated a move to one of the incumbent chat solutions at HBC and dragged everyone across the divide: a solution that would have been a long hard march. Instead, we looked about at the current most- loved tool, Slack, kicked off a couple of channels, invited some of our teams in, and said, “hey, try it out.” Within days we encountered some interesting effects: first, people loved it; and second, they wanted clarity to know if everyone could just move there together. Without having to force or coerce anyone, we’re now all together on one system: Slack.

So what do we learn from this application of voluntary adoption? First, we got the outcome we wanted, fast, and it stuck. Second, but perhaps more interestingly, was that we traded off one kind of organisational stress for another. Top down, authoritative control offers clarity and a sense of control, and the expense of individual choice. “Everyone, we’re using Tool X” has a clarity, but smart folk quickly reject being told to use a tool they don’t like and that leads to stress and angst. “Everyone, we don’t have an agreed standard yet so why not try this as well as the current solutions?” feels rudderless and perhaps somewhat chaotic for those in the midst of it: adoptees are confused and wonder which one to choose. However, this approach unleashes a Darwinian process of natural selection: a decentralised, collaborative optimisation process that will either squash a bad idea up front or elevate a good idea into something grand.

We apply voluntary adoption at multiple levels - in our open-source offerings, internal libraries, tools, and how we work together as teams - and the ramifications for voluntary adoption for us as engineers and product innovators are profound. If you’re going to invest time into building something, spend time on making it a dream for the newcomer: easy to use, surprising, delighting. Think: you are not in a position to force your solution on someone; however, you can make your solution something that’s a dream to adopt. Voluntarily.

leadership 6 organisation 1 chat solutions 1 voluntary adoption 1 culture 32 hipchat 1 slack 1
Tech
Gilt Tech

Keeping an Extra Eye on Your Stack with CloudWatch Events

Emerson Loureiro aws

Why an Extra Eye?

There’s a lot going on in AWS; EC2 instances coming up, new releases being rolled out, services scaling up and down, new services and the underlying infrastructure being setup. If you own software running production, you probably know the drill; you need to ensure that the software and its supporting infrastructure are healthy and stable. You will also eventually need to diagnose production issues. In short, you need a way to keep an eye on your software. The scale at which we run things here at Gilt, with over 300 services in production, each with multiple instances, means this is even more important.

In AWS, CloudWatch Events is a powerful tool for monitoring your resources. In very simple terms, it allows you to receive notifications about some of your infrastructure in production, and then lets you decide what to do with it. This last part is where Lambdas come in, and I’ll go into the details of how that’s done in a minute. First, let’s look at some of the events you can receive with CloudWatch.

  • EC2: instance state changes (starting, running, stopping, stopped, and terminated);
  • CodeDeploy: deployment state changes (started, succeeded, failed);
  • AutoScaling: instance terminated, instance added to AutoScaling Group;
  • ECS: task state changes

The Basic Framework

The framework for consuming these events consists of three parts: a CloudWatch rule, a permission, and a target. In our case, the target is a Lambda, but it can also be a SQS queue, an SNS topic and a few other things. The CloudWatch rule determines the actual event you are interested in receiving. The Lambda is what will receive the event and allow you to act on it (e.g., send an email notification). Finally, the permission binds the rule to the Lambda, enabling the Lambda invocation whenever the rule is met.

In a little bit more details, the CloudWatch rule consists of a source - i.e., the AWS service where the event originates - a detail-type, specifying the specific event you are interested in receiving - e.g., failed deployments - and finally a detail which is essentially a filter. For CodeDeploy deployments, for example, that would be which deployment groups the events should be fired for. Here’s an example of a rule we actually use in production at Gilt. This rule will fire events whenever instances under the production and canary AutoScaling groups of the service some-service are terminated.

{
  "detail-type": [
    "EC2 Instance Terminate Successful"
  ],
  "source": [
    "aws.autoscaling"
  ],
  "detail": {
    "AutoScalingGroupName": [
      "some-service-production-auto-scaling-group",
      "some-service-canary-auto-scaling-group"
    ]
  }
}

You can create the rule in different ways, for example via the console, and the code snippet above is the representation of the rule from CloudWatch’s point of view. Here at Gilt, we usually use CloudFormation stacks for creating resources like that, and I will illustrate how to do that in a little while.

But, first, how exactly have we been using that? There are two use cases, actually. In one case, we want to be notified when an instance is terminated due to a healthcheck failure. This essentially means that the instance was running and healthy, but for whatever reason the healthcheck failed for some amount of time, and the instance was killed by the auto scaling group. This is definitely something we want to be aware of, as it may indicate a pattern on certain services - e.g., the instance type for this service is no longer enough, as it keeps on dying with an out of memory error. The other use case is for deployments. Either to know when the deployment for a new release is finished, or to help piecing together a timeline of events when investigating production issues.

The CloudWatch and Lambda Setup

Now let’s get into the details of how we have set that up, starting with the instance termination events. As I said early on, for creating resources in AWS, typically we rely on CloudFormation stacks, so it’s no different with our CloudWatch event + Lambda setup. Here’s the CloudFormation template that creates the CloudWatch rule for instance termination events.

InstanceTerminationEvent:
    Type: AWS::Events::Rule
    Properties:
      Name: my-instance-termination-rule
      State: ENABLED
      Targets:
        - Arn: arn:aws:lambda:us-east-1:123456789:function:my-instance-termination-lambda-function
          Id: my-instance-termination-rule-target
      EventPattern:
        source: ['aws.autoscaling']
        detail-type: ['EC2 Instance Terminate Successful']
        detail:
          AutoScalingGroupName:
            - !Ref YourAutoScalingGroup
            - !Ref AnotherOfYourAutoScalingGroup

When the rule is matched to an event, given the conditions stated on the template, it will invoke the target - in this case a Lambda. The ARN of the Lambda is then provided as the value of the target. This rule is essentially stating that, when an EC2 instance is terminated within the auto scaling groups defined, the Lambda should be triggered. The event itself is then used as the input to the Lambda.

Remember though that you also need a permission to allow the rule to invoke the target. Here’s how we define it on a CloudFormation template.

InstanceTerminationLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:*
      FunctionName: arn:aws:lambda:us-east-1:123456789:function:my-instance-termination-lambda-function
      Principal: events.amazonaws.com
      SourceArn: !GetAtt ['InstanceTerminationEvent', 'Arn']

The Lambda itself is an exception; instead of creating it via a CloudFormation template, we simply defined it via the console. That way, it’s simpler to test it out as and perform code changes. Below is our Lambda - in Python - which takes the instance termination events and sends an email to our team with details around the instance that has been terminated, the time, and also the cause. In this particular case, as I mentioned above, we are only interested in instances that have been terminated due to a health check failure, so the cause on the emails will always be the same. It’s worth pointing out though that the email notification is just one option. You can also, for example, integrate your Lambda with something like PagerDuty if you wish to have more real time alerts.

import boto3

# A wrappper for the instance termination event we get from AWS
class Ec2TerminationEvent:
    def __init__(self, event):
        detail = event['detail']
        self.instance_id = detail['EC2InstanceId']
        self.cause = detail['Cause']
        self.terminated_at = detail['EndTime']
        self.is_health_check_failure = self.cause.endswith('ELB system health check failure.')
        ec2_client = boto3.client('ec2')
        # Fetching the instance name
        instance = (((ec2_client.describe_instances(InstanceIds=[self.instance_id])['Reservations'])[0])['Instances'])[0]
        instance_tags = instance['Tags']
        self.instance_name = None
        for instance_tag in instance_tags:
            tag_key = instance_tag['Key']
            if tag_key == 'Name':
                self.instance_name = instance_tag['Value']

def lambda_handler(event, context):
    print('Received EC2 termination event: {}'.format(event))
    ec2_termination_event = Ec2TerminationEvent(event)
    if ec2_termination_event.is_health_check_failure:
        print('Event for instance {} is a health check failure'.format(ec2_termination_event.instance_id))
        send_email(ec2_termination_event)
    return 'ok'

def send_email(ec2_termination_event):
    ses_client = boto3.client('ses')
    # Simply change the email address below to the one for your
    # team, for example
    destination = { 'ToAddresses': ['your-email-address@your-domain.com'] }
    email_subject = { 'Data': 'EC2 Instance {} Terminated'.format(ec2_termination_event.instance_id) }
    email_body = { 'Html': { 'Data': create_message_text(ec2_termination_event) } }
    email_message = { 'Subject': email_subject, 'Body': email_body }
    # Also change the email addresses below to the ones applicable
    # to your case
    send_email_response = ses_client.send_email(Source='email-address-to-send-to@your-domain.com', Destination=destination, Message=email_message, ReplyToAddresses=['your-email-address@your-domain.com'])
    message_id = send_email_response['MessageId']
    print('Successfully sent email with message id {}'.format(message_id))

def create_message_text(ec2_termination_event):
    instance_id_line = '<b>Instance</b>: <a href=\"https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:search=' + ec2_termination_event.instance_id + ';sort=tag:Name\">' + ec2_termination_event.instance_id + '</a><br>'
    ec2_termination_event.instance_name_line = ''
    if ec2_termination_event.instance_name is not None:
        instance_name_line = '<b>Instance name</b>: ' + ec2_termination_event.instance_name + '<br>'
    cause_line = '<b>Cause</b>: ' + ec2_termination_event.cause + '<br>'
    terminated_at_line = '<b>Terminated at</b>: ' + ec2_termination_event.terminated_at + '<br>'
    return  instance_id_line + instance_name_line + cause_line + terminated_at_line

For our deployment notifications, the setup is fairly similar. For this, as I said before, we are only interested in events for deployments that have succeeded (even though the setup here can be easily extended to include failed deployments too!). Here is the CloudFormation template snippet for creating the rule and the permission.

CodeDeploySuccessNotificationEvent:
    Type: AWS::Events::Rule
    Properties:
      Name: my-deployment-successful-rule
      State: ENABLED
      Targets:
        - Arn: arn:aws:lambda:us-east-1:123456789:function:my-deployment-successful-lambda-function
          Id: my-deployment-successful-target
      EventPattern:
        source: ['aws.codedeploy']
        detail-type: ['CodeDeploy Deployment State-change Notification']
        detail:
          state: [SUCCESS]
          application: [!Ref YourCodedeployApplication]
CodeDeploySuccessNotificationEventLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:*
      FunctionName: arn:aws:lambda:us-east-1:123456789:function:my-deployment-successful-lambda-function
      Principal: events.amazonaws.com
      SourceArn: !GetAtt ['CodeDeploySuccessNotificationEvent', 'Arn']

And below is the Lambda that receives the event, and sends out an email notification. On the email it will be included the application name, the deployment group where the deployment has happened, as well as the release version. Our actual Lambda in production also fires a deployment notification to NewRelic. In there you have a history of the releases for a given service, and how metrics have changed since each release. That can come in handy when establishing timelines and finding out exactly which release is broken.

import boto3
import httplib

# A wrapper for the deployment successful we receive
# from CodeDeploy. Extracs the application, deployment
# group, release version, etc.
class CodeDeployEvent():
    def __init__(self, event):
        detail = event['detail']
        self.application = detail['application']
        self.deployment_id = detail['deploymentId']
        self.deployment_group = detail['deploymentGroup']
        codedeploy_client = boto3.client('codedeploy')
        deployment = codedeploy_client.get_deployment(deploymentId=self.deployment_id)
        self.version = None
        self.release_file = None
        revision = (deployment['deploymentInfo'])['revision']
        if (revision['revisionType'] == 'S3'):
            s3_location = revision['s3Location']
            self.release_file = s3_location['key']
            last_slash_index = self.release_file.rfind('/')
            last_dot_index = self.release_file.rfind('.')
            self.version = self.release_file[last_slash_index + 1:last_dot_index]

def lambda_handler(event, context):
    code_deploy_event = CodeDeployEvent(event)
    print('Received success deploy {} for application {} and deployment group {}'.format(code_deploy_event.deployment_id, code_deploy_event.application, code_deploy_event.deployment_group))
    send_email(code_deploy_event)
    return 'ok'

def send_email(code_deploy_event):
    ses_client = boto3.client('ses')
    # Simply change the email address below to the one for your
    # team, for example
    destination = { 'ToAddresses': ['your-email-address@your-domain.com'] }
    email_subject = { 'Data': 'Deployment {} Successful'.format(code_deploy_event.deployment_id) }
    email_body = { 'Html': { 'Data': create_message_text(code_deploy_event) } }
    email_message = { 'Subject': email_subject, 'Body': email_body }
    send_email_response = ses_client.send_email(Source='email-address-to-send-to@your-domain.com', Destination=destination, Message=email_message, ReplyToAddresses=['your-email-address@your-domain.com'])
    message_id = send_email_response['MessageId']
    print('Successfully sent email with message id {}'.format(message_id))

def create_message_text(code_deploy_event):
    deployment_id_line = '<b>Deployment id</b>: <a href=\"https://console.aws.amazon.com/codedeploy/home?region=us-east-1#/deployments/' + code_deploy_event.deployment_id + '\">' + code_deploy_event.deployment_id + '</a><br>'
    version_line = '<b>Revision</b>: ' + code_deploy_event.version + '<br>'
    application_line = '<b>Application</b>: ' + code_deploy_event.application + '<br>'
    deployment_group_line = '<b>Deployment group</b>: ' + code_deploy_event.deployment_group
    return deployment_id_line + version_line + application_line + deployment_group_line

Final Thoughts

We have had this setup running in one of our teams here at Gilt for quite a few months now, and the results are satisfying. Instance termination events, given their real time nature, allow us, for example, to act quickly and prevent potential outages on our services. Also, it has already allowed us to identify services that had not enough memory allocated, and thus needed code changes or a change of instance type in order to stabilize them. In short, it’s giving us a level visibility we never really had before and enabling us to be more proactive towards keeping our services in good shape.

Finally, deployment notifications add more to the debugging side of things. They let us establish a timeline of events - e.g., when releases have gone out - and with that more quickly identify releases and code changes that have broken a particular service in production. Ultimately, this speeds up the process of bringing a service back to a healthy state. We feel like our current setup is enough for our needs, but certainly we will be looking at expanding the range of events we are watching out for when the need arises. At the end of the day, it’s all about having quality information in order to help keep our services running well.

aws 9 cloudwatch 1 events 27 lambda 1 production 1
Tech
Gilt Tech

Perfect Overnight Cold Brew

Evan Maloney coffee

When Gilt’s Mobile team worked at 1 Madison Avenue, my morning coffee ritual involved getting a large black iced coffee from myWayCup as I exited the 6 train at 23rd Street. What they served at myWayCup—a private-label version of Intelligentsia Coffee’s House Blend—was so good that I switched to iced coffee year-round—even through brutal New York winters—a trait that often earned me quizzical looks when ordering my preferred drink during a snowstorm.

About a year later when the Mobile team moved back to the 2 Park Avenue office, I searched the neighborhood for iced coffee I liked as much, but came up empty. The cold brews I tried tended to be syrupy and super-concentrated, while the ones made with a hot brew process had all the subtlety scorched out of the beans, leaving a jagged, edgy texture. And too often, stores didn’t turn over iced coffee frequently enough in the winter, so you’d end up with something that had become stale after days of storage.

Without a local favorite, I started experimenting with making my own iced coffee. At times, there were catastrophic failures. At least two glass carafes gave their lives in pursuit of coffee perfection, and an otherwise white wall at the office somehow acquired a coffee streak arcing towards the floor. I even managed to melt one of my coffee grinders on top of a stove I didn’t realize was still hot.

Despite these embarrassing setbacks, the technique continued to evolve and improve, and I eventually switched from the laborious process of rapid cooling a hot brew to the simpler—but far lengthier—process of an overnight cold brew. True, there’s no instant gratification: my coffee intake now requires preparing a day in advance, but the result is a well-balanced brew. It’s not the thick, need-to-dilute-it-with-water cold brew that used to get delivered to our office in metal kegs. (A co-worker once posted notices warning of the jitters that ensue when forgetting to water it down.) Nor does it have the unrefined taste of beans that have had great violence done to them by exposure to extreme heat followed by cooling.

To me, this technique yields the perfect iced coffee.

What I use to brew:

  • Intelligentsia House Blend beans: Delicious coffee, full-bodied but not over-roasted or overly bitter. Using beans as opposed to pre-ground coffee ensures maximum freshness when you brew. Ground coffee oxidizes quickly and will soon taste stale if not used right away. (I signed up for Intelligentsia’s mail order subscription service so I’m never out of fresh beans.)

  • A Bodum Bistro Electric Burr Grinder: This is a conical burr grinder, which means the grounds come out a consistent granularity. Most home coffee grinders are blade grinders that deliver grounds of varying size. In a French press, the finer grounds will not be caught by the filter; those grounds will end up on your tongue like silt, which will not make for pleasant drinking. If you’re a coffee enthusiast, you should seriously consider a conical burr grinder.

  • A 51oz (1.5 liter) Bodum Chambord French Press. This size yields about 2-3 days of coffee at my rate of consumption, which is ideal; beyond 3 days, the coffee would begin tasting stale anyway.

  • A pitcher of NYC tap water filtered through a Brita.

I start the brewing process in the morning, and I keep the French press on the counter until nighttime so I can stir periodically when I get the chance. Then, when I go to bed, I put the carafe in the fridge so it’s cold and ready to drink when I get up in the morning.

The brewing steps I follow:

  1. Set the grinder’s coarseness one notch to the left of the French Press icon on the dial.

  2. Set the grinder to grind for 15 seconds.

  3. Fill the hopper with beans.

  4. Press the grind button and remain calm.

  5. Take the fresh grounds out of the grinder—be sure to stop and smell the coffee!—and dump them in the glass carafe of the French press. Return any beans remaining in the hopper to an airtight container.

  6. Fill the carafe nearly to the top with cold water. Filtered tap water or bottled water is recommended, depending on the desirability of the water coming out of your faucet. (Remember, good beans + bad water = bad coffee.)

  7. Mix the water and grounds together by stirring for 10 seconds.

  8. Cover the top of the carafe with cellophane wrap or aluminum foil to avoid excess exposure to air.

  9. Leave out at room temperature for 16 hours, stirring periodically for 10 seconds if possible. (I usually end up stirring about 5 times or so for the typical batch.)

  10. Put in fridge for the final 8 hours.

  11. Remove from fridge and stir for 10 seconds.

  12. Press the grounds and serve the coffee.

  13. Store what remains in an airtight glass container, and drink within 3 days.

Overall, it’s a 24-hour process, although there are some tricks you can use to cut down on brew time. More frequent, vigorous and longer stirring will help shorten the process, as will repeated pressing at the end. (Pressing the grounds, removing the press, stirring the grounds, and pressing again.) You can adjust according to your equipment, beans and personal taste, but in my experience you need at least 16 hours of brew time to let the flavors fully form.

Hope you enjoy it as much as I do!

iced coffee 1 cold brew 1 conical burr grinder 1 French press 1 Intelligentsia Coffee 1
Tech
Gilt Tech

NYC Scrum User Group - January 19th

meetups

We’ll be hosting our first meetup of 2017 in partnership with the NYC Scrum User Group on Thursday, January 19th. This is our first time hosting this group and we’re off to a great start: Ken Rubin will be joining us to lead a talk on Agile.

More on Ken:

Ken is the author of Amazon’s #1 best selling book Essential Scrum: A Practical Guide to the Most Popular Agile Process. As an agile thought leader, he founded Innolution where he helps organizations thrive through the application of agile principles in an effective and economically sensible way. He has coached over 200 companies ranging from startups to Fortune 10, and is an angel investor and mentor to numerous exciting startups. As a Certified Scrum Trainer, Ken has trained over 24,000 people in agile / Scrum as well as object-oriented technology. He was the first managing director of the worldwide Scrum Alliance, a nonprofit organization focused on transforming the world of work using Scrum.

The title of the talk is “Agile Transition Lessons That Address Practical Questions” and will address questions like:

  • Is there a way to quantify the cost of the transition?
  • How many teams or what scope should the initial transition effort cover?
  • Should we use an internal coach or hire an external coach?
  • How does training fit in to the adoption?
  • How do we measure our success?
  • Should we use an existing scaling framework or develop our own?

If you plan to attend, please RSVP on the NYC Scrum User Group Meetup page. As always there will be refreshments, networking opportunities and a chance to chat with the speaker. We hope to see you there!

meetups 34 agile 8
Tech
Gilt Tech

BackOffice Hike

Ryan Martin culture

Now that winter is here and the cold is keeping us inside, I thought it would be good to dream of warm days and look back at a BackOffice team outing from October.

On what turned out to be an unseasonably warm and beautiful Monday, a bunch of us took the day off and hiked the Breakneck Ridge Trail. It’s a 3.5-mile loop that climbs quickly (read: steep scrambling) to amazing views of the Hudson River Valley. It’s accessible via Metro-North out of NYC, so it’s a great target for a day trip from the City.

Here are some photos of the trip - can’t wait for the next one.


culture 32 outing 1
Tech
Gilt Tech

Deep Learning at GILT

Pau Carré Cardona machine learning, deep learning

Cognitive Fashion Industry Challenges

In the fashion industry there are many tasks that require human-level cognitive skills, such as detecting similar products or identifying facets in products (e.g. sleeve length or silhouette types in dresses).

In GILT we are building automated cognitive systems to detect dresses based on their silhouette, neckline, sleeve type and occasion. On top of that, we are also developing systems to detect dress similarity which can be useful for product recommendations. Furthermore, when integrated with automated tagging, our customers will be able to find similar products with different facets. For instance, a customer might be very interested in a dress in particular, but with a different neckline or sleeve length.

For these automated cognitive tasks we are leveraging the power of a technology called Deep Learning that recently managed to achieve groundbreaking results thanks to mathematical and algorithmic advances together with the massive parallel processing power of modern GPUs.

GILT Automated dress faceting

GILT Automated dress similarity

Deep Learning

Deep learning is based on what’s called deep neural networks. A neural network is a sequence of numerical parameters that transform an input into an output. The input can be the raw pixels in an image, and the output can be the probability of that image to be of a specific type (for example, a dress with boat neckline).

To achieve these results it’s necessary to set the right numerical parameters to the network so it can make accurate predictions. This process is called neural network training and most times, involves different forms of a base algorithm called backpropagation. The training is done using a set of inputs (e.g. images of dresses) and known output targets (e.g. the probability of each dress to be of a given silhouette) called the training set. The training set is used by the backpropagation algorithm to update the network parameters. Given an input image, the backpropagation refines parameters so to get closer to the target. Iterating many times through backpropagation will lead to a model that is able to produce, for a given input, outputs very close to the target.

Once the training is done, if it has a high accuracy and the model is not affected by overfitting, whenever the network is fed with a brand new image, it should be able to produce accurate predictions.

For example, say that we train a neural network to detect necklines in dresses using a dataset of images of dresses with known necklines. We’d expect that if the network parameters are properly set, when we feed the network with an image of a cowl neckline, the output probability for the cowl neckline should be close to 1 (100% confidence). The accuracy of the model can be computed using a set of inputs and expected targets called test set. The test set is never used during training and thus it provides an objective view of how the network would behave with new data.

Neural networks are structured in layers which are atomic forms of neural networks. Each layer gets as an input the output of the previous layer, computes a new output with its numerical parameters and feeds it forward into the next layer’s input. The first layers usually extract low level features in images such as edges, corners and curves. The deeper the layer is, the more high level features it extracts. Deep neural networks have many layers, usually one stacked on top of the other.

Deep Neural Network Diagram

Dress Faceting

Automatic dress faceting is one of the new initiatives GILT is working on. GILT is currently training deep neural networks to tag occasion, silhouette, neckline and sleeve type in dresses.

Dress Faceting Model

The model used for training is Facebook’s open source Torch implementation of Microsoft’s ResNet. Facebook’s project is an image classifier, with models already trained in ImageNet. We’ve added a few additional features to the original open source project:

  • Selection of dataset for training and testing (silhouette, occasion, neckline…)

  • Weighted loss for imbalanced datasets

  • Inference given a file path of an image

  • Store and load models in/from AWS S3

  • Automatic synchronization image labels with imported dataset

  • Tolerance to corrupted or invalid images

  • Custom ordering of labels

  • Test and train F1 Score accuracy computation for each class as well as individual predictions for each image across all tags.

  • Spatial transformer attachment in existing networks

The models are trained in GPU P2 EC2 instances deployed using Cloud Formation and attaching EBS to them. We plan to substitute EBS by EFS (Elastic File System) to be able to share data across many GPU instances.

We are also investing efforts trying to archive similar results using TensorFlow and GoogleNet v3.

Data and Quality Management

To keep track of the results that our model is generating we’ve built a Play web application to analyze results, keep a persistent dataset, and change the tags of the samples if we detect they are wrong.

Model Accuracy Analysis

The most basic view to analyze machine learning results is the F1 Score, which provides a good metric that takes into account both false positive and false negative errors.

On top of that, we provide a few views to be able to analyze results, specifically intended to make sure samples are properly tagged.

F1 Score View

Image Tagging Refining

The accuracy analysis allows us to detect which are the images the model is struggling to properly classify. Often times, these images are mistagged and they have to be manually retagged and the model retrained with the new test and training set. Once the model is retrained, very often its accuracy increases and it’s possible to spot further mistagged images.

It’s important to note here that images in either the test or the training set always remain in test or in train. It’s only the tag that is changed: for example, a long sleeve could be retagged to three-quarters sleeve.

To scale the system we are attempting to automate the retagging using Amazon Mechanical Turk.

False Negatives View

Image Tagging Refining Workflow

Alternatives using SaaS

There are other alternatives to image tagging from SaaS companies. We’ve tried them without success. The problem with most of these platforms is that at this point in time they are not accurate nor detailed enough in regards of fashion tagging.

Amazon Rekognition short sleeve dress image tagging results

Dress Similarity

Product similarity will allow us to be able to offer our customers recommendations based on product similarity. It’ll also allow our customers to find visually similar product with other facets.

Dress Similarity Model

For the machine learning model we are using TiefVision.

TiefVision is based on reusing an existing already trained network to classify on the ImageNet dataset, and swapping its last layers with a new network specialized for another purpose. This technique is know as transfer learning.

The first trained networks are used to locate the dress in the image following Yann LeCun Overfeat paper. This location algorithm trains two networks using transfer learning:

  • Background detection: detects background and foreground (dress) patches.

  • Dress location network: locates a dress in an image given a patch of a dress.

Combination of Dress Location and Background detection to accurately detect the Location of the dress

Once the dress is located, the next step is to detect whether two dresses are similar or not. This can be done using unsupervised learning from the embeddings of the output of one of the last layers. Another approach is to train a network for to learn dress similarity (supervised learning).

For the supervised side, we use Google’s DeepRank paper. The supervised learning network uses as input three images: a reference dress, a dress similar to the reference, and another dissimilar to the reference. Using a siamese network and training the network using a Hinge loss function, the network learns to detect dress similarities.

Similarity Network Topology

To compute the similarity of a dress and the other dresses we have in our database TiefVision does the following two steps:

  • The dress is first cropped using the location and background detection networks.

  • Finally the dress similarity network computes the similarity between the cropped dress and the cropped dresses we have in our database. It’s also possible to compute similarity using unsupervised learning.

For more information about TiefVision you can take a look at this presentation.

deep learning 2 Pau Carré Cardona 2 machine learning 7 computer vision 1
Tech
Gilt Tech

From Monolothic to Microservices - Gilt's Journey to Microservices on AWS

John Coghlan conferences

Watch Emerson Louriero’s talk from AWS re:Invent 2016

At AWS re:Invent, Emerson Loureiro, Senior Software Engineer at Gilt, lead two well-received sessions on our journey from a single monolithic Rails application to more than 300 Scala/Java microservices deployed in the cloud. We learned many important lessons along the way so we’re very happy to share this video of Emerson’s talk with you.

microservices 19 aws 9 scala 16 emerson louriero 1
Tech
Page 1 of 67