Speed Wins
CIOREVIEW >> Database >>

Speed Wins

Matt Spilich, Director - Database and Warehouse Operations, TripAdvisor [NASDAQ: TRIP]
Matt Spilich, Director - Database and Warehouse Operations, TripAdvisor [NASDAQ: TRIP]

Matt Spilich, Director - Database and Warehouse Operations, TripAdvisor [NASDAQ: TRIP]

I lead a team that manages a service organization, sup­porting multiple teams reliant on infrastructure and database services. In my opinion, reducing time to de­livery for new features and functionality is the largest business challenge that we face in my industry. Improv­ing operational stability and consistency is another challenge that we face as data warehouse professionals, along with work­ing to reduce risk of unexpected delays when rolling out infra­structure changes (both hardware and software).

Operation teams have historically tended to be conservative and slow moving. While these traits may help to reduce risk, it typically comes at the cost of speed to delivery for new features, hardware, software upgrades, and supporting new products. By contrast, at TripAdvisor we have a “speed wins” culture that prioritizes getting quality work done quickly. Through investment in database and data-warehouse initialization and configuration and deployment as a first class “infrastructure as code,” we allow our data service organization to be able to more effectively be able to scale at the speed of the business without sacrificing any operational rigor and operational standards.

The availability of cloud providers gives individual teams the opportunity to fulfill their own infrastructure on demand. While these services offer tremendous benefits of elasticity and reduction in spin up time; there is some risk of creating silos across teams that manage their own infrastructure. Initial time to market is reduced but not all teams are going to have the same level of operational rigor supporting cloud services. It is likely that teams that go around their respective operations organizations will need to pay similar costs in terms of automation once they reach a certain scale. A hybrid solution with both cloud provided and on-premise in partnership with internal operation is likely to provide the best outcome from a speed, security, and service to the larger product organization. In short, cloud can be a part of the solution, but without changes in how we think of our infrastructure internally, it’s not by itself a solution to all of these challenges.

  ​Improving operational stability and consistency is another challenge that we face as data warehouse professionals   

Traditional database services organizations’ slowness is often due to reliance on operations team personnel to accomplish tasks through manual tweaking and configuration. At TripAdvisor, our investment in automation and ‘infrastructure as code’ allows our service organization to be able to more effectively perform software and hardware upgrades without incurring delays or impeding the business. We’re leveraging multiple open source configuration management tools to achieve these goals. More specifically, we leverage Puppet for underlying OS configuration and Ansible for higher-level application service configuration. This investment was born out of the need to manage ever-increasing amounts of infrastructure without comparable increasing in staffing. By focusing initially on some quick wins, we were able to show some value to this approach while iterating towards more comprehensive solutions.

From an operational rigor perspective, the benefits are clear. With infrastructure state checked into source control and deployed automatically, we eliminate systematic differences between individual components of a system. Large-scale changes in configuration can be rolled out in a managed and controlled fashion to a subset of systems. Performance signatures can be compared between different states. New systems can be brought online in minutes rather than days, which lets us—as a service-oriented organization—be able to exceed our customers’ expectations in terms of time to delivery. Immediate benefits are apparent during large-scale hardware refresh cycles. We have reduced the time to initialize and bring into service a refresh of a datacenter from weeks to days, but one can see the benefit even on a small scale with individual new product facing requests.

Testing best practices that software engineers have taken for granted now more easily apply to our infrastructure and configuration. Infrastructure as code can be validated locally on a developer’s workstation using containers or virtual machines. Automated tests can be written and changes can be validated in continuous integration. Breaking changes are more likely to be found as they are committed rather than during release, which increases stability and reduces outages. Through investment in automation and configuration management, we’ve made progress in all of these areas. We are able to deliver more stable, more consistent changes to our customers and are able to reduce the risk in making changes. This in turn allows us to move at the speed of the business customers that we support. Keeping up with the pace of business is an ever-changing challenge for operations. We’re pleased with the progress that our teams have made in this area and look forward to new challenges to be solved in the coming year.

Read Also

Ensuring Diligence In The Technology Era

Carlos Renteria, CISO, Southside Bank

Telecom & Grid Modernization

Kymberly Traylor, Director of Network & Telecommunications, JEA

Unlocking The Power Of Your Asset’s Data

Rob Kennedy, Global Head of Digital Twin - Full Asset Lifecycle, Wood

Enterprise Agility In The Face Of Rising Cyber Threats

Jonathan Sinclair, Associate Director, Cyber Security, Bristol Myers Squibb

Digitalizing Energy Asset Management– Not A Walk In The Park

Claudio Lambert, Head of Asset IT, Distribution, Hydro & Services, Vattenfall