Home / News / How to migrate to Snowflake without getting ‘data drunk’

How to migrate to Snowflake without getting ‘data drunk’

The Turn out to be Era Summits get started October 13th with Low-Code/No Code: Enabling Endeavor Agility. Check in now!

For those who haven’t heard, the cloud is booming. And on the subject of cloud garage, particularly, knowledge warehouse Snowflake is taking advantage of the snowfall. In its newest monetary disclosure, the corporate reported four,532 consumers and 110% year-over-year income expansion.

Even supposing migration is handiest step one on the subject of embracing the cloud, getting it proper is a very powerful to environment any trade up for good fortune. And there’s so much to imagine: governance, customizations, aligning stakeholders, and construction out a crew to make it occur. Plus, the truth that Snowflake’s limitless garage and compute make it simple to rack up a large invoice.

To get a greater thought of the way one corporate ready for its migration to Snowflake, we chatted with Salim Syed, senior director of information engineering at Capital One. He pulled again the curtain at the corporate’s migration, which kicked off in 2017. The crew has made a number of updates through the years, he says, and it’s been a a success adventure leading to nearly 27% value financial savings.

This interview has been edited for brevity and readability.

VentureBeat: Capital One got down to migrate to Snowflake since you noticed some possible advantages, after all. However what demanding situations did you await? Have been there any drawbacks you felt you needed to clear up for referring to how this may affect Capital One’s knowledge and manner of running with that knowledge?

Salim Syed: It’s a excellent query. And sure. So Snowflake’s structure used to be other than every other knowledge warehouse we had labored with, which had separation of garage and compute. So up to now, we didn’t have to regulate compute one after the other; you simply gave get admission to to the database to our customers. However we knew that Snowflake supplies limitless garage and limitless compute and if we didn’t arrange how you can provision that and construct correct controls and governance round it, then we might lose observe of the price and of governance. In order that’s something.

The opposite factor is we didn’t need the centralized crew to be a bottleneck, with 6,000 customers asking for get admission to to the knowledge warehouse and compute one after the other. So we began serious about how lets make this extra self-service and provides the possession of information and infrastructure to the companies to regulate their very own environments, but in addition be sure governance, value regulate, and best possible practices are inbuilt. And in order that resulted in our adventure construction these types of gear that lend a hand us arrange Snowflake higher.

VentureBeat: And what’s this worry about turning into “knowledge inebriated”? 

Syed: As we transfer to the cloud, the volume of information we’re seeing now’s, I will be able to’t even … possibly 50 occasions greater than what we ever had in our on-premise. So the volume of information and the number of knowledge is solely frequently expanding, and Snowflake permits you to principally retailer as a lot knowledge as you wish to have and run as a lot analytics as you wish to have. In order that’s the time period we got here up with about how our analysts will principally use no matter sources we give them. When analysts paintings with knowledge, they’re principally growing subsets of information and storing them of their non-public sandboxes. And what occurs is whilst you permit analysts, knowledge scientists, or whoever to simply proceed to create increasingly more garage, you lose regulate of that knowledge. And so we additionally very in particular sought after to be sure that any knowledge this is created out of doors of our manufacturing techniques by means of our customers is well-governed. We all know precisely what that knowledge is, who will have to have get admission to to it, the way it’s shared, how lengthy to stay the knowledge, the metadata — we require all this is captured in order that we’re nonetheless rising in point of fact speedy but in addition ensuring we’re nonetheless well-governed.

VentureBeat: So the worry round getting “knowledge inebriated” is extra in regards to the regulate than the volume of information?

Syed: It’s each. The price is one facet as a result of you’ll finally end up spending so much, while up to now, you didn’t. It wasn’t prepaid, however moderately you purchased a license for a yr and simply used it — it didn’t topic how a lot. With Snowflake and AWS cloud, the extra you employ it, the extra you find yourself paying. So it’s essential to be sure to’re the usage of the compute as successfully as imaginable. At the different facet, governance and regulate could also be essential if you have such various knowledge and such a lot of various kinds of knowledge. To ensure that us to be nicely ruled, we need to fulfill now not handiest the cyber other people however regulators, the database management crew, and the entire other stakeholders.

VentureBeat: Talking of regulators, does the truth that Capital One sits in a closely regulated business have any affect? 

Syed: I feel Capital One used to be in a greater position as a result of we’re this type of closely regulated corporate, so we perceive possibility control higher than others. However what in point of fact modified as a part of our migration used to be scaling governance as a result of now we’re simply coping with exponentially extra knowledge. Traditionally, governance can turn into a bottleneck and will stifle your innovation as a result of everybody has to deal with the central crew that enforces governance, and everybody has to apply that. So our problem used to be how will we federate and simplify governance? And the way will we cover the entire forms that is going on and make it clear so our customers can nonetheless get admission to the knowledge and innovate whilst ensuring that the entire governance actions are sorted at the back of the scenes? That’s what we in point of fact all in favour of all through our migration. And also you requested about different corporations. Despite the fact that it’s now not a regulated corporate, it’s turning into such a very powerful a part of each and every group. All that data goes to be tremendous treasured regardless of [whether] it’s regulated.

VentureBeat: So let’s get into your answers. How did you pass about now not simply putting in place controls, however streamlining the method? 

Syed: We constructed the gear as a result of we knew value would turn into a large factor if we didn’t. However the thought used to be that you’re federating the possession and control to the trade whilst imposing central insurance policies and the usage of centralized gear. So the query used to be how are you able to make it nonetheless be versatile in order that line of industrial can nonetheless regulate they usually don’t simply reject it? That’s the place it in point of fact began.

Then the adventure went from infrastructure control in Snowflake to knowledge control. We needed to be sure that at the manufacturer facet, for instance, the revel in used to be seamless — that you should ingest knowledge from the entire other resources and ensure the only unmarried workflow would get your knowledge and registered metadata, establish the sensitivity of columns, and classify columns and fields. After which be sure that past the place the knowledge shall be saved, how it’ll get up to date and what transformations will occur. We simply sought after to make that complete revel in simple. After which whilst that used to be taking place, we principally enabled the entire knowledge governance issues so companies don’t need to reinvest and will simply configure their workflow and use our ingestion procedure.

We in point of fact concept in regards to the knowledge discovery section too. We had to construct a machine the place you will discover the knowledge simply by means of seeing what folks on your position have looked for, so we used device studying to determine that out. After which when you in finding the knowledge that’s related to you, we provide you with data round if you’ll agree with the knowledge, how ceaselessly has it been up to date, when used to be the final time, what are the values, who accesses the knowledge, and many others. We needed to take away all that forms and make a unbroken end-to-end utility.

VentureBeat: And what did this all seem like on the subject of the folks concerned? Did you’ve got a devoted crew? Which sorts of mavens would you assert are a will have to to have curious about this type of enterprise?

Syed: All of it begins from management. You need to have management’s buy-in so the entire strains of industrial realize it’s the best way you’re going. And yeah, completely. You’ll have to construct a crew of information engineers, utility builders, UI architects, and those who perceive governance and the ache issues. An enormous product crew. So it used to be indisputably a mixture of groups that have been introduced in, and we additionally continuously engaged with line of companies to verify we have been addressing their wishes as nicely.

VentureBeat: Has all this carried you nicely as much as these days? Have you ever needed to make any updates or adjustments?

Syed: We’ve indisputably realized so much alongside the best way and made changes. As an example, we had to begin with created some patterns for knowledge manufacturers to, for instance, load the knowledge. And we gave the strains of companies the foundations of the street and mentioned they may be able to do it on their very own. However through the years, we discovered it used to be in point of fact exhausting to implement this and know who used to be or wasn’t following the foundations. So we made centralized tooling for this, but in addition addressed the worries of line of industrial by means of ensuring it might be extremely configurable and versatile. However I think like we’re now in a in point of fact excellent place and seeing the advantages. Nearly 50,000 hours of guide paintings we used to do is now carried out by means of this utility, and we’ve noticed nearly 27% value financial savings. And we’re seeing utilization proceed to head up, with Five-6 occasions extra queries being run.

VentureBeat: What takeaways do you’ve got from this revel in? Is there anything else you would like you had recognized previous on within the procedure?

Syed: For someone who’s looking to make a migration or knowledge transformation to the cloud, understand it’s exhausting to place the genie again within the bottle. So it’s in point of fact essential to assume forward on the way you’re going to deploy the governance.


VentureBeat’s venture is to be a virtual the city sq. for technical decision-makers to achieve wisdom about transformative generation and transact.

Our website delivers very important data on knowledge applied sciences and techniques to lead you as you lead your organizations. We invite you to turn into a member of our group, to get admission to:

  • up-to-date data at the topics of pastime to you
  • our newsletters
  • gated thought-leader content material and discounted get admission to to our prized occasions, akin to Turn out to be 2021: Be told Extra
  • networking options, and extra

Turn into a member


Check Also

Kolide a ‘transparency first endpoint security platform raises 17M 310x165 - Kolide, a ‘transparency-first’ endpoint security platform, raises $17M

Kolide, a ‘transparency-first’ endpoint security platform, raises $17M

A brand new GamesBeat match is across the nook! Be told extra about what comes …