One of the most satisfying parts of running the data team at GitLab was helping other people do their job more effectively. GitLab was rapidly growing when I joined. They’d just closed a Series C and I was around the 200th person to join the company and by the end of my first year we were at ~600 people. This caused a huge increase in the number of Things That Were Being Done™.
With the increase in people, there was an increase in the amount of complexity in the overall system of the company. Multiple SaaS tools were being spun up and used by different teams and departments, while workflows and processes were created, run, and improved. We needed a way to validate and audit these critical processes to ensure accuracy in reported metrics and compliance with rules and regulations.
Bring it together
Centralized data stores are uniquely positioned to audit and improve systems because they sit outside the systems they observe. When we pull data out of existing systems and free them from the constraints of their designers, we’re able to ask questions about the system as it’s represented in the data. We’re also able to bridge the gaps between systems in a way that enables us to ensure the integrity of the connections between these systems.
To ground this better to reality, I’ll draw upon a simplified version of my GitLab experience. We had 2 systems that shared data and represented reality in their own way: Zuora and Salesforce. Different teams (Finance and Sales, respectively) used these 2 systems for different reasons. One of the goals for both of these teams was to understand the customer reality of a subscription lifecycle – when it started, re-upped, changed plans, ended, and if they restarted it.
Neither of these tools were, by default, able to represent what was actually possible with subscriptions accurately. A customer who signed up for a year, cancelled for 4 months, then started again – how do we talk about that? Is it the same subscription? Are those 2 subscriptions? It has an effect on monthly recurring revenue numbers, retention and churn numbers. And, importantly, do they look the same in both tools?
Each team had done their best to represent the subscriptions within the systems but there was always a gap in metrics due to how the data was stored. Enter the data team and our centralized warehouse.
I’ll spare all of the gory details, but there was an iterative process over several weeks to do the following:
- recreate the representation of subscriptions within both systems
- document and agree on a single definition of what a subscription actually is (both the numbers and the actual definition)
- get the source of truth 100% reflected in Zuora as the primary, and Salesforce as the secondary
- lastly, setup the warehouse to act as the audit mechanism
The audit step was, in my view, the most powerful and satisfying. Once we put the constraints on the system in the form of definitions, we were then able to test it to ensure it conformed to those constraints. When a test failed and something was out of compliance, it would then trigger a work process for the person responsible to fix the data in the system. Once done, the tests would pass and everyone could feel confident that the data was right and the numbers we were looking at were accurate.
The confidence came from the tests. We observed the system, defined the expected states, and then tested those states. Passing tests meant we didn’t need to worry about the system. Which meant we could worry about other things.
Sitting Outside
The most important element of this whole process was that it sat outside the system it was observing. This would not have worked within the system itself because it was too constrained by the software of the system. Only by moving the critical elements out of the system (i.e. the data) were we able to effectively constrain it.
Moving data outside the system is now fully in the realm of data engineering, a field and discipline unto itself. But it is a critical part of enabling this audit process because its predicated on the fact that the data outside the system is the same as the data inside they system.
Closing Thoughts
An outside observer for any process can dramatically help the overall efficiency of a system. This is one of the bull cases for Private Equity. PE is able to come into a company, bring their experience and outsider status to observe and then improve the system. That improvement comes in several forms whether it’s strategy changes or talent training. Many of those improvements can be tied back to data in some way and once that data is accurate and centralized, it can then be audited. If this auditing power is interesting to you, we’d love to talk.