Data and Analytics

Why the Modern Data Stack sucks for data consultancies looking to productize

Douwe Maan
January 10, 2024
12:00 AM
Data and Analytics
By Douwe Maan on January 10, 2024

If you’re a data consultancy primarily serving Small to Medium-sized Business (SMB) clients, chances are that you’ve seen a lot of overlap in your clients’ needs and are thinking about productizing your services.

Data Platform as a Service, not a deliverable

Instead of building each client a bespoke-but-largely-the-same data platform for an hourly rate, you can reach many more clients with a fully-managed two-thirds-standard/one-third-custom “data platform as a service” (DPaaS) with analytics support attached, aimed at a specific common use case or industry and priced based on the value it provides them on an ongoing basis.

After all, what clients are buying from you is the ability to improve their business through the power of data, not the implementation detail of a specific data stack.

Compared to the traditional bill-by-the-hour business model, a productized service has many advantages for you and your clients:

  • Value-based pricing: Selling largely the same solution to many clients means lower cost and higher margins.
  • Long-term engagements: Selling a monthly subscription means reliable, recurring revenue and opportunities for future consulting work, and clients get lower upfront cost and ongoing analytics support.
  • Scale to more clients: When the first two-thirds of any client’s needs are already taken care of, your team can handle many more of them in parallel (and do less boring repetitive work!), and more clients get a chance to become data-driven.
  • Shorter sales cycles: Having a standard solution ready to go means clients can see value immediately and get to yes more quickly — especially as the setup fee to customize the standard solution to their needs will be lower than a bespoke project would’ve been.

What it takes to productize

Selling a product is different from selling consulting services in terms of marketing, project management, contracts, invoices, and support, and you shouldn’t underestimate the shift in mindset and processes it requires. But all of that can be figured out with the help of tools like Common Paper and Stripe, and the biggest obstacle in making this business model work for you from an economical perspective is in the technology.

As you’ll be offering a service for a fixed subscription fee, you need the marginal cost of taking on a new client to be as low and predictable as possible, so that you can scale to as many clients as want your product while keeping margins high. Your per-client costs are in three places:

  1. The amount of time it takes your team to onboard a client and customize the platform to their needs (which you could charge a setup fee for, at the risk of losing clients, or expect to make it back in future recurring charges).
  2. The amount of time it takes your team on an ongoing basis to monitor and maintain their data platform and provide ad-hoc (analytics) support.
  3. The cost of ingesting, transforming, storing, and showing reports of their data.

To minimize these costs, you need a multi-tenant technology stack that lets you:

  • Templatize your standard ELT and data warehousing solutions
  • Instantiate a new data platform from a template and auto-provision all the necessary infrastructure quickly
  • Customize the data platform to the client’s needs
  • Reuse data models and reports across clients
  • Monitor all clients’ data platforms and ELT pipelines in one place
  • Bulk-apply new and fixed data models to all clients on a given template
  • Offer clients a white-label portal to connect and manage their data sources

And last but not least, you need the cost of ELT and the data warehouse to be predictable and reasonably low so that your margins can never go negative when a client has a busy month or a specific high-volume source that they simply can’t do without.

The Modern Data Stack is not up to the task

When you’re building a client their very own data platform that they’ll need to manage themselves once your engagement is over, it makes a ton of sense to set them up on the Modern Data Stack with subscriptions to Fivetran/Stitch, dbt Cloud, Astronomer/Dagster Cloud, Snowflake/BigQuery, and some BI tool.

But if you want to reproduce that stack for any number of additional clients, you run into the fact that these tools are fundamentally built for a single tenant, and not for a data consultancy managing many. It takes a lot of time and effort to set up, monitor, and maintain 10 clients’ data platforms when they’re spread out over 40+ tools and accounts, and the 40+ bills you’ll receive can vary widely in size between different clients and different months, even though your goal is to charge them all the same.

Out of the box, these tools meet none of the requirements we laid out above, but they’re all programmable to some degree using APIs and Terraform, so you could decide to one-by-one automate all of the manual tasks and effectively build your own multi-tenant Modern Data Stack meta-orchestrator that gives you the centralized control and templating abilities you need. But that would take a huge amount of work, it’s a task more suited to software and infrastructure engineers than the data experts that make up your team, and it’s ultimately undifferentiated: your clients care about the quality of your canned reports, your help in customizing them to their needs, and your ongoing support in interpreting them and answering specific questions — not the fancy multi-tenant tech you spent months building to make that easier for you to scale.

A variation on this approach is to say goodbye to the SaaS Modern Data Stack and go the self-managed open source route instead, with Airflow or Dagster for orchestration, Meltano for Extract & Load, dbt Core for Transformation, Clickhouse or Postgres+Hydra for data warehousing, and Metabase or Superset for BI. You can bring these together on an EC2 box or Kubernetes cluster and copy this setup to each client with relative ease (if you’re an infrastructure or software engineer). While this solves the SaaS cost issue, it does makes you responsible for managing all the underlying infrastructure, not just the pipelines that run on top of it. You’re also still left with the task of building the multi-tenant meta-orchestration platform to centrally manage all these instances and keep them up to date as your template evolves (without accidentally overriding any per-client customizations).

Data consultancies deserve better

With the current state of the Modern Data Stack, the productized services business model is great in theory but a pain to bring into practice, to the point that every data consultancy we’ve spoken to is aware of and intrigued by the opportunity, but few have actually been able to successfully implement it.

The sad reality is that the needs of data consultancies managing similar data platforms for multiple clients are simply not on the radar of the Modern Data Stack vendors, as their ideal customers are Fortune 500 enterprises that all have in-house data teams. No matter how large Fivetran’s and Snowflake’s bills may seem to you and your clients, it’s just a drop in the bucket to them: fundamentally, their products are built to help them scale to more enterprise customers, not to help you scale to more clients.

We think it’s time to finally give data consultancies and their clients the productized services they want, by building new (post-Modern?) data infrastructure technology that embraces what makes data consultancies different from in-house data teams, instead of treating them like an annoying edge case.

To this end, we’ve been working closely with a number of consultancies from the Meltano community over the past 6 months to build Arch: the world’s first Multi-Tenant Data Platform that aims to meet all of the requirements laid out above.

If you’re also looking to productize your services and feeling frustrated with the limitations of single-tenant data tools, we’d love to work with you as well and make sure Arch meets all your needs. Feel free to reach out to douwe@arch.dev, book a slot directly on my calendar, or sign up for the waitlist to stay up to date, and let’s make your productized service a reality together!