Start up to Scale up

Claudia Chitu
7 min readFeb 28, 2021

Scaling is foundational. The opportunities of generating data and availability of open source machine learning, as well as open source software frameworks bring with them challenges for startups for scaling with confidence and reliability. Therefore, it is important to understand the nature of the startup and the business when deciding which option to take to scale up your business with data crunching and data science.

In this article you will find how to grow from a startup to an enterprise size company in terms of analytics team and platform. Therefore, let’s shape a simple definition for each of the stages:

1. Startup — small team with no one responsible for the analytics function

2. Scale up — couple of employees in charge with the analytics function

3. Enterprise — mature teams responsible with the entire data life cycle in the organization and end-to-end processing flow.

This article will cover information about the Needs, the Options and the Team for each of the previously mentioned stages: what does the team need from the data, how to get the data and who do we need to get this data.

1. Startup:

a. Size: usually, less than 20 people in the company

b. Startup needs some tools to log events quickly and swiftly (those that have a mobile app or generate time series events from other type of devices); such tool can be Google Analytics, Facebook Analytics, even the free version and if we discuss about a business with users (B2C), then it can have metrics such as MAU (Monthly Active Users), DAU (Daily Active Users), retention, highly vital to understand the direction of the business. Besides that, spreadsheets are commonly used to visualize trends and assess growth YoY, MoM etc.

c. Team: no resource dedicated, but everyone’s responsibility

2. Scale up — here is where it starts to hurt

a. Size: 20–100 people

b. The business gets more data than fits in an excel or a report, more data means more challenges, more problems, and more opportunities. The company needs:

i. Data warehouse (DWH), for storing data for longer term analysis and for advanced analytics, machine learning.

ii. Effective visualization — it is ideal for non-engineer people to have a self-service platform for quick visualizations

iii. KPI tracking — enable the teams to be able to be responsible for their own KPIs

But what is a Data Warehouse? It is an information system that facilitates a single version of truth and consists of multiple layers. The first one, is dedicated to data ingestion, from multiple systems, internal and external, and gets the data into a landing/staging area from where consistency can be ensured, i.e., applying standardization to all collected data. From this layer, the data is extracted, transformed and loaded into the warehouse, into raw data, in what is known as “fact” tables. This raw data in a nexus with the metadata (data about your data, that doesn’t change as often as the events, facts), create the aggregated version, a summary of the data, enriched and transformed. From this point, the data flows further down the stream into data marts. A data mart is actually a database, an access layer which is used to provide data out to the users. Don’t forget that one of the primary objectives of the DWH is to enable businesses to make strategic decisions. Therefore, there are data marts for each team, each business unit, such as in the image, where is the Sales department. And the last layer in the system is the visualization layer, that allows users to interact with the data warehouse system. This way there can be a quick access to historical data to understand where the strategy can be optimized, and it allows to improve the speed and the efficiency for decision makers.

Highly as important as is for the product to get the access to a standardized data, fact insights, historical access, is the internal policies. The team is growing, and some pains start to appear when it comes to access the data, from different needs point of view. Having a DWH, is easier to grant minimum access only to the roles that need access to a certain data mart, not to mention the fact that the PII data can be much easier protected now with proper techniques and strategies and the data retention can be applied to all the sources to stay compliant with the regulations.

Out on the market, more and more businesses are having their data warehouse in the cloud and one option is to go for BigQuery(BQ) DWH with cloud storage (other options such as BQ storage exists but are more expensive), and a Pay as you Go plan. Having DWH from other providers such as Snowflake or Teradata implies to get consultants. A preview of the ecosystem to build DWH on Google Cloud Platform (GCP) is shown in the following image:

In terms of visualization options, Google Data Studio works best with the BQ DWH, and it is a great choice for scale ups to publish their KPIs in Google Data Studio, per team, so that each team can keep track of their own metrics and own them. To use efficiently Google Data Studio with data from BigQuery, an analyst expert is the right resource to design the aggregated sources so that the cost for powering the dashboards is kept under control, without reading more data than is optimal. Of course, there are alternatives, such as MixPanel, Tableau, but they are more expensive (license per user for starting with) and need more capabilities and the process to fit the data to the tool capabilities would require more data manipulation (ETL tasks — extract transform load), in case of Tableau for instance.

Having data in Google Cloud Platform represents an opportunity to get started at this stage to start with BigQuery ML. BQ ML democratizes the machine learning concept, making very easy for SQL practitioners to use built in machine learning models. The first step to use this, is to log your data and store it in BigQuery (if you go for storing the data in BigQuery storage it will be more expensive, but much faster to retrieve the results from your queries due to the advancements of columnar storage for instance). The second step is to start crunching your data with services from GCP such as Data Flow, Data prep, to extract meaning from the data. What meaning can be helpful for your business for example? Well, having a clustering algorithm to understand how your customers/users are being grouped in classes with similar patterns, interests, definitively is a valuable insight. Or even using an ARIMA model to do predictions for your KPIs, for anomaly detention, or to forecast your premium subscribers, that is a great boost to your business. Having these insights, you have the right information to update your product!

c. Team: To begin the discovery process, applying data science techniques or use the BQ ML for the data stored in the DWH, the right experts for you team are a data analyst and a data scientist. They can communicate actionable insights using data to non-technical audiences.

3. Enterprise

a. Size: typically, 100s of people in the company

b. An enterprise will be ready to use real time events, to have scalable machine learning algorithms, and the entire orchestration performed with managed services and even perhaps graph algorithms depending on the scale and the type of the business. In this scenario, the development efforts from the company will go into building (or integrating) a ML platform to reduce the time of creating ML models, that will help in experimentation, reproducibility, model deployment and registry. The benefits of owning such platform will consist of high competitivity of the product and the business awareness of the real events and the way the enterprise reacts to them.

c. Team: ML engineers are vital to perform this stage, to manage the ML lifecycle, track ML experiments and deploy models, scale and obtain fantastic results.

In summary, this article presented what a startup, a scaleup and an enterprise need from the data, what tools to use to process and consume the data, and what experts are needed in the team to stay highly competitive in the market and regulation compliant; and this would be the strategy to scale up your business and your team:

Congratulations to make it to the end!

With these definitions and information, depending where your company is right now, you know where to start working and taking informed decisions.

--

--

Claudia Chitu

Hi! This is Claudia, data strategist and data science evangelist! I love to work on changing organizational cultures to take data driven decisions