Tooling for data scientists

How can organisations unlock the value of their enormous datasets to innovate new products?

Background

Citizen data scientists need to understand and work with data to make informed decisions about the strategic direction of their organisations. These employees want to leverage data to do their jobs more effectively but lack the data literacy to make this a reality.

The innovation lab of a leading financial services company wanted this cohort of employees to explore their business data with an intuitive software tool that reduced the overhead of traditional data analysis. The company wanted to discover if this drove more effective product innovation.

They asked us to create an experience vision for a pilot version of the software, using the concept of a customer’s “financial well-being” as the cornerstone of the design process. The pilot became codenamed CitiFin.

Synthetic Data Generation

We had to find a means of generating a dataset to create the vision.

To get started, therefore, we created a baseline financial dataset based on six months of real data from volunteers matching the key financial personas.

We knew that the larger the dataset, the more useful and accurate the behavioural insights, which meant we needed much more data.  From the small, initial dataset, we used python, pandas and machine learning to generate the larger synthetic dataset that we needed for the pilot.

Synthetic data is usually generated by passing real-world data through a noise-adding process. This creates a fresh dataset that contains the features of the original dataset. However, the sensitive information in the original dataset has been obfuscated, thereby avoiding the privacy risks that would have made the data impossible to work with.

The versatility of synthetic data comes from what’s known in the industry as differential privacy. This means that anyone mining synthetic data can make reliable and compelling assumptions, safe in the knowledge that the data they are exploring is statistically similar to the real-world data from which it has been produced.

Data visualisation

We then categorised the transactional data points. This helped us to visualise the individual and create a financial persona that the citizen data scientists could begin to hypothesise on in the context of new product development.

In this context, the visualisation provides a point-in-time snapshot of a customer’s financial well-being, highlighting habits and trends. Controls are provided to allow the user to adjust key metrics. How can the customer save more? How does one element of the spider chart impact another?

Tooling

The design process kicked out a number of interesting requirements

  • Financial personas

    Changing parameters (marital status, home ownership, income, etc.) allowed our user to create a financial persona and a related visual representation of that persona’s financial well-being.

  • Seed and Generate Data

    Counterpointing the first use case, a persona can be used to generate a dataset, which can be used for further exploration by the user.

  • At-a-glance Insights

    The tool needed to be easily propagated from a source to generate the visual representation that provides “at-a-glance” insights of the data.

Results

Citizen data science has a bright future. Organisations want to empower their employees with better business intelligence software, helping them to interpret the data that’s available to them more effectively.

CitiFin exemplifies the type of tool that will become popular as AI and machine learning in the business intelligence space become ever more ubiquitous.

What we did

Discovery workshop
Concept development
Synthetic data generation
Machine Learning
Data Visualisation