Providing Data to Slices

Note

This help should be accurate and comprehensive. If you see anything missing or that needs to be fixed, see How to Contribute or let us know in the Juice Slack #documentation channel.

Now that we know how to add slices to a stack, we need to know how to provide data to those slices. There are two ways to do this, the first is to provide a json fixture file that provides the data in the proper response format. We’ll talk about this more in a later section. The second method, which is more common is to build a Python module that has a class for each slice to provide the data needed for the slice.

These Python modules all follow the same basic structure as shown here:

# Setup # Data Tables # Recipe Service # Filter Service # Slice Services

The setup section is where we import all the needed components and establish database connections. Next, we define all the tables that hold the data we will be using in our slices. With our tables defined, we can build a Recipe Service that houses all our data “ingredients”. (Don’t worry about all these new words, we will define and explore them all in future sections.) Finally, we build classes based off our Recipe Service that provide the data for each slice.

Setup

In the setup section, we need to import all the data types and functions we use in our Tables and slices. You will always need Table and Column. If you are going to sum or count things, you will need func as well. After that you will need to import the data types present in your table, a full list of these is available in the SQLAlchemy Docs

from sqlalchemy import func, Table, Column, String, Float

Next, we’ll import the declarative base, which provides a container to store all the metadata about our data tables. This is used by the recipes we’ll talk more about soon.

from sqlalchemy.ext.declarative import declarative_base

Most Juicebox stacks require a connection to Redshift, a type of database, to access their data. We’ve built an easier way to use a preconfigured engine to create connections.

from dataservices.redshift_connectionbase import redshift_create_engine

When we start creating data services to power our slices, we are going to use a recipe, which is a way of defining how we get and use data in our slice. All recipes are built from metrics and dimensions. A metric is a single unit of data that we are displaying in the slice. Dimensions are just like metrics, except that they are used to group or aggregate data in our slice. Recipes can also have filters that are used to limit the data output in our slice. More on this later, but for now we just need to import these components so we can use them in our slices.

from dataservices.recipe import Metric, Dimension, Filter

Recipes are part of the secret sauce that makes Juicebox awesome. The RecipeServiceBaseV3 is the default object that all data services inherit from. It provides most of the services needed by a Juicebox data services.

from dataservices.servicebasev3 import RecipeServiceBaseV3

With all the items we need imported, we start by connecting to the database. This is done by creating an engine, we’ll use in our BaseService.

engine = redshift_create_engine()

Next, we initialize a Base to use as the foundation of our data tables. This base stores important metadata about our tables, and how they function inside the database.

Base = declarative_base()

Data Tables

We begin the main part of our application by defining the data tables that will be used. The data tables are represented by classes, and those classes must inherit from the Base we created in the previous step in order to have access to all the query features available in the platform.

class Census(Base):
    __table__ = Table('census', Base.metadata,
                      Column('state', String(30), primary_key=True),
                      Column('sex', String(1)),
                      Column('age', Float()),
                      Column('pop2000', Float()),
                      Column('pop2008', Float()),
                      schema='demo', extend_existing=True)

Tables are defined and assigned to the __table__ attribute of the class. The Table constructor takes a table name, an instance of Base.metadata, a collection of Column objects, a schema name that we will provide to you, and a setting extend_existing=True to ensure that if the table is already defined that our new configuration is still applied. The name must match the table name present in Redshift.

Column objects are defined by a name, type, and other attributes. You can learn more about Column object types in the SQLAlchemy docs for types. At least one of the Column objects just be marked as a primary_key, and is used by the metadata and other Juicebox features.

Ingredients

Ingredients represent the data we will be using later in data services for our slices. There are three main types of ingredients: Metrics, Dimensions, and Filters. Metrics are values, which are often numeric calculations. Dimensions are typically categorical values that represent a grouping or aggregation. For example, if you wanted to know the average height by gender, “average height” would be a Metric, and “gender” would be a Dimension. It’s important to note that multiple columns from a table can be used to compute a Metric or Dimension. A Filter is used to limit the results of a recipe based on some metric or dimension.

Metrics and Dimensions are all defined with a reference to a column from a data table or a function operating on one or more columns. They also use keywords that define the visual formatting of the Ingredient. The singular and plural keywords specify what the singular and plural descriptions of the ingredient should be. To control the format of numbers, you can use the format keyword. If you need an expression to be output that includes the ingredient everywhere it appears, you can use the formatter keyword; however, this is often unnecessary.

An example Metric:

Metric(func.sum(Census.pop2008 - Census.pop2000), format=".3s",
       singular='Population Growth')

An example Dimension:

Dimension(Census.state, singular='State', plural='States',
          format="")

Recipe Service

Recipe services are the core elements of a Juicebox application. They are made up of collections of Metrics and Dimensions coupled with additional configuration settings. All Juicebox 3 recipe services inherit from the RecipeServiceBaseV3 we imported earlier, and define a metric_shelf and a dimension_shelf which contain the Ingredients we wish to use in our data services. Here is an example of a basic service.

class BasicService(RecipeServiceBaseV3):
    metric_shelf = {
        'pop2000': Metric(func.sum(Census.pop2000),
                          singular='Population 2000'),
        'pop2008': Metric(func.sum(Census.pop2008),
                          singular='Population 2008'),
        'popdiff': Metric(func.sum(Census.pop2008 - Census.pop2000),
                          singular='Population Growth'),
        'avgage': Metric(func.sum(Census.pop2008 * Census.age) /
                         func.sum(Census.pop2008), singular='Average Age'),

        # A metric using a complex expression
        'pctfemale': Metric(func.sum(case([(Census.sex == 'F',
                                            Census.pop2008)], else_=0)) /
                            func.sum(Census.pop2008),
                            singular='% Female'),

        # a metric using a formatter
        'pctdiff': Metric(func.sum(Census.pop2008 - Census.pop2000) /
                          func.sum(Census.pop2000),
                          singular='Population Pct Change',
                          formatters=[lambda x: "Change is {0:0.1%} "
                                                "percent".format(x)]),
    }

    dimension_shelf = {
        # Simplest possible dimension, a SQLAlchemy expression and a label.
        'state': Dimension(Census.state, singular='State'),

        # This will use the lookup to get display values of "M" and "F"
        'sex': LookupDimension(Census.sex, singular='Sex',
                               lookup={'M': 'Menfolk', "F": "Womenfolk"}),
    }

Our recipe service can also specify some values to filter on automatically. By default every slice listens to the slice above it to adjust its view of the data. They are a list of keys on which every slice should be automatically filtered if it is present in the request. It typically matches the dimensions and metrics used in the FilterService, which we will discuss in the next chapter.

An example of using automatic_filter_keys.

automatic_filter_keys = ('state',)

It is also possible to target a specific database by using the database attribute on a recipe service. This allows you to specify which database the recipes should be run against. Most commonly this is used to target the juicebox or healthstream databases. This is not required and by default targets the juicebox redshift instance.

An example of specifying the database.

database = 'juicebox'

We’re all set for building our global filters now.