Glossary
  • 2 Minutes to read
  • Dark
    Light
  • PDF

Glossary

  • Dark
    Light
  • PDF

Article summary

Aggregate

An aggregate is a cross-table statistical summary calculation, such as Min, Max, Sum, Average, Count.    It summarizes data for linked records.   For example, if there is a customer table linked to a transaction table via customer ID, an aggregate could be calculated on the customer table that provided Total Spending for each customer.   

Cardinality

Data cardinality refers to the uniqueness of the values contained in a field. If most of the values are distinct (i.e., different from each other), then it is considered to have high cardinality. If the column contains mostly repeated values, that makes it a low cardinality field/column.

For purposes of shortcut thresholds, a field is considered to have low cardinality if number of discrete values <= 15

There are significant differences in functionality depending on whether a field is high cardinality (continuous) or low cardinality (discrete).   See DataTypes for more details on this.

Continuous Field

A continuous field is a field that can have an unknown number of unique values, and for which it is not valid to calculate frequency counts, profiles etc.  Descriptive data are usually continuous, as well as DateTime fields, keys, names, addresses, measurements etc.  Continuous fields can be all field types.

Some continuous fields can be analyzed using Field Analyzer.   Depending on the number of discrete values in a field (discoverable using Data Audit) it may be possible to convert a field from continuous to discrete using the CTOD function.

Dimension

A dimension is a qualitative variable that provides context and meaning to a metric or measure.   It is used to group data.  Usually a dimension is a discrete field (or column).  Dimensions provide row groupings for tabulations and profiles, and are similar to "row fields" in a spreadsheet pivot table.

Discrete Field

A Discrete Field is a field that has an index applied to it so that frequency counts within the field are easily discoverable.   Some database systems refer to discrete fields as categorical fields. In DataJet discrete fields can be of any data type, although date fields and dataset fields are always discrete.  If a field is discrete it will:

  • be expandable in the Database Tree-View, 
  • can be profiled.  
  • can be used as a parameter in a variety of engineering functions

Depending on the number of unique values in a field, there may be an overhead in loading a field as discrete as opposed to continuous.  For this reason an automatic discrete threshold is used by the DataJet system.    By default this is 300,000 but in practise a discrete field with 300,000 is unwieldy and should only be used if absolutely required.

Whether a field is discrete or not is determined at load time, by specifying the DataType.   Load wizards will auto-detect fields that are viable as discrete, although it is possible to over-ride this.

Field

Foreign Table

Measure

Measures are quantitative measurements, sometimes referred to as metrics.   They control what is being counted or summarized, as well as how it is being summarized (e.g., Min, max, avg).  A measure is often used as short-hand for a numeric field that is being summarized.

Primary Table

A primary table refers to the ONE side of tables that are joined via a ONE to MANY join.  So for example, if the CUSTOMER table is joined to the TRANSACTION table via CustomerID, each record in the CUSTOMER table will relate to one, or MANY records in the TRANSACTION table.


Table

Templates

DataJet has 3 kinds of template:



Was this article helpful?