Multi-Field Statistics

Multi-Field Statistics allows statistics for multiple fields to be calculated and displayed in the same chart and/or grid.

It also supports the rapid creation of a dashboard of multiple individual statistics slots from up to 12 fields.

Results from the Multi-Field Statistics report can be exported into a table, ready for further analysis and engineering.

Report Layout


Quick Tip:

Drop a low cardinality field onto a measure and select Add Repeated Categorical Values from the popup menu to repeat the measure with a filter for each value.


Quick Tip:

Drop a table or field template onto the setup grid and select from:

  • Add All Discrete Numeric Fields - Adds all numeric fields in the table with datatype of discrete (up to 1024 fields)
  • Add All Numeric fields - Adds all numeric fields (up to 1024 fields)


Quick Tip:

Drop a filter on the result set to recalculate all fields with new dataset applied:


Report Setup



Adding Fields and filters

Fields

Add fields to the calculation by dragging and dropping from the Project Explorer.  Drop onto the field list area:

Drag the following object types:

  • Field(s)  - add any field type - 
    • most statistics will only be calculated for numeric fields, and some will only be calculated for discrete numeric fields.
    • drop a field onto a field already in the list and the helper menu will be displayed
  • Table- choose one of the following:
    • Add all Discrete Field numeric fields
    • Add all numeric fields
  • Field Template
    • Drop an existing field template into the field list area and choose "Add All Fields"
Note!
All fields must come from the same table



Filters

Each row that is added to the field list can have a separate filter applied to it.  To apply a filter to a row, drop the dataset onto the row in the field list.  Select a row to see the filter details. 

To apply a filter to the entire calcualtion, first calculate without the filter, and then drop the dataset onto the results grid.



Helper Menu


Add Field 

Adds selected field to the list


Replace Field

 Replaces the target field with the new field


Add Repeated Categorical Values

Displays the "Select Values" dialog.  Select the values to apply as filters.  A row will be added for each value selected.   Use the [...] help to control selection of values.


Set as Measure & Category

{TODO - What does this do?}


Select Fields

Hold CTRL or SHIFT keys to select multiple fields.


Re-arrange Fields

Use the arrow keys to move fields up and down in the calculation order


Remove Fields

Select the trashcan or right click and choose "Delete" to remove (a) field(s) from the calculation.


Create Template

Create a field template from selected fields by right-clicking and choosing "Create Template"

Transpose Results

Flip the orientation of the grid to have field names as either columns or rows.


Extended Stats

Calculate Extended statistics - on by default.  (Note - Extended statistics are only calculated for numeric fields)

Available extended statistics (see Available Statistics for more details):

  • range
  • trimean
  • q1
  • q2 (median)
  • q3
  • iqmean
  • igrange
  • midhinge
  • kurtosis
  • skew

Calculate

There is no auto-calc on the multi-field statistics report. Once setup is complete, select the calculate button:

The result tab will automatically be displayed once the calculation is complete. 


Results

Once the results tab is displayed, explore results further via the following options:

Right-Click

Right click a row in the grid and a helper menu will be displayed:

  • Clear Data Set - removes the base dataset if one is present.  To check whether a dataset has been applied, open the info panel.
  • -----
  • View Statistics - Opens the statistics report for the selected row.  If a filter has been applied, it will be included in the report.    Use the statistics report to see the detailed statistics, box-plot and histogram for the selected row.  The statistics report can also be used to create a new banding or grouping from the field. 
  • View Profile - Open the profile report for the selected row.
  • Navigate To - Open the selected field in Project Explorer.   From here, view underlying data, create profiles, and engineer data.

Filter

Recalculate all statistics for a sub set of data by dropping a dataset onto the result grid.


Sort and Filter Results

The existing result set can be sorted and results from the results excluded or included by modifying column filters.

  • To sort: Click on a column header
  • To Filter: Select the table or field name to filter on


Controlling Display Options

See Micro Toolbar for details of the following:

  • Display options
    • Grid only
    • Chart only
    • Grid and Chart
  • Sort Mode
  • Relative Width/Height
  • Grid/Chart display orientation
  • Export options
Box Plots
By default a box plot of all rows will be displayed.   However, if specific statistics are being plotted via the graphing menu, a different kind of plot may be displayed.  To return to a box-plot, select "Featured Report" from the charting option dropdown.



Visualisation

Visualisation can be controlled:

  • Via the usual charting and display options (see Micro-toolbar
  • As part of setup.   
  • By the right-click | Graphing option.


Create Dashboard

Automatically create a dashboard showing individual statistics for the first 12 rows in the result grid by choosing Automation (Open as Dashboard) from the micro-toolbar.  To inspect a single variable in more detail, select the desired slot and choose "Open Copy"



Available Statistics

(The exclusive method is used to calculate the Interquartile range)

StatisticDescriptionCalculationNotes
avgMean
The arithmetic mean value for the field in the Dataset
Sum of Values / Count of Values

μ = (Σxᵢ) / N

Where:

  • μ (mu) is the population mean
  • Σ (sigma) represents the summation (adding up)
  • xáµ¢ represents each individual value in the population
  • N is the total number of individuals in the population
Core Stats
Describes the central tendency of the entire group being studied and is calculated by summing all the values and dividing by the total number of individuals in the population.

countThe number of records (or observations for the field) in the DataSet (including null)number of observationsCore Stats
Basic level of summarization, often represented in frequency charts and visualisations
iqmeanInterquartile mean
Average of all the data points that lie within the interquartile range (IQR)
(Sum of data points between Q1 and Q3) / (Number of data points between Q1 and Q3)Extended Stats
A robust measure of central tendency that calculates the average of the data points within the interquartile range, providing a more representative "typical" value when outliers are present.
  • IQMean and mean are close: suggests data is relatively symmetrical without significant outliers. 
  • IQMean noticeably different from mean: indicates data is likely skewed or has outliers
  • IQMean and median are close: suggests middle 50% of data is relatively symmetrical.
  • IQMean different from median: indicates middle 50% of data is not perfectly symmetrical. 
  • IQMean and trimean are similar: suggests relatively consistent distribution within the IQR
iqrangeIQR
Represents the spread of the middle 50% of the data
IQR = Q3 – Q1Extended Stats
A robust measure of dispersion. Less sensitive to extreme values (outliers) than the range or standard deviation. This is because it focuses on the middle 50% of the data and ignores the values in the tails.
kurtosisDescribes the shape of the probability distribution.  A measure of how "peaked" or "flat" the distribution is.Extended Stats
Kurtosis indicates the "tailedness" of the distribution (i.e., how often outliers occur)
~3 : normal distribution
>3 : fat tails (leptokurtic)
<3 : thin tails (platykurtic)
  • High Kurtosis (Leptokurtic): Indicates a higher probability of extreme values and a sharper peak. This might suggest a greater risk or volatility in the data.
  • Low Kurtosis (Platykurtic): Indicates a lower probability of extreme values and a flatter peak. This might suggest a more stable or predictable dataset.
  • Kurtosis near 3 (Mesokurtic): Suggests the distribution is similar in shape to a normal distribution.
maxThe maximum value for the field in the Datasetmaximum valueCore Stats
The highest value in the dataset
midhingeMid-Quartile Range
A measure of central tendency that is less sensitive to extreme values (outliers) than the mean
(Q1 + Q3) / 2Extended Stats
Represents the centre of the Middle 50%: The midhinge represents the midpoint of the interquartile range (IQR). The IQR is the range between Q1 and Q3, encompassing the middle 50% of the data. Therefore, the midhinge is the centre of that middle 50%.
minThe minimum value for the field in the Datasetminimum valueCore Stats
The lowest value in the dataset
modeThe most commonly occurring valueCount occurrence of each value and select most commonly occurring valueCore Stats
A measure of central tendency that represents the value or values that appear most frequently in a dataset.
The mode is the most appropriate measure of central tendency for categorical data.
nullsThe number of null or empty records/observations
Core Stats
The number of records with no observations
q1First quartile. The 25th percentile. Lower quartile.Arrange data in ascending order
position = (n+1) * 0.25
Extended Stats
Represented on a box plot as the lower edge of the box
q2 (median)Middle value of the DatasetArrange data in ascending order
position = (n+1)/2
Extended Stats
Middle number in an ordered dataset.  50% of the data is below this value, 50% above.
q3Third quartile. Upper Quartile. Middle value of second half of DatasetArrange data in ascending order
position = (n+1) * 0.75
Extended Stats
The value that separates lowest 75% of the data from highest 25%
Represented on a box plot as the upper edge of the box
rangethe total spread of values for the distributionmax - minExtended Stats
A simple measure of dispersion.  Sensitive to outliers.  
skewskewness
Where:
  • σ = population standard deviation
  • X = data point
  • µ = population mean
  • N = population count
Extended Stats
Skewness provides a measure of the "asymmetry" of a distribution.
no skew (= 0) :  mean = median
right (+ve) skew (tail on the right) : mode < median < mean
left (-ve) skew (tail on the left ) : mean < median

stdevThe standard deviation from the mean for all values in dataset(Square root of variance)
Square Root of  
[(Sum of square of differences from mean) / (number of data points -1) ]

Where:

  • σ = population standard deviation
  • X = data point
  • µ = population mean
  • n = population count
Core Stats

Measures dispersion.


standard errorEstimate of variability of the calculated mean from the true population mean.

  • σx  = standard deviation
  • N = total number of observations

Core Stats
A measure of how accurately the mean represents the true mean of the population.
sumTotal of all non-null values in the datasetSum of all valuesCore Stats
trimeanA measure of central tendency - i.e., a representation of the "centre" of the data that gives the median twice the weight of the other quartiles(Q1 + 2 * Q2 + Q3) / 4Extended Stats
Weighted average of the first quartile (Q1), the median (Q2), and the third quartile (Q3), giving more weight to the median.
If trimean and mean are close, data is likely to be relatively symmetrical. If trimean noticeable different from mean, data is likely skewed or has outliers.
variancePopulation variance


[(Sum of square of differences from mean) / (number of data points) ]

Where:

  • σ = population standard deviation
  • X = data point
  • µ = population mean
  • N = population count

Core Stats
Variance is an indication of the "dispersion" of the data from the mean