AnalyseSegments

Prev Next
Note!
AnalyseSegments is considered an INTENSIVE process - it can take some time to execute depending on the size of segmentation data being analysed.   Often AnalyseSegments will be scheduled as part of a load process.

Creates a data grid of intersection counts for a base filter and set of target filters for segmentation data.  Use this method to filter all campaign segmentation records by selected datasets.

Available output formats:

  • Workbook - see "workbook"
  • Table - see "table"
  • Data Table - see "saveModelAs"
  • Parquet - see "parquet"
  • Multi-Field Statistics - see "multiFieldStats"

Python Integration:

  • custom plots - see "plotter"
  • model processing - see "secondaryProcess"
KeyValue(s)Description
method"AnalyseSegments"Creates a data grid of intersection counts for a base filter and set of target filters for segmentation data
workbook"WorkBookName.xlsx"Required. Output file. Must be *.xlsx.
Data will be output to a worksheet named "segments"
table"tablename"Optional.
Target table for output results. If provided, a copy of the contents of workbook will be added to a new table.
targetDataSets [][
{Dataset1Definition},
{Dataset2Definition},
{Dataset3Definition} 
]
Required. Target datasets to use when calculating overlaps.
A minimum of 1 target dataset must be included.
Target datasets can be added either by dragging and dropping a dataset collection onto the JSON window, or by dragging and dropping a dataset.

NOTE: All target datasets must be on the primary contact table.  (The Primary Contact Table is a DataJet table containing at minimum Hash-Keys and their matching Integer-Keys.  It is created by loading the Master Hash-Key File generated by ProcessSegments.  See Campaign Prototype: How to make a system campaign enabled for further details).
To use data from other tables, first copy the required fields to the primary contact table using either CopyUp or CopyDown.

NOTE: A maximum of 32 target datasets can be processed in each call to Analyse Segments.
It may be possible to add more datasets, depending on the size of the segment data and the memory available to the DataJet Server.   A machine with 64GB of RAM could handle 128 datasets.   If large numbers of datasets need to be processed, it will reduce overall compute to add the maximum number of target datasets to the AnalyseSegments method as the bulk of the processing time occurs during the scan of the segment data.   This happens once per call to AnalyseSegments, regardless of how many target datasets have been specified.
To exceed 32 target datasets, set "ignoreMaxDataSets" to true.
dataSet {}

{DatasetDefinition}

Optional.  Base Dataset.  
If provided, filters all segments to include only records in the primary contact table that are also in the base dataset.

NOTE: The base dataset must be on the primary contact table.

"dataSet": {
    "logic": "or",
    "name": "a_gender_code_Female",
    "strict": true,
    "set": [
      {
        "logic": "and",
        "stype": "FIELD",
        "entity": {
          "type": "field",
          "name": "DATA_EYE_FULL.a_gender_code",
          "table": "DATA_EYE_FULL",
          "valueFilter": []
        },
        "op": "=",
        "values": [
          "Female"
        ]
      }
    ]
  }

segments [][NNNN, NNNN, NNNN]Optional.  If not present, all segments in the system will be analysed (i.e., all segments listed in the Audiences or Campaigns interface)
Provide a list of segments IDs to include in the analysis:
[
10001,
10002,
10003
]

NOTE: A list of segment IDs can be obtained from the Campaign | Audiences report. Filter the grid to show the required segments, and then use the Copy/Paste functionality to transfer segment IDs into the AnalyseSegments API.
NOTE: Only one of segments or segmentFilter can be used.
  "segments": [
    10846,
    10847,
    10848,
    10849,
    10850,
    10851,
    10852,
    10853,
    10854,
    10855
  ],


segmentFilter {}{FilterDefinition}Optional.
Only includes segments that meet criteria.

Required input:
  • field: full name of field containing segment IDs  (Tip: The source table for the segmentFilter query can be created from another AnalyseSegments call or from CreateSegmentModel)
  • dataSet{} : dataSet JSON for the query that will select the Ids.

NOTE: Only one of segments or segmentFilter can be used.

"segmentFilter": {
    "field": "DailyLoad.Id",
    "dataSet": {
      "logic": "or",
      "name": "DataSet",
      "strict": true,
      "set": [
        {
          "logic": "and",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DailyLoad.records",
            "table": "DailyLoad",
            "valueFilter": []
          },
          "op": ">=",
          "values": [
            "200000"
          ]
        }
      ]
    }
  },

project"ProjectName"
description"Description"
lowerLimitNOptional. Integer. Default = 0.
If non-zero, segments with rows fewer than N in the base dataset will not be included in the Analysis output.
NOTE: This filters outputs only - to filter inputs (and reduce calculation time), use either segment or segmentFilter
ignoreMaxDataSetstrue/falseOptional.  Default = false
If true, more than 32 target datasets can be specified.
NOTE: It is possible AnalyseSegments will fail if the engine server does not have enough memory.
threadsNReserved for future use
0,2,4,6,8
mode"modeName"Optional.  String.  Default = ""
Supported modes:
  • "index" - returns counts and index
  • "index only" - returns index only

See Index Example for details of Index calculation 

Available from v7.03.28.01
saveModelAs"modelName"Optional. String. Default = ""
If provided, the output from AnalyseSegments will be saved as a "hot model" and will be accessible from the Data Model Viewer.
An optional path can be included if required in which case a JSON data table model will be written to file.
Note: Set "overwrite": true to overwrite an existing model of the same name.
See "Deep Dive - Data Table Model Format" for further details of model format.
Available from v7.03.28.01



formatting{}{FormattingDefinition}Optional.
Works only on mode="index only".  
Displays colorscale or colorBars in Data Table Viewer |Sheet

"formatting": {
    "colorBars": {
      "border": "Blue",
      "fill": "LightBlue"
    },
    "colorScale": {
      "lowColor": "Cyan",
      "highColor": "Red"
    },
    "freezePane": {
      "row": 1,
      "column": 1
    }
  },

Available from v7.03.28.01
parquet"parquetFilename"Optional.
Name of parquet file to output to.
plotter{}"plotter": {
    "filename": "filename.py",
    "chart":
  },
Optional.
Outputs the image generated by the python script in "filename".
  • filename:  - name of python script containing image processing code which if run against a data table model will produce a matplotlib
  • chart: - TODO
secondaryProcess{}"secondaryProcess": {
    "snippet": "KMeans Param",
    "language": "python",
    "parameters": [
      {
        "name": "Clusters",
        "value": "5",
        "description": "Number Of Clusters"
      }
    ]
  }
Optional.
Sends a Standard Data Table Model to the python Model Processor and runs the specified snippet.
(See Model Processing - Quick Reference for details of how to configure a snippet.)
Only snippets which have been pushed into the snippet library are accessible via secondaryProcess.
Tip!
Use Secondary Processing to apply results filters to AnalyseSegments output - for example, selecting the top N segments by Index value for each target. 
Data which is output from secondaryProcess is accessed via the "Details" tab of the Data Table Viewer.
multiFieldStats"MFSReportName"Optional.
Name of muti-field statistics report to create from target dataset outputs.
NOTE: Only supported in "index only" mode.

Sample Output:


JSON right-click

  • Remove DataSet: deletes base dataset
  • Manage DataSets: Displays Dataset Collection Builder for target datasets
  • Mode: 


AnalyseSegments will take approx. 30 mins to run on 1800 segments against 1 billion primary contacts.   (This is against approx 230 billion foreign records).   Against an individual segment runtime will be under 10 seconds.


Blank method

{
  "method": "AnalyseSegments",
  "workbook": "",
  "DataSet": {},
  "segments": [],
  "targetDataSets": {},
  "description": "latest API",
  "project": "eyeota-audience"
  "lowerLimit": 0
}

Sample Method

{
  "method": "AnalyseSegments",
  "workbook": "%DATAPATH%REGRESS_AnalyseSegments_small.xlsx",
  "lowerLimit": 0,
  "segments": [
    10117
  ],
  "dataSet": {
    "logic": "or",
    "name": "Cadillac_1",
    "strict": true,
    "set": [
      {
        "logic": "and",
        "stype": "FIELD",
        "entity": {
          "type": "field",
          "name": "DATA_EYE_FULL.Cadillac",
          "table": "DATA_EYE_FULL",
          "valueFilter": []
        },
        "op": "=",
        "values": [
          "1"
        ]
      }
    ]
  },
  "targetDataSets": [
    {
      "logic": "or",
      "name": "New Homeowners",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - New Homeowners",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    },
    {
      "logic": "or",
      "name": "New Parents",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - New Parents",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    },
    {
      "logic": "or",
      "name": "Newly Single",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - Newly Single",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    },
    {
      "logic": "or",
      "name": "Expectant Parents (non-US)",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - Expectant Parents (non-US)",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    }
  ],
  "description": "One Segment - 10117",
  "project": "eyeota-pivot"
}


Index Example

This type of calculation is common in audience analysis. The Index represents how concentrated the Target group is within the Base group compared to how concentrated the Target group is within the overall Universe.

An index greater than 1 means the value being analysed is over-represented in the target group compared to the base - i.e., that value is more likely to be found in the target that in the comparison population.



mode: Index 
mode: index only

 

Formula:

Index = (Percentage of Base that is Target) / (Percentage of Universe that is Target)


Percentage of Base that is Target:

This tells us what proportion of the Base group also falls into the Target group.

Calculation: Target Overlap / Base

14015 / 1384161 ≈ 0.01012526 (or about 1.01%)


Percentage of Universe that is Target:

This tells us the overall prevalence of the Target group in the entire population.

Calculation: Target / Universe

41667 / 4520401 ≈ 0.00921765 (or about 0.92%)


Calculating Index:

Calculation:  Percentage of Base that is Target / Percentage of Universe that is Target 

Index = 0.01012526 / 0.00921765 

Index ≈ 1.0984778


Summary:

The Index calculation is:

Index = (Target Overlap / Base) / (Target / Universe)

Index = (14015 / 1384161) / (41667 / 4520401) 

Index ≈ 0.01012526 / 0.00921765 

Index ≈ 1.098478


As an index greater than 1 (like this one) means the Target audience is more likely to be found within the Base group than within the Universe as a whole, in this case, it is about 1.1 times (or 9.8%) more likely.


See Also:

Index Profile


Trouble-shooting

IssueCauseResolution
error cannot find keymismatch between active primary contact table key and campaign configuration file.The primary contact table is the table that contains the dynamic integer key (generated as part of ProcessSegments). Usually this is something like Table.key

The campaign configuration file is the json file that enables campaign functionality for the project.   By default it is stored in the Admin | Remote Files | Campaign directory and has the name [projectname].campaign.json

The key entry in this file should match the primary contact table key:
{
  "key": "DATA_EYE_FULL.key",
...
}


campaign Audiences report does not support drag and drop of segments

campaign report not available

dataset does not belong to key table
Nothing to output
cannot export model to table
either dataSet or targetDataSets contains a dataset from a table that is not linked to the Primary Contact TableRemove the dataset, or create a link between the dataset and the Primary Contact Table.