AnalyseSegments

Note!

AnalyseSegments is considered an INTENSIVE process - it can take some time to execute depending on the size of segmentation data being analysed. Often AnalyseSegments will be scheduled as part of a load process.

Creates a data grid of intersection counts for a base filter and set of target filters for segmentation data. Use this method to filter all campaign segmentation records by selected datasets.

Available output formats:

Workbook - see "workbook"
Table - see "table"
Data Table - see "saveModelAs"
Parquet - see "parquet"
Multi-Field Statistics - see "multiFieldStats"

Python Integration:

custom plots - see "plotter"
model processing - see "secondaryProcess"

Key	Value(s)	Description
method	"AnalyseSegments"	Creates a data grid of intersection counts for a base filter and set of target filters for segmentation data
workbook	"WorkBookName.xlsx"	Required. Output file. Must be *.xlsx. Data will be output to a worksheet named "segments"
table	"tablename"	Optional. Target table for output results. If provided, a copy of the contents of workbook will be added to a new table.
targetDataSets []	[ {Dataset1Definition}, {Dataset2Definition}, {Dataset3Definition} ]	Required. Target datasets to use when calculating overlaps. A minimum of 1 target dataset must be included. Target datasets can be added either by dragging and dropping a dataset collection onto the JSON window, or by dragging and dropping a dataset. NOTE: All target datasets must be on the primary contact table. (The Primary Contact Table is a DataJet table containing at minimum Hash-Keys and their matching Integer-Keys. It is created by loading the Master Hash-Key File generated by ProcessSegments. See Campaign Prototype: How to make a system campaign enabled for further details). To use data from other tables, first copy the required fields to the primary contact table using either CopyUp or CopyDown. NOTE: A maximum of 32 target datasets can be processed in each call to Analyse Segments. It may be possible to add more datasets, depending on the size of the segment data and the memory available to the DataJet Server. A machine with 64GB of RAM could handle 128 datasets. If large numbers of datasets need to be processed, it will reduce overall compute to add the maximum number of target datasets to the AnalyseSegments method as the bulk of the processing time occurs during the scan of the segment data. This happens once per call to AnalyseSegments, regardless of how many target datasets have been specified. To exceed 32 target datasets, set "ignoreMaxDataSets" to true.
dataSet {}	{DatasetDefinition}	Optional. Base Dataset. If provided, filters all segments to include only records in the primary contact table that are also in the base dataset. NOTE: The base dataset must be on the primary contact table. `"dataSet": { "logic": "or", "name": "a_gender_code_Female", "strict": true, "set": [ { "logic": "and", "stype": "FIELD", "entity": { "type": "field", "name": "DATA_EYE_FULL.a_gender_code", "table": "DATA_EYE_FULL", "valueFilter": [] }, "op": "=", "values": [ "Female" ] } ] }`
segments []	[NNNN, NNNN, NNNN]	Optional. If not present, all segments in the system will be analysed (i.e., all segments listed in the Audiences or Campaigns interface) Provide a list of segments IDs to include in the analysis: [ 10001, 10002, 10003 ] NOTE: A list of segment IDs can be obtained from the Campaign \| Audiences report. Filter the grid to show the required segments, and then use the Copy/Paste functionality to transfer segment IDs into the AnalyseSegments API. NOTE: Only one of segments or segmentFilter can be used. `"segments": [ 10846, 10847, 10848, 10849, 10850, 10851, 10852, 10853, 10854, 10855 ],`
segmentFilter {}	{FilterDefinition}	Optional. Only includes segments that meet criteria. Required input: field: full name of field containing segment IDs (Tip: The source table for the segmentFilter query can be created from another AnalyseSegments call or from CreateSegmentModel) dataSet{} : dataSet JSON for the query that will select the Ids. NOTE: Only one of segments or segmentFilter can be used. `"segmentFilter": { "field": "DailyLoad.Id", "dataSet": { "logic": "or", "name": "DataSet", "strict": true, "set": [ { "logic": "and", "stype": "FIELD", "entity": { "type": "field", "name": "DailyLoad.records", "table": "DailyLoad", "valueFilter": [] }, "op": ">=", "values": [ "200000" ] } ] } },`
project	"ProjectName"
description	"Description"
lowerLimit	N	Optional. Integer. Default = 0. If non-zero, segments with rows fewer than N in the base dataset will not be included in the Analysis output. NOTE: This filters outputs only - to filter inputs (and reduce calculation time), use either segment or segmentFilter
ignoreMaxDataSets	true/false	Optional. Default = false If true, more than 32 target datasets can be specified. NOTE: It is possible AnalyseSegments will fail if the engine server does not have enough memory.
threads	N	Reserved for future use 0,2,4,6,8
mode	"modeName"	Optional. String. Default = "" Supported modes: "index" - returns counts and index "index only" - returns index only See Index Example for details of Index calculation Available from v7.03.28.01
saveModelAs	"modelName"	Optional. String. Default = "" If provided, the output from AnalyseSegments will be saved as a "hot model" and will be accessible from the Data Model Viewer. An optional path can be included if required in which case a JSON data table model will be written to file. Note: Set "overwrite": true to overwrite an existing model of the same name. See "Deep Dive - Data Table Model Format" for further details of model format. Available from v7.03.28.01

formatting{}	{FormattingDefinition}	Optional. Works only on mode="index only". Displays colorscale or colorBars in Data Table Viewer \|Sheet `"formatting": { "colorBars": { "border": "Blue", "fill": "LightBlue" }, "colorScale": { "lowColor": "Cyan", "highColor": "Red" }, "freezePane": { "row": 1, "column": 1 } },` Available from v7.03.28.01
parquet	"parquetFilename"	Optional. Name of parquet file to output to.
plotter{}	"plotter": { "filename": "filename.py", "chart": },	Optional. Outputs the image generated by the python script in "filename". filename: - name of python script containing image processing code which if run against a data table model will produce a matplotlib chart: - TODO
secondaryProcess{}	"secondaryProcess": { "snippet": "KMeans Param", "language": "python", "parameters": [ { "name": "Clusters", "value": "5", "description": "Number Of Clusters" } ] }	Optional. Sends a Standard Data Table Model to the python Model Processor and runs the specified snippet. (See Model Processing - Quick Reference for details of how to configure a snippet.) Only snippets which have been pushed into the snippet library are accessible via secondaryProcess. Tip! Use Secondary Processing to apply results filters to AnalyseSegments output - for example, selecting the top N segments by Index value for each target. Data which is output from secondaryProcess is accessed via the "Details" tab of the Data Table Viewer.
multiFieldStats	"MFSReportName"	Optional. Name of muti-field statistics report to create from target dataset outputs. NOTE: Only supported in "index only" mode.

Sample Output:

JSON right-click

Remove DataSet: deletes base dataset
Manage DataSets: Displays Dataset Collection Builder for target datasets
Mode:

AnalyseSegments will take approx. 30 mins to run on 1800 segments against 1 billion primary contacts. (This is against approx 230 billion foreign records). Against an individual segment runtime will be under 10 seconds.

Blank method

{
  "method": "AnalyseSegments",
  "workbook": "",
  "DataSet": {},
  "segments": [],
  "targetDataSets": {},
  "description": "latest API",
  "project": "eyeota-audience"
  "lowerLimit": 0
}

Sample Method

{
  "method": "AnalyseSegments",
  "workbook": "%DATAPATH%REGRESS_AnalyseSegments_small.xlsx",
  "lowerLimit": 0,
  "segments": [
    10117
  ],
  "dataSet": {
    "logic": "or",
    "name": "Cadillac_1",
    "strict": true,
    "set": [
      {
        "logic": "and",
        "stype": "FIELD",
        "entity": {
          "type": "field",
          "name": "DATA_EYE_FULL.Cadillac",
          "table": "DATA_EYE_FULL",
          "valueFilter": []
        },
        "op": "=",
        "values": [
          "1"
        ]
      }
    ]
  },
  "targetDataSets": [
    {
      "logic": "or",
      "name": "New Homeowners",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - New Homeowners",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    },
    {
      "logic": "or",
      "name": "New Parents",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - New Parents",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    },
    {
      "logic": "or",
      "name": "Newly Single",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - Newly Single",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    },
    {
      "logic": "or",
      "name": "Expectant Parents (non-US)",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "FIELD",
          "entity": {
            "type": "field",
            "name": "DATA_EYE_FULL.Demo - Life Events - Expectant Parents (non-US)",
            "table": "DATA_EYE_FULL",
            "valueFilter": []
          },
          "op": "=",
          "values": [
            "1"
          ]
        }
      ],
      "NodeType": "DataSet"
    }
  ],
  "description": "One Segment - 10117",
  "project": "eyeota-pivot"
}

Index Example

This type of calculation is common in audience analysis. The Index represents how concentrated the Target group is within the Base group compared to how concentrated the Target group is within the overall Universe.

An index greater than 1 means the value being analysed is over-represented in the target group compared to the base - i.e., that value is more likely to be found in the target that in the comparison population.

mode: Index

mode: index only

Formula:

Index = (Percentage of Base that is Target) / (Percentage of Universe that is Target)

Percentage of Base that is Target:

This tells us what proportion of the Base group also falls into the Target group.

Calculation: Target Overlap / Base

14015 / 1384161 ≈ 0.01012526 (or about 1.01%)

Percentage of Universe that is Target:

This tells us the overall prevalence of the Target group in the entire population.

Calculation: Target / Universe

41667 / 4520401 ≈ 0.00921765 (or about 0.92%)

Calculating Index:

Calculation: Percentage of Base that is Target / Percentage of Universe that is Target

Index = 0.01012526 / 0.00921765

Index ≈ 1.0984778

Summary:

The Index calculation is:

Index = (Target Overlap / Base) / (Target / Universe)

Index = (14015 / 1384161) / (41667 / 4520401)

Index ≈ 0.01012526 / 0.00921765

Index ≈ 1.098478

As an index greater than 1 (like this one) means the Target audience is more likely to be found within the Base group than within the Universe as a whole, in this case, it is about 1.1 times (or 9.8%) more likely.

See Also:

Index Profile

Trouble-shooting

Issue	Cause	Resolution
error cannot find key	mismatch between active primary contact table key and campaign configuration file.	The primary contact table is the table that contains the dynamic integer key (generated as part of ProcessSegments). Usually this is something like Table.key The campaign configuration file is the json file that enables campaign functionality for the project. By default it is stored in the Admin \| Remote Files \| Campaign directory and has the name [projectname].campaign.json The key entry in this file should match the primary contact table key: `{ "key": "DATA_EYE_FULL.key", ... }`
campaign Audiences report does not support drag and drop of segments
campaign report not available
dataset does not belong to key table Nothing to output cannot export model to table	either dataSet or targetDataSets contains a dataset from a table that is not linked to the Primary Contact Table	Remove the dataset, or create a link between the dataset and the Primary Contact Table.

Multi-Field Statistics