Eyeota Overview

Article In Progress - Content May Change...

Overview

The Audience Selection solution loads and processes daily segmentation files, received from a vendor, into a DataSource that is then used to:

Create and schedule regular audience data extracts as part of the Audience Extract Solution
Allow datasource segmentation to be accessed by DataHub customers via a DataHub Package

Audience Extract Solution

The Audience Extract Solution supports the creation and extraction of "audiences". An Audience is a set of selection rules, generated from vendor supplied segment data. These segment data are accessed via the campaign selection interface, either in DataJet Desktop, or in the Audiences Web App.

Load Projects & Scripts

The system requires 2 main load projects, referred to here as dsource1-incremental and dsource1-audience.

dsource1-incremental: the project that processes todays data and contains data used to create daily audience extracts.
dsource1-audience: the project that contains the data used to create new audiences. This is the data sitting underneath the Audience Builder Solution and DataHub Package and contains data for a multi-day period.

Project: dsource1-incremental

dsource1-incremental is the project used to create daily audience extracts - it contains only the data received for the active day's processing. The following process is used:

Active day's raw data files are processed (using ProcessSegements)
Segment data for those files are stored in the campaign storage system (...dsource1-incremental/segs)
Unique Hash-keys are loaded into the PRIMARY CONTACT TABLE (DATA_XXX_DAILY) (using CreateTableFromFile) where XXX is the vendor 3 digit code
A daily load history file is created and loaded into table (DailyLoad) (using GetCampaignHistory and targetTable)
Existing audiences are re-imported into the project (using ImportAudienceDefinitions)
Audience Definitions are used to create extract files and metadata (via BulkExportAudiences)
Today's Load History is added to the .../DataSource1Daily history folder (DailyLoad_YYMMDD...dat)
Audience extract files are sent back to vendor

Project: dsource1-audience

Dsource1-audience is the multi-day project used to create new audiences and the package that is injected as part of the datahub solution. The following process is used:

Raw data files are processed (usually 7-15 days worth, up to a limit of 1.8 billion "Hash-Keys", where a Hash is an encoded email) (using ProcessSegments)
Segment data are stored in the campaign storage system (...dsource1-audience/segs)
Unique Hash-keys are loaded into the PRIMARY CONTACT TABLE (DATA_XXX_FULL)
A daily load history table (DailyLoad) is created (using GetCampaignHistory and targetTable)
An Extract history table (LoadHistory) is created by loading all daily load reports generated by project dsource1-incremental (using CreateTableFromFile and folder DSource1Daily)
The Audience Builder app connects to the project and uses segmentation data to build, edit and schedule audiences for extract.
A package is built and distributed so that the segmentation data can be used by other projects (e.g., DataHub)

Project Separation

It is important that the segmentation files for dsource1-audience and dsource1-incremental are kept separate, as they do not use the same key-hash relationship. This means that today's data gets processed twice, once in each project. This is controlled via the campaign definition files for each project. This is designed so that the incremental project (which takes under an hour to process) can be executed as needed without requiring a full rebuild of the DataSource package (which takes takes 4-6 hours)

{
 ...
  "alternatePath" : "/home/engine/datasources/campaignRoot/onetouch-dev01/dsource1-audience/segs",
  ...
}

{
 ...
  "alternatePath" : "/home/engine/datasources/campaignRoot/onetouch-dev01/dsource1-incremental/segs",
  ...
}

For more detail on campaign definition files, see Campaign Prototype: How to make a system campaign enabled

API Call Overview

The following API calls support these processes and applications

AudienceDelete - Deletes an audience
AudienceLS - Provides a list of audiences
AudiencePush - Adds or updates audience details
AudienceRename - Renames an existing audience
BulkExportAudiences - Create audience extract files and summary metadata
CountCampaignDataset - Returns number of records in an audience
CreateCampaignDataset - Adds a dataset field to the Primary Contact Table that shows membership of an audience
ExportAudience - Exports the hash-keys for an audience to file
GetCampaignHistory - Creates a segmentation and audience overview report, listing segmentations and the number of records
ImportAudienceDefinitions - Imports a set of audience definitions from file
ExportAudienceDefinitions - Exports a set of audience definitions to file
ProcessSegments - Processes raw Eyeota segmentation files into campaign data files
SetAudienceState - Set the active state on an audience

Functional Areas

Automated Data Load and Refresh - Daily processing of 2 load projects: dsource1-audience (10-15 days of data) and dsource1-incremental (today's data)
Audience Solution
1. Audience creation - a web-app for audience creation (accessed externally by the Audience creation team) using the dsource1_audience project
2. Automated Audience Export and Distribution - an automated process that extracts and distributes daily audience files from the dsource1_incremental project. Note: this is just the data received in the last cycle.
DataHub Solution
1. Automated datasource package creation and deployment - a DataJet package of the data contained in dsource1_audience
2. Datasource segment integration in Marketplace Extracts - Injection and linking of the dsource1_audience data for use in the DataHub

Automated Data Load and Refresh

Both the Audiences and DataHub solutions depend on the Automated Load and Refresh Process, which consists of two parts:

DSource1_Daily_Audience_Extract.json - this script processes today's files and creates the audience extract files.
Dsource1_Load_And_Package.json - this script creates the full DataSource that is used by Audience Builder and DataHub

Process Overview

Each project uses the same methodology to process the raw data using the ProcessSegments method. This method takes raw data files from the data supplier and processes them to create the DATA_XXX_* tables.

The workflow is as follows:

Receipt of Raw Data
Transfer and decompression of raw data into sub folders of the Source Folder - called Daily Source Folders
Datajet Segment Processing of specified Daily Source Folders , generating
1. Segment data files - binary campaign data files used to calculate which ids are present in which segments and audiences
2. Hash-Key files - list of unique ids plus unique email hashes.
Creation of Primary Contact Table by Loading Master Hash-Key file - e.g., DATA_XXX_FULL, DATA_XXX_EXTRACT

Receipt of Raw Data

Data comes from the vendor as compressed *.tsv files which are stored in Daily Source Folders.

Each file has the format:

HEMSHA2_Key Key_Type {Array of Segment IDs}

Each Raw Data file contains a list of HEMSHA2 Hash of an email address - called a Hash-Key - as well as an array of segments that apply for that Hash-key.

0009416759a1f763ceac6e24c152335aa61ad1128da9a3610a0703ccde4e24bd	hemsha2	11470,11225,13529,16540,19176,21584

Compressed Data

Files with the following extensions are ignored by ProcessSegments:

*.gz
*.zip
*.rar

Transfer and de-compression of raw data

Raw Data files are organised into Daily Source Folders. These Daily Source Folders are stored under the Source Input Folder (e.g. ...DSource1\...\US) and take the format YYYYMMDD.

The HEMSHA2 raw data files are stored in a sub directory:

...\YYYYMMDD\HEMSHA2

Either this needs to be specified as a childfolder in the ProcessSegments call, or the data needs to be unzipped into the daily source folder:

{
  "method": "ProcessSegments",  
...
  "childfolder": "HEMSHA2",
...
}

Segment Processing

Data is processed using the ProcessSegments method. This method:

Associates each unique hash key in the Daily Source Folders with a system generated integer key
For each Daily Source Folder, creates a binary data file (called segX.dat) file for each segment. Each data file contains a list of all the unique keys that are in the segment. These binary segment files are stored in Daily Segment Folders within the Segment Root Folder. (Note: The segment root folder is defined in the project's campaign definition file)
Creates a Master Hash-Key file that contains the mapping between the HEMSHA2 key and the system generated integer key
- Hash-Keys are the Hash Key's supplied by DataSource Supplier in the raw data files
- Integer-Keys are the keys generated each day by the ProcessSegments method. These are used in order to minimise disk space requirements for the processed segment data and system response times and do not persist between data updates.

{
  "method": "ProcessSegments",
  "sourcepath": "/home/.../datasources/.../dsource1/.../US/",
  "targetpath": "/home/.../datasources/campaignRoot/realm-name/dsource1-audience",
  "segspath": "/home/engine/datasources/campaignRoot/realm-name/dsource1-audience/segs",
  "childfolder": "HEMSHA2",
  "reportFile": "%OUTPUT%SegmentProcessReport.json",
  "latestFolders": 15,
  "-note": "latestFolders 1 just loads most recent daily source folder",
  "maxLines": 1500000000,
  "numericFoldersOnly": true,
  "ignoreCompressedFolders": true,
  "finalConvert": true,
  "cleanOnStart": true,
  "sample": false,
  "writekeys": true,
  "description": "Process DSource1 Segments",
  "project": "dsource1-audience"
}

Creation of Primary Contact Table

The Primary Contact Table is a DataJet table containing at minimum Hash-Keys and their matching Integer-Keys. It is created by loading the Master Hash-Key File generated by ProcessSegments.

Audience Dataset Fields

In DataJet Desktop it is possible to create dataset fields of audiences by using the Create Dataset option (this calls the CreateCampaignDataset method):

The following shows the CreateTableFromFile method used to load the DATA_XXX Primary Contact table:

{
  "method": "CreateTableFromFile",
  "action": "LOAD",
  "table": "DATA_XXX",
  "filename": "%DATAPATH%.../US/maindatafile.txt",
  "definition": [
    "SHA256|CONTINUOUS|STRING|",
    "Key|CONTINUOUS|INTEGER|"
  ],
  "loading": [
    "SHA256",
    "Key"
  ],
  "preHeaderSkip": 0,
  "dateFormat": "YYYY-MM-DD",
  "fileFormat": "ASCII8",
  "skipFirstLine": false,
  "delimiter": "TAB",
  "stripCharacter": "",
  "dateTimeFormat": "YYYY-MM-DD HH:MM:SS",
  "project": "DSource1_Test",
  "sample": false,
  "ignoreErrors": false
}

Audience Extract Solution

To access the Audiences App, go to:

webhome.allantgroup.com/amp/{realm-name}

Select the Audience Extraction Project project (VENDOR_Audience), and the audiences app will open:

Audience Creation

For the Audience Extraction Solution, audiences will be created in the web app.

Storage of Audiences

Audience details are stored in Mongo on the active Realm.

Audience availability

NOTE: Audiences are currently only available in the project in which they were built. Audiences can be shared between projects and realms by using the ExportAudienceDefinitions and ImportAudienceDefinitions methods.

Automated Audience Export and Distribution

Export Methods

Live audiences are exported daily - note that audiences are exported from the dsource1-incremental project. Audiences can be exported using one of two methods:

ExportAudience - exports the hash-keys for the specified audience id to an individual *.txt file.
BulkExportAudiences - exports all audiences of specified status into appended files of a specified size (default = 1 million hash-keys)

ExportAudience contains Hash keys only, whereas BulkExportAudiences contains a configurable set of fields, e.g., Hash|CountryCode|AudienceID

BulkExportAudiences file format

The content of the files generated by the BulkExportAudiences methods are determined by the BulkExportAudiences method.

{
  "method": "BulkExportAudiences",
  "status": "*",
  "fileStub": "testexport",
  "prefix": "US\t",
  "suffix": "\t%AUDIENCEID%",
  "recordsPerFile": 50000,
  "project": "dsource1-incremental",
  "description": "Export all audiences of all status"
}

Location of Export Files

The location of the export files is determined by the project's Campaign Definition File. (See Campaign Prototype: How to make a system campaign enabled) for more details on system and project configuration.

{
  "key": "DATA_XXX.key",
  "export": {
    "fields": [ "DATA_XXX.hash" ],
    "root": "D:/Datajet/campaignExports/"
  }
}

Sample Scripts

eyeota-incremental

See sample script: eyeota-incremental for further details

eyeota-audience

See sample script: eyeota-audience for further details

DataSource Marketplace Solution

The DataSource Marketplace Solution makes the vendor segmentation data available for use in client projects, or the DataHub.

There are 2 ways that the this type of vendor data can be integrated into other projects:

simple injection
campaign injection

Marketplace Solution Availability

The Marketplace Solution is currently only supported in DataJet Desktop

Simple Injection

Simple Injection is where a package is created from the primary contact table (DATA_DSOURCE_FULL) and then this package is injected into other projects. This injected table can then be linked via hashed email to an identification table - for example DATA_AWI or a derivative.

Appending Segmentation Data

For the injected DATA_DSOURCE table to be useful in this scenario, it usually needs to have had segmentation information appended to it as DataSet Fields or Datasets.

It is not recommended to add too many dataset fields to the Primary Contact Table, but in many scenarios this will provide sufficient value to the consumer project.

To add DataSet Fields to the DATA_DSOURCE table, do the following:

Make the project campaign enabled by adding a campaign definition file for the project to the realm folder in campaignRoot (see How to make a system campaign enabled for further details)
Close and re-open the project
Open the Campaign Selector report (Campaign | Selection | Remote)
Create an audience by clicking on the New Audience icon and giving the audience a name
Double-click a row in the Segment list to add the row to the audience
Calculate the audience by clicking Run
Add the audience as a field to the Primary Contact Table (DATA_DSOURCE) by selecting the Create Dataset icon

Campaign Injection

Campaign Injection is where the DATA_DSOURCE table is injected into a project that has also been campaign enabled.

A project is campaign enabled if:

It has a campaign definition file in the campaignRoot folder.
It has access to the datasource segmentation data

Campaign Injection allows a user to access all Datasource segments and create datasets from them.

{TODO: What can usefully be done with these datasets?}

Combining DataSource segments to build custom audiences/datasets

Combing DataSource segments with data from other fields

Using DataSource segments in analysis reports

{TODO: What needs to be done with these datasets?}

Automated DataSource package creation and deployment

{TODO: How to create a package}

{TODO: How to deploy and inject}

DataSource segment integration in Marketplace Extracts

{DataJet Desktop UI}

{TODO: Show selecting from campaign}

{TODO: Show Combined Segment}

Audience Selection Overview

Overview

Audience Extract Solution

Load Projects & Scripts

Project: dsource1-incremental

Project: dsource1-audience

API Call Overview

Functional Areas

Automated Data Load and Refresh

Process Overview

Receipt of Raw Data

Transfer and de-compression of raw data

Segment Processing

Creation of Primary Contact Table

Audience Extract Solution

Audience Creation

Storage of Audiences

Automated Audience Export and Distribution

Sample Scripts

eyeota-audience

DataSource Marketplace Solution

Simple Injection

Appending Segmentation Data

Campaign Injection

Automated DataSource package creation and deployment

DataSource segment integration in Marketplace Extracts