ProcessSegments
  • 3 Minutes to read
  • Dark
    Light
  • PDF

ProcessSegments

  • Dark
    Light
  • PDF

Article summary

Loads a file containing arrays of segmentation data into a segment table and creates a file of unique segment IDs

KeyValue(s)Description
method"ProcessSegments"Loads a file containing arrays of segmentation data into a segment table and creates a file of unique segment IDs
sourcepath"Path"Path to directory containing raw data to be processed.   If data are stored in more than one folder, this should be the root folder immediately above the individual data folders.
targetpath"Path"Root folder for storage of processed data.  Generally the same as sourcepath.  Location where HASH-KEY file is stored.
Filename is maindatafile.dat
segspath"Path"Root folder for processed segment files.
dirs[][
"subfolder1"
"subfolder2"
"..."
]
List of folders containing raw data.
childfolder"FolderName"optional.  name of child folders if the daily folders contain them.
verbosetrue/falsedefault = false.  If true, provides additional logging in "info" section of API response.
finalConverttrue/falsedefault = true.  If true creates maindatafile.txt from maindatafile.dat, ready for loading into a datajet table.
cleanOnStarttrue/falsedefault = false.  If true, removes maindatafile.dat before starting processing.
sampletrue/falsedefault = false.  If true loads first file in each specified folder.
writekeystrue/falsedefault = true.  If true, writes out the segment files.  Set to false in order to just test generation of the hash-key file.
numericFoldersOnlytrue/falseignore non-numeric folders when identifying sub-folders to process
ignoreCompressedFolderstrue/falseignore folders containing only compressed data (*.gz, *.zip, *.rar)
project

lastFolder
maximum number of folders to process, starting with the most recent folder name  and going backwards  (assuming that folders have date names, e.g., 20240213)
maxLines
Deprecated in v 6.11.11.01
maximum number of lines to process - up to 2,000,000,000 (2 billion)
collatedtrue/falseDeprecated in v 6.11.11.01
Default = true.  Processing optimizes audience calculation performance.


{
  "method": "ProcessSegments",
  "sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "targetpath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "segspath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/segs/",
  "dirs": [
    "20240318",
    "20240319",
    "20240320",
    "20240321",
    "20240322",
    "20240323",
    "20240324"
  ],
  "finalConvert": true,
  "cleanOnStart": true,
  "sample": true,
  "writekeys": true,
  "numericFoldersOnly": true,
  "ignoreCompressedFolders": true,
  "description": "Process Eyeota Segments",
  "project": "eyeota",
  "tooltip": "Takes raw data in unzipped format and turns into segment files and hash key file"
}

The following shows how to use ProcessSegments with data stored in a sub-folder of the primary folders:

{
  "method": "ProcessSegments",
  "sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "targetpath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "segspath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/segs/",
  "dirs": [
    "20240318",
    "20240319",
    "20240320",
    "20240321",
    "20240322",
    "20240323"
  ],
  "childfolder": "HEMSHA2",
  "finalConvert": true,
  "cleanOnStart": true,
  "sample": false,
  "verbose": false,
  "writekeys": true,
  "description": "Process Eyeota Segments",
  "project": "Q1Patch1Eyeota_Pro"
}


Report File Content

keyvaluedescription
ProcessSegment"6.11.6.1"version of the segment processor used to generate the report
sourcepath"/home/engine/datasources/US/"location of the source data
targetpath"/home/engine/campaignRoot/[realm]/[project]/"Root folder for storage of processed data (i.e., main hash file).  Generally the same as sourcepath.  
segspath"/home/engine/campaignRoot/[realm]/[project]/segs"

Root folder for processed segment files.
childfolder "childfolder_name"name of childfolder, if used.  For example  "HEMSHA2"
dirs[][
"dir1"
"dir2"
"..."
]
list of source folders containing raw data that are eligible for processing
writekeystrue/falseIf true, segment files have been written
checkingtrue/falsereserved for future use
sampletrue/falseIf true, a sample of 
verbosetrue/falseIf true, logging is detailed
counttrue/falsereserved for future use
followonlytrue/falsereserved for future use
collatedtrue/falsereserved for future use
files_processedNnumber of files processed in total
folders_processed[
"Dir1"
"Dir2"
...
]
list of folders included in processing (a subset of dirs[])
folders_processed_lines[
N1
N2
...
]
total number of lines processed in each folder (corresponds to folders_processed)
processedDirs[]["dirN"]name of the folder that stores the segment files (underneath segsPath)
totalLines2000000000total number of lines processed (usually 2 billion)
uniqueLines997217069total number of unique ids extracted from all processed folders (this is the number of records in the primary contact table)
maxLines2000000000
start"2024-11-06 11:43:30"time-stamp of start processing
end"2024-11-06 18:35:53"time-stamp of end processing
durationNnumber of seconds to process segments

Sample reportFile contents:

{
  "ProcessSegment": "2024.7.22.1",
  "sourcepath": "...mft/US/",
  "targetpath": ".../datasource-audience/",
  "segspath": ".../datasource-audience/segs/",
  "childfolder": "HEMSHA2",
  "dirs": [
    "20240707",
    "20240706",
    "20240705",
    "20240704",
    "20240703",
    "20240702",
    "20240701"
  ],
  "writekeys": true,
  "checking": false,
  "sample": false,
  "verbose": false,
  "count": false,
  "followonly": false,
  "collated": true,
  "files_processed": 757,
  "folders_processed": [
    "20240707/HEMSHA2",
    "20240706/HEMSHA2",
    "20240705/HEMSHA2",
    "20240704/HEMSHA2",
    "20240703/HEMSHA2",
    "20240702/HEMSHA2",
    "20240701/HEMSHA2"
  ],
  "folders_processed_lines": [
    535469843,
    836427346,
    109439092,
    598006467,
    738713871,
    46902468,
    93804936
  ],
  "processedDirs": [
    "20240707"
  ],
  "totalLines": 2000000000,
  "uniqueLines": 1179447368,
  "maxLines": 2000000000
  "start": "2024-11-06 11:43:30",
  "end": "2024-11-06 18:35:53",
  "duration": 24743
}
{  
  "sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "targetpath": "/home/engine/datasources/campaignRoot/onetouch-dev01/eyeota-audience/",
  "segspath": "/home/engine/datasources/campaignRoot/onetouch-dev01/eyeota-audience/segs/",
  "childfolder": "HEMSHA2",
  "dirs": [
    "20240508",
    "20240507",
    "20240506",
    "20240505",
    "20240504",
    "20240503",
    "20240502",
    "20240501",
    "20240430",
    "20240429",
    "20240428",
    "20240427",
    "20240426",
    "20240425",
    "20240424"
  ],
  "writekeys": true,
  "checking": false,
  "sample": false,
  "verbose": false,
  "count": false,
  "followonly": false,
  "processedDirs": [
    "20240508",
    "20240507",
    "20240506",
    "20240505",
    "20240504",
    "20240503",
    "20240502"
  ]
}

Was this article helpful?