ProcessSegments

Loads a file containing arrays of segmentation data into a segment table and creates a file of unique segment IDs

Key	Value(s)	Description
method	"ProcessSegments"	Loads a file containing arrays of segmentation data into a segment table and creates a file of unique segment IDs
sourcepath	"Path"	Path to directory containing raw data to be processed. If data are stored in more than one folder, this should be the root folder immediately above the individual data folders.
targetpath	"Path"	Root folder for storage of processed data. Generally the same as sourcepath. Location where HASH-KEY file is stored. Filename is maindatafile.dat
segspath	"Path"	Root folder for processed segment files.
dirs[]	[ "subfolder1" "subfolder2" "..." ]	List of folders containing raw data.
childfolder	"FolderName"	optional. name of child folders if the daily folders contain them.
verbose	true/false	default = false. If true, provides additional logging in "info" section of API response.
finalConvert	true/false	default = true. If true creates maindatafile.txt from maindatafile.dat, ready for loading into a datajet table.
cleanOnStart	true/false	default = false. If true, removes maindatafile.dat before starting processing.
sample	true/false	default = false. If true loads first file in each specified folder.
writekeys	true/false	default = true. If true, writes out the segment files. Set to false in order to just test generation of the hash-key file.
numericFoldersOnly	true/false	ignore non-numeric folders when identifying sub-folders to process
ignoreCompressedFolders	true/false	ignore folders containing only compressed data (.gz, .zip, *.rar)
project
lastFolder		maximum number of folders to process, starting with the most recent folder name and going backwards (assuming that folders have date names, e.g., 20240213)
maxLines		Deprecated in v 6.11.11.01 maximum number of lines to process - up to 2,000,000,000 (2 billion)
collated	true/false	Deprecated in v 6.11.11.01 Default = true. Processing optimizes audience calculation performance.

{
  "method": "ProcessSegments",
  "sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "targetpath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "segspath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/segs/",
  "dirs": [
    "20240318",
    "20240319",
    "20240320",
    "20240321",
    "20240322",
    "20240323",
    "20240324"
  ],
  "finalConvert": true,
  "cleanOnStart": true,
  "sample": true,
  "writekeys": true,
  "numericFoldersOnly": true,
  "ignoreCompressedFolders": true,
  "description": "Process Eyeota Segments",
  "project": "eyeota",
  "tooltip": "Takes raw data in unzipped format and turns into segment files and hash key file"
}

The following shows how to use ProcessSegments with data stored in a sub-folder of the primary folders:

{
  "method": "ProcessSegments",
  "sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "targetpath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "segspath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/segs/",
  "dirs": [
    "20240318",
    "20240319",
    "20240320",
    "20240321",
    "20240322",
    "20240323"
  ],
  "childfolder": "HEMSHA2",
  "finalConvert": true,
  "cleanOnStart": true,
  "sample": false,
  "verbose": false,
  "writekeys": true,
  "description": "Process Eyeota Segments",
  "project": "Q1Patch1Eyeota_Pro"
}

Report File Content

key	value	description
ProcessSegment	"6.11.6.1"	version of the segment processor used to generate the report
sourcepath	"/home/engine/datasources/US/"	location of the source data
targetpath	"/home/engine/campaignRoot/[realm]/[project]/"	Root folder for storage of processed data (i.e., main hash file). Generally the same as sourcepath.
segspath	"/home/engine/campaignRoot/[realm]/[project]/segs"	Root folder for processed segment files.
childfolder	"childfolder_name"	name of childfolder, if used. For example "HEMSHA2"
dirs[]	[ "dir1" "dir2" "..." ]	list of source folders containing raw data that are eligible for processing
writekeys	true/false	If true, segment files have been written
checking	true/false	reserved for future use
sample	true/false	If true, a sample of
verbose	true/false	If true, logging is detailed
count	true/false	reserved for future use
followonly	true/false	reserved for future use
collated	true/false	reserved for future use
files_processed	N	number of files processed in total
folders_processed	[ "Dir1" "Dir2" ... ]	list of folders included in processing (a subset of dirs[])
folders_processed_lines	[ N1 N2 ... ]	total number of lines processed in each folder (corresponds to folders_processed)
processedDirs[]	["dirN"]	name of the folder that stores the segment files (underneath segsPath)
totalLines	2000000000	total number of lines processed (usually 2 billion)
uniqueLines	997217069	total number of unique ids extracted from all processed folders (this is the number of records in the primary contact table)
maxLines	2000000000
start	"2024-11-06 11:43:30"	time-stamp of start processing
end	"2024-11-06 18:35:53"	time-stamp of end processing
duration	N	number of seconds to process segments

Sample reportFile contents:

{
  "ProcessSegment": "2024.7.22.1",
  "sourcepath": "...mft/US/",
  "targetpath": ".../datasource-audience/",
  "segspath": ".../datasource-audience/segs/",
  "childfolder": "HEMSHA2",
  "dirs": [
    "20240707",
    "20240706",
    "20240705",
    "20240704",
    "20240703",
    "20240702",
    "20240701"
  ],
  "writekeys": true,
  "checking": false,
  "sample": false,
  "verbose": false,
  "count": false,
  "followonly": false,
  "collated": true,
  "files_processed": 757,
  "folders_processed": [
    "20240707/HEMSHA2",
    "20240706/HEMSHA2",
    "20240705/HEMSHA2",
    "20240704/HEMSHA2",
    "20240703/HEMSHA2",
    "20240702/HEMSHA2",
    "20240701/HEMSHA2"
  ],
  "folders_processed_lines": [
    535469843,
    836427346,
    109439092,
    598006467,
    738713871,
    46902468,
    93804936
  ],
  "processedDirs": [
    "20240707"
  ],
  "totalLines": 2000000000,
  "uniqueLines": 1179447368,
  "maxLines": 2000000000
  "start": "2024-11-06 11:43:30",
  "end": "2024-11-06 18:35:53",
  "duration": 24743
}

{  
  "sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
  "targetpath": "/home/engine/datasources/campaignRoot/onetouch-dev01/eyeota-audience/",
  "segspath": "/home/engine/datasources/campaignRoot/onetouch-dev01/eyeota-audience/segs/",
  "childfolder": "HEMSHA2",
  "dirs": [
    "20240508",
    "20240507",
    "20240506",
    "20240505",
    "20240504",
    "20240503",
    "20240502",
    "20240501",
    "20240430",
    "20240429",
    "20240428",
    "20240427",
    "20240426",
    "20240425",
    "20240424"
  ],
  "writekeys": true,
  "checking": false,
  "sample": false,
  "verbose": false,
  "count": false,
  "followonly": false,
  "processedDirs": [
    "20240508",
    "20240507",
    "20240506",
    "20240505",
    "20240504",
    "20240503",
    "20240502"
  ]
}