ProcessSegments
- 3 Minutes to read
- Print
- DarkLight
- PDF
ProcessSegments
- 3 Minutes to read
- Print
- DarkLight
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback
Loads a file containing arrays of segmentation data into a segment table and creates a file of unique segment IDs
Key | Value(s) | Description |
---|---|---|
method | "ProcessSegments" | Loads a file containing arrays of segmentation data into a segment table and creates a file of unique segment IDs |
sourcepath | "Path" | Path to directory containing raw data to be processed. If data are stored in more than one folder, this should be the root folder immediately above the individual data folders. |
targetpath | "Path" | Root folder for storage of processed data. Generally the same as sourcepath. Location where HASH-KEY file is stored. Filename is maindatafile.dat |
segspath | "Path" | Root folder for processed segment files. |
dirs[] | [ "subfolder1" "subfolder2" "..." ] | List of folders containing raw data. |
childfolder | "FolderName" | optional. name of child folders if the daily folders contain them. |
verbose | true/false | default = false. If true, provides additional logging in "info" section of API response. |
finalConvert | true/false | default = true. If true creates maindatafile.txt from maindatafile.dat, ready for loading into a datajet table. |
cleanOnStart | true/false | default = false. If true, removes maindatafile.dat before starting processing. |
sample | true/false | default = false. If true loads first file in each specified folder. |
writekeys | true/false | default = true. If true, writes out the segment files. Set to false in order to just test generation of the hash-key file. |
numericFoldersOnly | true/false | ignore non-numeric folders when identifying sub-folders to process |
ignoreCompressedFolders | true/false | ignore folders containing only compressed data (*.gz, *.zip, *.rar) |
project | ||
lastFolder | maximum number of folders to process, starting with the most recent folder name and going backwards (assuming that folders have date names, e.g., 20240213) | |
maxLines | Deprecated in v 6.11.11.01 maximum number of lines to process - up to 2,000,000,000 (2 billion) | |
collated | true/false | Deprecated in v 6.11.11.01 Default = true. Processing optimizes audience calculation performance. |
{
"method": "ProcessSegments",
"sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
"targetpath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
"segspath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/segs/",
"dirs": [
"20240318",
"20240319",
"20240320",
"20240321",
"20240322",
"20240323",
"20240324"
],
"finalConvert": true,
"cleanOnStart": true,
"sample": true,
"writekeys": true,
"numericFoldersOnly": true,
"ignoreCompressedFolders": true,
"description": "Process Eyeota Segments",
"project": "eyeota",
"tooltip": "Takes raw data in unzipped format and turns into segment files and hash key file"
}
The following shows how to use ProcessSegments with data stored in a sub-folder of the primary folders:
{
"method": "ProcessSegments",
"sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
"targetpath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
"segspath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/segs/",
"dirs": [
"20240318",
"20240319",
"20240320",
"20240321",
"20240322",
"20240323"
],
"childfolder": "HEMSHA2",
"finalConvert": true,
"cleanOnStart": true,
"sample": false,
"verbose": false,
"writekeys": true,
"description": "Process Eyeota Segments",
"project": "Q1Patch1Eyeota_Pro"
}
Report File Content
key | value | description |
---|---|---|
ProcessSegment | "6.11.6.1" | version of the segment processor used to generate the report |
sourcepath | "/home/engine/datasources/US/" | location of the source data |
targetpath | "/home/engine/campaignRoot/[realm]/[project]/" | Root folder for storage of processed data (i.e., main hash file). Generally the same as sourcepath. |
segspath | "/home/engine/campaignRoot/[realm]/[project]/segs" | Root folder for processed segment files. |
childfolder | "childfolder_name" | name of childfolder, if used. For example "HEMSHA2" |
dirs[] | [ "dir1" "dir2" "..." ] | list of source folders containing raw data that are eligible for processing |
writekeys | true/false | If true, segment files have been written |
checking | true/false | reserved for future use |
sample | true/false | If true, a sample of |
verbose | true/false | If true, logging is detailed |
count | true/false | reserved for future use |
followonly | true/false | reserved for future use |
collated | true/false | reserved for future use |
files_processed | N | number of files processed in total |
folders_processed | [ "Dir1" "Dir2" ... ] | list of folders included in processing (a subset of dirs[]) |
folders_processed_lines | [ N1 N2 ... ] | total number of lines processed in each folder (corresponds to folders_processed) |
processedDirs[] | ["dirN"] | name of the folder that stores the segment files (underneath segsPath) |
totalLines | 2000000000 | total number of lines processed (usually 2 billion) |
uniqueLines | 997217069 | total number of unique ids extracted from all processed folders (this is the number of records in the primary contact table) |
maxLines | 2000000000 | |
start | "2024-11-06 11:43:30" | time-stamp of start processing |
end | "2024-11-06 18:35:53" | time-stamp of end processing |
duration | N | number of seconds to process segments |
Sample reportFile contents:
{
"ProcessSegment": "2024.7.22.1",
"sourcepath": "...mft/US/",
"targetpath": ".../datasource-audience/",
"segspath": ".../datasource-audience/segs/",
"childfolder": "HEMSHA2",
"dirs": [
"20240707",
"20240706",
"20240705",
"20240704",
"20240703",
"20240702",
"20240701"
],
"writekeys": true,
"checking": false,
"sample": false,
"verbose": false,
"count": false,
"followonly": false,
"collated": true,
"files_processed": 757,
"folders_processed": [
"20240707/HEMSHA2",
"20240706/HEMSHA2",
"20240705/HEMSHA2",
"20240704/HEMSHA2",
"20240703/HEMSHA2",
"20240702/HEMSHA2",
"20240701/HEMSHA2"
],
"folders_processed_lines": [
535469843,
836427346,
109439092,
598006467,
738713871,
46902468,
93804936
],
"processedDirs": [
"20240707"
],
"totalLines": 2000000000,
"uniqueLines": 1179447368,
"maxLines": 2000000000
"start": "2024-11-06 11:43:30",
"end": "2024-11-06 18:35:53",
"duration": 24743
}
{
"sourcepath": "/home/engine/datasources/OneTouch/Eyeota/mft/US/",
"targetpath": "/home/engine/datasources/campaignRoot/onetouch-dev01/eyeota-audience/",
"segspath": "/home/engine/datasources/campaignRoot/onetouch-dev01/eyeota-audience/segs/",
"childfolder": "HEMSHA2",
"dirs": [
"20240508",
"20240507",
"20240506",
"20240505",
"20240504",
"20240503",
"20240502",
"20240501",
"20240430",
"20240429",
"20240428",
"20240427",
"20240426",
"20240425",
"20240424"
],
"writekeys": true,
"checking": false,
"sample": false,
"verbose": false,
"count": false,
"followonly": false,
"processedDirs": [
"20240508",
"20240507",
"20240506",
"20240505",
"20240504",
"20240503",
"20240502"
]
}
Was this article helpful?