Checks a set of files in a directory for consistent delimiters on each input line
Key | Value(s) | Description |
---|---|---|
method | "MultiDelimiterCheck" | Checks a file or a set of files in a directory for consistent delimiters on each input line |
filename | "Path" | Required. The path to the directory containing files to check for delimiter consistency. Supports environment variables (e.g., %DATAPATH%). Example: "%DATAPATH%Regression2/Rg2-Audience/target/*" |
filters[] | [ "ext1", "ext2" ] | Optional. An array of strings used to filter files within the path (e.g., file extensions like "dat", "csv", or wildcard patterns). Example: ["csv", "dat"] or ["*.csv"] |
delimiter | "Delimiter" | Required. The delimiter to check for in the file. Can be specified as a Unicode character (e.g., "\u0001" for SOH) or a string (e.g., "TAB" for tab character).
|
samplesize | "SampleSizeNumber" | Optional. Number of lines to sample for delimiter consistency check.
|
qualifier | "QualifierString" | Optional. The character used to qualify (wrap) fields, typically a quote (e.g., "\""). Helps handle delimiters within fields (e.g., commas in quoted strings) and issues like quotes inside quotes or unbalanced quotes. NOTE: Run MultiDelimiterCheck WITHOUT QualifierString - if "Result" returns more than one set of counts, the data needs to be loaded with qualifiers to ensure consistent loading. |
expected | "ExpectedDelimiterCount" | Optional. An integer specifying the exact number of delimiters expected on each line. If provided, lines with a different count are flagged. |
throwError | true/false | Boolean indicating whether to throw an error if delimiter inconsistencies are found. If true, the process generates an error; if false (or omitted), errors are reported in the output but processing continues. |
Scans the directory specified by path, filtering files based on the optional filters array. For each matching file, reads lines (up to sampleSize, or all lines if sampleSize is 0 or omitted) and counts the occurrences of the specified delimiter.
Returns a result object detailing the check for each file, including delimiter counts found, line counts, file size, and a success flag per file. An overall success flag indicates if all checked files passed. If expected is used and inconsistencies are found, the result array "unexpected" lists the line numbers and their deviating delimiter counts for failing files.
"result": [
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_0_2_0.csv",
"result": [
[19, 3993919] // 3,993,919 lines have 19 delimiters
],
"fileSize": 1439981805,
"lines": 3993919,
"success": true
}
],
Qualifiers
If a delimiter check reports inconsistent delimiter counts across lines (e.g., MultiDelimiterCheck fails), verify whether text qualifiers (like quotes " enclosing fields) are the actual cause. Delimiters appearing inside these qualified fields can be miscounted if the qualifier parameter isn't specified in the check. Rerun the check providing the correct qualifier (e.g., "qualifier": "\""). If the check then succeeds, the initial failure was due to qualifier handling, not truly inconsistent delimiters; ensure this same qualifier is also specified when subsequently loading the data (e.g., using CreateTableFromFile).
Files Run with Qualifiers:
{
"method": "MultiDelimiterCheck",
"path": "D:/datajet/datasources/preload/transactions",
"delimiter": "|",
"sampleSize": 0,
"throwError": true
}
{
"result": [
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_0_2_0.csv",
"result": [
[
19,
3993768
],
[
20,
148
],
[
21,
2
],
[
22,
1
]
],
"fileSize": 1439981805,
"lines": 3993919,
"success": false
},
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_0_3_0.csv",
"result": [
[
19,
4153219
],
[
20,
125
],
[
22,
2
]
],
"fileSize": 1456094688,
"lines": 4153346,
"success": false
},
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_1_6_1.csv",
"result": [
[
19,
7710310
],
[
20,
257
],
[
21,
2
],
[
22,
7
]
],
"fileSize": 2666346502,
"lines": 7710576,
"success": false
}
],
"success": false,
"files": 3,
"executed": true,
"errors": [],
"millis": 11895,
"projectEpoch": 638801677496440520,
"method": "MultiDelimiterCheck",
"exmillis": 11916
}
{
"file": "D:\\datajet\\datasources\\preload\\s_data_0_2_0.csv",
"result": [
[19, 3993919] // All 3,993,919 lines have 19 delimiters
],
"fileSize": 1439981805,
"lines": 3993919,
"success": true
}
Files Run without Qualifiers:
{
"method": "MultiDelimiterCheck",
"path": "D:/datajet/datasources/preload/transactions",
"delimiter": "|",
"sampleSize": 0,
"qualifier": "\"",
"throwError": true
}
{
"result": [
{
"file": "D:\\datajet\\datasources\\preload\\s_data_0_2_0.csv",
"result": [
[
19,
3993919
]
],
"fileSize": 1439981805,
"lines": 3993919,
"success": true
},
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_0_3_0.csv",
"result": [
[
19,
4153346
]
],
"fileSize": 1456094688,
"lines": 4153346,
"success": true
},
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_1_6_1.csv",
"result": [
[
19,
7710576
]
],
"fileSize": 2666346502,
"lines": 7710576,
"success": true
}
],
"success": true,
"files": 3,
"executed": true,
"errors": [],
"millis": 13290,
"projectEpoch": 638801677496440520,
"method": "MultiDelimiterCheck",
"exmillis": 13307
}
{
"file": "D:\\datajet\\datasources\\preload\\transactions\\s_data_0_2_0.csv",
"result": [
[19, 3993768], // 3,993,768 lines have 19 delimiters
[20, 148], // 148 lines have 20 delimiters
[21, 2], // 2 lines have 21 delimiters
[22, 1] // 1 line has 22 delimiters
],
"fileSize": 1439981805,
"lines": 3993919,
"success": false
}