- 3 Minutes to read
- Print
- DarkLight
- PDF
HOW TO - Upsert data
- 3 Minutes to read
- Print
- DarkLight
- PDF
This article demonstrates using simple scripting methods to upsert data. Upsert is the process of inserting rows into a table if they do not already exist (DataJet refers to this as "appending") and updating rows if they already exist. As an analytical database, DataJet does not support updating records in the way that a transactional database would, however, records can be updated using the following process.
Upsert Data Example Script
The following demonstrates, using 2 source files (file1.xlsx and file2.xlsx), how to create a modified table that has a mixture or unchanged, updated and new records.
- Data is initially loaded into table1.
- Updated data is loaded into table2 - this contains a mixture of new records and updated records.
- Table2 is linked to Table1
- Dataset manipulation is used to identify records in table1 that are unchanged in table2, and these are dumped to a new table called "upserted" - these are the KEEP records.
- All records in Table2 are appended to the new table (ExportIntoTable with append=true). This adds CHANGED records and NEW records to the upserted table.
- At this point, Table1 and Table2 can be deleted.
(Note in the example below a SortTable is executed, so that the upserted records can be clearly seen. SortTable should only be used on very small tables (<100m records, not too many fields))
Pseudo-Code:
Check base table BASE exists
Load UPSERT table
(create keys and join)
//Calculate the records to keep on the BASE table
create KEEPDATASET as field on BASE
//Export Records to keep from BASE into NEWTABLE
ExportIntoTable KEEPDATASET of BASE into NEWTABLE
//Append new and modified records in the UPSERT table into NEWTABLE
ExportIntoTable (APPEND) UPSERT into NEWTABLE
//Rename NEWTABLE to BASE
DeleteTable BASE
ExportIntoTable NEWTABLE into BASE
//Clean up temporary tables
DeleteTable NEWTABLE
DeleteTable UPSERT
Script overview
- NOTE: Most likely CreateTableFromFile will be used in place of CreateTableFromWorkbook
- Do not try to sort very large tables
Script
[
{
"method": "DropProject",
"deleteFiles": true,
"project": "Upsert",
"name": "Upsert"
},
{
"method": "remark",
"text": "notes",
"description": "table1 is the original 20 records.\r\n\r\ntable2 has 10 new records and an update for 5 of the original",
"project": "Upsert"
},
{
"method": "CreateProject",
"project": "Upsert",
"name": "Upsert"
},
{
"method": "OpenProject",
"project": "Upsert",
"name": "Upsert"
},
{
"method": "CreateTableFromWorkbook",
"action": "LOAD",
"table": "table1",
"filename": "%DATAPATH%upsert/file1.xlsx",
"worksheet": "Sheet1",
"headerRow": 1,
"startRow": 0,
"endRow": 0,
"worksheetRowName": "",
"ignoreEmptyRows": true,
"allStringTypes": false,
"leadingZeroAsStringType": false,
"project": "Upsert"
},
{
"method": "remark",
"text": "load upsert file",
"description": "",
"project": "Upsert"
},
{
"method": "CreateTableFromWorkbook",
"action": "LOAD",
"table": "table2",
"filename": "%DATAPATH%upsert/file2.xlsx",
"worksheet": "Sheet1",
"headerRow": 1,
"startRow": 0,
"endRow": 0,
"worksheetRowName": "",
"ignoreEmptyRows": true,
"allStringTypes": false,
"leadingZeroAsStringType": false,
"project": "Upsert"
},
{
"method": "AddJoin",
"primaryTable": "table2",
"primaryField": "custid",
"foreignTable": "table1",
"foreignField": "custid",
"project": "Upsert"
},
{
"method": "remark",
"text": "dump records from table1 that have not changed",
"description": "",
"project": "Upsert"
},
{
"method": "ExportIntoTable",
"table": "upserted",
"append": false,
"fields": [
"table1.custid",
"table1.gender",
"table1.age",
"table1.seq"
],
"dataSet": {
"logic": "or",
"name": "SubSet",
"strict": true,
"set": [
{
"logic": "or",
"stype": "TABLE",
"entity": "table1"
},
{
"logic": "remove",
"stype": "TABLE",
"entity": "table2"
}
]
},
"project": "Upsert"
},
{
"method": "remark",
"text": "append all records from table2",
"description": "",
"project": "Upsert"
},
{
"method": "ExportIntoTable",
"table": "upserted",
"append": true,
"fields": [
"table2.custid",
"table2.gender",
"table2.age",
"table2.seq"
],
"dataSet": {
"logic": "or",
"name": "SubSet",
"strict": true,
"set": [
{
"logic": "or",
"stype": "TABLE",
"entity": "table2"
}
]
},
"project": "Upsert"
},
{
"method": "remark",
"text": "optional sort for clarity of view..",
"description": "",
"project": "Upsert"
},
{
"method": "SortTable",
"table": "upserted",
"sortKey": "custid",
"project": "Upsert"
}
]
Data Files
File1.xlsx
custid | gender | age | seq |
1 | m | 38 | 1 |
2 | f | 24 | 1 |
3 | f | 2 | 1 |
4 | m | 79 | 1 |
5 | m | 54 | 1 |
6 | m | 4 | 1 |
7 | f | 21 | 1 |
8 | m | 58 | 1 |
9 | f | 28 | 1 |
10 | m | 53 | 1 |
11 | m | 72 | 1 |
12 | f | 63 | 1 |
13 | f | 31 | 1 |
14 | f | 77 | 1 |
15 | m | 15 | 1 |
16 | m | 67 | 1 |
17 | m | 39 | 1 |
18 | f | 76 | 1 |
19 | f | 36 | 1 |
20 | f | 43 | 1 |
File2.xlsx
custid | gender | age | seq |
5 | m | 54 | 2 |
6 | m | 4 | 2 |
11 | m | 72 | 2 |
13 | f | 31 | 2 |
14 | f | 77 | 2 |
21 | m | 32 | 2 |
22 | m | 69 | 2 |
23 | m | 66 | 2 |
23 | f | 24 | 2 |
25 | f | 26 | 2 |
26 | m | 54 | 2 |
27 | m | 25 | 2 |
28 | m | 18 | 2 |
29 | f | 85 | 2 |
30 | f | 70 | 2 |