HOW TO - Upsert data
  • 3 Minutes to read
  • Dark
    Light
  • PDF

HOW TO - Upsert data

  • Dark
    Light
  • PDF

Article summary

This article demonstrates using simple scripting methods to upsert data.   Upsert is the process of inserting rows into a table if they do not already exist (DataJet refers to this as "appending") and updating rows if they already exist.   As an analytical database, DataJet does not support updating records in the way that a transactional database would, however, records can be updated using the following process.


Upsert Data Example Script

The following demonstrates, using 2 source files (file1.xlsx and file2.xlsx), how to create a modified table that has a mixture or unchanged, updated and new records.

  1. Data is initially loaded into table1.    
  2. Updated data is loaded into table2 - this contains a mixture of new records and updated records.
  3. Table2 is linked to Table1
  4. Dataset manipulation is used to identify records in table1 that are unchanged in table2, and these are dumped to a new table called "upserted" - these are the KEEP records.
  5. All records in Table2 are appended to the new table (ExportIntoTable with append=true).   This adds CHANGED records and NEW records to the upserted table.
  6. At this point, Table1 and Table2 can be deleted.

(Note in the example below a SortTable is executed, so that the upserted records can be clearly seen.   SortTable should only be used on very small tables (<100m records, not too many fields))

Pseudo-Code:

Check base table BASE exists 

Load UPSERT table 

(create keys and join) 

//Calculate the records to keep on the BASE table
create KEEPDATASET as field on BASE

//Export Records to keep from BASE into NEWTABLE
ExportIntoTable KEEPDATASET of BASE into NEWTABLE

//Append new and modified records in the UPSERT table into NEWTABLE
ExportIntoTable (APPEND) UPSERT into NEWTABLE

//Rename NEWTABLE to BASE
DeleteTable BASE
ExportIntoTable NEWTABLE into BASE

//Clean up temporary tables
DeleteTable NEWTABLE
DeleteTable UPSERT


Script overview

  • NOTE:  Most likely CreateTableFromFile will be used in place of CreateTableFromWorkbook
  • Do not try to sort very large tables

Script

[
  {
    "method": "DropProject",
    "deleteFiles": true,
    "project": "Upsert",
    "name": "Upsert"
  },
  {
    "method": "remark",
    "text": "notes",
    "description": "table1 is the original 20 records.\r\n\r\ntable2 has 10 new records and an update for 5 of the original",
    "project": "Upsert"
  },
  {
    "method": "CreateProject",
    "project": "Upsert",
    "name": "Upsert"
  },
  {
    "method": "OpenProject",
    "project": "Upsert",
    "name": "Upsert"
  },
  {
    "method": "CreateTableFromWorkbook",
    "action": "LOAD",
    "table": "table1",
    "filename": "%DATAPATH%upsert/file1.xlsx",
    "worksheet": "Sheet1",
    "headerRow": 1,
    "startRow": 0,
    "endRow": 0,
    "worksheetRowName": "",
    "ignoreEmptyRows": true,
    "allStringTypes": false,
    "leadingZeroAsStringType": false,
    "project": "Upsert"
  },
  {
    "method": "remark",
    "text": "load upsert file",
    "description": "",
    "project": "Upsert"
  },
  {
    "method": "CreateTableFromWorkbook",
    "action": "LOAD",
    "table": "table2",
    "filename": "%DATAPATH%upsert/file2.xlsx",
    "worksheet": "Sheet1",
    "headerRow": 1,
    "startRow": 0,
    "endRow": 0,
    "worksheetRowName": "",
    "ignoreEmptyRows": true,
    "allStringTypes": false,
    "leadingZeroAsStringType": false,
    "project": "Upsert"
  },
  {
    "method": "AddJoin",
    "primaryTable": "table2",
    "primaryField": "custid",
    "foreignTable": "table1",
    "foreignField": "custid",
    "project": "Upsert"
  },
  {
    "method": "remark",
    "text": "dump records from table1 that have not changed",
    "description": "",
    "project": "Upsert"
  },
  {
    "method": "ExportIntoTable",
    "table": "upserted",
    "append": false,
    "fields": [
      "table1.custid",
      "table1.gender",
      "table1.age",
      "table1.seq"
    ],
    "dataSet": {
      "logic": "or",
      "name": "SubSet",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "TABLE",
          "entity": "table1"
        },
        {
          "logic": "remove",
          "stype": "TABLE",
          "entity": "table2"
        }
      ]
    },
    "project": "Upsert"
  },
  {
    "method": "remark",
    "text": "append all records from table2",
    "description": "",
    "project": "Upsert"
  },
  {
    "method": "ExportIntoTable",
    "table": "upserted",
    "append": true,
    "fields": [
      "table2.custid",
      "table2.gender",
      "table2.age",
      "table2.seq"
    ],
    "dataSet": {
      "logic": "or",
      "name": "SubSet",
      "strict": true,
      "set": [
        {
          "logic": "or",
          "stype": "TABLE",
          "entity": "table2"
        }
      ]
    },
    "project": "Upsert"
  },
  {
    "method": "remark",
    "text": "optional sort for clarity of view..",
    "description": "",
    "project": "Upsert"
  },
  {
    "method": "SortTable",
    "table": "upserted",
    "sortKey": "custid",
    "project": "Upsert"
  }
]

Data Files

File1.xlsx                                                                                                                                                                     

custidgenderageseq
1m381
2f241
3f21
4m791
5m541
6m41
7f211
8m581
9f281
10m531
11m721
12f631
13f311
14f771
15m151
16m671
17m391
18f761
19f361
20f431

File2.xlsx                                                                                                                             

custidgenderageseq
5m542
6m42
11m722
13f312
14f772
21m322
22m692
23m662
23f242
25f262
26m542
27m252
28m182
29f852
30f702

Results



Was this article helpful?