Deep Dive - Datajet Joins
  • 3 Minutes to read
  • Dark
    Light
  • PDF

Deep Dive - Datajet Joins

  • Dark
    Light
  • PDF

Article summary

Article In Progress...

DataJet is a NoSQL Analytical database, as opposed to a Transactional Database.   This means that where it is necessary to work across tables - for example when aggregating, copying or querying data - tables need to be joined ahead of time rather than as part of a SQL statement.  Usually this joining happens as part of the database build, although depending on user permissions, joins can still be created once the database has been "published".

Once a join has been added between tables, cross-table calculations will be very efficient and there will be no need to explicitly provide any further join information.   This strategy is by design to support rapid NoSQL analytics, although the DataJet architecture allows for over-riding default joins on a case by case basis.

NOTE:  Please speak to your system administrator if you need to use something other than the default join provided as part of your data model.

Types of Join

Direct Join

A direct join is a join that goes directly from a Primary Table (ONE side of a join) to a Foreign Table (MANY side of a join).

This is the equivalent of a SQL Inner Join:


Direct Joins support the following cross-table operations:

  • Queries - building query trees to filter data and select records
  • Dataset manipulation - using dataset manipulation and intrinsic functions to calculate inner and outer joins and sample and filter data
  • Aggregation - summarizing linked foreign fields on the primary table - e.g., sum of transactions, customer profit...
  • Copy Up - copying linked records from the FOREIGN table to the PRIMARY table
  • Copy Down - copying linked records from the PRIMARY table to the FOREIGN table


Non-Discrete Primary Keys

If a primary key is not discrete, the primary record that is joined will be arbitrary.   Record selection can be forced by using a filter to build a primary key that keeps the desired record and removes all others.   (See Concat, LMATH (A) and DMATH (A) for more information)


Duplicate values when copying up

To control which record is copied up when using the Copy Up function, use the foreign key filter.  Build a filter that prioritises the correct record, and then apply the filter to the copyup method:

Without a filter, the record that is selected will be arbitrary.  It may be necessary to build a rank field using Engineering | Extensions | Scan Rank.


Compound Join

Compound Joins link 2 foreign tables via an intermediate table:

Intermediate tables can be created (if they don't already exist) using Engineering | Make Key Table.    

Compound Joins are more limited than direct joins and support the following cross-table operations:

  • Queries - building query trees to filter data and select records

Other operations, such as copying data and aggregating will require multiple step operations if crossing a compound join.

Join Keys

A join is created from 2 matching keys, one on the primary table, one on the foreign table.    The most efficient joins are built using Integer keys.

General advice for optimising the building of joins:

  • Integer keys are best.  Next best is LongInteger.    Strings are always slowest.
  • Maximum size of primary key is 1 billion discrete values
  • No practical limit on Foreign Key but greater than 16 billion may have general decrease in performance
  • If a primary key is not unique, join will be to record with first occurrence of a value
  • Field Types must match (ie, string must be joined to string, integer to integer etc)
  • Data types should match.    
  • Use casting functions to create matching FieldDataTypes. 

Join Layers

It is possible to build layered joins between tables via the Engineering | Joins dialog.  Up to 15 layers of join can be added between tables.   To create join layers:

  1. Join the tables (using Add Join, or API:AddJoin)
  2. Add layers using Add Layer or API:AddJoinLayer.   

Note that when resolving a join, the order in which the layers were added will affect results.

For further details on this, see Join Layers

Creating and Deleting Joins

Joins are created and deleted:

  • via the Engineering | Joins dialog
  • in script using AddJoin, DeleteJoin and AddJoinLayer.  

Note it is not possible to delete individual Join Layers.


Key Tables

A Key table is the Primary Table in a Primary -< Foreign relationship.  To create a Key Table, use the Make Key Table functionality.  See Make Key Table for further details.

Join As

This is reserved for future use in order to provide backwards compatibility.

Limits and Constraints

TODO:



Was this article helpful?