Introduction to Data Injection
  • 9 Minutes to read
  • Dark
    Light
  • PDF

Introduction to Data Injection

  • Dark
    Light
  • PDF

Article summary

Overview

What is injected data?

Injected data is data that sits in one project but is accessed by other projects (and potentially realms) in a read-only capacity.  Injecting data therefore allows data to be shared across projects and realms.   Depending on system architecture, data may be injected across:

  • projects on the same realm, 
  • projects on different realms/DJ servers.

Injecting data avoids loading multiple instances of the same dataset.  Using this technique, multiple projects can use the same data source.  This is particularly useful when working with large datasources that take multiple hours to load and require a lot of storage.

Once the injection has completed, the injected table will appear in the consumer project and can then be used to engineer and derive further data.  Injected tables are green, and the injected fields are read-only within the consumer project. However, new fields can be created on an injected table.

Example: sales table injected from Demonstration project into BikeData2 project:

Terminology

Core concepts

  • Project - A project is a collection of data objects (tables, datasets, templates, collections), reports, joins, dashboards and scripts
  • Package - a package is a set of files containing instructions for using data from one project within another
  • Realm - a realm is a collection of data (consisting of projects, metadata and data objects) and users that together provide a functional and siloed domain of operation.   
  • Storage Hub - the storage hub is a file-server that stores packages and scripts for sharing across realms and projects.

Projects

Projects can be of the following types:

  1. Native projects - these are projects that do not contain any injected data.  They contain only local data.
  2. Consumer projects - these are projects that are injecting or sharing data from one or more packages.  They may contain a mix of local and injected data.
  3. Source projects - these are projects that have been used to create a package.   They may be native projects, or consumer projects.  

Projects can have:

  1. Input Packages - packages that are injected into a project
  2. Output Packages - packages that are created from a project

Packages

Packages can be of the following composition types

  1. Simple/1st Tier - a package is created from a native source project (i.e., a project containing only local data).  
  2. Compound/Multi-Tier - package is created from a source project that contains at least one injected package

An orphan package is a consumer package that contains an injected package whose source project or data is no longer valid or no longer exists.

Data Types

The following types of data can be seen in projects:

  1. Local - data that has been loaded directly into the project, either from file or other data source.    Local data is owned by the project where it was loaded.
  2. Injected - data that has been injected via a package, and is actually owned by the package's source project.  Injected data is displayed as green.


Realms

2 types of realm configuration are possible.

  1. Single-realm - packages are created and used on the same server
  2. Cross-realm - packages are created on one server, and consumed on another.  Only available if storage hub is configured.

Usage Notes

  • Deleting the injected table in the source project will also remove it from all consumer projects
  • To remove an injected table, right-click the table in Database Explorer and choose "detach".   Alternatively use the DetachTable API.
  • It is possible to create derived data on an injected table within a consumer project.
  • Derived data that has been created on an injected table will remain on the table even after the table has been detached. This means that detaching a table may not delete the table in the consumer project, but it will change from green to black.
  • Additions and changes to the source table will NOT be reflected in the injected table.    It is important to have a process in place that prevents source data from being modified once it has been injected in another project.
  • From v 6.11.11.01 onwards, consumer projects will display a notification if a package they are using has been modified or deleted.  (See GetDependencies and Release Notes for more information)

Approach

The overall approach to sharing data is always the same:

  1. Prepare Source Data 
    1. Create, Load, Engineer
  2. Create Injection Packages 
    1. Create a source project
    2. Make sure data files are accessible from Consumer machines.  
    3. Use CreatePackage method to add a package to the storage hub
  3. Prepare Consumer Project to receive data
    1. Create Consumer Project
  4. Inject Data into Consumer Project(s)
    1. InjectPackage - Injects a set of tables, joins and reports into a project on source realm or other connected realms


Package Creation

Packages are created from source projects and can be created either from the user interface, or in script. The source project's data is not changed in any way by the creation of a package, other than to flag the project as having been used as a package source.

The creation of a package is very quick, as no data is altered or moved.

User Interface

To create a package from the application, do the following:

  1. Open the project that contains the data to add to the package.
  2. Select Project | Create Package
  3. Provide a name for the package
  4. The package will be added to the storage hub.

Scripting

To create a package using script, use the CreatePackage API call.

{
  "method": "CreatePackage",
  "includeProjectMetadata": true,
  "createMetafile": true,
  "autoLock": false,
  "push": true,
  "name": "BProject22024-10-31-13-57-26",
  "targetProject": "BProject2",
  "project": "BProject2"
}
Tip:
To see the CreatePackage API method that is generated by Project | CreatePackage:
  1. Open Script | Script Editor
  2. Select or create a new script (File | New)
  3. Press RECORD
  4. Select Project | CreatePackage
  5. Enter project name
  6. Save
  7. Press STOP

The CreatePackage API call will be added to the new script.

The data made available in the package are controlled via the tables[] and reports[] properties.   By default, all project information is included in the package file.  

Package Injection

Packages are injected using the InjectPackage API:

{
  "method": "InjectPackage",
  "name": "Consumer_Package1",
  "targetProject": "Injection_Test_Basic",
  "includeProjectMetaData": true,
  "sourceName": "Test Consumer Project",
  "sortMode": 0,
  "errorOnMissingDataFiles": false,
  "pull": true,
  "project": "CONSUMER_Test1"
}

This method injects a static set of Data Objects and Data into the target (or consumer) project.   InjectPackage works across:

  • Projects on the same realm
  • Projects on different servers (and hence different realms)  (assuming all realms have access to the repository data)

The process is as follows:

  1. Prepare Data -create a project containing the data that is to be injected
  2. Create the package and make sure the project data (i.e., the project repository) is accessible from all consumer realms
  3. Run InjectPackage on the target realm

See InjectPackage for more details

Package Injection/Creation Combinations

OptionCreatePackageInjectPackageNotes
Package NameTo store a package in the hub, use the "name" key and set "push" to true.

To use a package created using the name method, use the "name" key in InjectPackage
"pull" must be set to true
Storage hub must be set up in Client Configuration File in order to use named packages
Package File and/or Path"path" must be included.
  • If "path" is a short filename, the default path will be used, which is %OUTPUT%, 
    • e.g., "path": NewPackageFileName.zip"
  • Otherwise a full path and filename can be specified. 
    • e.g., "path": "%DATAPATH%NewPackageFileName.zip"
"push" must be false or omitted
"path" must be included.
  • If "path" is a short filename, the default path will be used, which is %OUTPUT%, 
    • e.g., "path": NewPackageFileName.zip"
  • Otherwise a full path and filename can be specified. 
    • e.g., "path": "%DATAPATH%NewPackageFileName.zip"
"homePath" must be provided.
  • "homePath": "%REPOS%Projectname"
To inject everything omit Reports[], Tables[] etc

API Calls

Invalid and Out of Date Packages

Dependencies are added to a project by the following methods:

Once a dependency has been added to a project, if the underlying file (package file or excel file) changes, the file will be flagged as a changed dependency. 

The following are all classed as a changed dependency:

  • Changed date-time stamp on file
  • Missing workbook or package file
  • Changed size of file
  • Mismatch between package and underlying source project - NOTE: this is not detected at the time of injection, but will result in the error message "View failed, incomplete table" when attempting to view a DataView from an out-of-synch injected table.   To resolve this, rebuild the package file, and re-inject into the consumer project.   This usually happens if a source project has been rebuilt without rebuilding or removing any packages that depended on data in the old project.
Tip!
Keep a record of packages and their hierarchies - there is no technical protection against ad-hoc data injection processes...

Use the following to track down issues with compound packages:
  • GetDependencies - shows which packages have been injected into a particular project
  • VerifyPackage - shows any data objects (fields + joins) in a package which do not have underlying valid data files.


If a package file is rebuilt, all consumer projects that are using that file will display a dependency notification when they are opened:

 


A notification icon will be displayed in the project explorer:


To get a detailed breakdown of a project's package and file dependencies, run GetDependencies in Script Editor or Runner:


To trouble shoot or verify the status of packages, use VerifyPackage:


Refreshing a package

If a notification is displayed when opening a project, this means some source data on which the project relies has potentially changed.  The following scenarios are all possible when working with injected data:

ScenarioSymptomsResolution
Source project is missing dataNO NOTIFICATION WILL BE SHOWN
Attempting to view data will give the message "View failed, incomplete table".  Note, project explorer may still display data accurately, so it is important to view actual records.
  1. Rebuild source project
  2. Rebuild package
  3. Rebuild consumer project
Source project has been rebuilt since it was last injectedTables will be missing from the consumer project, or will have changed from injected (green) to local (black).
  1. Rebuild package
  2. Rebuild consumer project

Source project has been deleted

Tables will be missing from the consumer project, or will have changed from injected (green) to local (black).
  1. Rebuild source project
  2. Rebuild package
  3. Rebuild consumer project
Source project has a mismatch between metadata and package fileIf there has been a process error during package creation, package files and their metadata description could become disconnected.  If this happens, there will be a mismatch between tree-view counts and live counts created in datasets or analysis reports and there could be an issue injecting the package.
  1. Rebuild package
  2. Rebuild consumer project

Source project has reached it’s expiry dataNOT YET IMPLEMENTED
New Packages are available (i.e., Package has been rebuilt since it was last injected)A dependency notification will be displayed.  
  • The consumer project needs to re-inject, and re-engineer any data from the package(s).
  • NOTE: the dependency notifications will remain unless the project is dropped as part of the rebuild. If a consumer project is NOT dropped (i.e., uses one or more of DetachTable, DetachAllTables, DeleteTable, DetachPackage) then project dependencies will need to be reset using DeleteDependencies.   This should happen at the start of the consumer project rebuild, before any packages are re-injected.
Package has been deleted or movedA dependency notification will be displayed.  
  • TODO: The consumer project will continue to work, so long as the underlying source data has not been deleted, moved or edited.

Package structure and architecture

For more information on package structure and content, see Packages.

Package Management and Administration

Packages can have the following status:

  • Online - package file is ready for use
  • Invalid - package fits one of the following criteria
    • Source project is missing
    • Source project has been reloaded since package was built
    • Package depends on another package that is itself invalid

The recommended Package Injection/Creation method is to use named packages and the storage hub, as this gives information to users who want to inject via the package manager:


Packages and A-B switching

Packages and A-B switching are used when source projects need to be rebuilt without causing any down-time to client/consumer projects that are using the data.

A container package (DSource_Package) is used as a consistent consumer project that will be injected into other projects.


Metadata

Metadata that is applied in a source project will be automatically available in all consumer projects that inject the package.

Trouble-shooting

SymptomCauseResolution
"View Failed. Incomplete Table" when viewing a dataset from an injected tableUnderlying source project has changed.
  1. Rebuild source project
  2. Rebuild package
  3. Rebuild consumer project
Package tables a long time to inject

Can't detach injected table

Can't delete injected table

A dependency notification is displayed when project is opened









Was this article helpful?