- 9 Minutes to read
- Print
- DarkLight
- PDF
Introduction to Data Injection
- 9 Minutes to read
- Print
- DarkLight
- PDF
Overview
What is injected data?
Injected data is data that sits in one project but is accessed by other projects (and potentially realms) in a read-only capacity. Injecting data therefore allows data to be shared across projects and realms. Depending on system architecture, data may be injected across:
- projects on the same realm,
- projects on different realms/DJ servers.
Injecting data avoids loading multiple instances of the same dataset. Using this technique, multiple projects can use the same data source. This is particularly useful when working with large datasources that take multiple hours to load and require a lot of storage.
Once the injection has completed, the injected table will appear in the consumer project and can then be used to engineer and derive further data. Injected tables are green, and the injected fields are read-only within the consumer project. However, new fields can be created on an injected table.
Example: sales table injected from Demonstration project into BikeData2 project:
Terminology
Core concepts
- Project - A project is a collection of data objects (tables, datasets, templates, collections), reports, joins, dashboards and scripts
- Package - a package is a set of files containing instructions for using data from one project within another
- Realm - a realm is a collection of data (consisting of projects, metadata and data objects) and users that together provide a functional and siloed domain of operation.
- Storage Hub - the storage hub is a file-server that stores packages and scripts for sharing across realms and projects.
Projects
Projects can be of the following types:
- Native projects - these are projects that do not contain any injected data. They contain only local data.
- Consumer projects - these are projects that are injecting or sharing data from one or more packages. They may contain a mix of local and injected data.
- Source projects - these are projects that have been used to create a package. They may be native projects, or consumer projects.
Projects can have:
- Input Packages - packages that are injected into a project
- Output Packages - packages that are created from a project
Packages
Packages can be of the following composition types
- Simple/1st Tier - a package is created from a native source project (i.e., a project containing only local data).
- Compound/Multi-Tier - package is created from a source project that contains at least one injected package
An orphan package is a consumer package that contains an injected package whose source project or data is no longer valid or no longer exists.
Data Types
The following types of data can be seen in projects:
- Local - data that has been loaded directly into the project, either from file or other data source. Local data is owned by the project where it was loaded.
- Injected - data that has been injected via a package, and is actually owned by the package's source project. Injected data is displayed as green.
Realms
2 types of realm configuration are possible.
- Single-realm - packages are created and used on the same server
- Cross-realm - packages are created on one server, and consumed on another. Only available if storage hub is configured.
Usage Notes
- Deleting the injected table in the source project will also remove it from all consumer projects
- To remove an injected table, right-click the table in Database Explorer and choose "detach". Alternatively use the DetachTable API.
- It is possible to create derived data on an injected table within a consumer project.
- Derived data that has been created on an injected table will remain on the table even after the table has been detached. This means that detaching a table may not delete the table in the consumer project, but it will change from green to black.
- Additions and changes to the source table will NOT be reflected in the injected table. It is important to have a process in place that prevents source data from being modified once it has been injected in another project.
- From v 6.11.11.01 onwards, consumer projects will display a notification if a package they are using has been modified or deleted. (See GetDependencies and Release Notes for more information)
Approach
The overall approach to sharing data is always the same:
- Prepare Source Data
- Create, Load, Engineer
- Create Injection Packages
- Create a source project
- Make sure data files are accessible from Consumer machines.
- Use CreatePackage method to add a package to the storage hub
- Prepare Consumer Project to receive data
- Create Consumer Project
- Inject Data into Consumer Project(s)
- InjectPackage - Injects a set of tables, joins and reports into a project on source realm or other connected realms
Package Creation
Packages are created from source projects and can be created either from the user interface, or in script. The source project's data is not changed in any way by the creation of a package, other than to flag the project as having been used as a package source.
The creation of a package is very quick, as no data is altered or moved.
User Interface
To create a package from the application, do the following:
- Open the project that contains the data to add to the package.
- Select Project | Create Package
- Provide a name for the package
- The package will be added to the storage hub.
Scripting
To create a package using script, use the CreatePackage API call.
{
"method": "CreatePackage",
"includeProjectMetadata": true,
"createMetafile": true,
"autoLock": false,
"push": true,
"name": "BProject22024-10-31-13-57-26",
"targetProject": "BProject2",
"project": "BProject2"
}
- Open Script | Script Editor
- Select or create a new script (File | New)
- Press RECORD
- Select Project | CreatePackage
- Enter project name
- Save
- Press STOP
The CreatePackage API call will be added to the new script.
The data made available in the package are controlled via the tables[] and reports[] properties. By default, all project information is included in the package file.
Package Injection
Packages are injected using the InjectPackage API:
{
"method": "InjectPackage",
"name": "Consumer_Package1",
"targetProject": "Injection_Test_Basic",
"includeProjectMetaData": true,
"sourceName": "Test Consumer Project",
"sortMode": 0,
"errorOnMissingDataFiles": false,
"pull": true,
"project": "CONSUMER_Test1"
}
This method injects a static set of Data Objects and Data into the target (or consumer) project. InjectPackage works across:
- Projects on the same realm
- Projects on different servers (and hence different realms) (assuming all realms have access to the repository data)
The process is as follows:
- Prepare Data -create a project containing the data that is to be injected
- Create the package and make sure the project data (i.e., the project repository) is accessible from all consumer realms
- Run InjectPackage on the target realm
See InjectPackage for more details
Package Injection/Creation Combinations
Option | CreatePackage | InjectPackage | Notes |
---|---|---|---|
Package Name | To store a package in the hub, use the "name" key and set "push" to true. | To use a package created using the name method, use the "name" key in InjectPackage "pull" must be set to true | Storage hub must be set up in Client Configuration File in order to use named packages |
Package File and/or Path | "path" must be included.
| "path" must be included.
| To inject everything omit Reports[], Tables[] etc |
API Calls
- CreatePackage - Creates a set of files and zip file that can be used to inject shared data definitions into consumer projects.
- DeleteDependencies - Deletes dependencies from a project's dependency list
- DeletePackage - Deletes a named package from the storage hub
- DetachAllTables - Detaches all injected tables from a project.
- DetachPackage - Detaches all items injected from the named package
- DetachTable - Removes an injected/shared table from a project
- GetDependencies - Returns a list of project dependencies
- InjectPackage - Injects a shared data package that has been created using CreatePackage.
- VerifyPackage - Verifies a package's status
Invalid and Out of Date Packages
Dependencies are added to a project by the following methods:
- InjectPackage - a dependency is created on the package file
- exceldecode (from worksheet) - a dependency is created on the excel workbook (key: workbook)
- BulkDecode - a dependency is created on the excel workbook
- BulkBanding - a dependency is created on the excel workbook
- CreateTableFromWorkbook - a dependency is created on the excel workbook
Once a dependency has been added to a project, if the underlying file (package file or excel file) changes, the file will be flagged as a changed dependency.
The following are all classed as a changed dependency:
- Changed date-time stamp on file
- Missing workbook or package file
- Changed size of file
- Mismatch between package and underlying source project - NOTE: this is not detected at the time of injection, but will result in the error message "View failed, incomplete table" when attempting to view a DataView from an out-of-synch injected table. To resolve this, rebuild the package file, and re-inject into the consumer project. This usually happens if a source project has been rebuilt without rebuilding or removing any packages that depended on data in the old project.
Use the following to track down issues with compound packages:
- GetDependencies - shows which packages have been injected into a particular project
- VerifyPackage - shows any data objects (fields + joins) in a package which do not have underlying valid data files.
If a package file is rebuilt, all consumer projects that are using that file will display a dependency notification when they are opened:
A notification icon will be displayed in the project explorer:
To get a detailed breakdown of a project's package and file dependencies, run GetDependencies in Script Editor or Runner:
To trouble shoot or verify the status of packages, use VerifyPackage:
Refreshing a package
If a notification is displayed when opening a project, this means some source data on which the project relies has potentially changed. The following scenarios are all possible when working with injected data:
Scenario | Symptoms | Resolution |
---|---|---|
Source project is missing data | NO NOTIFICATION WILL BE SHOWN Attempting to view data will give the message "View failed, incomplete table". Note, project explorer may still display data accurately, so it is important to view actual records. |
|
Source project has been rebuilt since it was last injected | Tables will be missing from the consumer project, or will have changed from injected (green) to local (black). |
|
Source project has been deleted | Tables will be missing from the consumer project, or will have changed from injected (green) to local (black). |
|
Source project has a mismatch between metadata and package file | If there has been a process error during package creation, package files and their metadata description could become disconnected. If this happens, there will be a mismatch between tree-view counts and live counts created in datasets or analysis reports and there could be an issue injecting the package. |
|
Source project has reached it’s expiry data | NOT YET IMPLEMENTED | |
New Packages are available (i.e., Package has been rebuilt since it was last injected) | A dependency notification will be displayed. |
|
Package has been deleted or moved | A dependency notification will be displayed. |
|
Package structure and architecture
For more information on package structure and content, see Packages.
Package Management and Administration
Packages can have the following status:
- Online - package file is ready for use
- Invalid - package fits one of the following criteria
- Source project is missing
- Source project has been reloaded since package was built
- Package depends on another package that is itself invalid
The recommended Package Injection/Creation method is to use named packages and the storage hub, as this gives information to users who want to inject via the package manager:
Packages and A-B switching
Packages and A-B switching are used when source projects need to be rebuilt without causing any down-time to client/consumer projects that are using the data.
A container package (DSource_Package) is used as a consistent consumer project that will be injected into other projects.
Metadata
Metadata that is applied in a source project will be automatically available in all consumer projects that inject the package.
Trouble-shooting
Symptom | Cause | Resolution |
---|---|---|
"View Failed. Incomplete Table" when viewing a dataset from an injected table | Underlying source project has changed. |
|
Package tables a long time to inject | ||
Can't detach injected table | ||
Can't delete injected table | ||
A dependency notification is displayed when project is opened | ||