Model Processing - Quick Reference

Article In Progress...

Before model processing in R, first make sure that R has been installed on the DataJet Server (or desktop if working on a local machine).

Before model processing in python, make sure that python has been installed on the DataJet server.

Model Processing Overview

TODO: Video - Integrated Modelling in DataJet

Prepare Source Data

The output of any profile report
Any discrete field from the data catalog

Develop Model

Use Engineering | User Defined Fields | Discrete Processor | R to open the R Model Processor and Engineering | User Defined Field | Discrete Processor | python to open the python model processor
Select a model template from the System or User library and load into the code panel using <- use
Modify model template as necessary or create a new model from a blank template by editing code in the processing panel
Test model by dropping source data into the source panel and verifying model output in the output panel using Test
Make model available to analysis reports by using Push Model to Library

Apply Model

Open an existing Multi-Function Profile report from Reports Tab

Create a new Analysis | Multi-Function Profile report
Select Model from the Model Drop Down list
Calculate Model
Save Report, and/or Export Results to table or file

Model Processor Console

Source Panel

Drag and drop DataJet objects into the panel to provide source data for the model. The following objects can be added to the source panel:

Reports:
- Profile
- Index Profile
- Multi-function profile
Database fields – discrete fields only

Processing Panel

Load an existing model from the SYSTEM or USER libraries. To load a model:

Select Library tab,
Select a library – USER or SYSTEM
Select a model
Select <- use.
Edit/Develop model in the Code Tab

TODO: Review {

Libraries

SYSTEM Library

The SYSTEM library is a standard library, and contains various example models.

SYSTEM Models are stored in Github, and are automatically refreshed whenever the Model Processor starts up.

Only models for the selected programming language (i.e. python or R) will be displayed in the libraries tab.

If the DataJet server is not connected to the internet, no system models will be available.

For SYSTEM models to be available in the multi-function profile report, they must first be “loaded” into the system:

From libraries tab, Select SYSTEM model
Select <- use
From Code tab, select Push To Library

USER Library

The USER Library contains models that are specific to the current installation. These could consist of:

custom imported models,
SYSTEM models that have been configured for the active installation or edited in some way
new models which have been developed using the R Model Processing report.

USER models are stored in the DataJet mongo database on the DataJet server.

For USER models to be available in the multi-function profile report, they must first be “loaded” into the system:

From libraries tab, Select USER model
Select <- use
From Code tab, select Push To Library

Output Panel

The output panel displays the model output, as generated by the Test option.

TODO: What does it mean if Test is disabled? How do you fix this?

Results Panel

Displays the content of the “grid” object for the model output:

dataModel$grid = grid

finalModel = toJSON(dataModel)

write_file(finalModel,myargs[2])

Python

TODO

Data Panel

Displays the content of the “associatedData” item for the model output:

dataModel$associatedData = adata

finalModel = toJSON(dataModel)

write_file(finalModel,myargs[2])

Python

TODO

Data Model Overview

Load Data from command line/file
Create JSON data model object
Access dataModel contents

TODO: Complete Grid

object	description	properties
grid	Content of results grid grid[[1]] = setup grid grid[[2]] = results grid grid[[3]] = ??? TODO	name tag headers data TODO: Is there more than 1 grid?
suggestedChart
chart		objectType name categories values chartType x y z
associatedData		DataRows DataColumns cluster_centers plottableCentroids inertia
headerInfo		name type [label, value] datatype fieldType axisOverride graphAble
rows
hasTotalRow		[TRUE, FALSE]
headers
hasTotalColumn
hasNullRow
hasNullColumn

Process data
Update dataModel contents
Write JSON data out to file

Working with R

Accessing dataModel Input

Data is read in from the model interface/JSON input/source panel using commandArgs (myargs[1]) or by reading directly from file:

#Ex1.1 Using readLines and commandArgs to load data from source panel:

myargs = commandArgs(trailingOnly=TRUE)

data = readLines(myargs[1])

#Ex1.2 Using read_file and commandArgs to load data from source panel:

myargs = commandArgs(trailingOnly=TRUE)

data = read_file(myargs[1], locale = default_locale())

#Ex1.3 Reading data directly from file:

data = read_file("d:/model_in_6379.json", locale = default_locale())

Additional input parameters can be accessed using myargs[3], myargs[4] etc…

Model Processing in R

R objects such as dataModels and dataFrames are used to apply modelling to source data:

#Ex2.1 Using for loop to process data

for(i in 1:length(data)) {

  data[i] = data[i]

}

Model Output in R

Data is output to model interface/output panel by outputting to file or using the commandArgs character vector myargs:

#Ex3.1 Outputting data using commandArgs and writeLines:

writeLines(data, myargs[2])

#Ex3.2 Outputting data using commandArgs and write_file
finalModel = toJSON(dataModel)

write_file(finalModel, myargs[2])