- 3 Minutes to read
- Print
- DarkLight
- PDF
Model Processing - Quick Reference
- 3 Minutes to read
- Print
- DarkLight
- PDF
Article In Progress...
Before model processing in R, first make sure that R has been installed on the DataJet Server (or desktop if working on a local machine).
Before model processing in python, make sure that python has been installed on the DataJet server.
Model Processing Overview
TODO: Video - Integrated Modelling in DataJet
Prepare Source Data
- The output of any profile report
- Any discrete field from the data catalog
Develop Model
- Use Engineering | User Defined Fields | Discrete Processor | R to open the R Model Processor and Engineering | User Defined Field | Discrete Processor | python to open the python model processor
- Select a model template from the System or User library and load into the code panel using <- use
- Modify model template as necessary or create a new model from a blank template by editing code in the processing panel
- Test model by dropping source data into the source panel and verifying model output in the output panel using Test
- Make model available to analysis reports by using Push Model to Library
Apply Model
- Open an existing Multi-Function Profile report from Reports Tab
or
- Create a new Analysis | Multi-Function Profile report
- Select Model from the Model Drop Down list
- Calculate Model
- Save Report, and/or Export Results to table or file
Model Processor Console
Source Panel
Drag and drop DataJet objects into the panel to provide source data for the model. The following objects can be added to the source panel:
- Reports:
- Profile
- Index Profile
- Multi-function profile
- Database fields – discrete fields only
Processing Panel
Load an existing model from the SYSTEM or USER libraries. To load a model:
- Select Library tab,
- Select a library – USER or SYSTEM
- Select a model
- Select <- use.
- Edit/Develop model in the Code Tab
TODO: Review {
Libraries
SYSTEM Library
The SYSTEM library is a standard library, and contains various example models.
SYSTEM Models are stored in Github, and are automatically refreshed whenever the Model Processor starts up.
Only models for the selected programming language (i.e. python or R) will be displayed in the libraries tab.
If the DataJet server is not connected to the internet, no system models will be available.
For SYSTEM models to be available in the multi-function profile report, they must first be “loaded” into the system:
- From libraries tab, Select SYSTEM model
- Select <- use
- From Code tab, select Push To Library
USER Library
The USER Library contains models that are specific to the current installation. These could consist of:
- custom imported models,
- SYSTEM models that have been configured for the active installation or edited in some way
- new models which have been developed using the R Model Processing report.
USER models are stored in the DataJet mongo database on the DataJet server.
For USER models to be available in the multi-function profile report, they must first be “loaded” into the system:
- From libraries tab, Select USER model
- Select <- use
- From Code tab, select Push To Library
Output Panel
The output panel displays the model output, as generated by the Test option.
TODO: What does it mean if Test is disabled? How do you fix this?
Results Panel
R | Displays the content of the “grid” object for the model output:
dataModel$grid = grid finalModel = toJSON(dataModel) write_file(finalModel,myargs[2]) |
Python | TODO |
Data Panel
R | Displays the content of the “associatedData” item for the model output: dataModel$associatedData = adata finalModel = toJSON(dataModel) write_file(finalModel,myargs[2]) |
Python | TODO |
Data Model Overview
- Load Data from command line/file
- Create JSON data model object
- Access dataModel contents
TODO: Complete Grid
object | description | properties |
grid | Content of results grid
grid[[1]] = setup grid grid[[2]] = results grid grid[[3]] = ??? TODO |
TODO: Is there more than 1 grid? |
suggestedChart |
| |
chart |
|
|
associatedData |
|
|
headerInfo |
|
|
rows |
|
|
hasTotalRow |
| [TRUE, FALSE] |
headers |
|
|
hasTotalColumn |
|
|
hasNullRow |
|
|
hasNullColumn |
|
|
- Process data
- Update dataModel contents
- Write JSON data out to file
Working with R
Accessing dataModel Input
Data is read in from the model interface/JSON input/source panel using commandArgs (myargs[1]) or by reading directly from file:
#Ex1.1 Using readLines and commandArgs to load data from source panel:
myargs = commandArgs(trailingOnly=TRUE)
data = readLines(myargs[1])
#Ex1.2 Using read_file and commandArgs to load data from source panel:
myargs = commandArgs(trailingOnly=TRUE)
data = read_file(myargs[1], locale = default_locale())
#Ex1.3 Reading data directly from file:
data = read_file("d:/model_in_6379.json", locale = default_locale())
Additional input parameters can be accessed using myargs[3], myargs[4] etc…
Model Processing in R
R objects such as dataModels and dataFrames are used to apply modelling to source data:
#Ex2.1 Using for loop to process data
for(i in 1:length(data)) {
data[i] = data[i]
}
Model Output in R
Data is output to model interface/output panel by outputting to file or using the commandArgs character vector myargs:
#Ex3.1 Outputting data using commandArgs and writeLines:
writeLines(data, myargs[2])
#Ex3.2 Outputting data using commandArgs and write_file
finalModel = toJSON(dataModel)
write_file(finalModel, myargs[2])