- 9 Minutes to read
- Print
- DarkLight
- PDF
Categories and Metadata
- 9 Minutes to read
- Print
- DarkLight
- PDF
Overview
Metadata is descriptive data attached to tables and fields in order to provide the user with more detailed information about the underlying record data. It is also used to enhance searching and grouping of data in the project explorer and context panel.
The following methods are used when loading data into a table and attaching metadata (i.e., descriptive data about the loaded fields):
- CreateTableFromFile , CreateTableFromWorkbook, CreateKeyTable, ExportIntoTable, CopySourceTable, CopyUp, CopyDown, CopyUpFromTemplate, CopyDownFromTemplate - load, create or copy data
- SetProjectProperties - configures the project to use descriptive data, searching, categorisation and export substitutions
- SetPropertiesFromWorkbook - attaches metadata from an excel spreadsheet to existing database objects.
- SetPropertiesForTableFromWorkbook - attaches a set of metadata from an excel spreadsheet to a one or more specified tables
Once attached, Metadata is displayed in the Project Explorer and Context Panel:
If field metadata is attached to a project, depending on the metadata, some or all of the following features will be enabled:
- Database Category- the categories tab will be enabled, and fields will be grouped into specified categories
- Note: A field can belong to more than one category. Provide the list of categories for the field as a comma separated strings e.g.
- Address, Postal, Personal
- Personal, Finance
- CAT1, CAT2, CAT7
- Note: A field can belong to more than one category. Provide the list of categories for the field as a comma separated strings e.g.
- Second Degree Search - Full Text search will operate on the content of the specified metadata column, as well as the field name (e.g., "DisplayText"
- Property Metadata - extendable metadata columns, defined by SetProjectProperties, and contained in the excel metadata definition file.
See Configuring Metadata for further details.
The process of attaching metadata is as follows:
Preparing metadata
File Format
Metadata is specific to a project, and can be attached either during database load and build, or later via Script | Script Editor.
2 kinds of metadata are supported:
- Explicit - the metadata file contains data specific to a particular table name
- General - the same metadata file can be applied across multiple tables (v6.11.7 onwards)
Explicit Metadata
Metadata details for one or more tables are loaded from an excel spreadsheet of the following format:
- Table Name - required, table must exist or the row will be ignored. Case Sensitive. Must be first column.
- Field Name - required, field must exist or the row will be ignored. Case Sensitive. Must be second column.
- [Col3] - Any number of additional columns may be present in the spreadsheet. Each column header must have a unique name.
- ...
Object properties are displayed in the context panel, and also in the Project Explorer | Right-click Object | Properties dialog.
{
"method": "SetPropertiesFromWorkbook",
"filename": "%DATAPATH%RegressionTest/FieldPropertiesWorkbook.xlsx",
"properties": [
"Description",
"Categories"
],
"project": "Demonstration"
}
General Metadata
In the situation where a single metadata file is going to be used to apply metadata across multiple tables, the following file format is required:
- Field - required, field must exist or the row will be ignored. Case Sensitive. Must be first column.
- [Col2] - Any number of additional columns may be present in the spreadsheet. Each column header must have a unique name. At least one additional column must be provided.
- ...
{
"method": "SetPropertiesForTableFromWorkbook",
"table": "Supplies",
"filename": "%DATAPATH%Regression2/Rg2-Supplies/SuppliesPropertiesWorkbook.xlsx",
"properties": [
"Description",
"Categories",
"Code"
],
"onErrorIgnore": true,
"project": "Rg2-Supplies"
}
Metadata Types
Each column in the spreadsheet is one of the following types:
- Item Identifier (e.g., Tablename and Fieldname)
- Code - If configured (see Configuring Metadata), data in the code column will be displayed in the code column of the Database Category tab (note this column does not have to be called code). Data in this metadata column can be searched via the Category Search option.
- Categories - If configured (see Configuring Metadata), data in the categories column will be used to control which categories fields are assigned to
- Description - If a description column is present, the content of that column will be displayed as the field's description property
- Other - Any other metadata column
Attaching and Configuring Metadata
Configuring Metadata
To specify which column(s) correspond to which metadata types, use the SetProjectProperties method.
- Set the categoryProperty key to point to the spreadsheet column that contains category information
- In this spreadsheet the categoryProperty column is called "Categories":
- "categoryProperty":"Categories",
- Set the "secondDegreeSearchProperty" to point to the column that contains the data to be displayed in the code column.
- "secondDegreeSearchProperty":"Code",
- List all metadata properties which are to be loaded from file in the "fieldPropertyNames" list - this list must include the categoryProperty and secondDegreeSearchProperty column headers if they have been specified.
{
"method": "SetProjectProperties",
"categoryProperty": "Categories",
"secondDegreeSearchProperty": "Code",
"fieldPropertyNames": [
"Code",
"Categories"
],
"project": "Demonstration"
}
Attaching Metadata
To load configured metadata from file and attach it to the database schema, use one of the following methods:
- SetPropertiesFromWorkbook - table names are included in the definition file
- SetPropertiesForTableFromWorkbook - uses a generic definition file which can be applied to multiple table
Both methods require the following:
- filename - points to the spreadsheet that contains the metadata
- properties - list of all columns
- SetPropertiesFromWorkbook:
- Columns 1 and 2, - tablename and fieldname - must be present in that order to be loaded into metadata properties.
- Properties list must include the column headers specified for SetProjectProperties: categoryProperty and SetProjectProperties: secondDegreeSearchProperty
- SetPropertiesFromWorkbook:
{
"method": "SetPropertiesFromWorkbook",
"filename": "%DATAPATH%RegressionTest/FieldPropertiesWorkbook.xlsx",
"properties": [
"Description",
"Categories"
],
"project": "Demonstration"
}
{
"method": "SetPropertiesForTableFromWorkbook",
"table": "Base1_AWIData_Foreign",
"filename": "%DATAPATH%REGRESS/OT/OT_Definitions/DEF_OT_HYGIENE_Metadata_20241128.xlsx",
"delimiter": ",",
"properties": [
"Displayname",
"MetadataType",
"Code",
"Categories",
"Value Type",
"DSARDelete",
"DisplayText"
],
"project": "REGRESS_Base1_Create_Sample"
}
Once metadata has been attached, re-open the project. The category button will now be visible in the Project Explorer:
Copying and Packaging Metadata
Once attached to a field, the metadata properties for that field will stay with any copies of the data that are made. The following methods will all create new fields with metadata attached:
- CopySourceTable
- ExportIntoTable
- CopyUp
- copydowndiscrete
- CopyUpFromTemplate
- CopyDownFromTemplate
- CreateKeyTable
In addition to this, CreatePackage can be used to build a package from a project that contains metadata. Injecting the package (using InjectPackage) will bring the metadata into the consumer project.
Example Script and File:
[
{
"method": "DropProject",
"deleteFiles": true,
"project": "Metadata_Test",
"name": "Metadata_Test"
},
{
"method": "CreateProject",
"description": "",
"project": "Metadata_Test",
"name": "Metadata_Test"
},
{
"method": "OpenProject",
"project": "Metadata_Test",
"name": "Metadata_Test"
},
{
"method": "CreateTableFromFile",
"action": "LOAD",
"table": "framingham",
"filename": "%DATAPATH%%SourceDirectory%/framingham_figshare_cleaned.csv",
"definition": [
"RecordNo|DISCRETE|INTEGER|BYTE",
"RANDID|CONTINUOUS|INTEGER|",
"SEX|DISCRETE|INTEGER|BYTE",
"TOTCHOL|DISCRETE|INTEGER|BYTE",
"AGE|DISCRETE|INTEGER|BYTE",
"SYSBP|DISCRETE|DOUBLE|BYTE",
"DIABP|DISCRETE|DOUBLE|BYTE",
"CURSMOKE|DISCRETE|INTEGER|BYTE",
"CIGPDAY|DISCRETE|INTEGER|BYTE",
"BMI|DISCRETE|INTEGER|BYTE",
"DIABETES|DISCRETE|INTEGER|BYTE",
"BPMEDS|DISCRETE|INTEGER|BYTE",
"HEARTRTE|DISCRETE|INTEGER|BYTE",
"GLUCOSE|DISCRETE|INTEGER|BYTE",
"educ|DISCRETE|INTEGER|BYTE",
"PREVCHD|DISCRETE|INTEGER|BYTE",
"PREVAP|DISCRETE|INTEGER|BYTE",
"PREVMI|DISCRETE|INTEGER|BYTE",
"PREVSTRK|DISCRETE|INTEGER|BYTE",
"PREVHYP|DISCRETE|INTEGER|BYTE",
"TIME|DISCRETE|INTEGER|BYTE",
"PERIOD|DISCRETE|INTEGER|BYTE",
"HDLC|DISCRETE|INTEGER|BYTE",
"LDLC|DISCRETE|INTEGER|BYTE",
"DEATH|DISCRETE|INTEGER|BYTE",
"ANGINA|DISCRETE|INTEGER|BYTE",
"HOSPMI|DISCRETE|INTEGER|BYTE",
"MI_FCHD|DISCRETE|INTEGER|BYTE",
"ANYCHD|DISCRETE|INTEGER|BYTE",
"STROKE|DISCRETE|INTEGER|BYTE",
"CVD|DISCRETE|INTEGER|BYTE",
"HYPERTEN|DISCRETE|INTEGER|BYTE",
"TIMEAP|DISCRETE|INTEGER|BYTE",
"TIMEMI|DISCRETE|INTEGER|BYTE",
"TIMEMIFC|DISCRETE|INTEGER|BYTE",
"TIMECHD|DISCRETE|INTEGER|BYTE",
"TIMESTRK|DISCRETE|INTEGER|BYTE",
"TIMECVD|DISCRETE|INTEGER|BYTE",
"TIMEDTH|DISCRETE|INTEGER|BYTE",
"TIMEHYP|DISCRETE|INTEGER|BYTE"
],
"loading": [
"RecordNo",
"RANDID",
"SEX",
"TOTCHOL",
"AGE",
"SYSBP",
"DIABP",
"CURSMOKE",
"CIGPDAY",
"BMI",
"DIABETES",
"BPMEDS",
"HEARTRTE",
"GLUCOSE",
"educ",
"PREVCHD",
"PREVAP",
"PREVMI",
"PREVSTRK",
"PREVHYP",
"TIME",
"PERIOD",
"HDLC",
"LDLC",
"DEATH",
"ANGINA",
"HOSPMI",
"MI_FCHD",
"ANYCHD",
"STROKE",
"CVD",
"HYPERTEN",
"TIMEAP",
"TIMEMI",
"TIMEMIFC",
"TIMECHD",
"TIMESTRK",
"TIMECVD",
"TIMEDTH",
"TIMEHYP"
],
"sample": false,
"preHeaderSkip": 0,
"dateFormat": "YYYY-MM-DD",
"fileFormat": "ASCII8",
"skipFirstLine": true,
"delimiter": "COMMA",
"stripCharacter": "",
"dateTimeFormat": "YYYY-MM-DD HH:MM:SS",
"project": "Metadata_Test"
},
{
"method": "SetProjectProperties",
"categoryProperty": "ObservationType",
"secondDegreeSearchProperty": "Unit",
"fieldPropertyNames": [
"Description",
"ObservationType",
"Unit"
],
"project": "Metadata_Test"
},
{
"method": "SetPropertiesFromWorkbook",
"filename": "%DATAPATH%%SourceDirectory%/META_framingham_figsure.xlsx",
"delimiter": ",",
"properties": [
"Description",
"ObservationType",
"Unit"
],
"project": "Metadata_Test"
}
]
Metadata file
Table | Name | Description | ObservationType | Unit |
framingham | RecordNo | Unique Record number | id | |
framingham | RANDID | Random participant ID | id | |
framingham | SEX | Biological sex of participant: 1 = male 2 = female | Status | |
framingham | TOTCHOL | Serum Total Cholesterol (mg/dL) 107-696 | Measurement | mg/dL |
framingham | AGE | Age at exam (years) | Measurement | |
framingham | SYSBP | Systolic Blood Pressure (mean of last two of three measurements) (mmHg) | Measurement | mmHg |
framingham | DIABP | Diastolic Blood Pressure (mean of last two of three measurements) (mmHg) | Measurement | mmHg |
framingham | CURSMOKE | Current cigarette smoking at exam | Status | |
framingham | CIGPDAY | Number of cigarettes smoked each day | Measurement | |
framingham | BMI | Body Mass Index weight in kilograms/height meters squared | Measurement | kg/m^2 |
framingham | DIABETES | Diabetic according to criteria of first exam treated or first exam with casual glucose of 200 mg/dL or more: 0 = not diabetic 1=diabetic | Status | |
framingham | BPMEDS | Use of anti-hypertensive medication: 0 = not used | Status | |
framingham | HEARTRTE | Heart rate (Ventricular rate) in beats/min | Measurement | beats/min |
framingham | GLUCOSE | Casual serum glucose (mg/dL) | Measurement | mg/dL |
framingham | educ | Attained Education: 1=0-11 years 2=High School Diploma or GED 3=Some College or Vocational School 4=College (BS BA) degree or more | Status | |
framingham | PREVCHD | Prevalent Coronary Heart Disease defined as pre-existing Angina Pectoris Myocardial Infarction (hospitalized silent or unrecognized) or Coronary Insufficiency (unstable angina) | Status | |
framingham | PREVAP | Prevalent Angina Pectoris at exam: 0 = free of disease 1 = prevalent disease | Status | |
framingham | PREVMI | Prevalent Myocardial Infarction | Status | |
framingham | PREVSTRK | Prevalent Stroke | Status | |
framingham | PREVHYP | Prevalent Hypertensive. Subject was defined as hypertensive if treated or if second exam at which mean systolic was >=140 mmHg or mean Diastolic >=90 | Status | |
framingham | TIME | Number of days since baseline exam | Time | days |
framingham | PERIOD | Examination cycle: 1 = first 2 = mid 3 = final | Marker | |
framingham | HDLC | High Density Lipoprotein Cholesterol (mg/dL) (available for period 3 only) | Measurement | mg/dL |
framingham | LDLC | Low Density Lipoprotein Cholesterol (mg/dL) (available for period 3 only) | Measurement | mg/dL |
framingham | DEATH | Death from any cause | Event | |
framingham | ANGINA | Angina Pectoris | Event | |
framingham | HOSPMI | Hospitalized Myocardial Infarction | Event | |
framingham | MI_FCHD | Hospitalized Myocardial Infarction or Fatal Coronary Heart Disease | Event | |
framingham | ANYCHD | Angina Pectoris Myocardial infarction (Hospitalized and silent or unrecognized) Coronary Insufficiency (Unstable Angina) or Fatal Coronary Heart Disease | Event | |
framingham | STROKE | Atherothrombotic infarction Cerebral Embolism Intracerebral Hemorrhage or Subarachnoid Hemorrhage or Fatal Cerebrovascular Disease | Event | |
framingham | CVD | Myocardial infarction (Hospitalized and silent or unrecognized) Fatal Coronary Heart Disease Atherothrombotic infarction Cerebral Embolism Intracerebral Hemorrhage or Subarachnoid Hemorrhage or Fatal Cerebrovascular Disease | Event | |
framingham | HYPERTEN | Hypertensive. Defined as the first exam treated for high blood pressure or second exam in which either Systolic is $ 140 mmHg or Diastolic $ 90mmHg | Event | |
framingham | TIMEAP | Number of days from Baseline exam to first Angina during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time | |
framingham | TIMEMI | Number of days from Baseline exam to first HOSPMI event (Hospitalized Myocardial Infarction) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time | |
framingham | TIMEMIFC | Number of days from Baseline exam to first MI_FCHD event (Hospitalized Myocardial Infarction or Fatal Coronary Heart Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time | |
framingham | TIMECHD | Number of days from Baseline exam to first ANYCHD event (Angina Pectoris Myocardial infarction (Hospitalized and silent or unrecognized) Coronary Insufficiency (Unstable Angina) or Fatal Coronary Heart Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time | |
framingham | TIMESTRK | Number of days from Baseline exam to first STROKE event (Atherothrombotic infarction Cerebral Embolism Intracerebral Hemorrhage or Subarachnoid Hemorrhage or Fatal Cerebrovascular Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time | |
framingham | TIMECVD | Number of days from Baseline exam to first CVD event (Myocardial infarction (Hospitalized and silent or unrecognized) Fatal Coronary Heart Disease Atherothrombotic infarction Cerebral Embolism Intracerebral Hemorrhage or Subarachnoid Hemorrhage or Fatal Cerebrovascular Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time | |
framingham | TIMEDTH | Number of days from Baseline exam to death if occurring during followup or Number of days from Baseline to censor date. Censor date may be end of followup or last known contact date if subject is lost to followup | Time | |
framingham | TIMEHYP | Number of days from Baseline exam to first HYPERTEN event during the followup or Number of days from Baseline to censor date. Censor date may be end of followup death or last known contact date if subject is lost to followup | Time |