Categories and Metadata
  • 9 Minutes to read
  • Dark
    Light
  • PDF

Categories and Metadata

  • Dark
    Light
  • PDF

Article summary

Overview

Metadata is descriptive data attached to tables and fields in order to provide the user with more detailed information about the underlying record data.   It is also used to enhance searching and grouping of data in the project explorer and context panel.

The following methods are used when loading data into a table and attaching metadata (i.e., descriptive data about the loaded fields):

Once attached, Metadata is displayed in the Project Explorer and Context Panel:

If field metadata is attached to a project, depending on the metadata, some or all of the following features will be enabled:

  • Database Category- the categories tab will be enabled, and fields will be grouped into specified categories
    • Note: A field can belong to more than one category.  Provide the list of categories for the field as a comma separated strings e.g.  
      • Address, Postal, Personal
      • Personal, Finance
      • CAT1, CAT2, CAT7
  • Second Degree Search - Full Text search will operate on the content of the specified metadata column, as well as the field name (e.g., "DisplayText"
  • Property Metadata - extendable metadata columns, defined by SetProjectProperties, and contained in the excel metadata definition file.

See Configuring Metadata for further details.

The process of attaching metadata is as follows:

Preparing metadata

File Format

Metadata is specific to a project, and can be attached either during database load and build, or later via Script | Script Editor.   

2 kinds of metadata are supported:

  • Explicit - the metadata file contains data specific to a particular table name
  • General - the same metadata file can be applied across multiple tables (v6.11.7 onwards)

Explicit Metadata

Metadata details for one or more tables are loaded from an excel spreadsheet of the following format:

  1. Table Name - required, table must exist or the row will be ignored.  Case Sensitive. Must be first column.
  2. Field Name - required, field must exist or the row will be ignored.  Case Sensitive.  Must be second column.
  3. [Col3] - Any number of additional columns may be present in the spreadsheet.  Each column header must have a unique name.
  4. ...
Tip!
The description for a field can be set in metadata, or it can be set using SetObjectDescription
Object properties are displayed in the context panel, and also in the Project Explorer | Right-click Object | Properties dialog.

JSON
{
  "method": "SetPropertiesFromWorkbook",
  "filename": "%DATAPATH%RegressionTest/FieldPropertiesWorkbook.xlsx",
  "properties": [
    "Description",
    "Categories"
  ],
  "project": "Demonstration"
}

General Metadata

In the situation where a single metadata file is going to be used to apply metadata across multiple tables, the following file format is required:

  1. Field - required, field must exist or the row will be ignored.  Case Sensitive.  Must be first column.
  2. [Col2] - Any number of additional columns may be present in the spreadsheet.  Each column header must have a unique name. At least one additional column must be provided.
  3. ...

JSON
{
  "method": "SetPropertiesForTableFromWorkbook",
  "table": "Supplies",
  "filename": "%DATAPATH%Regression2/Rg2-Supplies/SuppliesPropertiesWorkbook.xlsx",
  "properties": [
    "Description",
    "Categories",
    "Code"
  ],
  "onErrorIgnore": true,
  "project": "Rg2-Supplies"
}

Metadata Types

Each  column in the spreadsheet is one of the following types:

  • Item Identifier  (e.g., Tablename and Fieldname) 
  • Code - If configured (see Configuring Metadata), data in the code column will be displayed in the code column of the Database Category tab (note this column does not have to be called code).  Data in this metadata column can be searched via the Category Search option.
  • Categories - If configured (see Configuring Metadata), data in the categories column will be used to control which categories fields are assigned to
  • Description - If a description column is present, the content of that column will be displayed as the field's description property
  • Other - Any other metadata column


Attaching and Configuring Metadata 

Configuring Metadata

To specify which column(s) correspond to which metadata types, use the SetProjectProperties method.   

  • Set the categoryProperty key to point to the spreadsheet column that contains category information
    • In this spreadsheet the categoryProperty column is called "Categories":
    • "categoryProperty":"Categories",
  • Set the "secondDegreeSearchProperty" to point to the column that contains the data to be displayed in the code column.   
    • "secondDegreeSearchProperty":"Code",
  • List all metadata properties which are to be loaded from file in the "fieldPropertyNames" list - this list must include the categoryProperty and secondDegreeSearchProperty column headers if they have been specified. 
JSON
{
  "method": "SetProjectProperties",
  "categoryProperty": "Categories",
  "secondDegreeSearchProperty": "Code",
  "fieldPropertyNames": [
    "Code",
    "Categories"
  ],
  "project": "Demonstration"
}
Categories
All fields will be displayed in the ALL category.    Fields can belong to multiple categories - by default a comma is used to separate categories.   Other delimiters can be used if necessary.

Attaching Metadata

To load configured metadata from file and attach it to the database schema, use one of the following methods:

Both methods require the following:

  • filename - points to the spreadsheet that contains the metadata
  • properties - list of all columns 
    • SetPropertiesFromWorkbook:  
      • Columns 1 and 2, - tablename and fieldname - must be present in that order to be loaded into metadata properties.    
    • Properties list must include the column headers specified for SetProjectProperties: categoryProperty and SetProjectProperties: secondDegreeSearchProperty
JSON
{
  "method": "SetPropertiesFromWorkbook",
  "filename": "%DATAPATH%RegressionTest/FieldPropertiesWorkbook.xlsx",
  "properties": [
    "Description",
    "Categories"
  ],
  "project": "Demonstration"
}


{
  "method": "SetPropertiesForTableFromWorkbook",
  "table": "Base1_AWIData_Foreign",
  "filename": "%DATAPATH%REGRESS/OT/OT_Definitions/DEF_OT_HYGIENE_Metadata_20241128.xlsx",
  "delimiter": ",",
  "properties": [
    "Displayname",
    "MetadataType",
    "Code",   
    "Categories",
    "Value Type",
    "DSARDelete",
    "DisplayText"
  ],
  "project": "REGRESS_Base1_Create_Sample"
}


Once metadata has been attached, re-open the project.    The category button will now be visible in the Project Explorer:


Copying and Packaging Metadata

Once attached to a field, the metadata properties for that field will stay with any copies of the data that are made.  The following methods will all create new fields with metadata attached:

In addition to this, CreatePackage can be used to build a package from a project that contains metadata.  Injecting the package (using InjectPackage) will bring the metadata into the consumer project.


Example Script and File:

JSON
[
  {
    "method": "DropProject",
    "deleteFiles": true,
    "project": "Metadata_Test",
    "name": "Metadata_Test"
  },
  {
    "method": "CreateProject",
    "description": "",
    "project": "Metadata_Test",
    "name": "Metadata_Test"
  },
  {
    "method": "OpenProject",
    "project": "Metadata_Test",
    "name": "Metadata_Test"
  },
  {
    "method": "CreateTableFromFile",
    "action": "LOAD",
    "table": "framingham",
    "filename": "%DATAPATH%%SourceDirectory%/framingham_figshare_cleaned.csv",
    "definition": [
      "RecordNo|DISCRETE|INTEGER|BYTE",
      "RANDID|CONTINUOUS|INTEGER|",
      "SEX|DISCRETE|INTEGER|BYTE",
      "TOTCHOL|DISCRETE|INTEGER|BYTE",
      "AGE|DISCRETE|INTEGER|BYTE",
      "SYSBP|DISCRETE|DOUBLE|BYTE",
      "DIABP|DISCRETE|DOUBLE|BYTE",
      "CURSMOKE|DISCRETE|INTEGER|BYTE",
      "CIGPDAY|DISCRETE|INTEGER|BYTE",
      "BMI|DISCRETE|INTEGER|BYTE",
      "DIABETES|DISCRETE|INTEGER|BYTE",
      "BPMEDS|DISCRETE|INTEGER|BYTE",
      "HEARTRTE|DISCRETE|INTEGER|BYTE",
      "GLUCOSE|DISCRETE|INTEGER|BYTE",
      "educ|DISCRETE|INTEGER|BYTE",
      "PREVCHD|DISCRETE|INTEGER|BYTE",
      "PREVAP|DISCRETE|INTEGER|BYTE",
      "PREVMI|DISCRETE|INTEGER|BYTE",
      "PREVSTRK|DISCRETE|INTEGER|BYTE",
      "PREVHYP|DISCRETE|INTEGER|BYTE",
      "TIME|DISCRETE|INTEGER|BYTE",
      "PERIOD|DISCRETE|INTEGER|BYTE",
      "HDLC|DISCRETE|INTEGER|BYTE",
      "LDLC|DISCRETE|INTEGER|BYTE",
      "DEATH|DISCRETE|INTEGER|BYTE",
      "ANGINA|DISCRETE|INTEGER|BYTE",
      "HOSPMI|DISCRETE|INTEGER|BYTE",
      "MI_FCHD|DISCRETE|INTEGER|BYTE",
      "ANYCHD|DISCRETE|INTEGER|BYTE",
      "STROKE|DISCRETE|INTEGER|BYTE",
      "CVD|DISCRETE|INTEGER|BYTE",
      "HYPERTEN|DISCRETE|INTEGER|BYTE",
      "TIMEAP|DISCRETE|INTEGER|BYTE",
      "TIMEMI|DISCRETE|INTEGER|BYTE",
      "TIMEMIFC|DISCRETE|INTEGER|BYTE",
      "TIMECHD|DISCRETE|INTEGER|BYTE",
      "TIMESTRK|DISCRETE|INTEGER|BYTE",
      "TIMECVD|DISCRETE|INTEGER|BYTE",
      "TIMEDTH|DISCRETE|INTEGER|BYTE",
      "TIMEHYP|DISCRETE|INTEGER|BYTE"
    ],
    "loading": [
      "RecordNo",
      "RANDID",
      "SEX",
      "TOTCHOL",
      "AGE",
      "SYSBP",
      "DIABP",
      "CURSMOKE",
      "CIGPDAY",
      "BMI",
      "DIABETES",
      "BPMEDS",
      "HEARTRTE",
      "GLUCOSE",
      "educ",
      "PREVCHD",
      "PREVAP",
      "PREVMI",
      "PREVSTRK",
      "PREVHYP",
      "TIME",
      "PERIOD",
      "HDLC",
      "LDLC",
      "DEATH",
      "ANGINA",
      "HOSPMI",
      "MI_FCHD",
      "ANYCHD",
      "STROKE",
      "CVD",
      "HYPERTEN",
      "TIMEAP",
      "TIMEMI",
      "TIMEMIFC",
      "TIMECHD",
      "TIMESTRK",
      "TIMECVD",
      "TIMEDTH",
      "TIMEHYP"
    ],
    "sample": false,
    "preHeaderSkip": 0,
    "dateFormat": "YYYY-MM-DD",
    "fileFormat": "ASCII8",
    "skipFirstLine": true,
    "delimiter": "COMMA",
    "stripCharacter": "",
    "dateTimeFormat": "YYYY-MM-DD HH:MM:SS",
    "project": "Metadata_Test"
  },
  {
    "method": "SetProjectProperties",
    "categoryProperty": "ObservationType",
    "secondDegreeSearchProperty": "Unit",
    "fieldPropertyNames": [
      "Description",
      "ObservationType",
      "Unit"
    ],
    "project": "Metadata_Test"
  },
  {
    "method": "SetPropertiesFromWorkbook",
    "filename": "%DATAPATH%%SourceDirectory%/META_framingham_figsure.xlsx",
    "delimiter": ",",
    "properties": [
      "Description",
      "ObservationType",
      "Unit"
    ],
    "project": "Metadata_Test"
  }
]


Metadata file 

TableNameDescriptionObservationTypeUnit
framinghamRecordNoUnique Record numberid
framinghamRANDIDRandom participant IDid
framinghamSEXBiological sex of participant: 1 = male  2 = femaleStatus
framinghamTOTCHOLSerum Total Cholesterol (mg/dL) 107-696 Measurementmg/dL
framinghamAGEAge at exam (years)Measurement
framinghamSYSBPSystolic Blood Pressure (mean of last two of three measurements) (mmHg)MeasurementmmHg
framinghamDIABPDiastolic Blood Pressure (mean of last two of three measurements) (mmHg)MeasurementmmHg
framinghamCURSMOKECurrent cigarette smoking at examStatus
framinghamCIGPDAYNumber of cigarettes smoked each dayMeasurement
framinghamBMIBody Mass Index  weight in kilograms/height meters squaredMeasurementkg/m^2
framinghamDIABETESDiabetic according to criteria of first exam treated or first exam with casual glucose of 200 mg/dL or more: 0 = not diabetic  1=diabeticStatus
framinghamBPMEDSUse of anti-hypertensive medication: 0 = not used  Status
framinghamHEARTRTEHeart rate (Ventricular rate) in beats/minMeasurementbeats/min
framinghamGLUCOSECasual serum glucose (mg/dL)Measurementmg/dL
framinghameducAttained Education:  1=0-11 years  2=High School Diploma or GED  3=Some College or Vocational School  4=College (BS BA) degree or more Status
framinghamPREVCHDPrevalent Coronary Heart Disease defined as pre-existing Angina Pectoris     Myocardial Infarction (hospitalized     silent or unrecognized)  or Coronary Insufficiency (unstable angina)Status
framinghamPREVAPPrevalent Angina Pectoris at exam: 0 = free of disease  1 = prevalent diseaseStatus
framinghamPREVMIPrevalent Myocardial Infarction Status
framinghamPREVSTRKPrevalent Stroke Status
framinghamPREVHYPPrevalent Hypertensive. Subject was defined as hypertensive if treated or if second exam at which mean systolic was >=140 mmHg or mean Diastolic >=90 Status
framinghamTIMENumber of days since baseline examTimedays
framinghamPERIODExamination cycle: 1 = first  2 = mid  3 = finalMarker
framinghamHDLCHigh Density Lipoprotein Cholesterol (mg/dL)  (available for period 3 only)Measurementmg/dL
framinghamLDLCLow Density Lipoprotein Cholesterol (mg/dL)  (available for period 3 only)Measurementmg/dL
framinghamDEATHDeath from any cause Event
framinghamANGINAAngina PectorisEvent
framinghamHOSPMIHospitalized Myocardial InfarctionEvent
framinghamMI_FCHDHospitalized Myocardial Infarction or Fatal Coronary Heart DiseaseEvent
framinghamANYCHDAngina Pectoris  Myocardial infarction (Hospitalized and silent or unrecognized)  Coronary Insufficiency (Unstable Angina)  or Fatal Coronary Heart DiseaseEvent
framinghamSTROKEAtherothrombotic infarction  Cerebral Embolism  Intracerebral Hemorrhage  or Subarachnoid Hemorrhage or Fatal Cerebrovascular DiseaseEvent
framinghamCVDMyocardial infarction (Hospitalized and silent or unrecognized)     Fatal Coronary Heart Disease     Atherothrombotic infarction     Cerebral Embolism  Intracerebral Hemorrhage  or Subarachnoid Hemorrhage or Fatal Cerebrovascular DiseaseEvent
framinghamHYPERTENHypertensive. Defined as the first exam treated for high blood pressure or second exam in which either Systolic is $ 140 mmHg or Diastolic $ 90mmHg Event
framinghamTIMEAPNumber of days from Baseline exam to first Angina during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time
framinghamTIMEMINumber of days from Baseline exam to first HOSPMI  event (Hospitalized Myocardial Infarction) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time
framinghamTIMEMIFCNumber of days from Baseline exam to first MI_FCHD event (Hospitalized Myocardial Infarction or Fatal Coronary Heart Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time
framinghamTIMECHDNumber of days from Baseline exam to first ANYCHD  event (Angina Pectoris  Myocardial infarction (Hospitalized and silent or unrecognized)     Coronary Insufficiency (Unstable Angina)  or Fatal Coronary Heart Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time
framinghamTIMESTRKNumber of days from Baseline exam to first STROKE event (Atherothrombotic infarction  Cerebral Embolism  Intracerebral Hemorrhage  or Subarachnoid Hemorrhage or Fatal Cerebrovascular Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time
framinghamTIMECVDNumber of days from Baseline exam to first CVD event (Myocardial infarction (Hospitalized and silent or unrecognized)  Fatal Coronary Heart Disease  Atherothrombotic infarction  Cerebral Embolism  Intracerebral Hemorrhage  or Subarachnoid Hemorrhage or Fatal Cerebrovascular Disease) during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time
framinghamTIMEDTHNumber of days from Baseline exam to death if occurring during followup or Number of days from Baseline to censor date. Censor date may be end of followup  or last known contact date if subject is lost to followup Time
framinghamTIMEHYPNumber of days from Baseline exam to first HYPERTEN event during the followup or Number of days from Baseline to censor date. Censor date may be end of followup  death or last known contact date if subject is lost to followup Time



Was this article helpful?