Chapter 3 Concepts

Spending some time on the important concepts helps in the understanding the logic behind Tercen. The following questions highlight some core concepts.

What is a measurement, variable and observation?

What is a measurement, variable and observation? This question seems so obvious but it helps to understand how Tercen structures data. Tercen is measurement orientated. For example, think of a laboratory instrument generating measurements, say a protein or gene measurement value mt, and each measurement is taken with an annotation context, i.e. the sample name (sn), the temperature(tm), the time(ti), the Ph (Ph) etc. So each mt (i.e. a measurement) value is annotated with many other values sn, tm, ti, Ph. Describing mt can be done by using a data table using a long format such as:

mt sn tm ti Ph
200.3 sample1 37 60 7.7
413.1 sample2 37 70 8.0
273.0 sample3 37 80 7.0

Each row represents a individual measurement with all its corresponding annotation values. The number of observations are the number of rows. The number of columns is the number of variables. This format may seem verbose but it is measurement orientated and comes with the complete annotation of each measurement (i.e. rows). Internally Tercen uses such a row like structure. Each measurement is recorded with a full experimental context of annotation values. In the above example there is one measurement variable mt, three observations and four annotation variables in one data table.

Tercen is geared to work with measurements. Visuals and computations are performed on measurements. Remembering this helps in understanding Tercen.

In Tercen measurements are usually placed on the y-axis of most projection views (defined by the cross-tab window) and the others are used to define the row, col, x-axis etc.

Statistician traditionally use column headers to describe a variable and rows to describe an observation. The Tercen data model allows the measurement, variable and observation to be positioned anywhere in the visuals (col, row, x-axis, y-axis, color, label) and therefore there is great flexibility. However most Tercen operators require to know what is the variable in order to compute the appropriate results. Therefore we have adopted the convention of using variable on the row. This is inspired by the genomics world where the gene (i.e. a variable) is a row and the samples are the observation. Most Tercen operators or apps use the row as a variable and columns as observations and the y-axis for the measurement. The documentation associated with the operator indicates what the operator is expecting in terms of position and input. Note, there are Tercen operators which do not follow these conventions.

What is a step, app and workflow?

Typically, researchers perform many computations on their measurements. Tercen supports this by having the concepts of step, apps and workflows. An individual step is a projection and potentially a computation of the data. Projections are defined visually using the cross-tab window associated with a step. A workflow app is composed of several steps. A workflow is composed of several apps and steps.

What is visual computation?

It allows researchers to define a visual projection of the data, this projection also defines the computation to be executed. This done by interacting with the Tercen cross-tab window. For those readers who are interested, these projections are in fact using a universal table and relational algebra (see below).

What is an universal table and relational algebra?

A researcher avoids complexity by using the Tercen’s universal table concept. A universal table is essentially an abstract table. It presents itself as one table even though underneath it is composed of several actual tables. The universal table allows Tercen to always represent a unified data model throughout the Tercen workflow and visual projections. The underlying tables are all linked using relational algebra. The researchers can make these sophisticated relationships without requiring to understand relational algebra.