14 Data Input and Output Patterns
This chapter covers advanced patterns for handling data input and output in Tercen operators. Building on the basic concepts from previous chapters, we’ll explore sophisticated techniques for data manipulation, multiple output types, and complex data relationships.
Before proceeding, ensure you’ve completed: - Development Environment Setup chapter for development environment - Basic Implementation chapter for core operator concepts - Understanding of Tercen’s projection system
14.1 Understanding Tercen’s Data Structure
Tercen organizes data using a projection system with specific index columns:
Column | Purpose | Usage |
---|---|---|
.ri |
Row index | Identifies specific rows in the data projection |
.ci |
Column index | Identifies specific columns in the data projection |
.y |
Data values | The actual measurement or observation values |
.x |
X-axis values | Independent variable values (when applicable) |
These special columns enable flexible data aggregation and output patterns.
14.2 Basic Output Patterns
14.2.1 Per-Cell Output (Default)
Most operators output one result per cell, maintaining the original data structure:
library(tercen)
library(dplyr)
# Connect to Tercen context
<- tercenCtx()
ctx
# Per-cell calculation (e.g., log transformation)
<- ctx %>%
result select(.ri, .ci, .y) %>%
mutate(log_value = log(.y + 1)) %>%
select(.ri, .ci, log_value) %>%
$addNamespace()
ctx
$save(result) ctx
from tercen.client import context as ctx
import polars as pl
# Connect to Tercen context
= ctx.TercenContext()
tercenCtx
# Per-cell calculation
= (
df
tercenCtx'.ri', '.ci', '.y'], df_lib="polars")
.select([
.with_columns(['.y') + 1).log().alias('log_value')
(pl.col(
])'.ri', '.ci', 'log_value'])
.select([
)
= tercenCtx.add_namespace(df)
df tercenCtx.save(df)
14.2.2 Per-Row Output
Aggregate data across columns for each row:
# Calculate statistics per row
<- ctx %>%
row_stats select(.ri, .ci, .y) %>%
group_by(.ri) %>%
summarise(
mean_value = mean(.y, na.rm = TRUE),
sd_value = sd(.y, na.rm = TRUE),
count = n(),
.groups = "drop"
%>%
) $addNamespace()
ctx
$save(row_stats) ctx
# Calculate statistics per row
= (
df
tercenCtx'.ri', '.ci', '.y'], df_lib="polars")
.select(['.ri'])
.group_by([
.agg(['.y').mean().alias('mean_value'),
pl.col('.y').std().alias('sd_value'),
pl.col('.y').count().alias('count')
pl.col(
])
)
= tercenCtx.add_namespace(df)
df tercenCtx.save(df)
14.2.3 Per-Column Output
Aggregate data across rows for each column:
# Calculate statistics per column
<- ctx %>%
col_stats select(.ri, .ci, .y) %>%
group_by(.ci) %>%
summarise(
median_value = median(.y, na.rm = TRUE),
q25 = quantile(.y, 0.25, na.rm = TRUE),
q75 = quantile(.y, 0.75, na.rm = TRUE),
.groups = "drop"
%>%
) $addNamespace()
ctx
$save(col_stats) ctx
# Calculate statistics per column
= (
df
tercenCtx'.ri', '.ci', '.y'], df_lib="polars")
.select(['.ci'])
.group_by([
.agg(['.y').median().alias('median_value'),
pl.col('.y').quantile(0.25).alias('q25'),
pl.col('.y').quantile(0.75).alias('q75')
pl.col(
])
)
= tercenCtx.add_namespace(df)
df tercenCtx.save(df)
14.3 Advanced Output Patterns
14.3.1 Multiple Output Tables
Some operators need to return multiple related datasets:
# Generate multiple output tables
<- ctx %>%
summary_stats select(.ri, .ci, .y) %>%
summarise(
overall_mean = mean(.y, na.rm = TRUE),
overall_sd = sd(.y, na.rm = TRUE)
%>%
) $addNamespace()
ctx
<- ctx %>%
row_stats select(.ri, .y) %>%
group_by(.ri) %>%
summarise(row_mean = mean(.y, na.rm = TRUE)) %>%
$addNamespace()
ctx
<- ctx %>%
col_stats select(.ci, .y) %>%
group_by(.ci) %>%
summarise(col_mean = mean(.y, na.rm = TRUE)) %>%
$addNamespace()
ctx
# Save multiple tables
$save(list(summary_stats, row_stats, col_stats)) ctx
# Generate multiple output tables
= (
summary_stats
tercenCtx'.y'], df_lib="polars")
.select([
.select(['.y').mean().alias('overall_mean'),
pl.col('.y').std().alias('overall_sd')
pl.col(
])
)
= (
row_stats
tercenCtx'.ri', '.y'], df_lib="polars")
.select(['.ri'])
.group_by(['.y').mean().alias('row_mean')])
.agg([pl.col(
)
= (
col_stats
tercenCtx'.ci', '.y'], df_lib="polars")
.select(['.ci'])
.group_by(['.y').mean().alias('col_mean')])
.agg([pl.col(
)
# Add namespaces and save
= tercenCtx.add_namespace(summary_stats)
summary_stats = tercenCtx.add_namespace(row_stats)
row_stats = tercenCtx.add_namespace(col_stats)
col_stats
tercenCtx.save([summary_stats, row_stats, col_stats])
14.3.2 Working with Factor Variables
When your projection includes factors (categorical variables), incorporate them into your analysis:
# Include factors in analysis
<- tercenCtx()
ctx
# Get factor columns
<- ctx$rselect() # Get all factors
factors
<- ctx %>%
result select(.ri, .ci, .y) %>%
left_join(factors, by = c(".ri", ".ci")) %>%
group_by(.ri, .ci, factor_column) %>% # Include relevant factors
summarise(
group_mean = mean(.y, na.rm = TRUE),
.groups = "drop"
%>%
) $addNamespace()
ctx
$save(result) ctx
# Include factors in analysis
= tercenCtx.rselect(df_lib="polars")
factors_df
= (
result
tercenCtx'.ri', '.ci', '.y'], df_lib="polars")
.select([=['.ri', '.ci'], how='left')
.join(factors_df, on'.ri', '.ci', 'factor_column'])
.group_by(['.y').mean().alias('group_mean')])
.agg([pl.col(
)
= tercenCtx.add_namespace(result)
result tercenCtx.save(result)
14.4 Specialized Output Types
14.4.1 File Output
Tercen operators can generate and output files (plots, reports, data exports) that users can download or view directly in the interface. This is particularly useful for visualization operators, report generators, and data export tools.
- Plots and Visualizations: PNG, PDF, SVG graphics
- Reports: HTML, PDF documents with analysis results
- Data Exports: CSV, Excel files with processed data
- Configuration Files: JSON, YAML files for downstream tools
See the Patterns for Plot Operators chapter for a detailed tutorial on how to output files in Tercen.
14.4.2 Relations Output
Relations in Tercen support complex data linking and joining tables. This is useful for operators that need to create complex relationships between different data dimensions.
- PCA analysis with loadings and scores
- Clustering with cluster assignments and centroids
- Complex statistical models with multiple output components
Key relation functions: - as_relation()
: Convert data frames to relations - left_join_relation()
: Join relations together - save_relation()
: Save relations to Tercen - as_join_operator()
: Create join operators for complex relationships
# Example: Simple relation output
library(tibble)
# Create a relation with results
<- tibble(
result_relation component = c("PC1", "PC2", "PC3"),
variance_explained = c(0.45, 0.32, 0.15),
eigenvalue = c(4.5, 3.2, 1.5)
%>%
) $addNamespace() %>%
ctxas_relation()
# Save relation
$save_relation(result_relation) ctx
import polars as pl
# Create a relation with results
= pl.DataFrame({
result_data 'component': ['PC1', 'PC2', 'PC3'],
'variance_explained': [0.45, 0.32, 0.15],
'eigenvalue': [4.5, 3.2, 1.5]
})
= tercenCtx.add_namespace(result_data)
result_relation = tercenCtx.as_relation(result_relation)
result_relation
# Save relation
tercenCtx.save_relation(result_relation)
14.5 Advanced Input Patterns
14.5.1 Reading Project Files
Sometimes operators need to access additional files stored in the same project:
# Get workflow and project information
<- ctx$context$client$workflowService$get(ctx$context$workflowId)
workflow <- ctx$schema$projectId
project_id
# Find project files
<- ctx$client$projectDocumentService$findProjectObjectsByFolderAndName(
project_files c(project_id, "ufff0", "ufff0"),
c(project_id, "", ""),
useFactory = FALSE,
limit = 25000
)
# Find specific file
<- "config.csv"
target_file <- sapply(project_files, function(f) f$name)
file_names <- which(grepl(target_file, file_names))[1]
file_index
if (!is.na(file_index)) {
<- project_files[[file_index]]
pf
# Download and read file
<- ctx$context$client$fileService$download(pf$id)
response <- response$read()
file_content
# Process as needed
if (is.raw(file_content)) {
<- rawToChar(file_content)
file_content
}
# Use file content in analysis...
}
# Get project information
= tercenCtx.schema.projectId
project_id
# Find project files
= tercenCtx.client.projectDocumentService.findProjectObjectsByFolderAndName(
project_files "ufff0", "ufff0"],
[project_id, "", ""],
[project_id, =False,
useFactory=25000
limit
)
# Find specific file
= 'config.csv'
target_file = [f.name for f in project_files]
fnames = [i for i, name in enumerate(fnames) if target_file in name]
matching_files
if matching_files:
= project_files[matching_files[0]]
pf
# Download and read file
= tercenCtx.context.client.fileService.download(pf.id)
resp = resp.read()
file_content
# Process as needed
if isinstance(file_content, bytes):
= file_content.decode('utf-8')
file_content
# Use file content in analysis...
Avoid manual file retrieval when possible. Instead, include files directly in the workflow input projection for better reproducibility and user experience.
14.6 Next Steps
With these input and output patterns mastered, you can:
- Create Complex Operators: Combine multiple patterns for sophisticated analyses
- Handle Edge Cases: Build robust operators that gracefully handle data issues
- Optimize Performance: Use efficient data processing techniques
- Integrate with Workflows: Design operators that work seamlessly in Tercen pipelines
The next chapter covers continuous integration and deployment strategies for your operators.