13 Input and Output Patterns
This chapter provides a walkthrough for using advanced input and output operations in Tercen. It demonstrates how to output data frames and relations, save files as Tercen tables, and read data from projects. This guide will help you better manage and interact with data in Tercen workflows.
1. Setting Up the Tercen Context
Before working with any Tercen workflow, start by setting up the context with TercenContext
, which connects to your specific Tercen instance.
2. Output Patterns
Data Frame Output
Data frames in Tercen are outputted per row, per column, or per cell. Each type of output requires to include a specific column: .ri
for rows and .ci
for columns.
2.1 Row-Wise Output
The following code outputs the mean value for each row. It groups the data by .ri
, calculates the mean, and saves it back to Tercen.
3. Tercen Relation Output
Relations in Tercen support complex data linking, such as joining tables and managing non-standard row/column associations. Relations allow for left joins and merging tables.
Here are the relevant API calls:
-
Create Relations using
as_relation()
. -
Join Relations with
left_join_relation()
. -
Save Relations with
save_relation()
.
Example: Generating a PCA relation with components.
data.matrix = t(ctx %>% as.matrix())
aPca = data.matrix %>% prcomp(scale = scale, center = center, tol = tol)
maxComp = ifelse(maxComp > 0, min(maxComp, nrow(aPca$rotation)), nrow(aPca$rotation))
npc = length(aPca$sdev)
# pad left pc names with 0 to ensure alphabetic order
pcRelation = tibble(PC = sprintf(paste0("PC%0", nchar(as.character(npc)), "d"), 1:npc)) %>%
ctx$addNamespace() %>%
as_relation()
eigenRelation = tibble(pc.eigen.values = aPca$sdev^2) %>%
mutate(var_explained = .$pc.eigen.values / sum(.$pc.eigen.values))%>%
ctx$addNamespace() %>%
as_relation()
loadingRelation = aPca$rotation[,1:maxComp] %>%
as_tibble() %>%
setNames(0:(ncol(.)-1)) %>%
mutate(.var.rids = 0:(nrow(.) - 1)) %>%
pivot_longer(-.var.rids,
names_to = ".pc.rids",
values_to = "pc.loading",
names_transform=list(.pc.rids=as.integer)) %>%
ctx$addNamespace() %>%
as_relation() %>%
left_join_relation(ctx$rrelation, ".var.rids", ctx$rrelation$rids)
scoresRelation = aPca$x[,1:maxComp] %>%
as_tibble() %>%
setNames(0:(ncol(.)-1)) %>%
mutate(.i=0:(nrow(.)-1)) %>%
pivot_longer(-.i,
names_to = ".pc.rids",
values_to = "pc.value",
names_transform=list(.pc.rids=as.integer)) %>%
ctx$addNamespace() %>%
as_relation() %>%
left_join_relation(ctx$crelation, ".i", ctx$crelation$rids)
# link all 4 relation into one and save
rels <- pcRelation %>%
left_join_relation(scoresRelation, pcRelation$rids, ".pc.rids") %>%
left_join_relation(eigenRelation, pcRelation$rids, eigenRelation$rids) %>%
left_join_relation(loadingRelation, pcRelation$rids, ".pc.rids") %>%
as_join_operator(ctx$cnames, ctx$cnames)
4. File Output
Tercen supports outputting files, such as images or documents, by first saving them temporarily and then converting them to a Tercen-compatible format.
from tempfile import NamedTemporaryFile
import matplotlib.pyplot as plt
# Save plot as a temporary file
tmp = NamedTemporaryFile(delete=True, suffix='.png')
data_np = df_per_cell["mean_cells"].to_numpy()
plt.hist(data_np, bins=5, edgecolor="black")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram")
plt.savefig(tmp)
# Convert file to Tercen table
from tercen.util.helper_functions import as_relation
df_plot = as_relation(tmp.name)
ctx.save_relation(df_plot)
5. Advanced Input: Reading from Project
Tercen enables reading files and documents directly from a project.
-
Identify Project and Folder IDs using
workflowId
andfolderId
. - Locate Files by matching the file name in the project.
-
Download Documents using
download()
.
Example: Retrieving a document from the project.
# Get the workflow and Folder ID that contains it
wf = ctx.context.client.workflowService.get(ctx.context.workflowId)
wf.folderId
# Get project ID
projectId = ctx.schema.projectId
project = ctx.context.client.projectService.get(projectId)
projectUser = project.acl.owner
projectFiles = ctx.client.projectDocumentService.findProjectObjectsByFolderAndName(\
[projectId, "ufff0", "ufff0"],\
[projectId, "", ""], useFactory=False, limit=25000 )
fnames = [f.name for f in projectFiles]
## Download a document
my_file_id = [index for index, fname in enumerate(fnames) if 'crabs_file.csv' in fname][0] # First match
# Get Document properties
pf = projectFiles[my_file_id]
pf.name # file name
pf.folderId # folder ID
pf.id # document ID
# Download data and read response
resp = ctx.context.client.fileService.download(pf.id)
resp.read()
# If binary response, must be decoded
Recommendation: Avoid manual file retrieval by setting document IDs directly in the workflow input projection.
This chapter provides essential steps for managing inputs and outputs within Tercen. Utilize these methods to streamline workflows and enhance data integration capabilities in your Tercen projects.