15  Operator Improvements and Best Practices

This chapter covers advanced techniques for improving operator quality, reliability, and maintainability. You’ll learn essential practices for logging, error handling, testing, and optimization that ensure your operators work robustly in production environments.

Learning Objectives

By the end of this chapter, you will be able to: - Implement comprehensive logging and debugging strategies - Build robust error handling and input validation - Create comprehensive test suites for operators - Apply performance optimization techniques - Follow best practices for production-ready operators

Prerequisites

Before proceeding, ensure you’ve completed: - Basic Implementation chapter for core operator concepts - Advanced Features chapter for complex functionality - Data Input and Output Patterns for data handling

15.1 Logging and Debugging

Effective logging is essential for monitoring operator behavior and diagnosing issues in production environments.

15.1.1 Basic Logging

Implement logging for production operators:

ctx$log("Your message.")
ctx.log("Your message.")
Logging Best Practices
  • Log key milestones: Start/end of major operations
  • Include data metrics: Row counts, processing times, memory usage
  • Log parameter values: Help reproduce issues with specific inputs
  • Use structured formats: Enable easier log parsing and analysis
  • Avoid logging sensitive data: Protect user privacy and security

15.2 Error Handling

Implement comprehensive error handling that provides helpful feedback to users:

# Comprehensive error handling with user-friendly messages
robust_operator <- function(ctx) {
  tryCatch({
    # Validate inputs first
    validate_inputs(ctx)
    
    # Main processing with progress logging
    ctx$log(paste("[INFO]", Sys.time(), "- Beginning data analysis"))
    
    # Check for edge cases
    data <- ctx$select(.ri, .ci, .y)
    
    if (any(is.infinite(data$.y))) {
      ctx$log(paste("[WARNING]", Sys.time(), "- Infinite values detected, removing them"))
      data <- data[is.finite(data$.y), ]
    }
    
    if (nrow(data) == 0) {
      stop("No valid data remaining after cleaning")
    }
    
    # Perform analysis
    result <- perform_analysis(data)
    
    ctx$log(paste("[INFO]", Sys.time(), "- Analysis completed successfully"))
    return(result)
    
  }, error = function(e) {
    # Log the technical error
    ctx$log(paste("[ERROR]", Sys.time(), "- Technical error:", e$message))
    
    # Provide user-friendly error message
    if (grepl("projection.*required", e$message)) {
      stop("Please ensure you have dragged the required data columns to the appropriate axes.")
    } else if (grepl("data points required", e$message)) {
      stop("This analysis requires at least 3 data points. Please check your data selection.")
    } else if (grepl("values must vary", e$message)) {
      stop("The data values do not vary enough for this analysis. Please check your input data.")
    } else {
      stop(paste("An error occurred during analysis:", e$message))
    }
  })
}
def robust_operator(tercen_ctx):
    """Operator with comprehensive error handling"""
    
    try:
        # Validate inputs first
        validate_inputs(tercen_ctx)
        
        # Main processing with progress logging
        tercen_ctx.log(f"[INFO] {datetime.now()} - Beginning data analysis")
        
        # Check for edge cases
        df = tercen_ctx.select(['.ri', '.ci', '.y'], df_lib="polars")
        
        # Handle infinite values
        infinite_count = df.filter(pl.col('.y').is_infinite()).height
        if infinite_count > 0:
            tercen_ctx.log(f"[WARNING] {datetime.now()} - Infinite values detected ({infinite_count}), removing them")
            df = df.filter(pl.col('.y').is_finite())
        
        if len(df) == 0:
            raise ValueError("No valid data remaining after cleaning")
        
        # Perform analysis
        result = perform_analysis(df)
        
        tercen_ctx.log(f"[INFO] {datetime.now()} - Analysis completed successfully")
        return result
        
    except ValueError as ve:
        tercen_ctx.log(f"[ERROR] {datetime.now()} - Validation error: {str(ve)}")
        
        # Provide user-friendly error messages
        if "projection" in str(ve) and "required" in str(ve):
            raise ValueError("Please ensure you have dragged the required data columns to the appropriate axes.")
        elif "data points required" in str(ve):
            raise ValueError("This analysis requires at least 3 data points. Please check your data selection.")
        elif "values must vary" in str(ve):
            raise ValueError("The data values do not vary enough for this analysis. Please check your input data.")
        else:
            raise ValueError(f"An error occurred during analysis: {str(ve)}")
            
    except Exception as e:
        tercen_ctx.log(f"[ERROR] {datetime.now()} - Unexpected error: {str(e)}")
        raise ValueError(f"An unexpected error occurred. Please check your data and try again. Error: {str(e)}")
Error Handling Guidelines
  • Validate early: Check inputs before expensive computations
  • Fail gracefully: Provide clear, actionable error messages
  • Log technical details: Help with debugging while keeping user messages simple
  • Handle edge cases: Account for missing data, infinite values, empty datasets
  • Test error scenarios: Ensure error handling works as expected

15.3 Testing and Validation

Comprehensive testing ensures your operator works correctly across different data scenarios and edge cases.

Testing Approaches

Tercen supports two main testing frameworks: 1. Unit Tests: Simple data files with expected input/output and test specifications 2. Integration Tests: Actual Tercen workflows triggered to perform computations

15.3.1 Unit Test Structure

Create a tests directory in your operator repository with the following structure:

tests/
├── input.csv          # Sample input data
├── output.csv         # Expected output data  
├── test.json          # Test configuration

For multiple test scenarios, use numbered files: - test_1.json, test_2.json for different parameter settings - input_1.csv, input_2.csv for different data scenarios

15.3.2 Creating Comprehensive Test Data

Design test cases that cover various scenarios:

Test Scenario Purpose Example Data
Normal Case Standard operation Regular numeric data with good distribution
Edge Cases Boundary conditions Minimum data points, extreme values
Error Cases Invalid inputs Missing data, wrong data types

15.3.3 Test Configuration File

Create comprehensive test.json files to specify how Tercen should run your tests:

Basic Test Configuration:

{
  "kind": "OperatorUnitTest",
  "name": "regression_test_basic",
  "namespace": "test",
  "inputDataUri": "input.csv",
  "outputDataUri": ["output.csv"],
  "columns": [],
  "rows": [],
  "colors": [],
  "labels": [],
  "yAxis": ".y",
  "xAxis": ".x",
  "properties": {
    "intercept.omit": false,
    "confidence.level": 0.95
  }
}

This completes our comprehensive guide to Tercen operator development. You now have all the tools and knowledge needed to create robust, efficient, and user-friendly operators that extend Tercen’s analytical capabilities!