Operations Overview
The Operations section provides real-time visibility and control over your layline.io deployments, from cluster health to individual engine states.
Purpose
Once you've designed and deployed your workflows, the Operations section becomes your mission control. This is where you monitor live systems, diagnose issues, and manage the day-to-day running of your data pipelines. Unlike the Assets section (where you build) or the Deployment section (where you configure), Operations is about observing and interacting with what's actually happening right now.
The Operations section is organized around three core concepts:
- Cluster Management — The infrastructure view: nodes, deployments, and system health
- Engine State — The runtime view: what's executing, what's connected, what's flowing
- Audit Trail — The history view: who did what, when, and with what result
Who Uses Operations
- Operations Engineers — Monitor cluster health, respond to alarms, manage deployments
- Developers — Debug running workflows, inspect live state, trace data flow
- Administrators — Manage user access, review audit logs, configure system settings
Main Areas
Cluster Management
The cluster is the foundation — a collection of nodes running layline.io engines. This section covers:
- Cluster Login — How to connect to and authenticate with a cluster
- Cluster Tab Overview — Navigating the cluster-level interface
- Alarm Center — Real-time alerts, thresholds, and notification routing
- Deployment Storage — Where deployment configurations live and how to manage them
- Scheduler — Workflow scheduling and execution history
- Stream Monitor — Controller to observe and manage data streams, throughput, and backpressure
- Sniffer Directory — Controller to observer and manage message sniffing
- Access Coordinator — Managing access to sources and resources
- Operations User Storage — User- and role-specific operational data and preferences
- Operations Secret Storage — Secure credential management for operations
- AI Storage — Storage for AI/ML model artifacts and training data
- Cluster Node Detail — Deep-dive into individual node metrics and logs, as well as switching debugging context to a specific node
Engine State
While Cluster Management shows you the infrastructure, Engine State shows you what's actually running on it. This is the live runtime view:
- Engine State Overview — The main dashboard for runtime monitoring
- Workflow State — Active workflows, their status, and execution context
- Service State — Running services and their health
- Connection State — Active connections to external systems
- Source State — Input sources and their folders, read positions, etc.
- Sink State — Output sinks and their write status
- Format State — Format parsers and logs
- Resource State — Resource status and detail configs
Engine State is particularly useful for debugging: You can see whether all Assets are running as expected, and look at the detailed state of each as well as their configurations.
Audit Trail
The Audit Trail provides a comprehensive record of all workflow and stream related actions taken within the system:
- Audit Trail Overview — Understanding the audit log structure and retention
Audit logs capture:
- Workflow executions (start, completion, failure)
- Stream events (data arrival, processing milestones)
Other logging for system events (alarms, node status changes, etc.) can be found in the respective sections of Cluster Controllers and Engine State.
Navigating the Operations UI
The Operations section uses a three-level navigation pattern:
- Section Tabs — Switch between Cluster, Engine State, and Audit Trail
- Category Sidebar — Within each section, navigate between specific tools (e.g., Alarm Center, Scheduler)
- Detail Panels — Drill into specific entities (a node, a workflow, a log entry)
Most operational screens follow a similar layout:
- Top bar — Context selector (cluster, environment, time range)
- Main panel — Primary data (lists, graphs, diagrams)
- Sidebar — Filters, quick actions, related links
Common Workflows
Investigating an Alarm
- Alarm fires → Notification sent (email/Teams)
- Open Alarm Center to see the alert details
- Check Cluster Overview for node health
- Drill into Engine State to find the affected workflow
- Review Audit Trail for recent changes
- Take corrective action (restart, redeploy, or escalate)
Tracing a Data Flow Issue
- Start in Audit Trail Workflow to identify workflow instances with errors
- Check Audit Trail Stream to confirm data is arriving and is being processed
- Review Engine State to check workflow and service health
- Use Cluster Node Detail to inspect logs and metrics on the node running the affected workflow
- Identify bottlenecks or failures and take action (e.g., restart workflow, adjust resources, or fix configuration)
Key Concepts
Cluster vs. Engine
- Cluster — The physical or virtual infrastructure (nodes, networks, storage)
- Engine — The layline.io runtime process executing workflows
A cluster can run multiple engines. An engine belongs to one cluster.
Live State vs. Configuration
- Configuration (Assets section) — What should be running (the blueprint)
- Live State (Operations section) — What is running right now (the reality)
Operations shows live state. If you see a discrepancy (e.g., a missing running workflow which is configured in a project), it usually means a deployment has failed or is pending or an error has occurred.
Alarms vs. Logs
- Alarms — Notifications about current problems requiring attention
- Logs — Historical record of past events for analysis
Alarms are actionable now. Logs are searchable history.
Security Considerations
Operations provides powerful visibility into running systems. Access is typically restricted:
- Read-only access — View metrics, logs, and state (typical for developers)
- Operational access — Restart workflows, acknowledge alarms, trigger deployments (typical for ops engineers)
- Administrative access — Full control including user management and audit log access (typical for admins)
See Access Coordinator for details on permission management.
See Also
- Deployment — How deployments are configured and created
- Workflow — Workflow design and configuration
- Assets Overview — Building the components that run in Operations