- Published on
On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz
On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz
Bring every cron script, legacy EXE, PowerShell task, and Python job into one dashboard you host yourself -- without sending a single byte to a SaaS vendor.
Part 3 of 3 in a series on observing a .NET system with OpenTelemetry and SigNoz (series index). It reuses the Collector, Compose file, and SigNoz install from Part 1.
Web apps are the easy part of observability. The hard part is everything else that keeps your business running after hours: a nightly cron job you inherited, a Python ETL a data team owns, a PowerShell backup task, a 12-year-old exporter that writes nothing but export-20260529.txt. You can't dotnet add package your way out of those -- and they're often exactly the jobs that fail silently at 3 a.m.
This post shows how to get all of them into SigNoz, running entirely on your own infrastructure. By the end you'll be able to observe a job no matter where it sits on the spectrum -- from "only writes text files" to "full OpenTelemetry SDK." Every script and config is included at the end -- there's no repo to clone.
What you'll learn
- Why on-prem observability is the right fit for background jobs
- How the OpenTelemetry Collector acts as a universal on-ramp for anything
- Case A: collect a job that only writes
.txtfiles (no code changes) - Case B: instrument a Python job with the OpenTelemetry SDK
- Case C: get telemetry out of PowerShell, which has no official SDK
- How they all show up side by side in SigNoz
Why on-prem?
SigNoz is fully open-source and self-hosted, which matters more for background jobs than for almost anything else:
- Your data stays in your network. Batch jobs touch your most sensitive data -- financial exports, PII, backups. With self-hosted SigNoz, the telemetry about that work never leaves your infrastructure. That's a real advantage in regulated or air-gapped environments where shipping logs to a SaaS is a non-starter.
- No per-GB surprise bill. Jobs are noisy -- verbose logs, high-frequency runs. On a usage-priced SaaS, job logs are where the bill explodes. Self-hosted, the cost is the box it runs on.
- It works where the jobs work. Plenty of these jobs run on an on-prem server or a locked-down VM with no outbound internet. The Collector and SigNoz run right there next to them.
Everything in this post runs locally: the apps and the Collector in your network, and SigNoz storing data in its own ClickHouse database on your infrastructure. Nothing egresses.
The one idea: the Collector is the on-ramp
Here's the whole mental model. The OpenTelemetry Collector is a small service that ingests telemetry from many sources and forwards it to SigNoz. Apps you control speak OTLP to it directly. Apps you don't control get adapted at the Collector:
in-solution .NET worker ──OTLP──┐
Python job (OTEL SDK) ──OTLP────┤
PowerShell (OTLP/HTTP) ─────────┤──▶ OpenTelemetry Collector ──▶ SigNoz (on your infra)
legacy job (.txt files) ─(filelog reads the files)─┘
Think of jobs on a maturity ladder, and the Collector handles every rung:
- No telemetry, only log files → the Collector reads the files (
filelogreceiver) and turns each line into a log record. - Can write structured lines or POST a payload, but no SDK → JSON-lines files, or OTLP/HTTP over the wire.
- Has a real SDK (Python, Java, Go, Node) → traces + metrics + logs natively.
Where a job sits only changes how its signal gets in -- never where it lands. Let's start at the bottom of the ladder, which is the hardest and most common.
The easy case (for contrast): an in-solution .NET worker
If you own the code and it's .NET, it's a one-liner. A background worker calls the same AddObservability helper the web apps use (full source in Part 1), points it at its own WorkerTelemetry class of custom instruments (full source in the appendix), and just turns off the web-server instrumentation:
builder.AddObservability("worker-jobs", options =>
{
options.InstrumentAspNetCore = false; // not a web server
options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
options.Meters.Add(WorkerTelemetry.MeterName);
});
That's the gold standard: custom spans, custom metrics, and -- because its HttpClient is instrumented -- automatic distributed traces (worker-jobs → backend-api → db). Everything below is what you do when you can't make that one call.
Case A -- a job that only writes .txt files
This is the job you'll meet most often: a legacy exporter, no SDK, no source you can change. It only appends human-readable lines:
2026-05-29 12:00:00 [INFO] run #3: wrote 161 records to dataset
2026-05-29 12:00:21 [ERROR] run #7: export failed: connection reset by peer
at LegacyExporter.Flush(batchId=7)
at LegacyExporter.Run()
You collect it with the Collector's filelog receiver, which tails the files and turns each entry into a log record. Here's the config, with each part explained:
receivers:
filelog/legacy:
include: [/var/log/legacy/*.txt] # 1. which files (glob handles daily rotation)
start_at: beginning
multiline: # 2. keep stack traces together
line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
operators:
- type: regex_parser # 3. split each entry into fields
regex: '(?s)^(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<sev>\w+)\] (?P<msg>.*)$'
timestamp: { parse_from: attributes.ts, layout: '%Y-%m-%d %H:%M:%S' }
severity: { parse_from: attributes.sev }
- type: move # 4. put the message in the log body
from: attributes.msg
to: body
What each numbered part does:
includeis a glob, soexport-20260529.txt,…30.txt, … are all picked up -- no config change when the date rolls over.multiline.line_start_patternsays "a new log entry begins only on a line that starts with a timestamp." Without it, a four-line stack trace becomes four useless records. With it, the[ERROR]line and itsat …frames become one record. This is the single most important setting for legacy logs.regex_parserpulls out the timestamp, severity, and message. The leading(?s)matters: Go's regex engine needs it so.can span newlines -- without it the multiline error entries fail to parse. Thetimestampblock makes SigNoz use the log's own time, andseveritymapsINFO/WARN/ERRORso severity filtering works.movepromotes the clean message into the log body.
One more step: a file has no service name, so we stamp one with a resource processor and route it through its own logs pipeline:
processors:
resource/legacy:
attributes:
- { key: service.name, value: legacy-batch-job, action: upsert }
- { key: service.namespace, value: blazor-signoz, action: upsert }
service:
pipelines:
logs/filelog:
receivers: [filelog/legacy]
processors: [resource/legacy, batch]
exporters: [otlp/signoz]
(This receiver, processor, and pipeline are part of the full Collector config in Part 1's appendix.)
Now the file-only job groups under legacy-batch-job in SigNoz, right next to your real services. Here's the payoff -- that multiline error as a single, parsed record:
A job that only writes .txt files, in SigNoz. The [ERROR] line and its three at … stack frames are one record body; severity is parsed to ERROR, the timestamp is the log's own, and log.file.name points back to the source file -- all done in the Collector, with zero changes to the job.
On-prem note -- don't lose your place. The demo uses
start_at: beginningon purpose, so you see the existing.txtlines the first time the Collector starts. But on its own that re-reads from the top on every restart (duplicates), whilestart_at: endskips anything written while the Collector was down (gaps). For production, add afile_storageextension so the read offsets survive restarts:extensions: file_storage: { directory: /var/lib/otelcol/storage } receivers: filelog/legacy: { include: [/var/log/legacy/*.txt], start_at: end, storage: file_storage }
Case B -- Python, with the OpenTelemetry SDK
When the job's language has a real SDK, use it. The Python ETL job produces traces, metrics, and logs identical in shape to the C# services, from just three packages:
opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-grpc
You build one Resource (the job's identity) and share it across the three providers, so everything groups under one service:
resource = Resource.create({"service.name": "python-etl-job", "service.namespace": "blazor-signoz"})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")
You set up a meter_provider (for the python.etl.* metrics) and a logger_provider (for logs) in exactly the same shape -- the full script is in the appendix.
Then your work nests spans naturally, so SigNoz shows an extract → transform → load waterfall:
with tracer.start_as_current_span("python.etl.run"):
with tracer.start_as_current_span("extract"): ...
with tracer.start_as_current_span("transform"): ...
with tracer.start_as_current_span("load"): ...
One ETL run as a trace. The root python.etl.run span (484 ms) contains extract, transform, and load, exactly as the with blocks nest them. This run is one of the roughly 15% that fail at load -- the span is red and the header reads Errors: 1, so you can see the stage and the timing of the failure without opening a log file on the box. A Python script you can read top to bottom, shown the same way a distributed microservice trace would be.
The one line short-lived jobs must not skip: batch processors buffer telemetry and flush on a timer. A job that exits normally drops whatever is still buffered -- your last spans and logs vanish. Always flush before exit:
finally: tracer_provider.shutdown(); meter_provider.shutdown(); logger_provider.shutdown()This is the #1 cause of "my cron job ran but I see nothing."
Don't want to touch the script? Use zero-code auto-instrumentation instead:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
OTEL_SERVICE_NAME=python-etl-job \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:5317 OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
opentelemetry-instrument python etl_job.py
(Set OTEL_EXPORTER_OTLP_PROTOCOL=grpc because 5317 is the Collector's gRPC port; the auto-instrumentation distro defaults to OTLP/HTTP, which would need the 5318 port instead.)
That gives you traces and logs for free; drop to the manual SDK only when you want custom spans like extract/transform/load and custom metrics.
Case C -- PowerShell, which has no official SDK
PowerShell has no official OpenTelemetry SDK, so a small dot-sourceable helper (OtelExport.ps1, in the appendix) offers two pragmatic paths.
Option A -- write JSON-lines to a file (most robust). The script just appends one JSON object per line; the Collector's filelog receiver picks it up with a json_parser:
Write-JobLog -Message "Backed up $files files" -Level INFO -Attributes @{ files = $files }
# -> {"time":"2026-05-29T12:00:00...","level":"INFO","msg":"Backed up 231 files","files":231}
The Collector side is a second filelog receiver, just like the legacy one in Case A but with a json_parser instead of a regex -- add it next to filelog/legacy and wire it into the logs pipeline:
receivers:
filelog/powershell:
include: [/var/log/powershell/*.log]
start_at: beginning
operators:
- type: json_parser
timestamp: { parse_from: attributes.time, layout: '%Y-%m-%dT%H:%M:%S.%L%z' }
severity: { parse_from: attributes.level }
- type: move # promote the message to the log body
from: attributes.msg
to: body
This is the safest option: the script never blocks on the network, and if the Collector is down it catches up later. Best for batch jobs.
Option B -- POST OTLP/HTTP directly (real-time). Send-OtelLog / Send-OtelTrace POST JSON straight to the Collector's HTTP port (:4318) with Invoke-RestMethod. You get real spans as the job runs -- but OTLP/JSON has sharp edges, and the helper handles each one:
- 64-bit timestamps must be quoted strings.
timeUnixNanois built as a string (([long]$ms * 1000000).ToString()), because JSON numbers can't safely hold int64. - Trace/span IDs are hex, not base64 -- 32 hex chars for the trace, 16 for the span.
- Enums are integers --
severityNumber(INFO = 9), spankind, statuscode. ConvertTo-Json -Depth 12-- the default depth of 2 silently truncates the nestedresourceLogs → scopeLogs → logRecordsstructure.
Which to use? File (Option A) for batch jobs where durability beats latency; OTLP/HTTP (Option B) when you want spans in real time. Doing both costs almost nothing -- the example job does exactly that.
The backup job's logs in SigNoz, filtered to service.name = powershell-backup-job. Each "Backed up N files" line is a structured record POSTed straight from PowerShell over OTLP/HTTP, with severity (INFO) and the files attribute preserved -- not a flat text line. The service shows up in the left-hand filter right next to backend-api, blazor-frontend, and python-etl-job, even though PowerShell has no SDK.
And it is not just logs. The hand-rolled OTLP/JSON from Option B produces a genuine span, so the same script shows up in Traces:
A real distributed-tracing span emitted by a PowerShell script. The backup span (463 ms) is the whole job, on service powershell-backup-job, built by hand in Send-OtelTrace and POSTed as OTLP/JSON -- hex IDs, quoted nanosecond timestamps and all. It is a single span here because the job does one unit of work, but nothing stops you from nesting child spans the same way the Python ETL does. The point: even a language with no SDK lands in the same Traces view as the C# services.
See it all in SigNoz
Bring the stack up, generate some traffic, and run the jobs (commands below). The payoff: a C# worker, a Python script, and a PowerShell script all land in the same Services list (they share a service.namespace), and the .txt-only legacy job shows up in Logs:
python-etl-job and powershell-backup-job sit in the Services list right next to backend-api, blazor-frontend, and worker-jobs. (The .txt-only legacy job appears in Logs, not the APM list, since it emits no spans.)
Then:
- Logs -- filter
service.name = legacy-batch-job, then severityERROR, and open one: theconnection reset by peermessage and its threeat …frames are one grouped record. - Traces -- filter
service.name = python-etl-job, open apython.etl.runtrace, and see theextract → transform → loadwaterfall (about 15% of runs fail atload, on purpose). - Metrics -- chart
python.etl.rows_processedandpython.etl.runsgrouped bysuccess.
What it costs to keep
On-prem flips the cost model: there is no per-GB ingest bill, just disk you already own. The trade is that you decide how long data lives, and a chatty filelog pipeline can fill a disk if you let it.
Retention in SigNoz is set per signal under Settings → General (it becomes a ClickHouse TTL under the hood). Tune each signal to its value and volume:
- Logs are the highest-volume signal -- keep them short (say 15 days). The legacy
.txtpipeline alone can be noisy. - Traces are bursty; a week or two is usually plenty for incident forensics.
- Metrics are tiny once aggregated -- keep them longest (a quarter or more) for capacity trends and year-over-year comparisons.
ClickHouse compresses telemetry hard (often around 10x), so sizing is far cheaper than raw volume suggests, but the discipline is the same as any self-hosted store: set retention deliberately, watch the disk, and sample the noisy sources (see Part 2's note on sampling) before they become the problem.
Cheat sheet -- which path for which job
| The job… | Use | You get |
|---|---|---|
Only writes .txt/log files, can't change it | Collector filelog receiver | Logs (with severity + multiline) |
| Writes structured lines, no SDK | JSON-lines file + json_parser | Logs with parsed fields |
| Can do an HTTP POST, no SDK | OTLP/HTTP via Invoke-RestMethod | Logs + real-time spans |
| Has a real OTEL SDK (Python, etc.) | The SDK + OTLP exporter | Traces + metrics + logs |
| Is your own .NET worker | AddObservability(...) | Everything, plus distributed traces |
The complete code
Everything for the jobs in this post. The shared AddObservability bootstrap, the full Collector config (which already includes the filelog/legacy receiver shown above), the docker-compose.yml, and the SigNoz install are in Part 1's appendix -- reuse them as-is.
Run it
# 1. Start SigNoz (one-time, self-hosted) -- full install in Part 1
git clone -b main https://github.com/SigNoz/signoz.git
cd signoz/deploy/docker && docker compose up -d # UI at http://localhost:8080
cd -
# 2. Start the stack (docker-compose.yml + collector from Part 1).
# The legacy .txt job runs by default and starts filling /var/log/legacy/*.txt
docker compose up -d --build
docker compose --profile jobs up -d --build # add the Python job
# 3. Run the PowerShell job on your host (PowerShell 7+), pointed at the collector's HTTP port:
$env:OTEL_EXPORTER_OTLP_ENDPOINT='http://localhost:5318'
pwsh ./backup-job.ps1
docker-compose services for the jobs
Add these two services to the docker-compose.yml from Part 1 (the legacy job runs by default; the Python job is behind a jobs profile):
# A job with NO OpenTelemetry awareness -- only writes .txt files. The collector's filelog
# receiver (in Part 1's collector config) reads them. Shares the job-logs volume with the collector.
legacy-job:
image: alpine:3.20
command: ['sh', '/opt/job/run-batch.sh']
environment: { LOG_DIR: /var/log/legacy, INTERVAL_SECONDS: '20' }
volumes:
- ./external-jobs/legacy-batch/run-batch.sh:/opt/job/run-batch.sh:ro
- job-logs:/var/log/legacy
networks: [blazorsignoz]
python-job:
build: { context: ./external-jobs/python }
profiles: ['jobs']
environment:
OTEL_SERVICE_NAME: python-etl-job
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
JOB_INTERVAL_SECONDS: '30'
depends_on: [otel-collector]
networks: [blazorsignoz]
external-jobs/legacy-batch/run-batch.sh
The stand-in legacy job. No SDK -- just .txt lines, including occasional multi-line stack traces.
#!/usr/bin/env sh
# A stand-in for a legacy batch job that has NO telemetry SDK and cannot be changed:
# it only appends human-readable lines to a .txt log file.
#
# Run locally: LOG_DIR=./out INTERVAL_SECONDS=5 ./run-batch.sh
set -eu
LOG_DIR="${LOG_DIR:-./out}"
INTERVAL="${INTERVAL_SECONDS:-20}"
mkdir -p "$LOG_DIR"
logfile() { echo "$LOG_DIR/export-$(date '+%Y%m%d').txt"; }
emit() {
level="$1"; shift
printf '%s [%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$level" "$*" >> "$(logfile)"
}
emit INFO "legacy batch job started (pid $$)"
count=0
while true; do
count=$((count + 1))
records=$(( (count * 37) % 500 + 50 ))
emit INFO "run #$count: exporting nightly inventory snapshot"
emit INFO "run #$count: wrote $records records to dataset"
if [ $((count % 4)) -eq 0 ]; then
emit WARN "run #$count: 3 records skipped (failed validation)"
fi
# A multi-line error. The continuation lines do not start with a timestamp, so the
# collector's multiline rule attaches them to the [ERROR] entry as one record.
if [ $((count % 7)) -eq 0 ]; then
file="$(logfile)"
{
printf '%s [ERROR] run #%s: export failed: connection reset by peer\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$count"
printf ' at LegacyExporter.Flush(batchId=%s)\n' "$count"
printf ' at LegacyExporter.Run()\n'
printf ' at main()\n'
} >> "$file"
fi
sleep "$INTERVAL"
done
external-jobs/python/requirements.txt
opentelemetry-api>=1.29,<2
opentelemetry-sdk>=1.29,<2
opentelemetry-exporter-otlp-proto-grpc>=1.29,<2
external-jobs/python/etl_job.py
"""
A standalone Python ETL job instrumented with the OpenTelemetry SDK. It exports traces, metrics,
and logs over OTLP to the collector (which forwards to SigNoz).
Env vars:
OTEL_SERVICE_NAME default "python-etl-job"
OTEL_EXPORTER_OTLP_ENDPOINT default "http://localhost:5317" (the demo collector's host port)
JOB_INTERVAL_SECONDS 0 = run once and exit; >0 = loop forever
"""
import logging
import os
import random
import time
from opentelemetry import metrics, trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import Status, StatusCode
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "python-etl-job")
ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:5317")
INTERVAL = int(os.getenv("JOB_INTERVAL_SECONDS", "0"))
resource = Resource.create(
{
"service.name": SERVICE_NAME,
"service.namespace": "blazor-signoz",
"service.instance.id": os.getenv("HOSTNAME", "local"),
}
)
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")
metric_reader = PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint=ENDPOINT, insecure=True),
export_interval_millis=5000,
)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
metrics.set_meter_provider(meter_provider)
meter = metrics.get_meter("etl_job")
rows_counter = meter.create_counter("python.etl.rows_processed", unit="{row}", description="Rows processed by the ETL job")
runs_counter = meter.create_counter("python.etl.runs", unit="{run}", description="ETL job executions, tagged by outcome")
logger_provider = LoggerProvider(resource=resource)
set_logger_provider(logger_provider)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter(endpoint=ENDPOINT, insecure=True)))
logging.basicConfig(
level=logging.INFO,
handlers=[LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider), logging.StreamHandler()],
)
log = logging.getLogger("etl_job")
def run_once(run_id: int) -> None:
with tracer.start_as_current_span("python.etl.run") as span:
span.set_attribute("etl.run", run_id)
log.info("ETL run %s starting", run_id)
with tracer.start_as_current_span("extract"):
time.sleep(random.uniform(0.05, 0.25))
rows = random.randint(100, 1000)
with tracer.start_as_current_span("transform"):
time.sleep(random.uniform(0.05, 0.25))
with tracer.start_as_current_span("load") as load_span:
time.sleep(random.uniform(0.05, 0.25))
if random.random() < 0.15:
load_span.set_status(Status(StatusCode.ERROR, "load failed"))
log.error("ETL run %s: load step failed", run_id)
runs_counter.add(1, {"success": "false"})
return
rows_counter.add(rows)
runs_counter.add(1, {"success": "true"})
span.set_attribute("etl.rows", rows)
log.info("ETL run %s finished: %s rows processed", run_id, rows)
def main() -> None:
run_id = 0
try:
if INTERVAL > 0:
log.info("Looping every %ss; exporting to %s", INTERVAL, ENDPOINT)
while True:
run_id += 1
run_once(run_id)
time.sleep(INTERVAL)
else:
run_once(1)
except KeyboardInterrupt:
pass
finally:
# Critical for short-lived jobs: flush batched telemetry before the process exits.
tracer_provider.shutdown()
meter_provider.shutdown()
logger_provider.shutdown()
if __name__ == "__main__":
main()
external-jobs/python/Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY etl_job.py .
ENV JOB_INTERVAL_SECONDS=30
ENTRYPOINT ["python", "etl_job.py"]
external-jobs/powershell/OtelExport.ps1
Dot-source this; it provides both PowerShell telemetry paths.
<#
OtelExport.ps1 -- minimal OpenTelemetry helpers for PowerShell (7+).
Send-OtelLog / Send-OtelTrace : POST OTLP/HTTP+JSON straight to a collector (port 4318).
Write-JobLog : append JSON-lines to a file for the collector's filelog receiver.
OTLP/JSON gotchas handled: timeUnixNano as quoted strings, hex trace/span ids, integer enums.
#>
function Get-OtelNano {
$ms = [DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()
return ([long]$ms * 1000000).ToString()
}
function New-OtelId {
param([int]$Bytes)
$buffer = New-Object byte[] $Bytes
[System.Security.Cryptography.RandomNumberGenerator]::Fill($buffer)
return (($buffer | ForEach-Object { $_.ToString('x2') }) -join '')
}
function ConvertTo-OtelAttributes {
param([hashtable]$Attributes)
$list = @()
foreach ($key in $Attributes.Keys) {
$list += @{ key = $key; value = @{ stringValue = [string]$Attributes[$key] } }
}
return , $list # leading comma forces an array even for 0/1 elements
}
function Send-OtelLog {
param(
[Parameter(Mandatory)][string]$Message,
[ValidateSet('TRACE', 'DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')][string]$Severity = 'INFO',
[string]$Service = 'powershell-job',
[hashtable]$Attributes = @{},
[string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
)
if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
$severityNumber = @{ TRACE = 1; DEBUG = 5; INFO = 9; WARN = 13; ERROR = 17; FATAL = 21 }[$Severity]
$payload = @{
resourceLogs = @(@{
resource = @{ attributes = @(
@{ key = 'service.name'; value = @{ stringValue = $Service } },
@{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
) }
scopeLogs = @(@{
scope = @{ name = 'powershell' }
logRecords = @(@{
timeUnixNano = (Get-OtelNano)
severityNumber = $severityNumber
severityText = $Severity
body = @{ stringValue = $Message }
attributes = (ConvertTo-OtelAttributes $Attributes)
})
})
})
}
$json = $payload | ConvertTo-Json -Depth 12 -Compress
try {
Invoke-RestMethod -Uri "$Endpoint/v1/logs" -Method Post -ContentType 'application/json' -Body $json | Out-Null
}
catch {
Write-Warning "OTLP log export failed: $($_.Exception.Message)"
}
}
function Send-OtelTrace {
param(
[Parameter(Mandatory)][string]$Name,
[int]$DurationMs = 100,
[ValidateSet('UNSET', 'OK', 'ERROR')][string]$Status = 'OK',
[string]$Service = 'powershell-job',
[hashtable]$Attributes = @{},
[string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
)
if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
$endNano = [long]([DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()) * 1000000
$startNano = $endNano - ([long]$DurationMs * 1000000)
$statusCode = @{ UNSET = 0; OK = 1; ERROR = 2 }[$Status]
$payload = @{
resourceSpans = @(@{
resource = @{ attributes = @(
@{ key = 'service.name'; value = @{ stringValue = $Service } },
@{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
) }
scopeSpans = @(@{
scope = @{ name = 'powershell' }
spans = @(@{
traceId = (New-OtelId 16)
spanId = (New-OtelId 8)
name = $Name
kind = 1 # INTERNAL
startTimeUnixNano = $startNano.ToString()
endTimeUnixNano = $endNano.ToString()
attributes = (ConvertTo-OtelAttributes $Attributes)
status = @{ code = $statusCode }
})
})
})
}
$json = $payload | ConvertTo-Json -Depth 12 -Compress
try {
Invoke-RestMethod -Uri "$Endpoint/v1/traces" -Method Post -ContentType 'application/json' -Body $json | Out-Null
}
catch {
Write-Warning "OTLP trace export failed: $($_.Exception.Message)"
}
}
function Write-JobLog {
param(
[Parameter(Mandatory)][string]$Message,
[string]$Level = 'INFO',
[string]$Path = './out/powershell-job.log',
[hashtable]$Attributes = @{}
)
$dir = Split-Path -Parent $Path
if ($dir -and -not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }
$entry = [ordered]@{ time = (Get-Date).ToString('o'); level = $Level; msg = $Message }
foreach ($key in $Attributes.Keys) { $entry[$key] = $Attributes[$key] }
($entry | ConvertTo-Json -Compress) | Add-Content -Path $Path
}
external-jobs/powershell/backup-job.ps1
An example job that uses both paths -- belt and suspenders.
. "$PSScriptRoot/OtelExport.ps1"
$ErrorActionPreference = 'Stop'
$service = 'powershell-backup-job'
$logFile = Join-Path $PSScriptRoot 'out/powershell-job.log'
$start = Get-Date
Send-OtelLog -Service $service -Severity INFO -Message 'Backup job started' -Attributes @{ host = $env:COMPUTERNAME }
Write-JobLog -Path $logFile -Level INFO -Message 'Backup job started (file path)'
try {
Start-Sleep -Milliseconds 400
$files = Get-Random -Minimum 50 -Maximum 500
Send-OtelLog -Service $service -Severity INFO -Message "Backed up $files files" -Attributes @{ files = $files }
Write-JobLog -Path $logFile -Level INFO -Message "Backed up $files files" -Attributes @{ files = $files }
$duration = [int]((Get-Date) - $start).TotalMilliseconds
Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status OK -Attributes @{ files = $files }
Write-Host "Backup complete: $files files in ${duration}ms"
}
catch {
$message = $_.Exception.Message
Send-OtelLog -Service $service -Severity ERROR -Message "Backup failed: $message"
Write-JobLog -Path $logFile -Level ERROR -Message "Backup failed: $message"
$duration = [int]((Get-Date) - $start).TotalMilliseconds
Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status ERROR
throw
}
The in-solution .NET worker (the easy case)
For completeness, here is the worker that gets full telemetry from the shared bootstrap. It's a plain Microsoft.NET.Sdk.Worker app with a <ProjectReference> to Shared.Telemetry.
src/Worker.Jobs/Program.cs
using Shared.Telemetry;
using Worker.Jobs.Jobs;
using Worker.Jobs.Telemetry;
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddSingleton<WorkerTelemetry>();
var backendBaseUrl = builder.Configuration["Backend:BaseUrl"] ?? "http://localhost:5081";
builder.Services.AddHttpClient("backend", client => client.BaseAddress = new Uri(backendBaseUrl));
builder.Services.AddHostedService<InventoryReconciliationJob>();
// Same shared bootstrap as the web apps. A worker is not a web server, so ASP.NET Core
// instrumentation is off; HttpClient + runtime instrumentation stay on.
builder.AddObservability("worker-jobs", options =>
{
options.InstrumentAspNetCore = false;
options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
options.Meters.Add(WorkerTelemetry.MeterName);
});
var host = builder.Build();
host.Run();
src/Worker.Jobs/Telemetry/WorkerTelemetry.cs
using System.Diagnostics;
using System.Diagnostics.Metrics;
namespace Worker.Jobs.Telemetry;
public sealed class WorkerTelemetry : IDisposable
{
public const string ActivitySourceName = "Worker.Jobs";
public const string MeterName = "Worker.Jobs";
public static readonly ActivitySource ActivitySource = new(ActivitySourceName);
private readonly Meter _meter = new(MeterName, "1.0.0");
private readonly Counter<long> _jobRuns;
private readonly Histogram<double> _jobDuration;
private readonly Counter<long> _itemsProcessed;
public WorkerTelemetry()
{
_jobRuns = _meter.CreateCounter<long>("worker.job.runs", unit: "{run}",
description: "Number of background job executions, tagged by job name and outcome.");
_jobDuration = _meter.CreateHistogram<double>("worker.job.duration", unit: "ms",
description: "Duration of background job executions.");
_itemsProcessed = _meter.CreateCounter<long>("worker.job.items_processed", unit: "{item}",
description: "Items processed by background jobs.");
}
public Activity? StartActivity(string name) => ActivitySource.StartActivity(name, ActivityKind.Internal);
public void RecordRun(string jobName, bool success, TimeSpan duration, int itemsProcessed)
{
var tags = new TagList { { "job.name", jobName }, { "success", success } };
_jobRuns.Add(1, tags);
_jobDuration.Record(duration.TotalMilliseconds, tags);
if (itemsProcessed > 0) _itemsProcessed.Add(itemsProcessed, new TagList { { "job.name", jobName } });
}
public void Dispose() => _meter.Dispose();
}
src/Worker.Jobs/Jobs/InventoryReconciliationJob.cs
using System.Diagnostics;
using System.Net.Http;
using System.Net.Http.Json;
using Worker.Jobs.Telemetry;
namespace Worker.Jobs.Jobs;
/// <summary>A periodic job. Each run starts a root span, calls the backend over an instrumented
/// HttpClient (→ worker-jobs → backend-api → db in SigNoz), logs, and records run metrics.</summary>
public sealed class InventoryReconciliationJob(
IHttpClientFactory httpClientFactory,
WorkerTelemetry telemetry,
IConfiguration configuration,
ILogger<InventoryReconciliationJob> logger) : BackgroundService
{
private const string JobName = "inventory-reconciliation";
private readonly TimeSpan _interval =
TimeSpan.FromSeconds(Math.Clamp(configuration.GetValue("Worker:IntervalSeconds", 15), 1, 3600));
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
logger.LogInformation("{JobName} starting; interval {IntervalSeconds}s", JobName, _interval.TotalSeconds);
try { await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken); } // let the backend come up
catch (OperationCanceledException) { return; }
// Run once immediately, then start the timer so its clock begins AFTER the warm-up run.
await RunOnceAsync(stoppingToken);
using var timer = new PeriodicTimer(_interval);
while (await WaitForNextTickAsync(timer, stoppingToken))
{
await RunOnceAsync(stoppingToken);
}
}
private async Task RunOnceAsync(CancellationToken ct)
{
using var activity = telemetry.StartActivity($"job.{JobName}");
activity?.SetTag("job.name", JobName);
var stopwatch = Stopwatch.StartNew();
try
{
var client = httpClientFactory.CreateClient("backend");
var stats = await client.GetFromJsonAsync<ProductStats>("/api/products/stats", ct);
var count = stats?.TotalCount ?? 0;
activity?.SetTag("job.items", count);
logger.LogInformation("Reconciled {ProductCount} products (inventory value {InventoryValue})",
count, stats?.InventoryValue);
telemetry.RecordRun(JobName, success: true, stopwatch.Elapsed, itemsProcessed: count);
}
catch (OperationCanceledException) when (ct.IsCancellationRequested)
{
return; // graceful shutdown -- not a failure
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
logger.LogError(ex, "{JobName} run failed", JobName);
telemetry.RecordRun(JobName, success: false, stopwatch.Elapsed, itemsProcessed: 0);
}
}
private static async Task<bool> WaitForNextTickAsync(PeriodicTimer timer, CancellationToken ct)
{
try { return await timer.WaitForNextTickAsync(ct); }
catch (OperationCanceledException) { return false; }
}
private sealed record ProductStats(int TotalCount, int TotalQuantity, decimal InventoryValue, decimal AveragePrice);
}
Wrapping up
The lesson is the same across every case: the Collector is the seam. Apps you own export OTLP; apps you don't own get adapted at the Collector -- a filelog receiver for text, a resource processor to give files an identity, a json_parser for structured lines, OTLP/HTTP for anything that can POST. Stamp a consistent service.name / service.namespace, and every job -- C#, Python, PowerShell, or a decade-old batch script -- shows up side by side in a dashboard you host yourself.
For the rest of the series: Part 1 -- Blazor Server observability (which also carries the shared foundation code) and Part 2 -- full-stack observability for a C# API with Postgres/SQL Server, or jump back to the series index.
💼Open for consulting
I take on consulting and delivery work across .NET and React — on my own or alongside a trusted group of senior engineers I work with. Together we can build, untangle and modernize your software:
- Building ASP.NET / Blazor / C# / WPF apps with Postgres / ClickHouse
- Untangling, refactoring & modernizing legacy ASP.NET, C#, Blazor and WPF into a modern stack (modular monolith C# + React)
- Cloud & on-premise DevOps: Azure DevOps, CI/CD pipelines and automation
- Observability & analytics — in the cloud and on-premise
- On-premise migrations
- Scaling up delivery with experienced .NET, backend and React engineers, plus technical leadership