Concept Lesson · Operational Safety

Watchdogs:
the processes that
watch the processes.

Your strategy can fail silently — a hung loop, a crashed process, a position no SL was ever placed against. A watchdog is a second, independent program whose only job is to notice that and shout.

Two watchdogs: Hedge Runner · MCX SL Watchdog
Both: cron-driven, separate process, separate Telegram bot
Why a watchdog

A strategy can fail in ways it cannot report

Silent failures

  • Process crashed — no Telegram is sent because there's no process to send it.
  • Event loop hung — the bot is "running" but doing nothing.
  • SL order rejected at the exchange — entry filled, SL didn't, no one noticed.
  • Position created by a different actor (manual, external strategy) — the running bot doesn't know it exists.

What a watchdog adds

  • Runs as a separate process — survives the main bot dying.
  • Looks at broker truth (positions, orders) — not the bot's in-memory state.
  • Wakes on a cron schedule — no event loop to hang.
  • Alerts on a separate Telegram bot — even if the main alert channel is muted.
A self-reporting bot can only report what it knows. A watchdog reports what the bot couldn't.
Mental model

The night watchman walks the perimeter

Imagine a warehouse with sensors, cameras, alarms — all wired to a central panel. The panel works. Until it doesn't. The cable gets cut, the power dies, the screen says everything is fine because nothing is sending updates.

So you also hire a night watchman. They don't trust the panel. Every fifteen minutes they walk the perimeter, count the doors, check the locks. If something is wrong they shout — through a different phone, on a different line.

That's exactly what these two processes do. They don't ask the trading bot "are you OK?" They go directly to the broker and the exchange and check what's actually there.

A watchdog never trusts the thing it's watching. It checks ground truth.
Inventory

We run two watchdogs, each guarding a different leg

Hedge Runner

Watches overnight positional short positions on the equity index account. If a short is open, a hedge must be in place by 15:16 IST and removed at 09:16 IST the next morning.

File: hedge_runner.py

Bot: RedRisk

MCX SL Watchdog

Watches MCX commodity short positions. Every short must have a live SL order (regular or GTT). Naked shorts, bad-state SLs, and orphan SLs all alert.

File: scripts/mcx_sl_watchdog.py

Bot: RedRisk

Different segments, different failure modes — but the same shape: an outside actor checking broker truth on a clock.
Hedge Runner · purpose

Overnight unhedged shorts are a tail-risk grenade

A short option held overnight has unlimited downside if the market gaps against you. The fix is well-known: buy a far-OTM option in the same series. That's the hedge. It caps the worst-case loss.

!

The hedge is for shorts we did not place

This watchdog isn't protecting the intraday strategies (they all close by squareoff). It protects external shorts — positions opened manually by the operator or by a separate positional strategy that has no concept of "place the hedge".

The invariant

If a short exists overnight, a matching long (the hedge) must exist alongside it — same expiry, same quantity, far enough OTM to be cheap. Hedge quantity must always equal short quantity.

The strategy that opens the short isn't the one that defends it. The hedge runner closes that gap.
Hedge Runner · schedule

Three cron entries, one daily cycle

09:16 IST
Squareoff
Sell back yesterday's hedge once the new session is open and liquid.
15:16 IST
Place
Look at the current short book. Buy a matching hedge for every open short.
15:20 / :25 / :29
Monitor x3
Re-check. If a hedge is missing (rejection, slip), re-place. Last chance before close.
Any mode
Crash alert
If the runner itself fails to start, a CRITICAL goes to RedRisk before the process exits.

Each entry is a one-shot. The runner is not a long-lived process — cron starts it, it does one job, it exits. There's nothing to crash overnight, nothing to leak memory, nothing to forget.

A daemon can die mid-day. A cron job that's already finished can't.
Hedge Runner · logic

It reads the broker, not its own memory

hedge_runner.py --mode place
# 15:16 IST — cron fires this entry on weekdays

1. Fetch live positions from the broker.
2. Keep only equity-index shorts that this account will hold overnight.
3. For each short:
     compute the hedge strike (far OTM, same expiry).
     compute the required hedge quantity = abs(short qty).
4. For each hedge that is missing or under-sized:
     place a BUY order to top it up.
5. Persist what got placed to logs/hedge_state.json.
6. Alert RedRisk: "Hedges placed, all shorts covered."

# 15:20 / 15:25 / 15:29 IST — same logic, in --mode monitor
# If any hedge is still missing, re-place it. No state from earlier runs is trusted.

The state file is a record, not a source of truth. Every run re-reads positions from the broker. If you manually flatten a short between 15:16 and 15:20, the next monitor pass simply notices the hedge is now too large and stops there — it doesn't act on stale state.

Truth lives at the broker. The bot is just an opinion about what the broker probably looks like.
MCX SL Watchdog · purpose

Every MCX short must have a live stop-loss

MCX commodity options run from 09:00 to 23:30 IST. That's fourteen and a half hours of exposure per day. A short crude option with no stop-loss order on the exchange is a position with no defined worst case.

?

The trading bot already places SLs — why a watchdog?

Because placement can quietly fail: the SL order gets rejected, cancelled by the broker, or never accepted by the exchange. The main bot will alert on most of these, but not all — and a crashed bot alerts on none of them.

The watchdog also catches manual SLs

The operator may have placed a GTT stop-loss through the Upstox app. The bot doesn't see those. The watchdog checks both the regular order book (v2) and the GTT book (v3) — a manual GTT counts as protection.

An unprotected MCX short between 09:00 and 23:30 is the single highest-loss scenario in the whole system.
MCX SL Watchdog · five conditions

What the watchman is looking for

1

Naked short

A short MCX position with no matching SL order in either the regular or GTT book. The most dangerous failure — pure undefined risk.

2

Bad-state SL

An SL exists but its status is rejected, cancelled, failed or expired. From the exchange's point of view, there is no protection.

3

Wrong product type

The position is opened as MIS (intraday, auto-square-off) but the SL is NRML, or vice versa. Different product types don't protect each other — the SL won't trigger against the position you think it's guarding.

4

Non-MIS MCX position

MCX entries are required to be MIS so the broker auto-squares anything we forget. A delivery-product MCX short bypasses that safety net — flagged.

5

Orphan SL

A live SL exists but the underlying short is gone. If price moves to the trigger, the SL fires as an unintended long — you wake up with an inverted position.

MCX SL Watchdog · both books

Regular v2 & GTT v3 — two places an SL can live

Regular order book (v2)
GTT order book (v3)
Placed by
The trading bot (automated) — via the broker API.
Usually the operator manually, via the Upstox app.
Lives until
End of session — or until cancelled.
Triggered or until cancelled. Survives sessions.
"Live" status
trigger_pending, open, pending
SCHEDULED
Bad states
rejected, cancelled
FAILED, CANCELLED, EXPIRED
Matched by
Instrument token — one short can be protected by either type.
Instrument token — same.

Skipping the GTT book would mean false alarms every time the operator manually protected a position. Including it means a manual safety net counts.

A check that doesn't model how the operator actually behaves will be muted within a week.
MCX SL Watchdog · rhythm

Every fifteen minutes during MCX hours

*/15 · 09:00–23:30
Wake
Cron fires the script. If outside MCX hours, exit silently. If a non-trading day, exit silently.
~5–15 sec
Inspect
Pull positions, regular orders, GTT orders. Match shorts to SLs by instrument token.
One alert
Speak
CRITICAL if any of the five conditions hit. Quiet INFO "all clear" if every short is protected and nothing is orphaned.

The all-clear is deliberately quiet but present. If RedRisk goes silent for an hour during MCX hours, that itself is a signal — the watchdog has stopped running.

Silence from a watchdog is also data. A heartbeat is what lets you trust the absence of an alert.
The pattern

Five properties both watchdogs share

A

External process

Neither runs inside the trading bot. If the bot is dead, hung, or in a bad deploy state, the watchdog still wakes up.

B

Cron-driven, not daemonized

Each invocation is a fresh process. Nothing to keep alive, nothing to restart, nothing to leak.

C

Broker truth, not local state

Both query the broker for positions and orders on every run. No JSON cache is trusted as source of truth.

D

Separate Telegram bot (RedRisk)

The main trading-info bot can get noisy and muted. RedRisk is a quiet channel reserved for "something is structurally wrong." Two channels = two attention budgets.

E

Idempotent

Running twice produces the same alert (or none). No double-counting, no double-placement. Safe to re-run by hand at any time.

When you'd build the next one

Three signals that a feature needs its own watchdog

1. The failure is silent

If a critical step can fail without anyone noticing for an hour, you need an outside check. Examples: token refresh, websocket reconnect, file-handle leak.

2. The failure is expensive

If the worst case is "lose the day" or "lose the account", you can't rely on the main process to self-report. Examples: missing SL, missing hedge, wrong-side position.

3. The check can be done from outside

A watchdog only works if the invariant is observable from the broker, the filesystem, or some external state. "Did the strategy think correctly?" is not checkable. "Is the SL order alive?" is.

Don't add a watchdog for everything — only for the failures you cannot afford to discover late.
The rule

A bot reports what it knows.

A watchdog reports what the bot couldn't.

Build one for every loss you can't afford.

External · cron-driven · broker-truth · separate channel · idempotent.

← Back to course index

Arrow keys to navigate · F for fullscreen
1 / 14