One `kill wineserver` and four prop accounts go dark

writing · 2026-06-14 · Wine · MT5 · Linux ops

ONE `KILL WINESERVER`
AND FOUR PROP
ACCOUNTS GO DARK.

I run four MT5 terminals on one Linux VPS under Wine — one per broker, each in its own Wine install, each with its own bridge. They look isolated. They are not. They share a single wineserver. A bridge-restart script aimed at one python process grabbed the wineserver PID instead. The whole rig went dark in one signal.

TL;DR

On Linux, wineserver is the shared kernel of a Wine prefix. Killing it cascades to every wine-hosted process — terminals, EAs, helper python, everything.
The trap: ss -ltnp on a wine-hosted listener lists both the python.exe owner and the wineserver in users:((…)). The order is not stable. A grep -oE 'pid=[0-9]+' | head -1 picks one or the other — coin flip.
Fix: match on the process name, not the position. grep -oE 'python\.exe.+pid=[0-9]+' then extract. Or safer: pkill -f "port=$PORT".
Recovery: DISPLAY=:1 wine "C:\Program Files\MetaTrader N\terminal64.exe" & per terminal. With "remember password", they auto-login. Allow ~60s per broker for IPC to come back. Any login popup = VNC required.

1 · The setup that looks isolated

Four prop accounts, four brokers, one VPS. Each broker is pinned to its own MT5 installation under the Wine prefix — literally MetaTrader 5, MetaTrader 6, MetaTrader 7, MetaTrader 8 as separate program-files folders. Each terminal has its own login. In front of each terminal is a tiny rpyc bridge: a Wine-hosted python.exe binding 127.0.0.1:18812…18815, exposing the MT5 API to the Nautilus engine running natively on Linux.

From the Linux side it looks like four independent stacks. Different binaries. Different ports. Different systemd units (nautilus-journal@e8, @derrick, @fn, @lf). If one breaks, the other three should keep running. That's how isolation works, right?

Yes — except for one shared dependency I’d stopped thinking about. The Wine prefix.

2 · What wineserver actually is

Wine is not an emulator (the W in WINE: “Wine Is Not…”). It’s a re-implementation of the Win32 API on top of Linux syscalls. To pull that off, every wine-hosted process needs a coordination layer that fakes the things Windows processes assume exist — a registry, the inter-process Win32 sync primitives (events, mutexes, named pipes), the desktop, file-handle semantics. That coordinator is wineserver.

One wineserver per Wine prefix. All wine-hosted processes attached to that prefix talk to the same wineserver over a unix-domain socket. If wineserver disappears, the processes attached to it are no longer running on anything — their fake-Windows backbone is gone. Wine’s shutdown path is to kill them all.

Which means: four separate MetaTrader 5/6/7/8 installations is just folder-level separation. They’re all attached to /root/.wine. They all share one wineserver. They are one blast radius.

3 · The regex that did it

The bridge-restart script does what you’d guess: find the python process listening on the port, kill it, restart it. The lookup looked like this:

× Broken — coin flip between python and wineserver

PORT=18814
PID=$(ss -ltnp \
  | grep ":$PORT" \
  | grep -oE 'pid=[0-9]+' \
  | head -1 \
  | cut -d= -f2)
kill "$PID"

✓ Fixed — match the python.exe owner explicitly

PORT=18814
PID=$(ss -ltnp \
  | awk -v p=":$PORT" '$4 ~ p' \
  | grep -oE 'python\.exe.+pid=[0-9]+' \
  | grep -oE 'pid=[0-9]+' \
  | head -1 \
  | cut -d= -f2)
kill "$PID"

# Or, safer still:
pkill -f "port=$PORT"

Run ss -ltnp on a wine-hosted listener and the users: column for that socket lists every process holding a handle to it. On a Wine python listener that’s typically two entries: the python.exe itself and the wineserver coordinator. The order in the output is not stable across kernel versions, ss versions, and even across consecutive runs.

The script had been running fine for weeks. On 2026-05-24, on a routine bridge restart for the E8 book on port 18814, the regex grabbed wineserver. kill returned 0. Within a second, every terminal under /root/.wine — E8, Derrick, FundedNext, LiteFinance — was gone. Four books offline at the same time.

4 · What the cascade actually looks like

From journalctl during the incident:

... bridge-restart[12345]: killing port=18814 pid=8821
... systemd[1]: [email protected]: rpyc connection lost, restarting
... systemd[1]: [email protected]: rpyc connection lost, restarting
... systemd[1]: [email protected]: rpyc connection lost, restarting
... systemd[1]: [email protected]: rpyc connection lost, restarting
... spread-recorder-universe[33301]: ECONNRESET on :18812, :18813, :18814, :18815

One kill. Four journal services losing their rpyc peer in the same second. The spread recorder dropping all four lanes. No live order routing on any book. If any account had been mid-trade, the journal would catch up on reconnect — but the strategy can’t arm new stops on a terminal that isn’t there.

The recovery is mechanical but slow. Per terminal:

DISPLAY=:1 wine "C:\Program Files\MetaTrader 5\terminal64.exe" &
DISPLAY=:1 wine "C:\Program Files\MetaTrader 6\terminal64.exe" &
DISPLAY=:1 wine "C:\Program Files\MetaTrader 7\terminal64.exe" &
DISPLAY=:1 wine "C:\Program Files\MetaTrader 8\terminal64.exe" &

With cached creds and “remember password” ticked on each install, the terminals auto-login on cold start. Allow ~60 seconds per terminal for broker IPC to converge. If any of them throws a login dialog — expired session, server-side reset, password rotation — you need VNC into :1 to click through. There’s no headless recovery once a modal is up.

5 · Why this is a class of bug, not a one-off

The mistake feels stupid in hindsight (it always does). But the shape of it generalises:

Wine prefix — one wineserver, many processes. Kill the coordinator, kill everything.
Docker container — one PID 1, many child processes. kill -9 the PID-1 and the kernel reaps the lot.
systemd cgroup — one unit, many processes. systemctl kill --kill-who=all hits every member.
k8s pod — one shared network and IPC namespace. Restart any container and the pod IP and ephemeral state churn for all of them.
tmux session — one server, many panes. tmux kill-server nukes every pane on the box.

The pattern is: any time you have a shared-lifecycle container of processes, the unit-of-isolation is the container, not the leaf process. If your tooling reasons at the leaf level (one port, one python, one connection) but the kernel reasons at the container level (one wineserver, one PID 1, one cgroup), you have a blast-radius mismatch waiting to surface.

The defense is not better kill commands. The defense is teaching your scripts about the container they’re inside, and refusing to send signals to processes that aren’t leaves of it. pkill -f "port=$PORT" works because the match is exact: only your own python.exe ever carries that port= token in its argv. The wineserver doesn’t.

6 · The audit, if you run this stack

If you have a Wine-hosted MT5 rig and you want to be sure this can’t bite you:

Grep your ops scripts for ss -ltnp | grep | head -1 patterns. Any positional pick on a wine-hosted port is a coin flip. Replace with name-matched extraction or pkill -f.
Tag your bridges in argv. Start each bridge as python.exe bridge.py port=18814 broker=e8 so pkill -f "port=18814" uniquely identifies one process.
Order your service shutdown. Stop journal + recorder services before touching bridges. Restart in reverse. A bridge restart with live consumers attached just trips reconnect storms.
Document the prefix as the unit of blast. “Never kill wineserver” goes next to “never kill MT5” in your ops runbook. They’re the same class of rule.
Drill the cold-start. Time how long four terminals take from wine terminal64.exe to first journal trade. Mine is around four minutes. If yours is “I don’t know, I’ve never had to do it,” you’ll find out at the worst moment.

7 · The bigger principle

I’ve been writing a lot lately about success-shaped failures — bugs where every layer returns OK and only the P&L knows something went wrong. This is a different beast: a scope-shaped failure. The signal you sent was correct. It was just sent to a process whose scope was wider than you thought.

The fix isn’t cleverer code. It’s respecting that the kernel groups processes by lifecycle, not by intent. Your intent was to bounce one bridge. The kernel saw a signal to the prefix coordinator. The kernel was right; the intent was wrong.

If you’re running multi-broker prop infra on Linux and your ops scripts have grown organically from “one terminal that worked once” into “four terminals nobody’s touched in months,” I’d bet there’s at least one shared-lifecycle trap like this hiding in your runbook. Audit before you find out the way I did.

SVENN MIVEDOR

AI-native quant. Live prop trading systems, multi-broker MT5 plumbing, agent ops on production books. See the work →

Book a call →