Lab — five playable thought experiments

Interactive research arcade

Aim. Fire. Get graded.

Five test chambers from my research: Bitcoin consensus, AI watermarks, training-run forensics, fault tolerance, optimization. Each one drops you into a real scenario; you pick the strategy, the simulation grades the result from Off-target to Frontier 🏆. The dial does not flatter. The cake, as always, is a lie. 5★ means you'd survive in production.

Mine, attack, defend Bitcoin Catch a stolen AI model Crash-test redundancy

Five labs, each a scenario with a measurable goal. 5★ means you found the minimum play that beats it. Unlocks the badge.

Block Race Locked
Watermark Locked
TMR Locked
Proof-of-Learning Locked
Gradient descent Locked

0/5 missions

Blockchain · Nakamoto consensus

Block Race

Scientific Reference: Based on the Satoshi consensus protocol. For a review of blockchain integration in AI, see the author's survey: "Blockchain-Enhanced Machine Learning" (IEEE Access 2023).

💰 Someone's paying you $10,000 in Bitcoin. They might try to reverse the transaction after you ship the goods. How long should you wait?

Adversary:

Wait 6 confirmations ≈ 1 hour · each block takes about 10 min

✅

0.85% chance the payment reverses. …

🔬 Open the full simulator (three roles · chain race · scoring · 14 named endings)

⛏️ Bitcoin's entire security argument fits on a napkin Satoshi never published: longest chain wins, honest hashrate outpaces malicious, attack probability decays exponentially in z. Three roles, one protocol — mine honestly, attempt a heist, or hold the line. The math is whitepaper §11; the bad decisions are entirely yours.

Your move Pick a rig, mine for a span, hit the expected block yield. Each tier sets a target block count over a fixed window. 5★ means you found at least the expected number with the smallest rig that can hit it.

5★ Heisenberg-grade4★ Mr. White budget3★ Pinkman bracket

Pick your rig

Your strategy: blocks to mine? N 24

… Pick a role, choose your strategy, hit Run.

Expected blocks … your hashrate × window

90th-percentile yield … variance is real, plan for it

Pick a rig and a mining window. Hashrate share decides how often you win the next block; variance decides how lumpy your weeks feel.

🧠 What did you just learn? (also: is this a Black Mirror episode?)

Nakamoto consensus is a foot race, not a vote. Every ~10 minutes, every miner is racing to solve a hash puzzle. Whoever wins extends the chain. Your chance of winning the next block equals your share of total hashrate. Over many blocks, your yield converges to that share — but any individual week can be wildly lucky or unlucky (Poisson variance, λ = q·N). Pools exist because solo variance is brutal. Heisenberg's uncertainty principle has nothing on the variance of one ASIC.

The double-spend math. If an attacker controls fraction q of the network and the honest chain is z blocks ahead, the chance they ever catch up is, per Satoshi's whitepaper §11, 1 − Σ Poisson(k; zq/p)·(1−(q/p)^z−k). At q = 30%, six confirmations cuts attack probability to ~17.7%. At q = 10%, the same six confirmations cuts it to 0.024%. That's the entire reason exchanges wait six blocks. There is no spoon. There is only Poisson.

Defense is exponential. Each extra confirmation costs you one block-time (~10 minutes) and roughly squares the attacker's odds against you. The longer you wait, the more impossible the heist becomes — which is why patient merchants never get robbed, and impatient ones get to star in their own Black Mirror episode. Tread lightly. Better Call Saul. Wubba lubba dub dub.

Scientific Context: Blockchain networks provide decentralized consensus mechanisms. To explore the broader integration of blockchain mechanisms to secure distributed machine learning pipelines, read the author's comprehensive survey: "Blockchain-Enhanced Machine Learning" (IEEE Access 2023).

ML security · Model provenance

Model Heist Detector

🕵️ Someone leaked your AI and ran it through a disguise. Before publishing, you'd spread a faint statistical signature across thousands of weights, too small to notice individually but together a fingerprint only you can read. The thief tries to scrub it. You have to read it through the noise anyway.

Scientific Reference: Implements robust statistical watermarking. See the author's paper: "Feature-Based Model Watermarking for PoL" (IEEE Access 2024).

Your move Catch the thief without breaking the model. Each thief level adds more noise. Tune signature strength and number of marks. Push ε too far and the model behaves oddly; under-shoot k and the signal drowns in the noise floor.

5★ beat the catch target4★ catch with collateral3★ partial detection

Pick the thief

Your strategy: how bold is your signature? ε 0.20 Your strategy: how many hidden marks? k 8

… Pick a thief, set your strategy, hit Run.

Caught the thief … verifier with the secret key

False alarm rate … on an innocent model

Drag the sliders.

🧠 What did you just learn?

Tiny secrets in many places beat one big secret. An AI model has hundreds of millions of internal numbers. A watermark is a faint statistical pattern spread across many of them. Each individual mark is way too small to notice, but together they form a signature only you can recognize.

This is how labs at OpenAI, Google, and Anthropic plan to prove "yes, that model is ours" if it gets leaked. The same trick is used to find AI-generated text, mark images from specific cameras, and prove which lab produced a research paper. The harder the thief tries to erase one mark, the more visible the others become.

Scientific Context: This simulation implements features similar to the watermark detection schemes analyzed in the author's paper, where aggregate detection over $k$ weights is modeled using a Gaussian Z-test. Read the details in the IEEE Access paper: "Feature-Based Model Watermarking for PoL" (IEEE Access 2024).

Machine Learning · Model provenance

Proof-of-Learning (SecurePoL)

🔬 Anyone can download a model and claim they trained it. The proof is in the journey: a real training run leaves a wobbly, monotone-ish loss curve that's almost impossible to forge after the fact. Tune the run; see if the trajectory would survive an audit. (The auditor is unforgiving but fair. Mostly fair.)

Scientific Reference: Implements loss curve auditing and proof verification. See the author's paper: "SecurePoL: Integration of Watermarking With Proof-of-Learning" (IEEE Access 2025) and Ph.D. Dissertation.

Your move Earn Gold Proof. Pick learning rate, batch size, and data noise. Hit Train. The detector scores you on monotonicity, smoothness, distance from the fake-flat baseline, and how close your hyperparameters are to a credible regime.

5★ score ≥ 944★ ≥ 823★ ≥ 68

Gold lives near α 0.008 to 0.018, batch 64 to 256, and noise 0.02 to 0.08. The lab will not admit this was helpful.

Learning speed α 0.01 Examples per step B 32

Faster training

… Set hyperparameters, hit Train, watch the curve.

Verification score 0 target: Gold ≥ 94

Badge … Bronze · Silver · Gold

Adjust sliders and hit Train. Real training should descend with controlled chaos, not flatline like a suspiciously convenient download from the desert of the real.

🧠 What did you just learn?

The journey is harder to fake than the destination. The final weights of a trained AI are just a giant list of numbers, copy-pasteable in seconds. But the path the loss took during training? It's noisy, bumpy, has little flat spots when learning stalls, sudden drops when it finds a shortcut. That fingerprint is almost impossible to forge after the fact.

This is called Proof-of-Learning, and it matters for: patent disputes ("did you really invent this?"), competitive intelligence ("did they actually train, or did they distill ours?"), and verifying open-source claims. Same idea as a digital signature, but for the training process instead of the model.

Scientific Context: In a real training run, verification relies on checking the monotonicity, smoothness, and hyperparameter consistency of the logged epoch states. Spoofing attacks can be detected by coupling these logs with model watermarks, as proposed in the author's paper: "SecurePoL: Integration of Watermarking With Proof-of-Learning to Enhance Security Against Spoofing Attacks" (IEEE Access 2025) and detailed in his Ph.D. Dissertation.

Aerospace · Fault tolerance

Redundancy Reactor

✈️ Your A320 has three flight computers and a majority voter. One fails, the other two outvote it. But "three computers" is only "three independent failure paths" if they fail differently. Identical software hits the same overflow at the same millisecond. (Ariane 5, 1996. The rocket disagreed with reality and disassembled itself 39 seconds in.)

Scientific Reference: Based on Triple Modular Redundancy (TMR). Inspired by real-time safety-critical aerospace and flight simulator architecture developed by the author.

Your move Pick the mission; pick the minimum N that beats the safety target. Each mission carries its own correlation (ρ) and per-channel failure rate (q). 5★ means hitting the safety multiplier with the fewest backups; overspending earns 4★.

5★ minimum N4★ overspent win3★ close miss

Pick the mission

Your strategy: how many backup computers? N 3

… Pick a mission, choose backup count, hit Run.

Safety multiplier … vs a single computer

Whole-system failure rate … after majority voting

Drag the sliders. The strip is live; the curves are closed-form; déjà vu is still a bug, not a feature.

🧠 What did you just learn?

Three of the same thing is still one thing. Backup computers only protect you if they fail in different ways. Three identical computers running the same software will hit the same bug at the same moment, and then majority voting gives the wrong answer with confidence.

This is what crashed the first Ariane 5 rocket in 1996: it had redundant flight computers, but they all ran the same code, so they all overflowed the same variable at the same instant. To get real safety, planes use computers from different vendors, written by different teams, in different programming languages. Same idea protects nuclear reactors, Mars rovers, and the secure chip in your phone.

Scientific Context: Triple Modular Redundancy (TMR) is a foundational pattern in high-availability and safety-critical computing. This model shows how correlated failures (common-mode faults) degrade reliability. Similar redundancy and low-latency safety-critical mechanisms are essential in real-time avionics software and flight simulator data pipelines developed by the author.

Deep Learning · Optimization

Gradient Pinball

⛰️ Every modern model (GPT, Stable Diffusion, your phone's autocorrect) learns by rolling a ball down a high-dimensional loss landscape. The deepest valley is the answer; the smaller dips are traps. Too cautious and you settle in a side valley convinced you've won; too aggressive and the ball leaves the landscape entirely. There is no spoon, only a basin.

Scientific Reference: Interactive optimization landscape visualizer. Represents the core mathematical optimization mechanism of the deep neural networks studied in the author's machine learning research.

Your move Land the optimizer in the global minimum. Pick step size and momentum, then Train. Momentum carries you through small hills the way physics carries a ball through ridges. 5★ only when you land in the real valley, not a saddle.

5★ global minimum4★ close miss3★ stuck in a side valley

Step size α 0.02 Momentum β 0.00

Faster epochs

Challenge: loading...

… Pick step size and momentum, hit Train.

Final loss … lower = closer to the answer

Epochs 0 steps the ball took

Set parameters. Hit Train. Try not to bend the manifold; that never ends well in any dimension.

🧠 What did you just learn?

Learning is rolling downhill. Every modern AI (GPT, image generators, the search ranking on your phone) learns by repeatedly nudging its parameters in whichever direction makes the loss smaller. That's literally rolling a ball downhill on a high-dimensional landscape. Step size matters: too small and you crawl forever, too big and you bounce out of the valley you wanted.

Momentum is the killer feature. It lets the ball carry forward through small ridges that would otherwise trap it. This idea, invented for physics simulations in the 1960s, is why training a large model in 2026 finishes in days instead of years. Same algorithm in your phone's autocorrect, your camera's autofocus, and every AI lab on Earth.

Scientific Context: High-dimensional optimization landscapes govern the training dynamics of deep neural networks. Understanding how learning rates and momentum values navigate local minima and saddle points is essential for designing secure training frameworks like those in the author's machine learning security research.

Dr. Ozgur Ural

Aim. Fire. Get graded.

Block Race

Model Heist Detector

Proof-of-Learning (SecurePoL)

Redundancy Reactor

Gradient Pinball