Building NodeSeek Rulings Search: A Full-Stack Moderation Transparency Tool

From a weekend Telegram bot to a multi-surface platform with a browser userscript, a hardened HTTP API, and Cloudflare Turnstile integration — this is how it got built.

Background

NodeSeek is a Chinese-language tech community where moderators record every administrative action taken against users — bans, coin deductions, post locks, and more — in a public-ish ruling log. The data is technically accessible through the platform’s admin API, but there’s no search interface exposed to regular users. If you want to know whether someone has a history of rule violations, you have to scroll through pages of raw records manually.

I thought that was fixable in an afternoon. It took about six weeks.

What the Project Actually Is

NodeSeek Rulings Search is a multi-surface query tool for NodeSeek’s moderation records. It has four main components that all share the same SQLite database:

Component	Role
`scan.py`	Admin-mode Telegram bot: crawls records + handles queries
`scan_public.py`	Public-mode Telegram bot: query-only with global rate limiting
`query_backend.py`	Threaded HTTP API server for the browser extension
`Tampermonkey.js`	Userscript that injects query buttons directly into NodeSeek pages

The end result: anyone browsing NodeSeek can click a ”🔍 查询管理记录” button next to any username and instantly see that user’s full moderation history in a polished modal — without ever leaving the page.

The Development Journey

Week 1: Getting the Data (March 6–7)

The first commit was deceptively simple — a single scan.py and a README. The core loop was already there: authenticate with a session cookie, hit the NodeSeek admin ruling API incrementally starting from MAX(id)+1, persist records into SQLite with INSERT OR IGNORE, and surface them via a Telegram bot’s /search command.

But “simple” started to unravel the moment I looked at what the API actually returned. The action_request field is raw JSON — a nested structure that describes what the moderator did. Something like:

{"type": "moveTo", "targetSection": "tech", "readLevel": 0}

This is utterly unreadable to a user. So the very first real work was writing translate_action_request() — a function that would turn that blob into something human-readable in Chinese. This ended up being one of the most frequently iterated pieces of the whole codebase:

Distinguishing posts from comments
Parsing coin adjustments (chickenLeg) with a +/- prefix
Handling stardust currency separately
Mapping English section identifiers to Chinese board names
Differentiating lock from unlock actions
Handling comment pinning

By end of day two on March 7, the bot could parse nine distinct action types, display the “busiest moderation day” in /static, and required a password to update the crawler cookie via /setcookie. A lot happened fast.

Week 1 Continued: Going Public (March 8–10)

The private bot was useful, but I wanted to let others query it too without handing out admin access to the crawler. That meant a clean split: scan_public.py would be a stripped-down version with no cookie management, no manual crawl trigger, and a sliding-window rate limiter to protect the database from abuse.

The rate limiter is a collections.deque-based rolling window — track query timestamps, evict entries older than 60 seconds, reject requests when the count exceeds the threshold. Simple, zero-dependency, and effective enough for a Telegram bot with modest usage.

Around the same time I fixed a subtle but important bug: the original /search was doing a LIKE fuzzy match, which meant searching for user abc would also return xabcy. Switched to exact match (target_name = ?) for /search and introduced a dedicated /partial_match command for cases where you only know part of a username. The partial match flow handles the multi-user case gracefully — if more than one user matches, it renders inline keyboard buttons so the user can select exactly who they’re looking for.

I also bumped the crawl interval from once-daily to every 6 hours. Moderation happens around the clock.

Week 2: The AI Prompt Feature (March 11)

This one is a bit unusual. The last page of any user’s search results now includes a pre-formatted AI analysis prompt — something you can copy directly into an LLM to get a structured “integrity score” across two dimensions: honesty and rule adherence, each worth 50 points.

It reads all the user’s ruling records and feeds them into the prompt template as context. Whether this is genuinely useful or a novelty is debatable, but it’s a good example of how the data model — having all records in one place with normalized action descriptions — enables features that would be hard to build otherwise.

Week 3: The Browser Extension (March 24)

This was the biggest architectural leap. The Telegram bot was fine for power users, but it required leaving the NodeSeek page, opening Telegram, typing a command, and waiting. That’s friction. The ideal UX is: see a username on the page, click a button, read the results — all without context switching.

So I built two things in parallel on March 24:

query_backend.py — a pure-Python ThreadingHTTPServer that exposes three endpoints:

GET /api/search?target=<username>&page=1&per_page=5
GET /api/captcha/config
GET /api/captcha/verify?token=<turnstile_token>

The security model is layered in the request handling path:

Per-IP exponential-backoff ban (burst detection → ban, repeated violations → longer bans up to 24 hours)
Per-IP minute-window rate limit
Global minute-window rate limit
Optional Cloudflare Turnstile CAPTCHA gate

I deliberately avoided a static API key. The userscript is public JavaScript — any key embedded in it would be immediately visible to anyone who reads the source. The Turnstile approach is strictly better: it validates that requests come from a real browser session on nodeseek.com, without exposing any secret.

Tampermonkey.js — a userscript that uses MutationObserver to watch for new post/comment nodes being added to the DOM (NodeSeek renders dynamically), injects a ”🔍 查询管理记录” button next to each username, and handles the full query flow in a modal overlay.

The modal itself is a hand-rolled frosted glass panel — backdrop-filter: blur(18px) saturate(160%) — with CSS custom properties for full light/dark mode support. No framework, no build step. Just a self-contained IIFE with ~730 lines of vanilla JavaScript.

One thing I was careful about: URL validation. The API_BASE_URL constant goes through validateApiBaseUrl() at script initialization — it must be HTTPS, and the hostname must appear in a hard-coded TRUSTED_API_HOSTS set. This prevents a class of attacks where a tampered version of the script could be configured to exfiltrate queries to an attacker-controlled server.

Week 4–6: UI Polish (March 25 – April 19)

The last stretch was almost entirely UI work:

v1.3 → v1.4: Fixed a duplicate-button bug where clicking “load more comments” would inject a second query button next to the same username. Added a WeakSet to track processed nodes, and a guard that checks node.nextElementSibling?.classList.contains('custom-search-btn') before injecting.
Dark mode via GM_addStyle: Moved all CSS into a GM_addStyle injection (styled as a <style> tag with id="ns-ruling-style") to avoid conflicts with NodeSeek’s own stylesheet cascade.
Source attribution: The modal footer now displays the API hostname and the last database update time, so users know how fresh the data is.
Code refactoring (April 19): Cleaned up variable names, extracted helper functions, improved readability throughout all three Python files.

Architecture Overview

NodeSeek Admin API
        │
        ▼
   scan.py (crawler)
        │  INSERT OR IGNORE
        ▼
nodeseek_ruling.db (SQLite)
        │
   ┌────┴────────────────┐
   │                     │
   ▼                     ▼
scan_public.py    query_backend.py
(Telegram bot)    (HTTP API server)
                         │
                         ▼
                  Tampermonkey.js
                  (browser userscript)

All four components read from the same rulings table. The schema is intentionally minimal:

CREATE TABLE rulings (
    id          INTEGER PRIMARY KEY,
    admin_name  TEXT,
    target_name TEXT,
    post_id     INTEGER,
    action_request TEXT,
    created_at  TEXT,
    raw_data    TEXT,
    fetch_time  TEXT
)

translate_action_request() exists in all three Python files. This is a known duplication tradeoff — keeping the files independently deployable is worth the maintenance cost of syncing the translation logic manually.

Key Technical Decisions

Why SQLite? The dataset is write-once, append-only, and query patterns are simple exact/fuzzy matches on target_name. SQLite handles this perfectly with zero operational overhead. A proper database server would be overkill.

Why curl_cffi instead of requests? NodeSeek uses Cloudflare protection. curl_cffi can mimic browser TLS fingerprints, which bypasses the basic bot detection that would reject a standard requests call.

Why a deque-based rate limiter instead of something like Redis? The public bot runs on a single process. A thread-safe deque with maxlen is literally three lines and has no dependencies. Redis would introduce an entire additional service for no real benefit at this scale.

Why no static API key in the userscript? Anything embedded in a Greasyfork-published userscript is public. Cloudflare Turnstile solves the same problem — “is this a real browser on the right domain?” — without any secret that could be extracted.

Setup

Telegram Bots

pip install python-telegram-bot[job-queue] curl_cffi

# Full mode (crawler + queries)
python scan.py

# Public mode (queries only)
python scan_public.py

On first run, each script auto-generates its config.json with placeholder values, then exits. Fill in your bot token and admin chat ID, then rerun.

HTTP API Backend

python query_backend.py

Same pattern — first run generates query_backend_config.json and exits. Configure your Turnstile keys if you want CAPTCHA protection, then rerun.

Tampermonkey Userscript

Install Tampermonkey.js as a userscript. The @connect metadata directive must include your API server’s domain for GM_xmlhttpRequest to be allowed to reach it.

Bot Commands

Command	Description	Mode
`/start`	Show status and feature overview	All
`/search <username>`	Exact-match moderation record lookup	All
`/partial_match <keyword>`	Fuzzy search, inline user selection	All
`/static`	Global database statistics	All
`/setcookie <password> <cookie>`	Update crawler session cookie	Full only
`/run`	Trigger an immediate crawl	Full only

Lessons Learned

Normalization pays off early. Writing translate_action_request() on day one meant every feature built on top of it — Telegram messages, HTTP API responses, AI prompts — automatically got human-readable action text. The raw JSON stays in raw_data for auditability, but nothing in the UI ever has to parse it again.

MutationObserver is the right tool for dynamic pages. NodeSeek loads comments asynchronously. A setTimeout-only approach would miss late-loaded content. The observer + debounce pattern (200ms debounce on childList mutations) is reliable without being expensive.

Layered rate limiting beats a single gate. The HTTP backend enforces four distinct limits: burst detection, per-IP minute window, global minute window, and CAPTCHA. Each layer catches a different abuse pattern. A single global rate limit would either be too permissive (one user can exhaust it) or too restrictive (legitimate users get blocked when traffic is high).

Dark mode is not an afterthought. The userscript modal lives inside a page it doesn’t control. Using CSS custom properties with a @media (prefers-color-scheme: dark) override block — scoped under a unique panel ID — means the modal matches the user’s system preference without touching anything on the host page.

What’s Next

A few things I’m thinking about:

A web frontend — the HTTP API is already there; a simple static page would make the data accessible without needing Telegram or Tampermonkey.
Admin analytics dashboard — the data is rich enough to surface interesting patterns: which admins are most active, what action types are most common, week-over-week trends.
Multi-database sync — right now the crawler runs on a single machine. A read replica or periodic export would make the public query surface more resilient.

The full source is available on GitHub: Z1rconium/NodeSeek-Rulings-Search

Disclaimer: This project is for educational and research purposes only. Use responsibly and in accordance with NodeSeek’s terms of service.

NodeSeek Rulings Search Bot

A Telegram bot that crawls NodeSeek forum moderation records and provides searchable ruling history via a local SQLite database.