Show and Tell: Streaming 50,000 records into a Web Grid with Zero-Latency Scrolling (and how we built it without a backend)_健趣网_健康去哪儿

Show and Tell: Streaming 50,000 records into a Web Grid with Zero-Latency Scrolling (and how we built it without a backend)

作者：发布日期：2026-03-01 16:21:08 浏览次数：56 标签：阳痿锻炼

I've always been frustrated by the lack of an accurate ranking for top open-source contributors on GitHub. The available lists either cap out early or are highly localized, completely missing people with tens or hundreds of thousands of contributions.

So, I decided to build a true global index: DevIndex. It ranks the top 50,000 most active developers globally based on their lifetime contributions.

But from an engineering perspective, building an index of this scale revealed a massive technical challenge: How do you render, sort, and filter 50,000 data-rich records in a browser without it locking up or crashing?

To make it harder, DevIndex is a Free and Open Source project. I didn't want to pay for a massive database or API server cluster. It had to be a pure "Fat Client" hosted on static GitHub Pages. The entire 50k-record dataset (~23MB of JSON) had to be managed directly in the browser.

We ended up having to break and rewrite our own UI framework (Neo.mjs) to achieve this. Here are the core architectural changes we made to make it possible:

1. Engine-Level Streaming (O(1) Memory Parsing)

You can't download a 23MB JSON file and call JSON.parse() on it without freezing the UI. Instead, we built a Stream Proxy. It fetches the users.jsonl (Newline Delimited JSON) file and uses ReadableStream and TextDecoderStream to parse the data incrementally. As chunks of records arrive, they are instantly pumped into the App Worker and rendered. You can browse the first 500 users instantly while the remaining 49,500 load in the background.

2. Turbo Mode & Virtual Fields (Zero-Overhead Records)

If we instantiated a full Record class for all 50,000 developers, the memory overhead would be catastrophic. We enabled "Turbo Mode", meaning the Store holds onto the raw, minified POJOs exactly as parsed from the stream. To allow the Grid to sort by complex calculated fields (like "Total Commits 2024" which maps to an array index), we generate prototype-based getters on the fly. Adding 60 new year-based data columns to the grid adds 0 bytes of memory overhead to the individual records.

3. The "Fixed-DOM-Order" Grid (Zero-Mutation Scrolling)

Traditional Virtual DOM frameworks struggle with massive lists. Even with virtualization, scrolling fast causes thousands of structural DOM changes (insertBefore, removeChild), triggering severe layout thrashing and Garbage Collection pauses. We rewrote the Grid to use a strict DOM Pool. The VDOM children array and the actual DOM order of the rows never change. If your viewport fits 20 rows, the grid creates exactly 26 Row instances. As you scroll, rows leaving the viewport are simply recycled in place using hardware-accelerated CSS translate3d.

A 60fps vertical scroll across 50,000 records generates 0 insertNode and 0 removeNode commands. It is pure attribute updates.

4. The Quintuple-Threaded Architecture

To keep the UI fluid while sorting 50k records and rendering live "Living Sparkline" charts in the cells, we aggressively split the workload: - Main Thread: Applies minimal DOM updates only. - App Worker: Manages the entire 50k dataset, streaming, sorting, and VDOM generation. - Data Worker: (Offloads heavy array reductions). - VDom Worker: Calculates diffs in parallel. - Canvas Worker: Renders the Sparkline charts independently at 60fps using OffscreenCanvas.

To prove the Main Thread is unblocked, we added a "Performance Theater" effect: when you scroll the grid, the complex 3D header animation intentionally speeds up. The Canvas worker accelerates while the grid scrolls underneath it, proving visually that heavy canvas operations cannot block the scrolling logic.

The Autonomous "Data Factory" Backend

Because the GitHub API doesn't provide "Lifetime Contributions," we built a massive Node.js Data Factory. It features a "Spider" (discovery engine) that uses network graph traversal to find hidden talent, and an "Updater" that fetches historical data.

Privacy & Ethics: We enforce a strict "Opt-Out-First" privacy policy using a novel "Stealth Star" architecture. If a developer doesn't want to be indexed, they simply star a specific repository. The Data Factory detects this cryptographically secure action, instantly purges them, adds them to a blocklist, and encourages them to un-star it. Zero email requests, zero manual intervention.

We released this major rewrite last night as Neo.mjs v12.0.0. The DevIndex backend, the streaming UI, and the complete core engine rewrite were completed in exactly one month by myself and my AI agent.

Because we basically had to invent a new architecture to make this work, we wrote 26 dedicated guides (living right inside the repo) explaining every part of the system—from the Node.js Spider that finds the users, to the math behind the OffscreenCanvas physics, to our Ethical Manifesto on making open-source labor visible.

Check out the live app (and see where you rank!): ? https://neomjs.com/apps/devindex/

Read the architectural deep-dive guides (directly in the app's Learn tab): ? https://neomjs.com/apps/devindex/#/learn

Would love to hear how it performs on different machines or if anyone has tackled similar "Fat Client" scaling issues!

我要点评：	好评(有用) 差评(没用)
评价内容：
	10~500个字