sybil-detection clusters base wallet-tracking analysis agent-api

How Ramaris Detects Coordinated Wallet Activity

Ramaris uses temporal fingerprinting, funding source analysis, and nonce correlation to identify when multiple wallets belong to the same entity. Learn how our 5-signal sybil detection pipeline transforms raw wallet counts into real entity estimates.

By Ramaris Team • February 23, 2026 • 9 min read • Updated February 23, 2026

Quick Answer: When Ramaris detects an accumulation cluster — 3 or more wallets buying the same token within a 72-hour window — it runs a 5-signal sybil detection pipeline to estimate how many real, independent entities are behind those wallets. The result transforms “13 wallets accumulating cbBTC” into “approximately 3 entities controlling 13 wallets,” which is a fundamentally different piece of intelligence.

TL;DR:

Raw wallet counts in accumulation clusters can be misleading — one entity can operate dozens of wallets
Ramaris runs a 5-signal pipeline (temporal fingerprinting, funding source analysis, nonce correlation, gas station detection, contract overlap) to estimate real entity counts
Wallet pairs scoring above 0.60 composite are merged into the same entity using a Union-Find algorithm
The pipeline refreshes every 15 minutes for active clusters
Sybil entity data is available in the Agent API and on ULTRA plan dashboards

The Problem With Raw Wallet Counts

Accumulation clusters are only useful if you know how many real entities are behind them.

“13 wallets accumulating cbBTC” could mean two very different things:

One entity running 13 wallets, building a large position while obscuring its true size
13 independent traders who independently found the same opportunity

The difference is significant. Scenario one suggests a single actor with a specific thesis and target position. Scenario two suggests organic, distributed conviction from multiple independent sources — a stronger signal by most analytical frameworks.

Raw wallet counts can’t distinguish these cases. An entity running a coordinated accumulation strategy has strong incentives to split across multiple wallets: lower visibility, reduced slippage on entry, harder to track exit timing. Sophisticated operators on Base routinely use 5-20 wallets for a single coordinated position.

Sybil detection exists to answer the question the raw count cannot: how many genuinely independent actors are in this cluster?

The 5-Signal Detection Pipeline

Ramaris analyzes five distinct behavioral signals for every wallet pair in a cluster. Each signal produces a score between 0.0 and 1.0. The signals are combined into a composite score using weighted averaging, and wallet pairs with composite scores above 0.60 are merged into the same entity.

1. Gas Station Detection (weight: 0.30)

The highest-weighted signal. A “gas station” wallet is an address that distributes small amounts of ETH to multiple wallets within a short window before they all start trading — the classic batch wallet creation pattern.

The detector looks back 24 hours from a cluster’s first buy timestamp and checks whether any sender funded multiple cluster member wallets in that window. When the gas amounts are identical (for example, exactly 0.005 ETH to each wallet), confidence approaches certainty.

Legitimate independent traders do not receive ETH from the same address minutes before buying the same token. The false positive rate for this signal is near zero when the funding window is tight.

2. Funding Source Analysis (weight: 0.25)

Traces where each wallet’s initial ETH came from by looking up the first incoming ETH transfer to each address. Wallets funded by the same non-CEX address are highly likely to be the same entity.

The key filter here is CEX hot wallets. Coinbase, Binance, OKX, and Bybit withdrawal addresses fund enormous numbers of unrelated wallets — pairing wallets because they both withdrew from Coinbase would generate false positives constantly. Known CEX hot wallet addresses are filtered before scoring, so only shared funding from personal or operational addresses counts as a signal.

Funding source data is cached permanently per wallet. Once looked up, it does not need to be re-fetched.

3. Temporal Fingerprinting (weight: 0.15)

Analyzes timing patterns between wallet buy transactions within the same cluster. Wallets controlled by the same automated system trade with characteristic timing signatures: same block or consecutive blocks, fixed-delay intervals, or burst patterns where all trades land within seconds.

Four metrics feed into the temporal score:

minDelta: the smallest time gap between any two swaps from this wallet pair
avgDelta: average inter-swap spacing
burstCount: number of swap pairs that occurred within 30 seconds of each other
Regularity coefficient of variation: low CV across multiple swaps indicates bot-like regular intervals

A wallet pair trading within 30 seconds of each other scores 1.0 on the delta component. A pair trading more than 2 minutes apart on average scores 0.0. Temporal fingerprinting is the cheapest signal to compute — it runs on existing swap timestamp data with no external calls required.

4. Nonce Correlation (weight: 0.15)

Examines transaction counts for each wallet. A wallet’s nonce (its total transaction count) is a proxy for age and reuse. Freshly created wallets with nonces below 5 appearing together in the same cluster are significantly more suspicious than wallets with hundreds of transactions.

The scoring is direct: two wallets in the same cluster that both have nonces below 5 score 0.8. Two wallets both created within the past 7 days score 0.4. Established wallets with high nonces score 0.0 on this signal, meaning the signal does not penalize genuine accumulation by long-running wallets.

Nonce data is fetched via batched eth_getTransactionCount RPC calls and cached with a 7-day refresh window.

5. Contract Interaction Overlap (weight: 0.15)

Computes Jaccard similarity on the set of contract addresses each wallet has interacted with across its history. Wallets operated by the same entity tend to use the same DEX routers, the same bridges, the same approval targets, and the same set of protocols.

The Jaccard formula is straightforward: the size of the intersection divided by the size of the union of two wallets’ contract sets. A score above 0.70 is treated as meaningful. Universal contracts — WETH, common DEX routers used by virtually every wallet — are filtered out before computation to avoid false positives from wallets that merely share mainstream DeFi tooling.

Contract overlap is the most computationally expensive signal and serves primarily as a tiebreaker when other signals are ambiguous.

Entity Grouping: How Pairs Become Clusters

Computing pairwise scores across all wallets in a cluster produces a similarity graph. The next step is grouping that graph into entities.

Ramaris uses a Union-Find (Disjoint Set Union) data structure for this. The algorithm is simple:

Compute composite scores for all wallet pairs in the cluster
For any pair with a composite score above 0.60, merge them into the same entity
Count the resulting connected components

Each connected component is one estimated entity. The estimatedEntities field on the clusters API is this component count.

The weights auto-normalize based on which signals are available. If funding source data has not yet been fetched for some wallets, the composite is computed from the available signals only, with weights re-normalized to sum to 1.0. This means a cluster can return a useful entity estimate even before all signals have been collected.

Confidence levels are derived from the maximum composite score seen across all pairs:

high: at least one pair scored above 0.80
medium: at least one pair scored above 0.60
low: all pairs scored below 0.60
unknown: insufficient data to score

The analysis pipeline runs as a background job every 15 minutes for all active ACCUMULATING clusters, and also triggers immediately when a new cluster is created.

A Worked Example

A cluster appears: 13 wallets buying cbBTC within a 48-hour window, $45,000 total volume.

The sybil pipeline runs and finds:

Wallets 1, 2, 5, 8, 9, 10, 11, and 12 were all funded by the same address within a 2-hour window before the first buy. Five of them also traded within 30-second windows of each other. Composite scores for pairs within this group exceed 0.85. They merge into Entity 1.
Wallets 3, 4, and 6 share the same funding source (different from Entity 1). No temporal correlation. Composite scores around 0.65. They merge into Entity 2.
Wallets 7 and 13 have independent funding sources, high nonces, and no temporal overlap. They score below 0.60 against all other wallets and remain as separate entities.

Result: 13 wallets, 4 estimated entities, confidence high.

The Agent API response for this cluster:

{
  "data": [{
    "id": 1,
    "token": { "address": "0x...", "symbol": "cbBTC" },
    "status": "ACCUMULATING",
    "walletCount": 13,
    "totalUsdVolume": 45000,
    "sybil": {
      "estimatedEntities": 3,
      "confidence": 0.82,
      "entityGroups": [[1,2,5,8,9,10,11,12], [3,4,6], [7,13]],
      "signalsUsed": ["temporal", "funding", "nonce", "gasStation", "contractOverlap"],
      "analyzedAt": "2026-02-23T12:00:00Z"
    }
  }]
}

The same cluster — 13 wallets, $45,000 volume — reads very differently once you know it represents 3 entities rather than 13. The largest entity controls 8 wallets and was funded from a single source. That is a concentrated, operationally coordinated position, not broad organic accumulation.

Why This Matters for Agent Integrations

AI agents querying the Agent API receive sybil entity data in every clusters response. The endpoint is GET https://api.ramaris.app/agent/v1/clusters and each request costs $0.05 via x402 micropayment — no subscription required, no API key setup.

The entity-level data transforms what agents can reason about. An agent making decisions based on raw wallet counts is working with potentially misleading inputs. An agent that can compare walletCount: 13 against estimatedEntities: 3 can ask the right question: is this genuine distributed accumulation, or one actor spreading a position across wallets?

This distinction is particularly relevant for agents tracking:

Token accumulation before protocol events
Whale positioning across multiple addresses
Patterns that appear organic but are operationally centralized

For the full API schema, pass https://www.ramaris.app/llms-full.txt to your AI assistant of choice.

Availability

Sybil detection data is available on:

ULTRA plan: Entity estimates visible directly in the cluster view on the Ramaris dashboard
Agent API: Included in all /agent/v1/clusters responses at $0.05/request — no plan required
FREE and PRO plans: Raw wallet counts only (no entity estimates)

Cluster detection itself — identifying that accumulation is occurring — is available on all plans. Sybil entity analysis is an ULTRA and Agent API feature.

Explore accumulation clusters at Browse Strategies or get started with the Agent API at https://api.ramaris.app/agent/v1/clusters. The Base Smart Money Playbook includes a dedicated chapter on sybil detection methodology and how entity estimates change the way you interpret accumulation clusters.

For informational purposes only. Not financial advice. On-chain data reflects historical wallet activity and does not indicate future performance. Entity estimates are probabilistic heuristics, not certainties. Always conduct your own research (DYOR) before making any financial decisions.

Track smart money wallets on Base

Real-time alerts, PnL history, sybil filtering. Free to start.

Start tracking wallets