sampling

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2025 License: Apache-2.0 Imports: 2 Imported by: 0

README

NOTE: This file is generated with the use of AI.

sampling-go

A high-performance, hash-based sampling library for Go that provides deterministic, stateless sampling without external dependencies.

Overview

sampling-go is a generic sampling library that uses consistent hashing to make deterministic decisions about whether to include or exclude items from a sample.

It's particularly useful for:

  • Rate limiting: Reduce load on downstream services by sampling a percentage of requests
  • Logging and metrics: Sample log entries or metrics to reduce storage costs
  • A/B testing: Consistently assign users to test groups
  • Data processing: Sample large datasets for analysis
  • Distributed systems: Make consistent sampling decisions across multiple service instances

Note: This library provides deterministic sampling based on hash values. The quality of distribution depends on the hash function and input characteristics. For cryptographic use cases, consider using cryptographically secure hash functions.

Key Features
  • Deterministic: Same input always produces the same sampling decision
  • Stateless: No external storage or coordination required
  • High performance: All operations are in-memory with minimal overhead
  • Generic: Works with any Go type through generics
  • Configurable: Flexible hash functions and sampling rates
  • Zero dependencies: No third-party dependencies for core functionality

Installation

go get github.com/block/sampling-go

Quick Start

package main

import (
    "fmt"
    "github.com/block/sampling-go"
    "github.com/block/sampling-go/hashing"
)

func main() {
    // Create a sampler for strings with 30% sampling rate
    sampler := sampling.NewSampler[string](
        sampling.WithHashFunc(hashing.DefaultStringHasher),
        sampling.WithStaticRate[string](0.3), // 30% sampling rate
    )

    messages := []string{
        "user-123-login",
        "user-456-purchase", 
        "user-789-logout",
        "user-123-login", // Same message - will have same result
    }

    for _, msg := range messages {
        if sampler.Sample(msg) {
            fmt.Printf("✓ Sampled: %s\n", msg)
        } else {
            fmt.Printf("✗ Skipped: %s\n", msg)
        }
    }
}

How It Works

The library uses a hash-based approach to sampling:

  1. Hash Generation: Each input is converted to a uint32 hash value using a configurable hash function
  2. Threshold Calculation: The sampling rate is converted to a "max hash" threshold
  3. Comparison: If hash ≤ maxHash, the item is included in the sample
  4. Consistency: Same input always produces the same hash, ensuring deterministic behavior
Visual Representation

For a 5% sampling rate: Max Hash = max(uint32) * 0.05 = 4,294,967,295 * 0.05 = 214,748,364

0             214,748,364                                 4,294,967,295
|-----------------|------------- ... ---------------------------|
| 1 | 2 | 3 | 4 | 5 |            ...                 | 99 | 100 |    <-- bucket (metrics, logging) 
^^^^^^^^^^^^^^^^^^                                               
Items in this area ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
are INCLUDED (5%)               Items in this area  
                                are EXCLUDED (95%)
                  ^
                  |
               Max Hash

Usage Examples

Basic String Sampling
sampler := sampling.NewSampler[string](
    sampling.WithHashFunc(hashing.DefaultStringHasher),
    sampling.WithStaticRate[string](0.1), // 10% sampling
)

result := sampler.Sample("user-request-123")
// Returns true/false consistently for the same input
Sampling with Metadata
sampler := sampling.NewSampler[string](
    sampling.WithHashFunc(hashing.DefaultStringHasher),
    sampling.WithStaticRate[string](0.2),
)

shouldTake, meta := sampler.SampleWithMeta("user-action-456")

fmt.Printf("Take: %v\n", shouldTake)
fmt.Printf("Hash: %d\n", meta.Hash)
fmt.Printf("Bucket: %d\n", meta.Bucket)
fmt.Printf("Valid: %v\n", meta.IsHashValid)
Custom Struct Sampling
type Event struct {
    ID       string
    UserID   string
    Type     string
    Priority int
}

// Custom hash function
func eventHasher(event Event) (uint32, bool) {
    if event.ID == "" {
        return 0, false // Invalid event
    }
    composite := fmt.Sprintf("%s:%s:%s", event.ID, event.UserID, event.Type)
    return crc32.ChecksumIEEE([]byte(composite)), true
}

// Priority-based sampling rates
func priorityMaxHash(event Event) uint32 {
    switch {
    case event.Priority >= 9:
        return hashing.ToMaxHash(1.0, 1.0) // 100% for critical
    case event.Priority >= 7:
        return hashing.ToMaxHash(0.8, 1.0) // 80% for high
    default:
        return hashing.ToMaxHash(0.1, 1.0) // 10% for normal
    }
}

sampler := sampling.NewSampler[Event](
    sampling.WithHashFunc(eventHasher),
    sampling.WithMaxHashFunc(priorityMaxHash),
)
Request Rate Limiting
type Request struct {
    RequestID string
    Method    string
    Path      string
    UserAgent string
}

func requestHasher(req Request) (uint32, bool) {
    if req.RequestID == "" {
        return 0, false
    }
    return crc32.ChecksumIEEE([]byte(req.RequestID)), true
}

// Sample 30% of requests to reduce downstream load
sampler := sampling.NewSampler[Request](
    sampling.WithHashFunc(requestHasher),
    sampling.WithStaticRate[Request](0.3),
)

func handleRequest(req Request) {
    if sampler.Sample(req) {
        // Forward to expensive downstream service
        processRequest(req)
    } else {
        // Log and skip
        log.Printf("Skipped request %s", req.RequestID)
    }
}

Configuration Options

Hash Functions
  • hashing.DefaultStringHasher: CRC32 hash for strings
  • hashing.DefaultBytesHasher: CRC32 hash for byte slices
  • hashing.ZeroHash[T](): Always returns 0, false (for testing)
  • Custom functions implementing hashing.HashFunc[T]
Sampling Rates
  • WithStaticRate[T](rate): Fixed sampling rate (0.0 to 1.0)
  • WithMaxHashFunc[T](func): Dynamic sampling based on input
  • Default: 100% sampling rate
Additional Options
  • WithSkipInvalid[T](): Skip items that can't be hashed (default: include them)

API Reference

Core Types
type Sampler[T any] interface {
    Sample(T) bool
    SampleWithMeta(T) (bool, Meta)
}

type Meta struct {
    IsHashValid bool   // Whether hash was successfully generated
    Bucket      int32  // Hash bucket (1-100) for distribution analysis
    Hash        uint32 // Generated hash value
    MaxHash     uint32 // Sampling threshold
}
Constructor
func NewSampler[T any](opts ...SamplerOption[T]) *sampler[T]
Configuration Options
func WithHashFunc[T any](hasher hashing.HashFunc[T]) SamplerOption[T]
func WithMaxHashFunc[T any](maxHash hashing.MaxHashFunc[T]) SamplerOption[T]
func WithStaticRate[T any](rate float64) SamplerOption[T]
func WithSkipInvalid[T any]() SamplerOption[T]

Examples

See the examples/ directory for comprehensive examples:

  • Basic usage: Simple string sampling
  • Configuration options: Various sampling configurations
  • Custom structs: Advanced patterns with business logic
  • Request limiting: Real-world rate limiting scenario

Run examples:

cd examples
go run .
Development Setup
  1. Clone the repository:

    git clone https://github.com/block/sampling-go.git
    cd sampling-go
    
  2. Install dependencies:

    go mod download
    
  3. Run tests:

    go test ./...
    

License

LICENSE

Acknowledgments

  • Built with ❤️ by the Block team
  • Inspired by consistent hashing algorithms used in distributed systems
  • Uses CRC32 hashing for fast, deterministic hash generation

Support

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewSampler

func NewSampler[T any](opts ...SamplerOption[T]) *sampler[T]

NewSampler creates a new Sampler instance with the provided configuration options. If no hash function is provided, it defaults to ZeroHash (always returns 0, false). If no max hash function is provided, it defaults to 100% sampling rate.

Example usage:

sampler := NewSampler[string](
    WithHashFunc(hashing.DefaultStringHasher),
    WithStaticRate(0.1), // 10% sampling rate
    WithSkipInvalid(),
)

Types

type Meta

type Meta struct {
	// IsHashValid indicates whether the hash function successfully generated
	// a valid hash for the input value. False for empty/zero values.
	IsHashValid bool

	// Bucket is a number from 1 to 100 which represents even "chunks" of the uint32 keyspace.
	// Bucket helps to make sure that the "hash" generated by Sample is evenly distributed.
	// Ideally, if a value used for sampling is evenly distributed (or random),
	// Bucket should be evenly distributed too.
	// "bucket" with number 0 is reserved and should not be used.
	Bucket int32

	// Hash is the uint32 hash value generated for the input.
	// This value is compared against MaxHash to determine sampling decision.
	Hash uint32

	// MaxHash is the threshold value computed from the sampling rate.
	// Values with hash <= MaxHash are included in the sample.
	MaxHash uint32
}

Meta contains metadata about a sampling operation, providing insights into the internal calculations and decisions made during sampling.

type Sampler

type Sampler[T any] interface {
	// Sample returns "true" for objects that "survived" sampling
	// and should be taken and "false" for object that should be skipped.
	Sample(T) bool

	// SampleWithMeta returns "true" (take) or "false" (skip), just like the call to Sample.
	// Additionally, SampleWithMeta return Meta struct with the internals of the sampling call.
	// Meta can be used for metrics, logging, debugging, etc.
	SampleWithMeta(T) (bool, Meta)
}

Sampler is a hash-based in-memory sampling.

When a message is "sampled" - the algorithm decides if the message should be "taken" (true) or "skipped" (false).

Sampler does not have any dependencies and does not require a third-party storage. All operations are performed locally.

On the high level, Sample does the following: * it uses uint32 number as a "full" keyspace: [0, max(uint32)]. * clients set "sampling rate", i.e. "0.05" is 5%, "0.5" is 50%. * "sampling rate" must be between 0.0 and 1.0 * "sampling rate" is converted to a "max-hash" number, e.g. "0.05" becomes "214,748,364" * Sample generates a "hash" (uint32 number) for each message being sampled * after, the hash is compared with the "max-hash" * if "hash" is less than "max-hash" - the message added to the result, if not - skipped.

Here's an illustration of how it all works:

If Sampling rate is 5% (0.05) Then, the "max hash" is: 214,748,364

0 214,748,364 4,294,967,295 |-----------------|------------- ... ---------------------------| | 1 | 2 | 3 | 4 | 5 | ... | 99 | 100 | <- bucket

///////////////// \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
       ^                                     ^
       |                                     |
if the hash (number)                   if the hash (number)
is in this area,                       is in this area,
call to Sample() will return           call to Sample() will return
"true" (should take)                   "false" (should skip)

type SamplerOption

type SamplerOption[T any] func(c *sampler[T])

SamplerOption is a functional option type for configuring a Sampler instance. It follows the functional options pattern to provide flexible configuration.

func WithHashFunc

func WithHashFunc[T any](hasher hashing.HashFunc[T]) SamplerOption[T]

WithHashFunc configures the sampler to use a custom hash function. The hash function determines how input values are converted to uint32 hashes. See: hashing.HashFunc

func WithMaxHashFunc

func WithMaxHashFunc[T any](maxHash hashing.MaxHashFunc[T]) SamplerOption[T]

WithMaxHashFunc configures the sampler to use a custom max hash function. The max hash function determines the sampling threshold for each input. See: hashing.MaxHashFunc

func WithSkipInvalid

func WithSkipInvalid[T any]() SamplerOption[T]

WithSkipInvalid configures the sampler to skip inputs that cannot be hashed (e.g., empty strings, nil values). By default, invalid inputs are included, e.g. a call to Sample() returns "true".

func WithStaticRate

func WithStaticRate[T any](rate float64) SamplerOption[T]

WithStaticRate configures the sampler with a fixed sampling rate. The rate should be between 0.0 (0%) and 1.0 (100%). This creates a static max hash function that applies the same rate to all inputs.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL