class HClust::DistanceMatrix

Overview

Stores the pairwise distances between the elements of a set.

A distance matrix is a square, hollow, symmetric, two-dimensional matrix of distances. The latter are assumed to be a metric, which is defined by the properties of non-negativity, identity of indiscernibles, and triangle inequality [1]. However, these properties are not checked.

To avoid redundancy, the matrix is stored in the condensed form, i.e., a one-dimensional array of size (n * (n - 1)) // 2 that holds the upper triangular portion of the matrix. Then, the position of the distance between the elements i and j in the array is computed as ((2 * n - 3 - i) * i >> 1) + j - 1 with i < j. Refer to the Notes section in the SciPy documentation of the squareform function [2]. Using the condensed form is useful for implementing optimized clustering functions, among others.

Example

# 5x5 distance matrix
mat = HClust::DistanceMatrix.new(5) do |i, j|
  # compute distance between elements i and j
  10 * (i + 1) + j + 1
end
mat[0, 0] # => 0.0 (the distance between the same elements is zero)
mat[1, 1] # => 0.0
mat[0, 1] # => 12.0
mat[1, 0] # => 12.0 (symmetry)
mat[2, 3] # => 34.0

Defined in:

hclust/distance.cr

Constructors

Instance Method Summary

Constructor Detail

def self.from_condensed(values : Array(Float64)) : self #

Creates a new DistanceMatrix from the given condensed distance matrix (one-dimensional array). Raises ArgumentError if the given array cannot be interpreted as a condensed matrix (it contains an invalid number of elements) or Enumerable::EmptyError if it's empty.

NOTE distance values must be valid (non-NaN).


[View source]
def self.new(size : Int32) #

Creates a new DistanceMatrix of the given size filled with zeros.


[View source]
def self.new(elements : Indexable(T), & : T -> Number) : self forall T #

Creates a new DistanceMatrix from the given elements by invoking the given block once for each pair of elements, using the block's return value as the distance between the elements.

Raises Enumerable::EmptyError if elements is empty or ArgumentError if any distance value is NaN.

dm = HClust::DistanceMatrix.new([1, 2, 3, 4]) { |a, b| 10 * a + b }
dm.to_a # => [12.0, 13.0, 14.0, 23.0, 24.0, 34.0]

[View source]
def self.new(size : Int32, & : Int32, Int32 -> Number) #

Creates a new DistanceMatrix of the given size and invokes the given block once for each pair of elements (indexes), using the block's return value as the distance between the given elements.

Raises ArgumentError if any distance value is NaN.

HClust::DistanceMatrix.new(5) do |i, j|
  # compute distance between elements i and j
  10 * (i + 1) + j + 1
end

[View source]

Instance Method Detail

def ==(rhs : self) : Bool #

Returns true if the distances of the matrices are equal, else false.


[View source]
def ==(rhs) : Bool #

Returns true if the distances of the matrices are equal, else false.


[View source]
def [](i : Int, j : Int) : Float64 #

Returns the distance between the elements at i and j. Raises IndexError if any of the indexes is out of bounds.


[View source]
def [](indexes : Indexable(Int)) : self #

Returns the submatrix containing the distances between the elements at the given indexes. Raises Enumerable::EmptyError if indexes is empty or IndexError if any of the indexes is out of bounds.


[View source]
def []=(i : Int, j : Int, value : Float64) : Float64 #

Sets the distance between the elements at i and j to value. Returns value.

Negative indices can be used to start counting from the end of the elements. Raises IndexError if either i or j is out of bounds, or if i == j and value is not zero.


[View source]
def []?(i : Int, j : Int) : Float64 | Nil #

Returns the distance between the elements at i and j, or nil if any of the indexes is out of bounds.


[View source]
def []?(indexes : Indexable(Int)) : self | Nil #

Returns the submatrix containing the distances between the elements at the given indexes, or nil if indexes is empty or any of the indexes is out of bounds.


[View source]
def centroid : Int32 #

Returns the index of the element with the smallest average distance to all others.


[View source]
def clone : self #

Returns a new DistanceMatrix with the same elements as the matrix (deep copy).


[View source]
def map(& : Float64 -> Float64) : self #

Returns a new DistanceMatrix with the results of running the block against each element of the matrix.


[View source]
def map!(& : Float64 -> Float64) : self #

Invokes the given block for each element of the distance matrix, replacing the element with the value returned by the block. Returns self.


[View source]
def matrix_to_condensed_index(row : Int32, col : Int32) : Int32 #

Returns the condensed matrix index of the distance between the elements at i and j.


[View source]
def size : Int32 #

Returns the size of the encoded matrix.


[View source]
def to_a : Array(Float64) #

Returns the condensed distance matrix as an array.


[View source]
def to_unsafe(row : Int32, col : Int32) : Pointer(Float64) #

Returns a pointer to the internal buffer placed at the specified location.


[View source]
def to_unsafe : Pointer(Float64) #

Returns a pointer to the internal buffer.


[View source]
def unsafe_fetch(i : Int32, j : Int32) : Float64 #

Returns the distance between the elements at i and j, without doing any bounds check.

This should be called with i and j within 0...size and i != j. Use #[](i, j) and #[]?(i, j) instead for bounds checking and support for negative indexes.

NOTE This method should only be directly invoked if you are absolutely sure i and j are in bounds, to avoid a bounds check for a small boost of performance.


[View source]
def unsafe_fetch(index : Int) : Float64 #

Returns the distance at the given index of the condensed distance matrix (one-dimensional), without doing any bounds check.

This should be called with index within 0...((size * (size - 1)) // 2).

NOTE This method should only be directly invoked if you are absolutely sure the index is in bounds, to avoid a bounds check for a small boost of performance.


[View source]
def unsafe_put(i : Int32, j : Int32, value : Float64) : Float64 #

Sets the distance between the elements at i and j to value, without doing any bounds check.

This should be called with i and j within 0...size and i != j. Use #[]=(i, j, value) instead for bounds checking and support for negative indexes.

NOTE This method should only be directly invoked if you are absolutely sure i and j are in bounds, to avoid a bounds check for a small boost of performance.


[View source]
def unsafe_put(index : Int32, value : Float64) : Float64 #

Sets the distance at the given index of the condensed distance matrix (one-dimensional) to value, without doing any bounds check.

This should be called with index within 0...((size * (size - 1)) // 2).

NOTE This method should only be directly invoked if you are absolutely sure the index is in bounds, to avoid a bounds check for a small boost of performance.


[View source]