class HClust::DistanceMatrix
- HClust::DistanceMatrix
- Reference
- Object
Overview
Stores the pairwise distances between the elements of a set.
A distance matrix is a square, hollow, symmetric, two-dimensional matrix of distances. The latter are assumed to be a metric, which is defined by the properties of non-negativity, identity of indiscernibles, and triangle inequality [1]. However, these properties are not checked.
To avoid redundancy, the matrix is stored in the condensed form, i.e.,
a one-dimensional array of size (n * (n - 1)) // 2
that holds the
upper triangular portion of the matrix. Then, the position of the
distance between the elements i and j in the array is computed as
((2 * n - 3 - i) * i >> 1) + j - 1
with i < j
. Refer to the Notes
section in the SciPy documentation of the squareform
function [2].
Using the condensed form is useful for implementing optimized
clustering functions, among others.
Example
# 5x5 distance matrix
mat = HClust::DistanceMatrix.new(5) do |i, j|
# compute distance between elements i and j
10 * (i + 1) + j + 1
end
mat[0, 0] # => 0.0 (the distance between the same elements is zero)
mat[1, 1] # => 0.0
mat[0, 1] # => 12.0
mat[1, 0] # => 12.0 (symmetry)
mat[2, 3] # => 34.0
Defined in:
hclust/distance.crConstructors
-
.from_condensed(values : Array(Float64)) : self
Creates a new
DistanceMatrix
from the given condensed distance matrix (one-dimensional array). -
.new(size : Int32)
Creates a new
DistanceMatrix
of the given size filled with zeros. -
.new(elements : Indexable(T), & : T -> Number) : self forall T
Creates a new
DistanceMatrix
from the given elements by invoking the given block once for each pair of elements, using the block's return value as the distance between the elements. -
.new(size : Int32, & : Int32, Int32 -> Number)
Creates a new
DistanceMatrix
of the given size and invokes the given block once for each pair of elements (indexes), using the block's return value as the distance between the given elements.
Instance Method Summary
-
#==(rhs : self) : Bool
Returns
true
if the distances of the matrices are equal, elsefalse
. -
#==(rhs) : Bool
Returns
true
if the distances of the matrices are equal, elsefalse
. -
#[](i : Int, j : Int) : Float64
Returns the distance between the elements at i and j.
-
#[](indexes : Indexable(Int)) : self
Returns the submatrix containing the distances between the elements at the given indexes.
-
#[]=(i : Int, j : Int, value : Float64) : Float64
Sets the distance between the elements at i and j to value.
-
#[]?(i : Int, j : Int) : Float64 | Nil
Returns the distance between the elements at i and j, or
nil
if any of the indexes is out of bounds. -
#[]?(indexes : Indexable(Int)) : self | Nil
Returns the submatrix containing the distances between the elements at the given indexes, or
nil
if indexes is empty or any of the indexes is out of bounds. -
#centroid : Int32
Returns the index of the element with the smallest average distance to all others.
-
#clone : self
Returns a new
DistanceMatrix
with the same elements as the matrix (deep copy). -
#map(& : Float64 -> Float64) : self
Returns a new
DistanceMatrix
with the results of running the block against each element of the matrix. -
#map!(& : Float64 -> Float64) : self
Invokes the given block for each element of the distance matrix, replacing the element with the value returned by the block.
-
#matrix_to_condensed_index(row : Int32, col : Int32) : Int32
Returns the condensed matrix index of the distance between the elements at i and j.
-
#size : Int32
Returns the size of the encoded matrix.
-
#to_a : Array(Float64)
Returns the condensed distance matrix as an array.
-
#to_unsafe(row : Int32, col : Int32) : Pointer(Float64)
Returns a pointer to the internal buffer placed at the specified location.
-
#to_unsafe : Pointer(Float64)
Returns a pointer to the internal buffer.
-
#unsafe_fetch(i : Int32, j : Int32) : Float64
Returns the distance between the elements at i and j, without doing any bounds check.
-
#unsafe_fetch(index : Int) : Float64
Returns the distance at the given index of the condensed distance matrix (one-dimensional), without doing any bounds check.
-
#unsafe_put(i : Int32, j : Int32, value : Float64) : Float64
Sets the distance between the elements at i and j to value, without doing any bounds check.
-
#unsafe_put(index : Int32, value : Float64) : Float64
Sets the distance at the given index of the condensed distance matrix (one-dimensional) to value, without doing any bounds check.
Constructor Detail
Creates a new DistanceMatrix
from the given condensed distance
matrix (one-dimensional array). Raises ArgumentError
if the given
array cannot be interpreted as a condensed matrix (it contains an
invalid number of elements) or Enumerable::EmptyError
if it's
empty.
NOTE distance values must be valid (non-NaN).
Creates a new DistanceMatrix
from the given elements by invoking
the given block once for each pair of elements, using the block's
return value as the distance between the elements.
Raises Enumerable::EmptyError
if elements is empty or
ArgumentError
if any distance value is NaN.
dm = HClust::DistanceMatrix.new([1, 2, 3, 4]) { |a, b| 10 * a + b }
dm.to_a # => [12.0, 13.0, 14.0, 23.0, 24.0, 34.0]
Creates a new DistanceMatrix
of the given size and invokes the
given block once for each pair of elements (indexes), using the
block's return value as the distance between the given elements.
Raises ArgumentError
if any distance value is NaN.
HClust::DistanceMatrix.new(5) do |i, j|
# compute distance between elements i and j
10 * (i + 1) + j + 1
end
Instance Method Detail
Returns the distance between the elements at i and j. Raises
IndexError
if any of the indexes is out of bounds.
Returns the submatrix containing the distances between the elements
at the given indexes. Raises Enumerable::EmptyError
if indexes
is empty or IndexError
if any of the indexes is out of bounds.
Sets the distance between the elements at i and j to value. Returns value.
Negative indices can be used to start counting from the end of the
elements. Raises IndexError
if either i or j is out of bounds,
or if i == j and value is not zero.
Returns the distance between the elements at i and j, or nil
if
any of the indexes is out of bounds.
Returns the submatrix containing the distances between the elements
at the given indexes, or nil
if indexes is empty or any of the
indexes is out of bounds.
Returns the index of the element with the smallest average distance to all others.
Returns a new DistanceMatrix
with the results of running the block
against each element of the matrix.
Invokes the given block for each element of the distance matrix,
replacing the element with the value returned by the block. Returns
self
.
Returns the condensed matrix index of the distance between the elements at i and j.
Returns a pointer to the internal buffer placed at the specified location.
Returns the distance between the elements at i and j, without doing any bounds check.
This should be called with i and j within 0...size
and i != j
. Use #[](i, j)
and #[]?(i, j)
instead for bounds checking and
support for negative indexes.
NOTE This method should only be directly invoked if you are absolutely sure i and j are in bounds, to avoid a bounds check for a small boost of performance.
Returns the distance at the given index of the condensed distance matrix (one-dimensional), without doing any bounds check.
This should be called with index within 0...((size * (size - 1)) // 2)
.
NOTE This method should only be directly invoked if you are absolutely sure the index is in bounds, to avoid a bounds check for a small boost of performance.
Sets the distance between the elements at i and j to value, without doing any bounds check.
This should be called with i and j within 0...size
and i != j
. Use #[]=(i, j, value)
instead for bounds checking and support
for negative indexes.
NOTE This method should only be directly invoked if you are absolutely sure i and j are in bounds, to avoid a bounds check for a small boost of performance.
Sets the distance at the given index of the condensed distance matrix (one-dimensional) to value, without doing any bounds check.
This should be called with index within 0...((size * (size - 1)) // 2)
.
NOTE This method should only be directly invoked if you are absolutely sure the index is in bounds, to avoid a bounds check for a small boost of performance.