Notes on Error-Correcting Codes
Last Modified 2022-03-17
Definition: Alphabet, Letter, Symbol
An alphabet is a finite set. We call the elements of an alphabet letters or symbols.
Definition: Word, Length, Kleene Star, Kleene Closure,
A word over an alphabet of length is an element of (the -fold Cartesian product of with itself). The set of all words of any length over is called the Kleene star or Kleene closure of , and is denoted by .
In other words, , and length defines a function that maps each element of to .
Definition: Code, Block Length, Codeword
A code over an alphabet of block length is a subset of . The elements of a code are called its codewords.
Definition: Systematic Encoding Map
Let be an alphabet, and let with . A systematic encoding map is a function that has the following property: for any , the first symbols of coincide with .
Definition: Indicator Function,
The indicator function of a logical proposition is the quantity
Definition: Hamming Distance, , Relative Hamming Distance,
Let be an alphabet, and let . The Hamming distance between two words is the quantity
Similarly, the relative Hamming distance between and is the quantity
The Hamming distance and relative Hamming distance are metrics on . In particular, they satisfy the triangle inequality.
Definition: Message Length, Dimension, Rate, Minimum Distance, Distance, -Code
Let be an alphabet, and let .
- The message length or dimension of a code is the quantity .
- The rate of a code is the quantity .
- The minimum distance (or simply distance) of a code is the minimum Hamming distance between its distinct codewords, i.e., .
A code with message length and minimum distance is called an -code.
Definition: Hamming Ball, , Volume,
Let be an alphabet, and let . The Hamming ball in of radius about a word is the set
The volume of a Hamming ball is simply its cardinality, and is denoted by
Note that this quantity is independent of the choice of word .
Definition: Erasure, Error
TODO
Minimum Distance is a Proxy for Robustness
Theorem: Let be an alphabet, and let . If a code has minimum distance , then:
- can correct erasures.
- can detect errors.
- can correct errors.
Proof: TODO
Hamming Bound
Theorem: Let be an alphabet, and let . If a code has minimum distance , then its rate is bounded above by
Proof: Let denote the collection of Hamming balls of radius about each codeword in . Observe that the members of are pairwise disjoint. Indeed, if there were two distinct codewords whose Hamming balls intersect, say at , then by definition, we would have and . The triangle inequality would then imply
contradicting the hypothesis that has minimum distance . Now, the members of are all subsets of , so it follows that the total cardinality of all of the balls in is bounded above by .
Taking the base- logarithm of both sides, we obtain:
Dividing both sides by yields the desired result. ∎
Definition: Perfect Code
Let be an alphabet, and let . A code is perfect if the collection of Hamming balls covers , where denotes the minimum distance of .
In other words, a code is perfect if it saturates the Hamming bound.
Definition: Linear Code
Let be a finite field. A linear code of dimension and block length over is a -dimensional linear subspace of .
Note that the linear-algebraic notion of dimension coincides with our previous definition of the dimension of a code. If is a -dimensional linear subspace of , then , which implies .
Definition: Generator Matrix, Parity-Check Matrix, Dual,
Let be a finite field, , and let be a linear code of dimension .
- A generator matrix for is a matrix whose column space is .
- A parity-check matrix for is a matrix whose kernel is .
- The dual of is the linear code .
Here, denotes the standard inner product on , i.e., .
Because every vector space has a basis, every linear code admits a generator matrix (whose columns are the basis vectors for ). In fact, up to permutations of symbol positions within codewords, every linear code admits a generator matrix whose first rows coincide with the identity matrix. Such a generator matrix defines a systematic encoding map.
Gilbert–Varshamov Bound
Theorem: Let be a prime power, and let with . There exists a linear code of minimum distance and rate
Proof: Let be a uniformly random linear subspace of of dimension . Note that there are only finitely many such subspaces, so this uniform distribution makes sense. We will show that has minimum distance with probability .
Let be a uniformly random generator matrix for . Observe that, for any fixed , the vector is uniformly distributed in . Indeed, all of the random choices we have made so far are invariant under a linear isomorphism of , and there exist isomorphisms of that send any nonzero vector to any other nonzero vector, so this random process cannot favor any nonzero vector.
TODO
Remark: The Gilbert-Varshamov bound actually holds for any , not just prime powers. A proof of this stronger result requires a different construction, and the resulting codes are not linear codes.