
The task of CTC is to arrange and predict the output characters from the results of sequence recognition using LSTM, Transformer, etc.

CTC is a simple loss function used for training deep learning models. Its goal is to find the alignment between an input X and an output Y. It does not require aligned data, as it can provide the probability for each possible alignment from X to Y. It only requires an input image (feature matrix of the image) and the corresponding text.
Handling Duplicate Characters: CTC merges consecutive duplicate characters into one. For example, "ffoooccusss" becomes "focus" Managing Intrinsic Duplicates: To address words with inherent duplicate characters, CTC inserts a blank character ("-") between duplicates. For example, "meet -> mm-ee-ee-t, mmm-e-ee-tt." During decoding, the blank indicates that both surrounding characters should be retained.


For example:
Alignment probability of a character: *aaa, a–, a-, aa-, -aa, –a. So the score of a is: a = 0.4x0.3x0.4 + 0.4x0.7x0.6 + 0.4x0.7 + 0.4x0.3x0.6 + 0.1x0.3x0.4 + 0.1x0.7x0.4 = 0.608. —>
$$ Loss = -log_{10}0.608 \approx 0.216 $$

The decoding process for an unseen image occurs as follows: