5.5 - Coding Systems
- VDU = visual display unit. Aka a monitor
- ASCII = american standard code for information interchange
- contained non printable characters, for controlling things (E.g: carriage return, etc)
- Unicode
- Downfalls of ASCII
- only provides 128 numeric values, 33 of which are reserved for special functions
- many control codes obsolete
- does not cater for many European langs, which have accented characters and special symbols
- Asian langs
- Inadequate as a universal standard for information interchange
- Extends ASCII to include other langs, mathematical symbols, etc
- UTF-16 uses 1-2 16-bit codes. 2^16 different codes per unit
- UTF-32 uses 32 bit code units. 2^32 different codes per unit
- UTF-8 uses 1-4 bytes to encode each of the 1,112,064 valid unicode code points.
- the first 128 characters of unicode correspond 1:1 with ASCII, and are represented in exactly the same way (using 1 byte). Backwards compat
- Character code form of decimal digits
- numbers are mapped to certain codes, as they are characters just like anything else.
- pure binary representation
- when number itself actually encoded in binary
Error Checking and Correction
- information may get corrupted in transmission - may cause errors
- electrical interference
- faulty hardware
- how prevent
- extra info for use for error checking and correction
- Majority Voting
- Each bit is time sent 3 times in a row
- the bit is taken to be the one that appears the most
- does not guarantee absolute reliability
- unlikely but all 3 might corrupted
- prob can be minimised further by larger value for n
- size increase, but not increase information = additional redundancy
- Parity bits
- for even parity: an additional bit appended onto packet, such that there is an even number of 1s in the packet
- for odd: same but enforce odd 1s in the packet
- if odd number of bits flipped in transmission, the total number of bits in the packet will not be the decided parity, thus error detected
- does not work for even number of flips
- good for checking asynchronous serial transmission of data over short distances
- not good for synchronous serial transmission over long distances
- use of single parity bit is usually sufficient, though
- except when circumstances dictate that full error-detection is required
- for even parity, the parity bit can be calculated by XORing the bits together (because XOR does modulo-2 addition)
- e.g: for 0101: 0 xor 1 = 1 xor 0 = 1 xor 1 = 0 <– parity bit
- for odd parity, same method can be used but negated at the end.
- Checksums
- checking applied to block of data
- data treated as fixed size numbers (e.g each one byte in size)
- numbers then added together (in some way) to form some total
- sum truncated to size of byte, often by hashing
- truncated total is known as the checksum
- checksum appended end of block
- receiver regenerates checksum and compares
- Checksum - LRC = longitudinal redundancy check
- checksum made up of parity bit for each column
- by using this in conjunction of horizontal parity bits as well, some of the errors can be located and corrected
- Check digits
- special cases of checksums
- operates on decimal numbers
- an additional digit calculated using a standard algorithm
- can therefore be verified for mistakes by regenerating digit using algorithm
- try to detect the errors:
- corrupted digit
- transposition of digit
- usually use modular arithmetic
- Basic Check Digit
- given a number \(N\) made up of digits \(d_1, d_2, ...\)
- check digit \(C\) calculated by solving \((C+d_1+d_2+...)\mod p=0\), choosing an appropriate value for p
- \(\implies (C+d_1+d_2+...)\) must be a multiple of \(p\)
- therefore, let \(S=(d_1+d_2+...) mod p\), then using the fact that \(C\) is restricted to \(0\) to \(p-1\)
- \(C+S=p\).
- \(C=p-S\).
- ISBN