Representation of Characters in a computer
To represent a character in a computer we used two symbols 0 and 1. all the data to b stored and processed in computers are transformed or coded as strings of two symbols, one symbol to represent each state.
0 and 1 are known as bits, an abbreviation for binary digits. and there are four unique combinations of two bits.
00 | 01 | 10 | 11 |
[br]
and there are 8 unique combinations or strings of 3 bits.
000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
[br]
each unique string of bits may be used to represent or code a symbol. we have 26 letters in the alphabet and in order to code the 26 capital or upper case letters of English, at least 26 unique strings of bits are needed. but with the 4 bits (2x2x2x2) we only have 16 unique strings that are not sufficient for us. but with the 5 bits (2x2x2x2x2) we have 32 bits that are sufficient for us to represent the English letters. so we picked 26 unique combinations of bits to represent each English word in the computer as you can see in the given table.
[br]
Bit string | Letter | Bit string | Letter |
---|---|---|---|
00000 | A | 10000 | Q |
00001 | B | 10001 | R |
00010 | C | 10010 | S |
00011 | D | 10011 | T |
00100 | E | 10100 | U |
00101 | F | 10101 | V |
00110 | G | 10110 | W |
00111 | H | 10111 | X |
01000 | I | 11000 | Y |
01001 | J | 11001 | Z |
01010 | K | 10010 | |
01011 | L | 10010 | |
01100 | M | 10010 | |
01101 | N | 10010 | |
01110 | O | 10010 | |
01111 | P | 10010 |
[br]
but there is a problem, data processing using computers requires the processing of not only the 26 capital English letters but also the 26 lower case English letters, 10 digits, and around 32 other characters such as punctuation marks, arithmetic operators symbols, and parentheses. thus the total number of characters to be coded is 26 + 26 + 10 + 32 = 94.
[br]
with the strings of 6 bits each, it is possible to code only 64 characters. thus 6 bits are insufficient for coding these 94 characters. but we can use strings of 7 bits each that will have (2x2x2x2x2x2x2x2) 128 unique bit strings and can thus code up to 128 characters. so the strings of 7 bits each are sufficient to code 94 characters.
[br]
The most popular standard is known as ASCII(American standard code for information interchange). this standard uses 7 bits to code each character as we can see in the given below table.
[br]
Least significant bits of code | Most | significant | bits | b6 | b5 | b4 | ||
---|---|---|---|---|---|---|---|---|
b3 b2 b1 b0 | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
0 0 0 0 | NUL | DLE | SPACE | 0 | @ | P | p | |
0 0 0 1 | SOH | DC1 | ! | 1 | A | Q | a | q |
0 0 1 0 | STX | DC2 | “ | 2 | B | R | b | r |
0 0 1 1 | ETX | DC3 | # | 3 | C | S | c | s |
0 1 0 0 | EOT | DC4 | $ | 4 | D | T | d | t |
0 1 0 1 | ENQ | NAK | % | 5 | E | U | e | u |
0 1 1 0 | ACK | SYN | & | 6 | F | V | f | v |
0 1 1 1 | BEL | ETB | ‘ | 7 | G | W | g | w |
1 0 0 0 | BS | CAN | ( | 8 | H | X | h | x |
1 0 0 1 | HT | EM | ) | 9 | I | Y | i | y |
1 0 1 0 | LF | SUB | * | : | J | Z | j | z |
1 0 1 1 | VT | ESC | + | ; | K | [ | k | { |
1 1 0 0 | FF | FS | ‘ | < | L | l | | | |
1 1 0 1 | CR | GS | – | = | M | ] | m | } |
1 1 1 0 | SO | RS | . | > | N | ^ | n | ~ |
1 1 1 1 | SI | US | / | ? | O | _ | o | DEL |
[br]
for example, we type RAMA J in the computer then the bit representation of this string is
1010010 | 1000001 | 1001101 | 1000001 | 0100000 | 1001010 |
R | A | M | A | SPACE | J |
[br]
the blank between the RAMA and J also needs a code and this code is essential to leave a blank between RAMA and J when the string is printed.
[br]
A string of bits used to represent a character is known as a byte. characters coded in ASCII will need only 7 bits. and the need to accommodate characters of languages other than English was foreseen while designing ASCII and thus 8 bits were specified to represent characters. Thus a byte is commonly understood as a string of 8 bits.
[br]
The international standards organization standardized an 8-bit code (ISO 646) for Latin script used in Europe in addition to English letters. this was widely used in Europe. and the ASCII code is a proper subset of this code. but in 1991 the group proposed a standard called Unicode which was a 16-bit code called Unicode. and the primary idea of Unicode is to separate the coding of characters from their graphical representation called glyphs.