Representation of Characters in a computer YASH PAL, July 4, 2021November 25, 2024 Representation of Characters in Computer – To represent a character in a computer we used two symbols 0 and 1. All the data to be stored and processed in computers are transformed or coded as strings of two symbols, one symbol to represent each state. Representation of Characters 0 and 1 are known as bits, an abbreviation for binary digits. and there are four unique combinations of two bits. 00011011 and there are 8 unique combinations or strings of 3 bits. 000001010011100101110111 each unique string of bits may be used to represent or code a symbol. we have 26 letters in the alphabet and in order to code the 26 capital or upper case letters of English, at least 26 unique strings of bits are needed. but with the 4 bits (2x2x2x2) we only have 16 unique strings that are not sufficient for us. but with the 5 bits (2x2x2x2x2) we have 32 bits that are sufficient for us to represent the English letters. so we picked 26 unique combinations of bits to represent each English word in the computer as you can see in the given table. Bit stringLetterBit stringLetter00000A10000Q00001B10001R00010C10010S00011D10011T00100E10100U00101F10101V00110G10110W00111H10111X01000I11000Y01001J11001Z01010K10010 01011L10010 01100M10010 01101N10010 01110O10010 01111P10010 but there is a problem, data processing using computers requires the processing of not only the 26 capital English letters but also the 26 lower case English letters, 10 digits, and around 32 other characters such as punctuation marks, arithmetic operators symbols, and parentheses. thus the total number of characters to be coded is 26 + 26 + 10 + 32 = 94. with the strings of 6 bits each, it is possible to code only 64 characters. thus 6 bits are insufficient for coding these 94 characters. but we can use strings of 7 bits each that will have (2x2x2x2x2x2x2x2) 128 unique bit strings and can thus code up to 128 characters. so the strings of 7 bits each are sufficient to code 94 characters. The most popular standard is known as ASCII(American standard code for information interchange). this standard uses 7 bits to code each character as we can see in the given below table. Least significant bits of codeMostsignificantbitsb6b5b4 b3 b2 b1 b00000010100111001011101110 0 0 0NULDLESPACE0@P p0 0 0 1SOHDC1!1AQaq0 0 1 0STXDC2“2BRbr0 0 1 1ETXDC3#3CScs0 1 0 0EOTDC4$4DTdt0 1 0 1ENQNAK%5EUeu0 1 1 0ACKSYN&6FVfv0 1 1 1BELETB‘7GWgw1 0 0 0BSCAN(8HXhx1 0 0 1HTEM)9IYiy1 0 1 0LFSUB*:JZjz1 0 1 1VTESC+;K[k{1 1 0 0FFFS‘<L l|1 1 0 1CRGS–=M]m}1 1 1 0SORS.>N^n~1 1 1 1SIUS/?O_oDEL for example, we type RAMA J in the computer then the bit representation of this string is 101001010000011001101100000101000001001010RAMASPACEJ the blank between the RAMA and J also needs a code and this code is essential to leave a blank between RAMA and J when the string is printed. A string of bits used to represent a character is known as a byte. characters coded in ASCII will need only 7 bits. and the need to accommodate characters of languages other than English was foreseen while designing ASCII and thus 8 bits were specified to represent characters. Thus a byte is commonly understood as a string of 8 bits. The International Standards Organization standardized an 8-bit code (ISO 646) for Latin script used in Europe in addition to English letters. this was widely used in Europe. and the ASCII code is a proper subset of this code. but in 1991 the group proposed a standard called Unicode which was a 16-bit code called Unicode. The primary idea of Unicode is to separate the coding of characters from their graphical representation called glyphs. Also, read Computer Fundamentals Input and Output Units Error detecting codes Computer memory Hierarchy computer fundamentals engineering subjects computer fundamentalsengineering subjects