Representation of Characters in a computer

Representation of Characters in Computer – To represent a character in a computer we used two symbols 0 and 1. All the data to be stored and processed in computers are transformed or coded as strings of two symbols, one symbol to represent each state.

0 and 1 are known as bits, an abbreviation for binary digits. And there are four unique combinations of two bits.

And there are 8 unique combinations or strings of 3 bits.

000

001

010

011

100

101

110

111

Each unique string of bits may be used to represent or code a symbol. we have 26 letters in the alphabet, and in order to code the 26 capital or upper case letters of English, at least 26 unique strings of bits are needed.

but with the 4 bits (2x2x2x2) we only have 16 unique strings that are not sufficient for us. but with the 5 bits (2x2x2x2x2) we have 32 bits that are sufficient for us to represent the English letters. so we picked 26 unique combinations of bits to represent each English word in the computer as you can see in the given table.

Bit string	Letter	Bit string	Letter
00000	A	10000	Q
00001	B	10001	R
00010	C	10010	S
00011	D	10011	T
00100	E	10100	U
00101	F	10101	V
00110	G	10110	W
00111	H	10111	X
01000	I	11000	Y
01001	J	11001	Z
01010	K	10010
01011	L	10010
01100	M	10010
01101	N	10010
01110	O	10010
01111	P	10010

but there is a problem, data processing using computers requires the processing of not only the 26 capital English letters but also the 26 lower case English letters, 10 digits, and around 32 other characters such as punctuation marks, arithmetic operators symbols, and parentheses. thus the total number of characters to be coded is 26 + 26 + 10 + 32 = 94.

with the strings of 6 bits each, it is possible to code only 64 characters. thus 6 bits are insufficient for coding these 94 characters. but we can use strings of 7 bits each that will have (2x2x2x2x2x2x2x2) 128 unique bit strings and can thus code up to 128 characters. so the strings of 7 bits each are sufficient to code 94 characters.

The most popular standard is known as ASCII(American standard code for information interchange). this standard uses 7 bits to code each character as we can see in the given below table.

Least significant bits of code	Most	significant	bits	b6	b5	b4
b3 b2 b1 b0	000	001	010	011	100	101	110	111
0 0 0 0	NUL	DLE	SPACE	0	@	P		p
0 0 0 1	SOH	DC1	!	1	A	Q	a	q
0 0 1 0	STX	DC2	“	2	B	R	b	r
0 0 1 1	ETX	DC3	#	3	C	S	c	s
0 1 0 0	EOT	DC4	$	4	D	T	d	t
0 1 0 1	ENQ	NAK	%	5	E	U	e	u
0 1 1 0	ACK	SYN	&	6	F	V	f	v
0 1 1 1	BEL	ETB	‘	7	G	W	g	w
1 0 0 0	BS	CAN	(	8	H	X	h	x
1 0 0 1	HT	EM	)	9	I	Y	i	y
1 0 1 0	LF	SUB	*	:	J	Z	j	z
1 0 1 1	VT	ESC	+	;	K	[	k	{
1 1 0 0	FF	FS	‘	&lt	L		l	\|
1 1 0 1	CR	GS	–	=	M	]	m	}
1 1 1 0	SO	RS	.	&gt	N	^	n	~
1 1 1 1	SI	US	/	?	O	_	o	DEL

For example, we type RAMA J in the computer, then the bit representation of this string is

1010010	1000001	1001101	1000001	0100000	1001010
R	A	M	A	SPACE	J

The blank between the RAMA and J also needs a code, and this code is essential to leave a blank between RAMA and J when the string is printed.

A string of bits used to represent a character is known as a byte. characters coded in ASCII will need only 7 bits. and the need to accommodate characters of languages other than English was foreseen while designing ASCII and thus 8 bits were specified to represent characters. Thus a byte is commonly understood as a string of 8 bits.

The International Standards Organization standardized an 8-bit code (ISO 646) for Latin script used in Europe in addition to English letters. this was widely used in Europe. and the ASCII code is a proper subset of this code. but in 1991 the group proposed a standard called Unicode which was a 16-bit code called Unicode. The primary idea of Unicode is to separate the coding of characters from their graphical representation called glyphs.

Other related tutorials

Leave a Reply