Character Sets

Text is a collection of characters that can be represented in binary, which is the language that computers use to process information
To represent text in binary, a computer uses a character set, which is a collection of characters and the corresponding binary codes that represent them
One of the most commonly used character sets is the American Standard Code for Information Interchange (ASCII), which assigns a unique 7-bit binary code to each character, including uppercase and lowercase letters, digits, punctuation marks, and control characters
E.g. The ASCII code for the uppercase letter 'A' is 01000001, while the code for the character '?' is 00111111
ASCII has limitations in terms of the number of characters it can represent, and it does not support characters from languages other than English
To address these limitations, Unicode was developed as a character encoding standard that allows for a greater range of characters and symbols than ASCII, including different languages and emojis
Unicode uses a variable-length encoding scheme that assigns a unique code to each character, which can be represented in binary form using multiple bytes
E.g. The Unicode code for the heart symbol is U+2665, which can be represented in binary form as 11100110 10011000 10100101
As Unicode requires more bits per character than ASCII, it can result in larger file sizes and slower processing times when working with text-based data

Character Sets (CIE IGCSE Computer Science)