Level 2 Computer Science

Data Representaion

All computers do things with information that allow you to do different things such as view, listen, create etc.… with document, images, sound and many more. However, computers must represent this information in some way either stored on a hard disk or sent over by a network.

Computers use binary to represent this information which uses two values: 0 and 1. You will never see 0 and 1 on your computer screen but it is what the computer will read to represent anything by two states. For example, a 0 can mean off to a computer or electronic device or appliance, and a 1 can mean on for the same thing. Lights that have a 1 and 0 symbol on it are a basic representation of binary. Those symbols are used to show you if the light is turned on or off (Flick the 1 symbol up to turn on and flick the 0 symbol down to turn off). It’s not really a real-life representation of binary but an example of what binary is like as it does have an on and off function, 1 means on and 0 means off. Hard drives use magnetism to represent 1’s and 0’s while RAM uses electric charges stored in tiny capacitors to indicate “on” or “off”, 1 or 0.

As humans, we use the base-10 system for numbers. This means that when we are using numbers that have several digits, each digit is worth 10 more than the digit to the right of it e.g. 923 is 900 + 20 + 3 or 1277 is 1000 + 270 + 7. As you can see we have the thousands, hundreds and one’s digits where each digit is worth more. Since computers read binary that only have 0s and 1s, they can only use two digits to represent numbers. This means that when the computer is representing numbers in binary each digit is worth twice as much to the digit to the right of it.

00000000=0
00000001=1
00000011=3
00000111=7
00000010=2
11000000= 192
11111111=255

So, what about 256?

Using 8bits which is a byte, the highest value we can reach using base-2 is 255, any higher and we need more bits for the computer to be able to read it. For this reason, the original Pac-Man arcade game could only go up to level 255 because it was made using 8-bits. As stated before, 8-bits make up a maximum number of 255 (11111111 in binary= 255). So, what happens is that if you could get pass level 255 in the original Pac-Man, you can’t go any further because the number 256 doesn’t exist in 8bits. So, in order to increase the amount of levels you could go up in on Pac-Man you would need a processor that contains more bits e.g. 16bits, 32bits.

Character Represenation

How character representation works in binary is like number representation in terms of double as binary is in base-2 but there is a different way of explaining characters. Every time we add a bit in binary we are doubling the total amount of characters we can use in total. For example: 1-bit means we can only use up to 2 characters in total, 2-bits allows us to have 4 characters, 3-bits=8 characters…………… 8-bits= 256 characters. So, what would be the least number of bits we would need to make a non-standard keyboard with 255 characters work?

0= 2 characters
00= 4 characters
000= 8 characters
0000= 16 characters
00000= 32 characters
000000= 64 characters
0000000=128 characters
00000000= 256 characters (8-bits are required)

Using the fact that you cannot use half-bits (5 and a half bits are not possible for example) you need to use at least 8-bits to fit all the characters required for a keyboard that has 255 characters. By adding one bit, you double the number of characters that can be used. We can look at how the computer uses binary to represent numbers and now we can look at how the computer represents characters in binary. The computer uses binary code to represent every single character which is like its number representation.

ASCII vs Unicode

ASCII has 128 characters in total with 256 in the extended set. It can fit in a single 8-bit byte. There is 128 possible characters with a range of 0-127. There are way more characters in the world than that. Because of this ASCII is not the most efficient because it uses 8 bits but only uses 7 of those bits, keeping the 8th at zero which is wasted space but necessary as computers read in bytes.

Unicode came along to solve this issue which has numbers of schemes of which UTF-8 is most popular. It is used in software and web pages as the most popular character-set as it is the most flexible, it can use one to four bytes when representing characters.

UTF-32 uses four bytes, so 32 bits is 33,554,432 characters in total. (fixed)

UTF-16 uses two or four bytes so that’s 16 or 32 bits so 65,536 or 33,554,432 characters in total. (flexible)

UTF-8 uses one, two, three or four bytes so that ranges from 256 to 4,294,967,298 characters in total. (flexible, the most flexible)

Since UTF-8 is the most flexible, it is commonly used on website that support English characters as an English character only requires one byte of data to store. But Asian characters use two bytes, which means more data space so UTF-16 is suitable for Asian characters as it uses two-bytes. A lot of asian websites do use UTF-8 however. UTF-8 can use one, two, three or four bytes as it the most flexible.

Negative Binary Numbers

Positive Numbers can be represented in binary and so can negative numbers as there are two methods of doing it.

There is one method called Two’s complement that can be used to make a positive number and represent into a negative number. So how do we use Two’s complement?

Convert a selected number into binary (pretend it is a positive number)
Invert all of the digits (change 0 to 1 and 1 to 0)
Add 1 to the result

19= 00010011, Inverted= 11101100, +1 = 11101101 (The two’s complement representation for -19 is 11101101.)

96 = 01100000, Inverted= 10011111, +1 = 10100000(if you must add 1 to a 1 then it becomes 0 until you reach the nearest 0). The two’s complement representation of -96 is 10100000.

With the inverted binary code, make the nearest zero 1 and make any 1’s in front of that zero become 0. Anything behind the nearest zero remains unchanged.

What about the number 0?

0= 00000000, Inverted= 11111111, +1 = 00000000(same binary code) 0 stays the same.

There is also using simple sign bit which is putting a 1 at the end of the binary code e.g. +96= 01100000 –96 11100000(8th bit becomes 1 to represent a negative number) but Two’s Complement avoids using two representations for 0. (+0= 00000000, -0= 10000000 using simple sign bit) and computers use Two’s complement as it is more sophisticated for representing negative numbers.

Hexadecimal

Hexadecimal is shorthand for binary numbers. They break down long-bit numbers like 1111111111111111 into 4-bit groups, for this scenario: 1111 1111 1111 1111. How binary is represented in Hex is 0000 to 1001 is represented as its denary value, so 0000= 0 and 1001= 9, 0001= 1, 0010= 2 etc…. When the binary value reaches 1010 which is 10, it turns into letters so: 1010= A and 1111= F, A to F is the alphabet range for Hex, also 1011= B, 1100=C, 1101=D etc…. So, if you had a big number like 62236 it would be 1111001100011100 in binary which is quite big and a bit tedious but break it apart in 4-bit pieces you have 1111 0011 0001 1100 which in hex is F31C which is easier to look at. Hexadecimal is commonly used in representing colour codes in html as you apply a hex code into an html script to produce a colour.