Data structures

A data structure organizes and stores data in memory for efficient access and modification. A collection is a group of data items treated as a single unit, focusing on operations like adding, removing, and iterating over elements. While collections rely on data structures, they offer a more abstract way to manage groups of objects. All collections are based on data structures, but not all data structures are collections.

There are a lot of data structures:

  • Text structure: str
  • Numeric structures: int, float, complex
  • Sequence structures: list, tuple
  • Mapping structure: dict
  • Set structures: set, frozenset
  • Boolean structure: bool
  • Null structure: None
  • Binary structures: bytes, bytearray
  • From the collections module: deque

In this lesson, we will also learn about multidimensional structures and type casting. Some of the names above are links to specific sections of this page.

We already know a few structures (str, int, float, bool, and None), and I will introduce the rest in this lesson. Data structures have different qualities based on how they are declared and used. We can go with a loop through sequence, mapping, and set structures, just like with strings (e.g., for x in "abcde":). They are called iterable data structures.

A list is mutable, which means that if we assign it to an x variable, we will have two names referring to the same object (they will have the same ID). However, if we assign a string to an x variable, a second variable will be created (because it is immutable). This variable will contain the same data, but while editing the original version, we will not edit the copy (which was the case with the list). All the structures we already know are immutable.


list1 = [1, 2, 3] # list is mutable
x = list1
print(x)
print(list1)
x.append(4)
print(x)
print(list1)
                                    

string = "string" # string is immutable
x = string
print(x)
print(string)
x += "s"
print(x)
print(string)
                                    

All mutable types, like lists, when passed as arguments to a function and modified inside it, will change their value in the outer scope (the parameter holds a reference to the original object). Meanwhile, all immutable types, like strings, must be returned from a function for the changes made inside to take effect.


def modify_list(numbers):
    numbers.append(4)

def modify_string(text):
    text = text + " world"

numbers = [1, 2, 3]
modify_list(numbers)
print(numbers) # [1, 2, 3, 4]

text = "hello"
modify_string(text)
print(text) # "hello"
                                    

is

The is keywords check whether two objects are the same (in terms of memory addresses), while the == operator checks whether two objects have the same values.


x = "x"
y = x
print(x is y, x == y)
y = "y"
print(x is y, x == y)
                                    

Mutable data structures

Lists

We use lists to store many objects of all types. We create them using square brackets. We can immediately fill the array with objects after commas or leave it empty. Remember that we can't name a list "list" because it is its keyword, and we use it in type casting (I will explain it at the end of this lesson). A list index is the position number used to access or reference a specific element within an array, starting from 0. Just like with strings, we can use the index() method to get the index of any item in the list and check how many elements it has using the len() method.

append()


# Adding an element to a list
x = []
print("List before:", x)
x.append("Python")
print("List after:", x)
                                    

reverse() and reversed()


# Reversing the order of elements in a list
x = [3, 3.5, "Python"]
print("List before:", x)
x.reverse()
print("List after:", x)

y = list(reversed(x)) # creating a new reversed list without modifying the original list
print("A new reversed list:", y)
                                    

remove()


# Deleting an element from a list
x = ["Python"]
print("List before:", x)
x.remove("Python")
print("List after:", x)
                                    

List indexing

The rules are the same as for string indexing. We can index all iterable data structures.


x = ["a", "b", "c", "d", "e", "f"]
print(x[2]) # the third element - "c"
print(x[1:3]) # from the 2nd element (included) to the 4th (excluded) - ["b", "c"]
                                    

Iterating over a list


# Accessing every single list element independently
x = [3, 3.5, "Python"]
for y in x:
    print(y)
                                    

Unpacking a list


# Unpacking list values to individual variables (also works with, e.g., tuples)
a, b, c = [1, 2, 3]
                                    

Modifying a list with indexing


x = [1, 2, 3, 4, 5, 6]
x[2:4] = [7, 8]
print(x) # [1, 2, 7, 8, 5, 6]

y = [1, 2, 3, 4, 5, 6]
y[2:3] = [7, 8]
print(y) # [1, 2, 7, 8, 4, 5, 6]
                                    

enumerate()


# Accessing the item and its index simultaneously while iterating a list (it works with all iterable data structures)
x = ["a", "b", "c", "d"]
for index, item in enumerate(x):
    print(index, item)
                                    

x = enumerate(["a", "b", "c", "d", "e", "f"])
print(next(x)) # (0, "a")
print(next(x)) # (1, "b")
                                    

x = [["a", 1], ["b", 2], ["c", 3]]
for index, [name, number] in enumerate(x):
    print(f"{index}: {number} belongs to {name}")
                                    

Replace range() with enumerate() as often as possible (by PEP8 guidelines).

sorted()


# Sorting a list (alphabetically by default)
x = ["b", "e", "r", "z", "a"]
y = [6, 1, 4, 3, 5, 7]
print(sorted(x))
print(sorted(y))
                                    

max() and min()


x = [1, 2, 3, 4, 5]
print(max(x)) # the biggest number from a list
print(min(x)) # the smallest number from a list
                                    

If the list contains letters, the sorting is alphabetical.

insert()


# Adding an element to a list on a particular index (the rest of the items will move)
x = ["a", "b", "c", "d"]
x.insert(0, "e")
print(x)
                                    

pop()

This method also works with sets and dictionaries.


# Deleting an element chosen by index from a list (if no argument is given - the last element)
x = ["a", "b", "c", "d"]
x.pop(2) # if we assign this method to a variable, it will equal this deleted value (this method returns it)
print(x)
                                    

count()


# Counting the number of times an element occurs in a list
x = ["a", "b", "c", "d"]
print(x.count("a")
                                    

extend()


# Extending a list with an another iterable data structure
x = ["a", "b", "c", "d"]
y = ["e", "f"]
x.extend(y)
print(x)
                                    

"Truly" copying a list

Come back here after learning what a module is.

Check out the copy module. These methods also work with sets and dictionaries.

clear()

This method also works with sets and dictionaries.


# Purging a list
x = ["a", "b", "c", "d"]
x.clear()
print(x)
                                    

any() and all()

any() checks if any element in a list is True, and all() scans if all of them are.


x = int(input())
y = int(input())
z = [x == 2, y == 4]
if any(z):
    print("The x variable equals 2, or/and the y variable equals 4.")
                                    

x = int(input())
y = int(input())
z = [x == 2, y == 4]
if all(z):
    print("The x variable eqauls 2, and the y variable equals 4.")
                                    

Unpacking a list

This method also works with sets and dictionaries.


# Purging a list
x = ["a", "b", "c", "d"]
x.clear()
print(x)
                                    

Sets

Sets are like lists, but their elements are unique. Even if we add, e.g., the number 4 twice, it will keep only one. Its elements can mix indexes (they are stored in random order). We write its elements in braces, but if we want to create an empty set, we have to write x = set() because empty braces are reserved for dictionaries.

We can achieve better performance when searching sets than other data structures, e.g., lists.

add()


# Adding an element to a set
x = {3, 3.5}
x.add("Python")
print(x)
                                    

union()


# Merging two sets
x = {1, 2}
y = {3, 4}
print(x.union(y))
                                    

discard()


# Deleting an element from a set
x = {1, 2, 3, 4}
x.discard(2)
print(x)
                                    

issuperset() and issubset()

Set x is the superset of set y if all elements of y are in x. A set is a subset of another set if it is not its superset.


x = {1, 2, 3, 4}
y = {1, 2}
print(x.issuperset(y))
print(y.issuperset(x))
print(x.issubset(y))
print(y.issubset(x))
                                    

intersection()


# Creating a new set with elements that occur in both sets
x = {1, 2, 3, 4}
y = {1, 3, 5}
print(x.intersection(y))
                                    

difference()


# Creating a new set with elements that occur only in the first set
x = {1, 2, 3, 4, 8}
y = {1, 3, 5, 7}
print(x.difference(y))
                                    

isdisjoint()


# Checking if two sets have any common elements
x = {1, 2, 3, 4}
y = {1, 3, 5, 7}
z = {9, 10}
print(x.isdisjoint(y))
print(x.isdisjoint(z))
                                    

Dictionaries

Dictionaries work on the key-value coding rule, which states that searching for the key will give us the value. Every data structure can be a value. Using the dictionary keys, we can fetch the assigned values. In the example below, "Python" and "list" are keys, and "programming language" and "data structure" are values.


x = {"Python": "programming language", "list": "data type"}
x["list"] = "mutable data structure" # editing the value under a given key
print(x["list"]) # displaying the value under a given key
                                    

We can add objects to the dictionary using the same method.


x = {"Python": "programming language", "list": "data structure"}
x["tuple"] = "immutable data structure"
print(x)
                                    

The keys() method gives us a list of keys in the dictionary, while the values() method provides a list of its values. The items() method creates a list of tuples with keys and values inside them.


x = {"Python": "programming language", "list": "data structure"}
print(x.keys())
print(x.values())
print(x.items())
                                    

del()


# Deleting a key (and its value) from the dictionary
x = {"Python": "programming language", "list": "data structure"}
del x["Python"]
print(x)
                                    

get()


# Displaying the value under a given key, and if it doesn't exist - the default value
x = {"first": 1, "second": 2}
print(x.get("second", "default"))
print(x.get("third", "default"))
                                    

update()


# Merging and overriding dictionaries
x = {"Python": "programming language", "list": "data structure"}
y = {"C++": "programming language"}
z = {"list": "mutable data structure"}
x.update(y)
x.update(z)
print(x)
                                    

setdefault()


# Providing a default value for a dictionary key (it returns the value if the key exists, otherwise sets it to the default)
inventory = {"apples": 5, "oranges": 3}
print(inventory)

apples = inventory.setdefault("apples", 0) # key exists, returns 5
bananas = inventory.setdefault("bananas", 10) # key doesn't exist, sets to 10
print(inventory)
print(apples)
print(bananas)
                                    

popitem()


# Deleting the last element from a dictionary (if we assign this method to a variable, it will equal this deleted value)
x = {"a": 1, "b": 2, "c": 3}
x.popitem()
print(x)
y = x.popitem()
print(y)
print(x)
                                    

True and 1


z = {True: "a", 1: "b", 1.0: "c"}
print(z[True]) # True = 1 = 1.0 so True = "c"
                                    

Bytearray

I recommend you read about bytes first and then come back. We use bytearray, e.g., like this: bytearray([65, 66]). It has the same properties as the bytes structure, except it is mutable. This makes it suitable for situations where we need to change the byte data during processing.

We can create a bytearray object using the bytearray() method, passing a list of integers between 0 and 255. For example: bytearray([65, 66]). The result would be b'AB' (it decodes from Unicode). We can modify its contents like this: data[0] = 67 (change from "A" to "C").

We handle binary files using different symbols, such as rb instead of r and wb instead of w.


data = bytearray(10)
for i in range(len(data)):
    data[i] = 10 + i

with open("file.bin", "wb") as file:
    file.write(data)                                        
                                    

Two ways of reading from a binary file:


data = bytearray(10)

with open("file.bin", "rb") as file:
    file.readinto(data)
    
for x in data:
    print(hex(x), end = " ")
                                    

with open("file.bin", "rb") as file:
    data = bytearray(file.read())
    
for x in data:
    print(hex(x), end = " ")
                                    

Bytes is immutable, making it ideal for storing fixed binary data. Bytearray, however, allows modifying the data, which makes it more suitable for processing or manipulating data in memory.

Immutable data structures

Tuples

Tuples are similar to lists, but they are immutable. When we add an object to a tuple, Python creates another tuple with the same name and items plus the added one, while the previous one deletes itself. We write its elements in parentheses. Tuples share most methods with lists.


x = (3, 3.5)
x += ("Python",) # the comma at the end is necessary (otherwise, this would be considered a string because there is only one object inside)
print(x)
                                    

x = ("a", "b",)
first, second = x
print(first, "and", second)
                                    

Frozenset

Frozenset is an immutable version of a set.


x = ["a", "b", "c", "d"] # this could be any iterable data structure
fSet = frozenset(x)
print(fSet)
                                    

Bytes

Bytes is sequence of integers, where each integer is in the range of 0 to 255. This makes it efficient to store binary data since each byte is only 8 bits. This data structure is particularly useful for handling binary data in files or transmitting data over networks. We can create a bytes object using the bytes() method, passing a list of integers between 0 and 255. For example: data = bytes([65, 66]). The result would be: b'AB' (it decodes from Unicode).

Strings have no binary encoding assigned, and bytes has no text encoding assigned as it is used to store raw binary data. We can convert Unicode to binary data using the encode() method and vice versa using the decode() method.


text = "CPUcademy"
x = text.encode("utf-8")
print(x)
                                    

Go back to bytearray.

Complex

Complex numbers are expressed as a+bj, where a and b are real numbers, and j is an imaginary number, e.g., 100+3j. They are used in mathematics.

Deque

Deque stands for Double-Ended Queue. With it, we can easily add and remove elements not only at the end (like with lists) but also at the beginning. Its elements can be accessed by index. This data structure has to be imported from the collections module.


from collections import deque

integers = deque()
integers.append(2) # adding an element at the end
integers.append(4)
integers.pop() # deleting an element at the end
integers.appendleft(1) # adding an element at the beginning
integers.popleft() # deleting an element at the beginning
print(integers)                                          
                                    

These data structures don't exist in Python but can be implemented using deque:

Stacks and Queues are data structures for storing data in specific orders. Their elements can't be accessed by index. Stacks operate in a Last-In-First-Out (LIFO) order, where elements can only be added and deleted from the top of the stack. Queues operate in a First-In-First-Out (FIFO) order, where elements can be added at the end of the queue and removed from the front.

Multidimensional structures

A multidimensional structure is a structure that has another structure inside it, e.g., a list in a list or a tuple in a list, etc. A two-dimensional list is called a matrix.


list1 = [[1, 2, 3], [4, 5, 6]]
print(list1[1])
                                    

To access an element from the internal list, we do as in the example below. The first index means which list we are entering (in this case, the second one), and the second index means which element of the internal list we want to access (in this case, the first one).


list1 = [[1, 2, 3], [4, 5, 6]]
print(list1[1][0])
                                    

The code below checks which workers make over 90000 dollars annually. I will explain the one-lined instructions in the upcoming lessons, and now, please focus on the multidimensional structures.


workers = [("John", 110000), ("Amelia", 100000), ("Josh", 80000), ("James", 90000)]
list1 = [x for x, y in workers if y > 90000]
print(list1)
                                    

# Jumping every other element in every list
numbers = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]
numbers2 = [line[::2] for line in numbers]
print(numbers2)
                                    

Type casting

Type casting, also known as type conversion, is the process of converting one data type to another (e.g., list to str). We can't perform a conversion from, e.g., dict to list because it wouldn't make sense. We can check the variable's type using the type(x) method. If we write type(x).__qualname__(), we will get only, e.g., int instead of <class 'int'>. Here are a few examples of type casting:


# String to list (every character as a separate value)
string = "Python"
list1 = list(string)
print(list1)
                                    

# String to list using split() (breaking a string into a list by dividing it every time a given delimiter is encountered)
a = input() # the string can be entered by the user
b = a.split("-") # a space is the default delimiter
print(b)
                                    

# List to string (joining the elements of a list into a single string by inserting a delimiter between each element)
list1 = ["Python", "Coding"] # this list cannot contain numbers
string = " ".join(list1) # the space is the delimiter (we can choose a different one)
print(string)
                                    

# String to tuple
string = "Python"
tup = tuple(string)
print(tup)    
                                    

# Tuple to string
tup = ("Python", "Coding")
string = str(tup)
print(string)
                                    

# List to tuple
list1 = ["Python", 3, 3.5]
tup = tuple(list1)
print(tup)    
                                    

# Tuple to list
tup = ("Python", 3, 3.5)
list1 = list(tup)
print(list1)    
                                    

Type casting can be implicit, where a type is automatically converted to another (e.g., when dividing an integer by another integer, resulting in a floating-point number), or explicit, as shown above.