Data structure

A data structure in programming is a container object that stores information. The data within the container is structured according to a specific system, which varies depending on the type of structure. This is how developers optimize access to information.

A data structure can be thought of as a variable that stores not just one value, but several. Each value—or almost each—can be accessed through this variable using a specific algorithm. The types of these variables and how they are structured depend on the specific entity.

Almost all developers use basic data structures. These structures exist in almost all modern programming languages; they may simply be called differently.

What are data structures for?

Modern programs can handle large amounts of information, sometimes quite diverse. It would be difficult to assign a separate variable to each unit of such data: it would waste memory, and the abundance of variables would make it too easy to get confused.

Therefore, developers use data structures—unified objects that store a set of information. Access rules for this information can vary—there are special operations for this, sometimes specific to a particular structure type. Each structure is suited to its own use cases: there are tasks where it is convenient to use, and others where it is unnecessary.

Every developer should be able to use basic data structures. They simplify code writing and make it possible to solve problems that would be very difficult to accomplish without structured data.

Data structure characteristics

A data structure description can be roughly divided into an interface and an implementation. An interface is a set of operations that describes the behavior of a specific structure. That is, it describes what can be done with a specific data structure. And the implementation is how the data is structured internally.

In addition, a typical data structure meets several criteria:

  • correctness, that is, competent implementation of its technical description;
  • spatial complexity – the implementation should take up a minimum of space;
  • Time complexity – operations should be performed in a minimum amount of time.

“Minimum space” and “minimum time” may differ for different structures. This is normal: each has its own purpose and optimal use.

What types of data structures are there?

Data structures can generally be divided into linear and nonlinear. The difference lies in how they store their elements—the data contained within them.

  • In linear structures, elements are arranged in a “chain” fashion, one after another. They form a sequence. For example, a word can be thought of as a linear data structure, where the data are the letters.
  • In nonlinear data structures, elements can branch, forming tables or diagrams. A real-life example of a nonlinear data structure is a road map.

Below, we will briefly discuss the most popular data structures: their features, operations with them, and situations when they should be used.

Arrays

This is the simplest example of a data structure. An array is a linear sequence of values, each with its own number. These numbers are called indices—they can be used to access any element in the array.

The indices are sequential. They are integers, the sequence of which starts with 0.

Arrays can be classified as one-dimensional or multidimensional. One-dimensional arrays are simply a series of numbered values. Multidimensional arrays are like matrices: each array value is itself an array. Consequently, an element can be accessed using two indices—row and column.

Classic arrays are static, meaning they have a finite and unchanging length. If you create such an array with a length of 5, it will always have room for exactly 5 elements, no more and no less. There are also dynamic arrays: their length changes depending on how many values ​​are placed in them.

Typically, an array can only contain data of one type. For example, an array of numbers or an array of strings.

Operations. You can add one or more elements to an array—either to the end or to a specific section of a sequence. You can also return an element by index, and in some cases, remove it. By directly accessing an element by index, you can modify it. There are also operations for finding the length of an array and operations for returning the entire array.

Application. Arrays are used in most situations where organized data storage is required. They are not used only in complex structures: in those cases, a different format would be more logical. But even here, arrays are needed as building blocks for implementing more complex structures.

Queues

A queue is similar to an array. It’s also a linear structure consisting of a sequence of elements. The difference is that these elements can only be accessed using the FIFO (First In, First Out) principle. This means that the element at the very beginning of the queue, which has been there the longest, can be quickly and easily removed. Access from the end or middle may not be possible at all. Elements, on the other hand, are added only to the end—just like a real-life queue at a store.

Operations. The basic operations characteristic of a queue are adding an element to the end, retrieving an element from the beginning with or without deleting it, and checking whether the queue is empty.

Application. Queues are used to organize multithreaded processes when several actions are performed simultaneously. This data structure allows you to specify the sequence and priority of execution for actions. This balances the program load and prevents freezing. Queues are also used in a similar way when executing requests, if there are many of them and they arrive very quickly.

Stacks

A stack is a structure that’s the reverse of a queue. It’s a sequence accessed using the LIFO (Last In, First Out) principle. Elements are added to the end, and can be quickly retrieved and removed from the end. This means that the later an element is added to the stack, the easier it is to access.

The English word “stack” means a “pile”—for example, a stack of papers. A stack can be thought of in the same way: as a pile of papers, where the topmost element is taken first.

Operations. Among the operations specifically for stacks are push, which adds an element to the end, and pop, which removes an element from the end. Popping means removing the element from the stack but returning it as a value.

There is also a check for stack emptiness and an operation that returns the last element without removing it.

Usage: The stack is used to allocate memory for programs and processes. Functions running within a program are “stacked” onto the system stack. The popular expression “stack overflow” is a common error that occurs when the stack overflows due to improper memory management. This most often occurs with recursion.

Another use for the stack is browsing history: it makes it easier to access the data a person viewed most recently.

Decks

Decks are double-ended queues that combine the capabilities of both a queue and a stack. These data structures can operate on either a FIFO or LIFO basis. Elements can be accessed from either end.

Operations. Decks can be considered to combine the operations typical of queues and stacks. In some ways, these data structures resemble arrays and are close to them in functionality.

Application: Decks are used when it’s important to ensure access to both the first and last elements. For example, when optimizing process execution.

Related lists

Another common example of a linear data structure is a sequence of elements, each of which stores data and a reference to the next or previous element. As a result, one element can quickly access its neighbor.

The elements themselves, unlike an array, are stored separately from each other. They have no numbers. Sequence is achieved solely through “next” and “previous” pointers. The last element will point to an “empty” value.

Singly linked lists have links only to the next element, while doubly linked lists have links to both the next and the previous element. They are also called unidirectional or doubly linked lists. There are also circular lists—they form a loop, with the last element pointing to the first.

The first element is called the head of the list, and the last one is called the tail.

Operations. You can add an element to a list—to the head, tail, or a specified location. You can also update the value of an element, delete it, and search the data structure. There’s also an operation to check whether the list is empty. The next element can be accessed using the next pointer, and the previous element can be accessed using the prev pointer.

Applications. Linked lists are used to build more complex structures, such as graphs and hash tables, which we’ll discuss later. They are also important when allocating memory to programs in dynamic processes and operating systems.

Sets

A set is another linear data structure. Its unique feature is that its elements have no clear order, but are unique in value. It can be thought of not as a sequence or a list, but as a “tag cloud”: an unordered collection of unique values. This definition is close to the mathematical concept of a set. And it’s used in the same way: not sorted or structured, but rather used to store, retrieve, and analyze data.

A set is also called a multitude. In English, the name sounds like Set.

Operations. The operations specific to this data structure are designed to work with two sets. For example, these include set union, comparison, and intersection searches between two sets. These are similar to Euler circles, but implemented in software. 

There’s also an operation that returns only the elements of a set that don’t intersect another, and a subset operation that determines whether one set is contained within another.

Application. Sets are used to store collections of data that must be unique but don’t require ordering. These are typically collections that require specific operations, such as searching for subsets, combining them, comparing them with others, and identifying intersections.

Cards

In English, this data structure is called a Map. It’s also known as an associative array or dictionary. It’s indeed similar to an array, but instead of ordered numeric indices, it uses “keys”—user-specified numbers or strings. The entire array is a set of key-value pairs.

You can think of this data structure as a phone book. The key is the name a person uses to search for a number. And the value is the phone number itself.

The developer defines the keys themselves or generates them automatically. Keys are often strings, making it easier to understand and find information.

Associative arrays can typically store data of different types. The keys can also vary.

Operations. Maps can be manipulated like arrays, but they are accessed by key name rather than index. You can add a key-value pair, delete one, or change the value. You can also access an element by its key.

Associative arrays implement traversal of the entire structure differently than regular arrays: you can’t simply increment the index by 1 at each step, since the key can be anything. Therefore, programming languages ​​typically provide special operations for traversing a dictionary.

Application. Associative arrays are used to store data that needs to be quickly retrieved by a fixed key. They are also used for data transfer—the key acts as the parameter name, and the element itself stores its value. For example, when a user fills out a form on a website and submits it, the data is sent to the server in the form of a dictionary like this:

“name: Ivan,

phone: +255234567,

isAgree: true»

The key’s name, phone, and isAgree describe what exactly is being transmitted: name, phone number, and whether or not consent to receive messages. Ivan, true, and number are the values ​​that these parameters accept.

Hash tables

This data structure is similar to an associative array and is sometimes implemented using one. The difference is that the key for each value is automatically generated using a special formula— a hash formula. This formula is applied to the value itself, resulting in a key based on it.

A hash table can be used to generate keys automatically, for example, in situations where their names shouldn’t carry any useful information. This is more convenient and faster than a dictionary when dealing with large amounts of data. In addition, the use of hashes allows you to encrypt information; however, one table is not enough for this.

When working with hash tables, it’s important to avoid collisions—situations where applying a formula to different values ​​yields identical keys. To avoid this, you need to carefully select the formula for each case. There are also special strategies for preventing collisions.

Operations. You can add an element to a table, delete it, or search for it by a particular attribute. When creating a table, you can and should also specify a formula for generating hashes.

Applications. Hash tables are used to store large amounts of information in databases, as well as to create caches or build more complex structures. They are most often used when fast access to information is required.

Counts

A graph is a nonlinear data structure consisting of “vertices” and “edges” between them. Each vertex represents a value, and the edges represent the paths that connect the vertices. The result is a kind of “grid,” similar to a road map or a constellation.

Graphs can be undirected, where the edges have no specific direction, or directed. In the latter case, edges can only be traversed in one direction, like a one-way street. There are also mixed graphs, which contain both directed and undirected edges.

There are also weighted graphs, whose edges have a “weight”—a particular value. For example, in a road map, the weight of a road edge could be defined as its length.

Graphs are usually implemented using linked lists or matrices – two-dimensional arrays.

Operations. These structures support basic operations: adding a vertex or edge, displaying a vertex or the entire graph, estimating the “cost” of traversing a weighted graph, and so on.

There are several graph traversal algorithms for searching information or finding the shortest path from one vertex to another. For example, DFSBFSDijkstra’s algorithm, and others. These don’t always have dedicated commands, so you may need to implement these algorithms yourself.

Applications. Graphs are widely used to store machine learning models and for mapping, such as plotting routes through online services. Software emulation of electrical circuits also uses graphs. Graph theory is also used in search algorithms and social networks—for example, to calculate the “reach” of a person’s friends. Graphs can be used to allocate resources within a system or to organize complex calculations.

Trees

Trees can be considered a special case of graphs. They are also structures of vertices and edges, but they have a tree-like format. The vertices of trees are called nodes. A single node can have several descendant vertices, but each node can only have one parent. This is how a tree-like hierarchical structure is created.

Several types of trees are used in programming. Most of them are implemented using graphs.

  • Binary search trees are the most common tree format. Each node can have no more than two children. If a child node’s value is less than its parent’s, it is placed on the left. If it is equal to or greater than, it is placed on the right. These trees are convenient for sorting data and searching for a desired value using binary search.
  • Prefix trees, also known as loaded trees or forests, are trees that store data in a chain-like fashion. Parent nodes are prefixes needed to navigate to descendants. As the tree is traversed, the complete data is collected. For example, each node contains a letter. Traversing from beginning to end yields words. Such trees are typically used to store repetitive data or, for example, to implement autocompletion as you type.
  • Binary heaps are filled binary trees. Each node has two children. Depending on the tree type, children must be strictly greater than, less than, or equal to their parents.

There are also N-ary trees, red-black trees, AVL trees, balanced trees, 2-3-4 trees, and so on.

Operations. Trees can be traversed bottom-up or top-down, breadth-first or depth-first. Special algorithms exist for this, just like for graphs. Since the location of a node in trees typically depends on its value, it’s impossible to arbitrarily add a node anywhere. Therefore, adding, removing, or modifying a node causes a rebalancing—a change in the structure based on the added value.

Applications: Trees are used to store information where a hierarchical structure is important. They can also be used for efficient sorting, searching for specific values, or storing repeated data, as in the case of prefix trees. Trees are often used in machine learning algorithms.

Data structures exist in every programming language. You’ll first encounter them early in your learning process, when you start working with arrays. As your program becomes more complex, you’ll encounter other structures—and perhaps even write your own implementations.


Explore More IT Terms


Share this term: Facebook X LinkedIn WhatsApp Email

Leave a Reply

Your email address will not be published. Required fields are marked *