Algorithms and data structure in Java
Software development is always connected with the processing and storage of data. We process, modify or store in the memory of this data.
As engineers, most likely we use the term data, instead of information.
For data storage, we use data structure, for its processing – algorithms. Regardless of the programming language, domain area, we always face two things: data structure and algorithms.
The standard of the programs we create depends directly on how significantly we understand them.
The objective of this article is to give a simple understanding of what data structures are and what a protocol is. We look at the method of comparison of methods in time and storage used.
Data structure
Very first, let’s turn to the meaning of the data structure:
The data structure is a software unit which allows you to store and process many of the same type or logically related data in computing.
As we see, all of this is extremely abstract and we want more specifics.
An example of a data structure can be either a familiar array, or a list, or the Developer class, which allows you to store data associated with a developer, etc.
It means that data structure can be represented in the form of some kind of data warehouse, connected logically.
But for us, as engineers, it is important not only to store data but also to work with them. This means that we need to have an efficient and convenient way of processing information that is stored in these data structures.
Thus, we come to such a concept as an algorithm.
Algorithm
Turning to Wikipedia, we will see the following definition:
The algorithm is a set of instructions that describe the procedure for the performer to achieve a certain result.
But, from the point of view of software development, we get additional requirements for the algorithm:
-
It has a finite number of steps.
-
Contains clear and understandable instructions.
-
Gives the result
To describe the algorithm, there are 2 generally accepted trips:
-
Block diagram
-
Pseudocode
Flow chart
The flow chart is a visual description of the algorithm using certain characters.
Let’s consider an example of constructing a flowchart for linear search.
It means we have an array of integers and want to find a component in it by value. To do this, we iterate over the array and compare each component with the desired value.
If an item is found, we return its index; if not, we report it.
It is worth noting that the flowcharts are convenient for describing algorithms, but one should not use them to describe the operation of a big system as a whole. For these purposes, UML (Unified Modeling Language) is much better suited.
Now let’s try to describe this algorithm using pseudocode.
Pseudocode
Pseudocode is an algorithm description language that uses the keywords of programming languages but omits the details and specific syntax.
Task description remains the same.
METHOD linearSearch(integerArray, searchItem):
START METHOD
FOR index FROM 0 -> length(integerArray):
IF integerArray[index] == searchItem
THEN RETURN index
END IF
END LOOP
RETURN -1
END METHOD
Even as we can see, pseudocode is usually extremely just like ordinary programming language and is not well suited for compilation.
However, a creator who reads it may easily understand what needs in order to be implemented.
If we all are talking about commonly used data structures (array, checklist, graph), then you are usually unlikely to often create your own personal data structures (except classes) and algorithms regarding basic businesses (write, study, delete, etc. ). Yet, you will require the ability in order to opt for the most efficient methods and data structures for each and every specific situation.
Search algorithms
For basic research of algorithms, an incredibly useful data structure this kind of as an array.
The array is a data structure in the sort of the set of pieces (array elements) found in memory straight one after the additional, which allows accessing factors with a numerical index.
The particular most primitive search protocol is a linear search (the pseudocode which we regarded above).
Let’s consider the following example:
class LinearSearchDemo {
public static void main(String args[]) {
int[] integerArray = {800, 345, 977, 40, 12, -183, 234, 15};
int elementToFind = 234;
System.out.println("Element " + elementToFind + " found, index: " + linerSearch(integerArray, elementToFind));
}
public static int linerSearch(int[] integerArray, int key) {
int arraySize = integerArray.length;
for (int i = 0; i < arraySize; i++) {
if (integerArray[i] == key) {
return i;
}
}
return -1;
}
}
Within this case, the subsequent happens:
-
Within the linearSearch method, we take an array and the value we have been looking for.
-
Begin sorting through the array’s elements with a loop
-
Compare 234 and 800 – all of us get false
-
Compare 234 and 345 – all of us get false
-
Compare 234 and 977 – all of us get false
-
Compare 234 and 40 – all of us get false
-
Compare 234 and 12 – all of us get false
-
Compare 234 and -183 – all of us get false
-
Compare 234 and 234 – all of us get true
-
We return the element index – 6
The worst case, when the needed element is not in the array. In this case, we iterate over all the sun and rain. All those. linear complexity – O(n).
Let’s consider the following example:
class BinarySearchDemo {
public static void main(String args[]) {
int[] integerArray = {-183, 12, 15, 40, 234, 345, 800, 977800, 345, 977};
int elementToFind = 977800;
System.out.println("Element " + elementToFind + " found, index: " + binarySearch(integerArray, elementToFind, 0, integerArray.length - 1));
}
public static int binarySearch(int[] sortedIntegerArray, int elementToFind, int low, int high) {
if (low > high) {
return -1;
}
int mid = low + (high - low) / 2;
if (elementToFind < sortedIntegerArray[mid]) {
return binarySearch(sortedIntegerArray, elementToFind, low, mid - 1);
} else if (elementToFind > sortedIntegerArray[mid]) {
return binarySearch(sortedIntegerArray, elementToFind, mid + 1, high);
} else {
return mid;
}
}
}
Here, we can see the following:
-
In the binarySearch method, we accept an ordered (sorted) array of integers.
-
Check that low is less than high (low = 0, high = 9) – we get false
-
We get the middle of the array – 234
-
Compare 234 and the desired value of 977800 – 977800 more.
-
We cut the left side of the array and get the new low – 5 and high – 9.
-
New median – 977800
-
We fall into the else condition – we return the index 7.
The most important factor when choosing algorithm needs to be an understanding of how this works and exactly what strengths plus weaknesses it has.
Choice of data structure and algorithm
Usually, when selecting data structures and methods for working with information in them, we stability between saving memory (data structures) and processor load (algorithms). Most often, the particular less memory we make use of, the higher the PROCESSOR load and vice versa.
And here comes the idea of algorithm efficiency. Some methods are more efficient regarding a specific situation, several less.
It is far from totally right to use the particular calculation time to determine the particular efficiency of the protocol since it is extremely dependent upon the hardware and the particular amount of data. Consequently, it is customary in order to use the idea of algorithm difficulty.
memory complexity — just how much memory is going to be needed for calculations
time complexity – how long will certainly we do the calculations
Both of these troubles are highly dependent on the input data. Usually, they may be denoted as and.
Suppose we have a few an array of integers [10, 6, 4, 23, 87, 1, 1, 1, 100] and we would like to get the sum of all the components of the array.
class IntegerArraySumDemo
public static void main (String [] args)
int [] integerArray = 10,6,4,23,87,1,1,1,1004;
int sum = 0;
for (int i = 0; i <integerArray.length; i ++)
sum + = integerArray [i];
System.out.println (sum);
In cases like this, the number of iterations depends on the number of array’s elements. It means that if we all have 10 elements, we all have to get accessibility to the array plus perform the addition operation 10 times. If a hundred items – 100 times.
Such complexity is known as linear and is denoted by O(n).
Being a further example, let’s look at the situation when we need to access an array’s element by index:
class GetIntegerArrayElementByIndex {
public static void main(String[] args) {
int[] integerArray = {10, 6, 4, 23, 87, 1, 12, 1004};
int indexOfRequiredElement = 5;
if (indexOfRequiredElement < integerArray.length) {
int element = integerArray[indexOfRequiredElement];
System.out.println(element);
}
}
}
In this case, regardless of the number of elements in the array, we perform a fixed number of operations – roughly speaking, these are 3 operations (checking the index, getting the element by index, output to the console). This number of operations will remain the same with 100 elements in the array, and at 1,000,000.
Such complexity is called constant and is denoted by O (1).
It should be noted that if the number of operations is always 100, the designation will still be O (1).
And if we are O (10n) complexity, then the final notation will be O (n). This is called simplifying O-notation.
Examples of time complexity
Suppose we have a company. It has departments (a unique name) and employees (not unique names). Both employees and departments have an identifier (unique) and detailed information. Entities are ordered by name. We can drive into the list only one parameter in the search – the name. And in our system, intermittent failures occur (the search will fly, then the sort).
-
O (1) – search for detailed information of the department by name. Since we know for sure that the name of the department is unique, by typing the name of the department into the search, we will get all the data on it.
-
O (1) best case – we are looking for an employee by name and the first employee in the list is the one we need.
-
O (n) – search for employee details by name. Sort broke. In the worst case, in our system, only employees and all employees have the same name (the world of Edward). In this case, we are directly dependent on the number of employees in the system. We will have to go through all the employees and compare by identifier and look for detailed information.
-
O (log n) – we broke down the search and we just see the pages with employees and departments. Search by name Cyril. Suppose we have 100 pages, each of which has 50 employees/departments. Order by name is preserved. We open to page 51 and look at the first entry and see that this is Oleg. This means pages 51-100 do not fit exactly. We turn to page 25 – first entry – Igor. Drop the pages 1-25. We open to page 13 and the first name is Kirill with the identifier we need.
This way, we finish discussing the algorithms and data structure.