Data types and statically and dynamically typed languages in programming

Artificial Intelligence Technology Machine Learning Technology Digital Transformation Ontology Technology Natural Language Processing Intelligent information technology Clojure and functional programming Python and machine learning PHP and web frameworks Prolog and knowledge information processing LISP and artificial intelligence technology R language and machine learning C/C++ and various machine learning algorithms Java, Scala and Koltlin, general-purpose application building environments web design with CSS front-end development with Javascript Navigation of this blog Programming Techniques Overview
Data types and statically and dynamically typed languages in programming

In mathematicslogic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of mathematics. Two influential type theories that were proposed as foundations are Alonzo Church‘s typed λ-calculus and Per Martin-Löf‘s intuitionistic type theory. Most computerized proof-writing systems use a type theory for their foundation. A common one is Thierry Coquand‘s Calculus of Inductive Constructions.

In “Artificial Anencephalon Speaks of Zen and Buddha Bodhisattva,” Wittgenstein, who has read Russell’s “Principia Mathematica,” further discusses the relationship between the type of words and their meaning in the context of Zen philosophy.

A further mathematical approach would be related to the theory of spheres as described in “Gendai Shiso, July 2020, Special Issue: The World of Sphere Theory — Reading Notes on the Frontiers of Modern Mathematics.

In this issue, from the May 2020 issue of Software Design, we discuss data types in programming and statically and dynamically typed languages as the basics of types in computers.

How do computers handle data?

Data types” are an integral part of programming languages. First of all, let’s look at how computers store “data.

Basically, computers can only handle numerical values. For example, when dealing with characters, they are assigned a number (character code) and handled as such. In the case of decimal numbers, when all digits reach 9, the next digit is moved up to the next digit. In binary numbers, when all digits reach 1, the next digit is carried to the next digit. Therefore, with binary numbers, only 0s and 1s are needed to represent numbers from 0 to 255 in the case of an 8-digit number, for example.

The memory in today’s computers is called DRAM and is a collection of very small capacitors. In most cases, memory is managed with 8 bits as a unit, which is called a byte.

Although only integers can be expressed up to this point, numbers that include decimals are usually expressed in the form of floating-point numbers so that they can be expressed with a certain degree of precision regardless of the size of the number. Currently, the format specified in the IEEE754 standard is mostly used. The 32-bit (4-byte) format is called single-precision, and the 64-bit (8-byte) format is called double-precision.

Memory is numbered in bytes, and this number is called an address.

Types in C, Java, C#, etc.

As mentioned above, integers and floating point numbers can be represented in binary and IEE754. Variables are used to store such values, and statically typed languages such as C, C++, Java, and C# have types for variables. As an example, typical types in C are listed below.

  • char : 1 byte
  • short : 2 bytes
  • int : 4 bytes
  • long : 8 bytes
  • float : 4 bytes
  • double : 8 bytes

In the case of C, there is no standard that defines the specific number of bytes for each type, so the above values are those used in today’s PC processing systems. A variable is like a box that holds a value, and types such as char and short can be said to determine the size of the variable box.

For example, the binary number 000111111100011001111001111001100 means 1066192076 when interpreted as an integer (init), but 1.1 when interpreted as a floating-point number (float). When interpreted as a floating-point number (float), it means 1.1. By specifying the type, the processor can correctly retrieve the value.

enumerative type

For example, suppose you are creating a program to display some strings, such as a word processor or a browser, and you want to store information (horizontal position) on whether the strings should be left-justified, centered, or right-justified within a line in some variable.

In this case, for example, if we decide that left-justified is 1, centered is 1, and right-justified is 2, we can express the horizontal position in a variable of type int. In some cases, instead of 0, 1, and 2, we may define constants as in the C language, as shown below.

const int H ALIGN_LEFT = 0;
const int H_ALIGN_CENTER = 1;
const int H_ALIGN_RIGHT = 2;

This looks good, but what we want to express here is only “horizontal position,” and horizontal position is not originally an int type. As such, C provides a type called an enumerated type as “a type that takes one of several values. An enumerated type is declared as follows in C.

typedef enum {
  H_ALIGN_LEFT,
  H_ALIGN_CENTER,
  H_ALIGN_RIGHT,
} HorizontalAlignment;

Once declared in this manner, a “horizontal position type” variable (alignment in the figure below) can be declared in the following manner thereafter.

HorizontalAlignment alignment;

Types such as int and double are built into the programming language from the beginning, but enumerated types are defined by the programmer. Although enumerated types may internally be simply Int types, the use of enumerated types can increase the readability of a program.

Java did not have enumerated types until Java 5 (Tiger). Therefore, AWT (Abstract Window Tollkit), an old library for creating GUI in Java, specifies the horizontal position of a label string as an integer. Therefore, the method for specifying the horizontal position is as follows.

public void setAlignment(int alignment)

Since the user of this method does not know what to pass, an enumerated type was introduced in Java 5.

composite type

The types up to this point have all been primitive types, in which a single variable represents a single value, but in contrast, a large type that contains multiple values is called a composite type. In languages such as C, Java, and C#, an array type is a type that contains multiple values of a single type.

Another composite type is a struct type in C, and a class type in C++, Java, and C#. In structure types and classes, multiple types can be combined to create a new type. For example, the Person type, which expresses information about a “person,” is as follows.

<C Case>
typedef struct {
   char name[32];
   int age;
} Person;

<Java Case>
class Person {
  String name;
  int age;
}

The person type defined here can hold a person’s name and age, and since the size of the array name is 32, the length of the name can be up to 32 bytes.

Value and reference types

Looking at the example above, it appears that there is not much difference between C and Java, but the meaning of the variable p of type Person when it is declared is very different.

In C, a variable of type Person is a value type and contains name and age in itself. Comparing this to a variable box, the variable itself is a large box. Therefore, if we declare the variable p as follows, we can immediately store a value in it.

Person p;

/* Copy "C Taro" to p.name*/
strcpy(p.name, "C Taro");

/* Set p.age to 48 (age of C)*/
p.age = 48;

In Java, by contrast, a variable holds only a reference value to point to a “box”. Therefore, in Java, simply declaring p is not enough to store a value in p.namw, etc. It is necessary to allocate memory for Person by new and point p to it, as shown below.

Person p;

//Allocate memory for person and direct p to it
p = new Person();

//Point p.name to "Java Taro".
p.name = "Java Taro";

//Set p.age to 25 (age in Java)
p.age = 25;

This is illustrated in the following figure.

In Java, classes and arrays are reference types, and reference type variables store only reference values. A reference value is a value that “points” to the target data and is represented by an arrow in the above figure. A string is also a reference type because it is a class called the sytting class. On the other hand, int and double are primitive types, in which the variable directly holds the value.

C# supports both classes and structs, so it is possible to create variables that directly contain composite types such as Person. Instead of new, the malloc() function allocates memory for person.

Person *p = malloc(sizeof(Person));

A reference type is semantically the same as a pointer.

The C pointer type is used to store memory addresses. In modern PCs, the size of a pointer is 8 bytes (64 bits), and reference types such as Java store addresses in the same way. The memory used by modern programming languages can be roughly divided into static area, stack, and heap.

The static area is the area where the values of global variables and variables declared static are stored. In principle, a static area exists from the beginning to the end of a program.

The stack, whose structure is described in “Basic Algorithms for Graph Data (DFS, BFS, Nilpotent Graph Decision, Shortest Path Problem, and Minimum Global Tree),” is an area that holds local variables. Local variables are used only until the method is exited, so they shrink once the method is exited. Using the stack for local variables saves space for local variables of methods that are not currently running, and also allows recursive calls.

The heap is an area allocated by the malloc() function in C, or by new in Java or C#. The heap is considered unnecessary when it can no longer be traced back to a static area or the stack.

Here, a heap is a tree structure with the constraint that “child elements are always greater than or equal to (or always less than or equal to) parent elements,” and sorting using the characteristics of this structure is heap sorting, as described in “Data Sorting.

In C, arrays and structures can be stored directly in static areas or on the stack, but in Java, arrays and class objects are all stored in the heap. In the above figure, objects in the heap are represented as circles, but in the real memory area, they do not float like this, but are arranged in one dimension of memory, with addresses assigned in one-byte increments, and the value of that address is the reference value, which is stored in a variable of the reference type.

From this perspective, the heap is much harder to manage than a static area that always exists or a stack that only grows and shrinks in response to method calls. Since objects that are not on the heap are no longer needed in any order, the processor must keep track of which parts of memory are currently in use and which are unused. Rust, which has been attracting attention in recent years, improves on this garbage collection problem by introducing a reference mechanism called a “borrow checker” (I will discuss Rust separately).

Variables of reference type hold only the reference value, so assigning a variable to a variable only copies the reference value. Thus, the following Java program will display “a[1]. .10”.

int[] a = new int{1, 2, 3};
int[] b = a;
b[1] = 10;
System.out.println("a[1].." + a[1]);

In the above program, 10 is assigned to b[1], and the reason why 10 appears to be assigned on its own, even though nothing is done about array a, is because the destination indicated by a and b are the same, as shown below, which is called the alias problem.

These are the “statically typed” languages such as C, Java, and C#.

dynamic tidy language

Languages such as Javascript, Ruby, Python, etc. are “dynamically typed” languages. In a nutshell, they are “variable-untyped” languages, meaning that they can store any type of value in a variable. For example, in Ruby, a dynamically typed language, you can write code like the following

Class Person
  attr_reader :name, :age

  def initialize(name,age)
    @name = name
    @age = age
  end
end

# Assign an integer value to a
a = 5
# Assign string to a
a = "abc"
# Assign an object of class a to
a = Person.new("RubyTaro", 25)

We can see that we can assign an integer, a string, and an object of the class to the same variable a.

In the previous section, we mentioned that a variable is “like a box” and that types such as char and short determine the size of the box. The fact that “any type of value can be stored” means that all variables can be thought of as reference types.

Since reference types point to objects in memory, their size is constant. In this sense, Ruby and Python do not have the concept of primitive types. If all types were reference types, the heap would fill up, and the burden on the GC to retrieve them would increase. Therefore, in actual implementations, not all types are allocated in the heap, and frequently used types, such as integer types, are kept in variables to avoid using the heap.

In the next article, we will discuss the difference between statically and dynamically typed languages.

コメント

タイトルとURLをコピーしました