GNU Astronomy Utilities manual

Next: , Previous: , Up: Developing   [Contents][Index]


10.1 Why C programming language?

Currently the programming language that is most commonly used in scientific applications is C++, and more recently Python. One of the main reasons behind this choice is that through the Object oriented programming paradigm, they offer a much higher level of abstraction. However, GNU Astronomy Utilities are written in the C programming language. The reasons can be summarized with simplicity, portability and speed. All three are very important in a scientific software.

Simplicity can best be demonstrated in a comparison of the main books of C++ and C. The “C programming language”97 book, written by the authors of C, is only 286 pages and covers a very good fraction of the language, it has also remained unchanged from 1988. C is the main programming language of nearly all operating systems and there is no plan of any significant update. The most recent “C++ programming language”98 book, also written by its author, on the other hand has 1366 pages and its fourth edition came out in 2013! As discussed in Science and its tools, it is very important for other scientists to be able to readily read the code of a program at their will with minimum requirements.

In C++, inheriting objects in the object oriented programming paradigm and their internal functions make the code very easy to write for the programmer who is deeply invested in those objects and understands all their relations well. But it simultaneously makes reading the program for a first time reader (a curious scientist who wants to know only how a small step was done) extremely hard. Before understanding the methods, the scientist has to invest a lot of time in understanding those objects and their relations. But in C, if only simple structures are used, all variables can be given as the basic language types for example ints or floats and their pointers to define arrays. So when an outside reader is only interested in one part of the program, that part is all they have to understand.

Recently it is also becoming common to write scientific software in Python, or a combination of it with C or C++. Python is a high level scripting language which doesn’t need compilation. It is very useful when you want to do something on the go and don’t want to be halted by the troubles of compiling, linking, memory checking, etc. When the data sets are small and the job is temporary, this ability of Python is great and is highly encouraged. A very good example might be plotting, in which Python is undoubtedly one of the best.

But as the data sets increase in size and the processing becomes very complicated, the speed of Python scripts significantly decrease. So when the program doesn’t change too often and is widely used in a large community mostly on large data sets (like astronomical images), using Python will waste a lot of valuable research-hours. Some use Python as a wrapper for C or C++ functions to fix the speed issue. However because such actions allow separate programs to share memory (through Python), the code in such programs tends to become extremely complicated very soon, which is contrary to the principles in Science and its tools.

Like C++, Python is object oriented, so as explained above, it needs a high level of experience with that particular program to fully understand its inner workings. To make things worse, since it is mainly for fast and on the go programming, it constantly undergoes significant changes, such that Python 2.x and Python 3.x are not compatible. Lots of research teams that invested heavily in Python 2.x cannot benefit from Python 3.x or future versions any more. Some converters are available, but since they are automatic, lots of complications might arise in the conversion. Thus, re-writing all the changes would be the only truly reliable option. If a research project begins using Python 3.x today, there is no telling how compatible their investments will be when Python 4.x or 5.x will come out. This stems from the core principles of Python, which are very useful when you look in the ‘on the go’ basis as described before and not future usage.

The portability of C is best demonstrated by the fact that both C++ and Python are part of the C-family of programming languages which also include Java, Perl, and many other languages. C libraries can be immediately included in C++ and with tools like SWIG99 it is easily possible to use the C libraries in programs that are written in those languages. This will allow other scientists to use the libraries in Gnuastro with any of those languages. Gnuastro’s libraries are currently static and not installed, but we are working on making them shared and installable100. Following that we will be working on allowing the creation of libraries in different languages at configure time 101.

The final reason was speed. This is another very important aspect of C which is not independant of simplicity (first reason discussed above). The abstractions provided by the higher-level languages (which also makes learning them harder for a newcomer) comes at the cost of speed. Since C is a low-level language102(closer to the hardware), it is much less complex for both the human reader and the computer. The former was dicussed above in simplicity and the latter helps in making the program run more efficiently (faster). This thus allows for a closer relation between the scientist/programmer (program) and the actual data/processing. The GNU coding standards103 also encourage the use of C over all other languages when generality of usage and “high speed” is desired.


Footnotes

(97)

Brian Kernighan, Dennis Ritchie. The C programming language. Prentice Hall, Inc., Second edition, 1988. It is also commonly known as K&R and is based on the ANSI C and ISO C90 standards.

(98)

Bjarne Stroustrup. The C++ programming language. Addison-Wesley Professional; 4 edition, 2013.

(99)

http://swig.org/

(100)

http://savannah.gnu.org/task/?13765

(101)

http://savannah.gnu.org/task/?13786

(102)

Low-level languages are those that directly operate the hardware like assembly languages. So C is actually a high-level language, but it can be considered the lowest-level high-level language.

(103)

http://www.gnu.org/prep/standards/


Next: , Previous: , Up: Developing   [Contents][Index]


Read in other formats.
GNU Astronomy Utilities manual, November 2015.