Programming language

From Citizendium
Revision as of 17:19, 10 May 2010 by imported>Johan Förberg (Changed title of section 'type systems')
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Catalogs [?]
 
This editable Main Article is under development and subject to a disclaimer.

A programming language is a human-readable lexicon and grammar that a programmer uses to instruct a computer how to operate. Programs written in a programming language have to be translated into machine code, usually by a compiler program. Machine code consists of multiple lower-level instructions which the computer can actually understand. Use of a programming language allows programmers to work at a higher level than machine code (which is not human-readable).

Language categories

The following are some of the ways that people have categorized different computer programming languages, although there is not always agreement on the precise meaning of the categories, or which languages belong in them. This article will attempt to describe the more common contradictory uses of the following terms.

Compiled vs. interpreted

One way in which various programming languages have traditionally been categorized is as compiled vs. interpreted languages. The traditional view was that compiled languages were first translated, by a compiler program, from human-readable source code into binary machine code. Some widely used early languages such as Fortran and C use pure compilation.

Conversely, interpreted languages rely, at run time, on a special runtime application, called the interpreter, to translate source code into machine code during program execution. An example of an early purely interpreted language is Snobol. Purely interpreted programs tend to execute more slowly due to the necessary intervention of the interpreter while the program is "executing". HTML is a special-purpose language that is interpreted; the interpreter for HTML is called a web browser, and it reads the HTML line-by-line and renders a web page for display to a user based on the HTML code.

The division between compiled languages and interpreted languages has blurred since the 1990s with the advent of hybrid platforms such as Java and the .NET Framework (C# and VB.NET). These hybrid languages are compiled down to an intermediate language at the time a program is written (Java bytecode and Microsoft Intermediary Language respectively). When the program is later run, the intermediate code is loaded into a sophisticated, optimized runtime engine for execution. Such runtime engines could be implemented as interpreters (early ones were), but nowadays they typically use Just-In-Time compilers to generate native machine code from the intermediate language on an as-needed basis. So multiple compilers are involved, one used by programmers to create programs that contain intermediate code, and another used at runtime to "interpret" the intermediate language (or in actuality, to just-in-time compile portions of intermediate code to native code on the fly as needed).

High-level vs. low-level

Another way in which programming languages are sometimes categorized is into "high-level" versus "low-level" languages. "High-level" programming languages have one high-level command or statement corresponding to many machine code instructions. "Low-level" programming languages, including especially assemblers, may have approximately one human-readable instruction per binary machine instruction. A "high-level" language may also sometimes be called "low-level" if it permits a programmer to perform certain (possibly risky) hardware or operating system operations. C is technically "high-level" but is sometimes regarded as "low-level" as well because it imposes little, if any, restrictions on what a programmer can do in terms of accessing the computer's raw hardware capabilities.

General purpose vs. special purpose

A third categorization for programming languages is whether the language is "general purpose" or "special purpose". A language is considered general-purpose if any program at all can be coded in the language. Conversely, if the language is targeted towards making certain kinds of things possible, but does not do everything that other languages might, it is considered "special purpose". Examples of general-purposes languages are Fortran, C, Java and C#. An example of a special-purpose programming language is SQL (used to interact with database programs).

Markup languages (special purpose)

Markup languages contain a lexicon and grammar, but they are limited in purpose. Their purpose is to mark up text information into segments, and label each segment so that another program, sometime in the future, can "render" or display this information in a useful manner (instead of as one large blob of text). Examples of markup languages are HTML, LaTeX, SGML, XML and Postscript. HTML marks up information intended to be displayed later in a web browser; HTML tells the browser where paragraphs begin and end, which text to make into hyperlinks (and the target for those), what color to make the background, and things like that. Web browsers later "interpret" the markup commands within HTML pages and then format the page for display to human readers. HTML also allows for the expression of some semantic information regarding the meaning of the text on the page: this is slowly growing with the use of microformats and RDFa, and allow for parsers to do more intelligent things with content on the Web: such as extract telephone numbers or event details and load them into software specifically designed for the purpose of handling and tracking calendar events and contacts. Markup languages often express more then simply the display of documents but also their meaning or role. Postscript commands are used to tell printers how to print documents; printers act as the "interpreter" for postscript commands embedded within documents to be printed.

PDF is a derivative of Postscript and serves many of the same functions but now can be embedded with JavaScript and other features. XML takes the markup approach one step farther. Not only can it be used for human-readable presentations, but it also provides a simple, consistent format that other programs can use to store and transfer data across platforms. There are special purpose languages which are used to define the semantics of XML-based languages - namely, DTDs and XSD or RELAX NG schemas - as well as the transformation process to move one XML-based language into another (XSLT).

Object-oriented, procedural and functional

Java is an example of a strict object oriented language. Every method (function) and every attribute (variable) must live within some object. Java, for instance, allows no global variables or functions. By contrast, Python and C++ both provide objects but do not require their use; such languages are often called multi-paradigm. Objects can be composed out of other objects, and an object can be based on an existing object using inheritance. Thus one avoids 'reinventing the wheel' to solve general problems. In modern times, most large programming projects use object oriented programming methods to manage complexity and to tame side effects. Note that nearly any language can be used with an object oriented methodology. With great effort, C or even assembly language (ref: project Geos) can use object techniques. A modern programming language that maximizes the idea of object orientation beyond Java is Ruby.

An alternate approach to programming, that does not rule out the others, is functional programming. In functional programming, a program is regarded as a set of functions which live in their own bubbles and return a well-defined value for each set of arguments, and which try change the state of the program as little as possible. This can be compared with object-oriented programming, where all functions act on the state of objects and that state is often hidden from be programming in private variables. The idea is that a problem can be reduced into a set of functions which do simple tasks and do not interact with each other more than absolutely required, reducing the risk of errors. This shares some of the positive effects of object-oriented programming, including reusability and managing complexity. Python is an example of a language that encourages functional programming.

Different type systems

Static type-checking

In a statically typed language, almost all types are determined at compile-time by the compiler. The compiler tries to ensure that a function which expects arguments of a certain type will never be called with variables of an unexpected type, and that variables are not accidentally 'casted' into other types (potentially losing information or producing strange effects). This gives rather good type-safety, at the expense of increased programmer explicity and program inflexibility. An example of code which should raise an error in a statically-typed system:

func foo(int i, str s):
    print i, s
str physicist = "Max Planck"
float hbar = 1.05
foo(hbar, physicist) // Wrong because foo() expects (int, str) and gets (float, str)

This might not always be for the best, since it is very possible that foo() could have used a decimal number (a 'float') as well as an integer. This potentially requires the programmer to write two or more near-identical functions for essentially the same purpose. Examples of statically-typed lagnuages are C, C++ and Java

Dynamic type-checking

A dynamic type-checking scheme, as opposed to a static one, makes fewer assumptions about how the variables will be used at compile time. A principle known as duck-typing is applied: "Whatever walks like a duck and quacks like a duck, is a duck." This means that type-safety is determined based upon usage in the code. For example, the print() function in the above example might call the variable.tostr() function of each argument to determine its string representation. This would not be a problem at all, since (in this example) both int.tostr() and float.tostr() are proper, well-defined functions. The result is more flexible code at the expense of putting some of the burden of type-checking on the programmer and increasing the risk of strange behaviour due to unexpected behaviour. Note that this does not imply that variables in dynamic systems have non well-defined types (weak types): the type of each variable is known precisely by the compiler, and operations such as this one will inevitably fail:

s = "3" // compiler infers that s has type str
n = 2 // compiler infers that n has type int
s * n // The str type has no facility for 'multiplication' => error (a duck does not know how to 'moo'!)

Examples of dynamically typed languages are Javascript, Python and Haskell.

Strong and weak types

Perl is weakly typed and allows a variable to change dynamically between number and a string depending on the operators involved. Strict type checking at compile-time in Java can help one avoid many errors. Having a strongly-typed language does not necessarily mean that the type must be declared explicitly. In Java, one might write:

String x = "foo";

This would explicitly set x's type to String. But other languages like Scala and C# 3.0 allow the compiler to infer the type, rather than requiring an explicit type definition from the programmer.

Type casting

Casting is the process by which a variable is re-interpreted into another type. For example, it might be necessary to cast a decimal number into an integer. Most systems handle this by simply cutting off the decimal part, effectively flooring the number. Casting is often destructive, causing loss of information. This is less of a problem in weakly-typed languages where some 'logic' is built into the compiler, which effectively converts between types as needed.

A special case is the void pointer, present in the C programming language among others. This is a pointer which can point to any type, and which can be cast into any other pointer. This code is valid, although not practical, C:

#include<stdio.h>
int main() {
   int x = 10; int *px = &x; // there exists an int x and an int-pointer px which points to x
   void *vp = (void*)px; // there exists a void-pointer vp which points identically to px
   char *cp = (char*)vp; // there exists a char-pointer cp which points identically to vp
   putc(*cp); // print the char (really an int) referenced by cp to standard output
}

At the point where the programmer requests a void pointer, the compiler loses all knowledge of the type of the object referenced. Note that the actual content of x is not changed, only the set of operations which the compiler thinks are valid for that particular variable. This can cause very strange behaviour, as the above example is likely to do. Such low-level programming is likely to confuse and the use of void* is usually not recommended practice. There are situations where it is useful to solve a certain problem, however.

Declarative vs. Imperative

Examples of declarative languages would be sql, prolog and erlang. All other languages are mostly imperative, see list of programming languages: programming languages. Declarative languages tend to be very terse and describe only what task the programming wishes but do not include the details of how to do the task. Imperative languages tell the machine both "what" and "how" to do the task. For instance in SQL:
select * from people order by last_name;
gives a sorted list of people but does not specify the type of sorting algorithm used. One could argue that libraries of functions that abstract out the details of execution are declarative. Prolog and sql code specify some details so the boundary between declarative and imperative is not strict.

Strict vs Lazy

Real-time vs non-Real-time

Serial vs Parallel

Few languages are designed to be parallel. occam and erlang are pure parallel languages. More often, serial languages are extended with libraries that give access to parallel hardware. An example of a parallel library is PVM, parallel virtual machine. Sometimes libraries provide a data coordination language such as Linda or Gamma. Often parallel programs use either shared memory or message passing. Linda and gamma are a combination of the shared memory and message passing that use a framework called tuple-space. Tuple-space is a pool of data or tasks that many processors work on at the same time. Java-spaces is a Java version of linda. Major categories of parallel programming are SIMD and MIMD, (single instruction, multiple data) and (multiple instruction, multiple data), respectively. See: Parallel computation for more details. Renderman and glslang are examples of special-purpose SIMD parallel languages designed for rendering images on GPUs or render farms.

In languages not specifically designed for concurrency, often concurrency is implemented through through a specific language construct, often tied to a design pattern. For instance, Scala implements concurrency through Actors.

Dynamic languages

Scripting languages

Scripting languages tend to be interpreted and slower than compiled languages for the sake of convenience. There is a category of shell scripting languages for command line interfaces to Linux such as csh, bsh, bash, tsh, zsh, etc. Python is considered a scripting language even though it is semi-compiled. There are scripting languages for applications such as Lua for SciTe and elisp for emacs. Scheme and other languages can be used to script The Gimp. JavaScript/ECMAscript is used as a standard language to script web browsers (although it can be used elsewhere, for instance in Rhino (interpreter). Scripting languages tend to have automatic memory management, dynamic typing, associative arrays and other rapid prototyping features.

Assemblers

In the first computers, programmers had to work with binary machine code, which was very tedious and difficult. It was a huge breakthrough when someone wrote the first "assembler", a program which translated human-readable mnemonic words (written in plain text) into binary machine code. There is usually a one-to-one correspondence between assembler source code mnemonics (commands) with machine code instructions. A different assembler had to be written for each kind of computer, because each computer has a different machine instruction set, so there are many different assembler languages in existence (they are sometimes also called assembly languages). Assemblers were pre-cursors to high-level programming languages. In fact, compilers usually translate high-level program source code in two stages, first from human-readable high-level instructions to assembler, then from barely-human-readable assembler to machine code.


Popularity of programming languages

It's very hard to know the true popularity of programming languages, because of lack of objective information. Anyway, C (with C++, its object-oriented derivative) and Java seem to be the most popular languages, before PHP and Perl that are however very active in the internet community. TIOBE Programming Community [1] calculates every month the popularity of programming languages, based on search engines criteria. ohloh.net [2] presents a graphical statistic comparison based on coding metrics (like the number of projects, of lines, etc). On October 2007, the number of projects stored in the freshmeat.net repository [3] or in Sourceforge.net repository [4] shows the same tendency.

Some people wishing to track trends in programming language use statistics from technical book publishers like O'Reilly to infer popularity about the relevant programming languages[5]. There are problems with using this as a measure: some programming languages provide more comprehensive free, online documentation and so do not require programmers to purchase books in order to learn them. Additionally, for smaller languages, where a small number of books get published, the users of that language do not necessarily purchase books from all the different publishers.

That kind of statistics inform us about the current technical tendencies, making it possible to know about the market trends that can be important to anticipate the requirements in formation, qualified employment, etc. That said, some programmers have criticised the over-reliance on statistics about programming language popularity as being driven by fashion rather than technical excellence - it's based on the view that programming languages are standards rather than languages[6].

References