Translation system: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Nick Johnson
(Started, but totally incomplete.)
 
mNo edit summary
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
In [[computer science]] a '''translation system''' is software capable of translating a text from one [[language]] to another.  The most common example of a translation system is a compiler, which translates from one [[formal language]] to another, generally to translate from a human readable program description to a machine readable program description.
{{subpages}}


== Compilers ==
In [[computer science]], a '''translation system''' is a [[program]] which translates one [[language]] into another.  If both the source and target language are [[formal language|formal languages]], which are fully specified so there can be no ambiguity, the program is called a [[compiler]].  But software which translates between two [[natural language|natural languages]] must use many more techniques and [[heuristics]] than a compiler, due to the inherent ambiguity of natural language.  The endeavor of translating natural languages using a [[computer]] [[program]] involves both the academic disciplines of [[computer science]] and [[linguistics]].


As was mentioned earlier, a compiler is a program which can translate from one formal language into another.  Generally, the source language is a high level, human readable language (such as [[Java]] or [[Ada]]), though it could be any unambiguous program representation, such as a [[flow chart]] or [[finite state machine]] description of an algorithm.  The target language is generally [[machine language]], though it could just as well be another high level language or even a presentation language (such as [[HTML]]).
[[Category:Suggestion Bot Tag]]
 
=== Structure of Compilers ===
 
Like any class of software, there is a large variety of implementations.  However, most compilers follow this pattern,
 
# [[lexical analysis|Lexical Analysis or Scanning]], in which the input characters are recognized by a set of [[regular expressions]] and output as a sequence of [[token|tokens]].
# [[syntactic analysis|Syntactical Analysis or Parsing]], in which the input tokens are recognized by a set of [[pushdown automatons]] and output a sequence of semantic actions.
# [[semantic analysis|Semantic Analysis]], in which each semantic action builds an internal or intermediate representation of the source program, and [[context sensitive]] errors (any error that cannot be discriminated by a [[context-free language]]) are detected.
# [[optimization|Optimization]], in which the compiler attempts to replace computationally expensive portions of the program with less expensive versions, provided that no substitution affects the operation of the program.
# [[code generation|Code Generation]], in which the intermediate language is translated piece at a time to the target language.
# [[peephole optimizations|Peephole Optimization]], a final optimization pass in which analyses the output code over a small region (the peephole), searching for very localized optimizations.
 
In actuality, there may be multiple optimization stages scattered throughout this process.  Additionally, most modern compilers repeatedly translate the language from an intermediate representation to a simpler intermediate representation in order to accomodate a wide swath of optimizations that operate on different levels of detail.
 
==== Lexical Analysis ====
 
During lexical analysis, a set of regular expressions translate the input sequence (generally characters) into an output sequence (called tokens).  One popular tool to simplify the creation of lexical analyzers is a software package called [[lex]].
 
Readers accustomed to programming may benefit from a few examples of errors that can be detected during this phase.  A lexical analyzer could detect errors in a single token, for instance a number that has the letter 'y' in it, or a string with a missing end quote.
 
==== Syntactic Analysis ====
 
During syntactic analysis, an input sequence of tokens is matched against a set of gramatical constructs called [[productions]].  As each production is matched, a semantic action routine is called.  The role of each semantic action is to build an intermediate representation of the input program, such as a list of variables and functions, and a sequence of instructions comprising each function.
 
Readers accustomed to programming may benefit from a few examples of errors that can be detected during this phase.  A Syntactic analyzer could detect a syntactic error, such as a missing semicolon or curly brace.  A syntactic analyzer cannot detect the use of an undeclared variable.  This is because the declaration of a variable before its use is a [[context sensitive]] langauge requirement, though syntactic analyzers are generally [[context-free]] language recognizers.
 
==== Semantic Analysis ====
 
During semantic analysis, a compiler builds and examines an intermediate representation of the source program and checks it for consistency.
 
Readers accustomed to programming may benefit from a few examples of errors that can be detected during this phase.  A semantic analyzer could detect errors, such as undeclared variables or functions.
 
==== Optimization ====
 
During optimization, a compiler attempts to alter its internal representation of the input program as to improve code speed, size, or many other code characteristics.
 
==== Code Generation ====
 
==== Peephole Optimization ====
 
== Translation Systems other than Compilers ==
 
Babelfish?
 
{{stub}}
 
[[Category:Computers Workgroup]]
[[Category:CZ_Live]]

Latest revision as of 07:01, 30 October 2024

This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

In computer science, a translation system is a program which translates one language into another. If both the source and target language are formal languages, which are fully specified so there can be no ambiguity, the program is called a compiler. But software which translates between two natural languages must use many more techniques and heuristics than a compiler, due to the inherent ambiguity of natural language. The endeavor of translating natural languages using a computer program involves both the academic disciplines of computer science and linguistics.