当前位置:首页 >> >>

Chapter 1 Yu Wang A Survey of Software Distribution Formats


Chapter 1 Yu Wang: A Survey of Software Distribution Formats
Nowadays various computer architectures and operating systems are developed. Software development does not focus on any singl

e of them anymore. Instead, software companies and distributors concerns more about how their software system can be deployed as universal as possible. We call both architectures and operating systems platforms. Without a multi-platform solution, assume we have M instruction set architectures, to design a software that is executable over all the possibilities, we will have to compile the software M times or even more, due to combination of di?erent hardware components. Whereas with a multi-platform solution, the software needs only to be written once, and by techniques of the solution, it is able to achieve the same functionality as if the code is compiled M times in previous case. We de?ne software distribution format to be any form in which software is distributed. In this article, we give discussion on several software distribution formats, which are fat binary, software for virtual machines and source code distribution, where the topic of virtual machines are divided into application oriented and platform oriented. In term of compilation, these distribution formats can be considered as fully compiled, half compiled and not compiled. At the end, comparison on these formats is drawn in conclusing remarks.

1.1

Fat Binary

An immediate solution to universal software distribution is to have the source code compiled into several binaries against di?erent architectures, and have all of them available to the computer of end user, so that the binary selectively installed based on the architecture of that computer. This solution is known as fat binary [25] or univeral binary [1], such that compiled binaries are archived and compressed into a single package and allow the system to choose which binary should be installed according to its architecture at the install time. Due to multi-architecture packaging in fat binary, one disadvantage is that the size of the package to be distributed tends to be larger than other solutions introduced in this article. 3

4

Yu Wang wangy22@mcmaster.ca

A fat binary for software with size s often ends up with a size of M × s, for M instruction set architectures. But because fat binary format is easy to be applied technically when producing software, this technique is still widely used, such as the instance that it makes Apple Computer’s migration from PowerPC architecture to X86 architecture in 2005 much smoother [26]. A similar solution to fat binary is the deployment of PocketPC applications using Microsoft Installer format (MSI) [2]. A PocketPC is a handheld computer running WindowsCE based operating system by Microsoft and was originally developed in November of 1996. Typical steps of installing an application into a PocketPC are that ?rst have the setup package installed on a desktop workstation, and then transfer the cabinet binary ?le (CAB) when synchronizing with the PocketPC device when it gets connected [14]. Due to historical reason, current PocketPC devices come with processors in di?erent architectures, such as Intel XScale1 , Hitachi SH32 and MIPS3 , which raise the problem of compatibility when distributing software applications. Since the computing performance of handheld devices are not as powerful as desktop computers, the solution should be as much independent of the processor as possible. In this case, fat binary is most appropriate. In term of compilation, fat binary can be considered as fully compiled distribution format. Other than universal distribution for di?erent architectures, fat binary is also used for other purposes, such as applications with di?erent localized versions. Such cases are rare and are out of the scope of our topic.

1.2

Application Oriented Virtualization

To make the software adoptable for di?erent platforms, one can consider a machine that virtually exists on top of each platform, such that the software only needs to be written for this virtual machine and it is able to be executed on every platform from the point of view from users. Here we classify modern virtual machines into two types, application oriented virtualization and platform oriented virtualization which is discussed in next section. We de?ne application oriented virtual machine to be virtual machine which provides a set of non-native instructions and allows applications, which is compiled against to this instruction set, to be launched and executed in the virtual machine. The language of such instruction set is commonly referred as intermediate language. The code in intermediate language is called intermediate code. An application oriented virtual machines bridges the underlying platforms and its applications, to allow applications running on multi-platforms. Modern application oriented virtual machines are either abstract stack machines or registerbased machines. A stack machine model and manipulate memory spaces as stacks, where structured data and function calls are pushed to the stacks and are popped as needed. A register-based machine calls operations against ?nite registers such that input data is com1 2

See www.intel.com/design/intelxscale See http://www.superh.com 3 See http://www.mips.com

A Survey of Software Distribution Formats

5

puted results are placed in the registers. To narrow the scope of our topic, register-based machine is not discussed here. For detailed di?erences between two types of machines, one can refer to [18]. As software is distributed in intermediate code, eventually it has to be compiled again, by the built-in compiler from virtual machine, into native machine language to be understood by computers. We often refer the compilation from intermediate language to native machine language as code generation. Depending on the time when the compilation takes place, two types of code generation are considered. The ?rst one is called install-time code generation. As named, the intermediate code is compiled during the time when the software get installed. After this stage, software program is completely compiled into machine language of target platform, and will be running natively. One major weakness of install-time code generation is that, if the software to be compiled is relatively large in code size, it will probably take a long time for the installation stage to complete its work. If this is the case, we change strategy to another type of code generation, called Just-in-time (JIT) compilation. In this mode, intermediate code are selectively compiled during the running time of the software. The condition of selective compilation is that the JIT compiler only compiles intermediate code encountered in current program state, such as the code after a conditional branch. Since the time taken for compiling a block of code is much shorter than compiling all the code, large software as mentioned above can be load and executed faster, with its usability guaranteed. In JIT compilation, once the intermediate code is compiled, compiler output is saved in memory for possible calls in future. As the program life time progresses, more and more intermediate code are translated. Since the native code is directly executable on the underlying hardware, it runs faster than the code blocks which are not yet compiled. This leads to a situation that the performance of a software running in JIT mode gradually gets improved, with respect to the beginning when the program is loaded. Eventually all the code blocks are compiled into native code, and the performance reaches its maximum. We call the stage before the intermediate code completely compiled JIT warm-up stage. In term of compilation, software for application oriented virtual machine can be considered as half compiled distribution format. It is suspected in [4] that the earliest proposal that comes up with the concept of JIT is [13] back in 1960s. The author believed that the compilation of function code into machine code can be done on the ?y, thus no compiler output needs to be saved in physical storage, which conforms the idea of JIT. From that time on, several implementations of JIT compilers for di?erent languages are researched, such as [15], [10] as well as [22].

Java Virtual Machine
The actual time when JIT got well known, is when Sun Microsystems released Java in 1990s. It contains a virtual machine is called Java Virtual Machine or JVM, which is designed to run applications only written in Java language originally. There does exist third-party implementations that compile course codes in other programming language into

6

Yu Wang wangy22@mcmaster.ca

intermediate code executed by JVM[11], but its designers have integrated JVM tightly with whole Java development environment, and thus supports object model of Java directly, such as inheritance and interfacing. Low level methods of objects include static methods, virtual methods and interface methods. Aggregate data such as object is stored only after the memory is dynamically allocated for it, and is collected when it is no longer accessible. Scalar data is stored in either local variable, structure ?eld or on the stack of abstract machine. When invoking methods, scalar data and/or reference to aggregated data are pushed onto the evaluation stack, and the returned value of the methods are presented on the top of the stack. In ?le system, compiled Java intermediate code exists as class ?les, or a single compressed JAR ?le. The ?lename of each class ?le has to be the same as the public class name within the ?le. Classes can be organized using the directory structure from the ?le system. Each class ?le is in byte code format, where JVM instructions are presented as one-byte opcodes ranging from 0 to 255. Hence there are about 250 instructions by which JVM intermediate language is formed. These opcodes contains instructions such as load and store, arithmetic, type conversion, object creation, operand stack management, control transfer, method invocation and etc. In JVM, instructions are type speci?c, such that type checking is necessary before a value is passed to the instruction. Primitive types include byte, short, int, long, char, float and double. Type speci?c instructions with the same functionality di?er in the ?rst character of the instruction name. For instance, iadd and fadd are both arithmetic instructions that calculate the sum of two values, but one take integer type and the other one take ?oat point type. JVM is considered to be architecture neutral in [19], because it aims to emulate low level instruction in byte code for its virtual architecture, to in order to minimize the gap with the underlying actual architecture. Because of this, on some architecture, it is even possible to execute the byte code instructions directly in machine level, such as ARM926EJ-S 32-bits RISC CPU4 . On September 30, 2004, Sun Microsystems released Java 1.5, which adds language features including generics, metadata, autoboxing/unboxing and enumerations. For detailed speci?cation of JVM, one can refer to [21].

Common Language Runtime
Based on the experience of Java Development Kits, Microsoft released its competitive .NET Framework in early 2002, which contains a virtual machine mechanism known as common language runtime or CLR, where common language is the new name of intermediate language for its virtual machine. Similar to JVM, it is an abstract stack machine with instructions speci?c to stack operations, as previously described. In contrast to the architecture neutral JVM, CLR does not restrict to single programming language, as long as a language can be compiled to the common language. Therefore
4

See http://www.arm.com/products/CPUs/families/ARM9EFamily.html

A Survey of Software Distribution Formats

7

CLR is considered to be language neutral and it makes possible that objects are not tied to a particular object oriented language, but are generally de?ned by the common language. Because all data structures and message passing are based on a set of rules, known as common language speci?cation (CLS), the interoperability allows that one object can exchange information with another object where both of them are originally implemented in di?erent languages [6]. Applications for CLR, or .NET applications, are mostly presented as one or more portable executable (PE) ?les, which is originally an executable format for native programs under Windows operating systems, can be dynamically loaded. To ?t the applications into virtual machine domain, metadata is include in each PE ?le. The information contained in metadata gives the description of the application, such as types, members, references and class information, which can be used by the runtime for memory allocation, method invocation, object location, code veri?cation and etc. Based on class information, a class can be loaded as either value class such as struct or reference class such as class. Same as JVM, CLR uses byte code when representing its virtual machine instructions. The instructions are speci?c to stack manipulations, such that values and methods are pushed to and popped from the abstract stack during the execution. The di?erence is that, unlike JVM, instructions in CLR is not type speci?c. When calling add instruction, JIT compiler will automatically correspond the value stored in the stack slots to the correct types, as information about variable types are already included in the metadata. Having designed the common language in this way, it widens the value passing semantic, hence multi-language interoperability is well supported [11]. When we say CLR, we refer to the implementation of the virtual machine. while the speci?cation is de?ned in common language infrastructure or CLI. It is now an international standard accepted by Ecma International5 , which allows anyone other than Microsoft to implement the corresponding CLR environment, such as Mono Project 6 and DotGNU Project7 . Their implmentation does not contains Windows targeted libraries such as WinForm.Net, which is not a part of CLI speci?cation.

Garbage Collection In Virtual Machines
While program is running, occupied memory space is not always freed by the program itself, while actually it is a responsibility of the program. This is often due to bad programming habit from developers. Without a mechanism that automatically cleans the memory, if this happens, memory will be eventually consumed undesirably and no other programs are able to access the memory resource anymore after that. Such mechanism is called garbage collection. In object oriented virtual machines such as JVM and CLR, whether a memory slot of some object is garbage is decided by the its reachability. During the runtime, virtual system
5 6

See http://www.ecma-international.org/publications/standards/Ecma-335.htm See http://www.mono-project.com 7 See http://www.gnu.org/projects/dotgnu/

8

Yu Wang wangy22@mcmaster.ca

scans through the list of objects in the hash table periodically, to see if the inspected memory space belongs to any object or its sub-objects. If no object is found, the memory space is considered to be unreachable and can be collected with no problem. In general cases, it is not realistic to scan through the memory completely, as it might a?ect the usability of the program. To more e?ciently have the task done, modern virtual machines divide the memory segments into generations, where generation 1 is the newest generation and so on. It is by observed experience that newer allocated memory tends to be more possible to be unused during the program life cycle; while an older memory segment tends to be more useful in future. Basic algorithm is that, ? after the garbage collector scans through the objects for the ?rst time, survived memory slots can be labeled as generation 2 and leaving remaining memory slots labeled generation 1; ? In the coming rounds of collection, only garbage in generation 1 is scanned and collected, until no memory can be release anymore [7].[] While distributing cross-platform software applications running on virtual machines, it is particularly important to have memory garbage collection well facilitated in terms of the performance and stability of the running application, since the memory management techniques at the bottom layer may vary on di?erent platforms.

1.3

Platform Oriented Virtualization

While application oriented virtual machines allows applications running correctly without knowing about the architecture it is running, in some other cases, people do concern about the architecture for particular purposes. For software engineers and developers, it is possible that the software they are implementing is to be tested on di?erent operating systems. For users, some of them want running a second operating system without partitioning their hard drive again. There are many other cases can be thought of, and we de?ne virtual machine for such purposes to be platform oriented virtual machine. To distinguish platform oriented virtual machines from application oriented virtual machines easier, we use the name platform emulator instead. This is actually the essential di?erence between a platform oriented virtual machine and application oriented virtual machine, where a platform emulator is implemented to provide an virtual computer environment whose hardware components are either mapped from the physical system or virtualized by software modules. We call each of such computer environment guest computer and the actual computer running the emulator is called host computer. A guest computer is supposed to be no di?erent from the actual physical computers, such that it follows the standard steps to power on, boot and shutdown. In most cases, any software can be executed on the guest computer with no problem, including operating systems.

A Survey of Software Distribution Formats

9

To a host computer, guest computers are just software instances running in di?erent processes independently. This is done by separating resources, such as memory space and storage space for each of the guest computer, so that more than one guest computer are allowed to be running on the host computer. In some emulators, storage space is usually saved as a separate partition in the hard-drive of the host computer or a disc image in the ?le system, such as VMware [24]. Before the guest computer is turned on, the platform emulator maps the image ?le as a hard drive device to be accessed by the guest computer. The ?le image containing the software system is called virtual appliance, which provides a great convenience for distributing large software systems [17]. Software companies which produce operating systems, ?rewalls, databases can distribute the corresponding virtual appliances for users to try and run, without o?ending the existing platform 8 . Other commercial platform emulators such as Microsoft Virtual PC 9 provides the same functionality as VMware. The emulator Win4Lin10 only provides emulation speci?c to Windows systems under Linux. Open source platform emulators include bochs 11 and QEMU12 , which provide multi-architecture emulations. For PowerPC emulation, one should consider PearPC13 which is speci?c to MacOS X operating system. More emulators can be found at [28] which also provide a good comparison between these emulators.

1.4

Distribution In Source Code

A special architecture-neutral format is source code distribution. To software companies and distributors, it is important to gain as much business clients and end users as possible on di?erent architectures and platforms, to in order to increase the market share of their software. Ignoring the concern of copyright and protection of intellectual properties, the scenario of distributing software in the form of source code is considered. The scenario is usually found in the community of free and open source software, or FOSS in shorthand. Such software systems and applications are mostly licensed under BSD license, GNU General Public License and MIT License[12], which in common let their source codes viewable in the public domain and allow people to study and even make modi?cations to the source. It it believed such licenses reduces signi?cant amount of time and manpower for software developers building new software by either using or learning the opened source code [3]. There are two types of source codes. One is to be compiled, such as code written in C and C++; the other one is to be interpreted, such as code written in Perl and Python. We call the code to be interpreted as script in the following paragraphs, to di?er from code to
Virtual appliances for VMware can be downloaded at http://www.vmware.com/vmtn/appliances/. QEMU appliances can be downloaded at http://free.oszoo.org 9 See http://www.microsoft.com/windows/virtualpc 10 See http://www.win4lin.com/ 11 See http://bochs.sourceforge.net 12 See http://fabrice.bellard.free.fr/qemu/ 13 See http://pearpc.sourceforge.net
8

10 be compiled.

Yu Wang wangy22@mcmaster.ca

Code To Be Compiled
Consider the compiled code whose binary only runs natively on speci?c architecture or operating system instead of running on a virtual machine. It turns out that during the software deployment, it might be more feasible to distribute its source code, in addition of distributing this program in binary form, if the code follows particular standards. Such standards are supported by the operating systems on which the program are intended to be executed. One example is POSIX, which stands for Portable Operating System Interface, with the X standing for the application programming interfaces, or API, inherited from Unix. Incorporated with ANSI C standard, POSIX standard is widely supported by Unix systems, as well as non-Unix systems such as Linux. Windows NT based operating systems support POSIX only in real-time part [27]. For applications that is not Windows native but POSIX accordant to run under Windows, one can try installing one of Cygwin environment 14 and Windows Service for UNIX15 , which provide more POSIX compatibilities to Windows operating system. Any source code conforms POSIX standard is able to be compiled into corresponding native binary format. It is very similar to install-time compilation from virtual machines, except that instead of intermediate code, source code is compiled. It is not necessary to use the same compiler to achieve this, as long as the compiler being used supports ANSI C standard. One famous example is the Hello World program, as shown below, which can be compiled almost anywhere. #include <stdio.h> int main (void) { printf ("Hello, World!\n"); return 0; } As we have discussed previously, executable and linkable binaries have limitation on running anywhere due to lack of cross platform support. Even some software applications can be running on virtual machines, and each platform has its own implementation of such virtual machines, it still does not broaden the limitation. This is because there is a large amount of software applications have not yet been ported to virtual machines. The idea of source code distributions is as illustrated above, such that software companies and distributors provide the standard compliant source code to the end users, and allows users to compile the source code on their computers. We have given an example in later section.
14 15

See http://www.cygwin.com See http://www.microsoft.com/windows/sfu/

A Survey of Software Distribution Formats

11

Script To Be Interpreted
Comparing with compiled programs, scripts are typically more portable and tend to be smaller in ?le sizes. It is because the statements in scripts are usually at a higher level than low level code in compiled programs, with the same semantic information remained. The di?erence between a compiler and an interpreter is that, the compiler translates the given code into machine code which has the same semantic as its source code, and then to be executed directly by the computer; whereas the interpreter goes through each statement of the give script and meanwhile it executes that statement without knowing what statement is going to be executed next. During the running time of a script, interpreter accesses the run-time information, such as input and output and conditional branches, and keeps the program states synchronized with the semantic speci?ed in the statements of the script. In most cases, source code is the only form in which a software, written in interpretive language, presents. In other words, the script itself is the software. Scripts can be executed on multi platforms, if their interpreters are implemented on these platforms. Generally, the interpreters can be categorized into two di?erence kinds, one is stand alone interpreter and the other one is integrated interpreter. For example, most interpreters for languages Perl and Python previously mentioned are stand alone interpreters. These two languages are widely used in operating systems which support POSIX standard, mainly for system administration and small application. Provided that the interpreters are pre-installed, a script can be written so that it can be directly called. An example of Hello World program written in Perl is given below. #!/usr/bin/perl print "Hello, World!\n"; __END__ The ?rst line in hw.pl speci?es the path of desired interpreter, so that when the script is called, the shell environment, automatically locates the interpreter for the script to be interpreted. Similar syntax applies to Python. Under operating systems that does not natively provide a shell environment for the purpose, one can either installing a shell environment from third party, such as previously mentioned Cygwin environment, or pass the path of the script as a parameter to the interpreter. Integrated interpreters are mostly referred to the script engines built within browsers. Gecko based browsers such as Mozilla Firefox16 , provide support of JavaScript. Microsoft’s Internet Explorer supports both JavaScript and VBScript, where the latter one is derived from Visual Basic language17 . Comparing with Internet Explorer, Gecko browsers are more dependent on JavaScript. Binding with the XML User Interface Language, or XUL, which is a markup language for building graphical user interfaces within the browser, it allows applications other than web
16 17

For more information about JavaScript, visit http://www.mozilla.org/js/ For more information about VBScript, visit http://www.microsoft.com/vbscript/

12

Yu Wang wangy22@mcmaster.ca

applets to be created[29]. Examples include the many dialogs from Mozilla Firefox browser are their selves written in XUL. Normally, these applications are shipped in a particular format, so called Cross-Platform Install, or XPI in shorthand. With supported browsers, installation from XPI ?le is extreme easy, by just clicking on the link to it and the installing process is automatically launched18 .

Adaptive Compilation
For given source code to be successfully deployed in a group of computers with di?erent platforms, the code has to follow supported standards. Moreover, during the stage of installing or deploying, we want to help the build system to tweak the process adaptively, so that the source code can be successfully compiled and thus executable by the system. The problem we want to solve is that, for the same functionality, the corresponding API on di?erent platforms might di?er. For instance, the memcpy() function found on GNU C Library, is named as bcopy() in BSD System Library with their arguments reversed to each other. Such tweaking usually involves the steps such that, after inspecting the system environment, related identi?ers are de?ned and passed to the code preprocessor by the build system. Based on the de?ned identi?ers, the preprocessor conditionally selects correct branches inside the source code, to ensure the correct API is being used. See the following code segment in C language as an example. ... #ifdef BSD_MEM #define memcpy(_dest, _src, _l) bcopy(_src, _dest, _l) #endif ... The above statements are called macros, which is supposed to be understood by C preprocessor. These macros are saying that, if identi?er BSD MEM is de?ne, then de?ne bcopy() as memcpy(), with the destination and source pointers swapped. In the scenario that the build system ?nds out that the current operating system uses BSD System Library, it might compiles the source code with the parameter -DBSD MEM passed to the preprocessor. Almost all compilers for C language support this option, and the preprocessor is automatically launched before each compilation. Now, with the source code preprocessed, the compiler translates all the occurrences of memcpy() into bcopy(), hence the compatibilities is guaranteed. The example above describes a solution for codes to be compiled, while the same technique applies to code to be interpreted, or script, too. In both case, build environment information, such as architecture type and paths to the required libraries, is detected before preprocessing. To detect the build environment, GNU tools Autoconf, Automake and Libtool, which is included in the GNU Coding Standards [16], are used to interact with operating system for obtaining the information [23]. Among steps of building,
18

For more XUL applications, visit http://addons.mozilla.org

A Survey of Software Distribution Formats

13

? Autoconf generates portable shell scripts of tweaking build parameters, such as macro identi?ers shown above; ? once the shell scripts are generated, Automake produces configure script based on the shell script for generating make?les; ? configure script now can be called to tweak compiler parameters and make?les are generated according to the environment information; ? to compile the source code, use make command which execute make?les, with the help of Libtool for producing portable libraries. Adaptive compilation can be found on highly customizable operating systems from open source communities. Popular distributions such as Gentoo Linux 19 and FreeBSD20 both provide similar source code package releasing systems named Portage and Ports System respectively. Each system stores a directory of software as a telephone book and allow end user search for particular applications. Source code of selected application will be retrieved remotely, compiled and installed in steps above. In term of compilation, source code can be considered as not compiled distribution format. There is a project, called Linux From Scratch or LFS21 , even provide no actual code at all. Instead, it provides manuals of how to build and con?gure a complete operating system by obtaining the required source code manually. Even the compiler itself has to be downloaded and compiled by an existing compiler from somewhere else.

1.5

Other Distribution Formats

Slim Binaries
In 1997, slim binaries is introduced in [9], which proclaimed that source code can be translated into intermediate code represented by tree structure. When executing, the time taken for accessing mechanical storages, which are mostly hard drive and ?oppy drive, can be used for compiling intermediate code into native binary code. It is a form of just-in-time compilation. Comparing with byte code, tree presentation is convenient for storing semantics, such as conditional branches. We have de?ned that application oriented virtualization is basically to have an abstract machine executing byte code, which is an analogy to an real computer running native machine code. Thus the mechanism of slim binaries does not belong to this category. It is neither a distribution format in source code, since source code should be human readable in our de?nition.
See http://www.gentoo.org See http://www.freebsd.org 21 See http://www.linuxfromscratch.org
20 19

14

Yu Wang wangy22@mcmaster.ca

Slim binaries is believed to be a replacement of Java in [8]. The author pointed out that a virtual machine is not able to verify its byte code e?ciently, since semantics have to be analyzed again and it is redundant; where with code stored in tree presentation, it is easy to verify the semantic structure for slim binaries. Another advantage of slim binaries is that the code size is even smaller than native machine code, because of the semantics represented in tree structure, mentioned in [5].

Architecture Neutral Distribution Format
Mentioned in both [9] and [11], there used to be a speci?cation commissioned by Open Software Foundation which is called Architecture Neutral Distribution Format (ANDF). It was an attempt to distribute software in the form of intermediate code, to executed on stack based virtual machine. Instead of being compiled just-in-time, the intermediate code is to be compiled at its installation stage, but install-time code generation is not as time e?cient as JIT, as mentioned in section 1.2 on 5. In modern platforms, static variables and functions are save in corresponding memory o?sets in binary ?les, as well as intermediate code for virtual machines. One major reason that ANDF got faded after 90s, is that variables and functions are symbolically saved in the intermediate code, which eases the reverse engineering for its source code. For commercial software companies, using such format is identical to disclosing its intellectual properties to the public. After 2000, free and open source software becomes much more popular than before. FOSS developers at this point concerns more on how wide their software can be distributed, rather than the protection of intellectual properties. Because of this, other than distributing software in form of source code, ANDF can be a good choice [20]. Actually, there are ANDF based projects still running healthy, such as TenDRA project 22 , which provide C/C++ compilers for ANDF.

1.6

Concluding Remarks

In above sections, we have discussed four forms in which software can be distributed, known as software distribution formats, which are fat binary, intermediate code for application oriented virtual machine, appliance for platform oriented virtual machine and software distributed in source code. These software distribution formats can be commonly found in existing software distributions. Fat binary include fully compiled binaries for target platforms in a single ?le. It typically archives more than one binary ?les, and hence the ?le size is relatively large. Since the only operation required before execution is extraction of the binary corresponding to correct platform, it is considered to be a convenient for average users. It is used for software
22

See http://www.tendra.org

A Survey of Software Distribution Formats

15

distribution on Mac OS X operating systems from Apple Computer and PocketPC devices promoted by Microsoft. Application oriented virtual machine provides an environment running applications in form of half compiled intermediate code. The intermediate code requires to be compiled either at install time or in time for execution. Just in time compilation is considered to be e?cient, as only code branches need to be compiled while running. Popular examples include Java Virtual Machine designed by Sun Systems, and Common Language Runtime from Microsoft .Net Framework. Platform oriented virtual machine loads appliances as guest computers, running on top of a host computer. Guest computers are separate process instances and can be running without interfering each other and their host computer, which is good for risky task such as product testing. Since appliances are computer images, which has nothing to do with compilers, so they are neither fully compiled, half compiled nor not compiled distribution format. Number of appliances are provided and can be found at VMware website and FreeOSZoo website listed above. Software in the form of source code, which is not compiled, can also be used when distributing. File size of source code is the smallest comparing to all other distribution formats, because it is written in plain text and can be compress e?ciently. The weakness is that before running the software, compilation is required. Distributions in source code are commonly found in open source communities, such as Gentoo Linux, FreeBSD and Linux From Scratch, and their users tends to be advanced. There are other software distribution formats not introduced here, such as Flash Media from Adobe23 . It is a format of universal distributable multimedia applications, which is mainly formed by scripts, intermediate codes and usually multimedia contents. Similar to application oriented virtual machines, a Flash ?le is to be executed by Flash Player, which can be either executed stand alone or as a browser plug-in. Because of various software distribution formats as we mentioned, multi-platform support for software is made possible. As computer performance improved along with time, the gap of e?ciencies in both compilation and execution between these formats is believed to be narrowed in future. Combinations of existing formats are expected.

1.7

Exam Question

1. What are fat binary and slim binaries? 2. What are the di?erences between application oriented virtual machine and platform oriented virtual machine? 3. Compare all formats discussed in the paper.
23

Originally designed by Marcomedia which is a part of Adobe now. See http://www.macromedia.com

16

Yu Wang wangy22@mcmaster.ca

Bibliography
[1] Inc. Apple Computer. Universal Binary Programming Guidelines. 1 In?nite Loop, Cupertino, CA 95014, 408-996-1010, 2 edition, 3 2006. [2] Ralph Arvesen. Developing and deploying pocket pc setup applications. http://msdn.microsoft.com/library/en-us/dnnetcomp/html/netcfdeployment.asp. [3] Sami Asiri. Open source software. SIGCAS Comput. Soc., 33(1):2, 2003. [4] John Aycock. A brief history of just-in-time. ACM Comput. Surv., 35(2):97–113, 2003. [5] Arpad Beszedes, Rudolf Ferenc, Tibor Gyimothy, and Andre Dolenc. Survey of code-size reduction methods. ACM Comput. Surv., 35(3):223–267, 2003. [6] Microsoft Corporation. Automatic memory management, 2006. [Online at MSDN; accessed 7-March-2006]. [7] Microsoft Corporation. Common language runtime overview, 2006. [Online at MSDN; accessed 7-March-2006]. [8] Michael Franz. The Java Virtual Machine: A passing fad? IEEE Software, 15(6):26–??, November / December 1998. [9] Michael Franz and Thomas Kistler. Slim binaries. Commun. ACM, 40(12):87–94, 1997. [10] Adele Goldberg and David Robson. Smalltalk-80: the language and its implementation. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1983. [11] K John Gough. Stacking them up: a comparison of virtual machines. In ACSAC ’01: Proceedings of the 6th Australasian conference on Computer systems architecture, pages 55–61, Washington, DC, USA, 2001. IEEE Computer Society. [12] Michael K. Johnson. Licenses and copyright. Linux J., 1996(29es):3, 1996. [13] John McCarthy. Recursive functions of symbolic expressions and their computation by machine, part i. Commun. ACM, 3(4):184–195, 1960. [14] Brad A. Myers. Using handhelds and pcs together. Commun. ACM, 44(11):34–41, 2001. [15] T. Pittman. Two-level hybrid interpreter/native code execution for combined spacetime program e?ciency. In SIGPLAN ’87: Papers of the Symposium on Interpreters and interpretive techniques, pages 150–152, New York, NY, USA, 1987. ACM Press. [16] Arnold Robbins. What’s gnu? gnu coding standards. Linux J., 1995(16es):8, 1995.

A Survey of Software Distribution Formats

17

[17] Constantine Sapuntzakis, David Brumley, Ramesh Chandra, Nickolai Zeldovich, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Virtual appliances for deploying and maintaining software. In LISA ’03: Proceedings of the 17th USENIX conference on System administration, pages 181–194, Berkeley, CA, USA, 2003. USENIX Association. [18] Yunhe Shi, David Gregg, Andrew Beatty, and M. Anton Ertl. Virtual machine showdown: stack versus registers. In VEE ’05: Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments, pages 153–163, New York, NY, USA, 2005. ACM Press. [19] Jeremy Singer. Jvm versus clr: a comparative study. In PPPJ ’03: Proceedings of the 2nd international conference on Principles and practice of programming in Java, pages 167–169, New York, NY, USA, 2003. Computer Science Press, Inc. [20] Paul Tanner. Software portability: still an open issue? StandardView, 4(2):88–93, 1996. [21] Frank Yellin Tim Lindholm. The Java Virtual Machine Speci?cation. Addison-Wesley Professional, 2 edition, 1999. [22] David Ungar and Randall B. Smith. Self: The power of simplicity. In OOPSLA ’87: Conference proceedings on Object-oriented programming systems, languages and applications, pages 227–242, New York, NY, USA, 1987. ACM Press. [23] Gary V. Vaughan, Ben Elliston, Tom Tromey, and Ian Lance Taylor. GNU Autoconf, Automake and Libtool. pub-NEW-RIDERS, pub-NEW-RIDERS:adr, 2000. [24] Brian Walters. Vmware virtual platform. Linux J., 1999(63es):6, 1999. [25] Wikipedia. Fat binary. http://en.wikipedia.org/wiki/Fat binary. [26] Wikipedia. Comparison of virtual machines — wikipedia, the free encyclopedia, 2006. [Online; accessed 17-March-2006]. [27] Wikipedia. Posix — wikipedia, the free encyclopedia, 2006. [Online; accessed 18-March2006]. [28] Wikipedia. Universal binary — wikipedia, the free encyclopedia, 2006. [Online; accessed 17-March-2006]. [29] Louie Zhao, Jay Yan, and Kyle Yuan. Mozilla accessibility on unix/linux. In W4A ’05: Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A), pages 90–98, New York, NY, USA, 2005. ACM Press.


相关文章:
Chapter 1 Why China Works
Chapter 1 Why China Works_英语学习_外语学习_... production and distribution are based on supply ...It was founded 6 by Wong Kwong Yu (Huang ...
2009《学术论文写作》模板1
Chapter 1, the Introduction part, will be ...distribution franchise is that the franchisor ...9 Yu Yan Franchise Management of Internationalized ...
英文原刊时文阅读1(女童碾压事件)
However, Wang Zhongxing, a professor in the ... just in 英文原刊时文阅读 the first chapter of...Shierxiangshuyu: Yue Yue dies and we are still...
A Glimpse of Chinese Culture chapter1-8
A Glimpse of Chinese Culture chapter1-8_英语学习_外语学习_教育专区。...The Chinese word for fish “Yu” sounds like the word abundance(盈余). ...
大学生安全自卫学
"the college students' sports and health" Wang Jiabin, Yu Rongan, etc ...Ⅳ. Teaching Time Distribution Category Chapter 1 Chapter2 Chapter3 Total ...
8A Chapter2reading 导学案1
Yu 审批人 Ms Zhang 审批领导 Mr su 课 题 Chapter 2 Reading P20 课时 11...Chapter 2 Reading Wendy Wang,15,① must be ② 尖子生之 in Shanghai....
石油工程英语现场会话 (1)
石油工程英语现场会话 (1)_英语学习_外语学习_... Wang Xiao-yu said to Zhou in fear and ...(钻具) Chapter IV Equipment (Spudding) Inspection...
英语专业毕业论文格式-2014
Professor Yu.., who has…………………………... ) 1 重庆科技学院本科生毕业论文 Chapter XXX ...的标注形式,如作者王 守仁 2014,应标注为(Wang, ...
Staff HandbookCHAPTER 1
Staff HandbookCHAPTER 4 3页 1财富值 Staff HandbookChapter 2 3页 1财富值...Food and beverage services include Lobby Lounge, Deli Shop, Hai Yu Lan Ge...
资管硕一甲
(05)273-2893, Email: jyeh@mail.ncyu.edu.tw...Chapter 1 Han & Kamber – Chapter 2 Witten & ... Wong, Ke; Zhou, Senqiang; Yang, Qiang, 和...
更多相关标签: