Open Source Database

From DrugPedia: A Wikipedia for Drug discovery

Revision as of 06:43, 1 September 2008 by Ravi (Talk | contribs)
Jump to: navigation, search

In 1990, there was the era of Open Source Databases e.g MySQL, PostgreSQL.

MySQL

MySQL is a relational database management system (RDBMS) which has more than 11 million installations. The program runs as a server providing multi-user access to a number of databases.

MySQL is owned and sponsored by a single for-profit firm, the Swedish company MySQL AB, now a subsidiary of Sun Microsystems,which holds the copyright to most of the codebase. The project's source code is available under terms of the GNU General Public License, as well as under a variety of proprietary agreements.

Uses

MySQL is popular for web applications and acts as the database component of the LAMP, BAMP, MAMP, and WAMP platforms (Linux/BSD/Mac/Windows-Apache-MySQL-PHP/Perl/Python), and for open-source bug tracking tools like Bugzilla. Its popularity for use with web applications is closely tied to the popularity of PHP and Ruby on Rails, which are often combined with MySQL. PHP and MySQL are essential components for running popular content management systems such as Drupal, e107, Joomla!, WordPress and some BitTorrent trackers. Wikipedia runs on MediaWiki software, which is written in PHP and uses a MySQL database. Platforms and interfaces

MySQL is written in C and C++. The SQL parser uses yacc and a home-brewed lexer.

MySQL works on many different system platforms, including AIX, BSDi, FreeBSD, HP-UX, i5/OS, Linux, Mac OS X, NetBSD, Novell NetWare, OpenBSD, eComStation , OS/2 Warp, QNX, IRIX, Solaris, SunOS, SCO OpenServer, SCO UnixWare, Sanos, Tru64, Windows 95, Windows 98, Windows ME, Windows NT, Windows 2000, Windows XP, and Windows Vista. A port of MySQL to OpenVMS is also available.

Libraries for accessing MySQL databases are available in all major programming languages with language-specific APIs. In addition, an ODBC interface called MyODBC allows additional programming languages that support the ODBC interface to communicate with a MySQL database, such as ASP or ColdFusion. The MySQL server and official libraries are mostly implemented in ANSI C/ANSI C++.

To administer MySQL databases one can use the included command-line tool (commands: mysql and mysqladmin). Also downloadable from the MySQL site are GUI administration tools: MySQL Administrator and MySQL Query Browser. Both of the GUI tools are now included in one package called tools/5.0.html MySQL GUI Tools.

In addition to the above-mentioned tools developed by MySQL AB, there are several other commercial and non-commercial tools available. Examples include phpMyAdmin, a free Web-based administration interface implemented in PHP, or SQLyog Community Edition, a free desktop based GUI tool.

Features

As of August 2007, MySQL offers MySQL 5.0 in two different variants: the MySQL Community Server and Enterprise Server. They have a common code base and include the following features:

  • A broad subset of ANSI SQL 99, as well as extensions
  • Cross-platform support
  • Stored procedures
  • Triggers
  • Cursors
  • Updatable Views
  • True VARCHAR support
  • INFORMATION_SCHEMA
  • Strict mode
  • X/Open XA distributed transaction processing (DTP) support; two phase commit as part of this, using Oracle's InnoDB engine
  • Independent storage engines (MyISAM for read speed, InnoDB for transactions and referential integrity, MySQL Archive for storing historical data in little space)
  • Transactions with the InnoDB, BDB and Cluster storage engines; savepoints with InnoDB
  • SSL support
  • Query caching
  • Sub-SELECTs (i.e. nested SELECTs)
  • Replication with one master per slave, many slaves per master, no automatic support for multiple masters per slave.
  • Full-text indexing and searching using MyISAM engine
  • Embedded database library
  • Partial Unicode support (UTF-8 sequences longer than 3 bytes are not supported; UCS-2 encoded strings are also limited to the BMP)
  • ACID compliance using the InnoDB, BDB and Cluster engines
  • Shared-nothing clustering through MySQL Cluster

The MySQL Enterprise Server is released once per month and the sources can be obtained either from MySQL's customer-only Enterprise site or from MySQL's Bazaar repository, both under the GPL license. The MySQL Community Server is published on an unspecified schedule under the GPL and contains all bug fixes that were shipped with the last MySQL Enterprise Server release. Binaries are no longer provided by MySQL for every release of the Community Server.[8][9]

  • Replication support (i.e. Master-Master Replication & Master-Slave Replication

Distinguishing features

The following features are implemented by MySQL but not by some other RDBMS software:

  • Multiple storage engines, allowing you to choose the one which is most effective for each table in the application (in MySQL 5.0, storage engines must be compiled in; in MySQL 5.1, storage engines can be dynamically loaded at run time):
  • Native storage engines (MyISAM, Falcon, Merge, Memory (heap), Federated, Archive, CSV, Blackhole, Cluster, Berkeley DB, EXAMPLE, and Maria)
  • Partner-developed storage engines (InnoDB, solidDB, NitroEDB, BrightHouse)
  • Community-developed storage engines (memcached, httpd, PBXT)
  • Custom storage engines
  • Commit grouping, gathering multiple transactions from multiple connections together to increase the number of commits per second.

Server compilation type

There are 3 types of MySQL Server Compilations for Enterprise and Community users:

  • Standard: The MySQL-Standard binaries are recommended for most users, and include the InnoDB storage engine.
  • Max: (not MaxDB, which is a cooperation with SAP AG) is mysqld-max Extended MySQL Server. The MySQL-Max binaries include additional features that may not have been as extensively tested or are not required for general usage.
  • The MySQL-Debug binaries have been compiled with extra debug information, and are not intended for production use, because the included debugging code may cause reduced performance.

Beginning with MySQL 5.1, MySQL AB has stopped providing these different package variants. There will only be one MySQL server package, which includes a mysqld binary with all functionality and storage engines enabled. Instead of providing a separate debug package, a server binary with extended debugging information is also included in the standard package.

History

Milestones in MySQL development include:

  • MySQL was first released internally on 23 May 1995
  • Windows version was released on January 8, 1998 for Windows 95 and NT
  • Version 3.23: beta from June 2000, production release January 2001
  • Version 4.0: beta from August 2002, production release March 2003 (unions)
  • Version 4.1: beta from June 2004, production release October 2004 (R-trees and B-trees, subqueries, prepared statements)
  • Version 5.0: beta from March 2005, production release October 2005 (cursors, stored procedures, triggers, views, XA transactions)
  • Version 5.1: currently pre-production (since November 2005) (event scheduler, partitioning, plugin API, row-based replication, server log tables)
  • Sun Microsystems acquired MySQL AB on 26 February 2008.

PostgreSQL

PostgreSQL is an object-relational database management system (ORDBMS). It is released under a BSD-style license and is thus free software. As with many other open-source programs, PostgreSQL is not controlled by any single company, but relies on a global community of developers

Product name

The mixed-capitalization of the PostgreSQL name can confuse some people on first viewing. The several pronunciations of 'SQL' can lead to this confusion. PostgreSQL's developers pronounce it /poːst ɡɹɛs kjuː ɛl/; (Audio sample, 5.6k MP3). It is also common to hear it abbreviated as simply "postgres", which was its original name. Because of ubiquitous support for the SQL Standard amongst all relational databases, the community considered changing the name back to Postgres. However, the PostgreSQL Core Team announced in 2007 that the product would continue to be named PostgreSQL[2]. The name refers to the project's origins as a "post-Ingres" database, the original authors having also developed the Ingres database.

History

PostgreSQL evolved from the Ingres project at University of California, Berkeley. In 1982, the project leader, Michael Stonebraker, left Berkeley to commercialize Ingres. He returned to Berkeley in 1985 and started a post-Ingres project to address the problems with contemporary database systems that had become increasingly clear during the early 1980s. The new project, Postgres, aimed to add the fewest features needed to completely support types. These features included the ability to define types and to fully describe relationships – something used widely before but maintained entirely by the user. In Postgres, the database "understood" relationships, and could retrieve information in related tables in a natural way using rules. Postgres used many ideas of Ingres but not its code.

Starting in 1986, the team published a number of papers describing the basis of the system, and by 1988 had a prototype version. The team released version 1 to a small number of users in June 1989, then version 2 with a re-written rules system in June 1990. Version 3, released in 1991, again re-wrote the rules system, and added support for multiple storage managers and an improved query engine. By 1993 the great number of users began to overwhelm the project with requests for support and features. After releasing version 4 — primarily a cleanup — the project ended.

But open-source developers could obtain copies and develop the system further, because Berkeley had released Postgres under the BSD license. In 1994, Berkeley graduate students Andrew Yu and Jolly Chen replaced the Ingres-based QUEL query language interpreter with one for the SQL query language, creating Postgres95. The code was released on the web.

In July 1996, Marc Fournier at Hub.Org Networking Services provided the first non-university development server for the open source development effort. Along with Bruce Momjian and Vadim B. Mikheev, work began to stabilize the code inherited from Berkeley. The first open source version was released on August 1, 1996.

In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. The first PostgreSQL release formed version 6.0 in January 1997. Since then, the software was maintained by a group of database developers and volunteers around the world, coordinating via the Internet.

Although the license allowed for the commercialization of Postgres, the code did not develop commercially at first — somewhat surprisingly considering the advantages Postgres offered. The main offshoot originated when Paula Hawthorn (an original Ingres team member who moved from Ingres) and Michael Stonebraker formed Illustra Information Technologies to commercialize Postgres.

In 2000, former Red Hat investors created the company Great Bridge to commercialize PostgreSQL and compete against commercial database vendors. Great Bridge sponsored several PostgreSQL developers and donated many resources back to the community but by late 2001 closed due to tough competition from companies like Red Hat and to poor market conditions.

In 2001, Command Prompt, Inc. released Mammoth PostgreSQL, the oldest surviving commercial PostgreSQL distribution. It continues to actively support the PostgreSQL community through developer sponsorships and projects including PL/Perl, PL/php, and hosting of community projects such as the PostgreSQL Build Farm.

In January 2005, PostgreSQL received backing by database vendor Pervasive Software, known for its Btrieve product which was ubiquitous on the Novell NetWare platform. Pervasive announced commercial support and community participation and achieved some success. But in July 2006, it left the PostgreSQL support market.

In mid-2005 two other companies announced plans to commercialize PostgreSQL with focus on separate niche markets. EnterpriseDB added functionality to allow applications written to work with Oracle to be more readily run with PostgreSQL. Greenplum contributed enhancements directed at data warehouse and business intelligence applications, including the BizGres project.

In October 2005, John Loiacono, executive vice president of software at Sun Microsystems, commented: "We're not going to OEM Microsoft but we are looking at PostgreSQL right now," although no specifics were released at that time. By November 2005, Sun had announced support for PostgreSQL. As of June 2006, Sun Solaris 10 6/06 ships PostgreSQL.

In August, 2007, EnterpriseDB announced[8] the Postgres Resource Center and EnterpriseDB Postgres, designed to be a fully configured distribution of PostgreSQL including many contrib modules and add-on components. EnterpriseDB Postgres was renamed to Postgres Plus in March, 2008.

The PostgreSQL project continues to make yearly major releases and minor "bugfix" releases, all available under the BSD license, based on contributions from both commercial vendors, support companies, and open source programmers at large.

Functions

Functions allow blocks of code to be executed by the server. Although these blocks can be written in SQL, the lack of basic programming operations, such as branching and looping, has driven the adoption of other languages inside of functions. Some of the languages can even execute inside of triggers. Functions in PostgreSQL can be written in the following languages:

  • A built-in language called PL/pgSQL resembles Oracle's procedural language PL/SQL.
  • Scripting languages are supported through PL/Lua, PL/LOLCODE, PL/Perl, plPHP, PL/Python, PL/Ruby, PL/sh, PL/Tcl and PL/Scheme.
  • Compiled languages C, C++, or Java (via PL/Java).
  • The statistical language R through PL/R.

PostgreSQL supports row-returning functions, where the output of the function is a set of values which can be treated much like a table within queries.

Functions can be defined to execute with the privileges of either the caller or the user who defined the function. Functions are sometimes referred to as stored procedures, although there is a slight technical distinction between the two.

Indexes

PostgreSQL includes built-in support for B+-tree, hash, GiST and GiN indexes. In addition, user-defined index methods can be created, although this is quite an involved process. Indexes in PostgreSQL also support the following features:

  • PostgreSQL is capable of scanning indexes backwards when needed; a separate index is never needed to support ORDER BY field DESC.
  • Expression indexes can be created with an index of the result of an expression or function, instead of simply the value of a column.
  • Partial indexes, which only index part of a table, can be created by adding a WHERE clause to the end of the CREATE INDEX statement. This allows a smaller index to be created.
  • The planner is capable of using multiple indexes together to satisfy complex queries, using temporary in-memory bitmap index operations.

Triggers

Triggers are events triggered by the action of SQL DML statements. For example, an INSERT statement might activate a trigger that checked if the values of the statement were valid. Most triggers are only activated by either INSERT or UPDATE statements.

Triggers are fully supported and can be attached to tables but not to views. Views can have rules, though. Multiple triggers are fired in alphabetical order. In addition to calling functions written in the native PL/PgSQL, triggers can also invoke functions written in other languages like PL/Perl.

MVCC

PostgreSQL manages concurrency through a system known as Multi-Version Concurrency Control (MVCC), which gives each user a "snapshot" of the database, allowing changes to be made without being visible to other users until a transaction is committed. This largely eliminates the need for read locks, and ensures the database maintains the ACID principles in an efficient manner.

Rules

Rules allow the "query tree" of an incoming query to be rewritten. One common usage is to implement updatable views.

Data types

A wide variety of native data types are supported, including:

  • Arrays
  • Arbitrary precision numerics
  • Variable length text
  • Geometric primitives
  • IPv4 and IPv6 addresses
  • CIDR blocks and MAC addresses
  • XML (as of 8.3)
  • UUID (as of 8.3)

In addition, users can create their own data types which can usually be made fully indexable via PostgreSQL's GiST infrastructure. Examples of these are the geographic information system (GIS) data types from the PostGIS project for PostgreSQL.

User-defined objects

New types of almost all objects inside the database can be created, including:

  • Casts
  • Conversions
  • Data types
  • Domains
  • Functions, including aggregate functions
  • Indexes
  • Operators (existing ones can be overloaded)
  • Procedural languages

Inheritance

Tables can be set to inherit their characteristics from a "parent" table. Data is shared between "parent" and "child" tables. Tuples inserted or deleted in the "child" table will respectively be inserted or deleted in the "parent" table. Also adding a column in the parent table will cause that column to appear in the child table as well. This feature is not fully supported yet—in particular, table constraints are not currently inheritable. This means that attempting to insert the id of a row from a child table into a table that has a foreign key constraint referencing a parent table will fail because Postgres doesn't recognize that the id from the child table is also a valid id in the parent table.

Inheritance provides a way to map the features of generalization hierarchies depicted in Entity Relationship Diagrams (ERD) directly into the PostgreSQL database.

Other features

  • Referential integrity constraints including foreign key constraints, column constraints, and row checks
  • Views. Although native support for updateable views has not been implemented, the same functionality can be achieved using the rules system.
  • Full, inner, and outer (left and right) joins
  • Sub-selects
  • Transactions
  • Supports most of the major features of SQL:2003 standard [2] unsupported supported <-- lead to documentation for the next release of PostgreSQL, follow this link to find manuals for already released versions of PostgreSQL
  • Encrypted connections via SSL
  • Binary and textual large-object storage
  • Online backup
  • Domains
  • Tablespaces
  • Savepoints
  • Point-in-time recovery
  • Two-phase commit
  • TOAST (The Oversized-Attribute Storage Technique) is used to transparently store large table attributes (such as big MIME attachments or XML messages) in a separate area, with automatic compression.
  • Regular expressions [3]
  • XML Datatype
  • XPath

Add-ons

  • Geographic objects via PostGIS. GPL.
  • Shortest-Path-Algorithms with pgRouting using PostGIS. GPL.
  • Full text search via Tsearch2 and OpenFTS. (As of version 8.3, Tsearch2 is included in core PostgreSQL)
  • Some synchronous multi-master derivatives or extensions exist, including

o pgcluster (BSD license)

o Postgres-R (in early stages of development)

  • Several asynchronous master/slave replication packages, including

o Slony-I (BSD license)

o Mammoth Replicator. (Proprietary)

o Bucardo

  • There are proxy (middleware) tools that enable replication, failover or load management and balacing for postgresql:

o PGPool-II.

o Sequoia available for a number of different server besides postgresql.

Database administration front-ends

psql

The primary front-end for PostgreSQL is the psql command-line program, which can be used to enter SQL queries directly, or execute them from a file. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks; for example tab completion of object names and SQL syntax.

pgAdmin

pgAdmin is a graphical front-end administration tool for PostgreSQL, which is supported on most popular computer platforms. The program is available in more than a dozen languages, and is free software released under the Artistic License. The first prototype, named pgManager, was written for PostgreSQL 6.3.2 from 1998. The stable release (named pgAdmin II) was first released on 16 January 2002. The current version is pgAdmin III, which is released under the Artistic License. phpPgAdmin

phpPgAdmin is a web-based administration tool for PostgreSQL written in PHP and based on the popular phpMyAdmin interface originally written for MySQL administration. Benchmarks

Many informal performance studies of PostgreSQL have been done[10] but the first industry-standard and peer-validated benchmark was completed in June 2007 using the Sun Java System Application Server (commercial version of GlassFish) 9.0 Platform Edition, UltraSPARC T1 based Sun Fire server and Postgres 8.2. This result of 778.14 SPECjAppServer2004 JOPS@Standard compares favourably with the 874 JOPS@Standard with Oracle 10 on an Itanium based HP-UX

In August 2007, Sun submitted an improved benchmark score of 813.73 SPECjAppServer2004 JOPS@Standard. With the system under test at a reduced price, the price/performance improved from $US 84.98/JOPS to $US 70.57/JOPS. [12]

Prominent users

  • Yahoo! for web user behavioral analysis, storing 2 petabytes and claimed to be the largest data warehouse using a modified version of PostgreSQL.
  • Afilias, domain registries for .org, .info and others.
  • Sony Online multiplayer online games.
  • BASF, shopping platform for their agribusiness portal.
  • hi5.com social networking portal.
  • Skype VoIP application, central business databases.
  • Sun xVM, Sun's virtualization and datacenter automation suite
  • Evergreen, an open source integrated library system providing Online Public Access Catalog for many public library systems
  • NextBus, provider of arrival-time prediction and GPS tracking systems for public transportation