Univa Grid Engine
Univa Grid Engine is a batch-queuing system, forked from Sun Grid Engine. The software schedules resources in a data center applying user-configurable policies to help improve resource sharing and throughput by maximizing resource utilization. The product can be deployed to run on-premises, using IaaS cloud computing or in a hybrid cloud environment.
History
The roots of Grid Engine as a commercial product date back to 1993. A more comprehensive genealogy of the product is described in Sun Grid Engine. Grid Engine was first distributed by Genias Software and from 1999, after a company merger, by Gridware, Inc. In 2000, Sun Microsystems acquired Gridware. Sun renamed CODINE/GRD as Sun Grid Engine later that year, and released it as open-source in 2001.In 2010, Oracle Corporation acquired Sun and subsequently renamed SGE to Oracle Grid Engine. Oracle Grid Engine moved to a closed-source model providing binaries with the distribution but no source code. As a result, the project's open-source repository no longer reflected changes made by Oracle and users were prevented from contributing code changes. In response to this, the Grid Engine community started the Open Grid Scheduler and the Son of Grid Engine projects to continue to develop and maintain a free implementation of Grid Engine.
On January 18, 2011, Univa announced that it had hired the principal engineers from the Sun Grid Engine team. Univa Grid Engine development is led by CTO Fritz Ferstl, who founded the Grid Engine project and ran the business within Sun/Oracle for the past 10 years.
On October 22, 2013 Univa announced that it had acquired Oracle Grid Engine assets and intellectual property making it the sole commercial provider of Grid Engine software.
Between 2011 and 2013 Univa added new capabilities to Univa Grid Engine including Univa Unisight, and Univa License Orchestrator.
Univa Unisight provided new reporting and analytics capabilities related to of Univa Grid Engine workloads and infrastructure. Univa License Orchestrator extended Univa Grid Engine scheduling policies to support allocation and optimization of commercial software licenses, an important capability in Electronic Design Automation and other industries.
On June 24, 2018 Univa announced massive scalability operating a single cluster with over 1 million cores on AWS.
Releases
Univa Grid Engine 8.0 was Univa’s first commercial release of Grid Engine, offered on April 12, 2011. It was forked from SGE 6.2u5, the last open source release. It added improved third party application integration, license and policy management, enhanced support for software and hardware platforms, and cloud management tools.Univa Grid Engine 8.0.1 was released on October 4, 2011. It adds improved support for multi-core hardware, integration with NVIDIA GPUs, new job submission verifier extensions, and additional bug fixes.
Univa Grid Engine 8.1.0 was announced on May 2, 2011 and improvements and bug fixes were made in a series of 8.1.x releases through 8.1.3 announced on Nov 15, 2012. The 8.1.x releases delivered important new functionality such as Job Classes, PostgreSQL Spooling, Resource Maps, and improvements to Share Tree Policies. The releases also introduced a new Fair Urgency Policy, deterministic Wildcard PE Selection, improved Diagnostics, pre-configured MPI integrations, an improved Apache Hadoop Integration as well as many bug fixes and performance improvements.
Univa Grid Engine 8.1.6 was announced on Oct 14, 2013. This update included improvements to the Univa Grid Engine scheduler aimed at larger clusters and Qmaster stability and scalability improvements.
Univa Grid Engine 8.2.0 was released on Sept 02, 2014. Univa Grid Engine version 8.2.0 was the first release to provide native support for Microsoft® Windows® environments.
Univa Grid Engine 8.3.0 was released on June 22, 2015. The new Preemption feature in Univa Grid Engine 8.3.0 allowed users to set priorities on different work so that, if a higher priority application needed to use resources allocated to a lower priority application, the lower priority application would be effectively be “paused”—not lost—and work would automatically resume once the higher priority application completed. Among a handful of other new features added to Grid Engine 8.3.0 to improve overall reliability and efficiency was a new Run Time Modification of Resources feature. The Run Time Modification of Resources feature enabled cluster administrators to make configuration changes “on the fly” improving cluster availability and improving overall efficiency.
Univa Grid Engine 8.3.1 was released on August 28, 2015. This release contained additional fixes and enhancements identified since the release of 8.3.0.
Univa Grid Engine 8.4.0 was released on May 31, 2016. This release supports Docker containers containers and automatically dispatched and ran jobs within a user specified Docker Image.
Univa Grid Engine 8.5.0 was released on March 7, 2017. This release of Univa Grid Engine provided on average ~2x faster scheduling than open source Grid Engine 6.2u5. Univa Grid Engine 8.5.0 also delivered significant improvements to Docker support including mobility of GPU apps within a cluster.
Univa Grid Engine 8.6.0 was released on July 17, 2018. This release added support for NVIDIA Docker 2.0 providing more flexibility when running Docker containers in a Univa Grid Engine environment.
Univa Grid Engine 8.6.1 was released on August 8, 2018, providing improved control over GPU devices, and new affinity features, allowing jobs to gravitate towards, or away from, certain compute nodes.
Univa Grid Engine 8.6.2 was released on August 16, 2018. This release improved Univa Grid Engine performance and scalability in several key areas including network communications, job submission, memory allocation, and scheduler optimizations. This update also improved Univa Grid Engine job dispatch information.
Univa Grid Engine 8.6.3 was released on September 27, 2018. This update introduced bulk configuration changes for Univa Grid Engine hosts. Bulk configuration changes perform operations on many hosts simultaneously, making it easier to manage large Univa Grid Engine clusters.
Univa Grid Engine 8.6.4 was released on November 23, 2018, providing new core binding strategies making it easier to specify how jobs are placed on nodes and cores while also providing more flexibility. A new affinity-based job placement policy was included in this update. Jobs submitted using affinity can be packed close together or spread across the cluster based on resources requested. New Univa Grid Engine Resource Maps syntax provides more granular control over application access to host devices such as NVIDIA GPUs. This allows jobs to request GPUs and ensure that GPUs are exclusively assigned to the specific job. Univa Grid Engine was also enhanced to directly communicate with NVIDIA Data Center GPU Manager in this release to collect GPU metrics for scheduling and accounting.
Univa Grid Engine 8.6.5 was released on May 6, 2019. Key new features were:
- Support for IBM Power 9 on Linux
- Improvements to Docker support on Univa Grid Engine
- Very large cluster hostname management where the IP address for each host is contained in the hostname
- Integration with Linux Out of Memory Notification API ensuring that Univa Grid Engine is automatically notified of jobs that are terminated by the Linux kernel
- Improved responsiveness in heavily loaded cluster for Grid Engine administrator commands
- Performance improvements to Univa Grid Engine automatic job rescheduling
- Thread deadlock detection for Univa Grid Engine Qmaster
- Updated support for NVIDIA DCGM versions up to 1.6.3
- Ability to specify GPU/CPU affinity as hard or soft requests
Univa Grid Engine 8.6.8 was released on December 12, 2019, providing new parameters to fine-tine scheduling wildcard requests, support for Linux mount namespaces and GPU usage reporting.
Univa Grid Engine 8.6.9 was released on February 10, 2020, providing enhancements to the qconf command and improved information and messaging collection.
Univa Grid Engine 8.6.11 was released on March 17, 2020 delivering improved Docker compatibility, job reporting and monitoring and increased support for the latest version of DCGM.