Note
This technote is not yet published.
Notes and recommendations of an investigation into third-party tools for consolidating system deployment and management across LSST enclaves and physical sites
1 Hosting, Managing, and Consuming Yum Repos: Pakrat, Katello, and More¶
Managed Yum repositories are very important for the sake of reproducibility and control.
- The specific nature of the LSST project does not always allow us to rebuild nodes (e.g., from xCAT) in order to update them so we must be able to apply Yum updates from a controlled source.
- We need to be able to (re)build and patch each node up to state that is consistent with other nodes, so locking repos into “snapshots” is important.
- We may need to be able to roll back to a previous set up patches for the sake of recovering from an issue, so retaining previous repo “snapshots” is important.
- We need to be able to “branch” our repos so that dev and test machines can see newer “snapshots” and production machines need to see slightly older “snapshots”, so having multiple active “snapshots” is important.
1.1 Solutions for Managing and Hosting Yum Repos¶
1.1.1 Pakrat + createrepo + web server¶
1.1.1.1 Overview¶
Pakrat <https://github.com/ryanuber/pakrat> is a Python-based tool for mirroring and versioning Yum repositories. In our investigation/setup we are running it with a wrapper script via CRON (originally this was weekly but we’ve moved to daily). Each time the wrapper script runs Pakrat syncs several repos and then uses createrepo to rebuild the Yum metadata for the repo. Apache is used to serve out the repos.
Each repo synced by Pakrat consists of:
- a top-level ‘Packages’ directory - stores RPMs
- sub-folders for each versioned snapshot - looks like a Yum repo and contains metadata for a given point in time
- a ‘latest’ symlink pointing to the most recent snapshot (we’re not using this symlink)
Each repo snapshot consists of a symlink to the top-level ‘Packages’ directory and a unique ‘repodata’ metadata sub-folder. The ‘repodata’ is created immediately after syncing on a given date and it only refers to RPMs that are available in ‘Packages’ at that the time. As long as ‘repodata’ is not recreated in a given snapshot folder machines using that repo will not be able to access additional RPMs added to ‘Packages’ in the intervening time.
Here’s a diagram of the structure of a repo:
- repobase/
- 2017-07-10/
- Packages -> ../Packages
- repodata/
- 2017-07-17/
- Packages -> ../Packages
- repodata/
- ...
- 2017-07-31/
- Packages -> ../Packages
- repodata/
- latest -> 2017-07-31
- Packages/
- 2017-07-10/
In our investigation/setup we point Yum clients to a specific snapshot so that they are in a consistent, repeatable state. We have the ability to point test clients to a newer snapshot. We have Pakrat set to sync monthly.
1.1.1.2 Storage baselining (~7/3/2017 - 9/20/2017; includes CentOS 7.3 & 7.4)¶
repo | raw size, single day |
synced daily (~11 7 days) |
comp- arativ e raw size (raw X 117 days) |
synced weekly (17 weeks) |
comp- arativ e raw size (raw X 17 weeks) |
synced ed monthl y (4 months ) |
comp arativ e raw size (raw X 4 months ) |
---|---|---|---|---|---|---|---|
CentOS : base | 7.4G | 14G | ~865.8 G | 12G | ~125.8 G | 7.4G | ~29.6G |
CentOS : centos plus | 979M | 2.2G | ~111.9 G | 1.6G | ~16.3G | 1.1G | ~3.8G |
CentOS : extras | 1.4G | 1.7G | ~163.8 G | 1.6G | ~23.8G | 1.4G | ~5.6G |
CentOS : update s | 6.5G | 11G | ~760.5 G | 9.4G | ~110.5 G | 7.2G | ~26G |
EPEL: epel | 13G | 19G | ~1.49T | 17G | ~221G | 16G | ~52G |
Puppet Labs: puppet labs-p c1 | 2.3G | 2.7G | ~269.1 G | 2.5G | ~39.1G | 2.4G | ~9.2G |
TOTAL | 31G | 50G | 3.54 - 3.61T | 43G | 527 - 536.5G | 36G | 124 - 126.2G |
1.1.1.3 Storage baselining (~7/3/2017 - 7/8/2017)¶
repo | raw size, single day |
synced daily (~35 days) |
comp- arativ e raw size (raw X 35 days) |
synced weekly (6 weeks) |
comp- arativ e raw size (raw X 6 weeks) |
sync ed monthl y (2 months ) |
comp- arativ e raw size (raw X 2 months ) |
---|---|---|---|---|---|---|---|
CentOS : base | 7.4G | 8.3G | ~269G | 7.5G | ~44.4G | 7.4G | ~14.8G |
CentOS : centos plus | 979M | 1.4G | ~33.5G | 1.1G | ~5.7G | 1.1G | ~1.9G |
CentOS : extras | 1.4G | 1.5G | ~49G | 1.4G | ~8.4G | 1.4G | ~2.8G |
CentOS : update s | 6.5G | 7.9G | ~227.5 G | 7.3G | ~39G | 7.2G | ~13G |
EPEL: epel | 13G | 15G | ~455G | 14G | ~78G | 14G | ~26G |
Puppet Labs: puppet labs-p c1 | 2.3G | 2.4G | ~80.5G | 2.3G | ~13.8G | 2.3G | ~4.6G |
TOTAL | 31G | 36G | 1,085 - 1,114. 5G | 34G | 186 - 189.3G | 33G | 62 - 63.1G |
1.1.1.4 Puppet Implementation¶
- modules:
- ‘apache’, from Puppet Forge
- ‘apache_config’, includes default config, firewall, and vhost
- ‘pakrat’, includes base installation, wrapper, cron, and storage config
- profiles
- ‘pakrat’, includes pakrat module
- ‘yum_server’, includes elements of apache_config
- roles
- ‘pakrat_yum_server’, uses profile::pakrat and profile::yum_server
1.1.1.5 Daily Ops¶
- Note: This should be fleshed out a little more in the near-term, as necessary. If we elect to stick with Pakrat long-term then we can expand it even more.
- When/how to run the Pakrat repo sync?
- The Pakrat repo sync wrapper script is installed
at /root/cron/pakrat.sh.
- It depends on a pakrat.config file in the same directory.
- The wrapper script is run daily by cron at 4:25pm.
- The wrapper script can also be run manually.
- Resiliency/details:
- Repos will be given a pathname that ends with the Unix epoch timestamp so there should be no problem with running the script more than once per day.
- The wrapper script will exit if it detects that it is already being run (just in case there are issues with Pakrat/Yum under the hood that would make simultaneous runs problematic).
- The Pakrat repo sync wrapper script is installed
at /root/cron/pakrat.sh.
- How to add additional repos for Pakrat to sync?
- Recommended procedures:
- Establish the client configuration for the repository on the
- Pakrat-Yum server.
- XXXXXXXXX
- NOTE: If/when we start dealing more with GPG keys we will need to update this procedure slightly. See also LSST-1031 <https://jira.ncsa.illinois.edu/browse/LSST-1031>.
- Recommended procedures:
1.1.1.6 Improvements - High Priority¶
GPFS
- overall:
- size: Dan suggests ~50TB but look at baselining data from
object-data06
- synced daily for ~117 days leads to 50G of storage
- location: Andy says just inside GPFS root for now; mkdir -p pakrat/production (just in case)
- refactor Puppet code (apache_config) and Pakrat scripts to look for this location
- implement GPFS code in Puppet to make sure it is mounted
- size: Dan suggests ~50TB but look at baselining data from
object-data06
- add error checking into Pakrat script to handle case where GPFS is not available
- after further consideration, probably best to back up to GPFS but still store on disk (what happens if GPFS is broken and our goal is to push out a patch...?)
- overall:
create more verbose timestamp via wrapper so that we can run Pakrat multiple times a day if necessary
- ran it twice in one day once (into the same snapshot) and
encountered the errors described below for the elasticsearch-1.7
and influxdb repos
- initially thought they were related to running Pakrat twice into the same output repo path but they are persisting on the regularly weekly runs and after adding the Unix epoch timestamp to the repo paths
- ran it twice in one day once (into the same snapshot) and
encountered the errors described below for the elasticsearch-1.7
and influxdb repos
fix the following issue: packages with unexpected filenames do not appear in local Pakrat-generated metadata:
the particularly metadata issue we are concerned about is as follows and (so far) only affects the elasticsearch-1.7 and influxdb repos:
- results in errors in Pakrat output such as this:
- Cannot read file: /repos/centos/7/x86_64/influxdb/2017-08-14/Packages/chronograf-1.3.0-1.x86_64.rpm
- results in errors in Pakrat output such as this:
these errors correspond to the following scenario:
as listed in the *primary.xml metadata from the SOURCE repository
version/release info in ‘href’ parameter of ‘location’ key does not match various versions shown in ‘rpm-sourcerpm’ key:
- rpm:sourcerpm <http://rpmsourcerpm> (hard to imagine this is relevant)
- rpm:provides <http://rpmprovides> - rpm:entry <http://rpmentry> (e.g., rel=)
more specifically, the rpm name does NOT have a release segment in it
e.g., ‘elasticsearch-1.7.0.noarch.rpm’ is the RPM and it does not have a release in it’s name (e.g., *1.7.0-1.noarch.rpm) but SOURCE metadata indicates it is release -1:
- <rpm:sourcerpm <http://rpmsourcerpm>>elasticsearch-1.7.0-1.src.rpm</rpm:sourcerpm <http://rpmsourcerpm>><rpm:header-range <http://rpmheader-range> start=”880” end=”19168”/><rpm:provides <http://rpmprovides>><rpm:entry <http://rpmentry> name=”elasticsearch” flags=”EQ” epoch=”0” ver=”1.7.0”rel=”1”/><rpm:entry <http://rpmentry> name=”config(elasticsearch)” flags=”EQ” epoch=”0” ver=”1.7.0” rel=”1”/></rpm:provides <http://rpmprovides>>
Pakrat downloads the RPMs but does not include them in its local metadata (e.g., the only elasticsearch RPM that appears in Pakrat’s metadata is 1.7.4-1, because that is the only RPM that has a properly-formatted name, including the release)
- thus they would be unknown to Yum clients going through Pakrat
possible fixes:
- work with the vendor to release properly named RPMs
- improve Pakrat to address this scenario (i.e., use the source
metadata to fix its local metadata)
- or is this an issue for the makerepo command
- see if Katello has the same issue or not
- mv or cp (or make symlinks for) the badly named RPMs after
Pakrat downloads them; this may ensure that Pakrat includes
them in its metadata
- could probably script this fix, i.e., when Pakrat sync uncovers one of these errors, look for RPM without release in its name and copy it to the version that it is looking for so that the next run can include it in its metadata (perhaps even schedule another run of the repo at the end)
- if we start cleaning out old “snapshots” and RPMs that are
no longer used, then we may also have to build a workaround
into that process
- although it’s possible that the worst that would happen is that after a clean out, several badly named RPMs are redownloaded during the next Pakrat sync
- using symlinks may help us here:
- register the targets of all symlinks ahead of the cleanup
- only remove a target if you are also going to remove the symlink
find and implement additional repos
- search /etc/yum.repos.d using xdsh
- search for the following terms in Puppet:
- yum
- adm::puppetdb
- base::puppet
- rpm
- package
- tar
- wget
- curl
- .com
- .edu
- git
- sync all repos in Pakrat
- redo Puppet implementation for Yum clients
1.1.1.7 Improvements - Low Priority (e.g., only if we adopt Pakrat as a permanent solution)¶
- Apache:
- move vhost stuff into Hiera
- move firewall networks into Hiera
- should I eliminate apache_config module? move all Hiera references and ‘apache’ module references into profile?
- Pakrat:
- move config (.config file, cron stuff) into Hiera
- is my approach for installing OK?
- how to handle the dependency that fails to install initially?
- improve verification/notification/fix when Pakrat sync is broken
- fix postfix for cron (this is a larger issue)
- are we sure that cron scheduling via crontab (as opposed to file-based /etc/cron.d scheduling) will result in emails for any output? yes
- how to know which RPM versions are included in each snapshot?
- look at *-primary.xml.gz / *-other.xml.gz; zcat piped to some xml parser?
- document troubleshooting/monitoring for Pakrat
1.1.2 Katello¶
1.1.2.1 Overview¶
Katello <https://theforeman.org/plugins/katello/> is a plug-in for Foreman that is used to manage content, specifically local Yum and Puppet repositories. Katello is an integrated control interface and UI for Pulp and also Candlepin (RH subscription management). These products are all components of the RedHat Satellite platform.
1.1.2.2 Decision to Not Use Katello (October 2017)¶
Areas where it possibly offers benefits or at least different features as compared to the alternative (Puppet w/ Git and Pakrat, then Foreman or xCAT):
- Integrated change control for Yum and Puppet.
- Ability to schedule releases of content.
- GUI for managing Yum repo syncing and management.
- Flexibility in managing which RPMs are offered in Yum repos.
- Ability to discard old Yum RPMs.
- Manages RHEL subscriptions.
- Handles syncing from Foreman/Katello ‘master’ to Katello ‘capsule’ (a Foreman Smart Proxy with Katello content services):
Reasons we have elected not to investigate Katello further at this time:
- Install and design seems overly complicated.
- You must install Katello before installing Foreman, then run the foreman-installer with a special flag in order to install Foreman for use with Katello (link <https://theforeman.org/plugins/katello/nightly/installation/index.html>).
- Creates the need to consult both Katello’s documentation and Foreman’s documentation for some considerations.
- The above features don’t seem to offer anything critical that we need
and which we haven’t already solved with Pakrat and our current
Puppet/Git change control process.
- We already have integrated change control, via Git, for Yum and Puppet. In fact, it’s not clear whether or not Katello’s state can be captured by Git.
- We don’t really need to schedule the release of content. Our focus is more likely to be on scheduling patching or allowing a NHC process to do rolling patching.
- A GUI is probably not necessary. Our Git/Puppet work is done in the CL already. We will likely investigate the Hammer CLI for Foreman as well.
- This is a little tricky with Pakrat, although presumably we could set certain RPMs to the side and recreate/edit metadata.
- We can generate a manual process for discarding old Yum RPMs from Pakrat, although it might not be worth it. Space is cheap.
- We do not currently use RHEL.
- We could set up a Yum-Pakrat ‘master’ and have each Smart Proxy/Yum-Pakrat slave sync from it.
In summary, it doesn’t appear that the benefits of Katello outweigh the extra complications it seems to present.
1.1.3 Other Considerations¶
If we ever decide that Pakrat seems lacking in some area we should consider Pulp <http://docs.pulpproject.org/> (which is used by Katello) and also survey the landscape to see if anything else is available besides Katello.
1.2 Yum Client Config and Puppet Best Practices¶
1.2.1 Overview¶
- All of our nodes must be configured to look at our managed Yum repos:
- during or immediately after deployment (by xCAT, Foreman, etc.)
- before any attempts by Puppet or other actors to go out and get an RPM by running Yum
- We need to implement other things in Puppet in such a way that they
only use Yum to get RPMs.
- Anything that is not an RPM should either be built into an RPM and hosted locally, stashed in Git, or hosted and versioned in some other way.
- All needed Yum repos should be managed (ideally Puppet would disable or uninstall unmanaged repos).
1.2.2 Current Practice¶
- EPEL hostname is configured by a resource from the ‘epel’ module from
Puppet Forge using Hiera
- but where the the ‘epel’ module declared for each node? only in other modules that happen to be covering all nodes?
- extra::yum was created to manage other repos (CentOS and Puppet Labs)
using the ‘file’ resource
- also turns off delta RPMs
- profile::yum_client was created to utilize the extra::yum manifest
- all roles reference this profile
- various other modules install repos using the ‘yumrepo’ resource type or by installing RPMs that install repos
1.2.3 Improvements - High Priority (these are needed whether we use Pakrat or Katello)¶
- Yum:
- stop managing yumrepo files and use one or both the of the
following:
- ‘yum’ module (3rd-party Yum module)
- this might only be needed to manage other aspects of Yum configuration (e.g., turn off delta RPMs, throw out old kernels, etc.), beyond which repos are present, enabled, etc.
- ‘yumrepo’ resource type
- ‘yum’ module (3rd-party Yum module)
- put all repo URLs and other data in Hiera
- manage all repos that are needed, pulling updates from Pakrat/Katello
- will we need to install/manage GPG keys? which repos use them
(EPEL does but this is handled)? how about Puppet Labs, etc.? how
do we manage them?
- GPG keys are often installed by the RPMs that also install the .repo files, no (e.g., ZFS <https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS>)?
- files are placed in /etc/pki/rpm-gpg (could be hosted in/installed by Puppet) and then installed using a command like “rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux”
- can the ‘yumrepo’ Puppet resource help with this? does the ‘yum’ Puppet module handle it better?
- disable any unmanaged repos (or even uninstall files for unmanaged
repos? which is better / easier)
- can remove the xCAT provisioning repos after deployment:
- xCAT-centos7-path0
- xcat-otherpkgs0
- the following repos can be removed from adm01:
- centosplus-source/7
- dell-system-update_independent
- gitlab_gitlab-ce-source
- can remove the xCAT provisioning repos after deployment:
- document daily procedures for pointing Yum clients at specific snaphots (this is *probably* needed for Katello as well, but possibly not)
- consider explicitly including the epel module in profile::yum_client
- stop managing yumrepo files and use one or both the of the
following:
- Other Puppet refactoring/updates:
- anything that requires a pkg MUST also require the appropriate Yum resources / EPEL module, etc. so that any managed repo is configured first; update and document
- xCAT (or Foreman)
- install basic Yum config (CentOS, Puppet Labs, EPEL at a minimum); kind of a belt and suspenders thing, just in case some Puppet thing would otherwise sneak in an external RPM
2 Foreman¶
Purpose and Background
ITS is already using this (for non-LSST resources) for Puppet ENC and reporting.
Security is using this for their machine (largely VMs).
Investigation on LSST Test Cluster
Foreman is being installed on lsst-test-adm01. More info:
- Foreman Feature Matrix and Evaluation <file:////display/LSST/Foreman+Feature+Matrix+and+Evaluation>
- Foreman on test cluster <file:////display/LSST/Foreman+on+test+cluster>
Resources
project website: theforeman.org <https://theforeman.org/>
slideshare: Host Orchestration with Foreman, Puppet and Gitlab <https://www.slideshare.net/tullis/linux-host-orchestration-with-foreman-with-puppet-and-gitlab>
2.1 Foreman Feature Matrix and Evaluation¶
- Overview <#ForemanFeatureMatrixandEvaluation-Overv>
- Feature Matrix <#ForemanFeatureMatrixandEvaluation-Featu>
- Deployment <#ForemanFeatureMatrixandEvaluation-Deplo>
- BMC/firmware management <#ForemanFeatureMatrixandEvaluation-BMC/f>
- Integration w/ Puppet <#ForemanFeatureMatrixandEvaluation-Integ>
- Yum repo hosting/management <#ForemanFeatureMatrixandEvaluation-Yumre>
- Distributed architecture and scalability <#ForemanFeatureMatrixandEvaluation-Distr>
- Reliability <#ForemanFeatureMatrixandEvaluation-Relia>
- Interface / workflow / ease of use <#ForemanFeatureMatrixandEvaluation-Inter>
- Documentation and support <#ForemanFeatureMatrixandEvaluation-Docum>
- Summary Evaluation <#ForemanFeatureMatrixandEvaluation-Summa>
- Addendum 1: Possible end states <#ForemanFeatureMatrixandEvaluation-Adden>
- Addendum 2: Other considerations for making a decision <#ForemanFeatureMatrixandEvaluation-Adden>
2.2 Overview¶
The purpose of this page is to help us enumerate the features of a Foreman-based solution vs. an xCAT-based solution to deployment and management of nodes. It may pay to consider a hybrid solution, namely a Foreman-based solution that also uses pieces of xCAT (or Confluent).
NOTE: We also need to indicate which of the listed features are requirements. Some may not be.
2.3 Feature Matrix¶
Priority key:
3) requirement - must have this or we cannot deliver for the project and/or common/critical admin tasks would be hopelessly inefficient
2) very helpful to have - not a requirement but would increase admin efficiency considerably around a common task, decrease risk, or harden security further
1) somewhat helpful to have - not a requirement but would increase admin efficiency in a minor fashion
0) not needed - not necessary and of little usefulness, to the point that it is not worth the time
?) unknown
Feature | Priority | xCAT-oriented | Foreman- oriented |
---|---|---|---|
Deployment | – | ||
DHCP for mgmt networks | 3 | Yes - tested | Yes - tested |
PXE & TFTP | 3 | Yes - tested both Dell and Lenovo | Preliminary yes
|
Anaconda installs for CentOS: kickstart, partition, etc. | 3 | Yes - meeting our needs so far | Preliminary yes
|
Support for other distros or OSes
|
??? | Other NCSA clusters are using RHEL w/ xCAT. Should support others, including (apparently) Windows.
|
Should support others, including (apparently) Windows (via vSphere templates)
|
Deploys ESXi on bare metal
|
1 | Yes, appears to install ESXi on bare metal (xCAT wiki)
|
Yes, appears to install ESXi on bare metal (Foreman wiki)
|
Local DNS for location-specif ic mgmt and svc networks
|
??? | Yes, although we haven’t been using
|
Yes - tested |
Manage DNS hosted on external system (e.g., make local DNS authoritative or have mgmt system interact with external DNS via an API)
|
1 | Probably not.
|
Possibly...but needs investigation
|
Bare-metal deployment | 3 | Yes - tested | Yes - tested |
OS deployment to VMs
|
2 | Yes, but not yet tested https://sourcef orge.net/p/xcat /wiki/XCAT_Virt ualization_with _VMWare/
|
Yes, but not yet tested https://thefore man.org/manuals /1.15/#5.2.9VMw areNotes
|
Provisioning of VMs within VMware | 1 | Yes, but not yet tested https://sourcef orge.net/p/xcat /wiki/XCAT_Virt ualization_with _VMWare/
|
Yes, but not yet tested https://thefore man.org/manuals /1.15/#5.2.9VMw areNotes
|
Provisioning of cloud resources (e.g., AWS EC2, GCE, etc.) | ??? | Not really; the xCAT documentation recommends using Chef to interact with these resources. luster/ |
Some support (manual provisioning with image-based deployment of the OS). |
Diskless install / stateless nodes
|
??? | Yes, using in various NCSA clusters | Unsure...it seems possible (just PXE-boot from your desired boot image rather than an Anaconda-based install image) but there doesn’t seem to be any specific how-tos or tutorials on this and no sign that anyone asking has ever gotten detailed help with it
|
Node discovery (w/o interacting with switches)
|
2 | Yes, but haven’t pursued enough to get it to work
|
Offers this feature (Discovery Plugin <https:/ /theforeman.org /plugins/forema n_discovery/9.1 /index.html> ), but not tested
|
Switch-based discovery (i.e., SNMP query of switches)
|
1 | Yes
|
No?
|
Configure Ethernet switch ports
|
0.5 | Yes?
|
No?
|
BMC/firmware management | Need to strong focus of xCAT. | investigate what the BMC Smart Proxy offers us. Also investigate how we can use IBM/Lenovo Confluent (next-generatio n of xCAT) with Foreman. |
|
Remote power | 3 | Yes - rpower |
|
Remote console and console capture | 3 | Yes - xCAT’s rcons and conserver |
|
Manage BIOS settings out-of-band (ideally w/o reboot) and programmaticall y | 3 | Yes - Lenovo: xCAT’s pasu, but sometimes requires a reboot Yes - Dell: must use racadm, probably with a wrapper |
|
Install firmware outside of OS
|
3 | Lenovo: supported via xCAT Genesis boot + Lenovo onecli
|
|
Integration w/Puppet | 2 | Not integrated...
...However, the main thing missing right now is better Puppet reporting, although in theory this is already available in NPCF via centralized logging and is being looked at via our monitoring stack. |
High level of integration with Puppet; provides:
|
Yum repo hosting/ management | 3 | Pakrat:
|
Pakrat (or perhaps Pulp/Katello)
|
Distributed architecture and scalability
|
– | Allows for distributed management via Service Nodes: https://xcat-do cs.readthedocs. io/en/2.13.8/ad vanced/hierarch y/index.html
|
Allows for distributed management via Foreman Smart Proxies: https://thefore man.org/manuals /1.15/#1.Forema n1.15Manual
Foreman Master controls deployments (DHCP, local DNS, TFTP)
|
Central execution of remote deployments / central updating of node settings on remote deployment infrastructure (i.e., configure deployment settings on a master deployment server at NCSA to affect how a node deploys in Chile, handle things like DHCP, PXE, kickstart, etc.) | 1 | No, does not seem to support this out of the box (doesn’t support remote infrastructure at all)
|
Yes, definitely handles updating node settings (stored in Foreman Master)
|
Central management of remote deployment infrastructure (across WAN) (i.e., how do we keep remote deployment servers up-to-date)
|
2 | No, does not seem to support this directly
|
A little bit...?
|
Initiate IPMI/firmware/h ardware management commands on remote machines from a central location (e.g., set to PXE, reboot, install firmware, configure BMC, etc.)
|
2 | No, does not support this out of the box
|
Maybe...
|
Distributed Puppet architecture
|
3 or 1 | xCAT-based solution offers no assistance here but it should all be possible.
|
A Foreman-based solution may make some of this easier:
|
Distributed environments can operating during WAN cut
|
3 | Yes, but investigate Puppet (esp. ENC and CA).
|
Yes, but investigate Puppet (esp. ENC and CA).
|
PXE over WAN
|
1 | No, xCAT does not seem to support PXE over WAN. |
|
Local kickstart server or encryption of kickstart communication
|
3 | Yes, each xCAT master would be local. | Yes, Foreman has a “Templates” Smart Proxy feature that supports distributed sources kickstart. |
Other security considerations (encryption of other other command data across WAN; authentication/ authorization; etc.) | 3 |
|
|
Scalability | 3 | Yes, an xCAT-based solution should be able to scale to meet our needs.
|
Yes, a Foreman-based solution should be able to scale to meet our needs.
|
Reliability | Yes, seems solid overall as evidenced by previous use at NCSA, including LSST. | Probably...
|
|
Ability to backup and restore | 3 |
|
|
High availability - is this necessary?
|
3 or 1 | Possible roadmap: information on xCAT high availability <h ttp://xcat-docs .readthedocs.io /en/stable/adva nced/hamn/index .html> | Possible roadmap: HA case study <https:// theforeman.org/ 2015/12/journey _to_high_availa bility.html>
|
Interface / workflow / ease of use | |||
Reporting/centr al logging
|
1 | Yes. Adequate logging including console logs. | Yes. Also includes centralized reporting console for Puppet. |
Support for change control: Git integration, rollback, and auditing procedures
|
3 or 2 | No Git-integration by default, but we could easily customize.
No built-in undo. Auditing may be less than desired since we tend to do everything as root in xCAT. |
No Git-integration by default. Custom functionality may be harder to implement and enforce.
No built-in undo. Has decent auditing of actions performed via the Foreman master (likely includes CLI), and may display executing user effectively (esp. in web UI; not sure about CLI, etc.)
|
Overall ease of use / efficiency | 2 |
|
|
Specifically: ease of (re)deploying the OS on a node (incl. Puppet ENC, NICs, disk partitioning) |
2 |
|
|
Specifically: ease of configuring new hardware (i.e., modifying BIOS settings, other firmware, possibly “discovery” process)
|
1 |
|
|
Command-line interface (and other scriptable APIs)
|
3 | Extensive and fairly well developed CLI. |
|
GUI admin console | 1 | No...
|
Yes.
|
Granular permissions (levels of access, buckets of resources)
|
3 or 1 | Not built in.
|
Yes, but need to evaluate further if this is important.
|
Specifically: Allow developers to reprovision specific groups of machines
|
3 or 1 | Not built in.
|
Seems to be built in.
|
Notifications
|
1 | No, does not seem to be built in. | Yes, seem to be built in.
|
Documentation and support | 2 | xCAT documentation is decent (both comprehensive and specific, although there seem to be quite a few new features that are not yet documented). xcat-user list on SourceForge has been reasonably useful. Current vendor relationship with NETSource/Lenov o allows us somewhat privileged access to xCAT team. NCSA is already using xCAT for Systems (Industry and ICC in addition to LSST) and has a few team members with extensive experience with xCAT. |
Foreman documentation is decent (it is a really big product and the documentation sometimes lacks specificity and/or concrete examples). foreman-users Google group had about 2.5 times more messages than xcat-user list in a representative time frame (the Google group is now defunct; use . Using RedHat Satellite (Foreman + Katello & more) might get us support but would almost certainly require using RHEL and would almost certainly require additional cost. NCSA is already using Foreman for ITS (basic UI/Puppet reports & ENC only, so far) and Security (more extensive use, including Katello). Security’s person with most experience recently left. |
2.4 Summary Evaluation¶
Both products—xCAT and Foreman—or a combination of these products would seem to meet our needs at a fundamental level. In any case we’d be using the product(s) for IPMI functions, (possibly) bare-metal discovery / VMware provisioning, and PXE-boot OS installs with as minimal a configuration as possible with Puppet handling as much of the configuration as possible.
Foreman is a newer tool but seems to have broader functionality and appears to have a larger user community. It also appears to be a more complex tool, which could lead to greater management overhead.
Foreman also appears to offer better out-of-the-box support for a distributed architecture with centralized control and secure communication between the deployment servers. On the other hand, pursuing a more centralized point of control would likely push us more strongly towards high availability of the central/master resources, which could introduce even more complexity/management overhead.
The actual design and implementation of our solution, or future shifts in our design/implementation, may be influenced by a few outstanding questions about project requirements and architecture (e.g., will we need to support stateless nodes? will we need to manage DNS with our solution? will we need to offer role-based access to admins or the capability for non-admins to view/update configuration? will we need to support cloud resources?)
2.5 Addendum 1: Possible end states¶
(1) Use current NPCF model (xCAT for deployment and IPMI functions, Puppet for configuration management, Pakrat for Yum repo management, new monitoring stack, possibly Confluent for IPMI functions)
(2) Same + use Foreman for Puppet integration (ENC, reporting, certificates) alongside xCAT, etc.
- It may not be possible for xCAT and Foreman Master to live on the same server. By default a Foreman Master includes TFTP server by default as does xCAT and their settings according to /etc/xinitd.d/tftp seem to conflict. We could ask online to see if it is possible to install a Foreman Master without TFTP. Also see Foreman Manual for customization of TFTP <https://www.theforeman.org/manuals/1.16/index.html#4.3.9TFTP>.
- If pursuing (2) it might make sense to have general admin/xCAT/IPMI/bastion functions on one node and Foreman/Puppet (CA, Master, ENC, reporting)/GitLab on another node.
- Our GitLab on lsst-adm01 uses PostgreSQL as does Foreman (by default). Handle Foreman + GitLab with care.
(3) Same + use Foreman for node deployment (DHCP/PXE, kickstart, possibly DNS) instead of xCAT (still use xCAT/Confluent for IPMI functions, Pakrat for Yum repo management).
(4) Same + use Foreman BMC Smart Proxy for IPMI functions (still use Pakrat for Yum repo management)
NOTE: (2), (3), and (4) also offer the possibility of using Katello for Git/Puppet branch management and/or Yum repo management.
- We could also look at using Katello components (esp. Pulp) directly w/ (1), (2), (3), or (4).
2.6 Addendum 2: Other considerations for making a decision¶
- We would save some time up front by going with (1) because we’re
basically already there with NPCF.
- There are quite a few improvements we should make, however.
- And we should rebuild our current xCAT/Puppet master/management node (lsst-adm01) at some point. Do we want to rebuild more-or-less as-is or rebuild with Foreman, whether (2), (3), or (4)?
- By sticking with (1), merging NCSA 3003 into a shared environment can
be a stronger focus more immediately (and there are many benefits to
getting this done sooner rather than later).
- Standing up another xCAT master for NCSA 3003 would take very little time and would offer a good opportunity for refining our backup/rebuild procedures for our xCAT master at NPCF.
- (2), (3), and (4) could be pursued later on (with more awareness of both project requirements and of Foreman) and also pursued incrementally, e.g., (1)->(2)->(3)->(4)->....