Posted on Leave a comment

Introducing Fedora CoreOS

The Fedora CoreOS team is excited to announce the first preview release of Fedora CoreOS, a new Fedora edition built specifically for running containerized workloads securely and at scale. It’s the successor to both Fedora Atomic Host and CoreOS Container Linux. Fedora CoreOS combines the provisioning tools, automatic update model, and philosophy of Container Linux with the packaging technology, OCI support, and SELinux security of Atomic Host.

Read on for more details about this exciting new release.

Why Fedora CoreOS?

Containers allow workloads to be reproducibly deployed to production and automatically scaled to meet demand. The isolation provided by a container means that the host OS can be small. It only needs a Linux kernel, systemd, a container runtime, and a few additional services such as an SSH server.

While containers can be run on a full-sized server OS, an operating system built specifically for containers can provide functionality that a general purpose OS cannot. Since the required software is minimal and uniform, the entire OS can be deployed as a unit with little customization. And, since containers are deployed across multiple nodes for redundancy, the OS can update itself automatically and then reboot without interrupting workloads.

Fedora CoreOS is built to be the secure and reliable host for your compute clusters. It’s designed specifically for running containerized workloads without regular maintenance, automatically updating itself with the latest OS improvements, bug fixes, and security updates. It provisions itself with Ignition, runs containers with Podman and Moby, and updates itself atomically and automatically with rpm-ostree.

Provisioning immutable infrastructure

Whether you run in the cloud, virtualized, or on bare metal, a Fedora CoreOS machine always begins from the same place: a generic OS image. Then, during the first boot, Fedora CoreOS uses Ignition to provision the system. Ignition reads an Ignition config from cloud user data or a remote URL, and uses it to create disk partitions and file systems, users, files and systemd units.

To provision a machine:

  1. Write a Fedora CoreOS Config (FCC), a YAML document that specifies the desired configuration of a machine. FCCs support all Ignition functionality, and also provide additional syntax (“sugar”) that makes it easier to specify typical configuration changes.
  2. Use the Fedora CoreOS Config Transpiler to validate your FCC and convert it to an Ignition config.
  3. Launch a Fedora CoreOS machine and pass it the Ignition config. If the machine boots successfully, provisioning has completed without errors.

Fedora CoreOS is designed to be managed as immutable infrastructure. After a machine is provisioned, you should not modify /etc or otherwise reconfigure the machine. Instead, modify the FCC and use it to provision a replacement machine.

This is similar to how you’d manage a container: container images are not updated in place, but rebuilt from scratch and redeployed. This approach makes it easy to scale out when load increases. Simply use the same Ignition config to launch additional machines.

Automatic updates

By default, Fedora CoreOS automatically downloads new OS releases, atomically installs them, and reboots into them. Releases roll out gradually over time. We can even stop a rollout if we discover a problem in a new release. Upgrades between Fedora releases are treated as any other update, and are automatically applied without user intervention.

The Linux ecosystem evolves quickly, and software updates can bring undesired behavior changes. However, for automatic updates to be trustworthy, they cannot break existing machines. To avoid this, Fedora CoreOS takes a two-pronged approach. First, we automatically test each change to the OS. However, automatic testing can’t catch all regressions, so Fedora CoreOS also ships multiple independent release streams:

  • The testing stream is a regular snapshot of the current Fedora release, plus updates.
  • After a testing release has been available for two weeks, it is sent to the stable stream. Bugs discovered in testing will be fixed before a release is sent to stable.
  • The next stream is a regular snapshot of the upcoming Fedora release, allowing additional time for testing larger changes.

All three streams receive security updates and critical bugfixes, and are intended to be safe for production use. Most machines should run the stable stream, since that receives the most testing. However, users should run a few percent of their nodes on the next and testing streams, and report problems to the issue tracker. This helps ensure that bugs that only affect certain workloads or certain hardware are fixed before they reach stable.

Telemetry

To help direct our development efforts, Fedora CoreOS performs some telemetry by default. A service called fedora-coreos-pinger periodically collects non-identifying information about the machine, such as the OS version, cloud platform, and instance type, and report it to servers controlled by the Fedora project.

No unique identifiers are reported or collected, and the data is only used in aggregate to answer questions about how Fedora CoreOS is being used. We prominently document that this collection is occurring and how to disable it. We also tell you how to help the project by reporting additional detail, including information that might identify the machine.

Current status of Fedora CoreOS

Fedora CoreOS is still under active development, and some planned functionality is not available in the first preview release:

  • Only the testing stream currently exists; the next and stable streams are not yet available.
  • Several cloud and virtualization platforms are not yet available. Only x86_64 is currently supported.
  • Booting a live Fedora CoreOS system via network (PXE) or CD is not yet supported.
  • We are actively discussing plans for closer integration with Kubernetes distributions, including OKD.
  • Fedora CoreOS Config Transpiler will gain more sugar over time.
  • Telemetry is not yet active.
  • Documentation is still under development.

While Fedora CoreOS is intended for production use, preview releases should not be used in production. Fedora CoreOS may change in incompatible ways during the preview period. There is no guarantee that a preview release will successfully update to a later preview release, or to a stable release.

The future

We expect the preview period to continue for about six months. At the end of the preview, we will declare Fedora CoreOS stable and encourage its use in production.

CoreOS Container Linux will be maintained until about six months after Fedora CoreOS is declared stable. We’ll announce the exact timing later this year. During the preview period, we’ll publish tools and documentation to help Container Linux users migrate to Fedora CoreOS.

Fedora Atomic Host will be maintained until the end of life of Fedora 29, expected in late November. Before then, Fedora Atomic Host users should migrate to Fedora CoreOS.

Getting involved in Fedora CoreOS

To try out the new release, head over to the download page to get OS images or cloud image IDs. Then use the quick start guide to get a machine running quickly. Finally, get involved! You can report bugs and missing features to the issue tracker. You can also discuss Fedora CoreOS in Fedora Discourse, the development mailing list, or in #fedora-coreos on Freenode.

Welcome to Fedora CoreOS, and let us know what you think!

Posted on Leave a comment

Bond WiFi and Ethernet for easier networking mobility

Sometimes one network interface isn’t enough. Network bonding allows multiple network connections to act together with a single logical interface. You might do this because you want more bandwidth than a single connection can handle. Or maybe you want to switch back and forth between your wired and wireless networks without losing your network connection.

The latter applies to me. One of the benefits to working from home is that when the weather is nice, it’s enjoyable to work from a sunny deck instead of inside. But every time I did that, I lost my network connections. IRC, SSH, VPN — everything goes away, at least for a moment while some clients reconnect. This article describes how I set up network bonding on my Fedora 30 laptop to seamlessly move from the wired connection my laptop dock to a WiFi connection.

In Linux, interface bonding is handled by the bonding kernel module. Fedora does not ship with this enabled by default, but it is included in the kernel-core package. This means that enabling interface bonding is only a command away:

sudo modprobe bonding

Note that this will only have effect until you reboot. To permanently enable interface bonding, create a file called bonding.conf in the /etc/modules-load.d directory that contains only the word “bonding”.

Now that you have bonding enabled, it’s time to create the bonded interface. First, you must get the names of the interfaces you want to bond. To list the available interfaces, run:

sudo nmcli device status

You will see output that looks like this:

DEVICE          TYPE      STATE         CONNECTION         
enp12s0u1       ethernet  connected     Wired connection 1
tun0            tun       connected     tun0               
virbr0          bridge    connected     virbr0             
wlp2s0          wifi      disconnected  --      
p2p-dev-wlp2s0  wifi-p2p disconnected  --      
enp0s31f6       ethernet  unavailable   --      
lo              loopback  unmanaged     --                 
virbr0-nic      tun       unmanaged     --       

In this case, there are two (wired) Ethernet interfaces available. enp12s0u1 is on a laptop docking station, and you can tell that it’s connected from the STATE column. The other, enp0s31f6, is the built-in port in the laptop. There is also a WiFi connection called wlp2s0. enp12s0u1 and wlp2s0 are the two interfaces we’re interested in here. (Note that it’s not necessary for this exercise to understand how network devices are named, but if you’re interested you can see the systemd.net-naming-scheme man page.)

The first step is to create the bonded interface:

sudo nmcli connection add type bond ifname bond0 con-name bond0

In this example, the bonded interface is named bond0. The “con-name bond0” sets the connection name to bond0; leaving this off would result in a connection named bond-bond0. You can also set the connection name to something more human-friendly, like “Docking station bond” or “Ben”

The next step is to add the interfaces to the bonded interface:

sudo nmcli connection add type ethernet ifname enp12s0u1 master bond0 con-name bond-ethernet
sudo nmcli connection add type wifi ifname wlp2s0 master bond0 ssid Cotton con-name bond-wifi

As above, the connection name is specified to be more descriptive. Be sure to replace enp12s0u1 and wlp2s0 with the appropriate interface names on your system. For the WiFi interface, use your own network name (SSID) where I use “Cotton”. If your WiFi connection has a password (and of course it does!), you’ll need to add that to the configuration, too. The following assumes you’re using WPA2-PSK authentication

sudo nmcli connection modify bond-wifi wifi-sec.key-mgmt wpa-psk
sudo nmcli connection edit bond-wif

The second command will bring you into the interactive editor where you can enter your password without it being logged in your shell history. Enter the following, replacing password with your actual password

set wifi-sec.psk password
save
quit

Now you’re ready to start your bonded interface and the secondary interfaces you created

sudo nmcli connection up bond0
sudo nmcli connection up bond-ethernet
sudo nmcli connection up bond-wifi

You should now be able to disconnect your wired or wireless connections without losing your network connections.

A caveat: using other WiFi networks

This configuration works well when moving around on the specified WiFi network, but when away from this network, the SSID used in the bond is not available. Theoretically, one could add an interface to the bond for every WiFi connection used, but that doesn’t seem reasonable. Instead, you can disable the bonded interface:

sudo nmcli connection down bond0

When back on the defined WiFi network, simply start the bonded interface as above.

Fine-tuning your bond

By default, the bonded interface uses the “load balancing (round-robin)” mode. This spreads the load equally across the interfaces. But if you have a wired and a wireless connection, you may want to prefer the wired connection. The “active-backup” mode enables this. You can specify the mode and primary interface when you are creating the interface, or afterward using this command (the bonded interface should be down):

sudo nmcli connection modify bond0 +bond.options "mode=active-backup,primary=enp12s0u1"

The kernel documentation has much more information about bonding options.

Posted on Leave a comment

Manage your shell environment

Some time ago, the Fedora Magazine has published an article introducing ZSH — an alternative shell to Fedora’s default, bash. This time, we’re going to look into customizing it to use it in a more effective way. All of the concepts shown in this article also work in other shells such as bash.

Alias

Aliases are shortcuts for commands. This is useful for creating short commands for actions that are performed often, but require a long command that would take too much time to type. The syntax is:

$ alias yourAlias='complex command with arguments'

They don’t always need to be used for shortening long commands. Important is that you use them for tasks that you do often. An example could be:

$ alias dnfUpgrade='dnf -y upgrade'

That way, to do a system upgrade, I just type dnfUpgrade instead of the whole dnf command.

The problem of setting aliases right in the console is that once the terminal session is closed, the alias would be lost. To set them permanently, resource files are used.

Resource Files

Resource files (or rc files) are configuration files that are loaded per user in the beginning of a session or a process (when a new terminal window is opened, or a new program like vim is started). In the case of ZSH, the resource file is .zshrc, and for bash it’s .bashrc.

To make the aliases permanent, you can either put them in your resource. You can edit your resource file with a text editor of your choice. This example uses vim:

$ vim $HOME/.zshrc

Or for bash:

$ vim $HOME/.bashrc

Note that the location of the resource file is specified relatively to a home directory — and that’s where ZSH (or bash) are going to look for the file by default for each user.

Other option is to put your configuration in any other file, and then source it:

$ source /path/to/your/rc/file

Again, sourcing it right in your session will only apply it to the session, so to make it permanent, add the source command to your resource file. The advantage of having your source file in a different location is that you can source it any time. Or anywhere which is especially useful in shared environments.

Environment Variables

Environment variables are values assigned to a specific name which can be then called in scripts and commands. They start with the $ dollar sign. One of the most common is $HOME that references the home directory.

As the name suggests, environment variables are a part of your environment. Set a variable using the following syntax:

$ http_proxy="http://your.proxy"

And to make it an environment variable, export it with the following command:

$ export $http_proxy

To see all the environment variables that are currently set, use the env command:

$ env

The command outputs all the variables available in your session. To demonstrate how to use them in a command, try running the following echo commands:

$ echo $PWD
/home/fedora
$ echo $USER
fedora

What happens here is variable expansion — the value stored in the variable is used in your command.

Another useful variable is $PATH, that defines directories that your shell uses to look for binaries.

The $PATH variable

There are many directories, or folders (the way they are called in graphical environments) that are important to the OS. Some directories are set to hold binaries you can use directly in your shell. And these directories are defined in the $PATH variable.

$ echo $PATH
/usr/lib64/qt-3.3/bin:/usr/share/Modules/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/usr/libexec/sdcc:/usr/libexec/sdcc:/usr/bin:/bin:/sbin:/usr/sbin:/opt/FortiClient

This will help you when you want to have your own binaries (or scripts) accessible in the shell.

Posted on Leave a comment

Securing telnet connections with stunnel

Telnet is a client-server protocol that connects to a remote server through TCP over port 23. Telnet does not encrypt data and is considered insecure and passwords can be easily sniffed because data is sent in the clear. However there are still legacy systems that need to use it. This is where stunnel comes to the rescue.

Stunnel is designed to add SSL encryption to programs that have insecure connection protocols. This article shows you how to use it, with telnet as an example.

Server Installation

Install stunnel along with the telnet server and client using sudo:

sudo dnf -y install stunnel telnet-server telnet

Add a firewall rule, entering your password when prompted:

firewall-cmd --add-service=telnet --perm
firewall-cmd --reload

Next, generate an RSA private key and an SSL certificate:

openssl genrsa 2048 > stunnel.key
openssl req -new -key stunnel.key -x509 -days 90 -out stunnel.crt

You will be prompted for the following information one line at a time. When asked for Common Name you must enter the correct host name or IP address, but everything else you can skip through by hitting the Enter key.

You are about to be asked to enter information that will be
incorporated into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:
Email Address []

Merge the RSA key and SSL certificate into a single .pem file, and copy that to the SSL certificate directory:

cat stunnel.crt stunnel.key > stunnel.pem
sudo cp stunnel.pem /etc/pki/tls/certs/

Now it’s time to define the service and the ports to use for encrypting your connection. Choose a port that is not already in use. This example uses port 450 for tunneling telnet. Edit or create the /etc/stunnel/telnet.conf file:

cert = /etc/pki/tls/certs/stunnel.pem
sslVersion = TLSv1
chroot = /var/run/stunnel
setuid = nobody
setgid = nobody
pid = /stunnel.pid
socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
[telnet]
accept = 450
connect = 23

The accept option is the port the server will listen to for incoming telnet requests. The connect option is the internal port the telnet server listens to.

Next, make a copy of the systemd unit file that allows you to override the packaged version:

sudo cp /usr/lib/systemd/system/stunnel.service /etc/systemd/system

Edit the /etc/systemd/system/stunnel.service file to add two lines. These lines create a chroot jail for the service when it starts.

[Unit]
Description=TLS tunnel for network daemons
After=syslog.target network.target

[Service]
ExecStart=/usr/bin/stunnel
Type=forking
PrivateTmp=true
ExecStartPre=-/usr/bin/mkdir /var/run/stunnel
ExecStartPre=/usr/bin/chown -R nobody:nobody /var/run/stunnel

[Install]
WantedBy=multi-user.target

Next, configure SELinux to listen to telnet on the new port you just specified:

sudo semanage port -a -t telnetd_port_t -p tcp 450

Finally, add a new firewall rule:

firewall-cmd --add-port=450/tcp --perm
firewall-cmd --reload

Now you can enable and start telnet and stunnel.

systemctl enable telnet.socket stunnel@telnet.service --now

A note on the systemctl command is in order. Systemd and the stunnel package provide an additional template unit file by default. The template lets you drop multiple configuration files for stunnel into /etc/stunnel, and use the filename to start the service. For instance, if you had a foobar.conf file, you could start that instance of stunnel with systemctl start stunnel@foobar.service, without having to write any unit files yourself.

If you want, you can set this stunnel template service to start on boot:

systemctl enable stunnel@telnet.service

Client Installation

This part of the article assumes you are logged in as a normal user (with sudo privileges) on the client system. Install stunnel and the telnet client:

dnf -y install stunnel telnet

Copy the stunnel.pem file from the remote server to your client /etc/pki/tls/certs directory. In this example, the IP address of the remote telnet server is 192.168.1.143.

sudo scp myuser@192.168.1.143:/etc/pki/tls/certs/stunnel.pem
/etc/pki/tls/certs/

Create the /etc/stunnel/telnet.conf file:

cert = /etc/pki/tls/certs/stunnel.pem
client=yes
[telnet]
accept=450
connect=192.168.1.143:450

The accept option is the port that will be used for telnet sessions. The connect option is the IP address of your remote server and the port it’s listening on.

Next, enable and start stunnel:

systemctl enable stunnel@telnet.service --now

Test your connection. Since you have a connection established, you will telnet to localhost instead of the hostname or IP address of the remote telnet server:

[user@client ~]$ telnet localhost 450
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Kernel 5.0.9-301.fc30.x86_64 on an x86_64 (0)
server login: myuser
Password: XXXXXXX
Last login: Sun May  5 14:28:22 from localhost
[myuser@server ~]$
Posted on Leave a comment

Manage business documents with OpenAS2 on Fedora

Business documents often require special handling. Enter Electronic Document Interchange, or EDI. EDI is more than simply transferring files using email or http (or ftp), because these are documents like orders and invoices. When you send an invoice, you want to be sure that:

1. It goes to the right destination, and is not intercepted by competitors.
2. Your invoice cannot be forged by a 3rd party.
3. Your customer can’t claim in court that they never got the invoice.

The first two goals can be accomplished by HTTPS or email with S/MIME, and in some situations, a simple HTTPS POST to a web API is sufficient. What EDI adds is the last part.

This article does not cover the messy topic of formats for the files exchanged. Even when using a standardized format like ANSI or EDIFACT, it is ultimately up to the business partners. It is not uncommon for business partners to use an ad-hoc CSV file format. This article shows you how to configure Fedora to send and receive in an EDI setup.

Centralized EDI

The traditional solution is to use a Value Added Network, or VAN. The VAN is a central hub that transfers documents between their customers. Most importantly, it keeps a secure record of the documents exchanged that can be used as evidence in disputes. The VAN can use different transfer protocols for each of its customers

AS Protocols and MDN

The AS protocols are a specification for adding a digital signature with optional encryption to an electronic document. What it adds over HTTPS or S/MIME is the Message Disposition Notification, or MDN. The MDN is a signed and dated response that says, in essence, “We got your invoice.” It uses a secure hash to identify the specific document received. This addresses point #3 without involving a third party.

The AS2 protocol uses HTTP or HTTPS for transport. Other AS protocols target FTP and SMTP. AS2 is used by companies big and small to avoid depending on (and paying) a VAN.

OpenAS2

OpenAS2 is an open source Java implemention of the AS2 protocol. It is available in Fedora since 28, and installed with:

$ sudo dnf install openas2
$ cd /etc/openas2

Configuration is done with a text editor, and the config files are in XML. The first order of business before starting OpenAS2 is to change the factory passwords.

Edit /etc/openas2/config.xml and search for ChangeMe. Change those passwords. The default password on the certificate store is testas2, but that doesn’t matter much as anyone who can read the certificate store can read config.xml and get the password.

What to share with AS2 partners

There are 3 things you will exchange with an AS2 peer.

AS2 ID

Don’t bother looking up the official AS2 standard for legal AS2 IDs. While OpenAS2 implements the standard, your partners will likely be using a proprietary product which doesn’t. While AS2 allows much longer IDs, many implementations break with more than 16 characters. Using otherwise legal AS2 ID chars like ‘:’ that can appear as path separators on a proprietary OS is also a problem. Restrict your AS2 ID to upper and lower case alpha, digits, and ‘_’ with no more than 16 characters.

SSL certificate

For real use, you will want to generate a certificate with SHA256 and RSA. OpenAS2 ships with two factory certs to play with. Don’t use these for anything real, obviously. The certificate file is in PKCS12 format. Java ships with keytool which can maintain your PKCS12 “keystore,” as Java calls it. This article skips using openssl to generate keys and certificates. Simply note that sudo keytool -list -keystore as2_certs.p12 will list the two factory practice certs.

AS2 URL

This is an HTTP URL that will access your OpenAS2 instance. HTTPS is also supported, but is redundant. To use it you have to uncomment the https module configuration in config.xml, and supply a certificate signed by a public CA. This requires another article and is entirely unnecessary here.

By default, OpenAS2 listens on 10080 for HTTP and 10443 for HTTPS. OpenAS2 can talk to itself, so it ships with two partnerships using http://localhost:10080 as the AS2 URL. If you don’t find this a convincing demo, and can install a second instance (on a VM, for instance), you can use private IPs for the AS2 URLs. Or install Cjdns to get IPv6 mesh addresses that can be used anywhere, resulting in AS2 URLs like http://[fcbf:fc54:e597:7354:8250:2b2e:95e6:d6ba]:10080.

Most businesses will also want a list of IPs to add to their firewall. This is actually bad practice. An AS2 server has the same security risk as a web server, meaning you should isolate it in a VM or container. Also, the difficulty of keeping mutual lists of IPs up to date grows with the list of partners. The AS2 server rejects requests not signed by a configured partner.

OpenAS2 Partners

With that in mind, open partnerships.xml in your editor. At the top is a list of “partners.” Each partner has a name (referenced by the partnerships below as “sender” or “receiver”), AS2 ID, certificate, and email. You need a partner definition for yourself and those you exchange documents with. You can define multiple partners for yourself. OpenAS2 ships with two partners, OpenAS2A and OpenAS2B, which you’ll use to send a test document.

OpenAS2 Partnerships

Next is a list of “partnerships,” one for each direction. Each partnership configuration includes the sender, receiver, and the AS2 URL used to send the documents. By default, partnerships use synchronous MDN. The MDN is returned on the same HTTP transaction. You could uncomment the as2_receipt_option for asynchronous MDN, which is sent some time later. Use synchronous MDN whenever possible, as tracking pending MDNs adds complexity to your application.

The other partnership options select encryption, signature hash, and other protocol options. A fully implemented AS2 receiver can handle any combination of options, but AS2 partners may have incomplete implementations or policy requirements. For example, DES3 is a comparatively weak encryption algorithm, and may not be acceptable. It is the default because it is almost universally implemented.

If you went to the trouble to set up a second physical or virtual machine for this test, designate one as OpenAS2A and the other as OpenAS2B. Modify the as2_url on the OpenAS2A-to-OpenAS2B partnership to use the IP (or hostname) of OpenAS2B, and vice versa for the OpenAS2B-to-OpenAS2A partnership. Unless they are using the FedoraWorkstation firewall profile, on both machines you’ll need:

# sudo firewall-cmd --zone=public --add-port=10080/tcp

Now start the openas2 service (on both machines if needed):

# sudo systemctl start openas2

Resetting the MDN password

This initializes the MDN log database with the factory password, not the one you changed it to. This is a packaging bug to be fixed in the next release. To avoid frustration, here’s how to change the h2 database password:

$ sudo systemctl stop openas2
$ cat >h2passwd <<'DONE'
#!/bin/bash
AS2DIR="/var/lib/openas2"
java -cp "$AS2DIR"/lib/h2* org.h2.tools.Shell \
-url jdbc:h2:"$AS2DIR"/db/openas2 \
-user sa -password "$1" <<EOF
alter user sa set password '$2';
exit
EOF
DONE
$ sudo sh h2passwd ChangeMe yournewpasswordsetabove
$ sudo systemctl start openas2

Testing the setup

With that out of the way, let’s send a document. Assuming you are on OpenAS2A machine:

$ cat >testdoc <<'DONE'
This is not a real EDI format, but is nevertheless a document.
DONE
$ sudo chown openas2 testdoc
$ sudo mv testdoc /var/spool/openas2/toOpenAS2B
$ sudo journalctl -f -u openas2
... log output of sending file, Control-C to stop following log
^C

OpenAS2 does not send a document until it is writable by the openas2 user or group. As a consequence, your actual business application will copy, or generate in place, the document. Then it changes the group or permissions to send it on its way, to avoid sending a partial document.

Now, on the OpenAS2B machine, /var/spool/openas2/OpenAS2A_OID-OpenAS2B_OID/inbox shows the message received. That should get you started!


Photo by Beatriz Pérez Moya on Unsplash.

Posted on Leave a comment

Check storage performance with dd

This article includes some example commands to show you how to get a rough estimate of hard drive and RAID array performance using the dd command. Accurate measurements would have to take into account things like write amplification and system call overhead, which this guide does not. For a tool that might give more accurate results, you might want to consider using hdparm.

To factor out performance issues related to the file system, these examples show how to test the performance of your drives and arrays at the block level by reading and writing directly to/from their block devices. WARNING: The write tests will destroy any data on the block devices against which they are run. Do not run them against any device that contains data you want to keep!

Four tests

Below are four example dd commands that can be used to test the performance of a block device:

  1. One process reading from $MY_DISK:
    # dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache
  2. One process writing to $MY_DISK:
    # dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct
  3. Two processes reading concurrently from $MY_DISK:
    # (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache &); (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache skip=200 &)
  4. Two processes writing concurrently to $MY_DISK:
    # (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct &); (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct skip=200 &)

– The iflag=nocache and oflag=direct parameters are important when performing the read and write tests (respectively) because without them the dd command will sometimes show the resulting speed of transferring the data to/from RAM rather than the hard drive.

– The values for the bs and count parameters are somewhat arbitrary and what I have chosen should be large enough to provide a decent average in most cases for current hardware.

– The null and zero devices are used for the destination and source (respectively) in the read and write tests because they are fast enough that they will not be the limiting factor in the performance tests.

– The skip=200 parameter on the second dd command in the concurrent read and write tests is to ensure that the two copies of dd are operating on different areas of the hard drive.

16 examples

Below are demonstrations showing the results of running each of the above four tests against each of the following four block devices:

  1. MY_DISK=/dev/sda2 (used in examples 1-X)
  2. MY_DISK=/dev/sdb2 (used in examples 2-X)
  3. MY_DISK=/dev/md/stripped (used in examples 3-X)
  4. MY_DISK=/dev/md/mirrored (used in examples 4-X)

A video demonstration of the these tests being run on a PC is provided at the end of this guide.

Begin by putting your computer into rescue mode to reduce the chances that disk I/O from background services might randomly affect your test results. WARNING: This will shutdown all non-essential programs and services. Be sure to save your work before running these commands. You will need to know your root password to get into rescue mode. The passwd command, when run as the root user, will prompt you to (re)set your root account password.

$ sudo -i
# passwd
# setenforce 0
# systemctl rescue

You might also want to temporarily disable logging to disk:

# sed -r -i.bak 's/^#?Storage=.*/Storage=none/' /etc/systemd/journald.conf
# systemctl restart systemd-journald.service

If you have a swap device, it can be temporarily disabled and used to perform the following tests:

# swapoff -a
# MY_DEVS=$(mdadm --detail /dev/md/swap | grep active | grep -o "/dev/sd.*")
# mdadm --stop /dev/md/swap
# mdadm --zero-superblock $MY_DEVS

Example 1-1 (reading from sda)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 1)
# dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.7003 s, 123 MB/s

Example 1-2 (writing to sda)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 1)
# dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.67117 s, 125 MB/s

Example 1-3 (reading concurrently from sda)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 1)
# (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache &); (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.42875 s, 61.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.52614 s, 59.5 MB/s

Example 1-4 (writing concurrently to sda)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 1)
# (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct &); (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct skip=200 &)
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.2435 s, 64.7 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.60872 s, 58.1 MB/s

Example 2-1 (reading from sdb)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 2)
# dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.67285 s, 125 MB/s

Example 2-2 (writing to sdb)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 2)
# dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.67198 s, 125 MB/s

Example 2-3 (reading concurrently from sdb)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 2)
# (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache &); (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.52808 s, 59.4 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.57736 s, 58.6 MB/s

Example 2-4 (writing concurrently to sdb)

# MY_DISK=$(echo $MY_DEVS | cut -d ' ' -f 2)
# (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct &); (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.7841 s, 55.4 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.81475 s, 55.0 MB/s

Example 3-1 (reading from RAID0)

# mdadm --create /dev/md/stripped --homehost=any --metadata=1.0 --level=0 --raid-devices=2 $MY_DEVS
# MY_DISK=/dev/md/stripped
# dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.837419 s, 250 MB/s

Example 3-2 (writing to RAID0)

# MY_DISK=/dev/md/stripped
# dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.823648 s, 255 MB/s

Example 3-3 (reading concurrently from RAID0)

# MY_DISK=/dev/md/stripped
# (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache &); (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.31025 s, 160 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.80016 s, 116 MB/s

Example 3-4 (writing concurrently to RAID0)

# MY_DISK=/dev/md/stripped
# (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct &); (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.65026 s, 127 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.81323 s, 116 MB/s

Example 4-1 (reading from RAID1)

# mdadm --stop /dev/md/stripped
# mdadm --create /dev/md/mirrored --homehost=any --metadata=1.0 --level=1 --raid-devices=2 --assume-clean $MY_DEVS
# MY_DISK=/dev/md/mirrored
# dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.74963 s, 120 MB/s

Example 4-2 (writing to RAID1)

# MY_DISK=/dev/md/mirrored
# dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.74625 s, 120 MB/s

Example 4-3 (reading concurrently from RAID1)

# MY_DISK=/dev/md/mirrored
# (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache &); (dd if=$MY_DISK of=/dev/null bs=1MiB count=200 iflag=nocache skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.67171 s, 125 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.67685 s, 125 MB/s

Example 4-4 (writing concurrently to RAID1)

# MY_DISK=/dev/md/mirrored
# (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct &); (dd if=/dev/zero of=$MY_DISK bs=1MiB count=200 oflag=direct skip=200 &)
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 4.09666 s, 51.2 MB/s
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 4.1067 s, 51.1 MB/s

Restore your swap device and journald configuration

# mdadm --stop /dev/md/stripped /dev/md/mirrored
# mdadm --create /dev/md/swap --homehost=any --metadata=1.0 --level=1 --raid-devices=2 $MY_DEVS
# mkswap /dev/md/swap
# swapon -a
# mv /etc/systemd/journald.conf.bak /etc/systemd/journald.conf
# systemctl restart systemd-journald.service
# reboot

Interpreting the results

Examples 1-1, 1-2, 2-1, and 2-2 show that each of my drives read and write at about 125 MB/s.

Examples 1-3, 1-4, 2-3, and 2-4 show that when two reads or two writes are done in parallel on the same drive, each process gets at about half the drive’s bandwidth (60 MB/s).

The 3-x examples show the performance benefit of putting the two drives together in a RAID0 (data stripping) array. The numbers, in all cases, show that the RAID0 array performs about twice as fast as either drive is able to perform on its own. The trade-off is that you are twice as likely to lose everything because each drive only contains half the data. A three-drive array would perform three times as fast as a single drive (all drives being equal) but it would be thrice as likely to suffer a catastrophic failure.

The 4-x examples show that the performance of the RAID1 (data mirroring) array is similar to that of a single disk except for the case where multiple processes are concurrently reading (example 4-3). In the case of multiple processes reading, the performance of the RAID1 array is similar to that of the RAID0 array. This means that you will see a performance benefit with RAID1, but only when processes are reading concurrently. For example, if a process tries to access a large number of files in the background while you are trying to use a web browser or email client in the foreground. The main benefit of RAID1 is that your data is unlikely to be lost if a drive fails.

Video demo

Testing storage throughput using dd

Troubleshooting

If the above tests aren’t performing as you expect, you might have a bad or failing drive. Most modern hard drives have built-in Self-Monitoring, Analysis and Reporting Technology (SMART). If your drive supports it, the smartctl command can be used to query your hard drive for its internal statistics:

# smartctl --health /dev/sda
# smartctl --log=error /dev/sda
# smartctl -x /dev/sda

Another way that you might be able to tune your PC for better performance is by changing your I/O scheduler. Linux systems support several I/O schedulers and the current default for Fedora systems is the multiqueue variant of the deadline scheduler. The default performs very well overall and scales extremely well for large servers with many processors and large disk arrays. There are, however, a few more specialized schedulers that might perform better under certain conditions.

To view which I/O scheduler your drives are using, issue the following command:

$ for i in /sys/block/sd?/queue/scheduler; do echo "$i: $(<$i)"; done

You can change the scheduler for a drive by writing the name of the desired scheduler to the /sys/block/<device name>/queue/scheduler file:

# echo bfq > /sys/block/sda/queue/scheduler

You can make your changes permanent by creating a udev rule for your drive. The following example shows how to create a udev rule that will set all rotational drives to use the BFQ I/O scheduler:

# cat << END > /etc/udev/rules.d/60-ioscheduler-rotational.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
END

Here is another example that sets all solid-state drives to use the NOOP I/O scheduler:

# cat << END > /etc/udev/rules.d/60-ioscheduler-solid-state.rules
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
END

Changing your I/O scheduler won’t affect the raw throughput of your devices, but it might make your PC seem more responsive by prioritizing the bandwidth for the foreground tasks over the background tasks or by eliminating unnecessary block reordering.


Photo by James Donovan on Unsplash.

Posted on Leave a comment

Use udica to build SELinux policy for containers

While modern IT environments move towards Linux containers, the need to secure these environments is as relevant as ever. Containers are a process isolation technology. While containers can be a defense mechanism, they only excel when combined with SELinux.

Fedora SELinux engineering built a new standalone tool, udica, to generate SELinux policy profiles for containers by automatically inspecting them. This article focuses on why udica is needed in the container world, and how it makes SELinux and containers work better together. You’ll find examples of SELinux separation for containers that let you avoid turning protection off because the generic SELinux type container_t is too tight. With udica you can easily customize the policy with limited SELinux policy writing skills.

SELinux technology

SELinux is a security technology that brings proactive security to Linux systems. It’s a labeling system that assigns a label to all subjects (processes and users) and objects (files, directories, sockets, etc.). These labels are then used in a security policy that controls access throughout the system. It’s important to mention that what’s not allowed in an SELinux security policy is denied by default. The policy rules are enforced by the kernel. This security technology has been in use on Fedora for several years. A real example of such a rule is:

allow httpd_t httpd_log_t: file { append create getattr ioctl lock open read setattr };

The rule allows any process labeled as httpd_t to create, append, read and lock files labeled as httpd_log_t. Using the ps command, you can list all processes with their labels:

$ ps -efZ | grep httpd
system_u:system_r:httpd_t:s0 root 13911 1 0 Apr14 ? 00:05:14 /usr/sbin/httpd -DFOREGROUND
...

To see which objects are labeled as httpd_log_t, use semanage:

# semanage fcontext -l | grep httpd_log_t
/var/log/httpd(/.)? all files system_u:object_r:httpd_log_t:s0
/var/log/nginx(/.)? all files system_u:object_r:httpd_log_t:s0
...

The SELinux security policy for Fedora is shipped in the selinux-policyRPM package.

SELinux vs. containers

In Fedora, the container-selinux RPM package provides a generic SELinux policy for all containers started by engines like podman or docker. Its main purposes are to protect the host system against a container process, and to separate containers from each other. For instance, containers confined by SELinux with the process type container_t can only read/execute files in /usr and write to container_file_t files type on host file system. To prevent attacks by containers on each other, Multi-Category Security (MCS) is used.

Using only one generic policy for containers is problematic, because of the huge variety of container usage. On one hand, the default container type (container_t) is often too strict. For example:

  • Fedora SilverBlue needs containers to read/write a user’s home directory
  • Fluentd project needs containers to be able to read logs in the /var/log directory

On the other hand, the default container type could be too loose for certain use cases:

  • It has no SELinux network controls — all container processes can bind to any network port
  • It has no SELinux control on Linux capabilities — all container processes can use all capabilities

There is one solution to handle both use cases: write a custom SELinux security policy for the container. This can be tricky, because SELinux expertise is required. For this purpose, the udica tool was created.

Introducing udica

Udica generates SELinux security profiles for containers. Its concept is based on the “block inheritance” feature inside the common intermediate language (CIL) supported by SELinux userspace. The tool creates a policy that combines:

  • Rules inherited from specified CIL blocks (templates), and
  • Rules discovered by inspection of container JSON file, which contains mountpoints and ports definitions

You can load the final policy immediately, or move it to another system to load into the kernel. Here’s an example, using a container that:

  • Mounts /home as read only
  • Mounts /var/spool as read/write
  • Exposes port tcp/21

The container starts with this command:

# podman run -v /home:/home:ro -v /var/spool:/var/spool:rw -p 21:21 -it fedora bash

The default container type (container_t) doesn’t allow any of these three actions. To prove it, you could use the sesearch tool to query that the allow rules are present on system:

# sesearch -A -s container_t -t home_root_t -c dir -p read 

There’s no allow rule present that lets a process labeled as container_t access a directory labeled home_root_t (like the /home directory). The same situation occurs with /var/spool, which is labeled var_spool_t:

# sesearch -A -s container_t -t var_spool_t -c dir -p read

On the other hand, the default policy completely allows network access.

# sesearch -A -s container_t -t port_type -c tcp_socket
allow container_net_domain port_type:tcp_socket { name_bind name_connect recv_msg send_msg };
allow sandbox_net_domain port_type:tcp_socket { name_bind name_connect recv_msg send_msg };

Securing the container

It would be great to restrict this access and allow the container to bind just on TCP port 21 or with the same label. Imagine you find an example container using podman ps whose ID is 37a3635afb8f:

# podman ps -q
37a3635afb8f

You can now inspect the container and pass the inspection file to the udica tool. The name for the new policy is my_container.

# podman inspect 37a3635afb8f > container.json
# udica -j container.json my_container
Policy my_container with container id 37a3635afb8f created!

Please load these modules using:
# semodule -i my_container.cil /usr/share/udica/templates/{base_container.cil,net_container.cil,home_container.cil}

Restart the container with: "--security-opt label=type:my_container.process" parameter

That’s it! You just created a custom SELinux security policy for the example container. Now you can load this policy into the kernel and make it active. The udica output above even tells you the command to use:

# semodule -i my_container.cil /usr/share/udica/templates/{base_container.cil,net_container.cil,home_container.cil}

Now you must restart the container to allow the container engine to use the new custom policy:

# podman run --security-opt label=type:my_container.process -v /home:/home:ro -v /var/spool:/var/spool:rw -p 21:21 -it fedora bash

The example container is now running in the newly created my_container.process SELinux process type:

# ps -efZ | grep my_container.process
unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 root 2275 434 1 13:49 pts/1 00:00:00 podman run --security-opt label=type:my_container.process -v /home:/home:ro -v /var/spool:/var/spool:rw -p 21:21 -it fedora bash
system_u:system_r:my_container.process:s0:c270,c963 root 2317 2305 0 13:49 pts/0 00:00:00 bash

Seeing the results

The command sesearch now shows allow rules for accessing /home and /var/spool:

# sesearch -A -s my_container.process -t home_root_t -c dir -p read
allow my_container.process home_root_t:dir { getattr ioctl lock open read search };
# sesearch -A -s my_container.process -t var_spool_t -c dir -p read
allow my_container.process var_spool_t:dir { add_name getattr ioctl lock open read remove_name search write }

The new custom SELinux policy also allows my_container.process to bind only to TCP/UDP ports labeled the same as TCP port 21:

# semanage port -l | grep 21 | grep ftp
ftp_port_t tcp 21, 989, 990
# sesearch -A -s my_container.process -c tcp_socket -p name_bind
allow my_container.process ftp_port_t:tcp_socket name_bind;

Conclusion

The udica tool helps you create SELinux policies for containers based on an inspection file without any SELinux expertise required. Now you can increase the security of containerized environments. Sources are available on GitHub, and an RPM package is available in Fedora repositories for Fedora 28 and later.


Photo by Samuel Zeller on Unsplash.

Posted on Leave a comment

How to Build a Netboot Server, Part 4

One significant limitation of the netboot server built in this series is the operating system image being served is read-only. Some use cases may require the end user to modify the image. For example, an instructor may want to have the students install and configure software packages like MariaDB and Node.js as part of their course walk-through.

An added benefit of writable netboot images is the end user’s “personalized” operating system can follow them to different workstations they may use at later times.

Change the Bootmenu Application to use HTTPS

Create a self-signed certificate for the bootmenu application:

$ sudo -i
# MY_NAME=$(</etc/hostname)
# MY_TLSD=/opt/bootmenu/tls
# mkdir $MY_TLSD
# openssl req -newkey rsa:2048 -nodes -keyout $MY_TLSD/$MY_NAME.key -x509 -days 3650 -out $MY_TLSD/$MY_NAME.pem

Verify your certificate’s values. Make sure the “CN” value in the “Subject” line matches the DNS name that your iPXE clients use to connect to your bootmenu server:

# openssl x509 -text -noout -in $MY_TLSD/$MY_NAME.pem

Next, update the bootmenu application’s listen directive to use the HTTPS port and the newly created certificate and key:

# sed -i "s#listen => .*#listen => ['https://$MY_NAME:443?cert=$MY_TLSD/$MY_NAME.pem\&key=$MY_TLSD/$MY_NAME.key\&ciphers=AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA'],#" /opt/bootmenu/bootmenu.conf

Note the ciphers have been restricted to those currently supported by iPXE.

GnuTLS requires the “CAP_DAC_READ_SEARCH” capability, so add it to the bootmenu application’s systemd service:

# sed -i '/^AmbientCapabilities=/ s/$/ CAP_DAC_READ_SEARCH/' /etc/systemd/system/bootmenu.service
# sed -i 's/Serves iPXE Menus over HTTP/Serves iPXE Menus over HTTPS/' /etc/systemd/system/bootmenu.service
# systemctl daemon-reload

Now, add an exception for the bootmenu service to the firewall and restart the service:

# MY_SUBNET=192.0.2.0
# MY_PREFIX=24
# firewall-cmd --add-rich-rule="rule family='ipv4' source address='$MY_SUBNET/$MY_PREFIX' service name='https' accept"
# firewall-cmd --runtime-to-permanent
# systemctl restart bootmenu.service

Use wget to verify it’s working:

$ MY_NAME=server-01.example.edu
$ MY_TLSD=/opt/bootmenu/tls
$ wget -q --ca-certificate=$MY_TLSD/$MY_NAME.pem -O - https://$MY_NAME/menu

Add HTTPS to iPXE

Update init.ipxe to use HTTPS. Then recompile the ipxe bootloader with options to embed and trust the self-signed certificate you created for the bootmenu application:

$ echo '#define DOWNLOAD_PROTO_HTTPS' >> $HOME/ipxe/src/config/local/general.h
$ sed -i 's/^chain http:/chain https:/' $HOME/ipxe/init.ipxe
$ cp $MY_TLSD/$MY_NAME.pem $HOME/ipxe
$ cd $HOME/ipxe/src
$ make clean
$ make bin-x86_64-efi/ipxe.efi EMBED=../init.ipxe CERT="../$MY_NAME.pem" TRUST="../$MY_NAME.pem"

You can now copy the HTTPS-enabled iPXE bootloader out to your clients and test that everything is working correctly:

$ cp $HOME/ipxe/src/bin-x86_64-efi/ipxe.efi $HOME/esp/efi/boot/bootx64.efi

Add User Authentication to Mojolicious

Create a PAM service definition for the bootmenu application:

# dnf install -y pam_krb5
# echo 'auth required pam_krb5.so' > /etc/pam.d/bootmenu

Add a library to the bootmenu application that uses the Authen-PAM perl module to perform user authentication:

# dnf install -y perl-Authen-PAM;
# MY_MOJO=/opt/bootmenu
# mkdir $MY_MOJO/lib
# cat << 'END' > $MY_MOJO/lib/PAM.pm
package PAM; use Authen::PAM; sub auth { my $success = 0; my $username = shift; my $password = shift; my $callback = sub { my @res; while (@_) { my $code = shift; my $msg = shift; my $ans = ""; $ans = $username if ($code == PAM_PROMPT_ECHO_ON()); $ans = $password if ($code == PAM_PROMPT_ECHO_OFF()); push @res, (PAM_SUCCESS(), $ans); } push @res, PAM_SUCCESS(); return @res; }; my $pamh = new Authen::PAM('bootmenu', $username, $callback); { last unless ref $pamh; last unless $pamh->pam_authenticate() == PAM_SUCCESS; $success = 1; } return $success;
} return 1;
END

The above code is taken almost verbatim from the Authen::PAM::FAQ man page.

Redefine the bootmenu application so it returns a netboot template only if a valid username and password are supplied:

# cat << 'END' > $MY_MOJO/bootmenu.pl
#!/usr/bin/env perl use lib 'lib'; use PAM;
use Mojolicious::Lite;
use Mojolicious::Plugins;
use Mojo::Util ('url_unescape'); plugin 'Config'; get '/menu';
get '/boot' => sub { my $c = shift; my $instance = $c->param('instance'); my $username = $c->param('username'); my $password = $c->param('password'); my $template = 'menu'; { last unless $instance =~ /^fc[[:digit:]]{2}$/; last unless $username =~ /^[[:alnum:]]+$/; last unless PAM::auth($username, url_unescape($password)); $template = $instance; } return $c->render(template => $template);
}; app->start;
END

The bootmenu application now looks for the lib directory relative to its WorkingDirectory. However, by default the working directory is set to the root directory of the server for systemd units. Therefore, you must update the systemd unit to set WorkingDirectory to the root of the bootmenu application instead:

# sed -i "/^RuntimeDirectory=/ a WorkingDirectory=$MY_MOJO" /etc/systemd/system/bootmenu.service
# systemctl daemon-reload

Update the templates to work with the redefined bootmenu application:

# cd $MY_MOJO/templates
# MY_BOOTMENU_SERVER=$(</etc/hostname)
# MY_FEDORA_RELEASES="28 29"
# for i in $MY_FEDORA_RELEASES; do echo '#!ipxe' > fc$i.html.ep; grep "^kernel\|initrd" menu.html.ep | grep "fc$i" >> fc$i.html.ep; echo "boot || chain https://$MY_BOOTMENU_SERVER/menu" >> fc$i.html.ep; sed -i "/^:f$i$/,/^boot /c :f$i\nlogin\nchain https://$MY_BOOTMENU_SERVER/boot?instance=fc$i\&username=\${username}\&password=\${password:uristring} || goto failed" menu.html.ep; done

The result of the last command above should be three files similar to the following:

menu.html.ep:

#!ipxe set timeout 5000 :menu
menu iPXE Boot Menu
item --key 1 lcl 1. Microsoft Windows 10
item --key 2 f29 2. RedHat Fedora 29
item --key 3 f28 3. RedHat Fedora 28
choose --timeout ${timeout} --default lcl selected || goto shell
set timeout 0
goto ${selected} :failed
echo boot failed, dropping to shell...
goto shell :shell
echo type 'exit' to get the back to the menu
set timeout 0
shell
goto menu :lcl
exit :f29
login
chain https://server-01.example.edu/boot?instance=fc29&username=${username}&password=${password:uristring} || goto failed :f28
login
chain https://server-01.example.edu/boot?instance=fc28&username=${username}&password=${password:uristring} || goto failed

fc29.html.ep:

#!ipxe
kernel --name kernel.efi ${prefix}/vmlinuz-4.19.5-300.fc29.x86_64 initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=192.0.2.91 nameserver=192.0.2.92 root=/dev/disk/by-path/ip-192.0.2.158:3260-iscsi-iqn.edu.example.server-01:fc29-lun-1 netroot=iscsi:192.0.2.158::::iqn.edu.example.server-01:fc29 console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet
initrd --name initrd.img ${prefix}/initramfs-4.19.5-300.fc29.x86_64.img
boot || chain https://server-01.example.edu/menu

fc28.html.ep:

#!ipxe
kernel --name kernel.efi ${prefix}/vmlinuz-4.19.3-200.fc28.x86_64 initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=192.0.2.91 nameserver=192.0.2.92 root=/dev/disk/by-path/ip-192.0.2.158:3260-iscsi-iqn.edu.example.server-01:fc28-lun-1 netroot=iscsi:192.0.2.158::::iqn.edu.example.server-01:fc28 console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet
initrd --name initrd.img ${prefix}/initramfs-4.19.3-200.fc28.x86_64.img
boot || chain https://server-01.example.edu/menu

Now, restart the bootmenu application and verify authentication is working:

# systemctl restart bootmenu.service

Make the iSCSI Target Writeable

Now that user authentication works through iPXE, you can create per-user, writeable overlays on top of the read-only image on demand when users connect. Using a copy-on-write overlay has three advantages over simply copying the original image file for each user:

  1. The copy can be created very quickly. This allows creation on-demand.
  2. The copy does not increase the disk usage on the server. Only what the user writes to their personal copy of the image is stored in addition to the original image.
  3. Since most sectors for each copy are the same sectors on the server’s storage, they’ll likely already be loaded in RAM when subsequent users access their copies of the operating system. This improves the server’s performance because RAM is faster than disk I/O.

One potential pitfall of using copy-on-write is that once overlays are created, the images on which they are overlayed must not be changed. If they are changed, all the overlays will be corrupted. Then the overlays must be deleted and replaced with new, blank overlays. Even simply mounting the image file in read-write mode can cause sufficient filesystem updates to corrupt the overlays.

Due to the potential for the overlays to be corrupted if the original image is modified, mark the original image as immutable by running:

# chattr +i </path/to/file>

You can use lsattr </path/to/file> to view the status of the immutable flag and use  to chattr -i </path/to/file> unset the immutable flag. While the immutable flag is set, even the root user or a system process running as root cannot modify or delete the file.

Begin by stopping the tgtd.service so you can change the image files:

# systemctl stop tgtd.service

It’s normal for this command to take a minute or so to stop when there are connections still open.

Now, remove the read-only iSCSI export. Then update the readonly-root configuration file in the template so the image is no longer read-only:

# MY_FC=fc29
# rm -f /etc/tgt/conf.d/$MY_FC.conf
# TEMP_MNT=$(mktemp -d)
# mount /$MY_FC.img $TEMP_MNT
# sed -i 's/^READONLY=yes$/READONLY=no/' $TEMP_MNT/etc/sysconfig/readonly-root
# sed -i 's/^Storage=volatile$/#Storage=auto/' $TEMP_MNT/etc/systemd/journald.conf
# umount $TEMP_MNT

Journald was changed from logging to volatile memory back to its default (log to disk if /var/log/journal exists) because a user reported his clients would freeze with an out-of-memory error due to an application generating excessive system logs. The downside to setting logging to disk is that extra write traffic is generated by the clients, and might burden your netboot server with unnecessary I/O. You should decide which option — log to memory or log to disk — is preferable depending on your environment.

Since you won’t make any further changes to the template image, set the immutable flag on it and restart the tgtd.service:

# chattr +i /$MY_FC.img
# systemctl start tgtd.service

Now, update the bootmenu application:

# cat << 'END' > $MY_MOJO/bootmenu.pl
#!/usr/bin/env perl use lib 'lib'; use PAM;
use Mojolicious::Lite;
use Mojolicious::Plugins;
use Mojo::Util ('url_unescape'); plugin 'Config'; get '/menu';
get '/boot' => sub { my $c = shift; my $instance = $c->param('instance'); my $username = $c->param('username'); my $password = $c->param('password'); my $chapscrt; my $template = 'menu'; { last unless $instance =~ /^fc[[:digit:]]{2}$/; last unless $username =~ /^[[:alnum:]]+$/; last unless PAM::auth($username, url_unescape($password)); last unless $chapscrt = `sudo scripts/mktgt $instance $username`; $template = $instance; } return $c->render(template => $template, username => $username, chapscrt => $chapscrt);
}; app->start;
END

This new version of the bootmenu application calls a custom mktgt script which, on success, returns a random CHAP password for each new iSCSI target that it creates. The CHAP password prevents one user from mounting another user’s iSCSI target by indirect means. The app only returns the correct iSCSI target password to a user who has successfully authenticated.

The mktgt script is prefixed with sudo because it needs root privileges to create the target.

The $username and $chapscrt variables also pass to the render command so they can be incorporated into the templates returned to the user when necessary.

Next, update our boot templates so they can read the username and chapscrt variables and pass them along to the end user. Also update the templates to mount the root filesystem in rw (read-write) mode:

# cd $MY_MOJO/templates
# sed -i "s/:$MY_FC/:$MY_FC-<%= \$username %>/g" $MY_FC.html.ep
# sed -i "s/ netroot=iscsi:/ netroot=iscsi:<%= \$username %>:<%= \$chapscrt %>@/" $MY_FC.html.ep
# sed -i "s/ ro / rw /" $MY_FC.html.ep

After running the above commands, you should have boot templates like the following:

#!ipxe
kernel --name kernel.efi ${prefix}/vmlinuz-4.19.5-300.fc29.x86_64 initrd=initrd.img rw ip=dhcp rd.peerdns=0 nameserver=192.0.2.91 nameserver=192.0.2.92 root=/dev/disk/by-path/ip-192.0.2.158:3260-iscsi-iqn.edu.example.server-01:fc29-<%= $username %>-lun-1 netroot=iscsi:<%= $username %>:<%= $chapscrt %>@192.0.2.158::::iqn.edu.example.server-01:fc29-<%= $username %> console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet
initrd --name initrd.img ${prefix}/initramfs-4.19.5-300.fc29.x86_64.img
boot || chain https://server-01.example.edu/menu

NOTE: If you need to view the boot template after the variables have been interpolated, you can insert the “shell” command on its own line just before the “boot” command. Then, when you netboot your client, iPXE gives you an interactive shell where you can enter “imgstat” to view the parameters being passed to the kernel. If everything looks correct, you can type “exit” to leave the shell and continue the boot process.

Now allow the bootmenu user to run the mktgt script (and only that script) as root via sudo:

# echo "bootmenu ALL = NOPASSWD: $MY_MOJO/scripts/mktgt *" > /etc/sudoers.d/bootmenu

The bootmenu user should not have write access to the mktgt script or any other files under its home directory. All the files under /opt/bootmenu should be owned by root, and should not be writable by any user other than root.

Sudo does not work well with systemd’s DynamicUser option, so create a normal user account and set the systemd service to run as that user:

# useradd -r -c 'iPXE Boot Menu Service' -d /opt/bootmenu -s /sbin/nologin bootmenu
# sed -i 's/^DynamicUser=true$/User=bootmenu/' /etc/systemd/system/bootmenu.service
# systemctl daemon-reload

Finally, create a directory for the copy-on-write overlays and create the mktgt script that manages the iSCSI targets and their overlayed backing stores:

# mkdir /$MY_FC.cow
# mkdir $MY_MOJO/scripts
# cat << 'END' > $MY_MOJO/scripts/mktgt
#!/usr/bin/env perl # if another instance of this script is running, wait for it to finish "$ENV{FLOCKER}" eq 'MKTGT' or exec "env FLOCKER=MKTGT flock /tmp $0 @ARGV"; # use "RETURN" to print to STDOUT; everything else goes to STDERR by default
open(RETURN, '>&', STDOUT);
open(STDOUT, '>&', STDERR); my $instance = shift or die "instance not provided";
my $username = shift or die "username not provided"; my $img = "/$instance.img";
my $dir = "/$instance.cow";
my $top = "$dir/$username"; -f "$img" or die "'$img' is not a file"; -d "$dir" or die "'$dir' is not a directory"; my $base;
die unless $base = `losetup --show --read-only --nooverlap --find $img`;
chomp $base; my $size;
die unless $size = `blockdev --getsz $base`;
chomp $size; # create the per-user sparse file if it does not exist
if (! -e "$top") { die unless system("dd if=/dev/zero of=$top status=none bs=512 count=0 seek=$size") == 0;
} # create the copy-on-write overlay if it does not exist
my $cow="$instance-$username";
my $dev="/dev/mapper/$cow";
if (! -e "$dev") { my $over; die unless $over = `losetup --show --nooverlap --find $top`; chomp $over; die unless system("echo 0 $size snapshot $base $over p 8 | dmsetup create $cow") == 0;
} my $tgtadm = '/usr/sbin/tgtadm --lld iscsi'; # get textual representations of the iscsi targets
my $text = `$tgtadm --op show --mode target`;
my @targets = $text =~ /(?:^T.*\n)(?:^ .*\n)*/mg; # convert the textual representations into a hash table
my $targets = {};
foreach (@targets) { my $tgt; my $sid; foreach (split /\n/) { /^Target (\d+)(?{ $tgt = $targets->{$^N} = [] })/; /I_T nexus: (\d+)(?{ $sid = $^N })/; /Connection: (\d+)(?{ push @{$tgt}, [ $sid, $^N ] })/; }
} my $hostname;
die unless $hostname = `hostname`;
chomp $hostname; my $target = 'iqn.' . join('.', reverse split('\.', $hostname)) . ":$cow"; # find the target id corresponding to the provided target name and
# close any existing connections to it
my $tid = 0;
foreach (@targets) { next unless /^Target (\d+)(?{ $tid = $^N }): $target$/m; foreach (@{$targets->{$tid}}) { die unless system("$tgtadm --op delete --mode conn --tid $tid --sid $_->[0] --cid $_->[1]") == 0; }
} # create a new target if an existing one was not found
if ($tid == 0) { # find an available target id my @ids = (0, sort keys %{$targets}); $tid = 1; while ($ids[$tid]==$tid) { $tid++ } # create the target die unless -e "$dev"; die unless system("$tgtadm --op new --mode target --tid $tid --targetname $target") == 0; die unless system("$tgtadm --op new --mode logicalunit --tid $tid --lun 1 --backing-store $dev") == 0; die unless system("$tgtadm --op bind --mode target --tid $tid --initiator-address ALL") == 0;
} # (re)set the provided target's chap password
my $password = join('', map(chr(int(rand(26))+65), 1..8));
my $accounts = `$tgtadm --op show --mode account`;
if ($accounts =~ / $username$/m) { die unless system("$tgtadm --op delete --mode account --user $username") == 0;
}
die unless system("$tgtadm --op new --mode account --user $username --password $password") == 0;
die unless system("$tgtadm --op bind --mode account --tid $tid --user $username") == 0; # return the new password to the iscsi target on stdout
print RETURN $password;
END
# chmod +x $MY_MOJO/scripts/mktgt

The above script does five things:

  1. It creates the /<instance>.cow/<username> sparse file if it does not already exist.
  2. It creates the /dev/mapper/<instance>-<username> device node that serves as the copy-on-write backing store for the iSCSI target if it does not already exist.
  3. It creates the iqn.<reverse-hostname>:<instance>-<username> iSCSI target if it does not exist. Or, if the target does exist, it closes any existing connections to it because the image can only be opened in read-write mode from one place at a time.
  4. It (re)sets the chap password on the iqn.<reverse-hostname>:<instance>-<username> iSCSI target to a new random value.
  5. It prints the new chap password on standard output if all of the previous tasks compeleted successfully.

You should be able to test the mktgt script from the command line by running it with valid test parameters. For example:

# echo `$MY_MOJO/scripts/mktgt fc29 jsmith`

When run from the command line, the mktgt script should print out either the eight-character random password for the iSCSI target if it succeeded or the line number on which something went wrong if it failed.

On occasion, you may want to delete an iSCSI target without having to stop the entire service. For example, a user might inadvertently corrupt their personal image, in which case you would need to systematically undo everything that the above mktgt script does so that the next time they log in they will get a copy of the original image.

Below is an rmtgt script that undoes, in reverse order, what the above mktgt script did:

# mkdir $HOME/bin
# cat << 'END' > $HOME/bin/rmtgt
#!/usr/bin/env perl @ARGV >= 2 or die "usage: $0 <instance> <username> [+d|+f]\n"; my $instance = shift;
my $username = shift; my $rmd = ($ARGV[0] eq '+d'); #remove device node if +d flag is set
my $rmf = ($ARGV[0] eq '+f'); #remove sparse file if +f flag is set
my $cow = "$instance-$username"; my $hostname;
die unless $hostname = `hostname`;
chomp $hostname; my $tgtadm = '/usr/sbin/tgtadm';
my $target = 'iqn.' . join('.', reverse split('\.', $hostname)) . ":$cow"; my $text = `$tgtadm --op show --mode target`;
my @targets = $text =~ /(?:^T.*\n)(?:^ .*\n)*/mg; my $targets = {};
foreach (@targets) { my $tgt; my $sid; foreach (split /\n/) { /^Target (\d+)(?{ $tgt = $targets->{$^N} = [] })/; /I_T nexus: (\d+)(?{ $sid = $^N })/; /Connection: (\d+)(?{ push @{$tgt}, [ $sid, $^N ] })/; }
} my $tid = 0;
foreach (@targets) { next unless /^Target (\d+)(?{ $tid = $^N }): $target$/m; foreach (@{$targets->{$tid}}) { die unless system("$tgtadm --op delete --mode conn --tid $tid --sid $_->[0] --cid $_->[1]") == 0; } die unless system("$tgtadm --op delete --mode target --tid $tid") == 0; print "target $tid deleted\n"; sleep 1;
} my $dev = "/dev/mapper/$cow";
if ($rmd or ($rmf and -e $dev)) { die unless system("dmsetup remove $cow") == 0; print "device node $dev deleted\n";
} if ($rmf) { my $sf = "/$instance.cow/$username"; die "sparse file $sf not found" unless -e "$sf"; die unless system("rm -f $sf") == 0; die unless not -e "$sf"; print "sparse file $sf deleted\n";
}
END
# chmod +x $HOME/bin/rmtgt

For example, to use the above script to completely remove the fc29-jsmith target including its backing store device node and its sparse file, run the following:

# rmtgt fc29 jsmith +f

Once you’ve verified that the mktgt script is working properly, you can restart the bootmenu service. The next time someone netboots, they should receive a personal copy of the the netboot image they can write to:

# systemctl restart bootmenu.service

Users should now be able to modify the root filesystem as demonstrated in the below screenshot:

Posted on Leave a comment

How to Build a Netboot Server, Part 3

The How to Build a Netboot Server, Part 1 article provided a minimal iPXE boot script for your netboot image. Many users probably have a local operating system that they want to use in addition to the netboot image. But switching bootloaders using the typical workstation’s BIOS can be cumbersome. This part of the series shows how to set up some more complex iPXE configurations. These allow the end user to easily choose which operating system they want to boot. They also let the system administrator manage the boot menus from a central server.

An interactive iPXE boot menu

The commands below redefine the netboot image’s boot.cfg as an interactive iPXE boot menu with a 5 second countdown timer:

$ MY_FVER=29 $ MY_KRNL=$(ls -c /fc$MY_FVER/lib/modules | head -n 1) $ MY_DNS1=192.0.2.91 $ MY_DNS2=192.0.2.92 $ MY_NAME=server-01.example.edu $ MY_EMAN=$(echo $MY_NAME | tr '.' "\n" | tac | tr "\n" '.' | cut -b -${#MY_NAME}) $ MY_ADDR=$(host -t A $MY_NAME | awk '{print $4}') $ cat << END > $HOME/esp/linux/boot.cfg #!ipxe set timeout 5000 :menu menu iPXE Boot Menu item --key 1 lcl 1. Microsoft Windows 10 item --key 2 f$MY_FVER 2. RedHat Fedora $MY_FVER choose --timeout \${timeout} --default lcl selected || goto shell set timeout 0 goto \${selected} :failed echo boot failed, dropping to shell... goto shell :shell echo type 'exit' to get the back to the menu set timeout 0 shell goto menu :lcl exit :f$MY_FVER kernel --name kernel.efi \${prefix}/vmlinuz-$MY_KRNL initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=$MY_DNS1 nameserver=$MY_DNS2 root=/dev/disk/by-path/ip-$MY_ADDR:3260-iscsi-iqn.$MY_EMAN:fc$MY_FVER-lun-1 netroot=iscsi:$MY_ADDR::::iqn.$MY_EMAN:fc$MY_FVER console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet initrd --name initrd.img \${prefix}/initramfs-$MY_KRNL.img boot || goto failed END

The above menu has five sections:

  • menu defines the actual menu that will be shown on the screen.
  • failed notifies the user that something went wrong and drops the user to a shell so they can troubleshot the problem.
  • shell provides an interactive command prompt. You can reach it either by pressing the Esc key while at the boot menu or if the “boot” command returns with a failure code.
  • lcl contains a single command that tells iPXE to exit and return control back to the BIOS. Whatever you want to boot by default (e.g. the workstation’s local hard drive) must be listed as the next boot item right after iPXE in your workstation’s BIOS.
  • f29 contains the same netboot code used earlier but with the final exit replaced with goto failed.

Copy the updated boot.cfg from your $HOME/esp/linux directory out to the ESPs of all your client systems. If all goes well, you should see results similar to the image below:

A server hosted boot menu

Another feature you can add to the netboot server is the ability to manage all the client boot menus from one central location. This feature can be especially useful when rolling out a new version of the OS. It lets you perform a sort of atomic transaction to switch all clients over to the new OS after you’ve copied the new kernel and initramfs out to the ESPs of all the clients.

Install Mojolicious:

$ sudo -i # dnf install -y perl-Mojolicious

Define the “bootmenu” app:

# mkdir /opt/bootmenu # cat << END > /opt/bootmenu/bootmenu.pl #!/usr/bin/env perl use Mojolicious::Lite; use Mojolicious::Plugins; plugin 'Config'; get '/menu'; app->start; END # chmod 755 /opt/bootmenu/bootmenu.pl

Define the configuration file for the bootmenu app:

# cat << END > /opt/bootmenu/bootmenu.conf { hypnotoad => { listen => ['http://*:80'], pid_file => '/run/bootmenu/bootmenu.pid', } } END

This is an extremely simple Mojolicious application that listens on port 80 and only answers to /menu requests. If you want a quick introduction to what Mojolicious can do, run man Mojolicious::Guides::Growing to view the manual. Use the Q key to quit the manual.

Move boot.cfg over to our netboot app as a template named menu.html.ep:

# mkdir /opt/bootmenu/templates # mv $HOME/esp/linux/boot.cfg /opt/bootmenu/templates/menu.html.ep

Define a systemd service to manage the bootmenu app:

# cat << END > /etc/systemd/system/bootmenu.service [Unit] Description=Serves iPXE Menus over HTTP After=network-online.target [Service] Type=forking DynamicUser=true RuntimeDirectory=bootmenu PIDFile=/run/bootmenu/bootmenu.pid ExecStart=/usr/bin/hypnotoad /opt/bootmenu/bootmenu.pl ExecReload=/usr/bin/hypnotoad /opt/bootmenu/bootmenu.pl AmbientCapabilities=CAP_NET_BIND_SERVICE KillMode=process [Install] WantedBy=multi-user.target END

Add an exception for the HTTP service to the local firewall and start the bootmenu service:

# firewall-cmd --add-service http # firewall-cmd --runtime-to-permanent # systemctl enable bootmenu.service # systemctl start bootmenu.service

Test it with wget:

$ sudo dnf install -y wget $ MY_BOOTMENU_SERVER=server-01.example.edu $ wget -q -O - http://$MY_BOOTMENU_SERVER/menu

The above command should output something similar to the following:

#!ipxe set timeout 5000 :menu menu iPXE Boot Menu item --key 1 lcl 1. Microsoft Windows 10 item --key 2 f29 2. RedHat Fedora 29 choose --timeout ${timeout} --default lcl selected || goto shell set timeout 0 goto ${selected} :failed echo boot failed, dropping to shell... goto shell :shell echo type 'exit' to get the back to the menu set timeout 0 shell goto menu :lcl exit :f29 kernel --name kernel.efi ${prefix}/vmlinuz-4.19.4-300.fc29.x86_64 initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=192.0.2.91 nameserver=192.0.2.92 root=/dev/disk/by-path/ip-192.0.2.158:3260-iscsi-iqn.edu.example.server-01:fc29-lun-1 netroot=iscsi:192.0.2.158::::iqn.edu.example.server-01:fc29 console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet initrd --name initrd.img ${prefix}/initramfs-4.19.4-300.fc29.x86_64.img boot || goto failed

Now that the boot menu server is working, rebuild the ipxe.efi bootloader with an init script that points to it.

First, update the init.ipxe script created in part one of this series:

$ MY_BOOTMENU_SERVER=server-01.example.edu $ cat << END > $HOME/ipxe/init.ipxe #!ipxe dhcp || exit set prefix file:///linux chain http://$MY_BOOTMENU_SERVER/menu || exit END

Now, rebuild the boot loader:

$ cd $HOME/ipxe/src $ make clean $ make bin-x86_64-efi/ipxe.efi EMBED=../init.ipxe

Copy the updated bootloader to your ESP:

$ cp $HOME/ipxe/src/bin-x86_64-efi/ipxe.efi $HOME/esp/efi/boot/bootx64.efi

After you’ve copied the updated bootloader to all your clients, you can make future updates to the boot menu simply by editing /opt/bootmenu/templates/menu.html.ep and running:

$ sudo systemctl restart bootmenu.service

Making further changes

If the boot menu server is working properly, you’ll longer need the the boot.cfg file on your client systems.

For example, re-add the Fedora 28 image to the boot menu:

$ sudo -i # MY_FVER=28 # MY_KRNL=$(ls -c /fc$MY_FVER/lib/modules | head -n 1) # MY_DNS1=192.0.2.91 # MY_DNS2=192.0.2.92 # MY_NAME=$(</etc/hostname) # MY_EMAN=$(echo $MY_NAME | tr '.' "\n" | tac | tr "\n" '.' | cut -b -${#MY_NAME}) # MY_ADDR=$(host -t A $MY_NAME | awk '{print $4}') # cat << END >> /opt/bootmenu/templates/menu.html.ep :f$MY_FVER kernel --name kernel.efi \${prefix}/vmlinuz-$MY_KRNL initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=$MY_DNS1 nameserver=$MY_DNS2 root=/dev/disk/by-path/ip-$MY_ADDR:3260-iscsi-iqn.$MY_EMAN:fc$MY_FVER-lun-1 netroot=iscsi:$MY_ADDR::::iqn.$MY_EMAN:fc$MY_FVER console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet initrd --name initrd.img \${prefix}/initramfs-$MY_KRNL.img boot || goto failed END # sed -i "/item --key 2/a item --key 3 f$MY_FVER 3. RedHat Fedora $MY_FVER" /opt/bootmenu/templates/menu.html.ep # systemctl restart bootmenu.service

If all goes well, your clients should see results similar to the image below the next time they boot:

Posted on Leave a comment

How to Build a Netboot Server, Part 2

The article How to Build a Netboot Server, Part 1 showed you how to create a netboot image with a “liveuser” account whose home directory lives in volatile memory. Most users probably want to preserve files and settings across reboots, though. So this second part of the netboot series shows how to reconfigure the netboot image from part one so that Active Directory user accounts can log in and their home directories can be automatically mounted from a NFS server.

Part 3 of this series will show how to make an interactive and centrally-configurable iPXE boot menu for the netboot clients.

Setup NFS4 Home Directories with KRB5 Authentication

Follow the directions from the previous post “Share NFS Home Directories Securely with Kerberos,” then return here.

Remove the Liveuser Account

Remove the “liveuser” account created in part one of this series:

$ sudo -i # sed -i '/automaticlogin/Id' /fc28/etc/gdm/custom.conf # rm -f /fc28/etc/sudoers.d/liveuser # for i in passwd shadow group gshadow; do sed -i '/^liveuser:/d' /fc28/etc/$i; done

Configure NTP, KRB5 and SSSD

Next, we will need to duplicate the NTP, KRB5, and SSSD configuration that we set up on the server in the client image so that the same accounts will be available:

# MY_HOSTNAME=$(</etc/hostname) # MY_DOMAIN=${MY_HOSTNAME#*.} # dnf -y --installroot=/fc28 install ntp krb5-workstation sssd # cp /etc/ntp.conf /fc28/etc # chroot /fc28 systemctl enable ntpd.service # cp /etc/krb5.conf.d/${MY_DOMAIN%%.*} /fc28/etc/krb5.conf.d # cp /etc/sssd/sssd.conf /fc28/etc/sssd

Reconfigure sssd to provide authentication services, in addition to the identification service already configured:

# sed -i '/services =/s/$/, pam/' /fc28/etc/sssd/sssd.conf

Also, ensure none of the clients attempt to update the computer account password:

# sed -i '/id_provider/a \ \ ad_maximum_machine_account_password_age = 0' /fc28/etc/sssd/sssd.conf 

Also, copy the nfsnobody definitions:

# for i in passwd shadow group gshadow; do grep "^nfsnobody:" /etc/$i >> /fc28/etc/$i; done

Join Active Directory

Next, you’ll perform a chroot to join the client image to Active Directory. Begin by deleting any pre-existing computer account with the same name your netboot image will use:

# MY_USERNAME=jsmith # MY_CLIENT_HOSTNAME=$(</fc28/etc/hostname) # adcli delete-computer "${MY_CLIENT_HOSTNAME%%.*}" -U "$MY_USERNAME"

Also delete the krb5.keytab file from the netboot image if it exists:

# rm -f /fc28/etc/krb5.keytab

Perform a chroot into the netboot image:

# for i in dev dev/pts dev/shm proc sys run; do mount -o bind /$i /fc28/$i; done # chroot /fc28 /usr/bin/bash --login

Perform the join:

# MY_USERNAME=jsmith # MY_HOSTNAME=$(</etc/hostname) # MY_DOMAIN=${MY_HOSTNAME#*.} # MY_REALM=${MY_DOMAIN^^} # MY_OU="cn=computers,dc=${MY_DOMAIN//./,dc=}" # adcli join $MY_DOMAIN --login-user="$MY_USERNAME" --computer-name="${MY_HOSTNAME%%.*}" --host-fqdn="$MY_HOSTNAME" --user-principal="host/$MY_HOSTNAME@$MY_REALM" --domain-ou="$MY_OU"

Now log out of the chroot and clear the root user’s command history:

# logout # for i in run sys proc dev/shm dev/pts dev; do umount /fc28/$i; done # > /fc28/root/.bash_history

Install and Configure PAM Mount

We want our clients to automatically mount the user’s home directory when they log in. To accomplish this, we’ll use the “pam_mount” module. Install and configure pam_mount:

# dnf install -y --installroot=/fc28 pam_mount # cat << END > /fc28/etc/security/pam_mount.conf.xml <?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE pam_mount SYSTEM "pam_mount.conf.xml.dtd"> <pam_mount> <debug enable="0" /> <volume uid="1400000000-1499999999" fstype="nfs4" server="$MY_HOSTNAME" path="/home/%(USER)" mountpoint="/home/%(USER)" options="sec=krb5" /> <mkmountpoint enable="1" remove="0" /> <msg-authpw>Password:</msg-authpw> </pam_mount> END

Reconfigure PAM to use pam_mount:

# dnf install -y patch # cp -r /fc28/usr/share/authselect/default/sssd /fc28/etc/authselect/custom # echo 'initgroups: files' >> /fc28/etc/authselect/custom/sssd/nsswitch.conf # patch /fc28/etc/authselect/custom/sssd/system-auth << END @@ -12 +12,2 @@ -auth sufficient pam_sss.so forward_pass +auth requisite pam_mount.so {include if "with-pammount"} +auth sufficient pam_sss.so {if "with-pammount":use_first_pass|forward_pass} @@ -35,2 +36,3 @@ session required pam_unix.so +session optional pam_mount.so {include if "with-pammount"} session optional pam_sss.so END # patch /fc28/etc/authselect/custom/sssd/password-auth << END @@ -9 +9,2 @@ -auth sufficient pam_sss.so forward_pass +auth requisite pam_mount.so {include if "with-pammount"} +auth sufficient pam_sss.so {if "with-pammount":use_first_pass|forward_pass} @@ -32,2 +33,3 @@ session required pam_unix.so +session optional pam_mount.so {include if "with-pammount"} session optional pam_sss.so END # chroot /fc28 authselect select custom/sssd with-pammount --force

Also ensure the NFS server’s hostname is always resolvable from the client:

# MY_IP=$(host -t A $MY_HOSTNAME | awk '{print $4}') # echo "$MY_IP $MY_HOSTNAME ${MY_HOSTNAME%%.*}" >> /fc28/etc/hosts

Optionally, allow all users to run sudo:

# echo '%users ALL=(ALL) NOPASSWD: ALL' > /fc28/etc/sudoers.d/users

Convert the NFS Root to an iSCSI Backing-Store

Current versions of nfs-utils may have difficulty establishing a second connection from the client back to the NFS server for home directories when an nfsroot connection is already established. The client hangs when attempting to access the home directory. So, we will work around the problem by using a different protocol (iSCSI) for sharing our netboot image.

First chroot into the image to reconfigure its initramfs for booting from an iSCSI root:

# for i in dev dev/pts dev/shm proc sys run; do mount -o bind /$i /fc28/$i; done # chroot /fc28 /usr/bin/bash --login # dnf install -y iscsi-initiator-utils # sed -i 's/nfs/iscsi/' /etc/dracut.conf.d/netboot.conf # echo 'omit_drivers+=" qedi "' > /etc/dracut.conf.d/omit-qedi.conf # echo 'blacklist qedi' > /etc/modprobe.d/blacklist-qedi.conf # KERNEL=$(ls -c /lib/modules | head -n 1) # INITRD=$(find /boot -name 'init*' | grep -m 1 $KERNEL) # dracut -f $INITRD $KERNEL # logout # for i in run sys proc dev/shm dev/pts dev; do umount /fc28/$i; done # > /fc28/root/.bash_history

The qedi driver broke iscsi during testing, so it’s been disabled here.

Next, create a fc28.img sparse file. This file serves as the iSCSI target’s backing store:

# FC28_SIZE=$(du -ms /fc28 | cut -f 1) # dd if=/dev/zero of=/fc28.img bs=1MiB count=0 seek=$(($FC28_SIZE*2))

(If you have one available, a separate partition or disk drive can be used instead of creating a file.)

Next, format the image with a filesystem, mount it, and copy the netboot image into it:

# mkfs -t xfs -L NETROOT /fc28.img # TEMP_MNT=$(mktemp -d) # mount /fc28.img $TEMP_MNT # cp -a /fc28/* $TEMP_MNT # umount $TEMP_MNT

During testing using SquashFS, the client would occasionally stutter. It seems that SquashFS does not perform well when doing random I/O from a multiprocessor client. (See also The curious case of stalled squashfs reads.) If you want to improve throughput performance with filesystem compression, ZFS is probably a better option.

If you need extremely high throughput from the iSCSI server (say, for hundreds of clients), it might be possible to load balance a Ceph cluster. For more information, see Load Balancing Ceph Object Gateway Servers with HAProxy and Keepalived.

Install and Configure iSCSI

Install the scsi-target-utils package which will provide the iSCSI daemon for serving our image out to our clients:

# dnf install -y scsi-target-utils

Configure the iSCSI daemon to serve the fc28.img file:

# MY_REVERSE_HOSTNAME=$(echo $MY_HOSTNAME | tr '.' "\n" | tac | tr "\n" '.' | cut -b -${#MY_HOSTNAME}) # cat << END > /etc/tgt/conf.d/fc28.conf <target iqn.$MY_REVERSE_HOSTNAME:fc28> backing-store /fc28.img readonly 1 </target> END

The leading iqn. is expected by /usr/lib/dracut/modules.d/40network/net-lib.sh.

Add an exception to the firewall and enable and start the service:

# firewall-cmd --add-service=iscsi-target # firewall-cmd --runtime-to-permanent # systemctl enable tgtd.service # systemctl start tgtd.service

You should now be able to see the image being shared with the tgtadm command:

# tgtadm --mode target --op show

The above command should output something similar to the following:

Target 1: iqn.edu.example.server-01:fc28 System information: Driver: iscsi State: ready I_T nexus information: LUN information: LUN: 0 Type: controller SCSI ID: IET 00010000 SCSI SN: beaf10 Size: 0 MB, Block size: 1 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: null Backing store path: None Backing store flags:   LUN: 1 Type: disk SCSI ID: IET 00010001 SCSI SN: beaf11 Size: 10488 MB, Block size: 512 Online: Yes Removable media: No Prevent removal: No Readonly: Yes SWP: No Thin-provisioning: No Backing store type: rdwr Backing store path: /fc28.img Backing store flags: Account information: ACL information: ALL

We can now remove the NFS share that we created in part one of this series:

# rm -f /etc/exports.d/fc28.exports # exportfs -rv # umount /export/fc28 # rmdir /export/fc28 # sed -i '/^\/fc28 /d' /etc/fstab

You can also delete the /fc28 filesystem, but you may want to keep it for performing future updates.

Update the ESP to use the iSCSI Kernel

Ipdate the ESP to contain the iSCSI-enabled initramfs:

$ rm -vf $HOME/esp/linux/*.fc28.* $ MY_KRNL=$(ls -c /fc28/lib/modules | head -n 1) $ cp $(find /fc28/lib/modules -maxdepth 2 -name 'vmlinuz' | grep -m 1 $MY_KRNL) $HOME/esp/linux/vmlinuz-$MY_KRNL $ cp $(find /fc28/boot -name 'init*' | grep -m 1 $MY_KRNL) $HOME/esp/linux/initramfs-$MY_KRNL.img 

Update the boot.cfg file to pass the new root and netroot parameters:

$ MY_NAME=server-01.example.edu $ MY_EMAN=$(echo $MY_NAME | tr '.' "\n" | tac | tr "\n" '.' | cut -b -${#MY_NAME}) $ MY_ADDR=$(host -t A $MY_NAME | awk '{print $4}') $ sed -i "s! root=[^ ]*! root=/dev/disk/by-path/ip-$MY_ADDR:3260-iscsi-iqn.$MY_EMAN:fc28-lun-1 netroot=iscsi:$MY_ADDR::::iqn.$MY_EMAN:fc28!" $HOME/esp/linux/boot.cfg

Now you just need to copy the updated files from your $HOME/esp/linux directory out to the ESPs of all your client systems. You should see results similar to what is shown in the below screenshot:

Upgrading the Image

First, make a copy of the current image:

# cp -a /fc28 /fc29

Chroot into the new copy of the image:

# for i in dev dev/pts dev/shm proc sys run; do mount -o bind /$i /fc29/$i; done # chroot /fc29 /usr/bin/bash --login

Allow updating the kernel:

# sed -i 's/^exclude=kernel-\*$/#exclude=kernel-*/' /etc/dnf/dnf.conf

Perform the upgrade:

# dnf distro-sync -y --releasever=29

Prevent the kernel from being updated:

# sed -i 's/^#exclude=kernel-\*$/exclude=kernel-*/' /etc/dnf/dnf.conf

The above command is optional, but saves you from having to copy a new kernel out to the clients if you add or update a few packages in the image at some future time.

Clean up dnf’s package cache:

# dnf clean all 

Exit the chroot and clear root’s command history:

# logout # for i in run sys proc dev/shm dev/pts dev; do umount /fc29/$i; done # > /fc29/root/.bash_history

Create the iSCSI image:

# FC29_SIZE=$(du -ms /fc29 | cut -f 1) # dd if=/dev/zero of=/fc29.img bs=1MiB count=0 seek=$(($FC29_SIZE*2)) # mkfs -t xfs -L NETROOT /fc29.img # TEMP_MNT=$(mktemp -d) # mount /fc29.img $TEMP_MNT # cp -a /fc29/* $TEMP_MNT # umount $TEMP_MNT

Define a new iSCSI target that points to our new image and export it:

# MY_HOSTNAME=$(</etc/hostname) # MY_REVERSE_HOSTNAME=$(echo $MY_HOSTNAME | tr '.' "\n" | tac | tr "\n" '.' | cut -b -${#MY_HOSTNAME}) # cat << END > /etc/tgt/conf.d/fc29.conf <target iqn.$MY_REVERSE_HOSTNAME:fc29> backing-store /fc29.img readonly 1 </target> END # tgt-admin --update ALL

Add the new kernel and initramfs to the ESP:

$ MY_KRNL=$(ls -c /fc29/lib/modules | head -n 1) $ cp $(find /fc29/lib/modules -maxdepth 2 -name 'vmlinuz' | grep -m 1 $MY_KRNL) $HOME/esp/linux/vmlinuz-$MY_KRNL $ cp $(find /fc29/boot -name 'init*' | grep -m 1 $MY_KRNL) $HOME/esp/linux/initramfs-$MY_KRNL.img

Update the boot.cfg in the ESP:

$ MY_DNS1=192.0.2.91 $ MY_DNS2=192.0.2.92 $ MY_NAME=server-01.example.edu $ MY_EMAN=$(echo $MY_NAME | tr '.' "\n" | tac | tr "\n" '.' | cut -b -${#MY_NAME}) $ MY_ADDR=$(host -t A $MY_NAME | awk '{print $4}') $ cat << END > $HOME/esp/linux/boot.cfg #!ipxe kernel --name kernel.efi \${prefix}/vmlinuz-$MY_KRNL initrd=initrd.img ro ip=dhcp rd.peerdns=0 nameserver=$MY_DNS1 nameserver=$MY_DNS2 root=/dev/disk/by-path/ip-$MY_ADDR:3260-iscsi-iqn.$MY_EMAN:fc29-lun-1 netroot=iscsi:$MY_ADDR::::iqn.$MY_EMAN:fc29 console=tty0 console=ttyS0,115200n8 audit=0 selinux=0 quiet initrd --name initrd.img \${prefix}/initramfs-$MY_KRNL.img boot || exit END

Finally, copy the files from your $HOME/esp/linux directory out to the ESPs of all your client systems and enjoy!