Posted on Leave a comment

Tune up your sound with PulseEffects: Speakers

Audio components for your computer don’t always produce the quality of sound you want. For instance your laptop speakers may be a bit “tinny” sounding, or a set of speakers for your desktop may be too boomy for your room. Or if you use a desk or headset microphone, you may find that recordings you make are not as high quality as you’d like. Enter PulseEffects!

PulseAudio and Gstreamer

The PulseAudio sound server comes with Fedora Workstation by default. It’s highly flexible and easily modified. PulseAudio can deal with many different inputs and outputs. For instance, it lets you switch between different inputs known as sources (such as microphones, or sound files) or outputs known as sinks (such as speakers or headphones).

By default, PulseAudio manages sound as streams, digitally sampled at a specific rate and bit depth with a defined number of channels — two for most stereo streams. It handles different sample rates for you, so you don’t have to know the details of the stream. PulseAudio simply deals with moving sound from one point to another.

The Gstreamer multimedia framework, on the other hand, provides myriad ways to modify audio and video data on their way through a pipeline. Gstreamer comes with plugins that allow it to attach to PulseAudio. This means that you can use Gstreamer to make a pipeline between your inputs and outputs to change audio streams.

PulseEffects manages this process with a nice, graphical front end. It lets you select and order different effects for your sound.

Installing PulseEffects

To install PulseEffects, use the Software tool and type pulseeffects to find the package. Fedora carries this software in its official repositories. So if you need to, you can switch the source from the Flatpak version to the Fedora one. Then click Install.

If you’re using a command line, you can use the sudo command with dnf to do the same thing:

$ sudo dnf install pulseeffects

Start playing some audio. This can come from your Videos or Rhythmbox media player, a website such as YouTube, a music streaming app such as Spotify, or something else. The best source to use is a full-fidelity digital audio source, like a CD or a FLAC file. (MP3 and online digital streams cut out some frequency information to reduce data size.) Ideally, it should be music that you are used to listening to in many places, and know well, like an album or playlist. Put it on repeat so you can use it while you tune your sound.

Then launch the program using either the Software tool’s Launch control, or your desktop’s application launcher. On Fedora Workstation, go to the Activities hotspot, use the Show Applications control to locate PulseEffects in the list, and click to launch. You should see your application in the list of active sound streams, with color bars that show an average frequency response:

PulseEffects initial screen with one sound stream from Videos

Notice all the sound modules available on the left. None are running by default when you first start PulseEffects. Any enabled modules are applied to your sound in order from top to bottom. You can use the up/down controls next to the module names to alter the order.

What does “better” mean?

Before we get started, realize that what constitutes “better” will usually be different based on many factors:

  • Specific hardware like the model of speakers or microphone
  • The environment the sound device is in (your room)
  • What your ears prefer

There is no magic cure for bad sound that works universally everywhere. So the examples you’ll see are based on some common problems. But you will need to use your ears to determine what’s best for your hardware, in the place you’re using it.

Making desktop speakers sound better

Often desktop speaker sets consist of a subwoofer and small satellite speakers. These tend to be both excessively “boomy,” meaning too much very low frequency sound, and “honky” or “boxy,” meaning too much of some middle (or “mid”) frequencies. To fix frequencies that are over- or under-represented, we can use an equalizer.

By default, all the effects are off in PulseEffects. Since you want to modify a sound output — your speakers — make sure the “speaker” icon at the upper left is selected. Locate the Equalizer control in PulseEffects and select it. Select the toggle switch at the top of the equalizer controls to turn it on.

The default equalizer appears as a 30-band graphic EQ. Older readers might be familiar with seeing physical equipment like this. Each band alters not just that specific frequency, but a fairly narrow band of frequencies around it. Think of it like a “dip” or “bump” in the frequency graph, depending on whether you lower or raise the slider.

PulseEffects default EQ, with a roll-off of extreme low frequencies and some reduction of unpleasant “boxiness” around 450Hz

If you’re not sure how to alter frequencies to get better sound, click the “tools” icon under the on/off toggle. Under Presets you can select different EQ settings to find something closest to what you like. Then you can modify those settings as you like. If things get out of control, under Settings use the Flat response control to zero out all the EQ.

Using the “tools” icon, you can also choose a different number of bands to simplify your choices. Using the “gear” icon above each band, you can choose different types of EQ, as well as the width and Q. Additional filter types like a low/high pass or low/high shelf are also available. Feel free to play with the EQ to see how it works, but be careful not to increase EQ levels too high if your speakers are above a moderate volume, because you can damage speakers that way.

Tips from a mix engineer

These guidelines may help you find the optimal sound for your situation.

  • It’s almost always better to reduce a problem frequency than to boost other things. If you boost too much, your music can start to distort.
  • To fix excessive boominess, apply a high pass filter somewhere between 30 and 50 Hz. You may also want to try a bell EQ reduction somewhere between 40 and 100 Hz.
  • If you want to fix a boxy sound (reminds you of a cardboard box), try a bell EQ to reduce some frequencies between 300 and 500 Hz.
  • To fix a honky or nasal sound, try reducing some frequencies between 650 and 900 Hz.
  • If guitar/keyboard solos or vocals seem a bit muffled, try a gentle boost centered somewhere between 1 and 2 kHz to make them a little more present.
  • If your speakers sound overly tinny, apply a high shelf reduction starting somewhere between 4 and 8 kHz — start at a high frequency and dial back to where it’s helpful. To fix a dull sound, apply a high shelf boost using the same approach.

Remember that a little EQ goes a long way. Try keeping your bell boosts or cuts between +4 and -4 on the sliders. The goal is not to make the music sound extreme, but to make slight corrections. Otherwise your ears will get tired more quickly, or in extreme cases you may even get headaches.

Watch the Input and Output meters at the bottom of every module. If you see a lot of green on one or both, the sound module is overloading at that stage. You’ll often find MP3 files, especially of modern music, have this issue. You may also see “warning” icons flashing over the check mark on enabled modules.

One way to cure this is to use the Limiter module at the beginning of the chain, and simply turn the input gain on the chain down about -3dB, leaving the limit at 0dB. This simply lowers the overall signal level without any attenuation. Then you can run other modules without worrying quite as much about distortion or overload in later stages.

Making laptop speakers sound better

While the above guidelines might be good for bigger speakers, laptops have the additional burden of being very small. Typically they lack bass response, because more and/or larger speakers, and more powerful magnets, are needed to produce those frequencies well.

However, you can correct this using the Bass Enhancer module in PulseEffects. You may want to move this module downward in the stack after your EQ for best results. Rather than turning the amount up excessively, try a modest change of +3 or +4dB, and then move the Scope frequency around until you find where you start to notice good results. Don’t be tempted to amplify too much because again, if it’s too high you could start to damage your laptop speakers over time.

Storing your work

First, set PulseEffects to run whenever you login. Use the “hamburger” tool at the top right to open up the General settings. Set Start Service at Login to enabled, and also enable the option to Process All Outputs. This does not mean all devices will get the same settings. Instead, it means that PulseEffects will run a chain for any sound output device you have connected. You can apply different chains to different devices.

Next, select the Presets button, and in the text box, type a name for your preset. One recommendation is to use the name of the device for which you’ve created a chain. Then click the “+” icon to add the preset. If you make changes, you can either use the “save” icon to save the changes to the selected preset, or click Apply to throw them away and re-apply the saved preset.

Finally, you can click the “cycle” icon if you want the preset to be applied every time the currently used sound output is detected. This is almost always a good idea. If you want to set up different presets for other outputs, first connect the output. Then make a new preset as described above, and select that to be auto-applied.

One final note: When you close the PulseEffects application, your active chain of effects does not stop. It will stay running unless you reset or stop the service. PulseEffects will consume a few percent of CPU time (depending on processor speed). On all but the oldest systems the load should not be noticeable. However, if you are sensitive to power use such as on a laptop, you may want to stop the service using this command:

$ pulseeffects -q

Conclusion

Remember that every environment and person’s hearing is different, so beware of the overly dogmatic. Finally, you can’t make terrible speakers into great ones. But you can usually make them sound not so terrible — and if you have decent speakers, you usually can make them sound quite good!

The PulseEffects author also has both a LiberaPay donation site and a Patreon account, so if you find the software useful, you might want to consider contributing.

In the next installment, you’ll learn how to set up better sound on a desktop or headset microphone, to improve your teleconference meetings or make better audio or video spoken content. Until then, enjoy your new sound possibilities.


Photo by Paul Esch-Laurent on Unsplash.

Posted on Leave a comment

Create a wifi hotspot with Raspberry Pi 3 and Fedora

If you’re already running Fedora on your Pi, you’re already most of the way to a wifi hotspot. A Raspberry Pi has a wifi interface that’s usually set up to join an existing wifi network. This interface can be reconfigured to provide a new wifi network. If a room has a good network cable and a bad wifi signal (a brick wall, foil-backed plasterboard, and even a window with a metal oxide coating are all obstacles), fix it with your Pi.

This article describes the procedure for setting up the hotspot. It was tested on third generation Pis – a Model B v1.2, and a Model B+ (the older 2 and the new 4 weren’t tested). These are the credit-card size Pis that have been around a few years.

This article also delves a little way into the network concepts behind the scenes. For instance, “hotspot” is the term that’s caught on in public places around the world, but it’s more accurate to use the term WLAN AP (Wireless Local Area Network Access Point).In fact, if you want to annoy your friendly neighborhood network administrator, call a hotspot a “wifi router”. The inaccuracy will make their eyes cross.

A few nmcli commands configure the Raspberry Pi as a wifi AP. The nmcli command-line tool controls the NetworkManager daemon. It’s not the only network configuration system available. More complex solutions are available for the adventurous. Check out the hostapd RPM package and the OpenWRT distro. Have a look at Internet connection sharing with NetworkManager for more ideas.

A dive into network administration

The hotspot is a routed AP (Access Point). It sits between two networks, the current wired network and its new wireless network, and takes care of the post-office-style forwarding of IP packets between them.

Routing and interfaces

The wireless interface on the Raspberry Pi is named wlan0 and the wired one is eth0. The new wireless network uses one range of IP addresses and the current wired network uses another. In this example, the current network range is 192.168.0.0/24 and the new network range is 10.42.0.0/24. If these numbers make no sense, that’s OK. You can carry on without getting to grips with IP subnets and netmasks. The Raspberry Pi’s two interfaces have IP addresses from these ranges.

Packets are sent to local computers or remote destinations based on their IP addresses. This is routing work, and it’s where the routed part of routed AP name comes from. If you’d like to build a more complex router with DHCP and DNS, pick up some tips from the article How to use Fedora Server to create a router / gateway.

It’s not a bridged AP

Netowrk bridging is another way of extending a network, but it’s not how this Pi is set up. This routed AP is not a bridged AP. To understand the difference between routing and bridging, you have to know a little about the networking layers of the OSI network model. A good place to start is the beginner’s guide to network troubleshooting in Linux. Here’s the short answer.

  • layer 3, network ← Yes, our routed AP is here.
  • layer 2, data link ← No, it’s not a bridged AP.
  • layer 1, physical ← Radio transmission is covered here.

A bridge works at a lower layer of the network stack – it uses ethernet MAC addresses to send data. If this was a bridged AP, it wouldn’t have two sets of IP addresses; the new wireless network and the current wired network would use the same IP subnet.

IP masquerading

You won’t find an IP address starting with 10. anywhere on the Internet. It’s a private address, not a public address. To get an IP packet routed out of the wifi network and back in again, packet addresses have to be changed. IP masquerading is a way of making this routing work. The masquerade name is used because the packets’ real addresses are hidden. the wired network doesn’t see any addresses from the wireless network.

IP masquerading is set up automatically by NetworkManager. NetworkManager adds nftables rules to handle IP masquerading.

The Pi’s network stack

A stack of network hardware and software makes wifi work.

  • Network hardware
  • Kernel space software
  • User space software

You can see the network hardware. The Raspberry Pi has two main hardware components – a tiny antenna and Broadcom wifi chip. MagPi magazine has some great photos.

Kernel software provides the plumbing. There’s no need to work on these directly – it’s all good to go in the Fedora distribution.

  • Broadcom driver modules talk to the hardware. List these with the command lsmod | grep brcm.
  • A TCP/IP stack handles protocols.
  • The netfilter framework filters packets.
  • A network system ties these all together.

User space software customizes the system. It’s full of utilities that either help the user, talk to the kernel, or connect other utilities together. For instance, the firewall-cmd tool talks to the firewalld service, firewalld talks to the nftables tool, and nftables talks to the netfilter framework in the kernel. The nmcli commands talk to NetworkManager. And NetworkManager talks to pretty much everything.

Create the AP

That’s enough theory — let’s get practical. Fire up your Raspberry Pi running Fedora and run these commands.

Install software

Nearly all the required software is included with the Fedora Minimal image. The only thing missing is the dnsmasq package. This handles the DHCP and IP address part of the new wifi network, automatically. Run this command using sudo:

$ sudo dnf install dnsmasq

Create a new NetworkManager connection

NetworkManager sets up one network connection automatically, Wired connection 1. Use the nmcli tool to tell NetworkManager how to add a wifi connection. NetworkManager saves these settings, and a bunch more, in a new config file.

The new configuration file is created in the directory /etc/sysconfig/network-scripts/. At first, it’s empty; the image has no configuration files for network interfaces. If you want to find out more about how NetworkManager uses the network-scripts directory, the gory details are in the nm-settings-ifcfg-rh man page.

[nick@raspi ~]$ ls /etc/sysconfig/network-scripts/
[nick@raspi ~]$

The first nmcli command, to create a network connection, looks like this. There’s more to do — the Pi won’t work as a hotspot after running this.

nmcli con add \ type wifi \ ifname wlan0 \ con-name 'raspi hotspot' \ autoconnect yes \ ssid 'raspi wifi'

The following commands complete several more steps:

  • Create a new connection.
  • List the connections.
  • Take another look at the network-scripts folder. NetworkManager added a config file.
  • List available APs to connect to.

This requires running several commands as root using sudo:

$ sudo nmcli con add type wifi ifname wlan0 con-name 'raspi hotspot' autoconnect yes ssid 'raspi wifi'
Connection 'raspi wifi' (13ea67a7-a8e6-480c-8a46-3171d9f96554) successfully added.
$ sudo nmcli connection show
NAME UUID TYPE DEVICE
Wired connection 1 59b7f1b5-04e1-3ad8-bde8-386a97e5195d ethernet eth0
raspi wifi 13ea67a7-a8e6-480c-8a46-3171d9f96554 wifi wlan0
$ ls /etc/sysconfig/network-scripts/
ifcfg-raspi_wifi
$ sudo nmcli device wifi list
IN-USE BSSID SSID MODE CHAN RATE SIGNAL BARS SECURITY 01:0B:03:04:C6:50 APrivateAP Infra 6 195 Mbit/s 52 ▂▄__ WPA2 02:B3:54:05:C8:51 SomePublicAP Infra 6 195 Mbit/s 52 ▂▄__ --

You can remove the new config and start again with this command:

$ sudo nmcli con delete 'raspi hotspot'

Change the connection mode

A NetworkManager connection has many configuration settings. You can see these with the command nmcli con show ‘raspi hotspot’. Some of these settings start with the label 802-11-wireless. This is to do with industry standards that make wifi work – the IEEE organization specified many protocols for wifi, named 802.11. This new wifi connection is in infrastructure mode, ready to connect to a wifi access point. The Pi isn’t supposed to connect to another AP; it’s supposed to be the AP that others connect to.

This command changes the mode from infrastructure to AP. It also sets a few other wireless properties. The bg value tells NetworkManager to follow two old IEEE standards – 802.11b and 802.11g. Basically it configures the radio to use the 2.4GHz frequency band, not the 5GHz band. ipv4.method shared means this connection will be shared with others.

  • Change the connection to a hotspot by changing the mode to ap.
sudo nmcli connection \ modify "raspi hotspot" \ 802-11-wireless.mode ap \ 802-11-wireless.band bg \ ipv4.method shared

The connection starts automatically. The dnsmasq application gives the wlan0 interface an IP address of 10.42.0.1. The manual commands to start and stop the hotspot are:

$ sudo nmcli con up "raspi hotspot"
$ sudo nmcli con down "raspi hotspot"

Connect a device

The next steps are to:

  • Watch the log.
  • Connect a smartphone.
  • When you’ve seen enough, type
    ^C

    ([control][c]) to stop watching the log.

$ journalctl --follow
-- Logs begin at Wed 2020-04-01 18:23:45 BST. --
...

Use a wifi-enabled device, like your phone. The phone can find the new raspi wifi network.

Messages about an associating client appear in the activity log:

Jun 10 18:08:05 raspi wpa_supplicant[662]: wlan0: AP-STA-CONNECTED 94:b0:1f:2e:d2:bd
Jun 10 18:08:05 raspi wpa_supplicant[662]: wlan0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
Jun 10 18:08:05 raspi dnsmasq-dhcp[713]: DHCPREQUEST(wlan0) 10.42.0.125 94:b0:1f:2e:d2:bd
Jun 10 18:08:05 raspi dnsmasq-dhcp[713]: DHCPACK(wlan0) 10.42.0.125 94:b0:1f:2e:d2:bd nick

Examine the firewall

A new security zone named nm-shared has appeared. This is stopping some wifi access.

$ sudo firewall-cmd --get-active-zones
[sudo] password for nick:
nm-shared interfaces: wlan0
public interfaces: eth0

The new zone is set up to accept everything because the target is ACCEPT. Clients are able to use web, mail and SSH to get to the Internet.

$ sudo firewall-cmd --zone=nm-shared --list-all
nm-shared (active) target: ACCEPT icmp-block-inversion: no interfaces: wlan0 sources: services: dhcp dns ssh ports: protocols: icmp ipv6-icmp masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: rule priority="32767" reject

This big list of config settings takes a little examination.

The first line, the innocent-until-proven-guilty option target: ACCEPT says all traffic is allowed through, unless a rule says otherwise. It’s the same as saying these types of traffic are all OK.

  • inbound packets – requests sent from wifi clients to the Raspberry Pi
  • forwarded packets – requests from wifi clients to the Internet
  • outbound packets – requests sent by the PI to wifi clients

However, there’s a hidden gotcha: requests from wifi clients (like your workstation) to the Raspberry Pi may be rejected. The final line — the mysterious rule in the rich rules section — refers to the routing policy database. The rule stops you from connecting from your workstation to your Pi with a command like this: ssh 10.42.0.1. This rule only affects traffic sent to to the Raspberry Pi, not traffic sent to the Internet, so browsing the web works fine.

If an inbound packet matches something in the services and protocols lists, it’s allowed through. NetworkManager automatically adds ICMP, DHCP and DNS (Internet infrastructure services and protocols). An SSH packet doesn’t match, gets as far as the post-processing stage, and is rejected — priority=”32767″ translates as “do this after all the processing is done.”

If you want to know what’s happening behind the scenes, that rich rule creates an nftables rule. The nftables rule looks like this.

$ sudo nft list chain inet firewalld filter_IN_nm-shared_post
table inet firewalld { chain filter_IN_nm-shared_post { reject }
}

Fix SSH login

Connect from your workstation to the Raspberry Pi using SSH.This won’t work because of the rich rule. A protocol that’s not on the list gets instantly rejected.

Check that SSH is blocked:

$ ssh 10.42.0.1
ssh: connect to host 10.42.0.1 port 22: Connection refused

Next, add SSH to the list of allowed services. If you don’t remember what services are defined, list them all with firewall-cmd ‐‐get-services. For SSH, use option ‐‐add-service ssh or ‐‐remove-service ssh. Don’t forget to make the change permanent.

$ sudo firewall-cmd --add-service ssh --permanent --zone=nm-shared
success

Now test with SSH again.

$ ssh 10.42.0.1
The authenticity of host '10.42.0.1 (10.42.0.1)' can't be established.
ECDSA key fingerprint is SHA256:dDdgJpDSMNKR5h0cnpiegyFGAwGD24Dgjg82/NUC3Bc.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.42.0.1' (ECDSA) to the list of known hosts.
Last login: Tue Jun 9 18:58:36 2020 from 10.0.1.35
[email protected]'s password:

SSH access is no longer blocked.

Test as a headless computer

The raspberry pi runs fine as a headless computer. From here on, you can use SSH to work on your Pi.

  • Power off.
  • Remove keyboard and video monitor.
  • Power on.
  • Wait a couple minutes.
  • Connect from your workstation to the Raspberry Pi using SSH. Use either the wired interface or the wireless one; both work.

Increase security with WPA-PSK

The WPA-PSK (Wifi Protected Access with Pre-Shared Key) system is designed for home users and small offices. It is password protected. Use nmcli again to add WPA-PSK:

$ sudo nmcli con modify "raspi hotspot" wifi-sec.key-mgmt wpa-psk
$ sudo nmcli con modify "raspi hotspot" wifi-sec.psk "hotspot-password"

Troubleshooting

Here are a couple recommendations:

The bad news is, there are no troubleshooting tips here. There are so many things that can go wrong, there’s no way of covering them.

Troubleshooting a network stack is tricky. If one component goes wrong, it may all go wrong. And making changes like reloading firewall rules can upset services like NetworkManager and sshd. You know you’re in the weeds when you find yourself running nftables commands like nft list ruleset and firewalld commands like firewall-cmd ‐‐set-log-denied=all.

Play with your new platform

Add value to your new AP. Since you’re running a Pi, there are many hardware add-ons. Since it’s running Fedora, you have thousands of packages available. Try turning it into a mini-NAS, or adding battery back-up, or perhaps a music player.


Photo by Uriel SC on Unsplash.

Posted on Leave a comment

TCP window scaling, timestamps and SACK

The Linux TCP stack has a myriad of sysctl knobs that allow to change its behavior.  This includes the amount of memory that can be used for receive or transmit operations, the maximum number of sockets and optional features and protocol extensions.

There are  multiple articles that recommend to disable TCP extensions, such as timestamps or selective acknowledgments (SACK) for various “performance tuning” or “security” reasons.

This article provides background on what these extensions do, why they
are enabled by default, how they relate to one another and why it is normally a bad idea to turn them off.

TCP Window scaling

The data transmission rate that TCP can sustain is limited by several factors. Some of these are:

  • Round trip time (RTT).  This is the time it takes for a packet to get to the destination and a reply to come back. Lower is better.
  • lowest link speed of the network paths involved
  • frequency of packet loss
  • the speed at which new data can be made available for transmission
    For example, the CPU needs to be able to pass data to the network adapter fast enough. If the CPU needs to encrypt the data first, the adapter might have to wait for new data. In similar fashion disk storage can be a bottleneck if it can’t read the data fast enough.
  • The maximum possible size of the TCP receive window. The receive window determines how much data (in bytes) TCP can transmit before it has to wait for the receiver to report reception of that data. This is announced by the receiver. The receiver will constantly update this value as it reads and acknowledges reception of the incoming data. The receive windows current value is contained in the TCP header that is part of every segment sent by TCP. The sender is thus aware of the current receive window whenever it receives an acknowledgment from the peer. This means that the higher the round-trip time, the longer it takes for sender to get receive window updates.

TCP is limited to at most 64 kilobytes of unacknowledged (in-flight) data. This is not even close to what is needed to sustain a decent data rate in most networking scenarios. Let us look at some examples.

Theoretical data rate

With a round-trip-time of 100 milliseconds, TCP can transfer at most 640 kilobytes per second. With a 1 second delay, the maximum theoretical data rate drops down to only 64 kilobytes per second.

This is because of the receive window. Once 64kbyte of data have been sent the receive window is already full.  The sender must wait until the peer informs it that at least some of the data has been read by the application. 

The first segment sent reduces the TCP window by the size of that segment. It takes one round-trip before an update of the receive window value will become available. When updates arrive with a 1 second delay, this results in a 64 kilobyte limit even if the link has plenty of bandwidth available.

In order to fully utilize a fast network with several milliseconds of delay, a window size larger than what classic TCP supports is a must. The ’64 kilobyte limit’ is an artifact of the protocols specification: The TCP header reserves only 16bits for the receive window size. This allows receive windows of up to 64KByte. When the TCP protocol was originally designed, this size was not seen as a limit.

Unfortunately, its not possible to just change the TCP header to support a larger maximum window value. Doing so would mean all implementations of TCP would have to be updated simultaneously or they wouldn’t understand one another anymore. To solve this, the interpretation of the receive window value is changed instead.

The ‘window scaling option’ allows to do this while keeping compatibility to existing implementations.

TCP Options: Backwards-compatible protocol extensions

TCP supports optional extensions. This allows to enhance the protocol with new features without the need to update all implementations at once. When a TCP initiator connects to the peer, it also send a list of supported extensions. All extensions follow the same format: an unique option number followed by the length of the option and the option data itself.

The TCP responder checks all the option numbers contained in the connection request. If it does not understand an option number it skips
‘length’ bytes of data and checks the next option number. The responder omits those it did not understand from the reply. This allows both the sender and receiver to learn the common set of supported options.

With window scaling, the option data always consist of a single number.

The window scaling option

 
Window Scale option (WSopt): Kind: 3, Length: 3
    +---------+---------+---------+
    | Kind=3  |Length=3 |shift.cnt|
    +---------+---------+---------+
         1         1         1

The window scaling option tells the peer that the receive window value found in the TCP header should be scaled by the given number to get the real size.

For example, a TCP initiator that announces a window scaling factor of 7 tries to instruct the responder that any future packets that carry a receive window value of 512 really announce a window of 65536 byte. This is an increase by a factor of 128. This would allow a maximum TCP Window of 8 Megabytes.

A TCP responder that does not understand this option ignores it. The TCP packet sent in reply to the connection request (the syn-ack) then does not contain the window scale option. In this case both sides can only use a 64k window size. Fortunately, almost every TCP stack supports and enables this option by default, including Linux.

The responder includes its own desired scaling factor. Both peers can use a different number. Its also legitimate to announce a scaling factor of 0. This means the peer should treat the receive window value it receives verbatim, but it allows scaled values in the reply direction — the recipient can then use a larger receive window.

Unlike SACK or TCP timestamps, the window scaling option only appears in the first two packets of a TCP connection, it cannot be changed afterwards. It is also not possible to determine the scaling factor by looking at a packet capture of a connection that does not contain the initial connection three-way handshake.

The largest supported scaling factor is 14. This allows TCP window sizes
of up to one Gigabyte.

Window scaling downsides

It can cause data corruption in very special cases. Before you disable the option – it is impossible under normal circumstances. There is also a solution in place that prevents this. Unfortunately, some people disable this solution without realizing the relationship with window scaling. First, let’s have a look at the actual problem that needs to be addressed. Imagine the following sequence of events:

  1. The sender transmits segments: s_1, s_2, s_3, … s_n
  2.  The receiver sees: s_1, s_3, .. s_n and sends an acknowledgment for s_1.
  3.  The sender considers s_2 lost and sends it a second time. It also sends new data contained in segment s_n+1.
  4.  The receiver then sees: s_2, s_n+1, s_2: the packet s_2 is received twice.

This can happen for example when a sender triggers re-transmission too early. Such erroneous re-transmits are never a problem in normal cases, even with window scaling. The receiver will just discard the duplicate.

Old data to new data

The TCP sequence number can be at most 4 Gigabyte. If it becomes larger than this, the sequence wraps back to 0 and then increases again. This is not a problem in itself, but if this occur fast enough then the above scenario can create an ambiguity.

If a wrap-around occurs at the right moment, the sequence number s_2 (the re-transmitted packet) can already be larger than s_n+1. Thus, in the last step (4), the receiver may interpret this as: s_2, s_n+1, s_n+m, i.e. it could view the ‘old’ packet s_2 as containing new data.

Normally, this won’t happen because a ‘wrap around’ occurs only every couple of seconds or minutes even on high bandwidth links. The interval between the original and a unneeded re-transmit will be a lot smaller.

For example,with a transmit speed of 50 Megabytes per second, a
duplicate needs to arrive more than one minute late for this to become a problem. The sequence numbers do not wrap fast enough for small delays to induce this problem.

Once TCP approaches ‘Gigabyte per second’ throughput rates, the sequence numbers can wrap so fast that even a delay by only a few milliseconds can create duplicates that TCP cannot detect anymore. By solving the problem of the too small receive window, TCP can now be used for network speeds that were impossible before – and that creates a new, albeit rare problem. To safely use Gigabytes/s speed in environments with very low RTT receivers must be able to detect such old duplicates without relying on the sequence number alone.

TCP time stamps

A best-before date

In the most simple terms, TCP timestamps just add a time stamp to the packets to resolve the ambiguity caused by very fast sequence number wrap around. If a segment appears to contain new data, but its timestamp is older than the last in-window packet, then the sequence number has wrapped and the ”new” packet is actually an older duplicate. This resolves the ambiguity of re-transmits even for extreme corner cases.

But this extension allows for more than just detection of old packets. The other major feature made possible by TCP timestamps are more precise round-trip time measurements (RTTm).

A need for precise round-trip-time estimation

When both peers support timestamps,  every TCP segment carries two additional numbers: a timestamp value and a timestamp echo.

 
TCP Timestamp option (TSopt): Kind: 8, Length: 10
+-------+----+----------------+-----------------+
|Kind=8 | 10 |TS Value (TSval)|EchoReply (TSecr)|
+-------+----+----------------+-----------------+
    1      1         4                4

An accurate RTT estimate is crucial for TCP performance. TCP automatically re-sends data that was not acknowledged. Re-transmission is triggered by a timer: If it expires, TCP considers one or more packets that it has not yet received an acknowledgment for to be lost. They are then sent again.

But “has not been acknowledged” does not mean the segment was lost. It is also possible that the receiver did not send an acknowledgment so far or that the acknowledgment is still in flight. This creates a dilemma: TCP must wait long enough for such slight delays to not matter, but it can’t wait for too long either.

Low versus high network delay

In networks with a high delay, if the timer fires too fast, TCP frequently wastes time and bandwidth with unneeded re-sends.

In networks with a low delay however,  waiting for too long causes reduced throughput when a real packet loss occurs. Therefore, the timer should expire sooner in low-delay networks than in those with a high delay. The tcp retransmit timeout therefore cannot use a fixed constant value as a timeout. It needs to adapt the value based on the delay that it experiences in the network.

Round-trip time measurement

TCP picks a retransmit timeout that is based on the expected round-trip time (RTT). The RTT is not known in advance. RTT is estimated by measuring the delta between the time a segment is sent and the time TCP receives an acknowledgment for the data carried by that segment.

This is complicated by several factors.

  • For performance reasons, TCP does not generate a new acknowledgment for every packet it receives. It waits  for a very small amount of time: If more segments arrive, their reception can be acknowledged with a single ACK packet. This is called “cumulative ACK”.
  •  The round-trip-time is not constant. This is because of a myriad of factors. For example, a client might be a mobile phone switching to different base stations as its moved around. Its also possible that packet switching takes longer when link or CPU utilization increases.
  • a packet that had to be re-sent must be ignored during computation. This is because the sender cannot tell if the ACK for the re-transmitted segment is acknowledging the original transmission (that arrived after all) or the re-transmission.

This last point is significant: When TCP is busy recovering from a loss, it may only receives ACKs for re-transmitted segments. It then can’t measure (update) the RTT during this recovery phase. As a consequence it can’t adjust the re-transmission timeout, which then keeps growing exponentially. That’s a pretty specific case (it assumes that other mechanisms such as fast retransmit or SACK did not help). Nevertheless, with TCP timestamps, RTT evaluation is done even in this case.

If the extension is used, the peer reads the timestamp value from the TCP segments extension space and stores it locally. It then places this value in all the segments it sends back as the “timestamp echo”.

Therefore the option carries two timestamps: Its senders own timestamp and the most recent timestamp it received from the peer. The “echo timestamp” is used by the original sender to compute the RTT. Its the delta between its current timestamp clock and what was reflected in the “timestamp echo”.

Other timestamp uses

TCP timestamps even have other uses beyond PAWS and RTT measurements. For example it becomes possible to detect if a retransmission was unnecessary. If the acknowledgment carries an older timestamp echo, the acknowledgment was for the initial packet, not the re-transmitted one.

Another, more obscure use case for TCP timestamps is related to the TCP syn cookie feature.

TCP connection establishment on server side

When connection requests arrive faster than a server application can accept the new incoming connection, the connection backlog will eventually reach its limit. This can occur because of a mis-configuration of the system or a bug in the application. It also happens when one or more clients send connection requests without reacting to the ‘syn ack’ response. This fills the connection queue with incomplete connections. It takes several seconds for these entries to time out. This is called a “syn flood attack”.

TCP timestamps and TCP syn cookies

Some TCP stacks allow to accept new connections even if the queue is full. When this happens, the Linux kernel will print a prominent message to the system log:

Possible SYN flooding on port P. Sending Cookies. Check SNMP counters.

This mechanism bypasses the connection queue entirely. The information that is normally stored in the connection queue is encoded into the SYN/ACK responses TCP sequence number. When the ACK comes back, the queue entry can be rebuilt from the sequence number.

The sequence number only has limited space to store information. Connections established using the ‘TCP syn cookie’ mechanism can not support TCP options for this reason.

The TCP options that are common to both peers can be stored in the timestamp, however. The ACK packet reflects the value back in the timestamp echo field which allows to recover the agreed-upon TCP options as well. Else, cookie-connections are restricted by the standard 64 kbyte receive window.

Common myths – timestamps are bad for performance

Unfortunately some guides recommend disabling TCP timestamps to reduce the number of times the kernel needs to access the timestamp clock to get the current time. This is not correct. As explained before, RTT estimation is a necessary part of TCP. For this reason, the kernel always takes a microsecond-resolution time stamp when a packet is received/sent.

Linux re-uses the clock timestamp taken for the RTT estimation for the remainder of the packet processing step. This also avoids the extra clock access to add a timestamp to an outgoing TCP packet.

The entire timestamp option only requires 10 bytes of TCP option space in each packet, this is not a significant decrease in space available for packet payload.

common myths – timestamps are a security problem

Some security audit tools and (older) blog posts recommend to disable TCP
timestamps because they allegedly leak system uptime: This would then allow to estimate the patch level of the system/kernel. This was true in the past: The timestamp clock is based on a constantly increasing value that starts at a fixed value on each system boot. A timestamp value would give a estimate as to how long the machine has been running (uptime).

As of Linux 4.12 TCP timestamps do not reveal the uptime anymore. All timestamp values sent use a peer-specific offset. Timestamp values also wrap every 49 days.

In other words, connections from or to address “A” see a different timestamp than connections to the remote address “B”.

Run sysctl net.ipv4.tcp_timestamps=2 to disable the randomization offset. This makes analyzing packet traces recorded by tools like wireshark or tcpdump easier – packets sent from the host then all have the same clock base in their TCP option timestamp.  For normal operation the default setting should be left as-is.

Selective Acknowledgments

TCP has problems if several packets in the same window of data are lost. This is because TCP Acknowledgments are cumulative, but only for packets
that arrived in-sequence. Example:

  • Sender transmits segments s_1, s_2, s_3, … s_n
  • Sender receives ACK for s_2
  • This means that both s_1 and s_2 were received and the
    sender no longer needs to keep these segments around.
  • Should s_3 be re-transmitted? What about s_4? s_n?

The sender waits for a “retransmission timeout” or ‘duplicate ACKs’ for s_2 to arrive. If a retransmit timeout occurs or several duplicate ACKs for s_2 arrive, the sender transmits s_3 again.

If the sender receives an acknowledgment for s_n, s_3 was the only missing packet. This is the ideal case. Only the single lost packet was re-sent.

If the sender receives an acknowledged segment that is smaller than s_n, for example s_4, that means that more than one packet was lost. The
sender needs to re-transmit the next segment as well.

Re-transmit strategies

Its possible to just repeat the same sequence: re-send the next packet until the receiver indicates it has processed all packet up to s_n. The problem with this approach is that it requires one RTT until the sender knows which packet it has to re-send next. While such strategy avoids unnecessary re-transmissions, it can take several seconds and more until TCP has re-sent the entire window of data.

The alternative is to re-send several packets at once. This approach allows TCP to recover more quickly when several packets have been lost. In the above example TCP re-send s_3, s_4, s_5, .. while it can only be sure that s_3 has been lost.

From a latency point of view, neither strategy is optimal. The first strategy is fast if only a single packet has to be re-sent, but takes too long when multiple packets were lost.

The second one is fast even if multiple packet have to be re-sent, but at the cost of wasting bandwidth. In addition, such a TCP sender could have transmitted new data already while it was doing the unneeded re-transmissions.

With the available information TCP cannot know which packets were lost. This is where TCP Selective Acknowledgments (SACK) come in. Just like window scaling and timestamps, it is another optional, yet very useful TCP feature.

The SACK option

 
   TCP Sack-Permitted Option: Kind: 4, Length 2
   +---------+---------+
   | Kind=4  | Length=2|
   +---------+---------+

A sender that supports this extension includes the “Sack Permitted” option in the connection request. If both endpoints support the extension, then a peer that detects a packet is missing in the data stream can inform the sender about this.

 
   TCP SACK Option: Kind: 5, Length: Variable
                     +--------+--------+
                     | Kind=5 | Length |
   +--------+--------+--------+--------+
   |      Left Edge of 1st Block       |
   +--------+--------+--------+--------+
   |      Right Edge of 1st Block      |
   +--------+--------+--------+--------+
   |                                   |
   /            . . .                  /
   |                                   |
   +--------+--------+--------+--------+
   |      Left Edge of nth Block       |
   +--------+--------+--------+--------+
   |      Right Edge of nth Block      |
   +--------+--------+--------+--------+

A receiver that encounters segment_s2 followed by s_5…s_n, it will include a SACK block when it sends the acknowledgment for s_2:

 
                +--------+-------+
                | Kind=5 |   10  |
+--------+------+--------+-------+
| Left edge: s_5                 |
+--------+--------+-------+------+
| Right edge: s_n                |
+--------+-------+-------+-------+

This tells the sender that segments up to s_2 arrived in-sequence, but it also lets the sender know that the segments s_5 to s_n were also received. The sender can then re-transmit these two packets and proceed to send new data.

The mythical lossless network

In theory SACK provides no advantage if the connection cannot experience packet loss. Or the connection has such a low latency that even waiting one full RTT does not matter.

In practice lossless behavior is virtually impossible to ensure.
Even if the network and all its switches and routers have ample bandwidth and buffer space packets can still be lost:

  • The host operating system might be under memory pressure and drop
    packets. Remember that a host might be handling tens of thousands of packet streams simultaneously.
  • The CPU might not be able to drain incoming packets from the network interface fast enough. This causes packet drops in the network adapter itself.
  • If TCP timestamps are not available even a connection with a very small RTT can stall momentarily during loss recovery.

Use of SACK does not increase the size of TCP packets unless a connection experiences packet loss. Because of this, there is hardly a reason to disable this feature. Almost all TCP stacks support SACK – it is typically only absent on low-power IOT-alike devices that are not doing TCP bulk data transfers.

When a Linux system accepts a connection from such a device, TCP automatically disables SACK for the affected connection.

Summary

The three TCP extensions examined in this post are all related to TCP performance and should best be left to the default setting: enabled.

The TCP handshake ensures that only extensions that are understood by both parties are used, so there is never a need to disable an extension globally just because a peer might not support it.

Turning these extensions off results in severe performance penalties, especially in case of TCP Window Scaling and SACK. TCP timestamps can be disabled without an immediate disadvantage, however there is no compelling reason to do so anymore. Keeping them enabled also makes it possible to support TCP options even when SYN cookies come into effect.

Posted on Leave a comment

install Fedora on a Raspberry Pi 3

Fire up a Raspberry Pi with Fedora.

The Raspberry Pi Foundation has produced quite a few models over the years. This procedure was tested on third generation Pis – a Model B v1.2, and a Model B+ (the older 2 and the new 4 weren’t tested). These are the credit-card size Pis that have been around a few years.

get hardware

You do need a few hardware items, including the Pi. You don’t need any HaT (Hardware Attached on Top) boards or USB antennas. If you have used your Pi in the past, you probably have all these items.

  • current network. Perhaps this is your home lab.
  • ethernet cable. This connects the current network to the Raspberry Pi
  • Raspberry Pi 3, model B or B+.
  • power supply
  • micro-SD card, 8GB or larger.
  • keyboard and video monitor.

The keyboard and video monitor together make up the local console. It’s possible – though complicated – to get by without a console, such as setting up an automated install then connecting over the network. A local console makes it easy to answer the configuration questions during Fedora’s first boot. Also, a mistake during AP configuration may break the network, locking out remote users.

download Fedora Minimal

The Fedora Minimal image, one of Fedora’s alt downloads, has all the core packages and network packages required (well, nearly – check out dnsmasq below). The image contains a ready-made file system, with over 400 packages already installed. This minimal image does not include popular packages like a development environment, Internet service or desktop. These types of software aren’t required for this work, and may well use too much memory if you install them.

The Fedora Minimal raw image fits on a small SD card and runs in less than 1 GB of memory (these old Pis have 1GB RAM).

The name of the downloaded file is something like Fedora-Minimal-32-1.6.aarch64.raw.xz. The file is compressed and is about 700MB in size. When the file is uncompressed, it’s 5GB. It’s an ext4 file system that’s mostly empty – about 1GB is used and 4GB is empty. All that empty space is the reason the compressed download is so much smaller than the uncompressed raw image.

copy to the micro-SD card

  • Copy the image to a micro-SD card.

This can be a more complex than it sounds, and a painful experience. Finding a good micro-SD card takes work. Then there’s the challenge of physically attaching the card to your computer.Perhaps your laptop has a full SD card slot and you need a card adapter, or perhaps you need a USB adapter. Then, when it comes to copying, the OS may either help or get in your way. You may have luck with Fedora Media Writer, or with these Linux commands.

unxz ./Fedora-Minimal-32-1.6.aarch64.raw.xz
dd if=./Fedora-Minimal-32-1.6.aarch64.raw of=/dev/mmcblk0 bs=8M status=progress oflag=direct

set up Fedora

  • Connect the Pi, power cable, network cable and micro-SD card.
  • Hit the power.
  • See the colored box as the graphics chip powers up.
  • Wait for the anaconda installer to start.
  • Answer anaconda’s setup questions.

Initial configuration of the OS takes a few minutes – a couple minutes waiting for boot-up, and a couple to fill out the spokes of anaconda’s text-based installer. In the examples below, the user is named nick and is an administrator (a member of the wheel group).

Congratulations! Your Fedora Pi is up and operational.

update software

  • Update packages with `dnf update`.
  • Reboot the machine with `systemctl reboot`.

Over the years, a lot of people have put a lot of work into making the Raspberry Pi devices work. Use the latest software to make sure you get the benefit of their hard work. If you skip this step, you may find some things just don’t work.

The update downloads and installs about a hundred packages. Since the storage is a micro-SD card, writing new software is a slow process. This is what using computing storage felt like in the 1990s.

things to play with

There are a few other things that can be set up at this point, if you want to play around. It’s all optional. Try things like this.

  • Replace the localhost hostname with the command `sudo hostnamectl set-hostname raspi`.
  • Find the IP address with `ip addr`.
  • Try an SSH login, or even set up key-based login with `ssh-copy-id`.
  • Power down with `systemctl poweroff`.
Posted on Leave a comment

Backup and restore Toolboxes

Toolboxes started life often described as disposable containers – and that is still one of their major uses: install stuff, then try it out in the relative safety of a container, and lastly, cleanly dispose of it. Minimal risk, fuss and without pesky residual libraries and applications hanging around on the host long after you have finished.

So — why would you backup a Toolbox? Sometimes, they have more permanent uses, contain complex and lengthy installs, or are being used for critical applications. For example, Toolboxes can be used as a development environment, containing hardware associated drivers and applications. Or they could be used for an application you want to run in a container for which there is no Flatpak, or one that has requirements a Flatpak doesn’t satisfy. While they can be handy to use on Fedora Workstation, toolbox containers are often essential for Silverblue users since they offer an easy solution to installing applications that can’t successfully be installed by rpm-ostree. Or for applications that may not have a Flatpak version readily available. In the above situations a busted Toolbox can be a major headache. But if a backup exists, you can quickly restore a Toolbox or move it to another workstation.

The backup process uses Podman to create an image of an existing toolbox container, and save that image to an archive file. To restore the toolbox container, load the image from the archive file and then create a Toolbox from that image. The new toolbox container will be an identical copy of your backed up toolbox container.

It is important to note this process does not backup data, just what you have installed in the toolbox container. This includes packages installed from repositories or from a local rpm file using dnf. If you need to backup data, Podman’s commit command that will be used to capture an image of the toolbox container, has an option to include volumes attached to the container.

Creating a backup

To backup a toolbox container you will need it’s name and container ID which can be gotten by using toolbox list. For this example I am going to backup my golang development toolbox container, imaginatively named go.

$ toolbox list CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
00ff783a102f go 5 weeks ago exited registry.fedoraproject.org/f32/fedora-toolbox:32

If the container’s status shows as running , you should stop it using podman container stop container_name. Although the commit command has a -p for pause option, make sure that the Toolbox is not running, which helps it initialize correctly when restored from backup.

$ podman container stop go

To create an image of the toolbox container use

podman container commit -p container_ID backup-image-name

Depending on the complexity of the Toolbox, this can take a little while.

 $ podman container commit -p 00ff783a102f go-backup

Now to confirm the image has been created type…

$ toolbox list

You should get output similar to what is below…

IMAGE ID IMAGE NAME CREATED
cfcb13046db7 localhost/go-backup:latest About a minute ago CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
00ff783a102f go 5 weeks ago exited registry.fedoraproject.org/f32/fedora-toolbox:32

Now to save the backup image to a tar archive file using podman save -o backup-filename.tar backup-image-name.

$ podman save -o go.tar go-backup

Confirm the archive file, our toolbox container backup, was created.

$ ls go.tar 

Do some tidying up, remove the backup image and, if needed, remove the original Toolbox.

$ podman rmi go-backup $ toolbox rm go

Restore a backup

To create an image from the backup file that was made above, you do it with the command podman load -i backup_filename.

$ podman load -i go.tar

Then you can confirm the image was created with…

$ toolbox list IMAGE ID IMAGE NAME CREATED
cfcb13046db7 localhost/go-backup:latest 17 minutes ago

Now create a toolbox container from the restored image, with toolbox create –container container_name ––image image_name, specifying the full repository and version tag as the image name.

$ toolbox create --container go --image localhost/go-backup:latest

Confirm that the toolbox was created.

$ toolbox list IMAGE ID IMAGE NAME CREATED
cfcb13046db7 localhost/go-backup:latest 20 minutes ago CONTAINER ID CONTAINER NAME CREATED STATUS IMAGE NAME
34cef6b7e28d go 21 seconds ago configured localhost/go-backup:latest

Finally, you can test that the restored Toolbox works…

$ toolbox enter --container go

If you can enter the newly created toolbox container, you will see the toolbox prompt and will have successfully backed up and restored your Pet toolbox container.

Posted on Leave a comment

Spam Classification with ML-Pack

Introduction

ML-Pack is a small footprint C++ machine learning library that can be easily integrated into other programs. It is an actively developed open source project and released under a BSD-3 license. Machine learning has gained popularity due to the large amount of electronic data that can be collected. Some other popular machine learning frameworks include TensorFlow, MxNet, PyTorch, Chainer and Paddle Paddle, however these are designed for more complex workflows than ML-Pack. On Fedora, ML-Pack is packaged by its lead developer Ryan Curtin. In addition to a command line interface, ML-Pack has bindings for Python and Julia. Here, we will focus on the command line interface since this may be useful for system administrators to integrate into their workflows.

Installation

You can install ML-Pack on the Fedora command line using

$ sudo dnf -y install mlpack mlpack-bin

You can also install the documentation, development headers and Python bindings by using …

$ sudo dnf -y install mlpack-doc \
mlpack-devel mlpack-python3

though they will not be used in this introduction.

Example

As an example, we will train a machine learning model to classify spam SMS messages. To keep this article brief, linux commands will not be fully explained, but you can find out more about them by using the man command, for example for the command first command used below, wget

$ man wget

will give you information that wget will download files from the web and options you can use for it.

Get a dataset

We will use an example spam dataset in Indonesian provided by Yudi Wibisono

 
$ wget https://drive.google.com/file/d/1-stKadfTgJLtYsHWqXhGO3nTjKVFxm_Q/view
$ unzip dataset_sms_spam_bhs_indonesia_v1.zip

Pre-process dataset

We will try to classify a message as spam or ham by the number of occurrences of a word in a message. We first change the file line endings, remove line 243 which is missing a label and then remove the header from the dataset. Then, we split our data into two files, labels and messages. Since the labels are at the end of the message, the message is reversed and then the label removed and placed in one file. The message is then removed and placed in another file.

$ tr 'r' 'n' < dataset_sms_spam_v1.csv > dataset.txt
$ sed '243d' dataset.txt > dataset1.csv
$ sed '1d' dataset1.csv > dataset.csv
$ rev dataset.csv | cut -c1 | rev > labels.txt
$ rev dataset.csv | cut -c2- | rev > messages.txt
$ rm dataset.csv
$ rm dataset1.csv
$ rm dataset.txt

Machine learning works on numeric data, so we will use labels of 1 for ham and 0 for spam. The dataset contains three labels, 0, normal sms (ham), 1, fraud (spam), and 2 promotion (spam). We will label all spam as 1, so promotions and fraud will be labelled as 1.

$ tr '2' '1' < labels.txt > labels.csv
$ rm labels.txt

The next step is to convert all text in the messages to lower case and for simplicity remove punctuation and any symbols that are not spaces, line endings or in the range a-z (one would need expand this range of symbols for production use)

$ tr '[:upper:]' '[:lower:]' < \
messages.txt > messagesLower.txt
$ tr -Cd 'abcdefghijklmnopqrstuvwxyz n' < \ messagesLower.txt > messagesLetters.txt
$ rm messagesLower.txt

We now obtain a sorted list of unique words used (this step may take a few minutes, so use nice to give it a low priority while you continue with other tasks on your computer).

$ nice -20 xargs -n1 < messagesLetters.txt > temp.txt
$ sort temp.txt > temp2.txt
$ uniq temp2.txt > words.txt
$ rm temp.txt
$ rm temp2.txt

We then create a matrix, where for each message, the frequency of word occurrences is counted (more on this on Wikipedia, here and here). This requires a few lines of code, so the full script, which should be saved as ‘makematrix.sh’ is below

#!/bin/bash
declare -a words=()
declare -a letterstartind=()
declare -a letterstart=()
letter=" "
i=0
lettercount=0
while IFS= read -r line; do labels[$((i))]=$line let "i++"
done < labels.csv
i=0
while IFS= read -r line; do words[$((i))]=$line firstletter="$( echo $line | head -c 1 )" if [ "$firstletter" != "$letter" ] then letterstartind[$((lettercount))]=$((i)) letterstart[$((lettercount))]=$firstletter letter=$firstletter let "lettercount++" fi let "i++"
done < words.txt
letterstartind[$((lettercount))]=$((i))
echo "Created list of letters" touch wordfrequency.txt
rm wordfrequency.txt
touch wordfrequency.txt
messagecount=0
messagenum=0
messages="$( wc -l messages.txt )"
i=0
while IFS= read -r line; do let "messagenum++" declare -a wordcount=() declare -a wordarray=() read -r -a wordarray <<> wordfrequency.txt echo "Processed message ""$messagenum" let "i++"
done < messagesLetters.txt
# Create csv file
tr ' ' ',' data.csv

Since Bash is an interpreted language, this simple implementation can take upto 30 minutes to complete. If using the above Bash script on your primary workstation, run it as a task with low priority so that you can continue with other work while you wait:

$ nice -20 bash makematrix.sh

Once the script has finished running, split the data into testing (30%) and training (70%) sets:

$ mlpack_preprocess_split \ --input_file data.csv \ --input_labels_file labels.csv \ --training_file train.data.csv \ --training_labels_file train.labels.csv \ --test_file test.data.csv \ --test_labels_file test.labels.csv \ --test_ratio 0.3 \ --verbose

Train a model

Now train a Logistic regression model:

$ mlpack_logistic_regression \
--training_file train.data.csv \
--labels_file train.labels.csv --lambda 0.1 \
--output_model_file lr_model.bin

Test the model

Finally we test our model by producing predictions,

$ mlpack_logistic_regression \
--input_model_file lr_model.bin \ --test_file test.data.csv \
--output_file lr_predictions.csv

and comparing the predictions with the exact results,

$ export incorrect=$(diff -U 0 lr_predictions.csv \
test.labels.csv | grep '^@@' | wc -l)
$ export tests=$(wc -l < lr_predictions.csv)
$ echo "scale=2; 100 * ( 1 - $((incorrect)) \
/ $((tests)))" | bc

This gives approximately 90% validation rate, similar to that obtained here.

The dataset is composed of approximately 50% spam messages, so the validation rates are quite good without doing much parameter tuning. In typical cases, datasets are unbalanced with many more entries in some categories than in others. In these cases a good validation rate can be obtained by mispredicting the class with a few entries. Thus to better evaluate these models, one can compare the number of misclassifications of spam, and the number of misclassifications of ham. Of particular importance in applications is the number of false positive spam results as these are typically not transmitted. The script below produces a confusion matrix which gives a better indication of misclassification. Save it as ‘confusion.sh’

#!/bin/bash
declare -a labels
declare -a lr
i=0
while IFS= read -r line; do labels[i]=$line let "i++"
done < test.labels.csv
i=0
while IFS= read -r line; do lr[i]=$line let "i++"
done < lr_predictions.csv
TruePositiveLR=0
FalsePositiveLR=0
TrueZerpLR=0
FalseZeroLR=0
Positive=0
Zero=0
for i in "${!labels[@]}"; do if [ "${labels[$i]}" == "1" ] then let "Positive++" if [ "${lr[$i]}" == "1" ] then let "TruePositiveLR++" else let "FalseZeroLR++" fi fi if [ "${labels[$i]}" == "0" ] then let "Zero++" if [ "${lr[$i]}" == "0" ] then let "TrueZeroLR++" else let "FalsePositiveLR++" fi fi done
echo "Logistic Regression"
echo "Total spam" $Positive
echo "Total ham" $Zero
echo "Confusion matrix"
echo " Predicted class"
echo " Ham | Spam "
echo " ---------------"
echo " Actual| Ham | " $TrueZeroLR "|" $FalseZeroLR
echo " class | Spam | " $FalsePositiveLR " |" $TruePositiveLR
echo ""

then run the script

$ bash confusion.sh

You should get output similar to

Logistic Regression
Total spam 183
Total ham 159
Confusion matrix

    Predicted class
    Ham Spam
Actual class Ham 128 26
Spam 31 157

which indicates a reasonable level of classification. Other methods you can try in ML-Pack for this problem include Naive Bayes, random forest, decision tree, AdaBoost and perceptron.

To improve the error rating, you can try other pre-processing methods on the initial data set. Neural networks can give upto 99.95% validation rates, see for example here, here and here. However, using these techniques with ML-Pack cannot be done on the command line interface at present and is best covered in another post.

For more on ML-Pack, please see the documentation.

Posted on Leave a comment

How to configure an SSH proxy server with Squid

Sometimes you can’t connect to an SSH server from your current location. Other times, you may want to add an extra layer of security to your SSH connection. In these cases connecting to another SSH server via a proxy server is one way to get through.

Squid is a full-featured proxy server application that provides caching and proxy services. It’s normally used to help improve response times and reduce network bandwidth by reusing and caching previously requested web pages during browsing.

However for this setup you’ll configure Squid to be used as an SSH proxy server since it’s a robust trusted proxy server that is easy to configure.

Installation and configuration

Install the squid package using sudo:

$ sudo dnf install squid -y

The squid configuration file is quite extensive but there are only a few things we need to configure. Squid uses access control lists to manage connections.

Edit the /etc/squid/squid.conf file to make sure you have the two lines explained below.

First, specify your local IP network. The default configuration file already has a list of the most common ones but you will need to add yours if it’s not there. For example, if your local IP network range is 192.168.1.X, this is how the line would look:

acl localnet src 192.168.1.0/24

Next, add the SSH port as a safe port by adding the following line:

acl Safe_ports port 22

Save that file. Now enable and restart the squid proxy service:

$ sudo systemctl enable squid
$ sudo systemctl restart squid

4.) By default squid proxy listens on port 3128. Configure firewalld to allow for this:

$ sudo firewall-cmd --add-service=squid --perm
$ sudo firewall-cmd --reload

Testing the ssh proxy connection

To connect to a server via ssh through a proxy server we’ll be using netcat.

Install nmap-ncat if it’s not already installed:

$ sudo dnf install nmap-ncat -y

Here is an example of a standard ssh connection:

$ ssh [email protected]

Here is how you would connect to that same server using the squid proxy server as a gateway.

This example assumes the squid proxy server’s IP address is 192.168.1.63. You can also use the host-name or the FQDN of the squid proxy server:

$ ssh [email protected] -o "ProxyCommand nc --proxy 192.168.1.63:3128 %h %p"

Here are the meanings of the options:

  • ProxyCommand – Tells ssh a proxy command is going to be used.
  • nc – The command used to establish the connection to the proxy server. This is the netcat command.
  • %h – The placeholder for the proxy server’s host-name or IP address.
  • %p – The placeholder for the proxy server’s port number.

There are many ways to configure an SSH proxy server but this is a simple way to get started.

Posted on Leave a comment

Use DNS over TLS

The Domain Name System (DNS) that modern computers use to find resources on the internet was designed 35 years ago without consideration for user privacy. It is exposed to security risks and attacks like DNS Hijacking. It also allows ISPs to intercept the queries.

Luckily, DNS over TLS and DNSSEC are available. DNS over TLS and DNSSEC allow safe and encrypted end-to-end tunnels to be created from a computer to its configured DNS servers. On Fedora, the steps to implement these technologies are easy and all the necessary tools are readily available.

This guide will demonstrate how to configure DNS over TLS on Fedora using systemd-resolved. Refer to the documentation for further information about the systemd-resolved service.

Step 1 : Set-up systemd-resolved

Modify /etc/systemd/resolved.conf so that it is similar to what is shown below. Be sure to enable DNS over TLS and to configure the IP addresses of the DNS servers you want to use.

$ cat /etc/systemd/resolved.conf
[Resolve]
DNS=1.1.1.1 9.9.9.9
DNSOverTLS=yes
DNSSEC=yes
FallbackDNS=8.8.8.8 1.0.0.1 8.8.4.4
#Domains=~.
#LLMNR=yes
#MulticastDNS=yes
#Cache=yes
#DNSStubListener=yes
#ReadEtcHosts=yes

A quick note about the options:

  • DNS: A space-separated list of IPv4 and IPv6 addresses to use as system DNS servers
  • FallbackDNS: A space-separated list of IPv4 and IPv6 addresses to use as the fallback DNS servers.
  • Domains: These domains are used as search suffixes when resolving single-label host names, ~. stand for use the system DNS server defined with DNS= preferably for all domains.
  • DNSOverTLS: If true all connections to the server will be encrypted. Note that this mode requires a DNS server that supports DNS-over-TLS and has a valid certificate for it’s IP.

NOTE: The DNS servers listed in the above example are my personal choices. You should decide which DNS servers you want to use; being mindful of whom you are asking IPs for internet navigation.

Step 2 : Tell NetworkManager to push info to systemd-resolved

Create a file in /etc/NetworkManager/conf.d named 10-dns-systemd-resolved.conf.

$ cat /etc/NetworkManager/conf.d/10-dns-systemd-resolved.conf
[main]
dns=systemd-resolved

The setting shown above (dns=systemd-resolved) will cause NetworkManager to push DNS information acquired from DHCP to the systemd-resolved service. This will override the DNS settings configured in Step 1. This is fine on a trusted network, but feel free to set dns=none instead to use the DNS servers configured in /etc/systemd/resolved.conf.

Step 3 : start & restart services

To make the settings configured in the previous steps take effect, start and enable systemd-resolved. Then restart NetworkManager.

CAUTION: This will lead to a loss of connection for a few seconds while NetworkManager is restarting.

$ sudo systemctl start systemd-resolved
$ sudo systemctl enable systemd-resolved
$ sudo systemctl restart NetworkManager

NOTE: Currently, the systemd-resolved service is disabled by default and its use is opt-in. There are plans to enable systemd-resolved by default in Fedora 33.

Step 4 : Check if everything is fine

Now you should be using DNS over TLS. Confirm this by checking DNS resolution status with:

$ resolvectl status
MulticastDNS setting: yes DNSOverTLS setting: yes DNSSEC setting: yes DNSSEC supported: yes Current DNS Server: 1.1.1.1 DNS Servers: 1.1.1.1 9.9.9.9 Fallback DNS Servers: 8.8.8.8 1.0.0.1 8.8.4.4

/etc/resolv.conf should point to 127.0.0.53

$ cat /etc/resolv.conf
# Generated by NetworkManager
search lan
nameserver 127.0.0.53

To see the address and port that systemd-resolved is sending and receiving secure queries on, run:

$ sudo ss -lntp | grep '\(State\|:53 \)'
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=10410,fd=18))

To make a secure query, run:

$ resolvectl query fedoraproject.org
fedoraproject.org: 8.43.85.67 -- link: wlp58s0 8.43.85.73 -- link: wlp58s0 [..] -- Information acquired via protocol DNS in 36.3ms.
-- Data is authenticated: yes

BONUS Step 5 : Use Wireshark to verify the configuration

First, install and run Wireshark:

$ sudo dnf install wireshark
$ sudo wireshark

It will ask you which link device it have to begin capturing packets on. In my case, because I use a wireless interface, I will go ahead with wlp58s0. Set up a filter in Wireshark like tcp.port == 853 (853 is the DNS over TLS protocol port). You need to flush the local DNS caches before you can capture a DNS query:

$ sudo resolvectl flush-caches

Now run:

$ nslookup fedoramagazine.org

You should see a TLS-encryped exchange between your computer and your configured DNS server:

Poster in Cover Image Approved for Release by NSA on 04-17-2018, FOIA Case # 83661

Posted on Leave a comment

Running Rosetta@home on a Raspberry Pi with Fedora IoT

The Rosetta@home project is a not-for-profit distributed computing project created by the Baker laboratory at the University of Washington. The project uses idle compute capacity from volunteer computers to study protein structure, which is used in research into diseases such as HIV, Malaria, Cancer, and Alzheimer’s.

In common with many other scientific organizations, Rosetta@home is currently expending significant resources on the search for vaccines and treatments for COVID-19.

Rosetta@home uses the open source BOINC platform to manage donated compute resources. BOINC was originally developed to support the SETI@home project searching for Extraterrestrial Intelligence. These days, it is used by a number of projects in many different scientific fields. A single BOINC client can contribute compute resources to many such projects, though not all projects support all architectures.

For the example shown in this article a Raspberry Pi 3 Model B was used, which is one of the tested reference devices for Fedora IoT. This device, with only 1GB of RAM, is only just powerful enough to be able to make a meaningful contribution to Rosetta@home, and there’s certainly no way the Raspberry Pi can be used for anything else – such as running a desktop environment – at the same time.

It’s also worth mentioning at this point that the first rule of Raspberry Pi computing is to get the recommended power supply. It is important to get as close to the specified 2.5A as you can, and use a good quality micro-usb cable.

Getting Fedora IoT

To install Fedora IoT on a Raspberry Pi, the first step is to download the aarch64 Raw Image from the iot.fedoraproject.org download page.

Then use the arm-image-installer utility (sudo dnf install fedora-arm-installer) to write the image to the SD card. As always, be very sure which device name corresponds to your SD Card before continuing. Check the device with the lsblk command like this:

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 1 59.5G 0 disk
└─sdb1 8:17 1 59.5G 0 part /run/media/gavin/154F-1CEC
nvme0n1 259:0 0 477G 0 disk
├─nvme0n1p1 259:1 0 600M 0 part
...

If you’re still not sure, try running lsblk with the SD card removed, then again with the SD card inserted and comparing the outputs. In this case it lists the SD card as /dev/sdb. If you’re really unsure, there are some more tips described in the Getting Started guide.

We need to tell arm-image-installer which image file to use, what type of device we’re going to be using, and the device name – determined above – to use for writing the image. The arm-image-installer utility is also able to expand the filesystem to use the entire SD card at the point of writing the image.

Since we’re not going to use the zezere provisioning server to deploy SSH keys to the Raspberry Pi, we need to specify the option to remove the root password so that we can log in and set it at first boot.

In my case, the full command was:

sudo arm-image-installer --image ~/Downloads/Fedora-IoT-32-20200603.0.aarch64.raw.xz --target=rpi3 --media=/dev/sdb --resizefs --norootpass

After a final confirmation prompt:

= Selected Image: = /var/home/gavin/Downloads/Fedora-IoT-32-20200603.0.aarc...
= Selected Media : /dev/sdb
= U-Boot Target : rpi3
= Root Password will be removed.
= Root partition will be resized
===================================================== *****************************************************
*****************************************************
******** WARNING! ALL DATA WILL BE DESTROYED ********
*****************************************************
***************************************************** Type 'YES' to proceed, anything else to exit now 

the image is written to the SD Card.

...
= Installation Complete! Insert into the rpi3 and boot.

Booting the Raspberry Pi

For the initial setup, you’ll need to attach a keyboard and mouse to the Raspberry Pi. Alternatively, you can follow the instructions for connecting with a USB-to-Serial cable.

When the Raspberry Pi boots up, just type root at the login prompt and press enter.

localhost login: root
[root@localhost~]#

The first task is to set a password for the root user.

[root@localhost~]# passwd
Changing password for user root.
New password: Retype new password:
passwd: all authentication tokens updated successfully
[root@localhost~]#

Verifying Network Connectivity

To verify the network connectivity, the checklist in the Fedora IoT Getting Started guide was followed. This system is using a wired ethernet connection, which shows as eth0. If you need to set up a wireless connection this can be done with nmcli.

ip addr will allow you to check that you have a valid IP address.

[root@localhost ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether b8:27:eb:9d:6e:13 brd ff:ff:ff:ff:ff:ff
inet 192.168.178.60/24 brd 192.168.178.255 scope global dynamic noprefixroute eth0
valid_lft 863928sec preferred_lft 863928sec
inet6 fe80::ba27:ebff:fe9d:6e13/64 scope link
valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether fe:d3:c9:dc:54:25 brd ff:ff:ff:ff:ff:ff

ip route will check that the network has a default gateway configured.

[root@localhost ~]# ip route
default via 192.168.178.1 dev eth0 proto dhcp metric 100 192.168.178.0/24 dev eth0 proto kernel scope link src 192.168.178.60 metric 100 

To verify internet access and name resolution, use ping

[root@localhost ~]# ping -c3 iot.fedoraproject.org
PING wildcard.fedoraproject.org (8.43.85.67) 56(84) bytes of data.
64 bytes from proxy14.fedoraproject.org (8.43.85.67): icmp_seq=1 ttl=46 time=93.4 ms
64 bytes from proxy14.fedoraproject.org (8.43.85.67): icmp_seq=2 ttl=46 time=90.0 ms
64 bytes from proxy14.fedoraproject.org (8.43.85.67): icmp_seq=3 ttl=46 time=91.3 ms --- wildcard.fedoraproject.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 90.043/91.573/93.377/1.374 ms

Optional: Configuring sshd so we can disconnect the keyboard and monitor

Before disconnecting the keyboard and monitor, we need to ensure that we can connect to the Raspberry Pi over the network.

First we verify that sshd is running

[root@localhost~]# systemctl is-active sshd
active

and that there is a firewall rule present to allow ssh.

[root@localhost ~]# firewall-cmd --list-all
public (active) target: default icmp-block-inversion: no interfaces: eth0 sources: services: dhcpv6-client mdns ssh ports: protocols: masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: 

In the file /etc/ssh/sshd_config, find the section named

# Authentication

and add the line

PermitRootLogin yes

There will already be a line

#PermitRootLogin prohibit-password

which you can edit by removing the # comment character and changing the value to yes.

Restart the sshd service to pick up the change

[root@localhost ~]# systemctl restart sshd

If all this is in place, we should be able to ssh to the Raspberry Pi.

[gavin@desktop ~]$ ssh [email protected]
The authenticity of host '192.168.178.60 (192.168.178.60)' can't be established.
ECDSA key fingerprint is SHA256:DLdFaYbvKhB6DG2lKmJxqY2mbrbX5HDRptzWMiAUgBM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.178.60' (ECDSA) to the list of known hosts.
[email protected]'s password: Boot Status is GREEN - Health Check SUCCESS
Last login: Wed Apr 1 17:24:50 2020
[root@localhost ~]#

It’s now safe to log out from the console (exit) and disconnect the keyboard and monitor.

Disabling unneeded services

Since we’re right on the lower limit of viable hardware for Rosetta@home, it’s worth disabling any unneeded services. Fedora IoT is much more lightweight than desktop distributions, but there are still a few optimizations we can do.

Like disabling bluetooth, Modem Manager (used for cellular data connections), WPA supplicant (used for Wi-Fi) and the zezere services, which are used to centrally manage a fleet of Fedora IoT devices.

[root@localhost /]# for serviceName in bluetooth ModemManager wpa_supplicant zezere_ignition zezere_ignition.timer zezere_ignition_banner; do sudo systemctl stop $serviceName; sudo systemctl disable $serviceName; sudo systemctl mask $serviceName; done

Getting the BOINC client

Instead of installing the BOINC client directly onto the operating system with rpm-ostree, we’re going to use podman to run the containerized version of the client.

This image uses a volume mount to store its data, so we create the directories it needs in advance.

[root@localhost ~]# mkdir -p /opt/appdata/boinc/slots /opt/appdata/boinc/locale

We also need to add a firewall rule to allow the container to resolve external DNS names.

[root@localhost ~]# firewall-cmd --permanent --zone=trusted --add-interface=cni-podman0 success [root@localhost ~]# systemctl restart firewalld

Finally we are ready to pull and run the BOINC client container.

[root@localhost ~]# podman run --name boinc -dt -p 31416:31416 -v /opt/appdata/boinc:/var/lib/boinc:Z -e BOINC_GUI_RPC_PASSWORD="blah" -e BOINC_CMD_LINE_OPTIONS="--allow_remote_gui_rpc" boinc/client:arm64v8 
Trying to pull...
...
... 787a26c34206e75449a7767c4ad0dd452ec25a501f719c2e63485479f...

We can inspect the container logs to make sure everything is working as expected:

[root@localhost ~]# podman logs boinc
20-Jun-2020 09:02:44 [---] cc_config.xml not found - using defaults
20-Jun-2020 09:02:44 [---] Starting BOINC client version 7.14.12 for aarch64-unknown-linux-gnu
...
...
...
20-Jun-2020 09:02:44 [---] Checking presence of 0 project files
20-Jun-2020 09:02:44 [---] This computer is not attached to any projects
20-Jun-2020 09:02:44 Initialization completed

Configuring the BOINC container to run at startup

We can automatically generate a systemd unit file for the container with podman generate systemd.

[root@localhost ~]# podman generate systemd --files --name boinc
/root/container-boinc.service

This creates a systemd unit file in root’s home directory.

[root@localhost ~]# cat container-boinc.service 
# container-boinc.service
# autogenerated by Podman 1.9.3
# Sat Jun 20 09:13:58 UTC 2020 [Unit]
Description=Podman container-boinc.service
Documentation=man:podman-generate-systemd(1)
Wants=network.target
After=network-online.target [Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStart=/usr/bin/podman start boinc
ExecStop=/usr/bin/podman stop -t 10 boinc
PIDFile=/var/run/containers/storage/overlay-containers/787a26c34206e75449a7767c4ad0dd452ec25a501f719c2e63485479fbe21631/userdata/conmon.pid
KillMode=none
Type=forking [Install]
WantedBy=multi-user.target default.target

We install the file by moving it to the appropriate directory.

[root@localhost ~]# mv -Z container-boinc.service /etc/systemd/system
[root@localhost ~]# systemctl enable /etc/systemd/system/container-boinc.service
Created symlink /etc/systemd/system/multi-user.target.wants/container-boinc.service → /etc/systemd/system/container-boinc.service.
Created symlink /etc/systemd/system/default.target.wants/container-boinc.service → /etc/systemd/system/container-boinc.service.

Connecting to the Rosetta Stone project

You need to create an account at the Rosetta@home signup page, and retrieve your account key from your account home page. The key to copy is the “Weak Account Key”.

Finally, we execute the boinccmd configuration utility inside the container using podman exec, passing the Rosetta@home url and our account key.

[root@localhost ~]# podman exec boinc boinccmd --project_attach https://boinc.bakerlab.org/rosetta/ 2160739_cadd20314e4ef804f1d95ce2862c8f73

Running podman logs –follow boinc will allow us to see the container connecting to the project. You will probably see errors of the form

20-Jun-2020 10:18:40 [Rosetta@home] Rosetta needs 1716.61 MB RAM but only 845.11 MB is available for use.

This is because most, but not all, of the work units in Rosetta@Home require more memory than we have to offer. However, if you leave the device running for a while, it should eventually get some jobs to process. The polling interval seems to be approximately 10 minutes. We can also tweak the memory settings using BOINC manager to allow BOINC to use slightly more memory. This will increase the probability that Rosetta@home will be able to find tasks for us.

Installing BOINC Manager for remote access

You can use dnf to install the BOINC manager component to remotely manage the BOINC client on the Raspberry Pi.

[gavin@desktop ~]$ sudo dnf install boinc-manager

If you switch to “Advanced View” , you will be able to select “File -> Select Computer” and connect to your Raspberry Pi, using the IP address of the Pi and the value supplied for BOINC_GUI_RPC_PASSWORD in the podman run command, in my case “blah“.

Press Shift+Ctrl+I to connect BOINC manager to a remote computer

Under “Options -> Computing Preferences”, increase the value for “When Computer is not in use, use at most _ %”. I’ve been using 93%; this seems to allow Rosetta@home to schedule work on the pi, whilst still leaving it just about usable. It is possible that further fine tuning of the operating system might allow this percentage to be increased.

Using the Computing Preferences Dialog to set the memory threshhold

These settings can also be changed through the Rosetta@home website settings page, but bear in mind that changes made through the BOINC Manager client override preferences set in the web interface.

Wait

It may take a while, possibly several hours, for Rosetta@home to send work to our newly installed client, particularly as most work units are too big to run on a Raspberry Pi. COVID-19 has resulted in a large number of new computers being joined to the Rosetta@home project, which means that there are times when there isn’t enough work to do.

When we are assigned some work units, BOINC will download several hundred megabytes of data. This will be stored on the SD Card and can be viewed using BOINC manager.

We can also see the tasks running in the Tasks pane:

The client has downloaded four tasks, but only one of them is currently running due to memory constraints. At times, two tasks can run simultaneously, but I haven’t seen more than that. This is OK as long as the tasks are completed by the deadline shown on the right. I’m fairly confident these will be completed as long as the Raspberry Pi is left running. I have found that the additional memory overhead created by the BOINC Manager connection and sshd services can reduce parallelism, so I try to disconnect these when I’m not using them.

Conclusion

Rosetta@home, in common with many other distributed computing projects, is currently experiencing a large spike in participation due to COVID-19. That aside, the project has been doing valuable work for many years to combat a number of other diseases.

Whilst a Raspberry Pi is never going to appear at the top of the contribution chart, I think this is a worthwhile project to undertake with a spare Raspberry Pi. The existence of work units aimed at low-spec ARM devices indicates that the project organizers agree with this sentiment. I’ll certainly be leaving mine running for the foreseeable future.