# Command line quick tips: Searching with grep

If you use your Fedora system for more than just browsing the web, you have probably needed to search for text in your files. For instance, you might be a developer that can’t remember where you left some code snippet. Or you might be looking for a setting stored in your system configuration files. Whatever the reason, there are plenty of ways to search for text on your Fedora system. This article will show you how, including using the built-in utility grep.

## Introducing grep

The grep utility allows you to search for text, or more specifically text patterns, on your file system. The name grep comes from global regular expression print. Yikes, what a mouthful! This is because a regular expression (or regex) is a way of defining text patterns.

The grep utility lets you find and print out matches on these patterns — thus the name. It’s a powerful system, and you can even find it in modern code editors like Visual Studio Code or Atom.

## Regular expressions

Harnessing all the power of regular expressions is a topic bigger than this article, for sure. The simplest kind of regex can be just a word, or a portion of a word. That pattern is simply “the following characters, in the same order.” The pattern is searched line by line. For example:

• pciutil – matches any time the 7 characters pciutil appear together — including pciutil, pciutils, pciutil123, and foopciutil.
• ^pciutil – matches any time the 7 characters pciutil appear together immediately at the beginning of a line (that’s what the ^ stands for)
• pciutil$– matches any time the 7 characters pciutil appear together immediately before the end of a line (that’s what the$ stands for)

More complicated expressions are also possible. Special characters are used in a regex as wildcards, or to change the way the regex works. If you want to match on one of these characters, use a \ (backslash) before the character.

For instance, the . (period or full stop) is a wildcard that matches any single character. If you use it in the expression pci.til, it matches pciutil, pci4til, or pci!til, but does not match pcitil. There must be a character to match the . in the regular expression.

The ? is a marker in a regex that marks the previous element as optional. So if you built on the previous example, the expression pci.?til would also match on pcitil because there need not be a character between i and t for a valid match.

The + and * are markers that stand for repetition. While + stands for one or more of the previous element, * stands for zero or more. So the regex pci.+til would match any of these: pciutil, pci4til, pci!til, pciuuuuuutil, pci423til. However, it wouldn’t match pcitil — but the regex pci.*til would.

## Examples of grep

Now that you know a little about regex, let’s put it to work. Imagine that you’re trying to find a configuration file that mentions a user account jpublic. You tried a bunch of files already, but none were the correct one, and you’re sure it’s there. So, try searching the /etc folder (using sudo because some subfolders are not readable outside the root account):

$sudo grep -r jpublic /etc/ The -r switch searches the folder recursively. The utility prints a list of matching files, and the line where the hit occurred. In most modern terminal environments, the hit is color highlighted for better readability. Imagine you have a much larger selection of files in /home/shared and you need to establish which ones mention the name MacNulty. However, you’re not sure whether the capitalization will be consistent, and you’re just looking for names of files, not the context. Also, you believe someone may have misspelled the name as McNulty in some places. Use the -l switch to only output filenames with a match, a ? marker for optional a in the name, and -i to make the search case-insensitive: $ sudo grep -irl 'ma\?cnulty' /home/shared

This command will match on strings like Macnulty, McNulty, Mcnulty, and macNulty with no problem. You’ll get a simple list of filenames where the match was found in the contents.

These are only the simplest ways to use grep and regular expressions. You can learn a lot more about both using the info grep command.

## But wait, there’s more…

The grep command is venerable but in some situations may not be as efficient as newer search utilities. For instance, the ripgrep utility is engineered to be a fast search utility that can take the place of grep. We covered ripgrep as part of an article on Rust and Rust applications previously in the Magazine:

It’s important to note that ripgrep has its own command line switches and syntax. For example, it has simple switches to print only filename matches, invert searches, and many other useful functions. It can also ignore based on .rgignore files placed in any subdirectories. (It’s also noteworthy that the -r switch is used differently for ripgrep, because it is automatically recursive.)

To install, use this command:

$sudo dnf install ripgrep To explore the options, use the manual page (man rg). You’ll find that many, but not all, options are the same as grep. Have fun searching! # Use a drop-down terminal for fast commands in Fedora A drop-down terminal lets you tap a key and quickly enter any command on your desktop. Often it creates a terminal in a smooth way, sometimes with effects. This article demonstrates how it helps to improve and speed up daily tasks, using drop-down terminals like Yakuake, Tilda, Guake and a GNOME extension. ## Yakuake Yakuake is a drop-down terminal emulator based on KDE Konsole techonology. It is distributed under the terms of the GNU GPL Version 2. It includes features such as: • Smoothly rolls down from the top of your screen • Tabbed interface • Configurable dimensions and animation speed • Skinnable • Sophisticated D-Bus interface To install Yakuake, use the following command: $ sudo dnf install -y yakuake

### Startup and configuration

If you’re runnign KDE, open the System Settings and go to Startup and Shutdown. Add yakuake to the list of programs under Autostart, like this:

It’s easy to configure Yakuake while running the app. To begin, launch the program at the command line:

### Startup and configuration

Most users prefer to have a drop-down terminal available behind the scenes when they login. To set this option, first go to the app launcher in your desktop, search for Tilda, and open it.

Next, open up the Tilda Config window. Select Start Tilda hidden, which means it will not display a terminal immediately when started.

Next, you’ll set your desktop to start Tilda automatically. If you’re using KDE, go to System Settings > Startup and Shutdown > Autostart and use Add a Program.

If you’re using GNOME, you can run this command in a terminal:

$ln -s /usr/share/applications/tilda.desktop ~/.config/autostart/ When you run for the first time, a wizard shows up to set your preferences. If you need to change something, right click and go to Preferences in the menu. You can also create multiple configuration files, and bind other keys to open new terminals at different places on the screen. To do that, run this command: $ tilda -C

Every time you use the above command, Tilda creates a new config file located in the ~/.config/tilda/ folder called config_0, config_1, and so on. You can then map a key combination to open a new Tilda terminal with a specific set of options.

### Using Tilda

The main shortcuts are:

• F1 = Pull Down Terminal Tilda (Note: If you have more than one config file, the shortcuts are the same, with a diferent open/retract shortcut like F1, F2, F3, and so on)
• F11 = Full Screen Mode
• F12 = Toggle Transparency
• Ctrl+Page Up = Go to Next Tab
• Ctrl+Page Down = Go to Previous Tab

## GNOME Extension

The Drop-down Terminal GNOME Extension lets you use this useful tool in your GNOME Shell. It is easy to install and configure, and gives you fast access to a terminal session.

### Installation

Open a browser and go to the site for this GNOME extension. Enable the extension setting to On, as shown here:

Then select Install to install the extension on your system.

Once you do this, there’s no reason to set any autostart options. The extension will automatically run whenever you login to GNOME!

### Configuration

After install, the Drop Down Terminal configuration window opens to set your preferences. For example, you can set the size of the terminal, animation, transparency, and scrollbar use.

If you need change some preferences in the future, run the gnome-shell-extension-prefs command and choose Drop Down Terminal.

### Using the extension

The shortcuts are simple:

•  (usually the key above Tab) = Open/Retract Terminal
• F12 (customize as you prefer) = Open/Retract Terminal

# Trace code in Fedora with bpftrace

bpftrace is a new eBPF-based tracing tool that was first included in Fedora 28. It was developed by Brendan Gregg, Alastair Robertson and Matheus Marchini with the help of a loosely-knit team of hackers across the Net. A tracing tool lets you analyze what a system is doing behind the curtain. It tells you which functions in code are being called, with which arguments, how many times, and so on.

#### eBPF (extended Berkeley Packet Filter)

eBPF is a tiny virtual machine, or a virtual CPU to be more precise, in the Linux Kernel. The eBPF can load and run small programs in a safe and controlled way in kernel space. This makes it safer to use, even in production systems. This virtual machine has its own instruction set architecture (ISA) resembling a subset of modern processor architectures. The ISA makes it easy to translate those programs to the real hardware. The kernel performs just-in-time translation to native code for main architectures to improve the performance.

The eBPF virtual machine allows the kernel to be extended programmatically. Nowadays several kernel subsystems take advantage of this new powerful Linux Kernel capability. Examples include networking, seccomp, tracing, and more. The main idea is to attach eBPF programs into specific code points, and thereby extend the original kernel behavior.

eBPF machine language is very powerful. But writing code directly in it is extremely painful, because it’s a low level language. This is where bpftrace comes in. It provides a high-level language to write eBPF tracing scripts. The tool then translates these scripts to eBPF with the help of clang/LLVM libraries, and then attached to the specified code points.

## Installation and quick start

To install bpftrace, run the following command in a terminal using sudo:

$sudo dnf install bpftrace Try it out with a “hello world” example: $ sudo bpftrace -e 'BEGIN { printf("hello world\n"); }'

Note that you must run bpftrace as root due to the privileges required. Use the -e option to specify a program, and to construct the so-called “one-liners.” This example only prints hello world, and then waits for you to press Ctrl+C.

BEGIN is a special probe name that fires only once at the beginning of execution. Every action inside the curly braces { } fires whenever the probe is hit — in this case, it’s just a printf.

Let’s jump now to a more useful example:

$sudo bpftrace -e 't:syscalls:sys_enter_execve { printf("%s called %s\n", comm, str(args->filename)); }' This example prints the parent process name (comm) and the name of every new process being created in the system. t:syscalls:sys_enter_execve is a kernel tracepoint. It’s a shorthand for tracepoint:syscalls:sys_enter_execve, but both forms can be used. The next section shows you how to list all available tracepoints. comm is a bpftrace builtin that represents the process name. filename is a field of the t:syscalls:sys_enter_execve tracepoint. You can access these fields through the args builtin. All available fields of the tracepoint can be listed with this command: bpftrace -lv "t:syscalls:sys_enter_execve" ## Example usage ### Listing probes A central concept for bpftrace are probe points. Probe points are instrumentation points in code (kernel or userspace) where eBPF programs can be attached. They fit into the following categories: • kprobe – kernel function start • kretprobe – kernel function return • uprobe – user-level function start • uretprobe – user-level function return • tracepoint – kernel static tracepoints • usdt – user-level static tracepoints • profile – timed sampling • interval – timed output • software – kernel software events • hardware – processor-level events All available kprobe/kretprobe, tracepoints, software and hardware probes can be listed with this command: $ sudo bpftrace -l

The uprobe/uretprobe and usdt probes are userspace probes specific to a given executable. To use them, use the special syntax shown later in this article.

The profile and interval probes fire at fixed time intervals. Fixed time intervals are not covered in this article.

### Counting system calls

Maps are special BPF data types that store counts, statistics, and histograms. You can use maps to summarize how many times each syscall is being called:

$sudo bpftrace -e 't:syscalls:sys_enter_* { @[probe] = count(); }' Some probe types allow wildcards to match multiple probes. You can also specify multiple attach points for an action block using a comma separated list. In this example, the action block attaches to all tracepoints whose name starts with t:syscalls:sys_enter_, which means all available syscalls. The bpftrace builtin function count() counts the number of times this function is called. @[] represents a map (an associative array). The key of this map is probe, which is another bpftrace builtin that represents the full probe name. Here, the same action block is attached to every syscall. Then, each time a syscall is called the map will be updated, and the entry is incremented in the map relative to this same syscall. When the program terminates, it automatically prints out all declared maps. This example counts the syscalls called globally, it’s also possible to filter for a specific process by PID using the bpftrace filter syntax: $ sudo bpftrace -e 't:syscalls:sys_enter_* / pid == 1234 / { @[probe] = count(); }'

### Write bytes by process

Using these concepts, let’s analyze how many bytes each process is writing:

$sudo bpftrace -e 't:syscalls:sys_exit_write /args->ret > 0/ { @[comm] = sum(args->ret); }' bpftrace attaches the action block to the write syscall return probe (t:syscalls:sys_exit_write). Then, it uses a filter to discard the negative values, which are error codes (/args->ret > 0/). The map key comm represents the process name that called the syscall. The sum() builtin function accumulates the number of bytes written for each map entry or process. args is a bpftrace builtin to access tracepoint’s arguments and return values. Finally, if successful, the write syscall returns the number of written bytes. args->ret provides access to the bytes. ### Read size distribution by process (histogram): bpftrace supports the creation of histograms. Let’s analyze one example that creates a histogram of the read size distribution by process: $ sudo bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'

Histograms are BPF maps, so they must always be attributed to a map (@). In this example, the map key is comm.

The example makes bpftrace generate one histogram for every process that calls the read syscall. To generate just one global histogram, attribute the hist() function just to ‘@’ (without any key).

bpftrace automatically prints out declared histograms when the program terminates. The value used as base for the histogram creation is the number of read bytes, found through args->ret.

### Tracing userspace programs

You can also trace userspace programs with uprobes/uretprobes and USDT (User-level Statically Defined Tracing). The next example uses a uretprobe, which probes to the end of a user-level function. It gets the command lines issued in every bash running in the system:

$sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: \"%s\"\n", str(retval)); }' To list all available uprobes/uretprobes of the bash executable, run this command: $ sudo bpftrace -l "uprobe:/bin/bash"

uprobe instruments the beginning of a user-level function’s execution, and uretprobe instruments the end (its return). readline() is a function of /bin/bash, and it returns the typed command line. retval is the return value for the instrumented function, and can only be accessed on uretprobe.

When using uprobes, you can access arguments with arg0..argN. A str() call is necessary to turn the char * pointer to a string.

## Shipped Scripts

There are many useful scripts shipped with bpftrace package. You can find them in the /usr/share/bpftrace/tools/ directory.

Among them, you can find:

• killsnoop.bt – Trace signals issued by the kill() syscall.
• tcpconnect.bt – Trace all TCP network connections.
• pidpersec.bt – Count new procesess (via fork) per second.
• opensnoop.bt – Trace open() syscalls.
• vfsstat.bt – Count some VFS calls, with per-second summaries.

You can directly use the scripts. For example:

$sudo /usr/share/bpftrace/tools/killsnoop.bt You can also study these scripts as you create new tools. ## Links Photo by Roman Romashov on Unsplash. # 4 cool new projects to try in COPR for August 2019 COPR is a collection of personal repositories for software that isn’t carried in Fedora. Some software doesn’t conform to standards that allow easy packaging. Or it may not meet other Fedora standards, despite being free and open source. COPR can offer these projects outside the Fedora set of packages. Software in COPR isn’t supported by Fedora infrastructure or signed by the project. However, it can be a neat way to try new or experimental software. Here’s a set of new and interesting projects in COPR. ### Duc Duc is a collection of tools for disk usage inspection and visualization. Duc uses an indexed database to store sizes of files on your system. Once the indexing is done, you can then quickly overview your disk usage either by its command-line interface or the GUI. #### Installation instructions The repo currently provides duc for EPEL 7, Fedora 29 and 30. To install duc, use these commands: sudo dnf copr enable terrywang/duc sudo dnf install duc ### MuseScore MuseScore is a software for working with music notation. With MuseScore, you can create sheet music either by using a mouse, virtual keyboard or a MIDI controller. MuseScore can then play the created music or export it as a PDF, MIDI or MusicXML. Additionally, there’s an extensive database of sheet music created by Musescore users. #### Installation instructions The repo currently provides MuseScore for Fedora 29 and 30. To install MuseScore, use these commands: sudo dnf copr enable jjames/MuseScore sudo dnf install musescore ### Dynamic Wallpaper Editor Dynamic Wallpaper Editor is a tool for creating and editing a collection of wallpapers in GNOME that change in time. This can be done using XML files, however, Dynamic Wallpaper Editor makes this easy with its graphical interface, where you can simply add pictures, arrange them and set the duration of each picture and transitions between them. #### Installation instructions The repo currently provides dynamic-wallpaper-editor for Fedora 30 and Rawhide. To install dynamic-wallpaper-editor, use these commands: sudo dnf copr enable atim/dynamic-wallpaper-editor sudo dnf install dynamic-wallpaper-editor ### Manuskript Manuskript is a tool for writers and is aimed to make creating large writing projects easier. It serves as an editor for writing the text itself, as well as a tool for organizing notes about the story itself, characters of the story and individual plots. #### Installation instructions The repo currently provides Manuskript for Fedora 29, 30 and Rawhide. To install Manuskript, use these commands: sudo dnf copr enable notsag/manuskript sudo dnf install manuskript # Use Postfix to get email from your Fedora system Communication is key. Your computer might be trying to tell you something important. But if your mail transport agent (MTA) isn’t properly configured, you might not be getting the notifications. Postfix is a MTA that’s easy to configure and known for a strong security record. Follow these steps to ensure that email notifications sent from local services will get routed to your internet email account through the Postfix MTA. ## Install packages Use dnf to install the required packages (you configured sudo, right?): $ sudo -i
# dnf install postfix mailx

If you previously had a different MTA configured, you may need to set Postfix to be the system default. Use the alternatives command to set your system default MTA:

## Enable, start, and test Postfix

After you have updated the main.cf file, enable and start the Postfix service:

# systemctl enable --now postfix.service

You can then exit your sudo session as root using the exit command or Ctrl+D. You should now be able to test your configuration with the mail command:

$echo 'It worked!' | mail -s "Test:$(date)" glb@gmail.com

## Update services

If you have services like logwatch, mdadm, fail2ban, apcupsd or certwatch installed, you can now update their configurations so that their email notifications will go to your internet email address.

Optionally, you may want to configure all email that is sent to your local system’s root account to go to your internet email address. Add this line to the /etc/aliases file on your system (you’ll need to use sudo to edit this file, or switch to the root account first):

root: glb+root@gmail.com

Now run this command to re-read the aliases:

# newaliases
• TIP: If you are using Gmail, you can add an alpha-numeric mark between your username and the @ symbol as demonstrated above to make it easier to identify and filter the email that you will receive from your computer(s).

## Troubleshooting

View the mail queue:

$mailq Clear all email from the queues: # postsuper -d ALL Filter the configuration settings for interesting values: $ postconf | grep "^relayhost\|^smtp_"

View the postfix/smtp logs:

$journalctl --no-pager -t postfix/smtp Reload postfix after making configuration changes: $ systemctl reload postfix

Photo by Sharon McCutcheon on Unsplash.

This can make you more secure by having one strong password instead of many weak passwords. You can also sync your passwords across devices if you have a cloud-based password manager like LastPass, 1Password, or Dashlane. Unfortunately, none of these products are open source. Luckily there are open source alternatives available.

Each of these three apps has its own downsides. Bitwarden stores everything in one place and is exposed to the web through its API and website interface. LessPass can’t store custom passwords since it’s stateless, so you need to use their derived passwords. KeePass, a file-based password manager, can’t easily sync between devices. You can utilize a cloud-storage provider together with WebDAV to get around this, but a lot of clients do not support it and you might get file conflicts if devices do not sync correctly.

## Running an unofficial Bitwarden implementation

There is a community implementation of the server and its API called bitwarden_rs. This implementation is fully open source as it can use SQLite or MariaDB/MySQL, instead of the proprietary Microsoft SQL Server that the official server uses.

It’s important to recognize some differences exist between the official and the unofficial version. For instance, the official server has been audited by a third-party, whereas the unofficial one hasn’t. When it comes to implementations, the unofficial version lacks email confirmation and support for two-factor authentication using Duo or email codes.

Let’s get started running the server with SELinux in mind. Following the documentation for bitwarden_rs you can construct a Podman command as follows:

$podman run -d \ --userns=keep-id \ --name bitwarden \ -e SIGNUPS_ALLOWED=false \ -e ROCKET_PORT=8080 \ -v /home/egustavs/Bitwarden/bw-data/:/data/:Z \ -p 8080:8080 \ bitwardenrs/server:latest This downloads the bitwarden_rs image and runs it in a user container under the user’s namespace. It uses a port above 1024 so that non-root users can bind to it. It also changes the volume’s SELinux context with :Z to prevent permission issues with read-write on /data. If you host this under a domain, it’s recommended to put this server under a reverse proxy with Apache or Nginx. That way you can use port 80 and 443 which points to the container’s 8080 port without running the container as root. ## Running under systemd With Bitwarden now running, you probably want to keep it that way. Next, create a unit file that keeps the container running, automatically restarts if it doesn’t respond, and starts running after a system restart. Create this file as /etc/systemd/system/bitwarden.service: [Unit]Description=Bitwarden Podman containerWants=syslog.service[Service]User=egustavsGroup=egustavsTimeoutStartSec=0ExecStart=/usr/bin/podman run 'bitwarden'ExecStop=-/usr/bin/podman stop -t 10 'bitwarden'Restart=alwaysRestartSec=30sKillMode=none[Install]WantedBy=multi-user.target Now, enable and start it using sudo: $ sudo systemctl enable bitwarden.service && sudo systemctl start bitwarden.service$systemctl status bitwarden.servicebitwarden.service - Bitwarden Podman container Loaded: loaded (/etc/systemd/system/bitwarden.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2019-07-09 20:23:16 UTC; 1 day 14h ago Main PID: 14861 (podman) Tasks: 44 (limit: 4696) Memory: 463.4M Success! Bitwarden is now running under system and will keep running. ## Adding LetsEncrypt It’s strongly recommended to run your Bitwarden instance through an encrypted channel with something like LetsEncrypt if you have a domain. Certbot is a bot that creates LetsEncrypt certificates for us, and they have a guide for doing this through Fedora. After you generate a certificate, you can follow the bitwarden_rs guide about HTTPS. Just remember to append :Z to the LetsEncrypt volume to handle permissions while not changing the port. Photo by CMDR Shane on Unsplash. # Manage your shell environment Some time ago, the Fedora Magazine has published an article introducing ZSH — an alternative shell to Fedora’s default, bash. This time, we’re going to look into customizing it to use it in a more effective way. All of the concepts shown in this article also work in other shells such as bash. ### Alias Aliases are shortcuts for commands. This is useful for creating short commands for actions that are performed often, but require a long command that would take too much time to type. The syntax is: $ alias yourAlias='complex command with arguments'

They don’t always need to be used for shortening long commands. Important is that you use them for tasks that you do often. An example could be:

$alias dnfUpgrade='dnf -y upgrade' That way, to do a system upgrade, I just type dnfUpgrade instead of the whole dnf command. The problem of setting aliases right in the console is that once the terminal session is closed, the alias would be lost. To set them permanently, resource files are used. ### Resource Files Resource files (or rc files) are configuration files that are loaded per user in the beginning of a session or a process (when a new terminal window is opened, or a new program like vim is started). In the case of ZSH, the resource file is .zshrc, and for bash it’s .bashrc. To make the aliases permanent, you can either put them in your resource. You can edit your resource file with a text editor of your choice. This example uses vim: $ vim $HOME/.zshrc Or for bash: $ vim $HOME/.bashrc Note that the location of the resource file is specified relatively to a home directory — and that’s where ZSH (or bash) are going to look for the file by default for each user. Other option is to put your configuration in any other file, and then source it: $ source /path/to/your/rc/file

Again, sourcing it right in your session will only apply it to the session, so to make it permanent, add the source command to your resource file. The advantage of having your source file in a different location is that you can source it any time. Or anywhere which is especially useful in shared environments.

### Environment Variables

Environment variables are values assigned to a specific name which can be then called in scripts and commands. They start with the $dollar sign. One of the most common is$HOME that references the home directory.

As the name suggests, environment variables are a part of your environment. Set a variable using the following syntax:

$http_proxy="http://your.proxy" And to make it an environment variable, export it with the following command: $ export $http_proxy To see all the environment variables that are currently set, use the env command: $ env

The command outputs all the variables available in your session. To demonstrate how to use them in a command, try running the following echo commands:

$echo$PWD/home/fedora$echo$USERfedora

What happens here is variable expansion — the value stored in the variable is used in your command.

There are many directories, or folders (the way they are called in graphical environments) that are important to the OS. Some directories are set to hold binaries you can use directly in your shell. And these directories are defined in the $PATH variable. $ echo $PATH/usr/lib64/qt-3.3/bin:/usr/share/Modules/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/usr/libexec/sdcc:/usr/libexec/sdcc:/usr/bin:/bin:/sbin:/usr/sbin:/opt/FortiClient This will help you when you want to have your own binaries (or scripts) accessible in the shell. # Jupyter and data science in Fedora In the past, kings and leaders used oracles and magicians to help them predict the future — or at least get some good advice due to their supposed power to perceive hidden information. Nowadays, we live in a society obsessed with quantifying everything. So we have data scientists to do this job. Data scientists use statistical models, numerical techniques and advanced algorithms that didn’t come from statistical disciplines, along with the data that exist on databases, to find, to infer, to predict data that doesn’t exist yet. Sometimes this data is about the future. That is why we do a lot of predictive analytics and prescriptive analytics. Here are some questions to which data scientists help find answers: 1. Who are the students with high propensity to abandon the class? For each one, what are the reasons for leaving? 2. Which house has a price above or below the fair price? What is the fair price for a certain house? 3. What are the hidden groups that my clients classify themselves? 4. Which future problems this premature child will develop? 5. How many calls will I get in my call center tomorrow 11:43 AM? 6. My bank should or should not lend money to this customer? Note how the answer to all these question is not sitting in any database waiting to be queried. These are all data that still doesn’t exist and has to be calculated. That is part of the job we data scientists do. Throughout this article you’ll learn how to prepare a Fedora system as a Data Scientist’s development environment and also a production system. Most of the basic software is RPM-packaged, but the most advanced parts can only be installed, nowadays, with Python’s pip tool. ## Jupyter — the IDE Most modern data scientists use Python. And an important part of their work is EDA (exploratory data analysis). EDA is a manual and interactive process that retrieves data, explores its features, searches for correlations, and uses plotted graphics to visualize and understand how data is shaped and prototypes predictive models. Jupyter is a web application perfect for this task. Jupyter works with Notebooks, documents that mix rich text including beautifully rendered math formulas (thanks to mathjax), blocks of code and code output, including graphics. Notebook files have extension .ipynb, which means Interactive Python Notebook. ### Setting up and running Jupyter First, install essential packages for Jupyter (using sudo): $ sudo dnf install python3-notebook mathjax sscg

You might want to install additional and optional Python modules commonly used by data scientists:

$sudo dnf install python3-seaborn python3-lxml python3-basemap python3-scikit-image python3-scikit-learn python3-sympy python3-dask+dataframe python3-nltk Set a password to log into Notebook web interface and avoid those long tokens. Run the following command anywhere on your terminal: $ mkdir -p $HOME/.jupyter$ jupyter notebook password

Now, type a password for yourself. This will create the file $HOME/.jupyter/jupyter_notebook_config.json with your encrypted password. Next, prepare for SSLby generating a self-signed HTTPS certificate for Jupyter’s web server: $ cd $HOME/.jupyter; sscg Finish configuring Jupyter by editing your$HOME/.jupyter/jupyter_notebook_config.json file. Make it look like this:

{ "NotebookApp": { "password": "sha1:abf58...87b", "ip": "*", "allow_origin": "*", "allow_remote_access": true, "open_browser": false, "websocket_compression_options": {}, "certfile": "/home/aviram/.jupyter/service.pem", "keyfile": "/home/aviram/.jupyter/service-key.pem", "notebook_dir": "/home/aviram/Notebooks" }} 

The parts in red must be changed to match your folders. Parts in blue were already there after you created your password. Parts in green are the crypto-related files generated by sscg.

Create a folder for your notebook files, as configured in the notebook_dir setting above:

$mkdir$HOME/Notebooks

Now you are all set. Just run Jupyter Notebook from anywhere on your system by typing:

$jupyter notebook Or add this line to your$HOME/.bashrc file to create a shortcut command called jn:

alias jn='jupyter notebook'

After running the command jn, access https://your-fedora-host.com:8888 from any browser on the network to see the Jupyter user interface. You’ll need to use the password you set up earlier. Start typing some Python code and markup text. This is how it looks:

In addition to the IPython environment, you’ll also get a web-based Unix terminal provided by terminado. Some people might find this useful, while others find this insecure. You can disable this feature in the config file.

## JupyterLab — the next generation of Jupyter

JupyterLab is the next generation of Jupyter, with a better interface and more control over your workspace. It’s currently not RPM-packaged for Fedora at the time of writing, but you can use pip to get it installed easily:

$pip3 install jupyterlab --user$ jupyter serverextension enable --py jupyterlab

Then run your regular jupiter notebook command or jn alias. JupyterLab will be accessible from http://your-linux-host.com:8888/lab.

## Tools used by data scientists

In this section you can get to know some of these tools, and how to install them. Unless noted otherwise, the module is already packaged for Fedora and was installed as prerequisites for previous components.

### Numpy

Numpy is an advanced and C-optimized math library designed to work with large in-memory datasets. It provides advanced multidimensional matrix support and operations, including math functions as log(), exp(), trigonometry etc.

### Pandas

In this author’s opinion, Python is THE platform for data science mostly because of Pandas. Built on top of numpy, Pandas makes easy the work of preparing and displaying data. You can think of it as a no-UI spreadsheet, but ready to work with much larger datasets. Pandas helps with data retrieval from a SQL database, CSV or other types of files, columns and rows manipulation, data filtering and, to some extent, data visualization with matplotlib.

### Matplotlib

Matplotlib is a library to plot 2D and 3D data. It has great support for notations in graphics, labels and overlays

### Seaborn

Built on top of matplotlib, Seaborn’s graphics are optimized for a more statistical comprehension of data. It automatically displays regression lines or Gauss curve approximations of plotted data.

### StatsModels

StatsModels provides algorithms for statistical and econometrics data analysis such as linear and logistic regressions. Statsmodel is also home for the classical family of time series algorithms known as ARIMA.

### Scikit-learn

The central piece of the machine-learning ecosystem, scikit provides predictor algorithms for regression (Elasticnet, Gradient Boosting, Random Forest etc) and classification and clustering (K-means, DBSCAN etc). It features a very well designed API. Scikit also has classes for advanced data manipulation, dataset split into train and test parts, dimensionality reduction and data pipeline preparation.

### XGBoost

XGBoost is the most advanced regressor and classifier used nowadays. It’s not part of scikit-learn, but it adheres to scikit’s API. XGBoost is not packaged for Fedora and should be installed with pip. XGBoost can be accelerated with your nVidia GPU, but not through its pip package. You can get this if you compile it yourself against CUDA. Get it with:

### NLTK

The Natural Language toolkit, or NLTK, helps you work with human language data for the purpose of building chatbots (just to cite an example).

### SHAP

Machine learning algorithms are very good on predicting, but aren’t good at explaining why they made a prediction. SHAP solves that, by analyzing trained models.

Install it with pip:

$pip3 install shap --user ### Keras Keras is a library for deep learning and neural networks. Install it with pip: $ sudo dnf instal python3-h5py$pip3 install keras --user ### TensorFlow TensorFlow is a popular neural networks builder. Install it with pip: $ pip3 install tensorflow --user

Photo courtesy of FolsomNatural on Flickr (CC BY-SA 2.0).

# RPM packages explained

Perhaps the best known way the Fedora community pursues its mission of promoting free and open source software and content is by developing the Fedora software distribution. So it’s not a surprise at all that a very large proportion of our community resources are spent on this task. This post summarizes how this software is “packaged” and the underlying tools such as rpm that make it all possible.

## RPM: the smallest unit of software

The editions and flavors (spins/labs/silverblue) that users get to choose from are all very similar. They’re all composed of various software that is mixed and matched to work well together. What differs between them is the exact list of tools that goes into each. That choice depends on the use case that they target. The basic unit of all of these is an RPM package file.

RPM files are archives that are similar to ZIP files or tarballs. In fact, they uses compression to reduce the size of the archive. However, along with files, RPM archives also contain metadata about the package. This can be queried using the rpm tool:

 $rpm -q fpaste fpaste-0.3.9.2-2.fc30.noarch$ rpm -qi fpaste
Name        : fpaste
Version     : 0.3.9.2
Release     : 2.fc30
Architecture: noarch
Install Date: Tue 26 Mar 2019 08:49:10 GMT
Group       : Unspecified
Size        : 64144
Signature   : RSA/SHA256, Thu 07 Feb 2019 15:46:11 GMT, Key ID ef3c111fcfc659b9
Source RPM  : fpaste-0.3.9.2-2.fc30.src.rpm
Build Date  : Thu 31 Jan 2019 20:06:01 GMT
Build Host  : buildhw-07.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://pagure.io/fpaste
Bug URL     : https://bugz.fedoraproject.org/fpaste
Summary     : A simple tool for pasting info onto sticky notes instances
Description :
It is often useful to be able to easily paste text to the Fedora
Pastebin at http://paste.fedoraproject.org and this simple script
will do that and return the resulting URL so that people may
examine the output. This can hopefully help folks who are for
some reason stuck without X, working remotely, or any other
reason they may be unable to paste something into the pastebin

$rpm -ql fpaste /usr/bin/fpaste /usr/share/doc/fpaste /usr/share/doc/fpaste/README.rst /usr/share/doc/fpaste/TODO /usr/share/licenses/fpaste /usr/share/licenses/fpaste/COPYING /usr/share/man/man1/fpaste.1.gz  When an RPM package is installed, the rpm tools know exactly what files were added to the system. So, removing a package also removes these files, and leaves the system in a consistent state. This is why installing software using rpm is preferred over installing software from source whenever possible. ## Dependencies Nowadays, it is quite rare for software to be completely self-contained. Even fpaste, a simple one file Python script, requires that the Python interpreter be installed. So, if the system does not have Python installed (highly unlikely, but possible), fpaste cannot be used. In packager jargon, we say that “Python is a run-time dependency of fpaste“. When RPM packages are built (the process of building RPMs is not discussed in this post), the generated archive includes all of this metadata. That way, the tools interacting with the RPM package archive know what else must must be installed so that fpaste works correctly: $ rpm -q --requires fpaste
/usr/bin/python3
python3
rpmlib(CompressedFileNames) &lt;= 3.0.4-1
rpmlib(FileDigests) &lt;= 4.6.0-1

$rpm -q --provides fpaste fpaste = 0.3.9.2-2.fc30$ rpm -qi python3
Name        : python3
Version     : 3.7.3
Release     : 3.fc30
Architecture: x86_64
Install Date: Thu 16 May 2019 18:51:41 BST
Group       : Unspecified
Size        : 46139
Signature   : RSA/SHA256, Sat 11 May 2019 17:02:44 BST, Key ID ef3c111fcfc659b9
Source RPM  : python3-3.7.3-3.fc30.src.rpm
Build Date  : Sat 11 May 2019 01:47:35 BST
Build Host  : buildhw-05.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://www.python.org/
Bug URL     : https://bugz.fedoraproject.org/python3
Summary     : Interpreter of the Python programming language
Description :
Python is an accessible, high-level, dynamically typed, interpreted programming
language, designed with an emphasis on code readability.
It includes an extensive standard library, and has a vast ecosystem of
third-party libraries.

The python3 package provides the "python3" executable: the reference
interpreter for the Python language, version 3.
The majority of its standard library is provided in the python3-libs package,
which should be installed automatically along with python3.
The remaining parts of the Python standard library are broken out into the
python3-tkinter and python3-test packages, which may need to be installed
separately.

Documentation for Python is provided in the python3-docs package.

Packages containing additional libraries for Python are generally named with
the "python3-" prefix.

$rpm -q --provides python3 python(abi) = 3.7 python3 = 3.7.3-3.fc30 python3(x86-64) = 3.7.3-3.fc30 python3.7 = 3.7.3-3.fc30 python37 = 3.7.3-3.fc30  ## Resolving RPM dependencies While rpm knows the required dependencies for each archive, it does not know where to find them. This is by design: rpm only works on local files and must be told exactly where they are. So, if you try to install a single RPM package, you get an error if rpm cannot find the package’s run-time dependencies. This example tries to install a package downloaded from the Fedora package set: $ ls
python3-elephant-0.6.2-3.fc30.noarch.rpm

$rpm -qpi python3-elephant-0.6.2-3.fc30.noarch.rpm Name : python3-elephant Version : 0.6.2 Release : 3.fc30 Architecture: noarch Install Date: (not installed) Group : Unspecified Size : 2574456 License : BSD Signature : (none) Source RPM : python-elephant-0.6.2-3.fc30.src.rpm Build Date : Fri 14 Jun 2019 17:23:48 BST Build Host : buildhw-02.phx2.fedoraproject.org Relocations : (not relocatable) Packager : Fedora Project Vendor : Fedora Project URL : http://neuralensemble.org/elephant Bug URL : https://bugz.fedoraproject.org/python-elephant Summary : Elephant is a package for analysis of electrophysiology data in Python Description : Elephant - Electrophysiology Analysis Toolkit Elephant is a package for the analysis of neurophysiology data, based on Neo.$ rpm -qp --requires python3-elephant-0.6.2-3.fc30.noarch.rpm
python(abi) = 3.7
python3.7dist(neo) >= 0.7.1
python3.7dist(numpy) >= 1.8.2
python3.7dist(quantities) >= 0.10.1
python3.7dist(scipy) >= 0.14.0
python3.7dist(six) >= 1.10.0
rpmlib(CompressedFileNames) &lt;= 3.0.4-1
rpmlib(FileDigests) &lt;= 4.6.0-1

$sudo rpm -i ./python3-elephant-0.6.2-3.fc30.noarch.rpm error: Failed dependencies: python3.7dist(neo) >= 0.7.1 is needed by python3-elephant-0.6.2-3.fc30.noarch python3.7dist(quantities) >= 0.10.1 is needed by python3-elephant-0.6.2-3.fc30.noarch  In theory, one could download all the packages that are required for python3-elephant, and tell rpm where they all are, but that isn’t convenient. What if python3-neo and python3-quantities have other run-time requirements and so on? Very quickly, the dependency chain can get quite complicated. ### Repositories Luckily, dnf and friends exist to help with this issue. Unlike rpm, dnf is aware of repositories. Repositories are collections of packages, with metadata that tells dnf what these repositories contain. All Fedora systems come with the default Fedora repositories enabled by default: $ sudo dnf repolist
repo id              repo name                             status
fedora               Fedora 30 - x86_64                    56,582
fedora-modular       Fedora Modular 30 - x86_64               135
updates-testing      Fedora 30 - x86_64 - Test Updates      8,458 

There’s more information on these repositories, and how they can be managed on the Fedora quick docs.

dnf can be used to query repositories for information on the packages they contain. It can also search them for software, or install/uninstall/upgrade packages from them:

 $sudo dnf search elephant Last metadata expiration check: 0:05:21 ago on Sun 23 Jun 2019 14:33:38 BST. ============================================================================== Name &amp; Summary Matched: elephant ============================================================================== python3-elephant.noarch : Elephant is a package for analysis of electrophysiology data in Python python3-elephant.noarch : Elephant is a package for analysis of electrophysiology data in Python$ sudo dnf list \*elephant\*
Last metadata expiration check: 0:05:26 ago on Sun 23 Jun 2019 14:33:38 BST.
Available Packages
python3-elephant.noarch      0.6.2-3.fc30              updates 

### Installing dependencies

When installing the package using dnf now, it resolves all the required dependencies, then calls rpm to carry out the transaction:

 $sudo dnf install python3-elephant Last metadata expiration check: 0:06:17 ago on Sun 23 Jun 2019 14:33:38 BST. Dependencies resolved. ============================================================================================================================================================================================== Package Architecture Version Repository Size ============================================================================================================================================================================================== Installing: python3-elephant noarch 0.6.2-3.fc30 updates-testing 456 k Installing dependencies: python3-neo noarch 0.8.0-0.1.20190215git49b6041.fc30 fedora 753 k python3-quantities noarch 0.12.2-4.fc30 fedora 163 k Installing weak dependencies: python3-igor noarch 0.3-5.20150408git2c2a79d.fc30 fedora 63 k Transaction Summary ============================================================================================================================================================================================== Install 4 Packages Total download size: 1.4 M Installed size: 7.0 M Is this ok [y/N]: y Downloading Packages: (1/4): python3-igor-0.3-5.20150408git2c2a79d.fc30.noarch.rpm 222 kB/s | 63 kB 00:00 (2/4): python3-elephant-0.6.2-3.fc30.noarch.rpm 681 kB/s | 456 kB 00:00 (3/4): python3-quantities-0.12.2-4.fc30.noarch.rpm 421 kB/s | 163 kB 00:00 (4/4): python3-neo-0.8.0-0.1.20190215git49b6041.fc30.noarch.rpm 840 kB/s | 753 kB 00:00 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 884 kB/s | 1.4 MB 00:01 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : python3-quantities-0.12.2-4.fc30.noarch 1/4 Installing : python3-igor-0.3-5.20150408git2c2a79d.fc30.noarch 2/4 Installing : python3-neo-0.8.0-0.1.20190215git49b6041.fc30.noarch 3/4 Installing : python3-elephant-0.6.2-3.fc30.noarch 4/4 Running scriptlet: python3-elephant-0.6.2-3.fc30.noarch 4/4 Verifying : python3-elephant-0.6.2-3.fc30.noarch 1/4 Verifying : python3-igor-0.3-5.20150408git2c2a79d.fc30.noarch 2/4 Verifying : python3-neo-0.8.0-0.1.20190215git49b6041.fc30.noarch 3/4 Verifying : python3-quantities-0.12.2-4.fc30.noarch 4/4 Installed: python3-elephant-0.6.2-3.fc30.noarch python3-igor-0.3-5.20150408git2c2a79d.fc30.noarch python3-neo-0.8.0-0.1.20190215git49b6041.fc30.noarch python3-quantities-0.12.2-4.fc30.noarch Complete!  Notice how dnf even installed python3-igor, which isn’t a direct dependency of python3-elephant. ## DnfDragora: a graphical interface to DNF While technical users may find dnf straightforward to use, it isn’t for everyone. Dnfdragora addresses this issue by providing a graphical front end to dnf. From a quick look, dnfdragora appears to provide all of dnf‘s main functions. There are other tools in Fedora that also manage packages. GNOME Software, and Discover are two examples. GNOME Software is focused on graphical applications only. You can’t use the graphical front end to install command line or terminal tools such as htop or weechat. However, GNOME Software does support the installation of Flatpaks and Snap applications which dnf does not. So, they are different tools with different target audiences, and so provide different functions. This post only touches the tip of the iceberg that is the life cycle of software in Fedora. This article explained what RPM packages are, and the main differences between using rpm and using dnf. In future posts, we’ll speak more about: • The processes that are needed to create these packages • How the community tests them to ensure that they are built correctly • The infrastructure that the community uses to get them to community users in future posts. # Personal assistant with Mycroft and Fedora Looking for an open source personal assistant ? Mycroft is allowing you to run an open source service which gives you better control of your data. ## Install Mycroft on Fedora Mycroft is currently not available in the official package collection, but it can be easily installed from the project source. The first step is to download the source from Mycroft’s GitHub repository. $ git clone https://github.com/MycroftAI/mycroft-core.git

Mycroft is a Python application and the project provides a script that takes care of creating a virtual environment before installing Mycroft and its dependencies.

$cd mycroft-core$ ./dev_setup.sh

The installation script prompts the user to help him with the installation process. It is recommended to run the stable version and get automatic updates.

When prompted to install locally the Mimic text-to-speech engine, answer No. Since as described in the installation process this can take a long time and Mimic is available as an rpm package in Fedora so it can be installed using dnf.

$sudo dnf install mimic ## Starting Mycroft After the installation is complete, the Mycroft services can be started using the following script. $ ./start-mycroft.sh all

In order to start using Mycroft the device running the service needs to be registered. To do that an account is needed and can be created at https://home.mycroft.ai/.

Once the account created, it is possible to add a new device at the following address https://account.mycroft.ai/devices. Adding a new device requires a pairing code that will be spoken to you by your device after starting all the services.

The device is now ready to be used.

## Using Mycroft

Mycroft provides a set of skills that are enabled by default or can be downloaded from the Marketplace. To start you can simply ask Mycroft how is doing, or what the weather is.

Hey Mycroft, how are you ?Hey Mycroft, what's the weather like ?`

If you are interested in how things works, the start-mycroft.sh script provides a cli option that lets you interact with the services using the command line. It is also displaying logs which is really useful for debugging.

Mycroft is always trying to learn new skills, and there are many way to help by contributing the Mycroft community.

Photo by Przemyslaw Marczynski on Unsplash