Posted on Leave a comment

Using data from spreadsheets in Fedora with Python

Python is one of the most popular and powerful programming languages available. Because it’s free and open source, it’s available to everyone — and most Fedora systems come with the language already installed. Python is useful for a wide variety of tasks, but among them is processing comma-separated value (CSV) data. CSV files often start off life as tables or spreadsheets. This article shows how to get started working with CSV data in Python 3.

CSV data is precisely what it sounds like. A CSV file includes one row of data at a time, with data values separated by commas. Each row is defined by the same fields. Short CSV files are often easily read and understood. But longer data files, or those with more fields, may be harder to parse with the naked eye, so computers work better in those cases.

Here’s a simple example where the fields are Name, Email, and Country. In this example, the CSV data includes a field definition as the first row, although that is not always the case.

Name,Email,Country
John Q. Smith,[email protected],USA
Petr Novak,[email protected],CZ
Bernard Jones,[email protected],UK

Reading CSV from spreadsheets

Python helpfully includes a csv module that has functions for reading and writing CSV data. Most spreadsheet applications, both native like Excel or Numbers, and web-based such as Google Sheets, can export CSV data. In fact, many other services that can publish tabular reports will also export as CSV (PayPal for instance).

The Python csv module has a built in reader method called DictReader that can deal with each data row as an ordered dictionary (OrderedDict). It expects a file object to access the CSV data. So if our file above is called example.csv in the current directory, this code snippet is one way to get at this data:

f = open('example.csv', 'r')
from csv import DictReader
d = DictReader(f)
data = []
for row in d: data.append(row)

Now the data object in memory is a list of OrderedDict objects :

[OrderedDict([('Name', 'John Q. Smith'), ('Email', '[email protected]'), ('Country', 'USA')]), OrderedDict([('Name', 'Petr Novak'), ('Email', '[email protected]'), ('Country', 'CZ')]), OrderedDict([('Name', 'Bernard Jones'), ('Email', '[email protected]'), ('Country', 'UK')])]

Referencing each of these objects is easy:

>>> print(data[0]['Country'])
USA
>>> print(data[2]['Email'])
[email protected]

By the way, if you have to deal with a CSV file with no header row of field names, the DictReader class lets you define them. In the example above, add the fieldnames argument and pass a sequence of the names:

d = DictReader(f, fieldnames=['Name', 'Email', 'Country'])

A real world example

I recently wanted to pick a random winner from a long list of individuals. The CSV data I pulled from spreadsheets was a simple list of names and email addresses.

Fortunately, Python also has a helpful random module good for generating random values. The randrange function in the Random class from that module was just what I needed. You can give it a regular range of numbers — like integers — and a step value between them. The function then generates a random result, meaning I could get a random integer (or row number!) back within the total number of rows in my data.

So this small program worked well:

from csv import DictReader
from random import Random d = DictReader(open('mydata.csv'))
data = []
for row in d: data.append(row) r = Random()
winner = data[r.randrange(0, len(data), 1)]
print('The winner is:', winner['Name'])
print('Email address:', winner['Email'])

Obviously this example is extremely simple. Spreadsheets themselves include sophisticated ways to analyze data. However, if you want to do something outside the realm of your spreadsheet app, Python may be just the trick!


Photo by Isaac Smith on Unsplash.

Posted on Leave a comment

Best of 2019: Fedora for developers

With the end of the year approaching fast, it is a good time to look back at 2019 and go through the most popular articles on Fedora Magazine written by our contributors.

In this article of the “Best of 2019” series, we are looking at developers and how to use Fedora to be a great developer workstation

Make your Python code look good with Black on Fedora

Black made quite a big impact in the Python ecosystem this year. The project is now part of the Python Software Foundation and it is used by many different projects. So if you write or maintain some Python code and want to stop having to care about code style and code formatting you should check out this article.

How to run virtual machines with virt-manager

Setting up a development environment, running integration tests, testing a new feature, or running an older version of software for all these use cases being able to create and run a virtual machine is a must have knowledge for a developer. This article will walk you through how you can achieve that using virt-manager on your Fedora workstation.

Jupyter and data science in Fedora

With the rise of Data science and machine learning, the Jupyter IDE has become of very popular choice to share or present a program and its results. This article goes into the details of installing and using Jupyter and the different libraries and tools useful for data science.

Building Smaller Container Images

Fedora provides different container images, one of which is a minimal base image. The following article demonstrate how one can use this image to build smaller container images.

Getting Started with Go on Fedora

In 2019 the Go programming language turned 10 year old. In ten years the language has managed to become the default choice for cloud native applications and the cloud ecosystems. Fedora is providing an easy way to start developing in Go, this article takes you through the first step needed to get started.

Stay tuned to the Magazine for other upcoming “Best of 2019” categories. All of us at the Magazine hope you have a great end of year and holiday season.

Posted on Leave a comment

Manage business documents with OpenAS2 on Fedora

Business documents often require special handling. Enter Electronic Document Interchange, or EDI. EDI is more than simply transferring files using email or http (or ftp), because these are documents like orders and invoices. When you send an invoice, you want to be sure that:

1. It goes to the right destination, and is not intercepted by competitors.
2. Your invoice cannot be forged by a 3rd party.
3. Your customer can’t claim in court that they never got the invoice.

The first two goals can be accomplished by HTTPS or email with S/MIME, and in some situations, a simple HTTPS POST to a web API is sufficient. What EDI adds is the last part.

This article does not cover the messy topic of formats for the files exchanged. Even when using a standardized format like ANSI or EDIFACT, it is ultimately up to the business partners. It is not uncommon for business partners to use an ad-hoc CSV file format. This article shows you how to configure Fedora to send and receive in an EDI setup.

Centralized EDI

The traditional solution is to use a Value Added Network, or VAN. The VAN is a central hub that transfers documents between their customers. Most importantly, it keeps a secure record of the documents exchanged that can be used as evidence in disputes. The VAN can use different transfer protocols for each of its customers

AS Protocols and MDN

The AS protocols are a specification for adding a digital signature with optional encryption to an electronic document. What it adds over HTTPS or S/MIME is the Message Disposition Notification, or MDN. The MDN is a signed and dated response that says, in essence, “We got your invoice.” It uses a secure hash to identify the specific document received. This addresses point #3 without involving a third party.

The AS2 protocol uses HTTP or HTTPS for transport. Other AS protocols target FTP and SMTP. AS2 is used by companies big and small to avoid depending on (and paying) a VAN.

OpenAS2

OpenAS2 is an open source Java implemention of the AS2 protocol. It is available in Fedora since 28, and installed with:

$ sudo dnf install openas2
$ cd /etc/openas2

Configuration is done with a text editor, and the config files are in XML. The first order of business before starting OpenAS2 is to change the factory passwords.

Edit /etc/openas2/config.xml and search for ChangeMe. Change those passwords. The default password on the certificate store is testas2, but that doesn’t matter much as anyone who can read the certificate store can read config.xml and get the password.

What to share with AS2 partners

There are 3 things you will exchange with an AS2 peer.

AS2 ID

Don’t bother looking up the official AS2 standard for legal AS2 IDs. While OpenAS2 implements the standard, your partners will likely be using a proprietary product which doesn’t. While AS2 allows much longer IDs, many implementations break with more than 16 characters. Using otherwise legal AS2 ID chars like ‘:’ that can appear as path separators on a proprietary OS is also a problem. Restrict your AS2 ID to upper and lower case alpha, digits, and ‘_’ with no more than 16 characters.

SSL certificate

For real use, you will want to generate a certificate with SHA256 and RSA. OpenAS2 ships with two factory certs to play with. Don’t use these for anything real, obviously. The certificate file is in PKCS12 format. Java ships with keytool which can maintain your PKCS12 “keystore,” as Java calls it. This article skips using openssl to generate keys and certificates. Simply note that sudo keytool -list -keystore as2_certs.p12 will list the two factory practice certs.

AS2 URL

This is an HTTP URL that will access your OpenAS2 instance. HTTPS is also supported, but is redundant. To use it you have to uncomment the https module configuration in config.xml, and supply a certificate signed by a public CA. This requires another article and is entirely unnecessary here.

By default, OpenAS2 listens on 10080 for HTTP and 10443 for HTTPS. OpenAS2 can talk to itself, so it ships with two partnerships using http://localhost:10080 as the AS2 URL. If you don’t find this a convincing demo, and can install a second instance (on a VM, for instance), you can use private IPs for the AS2 URLs. Or install Cjdns to get IPv6 mesh addresses that can be used anywhere, resulting in AS2 URLs like http://[fcbf:fc54:e597:7354:8250:2b2e:95e6:d6ba]:10080.

Most businesses will also want a list of IPs to add to their firewall. This is actually bad practice. An AS2 server has the same security risk as a web server, meaning you should isolate it in a VM or container. Also, the difficulty of keeping mutual lists of IPs up to date grows with the list of partners. The AS2 server rejects requests not signed by a configured partner.

OpenAS2 Partners

With that in mind, open partnerships.xml in your editor. At the top is a list of “partners.” Each partner has a name (referenced by the partnerships below as “sender” or “receiver”), AS2 ID, certificate, and email. You need a partner definition for yourself and those you exchange documents with. You can define multiple partners for yourself. OpenAS2 ships with two partners, OpenAS2A and OpenAS2B, which you’ll use to send a test document.

OpenAS2 Partnerships

Next is a list of “partnerships,” one for each direction. Each partnership configuration includes the sender, receiver, and the AS2 URL used to send the documents. By default, partnerships use synchronous MDN. The MDN is returned on the same HTTP transaction. You could uncomment the as2_receipt_option for asynchronous MDN, which is sent some time later. Use synchronous MDN whenever possible, as tracking pending MDNs adds complexity to your application.

The other partnership options select encryption, signature hash, and other protocol options. A fully implemented AS2 receiver can handle any combination of options, but AS2 partners may have incomplete implementations or policy requirements. For example, DES3 is a comparatively weak encryption algorithm, and may not be acceptable. It is the default because it is almost universally implemented.

If you went to the trouble to set up a second physical or virtual machine for this test, designate one as OpenAS2A and the other as OpenAS2B. Modify the as2_url on the OpenAS2A-to-OpenAS2B partnership to use the IP (or hostname) of OpenAS2B, and vice versa for the OpenAS2B-to-OpenAS2A partnership. Unless they are using the FedoraWorkstation firewall profile, on both machines you’ll need:

# sudo firewall-cmd --zone=public --add-port=10080/tcp

Now start the openas2 service (on both machines if needed):

# sudo systemctl start openas2

Resetting the MDN password

This initializes the MDN log database with the factory password, not the one you changed it to. This is a packaging bug to be fixed in the next release. To avoid frustration, here’s how to change the h2 database password:

$ sudo systemctl stop openas2
$ cat >h2passwd <<'DONE'
#!/bin/bash
AS2DIR="/var/lib/openas2"
java -cp "$AS2DIR"/lib/h2* org.h2.tools.Shell \
-url jdbc:h2:"$AS2DIR"/db/openas2 \
-user sa -password "$1" <<EOF
alter user sa set password '$2';
exit
EOF
DONE
$ sudo sh h2passwd ChangeMe yournewpasswordsetabove
$ sudo systemctl start openas2

Testing the setup

With that out of the way, let’s send a document. Assuming you are on OpenAS2A machine:

$ cat >testdoc <<'DONE'
This is not a real EDI format, but is nevertheless a document.
DONE
$ sudo chown openas2 testdoc
$ sudo mv testdoc /var/spool/openas2/toOpenAS2B
$ sudo journalctl -f -u openas2
... log output of sending file, Control-C to stop following log
^C

OpenAS2 does not send a document until it is writable by the openas2 user or group. As a consequence, your actual business application will copy, or generate in place, the document. Then it changes the group or permissions to send it on its way, to avoid sending a partial document.

Now, on the OpenAS2B machine, /var/spool/openas2/OpenAS2A_OID-OpenAS2B_OID/inbox shows the message received. That should get you started!


Photo by Beatriz Pérez Moya on Unsplash.