Categories
Allgemein

The Type of User Input

In console applications user input will mostly come through the form of arguments provided by the user’s call of the program. We will explore how argparse, Python 3’s built-in argument parser converts this input to an appropriate type. There are some quirks there that provide some additional insights into (duck) typing and user input validation.

Basic argparse

Not everyone knows argparse, or Python for that matter, so we start with a basic example. Consult the official Python 3 documentation about argparse for details about this handy module.

Here is a very basic example of an argument parser using argparse.

import argparse

# Construct a parser
parser = argparse.ArgumentParser()

# Add arguments
parser.add_argument('my_positional_arg')
parser.add_argument('--my-other-arg')

# Parse the argv
args = parser.parse_args()

# Print the arguments received
print(args.my_positional_arg)
print(args.my_other_arg)

Calling this script using python .\argparser.py 1 --my-other-arg 2 will yield the following result.

1
2

Nothing exceptional happened here. However, we somehow encoded the implicit assumption that both arguments will be integers. Their type however is string, because argparse defaults to that. In fact, any other string could be provided and argparse would gullibly parse these as well.

We should fix that.

argparse type and choices

argparse comes with some built-in functionality for handling types. From the argparse documentation:

By default, ArgumentParser objects read command-line arguments in as simple strings. However, quite often the command-line string should instead be interpreted as another type, like a float or int. The type keyword argument of add_argument() allows any necessary type-checking and type conversions to be performed. Common built-in types and functions can be used directly as the value of the type argument:

https://docs.python.org/3/library/argparse.html#type

We might also want to limit the choices of the user to be i.e. only 0 and 1. Again we have a look at the documentation and Bingo!

Some command-line arguments should be selected from a restricted set of values. These can be handled by passing a container object as the choices keyword argument to add_argument(). When the command line is parsed, argument values will be checked, and an error message will be displayed if the argument was not one of the acceptable values:

https://docs.python.org/3/library/argparse.html#choices

The extended script looks like this.

import argparse

parser = argparse.ArgumentParser()

allowed_values = [0,1]

parser.add_argument('my_positional_arg',
                    type=int, choices=allowed_values)
parser.add_argument('--my-other-arg',
                     type=int, choices=allowed_values)
args = parser.parse_args()

print(args.my_positional_arg)
print(type(args.my_positional_arg))
print(args.my_other_arg)
print(type(args.my_other_arg))

We call it again with python .\argparser.py 1 --my-other-arg 1 and receive the following output:

1
<class 'int'>
1
<class 'int'>

Calling it with a value that’s not in the choices gives a helpful tip about it:

usage: argparser.py [-h] [--my-other-arg {0,1}] {0,1}
argparser.py: error: argument --my-other-arg: invalid choice: 2 (choose from 0, 1)

Okay, but how can we deal with more complex scenarios. For example, we might want to only accept even numbers. Or we would like to accept yes and no for True and False.

Duck Typing at its Best

What would we typically expect the type keyword argument to accept? Classes probably and primitives. In fact, everything we can get via type() will probably work just fine if it has a string constructor.

But what more to Python is a type but the return value of a function. In fact,

type= can take any callable that takes a single string argument and returns the converted value

https://docs.python.org/3/library/argparse.html#type

Typically, these callables are the __init__ calls made by convention if a class is called. But there is nothing disallowing using a lambda as a type or any other function.

parser.add_argument('my_positional_arg',
                    type=lambda x:
                         int(x) if int(x) % 2 == 0 else None)

This will only accept even numbers, otherwise None will be stored. The type of this will either be reported as <class ‘int’> or <class ‘NoneType’>. It is essentially a union of these two.

This allows parsing user input to a type we may later use. In general, if this kind of input validation is required for some command line arguments it makes sense to throw a ValueError describing what constraints must be fulfilled. If this is something that requires more reliability, throwing probably still makes sense, but catching and handling the error is likely required (think answering a POST request with a http error). Here is a simple example:

def construct_even(x):
    if int(x) % 2 == 0:
        return int(x)
    else:
        raise ValueError('argument must be even')

parser.add_argument('my_positional_arg', type=construct_even)

This pattern of trying to construct a proper type from user input and throwing (or using other means of raising errors) and handling the error on a level, where a user response can be made is one of the most simple forms of input validation. That is also, why it makes sense to use it. It is easy to implement, catches input issues early and can therefore be applied to a wide range of user input scenarios.

By Tilmann Matthaei

I'm an aspiring software professional looking to share what I learn about reliable software along the way.

I hold a Bachelor's Degree in Applied Computer Science - Digital Media and Games Development and am working in software development since 2018.
I have experience in embedded development (mostly in C++) as well as Continuous Integration and IT Security.

Feel free to contact me via tilmann@matthaei.dev.