iccas package


iccas.checks module

Sanity checks.

iccas.checks.totals_not_less_than_sum_of_sexes(data, variable)[source]

iccas.loading module


Returns the latest version of the ICCAS dataset in a pandas.DataFrame (as it’s returned by load()).

This function uses RemoteFolderCache.get(), which caches.

  • request.exceptions.ConnectionError – if the server is unreachable

  • and no dataset is available in cache_dir

Return type


iccas.loading.get_by_date(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]
Return type

Tuple[DataFrame, Timestamp]


Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type



Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type


iccas.loading.get_url(date=None, fmt='csv')[source]

Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.

Return type


Return type


iccas.loading.load_single_date(path, keep_date=False)[source]

Loads a dataset containing data for a single date.

By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).

  • path (Union[str, Path]) –

  • keep_date (bool) – whether to drop the date column (containing a single datetime value)

Return type


iccas.processing module

iccas.processing.fix_monotonicity(data, method='pchip', **interpolation)[source]

Replaces tracts of all cases and deaths time series that break the non-decreasing trend of the series with interpolated data. This function also ensures that the following conditions are still satisfied even after the “correction”:

male_cases + female_cases <= cases
male_deaths + female_deaths <= deaths

Non-integer columns, if present, are ignored and returned as they are in the output DataFrame.

  • data (DataFrame) – a DataFrame containing all integer columns about cases and deaths

  • method – interpolation method


a DataFrame with all integer time series (columns) modified so that they are non-decreasing time series


Set to NaN all elements s[i] such that s[i] > s[i+k]

iccas.processing.reindex_by_interpolating(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]

Reindexes data and fills new values by interpolation (PCHIP, by default).

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes.

  • data (~PandasObj) – a DataFrame or Series with a datetime index

  • new_index (DatetimeIndex) –

  • preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int

  • method – interpolation method (see pandas.DataFrame.interpolate())

  • **interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type



a new Dataframe/Series

iccas.processing.resample(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]

Resamples data and fills missing values by interpolation.

The resulting index is a pandas.DatetimeIndex whose elements are spaced accordingly to freq and having the time set to {hour}:00.

In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just use DataFrame.resample().

  • data (~PandasObj) – a DataFrame or Series with a datetime index

  • freq (Union[int, str]) – resampling frequency in pandas notation

  • hour (int) – reference hour; all datetimes in the new index will have this hour

  • preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int

  • method – interpolation method (see pandas.DataFrame.interpolate())

  • **interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type



a new Dataframe/Series with index elements spaced according to freq

iccas.queries module

iccas.queries.age_grouper(cuts, fmt_last='>={}')[source]
Return type

Dict[str, str]

iccas.queries.aggregate_age_groups(counts, cuts, fmt_last='>={}')[source]

Aggregates counts for different age groups summing them together.

  • counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)

  • cuts (Union[int, Sequence[int]]) – a single integer N means “cut each N years”; a sequence of integers determines the start ages of new age groups; 0 is implicitly the start age of the first group, even if not present in cuts.

  • fmt_last (str) – format string for the last “unbounded” age group

Return type



A Series/DataFrame with the same “structure” of the input but with aggregated age groups.

iccas.queries.average_by_period(counts, freq)[source]

Returns a new Series/DataFrame with average counts (cases/deaths) by period (e.g. months, weeks, n days ecc)

  • counts (~PandasObj) –

  • freq (Union[str, int]) – period frequency parameter (whatever accepted by pandas)


Return type


iccas.queries.cols(prefixes, fields='*')[source]

Generates a list of columns by combining prefixes with fields.

  • prefixes (str) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for all

  • fields (Union[str, Sequence[str]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’

Return type



a list of string

iccas.queries.count_by_period(counts, freq)[source]

Returns a new Series/DataFrame with counts (cases/deaths) by period (e.g. months, weeks, n days ecc)

  • counts (~PandasObj) –

  • freq (Union[str, int]) – period frequency parameter (whatever accepted by pandas)


Return type


iccas.queries.fatality_rate(counts, shift)[source]

Computes the fatality rate as a ratio between the total number of deaths and the total number of cases shift days before.

counts is resampled with interpolation if needed.

iccas.queries.get_unknown_sex_count(counts, variable)[source]

Returns cases/deaths of unknown sex for each age group

Return type



Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]

Return type



Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.

Return type



Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]

Return type


iccas.queries.product_join(*string_iterables, sep='')[source]
Return type


iccas.queries.running_average(counts, window, step=1, **resample_kwargs)[source]

Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of window, moving the window step days a time.

  • counts (~PandasObj) –

  • window (int) –

  • step (int) –


Return type


iccas.queries.running_count(counts, window, step=1, **resample_kwargs)[source]

Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of window days that moves forward by steps of step days.

  • counts (~PandasObj) –

  • window (int) –

  • step (int) –


Return type


iccas.types module

Module contents

iccas.age_grouper(cuts, fmt_last='>={}')[source]
Return type

Dict[str, str]

iccas.aggregate_age_groups(counts, cuts, fmt_last='>={}')[source]

Aggregates counts for different age groups summing them together.

  • counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)

  • cuts (Union[int, Sequence[int]]) – a single integer N means “cut each N years”; a sequence of integers determines the start ages of new age groups; 0 is implicitly the start age of the first group, even if not present in cuts.

  • fmt_last (str) – format string for the last “unbounded” age group

Return type



A Series/DataFrame with the same “structure” of the input but with aggregated age groups.

iccas.average_by_period(counts, freq)[source]

Returns a new Series/DataFrame with average counts (cases/deaths) by period (e.g. months, weeks, n days ecc)

  • counts (~PandasObj) –

  • freq (Union[str, int]) – period frequency parameter (whatever accepted by pandas)


Return type


iccas.cols(prefixes, fields='*')[source]

Generates a list of columns by combining prefixes with fields.

  • prefixes (str) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for all

  • fields (Union[str, Sequence[str]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’

Return type



a list of string

iccas.count_by_period(counts, freq)[source]

Returns a new Series/DataFrame with counts (cases/deaths) by period (e.g. months, weeks, n days ecc)

  • counts (~PandasObj) –

  • freq (Union[str, int]) – period frequency parameter (whatever accepted by pandas)


Return type


iccas.fatality_rate(counts, shift)[source]

Computes the fatality rate as a ratio between the total number of deaths and the total number of cases shift days before.

counts is resampled with interpolation if needed.

iccas.fix_monotonicity(data, method='pchip', **interpolation)[source]

Replaces tracts of all cases and deaths time series that break the non-decreasing trend of the series with interpolated data. This function also ensures that the following conditions are still satisfied even after the “correction”:

male_cases + female_cases <= cases
male_deaths + female_deaths <= deaths

Non-integer columns, if present, are ignored and returned as they are in the output DataFrame.

  • data (DataFrame) – a DataFrame containing all integer columns about cases and deaths

  • method – interpolation method


a DataFrame with all integer time series (columns) modified so that they are non-decreasing time series


Returns the latest version of the ICCAS dataset in a pandas.DataFrame (as it’s returned by load()).

This function uses RemoteFolderCache.get(), which caches.

  • request.exceptions.ConnectionError – if the server is unreachable

  • and no dataset is available in cache_dir

Return type


iccas.get_by_date(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]
Return type

Tuple[DataFrame, Timestamp]


Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type



Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type


iccas.get_unknown_sex_count(counts, variable)[source]

Returns cases/deaths of unknown sex for each age group

Return type


iccas.get_url(date=None, fmt='csv')[source]

Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.

Return type


Return type


iccas.load_single_date(path, keep_date=False)[source]

Loads a dataset containing data for a single date.

By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).

  • path (Union[str, Path]) –

  • keep_date (bool) – whether to drop the date column (containing a single datetime value)

Return type



Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]

Return type



Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.

Return type



Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]

Return type


iccas.reindex_by_interpolating(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]

Reindexes data and fills new values by interpolation (PCHIP, by default).

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes.

  • data (~PandasObj) – a DataFrame or Series with a datetime index

  • new_index (DatetimeIndex) –

  • preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int

  • method – interpolation method (see pandas.DataFrame.interpolate())

  • **interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type



a new Dataframe/Series

iccas.resample(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]

Resamples data and fills missing values by interpolation.

The resulting index is a pandas.DatetimeIndex whose elements are spaced accordingly to freq and having the time set to {hour}:00.

In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just use DataFrame.resample().

  • data (~PandasObj) – a DataFrame or Series with a datetime index

  • freq (Union[int, str]) – resampling frequency in pandas notation

  • hour (int) – reference hour; all datetimes in the new index will have this hour

  • preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int

  • method – interpolation method (see pandas.DataFrame.interpolate())

  • **interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type



a new Dataframe/Series with index elements spaced according to freq

iccas.running_average(counts, window, step=1, **resample_kwargs)[source]

Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of window, moving the window step days a time.

  • counts (~PandasObj) –

  • window (int) –

  • step (int) –


Return type


iccas.running_count(counts, window, step=1, **resample_kwargs)[source]

Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of window days that moves forward by steps of step days.

  • counts (~PandasObj) –

  • window (int) –

  • step (int) –


Return type



Sets the language. Supported languages: Italian (“it”) and English (“en”)


Apart from setting the internal language of the package, also sets the locale accordingly so that pandas/matplotlib displays translated dates