iccas package¶

Submodules¶

iccas.caching module¶

class iccas.caching.RemoteFolderProxy(folder_url, local_path)[source]¶

Bases: object

get(relative_path, force_download=False)[source]¶

Ensures the latest version of a remote file is available locally in the cache, downloading it only if needed. If no internet connection is available (or the server is unreachable), the file available in the cache is returned with a warning; if the file is not in the cache, a ConnectionError is raised.

Parameters

relative_path –
force_download (bool) –

Return type

Path

Returns

full local path of the file

get_path_of(relative_url)[source]¶

Return type: Path

iccas.checks module¶

Sanity checks.

iccas.checks.is_non_decreasing(df)[source]¶

iccas.checks.totals_not_less_than_sum_of_sexes(data, variable)[source]¶

iccas.loading module¶

iccas.loading.get(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Returns the latest version of the ICCAS dataset in a pandas.DataFrame (as it’s returned by load()).

This function uses RemoteFolderCache.get(), which caches.

Raises

request.exceptions.ConnectionError – if the server is unreachable
and no dataset is available in cache_dir –

Return type

DataFrame

iccas.loading.get_by_date(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Return type: Tuple[DataFrame, Timestamp]

iccas.loading.get_population_by_age(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type: DataFrame

iccas.loading.get_population_by_age_group(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type: DataFrame

iccas.loading.get_url(date=None, fmt='csv')[source]¶

Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.

Return type: str

iccas.loading.load(path)[source]¶

Return type: DataFrame

iccas.loading.load_single_date(path, keep_date=False)[source]¶

Loads a dataset containing data for a single date.

By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).

Parameters

path (Union[str, Path]) –
keep_date (bool) – whether to drop the date column (containing a single datetime value)

Return type

DataFrame

iccas.processing module¶

iccas.processing.fix_monotonicity(data, method='pchip', **interpolation)[source]¶

Replaces tracts of “cases” and “deaths” time series that break the monotonicity of the series with interpolated data, ensuring that the sum of male and female counts are less or equal to the total count.

Parameters

data (DataFrame) – a DataFrame containing all integer columns about cases and deaths
method – interpolation method

Returns:

iccas.processing.nullify_local_bumps(df)[source]¶

iccas.processing.nullify_series_local_bumps(series)[source]¶: Set to NaN all elements s[i] such that s[i] > s[i+k]

iccas.processing.reindex_by_interpolating(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]¶

Reindexes data and fills new values by interpolation (PCHIP, by default).

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes.

Parameters

data (~PandasObj) – a DataFrame or Series with a datetime index
new_index (DatetimeIndex) –
preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int
method – interpolation method (see pandas.DataFrame.interpolate())
**interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type

~PandasObj

Returns

a new Dataframe/Series

See also

reindex_by_interpolating()

iccas.processing.resample(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]¶

Resamples data and fills missing values by interpolation.

The resulting index is a pandas.DatetimeIndex whose elements are spaced by accordingly to freq and having the time set to {hour}:00.

In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just use DataFrame.resample().

Parameters

data (~PandasObj) – a DataFrame or Series with a datetime index
freq (Union[int, str]) – resampling frequency in pandas notation
hour (int) – reference hour; all datetimes in the new index will have this hour
preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int
method – interpolation method (see pandas.DataFrame.interpolate())
**interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type

~PandasObj

Returns

a new Dataframe/Series with index elements spaced according to freq

See also

reindex_by_interpolating()

iccas.queries module¶

iccas.queries.age_grouper(cuts, fmt_last='>={}')[source]¶

Return type: Dict[str, str]

iccas.queries.aggregate_age_groups(counts, cuts, fmt_last='>={}')[source]¶

Aggregates counts for different age groups summing them together.

Parameters

counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)
cuts (Union[int, Sequence[int]]) – a single integer N means “cuts each N years”; a sequence of integers determines the start ages of new age groups.
fmt_last (str) – format string for the last “unbounded” age group

Return type

~PandasObj

Returns

A Series/DataFrame with the same “structure” of the input but with aggregated age groups.

iccas.queries.average_by_period(counts, freq)[source]¶

Return type: ~PandasObj

iccas.queries.cols(prefixes, fields='*')[source]¶

Generates a list of columns by combining prefixes with fields.

Parameters

prefixes (str) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for all
fields (Union[str, Sequence[str]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’

Return type

List[str]

Returns

a list of string

iccas.queries.count_by_period(counts, freq)[source]¶

Return type: ~PandasObj

iccas.queries.fatality_rate(counts, shift)[source]¶

Computes the fatality rate as a ratio between the total number of deaths and the total number of cases shift days before.

counts is resampled with interpolation if needed.

iccas.queries.get_unknown_sex_count(counts, variable)[source]¶

Returns cases/deaths of unknown sex for each age group

Return type: DataFrame

iccas.queries.only_cases(data)[source]¶

Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]

Return type: DataFrame

iccas.queries.only_counts(data)[source]¶

Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.

Return type: DataFrame

iccas.queries.only_deaths(data)[source]¶

Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]

Return type: DataFrame

iccas.queries.product_join(*string_iterables, sep='')[source]¶

Return type: Iterable[str]

iccas.queries.running_average(counts, window=7, step=1, **resample_kwargs)[source]¶

Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of window, moving the window step days a time.

Parameters

counts (~PandasObj) –
window (int) –
step (int) –

Returns:

Return type: ~PandasObj

iccas.queries.running_count(counts, window=7, step=1, **resample_kwargs)[source]¶

Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of window days that moves forward by steps of step days.

Parameters

counts (~PandasObj) –
window (int) –
step (int) –

Returns:

Return type: ~PandasObj

iccas.types module¶

Module contents¶

iccas.age_grouper(cuts, fmt_last='>={}')[source]¶

Return type: Dict[str, str]

iccas.aggregate_age_groups(counts, cuts, fmt_last='>={}')[source]¶

Aggregates counts for different age groups summing them together.

Parameters

counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)
cuts (Union[int, Sequence[int]]) – a single integer N means “cuts each N years”; a sequence of integers determines the start ages of new age groups.
fmt_last (str) – format string for the last “unbounded” age group

Return type

~PandasObj

Returns

A Series/DataFrame with the same “structure” of the input but with aggregated age groups.

iccas.cols(prefixes, fields='*')[source]¶

Generates a list of columns by combining prefixes with fields.

Parameters

prefixes (str) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for all
fields (Union[str, Sequence[str]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’

Return type

List[str]

Returns

a list of string

iccas.fatality_rate(counts, shift)[source]¶

Computes the fatality rate as a ratio between the total number of deaths and the total number of cases shift days before.

counts is resampled with interpolation if needed.

iccas.fix_monotonicity(data, method='pchip', **interpolation)[source]¶

Replaces tracts of “cases” and “deaths” time series that break the monotonicity of the series with interpolated data, ensuring that the sum of male and female counts are less or equal to the total count.

Parameters

data (DataFrame) – a DataFrame containing all integer columns about cases and deaths
method – interpolation method

Returns:

iccas.get(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Returns the latest version of the ICCAS dataset in a pandas.DataFrame (as it’s returned by load()).

This function uses RemoteFolderCache.get(), which caches.

Raises

request.exceptions.ConnectionError – if the server is unreachable
and no dataset is available in cache_dir –

Return type

DataFrame

iccas.get_by_date(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Return type: Tuple[DataFrame, Timestamp]

iccas.get_population_by_age(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type: DataFrame

iccas.get_population_by_age_group(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶

Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)

Return type: DataFrame

iccas.get_unknown_sex_count(counts, variable)[source]¶

Returns cases/deaths of unknown sex for each age group

Return type: DataFrame

iccas.get_url(date=None, fmt='csv')[source]¶

Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.

Return type: str

iccas.load(path)[source]¶

Return type: DataFrame

iccas.load_single_date(path, keep_date=False)[source]¶

Loads a dataset containing data for a single date.

By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).

Parameters

path (Union[str, Path]) –
keep_date (bool) – whether to drop the date column (containing a single datetime value)

Return type

DataFrame

iccas.only_cases(data)[source]¶

Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]

Return type: DataFrame

iccas.only_counts(data)[source]¶

Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.

Return type: DataFrame

iccas.only_deaths(data)[source]¶

Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]

Return type: DataFrame

iccas.reindex_by_interpolating(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]¶

Reindexes data and fills new values by interpolation (PCHIP, by default).

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes.

Parameters

data (~PandasObj) – a DataFrame or Series with a datetime index
new_index (DatetimeIndex) –
preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int
method – interpolation method (see pandas.DataFrame.interpolate())
**interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type

~PandasObj

Returns

a new Dataframe/Series

See also

reindex_by_interpolating()

iccas.resample(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]¶

Resamples data and fills missing values by interpolation.

The resulting index is a pandas.DatetimeIndex whose elements are spaced by accordingly to freq and having the time set to {hour}:00.

In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.

This function was motivated by the fact that pandas.DataFrame.resample() followed by pandas.DataFrame.resample() doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just use DataFrame.resample().

Parameters

data (~PandasObj) – a DataFrame or Series with a datetime index
freq (Union[int, str]) – resampling frequency in pandas notation
hour (int) – reference hour; all datetimes in the new index will have this hour
preserve_ints (bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to int
method – interpolation method (see pandas.DataFrame.interpolate())
**interpolation – other interpolation keyword argument different from method passed to pandas.DataFrame.interpolate()

Return type

~PandasObj

Returns

a new Dataframe/Series with index elements spaced according to freq

See also

reindex_by_interpolating()

iccas.running_average(counts, window=7, step=1, **resample_kwargs)[source]¶

Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of window, moving the window step days a time.

Parameters

counts (~PandasObj) –
window (int) –
step (int) –

Returns:

Return type: ~PandasObj

iccas.running_count(counts, window=7, step=1, **resample_kwargs)[source]¶

Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of window days that moves forward by steps of step days.

Parameters

counts (~PandasObj) –
window (int) –
step (int) –

Returns:

Return type: ~PandasObj