iccas package¶
Subpackages¶
Submodules¶
iccas.checks module¶
Sanity checks.
iccas.loading module¶
-
iccas.loading.get(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns the latest version of the ICCAS dataset in a
pandas.DataFrame(as it’s returned byload()).This function uses
RemoteFolderCache.get(), which caches.- Raises
request.exceptions.ConnectionError – if the server is unreachable
and no dataset is available in cache_dir –
- Return type
DataFrame
-
iccas.loading.get_by_date(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ - Return type
Tuple[DataFrame,Timestamp]
-
iccas.loading.get_population_by_age(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.loading.get_population_by_age_group(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.loading.get_url(date=None, fmt='csv')[source]¶ Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.
- Return type
str
-
iccas.loading.load_single_date(path, keep_date=False)[source]¶ Loads a dataset containing data for a single date.
By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).
- Parameters
path (
Union[str,Path]) –keep_date (
bool) – whether to drop the date column (containing a single datetime value)
- Return type
DataFrame
iccas.processing module¶
-
iccas.processing.fix_monotonicity(data, method='pchip', **interpolation)[source]¶ Replaces tracts of all cases and deaths time series that break the non-decreasing trend of the series with interpolated data. This function also ensures that the following conditions are still satisfied even after the “correction”:
male_cases + female_cases <= cases male_deaths + female_deaths <= deaths
Non-integer columns, if present, are ignored and returned as they are in the output DataFrame.
- Parameters
data (
DataFrame) – a DataFrame containing all integer columns about cases and deathsmethod – interpolation method
- Returns
a DataFrame with all integer time series (columns) modified so that they are non-decreasing time series
-
iccas.processing.nullify_series_local_bumps(series)[source]¶ Set to NaN all elements s[i] such that s[i] > s[i+k]
-
iccas.processing.reindex_by_interpolating(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]¶ Reindexes data and fills new values by interpolation (PCHIP, by default).
This function was motivated by the fact that
pandas.DataFrame.resample()followed bypandas.DataFrame.resample()doesn’t take into account misaligned datetimes.- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
new_index (
DatetimeIndex) –preserve_ints (
bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate())**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series
See also
-
iccas.processing.resample(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]¶ Resamples data and fills missing values by interpolation.
The resulting index is a pandas.DatetimeIndex whose elements are spaced accordingly to freq and having the time set to {hour}:00.
In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.
This function was motivated by the fact that
pandas.DataFrame.resample()followed bypandas.DataFrame.resample()doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just useDataFrame.resample().- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
freq (
Union[int,str]) – resampling frequency in pandas notationhour (
int) – reference hour; all datetimes in the new index will have this hourpreserve_ints (
bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate())**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series with index elements spaced according to
freq
See also
iccas.queries module¶
-
iccas.queries.aggregate_age_groups(counts, cuts, fmt_last='>={}')[source]¶ Aggregates counts for different age groups summing them together.
- Parameters
counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)
cuts (
Union[int,Sequence[int]]) – a single integer N means “cut each N years”; a sequence of integers determines the start ages of new age groups; 0 is implicitly the start age of the first group, even if not present incuts.fmt_last (
str) – format string for the last “unbounded” age group
- Return type
~PandasObj
- Returns
A Series/DataFrame with the same “structure” of the input but with aggregated age groups.
-
iccas.queries.average_by_period(counts, freq)[source]¶ Returns a new Series/DataFrame with average counts (cases/deaths) by period (e.g. months, weeks,
ndays ecc)- Parameters
counts (~PandasObj) –
freq (
Union[str,int]) – period frequency parameter (whatever accepted bypandas)
Returns:
- Return type
~PandasObj
-
iccas.queries.cols(prefixes, fields='*')[source]¶ Generates a list of columns by combining prefixes with fields.
- Parameters
prefixes (
str) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for allfields (
Union[str,Sequence[str]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’
- Return type
List[str]- Returns
a list of string
-
iccas.queries.count_by_period(counts, freq)[source]¶ Returns a new Series/DataFrame with counts (cases/deaths) by period (e.g. months, weeks,
ndays ecc)- Parameters
counts (~PandasObj) –
freq (
Union[str,int]) – period frequency parameter (whatever accepted bypandas)
Returns:
- Return type
~PandasObj
-
iccas.queries.fatality_rate(counts, shift)[source]¶ Computes the fatality rate as a ratio between the total number of deaths and the total number of cases
shiftdays before.countsis resampled with interpolation if needed.
-
iccas.queries.get_unknown_sex_count(counts, variable)[source]¶ Returns cases/deaths of unknown sex for each age group
- Return type
DataFrame
-
iccas.queries.only_cases(data)[source]¶ Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]
- Return type
DataFrame
-
iccas.queries.only_counts(data)[source]¶ Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.
- Return type
DataFrame
-
iccas.queries.only_deaths(data)[source]¶ Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]
- Return type
DataFrame
-
iccas.queries.running_average(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of
window, moving the windowstepdays a time.- Parameters
counts (~PandasObj) –
window (
int) –step (
int) –
Returns:
- Return type
~PandasObj
-
iccas.queries.running_count(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of
windowdays that moves forward by steps ofstepdays.- Parameters
counts (~PandasObj) –
window (
int) –step (
int) –
Returns:
- Return type
~PandasObj
iccas.types module¶
Module contents¶
-
iccas.aggregate_age_groups(counts, cuts, fmt_last='>={}')[source]¶ Aggregates counts for different age groups summing them together.
- Parameters
counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)
cuts (
Union[int,Sequence[int]]) – a single integer N means “cut each N years”; a sequence of integers determines the start ages of new age groups; 0 is implicitly the start age of the first group, even if not present incuts.fmt_last (
str) – format string for the last “unbounded” age group
- Return type
~PandasObj
- Returns
A Series/DataFrame with the same “structure” of the input but with aggregated age groups.
-
iccas.average_by_period(counts, freq)[source]¶ Returns a new Series/DataFrame with average counts (cases/deaths) by period (e.g. months, weeks,
ndays ecc)- Parameters
counts (~PandasObj) –
freq (
Union[str,int]) – period frequency parameter (whatever accepted bypandas)
Returns:
- Return type
~PandasObj
-
iccas.cols(prefixes, fields='*')[source]¶ Generates a list of columns by combining prefixes with fields.
- Parameters
prefixes (
str) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for allfields (
Union[str,Sequence[str]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’
- Return type
List[str]- Returns
a list of string
-
iccas.count_by_period(counts, freq)[source]¶ Returns a new Series/DataFrame with counts (cases/deaths) by period (e.g. months, weeks,
ndays ecc)- Parameters
counts (~PandasObj) –
freq (
Union[str,int]) – period frequency parameter (whatever accepted bypandas)
Returns:
- Return type
~PandasObj
-
iccas.fatality_rate(counts, shift)[source]¶ Computes the fatality rate as a ratio between the total number of deaths and the total number of cases
shiftdays before.countsis resampled with interpolation if needed.
-
iccas.fix_monotonicity(data, method='pchip', **interpolation)[source]¶ Replaces tracts of all cases and deaths time series that break the non-decreasing trend of the series with interpolated data. This function also ensures that the following conditions are still satisfied even after the “correction”:
male_cases + female_cases <= cases male_deaths + female_deaths <= deaths
Non-integer columns, if present, are ignored and returned as they are in the output DataFrame.
- Parameters
data (
DataFrame) – a DataFrame containing all integer columns about cases and deathsmethod – interpolation method
- Returns
a DataFrame with all integer time series (columns) modified so that they are non-decreasing time series
-
iccas.get(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns the latest version of the ICCAS dataset in a
pandas.DataFrame(as it’s returned byload()).This function uses
RemoteFolderCache.get(), which caches.- Raises
request.exceptions.ConnectionError – if the server is unreachable
and no dataset is available in cache_dir –
- Return type
DataFrame
-
iccas.get_by_date(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ - Return type
Tuple[DataFrame,Timestamp]
-
iccas.get_population_by_age(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.get_population_by_age_group(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.get_unknown_sex_count(counts, variable)[source]¶ Returns cases/deaths of unknown sex for each age group
- Return type
DataFrame
-
iccas.get_url(date=None, fmt='csv')[source]¶ Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.
- Return type
str
-
iccas.load_single_date(path, keep_date=False)[source]¶ Loads a dataset containing data for a single date.
By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).
- Parameters
path (
Union[str,Path]) –keep_date (
bool) – whether to drop the date column (containing a single datetime value)
- Return type
DataFrame
-
iccas.only_cases(data)[source]¶ Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]
- Return type
DataFrame
-
iccas.only_counts(data)[source]¶ Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.
- Return type
DataFrame
-
iccas.only_deaths(data)[source]¶ Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]
- Return type
DataFrame
-
iccas.reindex_by_interpolating(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]¶ Reindexes data and fills new values by interpolation (PCHIP, by default).
This function was motivated by the fact that
pandas.DataFrame.resample()followed bypandas.DataFrame.resample()doesn’t take into account misaligned datetimes.- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
new_index (
DatetimeIndex) –preserve_ints (
bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate())**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series
See also
-
iccas.resample(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]¶ Resamples data and fills missing values by interpolation.
The resulting index is a pandas.DatetimeIndex whose elements are spaced accordingly to freq and having the time set to {hour}:00.
In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.
This function was motivated by the fact that
pandas.DataFrame.resample()followed bypandas.DataFrame.resample()doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just useDataFrame.resample().- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
freq (
Union[int,str]) – resampling frequency in pandas notationhour (
int) – reference hour; all datetimes in the new index will have this hourpreserve_ints (
bool) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate())**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series with index elements spaced according to
freq
See also
-
iccas.running_average(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of
window, moving the windowstepdays a time.- Parameters
counts (~PandasObj) –
window (
int) –step (
int) –
Returns:
- Return type
~PandasObj
-
iccas.running_count(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of
windowdays that moves forward by steps ofstepdays.- Parameters
counts (~PandasObj) –
window (
int) –step (
int) –
Returns:
- Return type
~PandasObj