iccas package¶
Subpackages¶
Submodules¶
iccas.checks module¶
Sanity checks.
iccas.loading module¶
-
iccas.loading.
get
(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns the latest version of the ICCAS dataset in a
pandas.DataFrame
(as it’s returned byload()
).This function uses
RemoteFolderCache.get()
, which caches.- Raises
request.exceptions.ConnectionError – if the server is unreachable
and no dataset is available in cache_dir –
- Return type
DataFrame
-
iccas.loading.
get_by_date
(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ - Return type
Tuple
[DataFrame
,Timestamp
]
-
iccas.loading.
get_population_by_age
(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.loading.
get_population_by_age_group
(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.loading.
get_url
(date=None, fmt='csv')[source]¶ Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.
- Return type
str
-
iccas.loading.
load_single_date
(path, keep_date=False)[source]¶ Loads a dataset containing data for a single date.
By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).
- Parameters
path (
Union
[str
,Path
]) –keep_date (
bool
) – whether to drop the date column (containing a single datetime value)
- Return type
DataFrame
iccas.processing module¶
-
iccas.processing.
fix_monotonicity
(data, method='pchip', **interpolation)[source]¶ Replaces tracts of all cases and deaths time series that break the non-decreasing trend of the series with interpolated data. This function also ensures that the following conditions are still satisfied even after the “correction”:
male_cases + female_cases <= cases male_deaths + female_deaths <= deaths
Non-integer columns, if present, are ignored and returned as they are in the output DataFrame.
- Parameters
data (
DataFrame
) – a DataFrame containing all integer columns about cases and deathsmethod – interpolation method
- Returns
a DataFrame with all integer time series (columns) modified so that they are non-decreasing time series
-
iccas.processing.
nullify_series_local_bumps
(series)[source]¶ Set to NaN all elements s[i] such that s[i] > s[i+k]
-
iccas.processing.
reindex_by_interpolating
(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]¶ Reindexes data and fills new values by interpolation (PCHIP, by default).
This function was motivated by the fact that
pandas.DataFrame.resample()
followed bypandas.DataFrame.resample()
doesn’t take into account misaligned datetimes.- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
new_index (
DatetimeIndex
) –preserve_ints (
bool
) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate()
)**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series
See also
-
iccas.processing.
resample
(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]¶ Resamples data and fills missing values by interpolation.
The resulting index is a pandas.DatetimeIndex whose elements are spaced accordingly to freq and having the time set to {hour}:00.
In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.
This function was motivated by the fact that
pandas.DataFrame.resample()
followed bypandas.DataFrame.resample()
doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just useDataFrame.resample()
.- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
freq (
Union
[int
,str
]) – resampling frequency in pandas notationhour (
int
) – reference hour; all datetimes in the new index will have this hourpreserve_ints (
bool
) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate()
)**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series with index elements spaced according to
freq
See also
iccas.queries module¶
-
iccas.queries.
aggregate_age_groups
(counts, cuts, fmt_last='>={}')[source]¶ Aggregates counts for different age groups summing them together.
- Parameters
counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)
cuts (
Union
[int
,Sequence
[int
]]) – a single integer N means “cut each N years”; a sequence of integers determines the start ages of new age groups; 0 is implicitly the start age of the first group, even if not present incuts
.fmt_last (
str
) – format string for the last “unbounded” age group
- Return type
~PandasObj
- Returns
A Series/DataFrame with the same “structure” of the input but with aggregated age groups.
-
iccas.queries.
average_by_period
(counts, freq)[source]¶ Returns a new Series/DataFrame with average counts (cases/deaths) by period (e.g. months, weeks,
n
days ecc)- Parameters
counts (~PandasObj) –
freq (
Union
[str
,int
]) – period frequency parameter (whatever accepted bypandas
)
Returns:
- Return type
~PandasObj
-
iccas.queries.
cols
(prefixes, fields='*')[source]¶ Generates a list of columns by combining prefixes with fields.
- Parameters
prefixes (
str
) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for allfields (
Union
[str
,Sequence
[str
]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’
- Return type
List
[str
]- Returns
a list of string
-
iccas.queries.
count_by_period
(counts, freq)[source]¶ Returns a new Series/DataFrame with counts (cases/deaths) by period (e.g. months, weeks,
n
days ecc)- Parameters
counts (~PandasObj) –
freq (
Union
[str
,int
]) – period frequency parameter (whatever accepted bypandas
)
Returns:
- Return type
~PandasObj
-
iccas.queries.
fatality_rate
(counts, shift)[source]¶ Computes the fatality rate as a ratio between the total number of deaths and the total number of cases
shift
days before.counts
is resampled with interpolation if needed.
-
iccas.queries.
get_unknown_sex_count
(counts, variable)[source]¶ Returns cases/deaths of unknown sex for each age group
- Return type
DataFrame
-
iccas.queries.
only_cases
(data)[source]¶ Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]
- Return type
DataFrame
-
iccas.queries.
only_counts
(data)[source]¶ Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.
- Return type
DataFrame
-
iccas.queries.
only_deaths
(data)[source]¶ Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]
- Return type
DataFrame
-
iccas.queries.
running_average
(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of
window
, moving the windowstep
days a time.- Parameters
counts (~PandasObj) –
window (
int
) –step (
int
) –
Returns:
- Return type
~PandasObj
-
iccas.queries.
running_count
(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of
window
days that moves forward by steps ofstep
days.- Parameters
counts (~PandasObj) –
window (
int
) –step (
int
) –
Returns:
- Return type
~PandasObj
iccas.types module¶
Module contents¶
-
iccas.
aggregate_age_groups
(counts, cuts, fmt_last='>={}')[source]¶ Aggregates counts for different age groups summing them together.
- Parameters
counts (~PandasObj) – can be a Series with age groups as index or a DataFrame with age groups as columns, either in a simple Index or in a MultiIndex (no matter in what level)
cuts (
Union
[int
,Sequence
[int
]]) – a single integer N means “cut each N years”; a sequence of integers determines the start ages of new age groups; 0 is implicitly the start age of the first group, even if not present incuts
.fmt_last (
str
) – format string for the last “unbounded” age group
- Return type
~PandasObj
- Returns
A Series/DataFrame with the same “structure” of the input but with aggregated age groups.
-
iccas.
average_by_period
(counts, freq)[source]¶ Returns a new Series/DataFrame with average counts (cases/deaths) by period (e.g. months, weeks,
n
days ecc)- Parameters
counts (~PandasObj) –
freq (
Union
[str
,int
]) – period frequency parameter (whatever accepted bypandas
)
Returns:
- Return type
~PandasObj
-
iccas.
cols
(prefixes, fields='*')[source]¶ Generates a list of columns by combining prefixes with fields.
- Parameters
prefixes (
str
) – string containing one or multiple of the following characters: - ‘m’ for males - ‘f’ for females - ‘t’ for totals (no prefix) - ‘*’ for allfields (
Union
[str
,Sequence
[str
]]) – values: ‘cases’, ‘deaths’, ‘cases_percentage’, ‘deaths_percentage’, ‘fatality_rate’, ‘*’
- Return type
List
[str
]- Returns
a list of string
-
iccas.
count_by_period
(counts, freq)[source]¶ Returns a new Series/DataFrame with counts (cases/deaths) by period (e.g. months, weeks,
n
days ecc)- Parameters
counts (~PandasObj) –
freq (
Union
[str
,int
]) – period frequency parameter (whatever accepted bypandas
)
Returns:
- Return type
~PandasObj
-
iccas.
fatality_rate
(counts, shift)[source]¶ Computes the fatality rate as a ratio between the total number of deaths and the total number of cases
shift
days before.counts
is resampled with interpolation if needed.
-
iccas.
fix_monotonicity
(data, method='pchip', **interpolation)[source]¶ Replaces tracts of all cases and deaths time series that break the non-decreasing trend of the series with interpolated data. This function also ensures that the following conditions are still satisfied even after the “correction”:
male_cases + female_cases <= cases male_deaths + female_deaths <= deaths
Non-integer columns, if present, are ignored and returned as they are in the output DataFrame.
- Parameters
data (
DataFrame
) – a DataFrame containing all integer columns about cases and deathsmethod – interpolation method
- Returns
a DataFrame with all integer time series (columns) modified so that they are non-decreasing time series
-
iccas.
get
(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns the latest version of the ICCAS dataset in a
pandas.DataFrame
(as it’s returned byload()
).This function uses
RemoteFolderCache.get()
, which caches.- Raises
request.exceptions.ConnectionError – if the server is unreachable
and no dataset is available in cache_dir –
- Return type
DataFrame
-
iccas.
get_by_date
(date, keep_date=False, cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ - Return type
Tuple
[DataFrame
,Timestamp
]
-
iccas.
get_population_by_age
(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.
get_population_by_age_group
(cache_dir=PosixPath('/home/docs/.iccas'))[source]¶ Returns a DataFrame with “age_group” as index and two columns: “value” (absolute counts) and “percentage” (<=1.0)
- Return type
DataFrame
-
iccas.
get_unknown_sex_count
(counts, variable)[source]¶ Returns cases/deaths of unknown sex for each age group
- Return type
DataFrame
-
iccas.
get_url
(date=None, fmt='csv')[source]¶ Returns the url of a dataset in a given format. If date is None, returns the URL of the full dataset.
- Return type
str
-
iccas.
load_single_date
(path, keep_date=False)[source]¶ Loads a dataset containing data for a single date.
By default (keep_date=False), the date column is dropped and the datetime is stored in the attrs of the DataFrame. If instead keep_date=True, the returned dataset has a MultiIndex (date, age_group).
- Parameters
path (
Union
[str
,Path
]) –keep_date (
bool
) – whether to drop the date column (containing a single datetime value)
- Return type
DataFrame
-
iccas.
only_cases
(data)[source]¶ Returns only columns [‘cases’, ‘female_cases’, ‘male_cases’]
- Return type
DataFrame
-
iccas.
only_counts
(data)[source]¶ Returns only cases and deaths columns (including sex-specific columns), dropping all other columns that are computable from these.
- Return type
DataFrame
-
iccas.
only_deaths
(data)[source]¶ Returns only columns [‘deaths’, ‘female_deaths’, ‘male_deaths’]
- Return type
DataFrame
-
iccas.
reindex_by_interpolating
(data, new_index, preserve_ints=True, method='pchip', **interpolation)[source]¶ Reindexes data and fills new values by interpolation (PCHIP, by default).
This function was motivated by the fact that
pandas.DataFrame.resample()
followed bypandas.DataFrame.resample()
doesn’t take into account misaligned datetimes.- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
new_index (
DatetimeIndex
) –preserve_ints (
bool
) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate()
)**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series
See also
-
iccas.
resample
(data, freq='1D', hour=18, preserve_ints=True, method='pchip', **interpolation)[source]¶ Resamples data and fills missing values by interpolation.
The resulting index is a pandas.DatetimeIndex whose elements are spaced accordingly to freq and having the time set to {hour}:00.
In the case of “day frequencies” (‘{num}D’), the index always includes the latest date (data.index[-1]): the new index is a datetime range built going backwards from the latest date.
This function was motivated by the fact that
pandas.DataFrame.resample()
followed bypandas.DataFrame.resample()
doesn’t take into account misaligned datetimes. If you want to back-fill or forward-fill, just useDataFrame.resample()
.- Parameters
data (~PandasObj) – a DataFrame or Series with a datetime index
freq (
Union
[int
,str
]) – resampling frequency in pandas notationhour (
int
) – reference hour; all datetimes in the new index will have this hourpreserve_ints (
bool
) – after interpolation, columns containing integers in the original dataframe are rounded and converted back to intmethod – interpolation method (see
pandas.DataFrame.interpolate()
)**interpolation – other interpolation keyword argument different from method passed to
pandas.DataFrame.interpolate()
- Return type
~PandasObj
- Returns
a new Dataframe/Series with index elements spaced according to
freq
See also
-
iccas.
running_average
(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases/deaths, returns the average daily number of new cases/deaths inside a temporal window of
window
, moving the windowstep
days a time.- Parameters
counts (~PandasObj) –
window (
int
) –step (
int
) –
Returns:
- Return type
~PandasObj
-
iccas.
running_count
(counts, window, step=1, **resample_kwargs)[source]¶ Given counts for cases and/or deaths, returns the number of new cases inside a temporal window of
window
days that moves forward by steps ofstep
days.- Parameters
counts (~PandasObj) –
window (
int
) –step (
int
) –
Returns:
- Return type
~PandasObj