Welcome!

This project is ongoing. If you have a problem, please open an issue on the github page. If you would like to add functionality, then open an issue and we can discuss it.

Github Page

About The Renamer

What does The Renamer do?

The Renamer is used to rename entries in pandas dataframe. It first reads in a csv file that has codes, or already known names to support identification and renaming. It uses a dictionary, with zip and map to simplify the process.

What does The Renamer do for me?

It helps separate data from code to keep your code base small and makes it easier to work with renaming a lot entries since you can edit the ids.csv in a full featured editor like excel. It also makes renaming things more repeatable since you just put the csv containing the ids in your code base, import build_ids and renamer and go.

If you do exploratory data analysis a lot, then you probably already have a system. Keep it, and augment it with The Renamer, or better yet make a pull request, and I can integrate your system.

Notes

To see a nice implementation, go to my Medium Article .

Excel Usage

The file used to contain the codes and known names is a csv file. Excel generally does well with csv files; however, it has a nasty habit of converting 0230 to an integer then 230 when you open the file by double clicking it. To get around this, open Excel first and use Excel to open the file. By default it will give you an option to pick the deliminator and assign the text type to each column.

Best Way to Use

Currently, the best way to use The Renamer is by cloning the master branch and placing the_renamer.py and ids.csv in your working directory. The code is well thought out, but I did not want to clutter PyPI since it is only ~200 lines. If there is enough desire, then I am happy to.

Github Master Branch

Functions

the_renamer.build_ids(path_to_ids, check_keys_unique=True)

Makes dictionaries for for all columns in ids csv file.

Parameters
  • path_to_ids (path or string) – Csv file containing known names and codes. Cannot have any blank entries.

  • check_keys_unique (boolean) – Prevents duplicate keys from being overwritten with last value.

Returns

Same order as the codes in the ids csv.

Return type

list of dicts

the_renamer.renamer(df, rename_col, code_col, rename_dict, return_original=True, replacement_value=None, in_place=False)

Renames column entries based on rename dictionary.

Parameters
  • df (pd.DataFrame) – Dataframe in.

  • rename_col (str) – Name of column for entries to rename.

  • code_col (str) – Name of column for entries that will be used identify the row. May be the same as rename_col.

  • rename_dict (dict) – Map from code_col value to rename_col new value.

  • return_original (bool) – Whether to alter values not specified in ids.csv.

  • replacement_value (any) – If return_original equals False, value to replace unspecified values.

  • in_place (bool) – Make changes in place or return a copy of the Dataframe.

Returns

Transformed dataframe.

Return type

pd.DataFrame

IDs.csv

This just a normal csv file with a few rules:
  1. Row 1, enter the names of the columns. The names are for your own book keeping. They don’t appear anywhere else. They must be unique within row 1.

  2. Row 2, enter you must have ‘destination’, ‘one’, or ‘many’ based on the type of column.

    • ‘Destination’ type columns contain the desired name.

    • ‘One’ type columns contains only single entries.

    • ‘Many’ type columns contain multiple entries separated by a comma.

  3. You can only have one ‘destination’ column. You can have arbitrarily many ‘one’ or ‘many’ columns. The columns can be arbitrarily long.

  4. The entries in the ‘destination’ or ‘one’ column can be any type, but the ‘many’ columns entries must be str type.

  5. No entries can be blank. If no value is desired, use ‘<NONE>’. If the value of ‘<NONE>’ is desired you change this by updating the ‘none_value’ variable at the top of the_renamer.py.

US States Example ids.csv

(displayed as if editing in Excel or Google sheets)

desired

most_common

all

destination

single

many

AL

Alabama

alabama, ALABAMA, Alabama, BAMA

AK

<NONE>

Arkansas, arkansas, ARKANSAS, The Other Kansas

AZ

Arizona

Arizona, arizona, ARIZONA

Test Coverage

There is pretty good test coverage. If you are having an issue, the first thing I will ask you to do is clone the most recent ‘master’ branch, run the tests, and then send me a screen shot.

Github Master Branch

class test_the_renamer.TestRenamer(methodName='runTest')
setUp()

Hook method for setting up the test fixture before exercising it.

test_many_keys_replacement()

Test using dictionary built from ‘many’ column from ids.csv. Returns ‘non-state’ for values not found in dictionary.

test_many_keys_return_original()

Test using dictionary built from ‘many’ column from ids.csv. Returns original values for values not found in dictionary.

test_single_key_return_original()

Test using dictionary built from ‘single’ column from ids.csv. Returns original values for values not found in dictionary.