Question: Please help with this function Suppose I got a recommandation data frame like this: event_dayfirm action Date 2013-12-09 16:00:002013-12-09CRT Capitalinit 2013-12-18 16:00:002013-12-18UBSinit 2013-12-19 16:00:002013-12-19Evercore Partnersinit
Please help with this function
Suppose I got a recommandation data frame like this:
event_dayfirm action
Date
2013-12-09 16:00:002013-12-09CRT Capitalinit
2013-12-18 16:00:002013-12-18UBSinit
2013-12-19 16:00:002013-12-19Evercore Partnersinit
2014-01-29 08:00:002014-01-29Bank of Americaup
2014-01-29 10:52:182014-01-29Deutsche Bankmain
............
2020-06-15 11:34:192020-06-15Citigroupmain
2020-07-24 14:50:252020-07-24Stifelmain
2020-07-27 10:34:072020-07-27Raymond Jamesup
2020-09-02 09:14:412020-09-02Berenbergdown
2020-10-13 10:42:532020-10-13Susquehannadown
How can I sort and pick the 30 company, please help with this function
def proc_rec_df(rec_df): """ This function takes a dataframe with the recommendations for a given ticker and performs the following operations **in this order**: 1. Keep only the top 30 firms (in terms of number of recommendations over the entire sample period) for this ticker (see Notes below). 2. Keep only recommendations that represent either an upgrade or a downgrade (that is, the values of `rec_df['action']` are either 'up' or 'down'). If there are no observations that match this criterion, this function will return an empty dataframe with the index and columns specified in the Returns section below. 3. Return the dataframe as described in the Returns section below. Parameters ---------- rec_df : dataframe Dataframe produced by the function `read_rec_csv` created above. Returns ------- df A Pandas dataframe with the same structure as `rec_df` but only including upgrades and downgrades: - df.index : index of the same type as rec_df.index, but not necessarily of the same length. - df.columns : columns as in rec_df.columns - df['action']: if dataframe is not empty, this column should only contain values 'up' or 'down'. Notes ----- - To select the top 30 firms: 1. Count the number of observations for each individual value of the column 'firm'. 2. Sort the result by counts in descending order. 3. For each count value, sort the firms in the group alphabetically 4. Keep the first 30 firms only. The procedure above means that if there is a tie for the 30th highest count, the firm name will be used when deciding which firms to keep (alphabetical priority). - The output of this function is a dataframe that looks like this (the contents of the df below are for illustration purposes only and will **not** represent the actual contents of the dataframe you create necessarily. In addition, the actual dataframe returned could be empty): | | event_day | firm | action | | index (datetime64) | | | | |---------------------+------------+----------------+--------| | 2012-02-16 13:53:00 | 2012-02-16 | Wunderlich | down | | 2012-03-26 07:31:00 | 2012-03-26 | Wunderlich | up | | 2020-07-28 09:57:21 | 2020-07-28 | Bernstein | down | | 2020-08-14 09:19:00 | 2020-08-14 | Morgan Stanley | up | """ # -------------------------------------------------------- # Get top 30 firms # -------------------------------------------------------- #'''df_firm_count = rec_df.value_counts('firm', ascending= False) return df_firm_count top30_firm = df_firm_count[0:30] print(top30_firm.index)''' groups = rec_df.groupby(by='firm') res = groups.apply(len) res.sort_values(axis=0, ascending= False) res.sort_index(ascending= True) print(res) return groups # -------------------------------------------------------- # Subset the DF to include only these firms # -------------------------------------------------------- # # -------------------------------------------------------- # Keep only the columns we want # cols = ['event_day', 'firm', 'action'] # -------------------------------------------------------- # # -------------------------------------------------------- # Keep only values of 'action' that are either 'up' or 'down' # -------------------------------------------------------- # # -------------------------------------------------------- # Return the dataframe # -------------------------------------------------------- #
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
