Dear Community
Good Afternoon!
I have been working on some code to parse and analyze TRC files from Hamilton by methods. I would like to share what i did so far and what is my main goal.
Code
import glob
import re
import os.path
import pandas as pdclass HAMILTONLogParser:
def get_data(path: str): time = "[0-9][0-9]:[0-9][0-9]" date = "\d{4}-\d{2}-\d{2}" runtime = [] rundate = [] output = {} results = [file for file in glob.glob(path) if "Error" not in open(file).read() and "Abort" not in open(file).read()] for file in results: with open(file, "r") as f: runtime.append(re.findall(time, f.read())) f.close() for run in results: with open(run, "r") as f2: rundate.append(re.findall(date, f2.read())) f2.close() output["times"] = runtime output["dates"] = list(zip(*rundate))[0] output["filename"] = results return output def hami_dataframe(data, output: str, raw=True, verbose=False): table = pd.DataFrame(data["times"], index=data["dates"]) table.insert(0, column="filename", value = data["filename"]) table.insert(1, column="records", value = table.count(axis=1)) table.sort_index(ascending=False, inplace=True) if raw == False: for col in table[0:len(table)]: try: table[col] = pd.to_timedelta(table[col]+':00') except (TypeError, ValueError): print(f"({col}") for i in table["records"] - 1: try: table["duration"] = table[i] - table[0] except: print(i) table = table[['filename', 'records', 'duration']] print(table) table.to_excel(output) table.to_excel(output) if verbose == True: print("NUMBER OF FILES: ", len(data["filename"])) print("NUMBER OF RECORDS IN TIME: ", len(data["times"])) print("NUMBER OF RECORDS IN DATE: ", len(data["dates"])) print(table.info()) if os.path.exists(output): print("File Written") else: print("FILE NOT WRITTEN") return table
run as:
f1 = HAMILTONLogParser.get_data(path=“path/to/protocol_*”)
f2 = HAMILTONLogParser.hami_dataframe(f1,output=“path/outputname/file.xlsx”, verbose=True, raw=True)
My code is working but is far from the ideal. The “raw” option returns the whole table with multiples hour:minutes, allowing you to calculate manually (latest row with a value - the first row with a value) using tools such as excel. However, despite raw=True calculate automatically, some rows are not being calculated.
I would like to hear if we have a better and mature code and if such concept is viable in yours daily routine. Where i work we are improving our methods continuously and so far we did not have a way to compare multiples runs before/after some changes in the code (Venus 4).
I tried to stay close as possible to PEP and KISS, it’s completely possible that i commited some sins and i open for suggestions/fix if you believe this tool could be useful! If we take it as something relevant, i will create a repository on github.
All the best!