I have a cron job running a pythton script, that pulls data from a website, parses the data and then inserts the data into my database. The cron job is running once day at the same time. Now the data that I get from the website is in a dataframe format and gets bigger everyday but the old content does not change. I would like a way to store the last dataframe index that was inputted into the database so that I do not input duplicate data into my database. Basically, if today I entered 20 rows of data, the number 20 is stored, and tomorrow, the program finds the number 20, deletes the first 20 rows from the dataframe, enters the remaining data and updates the last row entered variable.
Easiest way to do this would be a json file, but this does not work since with each run time the json file returns to original point where it was deploy from github.
I would also like to avoid checking the last row entered using the database itself for various reasons.
Any ideas how best to achieve this please?