Quantcast
Channel: Forums - Python
Viewing all articles
Browse latest Browse all 2485

Code to select one record from a number of duplicates

$
0
0
I’m trying to write code that will identify site records to use in a habitat species model.

There are 4000+ site records of presence and absence of a specific species. Some of these site records are duplicates (same location sampled multiple times over a number of years), some sites were only sampled once. At the sites that were sampled multiple times, in some cases an organism was present, and in other cases it was absent. For habitat model we only want to use each site location once, and if the organism ever visited the site we want to use one of the 'present' records.

I can write code that identifies duplicate sites and code that identifies whether the site was occupied, but I somehow need to combine these based on the following:
-If no duplicate sites: return 1 (whether occupied or not)
-if duplicate sites: check all duplicate site records
-if all duplicate site records have occurrences – assign one of the records =1, all others 0
-If all duplicate site records were absences – assign one of the records= 1, all others 0
-if there is a combination of occurrences and non-occurrences – assign one of the occurrence records = 1, all others 0
My data is in the following format where the Modelled column is the column I am trying to populate:
Attachment 29622

Does anyone have any thoughts as to how to approach this problem? I've been trying to write nested loops but I can't seem to figure out how to get the logic to work.
Attached Thumbnails
Click image for larger version

Name:	Capture.PNG‎
Views:	N/A
Size:	5.2 KB
ID:	29622  

Viewing all articles
Browse latest Browse all 2485

Trending Articles