Forum Archive

Kangaroo

Oct 23, 2016 - 13:52

I got a file of 100 line, each line contains 5 elements such as

Aberdeen,City,189120,57.14369,-2.09814

Corresponse to places,type,population,latitude,longitude respectively

I splited them in 5 arrays and want to use function sorted(population) to rewrite this file ordered by increasing population size.

However, I can't match them in a right order when I sorted(population),eg.Aberdeen,City,59122,57.14369,-2.09814
I just made population in a increasing order.

Niall

Oct 23, 2016 - 16:04

Because the items in your list are dicts rather than simple values (I'm assuming you used csv.DictReader to read in your file, and I'm going to call it list_of_dicts for clarity), there's no reliable automatic sorting. What you need is a "key function" that tells sorted how to determine what the order is.

You need to either:

1) define a function:
def getPopulation (record) : return record["population"]
and then call sorted with this as the named parameter "key":
sortedValues = sorted(list_of_dicts, key=getPopulation)

2) provide an anonymous function (lambda) as the key parameter
*sortedValues = sorted(list_of_dicts,key=(lambda x: x["population"])

Niall

Oct 23, 2016 - 16:11

Ah wait, sorry... I see what's happening here...

"I splited them in 5 arrays"

You don't want to do that. Each row of your file is a single entity and should not be split.

You should read each row as a single array, tuple or dictionary. If you read them as a dictionary, use the solution I supplied above. If you're reading them as an array or tuple, modify the key in the code above to be 1 rather than "population".

Check out the Python docs for the csv module to read your files in -- that does most of the string handling work for you.

dgelessus

Oct 23, 2016 - 17:10

Sorting a list of CSV lines is easier if you make every line a tuple (doesn't the csv module give you the rows as tuples already?) and put the tuples in a list. Then you can sort the list with a key to sort by a specific tuple element:

geodata = ... # List of rows (tuples) from the CSV file

# Make a sorted copy of geodata, sorted by the third entry in each row (the population)
sorted_geodata = sorted(geodata, key=lambda row: row[2])

# Or sort geodata in place (this overwrites geodata with the sorted version)
geodata.sort(key=lambda row: row[2])

If you don't know what lambda means, it's a shorthand for defining a function. You could also write:

def get_pop_from_row(row):
    return row[2]

sorted_geodata = sorted(geodata, key=get_pop_from_row)

Here you can also see what get_pop_from_row does. For example, if you run get_pop_from_row(geodata[0]) in the console (after running your script), it returns 189120 (assuming that Aberdeen is first in the list). The sorted function does this for every row, and compares the values returned by get_pop_from_row.

But for very short functions, like sort keys, lambda is more convenient.

Phuket2

Oct 23, 2016 - 17:27

I think a small problem is that it is assumed that @Kangaroo is using csv to write his files. He doesn't actually say he is, but ok, it's comma delimited file. But he may not be writing them out with csv. Not sure. I also had a quick look at his problem. But soon decided if I don't know how he is writing it out, is difficult to help him. So it's what is the real problem, the sort or loading the file into a list/var that is sortable. Of course it's a different problem if there are 1 million lines or a 100 lines.
Anyway, just saying.

Kangaroo

Oct 23, 2016 - 19:06

Thank you guys! I solved my problem by importing numpy! Thanks again!

Forum Archive

Function Sorted()