Efficient Method for inserting integers into a sorted list

D.V.D

Make a wish
Reaction score
73
Hey all, I'm playing around on my school computers and I was wondering, whats a efficient method for inserting new values into a sorted list? I read up that you could use a binary search every time you do insert but I went about a different way of doing it.

I'm using python because that's all our computers have, and I decided to instead of store the values in a list, I'd store them in a dictionary (unsortedlist i think in c++), then i would have the key's be the actual values stored, while the values would be the amount of instances of this value found in the list. So in code, this is what it would look like:

Code:
class sorted_list:
    buckets = {}
 
    def insert (self,value):
        # Assume value is a positive number
        if self.buckets.get(value,0) == 0:
            self.buckets[value] = 1
        else:
            self.buckets[value] += 1
 
    def remove (self,value):
        if self.buckets.get(value,0) == 0:
            #error
            return
        self.buckets[value] -= 1

This seems pretty basic, and i'm not exactly sure if its quick because you do have to use hash table which might be slower than some other alternatives. The other issue is you don't have all the properties of a list for example, its a lot harder to loop through the sorted list since you don't always know the available index's that you can choose from. So to make that work, you might have to check each time you insert or delete a value to get the range of the list, which gets a bit tricky and more expensive. Looping through would also require a get method for each index you loop through which isn't so bad if your index is 100 but if its in the thousands to millions (point clouds), this could become a issue.

Are there other quick ways of doing this or is a binary search the best option out there for large lists of integers with large ranges( were assuming the values are all positive )?
 

Accname

2D-Graphics enthusiast
Reaction score
1,462
Efficient is relative.
Your way might be very fast for very large sets of data (> 100.000), but its definitely not efficient for small sets and its probably wasteful in terms of memory usage.
If you never have more then 10.000 elements anyway then just go with a simple linear search. It doesnt even matter anymore today. (or if you feel fancy do quick sort or heap sort)

Other then that there is also AVL-trees which are very efficient in performance and memory usage but are quite a lot more work to implement.
 

D.V.D

Make a wish
Reaction score
73
Well if you want to end up working with point clouds, you'll always have upwards of hundred thousand points for practically all objects (they're super memory intensive). Hmm yeah, I was thinking that making a dictionary would use a lot more memory, integer for key and for the value.

Well I made a quick sort just now and a radix sort, but I just got a idea, what if you already have your sorted list, and then when you quick sort it, you make the pivot the value you just inserted? Damn actually that wouldn't work as well as i thought it would because the index would still be at the end so you'd loop through the whole data set anyways.

Ill check the AVL tree, I've heard of it but I've never actually looked at what it is. Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert, and then move up or down the list and compare the next values to see where your new one should go. It seems hashtables are a lot quicker in this case but I wonder how the AVL tree would work out.
 

Accname

2D-Graphics enthusiast
Reaction score
1,462
Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert
That is how quicksort works.

And hashtables arent all that fast either. The actual hash function needs to be calculated and then you have to deal with collisions. The worst-case time for insertion and look-up into/from hashtables is O(n) which is quite bad. Of course the average case is much better then that.

Quicksort is, already, really damn fast. But what I have recently read is that you could try to use parallel sorting if you are working with very large data sets. That is, like in quicksort, you always try to split your data up into smaller parts, but everytime you split up you use a new, separate thread for the work to be done. This way you make use of multi-core processors for sorting the data.
 

Darthfett

Aerospace/Cybersecurity Software Engineer
Reaction score
615
The problem with using a dictionary is that it is not necessarily sorted. The underlying implementation can do whatever it wants with the order, when you ask for the keys or items.

As far as your code, you're currently making that buckets value into a class attribute, not an instance attribute. Remove it as is, and add an initializer method:

Code:
def __init__(self):
    self.buckets = {}

Additionally, as with all programming optimization problems, the question is do you really *need* to optimize this? You should first profile your code, and determine that this is indeed your bottleneck. Python's sorting method, Timsort is actually quite fast, and works well with real data. It may be good enough to simply append the value, and then use sorted or list.sort, either just after each value is inserted, or when some code requests access to the list or one of its values.

What ends up being the most efficient algorithm is highly dependent upon your data set, both its size and whether there are any patterns in the values being inserted (see again Timsort, which takes advantage of the fact that there are often sorted sequences).

Finally, what data structure you need can depend on your use case. For example, a heapq might be good enough, if all you need to do is get the smallest/largest values (accessing these is O(log(n), and it is O(log(n)) to insert).

TLDR; Now that you understand why the answer is not simple, I will give you an answer: For general use, with large data sets, I would recommend using binary search insertion (see the bisect module).
 

monoVertex

I'm back!
Reaction score
460
Your example doesn't really work if you need the actual items stored in the list and you just keep tabs on the number of times an integer is found in your array, that dictionary is not sorted (and couldn't be sorted in any way). So it is not really a solution for the problem "insert integer into sorted list".

EDIT: Damn, Darth beat me.

If you absolutely have to keep your list as a list, then I think binary search is the best option. With an heuristic for the index choice, it's going to be even faster. Now, this heuristic depends a lot on the data set you're working with. The parallel search Accname suggested sounds really interesting, I'm going to test that :D.
 

D.V.D

Make a wish
Reaction score
73
Yeah I know, a dictionary isn't ever sorted because it has no order, but if you retrieve the data like you normally do by going from index 0 to whatever number, then your process of retrieving the info will make your result be a sorted list(if you wanted to print or return a list type from the dictionary). So in the end the result is the same.

Yeah I need to read up more on how a hash function actually looks like, I know the concept of it but I want to implement my own version so that I know I get it enough to make a working version of it.

The issue with QuickSort as I see it at least is predicting where the pivot should be. The way I made it, I chose the pivot as a random index in the list, which isn't efficient at all, but if you have a large list and your inserting a arbitrary value into it, making your pivot half doesn't seem as that good of a solution anyways. But I think I know how to quickly check if a large chunk was already pre-sorted so I'll definitely play around with this. So far, it seems like the best option.

Okay so I got AVL Tree, Timer Sort, Binary Search, and parallel sorts. Im probably going to leave parallel sorts for last since I have never done multi threaded programming and this whole thing is really for fun, but Ill do that one last. Guess I got a lot to do in my computer science lab tomorrow :D.

The point is that I wanted to avoid resorting a huge sorted list if only one element isn't sorted inside of it, where as the rest is. It looks like a divide and conquer algorithm seems the best fit and checking if the range of each divided part of the list can let you check if your value would fit inside that section. If not, that part is sorted so you can skip it. Reminds me a lot of quad tree's and octree's.

BTW, im still first year and we haven't gotten into any cool new concepts yet except dictionary's but could someone link a paper or webpage that explains what all the O(n) and log n stuff means and how you come up with it? I heard a bit is the negative log or ln of something but I barely know anything on this topic, even though its in every paper I read.
 

monoVertex

I'm back!
Reaction score
460
The big O notation and its friends are one of the first things you should have studied, I have no idea why wasn't it taught at your school.

It is basically a method to approximate the performance of an algorithm. Wikipedia has a pretty documented page about this topic. I don't know any other documents, as I was taught this concept, but I'm sure you can find a lot of them on the Internet. As you'll soon see, it's not just a cool concept to learn, it's something indispensable from a programmer's knowledge.

PS: it has nothing to do with bits, I have no idea what that "1 bit = -log" is and it doesn't sound correct either :D.
 

D.V.D

Make a wish
Reaction score
73
Nah, we do a course about proofs and proving statments which have a lot in common with programs. I might learn those stuff next semester in statistics but Im not sure :p

Thanks for the link, yeah it seems super important if its everywhere XD

Yeah probably, just something I heard or maybe it is, doesn't really matter though :p and it was the -log of something, not sure what it was :p .
 

seph ir oth

Mod'n Dat News Jon
Reaction score
262
BOGO SORT BEST SORT


Edit: if memory isn't an issue, try Radix sorting, especially with lots of data.
 

monoVertex

I'm back!
Reaction score
460
Random sorting is a more general class of algorithms based on random numbers, I think. For example, Bozo Sort, which I found here is another random sort algorithm.
 

D.V.D

Make a wish
Reaction score
73
Thanks a lot for the video, now I want to implement all of them just for the hell of it XDD. Really helpful with the visuals too :)
 
General chit-chat
Help Users
  • No one is chatting at the moment.

      The Helper Discord

      Members online

      No members online now.

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top