Efficient Method for inserting integers into a sorted list

D.V.D

Make a wish
Reaction score
73
Hey all, I'm playing around on my school computers and I was wondering, whats a efficient method for inserting new values into a sorted list? I read up that you could use a binary search every time you do insert but I went about a different way of doing it.

I'm using python because that's all our computers have, and I decided to instead of store the values in a list, I'd store them in a dictionary (unsortedlist i think in c++), then i would have the key's be the actual values stored, while the values would be the amount of instances of this value found in the list. So in code, this is what it would look like:

Code:
class sorted_list:
    buckets = {}
 
    def insert (self,value):
        # Assume value is a positive number
        if self.buckets.get(value,0) == 0:
            self.buckets[value] = 1
        else:
            self.buckets[value] += 1
 
    def remove (self,value):
        if self.buckets.get(value,0) == 0:
            #error
            return
        self.buckets[value] -= 1

This seems pretty basic, and i'm not exactly sure if its quick because you do have to use hash table which might be slower than some other alternatives. The other issue is you don't have all the properties of a list for example, its a lot harder to loop through the sorted list since you don't always know the available index's that you can choose from. So to make that work, you might have to check each time you insert or delete a value to get the range of the list, which gets a bit tricky and more expensive. Looping through would also require a get method for each index you loop through which isn't so bad if your index is 100 but if its in the thousands to millions (point clouds), this could become a issue.

Are there other quick ways of doing this or is a binary search the best option out there for large lists of integers with large ranges( were assuming the values are all positive )?
 

Accname

2D-Graphics enthusiast
Reaction score
1,462
Efficient is relative.
Your way might be very fast for very large sets of data (> 100.000), but its definitely not efficient for small sets and its probably wasteful in terms of memory usage.
If you never have more then 10.000 elements anyway then just go with a simple linear search. It doesnt even matter anymore today. (or if you feel fancy do quick sort or heap sort)

Other then that there is also AVL-trees which are very efficient in performance and memory usage but are quite a lot more work to implement.
 

D.V.D

Make a wish
Reaction score
73
Well if you want to end up working with point clouds, you'll always have upwards of hundred thousand points for practically all objects (they're super memory intensive). Hmm yeah, I was thinking that making a dictionary would use a lot more memory, integer for key and for the value.

Well I made a quick sort just now and a radix sort, but I just got a idea, what if you already have your sorted list, and then when you quick sort it, you make the pivot the value you just inserted? Damn actually that wouldn't work as well as i thought it would because the index would still be at the end so you'd loop through the whole data set anyways.

Ill check the AVL tree, I've heard of it but I've never actually looked at what it is. Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert, and then move up or down the list and compare the next values to see where your new one should go. It seems hashtables are a lot quicker in this case but I wonder how the AVL tree would work out.
 

Accname

2D-Graphics enthusiast
Reaction score
1,462
Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert
That is how quicksort works.

And hashtables arent all that fast either. The actual hash function needs to be calculated and then you have to deal with collisions. The worst-case time for insertion and look-up into/from hashtables is O(n) which is quite bad. Of course the average case is much better then that.

Quicksort is, already, really damn fast. But what I have recently read is that you could try to use parallel sorting if you are working with very large data sets. That is, like in quicksort, you always try to split your data up into smaller parts, but everytime you split up you use a new, separate thread for the work to be done. This way you make use of multi-core processors for sorting the data.
 

Darthfett

Aerospace/Cybersecurity Software Engineer
Reaction score
615
The problem with using a dictionary is that it is not necessarily sorted. The underlying implementation can do whatever it wants with the order, when you ask for the keys or items.

As far as your code, you're currently making that buckets value into a class attribute, not an instance attribute. Remove it as is, and add an initializer method:

Code:
def __init__(self):
    self.buckets = {}

Additionally, as with all programming optimization problems, the question is do you really *need* to optimize this? You should first profile your code, and determine that this is indeed your bottleneck. Python's sorting method, Timsort is actually quite fast, and works well with real data. It may be good enough to simply append the value, and then use sorted or list.sort, either just after each value is inserted, or when some code requests access to the list or one of its values.

What ends up being the most efficient algorithm is highly dependent upon your data set, both its size and whether there are any patterns in the values being inserted (see again Timsort, which takes advantage of the fact that there are often sorted sequences).

Finally, what data structure you need can depend on your use case. For example, a heapq might be good enough, if all you need to do is get the smallest/largest values (accessing these is O(log(n), and it is O(log(n)) to insert).

TLDR; Now that you understand why the answer is not simple, I will give you an answer: For general use, with large data sets, I would recommend using binary search insertion (see the bisect module).
 

monoVertex

I'm back!
Reaction score
460
Your example doesn't really work if you need the actual items stored in the list and you just keep tabs on the number of times an integer is found in your array, that dictionary is not sorted (and couldn't be sorted in any way). So it is not really a solution for the problem "insert integer into sorted list".

EDIT: Damn, Darth beat me.

If you absolutely have to keep your list as a list, then I think binary search is the best option. With an heuristic for the index choice, it's going to be even faster. Now, this heuristic depends a lot on the data set you're working with. The parallel search Accname suggested sounds really interesting, I'm going to test that :D.
 

D.V.D

Make a wish
Reaction score
73
Yeah I know, a dictionary isn't ever sorted because it has no order, but if you retrieve the data like you normally do by going from index 0 to whatever number, then your process of retrieving the info will make your result be a sorted list(if you wanted to print or return a list type from the dictionary). So in the end the result is the same.

Yeah I need to read up more on how a hash function actually looks like, I know the concept of it but I want to implement my own version so that I know I get it enough to make a working version of it.

The issue with QuickSort as I see it at least is predicting where the pivot should be. The way I made it, I chose the pivot as a random index in the list, which isn't efficient at all, but if you have a large list and your inserting a arbitrary value into it, making your pivot half doesn't seem as that good of a solution anyways. But I think I know how to quickly check if a large chunk was already pre-sorted so I'll definitely play around with this. So far, it seems like the best option.

Okay so I got AVL Tree, Timer Sort, Binary Search, and parallel sorts. Im probably going to leave parallel sorts for last since I have never done multi threaded programming and this whole thing is really for fun, but Ill do that one last. Guess I got a lot to do in my computer science lab tomorrow :D.

The point is that I wanted to avoid resorting a huge sorted list if only one element isn't sorted inside of it, where as the rest is. It looks like a divide and conquer algorithm seems the best fit and checking if the range of each divided part of the list can let you check if your value would fit inside that section. If not, that part is sorted so you can skip it. Reminds me a lot of quad tree's and octree's.

BTW, im still first year and we haven't gotten into any cool new concepts yet except dictionary's but could someone link a paper or webpage that explains what all the O(n) and log n stuff means and how you come up with it? I heard a bit is the negative log or ln of something but I barely know anything on this topic, even though its in every paper I read.
 

monoVertex

I'm back!
Reaction score
460
The big O notation and its friends are one of the first things you should have studied, I have no idea why wasn't it taught at your school.

It is basically a method to approximate the performance of an algorithm. Wikipedia has a pretty documented page about this topic. I don't know any other documents, as I was taught this concept, but I'm sure you can find a lot of them on the Internet. As you'll soon see, it's not just a cool concept to learn, it's something indispensable from a programmer's knowledge.

PS: it has nothing to do with bits, I have no idea what that "1 bit = -log" is and it doesn't sound correct either :D.
 

D.V.D

Make a wish
Reaction score
73
Nah, we do a course about proofs and proving statments which have a lot in common with programs. I might learn those stuff next semester in statistics but Im not sure :p

Thanks for the link, yeah it seems super important if its everywhere XD

Yeah probably, just something I heard or maybe it is, doesn't really matter though :p and it was the -log of something, not sure what it was :p .
 

seph ir oth

Mod'n Dat News Jon
Reaction score
262
BOGO SORT BEST SORT


Edit: if memory isn't an issue, try Radix sorting, especially with lots of data.
 

monoVertex

I'm back!
Reaction score
460
Random sorting is a more general class of algorithms based on random numbers, I think. For example, Bozo Sort, which I found here is another random sort algorithm.
 

D.V.D

Make a wish
Reaction score
73
Thanks a lot for the video, now I want to implement all of them just for the hell of it XDD. Really helpful with the visuals too :)
 
General chit-chat
Help Users
  • No one is chatting at the moment.
  • Varine Varine:
    How can you tell the difference between real traffic and indexing or AI generation bots?
  • The Helper The Helper:
    The bots will show up as users online in the forum software but they do not show up in my stats tracking. I am sure there are bots in the stats but the way alot of the bots treat the site do not show up on the stats
  • Varine Varine:
    I want to build a filtration system for my 3d printer, and that shit is so much more complicated than I thought it would be
  • Varine Varine:
    Apparently ABS emits styrene particulates which can be like .2 micrometers, which idk if the VOC detectors I have can even catch that
  • Varine Varine:
    Anyway I need to get some of those sensors and two air pressure sensors installed before an after the filters, which I need to figure out how to calculate the necessary pressure for and I have yet to find anything that tells me how to actually do that, just the cfm ratings
  • Varine Varine:
    And then I have to set up an arduino board to read those sensors, which I also don't know very much about but I have a whole bunch of crash course things for that
  • Varine Varine:
    These sensors are also a lot more than I thought they would be. Like 5 to 10 each, idk why but I assumed they would be like 2 dollars
  • Varine Varine:
    Another issue I'm learning is that a lot of the air quality sensors don't work at very high ambient temperatures. I'm planning on heating this enclosure to like 60C or so, and that's the upper limit of their functionality
  • Varine Varine:
    Although I don't know if I need to actually actively heat it or just let the plate and hotend bring the ambient temp to whatever it will, but even then I need to figure out an exfiltration for hot air. I think I kind of know what to do but it's still fucking confusing
  • The Helper The Helper:
    Maybe you could find some of that information from AC tech - like how they detect freon and such
  • Varine Varine:
    That's mostly what I've been looking at
  • Varine Varine:
    I don't think I'm dealing with quite the same pressures though, at the very least its a significantly smaller system. For the time being I'm just going to put together a quick scrubby box though and hope it works good enough to not make my house toxic
  • Varine Varine:
    I mean I don't use this enough to pose any significant danger I don't think, but I would still rather not be throwing styrene all over the air
  • The Helper The Helper:
    New dessert added to recipes Southern Pecan Praline Cake https://www.thehelper.net/threads/recipe-southern-pecan-praline-cake.193555/
  • The Helper The Helper:
    Another bot invasion 493 members online most of them bots that do not show up on stats
  • Varine Varine:
    I'm looking at a solid 378 guests, but 3 members. Of which two are me and VSNES. The third is unlisted, which makes me think its a ghost.
    +1
  • The Helper The Helper:
    Some members choose invisibility mode
    +1
  • The Helper The Helper:
    I bitch about Xenforo sometimes but it really is full featured you just have to really know what you are doing to get the most out of it.
  • The Helper The Helper:
    It is just not easy to fix styles and customize but it definitely can be done
  • The Helper The Helper:
    I do know this - xenforo dropped the ball by not keeping the vbulletin reputation comments as a feature. The loss of the Reputation comments data when we switched to Xenforo really was the death knell for the site when it came to all the users that left. I know I missed it so much and I got way less interested in the site when that feature was gone and I run the site.
  • Blackveiled Blackveiled:
    People love rep, lol
    +1
  • The Helper The Helper:
    The recipe today is Sloppy Joe Casserole - one of my faves LOL https://www.thehelper.net/threads/sloppy-joe-casserole-with-manwich.193585/
  • The Helper The Helper:
    Decided to put up a healthier type recipe to mix it up - Honey Garlic Shrimp Stir-Fry https://www.thehelper.net/threads/recipe-honey-garlic-shrimp-stir-fry.193595/

      The Helper Discord

      Members online

      No members online now.

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top