Efficient Method for inserting integers into a sorted list

D.V.D

Make a wish
Reaction score
73
Hey all, I'm playing around on my school computers and I was wondering, whats a efficient method for inserting new values into a sorted list? I read up that you could use a binary search every time you do insert but I went about a different way of doing it.

I'm using python because that's all our computers have, and I decided to instead of store the values in a list, I'd store them in a dictionary (unsortedlist i think in c++), then i would have the key's be the actual values stored, while the values would be the amount of instances of this value found in the list. So in code, this is what it would look like:

Code:
class sorted_list:
    buckets = {}
 
    def insert (self,value):
        # Assume value is a positive number
        if self.buckets.get(value,0) == 0:
            self.buckets[value] = 1
        else:
            self.buckets[value] += 1
 
    def remove (self,value):
        if self.buckets.get(value,0) == 0:
            #error
            return
        self.buckets[value] -= 1

This seems pretty basic, and i'm not exactly sure if its quick because you do have to use hash table which might be slower than some other alternatives. The other issue is you don't have all the properties of a list for example, its a lot harder to loop through the sorted list since you don't always know the available index's that you can choose from. So to make that work, you might have to check each time you insert or delete a value to get the range of the list, which gets a bit tricky and more expensive. Looping through would also require a get method for each index you loop through which isn't so bad if your index is 100 but if its in the thousands to millions (point clouds), this could become a issue.

Are there other quick ways of doing this or is a binary search the best option out there for large lists of integers with large ranges( were assuming the values are all positive )?
 
Efficient is relative.
Your way might be very fast for very large sets of data (> 100.000), but its definitely not efficient for small sets and its probably wasteful in terms of memory usage.
If you never have more then 10.000 elements anyway then just go with a simple linear search. It doesnt even matter anymore today. (or if you feel fancy do quick sort or heap sort)

Other then that there is also AVL-trees which are very efficient in performance and memory usage but are quite a lot more work to implement.
 
Well if you want to end up working with point clouds, you'll always have upwards of hundred thousand points for practically all objects (they're super memory intensive). Hmm yeah, I was thinking that making a dictionary would use a lot more memory, integer for key and for the value.

Well I made a quick sort just now and a radix sort, but I just got a idea, what if you already have your sorted list, and then when you quick sort it, you make the pivot the value you just inserted? Damn actually that wouldn't work as well as i thought it would because the index would still be at the end so you'd loop through the whole data set anyways.

Ill check the AVL tree, I've heard of it but I've never actually looked at what it is. Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert, and then move up or down the list and compare the next values to see where your new one should go. It seems hashtables are a lot quicker in this case but I wonder how the AVL tree would work out.
 
Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert
That is how quicksort works.

And hashtables arent all that fast either. The actual hash function needs to be calculated and then you have to deal with collisions. The worst-case time for insertion and look-up into/from hashtables is O(n) which is quite bad. Of course the average case is much better then that.

Quicksort is, already, really damn fast. But what I have recently read is that you could try to use parallel sorting if you are working with very large data sets. That is, like in quicksort, you always try to split your data up into smaller parts, but everytime you split up you use a new, separate thread for the work to be done. This way you make use of multi-core processors for sorting the data.
 
The problem with using a dictionary is that it is not necessarily sorted. The underlying implementation can do whatever it wants with the order, when you ask for the keys or items.

As far as your code, you're currently making that buckets value into a class attribute, not an instance attribute. Remove it as is, and add an initializer method:

Code:
def __init__(self):
    self.buckets = {}

Additionally, as with all programming optimization problems, the question is do you really *need* to optimize this? You should first profile your code, and determine that this is indeed your bottleneck. Python's sorting method, Timsort is actually quite fast, and works well with real data. It may be good enough to simply append the value, and then use sorted or list.sort, either just after each value is inserted, or when some code requests access to the list or one of its values.

What ends up being the most efficient algorithm is highly dependent upon your data set, both its size and whether there are any patterns in the values being inserted (see again Timsort, which takes advantage of the fact that there are often sorted sequences).

Finally, what data structure you need can depend on your use case. For example, a heapq might be good enough, if all you need to do is get the smallest/largest values (accessing these is O(log(n), and it is O(log(n)) to insert).

TLDR; Now that you understand why the answer is not simple, I will give you an answer: For general use, with large data sets, I would recommend using binary search insertion (see the bisect module).
 
Your example doesn't really work if you need the actual items stored in the list and you just keep tabs on the number of times an integer is found in your array, that dictionary is not sorted (and couldn't be sorted in any way). So it is not really a solution for the problem "insert integer into sorted list".

EDIT: Damn, Darth beat me.

If you absolutely have to keep your list as a list, then I think binary search is the best option. With an heuristic for the index choice, it's going to be even faster. Now, this heuristic depends a lot on the data set you're working with. The parallel search Accname suggested sounds really interesting, I'm going to test that :D.
 
Yeah I know, a dictionary isn't ever sorted because it has no order, but if you retrieve the data like you normally do by going from index 0 to whatever number, then your process of retrieving the info will make your result be a sorted list(if you wanted to print or return a list type from the dictionary). So in the end the result is the same.

Yeah I need to read up more on how a hash function actually looks like, I know the concept of it but I want to implement my own version so that I know I get it enough to make a working version of it.

The issue with QuickSort as I see it at least is predicting where the pivot should be. The way I made it, I chose the pivot as a random index in the list, which isn't efficient at all, but if you have a large list and your inserting a arbitrary value into it, making your pivot half doesn't seem as that good of a solution anyways. But I think I know how to quickly check if a large chunk was already pre-sorted so I'll definitely play around with this. So far, it seems like the best option.

Okay so I got AVL Tree, Timer Sort, Binary Search, and parallel sorts. Im probably going to leave parallel sorts for last since I have never done multi threaded programming and this whole thing is really for fun, but Ill do that one last. Guess I got a lot to do in my computer science lab tomorrow :D.

The point is that I wanted to avoid resorting a huge sorted list if only one element isn't sorted inside of it, where as the rest is. It looks like a divide and conquer algorithm seems the best fit and checking if the range of each divided part of the list can let you check if your value would fit inside that section. If not, that part is sorted so you can skip it. Reminds me a lot of quad tree's and octree's.

BTW, im still first year and we haven't gotten into any cool new concepts yet except dictionary's but could someone link a paper or webpage that explains what all the O(n) and log n stuff means and how you come up with it? I heard a bit is the negative log or ln of something but I barely know anything on this topic, even though its in every paper I read.
 
The big O notation and its friends are one of the first things you should have studied, I have no idea why wasn't it taught at your school.

It is basically a method to approximate the performance of an algorithm. Wikipedia has a pretty documented page about this topic. I don't know any other documents, as I was taught this concept, but I'm sure you can find a lot of them on the Internet. As you'll soon see, it's not just a cool concept to learn, it's something indispensable from a programmer's knowledge.

PS: it has nothing to do with bits, I have no idea what that "1 bit = -log" is and it doesn't sound correct either :D.
 
Nah, we do a course about proofs and proving statments which have a lot in common with programs. I might learn those stuff next semester in statistics but Im not sure :p

Thanks for the link, yeah it seems super important if its everywhere XD

Yeah probably, just something I heard or maybe it is, doesn't really matter though :p and it was the -log of something, not sure what it was :p .
 
BOGO SORT BEST SORT


Edit: if memory isn't an issue, try Radix sorting, especially with lots of data.
 
Thanks a lot for the video, now I want to implement all of them just for the hell of it XDD. Really helpful with the visuals too :)
 
General chit-chat
Help Users
  • No one is chatting at the moment.
  • Varine Varine:
    I would be there for days, even with my camera set up slides can take a long time, and if they want perfect captures I really need to use my scanners that are professionally made for that. My camera rig works well for what it is, but for enlargements and things it's not as good.
  • Varine Varine:
    I've only had a couple clients with that so far, though. I don't have a website or anything yet though.
  • Varine Varine:
    Console repair can be worthwhile, but it's also not a thing I can do at scale in my house. I just don't have room for the equipment. I need an office that I can segregate out for archival and then electronic restoration.
  • Varine Varine:
    But in order for that to be real, I need more time, and for more time I need to work less, and to work less I need a different job, and for a different job I need more money to fall back on so that I can make enough to just pay like, rent and utilities and use my savings to find these projects
    +1
  • Varine Varine:
    Another couple years. I just need to take it slow and it'll get there.
  • jonas jonas:
    any chance to get that stolen money back?
  • jonas jonas:
    Maybe you can do console repair just as a side thing, especially if there's so much competition business might be slow. Or do you need a lot of special equipment for that?
  • jonas jonas:
    I recently bought a used sauna and the preowner told me some component is broken, I took a look and it was just a burnt fuse, really cheap to fix. I was real proud of my self since I usually have two left hands for this kinda stuff :p
  • tom_mai78101 tom_mai78101:
    I am still playing Shapez 2. What an awful thing to happen, Varine, and hopefully everything has been sorted out soon. Always use multi-factor authentication whenever you have the opportunity to do so.
    +1
  • Varine Varine:
    I think all of the money is accounted for now, and all the cards have been changed out, so I think for the most part it's taken care of now. Just need to go through and make sure all of my accounts are secured again, it's just time consuming.
  • Varine Varine:
    And yeah everything has 2 factor turned on now, or at least everything I can think of at the moment.
  • Varine Varine:
    The consoles don't need too much equipment that I don't already have. I would like to get a reflow oven, but I don't really want to buy one so I'm thinking about modifying a toaster oven I have to make something that will work for what I'm doing.
  • Varine Varine:
    I have the soldering irons and reflow and all that, but without an oven it's kind of hard to build mod chips and things like that. I made a handful of them with a hot air station, but it's a pain.
  • Varine Varine:
    The only thing I'm not really set up for is BGA rework. I've done it before a little bit, but not reliably, and that equipment is wildly expensive. You need X-rays and shit.
  • Varine Varine:
    I also have a couple 3D printers. I'm not super good with those and need to get an enclosure built, but they'll be useful for some aesthetic mods I've been thinking about. At least I can use them to do designs and then just have someone else print out the parts for me once I know they work.
  • Varine Varine:
    I also use them to make adapters for all my camera shit, but that's also an on the side thing for now. Lens adapters get really expensive.
  • Varine Varine:
    I've been trying to do some little art pieces as well, but I'm not much an engineer so they haven't gone great. I got some new things showing up to try and play with
  • Varine Varine:
    I want to build this tesserect kind of thing with mirrors, and I've been trying to make this like black hole diorama. In my head it looks really cool, but I kind of thought I could form polarizing lenses into a sphere but I tend to just destroy them every time I try.
  • Varine Varine:
    So I got a new idea, but I'm not sure how to make it work like I want without being able to get a polarizer curved. I think they are made out of PVA typically, and I thought I could just heat it up a little bit to soften the film, but that clearly isn't working. So I'm going to try a few other things, I'm thinking if I put a mirror film over the polarizing film I might get something cool. I have some polarized LED's as well, and I think if I make a central light source I can use the mirrors combined with the polarizers to make that central light APPEAR black. I have next week off so I'm going to spend some time trying to figure it out
  • Varine Varine:
    The tesserect works, at least. I just need to figure out how to be able to assemble it, but I think I have a pretty good idea of how to go about it. Or at least a prototype of it. I'll post some pictures next week
  • jonas jonas:
    That last bit sounds like the last entry in a scientist's journal in a destroyed research facility in a post-apocalyptic video game
  • Varine Varine:
    lol it's not that exciting
  • Varine Varine:
    Shiny tho
  • Varine Varine:
    Basically it's a cube with a two way mirror on the inside, and then a smaller cube suspended in that with a mirrors on the outside of it. Kind of like those infinity pictures where they use two mirrors to go forever. Only it's twelve mirrors
  • Varine Varine:
    And the tiniest LED strip I could find

      The Helper Discord

      Members online

      No members online now.

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top