# Efficient Method for inserting integers into a sorted list

#### D.V.D

##### Make a wish
Hey all, I'm playing around on my school computers and I was wondering, whats a efficient method for inserting new values into a sorted list? I read up that you could use a binary search every time you do insert but I went about a different way of doing it.

I'm using python because that's all our computers have, and I decided to instead of store the values in a list, I'd store them in a dictionary (unsortedlist i think in c++), then i would have the key's be the actual values stored, while the values would be the amount of instances of this value found in the list. So in code, this is what it would look like:

Code:
``````class sorted_list:
buckets = {}

def insert (self,value):
# Assume value is a positive number
if self.buckets.get(value,0) == 0:
self.buckets[value] = 1
else:
self.buckets[value] += 1

def remove (self,value):
if self.buckets.get(value,0) == 0:
#error
return
self.buckets[value] -= 1``````
This seems pretty basic, and i'm not exactly sure if its quick because you do have to use hash table which might be slower than some other alternatives. The other issue is you don't have all the properties of a list for example, its a lot harder to loop through the sorted list since you don't always know the available index's that you can choose from. So to make that work, you might have to check each time you insert or delete a value to get the range of the list, which gets a bit tricky and more expensive. Looping through would also require a get method for each index you loop through which isn't so bad if your index is 100 but if its in the thousands to millions (point clouds), this could become a issue.

Are there other quick ways of doing this or is a binary search the best option out there for large lists of integers with large ranges( were assuming the values are all positive )?

#### Accname

##### 2D-Graphics enthusiast
Efficient is relative.
Your way might be very fast for very large sets of data (> 100.000), but its definitely not efficient for small sets and its probably wasteful in terms of memory usage.
If you never have more then 10.000 elements anyway then just go with a simple linear search. It doesnt even matter anymore today. (or if you feel fancy do quick sort or heap sort)

Other then that there is also AVL-trees which are very efficient in performance and memory usage but are quite a lot more work to implement.

#### D.V.D

##### Make a wish
Well if you want to end up working with point clouds, you'll always have upwards of hundred thousand points for practically all objects (they're super memory intensive). Hmm yeah, I was thinking that making a dictionary would use a lot more memory, integer for key and for the value.

Well I made a quick sort just now and a radix sort, but I just got a idea, what if you already have your sorted list, and then when you quick sort it, you make the pivot the value you just inserted? Damn actually that wouldn't work as well as i thought it would because the index would still be at the end so you'd loop through the whole data set anyways.

Ill check the AVL tree, I've heard of it but I've never actually looked at what it is. Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert, and then move up or down the list and compare the next values to see where your new one should go. It seems hashtables are a lot quicker in this case but I wonder how the AVL tree would work out.

#### Accname

##### 2D-Graphics enthusiast
Id imagine that the quickest way was if you could approximately find a value in the list close to the one your trying to insert
That is how quicksort works.

And hashtables arent all that fast either. The actual hash function needs to be calculated and then you have to deal with collisions. The worst-case time for insertion and look-up into/from hashtables is O(n) which is quite bad. Of course the average case is much better then that.

Quicksort is, already, really damn fast. But what I have recently read is that you could try to use parallel sorting if you are working with very large data sets. That is, like in quicksort, you always try to split your data up into smaller parts, but everytime you split up you use a new, separate thread for the work to be done. This way you make use of multi-core processors for sorting the data.

#### Darthfett

##### Aerospace/Cybersecurity Software Engineer
The problem with using a dictionary is that it is not necessarily sorted. The underlying implementation can do whatever it wants with the order, when you ask for the keys or items.

As far as your code, you're currently making that buckets value into a class attribute, not an instance attribute. Remove it as is, and add an initializer method:

Code:
``````def __init__(self):
self.buckets = {}``````
Additionally, as with all programming optimization problems, the question is do you really *need* to optimize this? You should first profile your code, and determine that this is indeed your bottleneck. Python's sorting method, Timsort is actually quite fast, and works well with real data. It may be good enough to simply append the value, and then use sorted or list.sort, either just after each value is inserted, or when some code requests access to the list or one of its values.

What ends up being the most efficient algorithm is highly dependent upon your data set, both its size and whether there are any patterns in the values being inserted (see again Timsort, which takes advantage of the fact that there are often sorted sequences).

Finally, what data structure you need can depend on your use case. For example, a heapq might be good enough, if all you need to do is get the smallest/largest values (accessing these is O(log(n), and it is O(log(n)) to insert).

TLDR; Now that you understand why the answer is not simple, I will give you an answer: For general use, with large data sets, I would recommend using binary search insertion (see the bisect module).

#### monoVertex

##### I'm back!
Your example doesn't really work if you need the actual items stored in the list and you just keep tabs on the number of times an integer is found in your array, that dictionary is not sorted (and couldn't be sorted in any way). So it is not really a solution for the problem "insert integer into sorted list".

EDIT: Damn, Darth beat me.

If you absolutely have to keep your list as a list, then I think binary search is the best option. With an heuristic for the index choice, it's going to be even faster. Now, this heuristic depends a lot on the data set you're working with. The parallel search Accname suggested sounds really interesting, I'm going to test that .

#### D.V.D

##### Make a wish
Yeah I know, a dictionary isn't ever sorted because it has no order, but if you retrieve the data like you normally do by going from index 0 to whatever number, then your process of retrieving the info will make your result be a sorted list(if you wanted to print or return a list type from the dictionary). So in the end the result is the same.

Yeah I need to read up more on how a hash function actually looks like, I know the concept of it but I want to implement my own version so that I know I get it enough to make a working version of it.

The issue with QuickSort as I see it at least is predicting where the pivot should be. The way I made it, I chose the pivot as a random index in the list, which isn't efficient at all, but if you have a large list and your inserting a arbitrary value into it, making your pivot half doesn't seem as that good of a solution anyways. But I think I know how to quickly check if a large chunk was already pre-sorted so I'll definitely play around with this. So far, it seems like the best option.

Okay so I got AVL Tree, Timer Sort, Binary Search, and parallel sorts. Im probably going to leave parallel sorts for last since I have never done multi threaded programming and this whole thing is really for fun, but Ill do that one last. Guess I got a lot to do in my computer science lab tomorrow .

The point is that I wanted to avoid resorting a huge sorted list if only one element isn't sorted inside of it, where as the rest is. It looks like a divide and conquer algorithm seems the best fit and checking if the range of each divided part of the list can let you check if your value would fit inside that section. If not, that part is sorted so you can skip it. Reminds me a lot of quad tree's and octree's.

BTW, im still first year and we haven't gotten into any cool new concepts yet except dictionary's but could someone link a paper or webpage that explains what all the O(n) and log n stuff means and how you come up with it? I heard a bit is the negative log or ln of something but I barely know anything on this topic, even though its in every paper I read.

#### monoVertex

##### I'm back!
The big O notation and its friends are one of the first things you should have studied, I have no idea why wasn't it taught at your school.

It is basically a method to approximate the performance of an algorithm. Wikipedia has a pretty documented page about this topic. I don't know any other documents, as I was taught this concept, but I'm sure you can find a lot of them on the Internet. As you'll soon see, it's not just a cool concept to learn, it's something indispensable from a programmer's knowledge.

PS: it has nothing to do with bits, I have no idea what that "1 bit = -log" is and it doesn't sound correct either .

#### D.V.D

##### Make a wish
Nah, we do a course about proofs and proving statments which have a lot in common with programs. I might learn those stuff next semester in statistics but Im not sure

Thanks for the link, yeah it seems super important if its everywhere XD

Yeah probably, just something I heard or maybe it is, doesn't really matter though and it was the -log of something, not sure what it was .

#### seph ir oth

##### Mod'n Dat News Jon
Staff member
BOGO SORT BEST SORT

Edit: if memory isn't an issue, try Radix sorting, especially with lots of data.

#### monoVertex

##### I'm back!
Lol, I never heard of bogo sort before, I though it was some uber cool algorithm.

#### Accname

##### 2D-Graphics enthusiast
Isnt that the same as random sort?

#### monoVertex

##### I'm back!
Random sorting is a more general class of algorithms based on random numbers, I think. For example, Bozo Sort, which I found here is another random sort algorithm.

#### D.V.D

##### Make a wish
Thanks a lot for the video, now I want to implement all of them just for the hell of it XDD. Really helpful with the visuals too

General chit-chat
Help Users
• No one is chatting at the moment.
• The Helper:
Happy Friday!
+1
• tom_mai78101:
Starting this upcoming Thursday, I will be in Japan for 10 days.
• tom_mai78101:
Thursday - Friday will be my Japan arrival flight. 9 days later, on a Sunday, will be my return departure flight.
+2
• The Helper:
Hope you have safe travels my friend!
+1
• vypur85:
Wow spring time in Japan is awesome. Enjoy!
• The Helper:
Hopefully it will be more pleasure than work
• vypur85:
Recently tried out ChatGPT about WE triggering. Wow it's capable of giving a somewhat legitimate response.
• The Helper:
I am sure it has read all the info on the forums here
• The Helper:
i think triggering is just scripting and chatgpt is real good at code
• vypur85:
Yeah I suppose so. It's interesting how it can explain in so much detail.
• vypur85:
But yet it won't work.
• The Helper:
it does a bad ass job doing excel vba code it has leveled me up at my job when I deal with excel that is for sure
• vypur85:
Nice! I love Excel coding as well. Has always been using Google to help me. Maybe I'll use ChatGPT next time when I need it.
• The Helper:
yeah whatever it puts out even if it is not perfect I can fix it and the latest version of chatgpt can create websites from pictures it will not be long until it can do that with almost all the tools
+1
• The Helper:
These new Chat AI programs are going to change everything everyone better Buckle the Fuck Up!
• The Helper:
oh and Happy Tuesday Evening!
+1
• jonas:
Im worried they'll change things for worse
• jonas:
A lot more low quality content, a lot more half-baked stuff.
• jonas:
If you're good enough to spot the mistakes of the answers you don't need it in the first place. If you aren't good enough, you're gonna rely on some half-correct stuff
• The Helper:
the earlier AI is and has been used extensively for publishing news and other content for a while now
• jonas:
I used to be active on quora, it's now flooded with extremely similar, superficial answers that often miss the point of the question
• N NJJ:
hi
• N NJJ:
Hello, gathering all my old accounts…
+1

### Members online

No members online now.