All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Linked List versus Hashed Linked iIst
@ 2014-08-18 20:20 Nick Krause
  2014-08-18 20:22 ` Mohammed Gamal
  2014-08-18 23:01 ` Greg Freemyer
  0 siblings, 2 replies; 8+ messages in thread
From: Nick Krause @ 2014-08-18 20:20 UTC (permalink / raw
  To: kernelnewbies

What are the advantages of the hashed linked list version over the
standard one and does it
increase the memory usage and overhead of the linked list more if I
use a hashed version?
Cheers Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-18 20:20 Linked List versus Hashed Linked iIst Nick Krause
@ 2014-08-18 20:22 ` Mohammed Gamal
  2014-08-18 23:01 ` Greg Freemyer
  1 sibling, 0 replies; 8+ messages in thread
From: Mohammed Gamal @ 2014-08-18 20:22 UTC (permalink / raw
  To: kernelnewbies

Google it!

On Mon, Aug 18, 2014 at 10:20 PM, Nick Krause <xerofoify@gmail.com> wrote:
> What are the advantages of the hashed linked list version over the
> standard one and does it
> increase the memory usage and overhead of the linked list more if I
> use a hashed version?
> Cheers Nick
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-18 20:20 Linked List versus Hashed Linked iIst Nick Krause
  2014-08-18 20:22 ` Mohammed Gamal
@ 2014-08-18 23:01 ` Greg Freemyer
  2014-08-19  2:53   ` Nick Krause
  1 sibling, 1 reply; 8+ messages in thread
From: Greg Freemyer @ 2014-08-18 23:01 UTC (permalink / raw
  To: kernelnewbies

On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <xerofoify@gmail.com> wrote:
> What are the advantages of the hashed linked list version over the
> standard one and does it
> increase the memory usage and overhead of the linked list more if I
> use a hashed version?

Seriously?  Do you know what a hash is?

A hash is a well-defined many to one algorithm.

If I have a universe of a million items that hash down to 100 unique
hashes, then I can group those million items by hash and have 100
groups of roughly 10,000 items each.

The better the hashing algorithm versus my original universe of 1
million items, the more even the distribution.

Now that I have 100 segregated groups I can build an array of 100
linked lists all maintained separately.

Thus:

hash_index = my_hash(item)

add_item(linked_list[hash_item], item) is how I add my item to the
hashed linked list.

is_in_list(linked_list[hash_item], item) is how I check to see if my
item is already in the list.

So in my example I have to have 100 linked lists, but each list is on
average 100x smaller than a simple linked list would be.

Is adding an item to the hashed linked list faster?

Absolutely not, I have to hash the item first then do a normal linked
list insertion.  That will always be slower.

Is finding the item faster?

That is the whole point of the exercise.  The theory is you ONLY use a
hashed linked list if the overhead of hashing the item is less than
the amount of time saved by traversing shorter lists when you search.

It is the job of the programmer to make the determination if a hashed
list is a better choice or not on a case by case basis.  It depends on
the length of the list without breaking it into pieces and how well
the hash algorithm can do at generating roughly similar segregated
groups.

For the size question, write yourself a userspace app and test it.
Obviously that is more work than asking here, but it is ASSUMED you
are doing research on your OWN before you post questions here.

fyi: this question has little to do with the linux kernel.  It is part
of what people mean when they say you need to go learn c before you
start on the kernel.  Using linked lists and hashed linked lists is
stuff you can fully explore in userspace.

Greg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-18 23:01 ` Greg Freemyer
@ 2014-08-19  2:53   ` Nick Krause
  2014-08-19  3:16     ` Ashok Babu
  2014-08-19 11:45     ` Greg Freemyer
  0 siblings, 2 replies; 8+ messages in thread
From: Nick Krause @ 2014-08-19  2:53 UTC (permalink / raw
  To: kernelnewbies

On Mon, Aug 18, 2014 at 7:01 PM, Greg Freemyer <greg.freemyer@gmail.com> wrote:
> On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <xerofoify@gmail.com> wrote:
>> What are the advantages of the hashed linked list version over the
>> standard one and does it
>> increase the memory usage and overhead of the linked list more if I
>> use a hashed version?
>
> Seriously?  Do you know what a hash is?
>
> A hash is a well-defined many to one algorithm.
>
> If I have a universe of a million items that hash down to 100 unique
> hashes, then I can group those million items by hash and have 100
> groups of roughly 10,000 items each.
>
> The better the hashing algorithm versus my original universe of 1
> million items, the more even the distribution.
>
> Now that I have 100 segregated groups I can build an array of 100
> linked lists all maintained separately.
>
> Thus:
>
> hash_index = my_hash(item)
>
> add_item(linked_list[hash_item], item) is how I add my item to the
> hashed linked list.
>
> is_in_list(linked_list[hash_item], item) is how I check to see if my
> item is already in the list.
>
> So in my example I have to have 100 linked lists, but each list is on
> average 100x smaller than a simple linked list would be.
>
> Is adding an item to the hashed linked list faster?
>
> Absolutely not, I have to hash the item first then do a normal linked
> list insertion.  That will always be slower.
>
> Is finding the item faster?
>
> That is the whole point of the exercise.  The theory is you ONLY use a
> hashed linked list if the overhead of hashing the item is less than
> the amount of time saved by traversing shorter lists when you search.
>
> It is the job of the programmer to make the determination if a hashed
> list is a better choice or not on a case by case basis.  It depends on
> the length of the list without breaking it into pieces and how well
> the hash algorithm can do at generating roughly similar segregated
> groups.
>
> For the size question, write yourself a userspace app and test it.
> Obviously that is more work than asking here, but it is ASSUMED you
> are doing research on your OWN before you post questions here.
>
> fyi: this question has little to do with the linux kernel.  It is part
> of what people mean when they say you need to go learn c before you
> start on the kernel.  Using linked lists and hashed linked lists is
> stuff you can fully explore in userspace.
>
> Greg
No I known what the advantages are for user space was wondering if
there were any issues that differ in
kernel space.
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-19  2:53   ` Nick Krause
@ 2014-08-19  3:16     ` Ashok Babu
  2014-08-19  3:41       ` Nick Krause
  2014-08-19 11:45     ` Greg Freemyer
  1 sibling, 1 reply; 8+ messages in thread
From: Ashok Babu @ 2014-08-19  3:16 UTC (permalink / raw
  To: kernelnewbies

Nick,

The mailing list is not answered by ATM or Google search engine, where you
can play around with any questions you like.
It takes so much efforts for a person to understand the questions, and
reply to that. You need to respect the value of a reply and the value of
time spent by REAL people.

Look at your first question, and after a long reply from GREG, you are
asking a totally different question.


Thanks
Ashok


On Tue, Aug 19, 2014 at 10:53 AM, Nick Krause <xerofoify@gmail.com> wrote:

> On Mon, Aug 18, 2014 at 7:01 PM, Greg Freemyer <greg.freemyer@gmail.com>
> wrote:
> > On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <xerofoify@gmail.com>
> wrote:
> >> What are the advantages of the hashed linked list version over the
> >> standard one and does it
> >> increase the memory usage and overhead of the linked list more if I
> >> use a hashed version?
> >
> > Seriously?  Do you know what a hash is?
> >
> > A hash is a well-defined many to one algorithm.
> >
> > If I have a universe of a million items that hash down to 100 unique
> > hashes, then I can group those million items by hash and have 100
> > groups of roughly 10,000 items each.
> >
> > The better the hashing algorithm versus my original universe of 1
> > million items, the more even the distribution.
> >
> > Now that I have 100 segregated groups I can build an array of 100
> > linked lists all maintained separately.
> >
> > Thus:
> >
> > hash_index = my_hash(item)
> >
> > add_item(linked_list[hash_item], item) is how I add my item to the
> > hashed linked list.
> >
> > is_in_list(linked_list[hash_item], item) is how I check to see if my
> > item is already in the list.
> >
> > So in my example I have to have 100 linked lists, but each list is on
> > average 100x smaller than a simple linked list would be.
> >
> > Is adding an item to the hashed linked list faster?
> >
> > Absolutely not, I have to hash the item first then do a normal linked
> > list insertion.  That will always be slower.
> >
> > Is finding the item faster?
> >
> > That is the whole point of the exercise.  The theory is you ONLY use a
> > hashed linked list if the overhead of hashing the item is less than
> > the amount of time saved by traversing shorter lists when you search.
> >
> > It is the job of the programmer to make the determination if a hashed
> > list is a better choice or not on a case by case basis.  It depends on
> > the length of the list without breaking it into pieces and how well
> > the hash algorithm can do at generating roughly similar segregated
> > groups.
> >
> > For the size question, write yourself a userspace app and test it.
> > Obviously that is more work than asking here, but it is ASSUMED you
> > are doing research on your OWN before you post questions here.
> >
> > fyi: this question has little to do with the linux kernel.  It is part
> > of what people mean when they say you need to go learn c before you
> > start on the kernel.  Using linked lists and hashed linked lists is
> > stuff you can fully explore in userspace.
> >
> > Greg
> No I known what the advantages are for user space was wondering if
> there were any issues that differ in
> kernel space.
> Nick
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140819/9124d9b1/attachment.html 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-19  3:16     ` Ashok Babu
@ 2014-08-19  3:41       ` Nick Krause
  0 siblings, 0 replies; 8+ messages in thread
From: Nick Krause @ 2014-08-19  3:41 UTC (permalink / raw
  To: kernelnewbies

On Mon, Aug 18, 2014 at 11:16 PM, Ashok Babu <ashok3d@gmail.com> wrote:
> Nick,
>
> The mailing list is not answered by ATM or Google search engine, where you
> can play around with any questions you like.
> It takes so much efforts for a person to understand the questions, and reply
> to that. You need to respect the value of a reply and the value of time
> spent by REAL people.
>
> Look at your first question, and after a long reply from GREG, you are
> asking a totally different question.
>
>
> Thanks
> Ashok
>
>
> On Tue, Aug 19, 2014 at 10:53 AM, Nick Krause <xerofoify@gmail.com> wrote:
>>
>> On Mon, Aug 18, 2014 at 7:01 PM, Greg Freemyer <greg.freemyer@gmail.com>
>> wrote:
>> > On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <xerofoify@gmail.com>
>> > wrote:
>> >> What are the advantages of the hashed linked list version over the
>> >> standard one and does it
>> >> increase the memory usage and overhead of the linked list more if I
>> >> use a hashed version?
>> >
>> > Seriously?  Do you know what a hash is?
>> >
>> > A hash is a well-defined many to one algorithm.
>> >
>> > If I have a universe of a million items that hash down to 100 unique
>> > hashes, then I can group those million items by hash and have 100
>> > groups of roughly 10,000 items each.
>> >
>> > The better the hashing algorithm versus my original universe of 1
>> > million items, the more even the distribution.
>> >
>> > Now that I have 100 segregated groups I can build an array of 100
>> > linked lists all maintained separately.
>> >
>> > Thus:
>> >
>> > hash_index = my_hash(item)
>> >
>> > add_item(linked_list[hash_item], item) is how I add my item to the
>> > hashed linked list.
>> >
>> > is_in_list(linked_list[hash_item], item) is how I check to see if my
>> > item is already in the list.
>> >
>> > So in my example I have to have 100 linked lists, but each list is on
>> > average 100x smaller than a simple linked list would be.
>> >
>> > Is adding an item to the hashed linked list faster?
>> >
>> > Absolutely not, I have to hash the item first then do a normal linked
>> > list insertion.  That will always be slower.
>> >
>> > Is finding the item faster?
>> >
>> > That is the whole point of the exercise.  The theory is you ONLY use a
>> > hashed linked list if the overhead of hashing the item is less than
>> > the amount of time saved by traversing shorter lists when you search.
>> >
>> > It is the job of the programmer to make the determination if a hashed
>> > list is a better choice or not on a case by case basis.  It depends on
>> > the length of the list without breaking it into pieces and how well
>> > the hash algorithm can do at generating roughly similar segregated
>> > groups.
>> >
>> > For the size question, write yourself a userspace app and test it.
>> > Obviously that is more work than asking here, but it is ASSUMED you
>> > are doing research on your OWN before you post questions here.
>> >
>> > fyi: this question has little to do with the linux kernel.  It is part
>> > of what people mean when they say you need to go learn c before you
>> > start on the kernel.  Using linked lists and hashed linked lists is
>> > stuff you can fully explore in userspace.
>> >
>> > Greg
>> No I known what the advantages are for user space was wondering if
>> there were any issues that differ in
>> kernel space.
>> Nick
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>
>
I understand that, sorry my email matters need improving `.).
To Greg , sorry about the misunderstanding with this question.
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-19  2:53   ` Nick Krause
  2014-08-19  3:16     ` Ashok Babu
@ 2014-08-19 11:45     ` Greg Freemyer
  2014-08-19 15:28       ` Valdis.Kletnieks at vt.edu
  1 sibling, 1 reply; 8+ messages in thread
From: Greg Freemyer @ 2014-08-19 11:45 UTC (permalink / raw
  To: kernelnewbies



On August 18, 2014 10:53:12 PM EDT, Nick Krause <xerofoify@gmail.com> wrote:
>On Mon, Aug 18, 2014 at 7:01 PM, Greg Freemyer
><greg.freemyer@gmail.com> wrote:
>> On Mon, Aug 18, 2014 at 4:20 PM, Nick Krause <xerofoify@gmail.com>
>wrote:
>>> What are the advantages of the hashed linked list version over the
>>> standard one and does it
>>> increase the memory usage and overhead of the linked list more if I
>>> use a hashed version?
>>
>> Seriously?  Do you know what a hash is?
>>
>> A hash is a well-defined many to one algorithm.
>>
>> If I have a universe of a million items that hash down to 100 unique
>> hashes, then I can group those million items by hash and have 100
>> groups of roughly 10,000 items each.
>>
>> The better the hashing algorithm versus my original universe of 1
>> million items, the more even the distribution.
>>
>> Now that I have 100 segregated groups I can build an array of 100
>> linked lists all maintained separately.
>>
>> Thus:
>>
>> hash_index = my_hash(item)
>>
>> add_item(linked_list[hash_item], item) is how I add my item to the
>> hashed linked list.
>>
>> is_in_list(linked_list[hash_item], item) is how I check to see if my
>> item is already in the list.
>>
>> So in my example I have to have 100 linked lists, but each list is on
>> average 100x smaller than a simple linked list would be.
>>
>> Is adding an item to the hashed linked list faster?
>>
>> Absolutely not, I have to hash the item first then do a normal linked
>> list insertion.  That will always be slower.
>>
>> Is finding the item faster?
>>
>> That is the whole point of the exercise.  The theory is you ONLY use
>a
>> hashed linked list if the overhead of hashing the item is less than
>> the amount of time saved by traversing shorter lists when you search.
>>
>> It is the job of the programmer to make the determination if a hashed
>> list is a better choice or not on a case by case basis.  It depends
>on
>> the length of the list without breaking it into pieces and how well
>> the hash algorithm can do at generating roughly similar segregated
>> groups.
>>
>> For the size question, write yourself a userspace app and test it.
>> Obviously that is more work than asking here, but it is ASSUMED you
>> are doing research on your OWN before you post questions here.
>>
>> fyi: this question has little to do with the linux kernel.  It is
>part
>> of what people mean when they say you need to go learn c before you
>> start on the kernel.  Using linked lists and hashed linked lists is
>> stuff you can fully explore in userspace.
>>
>> Greg
>No I known what the advantages are for user space was wondering if
>there were any issues that differ in
>kernel space.
>Nick

1) your original question needs either a highly generic answer like I gave, or a highly specific one that depends on the exact nature of the data, the number of the items tracked, the ratio of searches vs. adds, and how smooth the hash grouping is.  Since you didn't provide the exact use case, only the generic answer is possible.  In fact your question implies that the answer is relatively straight forward.  A much better question would have been "for a specific use case, how is the choice of a normal linked list vs a hashed linked list performed?"

Note the answer to that has nothing to do with user space vs kernel space.

2) The kernel is not a magic place.  Sure there are issues like locking and interrupts that make the kernel more complex than user space, but for data algorithms it is just that the quality of the code is pretty universally excellent.  It is excellent because it has been open for 20+ years and some great developers have worked on it during that time.  Poorly written code in any of the core areas was eradicated long ago.

You can take that excellent code into your user space app and test it to your heart's content.  Not only can you do that, for something like a linked list evaluation, you should do that.  You have implied "tested" code is code that compiles.  If a developer wanted to replace the hashed linked link implementation it would be expected that they had done significant testing of the new code in user space with highly varied loads to show what they work well on and when the new code performs less well.  Then do an analysis of the existing kernel data structures which use hashed linked lists and prove that the new method is an improvement for the actual kernel use cases.  It would be months of work, but that is what it takes if you actually want to improve the kernel in a meaningful way.

Greg


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Linked List versus Hashed Linked iIst
  2014-08-19 11:45     ` Greg Freemyer
@ 2014-08-19 15:28       ` Valdis.Kletnieks at vt.edu
  0 siblings, 0 replies; 8+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2014-08-19 15:28 UTC (permalink / raw
  To: kernelnewbies

On Tue, 19 Aug 2014 07:45:48 -0400, Greg Freemyer said:
> You can take that excellent code into your user space app and test it to your
> heart's content.  Not only can you do that, for something like a linked list
> evaluation, you should do that.  You have implied "tested" code is code that
> compiles.  If a developer wanted to replace the hashed linked link
> implementation it would be expected that they had done significant testing of
> the new code in user space with highly varied loads to show what they work well
> on and when the new code performs less well.

And in today's data structures lesson:

1) How is the kernel's linked-list implementation different than what
they usually teach in Data Structures 101?

2) Why do we do this?

(Consider it a Eudyptula-type challenge - figure it out *for yourself*
rather than spamming the list. ;)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 848 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20140819/0fd34d01/attachment.bin 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-19 15:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-18 20:20 Linked List versus Hashed Linked iIst Nick Krause
2014-08-18 20:22 ` Mohammed Gamal
2014-08-18 23:01 ` Greg Freemyer
2014-08-19  2:53   ` Nick Krause
2014-08-19  3:16     ` Ashok Babu
2014-08-19  3:41       ` Nick Krause
2014-08-19 11:45     ` Greg Freemyer
2014-08-19 15:28       ` Valdis.Kletnieks at vt.edu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.