From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 4 Apr 2000 20:03:40 +0200 (CEST) From: Andrea Arcangeli Subject: Re: PG_swap_entry bug in recent kernels In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Ben LaHaise Cc: riel@nl.linux.org, Linus Torvalds , linux-mm@kvack.org List-ID: On Tue, 4 Apr 2000, Ben LaHaise wrote: >try running a swap test under any recent devel kernel -- eventually you >will see an invalid swap entry show up in someone's pte causing a random acquire_swap_entry() should be just doing all the safety checking to make sure the page->index is caching a _valid_ swap entry. If it's garbage get_swap_page() will be recalled and a valid entry will be allocated (and if get_swap_page() returns an invalid-entry that's definitely not related to the PG_swap_entry logic anymore). Could you explain me how acquire_swap_entry() can return an invalid swap entry (starting with a random page->index of course)? I can't exclude there's a bug, but acquire_swap_entry was meant to return only valid entries despite of the page->index possible garbage and it seems it's doing that. >SIGSEGV (it's as if the entry was marked PROT_NONE -- present, but no >user access). I'm not saying you are not seeing that, I'm only trying to understand how can it be related to the PG_swap_entry logic. >> We're here talking about PG_swap_entry. The only object of that bit is to >> remains set on anonymous pages that aren't in the swap cache, so next time >> we'll re-add them to the swap cache we'll try to swap out them in the same >> swap entry as the page were before. > >Which is bogus. If it's an anonymous page that we want to swap out to the >same swap entry, leave it in the swap cache. There's a very basic reason that we can't left it in the swap cache. We can't left it in the swap cache simply because the swap cache is a read only entity and if you do a write access you can't left the page in the swap cache and change it without updating its on-disk counterpart. So we always remove the anonymous pages from the swap cache upon swapin-_write_-faults. That's also how 2.2.x works. Then I noticed it was possible to give persistence on the swap also to the dirtified pages without making the swap-cache dirty, by adding the PG_swap_entry bit that tell us if the page->index is currently caching the last swap entry where the page was allocated on the swap. That does all the job and we don't need dirty swap cache anymore. >> >If __delete_from_swap_cache() is called from a wrong code path, >> >that's something that should be fixed, of course (but that's >> >orthogonal to this). >> >> __delete_from_swap_cache is called by delete_from_swap_cache_nolock that >> is called by do_swap_page that does the swapin. > >As well as from shrink_mmap. I would not be complaining your patch if you would put the clear_bit within shrink_mmap :). BTW, also the SHM unmap points have to be checked to make sure the PG_swap_entry gets cleared. Also SHM uses swap cache and shares the do_swap_page code. >> >To quote from memory.c::do_swap_page() : >> > >> > if (write_access && !is_page_shared(page)) { >> > delete_from_swap_cache_nolock(page); >> > UnlockPage(page); >> > >> >If you think this is a bug, please fix it here... >> >> The above quoted code is correct. > >The code path that this patch really affects is shrink_mmap -> >__delete_from_swap_cache. Clearing the bit from shrink_mmap is an option, >but it doesn't make that much sense to me; if we're removing a page from ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >the swap cache, why aren't we clearing the PG_swap_entry bit? I'd rather ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As just said the whole point of the PG_swap_entry is to be set for regular anonymous swappable pages, _not_ for the swap cache at all. So it really make no sense to clear the PG_swap_entry bit while removing a page from the swap cache and I don't see your arguemnt. We have instead to cover the points where a swappable page become not a swappable page anymore. There's also to cover the shrink_mmap case where we free the page. I'm thinking about it to see if we can skip to cover the shrink_mmap case changing the point where we set the PG_swap_entry bit. I think it would be possible if we would avoid the double page fault in the write_swapin_access+page-is-shared case. But right now I think it's cleaner to keep the double page fault and to clear the PG_swap_entry within shrink_mmap while dropping page cache. >leave the page in the swap cache and set PG_dirty on it that have such >obscure sematics. If the page is shared you can't simply set the PG_dirty bit and change the contents of the page or you'll screwup all other in-core and on-disk users of the page. You have at least to do a COW and re-add the COWed page to a new swap cache entry so by definition failing to give persistence to the page in the swap space. In general (also for non-shared pages) setting the PG_dirty is inferior because it would waste swap space by keeping busy on the swap side a swap entry that it not uptodate and that's really not cache (there's no way somebody else can fault in the same swap cache page since the user who did the write-access is now the only user of the page and it has it just mapped in its pte so it can't fault on it). The reason we keep the page mapped in the read-only access is to skip a write to disk if we run low on memory but we can't skip the write to disk in the write-access case... In the case of multiple users on the swap side (for example when a process does a fork with some page in the swap) the PG_swap_entry can allow the two tasks to share the same swap entry if they gets swapped out and swapped in at different times. A PG_dirty swap cache wouldn't allow that because of the is-page-shared issue mentioned two paragraphs above. Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/