All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-24 21:55 Rafael J. Wysocki
  2006-04-24 22:16 ` Pavel Machek
  2006-04-25 10:28 ` Nick Piggin
  0 siblings, 2 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-24 21:55 UTC (permalink / raw
  To: Pavel Machek; +Cc: Linux PM, LKML, Nigel Cunningham

Hi,

The appended patch allows swsusp to break the "50% of the normal zone" limit.
This is achieved by using the observation that pages mapped by frozen
userland tasks need not be copied before saving.

The pages mapped by the frozen tasks are selected by checking if
page_mapped(page) returns true for them and whether they are not mapped
by the current task.  These pages are included in the snapshot image without
copying, because we wouldn't be able to rollback the suspend if they were
modified before or in the process of saving the image.

[I'm tempted to treat the page cache pages in a similar way, for the following
reason: AFAIK the page cache pages are only modified during block I/O
operations (or when they are reclaimed, but that's covered below).  The only
process that can perform such operations after the suspend image has been
created is the current task (ie. the suspending process).  However, the
current task must not perform any filesystem operations after creating the
image or it would probably corrupt some filesystems and thus its page cache
pages related to filesystems are not modified during suspend.  Of course
it may perform block I/O operations using block devices directly (eg. it
does this to save the image), but then such operations cannot be started
_before_ creating the image and _continued_ after the image has been created
or there's a risk of leaving the block device in an inconsistent state if the
suspend is successful.]

Still, it seems to be possible, although very unlikely, that these pages will
be reclaimed by try_to_free_pages() if there's not enough memory while
saving the image (at least I couldn't convince myself that it was impossible).
Therefore the patch takes these pages out of reach of try_to_free_pages()
by (temporarily) moving them out of their respective LRU lists.  This is done
after the image has been created so the LRU lists have to be restored only
if the suspend fails.

With this patch applied I was able to save (and restore ;-)) ~800 MB suspend
images on a box with 1 GB of RAM.

[Please don't beat me very hard, just couldn't resist. ;-)]

Greetings,
Rafael

---
 include/linux/rmap.h    |    6 +
 kernel/power/power.h    |   10 +
 kernel/power/snapshot.c |  259 +++++++++++++++++++++++++++++++++++-------------
 kernel/power/swsusp.c   |   15 +-
 kernel/power/user.c     |    4 
 mm/rmap.c               |   19 +++
 6 files changed, 238 insertions(+), 75 deletions(-)

Index: linux-2.6.17-rc1-mm3/mm/rmap.c
===================================================================
--- linux-2.6.17-rc1-mm3.orig/mm/rmap.c	2006-04-22 10:34:33.000000000 +0200
+++ linux-2.6.17-rc1-mm3/mm/rmap.c	2006-04-23 00:41:19.000000000 +0200
@@ -851,3 +851,22 @@ int try_to_unmap(struct page *page, int 
 	return ret;
 }
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+int page_mapped_by_current(struct page *page)
+{
+	struct vm_area_struct *vma;
+	int ret = 0;
+
+	spin_lock(&current->mm->page_table_lock);
+
+	for (vma = current->mm->mmap; vma; vma = vma->vm_next)
+		if (page_address_in_vma(page, vma) != -EFAULT) {
+			ret = 1;
+			break;
+		}
+
+	spin_unlock(&current->mm->page_table_lock);
+
+	return ret;
+}
+#endif /* CONFIG_SOFTWARE_SUSPEND */
Index: linux-2.6.17-rc1-mm3/include/linux/rmap.h
===================================================================
--- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
+++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
@@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
  */
 unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+int page_mapped_by_current(struct page *);
+#else
+static inline int page_mapped_by_current(struct page *page) { return 0; }
+#endif /* CONFIG_SOFTWARE_SUSPEND */
+
 #else	/* !CONFIG_MMU */
 
 #define anon_vma_init()		do {} while (0)
Index: linux-2.6.17-rc1-mm3/kernel/power/snapshot.c
===================================================================
--- linux-2.6.17-rc1-mm3.orig/kernel/power/snapshot.c	2006-04-22 10:34:33.000000000 +0200
+++ linux-2.6.17-rc1-mm3/kernel/power/snapshot.c	2006-04-23 11:40:13.000000000 +0200
@@ -13,6 +13,8 @@
 #include <linux/version.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/mm_inline.h>
+#include <linux/rmap.h>
 #include <linux/suspend.h>
 #include <linux/smp_lock.h>
 #include <linux/delay.h>
@@ -78,7 +80,7 @@ static int save_arch_mem(void)
 	void *kaddr;
 	struct arch_saveable_page *tmp = arch_pages;
 
-	pr_debug("swsusp: Saving arch specific memory");
+	pr_debug("swsusp: Saving arch specific memory\n");
 	while (tmp) {
 		tmp->data = (void *)__get_free_page(GFP_ATOMIC);
 		if (!tmp->data)
@@ -251,6 +253,14 @@ int restore_special_mem(void)
 	return ret;
 }
 
+/* Represents a stacked allocated page to be used in the future */
+struct res_page {
+	struct res_page *next;
+	char padding[PAGE_SIZE - sizeof(void *)];
+};
+
+static struct res_page *page_list;
+
 static int pfn_is_nosave(unsigned long pfn)
 {
 	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >> PAGE_SHIFT;
@@ -259,7 +269,7 @@ static int pfn_is_nosave(unsigned long p
 }
 
 /**
- *	saveable - Determine whether a page should be cloned or not.
+ *	saveable - Determine whether a page should be saved or not.
  *	@pfn:	The page
  *
  *	We save a page if it's Reserved, and not in the range of pages
@@ -267,9 +277,8 @@ static int pfn_is_nosave(unsigned long p
  *	isn't part of a free chunk of pages.
  */
 
-static int saveable(struct zone *zone, unsigned long *zone_pfn)
+static int saveable(unsigned long pfn)
 {
-	unsigned long pfn = *zone_pfn + zone->zone_start_pfn;
 	struct page *page;
 
 	if (!pfn_valid(pfn))
@@ -286,20 +295,46 @@ static int saveable(struct zone *zone, u
 	return 1;
 }
 
-unsigned int count_data_pages(void)
+/**
+ *	need_to_copy - determine if a page needs to be copied before saving.
+ *	Returns false if the page can be saved without copying.
+ */
+
+static int need_to_copy(struct page *page)
+{
+	if (!PageLRU(page) || PageCompound(page))
+		return 1;
+	if (page_mapped(page))
+		return page_mapped_by_current(page);
+
+	return 1;
+}
+
+unsigned int count_data_pages(unsigned int *total)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
-	unsigned int n = 0;
+	unsigned int n, m;
 
+	n = m = 0;
 	for_each_zone (zone) {
 		if (is_highmem(zone))
 			continue;
 		mark_free_pages(zone);
-		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
-			n += saveable(zone, &zone_pfn);
+		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
+			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+
+			if (saveable(pfn)) {
+				n++;
+				if (!need_to_copy(pfn_to_page(pfn)))
+					m++;
+			}
+		}
 	}
-	return n;
+	if (total)
+		*total = n;
+
+	return n - m;
 }
 
 static void copy_data_pages(struct pbe *pblist)
@@ -307,25 +342,51 @@ static void copy_data_pages(struct pbe *
 	struct zone *zone;
 	unsigned long zone_pfn;
 	struct pbe *pbe, *p;
+	struct res_page *ptr;
 
 	pbe = pblist;
 	for_each_zone (zone) {
 		if (is_highmem(zone))
 			continue;
+
 		mark_free_pages(zone);
-		/* This is necessary for swsusp_free() */
+		/* This is necessary for free_image() */
 		for_each_pb_page (p, pblist)
 			SetPageNosaveFree(virt_to_page(p));
+
 		for_each_pbe (p, pblist)
-			SetPageNosaveFree(virt_to_page(p->address));
+			if (p->address && p->orig_address != p->address)
+				SetPageNosaveFree(virt_to_page(p->address));
+
+		ptr = page_list;
+		while (ptr) {
+			SetPageNosaveFree(virt_to_page(ptr));
+			ptr = ptr->next;
+		}
+
 		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
-			if (saveable(zone, &zone_pfn)) {
+			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+
+			if (saveable(pfn)) {
 				struct page *page;
-				page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
+
 				BUG_ON(!pbe);
+				page = pfn_to_page(pfn);
 				pbe->orig_address = (unsigned long)page_address(page);
-				/* copy_page is not usable for copying task structs. */
-				memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
+				if (need_to_copy(page)) {
+					BUG_ON(!page_list);
+					pbe->address = (unsigned long)page_list;
+					page_list = page_list->next;
+					/*
+					 * copy_page is not usable for copying
+					 * task structs.
+					 */
+					memcpy((void *)pbe->address,
+						(void *)pbe->orig_address,
+						PAGE_SIZE);
+				} else {
+					pbe->address = pbe->orig_address;
+				}
 				pbe = pbe->next;
 			}
 		}
@@ -409,7 +470,7 @@ static inline void *alloc_image_page(gfp
 	res = (void *)get_zeroed_page(gfp_mask);
 	if (safe_needed)
 		while (res && PageNosaveFree(virt_to_page(res))) {
-			/* The page is unsafe, mark it for swsusp_free() */
+			/* The page is unsafe, mark it for free_image() */
 			SetPageNosave(virt_to_page(res));
 			unsafe_pages++;
 			res = (void *)get_zeroed_page(gfp_mask);
@@ -462,12 +523,74 @@ struct pbe *alloc_pagedir(unsigned int n
 	return pblist;
 }
 
+static LIST_HEAD(active_pages);
+static LIST_HEAD(inactive_pages);
+
+/**
+ *	protect_data_pages - move data pages that need to be protected from
+ *	being reclaimed out of their respective LRU lists
+ */
+
+static void protect_data_pages(struct pbe *pblist)
+{
+	struct pbe *p;
+
+	for_each_pbe (p, pblist)
+		if (p->address == p->orig_address) {
+			struct page *page = virt_to_page(p->address);
+			struct zone *zone = page_zone(page);
+
+			spin_lock(&zone->lru_lock);
+			if (PageActive(page)) {
+				del_page_from_active_list(zone, page);
+				list_add(&page->lru, &active_pages);
+			} else {
+				del_page_from_inactive_list(zone, page);
+				list_add(&page->lru, &inactive_pages);
+			}
+			spin_unlock(&zone->lru_lock);
+			ClearPageLRU(page);
+		}
+}
+
+/**
+ *	restore_active_inactive_lists - if suspend fails, the pages protected
+ *	with protect_data_pages() have to be moved back to their respective
+ *	lists
+ */
+
+static void restore_active_inactive_lists(void)
+{
+	struct page *page;
+	struct zone *zone;
+
+	while(!list_empty(&active_pages)) {
+		page = list_entry(active_pages.prev, struct page, lru);
+		zone = page_zone(page);
+		list_del(&page->lru);
+		SetPageLRU(page);
+		spin_lock_irq(&zone->lru_lock);
+		add_page_to_active_list(zone, page);
+		spin_unlock_irq(&zone->lru_lock);
+	}
+
+	while(!list_empty(&inactive_pages)) {
+		page = list_entry(inactive_pages.prev, struct page, lru);
+		zone = page_zone(page);
+		list_del(&page->lru);
+		SetPageLRU(page);
+		spin_lock_irq(&zone->lru_lock);
+		add_page_to_inactive_list(zone, page);
+		spin_unlock_irq(&zone->lru_lock);
+	}
+}
+
 /**
  * Free pages we allocated for suspend. Suspend pages are alocated
  * before atomic copy, so we need to free them after resume.
  */
 
-void swsusp_free(void)
+void free_image(void)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
@@ -490,6 +613,11 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+void swsusp_free(void)
+{
+	free_image();
+	restore_active_inactive_lists();
+}
 
 /**
  *	enough_free_mem - Make sure we enough free memory to snapshot.
@@ -498,32 +626,28 @@ void swsusp_free(void)
  *	free pages.
  */
 
-static int enough_free_mem(unsigned int nr_pages)
+static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
 {
 	struct zone *zone;
-	unsigned int n = 0;
+	long n = 0;
+
+	pr_debug("swsusp: pages needed: %u + %lu + %lu, free: %u\n",
+		 copy_pages,
+		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
+		 EXTRA_PAGES, nr_free_pages());
 
 	for_each_zone (zone)
-		if (!is_highmem(zone))
+		if (!is_highmem(zone) && populated_zone(zone)) {
 			n += zone->free_pages;
-	pr_debug("swsusp: available memory: %u pages\n", n);
-	return n > (nr_pages + PAGES_FOR_IO +
-		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
-}
-
-static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
-{
-	struct pbe *p;
+			n -= zone->lowmem_reserve[ZONE_NORMAL];
+		}
 
-	for_each_pbe (p, pblist) {
-		p->address = (unsigned long)alloc_image_page(gfp_mask, safe_needed);
-		if (!p->address)
-			return -ENOMEM;
-	}
-	return 0;
+	pr_debug("swsusp: available memory: %ld pages\n", n);
+	return n > (long)(copy_pages + EXTRA_PAGES +
+		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
 }
 
-static struct pbe *swsusp_alloc(unsigned int nr_pages)
+static struct pbe *swsusp_alloc(unsigned int nr_pages, unsigned int copy_pages)
 {
 	struct pbe *pblist;
 
@@ -532,10 +656,19 @@ static struct pbe *swsusp_alloc(unsigned
 		return NULL;
 	}
 
-	if (alloc_data_pages(pblist, GFP_ATOMIC | __GFP_COLD, 0)) {
-		printk(KERN_ERR "suspend: Allocating image pages failed.\n");
-		swsusp_free();
-		return NULL;
+	page_list = NULL;
+	while (copy_pages--) {
+		struct res_page *ptr;
+
+		ptr = alloc_image_page(GFP_ATOMIC | __GFP_COLD, 0);
+		if (!ptr) {
+			printk(KERN_ERR
+				"suspend: Allocating image pages failed.\n");
+			free_image();
+			return NULL;
+		}
+		ptr->next = page_list;
+		page_list = ptr;
 	}
 
 	return pblist;
@@ -543,25 +676,21 @@ static struct pbe *swsusp_alloc(unsigned
 
 asmlinkage int swsusp_save(void)
 {
-	unsigned int nr_pages;
+	unsigned int nr_pages, copy_pages;
 
 	pr_debug("swsusp: critical section: \n");
 
 	drain_local_pages();
-	nr_pages = count_data_pages();
-	printk("swsusp: Need to copy %u pages\n", nr_pages);
-
-	pr_debug("swsusp: pages needed: %u + %lu + %u, free: %u\n",
-		 nr_pages,
-		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
-		 PAGES_FOR_IO, nr_free_pages());
+	copy_pages = count_data_pages(&nr_pages);
+	printk("swsusp: Need to save %u pages\n", nr_pages);
+	printk("swsusp: %u pages to copy\n", copy_pages);
 
-	if (!enough_free_mem(nr_pages)) {
+	if (!enough_free_mem(nr_pages, copy_pages)) {
 		printk(KERN_ERR "swsusp: Not enough free memory\n");
 		return -ENOMEM;
 	}
 
-	pagedir_nosave = swsusp_alloc(nr_pages);
+	pagedir_nosave = swsusp_alloc(nr_pages, copy_pages);
 	if (!pagedir_nosave)
 		return -ENOMEM;
 
@@ -571,6 +700,9 @@ asmlinkage int swsusp_save(void)
 	drain_local_pages();
 	copy_data_pages(pagedir_nosave);
 
+	/* Make sure the pages that we have not copied won't be reclaimed */
+	protect_data_pages(pagedir_nosave);
+
 	/*
 	 * End of critical section. From now on, we can write to memory,
 	 * but we should not touch disk. This specially means we must _not_
@@ -580,7 +712,7 @@ asmlinkage int swsusp_save(void)
 	nr_copy_pages = nr_pages;
 	nr_meta_pages = (nr_pages * sizeof(long) + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
-	printk("swsusp: critical section/: done (%d pages copied)\n", nr_pages);
+	printk("swsusp: critical section/: done (%d pages)\n", nr_pages);
 	return 0;
 }
 
@@ -643,7 +775,7 @@ int snapshot_read_next(struct snapshot_h
 	if (handle->page > nr_meta_pages + nr_copy_pages)
 		return 0;
 	if (!buffer) {
-		/* This makes the buffer be freed by swsusp_free() */
+		/* This makes the buffer be freed by free_image() */
 		buffer = alloc_image_page(GFP_ATOMIC, 0);
 		if (!buffer)
 			return -ENOMEM;
@@ -799,13 +931,6 @@ static inline struct pbe *unpack_orig_ad
  *	of "safe" which will be used later
  */
 
-struct safe_page {
-	struct safe_page *next;
-	char padding[PAGE_SIZE - sizeof(void *)];
-};
-
-static struct safe_page *safe_pages;
-
 static int prepare_image(struct snapshot_handle *handle)
 {
 	int error = 0;
@@ -822,21 +947,21 @@ static int prepare_image(struct snapshot
 		if (!pblist)
 			error = -ENOMEM;
 	}
-	safe_pages = NULL;
+	page_list = NULL;
 	if (!error && nr_pages > unsafe_pages) {
 		nr_pages -= unsafe_pages;
 		while (nr_pages--) {
-			struct safe_page *ptr;
+			struct res_page *ptr;
 
-			ptr = (struct safe_page *)get_zeroed_page(GFP_ATOMIC);
+			ptr = (struct res_page *)get_zeroed_page(GFP_ATOMIC);
 			if (!ptr) {
 				error = -ENOMEM;
 				break;
 			}
 			if (!PageNosaveFree(virt_to_page(ptr))) {
 				/* The page is "safe", add it to the list */
-				ptr->next = safe_pages;
-				safe_pages = ptr;
+				ptr->next = page_list;
+				page_list = ptr;
 			}
 			/* Mark the page as allocated */
 			SetPageNosave(virt_to_page(ptr));
@@ -847,7 +972,7 @@ static int prepare_image(struct snapshot
 		pagedir_nosave = pblist;
 	} else {
 		handle->pbe = NULL;
-		swsusp_free();
+		free_image();
 	}
 	return error;
 }
@@ -871,8 +996,8 @@ static void *get_buffer(struct snapshot_
 	 * The "original" page frame has not been allocated and we have to
 	 * use a "safe" page frame to store the read page
 	 */
-	pbe->address = (unsigned long)safe_pages;
-	safe_pages = safe_pages->next;
+	pbe->address = (unsigned long)page_list;
+	page_list = page_list->next;
 	if (last)
 		last->next = pbe;
 	handle->last_pbe = pbe;
@@ -908,7 +1033,7 @@ int snapshot_write_next(struct snapshot_
 	if (handle->prev && handle->page > nr_meta_pages + nr_copy_pages)
 		return 0;
 	if (!buffer) {
-		/* This makes the buffer be freed by swsusp_free() */
+		/* This makes the buffer be freed by free_image() */
 		buffer = alloc_image_page(GFP_ATOMIC, 0);
 		if (!buffer)
 			return -ENOMEM;
Index: linux-2.6.17-rc1-mm3/kernel/power/power.h
===================================================================
--- linux-2.6.17-rc1-mm3.orig/kernel/power/power.h	2006-04-22 10:34:33.000000000 +0200
+++ linux-2.6.17-rc1-mm3/kernel/power/power.h	2006-04-22 10:34:45.000000000 +0200
@@ -48,7 +48,14 @@ extern dev_t swsusp_resume_device;
 extern asmlinkage int swsusp_arch_suspend(void);
 extern asmlinkage int swsusp_arch_resume(void);
 
-extern unsigned int count_data_pages(void);
+#ifdef CONFIG_BLK_DEV_INITRD
+#define EXTRA_PAGES	(PAGES_FOR_IO + \
+			(2 * CONFIG_BLK_DEV_RAM_SIZE * 1024) / PAGE_SIZE)
+#else
+#define EXTRA_PAGES	PAGES_FOR_IO
+#endif /* CONFIG_BLK_DEV_INITRD */
+
+extern unsigned int count_data_pages(unsigned int *total);
 
 struct snapshot_handle {
 	loff_t		offset;
@@ -111,6 +118,7 @@ extern int restore_special_mem(void);
 
 extern int swsusp_check(void);
 extern int swsusp_shrink_memory(void);
+extern void free_image(void);
 extern void swsusp_free(void);
 extern int swsusp_suspend(void);
 extern int swsusp_resume(void);
Index: linux-2.6.17-rc1-mm3/kernel/power/swsusp.c
===================================================================
--- linux-2.6.17-rc1-mm3.orig/kernel/power/swsusp.c	2006-04-22 10:34:33.000000000 +0200
+++ linux-2.6.17-rc1-mm3/kernel/power/swsusp.c	2006-04-22 10:34:45.000000000 +0200
@@ -177,28 +177,29 @@ int swsusp_shrink_memory(void)
 	long size, tmp;
 	struct zone *zone;
 	unsigned long pages = 0;
-	unsigned int i = 0;
+	unsigned int to_save, i = 0;
 	char *p = "-\\|/";
 
 	printk("Shrinking memory...  ");
 	do {
 		size = 2 * count_special_pages();
-		size += size / 50 + count_data_pages();
-		size += (size + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
-			PAGES_FOR_IO;
+		size += size / 50 + count_data_pages(&to_save);
+		size += (to_save + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
+			EXTRA_PAGES;
 		tmp = size;
 		for_each_zone (zone)
 			if (!is_highmem(zone) && populated_zone(zone)) {
 				tmp -= zone->free_pages;
 				tmp += zone->lowmem_reserve[ZONE_NORMAL];
 			}
+		size = to_save * (PAGE_SIZE + sizeof(long));
 		if (tmp > 0) {
 			tmp = __shrink_memory(tmp);
 			if (!tmp)
 				return -ENOMEM;
 			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+		} else if (size > image_size) {
+			tmp = __shrink_memory(size - image_size);
 			pages += tmp;
 		}
 		printk("\b%c", p[i++%4]);
@@ -261,7 +262,7 @@ int swsusp_resume(void)
 	 * very tight, so we have to free it as soon as we can to avoid
 	 * subsequent failures
 	 */
-	swsusp_free();
+	free_image();
 	restore_processor_state();
 	restore_special_mem();
 	touch_softlockup_watchdog();
Index: linux-2.6.17-rc1-mm3/kernel/power/user.c
===================================================================
--- linux-2.6.17-rc1-mm3.orig/kernel/power/user.c	2006-04-22 10:34:33.000000000 +0200
+++ linux-2.6.17-rc1-mm3/kernel/power/user.c	2006-04-22 10:34:45.000000000 +0200
@@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
 	case SNAPSHOT_UNFREEZE:
 		if (!data->frozen)
 			break;
+		if (data->ready) {
+			error = -EPERM;
+			break;
+		}
 		down(&pm_sem);
 		thaw_processes();
 		enable_nonboot_cpus();

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-24 21:55 [RFC][PATCH] swsusp: support creating bigger images Rafael J. Wysocki
@ 2006-04-24 22:16 ` Pavel Machek
  2006-04-25  8:26     ` Rafael J. Wysocki
  2006-04-25 10:28 ` Nick Piggin
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2006-04-24 22:16 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Nigel Cunningham

Hi!

> The appended patch allows swsusp to break the "50% of the normal zone" limit.
> This is achieved by using the observation that pages mapped by frozen
> userland tasks need not be copied before saving.

I've not followed the patch too carefully, but it is not as bad as I
expected...

> With this patch applied I was able to save (and restore ;-)) ~800 MB suspend
> images on a box with 1 GB of RAM.

And did it also work second time? ;-).

Okay, so it can be done, and patch does not look too bad. It still
scares me. Is 800MB image more responsive than 500MB after resume?

I guess we can revert to old behaviour by simply returning 1 from
need_to_copy, right?

I assume that need_to_copy returns 1 in case page is shared by current
and some other process?

> [Please don't beat me very hard, just couldn't resist. ;-)]

Well, you are about to force me to learn about mm internals. Plus you
force everyone who tries to modify swsusp. ... it may be okay if
benefit is great enough and if it gets proper testing. Not 2.6.17
material.

Is benefit worth it?


> --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
>   */
>  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
>  
> +#ifdef CONFIG_SOFTWARE_SUSPEND
> +int page_mapped_by_current(struct page *);
> +#else
> +static inline int page_mapped_by_current(struct page *page) { return 0; }
> +#endif /* CONFIG_SOFTWARE_SUSPEND */

I'd leave it undefined. That will prevent nasty surprise when someone
tries to use it w/o CONFIG_SOFTWARE_SUSPEND set.

> @@ -251,6 +253,14 @@ int restore_special_mem(void)
>  	return ret;
>  }
>  
> +/* Represents a stacked allocated page to be used in the future */
> +struct res_page {
> +	struct res_page *next;
> +	char padding[PAGE_SIZE - sizeof(void *)];
> +};
> +
> +static struct res_page *page_list;
> +
>  static int pfn_is_nosave(unsigned long pfn)
>  {
>  	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >>
> PAGE_SHIFT;

Is this part of some other patch?


> @@ -490,6 +613,11 @@ void swsusp_free(void)
>  	buffer = NULL;
>  }
>  
> +void swsusp_free(void)
> +{
> +	free_image();
> +	restore_active_inactive_lists();
> +}

This still scares me. Nice test would be to
save/restore_active_inactive_lists repeatedly in a loop ... on running
system ... aha, but it probably can't work outside of refrigerator?

>  	case SNAPSHOT_UNFREEZE:
>  		if (!data->frozen)
>  			break;
> +		if (data->ready) {
> +			error = -EPERM;
> +			break;
> +		}
>  		down(&pm_sem);
>  		thaw_processes();
>  		enable_nonboot_cpus();

Error from UNFREEZE is not nice:

Unfreeze:
        unfreeze(snapshot_fd);
        return error;
}
... we don't handle it. OTOH I guess it will be all fixed on exit()?

								Pavel

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-24 22:16 ` Pavel Machek
@ 2006-04-25  8:26     ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25  8:26 UTC (permalink / raw
  To: Pavel Machek; +Cc: Linux PM, LKML, Nigel Cunningham

Hi,

On Tuesday 25 April 2006 00:16, Pavel Machek wrote:
> > The appended patch allows swsusp to break the "50% of the normal zone" limit.
> > This is achieved by using the observation that pages mapped by frozen
> > userland tasks need not be copied before saving.
> 
> I've not followed the patch too carefully, but it is not as bad as I
> expected...
> 
> > With this patch applied I was able to save (and restore ;-)) ~800 MB suspend
> > images on a box with 1 GB of RAM.
> 
> And did it also work second time? ;-).

Yes, somethinkg like 10 times in a row. :-)

> Okay, so it can be done, and patch does not look too bad. It still
> scares me. Is 800MB image more responsive than 500MB after resume?

Yes, it is, slightly, but I think 800 meg images are impractical for
performance reasons (like IMO everything above 500 meg with the current
hardware).  However this means we can save 80% of RAM with the patch
and that should be 400 meg instead of 250 on a 500 meg machine, or
200 meg instead of 125 on a 250 meg machine.

Apart from this we save some memory allocations and copyings during
suspend.

> I guess we can revert to old behaviour by simply returning 1 from
> need_to_copy, right?

Yes.

> I assume that need_to_copy returns 1 in case page is shared by current
> and some other process?

Yes.  [If the page is shared with the current task it has to mapped by it, in
which case it will be copied.]

> > [Please don't beat me very hard, just couldn't resist. ;-)]
> 
> Well, you are about to force me to learn about mm internals. Plus you
> force everyone who tries to modify swsusp. ... it may be okay if
> benefit is great enough and if it gets proper testing. Not 2.6.17
> material.

Obviously not, and yes it requires a _lot_ of testing especially on boxes
with 512 meg of RAM or less.

> Is benefit worth it?

Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
unnecessary (as of today, but this may change if faster disks are available).
On boxes with 512 MB of RAM or less it may change a lot as far as the
responsiveness after resume is concerned.

Anyway do you think it may go into -mm (unless Andrew shoots it down,
that is ;-))?
 
> > --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> > @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
> >   */
> >  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *);
> > +#else
> > +static inline int page_mapped_by_current(struct page *page) { return 0; }
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> I'd leave it undefined. That will prevent nasty surprise when someone
> tries to use it w/o CONFIG_SOFTWARE_SUSPEND set.

OK

> > @@ -251,6 +253,14 @@ int restore_special_mem(void)
> >  	return ret;
> >  }
> >  
> > +/* Represents a stacked allocated page to be used in the future */
> > +struct res_page {
> > +	struct res_page *next;
> > +	char padding[PAGE_SIZE - sizeof(void *)];
> > +};
> > +
> > +static struct res_page *page_list;
> > +
> >  static int pfn_is_nosave(unsigned long pfn)
> >  {
> >  	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >>
> > PAGE_SHIFT;
> 
> Is this part of some other patch?

No.  That's because I use a linked list of allocated pages that has been
defined in a slightly different way in
swsusp-use-less-memory-during-resume.patch,
so I just redefine and reuse it.

> > @@ -490,6 +613,11 @@ void swsusp_free(void)
> >  	buffer = NULL;
> >  }
> >  
> > +void swsusp_free(void)
> > +{
> > +	free_image();
> > +	restore_active_inactive_lists();
> > +}
> 
> This still scares me. Nice test would be to
> save/restore_active_inactive_lists repeatedly in a loop ... on running
> system ... aha, but it probably can't work outside of refrigerator?

restore_active_inactive_lists() only does something if suspend fails
(otherwise the lists active_pages and inactive_pages are empty).

The suspend failure may be simulated by running swapoff and suspendig
with "echo disk > /sys/power/state".  However it wouldn't be interesting to
do this in a loop because it will make the same set of pages go out of the
LRU lists and back in a round robin fashion.

I did the the following test:
- ran swapoff
- tried to suspend with "echo disk > /sys/power/state" (which failed,
obviously)
- checked the numbers of active/inactive pages in /proc/meminfo
- ran swapon
- set image_size to 0
- suspended with "echo disk > /sys/power/state" to verify if the right number
of pages would be reclaimed
- resumed

at it worked, apparently.

> >  	case SNAPSHOT_UNFREEZE:
> >  		if (!data->frozen)
> >  			break;
> > +		if (data->ready) {
> > +			error = -EPERM;
> > +			break;
> > +		}
> >  		down(&pm_sem);
> >  		thaw_processes();
> >  		enable_nonboot_cpus();
> 
> Error from UNFREEZE is not nice:
> 
> Unfreeze:
>         unfreeze(snapshot_fd);
>         return error;
> }
> ... we don't handle it.

It will only occur if we don't call free_snapshot() before unfreeze() which is
a bug anyway.  We prevent it from occuring by always calling free_snapshot()
before unfreeze(). :-)

> OTOH I guess it will be all fixed on exit()?

Well, sort of, but I think the LRU lists always have to be restored before
thawing processes (eg. kswapd).

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25  8:26     ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25  8:26 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Linux PM, LKML

Hi,

On Tuesday 25 April 2006 00:16, Pavel Machek wrote:
> > The appended patch allows swsusp to break the "50% of the normal zone" limit.
> > This is achieved by using the observation that pages mapped by frozen
> > userland tasks need not be copied before saving.
> 
> I've not followed the patch too carefully, but it is not as bad as I
> expected...
> 
> > With this patch applied I was able to save (and restore ;-)) ~800 MB suspend
> > images on a box with 1 GB of RAM.
> 
> And did it also work second time? ;-).

Yes, somethinkg like 10 times in a row. :-)

> Okay, so it can be done, and patch does not look too bad. It still
> scares me. Is 800MB image more responsive than 500MB after resume?

Yes, it is, slightly, but I think 800 meg images are impractical for
performance reasons (like IMO everything above 500 meg with the current
hardware).  However this means we can save 80% of RAM with the patch
and that should be 400 meg instead of 250 on a 500 meg machine, or
200 meg instead of 125 on a 250 meg machine.

Apart from this we save some memory allocations and copyings during
suspend.

> I guess we can revert to old behaviour by simply returning 1 from
> need_to_copy, right?

Yes.

> I assume that need_to_copy returns 1 in case page is shared by current
> and some other process?

Yes.  [If the page is shared with the current task it has to mapped by it, in
which case it will be copied.]

> > [Please don't beat me very hard, just couldn't resist. ;-)]
> 
> Well, you are about to force me to learn about mm internals. Plus you
> force everyone who tries to modify swsusp. ... it may be okay if
> benefit is great enough and if it gets proper testing. Not 2.6.17
> material.

Obviously not, and yes it requires a _lot_ of testing especially on boxes
with 512 meg of RAM or less.

> Is benefit worth it?

Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
unnecessary (as of today, but this may change if faster disks are available).
On boxes with 512 MB of RAM or less it may change a lot as far as the
responsiveness after resume is concerned.

Anyway do you think it may go into -mm (unless Andrew shoots it down,
that is ;-))?
 
> > --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> > @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
> >   */
> >  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *);
> > +#else
> > +static inline int page_mapped_by_current(struct page *page) { return 0; }
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> I'd leave it undefined. That will prevent nasty surprise when someone
> tries to use it w/o CONFIG_SOFTWARE_SUSPEND set.

OK

> > @@ -251,6 +253,14 @@ int restore_special_mem(void)
> >  	return ret;
> >  }
> >  
> > +/* Represents a stacked allocated page to be used in the future */
> > +struct res_page {
> > +	struct res_page *next;
> > +	char padding[PAGE_SIZE - sizeof(void *)];
> > +};
> > +
> > +static struct res_page *page_list;
> > +
> >  static int pfn_is_nosave(unsigned long pfn)
> >  {
> >  	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >>
> > PAGE_SHIFT;
> 
> Is this part of some other patch?

No.  That's because I use a linked list of allocated pages that has been
defined in a slightly different way in
swsusp-use-less-memory-during-resume.patch,
so I just redefine and reuse it.

> > @@ -490,6 +613,11 @@ void swsusp_free(void)
> >  	buffer = NULL;
> >  }
> >  
> > +void swsusp_free(void)
> > +{
> > +	free_image();
> > +	restore_active_inactive_lists();
> > +}
> 
> This still scares me. Nice test would be to
> save/restore_active_inactive_lists repeatedly in a loop ... on running
> system ... aha, but it probably can't work outside of refrigerator?

restore_active_inactive_lists() only does something if suspend fails
(otherwise the lists active_pages and inactive_pages are empty).

The suspend failure may be simulated by running swapoff and suspendig
with "echo disk > /sys/power/state".  However it wouldn't be interesting to
do this in a loop because it will make the same set of pages go out of the
LRU lists and back in a round robin fashion.

I did the the following test:
- ran swapoff
- tried to suspend with "echo disk > /sys/power/state" (which failed,
obviously)
- checked the numbers of active/inactive pages in /proc/meminfo
- ran swapon
- set image_size to 0
- suspended with "echo disk > /sys/power/state" to verify if the right number
of pages would be reclaimed
- resumed

at it worked, apparently.

> >  	case SNAPSHOT_UNFREEZE:
> >  		if (!data->frozen)
> >  			break;
> > +		if (data->ready) {
> > +			error = -EPERM;
> > +			break;
> > +		}
> >  		down(&pm_sem);
> >  		thaw_processes();
> >  		enable_nonboot_cpus();
> 
> Error from UNFREEZE is not nice:
> 
> Unfreeze:
>         unfreeze(snapshot_fd);
>         return error;
> }
> ... we don't handle it.

It will only occur if we don't call free_snapshot() before unfreeze() which is
a bug anyway.  We prevent it from occuring by always calling free_snapshot()
before unfreeze(). :-)

> OTOH I guess it will be all fixed on exit()?

Well, sort of, but I think the LRU lists always have to be restored before
thawing processes (eg. kswapd).

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25  8:26     ` Rafael J. Wysocki
@ 2006-04-25 10:04       ` Pavel Machek
  -1 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 10:04 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Nigel Cunningham

Hi!

> > Okay, so it can be done, and patch does not look too bad. It still
> > scares me. Is 800MB image more responsive than 500MB after resume?
> 
> Yes, it is, slightly, but I think 800 meg images are impractical for
> performance reasons (like IMO everything above 500 meg with the current
> hardware).  However this means we can save 80% of RAM with the patch
> and that should be 400 meg instead of 250 on a 500 meg machine, or
> 200 meg instead of 125 on a 250 meg machine.

Could we get few people trying it on such small machines to see if it
is really that noticeable?

> > Is benefit worth it?
> 
> Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
> unnecessary (as of today, but this may change if faster disks are available).
> On boxes with 512 MB of RAM or less it may change a lot as far as the
> responsiveness after resume is concerned.
> 
> Anyway do you think it may go into -mm (unless Andrew shoots it down,
> that is ;-))?

I'd really like to hear that it helps someone before going to
-mm. It looks clean enough but still it is 300 lines... 

									Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 10:04       ` Pavel Machek
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 10:04 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

Hi!

> > Okay, so it can be done, and patch does not look too bad. It still
> > scares me. Is 800MB image more responsive than 500MB after resume?
> 
> Yes, it is, slightly, but I think 800 meg images are impractical for
> performance reasons (like IMO everything above 500 meg with the current
> hardware).  However this means we can save 80% of RAM with the patch
> and that should be 400 meg instead of 250 on a 500 meg machine, or
> 200 meg instead of 125 on a 250 meg machine.

Could we get few people trying it on such small machines to see if it
is really that noticeable?

> > Is benefit worth it?
> 
> Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
> unnecessary (as of today, but this may change if faster disks are available).
> On boxes with 512 MB of RAM or less it may change a lot as far as the
> responsiveness after resume is concerned.
> 
> Anyway do you think it may go into -mm (unless Andrew shoots it down,
> that is ;-))?

I'd really like to hear that it helps someone before going to
-mm. It looks clean enough but still it is 300 lines... 

									Pavel
-- 
Thanks for all the (sleeping) penguins.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-24 21:55 [RFC][PATCH] swsusp: support creating bigger images Rafael J. Wysocki
  2006-04-24 22:16 ` Pavel Machek
@ 2006-04-25 10:28 ` Nick Piggin
  2006-04-25 15:39   ` Rafael J. Wysocki
  1 sibling, 1 reply; 53+ messages in thread
From: Nick Piggin @ 2006-04-25 10:28 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Pavel Machek, Linux PM, LKML, Nigel Cunningham

pageRafael J. Wysocki wrote:

> Index: linux-2.6.17-rc1-mm3/mm/rmap.c
> ===================================================================
> --- linux-2.6.17-rc1-mm3.orig/mm/rmap.c	2006-04-22 10:34:33.000000000 +0200
> +++ linux-2.6.17-rc1-mm3/mm/rmap.c	2006-04-23 00:41:19.000000000 +0200
> @@ -851,3 +851,22 @@ int try_to_unmap(struct page *page, int 
>  	return ret;
>  }
>  
> +#ifdef CONFIG_SOFTWARE_SUSPEND
> +int page_mapped_by_current(struct page *page)
> +{
> +	struct vm_area_struct *vma;
> +	int ret = 0;
> +
> +	spin_lock(&current->mm->page_table_lock);
> +
> +	for (vma = current->mm->mmap; vma; vma = vma->vm_next)
> +		if (page_address_in_vma(page, vma) != -EFAULT) {
> +			ret = 1;
> +			break;
> +		}
> +
> +	spin_unlock(&current->mm->page_table_lock);
> +
> +	return ret;
> +}
> +#endif /* CONFIG_SOFTWARE_SUSPEND */

Why not just pass in the task struct? It might make the interface more
useful elsewhere.

> Index: linux-2.6.17-rc1-mm3/include/linux/rmap.h
> ===================================================================
> --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
>   */
>  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
>  
> +#ifdef CONFIG_SOFTWARE_SUSPEND
> +int page_mapped_by_current(struct page *);
> +#else
> +static inline int page_mapped_by_current(struct page *page) { return 0; }
> +#endif /* CONFIG_SOFTWARE_SUSPEND */

This just wrong. What if some random unrelated code calls
page_mapped_by_current? You should only fold inlines if their result
folds accordingly.

For now just leave it out completely if !CONFIG_SOFTWARE_SUSPEND.

  > -unsigned int count_data_pages(void)
> +/**
> + *	need_to_copy - determine if a page needs to be copied before saving.
> + *	Returns false if the page can be saved without copying.
> + */
> +
> +static int need_to_copy(struct page *page)
> +{
> +	if (!PageLRU(page) || PageCompound(page))
> +		return 1;
> +	if (page_mapped(page))
> +		return page_mapped_by_current(page);
> +
> +	return 1;
> +}

I'd much rather VM internal type stuff get moved *out* of kernel/power :(

It needs more comments too. Also, how important is it for the page to be
off the LRU? What if kswapd has taken it off the LRU temporarily (or is it
guanteed not to at this point)? What if it is in an lru_add pagevec?

> +
> +unsigned int count_data_pages(unsigned int *total)
>  {
>  	struct zone *zone;
>  	unsigned long zone_pfn;
> -	unsigned int n = 0;
> +	unsigned int n, m;
>  
> +	n = m = 0;
>  	for_each_zone (zone) {
>  		if (is_highmem(zone))
>  			continue;
>  		mark_free_pages(zone);
> -		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
> -			n += saveable(zone, &zone_pfn);
> +		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
> +			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
> +
> +			if (saveable(pfn)) {
> +				n++;
> +				if (!need_to_copy(pfn_to_page(pfn)))
> +					m++;
> +			}
> +		}
>  	}
> -	return n;
> +	if (total)
> +		*total = n;
> +
> +	return n - m;
>  }

Maybe we could add some commenting to these functions while changing them
around.

> +static LIST_HEAD(active_pages);
> +static LIST_HEAD(inactive_pages);
> +
> +/**
> + *	protect_data_pages - move data pages that need to be protected from
> + *	being reclaimed out of their respective LRU lists
> + */
> +
> +static void protect_data_pages(struct pbe *pblist)
> +{
> +	struct pbe *p;
> +
> +	for_each_pbe (p, pblist)
> +		if (p->address == p->orig_address) {
> +			struct page *page = virt_to_page(p->address);
> +			struct zone *zone = page_zone(page);
> +
> +			spin_lock(&zone->lru_lock);
> +			if (PageActive(page)) {
> +				del_page_from_active_list(zone, page);
> +				list_add(&page->lru, &active_pages);
> +			} else {
> +				del_page_from_inactive_list(zone, page);
> +				list_add(&page->lru, &inactive_pages);
> +			}
> +			spin_unlock(&zone->lru_lock);
> +			ClearPageLRU(page);
> +		}
> +}

a) How do you know the page is on the LRU?

b) Why are you clearing PageLRU outside the spinlock?

c) This code isn't going outside mm/ even if it is correct.

d) swsusp seems to be quite under documented as far as race conditions
    vs the rest of the kernel go. In some places only a single CPU is
    running with interrupts off, in other places all processes have
    frozen, etc. etc. So it is hard to even know if synchronisation is
    correct or not.

    Suggest more documentation goes into this.

> +
> +/**
> + *	restore_active_inactive_lists - if suspend fails, the pages protected
> + *	with protect_data_pages() have to be moved back to their respective
> + *	lists
> + */
> +
> +static void restore_active_inactive_lists(void)

Ditto, mostly.

> +static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
>  {
>  	struct zone *zone;
> -	unsigned int n = 0;
> +	long n = 0;
> +
> +	pr_debug("swsusp: pages needed: %u + %lu + %lu, free: %u\n",
> +		 copy_pages,
> +		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
> +		 EXTRA_PAGES, nr_free_pages());
>  
>  	for_each_zone (zone)
> -		if (!is_highmem(zone))
> +		if (!is_highmem(zone) && populated_zone(zone)) {
>  			n += zone->free_pages;
> -	pr_debug("swsusp: available memory: %u pages\n", n);
> -	return n > (nr_pages + PAGES_FOR_IO +
> -		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
> -}
> -
> -static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
> -{
> -	struct pbe *p;
> +			n -= zone->lowmem_reserve[ZONE_NORMAL];

Comment for this? Eg. why are you taking the ZONE_NORMAL watermark
in particular?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 10:04       ` Pavel Machek
@ 2006-04-25 10:31         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 10:31 UTC (permalink / raw
  To: Pavel Machek; +Cc: Linux PM, LKML, Nigel Cunningham

Hi,

On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > Okay, so it can be done, and patch does not look too bad. It still
> > > scares me. Is 800MB image more responsive than 500MB after resume?
> > 
> > Yes, it is, slightly, but I think 800 meg images are impractical for
> > performance reasons (like IMO everything above 500 meg with the current
> > hardware).  However this means we can save 80% of RAM with the patch
> > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > 200 meg instead of 125 on a 250 meg machine.
> 
> Could we get few people trying it on such small machines to see if it
> is really that noticeable?

OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).

> > > Is benefit worth it?
> > 
> > Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
> > unnecessary (as of today, but this may change if faster disks are available).
> > On boxes with 512 MB of RAM or less it may change a lot as far as the
> > responsiveness after resume is concerned.
> > 
> > Anyway do you think it may go into -mm (unless Andrew shoots it down,
> > that is ;-))?
> 
> I'd really like to hear that it helps someone before going to
> -mm. It looks clean enough but still it is 300 lines... 

Oh, it's not that bad.  Adds ~240 lines and removes 75. :-)

Greetings,
Rafael


> 
> 									Pavel

-- 
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 10:31         ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 10:31 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Linux PM, LKML

Hi,

On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > Okay, so it can be done, and patch does not look too bad. It still
> > > scares me. Is 800MB image more responsive than 500MB after resume?
> > 
> > Yes, it is, slightly, but I think 800 meg images are impractical for
> > performance reasons (like IMO everything above 500 meg with the current
> > hardware).  However this means we can save 80% of RAM with the patch
> > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > 200 meg instead of 125 on a 250 meg machine.
> 
> Could we get few people trying it on such small machines to see if it
> is really that noticeable?

OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).

> > > Is benefit worth it?
> > 
> > Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
> > unnecessary (as of today, but this may change if faster disks are available).
> > On boxes with 512 MB of RAM or less it may change a lot as far as the
> > responsiveness after resume is concerned.
> > 
> > Anyway do you think it may go into -mm (unless Andrew shoots it down,
> > that is ;-))?
> 
> I'd really like to hear that it helps someone before going to
> -mm. It looks clean enough but still it is 300 lines... 

Oh, it's not that bad.  Adds ~240 lines and removes 75. :-)

Greetings,
Rafael


> 
> 									Pavel

-- 
Beer is proof that God loves us and wants us to be happy - Benjamin Franklin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 10:28 ` Nick Piggin
@ 2006-04-25 15:39   ` Rafael J. Wysocki
  2006-04-25 20:32     ` Pavel Machek
  2006-04-27 19:53       ` Rafael J. Wysocki
  0 siblings, 2 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 15:39 UTC (permalink / raw
  To: Nick Piggin; +Cc: Pavel Machek, Linux PM, LKML, Nigel Cunningham

Hi,

First, thanks for reviewing.

On Tuesday 25 April 2006 12:28, Nick Piggin wrote:
> pageRafael J. Wysocki wrote:
> 
> > Index: linux-2.6.17-rc1-mm3/mm/rmap.c
> > ===================================================================
> > --- linux-2.6.17-rc1-mm3.orig/mm/rmap.c	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/mm/rmap.c	2006-04-23 00:41:19.000000000 +0200
> > @@ -851,3 +851,22 @@ int try_to_unmap(struct page *page, int 
> >  	return ret;
> >  }
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *page)
> > +{
> > +	struct vm_area_struct *vma;
> > +	int ret = 0;
> > +
> > +	spin_lock(&current->mm->page_table_lock);
> > +
> > +	for (vma = current->mm->mmap; vma; vma = vma->vm_next)
> > +		if (page_address_in_vma(page, vma) != -EFAULT) {
> > +			ret = 1;
> > +			break;
> > +		}
> > +
> > +	spin_unlock(&current->mm->page_table_lock);
> > +
> > +	return ret;
> > +}
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> Why not just pass in the task struct? It might make the interface more
> useful elsewhere.

Good idea, I'll do that.

> > Index: linux-2.6.17-rc1-mm3/include/linux/rmap.h
> > ===================================================================
> > --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> > @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
> >   */
> >  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *);
> > +#else
> > +static inline int page_mapped_by_current(struct page *page) { return 0; }
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> This just wrong. What if some random unrelated code calls
> page_mapped_by_current? You should only fold inlines if their result
> folds accordingly.
> 
> For now just leave it out completely if !CONFIG_SOFTWARE_SUSPEND.

OK

>   > -unsigned int count_data_pages(void)
> > +/**
> > + *	need_to_copy - determine if a page needs to be copied before saving.
> > + *	Returns false if the page can be saved without copying.
> > + */
> > +
> > +static int need_to_copy(struct page *page)
> > +{
> > +	if (!PageLRU(page) || PageCompound(page))
> > +		return 1;
> > +	if (page_mapped(page))
> > +		return page_mapped_by_current(page);
> > +
> > +	return 1;
> > +}
> 
> I'd much rather VM internal type stuff get moved *out* of kernel/power :(

Well, I kind of agree, but I don't know where to place it under mm/.

> It needs more comments too. Also, how important is it for the page to be
> off the LRU?

Hm, I'm not sure if that's what you're asking about, but the pages off the LRU
are handled in a usual way, ie. copied when snapshotting the system.  The
pages _on_ the LRU may be included in the snapshot image without
copying, but I require them additionally to be (a) mapped by someone and
(b) not mapped by the current task.

> What if kswapd has taken it off the LRU temporarily (or is it 
> guanteed not to at this point)? What if it is in an lru_add pagevec?

At this point kswapd is frozen as well as everything else except for the
current task (ie. us).  IOW no one is expected to process the LRU lists.

> > +
> > +unsigned int count_data_pages(unsigned int *total)
> >  {
> >  	struct zone *zone;
> >  	unsigned long zone_pfn;
> > -	unsigned int n = 0;
> > +	unsigned int n, m;
> >  
> > +	n = m = 0;
> >  	for_each_zone (zone) {
> >  		if (is_highmem(zone))
> >  			continue;
> >  		mark_free_pages(zone);
> > -		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
> > -			n += saveable(zone, &zone_pfn);
> > +		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
> > +			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
> > +
> > +			if (saveable(pfn)) {
> > +				n++;
> > +				if (!need_to_copy(pfn_to_page(pfn)))
> > +					m++;
> > +			}
> > +		}
> >  	}
> > -	return n;
> > +	if (total)
> > +		*total = n;
> > +
> > +	return n - m;
> >  }
> 
> Maybe we could add some commenting to these functions while changing them
> around.

OK, I will.

> > +static LIST_HEAD(active_pages);
> > +static LIST_HEAD(inactive_pages);
> > +
> > +/**
> > + *	protect_data_pages - move data pages that need to be protected from
> > + *	being reclaimed out of their respective LRU lists
> > + */
> > +
> > +static void protect_data_pages(struct pbe *pblist)
> > +{
> > +	struct pbe *p;
> > +
> > +	for_each_pbe (p, pblist)
> > +		if (p->address == p->orig_address) {
> > +			struct page *page = virt_to_page(p->address);
> > +			struct zone *zone = page_zone(page);
> > +
> > +			spin_lock(&zone->lru_lock);
> > +			if (PageActive(page)) {
> > +				del_page_from_active_list(zone, page);
> > +				list_add(&page->lru, &active_pages);
> > +			} else {
> > +				del_page_from_inactive_list(zone, page);
> > +				list_add(&page->lru, &inactive_pages);
> > +			}
> > +			spin_unlock(&zone->lru_lock);
> > +			ClearPageLRU(page);
> > +		}
> > +}
> 
> a) How do you know the page is on the LRU?

Because in copy_data_pages() p->address is only set to p->orig_address for
pages that are on the LRU.  No one can change the state of the pages in
between because everyone else is frozen and we have local IRQs disabled.

> b) Why are you clearing PageLRU outside the spinlock?

Well, good question. ;-)  Moreover it seems I don't need to acquire the
spinlock at all, because this is done on one CPU with IRQs disabled and the
other tasks frozen.

> c) This code isn't going outside mm/ even if it is correct.

OK, so could you please suggest me where to put it (or a corrected version)
under mm/?

> d) swsusp seems to be quite under documented as far as race conditions
>     vs the rest of the kernel go. In some places only a single CPU is
>     running with interrupts off, in other places all processes have
>     frozen, etc. etc. So it is hard to even know if synchronisation is
>     correct or not.
> 
>     Suggest more documentation goes into this.

OK

> > +
> > +/**
> > + *	restore_active_inactive_lists - if suspend fails, the pages protected
> > + *	with protect_data_pages() have to be moved back to their respective
> > + *	lists
> > + */
> > +
> > +static void restore_active_inactive_lists(void)
> 
> Ditto, mostly.
> 
> > +static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
> >  {
> >  	struct zone *zone;
> > -	unsigned int n = 0;
> > +	long n = 0;
> > +
> > +	pr_debug("swsusp: pages needed: %u + %lu + %lu, free: %u\n",
> > +		 copy_pages,
> > +		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
> > +		 EXTRA_PAGES, nr_free_pages());
> >  
> >  	for_each_zone (zone)
> > -		if (!is_highmem(zone))
> > +		if (!is_highmem(zone) && populated_zone(zone)) {
> >  			n += zone->free_pages;
> > -	pr_debug("swsusp: available memory: %u pages\n", n);
> > -	return n > (nr_pages + PAGES_FOR_IO +
> > -		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
> > -}
> > -
> > -static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
> > -{
> > -	struct pbe *p;
> > +			n -= zone->lowmem_reserve[ZONE_NORMAL];
> 
> Comment for this? Eg. why are you taking the ZONE_NORMAL watermark
> in particular?

This is to sync enough_free_mem() with swsusp_shrink_memory() (in swsusp.c).
[We allocate with GFP_ATOMIC ie. from the normal zone and the lowmem reserves
in the lower zones are effectively unavailable to us because of the check in
zone_watermark_ok().]

Thanks a lot for the comments.  I'll do my best to follow your suggestions
in the next version of the patch.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 15:39   ` Rafael J. Wysocki
@ 2006-04-25 20:32     ` Pavel Machek
  2006-04-25 21:12         ` Rafael J. Wysocki
  2006-04-27 19:53       ` Rafael J. Wysocki
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 20:32 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nick Piggin, Linux PM, LKML, Nigel Cunningham

Hi!

> >   > -unsigned int count_data_pages(void)
> > > +/**
> > > + *	need_to_copy - determine if a page needs to be copied before saving.
> > > + *	Returns false if the page can be saved without copying.
> > > + */
> > > +
> > > +static int need_to_copy(struct page *page)
> > > +{
> > > +	if (!PageLRU(page) || PageCompound(page))
> > > +		return 1;
> > > +	if (page_mapped(page))
> > > +		return page_mapped_by_current(page);
> > > +
> > > +	return 1;
> > > +}
> > 
> > I'd much rather VM internal type stuff get moved *out* of kernel/power :(
> 
> Well, I kind of agree, but I don't know where to place it under mm/.
> 
> > It needs more comments too. Also, how important is it for the page to be
> > off the LRU?
> 
> Hm, I'm not sure if that's what you're asking about, but the pages off the LRU
> are handled in a usual way, ie. copied when snapshotting the system.  The
> pages _on_ the LRU may be included in the snapshot image without
> copying, but I require them additionally to be (a) mapped by someone and
> (b) not mapped by the current task.

Why do you _want_ them mapped by someone?

> > b) Why are you clearing PageLRU outside the spinlock?
> 
> Well, good question. ;-)  Moreover it seems I don't need to acquire the
> spinlock at all, because this is done on one CPU with IRQs disabled and the
> other tasks frozen.

Well, it is probably better to still take the spinlock. That way, if
something goes wrong we get deadlock, not anything worse.
								Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 20:32     ` Pavel Machek
@ 2006-04-25 21:12         ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 21:12 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nick Piggin, Linux PM, LKML, Nigel Cunningham

Hi,

On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > >   > -unsigned int count_data_pages(void)
> > > > +/**
> > > > + *	need_to_copy - determine if a page needs to be copied before saving.
> > > > + *	Returns false if the page can be saved without copying.
> > > > + */
> > > > +
> > > > +static int need_to_copy(struct page *page)
> > > > +{
> > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > +		return 1;
> > > > +	if (page_mapped(page))
> > > > +		return page_mapped_by_current(page);
> > > > +
> > > > +	return 1;
> > > > +}
> > > 
> > > I'd much rather VM internal type stuff get moved *out* of kernel/power :(
> > 
> > Well, I kind of agree, but I don't know where to place it under mm/.
> > 
> > > It needs more comments too. Also, how important is it for the page to be
> > > off the LRU?
> > 
> > Hm, I'm not sure if that's what you're asking about, but the pages off the LRU
> > are handled in a usual way, ie. copied when snapshotting the system.  The
> > pages _on_ the LRU may be included in the snapshot image without
> > copying, but I require them additionally to be (a) mapped by someone and
> > (b) not mapped by the current task.
> 
> Why do you _want_ them mapped by someone?

Because this means they belong to a task that is frozen and won't touch them
(of course unless it's us).  The kernel has no reason to access them either
(even after we resume devices) except for reclaiming, but that's handled
explicitly.  Thus it's safe to include them in the image without copying.

As I said before, I think the page cache pages may be treated this way too.
It probably applies to all of the LRU pages, but there may be some corner
cases.  The mapped pages are just easy to single out.

> > > b) Why are you clearing PageLRU outside the spinlock?
> > 
> > Well, good question. ;-)  Moreover it seems I don't need to acquire the
> > spinlock at all, because this is done on one CPU with IRQs disabled and the
> > other tasks frozen.
> 
> Well, it is probably better to still take the spinlock. That way, if
> something goes wrong we get deadlock, not anything worse.

OK, so I'll just clear/set the LRU flag under the spinlock.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 21:12         ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 21:12 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Nick Piggin, LKML, Linux PM

Hi,

On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > >   > -unsigned int count_data_pages(void)
> > > > +/**
> > > > + *	need_to_copy - determine if a page needs to be copied before saving.
> > > > + *	Returns false if the page can be saved without copying.
> > > > + */
> > > > +
> > > > +static int need_to_copy(struct page *page)
> > > > +{
> > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > +		return 1;
> > > > +	if (page_mapped(page))
> > > > +		return page_mapped_by_current(page);
> > > > +
> > > > +	return 1;
> > > > +}
> > > 
> > > I'd much rather VM internal type stuff get moved *out* of kernel/power :(
> > 
> > Well, I kind of agree, but I don't know where to place it under mm/.
> > 
> > > It needs more comments too. Also, how important is it for the page to be
> > > off the LRU?
> > 
> > Hm, I'm not sure if that's what you're asking about, but the pages off the LRU
> > are handled in a usual way, ie. copied when snapshotting the system.  The
> > pages _on_ the LRU may be included in the snapshot image without
> > copying, but I require them additionally to be (a) mapped by someone and
> > (b) not mapped by the current task.
> 
> Why do you _want_ them mapped by someone?

Because this means they belong to a task that is frozen and won't touch them
(of course unless it's us).  The kernel has no reason to access them either
(even after we resume devices) except for reclaiming, but that's handled
explicitly.  Thus it's safe to include them in the image without copying.

As I said before, I think the page cache pages may be treated this way too.
It probably applies to all of the LRU pages, but there may be some corner
cases.  The mapped pages are just easy to single out.

> > > b) Why are you clearing PageLRU outside the spinlock?
> > 
> > Well, good question. ;-)  Moreover it seems I don't need to acquire the
> > spinlock at all, because this is done on one CPU with IRQs disabled and the
> > other tasks frozen.
> 
> Well, it is probably better to still take the spinlock. That way, if
> something goes wrong we get deadlock, not anything worse.

OK, so I'll just clear/set the LRU flag under the spinlock.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 21:12         ` Rafael J. Wysocki
@ 2006-04-25 21:18           ` Nigel Cunningham
  -1 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-25 21:18 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 2424 bytes --]

Hi.

On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> Hi,
>
> On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > >   > -unsigned int count_data_pages(void)
> > > > >
> > > > > +/**
> > > > > + *	need_to_copy - determine if a page needs to be copied before
> > > > > saving. + *	Returns false if the page can be saved without copying.
> > > > > + */
> > > > > +
> > > > > +static int need_to_copy(struct page *page)
> > > > > +{
> > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > +		return 1;
> > > > > +	if (page_mapped(page))
> > > > > +		return page_mapped_by_current(page);
> > > > > +
> > > > > +	return 1;
> > > > > +}
> > > >
> > > > I'd much rather VM internal type stuff get moved *out* of
> > > > kernel/power :(
> > >
> > > Well, I kind of agree, but I don't know where to place it under mm/.
> > >
> > > > It needs more comments too. Also, how important is it for the page to
> > > > be off the LRU?
> > >
> > > Hm, I'm not sure if that's what you're asking about, but the pages off
> > > the LRU are handled in a usual way, ie. copied when snapshotting the
> > > system.  The pages _on_ the LRU may be included in the snapshot image
> > > without copying, but I require them additionally to be (a) mapped by
> > > someone and (b) not mapped by the current task.
> >
> > Why do you _want_ them mapped by someone?
>
> Because this means they belong to a task that is frozen and won't touch
> them (of course unless it's us).  The kernel has no reason to access them
> either (even after we resume devices) except for reclaiming, but that's
> handled explicitly.  Thus it's safe to include them in the image without
> copying.
>
> As I said before, I think the page cache pages may be treated this way too.
> It probably applies to all of the LRU pages, but there may be some corner
> cases.  The mapped pages are just easy to single out.

It does apply to all of the LRU pages. This is what I've been doing for years 
now. The only corner case I've come across is XFS. It still wants to write 
data even when there's nothing to do and it's threads are frozen (IIRC - 
haven't looked at it for a while). I got around that by freezing bdevs when 
freezing processes.

Regards,

Nigel

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 21:18           ` Nigel Cunningham
  0 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-25 21:18 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nick Piggin, LKML, Pavel Machek, Linux PM


[-- Attachment #1.1: Type: text/plain, Size: 2424 bytes --]

Hi.

On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> Hi,
>
> On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > >   > -unsigned int count_data_pages(void)
> > > > >
> > > > > +/**
> > > > > + *	need_to_copy - determine if a page needs to be copied before
> > > > > saving. + *	Returns false if the page can be saved without copying.
> > > > > + */
> > > > > +
> > > > > +static int need_to_copy(struct page *page)
> > > > > +{
> > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > +		return 1;
> > > > > +	if (page_mapped(page))
> > > > > +		return page_mapped_by_current(page);
> > > > > +
> > > > > +	return 1;
> > > > > +}
> > > >
> > > > I'd much rather VM internal type stuff get moved *out* of
> > > > kernel/power :(
> > >
> > > Well, I kind of agree, but I don't know where to place it under mm/.
> > >
> > > > It needs more comments too. Also, how important is it for the page to
> > > > be off the LRU?
> > >
> > > Hm, I'm not sure if that's what you're asking about, but the pages off
> > > the LRU are handled in a usual way, ie. copied when snapshotting the
> > > system.  The pages _on_ the LRU may be included in the snapshot image
> > > without copying, but I require them additionally to be (a) mapped by
> > > someone and (b) not mapped by the current task.
> >
> > Why do you _want_ them mapped by someone?
>
> Because this means they belong to a task that is frozen and won't touch
> them (of course unless it's us).  The kernel has no reason to access them
> either (even after we resume devices) except for reclaiming, but that's
> handled explicitly.  Thus it's safe to include them in the image without
> copying.
>
> As I said before, I think the page cache pages may be treated this way too.
> It probably applies to all of the LRU pages, but there may be some corner
> cases.  The mapped pages are just easy to single out.

It does apply to all of the LRU pages. This is what I've been doing for years 
now. The only corner case I've come across is XFS. It still wants to write 
data even when there's nothing to do and it's threads are frozen (IIRC - 
haven't looked at it for a while). I got around that by freezing bdevs when 
freezing processes.

Regards,

Nigel

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 21:18           ` Nigel Cunningham
@ 2006-04-25 22:21             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 22:21 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

Hi,

On Tuesday 25 April 2006 23:18, Nigel Cunningham wrote:
> On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> > On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > > >   > -unsigned int count_data_pages(void)
> > > > > >
> > > > > > +/**
> > > > > > + *	need_to_copy - determine if a page needs to be copied before
> > > > > > saving. + *	Returns false if the page can be saved without copying.
> > > > > > + */
> > > > > > +
> > > > > > +static int need_to_copy(struct page *page)
> > > > > > +{
> > > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > > +		return 1;
> > > > > > +	if (page_mapped(page))
> > > > > > +		return page_mapped_by_current(page);
> > > > > > +
> > > > > > +	return 1;
> > > > > > +}
> > > > >
> > > > > I'd much rather VM internal type stuff get moved *out* of
> > > > > kernel/power :(
> > > >
> > > > Well, I kind of agree, but I don't know where to place it under mm/.
> > > >
> > > > > It needs more comments too. Also, how important is it for the page to
> > > > > be off the LRU?
> > > >
> > > > Hm, I'm not sure if that's what you're asking about, but the pages off
> > > > the LRU are handled in a usual way, ie. copied when snapshotting the
> > > > system.  The pages _on_ the LRU may be included in the snapshot image
> > > > without copying, but I require them additionally to be (a) mapped by
> > > > someone and (b) not mapped by the current task.
> > >
> > > Why do you _want_ them mapped by someone?
> >
> > Because this means they belong to a task that is frozen and won't touch
> > them (of course unless it's us).  The kernel has no reason to access them
> > either (even after we resume devices) except for reclaiming, but that's
> > handled explicitly.  Thus it's safe to include them in the image without
> > copying.
> >
> > As I said before, I think the page cache pages may be treated this way too.
> > It probably applies to all of the LRU pages, but there may be some corner
> > cases.  The mapped pages are just easy to single out.
> 
> It does apply to all of the LRU pages. This is what I've been doing for years 
> now. The only corner case I've come across is XFS. It still wants to write 
> data even when there's nothing to do and it's threads are frozen (IIRC - 
> haven't looked at it for a while). I got around that by freezing bdevs when 
> freezing processes.

This means if we freeze bdevs, we'll be able to save all of the LRU pages,
except for the pages mapped by the current task, without copying.  I think we
can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:21             ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 22:21 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Nick Piggin, LKML, Pavel Machek, Linux PM

[-- Attachment #1: Type: text/plain, Size: 2609 bytes --]

Hi,

On Tuesday 25 April 2006 23:18, Nigel Cunningham wrote:
> On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> > On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > > >   > -unsigned int count_data_pages(void)
> > > > > >
> > > > > > +/**
> > > > > > + *	need_to_copy - determine if a page needs to be copied before
> > > > > > saving. + *	Returns false if the page can be saved without copying.
> > > > > > + */
> > > > > > +
> > > > > > +static int need_to_copy(struct page *page)
> > > > > > +{
> > > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > > +		return 1;
> > > > > > +	if (page_mapped(page))
> > > > > > +		return page_mapped_by_current(page);
> > > > > > +
> > > > > > +	return 1;
> > > > > > +}
> > > > >
> > > > > I'd much rather VM internal type stuff get moved *out* of
> > > > > kernel/power :(
> > > >
> > > > Well, I kind of agree, but I don't know where to place it under mm/.
> > > >
> > > > > It needs more comments too. Also, how important is it for the page to
> > > > > be off the LRU?
> > > >
> > > > Hm, I'm not sure if that's what you're asking about, but the pages off
> > > > the LRU are handled in a usual way, ie. copied when snapshotting the
> > > > system.  The pages _on_ the LRU may be included in the snapshot image
> > > > without copying, but I require them additionally to be (a) mapped by
> > > > someone and (b) not mapped by the current task.
> > >
> > > Why do you _want_ them mapped by someone?
> >
> > Because this means they belong to a task that is frozen and won't touch
> > them (of course unless it's us).  The kernel has no reason to access them
> > either (even after we resume devices) except for reclaiming, but that's
> > handled explicitly.  Thus it's safe to include them in the image without
> > copying.
> >
> > As I said before, I think the page cache pages may be treated this way too.
> > It probably applies to all of the LRU pages, but there may be some corner
> > cases.  The mapped pages are just easy to single out.
> 
> It does apply to all of the LRU pages. This is what I've been doing for years 
> now. The only corner case I've come across is XFS. It still wants to write 
> data even when there's nothing to do and it's threads are frozen (IIRC - 
> haven't looked at it for a while). I got around that by freezing bdevs when 
> freezing processes.

This means if we freeze bdevs, we'll be able to save all of the LRU pages,
except for the pages mapped by the current task, without copying.  I think we
can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)

Greetings,
Rafael

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:21             ` Rafael J. Wysocki
@ 2006-04-25 22:24               ` Nigel Cunningham
  -1 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-25 22:24 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 3208 bytes --]

Hi.

On Wednesday 26 April 2006 08:21, Rafael J. Wysocki wrote:
> On Tuesday 25 April 2006 23:18, Nigel Cunningham wrote:
> > On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> > > On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > > > >   > -unsigned int count_data_pages(void)
> > > > > > >
> > > > > > > +/**
> > > > > > > + *	need_to_copy - determine if a page needs to be copied
> > > > > > > before saving. + *	Returns false if the page can be saved
> > > > > > > without copying. + */
> > > > > > > +
> > > > > > > +static int need_to_copy(struct page *page)
> > > > > > > +{
> > > > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > > > +		return 1;
> > > > > > > +	if (page_mapped(page))
> > > > > > > +		return page_mapped_by_current(page);
> > > > > > > +
> > > > > > > +	return 1;
> > > > > > > +}
> > > > > >
> > > > > > I'd much rather VM internal type stuff get moved *out* of
> > > > > > kernel/power :(
> > > > >
> > > > > Well, I kind of agree, but I don't know where to place it under
> > > > > mm/.
> > > > >
> > > > > > It needs more comments too. Also, how important is it for the
> > > > > > page to be off the LRU?
> > > > >
> > > > > Hm, I'm not sure if that's what you're asking about, but the pages
> > > > > off the LRU are handled in a usual way, ie. copied when
> > > > > snapshotting the system.  The pages _on_ the LRU may be included in
> > > > > the snapshot image without copying, but I require them additionally
> > > > > to be (a) mapped by someone and (b) not mapped by the current task.
> > > >
> > > > Why do you _want_ them mapped by someone?
> > >
> > > Because this means they belong to a task that is frozen and won't touch
> > > them (of course unless it's us).  The kernel has no reason to access
> > > them either (even after we resume devices) except for reclaiming, but
> > > that's handled explicitly.  Thus it's safe to include them in the image
> > > without copying.
> > >
> > > As I said before, I think the page cache pages may be treated this way
> > > too. It probably applies to all of the LRU pages, but there may be some
> > > corner cases.  The mapped pages are just easy to single out.
> >
> > It does apply to all of the LRU pages. This is what I've been doing for
> > years now. The only corner case I've come across is XFS. It still wants
> > to write data even when there's nothing to do and it's threads are frozen
> > (IIRC - haven't looked at it for a while). I got around that by freezing
> > bdevs when freezing processes.
>
> This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> except for the pages mapped by the current task, without copying.  I think
> we can try to do this, but we'll need a patch to freeze bdevs for this
> purpose. ;-)

Isn't that a coincidence? I just happen to have the code right here! Not in a 
patch form that you'd want though - do you have any refrigerator changes in 
mm at the moment, or can I use vanilla 2.6.17-rc2 process.c?

Regards,

Nigel
-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:24               ` Nigel Cunningham
  0 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-25 22:24 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nick Piggin, LKML, Pavel Machek, Linux PM


[-- Attachment #1.1: Type: text/plain, Size: 3208 bytes --]

Hi.

On Wednesday 26 April 2006 08:21, Rafael J. Wysocki wrote:
> On Tuesday 25 April 2006 23:18, Nigel Cunningham wrote:
> > On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> > > On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > > > >   > -unsigned int count_data_pages(void)
> > > > > > >
> > > > > > > +/**
> > > > > > > + *	need_to_copy - determine if a page needs to be copied
> > > > > > > before saving. + *	Returns false if the page can be saved
> > > > > > > without copying. + */
> > > > > > > +
> > > > > > > +static int need_to_copy(struct page *page)
> > > > > > > +{
> > > > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > > > +		return 1;
> > > > > > > +	if (page_mapped(page))
> > > > > > > +		return page_mapped_by_current(page);
> > > > > > > +
> > > > > > > +	return 1;
> > > > > > > +}
> > > > > >
> > > > > > I'd much rather VM internal type stuff get moved *out* of
> > > > > > kernel/power :(
> > > > >
> > > > > Well, I kind of agree, but I don't know where to place it under
> > > > > mm/.
> > > > >
> > > > > > It needs more comments too. Also, how important is it for the
> > > > > > page to be off the LRU?
> > > > >
> > > > > Hm, I'm not sure if that's what you're asking about, but the pages
> > > > > off the LRU are handled in a usual way, ie. copied when
> > > > > snapshotting the system.  The pages _on_ the LRU may be included in
> > > > > the snapshot image without copying, but I require them additionally
> > > > > to be (a) mapped by someone and (b) not mapped by the current task.
> > > >
> > > > Why do you _want_ them mapped by someone?
> > >
> > > Because this means they belong to a task that is frozen and won't touch
> > > them (of course unless it's us).  The kernel has no reason to access
> > > them either (even after we resume devices) except for reclaiming, but
> > > that's handled explicitly.  Thus it's safe to include them in the image
> > > without copying.
> > >
> > > As I said before, I think the page cache pages may be treated this way
> > > too. It probably applies to all of the LRU pages, but there may be some
> > > corner cases.  The mapped pages are just easy to single out.
> >
> > It does apply to all of the LRU pages. This is what I've been doing for
> > years now. The only corner case I've come across is XFS. It still wants
> > to write data even when there's nothing to do and it's threads are frozen
> > (IIRC - haven't looked at it for a while). I got around that by freezing
> > bdevs when freezing processes.
>
> This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> except for the pages mapped by the current task, without copying.  I think
> we can try to do this, but we'll need a patch to freeze bdevs for this
> purpose. ;-)

Isn't that a coincidence? I just happen to have the code right here! Not in a 
patch form that you'd want though - do you have any refrigerator changes in 
mm at the moment, or can I use vanilla 2.6.17-rc2 process.c?

Regards,

Nigel
-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:21             ` Rafael J. Wysocki
@ 2006-04-25 22:25               ` Pavel Machek
  -1 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 22:25 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Nick Piggin, Linux PM, LKML

Hi!

> > It does apply to all of the LRU pages. This is what I've been doing for years 
> > now. The only corner case I've come across is XFS. It still wants to write 
> > data even when there's nothing to do and it's threads are frozen (IIRC - 
> > haven't looked at it for a while). I got around that by freezing bdevs when 
> > freezing processes.
> 
> This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> except for the pages mapped by the current task, without copying.  I think we
> can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)

...adding more dependencies to how vm/blockdevs work. I'd say current
code is complex enough...
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:25               ` Pavel Machek
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 22:25 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Nick Piggin, LKML, Linux PM

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

Hi!

> > It does apply to all of the LRU pages. This is what I've been doing for years 
> > now. The only corner case I've come across is XFS. It still wants to write 
> > data even when there's nothing to do and it's threads are frozen (IIRC - 
> > haven't looked at it for a while). I got around that by freezing bdevs when 
> > freezing processes.
> 
> This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> except for the pages mapped by the current task, without copying.  I think we
> can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)

...adding more dependencies to how vm/blockdevs work. I'd say current
code is complex enough...
							Pavel
-- 
Thanks for all the (sleeping) penguins.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:25               ` Pavel Machek
@ 2006-04-25 22:30                 ` Nigel Cunningham
  -1 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-25 22:30 UTC (permalink / raw
  To: Pavel Machek; +Cc: Rafael J. Wysocki, Nick Piggin, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

HI.

On Wednesday 26 April 2006 08:25, Pavel Machek wrote:
> Hi!
>
> > > It does apply to all of the LRU pages. This is what I've been doing for
> > > years now. The only corner case I've come across is XFS. It still wants
> > > to write data even when there's nothing to do and it's threads are
> > > frozen (IIRC - haven't looked at it for a while). I got around that by
> > > freezing bdevs when freezing processes.
> >
> > This means if we freeze bdevs, we'll be able to save all of the LRU
> > pages, except for the pages mapped by the current task, without copying. 
> > I think we can try to do this, but we'll need a patch to freeze bdevs for
> > this purpose. ;-)
>
> ...adding more dependencies to how vm/blockdevs work. I'd say current
> code is complex enough...

You assume too much. This is just using existing code.

Regards,

Nigel

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:30                 ` Nigel Cunningham
  0 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-25 22:30 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nick Piggin, LKML, Linux PM


[-- Attachment #1.1: Type: text/plain, Size: 1014 bytes --]

HI.

On Wednesday 26 April 2006 08:25, Pavel Machek wrote:
> Hi!
>
> > > It does apply to all of the LRU pages. This is what I've been doing for
> > > years now. The only corner case I've come across is XFS. It still wants
> > > to write data even when there's nothing to do and it's threads are
> > > frozen (IIRC - haven't looked at it for a while). I got around that by
> > > freezing bdevs when freezing processes.
> >
> > This means if we freeze bdevs, we'll be able to save all of the LRU
> > pages, except for the pages mapped by the current task, without copying. 
> > I think we can try to do this, but we'll need a patch to freeze bdevs for
> > this purpose. ;-)
>
> ...adding more dependencies to how vm/blockdevs work. I'd say current
> code is complex enough...

You assume too much. This is just using existing code.

Regards,

Nigel

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:30                 ` Nigel Cunningham
@ 2006-04-25 22:36                   ` Pavel Machek
  -1 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 22:36 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Rafael J. Wysocki, Nick Piggin, Linux PM, LKML

On St 26-04-06 08:30:51, Nigel Cunningham wrote:
> HI.
> 
> On Wednesday 26 April 2006 08:25, Pavel Machek wrote:
> > Hi!
> >
> > > > It does apply to all of the LRU pages. This is what I've been doing for
> > > > years now. The only corner case I've come across is XFS. It still wants
> > > > to write data even when there's nothing to do and it's threads are
> > > > frozen (IIRC - haven't looked at it for a while). I got around that by
> > > > freezing bdevs when freezing processes.
> > >
> > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > pages, except for the pages mapped by the current task, without copying. 
> > > I think we can try to do this, but we'll need a patch to freeze bdevs for
> > > this purpose. ;-)
> >
> > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > code is complex enough...
> 
> You assume too much. This is just using existing code.

Let's see the patch, then.
								Pavel

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:36                   ` Pavel Machek
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-25 22:36 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Nick Piggin, LKML, Linux PM

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

On St 26-04-06 08:30:51, Nigel Cunningham wrote:
> HI.
> 
> On Wednesday 26 April 2006 08:25, Pavel Machek wrote:
> > Hi!
> >
> > > > It does apply to all of the LRU pages. This is what I've been doing for
> > > > years now. The only corner case I've come across is XFS. It still wants
> > > > to write data even when there's nothing to do and it's threads are
> > > > frozen (IIRC - haven't looked at it for a while). I got around that by
> > > > freezing bdevs when freezing processes.
> > >
> > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > pages, except for the pages mapped by the current task, without copying. 
> > > I think we can try to do this, but we'll need a patch to freeze bdevs for
> > > this purpose. ;-)
> >
> > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > code is complex enough...
> 
> You assume too much. This is just using existing code.

Let's see the patch, then.
								Pavel

-- 
Thanks for all the (sleeping) penguins.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:24               ` Nigel Cunningham
@ 2006-04-25 22:38                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 22:38 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

Hi,

On Wednesday 26 April 2006 00:24, Nigel Cunningham wrote:
> On Wednesday 26 April 2006 08:21, Rafael J. Wysocki wrote:
> > On Tuesday 25 April 2006 23:18, Nigel Cunningham wrote:
> > > On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> > > > On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > > > > >   > -unsigned int count_data_pages(void)
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + *	need_to_copy - determine if a page needs to be copied
> > > > > > > > before saving. + *	Returns false if the page can be saved
> > > > > > > > without copying. + */
> > > > > > > > +
> > > > > > > > +static int need_to_copy(struct page *page)
> > > > > > > > +{
> > > > > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > > > > +		return 1;
> > > > > > > > +	if (page_mapped(page))
> > > > > > > > +		return page_mapped_by_current(page);
> > > > > > > > +
> > > > > > > > +	return 1;
> > > > > > > > +}
> > > > > > >
> > > > > > > I'd much rather VM internal type stuff get moved *out* of
> > > > > > > kernel/power :(
> > > > > >
> > > > > > Well, I kind of agree, but I don't know where to place it under
> > > > > > mm/.
> > > > > >
> > > > > > > It needs more comments too. Also, how important is it for the
> > > > > > > page to be off the LRU?
> > > > > >
> > > > > > Hm, I'm not sure if that's what you're asking about, but the pages
> > > > > > off the LRU are handled in a usual way, ie. copied when
> > > > > > snapshotting the system.  The pages _on_ the LRU may be included in
> > > > > > the snapshot image without copying, but I require them additionally
> > > > > > to be (a) mapped by someone and (b) not mapped by the current task.
> > > > >
> > > > > Why do you _want_ them mapped by someone?
> > > >
> > > > Because this means they belong to a task that is frozen and won't touch
> > > > them (of course unless it's us).  The kernel has no reason to access
> > > > them either (even after we resume devices) except for reclaiming, but
> > > > that's handled explicitly.  Thus it's safe to include them in the image
> > > > without copying.
> > > >
> > > > As I said before, I think the page cache pages may be treated this way
> > > > too. It probably applies to all of the LRU pages, but there may be some
> > > > corner cases.  The mapped pages are just easy to single out.
> > >
> > > It does apply to all of the LRU pages. This is what I've been doing for
> > > years now. The only corner case I've come across is XFS. It still wants
> > > to write data even when there's nothing to do and it's threads are frozen
> > > (IIRC - haven't looked at it for a while). I got around that by freezing
> > > bdevs when freezing processes.
> >
> > This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> > except for the pages mapped by the current task, without copying.  I think
> > we can try to do this, but we'll need a patch to freeze bdevs for this
> > purpose. ;-)
> 
> Isn't that a coincidence? I just happen to have the code right here! Not in a 
> patch form that you'd want though - do you have any refrigerator changes in 
> mm at the moment, or can I use vanilla 2.6.17-rc2 process.c?

I think you can use the 2.6.17-rc2 one safely.

Greetings,
Rafael


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:38                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 22:38 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Nick Piggin, LKML, Pavel Machek, Linux PM

[-- Attachment #1: Type: text/plain, Size: 3233 bytes --]

Hi,

On Wednesday 26 April 2006 00:24, Nigel Cunningham wrote:
> On Wednesday 26 April 2006 08:21, Rafael J. Wysocki wrote:
> > On Tuesday 25 April 2006 23:18, Nigel Cunningham wrote:
> > > On Wednesday 26 April 2006 07:12, Rafael J. Wysocki wrote:
> > > > On Tuesday 25 April 2006 22:32, Pavel Machek wrote:
> > > > > > >   > -unsigned int count_data_pages(void)
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + *	need_to_copy - determine if a page needs to be copied
> > > > > > > > before saving. + *	Returns false if the page can be saved
> > > > > > > > without copying. + */
> > > > > > > > +
> > > > > > > > +static int need_to_copy(struct page *page)
> > > > > > > > +{
> > > > > > > > +	if (!PageLRU(page) || PageCompound(page))
> > > > > > > > +		return 1;
> > > > > > > > +	if (page_mapped(page))
> > > > > > > > +		return page_mapped_by_current(page);
> > > > > > > > +
> > > > > > > > +	return 1;
> > > > > > > > +}
> > > > > > >
> > > > > > > I'd much rather VM internal type stuff get moved *out* of
> > > > > > > kernel/power :(
> > > > > >
> > > > > > Well, I kind of agree, but I don't know where to place it under
> > > > > > mm/.
> > > > > >
> > > > > > > It needs more comments too. Also, how important is it for the
> > > > > > > page to be off the LRU?
> > > > > >
> > > > > > Hm, I'm not sure if that's what you're asking about, but the pages
> > > > > > off the LRU are handled in a usual way, ie. copied when
> > > > > > snapshotting the system.  The pages _on_ the LRU may be included in
> > > > > > the snapshot image without copying, but I require them additionally
> > > > > > to be (a) mapped by someone and (b) not mapped by the current task.
> > > > >
> > > > > Why do you _want_ them mapped by someone?
> > > >
> > > > Because this means they belong to a task that is frozen and won't touch
> > > > them (of course unless it's us).  The kernel has no reason to access
> > > > them either (even after we resume devices) except for reclaiming, but
> > > > that's handled explicitly.  Thus it's safe to include them in the image
> > > > without copying.
> > > >
> > > > As I said before, I think the page cache pages may be treated this way
> > > > too. It probably applies to all of the LRU pages, but there may be some
> > > > corner cases.  The mapped pages are just easy to single out.
> > >
> > > It does apply to all of the LRU pages. This is what I've been doing for
> > > years now. The only corner case I've come across is XFS. It still wants
> > > to write data even when there's nothing to do and it's threads are frozen
> > > (IIRC - haven't looked at it for a while). I got around that by freezing
> > > bdevs when freezing processes.
> >
> > This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> > except for the pages mapped by the current task, without copying.  I think
> > we can try to do this, but we'll need a patch to freeze bdevs for this
> > purpose. ;-)
> 
> Isn't that a coincidence? I just happen to have the code right here! Not in a 
> patch form that you'd want though - do you have any refrigerator changes in 
> mm at the moment, or can I use vanilla 2.6.17-rc2 process.c?

I think you can use the 2.6.17-rc2 one safely.

Greetings,
Rafael


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:25               ` Pavel Machek
@ 2006-04-25 22:43                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 22:43 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Nick Piggin, Linux PM, LKML

Hi,

On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > It does apply to all of the LRU pages. This is what I've been doing for years 
> > > now. The only corner case I've come across is XFS. It still wants to write 
> > > data even when there's nothing to do and it's threads are frozen (IIRC - 
> > > haven't looked at it for a while). I got around that by freezing bdevs when 
> > > freezing processes.
> > 
> > This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> > except for the pages mapped by the current task, without copying.  I think we
> > can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)
> 
> ...adding more dependencies to how vm/blockdevs work. I'd say current
> code is complex enough...

Well, why don't we see the patch?  If it's too complex, we can just decide not
to use it. :-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-25 22:43                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-25 22:43 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Nick Piggin, LKML, Linux PM

Hi,

On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > It does apply to all of the LRU pages. This is what I've been doing for years 
> > > now. The only corner case I've come across is XFS. It still wants to write 
> > > data even when there's nothing to do and it's threads are frozen (IIRC - 
> > > haven't looked at it for a while). I got around that by freezing bdevs when 
> > > freezing processes.
> > 
> > This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> > except for the pages mapped by the current task, without copying.  I think we
> > can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)
> 
> ...adding more dependencies to how vm/blockdevs work. I'd say current
> code is complex enough...

Well, why don't we see the patch?  If it's too complex, we can just decide not
to use it. :-)

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:43                 ` Rafael J. Wysocki
  (?)
@ 2006-04-26  0:49                 ` Nigel Cunningham
  2006-04-30 12:27                     ` Rafael J. Wysocki
  -1 siblings, 1 reply; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-26  0:49 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

Hi.

On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> Hi,
>
> On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > It does apply to all of the LRU pages. This is what I've been doing
> > > > for years now. The only corner case I've come across is XFS. It still
> > > > wants to write data even when there's nothing to do and it's threads
> > > > are frozen (IIRC - haven't looked at it for a while). I got around
> > > > that by freezing bdevs when freezing processes.
> > >
> > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > pages, except for the pages mapped by the current task, without
> > > copying.  I think we can try to do this, but we'll need a patch to
> > > freeze bdevs for this purpose. ;-)
> >
> > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > code is complex enough...
>
> Well, why don't we see the patch?  If it's too complex, we can just decide
> not to use it. :-)

In Suspend2, I'm still using a different version of process.c to what you guys 
have. In my version, I thaw kernelspace, then thaw bdevs, then thaw userspace. 
The version below just thaws bdevs after thawing all processes. I think that 
might need modification, but thought I'd post this now so you can see how 
complicated or otherwise it is.

Regards,

Nigel

diff -ruN linux-2.6.17-rc2/kernel/power/process.c bdev-freeze/kernel/power/process.c
--- linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000 +1000
+++ bdev-freeze/kernel/power/process.c	2006-04-26 10:44:56.000000000 +1000
@@ -19,6 +19,56 @@
  */
 #define TIMEOUT	(20 * HZ)
 
+struct frozen_fs
+{
+	struct list_head fsb_list;
+	struct super_block *sb;
+};
+
+LIST_HEAD(frozen_fs_list);
+
+void freezer_make_fses_rw(void)
+{
+	struct frozen_fs *fs, *next_fs;
+
+	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
+		thaw_bdev(fs->sb->s_bdev, fs->sb);
+
+		list_del(&fs->fsb_list);
+		kfree(fs);
+	}
+}
+
+/* 
+ * Done after userspace is frozen, so there should be no danger of
+ * fses being unmounted while we're in here.
+ */
+int freezer_make_fses_ro(void)
+{
+	struct frozen_fs *fs;
+	struct super_block *sb;
+
+	/* Generate the list */
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY))
+			continue;
+
+		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);
+		fs->sb = sb;
+		list_add_tail(&fs->fsb_list, &frozen_fs_list);
+	};
+
+	/* Do the freezing in reverse order so filesystems dependant
+	 * upon others are frozen in the right order. (Eg loopback
+	 * on ext3). */
+	list_for_each_entry_reverse(fs, &frozen_fs_list, fsb_list)
+		freeze_bdev(fs->sb->s_bdev);
+
+	return 0;
+}
+
 
 static inline int freezeable(struct task_struct * p)
 {
@@ -77,6 +127,7 @@
 	printk( "Stopping tasks: " );
 	start_time = jiffies;
 	user_frozen = 0;
+	bdevs_frozen = 0;
 	do {
 		nr_user = todo = 0;
 		read_lock(&tasklist_lock);
@@ -107,6 +158,10 @@
 			start_time = jiffies;
 		}
 		user_frozen = !nr_user;
+
+		if (user_frozen && !bdevs_frozen)
+			freezer_make_fses_ro();
+
 		yield();			/* Yield is okay here */
 		if (todo && time_after(jiffies, start_time + TIMEOUT))
 			break;
@@ -156,6 +211,8 @@
 			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
 	} while_each_thread(g, p);
 
+	freezer_make_fses_rw();
+
 	read_unlock(&tasklist_lock);
 	schedule();
 	printk( " done\n" );


-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 22:21             ` Rafael J. Wysocki
@ 2006-04-26  2:24               ` Nick Piggin
  -1 siblings, 0 replies; 53+ messages in thread
From: Nick Piggin @ 2006-04-26  2:24 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Pavel Machek, Linux PM, LKML

Rafael J. Wysocki wrote:

>This means if we freeze bdevs, we'll be able to save all of the LRU pages,
>except for the pages mapped by the current task, without copying.  I think we
>can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)
>

Why not the current task? Does it exit the kernel? Or go through some
get_uesr_pages path?

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-26  2:24               ` Nick Piggin
  0 siblings, 0 replies; 53+ messages in thread
From: Nick Piggin @ 2006-04-26  2:24 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Linux PM, LKML, Pavel Machek

[-- Attachment #1: Type: text/plain, Size: 439 bytes --]

Rafael J. Wysocki wrote:

>This means if we freeze bdevs, we'll be able to save all of the LRU pages,
>except for the pages mapped by the current task, without copying.  I think we
>can try to do this, but we'll need a patch to freeze bdevs for this purpose. ;-)
>

Why not the current task? Does it exit the kernel? Or go through some
get_uesr_pages path?

--

Send instant messages to your online friends http://au.messenger.yahoo.com 


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-26  2:24               ` Nick Piggin
  (?)
@ 2006-04-26  3:41               ` Nigel Cunningham
  2006-04-26 16:22                 ` Nick Piggin
  -1 siblings, 1 reply; 53+ messages in thread
From: Nigel Cunningham @ 2006-04-26  3:41 UTC (permalink / raw
  To: Nick Piggin; +Cc: Rafael J. Wysocki, Pavel Machek, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

Hi Nick.

On Wednesday 26 April 2006 12:24, Nick Piggin wrote:
> Rafael J. Wysocki wrote:
> >This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> >except for the pages mapped by the current task, without copying.  I think
> > we can try to do this, but we'll need a patch to freeze bdevs for this
> > purpose. ;-)
>
> Why not the current task? Does it exit the kernel? Or go through some
> get_uesr_pages path?

I think Rafael is asleep at the mo, so I'll answer for him - he's wanting it 
to be compatible with using userspace to control what happens (uswsusp). In 
that case, current will be the userspace program that's calling the ioctls to 
get processes frozen etc.

Regards,

Nigel

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-26  2:24               ` Nick Piggin
@ 2006-04-26  8:10                 ` Pavel Machek
  -1 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-26  8:10 UTC (permalink / raw
  To: Nick Piggin; +Cc: Rafael J. Wysocki, Nigel Cunningham, Linux PM, LKML

On St 26-04-06 12:24:43, Nick Piggin wrote:
> Rafael J. Wysocki wrote:
> 
> >This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> >except for the pages mapped by the current task, without copying.  I think 
> >we
> >can try to do this, but we'll need a patch to freeze bdevs for this 
> >purpose. ;-)
> >
> 
> Why not the current task? Does it exit the kernel? Or go through some
> get_uesr_pages path?

It does exit kernel, and does writing to devices in userspace (so it
can compress, etc).
								Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-26  8:10                 ` Pavel Machek
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-26  8:10 UTC (permalink / raw
  To: Nick Piggin; +Cc: Nigel Cunningham, LKML, Linux PM

[-- Attachment #1: Type: text/plain, Size: 575 bytes --]

On St 26-04-06 12:24:43, Nick Piggin wrote:
> Rafael J. Wysocki wrote:
> 
> >This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> >except for the pages mapped by the current task, without copying.  I think 
> >we
> >can try to do this, but we'll need a patch to freeze bdevs for this 
> >purpose. ;-)
> >
> 
> Why not the current task? Does it exit the kernel? Or go through some
> get_uesr_pages path?

It does exit kernel, and does writing to devices in userspace (so it
can compress, etc).
								Pavel
-- 
Thanks for all the (sleeping) penguins.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-26  3:41               ` Nigel Cunningham
@ 2006-04-26 16:22                 ` Nick Piggin
  2006-04-26 21:16                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 53+ messages in thread
From: Nick Piggin @ 2006-04-26 16:22 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Rafael J. Wysocki, Pavel Machek, Linux PM, LKML

Hi Nigel,
thanks

Nigel Cunningham wrote:
> Hi Nick.
> 
> On Wednesday 26 April 2006 12:24, Nick Piggin wrote:
> 
>>Rafael J. Wysocki wrote:
>>
>>>This means if we freeze bdevs, we'll be able to save all of the LRU pages,
>>>except for the pages mapped by the current task, without copying.  I think
>>>we can try to do this, but we'll need a patch to freeze bdevs for this
>>>purpose. ;-)
>>
>>Why not the current task? Does it exit the kernel? Or go through some
>>get_uesr_pages path?
> 
> 
> I think Rafael is asleep at the mo, so I'll answer for him - he's wanting it 
> to be compatible with using userspace to control what happens (uswsusp). In 
> that case, current will be the userspace program that's calling the ioctls to 
> get processes frozen etc.

OK, so what happens if, upon exit from kernel, that userspace task
subsequently changes some pagecache but doesn't have that mapped? Or
mmaps something then changes it?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-26 16:22                 ` Nick Piggin
@ 2006-04-26 21:16                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-26 21:16 UTC (permalink / raw
  To: Nick Piggin; +Cc: Nigel Cunningham, Pavel Machek, Linux PM, LKML

Hi,

On Wednesday 26 April 2006 18:22, Nick Piggin wrote:
> Nigel Cunningham wrote:
> > On Wednesday 26 April 2006 12:24, Nick Piggin wrote:
> > 
> >>Rafael J. Wysocki wrote:
> >>
> >>>This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> >>>except for the pages mapped by the current task, without copying.  I think
> >>>we can try to do this, but we'll need a patch to freeze bdevs for this
> >>>purpose. ;-)
> >>
> >>Why not the current task? Does it exit the kernel? Or go through some
> >>get_uesr_pages path?
> > 
> > 
> > I think Rafael is asleep at the mo, so I'll answer for him - he's wanting it 
> > to be compatible with using userspace to control what happens (uswsusp). In 
> > that case, current will be the userspace program that's calling the ioctls to 
> > get processes frozen etc.

Thanks Nigel. :-)

> OK, so what happens if, upon exit from kernel, that userspace task
> subsequently changes some pagecache but doesn't have that mapped? Or
> mmaps something then changes it?

The page cache that is not mapped by anyone is copied before creating the
image and the copy is included in the image.  However there would be a problem
if some page cache mapped by someone else were mapped by the suspending
task after it had exited the kernel.

Fortunately this task is forbidden to open() or mmap() any files at that time
anyway, because it would be likely to corrupt some filesystems if it did.
In fact it must not _touch_ any filesystems after the image has been created.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-26 21:16                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-26 21:16 UTC (permalink / raw
  To: Nick Piggin; +Cc: Nigel Cunningham, Linux PM, LKML, Pavel Machek

Hi,

On Wednesday 26 April 2006 18:22, Nick Piggin wrote:
> Nigel Cunningham wrote:
> > On Wednesday 26 April 2006 12:24, Nick Piggin wrote:
> > 
> >>Rafael J. Wysocki wrote:
> >>
> >>>This means if we freeze bdevs, we'll be able to save all of the LRU pages,
> >>>except for the pages mapped by the current task, without copying.  I think
> >>>we can try to do this, but we'll need a patch to freeze bdevs for this
> >>>purpose. ;-)
> >>
> >>Why not the current task? Does it exit the kernel? Or go through some
> >>get_uesr_pages path?
> > 
> > 
> > I think Rafael is asleep at the mo, so I'll answer for him - he's wanting it 
> > to be compatible with using userspace to control what happens (uswsusp). In 
> > that case, current will be the userspace program that's calling the ioctls to 
> > get processes frozen etc.

Thanks Nigel. :-)

> OK, so what happens if, upon exit from kernel, that userspace task
> subsequently changes some pagecache but doesn't have that mapped? Or
> mmaps something then changes it?

The page cache that is not mapped by anyone is copied before creating the
image and the copy is included in the image.  However there would be a problem
if some page cache mapped by someone else were mapped by the suspending
task after it had exited the kernel.

Fortunately this task is forbidden to open() or mmap() any files at that time
anyway, because it would be likely to corrupt some filesystems if it did.
In fact it must not _touch_ any filesystems after the image has been created.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-25 10:31         ` Rafael J. Wysocki
@ 2006-04-27 15:27           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-27 15:27 UTC (permalink / raw
  To: Pavel Machek; +Cc: Linux PM, LKML, Nigel Cunningham

Hi,

On Tuesday 25 April 2006 12:31, Rafael J. Wysocki wrote:
> On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > > Okay, so it can be done, and patch does not look too bad. It still
> > > > scares me. Is 800MB image more responsive than 500MB after resume?
> > > 
> > > Yes, it is, slightly, but I think 800 meg images are impractical for
> > > performance reasons (like IMO everything above 500 meg with the current
> > > hardware).  However this means we can save 80% of RAM with the patch
> > > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > > 200 meg instead of 125 on a 250 meg machine.
> > 
> > Could we get few people trying it on such small machines to see if it
> > is really that noticeable?
> 
> OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).

Done, although it was not so easy to find the box.  This was a PII 350MHz w/
256 MB of RAM.

I invented the following test:
- ran KDE with 4 desktops,
- ran Firefox, OpenOffice.org 1.1 (with a simple spreadsheet), and GIMP (with
2 pictures) each on its own desktop,
- ran the memory meter from the KDE's Info Center and two konsoles
on the remaining desktop - one konsole with a kernel compilation and the
other with a root session used for suspending the box (the built-in swsusp
was used),
so the box's RAM was almost fully occupied with ~30% taken by the page cache.

Then I suspended the box and measured the time from the start of resume
(ie. leaving GRUB) to the point at which I had all of the application windows
fully rendered on their respective desktops (I always switched the desktops
in the same order, starting from the memory meter's one, through the OOo's
and Firefox's, finishing on the GIMP's one and I always switched the
desktop as soon as the window(s) on it were fully rendered).

I ran it a couple of times on the 2.6.17-rc1-mm2 kernel with and without
the patch and the results (on the average) are the following:

(a) 25-28s with the patch
(b) 30-33s without it

Moreover, with the patch the image size was about 49000 of pages (only
~7000 - 12000 pages had to be copied during suspend) and without the patch
it was about 30000 of pages, so with the patch much smaller number of
pages was in the swap after resume.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-27 15:27           ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-27 15:27 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Linux PM, LKML

Hi,

On Tuesday 25 April 2006 12:31, Rafael J. Wysocki wrote:
> On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > > Okay, so it can be done, and patch does not look too bad. It still
> > > > scares me. Is 800MB image more responsive than 500MB after resume?
> > > 
> > > Yes, it is, slightly, but I think 800 meg images are impractical for
> > > performance reasons (like IMO everything above 500 meg with the current
> > > hardware).  However this means we can save 80% of RAM with the patch
> > > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > > 200 meg instead of 125 on a 250 meg machine.
> > 
> > Could we get few people trying it on such small machines to see if it
> > is really that noticeable?
> 
> OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).

Done, although it was not so easy to find the box.  This was a PII 350MHz w/
256 MB of RAM.

I invented the following test:
- ran KDE with 4 desktops,
- ran Firefox, OpenOffice.org 1.1 (with a simple spreadsheet), and GIMP (with
2 pictures) each on its own desktop,
- ran the memory meter from the KDE's Info Center and two konsoles
on the remaining desktop - one konsole with a kernel compilation and the
other with a root session used for suspending the box (the built-in swsusp
was used),
so the box's RAM was almost fully occupied with ~30% taken by the page cache.

Then I suspended the box and measured the time from the start of resume
(ie. leaving GRUB) to the point at which I had all of the application windows
fully rendered on their respective desktops (I always switched the desktops
in the same order, starting from the memory meter's one, through the OOo's
and Firefox's, finishing on the GIMP's one and I always switched the
desktop as soon as the window(s) on it were fully rendered).

I ran it a couple of times on the 2.6.17-rc1-mm2 kernel with and without
the patch and the results (on the average) are the following:

(a) 25-28s with the patch
(b) 30-33s without it

Moreover, with the patch the image size was about 49000 of pages (only
~7000 - 12000 pages had to be copied during suspend) and without the patch
it was about 30000 of pages, so with the patch much smaller number of
pages was in the swap after resume.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC][PATCH] swsusp: support creating bigger images (rev. 2)
  2006-04-25 15:39   ` Rafael J. Wysocki
@ 2006-04-27 19:53       ` Rafael J. Wysocki
  2006-04-27 19:53       ` Rafael J. Wysocki
  1 sibling, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-27 19:53 UTC (permalink / raw
  To: Nick Piggin; +Cc: Nigel Cunningham, Linux PM, LKML, Pavel Machek

Hi,

On Tuesday 25 April 2006 17:39, Rafael J. Wysocki wrote:
}-- snip --{
> 
> Thanks a lot for the comments.  I'll do my best to follow your suggestions
> in the next version of the patch.

The corrected version of the patch is appended.

A short summary of changes:
- the code that refers to the mm internals moved to mm/rmap.c and mm/vmscan.c
- added some comments (if you think more are needed or the existing ones should
be clarified, please let me know)
- the PageLRU flags are set/reset under spinlocks

Could you please have a look at it?

Greetings,
Rafael

---
 include/linux/rmap.h    |    8 +
 include/linux/swap.h    |    4 
 kernel/power/power.h    |   10 +-
 kernel/power/snapshot.c |  225 +++++++++++++++++++++++++++++++++---------------
 kernel/power/swsusp.c   |   15 +--
 kernel/power/user.c     |    4 
 mm/rmap.c               |   40 ++++++++
 mm/vmscan.c             |   63 +++++++++++++
 8 files changed, 293 insertions(+), 76 deletions(-)

Index: linux-2.6.17-rc2-mm1/mm/rmap.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/mm/rmap.c
+++ linux-2.6.17-rc2-mm1/mm/rmap.c
@@ -851,3 +851,43 @@ int try_to_unmap(struct page *page, int 
 	return ret;
 }
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+static int page_mapped_by_task(struct page *page, struct task_struct *task)
+{
+	struct vm_area_struct *vma;
+	int ret = 0;
+
+	spin_lock(&task->mm->page_table_lock);
+
+	for (vma = task->mm->mmap; vma; vma = vma->vm_next)
+		if (page_address_in_vma(page, vma) != -EFAULT) {
+			ret = 1;
+			break;
+		}
+
+	spin_unlock(&task->mm->page_table_lock);
+
+	return ret;
+}
+
+/**
+ *	page_to_copy - determine if a page can be included in the
+ *	suspend image without copying (returns false if that's the case).
+ *
+ *	It is safe to include the page in the suspend image without
+ *	copying if (a) it's on the LRU and (b) it's mapped by a frozen task
+ *	(all tasks except for the current task should be frozen when it's
+ *	called).  Otherwise the page should be copied for this purpose
+ *	(compound pages are avoided for simplicity).
+ */
+
+int page_to_copy(struct page *page)
+{
+	if (!PageLRU(page) || PageCompound(page))
+		return 1;
+	if (page_mapped(page))
+		return page_mapped_by_task(page, current);
+
+	return 1;
+}
+#endif /* CONFIG_SOFTWARE_SUSPEND */
Index: linux-2.6.17-rc2-mm1/include/linux/rmap.h
===================================================================
--- linux-2.6.17-rc2-mm1.orig/include/linux/rmap.h
+++ linux-2.6.17-rc2-mm1/include/linux/rmap.h
@@ -104,6 +104,14 @@ pte_t *page_check_address(struct page *,
  */
 unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+/*
+ * Used to determine if the page can be included in the suspend image without
+ * copying
+ */
+int page_to_copy(struct page *);
+#endif
+
 #else	/* !CONFIG_MMU */
 
 #define anon_vma_init()		do {} while (0)
Index: linux-2.6.17-rc2-mm1/kernel/power/snapshot.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/snapshot.c
+++ linux-2.6.17-rc2-mm1/kernel/power/snapshot.c
@@ -13,6 +13,7 @@
 #include <linux/version.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/rmap.h>
 #include <linux/suspend.h>
 #include <linux/smp_lock.h>
 #include <linux/delay.h>
@@ -261,6 +262,14 @@ int restore_special_mem(void)
 	return ret;
 }
 
+/* Represents a stacked allocated page to be used in the future */
+struct res_page {
+	struct res_page *next;
+	char padding[PAGE_SIZE - sizeof(void *)];
+};
+
+static struct res_page *page_list;
+
 static int pfn_is_nosave(unsigned long pfn)
 {
 	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >> PAGE_SHIFT;
@@ -269,7 +278,7 @@ static int pfn_is_nosave(unsigned long p
 }
 
 /**
- *	saveable - Determine whether a page should be cloned or not.
+ *	saveable - Determine whether a page should be saved or not.
  *	@pfn:	The page
  *
  *	We save a page if it's Reserved, and not in the range of pages
@@ -277,9 +286,8 @@ static int pfn_is_nosave(unsigned long p
  *	isn't part of a free chunk of pages.
  */
 
-static int saveable(struct zone *zone, unsigned long *zone_pfn)
+static int saveable(unsigned long pfn)
 {
-	unsigned long pfn = *zone_pfn + zone->zone_start_pfn;
 	struct page *page;
 
 	if (!pfn_valid(pfn))
@@ -296,46 +304,103 @@ static int saveable(struct zone *zone, u
 	return 1;
 }
 
-unsigned int count_data_pages(void)
+/**
+ *	count_data_pages - returns the number of pages that need to be copied
+ *	in order to be included in the suspend snapshot image.  If @total is
+ *	not null, the total number of pages that should be included in the
+ *	snapshot image is stored in the location pointed to by it.
+ */
+
+unsigned int count_data_pages(unsigned int *total)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
-	unsigned int n = 0;
+	unsigned int n, m;
 
+	n = m = 0;
 	for_each_zone (zone) {
 		if (is_highmem(zone))
 			continue;
 		mark_free_pages(zone);
-		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
-			n += saveable(zone, &zone_pfn);
+		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
+			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+
+			if (saveable(pfn)) {
+				n++;
+				if (!page_to_copy(pfn_to_page(pfn)))
+					m++;
+			}
+		}
 	}
-	return n;
+	if (total)
+		*total = n;
+
+	return n - m;
 }
 
+/**
+ *	copy_data_pages - populates the page backup list @pblist with
+ *	the addresses of the pages that should be included in the
+ *	suspend snapshot image.  The pages that cannot be included in the
+ *	image without copying are copied into empty page frames allocated
+ *	earlier and available from the list page_list (the addresses of
+ *	these page frames are also stored in the page backup list).
+ *
+ *	This function is only called from the critical section, ie. when
+ *	processes are frozen, there's only one CPU online and local IRQs
+ *	are disabled on it.
+ */
+
 static void copy_data_pages(struct pbe *pblist)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
 	struct pbe *pbe, *p;
+	struct res_page *ptr;
 
 	pbe = pblist;
 	for_each_zone (zone) {
 		if (is_highmem(zone))
 			continue;
+
 		mark_free_pages(zone);
-		/* This is necessary for swsusp_free() */
+		/* This is necessary for free_image() */
 		for_each_pb_page (p, pblist)
 			SetPageNosaveFree(virt_to_page(p));
+
 		for_each_pbe (p, pblist)
-			SetPageNosaveFree(virt_to_page(p->address));
+			if (p->address && p->orig_address != p->address)
+				SetPageNosaveFree(virt_to_page(p->address));
+
+		ptr = page_list;
+		while (ptr) {
+			SetPageNosaveFree(virt_to_page(ptr));
+			ptr = ptr->next;
+		}
+
 		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
-			if (saveable(zone, &zone_pfn)) {
+			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+
+			if (saveable(pfn)) {
 				struct page *page;
-				page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
+
 				BUG_ON(!pbe);
+				page = pfn_to_page(pfn);
 				pbe->orig_address = (unsigned long)page_address(page);
-				/* copy_page is not usable for copying task structs. */
-				memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
+				if (page_to_copy(page)) {
+					BUG_ON(!page_list);
+					pbe->address = (unsigned long)page_list;
+					page_list = page_list->next;
+					/*
+					 * copy_page is not usable for copying
+					 * task structs.
+					 */
+					memcpy((void *)pbe->address,
+						(void *)pbe->orig_address,
+						PAGE_SIZE);
+				} else {
+					pbe->address = pbe->orig_address;
+				}
 				pbe = pbe->next;
 			}
 		}
@@ -419,7 +484,7 @@ static inline void *alloc_image_page(gfp
 	res = (void *)get_zeroed_page(gfp_mask);
 	if (safe_needed)
 		while (res && PageNosaveFree(virt_to_page(res))) {
-			/* The page is unsafe, mark it for swsusp_free() */
+			/* The page is unsafe, mark it for free_image() */
 			SetPageNosave(virt_to_page(res));
 			unsafe_pages++;
 			res = (void *)get_zeroed_page(gfp_mask);
@@ -474,11 +539,32 @@ static struct pbe *alloc_pagedir(unsigne
 }
 
 /**
+ *	protect_data_pages - move data pages that need to be protected from
+ *	being reclaimed (ie. those included in the suspend image without
+ *	copying) out of their respective LRU lists.  This is done after the
+ *	image has been created so the LRU lists only have to be restored if
+ *	the suspend fails.
+ *
+ *	This function is only called from the critical section, ie. when
+ *	processes are frozen, there's only one CPU online and local IRQs
+ *	are disabled on it.
+ */
+
+static void protect_data_pages(struct pbe *pblist)
+{
+	struct pbe *p;
+
+	for_each_pbe (p, pblist)
+		if (p->address == p->orig_address)
+			move_out_of_lru(virt_to_page(p->address));
+}
+
+/**
  * Free pages we allocated for suspend. Suspend pages are alocated
  * before atomic copy, so we need to free them after resume.
  */
 
-void swsusp_free(void)
+void free_image(void)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
@@ -501,6 +587,11 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+void swsusp_free(void)
+{
+	free_image();
+	restore_active_inactive_lists();
+}
 
 /**
  *	enough_free_mem - Make sure we enough free memory to snapshot.
@@ -509,32 +600,33 @@ void swsusp_free(void)
  *	free pages.
  */
 
-static int enough_free_mem(unsigned int nr_pages)
+static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
 {
 	struct zone *zone;
-	unsigned int n = 0;
+	long n = 0;
+
+	pr_debug("swsusp: pages needed: %u + %lu + %lu\n",
+		 copy_pages,
+		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
+		 EXTRA_PAGES);
 
 	for_each_zone (zone)
-		if (!is_highmem(zone))
+		if (!is_highmem(zone) && populated_zone(zone)) {
 			n += zone->free_pages;
-	pr_debug("swsusp: available memory: %u pages\n", n);
-	return n > (nr_pages + PAGES_FOR_IO +
-		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
-}
-
-static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
-{
-	struct pbe *p;
+			/*
+			 * We're going to use atomic allocations, so we
+			 * shouldn't count the lowmem reserves in the lower
+			 * zones as available to us
+			 */
+			n -= zone->lowmem_reserve[ZONE_NORMAL];
+		}
 
-	for_each_pbe (p, pblist) {
-		p->address = (unsigned long)alloc_image_page(gfp_mask, safe_needed);
-		if (!p->address)
-			return -ENOMEM;
-	}
-	return 0;
+	pr_debug("swsusp: available memory: %ld pages\n", n);
+	return n > (long)(copy_pages + EXTRA_PAGES +
+		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
 }
 
-static struct pbe *swsusp_alloc(unsigned int nr_pages)
+static struct pbe *swsusp_alloc(unsigned int nr_pages, unsigned int copy_pages)
 {
 	struct pbe *pblist;
 
@@ -543,10 +635,19 @@ static struct pbe *swsusp_alloc(unsigned
 		return NULL;
 	}
 
-	if (alloc_data_pages(pblist, GFP_ATOMIC | __GFP_COLD, 0)) {
-		printk(KERN_ERR "suspend: Allocating image pages failed.\n");
-		swsusp_free();
-		return NULL;
+	page_list = NULL;
+	while (copy_pages--) {
+		struct res_page *ptr;
+
+		ptr = alloc_image_page(GFP_ATOMIC | __GFP_COLD, 0);
+		if (!ptr) {
+			printk(KERN_ERR
+				"suspend: Allocating image pages failed.\n");
+			free_image();
+			return NULL;
+		}
+		ptr->next = page_list;
+		page_list = ptr;
 	}
 
 	return pblist;
@@ -554,25 +655,21 @@ static struct pbe *swsusp_alloc(unsigned
 
 asmlinkage int swsusp_save(void)
 {
-	unsigned int nr_pages;
+	unsigned int nr_pages, copy_pages;
 
 	pr_debug("swsusp: critical section: \n");
 
 	drain_local_pages();
-	nr_pages = count_data_pages();
-	printk("swsusp: Need to copy %u pages\n", nr_pages);
-
-	pr_debug("swsusp: pages needed: %u + %lu + %u, free: %u\n",
-		 nr_pages,
-		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
-		 PAGES_FOR_IO, nr_free_pages());
+	copy_pages = count_data_pages(&nr_pages);
+	printk("swsusp: Need to save %u pages\n", nr_pages);
+	printk("swsusp: %u pages to copy\n", copy_pages);
 
-	if (!enough_free_mem(nr_pages)) {
+	if (!enough_free_mem(nr_pages, copy_pages)) {
 		printk(KERN_ERR "swsusp: Not enough free memory\n");
 		return -ENOMEM;
 	}
 
-	pagedir_nosave = swsusp_alloc(nr_pages);
+	pagedir_nosave = swsusp_alloc(nr_pages, copy_pages);
 	if (!pagedir_nosave)
 		return -ENOMEM;
 
@@ -582,6 +679,9 @@ asmlinkage int swsusp_save(void)
 	drain_local_pages();
 	copy_data_pages(pagedir_nosave);
 
+	/* Make sure the pages that we have not copied won't be reclaimed */
+	protect_data_pages(pagedir_nosave);
+
 	/*
 	 * End of critical section. From now on, we can write to memory,
 	 * but we should not touch disk. This specially means we must _not_
@@ -591,7 +691,7 @@ asmlinkage int swsusp_save(void)
 	nr_copy_pages = nr_pages;
 	nr_meta_pages = (nr_pages * sizeof(long) + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
-	printk("swsusp: critical section/: done (%d pages copied)\n", nr_pages);
+	printk("swsusp: critical section/: done (%d pages)\n", nr_pages);
 	return 0;
 }
 
@@ -654,7 +754,7 @@ int snapshot_read_next(struct snapshot_h
 	if (handle->page > nr_meta_pages + nr_copy_pages)
 		return 0;
 	if (!buffer) {
-		/* This makes the buffer be freed by swsusp_free() */
+		/* This makes the buffer be freed by free_image() */
 		buffer = alloc_image_page(GFP_ATOMIC, 0);
 		if (!buffer)
 			return -ENOMEM;
@@ -810,13 +910,6 @@ static inline struct pbe *unpack_orig_ad
  *	of "safe" which will be used later
  */
 
-struct safe_page {
-	struct safe_page *next;
-	char padding[PAGE_SIZE - sizeof(void *)];
-};
-
-static struct safe_page *safe_pages;
-
 static int prepare_image(struct snapshot_handle *handle)
 {
 	int error = 0;
@@ -833,21 +926,21 @@ static int prepare_image(struct snapshot
 		if (!pblist)
 			error = -ENOMEM;
 	}
-	safe_pages = NULL;
+	page_list = NULL;
 	if (!error && nr_pages > unsafe_pages) {
 		nr_pages -= unsafe_pages;
 		while (nr_pages--) {
-			struct safe_page *ptr;
+			struct res_page *ptr;
 
-			ptr = (struct safe_page *)get_zeroed_page(GFP_ATOMIC);
+			ptr = (struct res_page *)get_zeroed_page(GFP_ATOMIC);
 			if (!ptr) {
 				error = -ENOMEM;
 				break;
 			}
 			if (!PageNosaveFree(virt_to_page(ptr))) {
 				/* The page is "safe", add it to the list */
-				ptr->next = safe_pages;
-				safe_pages = ptr;
+				ptr->next = page_list;
+				page_list = ptr;
 			}
 			/* Mark the page as allocated */
 			SetPageNosave(virt_to_page(ptr));
@@ -858,7 +951,7 @@ static int prepare_image(struct snapshot
 		pagedir_nosave = pblist;
 	} else {
 		handle->pbe = NULL;
-		swsusp_free();
+		free_image();
 	}
 	return error;
 }
@@ -882,8 +975,8 @@ static void *get_buffer(struct snapshot_
 	 * The "original" page frame has not been allocated and we have to
 	 * use a "safe" page frame to store the read page
 	 */
-	pbe->address = (unsigned long)safe_pages;
-	safe_pages = safe_pages->next;
+	pbe->address = (unsigned long)page_list;
+	page_list = page_list->next;
 	if (last)
 		last->next = pbe;
 	handle->last_pbe = pbe;
@@ -919,7 +1012,7 @@ int snapshot_write_next(struct snapshot_
 	if (handle->prev && handle->page > nr_meta_pages + nr_copy_pages)
 		return 0;
 	if (!buffer) {
-		/* This makes the buffer be freed by swsusp_free() */
+		/* This makes the buffer be freed by free_image() */
 		buffer = alloc_image_page(GFP_ATOMIC, 0);
 		if (!buffer)
 			return -ENOMEM;
Index: linux-2.6.17-rc2-mm1/kernel/power/power.h
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/power.h
+++ linux-2.6.17-rc2-mm1/kernel/power/power.h
@@ -48,7 +48,14 @@ extern dev_t swsusp_resume_device;
 extern asmlinkage int swsusp_arch_suspend(void);
 extern asmlinkage int swsusp_arch_resume(void);
 
-extern unsigned int count_data_pages(void);
+#if (defined(CONFIG_BLK_DEV_INITRD) && defined(CONFIG_BLK_DEV_RAM))
+#define EXTRA_PAGES	(PAGES_FOR_IO + \
+			(2 * CONFIG_BLK_DEV_RAM_SIZE * 1024) / PAGE_SIZE)
+#else
+#define EXTRA_PAGES	PAGES_FOR_IO
+#endif
+
+extern unsigned int count_data_pages(unsigned int *total);
 
 struct snapshot_handle {
 	loff_t		offset;
@@ -111,6 +118,7 @@ extern int restore_special_mem(void);
 
 extern int swsusp_check(void);
 extern int swsusp_shrink_memory(void);
+extern void free_image(void);
 extern void swsusp_free(void);
 extern int swsusp_suspend(void);
 extern int swsusp_resume(void);
Index: linux-2.6.17-rc2-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/swsusp.c
+++ linux-2.6.17-rc2-mm1/kernel/power/swsusp.c
@@ -177,28 +177,29 @@ int swsusp_shrink_memory(void)
 	long size, tmp;
 	struct zone *zone;
 	unsigned long pages = 0;
-	unsigned int i = 0;
+	unsigned int to_save, i = 0;
 	char *p = "-\\|/";
 
 	printk("Shrinking memory...  ");
 	do {
 		size = 2 * count_special_pages();
-		size += size / 50 + count_data_pages();
-		size += (size + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
-			PAGES_FOR_IO;
+		size += size / 50 + count_data_pages(&to_save);
+		size += (to_save + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
+			EXTRA_PAGES;
 		tmp = size;
 		for_each_zone (zone)
 			if (!is_highmem(zone) && populated_zone(zone)) {
 				tmp -= zone->free_pages;
 				tmp += zone->lowmem_reserve[ZONE_NORMAL];
 			}
+		size = to_save * (PAGE_SIZE + sizeof(long));
 		if (tmp > 0) {
 			tmp = __shrink_memory(tmp);
 			if (!tmp)
 				return -ENOMEM;
 			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+		} else if (size > image_size) {
+			tmp = __shrink_memory(size - image_size);
 			pages += tmp;
 		}
 		printk("\b%c", p[i++%4]);
@@ -261,7 +262,7 @@ int swsusp_resume(void)
 	 * very tight, so we have to free it as soon as we can to avoid
 	 * subsequent failures
 	 */
-	swsusp_free();
+	free_image();
 	restore_processor_state();
 	restore_special_mem();
 	touch_softlockup_watchdog();
Index: linux-2.6.17-rc2-mm1/kernel/power/user.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/user.c
+++ linux-2.6.17-rc2-mm1/kernel/power/user.c
@@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
 	case SNAPSHOT_UNFREEZE:
 		if (!data->frozen)
 			break;
+		if (data->ready && in_suspend) {
+			error = -EPERM;
+			break;
+		}
 		down(&pm_sem);
 		thaw_processes();
 		enable_nonboot_cpus();
Index: linux-2.6.17-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/mm/vmscan.c
+++ linux-2.6.17-rc2-mm1/mm/vmscan.c
@@ -1437,7 +1437,68 @@ out:
 
 	return ret;
 }
-#endif
+
+static LIST_HEAD(saved_active_pages);
+static LIST_HEAD(saved_inactive_pages);
+
+/**
+ *	move_out_of_lru - software suspend includes some pages in the
+ *	suspend snapshot image without copying and these pages should be
+ *	procected from being reclaimed, which can be done by (temporarily)
+ *	moving them out of their respective LRU lists
+ *
+ *	It is to be called with local IRQs disabled.
+ */
+
+void move_out_of_lru(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+
+	spin_lock(&zone->lru_lock);
+	if (PageActive(page)) {
+		del_page_from_active_list(zone, page);
+		list_add(&page->lru, &saved_active_pages);
+	} else {
+		del_page_from_inactive_list(zone, page);
+		list_add(&page->lru, &saved_inactive_pages);
+	}
+	ClearPageLRU(page);
+	spin_unlock(&zone->lru_lock);
+}
+
+
+/**
+ *	restore_active_inactive_lists - used by the software suspend to move
+ *	the pages taken out of the LRU by take_page_out_of_lru() back to
+ *	their respective active/inactive lists (if the suspend fails)
+ */
+
+void restore_active_inactive_lists(void)
+{
+	struct page *page;
+	struct zone *zone;
+
+	while(!list_empty(&saved_active_pages)) {
+		page = lru_to_page(&saved_active_pages);
+		zone = page_zone(page);
+		list_del(&page->lru);
+		spin_lock_irq(&zone->lru_lock);
+		SetPageLRU(page);
+		add_page_to_active_list(zone, page);
+		spin_unlock_irq(&zone->lru_lock);
+	}
+
+	while(!list_empty(&saved_inactive_pages)) {
+		page = lru_to_page(&saved_inactive_pages);
+		zone = page_zone(page);
+		list_del(&page->lru);
+		spin_lock_irq(&zone->lru_lock);
+		SetPageLRU(page);
+		add_page_to_inactive_list(zone, page);
+		spin_unlock_irq(&zone->lru_lock);
+	}
+}
+#endif /* CONFIG_PM */
 
 #ifdef CONFIG_HOTPLUG_CPU
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
Index: linux-2.6.17-rc2-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.17-rc2-mm1.orig/include/linux/swap.h
+++ linux-2.6.17-rc2-mm1/include/linux/swap.h
@@ -184,7 +184,9 @@ extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zone **, gfp_t);
-extern unsigned long shrink_all_memory(unsigned long nr_pages);
+extern unsigned long shrink_all_memory(unsigned long);
+extern void move_out_of_lru(struct page *page);
+extern void restore_active_inactive_lists(void);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC][PATCH] swsusp: support creating bigger images (rev. 2)
@ 2006-04-27 19:53       ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-27 19:53 UTC (permalink / raw
  To: Nick Piggin; +Cc: Nigel Cunningham, Linux PM, LKML

Hi,

On Tuesday 25 April 2006 17:39, Rafael J. Wysocki wrote:
}-- snip --{
> 
> Thanks a lot for the comments.  I'll do my best to follow your suggestions
> in the next version of the patch.

The corrected version of the patch is appended.

A short summary of changes:
- the code that refers to the mm internals moved to mm/rmap.c and mm/vmscan.c
- added some comments (if you think more are needed or the existing ones should
be clarified, please let me know)
- the PageLRU flags are set/reset under spinlocks

Could you please have a look at it?

Greetings,
Rafael

---
 include/linux/rmap.h    |    8 +
 include/linux/swap.h    |    4 
 kernel/power/power.h    |   10 +-
 kernel/power/snapshot.c |  225 +++++++++++++++++++++++++++++++++---------------
 kernel/power/swsusp.c   |   15 +--
 kernel/power/user.c     |    4 
 mm/rmap.c               |   40 ++++++++
 mm/vmscan.c             |   63 +++++++++++++
 8 files changed, 293 insertions(+), 76 deletions(-)

Index: linux-2.6.17-rc2-mm1/mm/rmap.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/mm/rmap.c
+++ linux-2.6.17-rc2-mm1/mm/rmap.c
@@ -851,3 +851,43 @@ int try_to_unmap(struct page *page, int 
 	return ret;
 }
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+static int page_mapped_by_task(struct page *page, struct task_struct *task)
+{
+	struct vm_area_struct *vma;
+	int ret = 0;
+
+	spin_lock(&task->mm->page_table_lock);
+
+	for (vma = task->mm->mmap; vma; vma = vma->vm_next)
+		if (page_address_in_vma(page, vma) != -EFAULT) {
+			ret = 1;
+			break;
+		}
+
+	spin_unlock(&task->mm->page_table_lock);
+
+	return ret;
+}
+
+/**
+ *	page_to_copy - determine if a page can be included in the
+ *	suspend image without copying (returns false if that's the case).
+ *
+ *	It is safe to include the page in the suspend image without
+ *	copying if (a) it's on the LRU and (b) it's mapped by a frozen task
+ *	(all tasks except for the current task should be frozen when it's
+ *	called).  Otherwise the page should be copied for this purpose
+ *	(compound pages are avoided for simplicity).
+ */
+
+int page_to_copy(struct page *page)
+{
+	if (!PageLRU(page) || PageCompound(page))
+		return 1;
+	if (page_mapped(page))
+		return page_mapped_by_task(page, current);
+
+	return 1;
+}
+#endif /* CONFIG_SOFTWARE_SUSPEND */
Index: linux-2.6.17-rc2-mm1/include/linux/rmap.h
===================================================================
--- linux-2.6.17-rc2-mm1.orig/include/linux/rmap.h
+++ linux-2.6.17-rc2-mm1/include/linux/rmap.h
@@ -104,6 +104,14 @@ pte_t *page_check_address(struct page *,
  */
 unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
 
+#ifdef CONFIG_SOFTWARE_SUSPEND
+/*
+ * Used to determine if the page can be included in the suspend image without
+ * copying
+ */
+int page_to_copy(struct page *);
+#endif
+
 #else	/* !CONFIG_MMU */
 
 #define anon_vma_init()		do {} while (0)
Index: linux-2.6.17-rc2-mm1/kernel/power/snapshot.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/snapshot.c
+++ linux-2.6.17-rc2-mm1/kernel/power/snapshot.c
@@ -13,6 +13,7 @@
 #include <linux/version.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/rmap.h>
 #include <linux/suspend.h>
 #include <linux/smp_lock.h>
 #include <linux/delay.h>
@@ -261,6 +262,14 @@ int restore_special_mem(void)
 	return ret;
 }
 
+/* Represents a stacked allocated page to be used in the future */
+struct res_page {
+	struct res_page *next;
+	char padding[PAGE_SIZE - sizeof(void *)];
+};
+
+static struct res_page *page_list;
+
 static int pfn_is_nosave(unsigned long pfn)
 {
 	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >> PAGE_SHIFT;
@@ -269,7 +278,7 @@ static int pfn_is_nosave(unsigned long p
 }
 
 /**
- *	saveable - Determine whether a page should be cloned or not.
+ *	saveable - Determine whether a page should be saved or not.
  *	@pfn:	The page
  *
  *	We save a page if it's Reserved, and not in the range of pages
@@ -277,9 +286,8 @@ static int pfn_is_nosave(unsigned long p
  *	isn't part of a free chunk of pages.
  */
 
-static int saveable(struct zone *zone, unsigned long *zone_pfn)
+static int saveable(unsigned long pfn)
 {
-	unsigned long pfn = *zone_pfn + zone->zone_start_pfn;
 	struct page *page;
 
 	if (!pfn_valid(pfn))
@@ -296,46 +304,103 @@ static int saveable(struct zone *zone, u
 	return 1;
 }
 
-unsigned int count_data_pages(void)
+/**
+ *	count_data_pages - returns the number of pages that need to be copied
+ *	in order to be included in the suspend snapshot image.  If @total is
+ *	not null, the total number of pages that should be included in the
+ *	snapshot image is stored in the location pointed to by it.
+ */
+
+unsigned int count_data_pages(unsigned int *total)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
-	unsigned int n = 0;
+	unsigned int n, m;
 
+	n = m = 0;
 	for_each_zone (zone) {
 		if (is_highmem(zone))
 			continue;
 		mark_free_pages(zone);
-		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn)
-			n += saveable(zone, &zone_pfn);
+		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
+			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+
+			if (saveable(pfn)) {
+				n++;
+				if (!page_to_copy(pfn_to_page(pfn)))
+					m++;
+			}
+		}
 	}
-	return n;
+	if (total)
+		*total = n;
+
+	return n - m;
 }
 
+/**
+ *	copy_data_pages - populates the page backup list @pblist with
+ *	the addresses of the pages that should be included in the
+ *	suspend snapshot image.  The pages that cannot be included in the
+ *	image without copying are copied into empty page frames allocated
+ *	earlier and available from the list page_list (the addresses of
+ *	these page frames are also stored in the page backup list).
+ *
+ *	This function is only called from the critical section, ie. when
+ *	processes are frozen, there's only one CPU online and local IRQs
+ *	are disabled on it.
+ */
+
 static void copy_data_pages(struct pbe *pblist)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
 	struct pbe *pbe, *p;
+	struct res_page *ptr;
 
 	pbe = pblist;
 	for_each_zone (zone) {
 		if (is_highmem(zone))
 			continue;
+
 		mark_free_pages(zone);
-		/* This is necessary for swsusp_free() */
+		/* This is necessary for free_image() */
 		for_each_pb_page (p, pblist)
 			SetPageNosaveFree(virt_to_page(p));
+
 		for_each_pbe (p, pblist)
-			SetPageNosaveFree(virt_to_page(p->address));
+			if (p->address && p->orig_address != p->address)
+				SetPageNosaveFree(virt_to_page(p->address));
+
+		ptr = page_list;
+		while (ptr) {
+			SetPageNosaveFree(virt_to_page(ptr));
+			ptr = ptr->next;
+		}
+
 		for (zone_pfn = 0; zone_pfn < zone->spanned_pages; ++zone_pfn) {
-			if (saveable(zone, &zone_pfn)) {
+			unsigned long pfn = zone_pfn + zone->zone_start_pfn;
+
+			if (saveable(pfn)) {
 				struct page *page;
-				page = pfn_to_page(zone_pfn + zone->zone_start_pfn);
+
 				BUG_ON(!pbe);
+				page = pfn_to_page(pfn);
 				pbe->orig_address = (unsigned long)page_address(page);
-				/* copy_page is not usable for copying task structs. */
-				memcpy((void *)pbe->address, (void *)pbe->orig_address, PAGE_SIZE);
+				if (page_to_copy(page)) {
+					BUG_ON(!page_list);
+					pbe->address = (unsigned long)page_list;
+					page_list = page_list->next;
+					/*
+					 * copy_page is not usable for copying
+					 * task structs.
+					 */
+					memcpy((void *)pbe->address,
+						(void *)pbe->orig_address,
+						PAGE_SIZE);
+				} else {
+					pbe->address = pbe->orig_address;
+				}
 				pbe = pbe->next;
 			}
 		}
@@ -419,7 +484,7 @@ static inline void *alloc_image_page(gfp
 	res = (void *)get_zeroed_page(gfp_mask);
 	if (safe_needed)
 		while (res && PageNosaveFree(virt_to_page(res))) {
-			/* The page is unsafe, mark it for swsusp_free() */
+			/* The page is unsafe, mark it for free_image() */
 			SetPageNosave(virt_to_page(res));
 			unsafe_pages++;
 			res = (void *)get_zeroed_page(gfp_mask);
@@ -474,11 +539,32 @@ static struct pbe *alloc_pagedir(unsigne
 }
 
 /**
+ *	protect_data_pages - move data pages that need to be protected from
+ *	being reclaimed (ie. those included in the suspend image without
+ *	copying) out of their respective LRU lists.  This is done after the
+ *	image has been created so the LRU lists only have to be restored if
+ *	the suspend fails.
+ *
+ *	This function is only called from the critical section, ie. when
+ *	processes are frozen, there's only one CPU online and local IRQs
+ *	are disabled on it.
+ */
+
+static void protect_data_pages(struct pbe *pblist)
+{
+	struct pbe *p;
+
+	for_each_pbe (p, pblist)
+		if (p->address == p->orig_address)
+			move_out_of_lru(virt_to_page(p->address));
+}
+
+/**
  * Free pages we allocated for suspend. Suspend pages are alocated
  * before atomic copy, so we need to free them after resume.
  */
 
-void swsusp_free(void)
+void free_image(void)
 {
 	struct zone *zone;
 	unsigned long zone_pfn;
@@ -501,6 +587,11 @@ void swsusp_free(void)
 	buffer = NULL;
 }
 
+void swsusp_free(void)
+{
+	free_image();
+	restore_active_inactive_lists();
+}
 
 /**
  *	enough_free_mem - Make sure we enough free memory to snapshot.
@@ -509,32 +600,33 @@ void swsusp_free(void)
  *	free pages.
  */
 
-static int enough_free_mem(unsigned int nr_pages)
+static int enough_free_mem(unsigned int nr_pages, unsigned int copy_pages)
 {
 	struct zone *zone;
-	unsigned int n = 0;
+	long n = 0;
+
+	pr_debug("swsusp: pages needed: %u + %lu + %lu\n",
+		 copy_pages,
+		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
+		 EXTRA_PAGES);
 
 	for_each_zone (zone)
-		if (!is_highmem(zone))
+		if (!is_highmem(zone) && populated_zone(zone)) {
 			n += zone->free_pages;
-	pr_debug("swsusp: available memory: %u pages\n", n);
-	return n > (nr_pages + PAGES_FOR_IO +
-		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
-}
-
-static int alloc_data_pages(struct pbe *pblist, gfp_t gfp_mask, int safe_needed)
-{
-	struct pbe *p;
+			/*
+			 * We're going to use atomic allocations, so we
+			 * shouldn't count the lowmem reserves in the lower
+			 * zones as available to us
+			 */
+			n -= zone->lowmem_reserve[ZONE_NORMAL];
+		}
 
-	for_each_pbe (p, pblist) {
-		p->address = (unsigned long)alloc_image_page(gfp_mask, safe_needed);
-		if (!p->address)
-			return -ENOMEM;
-	}
-	return 0;
+	pr_debug("swsusp: available memory: %ld pages\n", n);
+	return n > (long)(copy_pages + EXTRA_PAGES +
+		(nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE);
 }
 
-static struct pbe *swsusp_alloc(unsigned int nr_pages)
+static struct pbe *swsusp_alloc(unsigned int nr_pages, unsigned int copy_pages)
 {
 	struct pbe *pblist;
 
@@ -543,10 +635,19 @@ static struct pbe *swsusp_alloc(unsigned
 		return NULL;
 	}
 
-	if (alloc_data_pages(pblist, GFP_ATOMIC | __GFP_COLD, 0)) {
-		printk(KERN_ERR "suspend: Allocating image pages failed.\n");
-		swsusp_free();
-		return NULL;
+	page_list = NULL;
+	while (copy_pages--) {
+		struct res_page *ptr;
+
+		ptr = alloc_image_page(GFP_ATOMIC | __GFP_COLD, 0);
+		if (!ptr) {
+			printk(KERN_ERR
+				"suspend: Allocating image pages failed.\n");
+			free_image();
+			return NULL;
+		}
+		ptr->next = page_list;
+		page_list = ptr;
 	}
 
 	return pblist;
@@ -554,25 +655,21 @@ static struct pbe *swsusp_alloc(unsigned
 
 asmlinkage int swsusp_save(void)
 {
-	unsigned int nr_pages;
+	unsigned int nr_pages, copy_pages;
 
 	pr_debug("swsusp: critical section: \n");
 
 	drain_local_pages();
-	nr_pages = count_data_pages();
-	printk("swsusp: Need to copy %u pages\n", nr_pages);
-
-	pr_debug("swsusp: pages needed: %u + %lu + %u, free: %u\n",
-		 nr_pages,
-		 (nr_pages + PBES_PER_PAGE - 1) / PBES_PER_PAGE,
-		 PAGES_FOR_IO, nr_free_pages());
+	copy_pages = count_data_pages(&nr_pages);
+	printk("swsusp: Need to save %u pages\n", nr_pages);
+	printk("swsusp: %u pages to copy\n", copy_pages);
 
-	if (!enough_free_mem(nr_pages)) {
+	if (!enough_free_mem(nr_pages, copy_pages)) {
 		printk(KERN_ERR "swsusp: Not enough free memory\n");
 		return -ENOMEM;
 	}
 
-	pagedir_nosave = swsusp_alloc(nr_pages);
+	pagedir_nosave = swsusp_alloc(nr_pages, copy_pages);
 	if (!pagedir_nosave)
 		return -ENOMEM;
 
@@ -582,6 +679,9 @@ asmlinkage int swsusp_save(void)
 	drain_local_pages();
 	copy_data_pages(pagedir_nosave);
 
+	/* Make sure the pages that we have not copied won't be reclaimed */
+	protect_data_pages(pagedir_nosave);
+
 	/*
 	 * End of critical section. From now on, we can write to memory,
 	 * but we should not touch disk. This specially means we must _not_
@@ -591,7 +691,7 @@ asmlinkage int swsusp_save(void)
 	nr_copy_pages = nr_pages;
 	nr_meta_pages = (nr_pages * sizeof(long) + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
-	printk("swsusp: critical section/: done (%d pages copied)\n", nr_pages);
+	printk("swsusp: critical section/: done (%d pages)\n", nr_pages);
 	return 0;
 }
 
@@ -654,7 +754,7 @@ int snapshot_read_next(struct snapshot_h
 	if (handle->page > nr_meta_pages + nr_copy_pages)
 		return 0;
 	if (!buffer) {
-		/* This makes the buffer be freed by swsusp_free() */
+		/* This makes the buffer be freed by free_image() */
 		buffer = alloc_image_page(GFP_ATOMIC, 0);
 		if (!buffer)
 			return -ENOMEM;
@@ -810,13 +910,6 @@ static inline struct pbe *unpack_orig_ad
  *	of "safe" which will be used later
  */
 
-struct safe_page {
-	struct safe_page *next;
-	char padding[PAGE_SIZE - sizeof(void *)];
-};
-
-static struct safe_page *safe_pages;
-
 static int prepare_image(struct snapshot_handle *handle)
 {
 	int error = 0;
@@ -833,21 +926,21 @@ static int prepare_image(struct snapshot
 		if (!pblist)
 			error = -ENOMEM;
 	}
-	safe_pages = NULL;
+	page_list = NULL;
 	if (!error && nr_pages > unsafe_pages) {
 		nr_pages -= unsafe_pages;
 		while (nr_pages--) {
-			struct safe_page *ptr;
+			struct res_page *ptr;
 
-			ptr = (struct safe_page *)get_zeroed_page(GFP_ATOMIC);
+			ptr = (struct res_page *)get_zeroed_page(GFP_ATOMIC);
 			if (!ptr) {
 				error = -ENOMEM;
 				break;
 			}
 			if (!PageNosaveFree(virt_to_page(ptr))) {
 				/* The page is "safe", add it to the list */
-				ptr->next = safe_pages;
-				safe_pages = ptr;
+				ptr->next = page_list;
+				page_list = ptr;
 			}
 			/* Mark the page as allocated */
 			SetPageNosave(virt_to_page(ptr));
@@ -858,7 +951,7 @@ static int prepare_image(struct snapshot
 		pagedir_nosave = pblist;
 	} else {
 		handle->pbe = NULL;
-		swsusp_free();
+		free_image();
 	}
 	return error;
 }
@@ -882,8 +975,8 @@ static void *get_buffer(struct snapshot_
 	 * The "original" page frame has not been allocated and we have to
 	 * use a "safe" page frame to store the read page
 	 */
-	pbe->address = (unsigned long)safe_pages;
-	safe_pages = safe_pages->next;
+	pbe->address = (unsigned long)page_list;
+	page_list = page_list->next;
 	if (last)
 		last->next = pbe;
 	handle->last_pbe = pbe;
@@ -919,7 +1012,7 @@ int snapshot_write_next(struct snapshot_
 	if (handle->prev && handle->page > nr_meta_pages + nr_copy_pages)
 		return 0;
 	if (!buffer) {
-		/* This makes the buffer be freed by swsusp_free() */
+		/* This makes the buffer be freed by free_image() */
 		buffer = alloc_image_page(GFP_ATOMIC, 0);
 		if (!buffer)
 			return -ENOMEM;
Index: linux-2.6.17-rc2-mm1/kernel/power/power.h
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/power.h
+++ linux-2.6.17-rc2-mm1/kernel/power/power.h
@@ -48,7 +48,14 @@ extern dev_t swsusp_resume_device;
 extern asmlinkage int swsusp_arch_suspend(void);
 extern asmlinkage int swsusp_arch_resume(void);
 
-extern unsigned int count_data_pages(void);
+#if (defined(CONFIG_BLK_DEV_INITRD) && defined(CONFIG_BLK_DEV_RAM))
+#define EXTRA_PAGES	(PAGES_FOR_IO + \
+			(2 * CONFIG_BLK_DEV_RAM_SIZE * 1024) / PAGE_SIZE)
+#else
+#define EXTRA_PAGES	PAGES_FOR_IO
+#endif
+
+extern unsigned int count_data_pages(unsigned int *total);
 
 struct snapshot_handle {
 	loff_t		offset;
@@ -111,6 +118,7 @@ extern int restore_special_mem(void);
 
 extern int swsusp_check(void);
 extern int swsusp_shrink_memory(void);
+extern void free_image(void);
 extern void swsusp_free(void);
 extern int swsusp_suspend(void);
 extern int swsusp_resume(void);
Index: linux-2.6.17-rc2-mm1/kernel/power/swsusp.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/swsusp.c
+++ linux-2.6.17-rc2-mm1/kernel/power/swsusp.c
@@ -177,28 +177,29 @@ int swsusp_shrink_memory(void)
 	long size, tmp;
 	struct zone *zone;
 	unsigned long pages = 0;
-	unsigned int i = 0;
+	unsigned int to_save, i = 0;
 	char *p = "-\\|/";
 
 	printk("Shrinking memory...  ");
 	do {
 		size = 2 * count_special_pages();
-		size += size / 50 + count_data_pages();
-		size += (size + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
-			PAGES_FOR_IO;
+		size += size / 50 + count_data_pages(&to_save);
+		size += (to_save + PBES_PER_PAGE - 1) / PBES_PER_PAGE +
+			EXTRA_PAGES;
 		tmp = size;
 		for_each_zone (zone)
 			if (!is_highmem(zone) && populated_zone(zone)) {
 				tmp -= zone->free_pages;
 				tmp += zone->lowmem_reserve[ZONE_NORMAL];
 			}
+		size = to_save * (PAGE_SIZE + sizeof(long));
 		if (tmp > 0) {
 			tmp = __shrink_memory(tmp);
 			if (!tmp)
 				return -ENOMEM;
 			pages += tmp;
-		} else if (size > image_size / PAGE_SIZE) {
-			tmp = __shrink_memory(size - (image_size / PAGE_SIZE));
+		} else if (size > image_size) {
+			tmp = __shrink_memory(size - image_size);
 			pages += tmp;
 		}
 		printk("\b%c", p[i++%4]);
@@ -261,7 +262,7 @@ int swsusp_resume(void)
 	 * very tight, so we have to free it as soon as we can to avoid
 	 * subsequent failures
 	 */
-	swsusp_free();
+	free_image();
 	restore_processor_state();
 	restore_special_mem();
 	touch_softlockup_watchdog();
Index: linux-2.6.17-rc2-mm1/kernel/power/user.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/kernel/power/user.c
+++ linux-2.6.17-rc2-mm1/kernel/power/user.c
@@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
 	case SNAPSHOT_UNFREEZE:
 		if (!data->frozen)
 			break;
+		if (data->ready && in_suspend) {
+			error = -EPERM;
+			break;
+		}
 		down(&pm_sem);
 		thaw_processes();
 		enable_nonboot_cpus();
Index: linux-2.6.17-rc2-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.17-rc2-mm1.orig/mm/vmscan.c
+++ linux-2.6.17-rc2-mm1/mm/vmscan.c
@@ -1437,7 +1437,68 @@ out:
 
 	return ret;
 }
-#endif
+
+static LIST_HEAD(saved_active_pages);
+static LIST_HEAD(saved_inactive_pages);
+
+/**
+ *	move_out_of_lru - software suspend includes some pages in the
+ *	suspend snapshot image without copying and these pages should be
+ *	procected from being reclaimed, which can be done by (temporarily)
+ *	moving them out of their respective LRU lists
+ *
+ *	It is to be called with local IRQs disabled.
+ */
+
+void move_out_of_lru(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+
+	spin_lock(&zone->lru_lock);
+	if (PageActive(page)) {
+		del_page_from_active_list(zone, page);
+		list_add(&page->lru, &saved_active_pages);
+	} else {
+		del_page_from_inactive_list(zone, page);
+		list_add(&page->lru, &saved_inactive_pages);
+	}
+	ClearPageLRU(page);
+	spin_unlock(&zone->lru_lock);
+}
+
+
+/**
+ *	restore_active_inactive_lists - used by the software suspend to move
+ *	the pages taken out of the LRU by take_page_out_of_lru() back to
+ *	their respective active/inactive lists (if the suspend fails)
+ */
+
+void restore_active_inactive_lists(void)
+{
+	struct page *page;
+	struct zone *zone;
+
+	while(!list_empty(&saved_active_pages)) {
+		page = lru_to_page(&saved_active_pages);
+		zone = page_zone(page);
+		list_del(&page->lru);
+		spin_lock_irq(&zone->lru_lock);
+		SetPageLRU(page);
+		add_page_to_active_list(zone, page);
+		spin_unlock_irq(&zone->lru_lock);
+	}
+
+	while(!list_empty(&saved_inactive_pages)) {
+		page = lru_to_page(&saved_inactive_pages);
+		zone = page_zone(page);
+		list_del(&page->lru);
+		spin_lock_irq(&zone->lru_lock);
+		SetPageLRU(page);
+		add_page_to_inactive_list(zone, page);
+		spin_unlock_irq(&zone->lru_lock);
+	}
+}
+#endif /* CONFIG_PM */
 
 #ifdef CONFIG_HOTPLUG_CPU
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
Index: linux-2.6.17-rc2-mm1/include/linux/swap.h
===================================================================
--- linux-2.6.17-rc2-mm1.orig/include/linux/swap.h
+++ linux-2.6.17-rc2-mm1/include/linux/swap.h
@@ -184,7 +184,9 @@ extern void swap_setup(void);
 
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zone **, gfp_t);
-extern unsigned long shrink_all_memory(unsigned long nr_pages);
+extern unsigned long shrink_all_memory(unsigned long);
+extern void move_out_of_lru(struct page *page);
+extern void restore_active_inactive_lists(void);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-27 15:27           ` Rafael J. Wysocki
  (?)
@ 2006-04-27 20:55           ` Pavel Machek
  2006-04-28  9:19               ` Rafael J. Wysocki
  -1 siblings, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2006-04-27 20:55 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Nigel Cunningham

Hi!
> On Tuesday 25 April 2006 12:31, Rafael J. Wysocki wrote:
> > On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > > > Okay, so it can be done, and patch does not look too bad. It still
> > > > > scares me. Is 800MB image more responsive than 500MB after resume?
> > > > 
> > > > Yes, it is, slightly, but I think 800 meg images are impractical for
> > > > performance reasons (like IMO everything above 500 meg with the current
> > > > hardware).  However this means we can save 80% of RAM with the patch
> > > > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > > > 200 meg instead of 125 on a 250 meg machine.
> > > 
> > > Could we get few people trying it on such small machines to see if it
> > > is really that noticeable?
> > 
> > OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).
> 
> Done, although it was not so easy to find the box.  This was a PII 350MHz w/
> 256 MB of RAM.
> 
> I invented the following test:
> - ran KDE with 4 desktops,
> - ran Firefox, OpenOffice.org 1.1 (with a simple spreadsheet), and GIMP (with
> 2 pictures) each on its own desktop,
> - ran the memory meter from the KDE's Info Center and two konsoles
> on the remaining desktop - one konsole with a kernel compilation and the
> other with a root session used for suspending the box (the built-in swsusp
> was used),
> so the box's RAM was almost fully occupied with ~30% taken by the page cache.
> 
> Then I suspended the box and measured the time from the start of resume
> (ie. leaving GRUB) to the point at which I had all of the application windows
> fully rendered on their respective desktops (I always switched the desktops
> in the same order, starting from the memory meter's one, through the OOo's
> and Firefox's, finishing on the GIMP's one and I always switched the
> desktop as soon as the window(s) on it were fully rendered).
> 
> I ran it a couple of times on the 2.6.17-rc1-mm2 kernel with and without
> the patch and the results (on the average) are the following:
> 
> (a) 25-28s with the patch
> (b) 30-33s without it

Ook, thanks for testing. I guess it is ready for -mm when Nick is
happy with it ...
							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-27 20:55           ` Pavel Machek
@ 2006-04-28  9:19               ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-28  9:19 UTC (permalink / raw
  To: Pavel Machek; +Cc: Linux PM, LKML, Nigel Cunningham

Hi,

On Thursday 27 April 2006 22:55, Pavel Machek wrote:
> > On Tuesday 25 April 2006 12:31, Rafael J. Wysocki wrote:
> > > On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > > > > Okay, so it can be done, and patch does not look too bad. It still
> > > > > > scares me. Is 800MB image more responsive than 500MB after resume?
> > > > > 
> > > > > Yes, it is, slightly, but I think 800 meg images are impractical for
> > > > > performance reasons (like IMO everything above 500 meg with the current
> > > > > hardware).  However this means we can save 80% of RAM with the patch
> > > > > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > > > > 200 meg instead of 125 on a 250 meg machine.
> > > > 
> > > > Could we get few people trying it on such small machines to see if it
> > > > is really that noticeable?
> > > 
> > > OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).
> > 
> > Done, although it was not so easy to find the box.  This was a PII 350MHz w/
> > 256 MB of RAM.
> > 
> > I invented the following test:
> > - ran KDE with 4 desktops,
> > - ran Firefox, OpenOffice.org 1.1 (with a simple spreadsheet), and GIMP (with
> > 2 pictures) each on its own desktop,
> > - ran the memory meter from the KDE's Info Center and two konsoles
> > on the remaining desktop - one konsole with a kernel compilation and the
> > other with a root session used for suspending the box (the built-in swsusp
> > was used),
> > so the box's RAM was almost fully occupied with ~30% taken by the page cache.
> > 
> > Then I suspended the box and measured the time from the start of resume
> > (ie. leaving GRUB) to the point at which I had all of the application windows
> > fully rendered on their respective desktops (I always switched the desktops
> > in the same order, starting from the memory meter's one, through the OOo's
> > and Firefox's, finishing on the GIMP's one and I always switched the
> > desktop as soon as the window(s) on it were fully rendered).
> > 
> > I ran it a couple of times on the 2.6.17-rc1-mm2 kernel with and without
> > the patch and the results (on the average) are the following:
> > 
> > (a) 25-28s with the patch
> > (b) 30-33s without it
> 
> Ook, thanks for testing. I guess it is ready for -mm when Nick is
> happy with it ...

OK, I'm waiting for Nick to respond, then. :-)

Still I'd like to change one more thing in the final patch.  Namely,
instead of this:

@@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
        case SNAPSHOT_UNFREEZE:
                if (!data->frozen)
                        break;
+               if (data->ready && in_suspend) {
+                       error = -EPERM;
+                       break;
+               }
                down(&pm_sem);
                thaw_processes();
                enable_nonboot_cpus();

I'd like to do:

 	case SNAPSHOT_UNFREEZE:
 		if (!data->frozen)
 			break;
+               	if (data->ready)
+			swsusp_free();
 		down(&pm_sem);
		thaw_processes();
		enable_nonboot_cpus();

so unfreeze() won't return the error.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-28  9:19               ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-28  9:19 UTC (permalink / raw
  To: Pavel Machek; +Cc: Nigel Cunningham, Linux PM, LKML

Hi,

On Thursday 27 April 2006 22:55, Pavel Machek wrote:
> > On Tuesday 25 April 2006 12:31, Rafael J. Wysocki wrote:
> > > On Tuesday 25 April 2006 12:04, Pavel Machek wrote:
> > > > > > Okay, so it can be done, and patch does not look too bad. It still
> > > > > > scares me. Is 800MB image more responsive than 500MB after resume?
> > > > > 
> > > > > Yes, it is, slightly, but I think 800 meg images are impractical for
> > > > > performance reasons (like IMO everything above 500 meg with the current
> > > > > hardware).  However this means we can save 80% of RAM with the patch
> > > > > and that should be 400 meg instead of 250 on a 500 meg machine, or
> > > > > 200 meg instead of 125 on a 250 meg machine.
> > > > 
> > > > Could we get few people trying it on such small machines to see if it
> > > > is really that noticeable?
> > > 
> > > OK, I'll try to run some tests on a machine with smaller RAM (and slower CPU).
> > 
> > Done, although it was not so easy to find the box.  This was a PII 350MHz w/
> > 256 MB of RAM.
> > 
> > I invented the following test:
> > - ran KDE with 4 desktops,
> > - ran Firefox, OpenOffice.org 1.1 (with a simple spreadsheet), and GIMP (with
> > 2 pictures) each on its own desktop,
> > - ran the memory meter from the KDE's Info Center and two konsoles
> > on the remaining desktop - one konsole with a kernel compilation and the
> > other with a root session used for suspending the box (the built-in swsusp
> > was used),
> > so the box's RAM was almost fully occupied with ~30% taken by the page cache.
> > 
> > Then I suspended the box and measured the time from the start of resume
> > (ie. leaving GRUB) to the point at which I had all of the application windows
> > fully rendered on their respective desktops (I always switched the desktops
> > in the same order, starting from the memory meter's one, through the OOo's
> > and Firefox's, finishing on the GIMP's one and I always switched the
> > desktop as soon as the window(s) on it were fully rendered).
> > 
> > I ran it a couple of times on the 2.6.17-rc1-mm2 kernel with and without
> > the patch and the results (on the average) are the following:
> > 
> > (a) 25-28s with the patch
> > (b) 30-33s without it
> 
> Ook, thanks for testing. I guess it is ready for -mm when Nick is
> happy with it ...

OK, I'm waiting for Nick to respond, then. :-)

Still I'd like to change one more thing in the final patch.  Namely,
instead of this:

@@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
        case SNAPSHOT_UNFREEZE:
                if (!data->frozen)
                        break;
+               if (data->ready && in_suspend) {
+                       error = -EPERM;
+                       break;
+               }
                down(&pm_sem);
                thaw_processes();
                enable_nonboot_cpus();

I'd like to do:

 	case SNAPSHOT_UNFREEZE:
 		if (!data->frozen)
 			break;
+               	if (data->ready)
+			swsusp_free();
 		down(&pm_sem);
		thaw_processes();
		enable_nonboot_cpus();

so unfreeze() won't return the error.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-28  9:19               ` Rafael J. Wysocki
@ 2006-04-28  9:23                 ` Pavel Machek
  -1 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-28  9:23 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Nigel Cunningham

Hi!

> OK, I'm waiting for Nick to respond, then. :-)
> 
> Still I'd like to change one more thing in the final patch.  Namely,
> instead of this:
> 
> @@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
>         case SNAPSHOT_UNFREEZE:
>                 if (!data->frozen)
>                         break;
> +               if (data->ready && in_suspend) {
> +                       error = -EPERM;
> +                       break;
> +               }
>                 down(&pm_sem);
>                 thaw_processes();
>                 enable_nonboot_cpus();
> 
> I'd like to do:
> 
>  	case SNAPSHOT_UNFREEZE:
>  		if (!data->frozen)
>  			break;
> +               	if (data->ready)
> +			swsusp_free();
>  		down(&pm_sem);
> 		thaw_processes();
> 		enable_nonboot_cpus();
> 
> so unfreeze() won't return the error.

Seems okay to me...
						Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-28  9:23                 ` Pavel Machek
  0 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2006-04-28  9:23 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Linux PM, LKML

Hi!

> OK, I'm waiting for Nick to respond, then. :-)
> 
> Still I'd like to change one more thing in the final patch.  Namely,
> instead of this:
> 
> @@ -153,6 +153,10 @@ static int snapshot_ioctl(struct inode *
>         case SNAPSHOT_UNFREEZE:
>                 if (!data->frozen)
>                         break;
> +               if (data->ready && in_suspend) {
> +                       error = -EPERM;
> +                       break;
> +               }
>                 down(&pm_sem);
>                 thaw_processes();
>                 enable_nonboot_cpus();
> 
> I'd like to do:
> 
>  	case SNAPSHOT_UNFREEZE:
>  		if (!data->frozen)
>  			break;
> +               	if (data->ready)
> +			swsusp_free();
>  		down(&pm_sem);
> 		thaw_processes();
> 		enable_nonboot_cpus();
> 
> so unfreeze() won't return the error.

Seems okay to me...
						Pavel
-- 
Thanks for all the (sleeping) penguins.
_______________________________________________
linux-pm mailing list
linux-pm@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-26  0:49                 ` Nigel Cunningham
@ 2006-04-30 12:27                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-30 12:27 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

Hi,

On Wednesday 26 April 2006 02:49, Nigel Cunningham wrote:
> On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> > On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > > It does apply to all of the LRU pages. This is what I've been doing
> > > > > for years now. The only corner case I've come across is XFS. It still
> > > > > wants to write data even when there's nothing to do and it's threads
> > > > > are frozen (IIRC - haven't looked at it for a while). I got around
> > > > > that by freezing bdevs when freezing processes.
> > > >
> > > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > > pages, except for the pages mapped by the current task, without
> > > > copying.  I think we can try to do this, but we'll need a patch to
> > > > freeze bdevs for this purpose. ;-)
> > >
> > > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > > code is complex enough...
> >
> > Well, why don't we see the patch?  If it's too complex, we can just decide
> > not to use it. :-)
> 
> In Suspend2, I'm still using a different version of process.c to what you guys 
> have. In my version, I thaw kernelspace, then thaw bdevs, then thaw userspace. 
> The version below just thaws bdevs after thawing all processes. I think that 
> might need modification, but thought I'd post this now so you can see how 
> complicated or otherwise it is.

IMHO it doesn't look so scary. :-)

> diff -ruN linux-2.6.17-rc2/kernel/power/process.c bdev-freeze/kernel/power/process.c
> --- linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000 +1000
> +++ bdev-freeze/kernel/power/process.c	2006-04-26 10:44:56.000000000 +1000
> @@ -19,6 +19,56 @@
>   */
>  #define TIMEOUT	(20 * HZ)
>  
> +struct frozen_fs
> +{
> +	struct list_head fsb_list;
> +	struct super_block *sb;
> +};
> +
> +LIST_HEAD(frozen_fs_list);
> +
> +void freezer_make_fses_rw(void)
> +{
> +	struct frozen_fs *fs, *next_fs;
> +
> +	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
> +		thaw_bdev(fs->sb->s_bdev, fs->sb);
> +
> +		list_del(&fs->fsb_list);
> +		kfree(fs);
> +	}
> +}
> +
> +/* 
> + * Done after userspace is frozen, so there should be no danger of
> + * fses being unmounted while we're in here.
> + */
> +int freezer_make_fses_ro(void)
> +{
> +	struct frozen_fs *fs;
> +	struct super_block *sb;
> +
> +	/* Generate the list */
> +	list_for_each_entry(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);

Shouldn't we check for kmalloc() failures here?

> +		fs->sb = sb;
> +		list_add_tail(&fs->fsb_list, &frozen_fs_list);
> +	};
> +
> +	/* Do the freezing in reverse order so filesystems dependant
> +	 * upon others are frozen in the right order. (Eg loopback
> +	 * on ext3). */
> +	list_for_each_entry_reverse(fs, &frozen_fs_list, fsb_list)
> +		freeze_bdev(fs->sb->s_bdev);
> +
> +	return 0;
> +}
> +
>  
>  static inline int freezeable(struct task_struct * p)
>  {
> @@ -77,6 +127,7 @@
>  	printk( "Stopping tasks: " );
>  	start_time = jiffies;
>  	user_frozen = 0;
> +	bdevs_frozen = 0;
>  	do {
>  		nr_user = todo = 0;
>  		read_lock(&tasklist_lock);
> @@ -107,6 +158,10 @@
>  			start_time = jiffies;
>  		}
>  		user_frozen = !nr_user;
> +
> +		if (user_frozen && !bdevs_frozen)
> +			freezer_make_fses_ro();
> +
>  		yield();			/* Yield is okay here */
>  		if (todo && time_after(jiffies, start_time + TIMEOUT))
>  			break;
> @@ -156,6 +211,8 @@
>  			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
>  	} while_each_thread(g, p);
>  
> +	freezer_make_fses_rw();
> +
>  	read_unlock(&tasklist_lock);
>  	schedule();
>  	printk( " done\n" );

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-04-30 12:27                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-04-30 12:27 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Nick Piggin, LKML, Pavel Machek, Linux PM

[-- Attachment #1: Type: text/plain, Size: 3824 bytes --]

Hi,

On Wednesday 26 April 2006 02:49, Nigel Cunningham wrote:
> On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> > On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > > It does apply to all of the LRU pages. This is what I've been doing
> > > > > for years now. The only corner case I've come across is XFS. It still
> > > > > wants to write data even when there's nothing to do and it's threads
> > > > > are frozen (IIRC - haven't looked at it for a while). I got around
> > > > > that by freezing bdevs when freezing processes.
> > > >
> > > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > > pages, except for the pages mapped by the current task, without
> > > > copying.  I think we can try to do this, but we'll need a patch to
> > > > freeze bdevs for this purpose. ;-)
> > >
> > > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > > code is complex enough...
> >
> > Well, why don't we see the patch?  If it's too complex, we can just decide
> > not to use it. :-)
> 
> In Suspend2, I'm still using a different version of process.c to what you guys 
> have. In my version, I thaw kernelspace, then thaw bdevs, then thaw userspace. 
> The version below just thaws bdevs after thawing all processes. I think that 
> might need modification, but thought I'd post this now so you can see how 
> complicated or otherwise it is.

IMHO it doesn't look so scary. :-)

> diff -ruN linux-2.6.17-rc2/kernel/power/process.c bdev-freeze/kernel/power/process.c
> --- linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000 +1000
> +++ bdev-freeze/kernel/power/process.c	2006-04-26 10:44:56.000000000 +1000
> @@ -19,6 +19,56 @@
>   */
>  #define TIMEOUT	(20 * HZ)
>  
> +struct frozen_fs
> +{
> +	struct list_head fsb_list;
> +	struct super_block *sb;
> +};
> +
> +LIST_HEAD(frozen_fs_list);
> +
> +void freezer_make_fses_rw(void)
> +{
> +	struct frozen_fs *fs, *next_fs;
> +
> +	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
> +		thaw_bdev(fs->sb->s_bdev, fs->sb);
> +
> +		list_del(&fs->fsb_list);
> +		kfree(fs);
> +	}
> +}
> +
> +/* 
> + * Done after userspace is frozen, so there should be no danger of
> + * fses being unmounted while we're in here.
> + */
> +int freezer_make_fses_ro(void)
> +{
> +	struct frozen_fs *fs;
> +	struct super_block *sb;
> +
> +	/* Generate the list */
> +	list_for_each_entry(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);

Shouldn't we check for kmalloc() failures here?

> +		fs->sb = sb;
> +		list_add_tail(&fs->fsb_list, &frozen_fs_list);
> +	};
> +
> +	/* Do the freezing in reverse order so filesystems dependant
> +	 * upon others are frozen in the right order. (Eg loopback
> +	 * on ext3). */
> +	list_for_each_entry_reverse(fs, &frozen_fs_list, fsb_list)
> +		freeze_bdev(fs->sb->s_bdev);
> +
> +	return 0;
> +}
> +
>  
>  static inline int freezeable(struct task_struct * p)
>  {
> @@ -77,6 +127,7 @@
>  	printk( "Stopping tasks: " );
>  	start_time = jiffies;
>  	user_frozen = 0;
> +	bdevs_frozen = 0;
>  	do {
>  		nr_user = todo = 0;
>  		read_lock(&tasklist_lock);
> @@ -107,6 +158,10 @@
>  			start_time = jiffies;
>  		}
>  		user_frozen = !nr_user;
> +
> +		if (user_frozen && !bdevs_frozen)
> +			freezer_make_fses_ro();
> +
>  		yield();			/* Yield is okay here */
>  		if (todo && time_after(jiffies, start_time + TIMEOUT))
>  			break;
> @@ -156,6 +211,8 @@
>  			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
>  	} while_each_thread(g, p);
>  
> +	freezer_make_fses_rw();
> +
>  	read_unlock(&tasklist_lock);
>  	schedule();
>  	printk( " done\n" );

Greetings,
Rafael

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-04-30 12:27                     ` Rafael J. Wysocki
@ 2006-05-01  1:49                       ` Nigel Cunningham
  -1 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-05-01  1:49 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Pavel Machek, Nick Piggin, Linux PM, LKML

[-- Attachment #1: Type: text/plain, Size: 5025 bytes --]

Hi.

Sorry for the slow response - I only have internet access at work now. This is 
going to be a pattern for the next few weeks - I'm off work next week and.the 
week after I'll also be off apart from Monday and Tuesday (those are my last 
two days working for Cyclades - I then get my sweetheart and little one back, 
and we drive down to Victoria over the rest of the week).

On Sunday 30 April 2006 22:27, Rafael J. Wysocki wrote:
> Hi,
>
> On Wednesday 26 April 2006 02:49, Nigel Cunningham wrote:
> > On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> > > On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > > > It does apply to all of the LRU pages. This is what I've been
> > > > > > doing for years now. The only corner case I've come across is
> > > > > > XFS. It still wants to write data even when there's nothing to do
> > > > > > and it's threads are frozen (IIRC - haven't looked at it for a
> > > > > > while). I got around that by freezing bdevs when freezing
> > > > > > processes.
> > > > >
> > > > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > > > pages, except for the pages mapped by the current task, without
> > > > > copying.  I think we can try to do this, but we'll need a patch to
> > > > > freeze bdevs for this purpose. ;-)
> > > >
> > > > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > > > code is complex enough...
> > >
> > > Well, why don't we see the patch?  If it's too complex, we can just
> > > decide not to use it. :-)
> >
> > In Suspend2, I'm still using a different version of process.c to what you
> > guys have. In my version, I thaw kernelspace, then thaw bdevs, then thaw
> > userspace. The version below just thaws bdevs after thawing all
> > processes. I think that might need modification, but thought I'd post
> > this now so you can see how complicated or otherwise it is.
>
> IMHO it doesn't look so scary. :-)

:)

> > diff -ruN linux-2.6.17-rc2/kernel/power/process.c
> > bdev-freeze/kernel/power/process.c ---
> > linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000
> > +1000 +++ bdev-freeze/kernel/power/process.c	2006-04-26
> > 10:44:56.000000000 +1000 @@ -19,6 +19,56 @@
> >   */
> >  #define TIMEOUT	(20 * HZ)
> >
> > +struct frozen_fs
> > +{
> > +	struct list_head fsb_list;
> > +	struct super_block *sb;
> > +};
> > +
> > +LIST_HEAD(frozen_fs_list);
> > +
> > +void freezer_make_fses_rw(void)
> > +{
> > +	struct frozen_fs *fs, *next_fs;
> > +
> > +	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
> > +		thaw_bdev(fs->sb->s_bdev, fs->sb);
> > +
> > +		list_del(&fs->fsb_list);
> > +		kfree(fs);
> > +	}
> > +}
> > +
> > +/*
> > + * Done after userspace is frozen, so there should be no danger of
> > + * fses being unmounted while we're in here.
> > + */
> > +int freezer_make_fses_ro(void)
> > +{
> > +	struct frozen_fs *fs;
> > +	struct super_block *sb;
> > +
> > +	/* Generate the list */
> > +	list_for_each_entry(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);
>
> Shouldn't we check for kmalloc() failures here?

Good point. Just because I've never seen it fail, doesn't mean it can't :)

Before I roll a new version, what did you think splitting the thawing and 
thawing bdevs in the middle? I think it's the right thing (TM) to do :>

Nigel

> > +		fs->sb = sb;
> > +		list_add_tail(&fs->fsb_list, &frozen_fs_list);
> > +	};
> > +
> > +	/* Do the freezing in reverse order so filesystems dependant
> > +	 * upon others are frozen in the right order. (Eg loopback
> > +	 * on ext3). */
> > +	list_for_each_entry_reverse(fs, &frozen_fs_list, fsb_list)
> > +		freeze_bdev(fs->sb->s_bdev);
> > +
> > +	return 0;
> > +}
> > +
> >
> >  static inline int freezeable(struct task_struct * p)
> >  {
> > @@ -77,6 +127,7 @@
> >  	printk( "Stopping tasks: " );
> >  	start_time = jiffies;
> >  	user_frozen = 0;
> > +	bdevs_frozen = 0;
> >  	do {
> >  		nr_user = todo = 0;
> >  		read_lock(&tasklist_lock);
> > @@ -107,6 +158,10 @@
> >  			start_time = jiffies;
> >  		}
> >  		user_frozen = !nr_user;
> > +
> > +		if (user_frozen && !bdevs_frozen)
> > +			freezer_make_fses_ro();
> > +
> >  		yield();			/* Yield is okay here */
> >  		if (todo && time_after(jiffies, start_time + TIMEOUT))
> >  			break;
> > @@ -156,6 +211,8 @@
> >  			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
> >  	} while_each_thread(g, p);
> >
> > +	freezer_make_fses_rw();
> > +
> >  	read_unlock(&tasklist_lock);
> >  	schedule();
> >  	printk( " done\n" );
>
> Greetings,
> Rafael

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
@ 2006-05-01  1:49                       ` Nigel Cunningham
  0 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-05-01  1:49 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Nick Piggin, LKML, Pavel Machek, Linux PM


[-- Attachment #1.1: Type: text/plain, Size: 5025 bytes --]

Hi.

Sorry for the slow response - I only have internet access at work now. This is 
going to be a pattern for the next few weeks - I'm off work next week and.the 
week after I'll also be off apart from Monday and Tuesday (those are my last 
two days working for Cyclades - I then get my sweetheart and little one back, 
and we drive down to Victoria over the rest of the week).

On Sunday 30 April 2006 22:27, Rafael J. Wysocki wrote:
> Hi,
>
> On Wednesday 26 April 2006 02:49, Nigel Cunningham wrote:
> > On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> > > On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > > > It does apply to all of the LRU pages. This is what I've been
> > > > > > doing for years now. The only corner case I've come across is
> > > > > > XFS. It still wants to write data even when there's nothing to do
> > > > > > and it's threads are frozen (IIRC - haven't looked at it for a
> > > > > > while). I got around that by freezing bdevs when freezing
> > > > > > processes.
> > > > >
> > > > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > > > pages, except for the pages mapped by the current task, without
> > > > > copying.  I think we can try to do this, but we'll need a patch to
> > > > > freeze bdevs for this purpose. ;-)
> > > >
> > > > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > > > code is complex enough...
> > >
> > > Well, why don't we see the patch?  If it's too complex, we can just
> > > decide not to use it. :-)
> >
> > In Suspend2, I'm still using a different version of process.c to what you
> > guys have. In my version, I thaw kernelspace, then thaw bdevs, then thaw
> > userspace. The version below just thaws bdevs after thawing all
> > processes. I think that might need modification, but thought I'd post
> > this now so you can see how complicated or otherwise it is.
>
> IMHO it doesn't look so scary. :-)

:)

> > diff -ruN linux-2.6.17-rc2/kernel/power/process.c
> > bdev-freeze/kernel/power/process.c ---
> > linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000
> > +1000 +++ bdev-freeze/kernel/power/process.c	2006-04-26
> > 10:44:56.000000000 +1000 @@ -19,6 +19,56 @@
> >   */
> >  #define TIMEOUT	(20 * HZ)
> >
> > +struct frozen_fs
> > +{
> > +	struct list_head fsb_list;
> > +	struct super_block *sb;
> > +};
> > +
> > +LIST_HEAD(frozen_fs_list);
> > +
> > +void freezer_make_fses_rw(void)
> > +{
> > +	struct frozen_fs *fs, *next_fs;
> > +
> > +	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
> > +		thaw_bdev(fs->sb->s_bdev, fs->sb);
> > +
> > +		list_del(&fs->fsb_list);
> > +		kfree(fs);
> > +	}
> > +}
> > +
> > +/*
> > + * Done after userspace is frozen, so there should be no danger of
> > + * fses being unmounted while we're in here.
> > + */
> > +int freezer_make_fses_ro(void)
> > +{
> > +	struct frozen_fs *fs;
> > +	struct super_block *sb;
> > +
> > +	/* Generate the list */
> > +	list_for_each_entry(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);
>
> Shouldn't we check for kmalloc() failures here?

Good point. Just because I've never seen it fail, doesn't mean it can't :)

Before I roll a new version, what did you think splitting the thawing and 
thawing bdevs in the middle? I think it's the right thing (TM) to do :>

Nigel

> > +		fs->sb = sb;
> > +		list_add_tail(&fs->fsb_list, &frozen_fs_list);
> > +	};
> > +
> > +	/* Do the freezing in reverse order so filesystems dependant
> > +	 * upon others are frozen in the right order. (Eg loopback
> > +	 * on ext3). */
> > +	list_for_each_entry_reverse(fs, &frozen_fs_list, fsb_list)
> > +		freeze_bdev(fs->sb->s_bdev);
> > +
> > +	return 0;
> > +}
> > +
> >
> >  static inline int freezeable(struct task_struct * p)
> >  {
> > @@ -77,6 +127,7 @@
> >  	printk( "Stopping tasks: " );
> >  	start_time = jiffies;
> >  	user_frozen = 0;
> > +	bdevs_frozen = 0;
> >  	do {
> >  		nr_user = todo = 0;
> >  		read_lock(&tasklist_lock);
> > @@ -107,6 +158,10 @@
> >  			start_time = jiffies;
> >  		}
> >  		user_frozen = !nr_user;
> > +
> > +		if (user_frozen && !bdevs_frozen)
> > +			freezer_make_fses_ro();
> > +
> >  		yield();			/* Yield is okay here */
> >  		if (todo && time_after(jiffies, start_time + TIMEOUT))
> >  			break;
> > @@ -156,6 +211,8 @@
> >  			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
> >  	} while_each_thread(g, p);
> >
> > +	freezer_make_fses_rw();
> > +
> >  	read_unlock(&tasklist_lock);
> >  	schedule();
> >  	printk( " done\n" );
>
> Greetings,
> Rafael

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-05-01  1:49                       ` Nigel Cunningham
  (?)
@ 2006-05-01 11:20                       ` Rafael J. Wysocki
  2006-05-01 22:56                         ` Nigel Cunningham
  -1 siblings, 1 reply; 53+ messages in thread
From: Rafael J. Wysocki @ 2006-05-01 11:20 UTC (permalink / raw
  To: Nigel Cunningham; +Cc: Linux PM, Pavel Machek

[-- Attachment #1: Type: text/plain, Size: 3995 bytes --]

[Dropped Nick and LKML from the Cc list.]

Hi,

On Monday 01 May 2006 03:49, Nigel Cunningham wrote:
> Hi.
> 
> Sorry for the slow response - I only have internet access at work now.

No problem at all. :-)

> This is  going to be a pattern for the next few weeks - I'm off work next
> week and.the week after I'll also be off apart from Monday and Tuesday
> (those are my last two days working for Cyclades - I then get my sweetheart
> and little one back, and we drive down to Victoria over the rest of the week).
> 
> On Sunday 30 April 2006 22:27, Rafael J. Wysocki wrote:
> > On Wednesday 26 April 2006 02:49, Nigel Cunningham wrote:
> > > On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> > > > On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > > > > It does apply to all of the LRU pages. This is what I've been
> > > > > > > doing for years now. The only corner case I've come across is
> > > > > > > XFS. It still wants to write data even when there's nothing to do
> > > > > > > and it's threads are frozen (IIRC - haven't looked at it for a
> > > > > > > while). I got around that by freezing bdevs when freezing
> > > > > > > processes.
> > > > > >
> > > > > > This means if we freeze bdevs, we'll be able to save all of the LRU
> > > > > > pages, except for the pages mapped by the current task, without
> > > > > > copying.  I think we can try to do this, but we'll need a patch to
> > > > > > freeze bdevs for this purpose. ;-)
> > > > >
> > > > > ...adding more dependencies to how vm/blockdevs work. I'd say current
> > > > > code is complex enough...
> > > >
> > > > Well, why don't we see the patch?  If it's too complex, we can just
> > > > decide not to use it. :-)
> > >
> > > In Suspend2, I'm still using a different version of process.c to what you
> > > guys have. In my version, I thaw kernelspace, then thaw bdevs, then thaw
> > > userspace. The version below just thaws bdevs after thawing all
> > > processes. I think that might need modification, but thought I'd post
> > > this now so you can see how complicated or otherwise it is.
> >
> > IMHO it doesn't look so scary. :-)
> 
> :)
> 
> > > diff -ruN linux-2.6.17-rc2/kernel/power/process.c
> > > bdev-freeze/kernel/power/process.c ---
> > > linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000
> > > +1000 +++ bdev-freeze/kernel/power/process.c	2006-04-26
> > > 10:44:56.000000000 +1000 @@ -19,6 +19,56 @@
> > >   */
> > >  #define TIMEOUT	(20 * HZ)
> > >
> > > +struct frozen_fs
> > > +{
> > > +	struct list_head fsb_list;
> > > +	struct super_block *sb;
> > > +};
> > > +
> > > +LIST_HEAD(frozen_fs_list);
> > > +
> > > +void freezer_make_fses_rw(void)
> > > +{
> > > +	struct frozen_fs *fs, *next_fs;
> > > +
> > > +	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
> > > +		thaw_bdev(fs->sb->s_bdev, fs->sb);
> > > +
> > > +		list_del(&fs->fsb_list);
> > > +		kfree(fs);
> > > +	}
> > > +}
> > > +
> > > +/*
> > > + * Done after userspace is frozen, so there should be no danger of
> > > + * fses being unmounted while we're in here.
> > > + */
> > > +int freezer_make_fses_ro(void)
> > > +{
> > > +	struct frozen_fs *fs;
> > > +	struct super_block *sb;
> > > +
> > > +	/* Generate the list */
> > > +	list_for_each_entry(sb, &super_blocks, s_list) {
> > > +		if (!sb->s_root || !sb->s_bdev ||
> > > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > +		    (sb->s_flags & MS_RDONLY))
> > > +			continue;
> > > +
> > > +		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);
> >
> > Shouldn't we check for kmalloc() failures here?
> 
> Good point. Just because I've never seen it fail, doesn't mean it can't :)
> 
> Before I roll a new version, what did you think splitting the thawing and 
> thawing bdevs in the middle? I think it's the right thing (TM) to do :>

Do you mean to thaw kernel threads first, thaw bdevs next and thaw user
space processes at the end?  I think it should be done in that order if
the bdevs are frozen.

Greetings,
Rafael

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC][PATCH] swsusp: support creating bigger images
  2006-05-01 11:20                       ` Rafael J. Wysocki
@ 2006-05-01 22:56                         ` Nigel Cunningham
  0 siblings, 0 replies; 53+ messages in thread
From: Nigel Cunningham @ 2006-05-01 22:56 UTC (permalink / raw
  To: Rafael J. Wysocki; +Cc: Linux PM, Pavel Machek


[-- Attachment #1.1: Type: text/plain, Size: 4576 bytes --]

Hi.

On Monday 01 May 2006 21:20, Rafael J. Wysocki wrote:
> [Dropped Nick and LKML from the Cc list.]
>
> Hi,
>
> On Monday 01 May 2006 03:49, Nigel Cunningham wrote:
> > Hi.
> >
> > Sorry for the slow response - I only have internet access at work now.
>
> No problem at all. :-)
>
> > This is  going to be a pattern for the next few weeks - I'm off work next
> > week and.the week after I'll also be off apart from Monday and Tuesday
> > (those are my last two days working for Cyclades - I then get my
> > sweetheart and little one back, and we drive down to Victoria over the
> > rest of the week).
> >
> > On Sunday 30 April 2006 22:27, Rafael J. Wysocki wrote:
> > > On Wednesday 26 April 2006 02:49, Nigel Cunningham wrote:
> > > > On Wednesday 26 April 2006 08:43, Rafael J. Wysocki wrote:
> > > > > On Wednesday 26 April 2006 00:25, Pavel Machek wrote:
> > > > > > > > It does apply to all of the LRU pages. This is what I've been
> > > > > > > > doing for years now. The only corner case I've come across is
> > > > > > > > XFS. It still wants to write data even when there's nothing
> > > > > > > > to do and it's threads are frozen (IIRC - haven't looked at
> > > > > > > > it for a while). I got around that by freezing bdevs when
> > > > > > > > freezing processes.
> > > > > > >
> > > > > > > This means if we freeze bdevs, we'll be able to save all of the
> > > > > > > LRU pages, except for the pages mapped by the current task,
> > > > > > > without copying.  I think we can try to do this, but we'll need
> > > > > > > a patch to freeze bdevs for this purpose. ;-)
> > > > > >
> > > > > > ...adding more dependencies to how vm/blockdevs work. I'd say
> > > > > > current code is complex enough...
> > > > >
> > > > > Well, why don't we see the patch?  If it's too complex, we can just
> > > > > decide not to use it. :-)
> > > >
> > > > In Suspend2, I'm still using a different version of process.c to what
> > > > you guys have. In my version, I thaw kernelspace, then thaw bdevs,
> > > > then thaw userspace. The version below just thaws bdevs after thawing
> > > > all processes. I think that might need modification, but thought I'd
> > > > post this now so you can see how complicated or otherwise it is.
> > >
> > > IMHO it doesn't look so scary. :-)
> > >
> > :)
> > :
> > > > diff -ruN linux-2.6.17-rc2/kernel/power/process.c
> > > > bdev-freeze/kernel/power/process.c ---
> > > > linux-2.6.17-rc2/kernel/power/process.c	2006-04-19 14:27:36.000000000
> > > > +1000 +++ bdev-freeze/kernel/power/process.c	2006-04-26
> > > > 10:44:56.000000000 +1000 @@ -19,6 +19,56 @@
> > > >   */
> > > >  #define TIMEOUT	(20 * HZ)
> > > >
> > > > +struct frozen_fs
> > > > +{
> > > > +	struct list_head fsb_list;
> > > > +	struct super_block *sb;
> > > > +};
> > > > +
> > > > +LIST_HEAD(frozen_fs_list);
> > > > +
> > > > +void freezer_make_fses_rw(void)
> > > > +{
> > > > +	struct frozen_fs *fs, *next_fs;
> > > > +
> > > > +	list_for_each_entry_safe(fs, next_fs, &frozen_fs_list, fsb_list) {
> > > > +		thaw_bdev(fs->sb->s_bdev, fs->sb);
> > > > +
> > > > +		list_del(&fs->fsb_list);
> > > > +		kfree(fs);
> > > > +	}
> > > > +}
> > > > +
> > > > +/*
> > > > + * Done after userspace is frozen, so there should be no danger of
> > > > + * fses being unmounted while we're in here.
> > > > + */
> > > > +int freezer_make_fses_ro(void)
> > > > +{
> > > > +	struct frozen_fs *fs;
> > > > +	struct super_block *sb;
> > > > +
> > > > +	/* Generate the list */
> > > > +	list_for_each_entry(sb, &super_blocks, s_list) {
> > > > +		if (!sb->s_root || !sb->s_bdev ||
> > > > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > > +		    (sb->s_flags & MS_RDONLY))
> > > > +			continue;
> > > > +
> > > > +		fs = kmalloc(sizeof(struct frozen_fs), GFP_ATOMIC);
> > >
> > > Shouldn't we check for kmalloc() failures here?
> >
> > Good point. Just because I've never seen it fail, doesn't mean it can't
> > :)
> >
> > Before I roll a new version, what did you think splitting the thawing and
> > thawing bdevs in the middle? I think it's the right thing (TM) to do :>
>
> Do you mean to thaw kernel threads first, thaw bdevs next and thaw user
> space processes at the end?  I think it should be done in that order if
> the bdevs are frozen.

Ok. Will seek to find time to prepare a patch to do that.

Regards,

Nigel

-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2006-05-01 22:56 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-24 21:55 [RFC][PATCH] swsusp: support creating bigger images Rafael J. Wysocki
2006-04-24 22:16 ` Pavel Machek
2006-04-25  8:26   ` Rafael J. Wysocki
2006-04-25  8:26     ` Rafael J. Wysocki
2006-04-25 10:04     ` Pavel Machek
2006-04-25 10:04       ` Pavel Machek
2006-04-25 10:31       ` Rafael J. Wysocki
2006-04-25 10:31         ` Rafael J. Wysocki
2006-04-27 15:27         ` Rafael J. Wysocki
2006-04-27 15:27           ` Rafael J. Wysocki
2006-04-27 20:55           ` Pavel Machek
2006-04-28  9:19             ` Rafael J. Wysocki
2006-04-28  9:19               ` Rafael J. Wysocki
2006-04-28  9:23               ` Pavel Machek
2006-04-28  9:23                 ` Pavel Machek
2006-04-25 10:28 ` Nick Piggin
2006-04-25 15:39   ` Rafael J. Wysocki
2006-04-25 20:32     ` Pavel Machek
2006-04-25 21:12       ` Rafael J. Wysocki
2006-04-25 21:12         ` Rafael J. Wysocki
2006-04-25 21:18         ` Nigel Cunningham
2006-04-25 21:18           ` Nigel Cunningham
2006-04-25 22:21           ` Rafael J. Wysocki
2006-04-25 22:21             ` Rafael J. Wysocki
2006-04-25 22:24             ` Nigel Cunningham
2006-04-25 22:24               ` Nigel Cunningham
2006-04-25 22:38               ` Rafael J. Wysocki
2006-04-25 22:38                 ` Rafael J. Wysocki
2006-04-25 22:25             ` Pavel Machek
2006-04-25 22:25               ` Pavel Machek
2006-04-25 22:30               ` Nigel Cunningham
2006-04-25 22:30                 ` Nigel Cunningham
2006-04-25 22:36                 ` Pavel Machek
2006-04-25 22:36                   ` Pavel Machek
2006-04-25 22:43               ` Rafael J. Wysocki
2006-04-25 22:43                 ` Rafael J. Wysocki
2006-04-26  0:49                 ` Nigel Cunningham
2006-04-30 12:27                   ` Rafael J. Wysocki
2006-04-30 12:27                     ` Rafael J. Wysocki
2006-05-01  1:49                     ` Nigel Cunningham
2006-05-01  1:49                       ` Nigel Cunningham
2006-05-01 11:20                       ` Rafael J. Wysocki
2006-05-01 22:56                         ` Nigel Cunningham
2006-04-26  2:24             ` Nick Piggin
2006-04-26  2:24               ` Nick Piggin
2006-04-26  3:41               ` Nigel Cunningham
2006-04-26 16:22                 ` Nick Piggin
2006-04-26 21:16                   ` Rafael J. Wysocki
2006-04-26 21:16                     ` Rafael J. Wysocki
2006-04-26  8:10               ` Pavel Machek
2006-04-26  8:10                 ` Pavel Machek
2006-04-27 19:53     ` [RFC][PATCH] swsusp: support creating bigger images (rev. 2) Rafael J. Wysocki
2006-04-27 19:53       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.