From: Huang Ying <ying.huang(a)intel.com>
ANBZ: #80
cherry-picked from
https://git.kernel.org/pub/scm/linux/kernel/git/vishal/tiering.git/commit/?…
When test memory tiering optimization solution with the application
using THP, we found sometimes the demotion/promotion between the
fast/slow memory nodes pauses for several hundreds seconds. After
some digging, we found there is a live lock as below.
1. In NUMA hint page fault handler, if the latency is less than the
threshold, the THP in the slow memory node will be promoted.
2. In migrate_misplaced_transhuge_page(), if there is only 1 order 9
free page (and 0 order 10 free page) in the fast memory node, 1 THP
will be allocated successfully.
3. In migrate_misplaced_transhuge_page(), numamigrate_isolate_page()
will be called to check the number of free pages in the fast memory node and isolate
the original THP page in the slow memory node. Because there's no
enough free page available in the fast memory node,
numamigrate_isolate_page() will return with failure and wake up kswapd
of the DRAM node.
4. In migrate_misplaced_transhuge_page(), the allocated THP in DRAM
will be freed, so there's 1 order 9 free page in the fast memory node
again.
5. In kswapd of the DRAM node, because the watermark is OK and there's
1 order 9 free page. kswapd will go to sleep without demotion
(reclaiming).
To resolve the issue, in this patch, in
migrate_misplaced_transhuge_page(), numamigrate_isolate_page() is
called before allocating the THP. So the remaining 1 order 9 free
page can be consumed successfully. And for the following THP
migration requests, kswapd can demotion (reclaim) and compact to make
some order 9 free pages available.
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
---
mm/migrate.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index ccb561d..0ac0e86 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2118,6 +2118,10 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
unsigned long mmun_start = address & HPAGE_PMD_MASK;
unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
+ isolated = numamigrate_isolate_page(pgdat, page);
+ if (!isolated)
+ goto out_fail;
+
new_page = alloc_pages_node(node,
(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
HPAGE_PMD_ORDER);
@@ -2125,12 +2129,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
goto out_fail;
prep_transhuge_page(new_page);
- isolated = numamigrate_isolate_page(pgdat, page);
- if (!isolated) {
- put_page(new_page);
- goto out_fail;
- }
-
/* Prepare a page as a migration target */
__SetPageLocked(new_page);
if (PageSwapBacked(page))
@@ -2158,12 +2156,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
unlock_page(new_page);
put_page(new_page); /* Free it */
- /* Retake the callers reference and putback on LRU */
- get_page(page);
- putback_lru_page(page);
- mod_node_page_state(page_pgdat(page),
- NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR);
-
goto out_unlock;
}
@@ -2234,6 +2226,13 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
spin_unlock(ptl);
out_unlock:
+ if (isolated) {
+ /* Retake the callers reference and putback on LRU */
+ get_page(page);
+ putback_lru_page(page);
+ mod_node_page_state(page_pgdat(page),
+ NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR);
+ }
unlock_page(page);
put_page(page);
return 0;
--
1.8.3.1