Baolin Wang <baolin.wang(a)linux.alibaba.com> writes:
zhong jiang
<zhongjiang-ali(a)linux.alibaba.com> writes:
On 2022/2/10 8:58 H, Huang, Ying wrote:
zhongjiang-ali
<zhongjiang-ali(a)linux.alibaba.com> writes:
> Currently, Mysql testcase show that a large number of thp are migrated
> from pmem node to toptier node, it will bring in more pgpromote_demoted
> and migrated failiure. because pmem node memory is marked as prot_none,
> it will be migrated by cpu access as soon as possible when it is hot,
> and it is unnesscessary to migrate thp to dram when dram memory is not
> enough, which will bring in more demoted and promoted.
>
> Hence, the patch forbid the thp to produce in pmem node. the result show
> about 3% improvements. the relative statistics is as follows.
>
> before appling patch:
> mysql prepare:
> pgpromote_demoted 908267
> pgmigrate_fail_dst_node_fail 428223
> pgmigrate_fail_numa_isolate_fail 460480
>
> mysql run:
> pgpromote_demoted 2901105
> pgmigrate_fail_dst_node_fail 5653776
> pgmigrate_fail_numa_isolate_fail 5686052
>
> after appling patch:
> mysql prepare:
> pgpromote_demoted 839297
> pgmigrate_fail_dst_node_fail 36585
> pgmigrate_fail_numa_isolate_fail 36585
>
> mysql run:
> pgpromote_demoted 913828
> pgmigrate_fail_dst_node_fail 235863
> pgmigrate_fail_numa_isolate_fail 235870
>
> Signed-off-by: zhongjiang-ali <zhongjiang-ali(a)linux.alibaba.com>
> ---
> mm/page_alloc.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8cfce92..4fff3cd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -461,6 +461,17 @@ static __always_inline int get_pfnblock_migratetype(struct page
*page, unsigned
> return __get_pfnblock_flags_mask(page, pfn, PB_migrate_end, MIGRATETYPE_MASK);
> }
> +static inline bool allow_hugepage_allocation(int nid, unsigned
> int order)
> +{
> + if (node_is_toptier(nid))
> + return true;
> +
> + if (order != HPAGE_PMD_ORDER)
> + return true;
> +
> + return false;
> +}
> +
> /**
> * set_pfnblock_flags_mask - Set the requested group of flags for a
pageblock_nr_pages block of pages
> * @page: The page within the block of interest
> @@ -3689,6 +3700,9 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct
zone *zone)
> }
> }
> + if (!allow_hugepage_allocation(zone_to_nid(zone),
> order))
> + continue;
> +
It appears that this will disable node reclaiming for THP allocation.
So more pages will be allocated in PMEM node because of allocation
fallback?
We just allow normal pages allocate in pmem node, hence, thp
allocation will fallback to produce more normal pages.
Mysql testcase show that too many thps is promoted to toptier ,
due to toptier memory is not enough, it will bring in
more pgpromote_deomted and dst_node_full counter increasing. In
that case, we prefer to remote access rather
than migrate thp between pmem and toptier node frequently, which
will make performance decrease.
Maybe we are looking at different source code :-).
In latest
upstream
code, zone_allows_reclaim() is to control node reclaiming (or zone
reclaim) only. Which repo should I look?
I think you misunderstood the change, the change is in
get_page_from_freelist(), not in zone_allows_reclaim().
OK, I see. I think the `diff` program fools me:
@@ -3689,6 +3700,9 @@ static bool zone_allows_reclaim(struct zone *local_zone, struct zone
*zone)
}
}
+ if (!allow_hugepage_allocation(zone_to_nid(zone), order))
+ continue;
+
if (no_fallback && nr_online_nodes > 1 &&
zone != ac->preferred_zoneref->zone) {
int local_nid;
From my understanding, Zhongjiang is trying to disable
the memory
allocation fallback for THP, right?
I think so too now.
But that will cause more demotion if we can not
fallback to PMEM node?
If THP fails to be allocated, normal pages will be allocated instead.
And it appears that if THP is failed to be demoted (with this patch, it
will always fail), THP will be split too. So we may have much less THP
in system with the patch. Zhongjiang, Can you check it?
Another choice is to split THP if migration fails. That's always a
question to prefer THP or local/hot normal pages.
Memory tiering
is challenging for THP. There's no many free pages
(a
little more than high watermark) in DRAM node, so it's hard to allocate
THP there except workloads start up.
Just curiously, whether disabling THP helps your workload?
From our previous testing, enabling THP can benefit the
performance. And our ECS environment enabled the THP by default.
Thanks!
Best Regards,
Huang, Ying