( 2/23/2022 4:44 PM, Huang, Ying S:
Baolin Wang <baolin.wang(a)linux.alibaba.com>
writes:
( 2/23/2022 3:51 PM, Huang, Ying S:
> Baolin Wang <baolin.wang(a)linux.alibaba.com> writes:
>
>> ANBZ: #80
>>
>> Add page promotion throttle statistic, which can be used to check
>> how many cold pages were trying to be prmoted to DRAM, and help
>> to tuning the latency threhold.
> Can this be calculated via the following formula?
> numa_hint_faults - pgpromote_candidate
They are not same. Since numa_hint_faults will not contain file page
promotion statistics, while the should_numa_migrate_memory() will
check the file pages latency before promotion. Here some statistics I
observed:
pgpromote_candidate 3890100
pgpromote_cold_throttle 50342001
numa_hint_faults 35341839
Then how about adding a counter for unmapped pages accesses we
checked?
And I think these kind of counters are for debugging only at least
for
now. Should we merge them formally?
Do you have an example about how to use this new counter or similar
one?
We can add a counter for unmapped pages, but it still be little clear
for these cold throttling counter. Please also considering
numa_hint_faults will contain local faults, and some page promotion
counts if the DRAM is free enough.
I think the cold throttling counter is not only for debugging, but
also for doing some decision in future. We can know how many cold
pages are accessed, and we can measure the cold and hot memory
distribution for this workdload to decide if the workload is suitable
for the tiered-memory. So a clear and accurate cold throttling counter
will be helpful. How do you think?
If our target is the page temperature distribution, then a histogram may
be even better. I have implemented one before. The histogram will
count the hint page fault number within each access latency range. And
we can output and clear the histogram every
sysctl_numa_balancing_scan_period_max.