How to observe block layer bandwidth of pid through cgroup stats?
在 2022-09-30 14:36:46,"Joseph Qi" <joseph.qi@linux.alibaba.com> 写道: >So you observed the bandwidth from dd? >That's not wright. dd returns the page cache write bandwidth, while iocost >controls block layer. You have to observe block layer bandwidth through >cgroup stats. > >Thanks, >Joseph > >On 9/29/22 8:14 PM, 王传国 wrote: >> Hi hongyun, >> Still no effect. My operation steps are as follows: >> 1. Add a storage with SCSI - 60G at my qemu-vm, format as ext4; >> 2.python3 iocost_coef_gen.py --testdev /dev/sda got: >> 8:0 rbps=14335615141 rseqiops=93650 rrandiops=90693 wbps=1178201578 wseqiops=82077 wrandiops=77142 >> 3.re-format sda as ext4 >> 4.init as: >> mount /dev/sda1 /wcg/data2/ >> echo "8:0 rbps=14335615141 rseqiops=93650 rrandiops=90693 wbps=1178201578 wseqiops=82077 wrandiops=77142" > /sys/fs/cgroup/blkio/blkio.cost.model >> echo "8:0 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos >> cd /sys/fs/cgroup/blkio >> mkdir blkcg_be blkcg_lc >> echo "8:0 50" > /sys/fs/cgroup/blkio/blkcg_be/blkio.cost.weight >> echo "8:0 1000" > /sys/fs/cgroup/blkio/blkcg_lc/blkio.cost.weight >> echo 0 > /sys/block/sda/queue/rotational >> 5. Executing 2 commands in two terminals at the same time: >> echo $$ > /sys/fs/cgroup/blkio/blkcg_be/cgroup.procs >> dd if=/dev/zero of=/wcg/data2/ddfile1 bs=1M count=20480 >> ------------------- >> echo $$ > /sys/fs/cgroup/blkio/blkcg_lc/cgroup.procs >> dd if=/dev/zero of=/wcg/data2/ddfile2 bs=1M count=20480 >> 6. 5 times,2 dd all got about 550 MB/s which I think was wrong. >> (blkcg_be should do almost when blkcg_lc was done, so the speed should be nearly 1:2) >> >> >> What did I do wrong???!!! >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 在 2022-09-29 16:38:41,"钱君(弘云)" <hongyun.qj@alibaba-inc.com> 写道: >> >> 使用说明 >> 第一步:为对应的磁盘生成相应的cost model数据 >> >> 在进行IO评测的时候需要获取iocost的model模型数据,iocost_coed_gen.py用来获取model数据,这个脚本可以使用内核源码中的tools/cgroup/iocost_coef_gen.py脚本来生成,或者从以下网址可以获取脚本的源码https://github.com/gregkh/linux/blob/master/tools/cgroup/iocost_coef_gen.py,这个脚本会通过直接读写磁盘来获取对应的磁盘模型数据,所以会破坏磁盘上的数据或文件系统,可以在磁盘被格式化前进行,只需要获取一次即可,比如我这边获取/dev/vdc盘的数据,可以用如下命令获取,最后一行输出的数据就是我们需要的数据 >> >> [root@iZbp14ah12fefuzd6rh5rkZ ~]# python3 iocost_coed_gen.py --testdev /dev/vdc >> Test target: vdc(253:32) >> Temporarily disabling elevator and merges >> Determining rbps... >> Jobs: 1 (f=1): [R(1)][100.0%][r=128MiB/s,w=0KiB/s][r=1,w=0 IOPS][eta 00m:00s] >> rbps=179879083, determining rseqiops... >> Jobs: 1 (f=1): [R(1)][100.0%][r=26.5MiB/s,w=0KiB/s][r=6791,w=0 IOPS][eta 00m:00s] >> rseqiops=6862, determining rrandiops... >> Jobs: 1 (f=1): [r(1)][100.0%][r=26.6MiB/s,w=0KiB/s][r=6800,w=0 IOPS][eta 00m:00s] >> rrandiops=6830, determining wbps... >> Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=128MiB/s][r=0,w=1 IOPS][eta 00m:00s] >> wbps=179882078, determining wseqiops... >> Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=26.6MiB/s][r=0,w=6798 IOPS][eta 00m:00s] >> wseqiops=6862, determining wrandiops... >> Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=26.6MiB/s][r=0,w=6806 IOPS][eta 00m:00s] >> wrandiops=6830 >> Restoring elevator to none and nomerges to 0 >> >> 253:32 rbps=179879083 rseqiops=6862 rrandiops=6830 wbps=179882078 wseqiops=6862 wrandiops=6830 >> >> 然后将最后一行的数据写入对应磁盘的cost model文件中,如下所示: >> >> echo "253:32 rbps=179879083 rseqiops=6862 rrandiops=6830 wbps=179882078 wseqiops=6862 wrandiops=6830" > /sys/fs/cgroup/blkio/blkio.cost.model >> >> 注意:不需要在所有的机器上都执行这个操作,同样的磁盘的model数据是一样的,我们只需要获取一次即可,然后将对应的数据写入blkio root目录的blkio.cost.model接口文件即可。 >> >> 第二步:配置磁盘的QOS,开启blk-iocost >> >> 这里假设使用cost.qos接口为设备253:32开启blk-iocost功能,并且当读写延迟rlat|wlat的请求有95%超过5 ms时,认为磁盘饱和。内核将进行磁盘发送请求速率的调整,调整区间为最低降至原速率的50%,最高升至原速率的150%: >> >> echo "253:32 enable=1 ctrl=user rpct=95.00 rlat=5000 wpct=95.00 wlat=5000 min=50.00 max=150.00" > /sys/fs/cgroup/blkio/blkio.cost.qos >> 第三步:为容器分配io权重 >> >> 这里可以根据业务容器不同的io的时延等级,给与设置不同的io权重,假设这里设置be blkcg的io权重为50,lc blkcg的io权重为1000: >> >> echo "253:32 50" > /sys/fs/cgroup/blkio/blkcg_be/blkio.cost.weight >> echo "253:32 50" > /sys/fs/cgroup/blkio/blkcg_lc/blkio.cost.weight >> >> 这样在IO资源使用饱和时,会根据blkcg的io权重进行io资源的分配。 >> >> 注意事项 >> >> 在ECS实例中使用blk-iocost功能启动ctrl=auto配置项时,如果对应的云盘为高效云盘、SSD云盘、ESSD云盘或NVMe SSD本地盘类型时,需要手动将对应磁盘的rotational属性设置为0: >> >> #[$DISK_NAME]为磁盘名称 >> echo 0 > /sys/block/[$DISK_NAME]/queue/rotational >> >> 另外切记不要使用分区的maj和min的值作为参数进行配置,需要使用硬盘的maj和min值进行配置。 >> >> >> >> ------------------------------------------------------------------ >> 发件人:Joseph Qi<joseph.qi@linux.alibaba.com> >> 日 期:2022年09月29日 16:26:55 >> 收件人:王传国<wangchuanguo@163.com>; 钱君(弘云)<hongyun.qj@alibaba-inc.com> >> 抄 送:<cloud-kernel@lists.openanolis.cn>; <storage@lists.openanolis.cn> >> 主 题:Re: [ck]Re: cgroup2 io weight 没有效果 >> >> Sure. >> Hi, Jun Qian, could you please share an iocost sample steps for buffer >> io weight control? >> >> Thanks, >> Joseph >> >> On 9/29/22 4:09 PM, 王传国 wrote: >>> Hi, >>> I'm using v1. >>> I do not need bfq, but mq-deadline also has no effect. >>> buffer IO is my target(I've added the cgwb_v1 at grub). If no throttle setting, it would be too fast for observation. >>> can you give out a demo shell with 2 different blkio.cost.weight at /sys/fs/cgroup/blkio?? >>> Thanks very much! >>> >>> >>> >>> >>> 在 2022-09-29 15:52:16,"Joseph Qi" <joseph.qi@linux.alibaba.com> 写道: >>>> Hi, >>>> >>>> Which cgroup version do you use? cgroup v1 or v2? >>>> >>>> For bfq, I don't have any experience on weight control. >>>> For iocost, better to specify qos and model, according to the documentation >>>> suggested before. >>>> >>>> Seems you've mixed bfq, iocost, block throttle together. I'd suggest you >>>> evaluate them individually and use direct io first. >>>> >>>> Thanks, >>>> Joseph >>>> >>>> On 9/29/22 3:26 PM, 王传国 wrote: >>>>> Hi Joseph, >>>>> Thanks for your reply! But I have 2 questions, : >>>>> 1. why the blkio.bfq.weight has no effect after "echo bfq > /sys/block/vdb/queue/scheduler" >>>>> 2.iocost didn't work either, two fio both got 5M,but 3M and 6M is what I want.Point out my mistakes please! >>>>> Thanks very much! >>>>> And my shell like below: >>>>> >>>>> mount /dev/vdb1 /wcg/data2/ >>>>> >>>>> cd /sys/fs/cgroup/blkio >>>>> >>>>> echo bfq > /sys/block/vdb/queue/scheduler >>>>> >>>>> echo 0 > /sys/block/vdb/queue/iosched/low_latency >>>>> >>>>> echo "253:16 10485760" > blkio.throttle.write_bps_device >>>>> >>>>> echo "253:16 enable=1" > blkio.cost.qos >>>>> >>>>> echo "253:16 ctrl=auto" > blkio.cost.model >>>>> >>>>> echo 0 > /sys/block/vdb/queue/rotational >>>>> >>>>> mkdir fio1 fio2 >>>>> >>>>> echo "253:16 100" > fio1/blkio.cost.weight >>>>> >>>>> echo "253:16 200" > fio2/blkio.cost.weight >>>>> >>>>> >>>>> >>>>> >>>>> echo $$ > /sys/fs/cgroup/blkio/fio1/cgroup.procs >>>>> >>>>> fio -rw=write -ioengine=libaio -bs=4k -size=1G -numjobs=1 -name=/wcg/data2/fio_test1.log >>>>> >>>>> >>>>> >>>>> >>>>> #do follows at another console >>>>> >>>>> echo $$ > /sys/fs/cgroup/blkio/fio2/cgroup.procs >>>>> >>>>> fio -rw=write -ioengine=libaio -bs=4k -size=1G -numjobs=1 -name=/wcg/data2/fio_test2.log >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 在 2022-09-28 16:41:40,"Joseph Qi" <joseph.qi@linux.alibaba.com> 写道: >>>>>> 'io.weight' is for cfq io scheduler, while 'io.bfq.weight' is for bfq io >>>>>> scheduler, as its name indicates. >>>>>> So you may need configure corresponding io scheduler as well. >>>>>> >>>>>> BTW, if you want io weight control, I recommend another approach named io >>>>>> cost. The following documentation may help to understand the details: >>>>>> https://help.aliyun.com/document_detail/155863.html >>>>>> >>>>>> Thanks, >>>>>> Joseph >>>>>> >>>>>> On 9/28/22 1:50 PM, 王传国 wrote: >>>>>>> 各位同仁, >>>>>>> >>>>>>> 我看到cgroup2中有io.weight 和 io.bfq.weight,区别是什么? >>>>>>> >>>>>>> 我的理解是为了控制兄弟group在父group下的IO权重,我在如下版本测试了下,好像结果不太对,谁能指点一下,拜谢! >>>>>>> >>>>>>> # uname -a >>>>>>> >>>>>>> Linux localhost.localdomain 4.19.91-26.an8.x86_64 #1 SMP Tue May 24 13:10:09 CST 2022 x86_64 x86_64 x86_64 GNU/Linux >>>>>>> >>>>>>> >>>>>>> >>>>>>> 我的测试脚本: >>>>>>> >>>>>>> #change to cgroup2 by adding cgroup_no_v1=all into grub param >>>>>>> >>>>>>> mkdir -p /aaa/cg2 >>>>>>> >>>>>>> mkdir -p /aaa/data2 >>>>>>> >>>>>>> mount -t cgroup2 nodev /aaa/cg2 >>>>>>> >>>>>>> mount /dev/sdb1 /aaag/data2/ >>>>>>> >>>>>>> echo bfq > /sys/block/vdb/queue/scheduler #做或不做 >>>>>>> >>>>>>> >>>>>>> >>>>>>> mkdir /aaa/cg2/test >>>>>>> >>>>>>> echo "+io +memory" > /aaa/cg2/cgroup.subtree_control >>>>>>> >>>>>>> echo "+io +memory" > /aaa/cg2/test/cgroup.subtree_control >>>>>>> >>>>>>> cat /aaa/cg2/test/cgroup.controllers >>>>>>> >>>>>>> echo "8:16 wbps=10485760" > /aaa/cg2/test/io.max >>>>>>> >>>>>>> echo $$ > /aaa/cg2/test/cgroup.procs >>>>>>> >>>>>>> >>>>>>> >>>>>>> mkdir -p /aaa/cg2/test/dd1 >>>>>>> >>>>>>> mkdir -p /aaa/cg2/test/dd2 >>>>>>> >>>>>>> echo 200 > /aaa/cg2/test/dd1/io.weight >>>>>>> >>>>>>> #echo 200 > /aaa/cg2/test/dd1/io.bfq.weight #两个选项都试了 >>>>>>> >>>>>>> >>>>>>> >>>>>>> #在另外2个终端执行如下的2个测试: >>>>>>> >>>>>>> echo $$ > /aaa/cg2/test/dd1/cgroup.procs >>>>>>> >>>>>>> dd if=/dev/zero of=/aaa/data2/ddfile1 bs=128M count=1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> echo $$ > /aaa/cg2/test/dd2/cgroup.procs >>>>>>> >>>>>>> dd if=/dev/zero of=/aaa/data2/ddfile2 bs=128M count=1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> 我得到了两个 500K+, 而不是期望的300K+ and 600K! >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Cloud Kernel mailing list -- cloud-kernel@lists.openanolis.cn >>>>>>> To unsubscribe send an email to cloud-kernel-leave@lists.openanolis.cn >>>>>> _______________________________________________ >>>>>> Cloud Kernel mailing list -- cloud-kernel@lists.openanolis.cn >>>>>> To unsubscribe send an email to cloud-kernel-leave@lists.openanolis.cn >>>> _______________________________________________ >>>> Cloud Kernel mailing list -- cloud-kernel@lists.openanolis.cn >>>> To unsubscribe send an email to cloud-kernel-leave@lists.openanolis.cn >>>> >>>> _______________________________________________ >>>> Cloud Kernel mailing list -- cloud-kernel@lists.openanolis.cn >>>> To unsubscribe send an email to cloud-kernel-leave@lists.openanolis.cn >_______________________________________________ >Cloud Kernel mailing list -- cloud-kernel@lists.openanolis.cn >To unsubscribe send an email to cloud-kernel-leave@lists.openanolis.cn