计算机科学                        
                
                                
                        
                            标杆管理                        
                
                                
                        
                            固件                        
                
                                
                        
                            树(集合论)                        
                
                                
                        
                            过程(计算)                        
                
                                
                        
                            实施                        
                
                                
                        
                            数据结构                        
                
                                
                        
                            吞吐量                        
                
                                
                        
                            树形结构                        
                
                                
                        
                            钥匙(锁)                        
                
                                
                        
                            分布式计算                        
                
                                
                        
                            计算机工程                        
                
                                
                        
                            嵌入式系统                        
                
                                
                        
                            软件工程                        
                
                                
                        
                            操作系统                        
                
                                
                        
                            无线                        
                
                                
                        
                            业务                        
                
                                
                        
                            数学分析                        
                
                                
                        
                            营销                        
                
                                
                        
                            数学                        
                
                        
                    
            作者
            
                Diego Didona,Nikolas Ioannou,Radu Stoica,Kornilios Kourtis            
         
                    
        
    
            
            标识
            
                                    DOI:10.14778/3430915.3430926
                                    
                                
                                 
         
        
                
            摘要
            
            Solid-state drives (SSDs) are extensively used to deploy persistent data stores, as they provide low latency random access, high write throughput, high data density, and low cost. Tree-based data structures are widely used to build persistent data stores, and indeed they lie at the backbone of many of the data management systems used in production and research today. We show that benchmarking a persistent tree-based data structure on an SSD is a complex process, which may easily incur subtle pitfalls that can lead to an inaccurate performance assessment. At a high-level, these pitfalls stem from the interaction of complex software running on complex hardware. On the one hand, tree structures implement internal operations that have non-trivial effects on performance. On the other hand, SSDs employ firmware logic to deal with the idiosyncrasies of the underlying flash memory, which are well known to also lead to complex performance dynamics. We identify seven benchmarking pitfalls using RocksDB and WiredTiger, two widespread implementations of an LSM-Tree and a B+Tree, respectively. We show that such pitfalls can lead to incorrect measurements of key performance indicators, hinder the reproducibility and the representativeness of the results, and lead to suboptimal deployments in production environments. We also provide guidelines on how to avoid these pitfalls to obtain more reliable performance measurements, and to perform more thorough and fair comparisons among different design points.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI