Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

Research Interests

Google Scholar Link

"Incipient Stage" Research Ideas

My research interests lie in the broad area of bigdata analysis, large scale data mining, bioinformatics, network topology analysis, and power grid risk evaluation. I am interested in developing data-analytical methods and tools to make different complex data more understandable and useful.

Completed Projects for Download


Specifically, I am currently doing the research in the following areas:


System Modeling, Data Analysis and Risk Evaluation for Power Grid System Failures

  • Nuisance Tripping Forecasting for Outdated Residential Power Network  
         Heating, ventilation, and air conditioning (HVAC) loads contribute significantly to the energy consumption of a residential power network. Environmental factors, such as ambient temperature, humidity, wind, and solar radiation, determine the power demand of the HVAC loads. Due to the constant variation of these factors throughout the day, the daily power demand curve of HVAC loads exhibits significant fluctuations, especially during peak hours. These fluctuations can impact the power quality of the entire distribution network.
        My related publication: GreenTech 2023, ECCE 2023

    Furthermore, certain power distribution systems are deployed in outdoor environments, for example, outdated residential buildings and expeditionary power networks. In these scenarios, extreme operating temperatures can lead to circuit breakers tripping at unexpected load levels, posing risks to the resilience of the power distribution system.
        My related publication: PESGM 2024  

  • Building HVAC energy consumption prediction
        Heating, ventilation, and air conditioning (HVAC) loads contribute a significant portion of energy that a building consumes. Especially for microgrid power systems that operate in island mode, the HVAC loads during daily peak hours can even affect the power quality of the whole grid. Therefore, studying load behavior of the HVAC systems can decrease the energy waste and reduce the risk of over load.
        Preliminary results: CSCI 2018

  • Probability Based Circuit Breaker Modeling and Risk Evaluation  
         Circuit breakers are widely applied in power system protection by interrupting fault current. Previously, all the research consider the tripping time as a constant number to avoid the difference between individual Circuit Breakers. In this work, we propose a probability based modeling to describe the property of Circuit Breaker.

        i) A brand new simulation model that contains probability tools is developed to realistically describe the tripping characteristics, and a product failure rate is also considered to reflect possible circumstances in real-world application of the thermal-magnetic circuit breakers.
        My related publication: APEC 2017

       ii) Circuit breakers can be manually or automatically reset after being tripped. With time elapsing, some mechanisms inside the circuit breakers may get aged and fatigued, and some connections and contacts may become loose and misaligned. In this work, we propose a probability based simulation modeling methodology for worn circuit breakers.
        My related publication: IEEE CYBER 2017

     iii) Over all, customizable simulation models for protective devices are demanded to effectively conduct system-level reliable analyses. In this work, thermal energy-based data analysis methodologies are applied to the protective devices’ physical properties, based on the manufacturer’s time/current data sheet. 
        My related publication: Energies (2022)

  • Fuse modeling and fault study
         Fuses act as sacrificial protective devices against over-current faults in AC and DC electronic circuit, and widely applied in modern power generation, transmission and distribution, and load service systems. In order to study the thermal energy in the process of fuse melting, data analysis is conducted on the time/current curve provided by the manufacturer.

        i) We present a new modeling method to provide the most optimal equation to describe the relationship between time, different level of fault current and the thermal energy that melts the fuse.
        My related publication: PESGM 2016

        ii) The melt for the fuse is related to current and time. However in practical, environmental temperature is a huge fact, especially when the fuses are exposed to seasonal temperature extremes. In this work, we will analyse the time/current curve by considering the temperature effect.
        My related publication: GreenTech 2023



Identifying topological properties to characterize network

  • Topological profile based network denoise
        We present a network topological profile based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of complex prediction by reducing the impact of hub nodes.

        i) In order to get the accurate topological profiles, we introduce two types of resistance into the simple random walk model to develop a new algorithm: Random walk with resistance (RWS). The resistances ensure the topological profile for different starting node will be different, and effectively control the impact from the hub nodes.
        My related publication: BIBM 2012, Bioinformatics (2013)

    ii) Random walk with resistance (RWS) is working well when the parameter is setting properly. How to choose the parameters is a difficult problem for the user. To address this problem, we developed a fully automated algorithm to predict the protein complex.
        My related publication: Proteome Science (2013)

  • Bio-network topological based cancer metastasis prediction
        The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution. 
        My related publication: BMC Bioinformatics (2020)



Network function prediction and pathway discovery

  • Cancer subnetworks discovery
        A robust definition of cancer subnetworks can lead to better patient prognostic and more effective treatment plans. We have to develop some methods to re-analyze the gene expression data of several independent cancer patient cohorts based on which the current subnetworks were defined. 
        My related publication: Genomics (2019)

  • Network-based classification
        We develope a novel computational method to analyze whole-genome DNA methylation data for endometrial tumors within the context of a human protein-protein interaction network, in order to identify subnetworks as potential epigenetic biomarkers for predicting tumor recurrence.
        My related publication: ACMBCB 2012




Biological Sequence Analysis

  • Identify TF binding sites on DNA sequence
        We propose a novel motif finding algorithm that finds consensus patterns using a population-based stochastic optimization technique called Particle Swarm Optimization. We use a word dissimilarity graph to remap the neighborhood structure of the solution space of DNA motifs. The experimental results show that our method is both more efficient and more accurate than several existing algorithms.
        My related publication: International Journal of Computational Biology and Drug Design (2009), BIBM'08workshops

        We make further modifications of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. We use both consensus and position-specific weight matrix representations in our algorithm; models gaps explicitly and find gapped motifs without any detailed knowledge of gaps.
        My related publication: BioData Mining (2010), EvoBio'10

  • Next-generation DNA sequencing data analysis
        To prove Nelf-b plays important roles in multiple aspects of transcriptional regulation in mammaian genomes, we process the ChIP-Seq data and analyzed the peak information. The result shows that genetic ablation of Nelf-b leads to deregulation of pol II pausing and defects in cell growth and survival.
        My related publication: Journal of Biological Chemistry (2011)

  • Cis-regulatory elements identification and analysis
        We propose a completely parameter-free and systematic method for constructing gene co-expression networks and predicting functional modules as well as cis-regulatory elements.
        My related publication: BMC Bioinformatics (2011)