Data-centric AI Foundation Models for Fleet-wide Data Imputation, Performance and Degradation Analysis of PV Systems |
Roger French, Yangxin Fan, Raymond Wieser, Arafath Nihar, Thomas Ciardi, Alexander Bradley, Priyan Rajamohan, Benjamin Pierce, Pawan Tripathi, Erika Barcelo, Laura Bruckman, Yinghui Wu Case Western Reserve University, , , |
PV systems represent a complex interplay of materials and the environment, involving intrinsically multi-modal datatypes. By leveraging existing data streams, technologically informed “AI for PV” can be developed using a “data-centric AI'' approach, as opposed to the “model centric AI” of things such as Large Language Models. Data-centric AI creates models which naturally address real-world questions of PV systems’ design, performance, and degradation. Data-centric AI challenges can be addressed by CRADLE distributed and high performance computing, data FAIRification, and recognizing that real-world PV fleets are a network of geospatiotemporal systems, best represented using spatiotemporal graph neural network (st-GNN) models with metadata feature vectors. A st-GNN model trained on geospatially distributed, time-series data, will utilize the intrinsic spatial and temporal coherence of the real-world PV systems. The model therefore encapsulates all the information about the PV fleet; it is both an st-graph and a knowledge graph (k-graph). This trained PV k/st-graph foundation model is literally a data-driven Digital Twin of the fleet of PV systems it trained on, and encompasses the full lifecycle of the PV systems. Data-centric k/st-graph deep learning AI enables generative data imputation, power forecasting, and performance loss rate determination. And it can be used to predict performance of PV systems not yet constructed. |