How Big Data relates to Smart-City Solutions?
Smart city projects generally use sensors and connected devices to collect and analyze data. This data is used to optimize city operations, manage resources, and improve the everyday life of citizens. Smart cities utilize technology to improve…
- Access to public transport
- Manage traffic
- Optimize water usage and power supply
- Improve law enforcement services, schools, hospitals
- and many more…
Big data plays significant role in smart city solutions – likewise processing of data gathered via IoT devices so that additional analysis may be performed to identify trends and requirements in the city. Sensors deployed across the city create massive quantities of data, but if it is used efficiently, it may lead to numerous improvements in the city.
Many governments are considering adopting the smart city solutions implementation in their cities – implementing big data applications that support smart city components to reach the required level of sustainability and improve the living standards.
Smart cities utilize multiple technologies to improve the performance of health, transportation, energy, education, and water services leading to higher levels of comfort of their citizens. It also involves reducing costs and resource consumption in addition to more effectively and actively engaging with their citizens.
As digitization has become an integral part of our everyday life, data collection has resulted in the accumulation of huge amounts of data that can be used for improving lot of things. Effective analysis and utilization of big data is a key factor for success in many business and service domains, including the smart city domain.
Main strength of the big data concept is the high influence it will have on numerous aspects of a smart city and consequently on people’s lives. Many governments have started to utilize big data to support the development and sustainability of smart cities around the world. This has allowed cities to maintain standards, principles, and requirements of the applications of smart city through realizing the main smart city characteristics. These characteristics include sustainability, resilience, governance, enhanced quality of life, and intelligent management of natural resources and city facilities.
There are well-defined components of the smart city, such as mobility, governance, environment, and people as well as its applications and services such as healthcare, transportation, smart education, and energy. To facilitate such applications and services, large computational and storage facilities are needed. One way to provide such platforms is to rely on Cloud Computing and utilize the many advantages of using cloud services to support smart city big data management and applications.
How does the city become smart?
Smart cities have three fundamental layers of operations :
- Technological Layer: A large number of sensors and connected devices used to provide a wide range of services.
- Dedicated Applications: Information systems used by city officials and citizens to improve city operations.
- Application Usage: Implementation and usage of the applications inside the city by designated users.
The technological layer of a smart city includes:
- Internet of Things (IoT): Sensors and internet-connected devices can communicate with each other and send data to management systems. Smart cities use IoT to collect data and to actively solve problems.
- Information and Communication Technology (ICT): This platform is used to enable communication between citizens. Smart cities can use ICT to make changes to city services by analyzing data and feedback from citizens.
- Sensors: Sensors can collect different types of information, like light pressure, temperature, number of vehicles, and people.
Few examples of city activities that can be smart :
- Smart Transport: Improves traffic management through the use of navigation apps, smart cards, and signal control systems. Smart transport technologies notify travelers about traffic and other road conditions, guide drivers to available parking spots, and detect traffic accidents.
- Smart Water and Energy: Based on smart meters which gather data about energy and water demands and uses. This data enables cities to regulate supplies. For instance, to supply more water or energy to certain parts of the city that consume more resources.
- Smart Healthcare: Improves health treatment and diagnosis through technology and smart devices. For instance, smart sensors can detect air or water pollution before they turn into a public health risk. Sensors can also gather data from medical facilities and detect the spread of diseases.
What is Big Data?
It can be defined as data sets whose size or type is beyond the ability of traditional relational databases to capture, manage and process the data with low latency. Characteristics of big data include 3 Vs…
- Volume of data being stored and used by organizations;
- Variety of data being generated by IoT devices; and
- Velocity, or speed, in which that data was being generated and updated.
These days, sources of data are becoming more complex than those for traditional data because they are being driven by artificial intelligence (AI), mobile devices, social media and the Internet of Things (IoT).
For example, the different types of data originate from sensors, devices, video/audio, networks, log files, transactional applications, web and social media — much of it generated in real time and at a very large scale.
Data types involved are structured, semi-structured and unstructured data. This data is mined for useful information and used in machine learning projects, predictive modeling and other advanced analytics applications. Big Data allows companies to address issues they are facing in their business effectively using Big Data Analytics. Big data grows exponentially with time.
Where Smart Cities Store Their Big Data (and other data)?
Smart cities need a huge amount of archived and real-time data to function properly. Some of the example data types are…
- Smart Cars Data: Autonomous and smart cars are becoming more integrated with mobile systems. As a result, cars can communicate with each other and with cities. This communication can reduce congestion, prevent road incidents, and improve navigation.
- Camera Systems Data: Camera surveillance can help with traffic enforcement, improve public safety, build intelligent lighting systems, and serve as a crime detector.
- Environmental Sensor Data: Air quality sensors enable cities to find and take action against polluters, locate green areas with low air quality, and provide air quality status alerts to citizens.
Smart cities can store their data in three main locations:
- Cloud Storage – Smart cities require large amounts of data for analytical purposes. Cloud data systems use solid-state drives in their data centers, remove redundant data, and encrypt the transmission of data. Cloud-based solutions usually have more flexible payment options than on-premise data centers.
- Edge Computing – Edge computing enables you to process data close to the source. Edge computing can be cheaper than streaming data to a remote storage location and then to relevant city authorities. Edge computing functionalities like artificial intelligence (AI) traffic management is already under development in some cities. AI traffic management makes use of intelligent automation to detect traffic congestion and accidents and provide faster response to different conditions.
- Hybrid Data Storage – Hybrid data storage systems combine the benefits of cloud and edge storage. Hybrid data storage enables cities to make new decisions based on real-time alerts on conditions as well as rich data stores.
What is Big Data Analytics?
Big data analytics is the process of examining large amounts of heterogeneous data to uncover information — such as hidden patterns, correlations, market trends and customer preferences that can help organizations make informed business decisions, which improve business-related outcomes.
Business intelligence (BI) queries answer basic questions about business operations and performance. It involves complex analysis with elements such as predictive models, statistical algorithms and what-if analysis powered by analytics systems.
In general, Big data analytics is the use of advanced analytic techniques against very large, diverse big data sets that include structured, semi-structured and unstructured data, from different sources, and in different sizes from terabytes to zettabytes.
With big data analytics, we can ultimately fuel better and faster decision-making, modelling and predicting of future outcomes and enhanced business intelligence.
When you build your big data solution, consider open source softwares/platforms such as…
- Apache Hadoop
- Apache Spark
- The entire Hadoop ecosystem containing cost-effective & flexible data processing and storage tools – which are designed to handle the volume of data being generated today.
Impact of Big Data on Smart-Cities
A smart city uses sensors and connected devices to collect and analyze data. This data is used to…
- Optimize city operations
- Manage resources
- Improve the everyday life of citizens
Smart cities utilize technology to improve…
- Access to public transport
- Manage traffic
- Optimize water and power supply
- Improve law enforcement services, schools, hospitals, and many more.
Big data solutions enable the use of advanced capabilities in smart cities which includes…
- IoT technology
- Smart sensors
- Smart transport
- & so on..
Big data solutions provide administrative controls for large amounts of data including…
- Storage
- Backups
- Analysis
- Visualization
Big Data Analytics can process data from IoT devices and sensors to recognize patterns and needs. Such analysis can…
- Reduce the number of road accidents and traffic congestion
- Help drivers find a parking spot
- Reduce crime
- Improve smart urban lighting
- Improve water and energy systems
- & so on..
Big data systems introduce efficiency into a complex data infrastructure. Big data can have an impact on various sectors of a city, including transport, public safety, city budgets, and more.
Public Safety
Smart cities must provide security for their citizens. Cities can use predictive big data analytics to identify which areas are prone to be hubs of crime and predict the exact crime location. Information like historical and geographical data helps cities to create a much safer environment.
Transportation
Traffic congestion is a major problem in many cities since it can cost cities millions in revenue. Cities can manage transportation by analyzing data from transport authorities. The analyzed data can uncover patterns that help reduce traffic congestion and help authorities implement data-driven road optimization.
Cost Reduction
Cities invest a lot of money in transforming a city into a smart city. These investments can be either for remodeling or renovation. Analysis of big data can suggest which areas require transformation and what kind of transformation. As a result, cities can make dedicated investments for required areas.
Sustainable Growth
Regular analysis of the growth of a smart city enables city officials to get continuous updates about needed changes. Continuous updates are the key growth drivers of sustainability because they provide a clear idea regarding the required developments. Data plays a key role in determining the outcomes of development in a smart city.
Big Data flow path in Smart-City projects (or any IoT project)
Big data generally flows through following 4 steps :
1. Data professionals collect data from a variety of different sources. Often, it is a mix of semi-structured and unstructured data. While each organization will use different data streams, some common sources include:
- internet clickstream data;
- web server logs;
- cloud applications;
- mobile applications;
- social media content;
- text from customer emails and survey responses;
- mobile phone records; and
- machine data captured by sensors connected to the internet of things (IoT).
2. Data is prepared and processed. After data is collected and stored in a data warehouse or data lake, data professionals must organize, configure and partition the data properly for analytical queries. Thorough data preparation and processing makes for higher performance from analytical queries.
3. Data is cleansed to improve its quality. Data professionals scrub the data using scripting tools or data quality software. They look for any errors or inconsistencies, such as duplications or formatting mistakes, and organize and tidy up the data.
4.The collected, processed and cleaned data is analyzed with analytics software. This includes tools for:
- Data mining, which sifts through data sets in search of patterns and relationships
- Predictive analytics, which builds models to forecast customer behavior and other future actions, scenarios and trends
- Machine learning, which taps various algorithms to analyze large data sets
- Deep learning, which is a more advanced offshoot of machine learning
- Text mining and statistical analysis software
- Artificial intelligence (AI)
- Mainstream business intelligence (BI) software
- Data visualization tools
Big Data Analytics – Key Challenges
Accessibility of data – With larger amounts of data, storage and processing become more complicated. Big data should be stored and maintained properly to ensure it can be used by less experienced data scientists and analysts.
Data quality maintenance – With high volumes of data coming in from a variety of sources and in different formats, data quality management for big data requires significant time, effort and resources to properly maintain it.
Data security – The complexity of big data systems presents unique security challenges. Properly addressing security concerns within such a complicated big data ecosystem can be a complex undertaking.
Choosing the right tools – Selecting from the vast array of big data analytics tools and platforms available on the market can be confusing, so organizations must know how to pick the best tool that aligns with users’ needs and infrastructure.
With a potential lack of internal analytics skills and the high cost of hiring experienced data scientists and engineers, some organizations are finding it hard to fill the gaps.
Big Data Analytics – Key Technologies & Tools
Many different types of tools and technologies are used to support big data analytics processes. Common technologies and tools used to enable big data analytics processes include:
- Hadoop, which is an open source framework for storing and processing big data sets. Hadoop can handle large amounts of structured and unstructured data.
- Predictive analytics hardware and software, which process large amounts of complex data, and use machine learning and statistical algorithms to make predictions about future event outcomes. Organizations use predictive analytics tools for fraud detection, marketing, risk assessment and operations.
- Stream analytics (real-time analytics) tools, which are used to filter, aggregate and analyze big data that may be stored in many different formats or platforms.
- Distributed storage data, which is replicated, generally on a non-relational database. This can be as a measure against independent node failures, lost or corrupted big data, or to provide low-latency access.
- NoSQL databases, which are non-relational data management systems that are useful when working with large sets of distributed data. They do not require a fixed schema, which makes them ideal for raw and unstructured data.
- A data lake is a large storage repository that holds native-format raw data until it is needed. Data lakes use a flat architecture.
- A data warehouse, which is a repository that stores large amounts of data collected by different sources. Data warehouses typically store data using predefined schemas.
- Knowledge discovery/big data mining tools, which enables businesses to mine large amounts of structured and unstructured big data.
- In-memory data fabric, which distributes large amounts of data across system memory resources. This helps provide low latency for data access and processing.
- Data virtualization, which enables data access without technical restrictions.
- Data integration software, which enables big data to be streamlined across different platforms, including Apache, Hadoop, MongoDB and Amazon EMR.
- Data quality software, which cleanses and enriches large data sets.
- Data preprocessing software, which prepares data for further analysis. Data is formatted and unstructured data is cleansed.
- Spark, which is an open source cluster computing framework used for batch and stream data processing.
Few of the key tools / technologies are briefed below :
HDFS – The Hadoop Distributed File System ( HDFS ) is a distributed file system designed to run on commodity hardware. In HDFS data is distributed over several machines and replicated to ensure their durability to failure and high availability to parallel application.
SCALA – Scala is used in Data processing, distributed computing, and web development. It powers the data engineering infrastructure of many companies. When it comes to performance, Scala is almost ten times faster than Python. Scala’s reliance on the Java Virtual Machine (JVM) during runtime imparts speed to it. Generally, compiled languages perform faster than interpreted languages. Since Python is dynamically typed, the development speed reduces.
PIG – Pig Represents Big Data as data flows. Pig is a high-level platform or tool which is used to process the large datasets. It provides a high-level of abstraction for processing over the MapReduce. It provides a high-level scripting language, known as Pig Latin which is used to develop the data analysis codes.
HIVE – Is an easy-to-use software application that allows us to analyze large-scale data through the batch processing technique. Hive Hadoop Component is used for completely structured Data whereas Pig Hadoop Component is used for semi structured data. Hive Hadoop Component is used mainly by data analysts whereas Pig Hadoop Component is generally used by Researchers and Programmers.
SQOOP – Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
FLUME – Flume is an open-source distributed data collection service used for transferring the data from source to destination. It is a reliable, and highly available service for collecting, aggregating, and transferring huge amounts of logs into HDFS.
YARN (Yet Another Resource Negotiator) – YARN is a generic job scheduling framework. It is also responsible for managing the resources amongst applications in the cluster.
Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Apache Ambari is an open-source administration tool deployed on top of Hadoop clusters, and it is responsible for keeping track of the running applications.
MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks, and processing them in parallel on Hadoop commodity servers.
Spark & Storm – Apache Storm and Spark are platforms for big data processing that work with real-time data streams. The core difference between the two technologies is in the way they handle data processing. Storm parallelizes task computation while Spark parallelizes data computations.
IMPORTANT NOTES :
- Big data analytics applications often include data from both internal systems and external sources, such as weather data or demographic data on consumers compiled by third-party information services providers.
- Streaming analytics applications are becoming common in big data environments as users look to perform real-time analytics on data fed into Hadoop systems through stream processing engines, such as Spark, Storm, …
- Early big data systems were mostly deployed on premises, particularly in large organizations that collected, organized and analyzed massive amounts of data. But cloud platform vendors, such as Amazon Web Services (AWS), Google and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud.
- The same goes for Hadoop suppliers such as Cloudera, which supports the distribution of the big data framework on the AWS, Google and Microsoft Azure clouds. Users can now spin up clusters in the cloud, run them for as long as they need and then take them offline with usage-based pricing that doesn’t require ongoing software licenses.
In fact, there are plenty of tools / technology stacks are available for Big Data Analytics, where below diagram highlights some of them only :
Factors to consider while selecting a Big Data tools / technologies
You should consider the following factors before selecting a big data tool :
- License Cost, if applicable.
- Quality of Customer support.
- The cost involved in training employees on the tool.
- Hardware/Software requirements of the big data tool.
- Support and Update policy of the big data tool vendor.
- Reviews of the company.
Big Data Analytics - Reference Architectures
Big Data Analytics Solution Layers
High-Level Logical View of Architecture
Microsoft Azure Cloud based Big Data Analytics Solution– Below figure shows the emerging big data architectural pattern in Azure cloud
AWS Cloud based Big Data Analytics Solution – Below figure shows the Real-time operational monitoring of renewable energy assets with AWS IoT
On-Premise Big Data Analytics Solution