Trending December 2023 # Using Tigervnc In Ubuntu: A Comprehensive Guide # Suggested January 2024 # Top 20 Popular

You are reading the article Using Tigervnc In Ubuntu: A Comprehensive Guide updated in December 2023 on the website Achiashop.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Using Tigervnc In Ubuntu: A Comprehensive Guide

What is TigerVNC?

TigerVNC is a high-performance, platform-neutral implementation of Virtual Network Computing (VNC), a protocol that allows you to view and control the desktop of another computer remotely. TigerVNC is free and open-source software, available under the GNU General Public License.

TigerVNC provides several benefits, including:

High performance: TigerVNC is designed for efficient remote desktop access over low-bandwidth networks.

Security: TigerVNC supports encryption and authentication, ensuring that your remote desktop connection is secure.

Cross-platform compatibility: TigerVNC can be used to connect to Ubuntu from Windows, macOS, and other operating systems.

Installing TigerVNC in Ubuntu

Before we can use TigerVNC, we need to install it on our Ubuntu machine. Here are the steps to do so:

Open a terminal window by pressing Ctrl+Alt+T.

Install TigerVNC by running the following command:

sudo apt-get install tigervnc-standalone-server tigervnc-xorg-extension tigervnc-viewer

This command will install the TigerVNC server and viewer components.

Configuring TigerVNC in Ubuntu

After installing TigerVNC, we need to configure it to allow remote desktop access. Here are the steps to do so:

Open a terminal window by pressing Ctrl+Alt+T.

Run the following command to create a new VNC password:

vncpasswd

This command will prompt you to enter and confirm a new VNC password. This password will be used to authenticate remote desktop connections.

Edit the TigerVNC configuration file by running the following command:

sudo nano /etc/vnc.conf

Add the following lines to the end of the file:

Authentication=VncAuth These lines tell TigerVNC to use VNC authentication and to use the password file we created earlier.

Save and close the file by pressing Ctrl+X, then Y, then Enter.

Starting the TigerVNC Server Now that we have installed and configured TigerVNC, we can start the server and begin accepting remote desktop connections. Here are the steps to do so:

Open a terminal window by pressing Ctrl+Alt+T.

Start the TigerVNC server by running the following command:

vncserver

This command will start the TigerVNC server and generate a unique desktop environment for each new connection.

Note the display number that is output by the command. It should be in the format :1, :2, etc. We will need this display number to connect to the remote desktop later.

Connecting to the Remote Desktop with TigerVNC Viewer

Now that the TigerVNC server is running, we can connect to the remote desktop using TigerVNC Viewer. Here are the steps to do so:

Download and install TigerVNC Viewer on the device you want to connect from. You can download it from the official website.

Open TigerVNC Viewer and enter the IP address or hostname of the Ubuntu machine in the "Remote Host" field.

Enter the display number we noted earlier in the "Display" field. For example, if the display number was :1, enter 1.

Enter the VNC password we created earlier in the "Password" field.

Conclusion

TigerVNC is a powerful and flexible tool for remotely accessing Ubuntu desktops. By following the steps outlined in this article, you should now be able to install, configure, and use TigerVNC in Ubuntu. With TigerVNC, you can easily work on your Ubuntu machine from anywhere in the world, using any device that supports the VNC protocol.

If you are looking for a way to remotely access your Ubuntu desktop, TigerVNC is a great option. This open-source software allows you to connect to your Ubuntu machine from another device, such as a Windows or macOS computer. In this article, we will explore how to install and use TigerVNC in Ubuntu, with code examples and explanations of related concepts.TigerVNC is a high-performance, platform-neutral implementation of Virtual Network Computing (VNC), a protocol that allows you to view and control the desktop of another computer remotely. TigerVNC is free and open-source software, available under the GNU General Public License. TigerVNC provides several benefits, including:Before we can use TigerVNC, we need to install it on our Ubuntu machine. Here are the steps to do so:This command will install the TigerVNC server and viewer components.After installing TigerVNC, we need to configure it to allow remote desktop access. Here are the steps to do so:This command will prompt you to enter and confirm a new VNC password. This password will be used to authenticate remote desktop connections.These lines tell TigerVNC to use VNC authentication and to use the password file we created chúng tôi that we have installed and configured TigerVNC, we can start the server and begin accepting remote desktop connections. Here are the steps to do so:This command will start the TigerVNC server and generate a unique desktop environment for each new chúng tôi that the TigerVNC server is running, we can connect to the remote desktop using TigerVNC Viewer. Here are the steps to do so:TigerVNC is a powerful and flexible tool for remotely accessing Ubuntu desktops. By following the steps outlined in this article, you should now be able to install, configure, and use TigerVNC in Ubuntu. With TigerVNC, you can easily work on your Ubuntu machine from anywhere in the world, using any device that supports the VNC protocol.

You're reading Using Tigervnc In Ubuntu: A Comprehensive Guide

Understanding Umask: A Comprehensive Guide

As a developer or system administrator, it’s essential to understand the concept of umask. Umask is a command-line utility that determines the default file permissions for newly created files and directories. In this article, we’ll take a closer look at what umask is, how it works, and how to use it in Linux and Unix systems.

What is Umask?

In Unix and Linux systems, every file and directory has a set of permissions that determine who can read, write, and execute them. These permissions are represented by three digits, each representing the permissions for a specific group of users: the owner of the file, the group owner of the file, and everyone else.

For example, if a file has permissions set to 644, it means that the owner of the file can read and write to it, while the group owner and everyone else can only read it.

The umask command determines the default permissions that are assigned to newly created files and directories. It works by subtracting the specified umask value from the default permissions assigned to new files and directories.

Understanding Umask Values

The umask value is represented by a three-digit octal number. Each digit represents the permissions that are removed from the default permissions for the owner, group owner, and everyone else.

For example, if the umask value is set to 022, it means that the write permission is removed for the group owner and everyone else. The default permissions for newly created files will be 644 (owner can read and write, group owner and everyone else can read), and for directories, it will be 755 (owner can read, write, and execute, group owner and everyone else can read and execute).

Using Umask in Linux and Unix Systems

To set the umask value, you can use the umask command followed by the desired value. For example, to set the umask value to 022, you can run the following command:

umask 022

You can also set the umask value in the shell startup file (e.g., ~/.bashrc or ~/.bash_profile) to make it persistent across sessions.

Once you set the umask value, any new files or directories you create will have the default permissions calculated based on the umask value.

Umask Examples

Let’s take a look at some examples to understand how umask works in practice.

Example 1: Setting the Umask Value

Suppose you want to set the umask value to 027. You can run the following command:

umask 027

This will set the umask value to 027, which means that the write permission is removed for the owner, and the read and write permissions are removed for the group owner and everyone else.

Example 2: Creating a New File

Suppose you create a new file named example.txt after setting the umask value to 027. The default permissions for the file will be 640 (owner can read and write, group owner can read, and everyone else has no permissions).

touch example.txt ls -l example.txt

Output:

Example 3: Creating a New Directory

Suppose you create a new directory named example after setting the umask value to 027. The default permissions for the directory will be 750 (owner can read, write, and execute, group owner can read and execute, and everyone else has no permissions).

mkdir example ls -ld example

Output:

Conclusion

In summary, umask is a command-line utility that determines the default file permissions for newly created files and directories in Unix and Linux systems. Understanding how umask works is essential for developers and system administrators to ensure that the correct permissions are set for files and directories. By using umask, you can easily set the default permissions for newly created files and directories based on your specific requirements.

A Comprehensive Guide To Sharding In Data Engineering For Beginners

This article was published as a part of the Data Science Blogathon.

Big Data is a very commonly heard term these days. A reasonably large volume of data that cannot be handled on a small capacity configuration of servers can be called ‘Big Data’ in that particular context. In today’s competitive world, every business organization relies on decision-making based on the outcome of the analyzed data they have on hand. The data pipeline starting from the collection of raw data to the final deployment of machine learning models based on this data goes through the usual steps of cleaning, pre-processing, processing, storage, model building, and analysis. Efficient handling and accuracy depend on resources like software, hardware, technical workforce, and costs. Answering queries requires specific data probing in either static or dynamic mode with consistency, reliability, and availability. When data is large, inadequacy in handling queries due to the size of data and low capacity of machines in terms of speed, memory may prove problematic for the organization. This is where sharding steps in to address the above problems.

This guide explores the basics and various facets of data sharding, the need for sharding, and its pros, and cons.

What is Data Sharding?

With the increasing use of IT technologies, data is accumulating at an overwhelmingly faster pace. Companies leverage this big data for data-driven decision-making. However, with the increased size of the data, system performance suffers due to queries becoming very slow if the dataset is entirely stored in a single database. This is why data sharding is required.

Image Source: Author

In simple terms, sharding is the process of dividing and storing a single logical dataset into databases that are distributed across multiple computers. This way, when a query is executed, a few computers in the network may be involved in processing the query, and the system performance is faster. With increased traffic, scaling the databases becomes non-negotiable to cope with the increased demand. Furthermore, several sharding solutions allow for the inclusion of additional computers. Sharding allows a database cluster to grow with the amount of data and traffic received.

Let’s look at some key terms used in the sharding of databases.

Scale-out and Scaling up: The process of creating or removing databases horizontally done to improve performance and increase capacity is called scale-out. Scaling up refers to the practice of adding physical resources to an existing database server, like memory, storage, and CPU, to improve performance.

Sharding: Sharding distributes similarly-formatted large data over several separate databases.

Chunk: A chunk is made up of sharded data subset and is bound by lower and higher ranges based on the shard key.

Shard: A shard is a horizontally distributed portion of data in a database. Data collections with the same partition keys are called logical shards, which are then distributed across separate database nodes.

Sharding Key: A sharding key is a column of the database to be sharded. This key is responsible for partitioning the data. It can be either a single indexed column or multiple columns denoted by a value that determines the data division between the shards. A primary key can be used as a sharding key. However, a sharding key cannot be a primary key. The choice of the sharding key depends on the application. For example, userID could be used as a sharding key in banking or social media applications.

Logical shard and Physical Shard: A chunk of the data with the same shard key is called a logical shard. When a single server holds one or more than one logical shard, it is called a physical shard.

Shard replicas: These are the copies of the shard and are allotted to different nodes.

Partition Key: It is a key that defines the pattern of data distribution in the database. Using this key, it is possible to direct the query to the concerned database for retrieving and manipulating the data. Data having the same partition key is stored in the same node.

Replication: It is a process of copying and storing data from a central database at more than one node.

Resharding: It is the process of redistributing the data across shards to adapt to the growing size of data.

Are Sharding and Partitioning the same?

Both Sharding and Partitioning allow splitting and storing the databases into smaller datasets. However, they are not the same. Upon comparison, we can say that sharding distributes the data and is shared over several machines, but not with partitioning. Within a single unsharded database, partitioning is the process of grouping subsets of data. Hence, the phrases sharding and partitioning are used interchangeably when the terms “horizontal” and “vertical” are used before them. As a result, “horizontal sharding” and “horizontal partitioning” are interchangeable terms.

Vertical Sharding:

Entire columns are split and placed in new, different tables in a vertically partitioned table. The data in one vertical split is different from the data in the others, and each contains distinct rows and columns.

Horizontal Sharding:

Horizontal sharding or horizontal partitioning divides a table’s rows into multiple tables or partitions. Every partition has the same schema and columns but distinct rows. Here, the data stored in each partition is distinct and independent of the data stored in other partitions.

The image below shows how a table can be partitioned both horizontally and vertically.

The Process

Before sharding a database, it is essential to evaluate the requirements for selecting the type of sharding to be implemented.

At the start, we need to have a clear idea about the data and how the data will be distributed across shards. The answer is crucial as it will directly impact the performance of the sharded database and its maintenance strategy.

Next, the nature of queries that need to be routed through these shards should also be known. For read queries, replication is a better and more cost-effective option than sharding the database. On the other hand, workload involving writing queries or both read and write queries would require sharding of the database. And the final point to be considered is regarding shard maintenance. As the accumulated data increases, it needs to be distributed, and the number of shards keeps on growing over time. Hence, the distribution of data in shards requires a strategy that needs to be planned ahead to keep the sharding process efficient.

Types of Sharding Architectures

Once you have decided to shard the existing database, the following step is to figure out how to achieve it. It is crucial that during query execution or distributing the incoming data to sharded tables/databases, it goes to the proper shard. Otherwise, there is a possibility of losing the data or experiencing noticeably slow searches. In this section, we will look at some commonly used sharding architectures, each of which has a distinct way of distributing data between shards. There are three main types of sharding architectures – Key or Hash-Based, Range Based, and Directory-Based sharding.

To understand these sharding strategies, say there is a company that handles databases for its client who sell their products in different countries. The handled database might look like this and can often extend to more than a million rows.

We will take a few rows from the above table to explain each sharding strategy.

So, to store and query these databases efficiently, we need to implement sharding on these databases for low latency, fault tolerance, and reliability.

Key Based Sharding

Key Based Sharding or Hash-Based Sharding, uses a value from the column data — like customer ID, customer IP address, a ZIP code, etc. to generate a hash value to shard the database. This selected table column is the shard key. Next, all row values in the shard key column are passed through the hash function.

This hash function is a mathematical function that converts any text input size (usually a combination of numbers and strings) and returns a unique output called a hash value. The hash value is based on the chosen algorithm (depending on the data and application) and the total number of available shards. This value indicates the data should be sent to which shard number.

It is important to remember that a shard key needs to be both unique and static, i.e., it should not change over a period of time. Otherwise, it would increase the amount of work required for update operations, thus slowing down performance.

The Key Based Sharding process looks like this:

Image Source: Author

Features of Key Based Sharding are-

It is easier to generate hash keys using algorithms. Hence, it is good at load balancing since data is equally distributed among the available numbers of shards.

As all shards share the same load, it helps to avoid database hotspots (when one shard contains excessive data as compared to the rest of the shards).

Additionally, in this type of sharding, there is no need to have any additional map or table to hold the information of where the data is stored.

However, it is not dynamic sharding, and it can be difficult to add or remove extra servers from a database depending on the application requirement. The adding or removing of servers requires recalculating the hash key. Since the hash key changes due to a change in the number of shards, all the data needs to be remapped and moved to the appropriate shard number. This is a tedious task and often challenging to implement in a production environment.

To address the above shortcoming of Key Based Sharding, a ‘Consistent Hashing’ strategy can be used.

Consistent Hashing-

In this strategy, hash values are generated both for the data input and the shard, based on the number generated for the data and the IP address of the shard machine, respectively. These two hash values are arranged around a ring or a circle utilizing the 360 degrees of the circle. The hash values that are close to each other are made into a pair, which can be done either clockwise or anti-clockwise.

The data is loaded according to this combination of hash values. Whenever the shards need to be reduced, the values from where the shard has been removed are attached to the nearest shard. A similar procedure is adopted when a shard is added. The possibility of mapping and reorganization problems in the Hash Key strategy is removed in this way as the mapping of the number of shards is reduced noticeably. For example, in Key Based Hashing, if you are required to shuffle the data to 3 out of 4 shards due to a change in the hash function, then in ‘consistent hashing,’ you will require shuffling on a lesser number of shards as compared to the previous one. Moreover, any overloading problem is taken care of by adding replicas of the shard.

Range Based Sharding

Range Based Sharding is the process of sharding data based on value ranges. Using our previous database example, we can make a few distinct shards using the Order value amount as a range (lower value and higher value) and divide customer information according to the price range of their order value, as seen below:

Image source: Author

Features of Range Based Sharding are-

Besides, there is no hashing function involved. Hence, it is possible to easily add more machines or reduce the number of machines. And there is no need to shuffle or reorganize the data.

On the other hand, this type of sharding does not ensure evenly distributed data. It can result in overloading a particular shard, commonly referred to as a database hotspot.

Directory-Based Sharding

This type of sharding relies on a lookup table (with the specific shard key) that keeps track of the stored data and the concerned allotted shards. It tells us exactly where the particular queried data is stored or located on a specific shard. This lookup table is maintained separately and does not exist on the shards or the application. The following image demonstrates a simple example of a Directory-Based Sharding.

Features of Directory-Based Sharding are –

The directory-Based Sharding strategy is highly adaptable. While Range Based Sharding is limited to providing ranges of values, Key Based Sharding is heavily dependent on a fixed hash function, making it challenging to alter later if application requirements change. Directory-Based Sharding enables us to use any method or technique to allocate data entries to shards, and it is convenient to add or reduce shards dynamically.

The only downside of this type of sharding architecture is that there is a need to connect to the lookup table before each query or write every time, which may increase the latency.

Furthermore, if the lookup table gets corrupted, it can cause a complete failure at that instant, known as a single point of failure. This can be overcome by ensuring the security of the lookup table and creating a backup of the table for such events.

Other than the three main sharding strategies discussed above, there can be many more sharding strategies that are usually a combination of these three.

After this detailed sharding architecture overview, we will now understand the pros and cons of sharding databases.

Benefits of Sharding

Horizontal Scaling: For any non-distributed database on a single server, there will always be a limit to storage and processing capacity. The ability of sharding to extend horizontally makes the arrangement flexible to accommodate larger amounts of data.

Speed: Speed is one more reason why sharded database design is preferred is to improve query response times. Upon submitting a query to a non-sharded database, it likely has to search every row in the table before finding the result set, you’re searching for. Queries can become prohibitively slow in an application with an unsharded single database. However, by sharding a single table into multiple tables, queries pass through fewer rows, and their resulting values are delivered considerably faster.

Reliability: Sharding can help to improve application reliability by reducing the effect of system failures. If a program or website is dependent on an unsharded database, a failure might render the entire application inoperable. An outage in a sharded database, on the other hand, is likely to affect only one shard. Even if this causes certain users to be unable to access some areas of the program or website, the overall impact would be minimal.

Challenges in Sharding

While sharding a database might facilitate growth and enhance speed, it can also impose certain constraints. We will go through some of them and why they could be strong reasons to avoid sharding entirely.

Increased complexity: Companies usually face a challenge of complexity when designing shared database architecture. There is a risk that the sharding operation will result in lost data or damaged tables if done incorrectly. Even if done correctly, shard maintenance and organization are likely to significantly influence the outcome.

Shard Imbalancing: Depending on the sharding architecture, distribution on different shards can get imbalanced due to incoming traffic. This results in remapping or reorganizing the data amongst different shards. Obviously, it is time-consuming and expensive.

Unsharding or restoring the database: Once a database has been sharded, it can be complicated to restore it to its earlier version. Backups of the database produced before it was sharded will not include data written after partitioning. As a result, reconstructing the original unsharded architecture would need either integrating the new partitioned data with the old backups or changing the partitioned database back into a single database, both of which would undesirable.

Not supported by all databases: It is to be noted that not every database engine natively supports sharding. There are several databases currently available. Some popular ones are MySQL, PostgreSQL, Cassandra, MongoDB, HBase, Redis, and more. Databases namely MySQL or MongoDB has an auto-sharding feature. As a result, we need to customize the strategy to suit the application requirements when using different databases for sharding.

Now that we have discussed the pros and cons of sharding databases, let us explore situations when one should select sharding.

When should one go for Sharding?

When the application data outgrows the storage capacity and can no longer be stored as a single database, sharding becomes essential.

When the volume of reading/writing exceeds the current database capacity, this results in a higher query response time or timeouts.

When a slow response is experienced while reading the read replicas, it indicates that the network bandwidth required by the application is higher than the available bandwidth.

Excluding the above situations, it is possible to optimize the database instead. These could include carrying out server upgrades, implementing caches, creating one or more read replicas, setting up remote databases, etc. Only when these options cannot solve the problem of increased data, sharding is always an option.

Conclusion

We have covered the fundamentals of sharding in Data Engineering and by now have developed a good understanding of this topic

With sharding, businesses can add horizontal scalability to the existing databases. However, it comes with a set of challenges that need to be addressed. These include considerable complexity, possible failure points, a requirement for additional resources, and more. Thus, sharding is essential only in certain situations.

I hope you enjoyed reading this guide! In the next part of this guide, we will cover how sharding is implemented step-by-step using MongoDB.

Author Bio

She loves traveling, reading fiction, solving Sudoku puzzles, and participating in coding competitions in her leisure time.

You can follow her on LinkedIn, GitHub, Kaggle, Medium, Twitter.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Comprehensive Guide To Itil Lifecycle

Overview of ITIL Lifecycle

Information technology infrastructure library (ITIL) is a planned structure, the main purpose of which is to improve the efficiency of the IT department of a company. This department does not just remain back-office support but the IT officers are service partners of the business. In this topic, we are going to learn about ITIL Lifecycle.

The ITIL is designed in such a way that the planning, selection, maintenance, and delivery of IT services of any business is systematized and standardized.

Start Your Free Project Management Course

Project scheduling and management, project management software & others

When a company decides to adopt ITIL, it requires trained and certified personnel to maintain it and also to guide the company and its IT department. Microsoft, IBM, and Hewlett Packard Enterprise are Company which is already using ITIL successfully.

Evolution of ITIL

In 1989 ITIL was introduced to standardize IT service management. This helped in streamlining services in organizations.

In 2001 ITIL v2 was introduced which included actual processes and a sound support system to benefit organizations.

2007 brought us ITIL v3 which provided guidelines to design, service and operation. Feedback for improvement was also started for continuous improvement.

In 2011, ITIL v3 gave a broader perspective and added more focus on strategy,

2023 gives us ITIL v4 which hopes to provide an improved role of IT management in a service economy. It ensures to give practical guidance and while drawing connections between ITIL and new technologies like DevOps.

Stages of ITIL Lifecycle

1. The Strategy of Service

This stage is of most importance as it is the crux of ITIL Lifecycle services. A uniform and rational strategy is the key to superior service management provided by any organization. The business goals of a company and the procedures followed by the IT department should be in sync. The objectives should be in alignment with the strategies.

So, the initial step to be taken here is :

To find out who the customers are?

What are the services required?+

What sort of skill or qualifications are needed?

From where will the funds come and how will the delivery be done?

How will the monetary worth be determined?

Who will take the responsibility of the business relations?

What is the purpose of IT service management?

2. Design of the Service

In this stage, the strategies of stage 1 are converted into activity. Now the ideas are taken to the next step and planning and designing takes place. A time period is also pre-decided within which the service needs to be executed.

This process includes:

Understanding the strategy.

Creating a prospectus for the service

Ensuring that the policies and services are cost-efficient

Look into the security system

3. Transition of Service

Once designed, the strategy is tested so that it is ready to be actually performed or we can say ready to be executed. This is the stage where the procedure is thoroughly checked so that there is no issue when it is finally presented to the customer.

This transition includes:

A new service to be implemented for every design.

Every design to be tested and displayed.

Any changes required for services to be managed.

Any risks to the services also are looked into.

Accomplishing the business target.

4. Service Operation

This is where the service is presented to the customer and is ready for operation. Customer satisfaction should be ensured by the service provider here and it is his duty to see how the service is performing. If there are any issues, they need to be reported.

Services are delivered to sanctioned users.

The cost should be effective and quality enhanced.

The satisfaction of the user.

Business Enhancement.

5. Continued Improvement of Service

Though the planning, designing and implementing services is done meticulously, continuous monitoring is required so that all the strategic targets of that IT service are reached. Once these are reached, new targets can be set and the process can start again.

By ensuring the proper execution of each stage of the ITIL lifecycle, the company knows that their services and their business strategies are on the same page.

Guiding Principles of ITIL

These are a few rules of ITIL which might be common to other methods too.

Focusing on creating value directly or indirectly

Acknowledge what is good and work on the weaker aspects

Work on small projects, improvising while the job is being done and measuring the work done for future reference.

Transparency amongst the team members as well as the shareholders and owners of the company as always proves beneficial to all concerned.

The undertaking of a project until its completion should have a holistic approach to it as this is the responsibility of the service value system (SVS)

The employment of resources, tools, procedures should be optimum and practical as time and finances both matter.

Human Resources should be involved only when necessary as it is easier with software.

ITIL looks at how the knowledge of the admins can be utilized for the benefit of the organization at a larger scale.

ITIL is beneficial as stated under:

The business and its IT department are aligned better as far as the goal is concerned.

Service is provided within the timeline and there are happy customers.

Resources are optimally utilized resulting in cost reduction.

There is more clarity on the cost and assets of IT.

The environment is more adjustable and open to changes which are very helpful.

ITIL proves to be a good infrastructure for businesses that don’t have a fixed service foundation but can pursue specialists to do the job in the best way possible.

Recommended Articles

This has been a guide to ITIL Lifecycle. Here we discuss the Evolution, Stages, Principles and the main purpose of using ITIL in an IT department of a company. You can also go through our other suggested articles to learn more –

Mount Google Drive In Ubuntu Using Google

More people than ever are relying on Google for document writing and storage. It’s especially useful for Linux users who need the seamless compatibility with other platforms. Google’s widespread adoption eliminates any real issues.

Thankfully, Linux developers, specifically the ones working on GNOME, realized just how useful integration with Google Docs can be and built functionality into the desktop environment itself. That integration is going to make this whole process a lot easier, and you’ll have complete access to your Google Drive on Ubuntu in no time. While this guide is tailored to Ubuntu, the process can easily be adapted to any distribution running GNOME.

Install Online Accounts

Everything here is dependent on GNOME’s online accounts feature. It’s probably installed by default, but it can’t hurt to be sure. Install it with Apt.

sudo

apt

install

gnome-online-accounts

It’s just a single package, so it won’t take very long.

Connect to Your Google Account

Now, you’ll be able to connect your Google account to GNOME through your settings. Open your app listing and locate Settings to open it.

Enter your Google username and password after a new window pops open and asks you to.

Next, once you’re alerted to the permissions within your Google account that GNOME is requesting, accept them.

In the last stage of the account setup, GNOME may ask for your keyring password. This is usually the same as your user account password on the system. This password is the one you’ll use to unlock your local password stores in GNOME.

When you’re done, GNOME will drop you back to the settings menu with your new Google account listed at the top.

Enable Drive Access

You’re also going to need to enable file access to your Google account to get access to your Drive, a simply process.

Mount Your Drive

You’ll see all of the folders that you’ve created on your Google Drive along with any individual files. You can open them up and edit them normally with LibreOffice or any other compatible program. You’re also free to add any new files to that drive, and they’ll automatically be uploaded to your Google Drive, to be accessed from anywhere.

You’re officially ready to use your Google Drive in a new and much more convenient way. Treating your Drive files like they’re local eliminates a lot of the potential awkwardness of working through a browser and uploading files. Sure, you still need an Internet connection to access and upload files, but the process is smoother, and it feels more rewarding to use Drive with Linux.

Nick Congleton

Nick is a freelance tech. journalist, Linux enthusiast, and a long time PC gamer.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.

By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.

Become A Data Visualization Whiz With This Comprehensive Guide To Seaborn In Python

Overview

Seaborn is a popular data visualization library for Python

Seaborn combines aesthetic appeal and technical insights – two crucial cogs in a data science project

Learn how it works and the different plots you can generate using seaborn

Introduction

There is just something extraordinary about a well-designed visualization. The colors stand out, the layers blend nicely together, the contours flow throughout, and the overall package not only has a nice aesthetic quality, but it provides meaningful insights to us as well.

This is quite important in data science where we often work with a lot of messy data. Having the ability to visualize it is critical for a data scientist. Our stakeholders or clients will more often than not rely on visual cues rather than the intricacies of a machine learning model.

There are plenty of excellent Python visualization libraries available, including the built-in matplotlib. But seaborn stands out for me. It combines aesthetic appeal seamlessly with technical insights, as we’ll soon see.

In this article, we’ll learn what seaborn is and why you should use it ahead of matplotlib. We’ll then use seaborn to generate all sorts of different data visualizations in Python. So put your creative hats on and let’s get rolling!

Seaborn is part of the comprehensive and popular Applied Machine Learning course. It’s your one-stop-destination to learning all about machine learning and its different aspects.

Table of Contents

What is Seaborn?

Why should you use Seaborn versus matplotlib?

Setting up the Environment

Data Visualization using Seaborn

Visualizing Statistical Relationships

Plotting with Categorical Data

Visualizing the Distribution of a Dataset

What is Seaborn?

Have you ever used the ggplot2 library in R? It’s one of the best visualization packages in any tool or language. Seaborn gives me the same overall feel.

Seaborn is an amazing Python visualization library built on top of matplotlib.

It gives us the capability to create amplified data visuals. This helps us understand the data by displaying it in a visual context to unearth any hidden correlations between variables or trends that might not be obvious initially. Seaborn has a high-level interface as compared to the low level of Matplotlib.

Why should you use Seaborn versus matplotlib?

I’ve been talking about how awesome seaborn is so you might be wondering what all the fuss is about.

I’ll answer that question comprehensively in a practical manner when we generate plots using seaborn. For now, let’s quickly talk about how seaborn feels like it’s a step above matplotlib.

Seaborn makes our charts and plots look engaging and enables some of the common data visualization needs (like mapping color to a variable or using faceting). Basically, it makes the data visualization and exploration easy to conquer. And trust me, that is no easy task in data science.

“If Matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.” – Michael Waskom (Creator of Seaborn)

There are essentially a couple of (big) limitations in matplotlib that Seaborn fixes:

Seaborn comes with a large number of high-level interfaces and customized themes that matplotlib lacks as it’s not easy to figure out the settings that make plots attractive

Matplotlib functions don’t work well with dataframes, whereas seaborn does

Setting up the Environment

The seaborn library has four mandatory dependencies you need to have:

To install Seaborn and use it effectively, first, we need to install the aforementioned dependencies. Once this step is done, we are all set to install Seaborn and enjoy its mesmerizing plots. To install Seaborn, you can use the following line of code-

To install the latest release of seaborn, you can use pip:

pip

install

seaborn

You can also use conda to install the latest version of seaborn:

conda

install

seaborn

To import the dependencies and seaborn itself in your code, you can use the following code-

View the code on Gist.

That’s it! We are all set to explore seaborn in detail.

Datasets Used for Data Visualization

We’ll be working primarily with two datasets:

I’ve picked these two because they contain a multitude of variables so we have plenty of options to play around with. Both these datasets also mimic real-world scenarios so you’ll get an idea of how data visualization and exploration work in the industry.

You can check out this and other high-quality datasets and hackathons on the DataHack platform. So go ahead and download the above two datasets before you proceed. We’ll be using them in tandem.

Data Visualization using Seaborn

Let’s get started! I have divided this implementation section into two categories:

Visualizing statistical relationships

Plotting categorical data

We’ll look at multiple examples of each category and how to plot it using seaborn.

Visualizing statistical relationships

A statistical relationship denotes a process of understanding relationships between different variables in a dataset and how that relationship affects or depends on other variables.

Here, we’ll be using seaborn to generate the below plots:

Scatter plot

SNS.relplot

Hue plot

I have picked the ‘Predict the number of upvotes‘ project for this. So, let’s start by importing the dataset in our working environment:

Scatterplot using Seaborn

A scatterplot is perhaps the most common example of visualizing relationships between two variables. Each point shows an observation in the dataset and these observations are represented by dot-like structures. The plot shows the joint distribution of two variables using a cloud of points.

To draw the scatter plot, we’ll be using the relplot() function of the seaborn library. It is a figure-level role for visualizing statistical relationships. By default, using a relplot produces a scatter plot:

Python Code:



SNS.relplot using Seaborn

SNS.relplot is the relplot function from SNS class, which is a seaborn class that we imported above with other dependencies.

The parameters – x, y, and data – represent the variables on X-axis, Y-axis and the data we are using to plot respectively. Here, we’ve found a relationship between the views and upvotes.

Next, if we want to see the tag associated with the data, we can use the below code:

View the code on Gist.

Hue Plot

We can add another dimension in our plot with the help of hue as it gives color to the points and each color has some meaning attached to it.

In the above plot, the hue semantic is categorical. That’s why it has a different color palette. If the hue semantic is numeric, then the coloring becomes sequential.

View the code on Gist.

We can also change the size of each point:

View the code on Gist.

We can also change the size manually by using another parameter sizes as sizes = (15, 200).

Plotting Categorical Data

Jitter

Hue

Boxplot

Voilin Plot

Pointplot

In the above section, we saw how we can use different visual representations to show the relationship between multiple variables. We drew the plots between two numeric variables. In this section, we’ll see the relationship between two variables of which one would be categorical (divided into different groups).

We’ll be using catplot() function of seaborn library to draw the plots of categorical data. Let’s dive in

Jitter Plot

For jitter plot we’ll be using another dataset from the problem HR analysis challenge, let’s import the dataset now.

View the code on Gist.

Now, we’ll see the plot between the columns education and avg_training_score by using catplot() function.

Since we can see that the plot is scattered, so to handle that, we can set the jitter to false. Jitter is the deviation from the true value. So, we’ll set the jitter to false by using another parameter.

Hue Plot

Next, if we want to introduce another variable or another dimension in our plot, we can use the hue parameter just like we used in the above section. Let’s say we want to see the gender distribution in the plot of education and avg_training_score, to do that, we can use the following code

In the above plots, we can see that the points are overlapping each other, to eliminate this situation, we can set kind = “swarm”, swarm uses an algorithm that prevents the points from overlapping and adjusts the points along the categorical axis. Let’s see how it looks like-

Pretty amazing, right? What if we want to see the swarmed version of the plot as well as a third dimension? Let’s see how it goes if we introduce is_promoted as a new variable

Clearly people with higher scores got a promotion.

Boxplot using seaborn

Another kind of plot we can draw is a boxplot which shows three quartile values of the distribution along with the end values. Each value in the boxplot corresponds to actual observation in the data. Let’s draw the boxplot now-

When we use hue semantic with boxplot, it is leveled along the categorical axis so they don’t overlap. The boxplot with hue would look like-

Violin Plot using seaborn

We can also represent the above variables differently by using violin plots. Let’s try it out

The violin plots combine the boxplot and kernel density estimation procedure to provide richer description of the distribution of values. The quartile values are displayed inside the violin. We can also split the violin when the hue semantic parameter has only two levels, which could also be helpful in saving space on the plot. Let’s look at the violin plot with a split of levels.

These amazing plots are the reason why I started using seaborn. It gives you a lot of options to display the data. Another coming in the line is boxplot.

Boxplot using seaborn

Boxplot operates on the full dataset and obtains the mean value by default. Let’s face it now.

Pointplot using seaborn

Another type of plot coming in is pointplot, and this plot points out the estimate value and confidence interval. Pointplot connects data from the same hue category. This helps in identifying how the relationship is changing in a particular hue category. You can check out how does a pointplot displays the information below.

As it is clear from the above plot, the one whose score is high has is more confident in getting a promotion.

This is not the end, seaborn is a huge library with a lot of plotting functions for different purposes. One such purpose is to introduce multiple dimensions. We can visualize higher dimension relationships as well. Let’s check it out using swarm plot.

Swarm plot using seaborn

It becomes so easy to visualize the insights when we combine multiple concepts into one. Here swarm plot is promoted attribute as hue semantic and gender attribute as a faceting variable.

Visualizing the Distribution of a Dataset

Whenever we are dealing with a dataset, we want to know how the data or the variables are being distributed. Distribution of data could tell us a lot about the nature of the data, so let’s dive into it.

Plotting Univariate Distributions

Histogram

One of the most common plots you’ll come across while examining the distribution of a variable is distplot. By default, distplot() function draws histogram and fits a Kernel Density Estimate. Let’s check out how age is distributed across the data.

This clearly shows that the majority of people are in their late twenties and early thirties.

Histogram using Seaborn

Another kind of plot that we use for univariate distribution is a histogram.

A histogram represents the distribution of data in the form of bins and uses bars to show the number of observations falling under each bin. We can also add a rugplot in it instead of using KDE (Kernel Density Estimate), which means at every observation, it will draw a small vertical stick.

Plotting Bivariate Distributions

Hexplot

KDE plot

Boxen plot

Ridge plot (Joyplot)

Apart from visualizing the distribution of a single variable, we can see how two independent variables are distributed with respect to each other. Bivariate means joint, so to visualize it, we use jointplot() function of seaborn library. By default, jointplot draws a scatter plot. Let’s check out the bivariate distribution between age and avg_training_score.

There are multiple ways to visualize bivariate distribution. Let’s look at a couple of more.

Hexplot using Seaborn

Hexplot is a bivariate analog of histogram as it shows the number of observations that falls within hexagonal bins. This is a plot which works with large dataset very easily. To draw a hexplot, we’ll set kind attribute to hex. Let’s check it out now.

View the code on Gist.

KDE Plot using Seaborn

That’s not the end of this, next comes KDE plot. It’s another very awesome method to visualize the bivariate distribution. Let’s see how the above observations could also be achieved by using jointplot() function and setting the attribute kind to KDE.

View the code on Gist.

Heatmaps using Seaborn

Now let’s talk about my absolute favorite plot, the heatmap. Heatmaps are graphical representations in which each variable is represented as a color.

Let’s go ahead and generate one:

View the code on Gist.

Boxen Plot using Seaborn

Another plot that we can use to show the bivariate distribution is boxen plot. Boxen plots were originally named letter value plot as it shows large number of values of a variable, also known as quantiles. These quantiles are also defined as letter values. By plotting a large number of quantiles, provides more insights about the shape of the distribution. These are similar to box plots, let’s see how they could be used.

View the code on Gist.

Ridge Plot using seaborn

The next plot is quite fascinating. It’s called ridge plot. It is also called joyplot. Ridge plot helps in visualizing the distribution of a numeric value for several groups. These distributions could be represented by using KDE plots or histograms. Now, let’s try to plot a ridge plot for age with respect to gender.

View the code on Gist.

Visualizing Pairwise Relationships in a Dataset

We can also plot multiple bivariate distributions in a dataset by using pairplot() function of the seaborn library. This shows the relationship between each column of the database. It also draws the univariate distribution plot of each variable on the diagonal axis. Let’s see how it looks.

End Notes

We’ve covered a lot of plots here. We saw how the seaborn library can be so effective when it comes to visualizing and exploring data (especially large datasets). We also discussed how we can plot different functions of the seaborn library for different kinds of data.

Like I mentioned earlier, the best way to learn seaborn (or any concept or library) is by practicing it. The more you generate new visualizations on your own, the more confident you’ll become. Go ahead and try your hand at any practice problem on the DataHack platform and start becoming a data visualization master!

Related

Update the detailed information about Using Tigervnc In Ubuntu: A Comprehensive Guide on the Achiashop.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!