Trending December 2023 # Pneumonia Detection Using Cnn With Implementation In Python # Suggested January 2024 # Top 16 Popular

You are reading the article Pneumonia Detection Using Cnn With Implementation In Python updated in December 2023 on the website Achiashop.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Pneumonia Detection Using Cnn With Implementation In Python

Hey there! Just finished another deep learning project several hours ago, now I want to share what I actually did there. So the objective of this challenge is to determine whether a person suffers pneumonia or not. If yes, then determine whether it’s caused by bacteria or viruses. Well, I think this project should be called classification instead of detection.

Several x-ray images in the dataset used in this project.

In other words, this task is going to be a multiclass classification problem where the label names are: normal, virus, and bacteria. In order to solve this problem. I will use CNN (Convolutional Neural Network), thanks to its excellent ability to perform image classification. Not only that, but here I also implement the image augmentation technique as an approach to improve model performance. By the way, here I obtained 80% accuracy on test data which is pretty impressive to me.

The dataset used in this project can be downloaded from this Kaggle link. The size of the entire dataset itself is around 1 GB, so it might take a while to download. Or, we can also directly create a Kaggle Notebook and code the entire project there, so we don’t even need to download anything. Next, if you explore the dataset folder, you will see that there are 3 subfolders, namely train, test and val.

Well, I think those folder names are self-explanatory. In addition, the data in the train folder consists of 1341, 1345, and 2530 samples for normal, virus and bacteria class respectively. I think that’s all for the intro, let’s now jump into the code!

Note: I put the entire code used in this project at the end of this article.

Loading modules and train images

The very first thing to do when working with a computer vision project is to load all required modules and the image data itself. I use tqdm module to display the progress bar which you’ll see why it is useful later on. The last import I do here is ImageDataGenerator coming from the Keras module. This module is going to help us with implementing the image augmentation technique during the training process.

import os import cv2 import pickle import numpy as np import matplotlib.pyplot as plt import seaborn as sns from tqdm import tqdm from sklearn.preprocessing import OneHotEncoder from sklearn.metrics import confusion_matrix from keras.models import Model, load_model from keras.layers import Dense, Input, Conv2D, MaxPool2D, Flatten from keras.preprocessing.image import ImageDataGenerator

np.random.seed(22)

Next, I define two functions to load image data from each folder. The two functions below might look identical at glance, but there’s actually a little difference at the line with bold text. This is done because the filename structure in NORMAL and PNEUMONIA folders are slightly different. Despite the difference, the other process done by both functions is essentially the same. First, all images are going to be resized to 200 by 200 pixels large.

This is important to do since the images in all folders are having different dimensions while the neural networks can only accept data with a fixed array size. Next, basically all images are stored with 3 color channels, which is I think it’s just redundant for x-ray images. So the idea here is to convert all those color images to grayscale.

# Do not forget to include the last slash def load_normal(norm_path): norm_files = np.array(os.listdir(norm_path)) norm_labels = np.array(['normal']*len(norm_files)) norm_images = [] for image in tqdm(norm_files): image = cv2.imread(norm_path + image) image = cv2.resize(image, dsize=(200,200)) image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) norm_images.append(image) norm_images = np.array(norm_images) return norm_images, norm_labels

def load_pneumonia(pneu_path): pneu_files = np.array(os.listdir(pneu_path)) pneu_labels = np.array([pneu_file.split('_')[1] for pneu_file in pneu_files]) pneu_images = [] for image in tqdm(pneu_files): image = cv2.imread(pneu_path + image) image = cv2.resize(image, dsize=(200,200)) image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) pneu_images.append(image) pneu_images = np.array(pneu_images) return pneu_images, pneu_labels

As the two functions above have been declared, now we can just use it to load train data. If you run the code below you’ll also see why I choose to implement tqdm module in this project.

norm_images, norm_labels = load_normal('/kaggle/input/chest-xray-pneumonia/chest_xray/train/NORMAL/')

pneu_images, pneu_labels = load_pneumonia('/kaggle/input/chest-xray-pneumonia/chest_xray/train/PNEUMONIA/')

The progress bar displayed using tqdm module.

Up to this point, we already got several arrays: norm_images, norm_labels, pneu_images, and pneu_labels. The one with _images suffix indicates that it contains the preprocessed images while the array with _labels suffix shows that it stores all ground truths (a.k.a. labels). In other words, both norm_images and pneu_images are going to be our X data while the rest is going to be y data. To make things look more straightforward, I concatenate the values of those arrays and store in X_train and y_train array.

X_train = np.append(norm_images, pneu_images, axis=0) y_train = np.append(norm_labels, pneu_labels)

The shape of the features (X) and labels (y).

By the way, I obtain the number of images of each class using the following code:

Finding out the number of unique values in our training set

You're reading Pneumonia Detection Using Cnn With Implementation In Python

Beginners Guide To Convolutional Neural Network With Implementation In Python

This article was published as a part of the Data Science Blogathon

We have learned about the Artificial Neural network and its application in the last few articles. This blog will be all about another Deep Learning model which is the Convolutional Neural Network. As always this will be a beginner’s guide and will be written in such as matter that a starter in the Data Science field will be able to understand the concept, so keep on reading 🙂

1. Introduction to Convolutional Neural Network

2. Its Components

Input layer

Convolutional Layer

Pooling Layer

Fully Connected Layer

3. Practical Implementation of CNN on a dataset

Introduction to CNN

Convolutional Neural Network is a Deep Learning algorithm specially designed for working with Images and videos. It takes images as inputs, extracts and learns the features of the image, and classifies them based on the learned features.

This algorithm is inspired by the working of a part of the human brain which is the Visual Cortex. The visual Cortex is a part of the human brain which is responsible for processing visual information from the outside world. It has various layers and each layer has its own functioning i.e each layer extracts some information from the image or any visual and at last all the information received from each layer is combined and the image/visual is interpreted or classified.

Similarly, CNN has various filters, and each filter extracts some information from the image such as edges, different kinds of shapes (vertical, horizontal, round), and then all of these are combined to identify the image.

It is too much computation for an ANN model to train large-size images and different types of image channels.

Another reason is that ANN is sensitive to the location of the object in the image i.e if the location or place of the same object changes, it will not be able to classify properly.

Components of CNN

The CNN model works in two steps: feature extraction and Classification

Feature Extraction is a phase where various filters and layers are applied to the images to extract the information and features out of it and once it’s done it is passed on to the next phase i.e Classification where they are classified based on the target variable of the problem.

A typical CNN model looks like this:

Input layer

Convolution layer + Activation function

Pooling layer

Fully Connected Layer

Let’s learn about each layer in detail.

Input layer

As the name says, it’s our input image and can be Grayscale or RGB. Every image is made up of pixels that range from 0 to 255. We need to normalize them i.e convert the range between 0 to 1  before passing it to the model.

Below is the example of an input image of size 4*4 and has 3 channels i.e RGB and pixel values.

Convolution Layer

The convolution layer is the layer where the filter is applied to our input image to extract or detect its features. A filter is applied to the image multiple times and creates a feature map which helps in classifying the input image. Let’s understand this with the help of an example. For simplicity, we will take a 2D input image with normalized pixels.

In the above figure, we have an input image of size 6*6 and applied a filter of 3*3 on it to detect some features. In this example, we have applied only one filter but in practice, many such filters are applied to extract information from the image.

The result of applying the filter to the image is that we get a Feature Map of 4*4 which has some information about the input image. Many such feature maps are generated in practical applications.

Let’s get into some maths behind getting the feature map in the above image.

As presented in the above figure, in the first step the filter is applied to the green highlighted part of the image, and the pixel values of the image are multiplied with the values of the filter (as shown in the figure using lines) and then summed up to get the final value.

In the next step, the filter is shifted by one column as shown in the below figure. This jump to the next column or row is known as stride and in this example, we are taking a stride of 1 which means we are shifting by one column.

Similarly, the filter passes over the entire image and we get our final Feature Map. Once we get the feature map, an activation function is applied to it for introducing nonlinearity.

A point to note here is that the Feature map we get is smaller than the size of our image. As we increase the value of stride the size of the feature map decreases.

This is how a filter passes through the entire image with the stride of 1

Pooling Layer

The pooling layer is applied after the Convolutional layer and is used to reduce the dimensions of the feature map which helps in preserving the important information or features of the input image and reduces the computation time.

Using pooling, a lower resolution version of input is created that still contains the large or important elements of the input image.

The most common types of Pooling are Max Pooling and Average Pooling. The below figure shows how Max Pooling works. Using the Feature map which we got from the above example to apply Pooling. Here we are using a Pooling layer of size 2*2 with a stride of 2.

The maximum value from each highlighted area is taken and a new version of the input image is obtained which is of size 2*2 so after applying Pooling the dimension of the feature map has reduced. 

Fully Connected Layer

Till now we have performed the Feature Extraction steps, now comes the Classification part. The Fully connected layer (as we have in ANN) is used for classifying the input image into a label. This layer connects the information extracted from the previous steps (i.e Convolution layer and Pooling layers) to the output layer and eventually classifies the input into the desired label.

The complete process of a CNN model can be seen in the below image.

How to Implement CNN in Python? #evaluting the model model.evaluate(X_test,y_test) Frequently Asked Questions

Q1. What is CNN in Python?

A. A Convolutional Neural Network (CNN) is a type of deep neural network used for image recognition and classification tasks in machine learning. Python libraries like TensorFlow, Keras, PyTorch, and Caffe provide pre-built CNN architectures and tools for building and training them on specific datasets.

Q2. What are the 4 types of CNN?

A. The four common types of Convolutional Neural Networks (CNNs) are LeNet, AlexNet, VGGNet, and ResNet. LeNet is the first CNN architecture used for handwritten digit recognition, while AlexNet, VGGNet, and ResNet are deep CNNs that achieved top performance on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

Q3. What is tensorflow in python?

A. TensorFlow is an open-source machine learning and artificial intelligence library developed by Google Brain Team. It is written in Python and provides high-level APIs like Keras, as well as low-level APIs, for building and training machine learning models. TensorFlow also offers tools for data preprocessing, visualization, and distributed computing.

End notes:

We have covered some important elements of CNN in this blog while many are still left such as Padding, Data Augmentation, more details on Stride but as Deep learning is a deep and never-ending topic so I will try to discuss it in some future blogs. I hope you found this article helpful and worth your time investing on.

In the next few blogs, you can expect a detailed implementation of CNN with explanations and concepts like Data augmentation and Hyperparameter tuning.

About the Author

I am Deepanshi Dhingra currently working as a Data Science Researcher, and possess knowledge of Analytics, Exploratory Data Analysis, Machine Learning, and Deep Learning. Feel free to content with me on LinkedIn for any feedback and suggestions.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Guide To Gradient Descent And Its Variants With Python Implementation

This article was published as a part of the Data Science Blogathon

Introduction Table of Content:

Need for Optimization

Gradient Descend

Stochastic Gradient Descent (SGD)

Mini batch Gradient Descent (SGD)

Momentum based Gradient Descent (SGD)

Adagrad (short for adaptive gradient)

Adelta

Adam(Adaptive Gradient Descend)

Conclusion

Need for Optimization

The main purpose of machine learning or deep learning is to create a model that performs well and gives accurate predictions in a particular set of cases. In order to achieve that, we machine optimization.

Optimization starts with defining some kind of loss function/cost function (objective function) and ends with minimizing it using one or the other optimization routine. The choice of an optimization algorithm can make a difference between getting a good accuracy in hours or days.

To know more about the Optimization algorithm refer to this article

1.Gradient Descent

Gradient descent is one of the most popular and widely used optimization algorithms.

Gradient descent is not only applicable to neural networks but is also used in situations where we need to find the minimum of the objective function.

Python Implementation:

Note: We will be using MSE (Mean Squared Error) as the loss function.

We generate some random data points with 500 rows and 2 columns (x and y) and use them for training

data = np.random.randn(500, 2) ## Column one=X values; Column two=Y values theta = np.zeros(2) ## Model Parameters(Weights)

Calculate the loss function using MSE

def loss_function(data,theta): #get m and b m = theta[0] b = theta[1] loss = 0 #on each data point for i in range(0, len(data)): #get x and y x = data[i, 0] y = data[i, 1] #predict the value of y y_hat = (m*x + b) #compute loss as given in quation (2) loss = loss + ((y - (y_hat)) ** 2) #mean sqaured loss mean_squared_loss = loss / float(len(data)) return mean_squared_loss

Calculate the Gradient of loss function for model parameters

def compute_gradients(data, theta): gradients = np.zeros(2) #total number of data points N = float(len(data)) m = theta[0] b = theta[1] #for each data point for i in range(len(data)): x = data[i, 0] y = data[i, 1] #gradient of loss function with respect to m as given in (3) gradients[0] += - (2 / N) * x * (y - (( m* x) + b)) #gradient of loss funcction with respect to b as given in (4) gradients[1] += - (2 / N) * (y - ((theta[0] * x) + b)) #add epsilon to avoid division by zero error epsilon = 1e-6 gradients = np.divide(gradients, N + epsilon) return gradients

After computing gradients, we need to update our model parameter.

theta = np.zeros(2) gr_loss=[] for t in range(50000): #compute gradients gradients = compute_gradients(data, theta) #update parameter theta = theta - (1e-2*gradients) #store the loss gr_loss.append(loss_function(data,theta))

2. Stochastic Gradient Descent (SGD)

In gradient descent, to perform a single parameter update, we go through all the data points in our training set. Updating the parameters of the model only after iterating through all the data points in the training set makes convergence in gradient descent very slow increases the training time, especially when we have a large dataset. To combat this, we use Stochastic Gradient Descent (SGD).

In Stochastic Gradient Descent (SGD) we don’t have to wait to update the parameter of the model after iterating all the data points in our training set instead we just update the parameters of the model after iterating through every single data point in our training set.

Since we update the parameters of the model in SGD after iterating every single data point, it will learn the optimal parameter of the model faster hence faster convergence, and this will minimize the training time as well.

3. Mini-batch Gradient Descent

In Mini-batch gradient descent, we update the parameters after iterating some batches of data points.

Let’s say the batch size is 10, which means that we update the parameter of the model after iterating through 10 data points instead of updating the parameter after iterating through each individual data point.

Now we will calculate the loss function and update parameters

def minibatch(data, theta, lr = 1e-2, minibatch_ratio = 0.01, num_iterations = 5000): loss = [] minibatch_size = int(math.ceil(len(data) * minibatch_ratio)) ## Calculate batch_size for t in range(num_iterations): sample_size = random.sample(range(len(data)), minibatch_size) np.random.shuffle(data) #sample batch of data sample_data = data[0:sample_size[0], :] #compute gradients grad = compute_gradients(sample_data, theta) #update parameters theta = theta - (lr * grad) loss.append(loss_function(data,theta)) return loss

4.Momentum based Gradient Descent (SGD)

The problem with Stochastic Gradient Descent (SGD) and Mini-batch Gradient Descent was that during convergence they had oscillations.

From the above plot, we can see oscillations represented with dotted lines in the case of Mini-batch Gradient Descent.

Now you must be wondering what these oscillations are?

Momentum helps us in not taking the direction that does not lead us to convergence.

In other words, we take a fraction of the parameter update from the previous gradient step and add it to the current gradient step.

Python Implementation

def Momentum(data, theta, lr = 1e-2, gamma = 0.9, num_iterations = 5000): loss = [] #Initialize vt with zeros: vt = np.zeros(theta.shape[0]) for t in range(num_iterations): #compute gradients with respect to theta gradients = compute_gradients(data, theta) #Update vt by equation (8) vt = gamma * vt + lr * gradients #update model parameter theta by equation (9) theta = theta - vt #store loss of every iteration loss.append(loss_function(data,theta)) return loss

From the above plot, we can see that Momentum reduces the oscillations produced in MiniBatch Gradient Descent

5. Adagrad (short for adaptive gradient)

In the case of deep learning, we have many model parameters (Weights) and many layers to train. Our goal is to find the optimal values for all these weights.

In all of the previous methods, we observed that the learning rate was a constant value for all the parameters of the network.

However, Adagrad adaptively sets the learning rate according to a parameter hence the name adaptive gradient.

In the given equation the denominator represents the sum of the squares of the previous gradient step for the given parameter. If we can notice this denominator actually scales of learning rate.

– That is, when the sum of the squared past gradients has a high value, we are basically dividing the learning rate by a high value, so our learning rate will become less.

– Similarly, if the sum of the squared past gradients has a low value, we are dividing the learning rate by a lower value, so our learning rate value will become high.

Python Implementation

def AdaGrad(data, theta, lr = 1e-2, epsilon = 1e-8, num_iterations = 100): loss = [] #initialize gradients_sum for storing sum of gradients gradients_sum = np.zeros(theta.shape[0]) for t in range(num_iterations): #compute gradients with respect to theta gradients = compute_gradients(data, theta) #compute square of sum of gradients gradients_sum += gradients ** 2 #update gradients gradient_update = gradients / (np.sqrt(gradients_sum + epsilon)) #update model parameter according to equation (12) theta = theta - (lr * gradient_update) loss.append(loss_function(data,theta)) return loss

As we can see that for every iteration, we are accumulating and summing all the past squared gradients. So, on every iteration, our sum of the squared past gradients value will increase. When the sum of the squared past gradient value is high, we will have a large number in the denominator. When we divide the learning rate by a very large number, then the learning rate will become very small.

That is, our learning rate will be decreasing. When the learning rate reaches a very low value, then it takes a long time to attain convergence

6. Adadelta

We can see that in the case of Adagrad we had a vanishing learning rate problem. To deal with this we generally use Adadelta.

In Adadelta, instead of taking the sum of all the squared past gradients, we take the exponentially decaying running average or weighted average of gradients.

Python Implementation

def AdaDelta(data, theta, gamma = 0.9, epsilon = 1e-5,num_iterations = 500): loss = [] #initialize running average of gradients E_grad2 = np.zeros(theta.shape[0]) #initialize running average of parameter update E_delta_theta2 = np.zeros(theta.shape[0]) for t in range(num_iterations): #compute gradients of loss with respect to theta gradients = compute_gradients(data, theta) #compute running average of gradients as given in equation (13) E_grad2 = (gamma * E_grad2) + ((1. - gamma) * (gradients ** 2)) #compute delta_theta as given in equation (14) delta_theta = - (np.sqrt(E_delta_theta2 + epsilon)) / (np.sqrt(E_grad2 + epsilon)) * gradients #compute running average of parameter updates as given in equation (15) E_delta_theta2 = (gamma * E_delta_theta2) + ((1. - gamma) * (delta_theta ** 2)) #update the model parameter, theta as given in equation (16) theta = theta + delta_theta #store the loss loss.append(loss_function(data,theta)) return loss

Note: The main idea behind Adadelta and RMSprop is mostly the same that is to deal with the vanishing learning rate by taking the weighted average of gradient step.

To know more about RMSprop refer to this article

7. Adam(Adaptive moment estimation)

compute the running average of the gradients.

In the above equations Beta=decaying rate.

From the above equation, we can see that we are combining the equations from both Momentum and RMSProp.

Python Implementation

def Adam(data, theta, lr = 1e-2, beta1 = 0.9, beta2 = 0.9, epsilon = 1e-6, num_iterations = 1000): loss = [] #initialize first moment mt mt = np.zeros(theta.shape[0]) #initialize second moment vt vt = np.zeros(theta.shape[0]) for t in range(num_iterations): #compute gradients with respect to theta gradients = compute_gradients(data, theta) #update first moment mt as given in equation (19) mt = beta1 * mt + (1. - beta1) * gradients #update second moment vt as given in equation (20) vt = beta2 * vt + (1. - beta2) * gradients ** 2 #compute bias-corected estimate of mt (21) mt_hat = mt / (1. - beta1 ** (t+1)) #compute bias-corrected estimate of vt (22) vt_hat = vt / (1. - beta2 ** (t+1)) #update the model parameter as given in (23) theta = theta - (lr / (np.sqrt(vt_hat) + epsilon)) * mt_hat loss.append(loss_function(data,theta)) return loss

Using Aws S3 With Python Boto3

This article was published as a part of the Data Science Blogathon.

Introduction

AWS S3 is one of the obje ve files quickly and securely from anywhere. Users can combine S3 with other services to build numerous luding S3, and use the resources from within AWS. It helps developers to create, configure, and manage AWS services, making it easy to integrate with Python applications, libraries, or scripts. This article covers how boto3 works and how it helps interact with S3 op ions such as creating, listing, and deleting buckets and objects.

What is boto3

Boto3 is a Python SDK or library that can manage and access various services of AWS, such as Amazon S3, EC2, Dynamo DB, SQS, Cloudwatch, etc., through python scripts. Boto3 has a data-driven approach for generating classes at runtime from JSON description files shared between SDKs. Because Boto 3 is generated from these shared JSON files, users get fast updates to the latest services and a consistent API across services. It provides object-oriented and easy-to-use API as well as low-level direct service access.

Key Features of boto3

It is built on top of botocore- a Python library used to send API requests to AWS and receive responses from the service.

Supports Python 2.7+ and 3.4+ natively.

Boto3 provides sessions and per-session credentials & configuration, along with essential components like authentication, parameter, and response handling.

Has a consistent and Up-to-date Interface

Working with AWS S3 and Boto3

Using the Boto3 library or SDK with Amazon S3 allows users to create, delete, and update S3 Buckets, Objects, S3 Bucket Policies, etc., from Python programs or scripts in a faster way. Boto3 has two abstractions, namely client and resource. Users can choose straction if they want to work with single S3 files or resource abstraction if they want to work with multiple S3 buckets. Clients provide a low-level interface to the AWS services, whereas resources are higher-level abstraction than clients.

Installation of boto3 and Building AWS S3 Client

Installing boto3 to your application:

On the Terminal, use the code

pip list

The above code will list the installed packages. If Boto3 is not installed, install it by the following code.

pip install boto3

Build an S3 client to access the service methods:

Create an S3 client that helps access the objects stored in the S3 environment and set credentials, including aws_access_key_id and aws_secret_access_key. It is essential to have credentials such as Access Key and Secret Key to access the S3 bucket and to run the following code.

# Import the necessary packages import boto3 # Now, build a client S3 = boto3.client( 's3', aws_access_key_id = 'enter your_aws_access_key_id ', aws_secret_access_key = ' enter your_aws_secret_access_key ', region_name = ' enter your_aws_region_name ' )   AWS S3 Operations With boto3

Creating buckets:

To create an S3 bucket, use the create_bucket() method with the Bucket and ACL parameters. ACL represents Access Control List which manages access to S3 buckets and objects. It is important to note that Bucket names should be unique throughout the whole AWS platform.

my_bucket = "enter your s3 bucket name that has to be created" bucket = s3.create_bucket( ACL='private', Bucket= my_bucket )

Listing buckets:

To list all the available buckets, use the list_buckets() method.

bucket_response = s3.list_buckets() # Output the bucket names print('Existing buckets are:') for bucket in bucket_response ['Buckets']: print(f' {bucket["Name"]}')

Deleting Buckets:

A bucket in S3 can be deleted using the delete_bucket() method. The bucket must be empty, meaning it does not contain any objects to perform the deletion.

my_bucket = "enter your s3 bucket name that has to be deleted" response = s3.delete_bucket(Bucket= my_bucket) print("Bucket has been deleted successfully !!!")

Listing the files from a bucket:

Files or objects from an S3 bucket can be listed using the list_objects method or the list_objects_v2 method.

my_bucket = "enter your s3 bucket name from which objects or files has to be listed out" response = s3.list_objects(Bucket= my_bucket, MaxKeys=10, Preffix="only_files_starting_with_this_string")

MaxKeys argument represents the maximum number of objects to be listed. The prefix argument lists Objects whose keys (names) only start with a specific prefix.

Another way to list objects:

s3 = boto3.client("s3") my_bucket = " enter your s3 bucket name from which objects or files has to be listed out " response = s3.list_objects_v2(Bucket=my_bucket) files = response.get("Contents") for file in files: print(f"file_name: {file['Key']}, size: {file['Size']}")

Uploading files:

To upload a file to an s3 bucket, use the method upload_file () having the following parameters:

File: it defines the path of the file to be uploaded

Key: it represents the unique identifier for an object within a bucket

Bucket: bucket name to which file has to be uploaded

my_bucket = "enter your bucket name to which files has to be uploaded" file_name = "enter your file path name to be uploaded" key_name = "enter unique identifier" s3.upload_file(Filename= file_name, Bucket= my_bucket, Key= key_name)

Downloading files:

To download a file or object locally from a bucket, use the download_file() method with Key, Bucket, and Filename parameters.

my_bucket = "enter your s3 bucket name from which object or files has to be downloaded" file_name = "enter file to be downloaded" key_name = "enter unique identifier" s3.download_file(Filename= file_name, Bucket= my_bucket, Key= key_name)

Deleting files:

To delete a file or object from a bucket, use the delete_object() method with Key and Bucket parameters.

my_bucket = "enter your s3 bucket name from which objects or files has to be deleted" key_name = "enter unique identifier" s3.delete_object(Bucket= my_bucket, Key= key_name)

Get the object’s metadata:

To get the file or object’s details, such as last modification time, storage class, content length, size in bytes, etc., use the head_object() method with Key and Bucket parameters.

my_bucket = "enter your s3 bucket name from which objects or file's metadata has to be obtained" key_name = "enter unique identifier" response = s3.head_object(Bucket= my_bucket, Key= key_name) Conclusion

AWS S3 is one of the most reliable, flexible, and durable object storage systems that allows users to store and retrieve data. AWS defines boto3 as a Python library or SDK (Software Development Kit) to create, manage and configure AWS services, including S3. The boto3 operates AWS services in a programmatic way from your applications and services.

Key Takeaways:

AWS S3 is one object storage service that helps store and retrieve files quickly.

Boto3 is a Python SDK or library that can manage Amazon S3, EC2, Dynamo DB, SQS, Cloudwatch, etc.

Boto3 clients provide a low-level interface to the AWS services, whereas resources are a higher-level abstraction than clients.

Using the Boto3 library with Amazon S3 allows users to create, list, delete, and update S3 Buckets, Objects, S3 Bucket Policies, etc., from Python programs or scripts in a faster way.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

Data Analysis Using Python Pandas

In this tutorial, we are going to see the data analysis using Python pandas library. The library pandas are written in C. So, we don’t get any problem with speed. It is famous for data analysis. We have two types of data storage structures in pandas. They are Series and DataFrame. Let’s see one by one.

1.Series

Series is a 1D array with customized index and values. We can create a Series object using the pandas.Series(data, index) class. Series will take integers, lists, dictionaries as data. Let’s see some examples.

Example # importing the pandas library import pandas as pd # data data = [1, 2, 3] # creating Series object # Series automatically takes the default index series = pd.Series(data) print(series) Output

If you run the above program, you will get the following result.

0 1 1 2 2 3 dtype: int64

How to have a customized index? See the example.

Example # importing the pandas library import pandas as pd # data data = [1, 2, 3] # index index = ['a', 'b', 'c'] # creating Series object series = pd.Series(data, index) print(series) Output

If you run the above program, you will get the following result.

a 1 b 2 c 3 dtype: int64

When we give the data as a dictionary to the Series class, then it takes keys as index and values as actual data. Let’s see one example.

Example # importing the pandas library import pandas as pd # data data = {'a':97, 'b':98, 'c':99} # creating Series object series = pd.Series(data) print(series) Output

If you run the above program, you will get the following results.

a 97 b 98 c 99 dtype: int64

We can access the data from the Series using an index. Let’s see the examples.

Example # importing the pandas library import pandas as pd # data data = {'a':97, 'b':98, 'c':99} # creating Series object series = pd.Series(data) # accessing the data from the Series using indexes print(series['a'], series['b'], series['c']) Output

If you run the above code, you will get the following results.

97 98 99 2.Pandas

We have how to use Series class in pandas. Let’s see how to use the DataFrame class. DataFrame data structure class in pandas that contains rows and columns.

We can create DataFrame objects using lists, dictionaries, Series, etc.., Let’s create the DataFrame using lists.

Example # importing the pandas library import pandas as pd # lists names = ['Tutorialspoint', 'Mohit', 'Sharma'] ages = [25, 32, 21] # creating a DataFrame data_frame = pd.DataFrame({'Name': names, 'Age': ages}) # printing the DataFrame print(data_frame) Output

If you run the above program, you will get the following results.

               Name    Age 0    Tutorialspoint    25 1             Mohit    32 2            Sharma    21

Let’s see how to create a data frame object using the Series.

Example # importing the pandas library import pandas as pd # Series _1 = pd.Series([1, 2, 3]) _2 = pd.Series([1, 4, 9]) _3 = pd.Series([1, 8, 27]) # creating a DataFrame data_frame = pd.DataFrame({"a":_1, "b":_2, "c":_3}) # printing the DataFrame print(data_frame) Output

If you run the above code, you will get the following results.

   a  b  c 0  1  1  1 1  2  4  8 2  3  9  27

We can access the data from the DataFrames using the column name. Let’s see one example.

Example # importing the pandas library import pandas as pd # Series _1 = pd.Series([1, 2, 3]) _2 = pd.Series([1, 4, 9]) _3 = pd.Series([1, 8, 27]) # creating a DataFrame data_frame = pd.DataFrame({"a":_1, "b":_2, "c":_3}) # accessing the entire column with name 'a' print(data_frame['a']) Output

If you run the above code, you will get the following results.

0 1 1 2 2 3

How To Read Text Files Using Linecache In Python

Solution..

The linecache module implements cache which holds the contents of files, parsed into separate lines, in memory. linecache module returns line/s by indexing into a list, and saves time over repeatedly reading the file and parsing lines to find the one desired.

lincecache module is very useful when looking for multiple lines from the same file.

Prepare test data. You can get this text by just using Google and searching for sample text.

Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam.

Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his.

Te nam tempor posidonium scripserit, eam mundi reprimique dissentias ne. Vim te soleat offendit democritum. Nam an diam elaboraret, quaeque dissentias an has. Autem legendos dignissim ad vis, sea ex amet petentium reprehendunt, inermis constituam philosophia ne mel. Esse noster lobortis usu ne.

Nec reque postea urbanitas ut, mea in nulla invidunt ocurreret. Ei duo iuvaret numquam. Ferri nemore audire te est, mel et detracto noluisse. Nec eu habeo justo, id pro posse apeirian volutpat. Mea sonet quaestio ne.

Atqui quaeque alienum te vim. Graeco aliquip liberavisse pro ut. Te similique reformidans usu, te mundi aliquando ius. Meis scripta minimum quo no, meis prima fabellas eu eam, laoreet delicata forensibus ut vim. Et quo vocibus mediocritatem, atqui summo an eam.

Example import os import tempfile text = """ Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam. Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his. Te nam tempor posidonium scripserit, eam mundi reprimique dissentias ne. Vim te soleat offendit democritum. Nam an diam elaboraret, quaeque dissentias an has. Autem legendos dignissim ad vis, sea ex amet petentium reprehendunt, inermis constituam philosophia ne mel. Esse noster lobortis usu ne. Nec reque postea urbanitas ut, mea in nulla invidunt ocurreret. Ei duo iuvaret numquam. Ferri nemore audire te est, mel et detracto noluisse. Nec eu habeo justo, id pro posse apeirian volutpat. Mea sonet quaestio ne. Atqui quaeque alienum te vim. Graeco aliquip liberavisse pro ut. Te similique reformidans usu, te mundi aliquando ius. Meis scripta minimum quo no, meis prima fabellas eu eam, laoreet delicata forensibus ut vim. Et quo vocibus mediocritatem, atqui summo an eam. """

1. Create a Function to create temporary file and delete it after usage.

def make_tempfile(): """ Function: Create a temporary file. mkstemp() and mkdtemp() to create temporary files and directories args: None return: Temp file name. """ fd, temp_file = tempfile.mkstemp() os.close(fd) with open(temp_file, 'wt') as f: f.write(text) return temp_file def cleanup(temp_file): os.unlink(temp_file)

3. Read specific lines using linecache. The line numbers of files read by the linecache module start with 1, unlike lists which start indexing the array from 0. This is an important point to remember.

import os import tempfile import linecache text = """ Lorem ipsum dolor sit amet, causae apeirian ea his, duo cu congue prodesset. Ut epicuri invenire duo, novum ridens eu has, in natum meliore noluisse sea. Has ei stet explicari. No nam eirmod deterruisset, nusquam electram rationibus ad sea, interesset delicatissimi et sit. Purto molestiae cu eum, in per hinc periculis intellegam. Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his. Te nam tempor posidonium scripserit, eam mundi reprimique dissentias ne. Vim te soleat offendit democritum. Nam an diam elaboraret, quaeque dissentias an has. Autem legendos dignissim ad vis, sea ex amet petentium reprehendunt, inermis constituam philosophia ne mel. Esse noster lobortis usu ne. Nec reque postea urbanitas ut, mea in nulla invidunt ocurreret. Ei duo iuvaret numquam. Ferri nemore audire te est, mel et detracto noluisse. Nec eu habeo justo, id pro posse apeirian volutpat. Mea sonet quaestio ne. Atqui quaeque alienum te vim. Graeco aliquip liberavisse pro ut. Te similique reformidans usu, te mundi aliquando ius. Meis scripta minimum quo no, meis prima fabellas eu eam, laoreet delicata forensibus ut vim. Et quo vocibus mediocritatem, atqui summo an eam. """ def make_tempfile(): """ Function: Create a temporary file. mkstemp() and mkdtemp() to create temporary files and directories args: None return: Temp file name. """ directory = os.getcwd() fd, temp_file = tempfile.mkstemp(dir=directory) os.close(fd) with open(temp_file, 'wt') as f: f.write(text) return temp_file def cleanup(temp_file): os.unlink(temp_file) # Make a file with ipsum data. filename = make_tempfile() print(f"Output n {filename}") split_line = 'n' # Pick the lines from source. print(f"*** Displaying first 5 lines directly from the source n {text.split(split_line)[4]}" ) # pick out the same line from cache print(f" n *** Displaying first 5 lines from the cache n {linecache.getline(filename, 5)}" ) # cleanup the tempfile by using unlink cleanup(filename) Output C:UserssasanPycharmProjectsblogTutorialPointsUpdated_Codetmpazax_yne *** Displaying first 5 lines directly from the source Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his. *** Displaying first 5 lines from the cache Id porro facete cum. No est veritus detraxit facilisis, sit ea clita decore essent. Ut eam labores fuisset menandri, ex sit brute viderer eleifend, altera argumentum vel ex. Duo at zril sensibus, eu vim ullum assentior, quando possit at his.

4. Linecache always includes the newline at the end of the line. Therefore, if the line is empty, the return value is just the newline.

See below.

import linecache # Make a file with ipsum data. filename = make_tempfile() print(f"Output n {filename}") # Blank lines include the newline. print(f"n *** The number of lines in the text is 13." ) print(" n *** Displaying the lastline from Linecache which should be a new linen {!r}".format(linecache.getline(filename, 8)) ) cleanup(filename) Output C:UserssasanPycharmProjectsblogTutorialPointsUpdated_Codetmp352zirvn *** The number of lines in the text is 13. *** Displaying the lastline from Linecache which should be a new line 'n'

5.Conclusion – When an application needs random access to files, linecache makes it easy to read lines by their line number. The contents of the file are maintained in a cache, so be careful of memory consumption.

Update the detailed information about Pneumonia Detection Using Cnn With Implementation In Python on the Achiashop.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!