Menu
This is the sixth post in my series of machine learning best practices If you've come across the series for the first time you can go back to the beginning or read the whole series Aristotle was likely one of the first data scientists who studied empiricism by learning through observation
Get more
What is data mining ? Data mining (is the analysis stage "Knowledge Discovery in Databases" or KDD) is a field of statistics and computer science refers to the process that attempts to discover patterns in large volume datasets It uses the methods of artificial intelligence machine learning statistics and database systems
Get more
data mining and machine learning algorithms and can lead to ineffi-cient learning systems To help fill this critical void we introduced the GraphLab abstraction which naturally expresses asynchronous dynamic graph-parallel computation while ensuring data consis-tency and achieving a high degree of parallel performance in the
Get more
Underfitting What does Underfitting Mean? Underfitting the counterpart of overfitting happens when a machine learning model is not complex enough to accurately capture relationships between a dataset's features and a target variable An underfitted model results in problematic or erroneous outcomes on new data or data that it wasn't trained on and often performs poorly
Get more
data mining and machine learning algorithms and can lead to inef-cient learning systems To help ll this critical void we introduced the GraphLab abstraction which naturally expresses asynchronous dynamic graph-parallel computation while ensuring data consis-tency and achieving a high degree of parallel performance in the shared-memory
Get more
In one my previous posts I talke about the biases that are to be expected in machine learning and can actually help build a better model Here is the follow-up post to show some of the bias to be avoided 1 Sample Bias We all have to consider sampling bias on our training data as a result of human input
Get more
This is due to IDF part which gives more weightage to the words that are distinct In other words 'day' is an important word for Document1 from the context of the entire corpus Python scikit-learn library provides efficient tools for text data mining and provides functions to calculate TF-IDF of text vocabulary given a text corpus
Get more
Data mining has become an imperative tool in any business process Today's technology has improved to store large volume of data unlike few decades back where many considered storing data a wasteful expenditure The situation has changed now due to several data mining tools available in the market many of which can mine large volumes of data
Get more
The topic of data mining big data and machine learning becomes more and more popular But there aren't many resources about it available online Even though a few videos on Youtube talk about data science and machine learning they are segments with scattered content rather than a set of videos systematically introducing the entire picture of the data science field
Get more
Data mining is the process of finding anomalies patterns and correlations within large data sets to predict outcomes Using a broad range of techniques you can use this information to increase revenues cut costs improve customer relationships reduce risks and more
Get more
Among current and emerging applications in the medical record data mining industry our research finds that machine learning applications show a trend While the general objectives of these platforms are mostly similar to gain useful insights from medical data to improve patient outcomes there are slight differences worth highlighting
Get more
Data Mining And Its Relevance To Business Data mining uses well established statistical and machine learning techniques to predict customer behaviour Today's technology has improved to store large volume of data unlike few decades back where many considered storing data a wasteful expenditure
Get more
Data Mining Tutorial: Process Techniques Tools EXAMPLES Details and potentially useful patterns in huge data sets Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data It is a multi-disciplinary skill that uses machine learning statistics
Get more
OneR always outperforms (or at worst equals) Baseline when evaluated on the training data (evaluating on the training data doesn't reflect performance on independent test data ) ZeroR sometimes outperforms OneR if the target distribution is skewed or limited data is available predicting the majority class can yield better results than basing a rule on a single attribute
Get more
Which one is better: a boat a car or an airplane? There's no possible answer without context An airplane is faster but you need a way to take off and land A car is standard for land and a boat is great for water on the surface if you want s
Get more
The 7 Steps of Machine Learning Now it's time for the next step of machine learning: Data preparation where we load our data into a suitable place and prepare it for use in our machine learning training We'll first put all our data together and then randomize the ordering
Get more
Feature Selection and Data Visualization 2 years ago in Breast Cancer Wisconsin (Diagnostic) Data Set 1 112 votes What Causes Heart Disease? Explaining the Model a year ago in Heart Disease UCI 600 votes APTOS 2019: DenseNet Keras Starter 9 months ago with multiple data sources 327 votes Basic Machine Learning with Cancer 3 years ago in Breast Cancer Wisconsin (Diagnostic) Data
Get more