吴恩达机器学习
第一部分
第一题:WarmUp
让我们写一个函数返回一个单位阵
这真没啥好说的
function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
% A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
A = [];
% ============= YOUR CODE HERE ==============
% Instructions: Return the 5x5 identity matrix
% In octave, we return values by defining which variables
% represent the return values (at the top of the file)
% and then set them accordingly.
A = eye(5);
% ===========================================
end
第二题&第五题:求代价函数
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% =================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
f=X*theta;
f=f-y;
J=1/m/2*(f'*f);
% ============================================================
end
回忆公式

由于那么我们可以使用这个定义求出应该有的值,再和y相减即可
注意这个在这里是齐次的,我们暂时不讨论非齐次的情况
怎么求和?利用向量的内积性质求出每一个测试集相差的平方和即可
如果是非齐次的怎么办?那就减去一个向量即可
第三题&第六题:梯度下降
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
% theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ================ YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCostMulti) and gradient here.
%
hyp = X*theta;
dJ = X'*(hyp-y);利用矩阵乘法的性质与每一个元素进行了运算
theta = theta - (alpha/m)*dJ;
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
吴恩达给的代码里面出现了迭代,这个是用来保存每一次J的值的,我们在实际运用的时候一定要记录J的变换,判断梯度下降的停止的时刻
首先求出每一次,保存到yf里面,扩大f到和
一样维度的方阵里
第四题
归一化
mu=mean(X);
sigma=std(X);
X_norm=(X-ones(size(X,1),1)*mu)./(ones(size(X,1),1)*sigma);
mean:求出X每一个参数值的平均值,mean(X,维度)
std:求极差,std(X,维度)
第七题
正规方程
function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
% NORMALEQN(X,y) computes the closed-form solution to linear
% regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ==================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
% to linear regression and put the result in theta.
%
theta = pinv(X' * X) * X' * y;
% ============================================================
end
第二部分
注意:X是一个的矩阵,矩阵的第一列都是1!,
是一个
的向量,保存的是我们要预测的参数值
第一题:sigmoid函数
输入一个矩阵,返回每一个元素做计算的值
function g = sigmoid(z) %SIGMOID Compute sigmoid function % g = SIGMOID(z) computes the sigmoid of z. % You need to return the following variables correctly g = zeros(size(z)); % ==================== YOUR CODE HERE ====================== % Instructions: Compute the sigmoid of each value of z (z can be a matrix, % vector or scalar). g=1./(ones(size(z))+exp(-z)) % ============================================================ end
第二题:不正则化的梯度迭代:
返回
function [J, grad] = costFunction(theta, X, y) %COSTFUNCTION Compute cost and gradient for logistic regression % J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the % parameter for logistic regression and the gradient of the cost % w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ==================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta % % Note: grad should have the same dimensions as theta % h = sigmoid(X * theta); J = (-log(h.')*y - log(ones(1, m) - h.')*(ones(m, 1) - y)) / m; grad = (X.' * (h - y)) /m; % ============================================================ end
首先先求出预测值,就用第一题的那个函数,求出预测值(注意:X*theta是第一个实验作业里面的线性回归函数的表达式)注意:h.就是求转置的意思,这样可以带入代价函数算出J
grad就是每一次下降的时候递减的值,注意:没算学习效率
第三题:0&1判断
这时候就用一手if对矩阵每个元素进行判定
像exp,log,if这些作用在矩阵里面就是对每个元素求值
function p = predict(theta, X) %PREDICT Predict whether the label is 0 or 1 using learned logistic %regression parameters theta % p = PREDICT(theta, X) computes the predictions for X using a % threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1) m = size(X, 1); % Number of training examples % You need to return the following variables correctly p = zeros(m, 1); % ===================== YOUR CODE HERE ===================== % Instructions: Complete the following code to make predictions using % your learned logistic regression parameters. % You should set p to a vector of 0's and 1's % h = sigmoid(X * theta); p = (h >= 0.5); % ============================================================ end
第四题:有正则化的代价函数求值:
function [J, grad] = costFunctionReg(theta, X, y, lambda) %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization % J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using % theta as the parameter for regularized logistic regression and the % gradient of the cost w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ==================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta h = sigmoid(X * theta); J = (-log(h.')*y - log(ones(1, m) - h.')*(ones(m, 1) - y)) / m +(lambda/(2*m)) * sum(theta(2:end).^2); grad(1) = (X(:, 1).' * (h - y)) /m; grad(2:end) = (X(:, 2:end).' * (h - y)) /m + (lambda/m) * theta(2:end); % ============================================================ end
这里主要区分了grad中1和2之间的区别,其他的和公式相差无二
第三部分
多分类
X:是一个维的矩阵,里面存的是m组数据集
第一题:
正则化的逻辑回归表达式
function [J, grad] = lrCostFunction(theta, X, y, lambda) %LRCOSTFUNCTION Compute cost and gradient for logistic regression with %regularization % J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using % theta as the parameter for regularized logistic regression and the % gradient of the cost w.r.t. to the parameters. % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; grad = zeros(size(theta)); % ==================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta % % Hint: The computation of the cost function and gradients can be % efficiently vectorized. For example, consider the computation % % sigmoid(X * theta) % % Each row of the resulting matrix will contain the value of the % prediction for that example. You can make use of this to vectorize % the cost function and gradient computations. % % Hint: When computing the gradient of the regularized cost function, % there're many possible vectorized solutions, but one solution % looks like: % grad = (unregularized gradient for logistic regression) % temp = theta; % temp(1) = 0; % because we don't add anything for j = 0 % grad = grad + YOUR_CODE_HERE (using the temp variable) % h=sigmoid(X*theta) J = (-log(h.')*y - log(ones(1, m) - h.')*(ones(m, 1) - y)) / m +(lambda/(2*m)) * sum(theta(2:end).^2); grad(1) = (X(:, 1).' * (h - y)) /m; grad(2:end) = (X(:, 2:end).' * (h - y)) /m + (lambda/m) * theta(2:end); % ============================================================ grad = grad(:); end
第一题就是求正则化的代价函数,这操作基本上一样的,就提一嘴数据集的格式:X是测试用的数据集,每个行向量都是一组数据,h求出来的就是一组0-1向量
第二题:
一对多的分类
function [all_theta] = oneVsAll(X, y, num_labels, lambda) %ONEVSALL trains multiple logistic regression classifiers and returns all %the classifiers in a matrix all_theta, where the i-th row of all_theta %corresponds to the classifier for label i % [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels % logistic regression classifiers and returns each of these classifiers % in a matrix all_theta, where the i-th row of all_theta corresponds % to the classifier for label i % Some useful variables m = size(X, 1); n = size(X, 2); % You need to return the following variables correctly all_theta = zeros(num_labels, n + 1); % Add ones to the X data matrix X = [ones(m, 1) X]; % ==================== YOUR CODE HERE ====================== % Instructions: You should complete the following code to train num_labels % logistic regression classifiers with regularization % parameter lambda. % % Hint: theta(:) will return a column vector. % % Hint: You can use y == c to obtain a vector of 1's and 0's that tell you % whether the ground truth is true/false for this class. % % Note: For this assignment, we recommend using fmincg to optimize the cost % function. It is okay to use a for-loop (for c = 1:num_labels) to % loop over the different classes. % % fmincg works similarly to fminunc, but is more efficient when we % are dealing with large number of parameters. % % Example Code for fmincg: % % % Set Initial theta % initial_theta = zeros(n + 1, 1); % % % Set options for fminunc % options = optimset('GradObj', 'on', 'MaxIter', 50); % % % Run fmincg to obtain the optimal theta % % This function will return theta and the cost % [theta] = ... % fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ... % initial_theta, options); % options = optimset('GradObj', 'on', 'MaxIter', 50); initial_theta = zeros(size(X, 2), 1); for c = 1:num_labels [all_theta(c, :)] = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options); end % ============================================================ end
options = optimset(‘param1′,value1,’param2’,value2,…) %设置所有参数及其值,未设置的为默认值
Parameter | Value | Description |
Display | ‘off’ | ‘iter’ | ‘final’ | ‘notify’ | ‘off’ 表示不显示输出; ‘iter’ 显示每次迭代的结果; ‘final’ 只显示最终结果; ‘notify’ 只在函数不收敛的时候显示结果. |
MaxFunEvals | positive integer | 函数求值运算(Function Evaluation)的最高次数 |
MaxIter | positive integer | 最大迭代次数. |
TolFun | positive scalar | 函数迭代的终止误差. |
TolX | positive scalar | 结束迭代的X值. |
fmincg这个函数说明的是可以返回和cost值,具体怎么实现不提,根据这个函数可以求出向量,每次就把行向量的值赋值给X里,迭代即可,返回的是
需要我们
第三题:预测(离散化)迭代函数
这里all_theta是一个维的矩阵
function p = predictOneVsAll(all_theta, X) %PREDICT Predict the label for a trained one-vs-all classifier. The labels %are in the range 1..K, where K = size(all_theta, 1). % p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions % for each example in the matrix X. Note that X contains the examples in % rows. all_theta is a matrix where the i-th row is a trained logistic % regression theta vector for the i-th class. You should set p to a vector % of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2 % for 4 examples) m = size(X, 1); num_labels = size(all_theta, 1); % You need to return the following variables correctly p = zeros(size(X, 1), 1); % Add ones to the X data matrix X = [ones(m, 1) X]; % ==================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned logistic regression parameters (one-vs-all). % You should set p to a vector of predictions (from 1 to % num_labels). % % Hint: This code can be done all vectorized using the max function. % In particular, the max function can also return the index of the % max element, for more information see 'help max'. If your examples % are in rows, then, you can use max(A, [], 2) to obtain the max % for each row. % [~, p] = max(X * all_theta.', [], 2); % ============================================================ end
C = max(A,[],dim)
返回A中有dim指定的维数范围中的最大值。比如C=max(A,[],2),在矩阵中,第2维度表示列,第1维度表示行
max这个函数有两个输出,但是调用这个函数的程序只把第二个输出赋值给了p,不需要第一个输出,于是第一个输出就写成~
第一个输出就是一个索引表,记录着何时会取最大,这个略过,我们不需要
max:求出最可能的特征的值,求每行最大的值即可
X * all_theta.’乘出来就是一个维的矩阵,保存的是每一个数据集属于哪一个集合的可能性
第二部分 神经网络
第四题:计算神经网络(不要求反向学习)
function p = predict(Theta1, Theta2, X) %PREDICT Predict the label of an input given a trained neural network % p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the % trained weights of a neural network (Theta1, Theta2) % Useful values m = size(X, 1); num_labels = size(Theta2, 1); % You need to return the following variables correctly p = zeros(size(X, 1), 1); % ==================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned neural network. You should set p to a % vector containing labels between 1 to num_labels. % % Hint: The max function might come in useful. In particular, the max % function can also return the index of the max element, for more % information see 'help max'. If your examples are in rows, then, you % can use max(A, [], 2) to obtain the max for each row. % X = [ones(size(X), 1), X]; % Add ones to the X data matrix X1 = sigmoid(X * Theta1.'); X1 = [ones(size(X1), 1), X1]; % Add ones to the X1 data matrix [~, p] = max(X1 * Theta2.', [], 2); % ============================================================ end
纯计算,theta是已经计算好的矩阵值,就直接算就行,注意要给计算出来的值一个bias值,就是加一个全是1的行
0 条评论