群馬大学 | 医学部 | サイトトップ | 医学情報処理演習
テキスト表4-1に示す成人男性100万人の母集団についての身長データは,次のコードで入力できる。Xに100万人の身長データが付値される。
HT <- 144:191 # Set heights into variable HT in cm. NUM <- c(2,3,6,17,19,56,125,219,463,915,1609,2649,4550,7214,11005,16081, 22098,29903,39048,48312,57703,66639,73332,78051,79829,77866,73767,66321, 57993,48410,39081,29967,22055,15810,10875,7309,4596,2726,1519,939,462, 224,128,50,31,14,5,4) # Set the number of individuals with each height. X <- rep(HT,NUM) # Make a new variable X as heights for a million people.
次に,ランダムな標本抽出を行うために,まずRNGkind("Mersenne-Twister")によって乱数生成アルゴリズムをメルセンヌツイスター法に指定し(注:乱数については,このページを参照されたい),set.seed(54321)によって乱数を初期化する。このことによって,解答者全員がまったく同じサンプリングを行うことができる。
RNGkind("Mersenne-Twister") # Specify how to make pseudo-random numbers (random number generator; RNG) set.seed(54321) # Initialize the RNG
なお,ここまでのRコードは,c04enter.Rにも示すので,コピー&ペーストできる。
ここからが問題だが,10人,100人,1000人の非復元抽出を2回ずつ行い,それぞれの平均と不偏標準偏差を,母集団の平均と標準偏差とともに求め,それぞれヒストグラムを描画するコードは,次のようになる。空白に適切な文字列または数字を入れよ。また,結果から何がいえるか考察せよ。
meansdhist <- function(Y, ...) { # Definition of function to draw histogram and calculate mean and sd [A](Y, right=FALSE, [B]=c(140, 200), ...) # draw a histogram, where each interval # including lower limit and excluding upper limit, # with ranges between 140 and 200. RES <- c(mean(Y), sd(Y)) # Calculating mean and sd as a numeric vector, named RES [C](RES) <- c("mean","sd") # Naming the two elements of RES return(RES) # Return the vector of mean and sd } S10M <- sample(X,10,rep=FALSE) # Randomly sampling 10 from the population X without replacement S10N <- sample(X,10,rep=FALSE) # Randomly sampling 10 from the population X without replacement S100M <- sample(X,100,rep=FALSE) # Randomly sampling 100 from the population X without replacement S100N <- sample(X,100,rep=FALSE) # Randomly sampling 100 from the population X without replacement S1000M <- sample(X,1000,rep=FALSE) # Randomly sampling 1000 from the population X without replacement S1000N <- sample(X,1000,rep=FALSE) # Randomly sampling 1000 from the population X without replacement layout(1:7) meansdhist(S10M, main="A histogram of randomly sampled 10 values of heights, 1st trial") meansdhist(S10N, main="A histogram of randomly sampled 10 values of heights, 2nd trial") meansdhist(S100M, main="A histogram of randomly sampled 100 values of heights, 1st trial") meansdhist(S100N, main="A histogram of randomly sampled 100 values of heights, 2nd trial") meansdhist(S1000M, main="A histogram of randomly sampled 1000 values of heights, 1st trial") meansdhist(S1000N, main="A histogram of randomly sampled 1000 values of heights, 2nd trial") hist(X, right=FALSE, xlim=c(140, 200), main="The histogram of heights for the population X") print(EX <- mean(X)) # Mean of X, assigned to EX sqrt(sum((X-EX)^[D])/length(X)) # Standard deviation for the population X
(The program listed above is to enter the heights for one million people (the whole population) as a variable X, to initialize random number generator, and to do several kinds of random sampling with viewing the nature of those samples by calculating means and sds and by drawing histograms. Please fill the boxes from [A] to [D] and discuss the results.)