This paper has scrutinized the process of testing market efficiency, data generation process and the feasibility of market prediction with a detailed, coherent and statistical approach. Furthermore, attempts are made to extract knowledge from S&P 500 market data with an emphasize on feature engineering. As such, different data representations are provided through different procedures, and their ... [Show full abstract] performance in knowledge extraction is discussed. Amongst the neural networks, Long Short-Term Memory has not been adequately experimented. LSTM, because of its intrinsic, considers the long-term and short-term memory in its computations. Thus, in this paper LSTM is further examined in return prediction and different preprocessing methods are tested to improve its accuracy. This study is conducted on market data during September-2000 to February-2021. In order to extend the amount of knowledge extracted from financial time series, and to select the best input features, the advantage of Principal Component Analyze, Random Forest, Wavelet and the LSTM’s own deep feature extraction procedure are taken, and 4 models are compiled. Subsequently, to validate the performance of the models, MAE, MSE, MAPE, CSP and CDCP are calculated. Results from Diebold Mariano test implied that although LSTM neural network has gained a lot of attention recently, it does not significantly perform better than the benchmark method in S&P 500 index return prediction. Yet, results from Wilcoxon signed rank test showed the significance of improvement in the predictions performed by the combination of Principal component analysis and LSTM.